Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis and J. van Leeuwen
1423
J.P. Buhler (Ed.)
Algorithmic Number Theory Third International Symposium, ANTS-III Portland, Oregon, USA, June 21-25, 1998 Proceedings
Springer
Series Editors Gerhard Goos, Karlsruhe University, Germany Juris Hartmanis, Cornell University, NY, USA Jan van Leeuwen, Utrecht University, The Netherlands
Volume Editor Joe P. Buhler Reed College 3203 S.E. Woodstock Blvd., Portland, OR 97202, USA E-mail:
[email protected] Cataloging-in-Publication data applied for
Die Deutsche Bibliothek - CIP-Einheitsaufnahme Algorithmic n u m b e r theory : third international symposium ; proceedings / ANTS-III, Portland, Oregon, USA, June 21 - 25, 1998. Joe Buhler (ed.). - Berlin ; Heidelberg ; New York ; Barcelona ; Budapest ; Hong Kong ; London ; Milan ; Paris ; Santa Clara ; Singapore ; Tokyo : Springer, 1998 (Leclure notes m computer science ; Vol. 1423) ISBN 3-540-64657-4
CR Subject Classification (1991): 1.1, E2.2, G.2, E.3-4, J.2 1991 Mathematics Subject Classification: 11Yxx, 11T71, 68P25, 68Q40, 68Q25, 68Q20, 12Y05, 94A60 ISSN 0302-9743 ISBN 3-540-64657-4 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. 9 Springer-Verlag Berlin Heidelberg 1998 Printed in Germany Typesetting: Camera-ready by author SPIN 10637477 06/3142 - 5 4 3 2 1 0
Printed on acid-free paper
Preface
The Algorithmic Number Theory Symposia (ANTS) were begun in 1994 in an effort to recognize the growing importance of algorithmic thinking, both theoretical and practical, in number theory; the intent was that "number theory" was to be construed in a broad fashion. These conferences have been held every two years; the first was held at Cornell University, and the second was held at the Universit@ Bordeaux I in 1996. The third ANTS conference will be held at Reed College, in Portland, Oregon, USA, on June 21-25, 1998. The conference is being supported by grants from Reed College, the National Science Foundation, and the National Security Agency. The Program Committee consists of Eric Bach, Johannes Buchmann, Joe Buhler, Henri Cohen, Neal Koblitz, Bjorn Poonen, and Ren@ Schoof. They certainly deserve thanks for the hard work of wading through a large number of manuscripts in a short period of time. The Local Arrangements Committee consists of Cathy D'Ambrosia, Danalee Buhler, Joe Buhler, Helen Ivey, and Jerry Shurman. The conference schedule includes invited talks by Professors Daniel Boneh (Stanford University), Noam Elkies (Harvard University), and Andrew Granville (the University of Georgia) together with 46 contributed talks, which are divided into very approximate categories in the table of contents. The task of getting the conference proceedings ready by the time of the conference has been made possible by the hard work of Cathy D'Ambrosia, the Springer-Verlag staff, and especially by Jerry Shurman's generous assistance in tackling the inevitable miasma of minutiae that arise in large text processing projects.
April, 1998
Joe P. Buhler ANTS III Program Chair
Table of C o n t e n t s
Invited Talk 1: Shimura Curve C o m p u t a t i o n s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Noam D. Elkies (Harvard University) Invited Talk 2: The Decision Diffie-Hellman Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
Dan Boneh (Stanford University) G C D Algorithms Parallel Implementation of Sch6nhage's Integer G C D Algorithm . . . . . . . . .
64
Giovanni Cesari (Universitd degli Studi di Trieste) The Complete Analysis of the Binary Euclidean Algorithm . . . . . . . . . . . . . .
77
Brigitte Vallde (Universitd de Caen) Primality Cyclotomy Primality Proving - Recent Developments
...................
95
Preda Mih~ilescu (FingerPIN A G ~J ETH, Institut fiir wissentschaftliches Rechnen) Primality Proving Using Elliptic Curves: An U p d a t e . . . . . . . . . . . . . . . . . . .
111
F. Morain (Laboratoire d'Informatique de l'Ecole polytechnique) Factoring Bounding Smooth Integers (Extended Abstract) . . . . . . . . . . . . . . . . . . . . . . . .
128
Daniel J. Bernstein (The University of Illinois at Chicago) Factorization of the Numbers of the Form m 3 -b c2m2 q- c l m + co . . . . . . . Zhang Mingzhi (Sichuan Union University)
131
Modelling the Yield of Number Field Sieve Polynomials . . . . . . . . . . . . . . . .
137
Brian Murphy (Australian National University) A Montgomery-Like Square Root for the N u m b e r Field Sieve . . . . . . . . . . .
Phong Nguyen (Ecole Normale Supgrieure)
151
VIII
Table of Contents
Sieving Robert Bennion's "Hopping Sieve" . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
169
William F. Galway (University of Illinois at Urbana-Champaign) Trading Time for Space in Prime Number Sieves . . . . . . . . . . . . . . . . . . . . . . .
179
Jonathan P. Sorenson (Butler University) Analytic Number Theory Do Sums of 4 Biquadrates Have a Positive Density? . . . . . . . . . . . . . . . . . . . .
196
Jean-Marc Deshouillers, Franfois Hennecart, Bernard Landreau (Universitd Bordeaux) New Experimental Results Concerning the Goldbach Conjecture . . . . . . . .
204
J-M. Deshouillers (Universitd Bordeaux), H.J.J. te Riele (CWI), Y. Saouter (Institut de Recherche en Informatique de Toulouse) Dense Admissible Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
216
Daniel M. Gordon, Gene Rodemich (Center for Communications Research) An Analytic Approach to Smooth Polynomials over Finite Fields . . . . . . .
226
Daniel Panario (University of Toronto), Xavier Gourdon (INRIA), Philippe Flajolet (INRIA ) Cryptography Generating a Product of Three Primes with an Unknown Factorization .
237
Dan Boneh, Jeremy Horwitz (Stanford University) On the Performance of Signature Schemes Based on Elliptic Curves . . . . .
252
Erik De Win (Katholieke Universiteit Leuven), Serge Mister (Queen's University), Bart Preneel (Katholieke Universiteit Leuven), Michael Wiener (Entrust Technologies) NTRU: A Ring-Based Public Key Cryptosystem . . . . . . . . . . . . . . . . . . . . . . .
267
Jeffrey Hoffstein, Jill Pipher, Joseph H. Silverman (Brown University) Finding Length-3 Positive Cunningham Chains and their Cryptographic Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
289
Adam Young (Columbia University), Moti Yung (CertCo) Linear Algebra, Lattices Reducing Ideal Arithmetic to Linear Algebra Problems . . . . . . . . . . . . . . . . .
299
Stefan Neis (Darmstadt University of Technology) Evaluation of Linear Relations between Vectors of a Lattice in Euclidean Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
311
I. A. Semaev An Efficient Parallel Block-Reduction Algorithm . . . . . . . . . . . . . . . . . . . . . . .
Susanne Wetzel (Universit~it des Saarlandes)
323
Table of Contents
IX
Fast Multiprecision Evaluation of Series of Rational N u m b e r s . . . . . . . . . . .
338
Series, Sums
B~uno Haible (ILOG), Thomas Papanikolaou (Laboratoire A2X) A Problem Concerning a Character Sum - - Extended Abstract . . . . . . . . .
351
E. Teske (Technische Universit~it Darmstadt), H.C. Williams (University of Manitoba) Formal Power Series and Their Continued Fraction Expansion . . . . . . . . . .
358
All van der Poorten (Centre for Number Theory Research) Algebraic Number Fields Imprimitive Octic Fields with Small Discriminants . . . . . . . . . . . . . . . . . . . . .
372
Henri Cohen, Francisco Diaz y Diaz, Michel Olivier (Universitg Bordeaux I) A Table of Totally Complex N u m b e r Fields of Small Discriminants . . . . .
381
Henri Cohen, Francisco Diaz y Diaz, Michel Olivier (Universitg Bordeaux I) Generating Arithmetically Equivalent N u m b e r Fields with Elliptic Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
392
Bart de Smit (Rijksuniversiteit Leiden) Computing the Lead Term of an Abelian L-function
...................
400
David S. Dummit (University of Vermont), Brett A. Tangedal (College of Charleston) Timing Analysis of Targeted Hunter Searches . . . . . . . . . . . . . . . . . . . . . . . . . .
412
John W. Jones (Arizona State University), David P. Roberts (Rutgers University) On Successive Minima of Rings of Algebraic Integers . . . . . . . . . . . . . . . . . . .
424
Jacques Martinet (Universitd Bordeaux I) Class Groups and Fields C o m p u t a t i o n of Relative Quadratic Class Groups . . . . . . . . . . . . . . . . . . . . . .
433
Henri Cohen, Francisco Diaz y Diaz, Michel Olivier (Universitg Bordeaux I) Generating Class Fields using Shimura Reciprocity . . . . . . . . . . . . . . . . . . . . .
441
Alice Gee, Peter Stevenhagen (Universiteit van Amsterdam) Irregularity of Prime Numbers over Real Quadratic Fields . . . . . . . . . . . . . .
454
Joshua Holden (University of Massachusetts at Amherst) Experimental Results on Class Groups of Real Quadratic Fields (Extended Abstract) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
463
Michael J. Jacobson, Yr. (Technische Universit~it Darmstadt) C o m p u t a t i o n of Relative Class Numbers of I m a g i n a r y Cyclic Fields of 2~ Degrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Stgphane Louboutin (Universit~ de Caen)
475
X
Table of Contents
Curves
Formal Groups, Elliptic Curves, and Some Theorems of Couveignes . . . . .
482
Antonia W. Bluher (National Security Agency) A Comparison of Direct and Indirect Methods for Computing Selmer Groups of an Elliptic Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
Z. Djabri (University of Kent at Canterbury), N.P. Smart (Hewlett-Packard Laboratories) An Algorithm for Approximate Counting of Points on Algebraic Sets over Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
514
Ming-Deh Huang, Yiu-Chung Wong (University of Southern California) S-integral Points on Elliptic Curves and Fermat's Triple Equations . . . . .
528
A. Peth5 (Kossuth Lajos University), E. Herrmann, H. G. Zimmer (Universitiit des Saarlandes) Speeding Up Pollard's Rho Method for Computing Discrete Logarithms
541
Edlyn Teske (Technische Universitiit Darmstadt) Function Fields A General Method of Constructing Global Function Fields with M a n y Rational Places ............................................
555
Harald Niederreiter (Austrian Academy of Sciences), Chaoping Xing (The National University of Singapore) Lattice Basis Reduction in Function Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
567
Sachar Paulus (Darmstadt University of Technology) Comparing Real and Imaginary Arithmetics for Divisor Class Groups of Hyperelliptic Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
576
Sachar Paulus (Darmstadt University of Technology), Andreas Stein (University of Manitoba) Unit Computation in Purely Cubic Function Fields of Unit Rank 1 . . . . .
592
Renate Scheidler (University of Delaware), Andreas Stein (University of Manitoba) An Improved Method of Computing the Regulator of a Real Quadratic Function Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
607
Andreas Stein, Hugh C. Williams (University of Manitoba) The Equivalence Between Elliptic Curve and Quadratic Function Field Discrete Logarithms in Characteristic 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
621
Robert J. Zuccherato (Entrust Technologies) Author
Index
........................................................
639
Shimura Curve Computations Noam D. Elkies Harvard University
Abstract. We give some methods for computing equations for certain Shimura curves, natural maps between them, and special points on them. We then illustrate these methods by working out several examples in varying degrees of detail. For instance, we compute coordinates for all the rational CM points on the curves X ∗ (1) associated with the quaternion algebras over Q ramified at {2, 3}, {2, 5}, {2, 7}, and {3, 5}. We conclude with a list of open questions that may point the way to further computational investigation of these curves.
1 1.1
Introduction Why and How to Compute with Shimura Curves
The classical modular curves, associated to congruence subgroups of PSL2 (Q), have long held and repaid the interest of number theorists working theoretically as well as computationally. In the fundamental paper [S2] Shimura defined curves associated with other quaternion algebras other over totally real number fields in the same way that the classical curves are associated with the algebra M2 (Q) of 2 × 2 matrices over Q. These Shimura curves are now recognized as close analogues of the classical modular curves: almost every result involving the classical curves generalizes with some more work to Shimura curves, and indeed Shimura curves figure alongside classical ones in a key step in the recent proof of Fermat’s “last theorem” [Ri]. But computational work on Shimura curves lags far behind the extensive effort devoted to the classical modular curves. The 19th century pioneers investigated some arithmetic quotients of the upper half plane which we now recognize as Shimura curves (see for instance [F1,F2]) with the same enthusiasm that they applied to the PSL2 (Q) curves. But further inroads proved much harder for Shimura curves than for their classical counterparts. The PSL2 (Q) curves parametrize elliptic curves with some extra structure; the general elliptic curve has a simple explicit formula which lets one directly write down the first few modular curves and maps between them. (For instance, this is how Tate obtained the equations for the first few curves X1 (N ) parametrizing elliptic curves with an N -torsion point; see for instance [Kn, pp.145–148].) Shimura showed that curves associated with other quaternion algebras also parametrize geometric objects, but considerably more complicated ones (abelian varieties with quaternionic endomorphisms); even in the first few cases beyond M2 (Q), explicit formulas for these objects were obtained only recently [HM], and using such formulas to get J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 1–47, 1998. c Springer-Verlag Berlin Heidelberg 1998
2
Noam D. Elkies
at the Shimura curves seems a most daunting task. Moreover, most modern computations with modular curves (e.g. [C,E5]) sidestep the elliptic interpretation and instead rely heavily on q-expansions, i.e. on the curves’ cusps. But arithmetic subgroups of PSL2 (R) other than those in PSL2 (Q) contain no parabolic elements, so their Shimura curves have no cusps, and thus any method that requires q-expansions must fail. But while Shimura curves pose harder computational problems than classical modular curves, efficient solutions to these problems promise great benefits. These curves tempt the computational number theorist not just because, like challenging mountainpeaks, “they’re there”, but because of their remarkable properties, direct applications, and potential for suggesting new ideas for theoretical research. Some Shimura curves and natural maps between them provide some of the most interesting examples in the geometry of curves of low genus; for instance each of the five curves of genus g ∈ [2, 14] that attains the Hurwitz bound 84(g − 1) on the number of automorphisms of a curve in characteristic zero is a Shimura curve. Shimura curves, like classical and Drinfeld modular curves, reduce to curves over the finite field Fq2 of q 2 elements that attain the Drinfeld-Vl˘ adut¸ upper bound (q − 1 + o(1))g on the number of points of a curve of genus g → ∞ over that field [I3]. Moreover, while all three flavors of modular curves include towers that can be given by explicit formulas and thus used to construct good error-correcting codes [Go1,Go2,TVZ], only the Shimura curves, precisely because of their lack of cusps, can give rise to totally unramified towers, which should simplify the computation of the codes; we gave formulas for several such towers in [E6]. Finally, the theory of modular curves indicates that CM (complex multiplication) points on Shimura curves, elliptic curves covered by them, and modular forms on them have number-theoretic significance. The ability to efficiently compute such objects should suggest new theoretical results and conjectures concerning the arithmetic of Shimura curves. For instance, the computations of CM points reported in this paper should suggest factorization formulas for the difference between the coordinates of two such points analogous to those of Gross and Zagier [GZ] for j-invariants of elliptic curves, much as the computation of CM values of the Weber modular functions suggested the formulas of [YZ]. Also, as in [GS], rational CM points on rational Shimura curves with only three elliptic points (i.e. coming from arithmetic triangle groups Gp,q,r ) yield identities A + B = C in coprime integers A, B, C with many repeated factors; we list the factorizations here, though we found no example in which A, B, C are perfect p, q, r-th powers, nor any new near-record ABC ratios. Finally, CM computations on Shimura curves may also make possible new Heegner-point constructions as in [E4]. So how do we carry out these computations? In a few cases (listed in [JL]), the extensive arithmetic theory of Shimura curves has been used to obtain explicit equations, deducing from the curves’ p-adic uniformizations Diophantine conditions on the coefficients of their equations stringent enough to determine them uniquely. But we are interested, not only in the equations, but in modular covers and maps between Shimura curves associated to the same quaternion
Shimura Curve Computations
3
algebra, and in CM points on those curves. The arithmetic methods may be able to provide this information, but so far no such computation seems to have been done. Our approach relies mostly on the uniformization of these curves qua Riemann surfaces by the hyperbolic plane, and uses almost no arithmetic. This approach is not fully satisfactory either; for instance it probably cannot be used in practice to exhibit all natural maps between Shimura curves of low genus. But it will provide equations for at least a hundred or so curves and maps not previously accessible, which include some of the most striking examples and should provide more than enough data to suggest further theoretical and computational work. When a Shimura curve C comes from an arithmetic subgroup of PSL2 (R) contained in a triangle group Gp,q,r , the curve H/Gp,q,r has genus 0, and C is a cover of that curve branched only above three points, so may be determined from the ramification data. (We noted in [E5, p.48] that this method was available also for classical modular curves comings from subgroups of PSL2 (Z) ∼ = G2,3,∞, though there better methods are available thanks to the cusp. Subgroups of PSL2 (R) commensurate with1 but not contained in Gp,q,r may be handled similarly via the common subgroup of finite index.) The identification of H/Gp,q,r with P1 is then given by a quotient of hypergeometric functions on P1 , which for instance lets us compute the P1 coordinate of any CM point on C as a complex number to high precision and thus recognize it at least putatively as an algebraic number. Now it is known [T] that only nineteen commensurability classes of arithmetic subgroups of PSL2 (R) contain a triangle group. These include some of the most interesting examples — for instance, congruence subgroups of arithmetic triangle groups account for several of the sporadic “arithmetically exceptional functions” (rational functions f(X) ∈ Q(X) which permute P1 (Fp ) for infinitely many primes p) of [M¨ u]; but an approach that could only deal with those nineteen classes would be limited indeed. When there are more than three elliptic points, a new difficulty arises: even if C = H/G still has genus 0, we must first determine the relative locations of the elliptic points, and to locate other CM points we must replace the hypergeometric functions to solutions of more general “Schwarzian differential equations” in the sense of [I1]. We do both by in effect using nontrivial elements of the “commensurator” of the group G ∈ PSL2 (R), i.e. transformations in PSL2 (R) which do not normalize G but conjugate G to a group commensurable with G. Ihara had already used these commensurators in [I1] theoretically to prove that both C and its Schwarzian equation are defined over a number field, but this method has apparently not been actually used to compute such equations until now. 1.2
Overview of the Paper
We begin with a review of the necessary definitions and facts on quaternion algebras and Shimura curves, drawn mostly from [S2] and [V]. We then give 1
Recall that two subgroups H, K of a group G are said to be commensurate if H ∩ K is a subgroup of finite index in both H and K.
4
Noam D. Elkies
extended computational accounts of Shimura curves and their supersingular and rational CM points for the two simplest indefinite quaternion algebras over Q beyond the classical case of the matrix algebra M2 (Q), namely the quaternion algebras ramified at {2, 3} and {2, 5}. In the final section we more briefly treat some other examples which illustrate features of our methods that do not arise in the {2, 3} and {2, 5} cases, and conclude with some open questions suggested by our computations that may point the way to further computational investigation on these curves. 1.3
Acknowledgements
Many thanks to B.H. Gross for introducing me to Shimura curves and for many enlightening conversations and clarifications on this fascinating topic. Thanks also to Serre for a beautiful course that introduced me to three-point covers of P1 among other things ([Se], see also [Mat]); to Ihara for alerting me to his work [I1,I2] on supersingular points on Shimura curves and their relation with the curves’ uniformization by the upper half-plane; and to C. McMullen for discussions of the uniformization of quotients of H by general co-compact discrete subgroups of PSL2 (R). A. Adler provided several references to the 19thcentury literature, and C. Doran informed me of [HM]. Finally, I thank B. Poonen for reading and commenting on a draft of this paper, leading to considerable improvements of exposition in several places. The numerical and symbolic computations reported here were carried out using the gp/pari and macsyma packages, except for (70), for which I thank Peter M¨ uller as noted there. This work was made possible in part by funding from the David and Lucile Packard Foundation.
2 2.1
Review of Quaternion Algebras over Q and their Shimura Curves Quaternion Algebras over Q; the Arithmetic Groups Γ (1) and Γ ∗ (1)
Let K be a field of characteristic zero; for our purposes K will always be a number field or, rarely, its localization, and usually the number field will be Q. A quaternion algebra over K is a simple associative algebra A with unit, containing K, such that K is the center of A and dimK A = 4. Such an algebra has a conjugation a ↔ a ¯, which is a K-linear anti-involution (i.e. ¯a = a and a1 a2 = a ¯2a ¯1 hold identically in A) such that a = a ¯ ⇔ a ∈ K. The trace and norm are the additive and multiplicative maps from A to K defined by tr(a) = a + a ¯,
N(a) = a¯ a=a ¯a;
(1)
every a ∈ A satisfies its characteristic equation a2 − (tr(a))a + N(a) = 0.
(2)
Shimura Curve Computations
5
The most familiar example of a quaternion algebra is M2 (K), the algebra of 2 × 2 matrices over K, and if K is algebraically closed then M2 (K) is the only quaternion algebra over K up to isomorphism. The other well-known example is the algebra of Hamilton quaternions over R. In M2 (K) the trace is the usual trace of a square matrix, so the conjugate of a ∈ M2 (K) is tr(a)I2×2 − a, and the norm is just the determinant. Any quaternion algebra with zero divisors is isomorphic with M2 (K). An equivalent criterion is that the algebra contain a nonzero element whose norm and trace both vanish. Now the trace-zero elements constitute a K-subspace of A of dimension 3, on which the norm is a homogeneous quadric; so the criterion states that A ∼ = M2 (K) if and only if that quadric has nonzero K-rational points. The Hamilton quaternions have basis 1, i, j, k satisfying the familiar relations i2 = j 2 = k 2 = 1,
ij = −ji = k,
jk = −kj = i,
ki = −ik = j;
(3)
the conjugates of 1, i, j, k are 1, −i, −j, −k, so a Hamilton quaternion α1 + α2 i + α3 j + α4 k has trace 2α1 and norm α21 + α22 + α23 + α24 . Thus the Hamilton quaternions over K are isomorphic with M2 (K) if and only if −1 is a sum of two squares in K. In fact it is known that if K = R then every quaternion algebra over K is isomorphic with either M2 (R) or the Hamilton quaternions. In general if K is any local field of characteristic zero then there is up to isomorphism exactly one quaternion algebra over K other than M2 (K) — with the exception of the field of complex numbers, which being algebraically closed admits no quaternion algebras other than M2 (C). If A is a quaternion algebra over a number field K then a finite or infinite place v of K is said to be ramified in A if A ⊗ Kv is not isomorphic with M2 (Kv ). There can only be a finite number of ramified places, because a nondegenerate quadric over K has nontrivial local zeros at all but finitely many places of K. A less trivial result (the case K = Q is equivalent to Quadratic Reciprocity) is that the number of ramified places is always even, and to each finite set of places Σ of even cardinality containing no complex places there corresponds a unique (again up to isomorphism) quaternion algebra over K ramified at those places and no others. In particular an everywhere unramified quaternion algebra over K must be isomorphic with M2 (K). An order in a quaternion algebra over a number field (or a non-Archimedean local field) K is a subring containing the ring OK of K-integers and having rank 4 over OK . For instance M2 (OK ) and OK [i, j] are orders in the matrix and quaternion algebras over K. Any order is contained in at least one maximal order, that is, in an order not properly contained in any other. Examples of maximal orders are M2 (OK ) ∈ M2 (K) and the Hurwitz order Z[1, i, j, (1 + i + j + k)/2] in the Hamilton quaternions over Q. It is known that if K has at least one Archimedean place at which A is not isomorphic with the Hamilton quaternions then all maximal orders are conjugate in A.
6
Noam D. Elkies
Now let2 K = Q. A quaternion algebra A/Q is called definite or indefinite according as A ⊗ R is isomorphic with the Hamilton quaternions or M2 (R), i.e. according as the infinite place is ramified or unramified in A. [These names allude to the norm form on the trace-zero subspace of A, which is definite in the former case, indefinite in the latter.] We shall be concerned only with the indefinite case. Then Σ consists of an even number of finite primes. Fix such a Σ and the corresponding quaternion algebra A. Let O be a maximal order in A; since A is indefinite, all its maximal orders are conjugate, so choosing a different maximal order would not materially affect the constructions in the sequel. Let O1∗ be the group of units of norm 1 in O. We then define the following arithmetic subgroups of A∗ /Q∗ : Γ (1) := O1∗ /{±1}, Γ ∗ (1) := {[a] ∈ A∗ /Q∗ : aO = Oa, N(a) > 0}.
(4) (5)
[In other words Γ ∗ (1) is the normalizer of Γ (1) in the positive-norm subgroup of A∗/Q∗ . Takeuchi [T] calls these groups Γ (1)(A, O1 ) and Γ (∗)(A, O1 ); we use Γ (1) to emphasize the analogy with the classical case of PSL 2 (Z), which makes Γ ∗(1) a natural adaptation of Takeuchi’s notation. Vign´eras [V, p. 121ff.] calls the same groups Γ and G, citing [Mi] for the structure of their quotient.] As noted, Γ (1) ∗ ∗ is a normal subgroup of Γ ∗(1). Q In fact Γ (1) consists of the classes mod 0 Q of elements of O whose norm is p∈Σ 0 p for some (possibly empty) subset Σ ⊆ Σ, and Γ ∗ (1)/Γ (1) is an elementary abelian 2-group with #Σ generators. 2.2
The Shimura Modular Curves X (1) and X ∗ (1)
The group Γ (1), and thus any other group commensurable with it such as Γ ∗(1), is a discrete subgroup of (A ⊗ R)∗+ /R∗ (the subscript “+” indicating positive norm), with compact quotient unless Σ = ∅, and of finite covolume even in that case. Since A⊗R ∼ = M2 (R), the group (A⊗R)∗+ /R∗ is isomorphic with PSL2 (R) and thus with Aut(H), the group of automorphisms of the hyperbolic upper half plane H := {z ∈ C : Im(z) > 0}. (6) Explicitly, a unimodular matrix ±( ac db ) acts on H via the fractional linear transformation z 7→ (az + b)/(cz + d). We may define the Shimura curves X (1) and X ∗(1) qua compact Riemann surfaces by X (1) := H/Γ (1), 2
X ∗ (1) := H/Γ ∗(1).
(7)
Most of our examples, including the two that will occupy us in the next two sections, involve quaternion algebras over Q. In [S2] Shimura associated modular curves to a quaternion algebra over any totally real number field K for which the algebra is ramified at all but one of the infinite places of K. Since the special case K = Q accounts for most of our computations, and is somewhat easier to describe, we limit our discussion to quaternion algebras over Q from here until section 5.3. At that point we briefly describe the situation for arbitrary K before working out a couple of examples with [K : Q] >1.
Shimura Curve Computations
7
[More precisely, the Riemann surfaces are given by (7) unless Σ = ∅, in which case the quotient only becomes compact upon adjoining a cusp.] The hyperbolic area of these quotients of H is given by the special case k = Q of a formula of RR Shimizu [S1, Appendix], quoted in [T, p.207]. Using the normalization π −1 dx dy/y2 for the hyperbolic area (with z = x + iy; this normalization gives an ideal triangle unit area), that formula is Area(X (1)) =
1 Y (p − 1), 6
(8)
p∈Σ
from which Area(X ∗ (1)) =
1 Y p−1 1 Area(X (1)) = . [Γ ∗(1) : Γ (1)] 6 2
(9)
p∈Σ
It is known (see for instance Ch.IV:2,3 of [V] for the following facts) that, for any discrete subgroup Γ ⊂ PSL2 (R) of finite covolume, the genus of H/Γ is determined by its area together with with information on elements of finite order in Γ . All finite subgroups of Γ are cyclic, and there are finitely many such subgroups up to conjugation in Γ . There are finitely many points Pj of H/Γ with nontrivial stabilizer, and the stabilizers are the maximal nontrivial finite subgroups of Γ modulo conjugation in Γ . If the order of the stabilizer of Pj is ej then Pj is said to be an “elliptic point of order ej ”. Then if H/Γ is compact then its genus g = g(H/Γ ) is given by 2g − 2 = Area(H/Γ ) −
X
(1 −
j
1 ). ej
(10)
Moreover Γ has a presentation Γ =
e hα1 , . . . , αg , β1 , . . . , βg , sj |sj j
= 1,
Y j
g Y sj [αi , βi ] = 1i,
(11)
i=1
in which sj generates the stabilizer of a preimage of Pj in H and rotates a neighborhood of that preimage by an angle 2π/ej (i.e. has derivative e2πi/ej at its fixed point), and [α, β] is the commutator αβα−1 β −1 . [This group is sometimes called (g; e1 , . . . , eg ).] If H/Γ is not compact then we must subtract the number of cusps from the right-hand side of (10) and include a generator sj of Γ of infinite order for each cusp, namely a generator of the infinite cyclic stabilizer of the cusp. This generator is a “parabolic element” of PSL2 (R), i.e. a fractional linear transformation with a single fixed point; there are two conjugacy classes of such elements in PSL2 (R), and sj will be in the class of z 7→ z + 1. We assign ej = ∞ to a cusp. For both finite and infinite ej , the trace and determinant of sj are related by π det(sj ). (12) Tr2 (sj ) = 4 cos2 ej
8
Noam D. Elkies
Since we are working in quaternion algebras over Q, this means that ej ∈ {2, 3, 4, 6, ∞}, and only 2, 3, ∞ are possible if Γ ⊆ Γ (1). Moreover ej = ∞ occurs only in the classical case Σ = ∅. We shall need to numerically compute for several such Γ the identification of H/Γ with an algebraic curve X/C, i.e. to compute the coordinates on X of a point corresponding to (the Γ -orbit of) a given z ∈ H, or inversely to obtain z corresponding to a point with given coordinates. In fact the two directions are essentially equivalent, because if we can efficiently compute an isomorphism between two Riemann surfaces then we can compute its inverse almost as easily. For classical modular curves one usually uses q-expansions to go from z to rational coordinates; but this method is not available for our groups Γ , which have no parabolic (ej = ∞) generator. We can, however, still go in the opposite direction, computing the map from X to H/Γ by solving differential equations on X. The key is that while the function z on X is not well defined due to the Γ ambiguity, its Schwarzian derivative is. In local coordinates the Schwarzian derivative of a nonconstant function z = z(ζ) is the meromorphic function defined by Sζ (z) := −4z −1 z 0
1/2
2z 0 z 000 − 3z 00 d2 z = . dζ 2 z 0 1/2 z02 2
(13)
This vanishes if and only if z is a fractional linear transformation of ζ. Moreover it satisfies a nice “chain rule”: if ζ is in turn a function of η then 2 dζ Sζ (z) + Sη (ζ). (14) Sη (z) = dη Thus if we choose a coordinate ζ on X then Sζ (z) is the same for each lift of z from H/Γ to H, and thus gives a well-defined function on the complement in X of the elliptic points; changing the coordinate from ζ to η multiples this function by (dζ/dη)2 and adds a term Sη (ζ) that vanishes if ζ is a fractional linear transformation of η. In particular if X has genus 0 and we choose only rational coordinates (i.e. η, ζ are rational functions of degree 1) then these terms Sη (ζ) always vanish and Sζ (z) dζ 2 is a well-defined quadratic differential σ on X. Near an elliptic point ζ0 of index ej , the function z has a branch point such that (z − z0 )/(z − z¯0 ) is (ζ − ζ0 )1/ej times an analytic function; for such z the Schwarzian derivative is still well-defined in a neighborhood of ζ0 but has −2 2 2 a double pole there with leading term (1 − e−2 j )/(ζ − ζ0 ) [or (1 − ej )/ζ if ζ0 = ∞ — note that this too has a double pole when multiplied by dζ 2 ]. So σ = Sζ (z) dζ 2 is a rational quadratic differential on X, regular except for double poles of known residue at the elliptic points, and independent of the choice of rational coordinate when X has genus 0. Knowing σ we may recover z from the differential equation (15) Sζ (z) = σ/dζ 2 , which determines z up to a fractional linear transformation over C, and can then remove the ambiguity if we know at least three values of z (e.g. at elliptic points, which are fixed points of known elements of Γ ).
Shimura Curve Computations
9
Because Sζ (z) is invariant under fractional linear transformations of z, the third-order nonlinear differential equation (15) can be linearized as follows (see e.g. [I1, §1–5]). Let (f1 , f2 ) be a basis for the solutions of the linear second-order equation (16) f 00 = af 0 + bf for some functions a(ζ), b(ζ). Then z := f1 /f2 is determined up to fractional linear transformation, whence Sζ (z) depends only on a, b and not the choice of basis. In fact we find, using either of the equivalent definitions in (13), that Sζ (f1 /f2 ) = 2
da − a2 − 4b. dζ
(17)
Thus if a is any rational function and b = −σ/4dζ 2 + a0 /2 − a2 /4 then the solutions of (15), and thus a map from X to H/Γ , are ratios of linearly independent pairs of solutions of (16). In the terminology of [I1], (16) is then a Schwarzian equation for H/Γ . We shall always choose a so that a dζ has at most simple poles at the elliptic points and no other poles; the Schwarzian equation then has regular singularities at the elliptic points and no other singularities. The most familiar example is the case that Γ is a triangle group, i.e. X has genus 0 and three elliptic points (if g = 0 there must be at least three elliptic points by (10)). In that case σ is completely determined by its poles and residues: if two different σ’s were possible, their difference would be a nonzero quadratic differential on P1 with at most three simple poles, which is impossible. If we choose the coordinate on X that puts the elliptic points at 0, 1, ∞, and require that a be chosen of the form a = C0 /ζ + C1 /(ζ − 1) so that b has only simple poles at 0, 1, then there are four choices for (C0 , C1 ), each giving rise to a hypergeometric equation upon multiplying (16) by ζ(1 − ζ): ζ(1 − ζ)f 00 = [(α + β + 1)ζ − γ]f 0 + αβf.
(18)
Here α, β, γ are related to the indices e1 , e2 , e3 at ζ = 0, 1, ∞ by 1 = ±(1 − γ), e1
1 = ±(γ − α − β), e2
1 = ±(α − β); e3
(19)
then F (α, β; γ; ζ) and (1 − ζ)γ F (α − γ + 1, β − γ + 1; 2 − γ; ζ) constitute a basis for the solutions of (16), where F = 2 F1 is the hypergeometric function defined for |ζ| < 1 by # "n−1 ∞ X Y (α + k)(β + k) ζ n , (20) F (α, β; γ; ζ) := (γ + k) n! n=0 k=0
and by similar power series in neighborhoods of ζ = 1 and ζ = ∞ (see for instance [GR, 9.10 and 9.15]). In general, knowing σ we may construct and solve a Schwarzian equation in power series, albeit series less familiar than 2 F1 , and numerically compute the map X → H/Γ as the quotient of two solutions. But once Γ is not a triangle group — that is, when X has more than three
10
Noam D. Elkies
elliptic points or positive genus — the elliptic points and their orders no longer determine σ but only restrict it to an affine space of finite but positive dimension. In general it is a refractory problem to find the “accessory parameters” that tell where σ lies in that space. If Γ is commensurable with a triangle group Γ 0 then we obtain σ from the quadratic differential on H/Γ 0 via the correspondence between that curve and H/Γ ; but this only applies to Shimura curves associated with the nineteen quaternion algebras listed by Takeuchi in [T], including only two over Q, the matrix algebra and the algebra ramified at {2, 3}. One of the advances in the present paper is the computation of σ for some arithmetic groups not commensurable with any triangle group. We now return to the Shimura curves X (1), X ∗ (1) obtained from arithmetic groups Γ = Γ (1), Γ ∗(1). These curves also have a modular interpretation that gives them the structure of algebraic curves over Q. To begin with, X (1) is the modular curve for principally polarized abelian surfaces (ppas) A with an embedding O ,→ End(A). (In the classical case O = M2 (Z), corresponding to Σ = ∅, such an abelian surface is simply the square of an elliptic curve and we recover the familiar picture of modular curves parametrizing elliptic ones, but for nonempty Σ the surfaces A are simple except for those associated to CM points on X (1); we shall say more about CM points later.) The periods of these surfaces satisfy a linear second-order differential equation which is a Schwarzian equation for H/Γ (1), usually called a “Picard-Fuchs equation” in this context. [This generalizes the expression for the periods of elliptic curves (a.k.a. “complete elliptic integrals”) as 2 F1 values, for which see e.g. [GR, 8.113 1.].] The group Γ ∗(1)/Γ (1) acts on X (1) with quotient curve X ∗ (1). For each p ∈ Σ there is then an involution wp ∈ Γ ∗ (1)/Γ (1) associated to the class in Γ ∗(1)/Γ (1) of elements of O of norm p, and these involutions commute with each other. (We chose the notation wp to suggest an analogy with the Atkin-Lehner involutions wl , which as we shall see have a more direct counterpart in our setting when l∈ / Σ.) In terms of abelian surfaces these involutions wp of X (1) may be explained as follows. Let Ip ⊂ O consist of the elements whose norm is divisible by p. Then Ip is a two-sided prime ideal of O, with O/Ip ∼ = Fp2 and Ip2 = pO. Given an action of O on a ppas A, the kernel of Ip is a subgroup of A of size p2 isotropic under the Weil pairing, so the quotient surface A0 := A/ ker Ip is itself principally polarized. Moreover, since Ip is a two-sided ideal, A0 inherits an action of O. Thus if A corresponds to some point P ∈ X (1) then A0 corresponds to a point P 0 ∈ X (1) determined algebraically by P ; that is, we have an algebraic map wp : P 7→ P 0 from X (1) to itself. Applying this construction to A0 yields A/ ker Ip2 = A/ ker pO = A/ ker p ∼ = A; thus wp (P 0 ) = P and wp is indeed an involution. The quotient curve X ∗ (1) then parametrizesQsurfaces A up to the identification of A with A/ ker I where I = ∩p∈Σ 0 Ip = p∈Σ 0 Ip for some Σ 0 ⊆ Σ. Since X (1), X ∗ (1) have the structure of algebraic curves over Q, they can be regarded as curves over R. Now a real structure on any Riemann surface is equivalent to an anti-holomorphic involution of the surface. For surfaces H/Γ
Shimura Curve Computations
11
uniformized by the upper half-plane, we can give such an involution by choosing a group (Γ : 2) ⊂ PGL2 (R) containing Γ with index 2 such that (Γ : 2) 6⊂ PSL2 (R). An element ( ac db )R∗ of PGL2 (R) − PSL2 (R) (i.e. with ad − bc < 0) acts on H anti-holomorphically z 7→ (a¯ z +b)/(c¯ z +d). Such a fractional conjugatelinear transformation has fixed points on H if and only if a + d = 0, in which case it is an involution and its fixed points constitute a hyperbolic line. Thus H/Γ , considered as a curve over R using Γ : 2, has real points if and only if (Γ : 2) − Γ contains an involution of H. The real structures on X (1), X ∗ (1) are defined by (Γ (1) : 2) := O∗ /{±1}, (Γ ∗ (1) : 2) := {[a] ∈ A∗ /Q∗ : aO = Oa}.
(21) (22)
That is, compared with (4,5) we drop the condition that the norm be positive. If Σ 6= ∅ then X (1) has no real points, because if Γ (1) : 2 contained an involution ±a then the characteristic equation of a would be a2 −1 = 0 and A would contain the zero divisors a ± 1. This is a special case of the result of [S3]. But X ∗(1) may have real points. For instance, we shall see that if Σ = {2, 3} then Γ ∗(1) is isomorphic with the triangle group G2,4,6. For general p, q, r with3 1/p + 1/q + 1/r < 1 we can (and, if p, q, r are distinct, can only) choose Gp,q,r : 2 so that the real locus of H/Gp,q,r consists of three hyperbolic lines joining the three elliptic points in pairs, forming a hyperbolic triangle, with Gp,q,r : 2 generated by hyperbolic reflections in the triangle’s sides; it is this triangle to which the term “triangle group” alludes. 2.3
The Shimura Modular Curves X (N ) and X ∗ (N ) (With N Coprime to Σ); the Curves X0 (N ) and X0∗ (N ) and their Involution wN
Now let l be a prime not ramified in A. Then A ⊗ Ql and O ⊗ Zl are isomorphic with M2 (Ql ) and M2 (Zl ) respectively. Thus (O ⊗ Ql )∗1 /{±1} ∼ = PSL2 (Zl ), with the subscript 1 indicating the norm-1 subgroup as in (4). We can thus define congruence subgroups Γ (l), Γ1 (l), Γ0 (l) of Γ (1) just as in the classical case in which Σ = ∅ and Γ (1) = PSL2 (Z). For instance, Γ (l) is the normal subgroup ∗ /{±1} : a ≡ 1 mod l} {±a ∈ O+
(23)
of Γ (1), with Γ (1)/Γ (l) ∼ = PSL2 (Fl ); once we choose an identification of the quotient group Γ (1)/Γ (l) with PSL2 (Fl ) we may define Γ0 (l) as the preimage in Γ (1) of the upper triangular subgroup of PSL2 (Fl ). Likewise we have subgroups Γ (lr ), Γ0 (lr ) etc., and even Γ (N ), Γ0 (N ) for a positive integer N not divisible by any of the primes of Σ. The quotients of H by these subgroups of Γ (1) are then modular curves covering X (1), which we denote by X (l), X0 (l), etc. They parametrize ppas’s A 3
If 1/p+ 1/q + 1/r equals or exceeds 1, an analogous situation occurs with H replaced by the complex plane or Riemann sphere.
12
Noam D. Elkies
with an O-action and extra structure: in the case of X (N ), a choice of basis for the N -torsion points A[N ]; in the case of X0 (N ), a subgroup G ⊂ A[N ] isomorphic with (Z/N )2 and isotropic under the Weil pairing. In the latter case the surface A0 = A/G is itself principally polarized and inherits an action of O from A, and the image of A[N ] in A0 is again a subgroup G0 ∼ = (Z/N )2 isotropic under the Weil pairing. Thus if we start from some point P on X0 (N ) and associate to it a pair (A, G) we obtain a new pair (A0 , G0 ) of the same kind and a new point P 0 ∈ X0 (N ) determined algebraically by P . Thus we have an algebraic map wN : P 7→ P 0 from X0 (N ) to itself. As in the classical case — in which it is easy to see that the construction of A0 , G0 from A, G amounts to (the square of) the familiar picture of cyclic subgroups and dual isogenies — this wN is an involution of X0 (N ) that comes from a trace-zero element of A of norm N whose image in A∗ /Q∗ is an involution normalizing Γ0 (N ). By abuse of terminology we shall say that a pair of points P, P 0 on X (1) are “cyclically N -isogenous”4 if they correspond to ppas’s A, A0 with A0 = A/G as above, and call the quotient map A → A/G ∼ = A0 a “cyclic N -isogeny”. If 0 we regard P, P as Γ (1)-orbits in H then they are cyclically N -isogenous iff a point in the first orbit is taken to a point in the second by some a ∈ O of ¯ also norm N such that a 6= ma0 for any a0 ∈ O and m > 1; since in that case a satisfies this condition and acts on H as the inverse of a, this relation on P, P 0 is symmetric. Then X0 (N ) parametrizes pairs of N -isogenous points on X (1), and wN exchanges the points in such a pair. The involutions wp on X (1) lift to the curves X (N ), X0 (N ), etc., and commute with wN on X0 (N ). The larger group Γ ∗(1) likewise has congruence groups such as Γ ∗(N ), Γ0∗(N ), etc., which give rise to modular curves covering X ∗ (1) called X ∗ (N ), X0∗ (N ), etc. The involution wN on X0 (N ) descends to an involution on X0∗ (N ) which we shall also call wN . We extend our abuse of terminology by saying that two points on X ∗(1) are “cyclically N -isogenous” if they lie under two N -isogenous points of X (1), and speak of “N -isogenies” between the equivalence classes of ppas’s parametrized by X ∗ (1). One new feature of the congruence subgroups of Γ ∗(1) is that, while Γ ∗ (N ) is still normal in Γ ∗(1), the quotient group may be larger / Σ is prime than PSL2 (Z/N ), due to the presence of the wp . For instance if l ∈ then Γ ∗(1)/Γ ∗(l) is PSL2 (Fl ) only if all the primes in Σ are squares modulo l; otherwise the quotient group is PGL2 (Fl ). In either case the index of Γ0∗(N ) in Γ ∗(1), and thus also the degree of the cover X0∗ (N )/X ∗ (1), is l + 1. Since these curves are all defined over Q, they can again be regarded as curves over R by a suitable choice of (Γ : 2). For instance, if Γ = Γ (N ), Γ1 (N ), Γ0 (N ) we obtain (Γ : 2) by adjoining a ∈ O of norm −1 such that a ≡ ( 10 −10 ) mod N under our identification of O/N O with M2 (Z/N ). Note however that most of the automorphisms PSL2 (Z/N ) of X (N ) do not commute with ( 10 −10 ) and thus do not act on X (N ) regarded as a real curve. Similar remarks apply to Γ ∗(N ) etc. 4
This qualifier “cyclically” is needed to exclude cases such as the multiplication-by-m map, which as in the case of elliptic curves would count as an “m2 -isogeny” but not a cyclic one.
Shimura Curve Computations
13
Now fix a prime l ∈ / Σ and consider the sequence of modular curves Xr = X0 (lr ) or Xr = X0∗ (lr ) (r = 0, 1, 2, . . .). The r-th curve parametrizes lr -isogenies, which is to say sequences of l-isogenies A0 → A1 → A2 → · · · → An
(24)
such that the composite isogeny Aj−1 → Aj+1 is a cyclic l2 -isogeny for each j with 0 < j < n. Thus for each m = 0, 1, . . ., n there are n + 1 − m maps πj : Xn → Xm obtained by extracting for some j = 0, 1, . . ., n − m the cyclic lm -isogeny Aj → Aj+m from (24). Each of these maps has degree ln−m , unless m = 0 when the degree is (l + 1)ln−1 . In particular we have a tower of maps π
π
π
π
π
Xn →0 Xn−1 →0 Xn−2 →0 · · · →0 X2 →0 X1 ,
(25)
each map being of degree l. We observed in [E6, Prop. 1] that explicit formulas for X1 , X2 , together with their involutions wl , wl2 and the map π0 : X2 → X1 , suffice to exhibit the entire tower (25) explicitly: For n ≥ 2 the product map π = π0 × π1 × π2 × · · · × πn−2 : Xn → X2n−1 is a 1:1 map from Xn to the set of (P1 , P2, . . . , Pn−1) ∈ X2n−1 such that π0 wl2 (Pj ) = wl π0 (Pj+1 )
(26)
(27)
for each j = 1, 2, . . . , n − 2. Here we note that this information on X1 , X2 is in turn determined by explicit formulas for X0 , X1 , together with the involution wl and the map π0 : X1 → X0 . Indeed π1 : X1 → X0 is then π0 ◦ wl , and the product map π0 × π1 : X2 → X12 identifies X2 with a curve in X12 contained in the locus of (28) {(Q1 , Q2) ∈ X12 : π1 (Q1 ) = π0 (Q2 )}, which decomposes as the union of that curve with the graph of wl .5 This determines X2 and the projections πj : X2 → X1 (j = 0, 1); the involution wl2 is (29) (Q1 , Q2 ) ↔ (wl Q2 , wl Q1 ). Thus the equations we shall exhibit for certain choices of A and l suffice to determine explicit formulas for towers of Shimura modular curves X0 (lr ), X0∗ (lr ), / Σ ∪ {l} is known to be asymptotically towers whose reduction at any prime l0 ∈ 2 optimal over the field of l0 elements [I3,TVZ]. 2.4
Complex-Multiplication (CM) and Supersingular Points on Shimura Curves
Let F be a quadratic imaginary field, and let OF be its ring of integers. Assume that none of the primes of Σ split in F . Then F embeds in A (in many ways), 5
This is where we use the hypothesis that l is prime. The description of Xn in (26,27) holds even for composite l, but the description of X2 in terms of X1 does not, because then (28) has other components.
14
Noam D. Elkies
and OF embeds in O. For any embedding ι : F ,→ A, the image of F ∗ in A∗ /Q∗ then has a unique fixed point on H; the orbit of this point under Γ (1), or under any other congruence subgroup Γ ⊂ A∗ /Q∗ , is then a CM point on the Shimura curve H/Γ . In particular, on X (1) such a point parametrizes a ppas with extra endomorphisms by ι(F ) ∩ O. For instance if ι(F ) ∩ O = ι(OF ) then this ppas is a product of elliptic curves each with complex multiplication by OF (but not in the product polarization). In general ι−1 (ι(F ) ∩ O) is called the CM ring of the CM point on X (1). Embeddings conjugate by Γ (1) yield the same point on X (1), and for each order O ⊂ F there are finitely many embeddings up to conjugacy, and thus finitely many CM points on X (1) with CM ring O; in fact their number is just the class number of O. In [S2] Shimura already showed that all points with the same CM ring are Galois conjugate over Q, from which it follows that a CM point is rational if and only if its CM ring has unique factorization. Thus far the description is completely analogous to the theory of complex multiplication for j-invariants of elliptic curves. But when Σ 6= ∅ a new phenomenon arises: CM points on the quotient curve X ∗ (1) may be rational even when their preimages on X (1) are not. For instance, a point with CM ring OF is rational on X ∗(1) if and only if the class group of F is generated by the classes of ideals I ⊂ OF such that I 2 is the principal ideal (p) for some rational prime p ∈ Σ. This has the amusing consequence that when Σ = {2, 3} the number of rational CM points on X ∗ (1) is more than twice the number of rational CM points on the classical modular curve X(1). [Curiously, already in the classical setting X(1) does not hold the record: it has 13 rational CM points, whilst X0∗ (6) = X0 (6)/hw2 , w3 i has 14. The reason again is fields F with nontrivial class group generated by square roots of the ideals (2) or (3), though with a few small exceptions both 2 and 3 must ramify in F . In the X ∗ (1) setting the primes of Σ are allowed to be inert as well, which makes the list considerably longer.] In fact for each of the first four cases Σ = {2, 3}, {2, 5}, {2, 7}, {3, 5} we find more rational CM points than on any classical modular curve. A major aim of this paper is computation of the coordinates of these points. We must first list all possible O. The class number of O, and thus of F , must be a power of 2 no greater than 2#Σ . In each of our cases, #Σ = 2, so F has class number at most 4 and we may refer to the list of imaginary quadratic number fields with class group (Z/2)r (r = 0, 1, 2), proved complete by Arno [A].6 Given F we easily find all possible O, and imbed each into O by finding a ∈ O such that (a − ¯ a)2 = disc(O). This gives us the CM point on H. But we want its coordinates on the Shimura curve H/Γ ∗(1) as rational numbers. Actually only one coordinate is needed because X ∗ (1) has genus 0 for each of our four Σ. We recover the coordinate as a real number using our Schwarzian uniformization of X ∗ (1) by H. (Of course a coordinate on P1 is only defined up to PGL2 (Q), 6
It might be possible to avoid that difficult proof for our application, since we are only concerned with fields whose class group is accounted for by ramified primes in a given set Σ, and it may be possible to provably list them all using the arithmetic of CM points on either classical or Shimura modular curves, as in Heegner’s proof √ that Q( −163 ) is the last quadratic imaginary field of class number 1.
Shimura Curve Computations
15
but in each case we choose a coordinate once and for all by specifying it on the CM points.) We then recognize that number as a rational number from its continued fraction expansion, and verify that the putative rational coordinate not only agrees with our computations to as many digits as we want but also satisfies various arithmetic conditions such as those described later in this section. Of course this is not fully satisfactory; we do not know how to prove that, for instance, t = 132 672 1092 1392 1572163/21056 116 176 (see Tables 1,2 below) is the CM point of discriminant −163 on the curve X ∗ (1) associated with the algebra ramified at {2, 3}. But we can prove that above half of our numbers are correct, again using the modular curves X ∗ (l) and their involutions wl for small l. This is because CM points behave well under isogenies: any point isogenous to a CM point is itself CM, and moreover a point on X (1) or X ∗ (1) is CM if and only if it admits a cyclic d-isogeny to itself for some d > 1. Once we have formulas for X0∗(l) and wl we may compute all points cyclically l-isogenous either with an already known CM points or with themselves. The discriminant of a new rational CM point can then be determined either by arithmetic tests or by identifying it with a real CM point to low precision. The classical theory of supersingular points also largely carries over to the Shimura setting. We may use the fact that the ppas parametrized by a CM point has extra endomorphisms to define CM points of Shimura curves algebraically, and thus in any characteristic ∈ / Σ. In positive characteristic p ∈ / Σ, any CM point is defined over some finite field, and conversely every Fp -point of a Shimura curve is CM. All but finitely many of these parametrize ppas’s whose endomorphism ring has Z-rank 8; the exceptional points, all defined over Fp2 , yield rank 16, and are called supersingular, all other Fp -points being ordinary. One may choose coordinates on X (1) (or X ∗ (1)) such that a CM point in characteristic zero reduces mod p to a ordinary point if p splits in the CM field, and to a supersingular point otherwise. Conversely each ordinary point mod p lifts to a unique CM point (cf. [D] for the classical case). This means that if two CM points with different CM fields have the same reduction mod p, their common reduction is supersingular, and then as in [GZ] there is an upper bound on p proportional to the product of the two CM discriminants. So for instance if X ∗ (1) ∼ = P1 then the difference between the coordinates of two rational CM points is a product of small primes. This remains the case, for similar reasons, even for distinct CM points with the same CM field, and may be checked from the tables of rational CM points in this paper. The preimages of the supersingular points on modular covers such as X0 (l) yield enough Fp2 -rational points on these curves to attain the Drinfeld-Vl˘ adut¸ bound [I3]; these curves are thus “asymptotically optimal” over Fp2 . Asymptotically optimal curves over Fp2f (f > 1) likewise come from Shimura curves associated to quaternion algebras over totally real number fields with a prime of residue field Fpf . In the case of residue field Fp (so in particular for quaternion algebras over Q) Ihara [I2] found a remarkable connection between the hyperbolic uniformization of a Shimura curve X = H/Γ and the supersingular points of its reduction mod p. We give his result in the case that X has genus 0, because we will only apply it
16
Noam D. Elkies
to such curves and the result can be stated in an equivalent and elementary form (though the proof is still far from elementary). Since we are working over Fp , we may identify any curve of genus 0 with P1 , and choose a coordinate (degree-1 function) t on P1 such that t = ∞ is an elliptic point. Let ti be the coordinates of the remaining elliptic points. First, the hyperbolic area of the curve controls the number of points, which is approximately 12 (p + 1)Area(X ) — “approximately” because 12 (p + 1)Area(X ) is not the number of points but their total mass. The mass of a non-elliptic supersingular point is 1, but an elliptic point with stabilizer G has mass 1/#G. If the elliptic point mod p is the reduction of only one elliptic point on H/Γ (which, for curves coming from quaternion algebras over Q, is always the case once p > 3), then its stabilizer is Z/eZ and its mass is 1/e where e is the index of that elliptic point. [The mass formula also holds for X of arbitrary genus, and for general residue fields provided p is replaced by the size of the field.] Let d be the number of non-elliptic supersingular points, and choose a Schwarzian equation (16) with at most regular singularities at t = ∞, ti and no other singularities. Then the supersingular points are determined uniquely by the condition that their t-coordinates are the roots of a polynomial P (t) of degree d such that Q for some ri ∈ Q the algebraic function i (t − ti )ri · P (t) is a solution of the Schwarzian differential equation (16)! For instance [I2, 4.3], if Γ is a triangle group we may choose ti = 0, 1, and then P (t) is a finite hypergeometric series mod p. Given t0 ∈ Q we may then test whether t0 is ordinary or supersingular mod p for each small p. If t0 is a CM point with CM field then its reduction is ordinary if p splits in F , supersingular otherwise. When we have obtained t0 as a good rational approximation to a rational CM point, but could not prove it correct, we checked for many p whether t0 is ordinary or supersingular mod p; when each prime behaves as expected from its behavior in F , we say that t0 has “passed the supersingular test” modulo those primes p.
3 3.1
The Case Σ = {2, 3} The Quaternion Algebra and the Curves X (1), X ∗ (1)
For this section we let A be the quaternion algebra ramified at {2, 3}. This algebra is generated over Q by elements b, c satisfying b2 = 2, c2 = −3, bc = −cb.
(30)
The conjugation of A fixes 1 and takes b, c, bc to −b, −c, −bc; thus for any element α = α1 + α2 b + α3 c + α4 bc ∈ A the conjugate and norm of α are given by α ¯ = α1 − α2 b − α3 c − α4 bc,
N(α) = α21 − 2α22 + 3α23 − 6α24 .
(31)
Since A is indefinite, all its maximal orders are conjugate; let O be the maximal order generated by b and (1+c)/2. Then Γ ∗(1) contains Γ (1) with index 2#Σ = 4,
Shimura Curve Computations
17
and consists of the classes mod Q∗ of elements of O of norm 1, 2, 3, or 6. In row II of Table 3 of [T] (p.208) we find that Γ ∗(1) is isomorphic with the triangle group (32) G2,4,6 := hs2 , s4 , s6 |s22 = s44 = s66 = s2 s4 s6 = 1i. Indeed we find that Γ ∗(1) contains elements s2 = [bc + 2c],
s4 = [(2 + b)(1 + c)],
s6 = [3 + c]
(33)
[NB (2 + b)(1 + c), 3 + c ∈ 2O] of orders 2, 4, 6 with s2 s4 s6 = 1. The subgroup of Γ ∗(1) generated by these elements is thus isomorphic with G2,4,6. But a hyperbolic triangle group cannot be isomorphic with a proper subgroup (since the areas of the quotients of H by the group and its subgroup are equal), so Γ ∗(1) is generated by s2 , s4 , s6 . Note that these generators have norms 6, 2, 3 mod (Q∗ )2 , and thus represent the three nontrivial cosets of Γ ∗ (1) in O∗ /{±1}. Since Γ ∗ (1) is a triangle group, X ∗ (1) is a curve of genus 0. Moreover X ∗(1) has Q-rational points (e.g. the three elliptic points, each of which must be rational because it is the only one of its index), so X ∗ (1) ∼ = P1 over Q. Let t be a rational coordinate on that curve (i.e. a rational function of degree 1). In general a rational coordinate on P1 is determined only up to the PGL2 action on P1 , but can be specified uniquely by prescribing its values at three points. In our case X ∗ (1) has three distinguished points, namely the elliptic points of orders 2, 4, 6; we fix t by requiring that it assume the values 0, 1, ∞ respectively at those three points. None of s2 , s4 , s6 is contained in Γ (1). Hence the (Z/2)2 cover X (1)/X ∗(1) is ramified at all three elliptic points. Thus s2 lies under two points of X (1) with trivial stabilizer, while s4 lies under two points of index 2 and s6 under two points of index 3. By either the Riemann-Hurwitz formula or from (10) we see that X (1) has genus 0. This and the orders 2, 2, 3, 3 of the elliptic points do not completely specify Γ (1) up to conjugacy in PSL2 (R): to do that we also need the cross-ratio of the four elliptic points. Fortunately this cross-ratio is determined by the existence of the cover X (1) → X ∗(1), or equivalently of an involution s4 on X (1) that fixes the two order-2 points and switches the order-3 points. This forces the pairs of order-2 and order-3 points to have a cross-ratio of −1, or to “divide each other harmonically” as the Greek geometers would say. The function field of X (1) is generated by the square roots of c0 t and c1 (t−1) for some c0 , c1 ∈ Q∗ /Q∗ 2 , but we do not yet know which multipliers c0 , c1 are appropriate. If both c0 , c1 were 1 then X (1) would be a rational curve with coordinate u with t = ((u2 + 1)/2u)2 = 1 + ((u2 − 1)/2u)2 , the familiar parametrization of Pythagorean triples. The elliptic points of order 2 and 3 would then be at u = ±1 and u = 0, ∞. However it will turn out that the correct choices are c0 = −1, c1 = 3, and thus that X (1) is the conic with equation X 2 + Y 2 + 3Z 2 = 0
(34)
and no rational points even over R. [That X (1) is the conic (34) is announced in [Ku, p.279] and attributed to Ihara; that there are no real points on the Shimura
18
Noam D. Elkies
curve X (1) associated to any indefinite quaternion algebra over Q other than M2 (Q) was already shown by Shimura [S3]. The equation (34) for X (1) does not uniquely determine c0 , c1 , but the local methods of [Ku] could probably supply that information as well.] 3.2
Shimura Modular Curves X0∗ (l) and X (l) for l = 5, 7, 13
Let l be a prime other than the primes 2, 3 of Σ. We determine the genus of the curve X0∗ (l) using the formula (10). Being a cover of X ∗(1) of degree l + 1, the curve X0∗ (l) has normalized hyperbolic area (l + 1)/12. It has 1 + (−6/l) elliptic points of order 2, 1 + (−1/l) elliptic points of order 4, and 1 + (−3/l) elliptic points of order 6. This is a consequence of our computation of s2 ,√s4 , s6 , which of A that generate subfields isomorphic with Q( −6 ), √ √ lift to elements Q( −1 ), and Q( −3 ). Actually the orders 2, 4, 6 of the elliptic points suffice. Consider the images of s2 , s4 , s6 in the Galois group (⊆ PGL2 (Fl )) of the cover X0∗(l)/X ∗ (1), and the cycle structures of their actions on the l + 1 points of P1 (Fl ). These images σ2 , σ4 , σ6 are group elements of order 2, 4, 6. For 4 and 6, the order determines the conjugacy class, which joins as many of the points of P1 (Fl ) as possible in cycles of length 4 or 6 respectively and leaves any remaining points fixed; the number of fixed points is two or none according to the residue of l mod 4 or 6. For σ2 there are two conjugacy classes in PGL2 (Fl ), one with two fixed points and the other with none, but the choice is determined by the condition that the genus g(X0∗ (l)) be an integer, or equivalently by the requirement that the signs of σ2 , σ4 , σ6 considered as permutations of P1 (Fl ) be consistent with s2 s4 s6 = 1. We readily check that this means that the image of s2 has two fixed points if and only if (−6/l) = +1, as claimed. From (10) we conclude that −6 −1 −3 1 ∗ l−6 −9 − 10 . (35) g(X0 (l)) = 24 l l l We tabulate this for l < 50: 5 7 11 13 17 19 23 29 31 37 41 43 47 l g(X0∗ (l)) 0 0 1 0 1 1 2 1 1 1 2 2 3 It so happens that in the first seven cases g(X0∗ (l)) coincides with the genus of the classical modular curve X0 (l), but of course this cannot go on forever because the latter genus is l/12 + O(1) while the former is only l/24 + O(1), and indeed g(X0∗ (l)) is smaller for all l > 23. Still, as with X0 (l), we find that X0∗(l) has genus 0 for l = 5, 7, 13, but not for l = 11 or any l > 13. For the three genus-0 cases we shall use the ramification behavior of the cover X0∗(l)/X ∗ (1) to find an explicit rational function of degree l + 1 on P1 that realizes that cover and determine the involution wl . Now for any l > 3 the solution of σ2 σ4 σ6 = 1 in elements σ2 , σ4 , σ6 of orders 2, 4, 6 in PGL2 (Fl ) is unique up to conjugation in that group. Thus we know from the general theory of [Mat] that the cover X0∗ (l)/X ∗ (1) is determined by its Galois group and ramification data. Unfortunately the proof of this fact
Shimura Curve Computations
19
does not readily yield an efficient computation of the cover; for instance the Riemann existence theorem for Riemann surfaces is an essential ingredient. We use a method for finding the rational function t : X0∗(l) → X ∗(1) explicitly that amounts to solving for its coefficients, using the cycle structures of σ2 , σ4 , σ6 to obtain algebraic conditions. In effect these conditions are the shape of the divisors (t)0 , (t)1 , (t)∞ . But a rational function satisfying these conditions is not in general known to have the right Galois group: all we know is that the monodromy elements around 0, 1, ∞ have the right cycle structures in the symmetric group Sl+1 . Thus we obtain several candidate functions, only one of which has Galois group PGL2 (Fl ) (or PSL2 (Fl ) if l ≡ 1 mod 24). Fortunately for l = 5, 7 we can exclude the impostors by inspection, and for l = 13 the computation has already been done for us. l=5. Here the cycle structures of s2 , s4 , s6 are 2211, 411, 6. Curiously if the identity in the symmetric group S6 is written as the product of three permutations σ2 , σ4 , σ6 with these cycle structures then they can never generate all of S6 . This can be seen by considering their images σ20 , σ40 , σ60 under an outer automorphism of S6 : these have cycle structures 2211, 411, 321, and thus have too many cycles to generate a transitive subgroup (if two permutations of n letters generate a transitive subgroup of Sn then they and their product together have at most n + 2 cycles). It turns out that the subgroup generated by σ20 , σ40 , σ60 can be either A4 ×S2 or the point stabilizer S5 . In the former case σ2 , σ4 , σ6 generate a transitive but imprimitive subgroup of S6 : the six letters are partitioned into three pairs, and the group consists of all permutations that respect this partition and permute the pairs cyclically. In the latter case σ2 , σ4, σ6 generate PGL2 (F5 ); this is the case we are interested in. In each of the two cases the triple (σ2 , σ4 , σ6) is determined uniquely up to conjugation in the subgroup of S6 generated by the σ’s, each of which is in a rational conjugacy class in the sense of [Mat]. Thus each case corresponds to a unique degree-6 cover P1 → P1 defined over Q. We shall determine both covers. Let t be a rational function on P1 ramified only above t = 0, 1, ∞ with cycle structures 2211, 411, 6. Choose a rational coordinate x on P1 such that x = ∞ is the sextuple pole of t and x = 0 is the quadruple zero of t − 1; this determines x up to scaling. Then t is a polynomial of degree 6 in x with two double roots such that t ≡ 1 mod x4 . The double roots are necessarily the roots of the quadratic polynomial x−3 dt/dx. Thus t is a polynomial of the form c6 x6 + c5 x5 + c4 x4 + 1 divisible by 6c6 x2 +5c5 x+4c4 . We readily compute that there are two possibilities for c4 , c5 , c6 up to scaling (c4 , c5 , c6 ) → (λ4 c4 , λ5 c5 , λ6 c6 ). One possibility gives t = 2x6 − 3x4 + 1 = (x2 − 1)2 (2x2 + 1); being symmetric under x ↔ −x this must be the imprimitive solution. Thus the remaining possibility must give the PGL2 (F5 ) cover X0∗(5)/X ∗ (1). The following choice of scaling of x = x5 seems simplest: t = 540x6 + 324x5 + 135x4 + 1 (36) = 1 + 27x4 (20x2 + 12x + 5) = (15x2 − 6x + 1)(6x2 + 3x + 1)2 . The elliptic points of order 2 and 4 on X0∗ (5) are the simple zeros of t and t − 1 respectively, i.e. the roots of 15x2 − 6x + 1 and 20x2 + 12x + 5. The involution
20
Noam D. Elkies
w5 switches each elliptic point with the other elliptic point of the same order; this suffices to determine w5 . The fact that two pairs of points on P1 switched by an involution of P1 determine the involution is well-known, but we have not found in the literature an explicit formula for doing this. Since we shall need this result on several occasion we give it in an Appendix as Proposition A. Using that formula (89), we find that w5 (x) =
42 − 55x . 55 + 300x
(37)
l=7. This time s2 , s4 , s6 have cycle structures 22211, 44, 611. Again there are several ways to get the identity permutation on 8 letters as a product of three permutations with these cycle structures, none of which generate the full symmetric group S8 . There are two ways to get the imprimitive group 24 : S4 ; the corresponding covers are obtained from the S4 cover t = 4ξ 3 − 3ξ 4 by taking ξ = x2 +ξ0 where ξ0 is either root of the quadratic 3ξ 2 +2ξ +1 = (1 −t)/(ξ −1)2 . The remaining solution corresponds to our PGL2 (F7 ) cover. To find that cover, let t be a rational function on P1 ramified only above t = 0, 1, ∞ with cycle structures 2211, 411, 6, and choose a rational coordinate x on P1 such that x = ∞ is the sextuple pole of t. This determines x up to an affine linear transformation. Then there is a cubic polynomial P and quadratic relatively prime polynomials Q1 , Q2 , Q3 in x such that t = P 2 Q1 /Q3 = 1 + , i.e. such that P 2 Q1 −Q42 is quadratic. Equivalently, the Taylor expansion Q42 /Q3√ 2 of Q2 / Q1 about x = ∞ should have vanishing x−1 and x−2 coefficients, and then R(x) is obtained by truncating that Taylor expansion after its constant term. We assume without loss of generality that Q1 , Q2 are monic. By translating x (a.k.a. “completing the square”) we may assume that Q1 is of the form x2 + α. If the same were true of Q2 then t would be a rational function of x2 and we would have an imprimitive cover. Thus the constant coefficient of Q2 is nonzero, and by scaling x we√may take Q2 = x2 + x + β. We then set the x−1 , x−2 coefficients of of Q22 / Q1 to zero, obtaining the equations 3α2 − 8αβ + 8β 2 − 4α = 3α2 − 4αβ = 0.
(38)
Thus either α = 0 or α = 4β/3. The first option yields β = 0 which fails because then Q1 , Q2 have the common factor x. The second option yields β = 0, which again fails for the same reason, but also β = 2 which succeeds. Substituting −(2x + 1)/3 for x to reduce the coefficients we then find: t=−
(4x2 + 4x + 25)(2x3 − 3x2 + 12x − 2)2 108(7x2 − 8x + 37) (39) (2x2 − x + 8)4 . =1− 108(7x2 − 8x + 37)
The elliptic points of order 2 and 6 on X0∗ (7) are respectively the simple zeros and poles of t, i.e. the roots of 4x2 + 4x + 25 and 7x2 − 38x + 7. The involution
Shimura Curve Computations
21
w7 is again by the fact that it switches each elliptic point with the other elliptic point of the same order: it is w7 (x) =
116 − 9x . 9 + 20x
(40)
l=13. Here the cycle structures are 27 , 44411, 6611. The computation of the degree-14 map is of course much more complicated than for the maps of degrees 6, 8 for l = 5, 7. Fortunately this computation was already done in [MM, §4] (a paper concerned not with Shimura modular curves but with examples of rigid PSL2 (Fp ) covers of the line). There we find that there is a coordinate x = x13 on X0∗ (13) for which t=1−
=
27 (x2 + 36)(x3 + x2 + 35x + 27)4 4 (7x2 + 2x + 247)(x2 + 39)6
(41)
(x7 − 50x6 + 63x5 − 5040x4 + 783x3 − 168426x2 − 6831x − 1864404)2 . 4(7x2 + 2x + 247)(x2 + 39)6
The elliptic points of order 4 and 6 on X0∗ (13) are respectively the simple zeros and poles of t − 1, i.e. the roots of x2 + 36 and 7x2 + 2x + 247. Once more we use (89) to find the involution from the fact that it switches each elliptic point with the other elliptic point of the same order: w13 (x) =
5x + 72 . 2x − 5
(42)
From an equation for X ∗ (l) and the rational map t on that curve we recover X0 (l) by adjoining square roots of c0 t and c1 (t − 1). For each of our three cases l = 5, 7, 13 the resulting curve has genus 1, and its Jacobian is an elliptic curve of conductor 6l — but only if we choose c0 , c1 that give the correct quadratic twist. For l = 5, l = 7, l = 13 it turns out that we must take a square root of 3t(1 − t), −t, 3(t − 1) respectively. Fortunately these are consistent and we obtain c0 = −1 and c1 = 3 as promised. The resulting curves X0 (5), X0 (7), X0 (13) have no rational or even real points (because this is already true of the curve X (1) which they all cover); their Jacobians are the curves numbered 30H, 42C, 78B in the Antwerp tables in [BK] compiled by Tingley et al., and and 30-A8, 42-A3, 78-A2 in Cremona [C]. 3.3
Supersingular Points on X ∗ (1) mod l
We have noted that Ihara’s description of supersingular points on Shimura curves is particularly simple in the case of a triangle group: the non-elliptic supersingular points are roots of a hypergeometric polynomial, and the elliptic points are CM in characteristic zero so the Deuring test determines whether each one is supersingular or not. In our case, The elliptic points t = 0, = ∞ are supersingular mod l √ t = 1, t √ √ if and only iff l is inert in Q( −6 ), Q( −1 ), Q( −3 ) respectively, i.e. iff −6,
22
Noam D. Elkies
−1, −3 is a quadratic nonresidue of l. Thus the status of all three elliptic points depends on l mod 24, as shown in the next table: l mod 24 t
e
1 5 7 11 13 17 19 23 0 2 • • • • • • • • 1 4 • • • • ∞ 6 (bullets mark elliptic points with supersingular reduction). This could also be obtained from the total mass (l + 1)/24 of supersingular points, together with the fact that the contribution to this mass of the non-elliptic points is integral: in each column the table shows the unique subset of 1/2, 1/4, 1/6 whose sum is congruent to (l + 1)/24 mod 1. The hypergeometric polynomial whose roots are the non-elliptic supersingular points has degree bl/24c, and depends on l mod 24 as follows: 1 5 1 , 24 ; 2 ; t), if l ≡ 1 or 5 mod 24; F ( 24 7 11 1 F ( , ; ; t), if l ≡ 7 or 11 mod 24; 24 24 2 (43) 13 17 3 , ; ; t), if l ≡ 13 or 17 mod 24; F ( 24 24 2 19 23 3 if l ≡ 19 or 23 mod 24. F ( 24 , 24 ; 2 ; t), For example, for l = 163(≡ 19 mod 24) we find F(
19 23 3 , ; ; t) = 43t6 + 89t5 + 97t4 + 52t3 + 149t2 + 132t + 1 24 24 2 = (t + 76)(t + 78)(t + 92)(t + 127)(t2 + 65t + 74) (44)
in characteristic 163, so the supersingular points mod 163 are 0, 1, and the roots of (44) in F1632 . 3.4
CM Points on X ∗ (1) via X0∗ (l) and wl
We noted already that the elliptic points t = 0, 1, ∞ on X ∗(1) are CM points, with discriminants −3, −4, −24. Using our formulas for X0 (l) and wl (l = 5, 7, 13) we can obtain fourteen further CM points: three points isogenous to one of the elliptic CM points, and eleven more points cyclically isogenous to themselves. This accounts for all but ten of the 27 rational CM points on X ∗ (1). The discriminants of the three new points isogenous to t = 1 or t = ∞ are determined by the isogenies’ degrees. The discriminants of the self-isogenous points can be surmised by testing them for supersingular reduction at small primes: in each case only one discriminant small enough to admit a self-isogeny of that degree has the correct quadratic character at the first few primes, which is then confirmed by extending the test to all primes up to 200. On X0∗ (5) the image of x5 = ∞ under w5 is −11/60, which yields the CM point t = 152881/138240; likewise from w5 (0) = 42/55 we recover the point 421850521/1771561. These CM points are 5-isogenous with the elliptic points
Shimura Curve Computations
23
t = ∞, t = 1 respectively, and thus have discriminants −3 · 52 and −4 · 52 . Similarly on X0∗(7) we have w7 (∞) = −9/20 at which t = −1073152081/3024000000, a CM point 7-isogenous with t = ∞ and thus of discriminant −3 · 72 . For each of l = 5, 7, 13 the two fixed points of wl on X0∗ (l) are rational and yields two new CM points of discriminants −cl for some factors c of 24. For X0∗(5) these fixed points are x5 = −3/5 and x5 = 7/30, at which t = 2312/125 and t = 5776/3375 respectively; these CM points have discriminants −40, −120 by the supersingular test. For X0∗ (7) we find x7 = 2 and x7 = −29/10, and thus t = −169/27, t = −701784/15625 of discriminants −84, −168 divisible by 7. For X0∗(13) the fixed points x13 = 9, x13 = −4 yield t = 6877/15625 and t = 27008742384/27680640625, with discriminants −52 = 4 · 13 and −312 = 24 · 13. Each of these new CM points admits an l-isogeny to itself. By solving the equation t(xl ) = t(wl (xl )) we find the remaining such points; those not accounted for by fixed points of wl admit two self-isogenies of degree l, and correspond to a quadratic pair of xl values over Q(t). As it happens all the t’s thus obtained are rational with the exception of a quadratic pair coming from the quartic 167x413 −60x313 +12138x213 −1980x13 +221607 = 0. Those points are: from X0∗ (5), the known t = 1, t = −169/25, and the new t = −1377/1024, t = 3211/1024 of discriminants −51, −19; from X0∗ (7), the CM points t = 0, 152881/138240, 3211/1024, 2312/125, 6877/15625 seen already, but also t = 13689/15625 of discriminant −132; and from X0∗ (13), seven of the CM points already known and also the two new values t = 21250987/16000000, 15545888/20796875 of discriminants −43, −88. 3.5
Numerical Computation of CM Points on X ∗ (1)
If we could obtain equations for the modular cover of X ∗ (1) by the elliptic curve X ∗ (11), X ∗(17) or X ∗ (19) we could similarly find a few more rational CM points on X ∗(1). But we do not know how to find these covers, let alone the cover X ∗(l) for l large enough to get at the rational CM point of discriminant −163; moreover, some applications may require irrational CM points of even higher discriminants. We thus want a uniform way of computing the CM points of any given discriminant as an algebraic irrationality. We come close to this by finding these points and their algebraic conjugates as real (or, in the irrational case, complex) numbers to high precision, and then using continued fractions to recognize their elementary symmetric functions as rational numbers. We say that this “comes close” to solving the problem because, unlike the case of the classical modular functions such as j, we do not know a priori how much precision is required, since the CM values are generally not integers, nor is an effective bound known on their height. However, even when we cannot prove that our results are correct using an isogeny of low degree, we are quite confident that the rational numbers we exist are correct because they not only match their numerical approximations to many digits but also pass all the supersingularity tests we tried as well as the condition that differences between pairs of CM values are products of small primes as in [GZ].
24
Noam D. Elkies
To do this we must be able to compute numerically the rational function t : ∼ H/Γ ∗(1)→P1 . Equivalently, we need to associate to each t ∈ P1 a representative of its corresponding Γ ∗(1)-orbit in H. We noted already that this is done, up to a fractional linear transformation over C, by the quotient of two hypergeometric functions in t. To fix the transformation we need images of three points, and we naturally choose the elliptic points t = 0, 1, ∞. These go to fixed points of s2 , s4 , s6 ∈ Γ ∗(1), and to find those fixed points we need an explicit action of Γ ∗(1) on H. To obtain such an action we must imbed that group into Aut(H) = PSL2 (R). Equivalently, we must choose an identification of A ⊗ R with the algebra M2 (R) of 2 × 2 real matrices. Having done this, to obtain the action of some g ∈ Γ ∗(1) ⊂ A∗/Q∗ on H we will choose a representative of g in A∗, identify this representative with an invertible matrix ( ac db ) of positive determinant, and let g act on z ∈ H by z 7→ (az + b)/(cz + d). Identifying A ⊗ R with M2 (R) is in turn tantamount to solving (30) in M2 (R). We choose the following solution: √ √ 2 √ 0 3 0 √ , c := . (45) b := 0 − 2 − 3 0 The elliptic points are then the Γ ∗(1) orbits of the fixed points in the upper half-plane of s2 , s4 , s6 , that is, of √ √ √ 1+ 2 (46) P2 := (1 + 2)i, P4 := √ (−1 + 2 i), P6 := i. 3 Thus for |t| < 1 the point on H/Γ ∗ (1) which maps to t is the Γ ∗ (1) orbit of z near P2 such that (47) (z − P2 )/(z − P¯2 ) = F1 (t)/F2 (t) for some solutions F1 , F2 of the hypergeometric equation (18). Since the fractional linear transformation z 7→ (z − P2 )/(z − P¯2 ) takes the hyperbolic lines P2 P4 and P2 P6 to straight lines through the origin, √ F2 must be a power series in t, and F1 is such a power series multiplied by t; that is, 1 5 1 13 17 3 , , ,t F , , , t) . (48) (z − P2 )/(z − P¯2 ) = Ct1/2 F 24 24 2 24 24 2 for some nonzero constant C. We evaluate C by taking t = 1 in (48). Then z = P4 , which determines the left-hand side, while the identity [GR, 9.122] F (a, b; c; 1) =
Γ (c)Γ (c − a − b) Γ (c − a)Γ (c − b)
(49)
gives us the coefficient of C in the right-hand side in terms of gamma functions. We find C = (.314837 . . .)i/(2.472571 . . .) = (.128545 . . .)i. Likewise we obtain convergent power series for computing z in neighborhoods of t = 1 and t = ∞. Now let D be the discriminant of an order OD in a quadratic imaginary field √ Q( D) such that OD has a maximal embedding in O (i.e. an embedding such
Shimura Curve Computations
25
that OD = (OD ⊗ Q) ∩ O) and the embedding is unique up to conjugation in Γ ∗(1). Then there is a unique, and therefore rational, CM point on X ∗ (1) of discriminant D. Being rational, the point is real, and thus can be found on one of the three hyperbolic line segments P2 P4 , P2 P6 , P4 P6 . It is thus the fixed point of a positive integer combination, with coprime coefficients, of two of the elliptic elements s2 = bc + 2c, s4 = (2 + b)(1 + c)/2, s6 = (3 + c)/2 with fixed points P2 , P4 , P6. In each case a short search finds the appropriate linear combination and thus the fixed point z. Using (48) or the analogous formulas near t = 1, t = ∞ we then solve for t as a real number with sufficient accuracy (60 decimals was more than enough) to recover it as a rational number from its continuedfraction expansion. 3.6
Tables of Rational CM Points on X ∗ (1)
There are 27 rational CM points on X ∗ (1). We write the discriminant D of each of them as −D0 D1 where D0 |24 and D1 is coprime to 6. In Table 1 we give, for each |D| = D0 D1 , the integers A, B with B ≥ 0 such that (A : B) is the t-coordinate of a CM point of discriminant D. In the last column of this table we indicate whether the point was obtained algebraically (via an isogeny of degree 5, 7, or 13) and thus proved correct, or only computed numerically. The CM points are listed in order of increasing height max(|A|, B). In Table 2 we give, for each except the first three cases, the factorizations of |A|, B, |C| where C = A − B, and also the associated “ABC ratio” [E1] defined by r = log N (ABC)/ log max(|A|, B, |C|). As expected, the A, B, C values are “almost” perfect squares, sixth powers, and fourth powers respectively: a prime at which at which the valuation of A, B, C is not divisible by 2, 6, 4 resp. is either 2, 3, or the unique prime in D1 . When D1 > 1 its unique prime factor is listed at the end of the |A|, B, or |C| factorization in which it appears; otherwise the prime factors are listed in increasing order. In the factorization of the difference between the last two t = A/B values in this table, the primes not accounted for by common factors in the last two rows of the table are 79, 127, 271, 907, 2287, 2971, 3547, each occurring once.
4 4.1
The Case Σ = {2, 5} The Quaternion Algebra and the Curves X (1), X ∗ (1)
For this section we let A be the quaternion algebra ramified at {2, 5}. This time A is generated over Q by elements b, e satisfying b2 + 2 = e2 − 5 = be + eb = 0,
(50)
and the conjugate and norm of an element α = α1 + α2 b + α3 e + α4 be ∈ A are α ¯ = α1 − α2 b − α3 e − α4 be,
αα ¯=α ¯ α = α21 + 2α22 − 5α23 − 10α24 .
(51)
The elements b and (1 + e)/2 generate a maximal order, which we use for O.
26
Noam D. Elkies
Table 1 |D| D0 D1 A B proved? 3 3 1 1 0 Y 4 4 1 1 1 Y 24 24 1 0 1 Y 84 12 7 −169 27 Y 40 8 5 2312 25 Y 51 3 17 −1377 1024 Y 19 1 19 3211 1024 Y 120 24 5 5776 3375 Y 52 4 13 6877 15625 Y 132 12 11 13689 15625 Y 75 3 52 152881 138240 Y 168 24 7 −701784 15625 Y 43 1 43 21250987 16000000 Y 228 12 19 66863329 11390625 N 88 8 11 15545888 20796875 Y 123 3 41 −296900721 16000000 N 100 4 52 421850521 1771561 Y 147 3 72 −1073152081 3024000000 Y 312 24 13 27008742384 27680640625 Y 67 1 67 77903700667 1024000000 N 148 4 37 69630712957 377149515625 N 372 12 31 −455413074649 747377296875 N 408 24 17 −32408609436736 55962140625 N 267 3 89 −5766681714488721 1814078464000000 N 232 8 29 66432278483452232 56413239012828125 N 708 12 59 71475755554842930369 224337327397603890625 N 163 1 163 699690239451360705067 684178814003344000000 N
By (9), the curve X ∗ (1) has hyperbolic area 1/6. Since the algebra A is not among the nineteen algebras listed in [T] that produce arithmetic triangle groups, X ∗ (1) must have at least four elliptic points. On the other hand, by (10) a curve of area as small as 1/6 cannot have more than four elliptic points, and if it has exactly four then their orders must be 2, 2, 2, 3. Indeed we find in Γ ∗(1) the elements of finite order s2 = [b], s02 = [2e + 5b − be], s002 = [5b − be], s3 = [2b − e − 1]
(52)
[NB 2e + 5b − be, 5b − be, 2b − e − 1 ∈ 2O] of orders 2, 2, 2, 3 with s2 s02 s002 s3 = 1. As in the case of the G2,4,6 we conclude that here Γ ∗(1) has the presentation hs2 , s02 , s002 , s3 |s22 = s02 = s002 = s33 = s2 s02 s002 s3 = 1i. 2
2
(53)
Of the four generators only s3 is in Γ (1); thus the (Z/2)2 cover X (1)/X ∗(1) is ramified at the elliptic points of order 2. Therefore X (1) is a rational curve with four elliptic points of order 3, and Γ (1) is generated by four 3-cycles whose product is the identity, for example by s3 and its conjugates by s2 , s02 , s002 . (The
Shimura Curve Computations
27
Table 2 |D| 84 40 51 19 120 52 132 75 168 88 43 228 123 100 147 312 67 148 372 408 267 232 708 163
D0 12 8 3 1 24 4 12 3 24 8 1 12 3 4 3 24 1 4 12 24 3 8 12 1
D1 |A| B |C| 7 132 33 22 72 5 23 172 53 37 17 34 17 210 74 19 132 19 210 37 4 2 3 3 5 2 19 3 5 74 2 6 13 23 13 5 22 37 11 34 132 56 24 112 52 172 232 210 33 5 114 3 5 2 6 7 2 3 19 5 114 72 5 2 2 6 3 11 2 17 41 5 11 37 74 43 192 372 43 210 56 37 74 2 2 2 6 6 6 4 19 13 17 37 3 5 2 7 192 4 2 2 10 6 41 3 13 23 41 2 5 74 194 2 2 2 2 6 5 19 23 47 11 24 37 74 5 72 172 412 472 210 33 56 7 114 234 4 5 2 2 6 6 13 2 3 17 43 13 5 11 74 234 2 2 2 16 6 67 13 43 61 67 2 5 37 74 114 2 2 2 6 6 37 13 47 71 37 5 17 22 37 74 114 31 132 232 372 612 33 56 116 22 74 194 312 17 26 132 192 432 672 36 56 173 74 114 314 6 2 2 2 2 16 6 6 89 3 13 17 19 71 89 2 5 11 74 314 434 3 2 2 2 2 2 6 6 3 29 2 13 17 41 89 113 5 23 29 37 74 114 194 59 34 132 192 232 372 412 1092 56 176 296 28 74 114 474 592 163 132 672 1092 1392 1572 163 210 56 116 176 311 74 194 234
r 1.19410 0.80487 0.84419 0.90424 0.95729 1.00276 0.87817 0.98579 0.79278 0.86307 0.92839 0.96018 0.90513 0.88998 0.96132 0.83432 0.89267 0.94008 0.99029 0.88352 0.87610 0.91700 0.91518 0.90013
genus and number of elliptic points of X (1), X ∗(1), but not the generators of Γ (1), Γ ∗(1), are already tabulated in [V, Ch.IV:2].) 4.2
Shimura Modular Curves X0∗ (l), in Particular X0∗ (3)
The elliptic elements s3 , s2 , s02 , s002 have discriminants −3, −8, −20, −40. Thus the curve X0∗ (l) has genus −3 −2 −5 −10 1 l−4 −3 −3 −3 . (54) g(X0∗ (l)) = 12 l l l l Again we tabulate this for l < 50: 3 7 11 13 17 19 23 29 31 37 41 43 47 l g(X0∗ (l)) 0 0 1 1 2 1 2 3 3 3 3 3 4 Since g(X0∗ (l)) ≥ (l − 13)/12, the cases l = 3, 7 of genus 0 occurring in this table are the only ones. We next find an explicit rational functions of degree 4 on P1 that realizes the cover X0∗(3)/X0∗ (1), and determine the involution w3 . The curve X0∗ (3) is a degree-4 cover of X ∗(1) with Galois group PGL2 (F3 ) and cycle structures 31, 211, 211, 22 over the elliptic points P3 , P2 , P20 , P200. Thus
28
Noam D. Elkies
there are coordinates τ, x on X ∗ (1), X0∗(3) such that τ (x) = (x2 − c)2 /(x − 1)3 for some c. To determine the parameter c, we use the fact that w3 fixes the simple pole x = ∞ and takes each simple preimage of the 211 points P2 , P20 to the other simple preimage of the same point. That is, (x2 − c)−1 (x − 1)4
dx = x2 − 4x + 3c dt
(55)
must have distinct roots xi (i = 1, 2) that yield quadratic polynomials (x − 1)3 (τ (x) − τ (xi )) (x − xi )2
(56)
with the same x coefficient. We find that this happens only for c = −5/3, i.e. that τ = (3x2 + 5)2 /9(x − 1)3 . For future use it will prove convenient to use t=
(6x − 6)3 63 = , 2 9τ + 8 (x + 1) (9x2 − 10x + 7)
(57)
0 with w3 (x) = 10 9 −x. [Smaller coefficients can be obtained by letting x = 1+2/x , 0 0 02 0 2 0 0 0 0 τ = 2t /9, when t = (2x + 3x + 3) /x and w3 (x ) = −9x /(4x + 9). But our choice of x will simplify the computation of the Schwarzian equation, while the choice of t will turn out to be the correct one 3-adically.] The elliptic points are then P6 : t = 0, P200 : t = 27, and P2 , P20 : t = ∞, 2. In fact the information so far does not exclude the possibility that the pole of t might be at P20 instead of P2 ; that in fact t(P2 ) = ∞, t(P20 ) = 2 and not the other way around can be seen from the order of the elliptic points on the real locus of X ∗(1), or (once we compute the Schwarzian equation) checked using the supersingular test.
4.3
CM Points on X ∗ (1) via X0∗ (3) and w3
From w3 we obtain five further CM points. Three of these are 3-isogenous to known elliptic points: w3 takes the triple zero x = 1 of t to x = 1/9, which gives us t = −192/25, the point 3-isogenous to P3 with discriminant −27; likewise w3 takes the double root x = 5 and double pole x = −1 of t − 2 to x = −35/9, 19/9 and thus to t = −2662/169 and t = 125/147, the points 3-isogenous to t = 2 and t = ∞ and thus (once these points are identified with P20 and P2 ) of discriminants −180 and −72. One new CM point comes from the other fixed point x = 5/9 of w3 , which yields t = −27/49 of discriminant −120. Finally the remaining solutions of t(x) = t(w3 (x)) are the roots of 9x2 − 10x + 65; the resulting CM point t = 64/7, with two 3-isogenies to itself, turns out to have discriminant −35. 4.4
The Schwarzian Equation on X ∗ (1)
We can take the Schwarzian equation on X ∗ (1) to be of the form t(t − 2)(t − 27)f 00 + (At2 + Bt + C)f 0 + (Dt + E) = 0.
(58)
Shimura Curve Computations
29
The coefficients A, B, C, D are then forced by the indices of the elliptic points. Near t = 0, the solutions of (58) must be generated by functions with leading terms 1 and t1/3 ; near t = 2 (t = 27), by functions with leading terms 1 and (t − 2)1/2 (resp. (t − 27)1/2 ); and at infinity, by functions with leading terms t−e and t−e−1/2 for some e. The conditions at the three finite singular points t = 0, 2, 27 determine the value of the f 0 coefficient at those points, and thus yield A, B, C, which turn out to be 5/3, −203/6, 36. Then e, e + 1/2 must be roots of an “indicial equation” e2 − 2e/3 + D = 0, so e = 1/12 and D = 7/144. Thus (58) becomes t(t − 2)(t − 27)f 00 +
10t2 − 203t + 216 0 7t f +( + E) = 0. 6 144
(59)
To determine the “accessory parameter” E, we again use the cover X0∗ (3)/X ∗(1) and the involution w3 . A Schwarzian equation for X0∗(3) is obtained by substituting t = (6x − 6)3 /(x + 1)2 (9x2 − 10x + 17) in (59). The resulting equation will not yet display the w3 symmetry, because it will have a spurious singular point at the double pole x = −1 of t(x). To remove this singularity we consider not f(t(x)) but (60) g(x) := (x + 1)−1/6 f(t(x)). The factor (x + 1)−1/6 is also singular at x = ∞, but that is already an elliptic point of X0∗ (3) and a fixed point of w3 . Let x = u + 5/9, so w3 is simply u ↔ −u. Then we find that the differential equation satisfied by g is 4(81u2 + 20)(81u2 + 128)2 g00 + 108u(81u2 + 128)(405u2 + 424)g0
(61) +(311 u4 − 163296u2 + 170496 + 72(18E + 7)(9u − 4)(81u2 + 128))g = 0. Clearly this has the desired symmetry if and only if 18E + 7 = 0. Thus the Schwarzian equation is t(t − 2)(t − 27)f 00 + 4.5
7 10t2 − 203t + 216 0 7t f +( − ) = 0. 6 144 18
(62)
Numerical Computation of CM Points on X ∗ (1)
We can now expand a basis of solutions of (62) in power series about each singular point t = 0, 2, 27, ∞ (using inverse powers of t − 27 2 for the expansion about ∞ to assure convergence for real t ∈ / [0, 27]). As with the Σ = {2, 3} case we need to identify A ⊗ R with M2 (R), and use the solution √ √ 2 5 √ 0 0 √ , e := . (63) b := − 2 0 0 − 5 of (50), analogous to (30). We want to proceed as we did for Σ = {2, 3}, but there is still one obstacle to computing, for given t0 ∈ R, the point on the hyperbolic quadrilateral formed by the fixed points of s2 , s02 , s002 , s3 at which t = t0 . In the Σ = {2, 3} case, the solutions of the Schwarzian equation were combinations of
30
Noam D. Elkies
hypergeometric functions, whose value at 1 is known. This let us determine two solutions whose ratio gives the desired map to H. But here Γ ∗(1) is not a triangle group, so our basic solutions of (62 are more complicated power series and we do not know a priori their values at the neighboring singular points. In general this obstacle can be overcome by noting that for each nonsingular t0 ∈ R its image in H can be computed from the power-series expansions about either of its neighbors and using the condition that the two computations agree for several choices of t0 to determine the maps to H. In our case we instead removed the obstacle using the non-elliptic CM points computed in the previous section. For example, we used the fact that t0 = 125/147 is the CM point of discriminant 72, and thus maps to the unique fixed point in H of (9b + 4e − be)/2, to determine the correct ratio of power series about t = 0 and t = 2. Two or three such points suffice to determine the four ratios needed to compute our map R → H to arbitrary accuracy; since we actually had five non-elliptic CM points, we used the extra points for consistency checks, and then used the resulting formulas to numerically compute the t-coordinates of the remaining CM points. There are 21 rational CM points on X ∗ (1). We write the discriminant D of each of them as −D0 D1 where D0 |40 and D1 is coprime to 10. Table 3 is organized in the same way as Table 1: we give, for each |D| = D0 D1 , the integers A, B with B ≥ 0 such that (A : B) is the t-coordinate of a CM point of discriminant D. The last column identifies with a “Y” the nine points obtained algebraically from the computation of X0∗ (3) and w3 . Some but not all of the remaining twelve points would move from “N” to “Y” if we also had the equations for the degree-8 map X0∗(7) → X ∗ (1) and the involution w7 on X0∗(7). It will be seen that the factor 3 3 in our normalization (57) of t was needed7 to make t a good coordinate 3-adically: 3 splits in the CM field iff t is not a multiple of 3. In Table 4 we give the factorizations of |A|, B, |A−2B|, |A−27B|; as expected, |A| is always “almost” a perfect cube, and B, |A − 2B|, |A − 27B| “almost” a perfect square, any exceptional primes other than 2 or 5 being the unique prime in D1 , which if it occurs is listed at the end of its respective factorization.
5
Further Examples and Problems
Our treatment here is briefer because most of the ideas and methods of the previous sections apply here with little change. Thus we only describe new features that did not arise for the algebras ramified at {2, 3} and {2, 5}, and exhibit the final results of our computations of modular curves and CM points. 5.1
The Case Σ = {2, 7}
We generate A by elements b, g with b2 + 2 = g2 − 7 = bg + gb = 0, 7
3
(64)
On the other hand the factor 2 in (57) was a matter of convenience, to make the four elliptic points integral.
Shimura Curve Computations
31
Table 3 |D| D0 D1 A 3 1 1 0 8 8 1 1 20 20 1 2 40 40 1 27 52 4 13 −54 120 40 3 −27 35 5 7 64 27 1 33 −192 72 8 32 125 43 1 43 1728 180 20 32 −2662 88 8 11 3375 115 5 23 13824 280 40 7 35937 67 1 67 -216000 148 4 37 71874 340 20 17 657018 520 40 13 658503 232 8 29 176558481 760 40 19 13772224773 163 1 163 −2299968000
B proved? 1 Y 0 Y 1 Y 1 Y 25 N 49 Y 7 Y 25 Y 147 Y 1225 N 169 Y 98 N 3887 N 7406 N 8281 N 207025 N 41209 N 11257064 N 2592100 N 237375649 N 6692712481 N
and a maximal order O by Z[b, g] together with (1 +b +g)/2 (and b(1 +g)/2). By (9), the curve X ∗ (1) has hyperbolic area 1/4. Since Γ ∗(1) is not a triangle group (again by [T]), we again conclude by (10) that X ∗(1) has exactly four elliptic points, this time of orders 2, 2, 2, 4. We find in Γ ∗(1) the elements of finite order s2 = [b], s02 = [7b − 2g − bg], s002 = [7b + 2g − bg], s4 = [1 + 2b + g]
(65)
[NB 7b ± 2g − bg ∈ 2O] of orders 2, 2, 2, 4 with s2 s02 s002 s4 = 1, and conclude that 2 2 s2 , s02 , s002 , s4 generate Γ ∗(1) with relations determined by s22 = s02 = s002 = s44 = 0 00 ∗ s2 s2 s2 s4 = 1. None of these is in Γ (1): the representatives b, 1 + 2b + g of s2 , s4 have norm 2, while s02 , s002 have representatives (7b ± 2g − bg)/2 of norm 14. The discriminants of s4 , s2 , s02 , s002 are −4, −8, −56, −56; note that −56 is not among the “idoneal” discriminants (discriminants of imaginary quadratic fields with class group (Z/2)r ), and thus that the elliptic fixed points P20 , P200 of s02 , s002 are quadratic conjugates on X ∗ (1). Again we use the involution w3 on the modular curve X0∗ (3) to simultaneously determine the relative position of the elliptic points P4 , P2 , P20 , P200 on X ∗ (1) and the modular cover X0∗ (3) → X ∗ (1), and then to obtain a Schwarzian equation on X ∗ (1). Clearly P4 is completely ramified in X0∗ (3). Since −8 and −56 are quadratic residues of 3, each of P2 , P20 , P200 has ramification type 211. Thus X0∗(3) is a rational curve with six elliptic points all of index 2, and we may choose coordinates t, x on X ∗ (1), X0∗(3) such that t(P4 ) = ∞, t(P2 ) = 0, and x = ∞, x = 0 at the quadruple pole and double zero respectively of t.
32
Noam D. Elkies
Table 4 |D| 3 8 20 40 52 120 35 27 72 43 180 88 115 280 67 148 340 520 232 760 163
D0 1 8 20 40 4 40 5 1 8 1 20 8 5 40 1 4 20 40 8 40 1
D1 1 1 1 1 13 3 7 33 32 43 32 11 23 7 67 37 17 13 29 19 163
|A| B |A − 2B| |A − 27B| 0 1 2 33 1 0 1 1 2 1 0 52 33 1 52 0 3 2 2·3 5 23 13 36 33 72 53 2·33 52 26 7 2·52 53 6 2 2 2 3 5 2·11 172 3 3 2 2 5 7 3 13 22 312 6 3 2 2 2 2 3 5 7 2·19 36 43 2·113 132 23 53 3 52 172 33 53 2·72 172 11 36 9 3 2 2 2 2 3 13 23 2·5 11 36 53 3 3 2 3 2 3 11 2·23 7 5 13 38 52 26 33 53 72 132 2·112 312 38 67 2·33 113 52 72 132 25 172 37 38 292 3 3 2 2 3 2 2 2·3 23 7 29 2 5 13 7 36 54 3 3 3 2 2 4 2 2 3 29 2 7 47 13 5 11 17 38 52 43 33 113 173 22 52 72 232 132 192 532 36 712 29 33 173 473 72 312 712 52 112 132 372 19 2·38 53 672 29 33 53 113 72 132 292 312 2·192 592 792 36 172 732 163
We next determine the action of w3 on the elliptic points of X0∗ (3). Necessarily the simple preimages of P2 parametrize two 3-isogenies from P2 to itself. On the other hand the simple preimages of P20 parametrize two 3-isogenies from 00 that √ point to P2 and vice versa, because the squares of the primes above 3 in Q( −14) are not principal. Therefore w3 exchanges the simple preimages of P2 but takes each of the two simple points above P20 to one above P200 and vice versa. So again we have a one-parameter family of degree-4 functions on P1 , and a single condition in the existence of the involution w3 ; but this time it turns out that there are (up to scaling the coordinates t, x) two ways to satisfy this condition: t=
1 4 (x + 4x3 + 6x2 ), 3
w3 (x) =
1−x , 1+x
P20 , P200 : t2 − 3t + 3 = 0
(66)
and t=
1 4 (x + 2x3 + 9x2 ), 27
w3 (x) =
5 − 2x , 2+x
P20 , P200 : 16t2 + 13t + 8 = 0. (67)
How to choose the correct one? We could consider the next modular curve X0∗(5) and its involution to obtain a new condition that would be satisfied by only one of (66,67). Fortunately we can circumvent this laborious calculation by noting that the Fuchsian group associated with (66) is commensurable with a triangle group, since its three elliptic points of index 2 are the roots of (1 − t)3 = 1 and are thus
Shimura Curve Computations
33
permuted by a 3-cycle that fixes the fourth elliptic point t = ∞. The quotient by that 3-cycle is a curve parametrized by (1 − t)3 with elliptic points of order 2, 3, 12 at 1, 0, ∞. But by [T] there is no triangle group commensurable with an arithmetic subgroup of A∗ /Q∗; indeed √ we find there that G2,3,12 is associated with the quaternion algebra over Q( 3) ramified at the prime above 2 and at one of the infinite places of that number field.8 Therefore (67) is the correct choice. Alternatively, we could have noticed that since X (1) is a (Z/2)2 cover of X ∗ (1) ramified at all four elliptic points, it has genus 1, and then used the condition that this curve’s Jacobian have conductor 14 to exclude (66). The function field of X ∗ (1) is obtained by adjoining square roots of c0 t and c1 (16t2 + 13t + 8) for some c0 , c1 ; for the Jacobian to have the correct conductor we must have c0 c1 = 1 mod squares. The double cover of X0∗ (3) obtained by adjoining p c1 (16t2 + 13t + 8) also has genus 1, and so must have Jacobian of conductor at most 42; this happens only when c1 = −1 mod squares, the Jacobian being the elliptic curve 42-A3 (42C). The curve X (1) then has the equation y2 = −16s4 + 13s2 − 8
(t = −s2 ),
(68)
and its Jacobian is the elliptic curve 14-A2 (14D). Kurihara had already obtained in [Ku] an equation birational with (68). Let Γ00 (3r ) be the group intermediate between Γ0 (3r ) and Γ0∗(3r ) consisting of the elements of norm 1 or 7 mod Q∗ 2 . Then the corresponding curves X00 (3r ) (r > 0) of genus 3r−1 + 1 are obtained from X0∗ (3r ) by extracting a square root of t(16t2 + 13t + 8), and constitute an unramified tower of curves over the genus-2 curve X00 (3) : y2 = 3(4x6 + 12x5 + 75x4 + 50x3 + 255x2 − 288x + 648)
(69)
whose reductions are asymptotically optimal over Fl2 (l 6= 2, 3, 7) with each step in the tower being a cyclic cubic extension. (Of course when we consider only reductions to curves over Fl2 the factor of 3 in (69) may be suppressed.) Using w3 we may again find the coordinates of several non-elliptic CM points: t = 4/3 and t = 75/16 of discriminants −36 and −72, i.e. the points 3-isogenous to P4 and P2 , other than P4 , P2 themselves; t = 4/9 and t = 200/9 of discriminants −84 and −168, coming from the fixed points x = 1 and x = −5 of w3 ; and the points t = −1, t = −5 of discriminants −11 and −35, coming from the remaining solutions of t(x) = t(w3 (x)) and each with two 3-isogenies to itself. Even once the relative position of the elliptic points are known, the compuuller for tation of the cover X0∗ (5)/X ∗ (1) is not a trivial matter; I thank Peter M¨ performing this computation using J.-C. Faugere’s Gr¨ obner basis package GB. It turns out that there are eight PGL2 (F5 ) covers consistent with the ramification of which only one is defined over Q: t=− 8
(256x3 + 224x2 + 232x + 217)2 , 50000(x2 + 1)
w5 (x) =
24 − 7x . 7 + 24x
(70)
See [T], table 3, row IV. In terms of that algebra A0 , the triangle group G2,3,12 is Γ ∗ (1); the index-3 normal subgroup whose quotient curve is parametrized by the t of (66) is the normalized in Γ ∗ (1) of {[a] ∈ O∗ /{±1} : a ≡ 1 mod I2 }; and the intersection of this group with Γ0∗ (3) yields as quotient curve the P1 with coordinate x.
34
Noam D. Elkies
Table 5 |D| 4 8 11 35 36 84 72 91 43 168 88 100 67 280 148 532 232 427 163
D0 4 8 1 7 4 28 8 7 1 56 8 4 1 56 4 28 8 7 1
D1 A B 16A2 + 13AB + 8B 2 1 1 0 24 1 0 1 23 11 −1 1 11 5 −5 1 73 2 2 2 3 4=2 3 2 112 2 2 3 4=2 9=3 22 73 32 75 = 52 3 16 = 24 27 292 13 −13 81 = 34 73 112 2 4 43 −25 = −5 81 = 3 292 43 3 2 2 3 200 = 2 5 9=3 24 73 112 11 − 200 = −23 52 81 = 34 25 372 11 52 − 196 = −22 72 405 = 34 5 22 112 432 2 2 4 67 −1225 = −5 7 81 = 3 112 532 67 2 4 4 5 − 845 = −13 5 1296 = 2 3 28 73 112 37 1225 = 52 72 5184 = 26 34 24 112 672 37 19 96100 = 22 52 312 29241 = 34 192 22 73 112 292 372 5 2 2 4 4 29 135200 = 2 5 13 194481 = 3 7 23 112 532 1092 29 2 2 8 61 −3368725 = −5 47 61 6561 = 3 73 112 292 432 532 163 −2235025 = −52 132 232 1185921 = 34 114 372 1072 1492 163
This yields the CM points of discriminants −11, −35, −36, −84 already known from w3 , and new points of discriminants −91, −100, −280. This accounts for eleven of the nineteen rational CM points on X ∗ (1); the remaining ones were computed numerically as we did for the Σ = {2, 5} curve. We used the Schwarzian equation 3 3 t+ f = 0, (71) t(16t2 + 13t + 8)f 00 + (24t2 + 13t + 4)f 0 + 4 16 for which the “accessory parameter” 3/16 was again determined by pulling back to X0∗ (3) and imposing the condition of symmetry under w3 . We tabulate the coordinates t = A/B and factorizations for all nineteen points in Table 5. We see that t is also a good coordinate 3-adically: a point of X ∗ (1) is supersingular at 3 iff the denominator of its t-coordinate is a multiple of 3. (It is supersingular at 5 iff 5|t.) 5.2
The Case Σ = {3, 5}
Here the area of X ∗ (1) is 1/3. This again is small enough to show that there are only four elliptic points, but leaves two possibilities for their indices: 2,2,2,6 or 2,2,3,3. It turns out that the first of these is correct. This fact is contained in the table of [V, Ch.IV:2]; it can also be checked as we did in the cases Σ = {2, p} (p = 3, 5, 7) by exhibiting appropriate elliptic elements of Γ ∗ (1) — which we need to do anyway to compute the CM points. We chose to write write O =
Shimura Curve Computations
Z[ 21 1 + c, e] with
c2 + 3 = e2 − 5 = ce + ec = 0,
35
(72)
and found the elliptic elements s2 = [4c − 3e], s02 = [5c − 3e − ce], s002 = [20c − 9e − 7ce], s6 = [3 + c]
(73)
[NB 20c − 9e − 7ce, 3 + c ∈ 2O] of orders 2, 2, 2, 6 with s2 s02 s002 s6 = 1. The corresponding elliptic points P2 , P20 , P200, P6 have CM discriminants −3, −12, −15, −60. For the first time we have a curve X0∗ (2), and here it turns out that the elliptic points P20 is not ramified in the cover X0∗ (2)/X ∗(1): it admits two 2-isogenies to itself, and one to P 00 . Of the remaining elliptic points, P6 is complete ramified, and each of P2 , P200 has one simple and one double preimage. So we may choose coordinates x, t on X0∗ (2) and X ∗(1) such that t = x(x − 3)2 /4, with t(P6 ) = ∞, t(P2 ) = 0, t(P200 ) = 1. To determine t(P20 ) we use the involution w2 , which switches x = ∞ (the triple pole) with x = 0 (the simple zero), x = 4 (the simple preimage of P200 ) with one of the preimages x1 of P20 (the one parametrizing the isogeny from P20 to P200), and the other two preimages of P20 with each other. Then w2 is x ↔ 4x1 /x, so the product of the roots of (t(x1 ) − t(x))/(x − x1 ) is 4x1 . Thus (74) x(x − 3)2 − 4t(P20 ) = (x − x1 )(x2 + ax + 4x1 ) for some a. Equating x2 coefficients yields a = x1 − 6, and equating the coefficients of x we find 9 = 10x1 − x21 . Thus x1 = 1 or x1 = 9; but the first would give us t(P20 ) = 1 = t(P200 ) which is impossible. Thus x1 = 9 and t(P20 ) = 81, with w2 (x) = 36/x. This lets us find six further rational CM points, of discriminants −7, −28, −40, −48, −120, −240; we can also solve for the accessory parameter −1/2 in the Schwarzian equation 1 3 2 81 1 00 0 t − 82t + f + t− f = 0, (75) t(t − 1)(t − 81)f + 2 2 18 2 and use it to compute the remaining twelve rational CM points numerically. We tabulate the coordinates t = A/B and factorizations for the twenty-two rational CM points on X ∗ (1) in Table 6. An equivalent coordinate that is also good 2-adically is (t − 1)/4, which is supersingular at 2 iff its denominator is even. The elliptic curve X (1) is obtained from X ∗ (1) by extracting square roots of At and B(t − 1)(t − 81) for some A, B ∈ Q∗ /Q∗ 2 . Using the condition that the Jacobian of X (1), and any elliptic curve occurring in the Jacobian of X0 (2), have conductor at most 15 and 30 respectively, we find A = B = −3. Then X (1) has equation (76) y2 = −(3s2 + 1)(s2 + 27) (with t = −3s2 ) and Jacobian isomorphic with elliptic curve 15C (15-A1); the ∗ curve intermediate between X p (2) and X0 (2) whose function field is obtained ∗ from Q(X (2)) by adjoining −3(t − 1)(t − 81) has equation y2 = −3(x4 − 10x3 + 33x2 − 360x + 1296)
(77)
36
Noam D. Elkies
Table 6 |D| 3 12 60 15 7 40 43 195 48 120 28 115 147 123 67 240 267 435 795 235 555 163
D0 3 3 15 15 1 5 1 15 3 15 1 5 3 3 1 15 3 15 15 5 15 1
D1 A 1 1 22 0 22 1 1 81 = 34 7 −27 = −33 3 2 27 = 33 43 −27 = −33 13 81 = 34 4 2 243 = 35 3 2 −243 = −35 22 7 −675 = −33 52 23 621 = 33 23 2 7 −729 = −36 41 2025 = 34 52 67 −3267 = −33 112 24 9801 = 34 112 89 7225 = 52 172 29 21141 = 36 29 53 −6413 = −112 53 47 1269 = 33 47 37 23409 = 34 172 163 −1728243 = −33 112 232
B A−B 0 1 1 −1 1 0 1 24 5 1 −22 7 2 52 16 = 24 −43 16 = 24 5·13 1 112 2 2 −72 5 1 −22 132 16 = 24 112 5 4 112 = 2 7 −292 4 16 = 2 72 41 16 = 24 −72 67 1 23 52 72 4 16 = 2 34 89 4 16 = 2 53 132 432 = 24 33 −5·372 1024 = 210 5·72 10 1024 = 2 5·112 37 10 1024 = 2 −1032 163
81B − A −1 34 24 5 0 22 33 33 5 33 72 33 5 −34 2 34 5 22 33 7 33 52 34 112 −36 33 132 −23 35 5 −72 112 −34 5·72 5·72 132 33 52 112 35 5·72 33 72 372
and Jacobian 30C (30-A3). Fundamental domains for Γ ∗ (1) and Γ (1), computed by Michon [Mi] and drawn by C. L´eger, can be found in [V, pp.123–127]; an equation for X (1) birational with (76) is reported in the table of [JL, p.235]. 5.3
The Triangle Group G2,3,7 as an Arithmetic Group
It is well-known that the minimal quotient area of a discrete subgroup of Aut(H) = PSL2 (R) is 1/42, attained only by the triangle group G2,3,7, and that the Riemann surfaces H/Γ with Γ a proper normal subgroup of finite index in G2,3,7 are precisely the curves of genus g > 1 whose number of automorphisms attains Hurwitz’s upper bound 84(g − 1). Shimura observed in [S2] that this group is arithmetic.9 Indeed, let K be the totally real cubic field Q(cos 2π/7) of minimal discriminant 49, and let A be a quaternion algebra over K ramified at two of the three real places and at no finite primes of K. Now for any totally real number field of degree n > 1 over Q, and any quaternion algebra over that field ramified at n − 1 of its real places, the group Γ (1) of norm-1 elements of a maximal order embeds as a discrete subgroup of PSL2 (R) = Aut(H), with H/Γ of finite area 9
Actually this fact is due to Fricke [F1,F2], over a century ago; but Fricke could not relate G2,3,7 to a quaternion algebra because the arithmetic of quaternion algebras had yet to be developed.
Shimura Curve Computations
37
given by Shimizu’s formula 3/2 n Y (−1) dK ζK (2) Y (N℘ − 1) = n−2 ζK (−1) (N℘ − 1) Area(X (1)) n−1 2n 4 π 2 ℘∈Σ
(78)
℘∈Σ
(from which we obtained (8) by taking K = Q). Thus, in our case of K = Q(cos 2π/7), Σ = {∞, ∞0}, the area of H/Γ (1) is 1/42, so Γ (1) must be isomorphic with G2,3,7. From this Shimura deduced [S2, p.83] that for any proper ideal I ⊂ OK his curve X (I) = H/Γ (I) attains the Hurwitz bound. For instance, if I is the prime ideal ℘7 above the totally ramified prime 7 of Q then X (℘7 ) is the Klein curve of genus 3 with automorphism group PSL2 (F7 ) of order 168. The next-smallest example is the ideal ℘8 above the inert prime 2, which yields a curve of genus 7 with automorphism group [P]SL2 (F8 ) of order 504. This curve is also described by Shimura as a “known curve”, and indeed it first appears in [F3]; an equivalent curve was studied in detail only a few years before Shimura by Macbeath [Mac], who does not cite Fricke, and the identification of Macbeath’s curve with Fricke’s and with Shimura’s X (℘8 ) may first have been observed by Serre in a 24.vii.1990 letter to Abhyankar. At any rate, we obtain towers {X (℘r7 )}r>0 , {X (℘r8 )}r>0 of unramified abelian extensions which are asymptotically optimal over the quadratic extensions of residue fields10 of K other than F49 and F64 respectively, which are involved in the class field towers of exponents 7, 2 of the Klein and Macbeath curves over those fields. These towers are the Galois closures of the covers of X (1) by X0 (℘r7 ), X0 (℘r8 ), which again may be obtained from the curves X0 (℘7 ), X0 (℘8 ) together with their involutions. It turns out that these curves both have genus 0 (indeed the corresponding arithmetic subgroups Γ0 (℘7 ), Γ0 (℘8 ) of Γ (1) are the triangle groups G3,3,7, G2,7,7 in [T, class X]). The cover X0 (℘7 )/X (1) has the same ramification data as the degree-8 cover of classical modular curves X0 (7)/X(1), and is thus given by the same rational function t=
(x47 − 8x37 − 18x27 − 88x7 + 1409)2 213 33 (9 − x7 ) (79)
=1+
(x27
− 8x7 − 5) + 8x7 + 43) 213 33 (9 − x7 ) 3
(x27
(with the elliptic points of orders 2, 3, 7 at t = 0, 1, ∞, i.e. t corresponds to is different, though: it still switches the two simple 1 − 12−3 j). The involution √ zeros x7 = −4 ± −27 of t − 1, but it takes the simple pole x7 = 0 to itself instead of the septuple pole at x7 = ∞. Using (89) again we find w℘7 (x7 ) = 10
19x7 + 711 . 13x7 − 19
(80)
That is, over the fields of size p2 for primes p = 7 or p ≡ ±1 mod 7, and p6 for other primes p.
38
Noam D. Elkies
For the degree-9 cover X0 (℘8 )/X (1) we find t=
(1 − x8 )(2x48 + 4x38 + 18x28 + 14x8 + 25)2 27(4x28 + 5x8 + 23) (81)
=1−
+ x28 + 5x8 − 1)3 , 27(4x28 + 5x8 + 23)
4(x38
with the involution fixing the simple zero x8 = 1 and switching the simple poles, i.e. 51 − 19x8 . (82) w℘8 (x8 ) = 19 + 13x8 Note that all of these covers and involutions have rational coefficients even though a priori they are only known to be defined over K. This is possible because K is a normal extension of Q and the primes ℘7 , ℘8 used to define our curves and maps are Galois-invariant. To each of the three real places of K corresponds a quaternion algebra ramified only at the other two places, and thus a Shimura curve X (1) with three elliptic points P2 , P3 , P7 to which we may assign coordinates 0, 1, ∞. Then Gal(K/Q) permutes these three curves; since we have chosen rational coordinates for the three distinguished points, any point on or cover of X (1) defined by a Galois-invariant construction must be fixed by this action of Galois and so be defined over Q. The same applies to each of the triangle groups Gp,q,r associated with quaternion algebras over number fields F properly containing Q, which can be found in cases III through XIX of Takeuchi’s list [T]. In each case, F is Galois over Q, and the finite ramified places of the quaternion algebra are Galois-invariant. Moreover, even when Gp,q,r is not Γ (1), it is still related with Γ (1) by a Galois-invariant construction (such as intersection with Γ0 (℘) or adjoining w℘ or w℘ for a Galois-invariant prime ℘ of F ). At least one of the triangle groups in each commensurability class has distinct indices p, q, r, whose corresponding elliptic points may be unambiguously identified with 0, 1, ∞; this yields a model of the curve H/Gp,q,r , and thus of all its commensurable triangle curves, that is defined over Q. This discussion bears also on CM points on X (1). There are many CM points on X (1) rational over K, but only seven of those are Q-rational: a CM point defined over Q must come from a CM field K 0 which is Galois not only over K but over Q. Thus K 0 is the compositum of K with an imaginary quadratic field, which must have unique factorization. We check that of the nine such fields √ only five retain unique factorization when composed with K. One of these, Q( −7 ), yields the cyclotomic field Q(e2πi/7 ), whose ring of integers is the CM ring for the elliptic point P7 : t = ∞; two subrings still have unique factorization and yield CM points ℘7 - and ℘8 -isogenous to that elliptic point, which again are not only K- but even Q-rational thanks to the Galois invariance of ℘7 , ℘8 . The other four cases are the fields of discriminant −3, −4, −8, −11, which yield one rational CM point each. The first two are the elliptic points P3 , P2 : t = 1, 0. To find the coordinates of the CM point of discriminant −8, and of the two points isogenous with
Shimura Curve Computations
39
P7 , we may use the involutions (80,82) on X0 (℘7 ) and X0 (℘8 ). On X0 (℘7 ), the involution takes x7 = ∞ to 19/13, yielding the point t = 3593763963/4015905088 ℘7 -isogenous with P7 on X (1); on X0 (℘8 ) the involution takes x8 = ∞ to −19/13, yielding the point t = 47439942003/8031810176 ℘8 -isogenous with P7 . On the latter curve, the second fixed point of the involution (besides x8 = 1) is x8 = −51/13, which yields the CM point t = 1092830632334/1694209959 of discriminant −8. The two points isogenous with P7 also arise from the second fixed point of w℘7 and a further solution of t(x8 ) = t(w℘8 (x8 )). This still leaves the problem of locating the CM point of discriminant −11. We found it numerically using quotients of hypergeometric functions as we did for G2,4,6. Let c = 2 cos 2π/7, so c is the unique positive root of c3 + c2 − 2c − 1. Consider the quaternion algebra over K generated by i, j with i2 = j 2 = c,
ij = −ji.
(83)
This is ramified at the two other real place of K, in which c maps to the negative reals 2 cos 4π/7 and 2 cos 6π/7, but not at the place with c = 2 cos 2π/7; since c is a unit, neither is this algebra ramified at any finite place with the possible exception of ℘8 , which we exclude using the fact that the set of ramified places has even cardinality. Thus K(i, j) is indeed our algebra A. A maximal order O is obtained from OK [i, j] by adjoining the integral element (1 +ci+(c2 +c+1)j)/2. Then O∗ contains the elements g2 := ij/c,
g3 :=
1 (1 + (c2 − 2)j + (3 − c2 )ij), 2 (84)
1 g7 := (c2 + c − 1 + (2 − c2 )i + (c2 + c − 2)ij) 2 of norm 1, with g22 = g33 = g77 = −1 and g2 = g7 g3 . Thus the images of g2 , g3 , g7 in Γ (1) are elliptic elements that generate that group. A short search finds the linear combination (2−c2 )g3 +(c2 +c)g7 ∈ O of discriminant −11; computing its fixed point in H and solving for t to high precision (150 decimals, which turned out to be overkill), we obtain a real number whose continued fraction matches that of 88983265401189332631297917 73 432 1272 1392 2072 659211 = , 45974167834557869095293 33 137 837
(85)
with numerator and denominator differing by 29 293 413 1673 2813 . Having also checked that this number differs from the t-coordinates of the three non-elliptic CM points by products of small (< 104 ) primes,11 and that it passes the supersingular test, we are quite confident that (85) is in fact the t-coordinate of the CM point of discriminant −11. 11
If 104 does not seem small, remember that the factorizations are really over K, not Q; the largest inert prime that occurs is 19, and the split primes are really primes of K of norm at most comparable with that of 19.
40
5.4
Noam D. Elkies
An Irrational Example: The Algebras over Q[τ ]/(τ 3 − 4τ + 2) with Σ = {∞i , ∞j }
While our examples so far have all been defined over Q, this is not generally the case for Shimura curves associated to a quaternion algebra over a totally number field K properly containing Q. For instance, K may not be a Galois extension of Q; or, K may be Galois, but the set of finite ramified places may fails to be Galois-stable; or, even if that set is Galois-stable, the congruence conditions on the subgroup of A∗/K ∗ may not be Galois-invariant, and the resulting curve would not be defined over Q even though X (1) would be. In each case different real embeddings of the field yield different arithmetic subgroups of PSL2 (R) and thus different quotient curves. We give here what is probably the simplest example: a curve X (1) associated to a quaternion algebra with no finite ramified places over a totally real cubic field which is not Galois over Q. While the curve has genus 0, no degree-1 rational function on it takes Q-rational values at all four of its elliptic points, and the towers of modular curves over this X (1) are defined over K but not over Q. Let K be the cubic field Q[τ ]/(τ 3 − 4τ + 2) and discriminant 148 = 22 37, which is minimal for a totally real non-Galois field. Let A/K be a quaternion algebra ramified at two of the three real places and at no finite primes of K. Using gp/pari to compute ζK (2), we find from Shimizu’s formula (78) that the associated Shimura curve X (1) = X ∗(1) has hyperbolic area .16666 . . .; thus the area is 1/6 and, since A is not in Takeuchi’s list, the curve X (1) has genus 0 and four elliptic points, one of order 3 and three of order 2. The order-3 point P3 has discriminant −3 as expected, but the order-2 points are a bit more interesting: their CM field is K(i), but the ring of integers of that field is not OK [i]! Note that the rational prime 2 is totally ramified in K, being the cube of the prime (τ ); thus (1 + i)/τ is an algebraic integer, and we readily check that it generates the integers of K(i) over OK . One of the elliptic points, call it P2 , has CM ring OK [(1 + i)/τ ] and discriminant −4/τ 2 ; of its three (τ )-isogenous points, one is P2 itself, and the others are the remaining elliptic points P20 , P200 , with CM ring OK [i] of discriminant −4. Thus the modular curve X0 ((τ )) is a degree-3 cover of X (1) unramified above the elliptic point P2 , and ramified above the other three elliptic points with type ¯ — 3 for P3 and 21 for P20 , P200. This determines the cover up to K-isomorphism the curve X0 ((τ )) has genus 0, and we can choose coordinates x on that curve and t on X (1) such that t(P3 ) = ∞ and t = x3 − 3cx for some c 6= 0 — but not the location of the unramified point P2 relative to the other three elliptic points. To determine that we once again use the involution, this time w(τ) , of X0 ((τ )): this involution fixes the point above P2 corresponding to its self-isogeny, and pairs the other two preimages of P2 with the simple preimages of P20 , P200. We find that there are three ways to satisfy this condition: P20 , P200 : t = ±2(τ 2 − 3)3/2 , (86) and its Galois conjugates. The correct choice is determined by the condition that the Shimura curves must be fixed by the involution of the Galois closure of K/Q t = x3 − 3(τ 2 − 3)x,
P2 : t = 1300 − 188τ − 351τ 2 ,
Shimura Curve Computations
41
that switches the two real embeddings of K that ramify A: the image of τ under the the third (split) embedding must be√used in (86). √ We find that the simple and of double preimages of P20 , P200 are x = ±2 a2 − 3, ∓ a2 − 3, and the preimages √ P2 are 12 − 2τ − 3τ 2 (fixed by w(τ) ) and (−12 + 2τ + 3τ 2 ± (3a2 − 12) a2 − 3)/2. From this we recover as usual the tower of curves X0 ((τ )r ), whose reductions at primes of K other than τ are asymptotically optimal over the quadratic extensions of the primes’ residue fields, and which in this case is a tower of double (whence cyclic) covers unramified above the genus-3 curve X0 ((τ )4 ) and thus involved in that curve’s class-field tower.
5.5
Open Problems
Computing Modular Curves and Covers. Given a nonempty even set Σ of rational primes, and thus a quaternion algebra A/Q, how to compute the curve X ∗(1) together with its Schwarzian equation and modular covers such as X (1) and X0∗ (l)? Even in the simplest case Σ = {2, 3} where Γ ∗(1) is a triangle group and all the covers X0∗(l)/X ∗ (1) are in principle determined by their ramifications, finding those covers seems at present a difficult problem once l gets much larger than the few primes we have dealt with here. This is the case even when l is still small enough that X0∗ (l) has genus small enough, say g ≤ 5, that the curve should have a simple model in projective space. For instance, according to 35 the curve X0∗ (73) has genus 1. Thus its Jacobian is an elliptic curve; moreover it must be one of the six elliptic curves of conductor 6 · 73 tabulated in [C]. Which one of those curves it is, and which principal homogeneous space of that curve is isomorphic with X0∗ (73), can probably be decided by local methods such as those of [Ku]; indeed such a computation was made for X0 (11) in D. Roberts’ thesis [Ro]. But that still leaves the problem of finding the degree-74 map on that curve which realizes the modular cover X0∗(73) → X ∗(1). For classical modular curves (i.e. with Σ = ∅) of comparable and even somewhat higher levels, the equations and covers can be obtained via q-expansions as explained in [E5]; but what can we do here in the absence of cusps and thus of q-expansions? Can we do anything at all once the primes in Σ are large or numerous enough to even defeat the methods of the present paper for computing X ∗ (1) and the location of the elliptic points on this curve? Again this happens while the genus of X ∗(1) is still small; for instance it seems already a difficult problem to locate the elliptic points on all curves X ∗ (1) of genus zero and determine their Schwarzian equations, let alone find equations for all curves X ∗ (1) of genus 1, 2, or 3. By [I2] the existence of the involutions wl on X0∗ (l) always suffices in principle to answer these questions, but the computations needed to actually do this become difficult very quickly; it seems that a perspicuous way to handle these computations, or a new and more efficient approach, is called for. The reader will note that so far we have said nothing about computing with modular forms on Shimura curves. Not only is this an intriguing question in its own right, but solving it may also allow more efficient computation of Shimura curves and the natural maps between them, as happens in the classical modular setting.
42
Noam D. Elkies
In another direction, we ask: is there a prescription, analogous to (27), for towers of Shimura curves whose levels are powers of a ramified prime of the algebra? For a concrete example (from case III of [T]), let A be the quaternion al√ ℘2 }, where ∞1 is one of the two Archimedean gebra over Q( 2 ) with Σ = {∞1 , √ places and ℘2 is the prime ideal ( 2 ) above 2; let O ⊂ A be a√maximal order, I = I℘2 ⊂ O the ideal of elements whose norm is a multiple of 2, and Γn = {[a] ∈ O1∗ /{±1} : a ≡ 1 mod I n }
(87)
for n = 0, 1, 2, . . . . Then Γn+1 is a normal subgroup of Γn with index 3, 22, 2 according as n = 0, n is odd, or n is even and positive. Consulting [T], we find that Γ0 , Γ1 are the triangle groups G3,3,4 and G4,4,4. Let Xn be the Shimura curve H/Γn , which parametrizes principally polarized abelian fourfolds with endomorphisms by A and complete level-I n structure. Then {Xn }n>0 is a tower of Z/2 or (Z/2)2 covers, unramified above the curve X3 . Moreover, Xn has genus zero for n = 0, 1, 2, while X3 is isomorphic with the curve y2 = x5 − x of genus 2 with maximal√automorphism group. The reduction of this tower at any prime ℘ 6= ℘2 of Q( 2 ) is asymptotically optimal over the quadratic extension of the residue field of ℘. So we ask for explicit recursive equations for the curves in this tower. Note that unlike the tower (25), this one does not seem to offer a wl or w℘2 shortcut. CM Points. Once we have found a Shimura modular curve together with a Schwarzian equation, we have seen how to compute the coordinates of CM points on the curve, at least as real or complex numbers to arbitrary precision. But this still leaves many theoretical and computational questions open. For instance, what form does the Gross-Zagier formula [GZ] for the difference between jinvariants of elliptic curves take in the context of Shimura curves such as X0∗(1) or X (1)? Note that a factorization theorem would also yield a rigorous proof that our tabulated rational coordinates of CM points are correct. Our tables also suggest that at least for rational CM points the heights increase more or less regularly with D1 ; can this be explained and generalized to CM points of degree > 1? For CM points on the classical modular curve X(1) this is easy: a CM j-invariant is an algebraic integer, and its size depends on how√close the corresponding point of H/PSL2 (Z) is to the cusp; so for instance if Q( −D) has class number 1 then the √ CM j-invariant of discriminant −D is a rational integer of absolute value exp(π D) + O(1). But such a simple explanation probably cannot work for Shimura curves which have neither cusps nor integrality of CM points. Within a commensurability class of Shimura curves (i.e. given the quaternion algebra A), the height is inversely proportional to the area of the curve; does this remain true in some sense when A is varied? As a special case we might ask: how does the minimal polynomial of a CM point of discriminant −D factor modulo the primes contained in D1 ? That the minimal polynomials for CM j-invariants are almost squares modulo prime factors of the discriminant was a key component of our results on supersingular reduction of elliptic curves [E2,E3]; analogous results on Shimura curves may
Shimura Curve Computations
43
likewise yield a proof that, for instance, for every t ∈ Q there are infinitely many primes p such that the point on the (2, 4, 6) curve with coordinate t reduces to a supersingular point mod p. Enumeration and Arithmetic of Covers. When an arithmetic subgroup of PSL2 (R) is commensurable with a triangle group G = Gp,q,r , as was the case for the Σ = {2, 3} algebra, any modular cover H/G0 of H/G (for G0 ⊂ G a congruence subgroup) is ramified above only three points on the genus-0 curve H/G. We readily obtain the ramification data, which leave only finitely many possibilities for the cover. We noted that, even when there is only one such cover, actually finding it can be far from straightforward; but much is known about covers of P1 ramified at three points — for instance, the number of such covers with given Galois group and ramification can be computed by solving equations in the group (see [Mat]), and the cover is known [Be] to have good reduction at each prime not dividing the size of the group. But when G, and any group commensurable with it, has positive genus or more than three elliptic points, we were forced to introduce additional information about the cover, namely the existence of an involution exchanging certain preimages of the branch points. In the examples we gave here (and in several others to be detailed in future work) this was enough to uniquely determine the cover H/G0 → H/G. But there is as yet no general theory that predicts the number of solutions of this kind of covering problem. The arithmetic of the solutions is even more mysterious: recall for instance that in our final example the cubic field Q[τ ]/(τ 3 − 4τ + 2) emerged out of conditions on the cover X0 ((τ ))/X (1) in which that field, and even its ramified prime 37, are nowhere to be seen.
6
Appendix: Involutions of P1
We collect some facts concerning involutions of the projective line over a field of characteristic other than 2. We do this from a representation-theoretic point of view, in the spirit of [FH]. That is, we identify a pair of points ti = (xi : yi ) (i = 1, 2) of P1 with a binary quadric, i.e. a one-dimensional space of homogeneous quadratic polynomials Q(X, Y ) = AX 2 +2BXY +CY 2 , namely the polynomials vanishing at the two points; we regard the three-dimensional space V3 of all such polynomials AX 2 + 2BXY + CY 2 as a representation of the group SL2 acting on P1 by unimodular linear transformations of (X, Y ). An invertible linear transformation of a two-dimensional vector space V2 over any field yields an involution of the projective line P1 = P(V2∗ ) if and only if it is not proportional to the identity and its trace vanishes (the first condition being necessary only in characteristic 2). Over an algebraically closed field of characteristic other than 2, every involution of P1 has two fixed points, and any two points are equivalent under the action of PSL2 on P1 . It is clear that the only involution fixing 0, ∞ is t ↔ −t; it follows that any pair of points determines a unique involution fixing those two points. Explicitly, if B 2 6= AC, the involution fixing the distinct roots of AX 2 + 2BXY + CY 2 is (X : Y ) ↔
44
Noam D. Elkies
(BX +CY : −AX −BY ). Note that the 2-transitivity of PSL2 on P1 also means that this group acts transitively on the complement in the projective plane PV3 of the conic B 2 = AC (and also acts transitively on that conic); indeed it is well-known that PSL2 is just the special orthogonal group for the discriminant quadric B 2 − AC on V3 . Now let Q1 , Q2 ∈ V3 be two polynomials without a common zero. Then there is a unique involution of P1 switching the roots of Q1 and also of Q2 . (If Qi has a double zero the condition on Qi means that its zero is a fixed point of the involution.) This can be seen by using the automorphism group Aut(P1 ) =PGL2 to map Qi to XY or Y 2 and noting that the involutions that switch t = 0 with ∞ are t ↔ a/t for nonzero a, while the involutions fixing t = ∞ are t ↔ a − t for arbitrary a. As before, we regard the involution determined in this way by Q1 , Q2 as an element of PV3 . This yields an algebraic map f from (an open set in) PV3 × PV3 , parametrizing Q1 , Q2 without common zeros, to PV3 . We next determine this map explicitly. First we note that this map is covariant under the action of PSL2 : we have f(gQ1 , gQ2 ) = g(f(Q1 , Q2 )) for any g ∈ PSL2 . Next we show that f has degree 1 in each factor. Using the action of PSL2 it is enough to show that if Q1 = XY or Y 2 then f is linear as a function of Q2 = AX 2 + 2BXY + CY 2 . In the first case, the involution is t ↔ C/At and its fixed points are the roots of AX 2 − CY 2 . In the second case, the involution is t ↔ (−2B/A) − t with fixed points t = ∞ and t = −B/A, i.e. the roots of AXY + BY 2 . In either case the coefficients of f(Q1 , Q2 ) are indeed linear in A, B, C. But it turns out that these two conditions completely determine f: there is up to scaling a unique PSL2 -covariant bilinear map from V3 × V3 to V3 ; equivalently, V3 occurs exactly once in the representation V3 ⊗ V3 of PSL2 . In fact it is known (see e.g. [FH, §11.2]) that V3 ⊗ V3 decomposes as V1 ⊕ V3 ⊕ V5 , where V1 is the trivial representation and V5 is the space of homogeneous polynomials of degree 4 in X, Y . The factor V3 is particularly easy to see, because it is V2 V3 of V3 ⊗ V3 . Now the next-to-highest extejust the antisymmetric part Vdim V −1 rior power V of any finite-dimensional vector space V is canonically Vdim V V. isomorphic with (det V ) ⊗ V ∗ , where det V is the top exterior power Taking V = V3 , we see that det V3 is the trivial representation of PSL2 . MoreV3 is self-dual as a over, thanks to the invariant quadric B 2 − AC we know that V ∼ ∼ 2 V3 → V3∗ → V3 , PSL2 representation. Unwinding the resulting identification we find: Proposition A. Let Qi = Ai X 2 + 2Bi XY + Ci Y 2 (i = 1, 2) be two polynomials in V3 without a common zero. Then the unique involution of P1 switching the roots of Q1 and also of Q2 is the involution whose fixed points are the roots of (88) (A1 B2 − A2 B1 )X 2 + (A1 C2 − A2 C1 )XY + (B1 C2 − B2 C1 )Y 2 , i.e. the fractional linear transformation t ←→
(A1 C2 − A2 C1 )t + 2(B1 C2 − B2 C1 ) . 2(B1 A2 − B2 A1 )t + (C1 A2 − C2 A1 )
(89)
Shimura Curve Computations
45
Proof : The coordinates of Q1 ∧ Q2 for the basis of V3∗ dual to (X 2 , 2XY, Y 2 ) are (B1 C2 −B2 C1 , A2 C1 −A1 C2 , A1 B2 −A2 B1 ). To identify V3∗ with V3 we need a PSL2 -invariant element of V3⊗2 . We could get this invariant from the invariant quadric B 2 − AC ∈ V3∗⊗2 , but it is easy enough to exhibit it directly: it is X2 ⊗ Y 2 −
1 2XY ⊗ 2XY + Y 2 ⊗ X 2 , 2
(90)
the generator of the kernel of the multiplication map Sym2 (V3 ) → V5 . The resulting isomorphism from V3∗ to V3 takes the dual basis of (X 2 , 2XY, Y 2 ) to (Y 2 , −XY, X 2 ), and thus takes Q1 ∧ Q2 to (88) as claimed. 2 Of course this is not the only way to obtain (89). A more “geometrical” approach (which ultimately amounts to the same thing) is to regard P1 as a conic in P2 . Then involutions of P1 correspond to points p ∈ P2 not on the conic: the involution associated with p takes any point q of the conic to the second point of intersection of the line pq with the conic. Of course the fixed points are then the points q such that pq is tangent to the conic at q. Given Q1 , Q2 we obtain for i = 1, 2 the secant of the conic through the roots of Qi , and then p is the intersection of those secants. From either of the two approaches we readily deduce Corollary B. Let Qi = Ai X 2 + 2Bi XY + Ci Y 2 (i = 1, 2, 3) be three polynomials in V3 without a common zero. Then there is an involution of P1 switching the roots of Qi for each i if and only if the determinant A1 B1 C1 A2 B2 C2 (91) A3 B3 C3 vanishes. As an additional check on the formula (88), we may compute that the discriminant of that quadratic polynomial is exactly the resolvent A1 2B1 C1 0 0 A1 2B1 C1 (92) det A2 2B2 C2 0 0 A2 2B2 C2 of Q1 , Q2 which vanishes if and only if these two polynomials have a common zero.
References Arno, S.: The imaginary quadratic fields of class number 4. Acta Arith. 60 (1992), 321–334. [Be] Beckmann, S.: Ramified primes in the field of moduli of branches coverings of curves. J. of Algebra 125 (1989), 236–255. [BK] Birch, B.J., Kuyk, W., ed.: Modular Functions of One Variable IV. Lect. Notes in Math. 476, 1975. [A]
46 [C] [D] [E1] [E2] [E3] [E4]
[E5]
[E6]
[F1] [F2]
[F3] [FH] [Go1] [Go2] [GR] [GS] [GZ] [HM]
[I1] [I2]
[I3] [JL] [Kn]
Noam D. Elkies Cremona, J.E.: Algorithms for modular elliptic curves. Cambridge University Press, 1992. Deuring, M.: Die Typen die Multiplikatorenringe elliptische Funktionk¨ orper, Abh. Math. Sem. Hansischen Univ. 14, 197–272 (1941). Elkies, N.D.: ABC implies Mordell, International Math. Research Notices 1991 #7, 99–109. Elkies, N.D.: The existence of infinitely many supersingular primes for every elliptic curve over Q, Invent. Math. 89 (1987), 561–568. Elkies, N.D.: Supersingular primes for elliptic curves over real number fields, Compositio Math. 72 (1989), 165–172. Elkies, N.D.: Heegner point computations. Pages 122–133 in Algorithmic number theory (Ithaca, NY, 1994; L.M. Adleman and M.-D. Huang, eds.; Lect. Notes in Computer Sci. #877; Berlin: Springer, 1994). Elkies, N.D.: Elliptic and modular curves over finite fields and related computational issues. Pages 21–76 in Computational Perspectives on Number Theory: Proceedings of a Conference in Honor of A.O.L. Atkin (D.A. Buell and J.T. Teitelbaum, eds.; AMS/International Press, 1998). Elkies, N.D.: Explicit modular towers. Pages 23–32 in Proceedings of the ThirtyFifth [1997] Annual Allerton Conference on Communication, Control and Computing (T. Ba¸sar and A. Vardy, eds.; Univ. of Illinois at Urbana-Champaign, 1998). ¨ Fricke, R.: Uber den arithmetischen Charakter der zu den Verzweigungen (2, 3, 7) und (2, 4, 7) geh¨ orenden Dreiecksfunctionen, Math. Ann. 41 (1893), 443–468. Fricke, R.: Entwicklungen zur Transformation f¨ unfter und siebenter Ordnung einiger specieller automorpher Functionen, Acta Mathematica 17 (1893), 345– 395. Fricke, R.: Ueber eine einfache Gruppe von 504 Oprationen, Math. Ann. 52 (1899), 321–339. Fulton, W., Harris, J.: Representation Theory: A First Course. New York: Springer, 1991 (GTM 129). Goppa, V.D.: Codes on algebraic curves, Soviet Math. Dokl. 24 (1981), 170–172. Goppa, V.D.: Algebraico-geometric codes, Math. USSR Izvestiya 24 (1983), 75– 91. Gradshteyn, I.S., Ryzhik, I.M.: Table of Integrals, Series, and Products. New York: Academic Press 1980. Granville, A., Stark, H.M.: abc implies no Siegel zeros, preprint 1997. Gross, B.H., Zagier, D.: On singular moduli, J. f¨ ur die reine und angew. Math. 335 (1985), 191–220. Hashimoto, K.-i., Murabayashi, N.: Shimura curves as intersections of Humbert surfaces and defining equations of QM-curves of genus two, Tohoku Math. Journal (2) 47 (1995), #2, 271–296. Ihara, Y.: Schwarzian equations, J. Fac. Sci. Univ. Tokyo 21 (1974), 97–118. Ihara, Y.: On the differentials associated to congruence relations and the Schwarzian equations defining uniformizations, J. Fac. Sci. Univ. Tokyo 21 (1974), 309–332. Ihara, Y.: Some remarks on the number of rational points of algebraic curves over finite fields. J. Fac. Sci. Tokyo 28 (1981), 721–724. Jordan, B.W., Livn´e, R.A.: Local Diophantine properties of Shimura curves. Math. Ann. 270 (1985), 235–248. Knapp, A.W.: Elliptic Curves. Princeton Univ. Press, 1992 (Mathematical Notes 40).
Shimura Curve Computations [Ku]
47
Kurihara, A.: On some examples of equations defining Shimura curves and the Mumford uniformization, J. Fac. Sci. Univ. Tokyo 25 (1979), 277–301. [MM] Malle, G., Matzat, B.H.: Realisierung von Gruppen P SL2 (Fp) als Galoisgruppen u ¨ber Q, Math. Ann. 272 (1985), 549–565. [Mac] Macbeath, A.M.: On a curve of genus 7, Proc. LMS 15 (1965), 527–542. [Mat] Matzat, B.H.: Konstruktive Galoistheorie, Lect. Notes Math. 1284, 1987. [Mi] Michon, J.-F.: Courbes de Shimura hyperelliptiques, Bull. Soc. math. France 109 (1981), 217–225. [M¨ u] M¨ uller, P.: Arithmetically exceptional functions and elliptic curves. Preprint, 1998. ¯ [Ri] Ribet, K.A.: On modular representations of Gal(Q/Q) arising from modular forms, Invent. Math. 100 (1990), 431–476. [Ro] Roberts, D.P.: Shimura curves analogous to X0 (N ). Harvard doctoral thesis, 1989. [Se] Serre, J.-P.: Topics in Galois Theory. Boston: Jones and Bartlett 1992. [S1] Shimizu, H.: On zeta functions of quaternion algebras, Ann. of Math. 81 (1965), 166–193. [S2] Shimura, G.: Construction of class fields and zeta functions of algebraic curves, Ann. of Math. 85 (1967), 58–159. [S3] Shimura, G.: On the Real Points of an Arithmetic Quotient of a Bounded Symmetric Domain, Math. Ann. 215 (1975), 135–164. [T] Takeuchi, K.: Commensurability classes of arithmetic triangle groups, J. Fac. Sci. Univ. Tokyo 24 (1977), 201–212. [TVZ] Tsfasman, M.A., Vl˘ adut¸, S.G., Zink, T.: Modular curves, Shimura curves and Goppa codes better than the Varshamov-Gilbert bound. Math. Nachr. 109 (1982), 21–28. [V] Vign´eras, M.-F.: Arithm´etique des Alg` ebres de Quaternions. Berlin: Springer, 1980 (SLN 800). [YZ] Yui, N., Zagier, D.: On the singular values of Weber modular functions. Math. of Computation 66 (1997), 1629–1644.
The Decision Diffie-Hellman Problem Dan Boneh Computer Science Department, Stanford University, Stanford, CA 94305-9045
[email protected]
Abstract. The Decision Diffie–Hellman assumption (ddh) is a gold mine. It enables one to construct efficient cryptographic systems with strong security properties. In this paper we survey the recent applications of DDH as well as known results regarding its security. We describe some open problems in this area.
1
Introduction
An important goal of cryptography is to pin down the exact complexity assumptions used by cryptographic protocols. Consider the Diffie–Hellman key exchange protocol [12]: Alice and Bob fix a finite cyclic group G and a generator g. They respectively pick random a, b ∈ [1, |G|] and exchange ga , gb . The secret key is gab . To totally break the protocol a passive eavesdropper, Eve, must compute the Diffie–Hellman function defined as: dhg (ga , gb ) = gab . We say that the group G satisfies the Computational Diffie–Hellman assumption (cdh) if no efficient algorithm can compute the function dhg (x, y) in G. Precise definitions are given in the next section. Recent results provide some limited reductions from computing discrete log to computing the Diffie–Hellman function [20,3,21]. Unfortunately, cdh by itself is not sufficient to prove that the Diffie–Hellman protocol is useful for practical cryptographic purposes. Even though Eve may be unable to recover the entire secret, she may still be able to recover valuable information about it. For instance, even if cdh is true, Eve may still be able to predict 80% of the bits of gab with some confidence. With our current state of knowledge we are are unable to prove that, assuming cdh, no such attack exists (although we discuss some results along this line in Section 3.3). Consequently, based on cdh, one cannot simply use the bits of gab as a shared key – cdh does not guarantee that Eve cannot predict these bits. If gab is to be the basis of a shared secret key, one must bound the amount of information Eve is able to deduce about it, given ga , gb . This is formally captured by the, much stronger, Decision Diffie–Hellman assumption (ddh) (defined in the next section). Loosely speaking, the ddh assumption states that no efficient algorithm can distinguish between the two distributions hga , gb , gab i and hg a , gb , gc i where a, b, c are chosen at random in [1, |G|]. As we shall see in Section 3.1, the ddh assumption is equivalent to the (conceptually simpler) assumption saying there is no efficient probabilistic algorithm that given any triplet hga , gb , gc i in G3 outputs “true” if a = bc and “false” otherwise. J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 48–63, 1998. c Springer-Verlag Berlin Heidelberg 1998
The Decision Diffie–Hellman Problem
49
To illustrate the importance of ddh we show how it applies to secret key exchange. We observed above that, with our present knowledge, cdh alone does not enable one to securely use bits of gab as a shared secret — based on cdh we cannot prove that Eve cannot predict some of these bits. Nevertheless, based on cdh alone Alice and Bob can derive one unpredictable bit (known as a hard core bit [16]) from gab . If, given ga , gb , Eve could predict the hard core bit of gab , she could also compute all of gab . Hence, based on cdh alone, to exchange a k bit secret, Alice and Bob would have to run the Diffie–Hellman protocol k times. Each time they extract one hard core bit which is provably unpredictable by Eve. This is clearly inefficient and undesirable1 . In contrast, using ddh one can do much better. Suppose |G| > 2n . One can prove that based on ddh it is possible to extract from a single application of the Diffie–Hellman protocol, n/2 bits which Eve cannot distinguish from a true random string. This is done by hashing gab to an n/2 bit string using an application of the leftover hash lemma as explained in Section 4.1. This is an example of how ddh can be used to significantly increase the efficiency of a cryptographic protocol. We point out that implemented cryptographic systems that derive multiple bits from the Diffie–Hellman secret are implicitly relying on ddh, not cdh. Over the past several years ddh has been successfully used to simplify many cryptographic schemes. We discuss some of these in Section 4. 1.1
ddh in Various Group Families
The ddh assumption is very attractive. However, one must keep in mind that it is a very strong assumption (far stronger than cdh). We note that in some groups the cdh assumption is believed to be true, yet the ddh assumption is trivially false. For example, consider the group Z∗p for a prime p and generator g. The Computational Diffie–Hellman problem is believed to be hard in this group. Yet, given ga , gb one can easily deduce the Legendre symbol of gab . This observation gives an immediate method for distinguishing hga , gb , gab i from hga , gb , gci for random a, b, c. This simple attack explains why most group families in which ddh is believed to be intractable have prime order. We note that to foil the attack it suffices to ensure the group order not have any small prime divisors. We give some examples of groups in which ddh is believed to be intractable. It is remarkable (and surprising) that in all these groups, the best known algorithm for ddh is a full discrete log algorithm. 1
We note that if one assumes that dh g (x, y) cannot be computed by any algorithm running in time t then one can securely derive log t bits out of each invocation of the Diffie–Hellman protocol. This is only a minor improvement over the single bit extraction process described above. We also note that these hard core bits are not bits of gab . Rather, they are derived from gab by viewing it as a bit string over 2 and computing its inner product with a public random vector over 2 of the same length. To apply the Goldreich-Levin theorem [16] to the Diffie–Hellman function one must make use of tricks described in [29, Sect. 5].
Z
Z
50
Dan Boneh
1. Let p = 2p1 + 1 where both p and p1 are prime. Let Qp be the subgroup of quadratic residues in Z∗p . It is a cyclic group of prime order. This family of groups is parameterized by p. 2. More generally, let p = aq+1 where both p and q are prime and q > p1/10 . Let Qp,q be the subgroup of Z∗p of order q. This family of groups is parameterized by both p and q. q−1 3. Let N = pq where p, q, p−1 2 , 2 are prime. Let T be the cyclic subgroup of order (p − 1)(q − 1). Although T does not have prime order, ddh is believed to be intractable. The group is parameterized by N . 4. Let p be a prime and Ea,b /Fp be an elliptic curve where |Ea,b| is prime. The group is parameterized by p, a, b. 5. Let p be a prime and J be a Jacobian of a hyper elliptic curve over Fp with a prime number of reduced divisors. The group is parameterized by p and the coefficients of the defining equation.
2
Definitions
We formally define the notion of indistinguishable distributions and the Decision Diffie–Hellman problem. Throughout the paper we use the term efficient as short hand for probabilistic polynomial time. We use the term negligible to refer to a function (n) which is smaller than 1/nα for all α > 0 and sufficiently large n. Group families. A group family G is a set of finite cyclic groups G = {Gp } where p ranges over an infinite index set. We denote by |p| the size of binary representation of p. We assume there is a polynomial time (in |p|) algorithm that given p and two elements in Gp outputs their sum. Instance generator. An Instance Generator, IG, for G is a randomized algorithm that given an integer n (in unary), runs in polynomial time in n and outputs some random index p and a generator g of Gp . Note that for each n, the Instance Generator induces a distribution on the set of indices p. Examples of group families were given in the previous section. The index p encodes the group parameters. For instance, for the group of points on an elliptic curve we may let p = hp, a, bi denote the curve Ea,b /Fp . The instance generator is used to select a random member of G of the appropriate size. For instance, when G is the family of prime order subgroups of Z∗p the instance generator, on input n, may generate a random n-bit prime p such that (p − 1)/2 is also prime. In some cases it may make sense to generate distributions other than uniform. For instance, one may wish to avoid primes of the form 2k + 1. Definition 2.1. Let G = {Gp } be a group family. – A cdh algorithm A for G is a probabilistic polynomial time (in |p|) algorithm satisfying, for some fixed α > 0 and sufficiently large n: 1 Pr A(p, g, ga , gb ) = gab > α n
The Decision Diffie–Hellman Problem
51
where g is a generator of Gp . The probability is over the random choice of hp, gi according to the distribution induced by IG(n), the random choice of a, b in the range [1, |Gp|] and the random bits used by A. The group family G satisfies the cdh assumption if there is no cdh algorithm for G. – A ddh algorithm A for G is a probabilistic polynomial time algorithm satisfying, for some fixed α > 0 and sufficiently large n: Pr[A(p, g, ga , gb , gab ) = “true”] − Pr[A(p, g, ga, gb , gc) = “true”] > 1 nα where g is a generator of Gp . The probability is over the random choice of hp, gi according to the distribution induced by IG(n), the random choice of a, b, c in the range [1, |Gp|] and the random bits used by A. The group family G satisfies the ddh assumption if there is no ddh algorithm for G. The difference between the two probabilities in the definition of ddh is often called the advantage of algorithm A. The definition captures the notion that the distributions hp, g, ga, gb , gab i and hp, g, ga , gb , gc i are computationally indistinguishable. We will occasionally refer to the related notion of statistically indistinguishable distributions, defined as follows: Definition 2.2. Let {Xp} and {Yp } be two ensembles of probability distributions, where for each p both Xp and Yp are defined over the same domain Dp . We say that the two ensembles are statistically indistinguishable if the statistical distance between them is negligible, i.e. X |Xp (a) − Yp (a)| < Var (Xp , Yp ) = a∈Dp
where = (|p|) is negligible.
3
Known Results on the Security of ddh
We survey some of the evidence that adds to our confidence in ddh. At the moment, this evidence is circumstantial. Proving a link between ddh and a known hard problem is a critical open problem in this area. 3.1
Randomized Reduction
When studying the security of ddh one asks for the weakest assumption that implies ddh. Ideally, one would like to prove cdh implies ddh, or some other classic problem (e.g. factoring) implies ddh. At the moment these questions are open. Fortunately, one can prove that ddh is implied by a slightly weaker assumption: perfect–ddh. perfect–ddh: Let G = {Gp } be a family of finite cyclic groups. A perfect–ddh algorithm A for G correctly decides with overwhelming probability whether a
52
Dan Boneh
given triplet (x, y, z) ∈ G3p is a proper Diffie–Hellman triplet. That is, for large enough n we have Pr[A(p, g, ga, gb , gc) = “true” a = bc] > 1 − Pr[A(p, ga, gb , gc) = “true” a 6= bc] < where the probability is taken over the random bits of A, the random choice of a, b, c ∈ [1, |Gp |], and the choice of hp, gi according to the distribution induced by IG(n). As usual, = (n) is a negligible function. We say that G satisfies the perfect–ddh assumption if there is no polynomial time perfect–ddh algorithm. A perfect–ddh algorithm does more than a ddh algorithm. Namely, it correctly decides whether dhg (x, y) = z for most triplets. In contrast, a ddh algorithm is only required to correctly decide with non-negligible advantage. Stadler [30, Prop. 1] and independently Naor and Reingold [23] showed that the two assumption, ddh and perfect–ddh, are equivalent. This conversion of an imperfect oracle into a perfect one is done via a random reduction. We slightly strengthen the result by applying it to groups in which only an upper bound on size of the group is given, rather than the exact order. This is useful when discussing ddh in the group Z∗N for some N = pq. Theorem 3.1. Let G = {Gp } be a family of finite cyclic groups of prime order. Let s(p) be an efficiently computable function such that |Gp | ≤ s(p) for all p. Then G satisfies the ddh assumption if and only if it satisfies the perfect–ddh assumption. Proof Sketch. The fact that the ddh assumption implies perfect–ddh is trivial. We prove the converse. Let O be a ddh oracle. That is, there exists an α > 0 such that for large enough n, Pr[O(p, g, ga, gb , gab ) = “true”] − Pr[O(p, g, ga, gb , gc ) = “true”] ≥ 1 nα The probability is over the random choice of a, b, c in [1, |Gp|], and the random choice of hp, gi according to the distribution induced by IG(n). We construct a probabilistic polynomial time (in s(p) and |p|) perfect–ddh algorithm, A, which makes use of the oracle O. Given p, g and x, y, z ∈ Gp algorithm A must determine with overwhelming probability whether it is a valid Diffie–Hellman triplet or not. Consider the following statistical experiment: pick random integers u1 , u2, v in the range [1, s(p)2 ] and construct the triplet (x0 , y0 , z 0 ) = (xv gu1 , ygu2 , z v yu1 xvu2 gu1 u2 ) Case 1. Suppose (x, y, z) is a valid triplet, then x = ga , y = gb , z = gab For some a, b. It follows that (x0 , y0 , z 0 ) is also a valid triplet. Furthermore, one can show that (x0 , y0 , z 0 ) is chosen from a distribution which is statistically indistinguishable from the uniform distribution on proper Diffie–Hellman triplets in Gp .
The Decision Diffie–Hellman Problem
53
Case 2. Suppose (x, y, z) is not a valid triplet. Then x = ga , y = gb , z = gab+c 0 0 0 0 for some c 6= 0. In this case, x0 = ga , y0 = gb , z 0 = ga b gcv . Note that since c 6= 0 we know that gc is a generator of Gp . Consequently, the distribution of gcv is statistically indistinguishable from uniform. It is not difficult to show that the distribution on (x0 , y0 , z 0 ) is statistically indistinguishable from the uniform distribution on G3p . We see that based on whether (x, y, z) is a valid Diffie–Hellman triplet we either generate a uniformly random valid triplet or a completely random triplet. Consequently, standard amplification techniques can be used to construct the algorithm A. We describe a simple approach. Algorithm A performs two experiments: it first generates k independent triplets (x0 , y0 , z 0 ) as described above and queries the oracle at those triplets. Let w1 be a random variable counting the number of times the oracle answers “true”. In the second experiment, A generates k random triplets in G3p and queries the oracle. Let w2 be a random variable counting the number of “true” answers. Clearly, E[|w1 − w2 |] = 0 if (x, y, z) is an invalid triplet and E[|w1 − w2 |] > k otherwise. Here = (n) ≥ 1/nα is the advantage produced by the oracle O. Algorithm A outputs “true” if |w1 − w2 | > k/2 and outputs “false” otherwise. Using standard large deviation bounds one can show that when k > 1 log2 1δ algorithm A outputs the right answer with probability at least 1 − δ. t u Observe that the only place where we use the fact that the group order is prime is in arguing that gc is a generator of Gp . This fact remains true, with high probability over the choice of c, as long as the smallest prime divisor of the group order is sufficiently large. Hence the theorem also applies in any group family G in which the smallest prime divisor of |Gp| is super-polynomial in |p|. in particular, it applies to the group of quadratic residues in Z∗N when N = pq and p = 2p1 + 1 and q = 2q1 + 1 for some large primes p, q, p1, q1 . A random reduction such as Theorem 3.1 is an important part of any hardness assumption. Essentially, it shows that assuming one cannot decide the Diffie– Hellman problem with overwhelming probability then one cannot decide it in any non-negligible fraction of the input space. 3.2
Generic Algorithms
Nechaev [25] and Shoup [29] describe models enabling one to argue about lower bounds on computations of discrete log as well as ddh. We use Shoup’s terminology. To disprove ddh one may first try to come up with a ddh algorithm that works in all groups. Indeed, such an algorithm would be devastating. However, the best known generic algorithm for ddh is a generic discrete log algorithm, namely the Baby-Step-Giant-Step [9]. When applied in a group of prime order p √ this algorithm runs in time O ( p). Shoup shows that this is the best possible generic algorithm for ddh. We discuss the implications of this result at the end of the section.
54
Dan Boneh
Definition 3.2 (Shoup). An encoding function on the additive group Z+ p is an injective map σ : Zp → {0, 1}n for some integer n > 0. algorithm that takes as input A generic algorithm A for Z+ p is a probabilistic an encoding list σ(x1 ), . . . , σ(xk ) where σ is an encoding function and xi ∈ Z+ p . During its execution, the algorithm may query an oracle by giving it two indices i, j into the encoding list and a sign bit. The oracle returns the encoding σ(xi ± xj ) according to the sign bit. This new encoding is then added to the encoding list. Eventually, the algorithm terminates and produces a certain output. The output is denoted by A(σ; x1 , . . . , xk ). To illustrate these concepts we describe two encodings of Z+ p . Let q be a prime with p dividing q − 1. Let g ∈ Z∗q have order p. Then σ defined by σ(a) = ∗ ga mod q is an encoding of Z+ p inside Zq . Another encoding could be defined using an elliptic curve over Fq with p points. Let P be a point on the curve. Then σ(a) = aP is another encoding of Z+ p . As an example of a generic algorithm we mentioned the Baby-Step-Giant-Step algorithm for discrete log. On the other hand, the index calculus method for computing discrete log is not generic. It takes advantage of the encoding of group elements as integers. Shoup proved a number of lower bounds on generic algorithms. These include lower bounds on computing discrete log, computing Diffie–Hellman, deciding Diffie–Hellman and a few others. Here, we are most interested in the lower bound on deciding Diffie–Hellman. Theorem 3.3 (Shoup). Let p be a prime and S ⊂ {0, 1}∗ a set of at least p binary strings. Let A be a generic algorithm for Z+ p that makes at most m oracle + queries. Let a, b, c ∈ Z+ p be chosen at random, let σ : Zp → S be a random encoding function, and let s be a random bit. Set w0 = ab and w1 = c. Then Pr[A(σ; 1, a, b, ws, w1−s) = s] − 1 < m2 /p 2 where the probability is over the random choice of a, b, c in [1, p], the random encoding σ and the random bits used by the algorithm. Proof Sketch. We bound the amount of information available to the algorithm after m queries. Each time the algorithm interacts with the oracle it learns the encoding σ(xi ) of some xi ∈ Z+ p . One can easily see that xi = Fi (a, b, c, ab) where Fi is a linear function that can be easily deduced by examining the oracle’s previous queries. Suppose that for all i, j such that Fi 6= Fj one has that σ(xi ) 6= σ(xj ). This means the algorithm learned the random encoding of distinct values. Since these values are independent random bit strings they provide no information to the algorithm. The only way the algorithm obtains any information is if for some i, j with Fi 6= Fj we have that σ(xi ) = σ(xj ). In this case the algorithm may learn a linear relation on the values a, b, c, ab. We give the algorithm the benefit of the
The Decision Diffie–Hellman Problem
55
doubt, and say that if it is able to find such an Fi , Fj then it is able to produce the correct output. Hence, to bound the success probability, it suffices to bound the probability that given arbitrary distinct m linear polynomials and random a, b, c, ab ∈ Zp there exists an i 6= j such that Fi (a, b, c, ab) = Fj (a, b, c, ab). Let R be this event. We bound Pr[R]. For a given Fi 6= Fj the number of solutions to Fi (x, y, z, xy) = Fj (x, y, z, xy) can be bounded by considering the polynomial G(x, y, z) = Fi − Fj . This is a polynomial of total degree 2. Consequently, the probability that a random (x, y, z) ∈ Zp3 is a zero of G is bounded by 2/p such pairs Fi , Fj to consider. Hence, the probability (see [28]). There are m 2 that a random (x, y, z, xy) is the root of some Fi − Fj is bounded by Pr[R] ≤
m2 m 2 · < p p 2
The theorem now follows. When R does not occur the algorithm can only guess the answer getting it right with probability half. The only information comes t u from the event R which occurs with probability less than m2 /p. The theorem shows that any generic algorithm whose running time is less √ that ( p)1− fails to solve ddh, with non-negligible advantage, on a random encoding of the group Z+ p . It follows that there exists an encoding where the algorithm must fail. Hence, the theorem shows that if a generic algorithm is to obtain a non-negligible advantage in solving ddh it must run in exponential time (in log p). This lower bound shows there is no efficient generic ddh algorithm that works in all groups. It is important to keep this in mind when searching for efficient ddh algorithms. The algorithm must make use of the particular group encoding. Using a similar argument one can show that no efficient generic algorithm can reduce cdh to ddh. That is, suppose that in addition to the group action oracle, the algorithm also has access to an oracle for deciding ddh (i.e. given hσ(a), σ(b), σ(c)i the oracle returns “true” if a = bc and “false” otherwise). Then any generic algorithm given σ(x), σ(y) and making a total of at most m oracle queries will succeed in computing σ(xy) with probability at most m2 /p. This is important to keep in mind when searching for a reduction from cdh to ddh. At a first reading the implications of Theorem 3.3 may not be clear. To avoid any confusion we point out a few things the theorem does not imply. – The theorem cannot be applied to any specific group. That is, the theorem does not imply that in Z∗p there is no sub-exponential algorithm for ddh. In fact, we know that such an algorithm exists. Similarly, the theorem implies nothing about the group of points on an elliptic curve. – The theorem does not imply that there exists an encoding of Z+ p for which ddh is true. It is certainly possible that for every encoding there exists a ddh algorithm that takes advantage of that particular encoding.
56
3.3
Dan Boneh
Security of Segments of the Diffie-Hellman Secret
Ideally, one would like to prove that cdh implies ddh. To so, one must provide a reduction showing that an oracle for breaking the decision problem can be used to break the computational problem. This appears to be a hard open problem. Nonetheless, one may try to prove weaker results regarding the security of Diffie– Hellman bits. Unfortunately, even proving that computing one bit of gab given g a and gb is as hard as cdh is open. Currently, the only result along these lines is due to Boneh and Venkatesan [4]. At the moment these results only apply to the group Z∗p and its subgroups. We define the k most significant bits of an elements x ∈ Z∗p as the k most significant bits of x when viewed as an integer in the range [0, p). ∗ Theorem 3.4 (Boneh-Venkatesan). Let p be an √ n-bit prime and g ∈ Zp . Let > 0 be a fixed constant and set k = k(n) = d ne. Suppose there exists an expected polynomial time (in n) algorithm, A, that given p, g, ga, gb computes the k most significant bits of gab . Then there is also an expected polynomial time algorithm that given p, g, ga, gb computes all of gab .
Proof Sketch. The proof relies on lattice basis reductions and the LLL algorithm [19]. Given some ga and gb we wish to compute all of gab . To do so, we pick one random r and apply A to the points ga+r , gb+t for many random values of t. Consequently, we learn the most significant bits of g(a+r)b · g(a+r)t . Notice that, with sufficiently high probability, ga+r is a generator of hgi, the group generated by g. Hence, g(a+r)t is a random element of hgi. The problem is now reduced to the following: let α = g(a+r)b ; we are given the most significant bits of α multiplied by random elements in hgi; find α. To solve this problem one makes use of the LLL algorithm. This requires some work since one must prove that even though LLL does not produce a shortest vector, one is still able to find the correct √ α. Indeed, the quality of the shortest vector produced by LLL implies the log p bound on the number of necessary bits. To prove the result for < 1 one makes use of Schnorr’s improvement of the LLL algorithm [27]. t u Once α is found, recovering gab is trivial. The result shows that under cdh there is no efficient algorithm that computes √ roughly log p bits of the Diffie–Hellman secret. To illustrate this, one may take = 1. In this case when p is 1024 bits long, under cdh one cannot compute the 32 leading bits. The same result holds for the least significant bits as well. The smaller the value of the longer the running time of the reduction algorithm. The running time is exponential in 1/. The result is a first step in arguing about the security of segments of the Diffie–Hellman secret based on cdh. Hopefully, future results will show that fewer bits are required to reconstruct the entire secret. Interestingly, this is the only result where the LLL algorithm is used to prove the security of a cryptographic primitive. Usually, LLL is used to attack cryptosystems (for example, consider Coppersmith’s low exponent attacks on RSA [10]).
The Decision Diffie–Hellman Problem
3.4
57
Statistical Results
Although we cannot give bounds on the computational complexity of ddh some results are known on the statistical distribution of proper Diffie–Hellman triples in the group Z∗p . Recently, Canetti, Friedlander and Shparlinski [7] showed that the triples (ga , gb , gab ) are uniformly distributed modulo p in the sense of Weyl. Let p be a prime and g a generator of Z∗p . Let B be a box of size |B| = h1 h2 h3 . That is, B = [k1, k1 + h1 − 1] × [k2 , k2 + h2 − 1] × [k3, k3 + h3 − 1] where 0 ≤ ki ≤ k1 + hi − 1 ≤ p − 1. We denote by N (B) the number of Diffie–Hellman triples (ga , gb , gab ) that when reduced modulo p fall in the box B. Suppose Diffie–Hellman triples were randomly scattered in (Zp )3 . Since there are (p − 1)2 triples over all, one would expect (p − 1)2 · |B|/(p − 1)3 of these to fall inside the box. Denote the discrepancy by |B| ∆ = supB N (B) − p − 1 Then we know [7] that this discrepancy is small. Theorem 3.5 (CFS). Let p be an n-bit prime and g a generator of Z∗p . Then ∆ ≤ O (p31/16) = o(p2 ) The result shows that Diffie–Hellman triples are close to being uniformly distributed among the boxes in Z3p . The proof is based on bounding certain exponential sums. One can give an interesting interpretation of this result using statistical independence. For binary strings x, y, z define Mk (x, y, z) to be the string obtained by concatenating the k most significant bits of x to the k most significant bits of y to the k most significant bits of z. Recall that the statistical distance between two distributions P1 and P2 over {0, 1}3k is defined by X |P1 (x) − P2 (x)| Var(P1 , P2 ) = x
Corollary 3.6 (CFS). Let p be an n-bit prime and set k = dγne for some constant γ < 1/48. Let g be a generator of Z∗p . Define the following two distributions over {0, 1}3k : – P1 is the uniform distribution among all strings in the set Mk (ga , gb , gab ) where a, b are in the range [1, p] and ga , gb , gab are reduced modulo p. – P2 is the uniform distribution on {0, 1}3k . Then the statistical distance between P1 and P2 is Var(P1 , P2 ) ≤ e−c(γ)n where c(γ) > 0 is a constant depending only on γ.
58
Dan Boneh
The corollary shows that given the k most significant bits of ga , gb one cannot distinguish (in the statistical sense) the k most significant bits of gab from a truly random k bit string. This is quite interesting although it does not seem to apply to the security analysis of existing protocols. In most protocols the adversary learns all of ga and gb . The authors claim that a similar result holds for subgroups of Z∗p as long as the index is “not too large”.
4
Applications of Decision Diffie-Hellman (DDH)
We briefly describe some applications of ddh that show why it is so attractive to cryptographers. 4.1
ElGamal Encryption
Let p be a prime and g ∈ Z∗p . The ElGamal public key system encrypts a message m ∈ Zp given a public key ga by computing hgb , m · gab i. Here b is chosen at random in [1, ord(g)]. Decryption using the private key a is done by first computing gab and then dividing to obtain m. When g is a generator of Z∗p the system in not semantically secure2 . Some information about the plaintext is revealed. Namely, the Legendre symbol of ga , gb completely exposes the Legendre symbol of m. In case the symbol of m encodes important information, the system is insecure. This is an example where even though the cdh assumption is believed to be true, the system leaks information. To argue that the ElGamal system is semantically secure one must rely on the ddh assumption. Let G be a group in which the ddh assumption holds and g a generator of G. Then, assuming the message space is restricted to G it is easy to show that the system is semantically secure under ddh. This follows since given ga , gb the secret pad gab cannot be distinguished from a random group element. It follows that m · gab cannot be distinguished from a random group element. Consequently, given the ciphertext, an attacker cannot deduce any extra information about the plaintext. To summarize, ddh is crucial for the security analysis of the ElGamal system. cdh by itself is insufficient. Notice that in the above argument we rely on the fact that the plaintext space is equal to the group G. This is somewhat cumbersome since often one wishes to encrypt an n-bit string rather than a group element. This can be easily fixed using hashing. Suppose |G| > 2n . Then assuming ddh, the string gab has at least n bits of computational entropy[18]. Note that the bit string representing gab may be much longer. Hashing gab to an m-bit string for some m ≤ n results in a bit-string indistinguishable from random. Encryption can be done by xoring this m bit hashed string with the plaintext. To formally argue that this hashing results in a pseudo random string one makes use of the leftover hash lemma [18] and pairwise independent hash functions. 2
Semantic security [17] is the standard security notion for an encryption scheme. It essentially says that any information about the plaintext an eavesdropper can obtain given the ciphertext, can also be obtained without the ciphertext.
The Decision Diffie–Hellman Problem
4.2
59
Efficient Pseudo Random Functions
Naor and Reingold [23] describe a beautiful application of ddh. They show how to construct a collection of efficient pseudo random functions. Such functions can be used as the basis of many cryptographic schemes including symmetric encryption, authentication [14] and digital signatures [1]. Prior to these results, existing constructions [15,22] based on number theoretic primitives were by far less efficient. Pseudo random functions were first introduced by Goldreich, Goldwasser and Micali [15]. At a high level, a set Fn of functions An 7→ Bn is called a pseudo random function ensemble if no efficient statistical test can distinguish between a random function chosen in the set and a truly random function, i.e. a function chosen at random from the set of all functions An 7→ Bn . Here An , Bn are finite domains. The statistical test is only given “black-box” access to the function. That is, it can ask an oracle to evaluate the given function at a point of its choice, but cannot peak at the internal implementation. We refer to [23] for the precise definition. Let G = {Gp } be a group family. For a given value of n ∈ N, the NaorReingold pseudo-random function ensemble, Fn , is a set of functions from {0, 1}n to Gp for some p (the index p may be different for different functions in the ensemble). A function in the set is parameterized by a seed s = hp, g, ai where g is a generator of Gp and a = (a0 , . . . , an ) is a vector of n + 1 random integers in the range [1, |Gp|]. The value of the function at a point x = x1 x2 . . . xn ∈ {0, 1}n is defined by Qn xi fp,g,a (x) = ga0 i=1 ai The distribution on the seed s is induced by the random choice of a and the distribution induced on hp, gi by IG(n). In what follows, we let Af denote the algorithm A with access to an oracle for evaluating the function f. The following theorem is the main result regarding the above construction. Theorem 4.1 (Naor-Reingold). Let G be a group family and let {Fn }n∈N be the Naor-Reingold pseudo-random function ensemble. Suppose the ddh assumption holds for G. Then for every probabilistic polynomial time algorithm A and sufficiently large n, we have that Pr[Afp,g,a (p, g) = “true”] − Pr[ARp,g,a (p, g) = “true”] < where = (n) is negligible. The first probability is taken over the choice of the seed s = hp, g, ai. The second probability is taken over the random distribution induced on p, g by IG(n) and the random choice of the function Rp,g,a among the set of all {0, 1}n 7→ Gp functions. The evaluation of a function fp,g,a (x) in the Naor-Reingold construction can be can be done very efficiently (compared Qn to other constructions). Essentially, one first computes the product r = a0 i=1 axi i mod |Gp| and then computes gr . Hence, the evaluation requires n modular multiplications and one exponentiation. Note that we are assuming the order of Gp is known.
60
4.3
Dan Boneh
A Cryptosystem Secure against Adaptive Chosen Ciphertext Attack
Recently, Cramer and Shoup [11] presented a surprising application of ddh. They describe an efficient public key cryptosystem which is secure against adaptive chosen ciphertext attack. Security against such a powerful attack could only be obtained previously by extremely inefficient techniques [24,26,13] relying on constructions for non-interactive zero-knowledge (efficient heuristic constructions are described in [32]). In light of this, it is remarkable that the ddh assumption is able to dramatically simplify things. An adaptive ciphertext attack is an attack where the adversary has access to a decryption oracle. The adversary is given a ciphertext C = E(M ). He can then query the oracle at arbitrary inputs of his choice. The only restriction is that the queries must be different than the given ciphertext C. The adversary’s goal is to then deduce some information about the plaintext M with non-negligible advantage. To motivate this notion of security we point out that the standard semantic security model [17] provides security against passive (i.e. eavesdropping) attacks. It does not provide any security against an active attacker who is able to influence the behavior of honest parties in the network. In contrast, security against adaptive chosen ciphertext attacks provides security against any active adversary. Clearly, a cryptosystem secure against an adaptive attack must be nonmalleable – given C one should not be able to construct a C 0 such that the decryption of C and C 0 are correlated in any way. Indeed, if this were not the case, the attacker would simply query the decryption oracle at C 0 and learn information about the decryption of C. Thus, the Cramer-Shoup cryptosystem is also non-malleable (assuming ddh). Non-malleable systems are needed in many scenarios (see [13]). For instance, to cheat in a bidding system, Alice may not need to discover Bob’s bid. She may only want to offer a lower bid. Thus, if Bob encrypts his bid using a malleable system, Alice may be able to cheat by creating the encryption of a lower bid without having to break Bob’s cipher. In case Bob encrypts his bid with a non-malleable system, this form of cheating is impossible. 4.4
Others
The ddh assumption is used in many other papers as well. We very briefly mention four (see also the summary in [23]). Recently, Canetti [6] described a simple construction based on ddh for a primitive called “Oracle Hashing”. These are hash functions that let one test that b = h(a), but given b alone, they reveal no information about a. Bellare and Micali [2] use ddh to construct a non-interactive oblivious transfer protocol. Brands [5] pointed out that several suggestions for undeniable signatures [8] implicitly rely on ddh. Steiner, Tsudik and Waidner [31] show that ddh implies generalized–ddh. They consider a generalization of Diffie–Hellman enabling a group of parties to exchange a common secret key. For example, in the case of three parties, each party picks a random
The Decision Diffie–Hellman Problem
61
xi , they publicly compute gxi , gxixj for 1 ≤ i < j ≤ 3 and set their common secret to gx1 x2 x3 . This suggests a generalization of the ddh assumption. Fortunately, Steiner, Tsudik and Waidner show that, for a constant number of parties, ddh implies the generalized–ddh.
5
Conclusions and Open Problems
The Decision Diffie–Hellman assumption appears to be a very strong assumption, yet the best known method for breaking it is computing discrete log. The assumption plays a central role in improving the performance of many cryptographic primitives. We presented the known evidence for its security. This evidence includes (1) a worst-case to average case reduction for ddh. (2) no generic algorithm can break ddh. (3) certain pieces of the Diffie–Hellman secret are provably as hard to compute as the entire secret. (4) statistically, Diffie– Hellman triplets are uniformly distributed (in the sense of Weyl). We conclude with a list of the main open problems in this area. Progress on any of these would be most welcome. Open Problems: 1. Is there an algorithm for ddh in a prime order subgroup of Z∗p whose running time is better than the fastest discrete log algorithm in that subgroup? This is perhaps the most interesting problem related to ddh. It is almost hard to believe that computing discrete log is the best method for testing that a triplet hx, y, zi satisfies the Diffie–Hellman relation. At the moment we are powerless to settle this question one way or another. 2. Is there a group family in which ddh is implied by some “standard” cryptographic assumption, e.g. cdh, or factoring? For instance, let N = pq where p = 2p1 + 1 and q = 2q1 + 1 with p, q, p1, q1 prime. Can one reduce the ddh assumption in Z∗N to the hardness of distinguishing quadratic residues from non residues with Jacobi symbol +1 ? 3. Can one improve the results of [4] (see Section 3.3) and show that in Z∗p the single most significant bit of the Diffie–Hellman secret is as hard to compute as the entire secret? Also, does a similar result to that of [4] hold in the group of points of an elliptic curve?
Acknowledgments The author thanks Victor Shoup for many insightful comments on an early draft of this paper.
62
Dan Boneh
References 1. M. Bellare, S. Goldwasser, “New paradigms for digital signatures and message authentication based on non-interactive zero-knowledge proofs” Crypto ’89, pp. 194–211. 2. M. Bellare, S. Micali, “Non-interactive oblivious transfer and applications”, Crypto ’89, pp. 547–557. 3. D. Boneh, R. Lipton, “Black box fields and their application to cryptography”, Proc. of Crypto ’96, pp. 283–297. 4. D. Boneh, R. Venkatesan, “Hardness of computing most significant bits in secret keys of Diffie–Hellman and related schemes”, Proc. of Crypto ’96, pp. 129–142. 5. S. Brands, “An efficient off-line electronic cash system based on the representation problem”, CWI Technical report, CS-R9323, 1993. 6. R. Canetti, “Towards realizing random oracles: hash functions that hide all partial information”, Proc. Crypto ’97, pp. 455–469. 7. R. Canetti, J. Friedlander, I. Shparlinski, “On certain exponential sums and the distribution of Diffie–Hellman triples”, Manuscript. 8. D. Chaum, H. van Antwerpen, “Undeniable signatures”, Proc. Crypto ’89, pp. 212–216. 9. H. Cohen, “A course in computational number theory”, Springer-Verlag. 10. D. Coppersmith, “Finding a Small Root of a Bivariate Integer Equation; Factoring with high bits known”, Proc. Eurocrypt ’96, 1996. 11. R. Cramer, V. Shoup, “A practical public key cryptosystem provably secure against adaptive chosen ciphertext attack”, manuscript. 12. W. Diffie, M. Hellman, “New directions in cryptography”, IEEE Transactions on Information Theory, vol. 22, no. 6, pp. 644–654, 1976. 13. D. Dolev, C. Dwork, M. Naor, “Non-malleable cryptography”, Proc. STOC’ 91, pp. 542–552. 14. O. Goldreich, S. Goldwasser, S. Micali, “On the cryptographic applications of random functions”, Crypto’ 84, pp. 276–288. 15. O. Goldreich, S. Goldwasser, S. Micali, “How to construct random functions”, J. ACM, Vol. 33, 1986, pp. 792–807. 16. O. Goldreich, L.A. Levin, “Hard core bits based on any one way function”, Proc. STOC ’89. 17. S. Goldwasser, S. Micali, “Probabilistic encryption”, J. Computer and Syst. Sciences, Vol. 28, 1984, pp. 270–299. 18. J. Hastad, R. Impaglizzo, L. Levin, M. Luby, “Construction of pseudo random generators from one-way functions”, SIAM J. of Computing, to appear. Also see preliminary version in STOC’ 89. 19. A. Lenstra, H. Lenstra, L. Lovasz, “Factoring polynomial with rational coefficients”, Mathematiche Annalen, 261:515–534, 1982. 20. U. Maurer, “Towards proving the equivalence of breaking the Diffie–Hellman protocol and computing discrete logarithms”, Proc. of Crypto ’94, pp. 271–281. 21. U. Maurer, S. Wolf, “Diffie–Hellman oracles”, Proc. of Crypto ’96, pp. 268–282. 22. M. Naor, O. Reingold, “Synthesizers and their application to the parallel construction of pseudo-random functions”, Proc. FOCS ’95, pp. 170–181. 23. M. Naor, O. Reingold, “Number theoretic constructions of efficient pseudo random functions”, Proc. FOCS ’97. pp. 458–467.
The Decision Diffie–Hellman Problem
63
24. M. Naor, M. Yung, “Public key cryptosystems provable secure against chosen ciphertext attacks”, STOC ’90, pp. 427–437 25. V. Nechaev, “Complexity of a determinate algorithm for the discrete logarithm”, Mathematical Notes, Vol. 55 (2), 1994, pp. 165–172. 26. C. Rackoff, D. Simon, “Non-interactive zero knowledge proof of knowledge and chosen ciphertext attack”, Crypto’ 91, pp. 433–444. 27. C. Schnorr, “A hierarchy of polynomial time lattice basis reduction algorithms”, Theoretical Computer Science, Vol. 53, 1987, pp. 201–224. 28. J. Schwartz, “Fast probabilistic algorithms for verification of polynomial identities”, J. ACM, Vol. 27 (4), 1980, pp. 701–717. 29. V. Shoup, “Lower bounds for discrete logarithms and related problems”, Proc. Eurocrypt ’97, pp. 256–266. 30. M. Stadler, “Publicly verifiable secret sharing”, Proc. Eurocrypt ’96, pp. 190– 199. 31. M. Steiner, G. Tsudik, M. Waidner, “Diffie–Hellman key distribution extended to group communication”, Proc. 3rd ACM Conference on Communications Security, 1996, pp. 31–37. 32. Y. Zheng, J. Seberry, “Practical approaches to attaining security against adaptively chosen ciphertext attacks”, Crypto ’92, pp. 292–304.
Parallel Implementation of Sch¨ onhage’s Integer GCD Algorithm Giovanni Cesari Universit` a degli Studi di Trieste, DEEI, I-34100 Trieste, Italy
[email protected]
Abstract. We present a parallel implementation of Sch¨ onhage’s integer GCD algorithm on distributed memory architectures. Results are generalized for the extended GCD algorithm. Experiments on sequential architectures show that Sch¨ onhage’s algorithm overcomes other GCD algorithms implemented in two well known multiple-precision packages for input sizes larger than about 50000 bytes. In the extended case this threshold drops to 10000 bytes. In these input ranges a parallel implementation provides additional speed-up. Parallelization is achieved by distributing matrix operations and by using parallel implementations of the multiple-precision integer multiplication algorithms. We use parallel Karatsuba’s and parallel 3-primes FFT multiplication algorithms implemented in CALYPSO, a computer algebra library for parallel symbolic computation we have developed. Sch¨ onhage’s parallel algorithm is analyzed by using a message-passing model of computation. Experimental results on distributed memory architectures, such as the Intel Paragon, confirm the analysis.
1
Introduction
The greatest common divisor (GCD) of two integers a and b, not both zero, is defined as the greatest integer evenly dividing both a and b. When a and b are both zero, every integer divides them both, so the above definition does not apply; it is convenient to set gcd(0, b) = b. All GCD algorithms follow the same idea of reducing efficiently a and b to a0 and b0 , so that gcd (a0 , b0 ) = gcd (a, b). These operations are applied several times, till gcd(a0 , b0 ) can be computed directly from a0 and b0 . Extended GCD algorithms also compute two integers s and t such that gcd (a, b) = as + bt. There are several well known algorithms used to computes GCDs. Euclid’s algorithm (see [11], pp 316-320) uses a modulus reduction: if a > b, gcd(a, b) = gcd(a, a mod b). It requires comparisons and multiple-precision divisions. The binary GCD is based on the following four simple facts (see [11], pp 321323): (i) if a and b are both even, then gcd(a, b) = 2 gcd(a/2, b/2); (ii) if a is even and b is odd, then gcd(a, b) = gcd(a/2, b); (iii) gcd(a, b) = gcd(a − b, b) and (iv) if a and b are both odd, then a − b is even, and |a − b| < max (a, b). By applying these properties we can implement reductions which do not require divisions. They only rely on subtractions, testing whether a number is even or odd, and J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 64–76, 1998. c Springer-Verlag Berlin Heidelberg 1998
Parallel Implementation of Sch¨ onhage’s Integer GCD Algorithm
65
shifting. Although this reduction step is less powerful than Euclid’s one (more reduction steps are required), it is much more efficient. The binary algorithm generally outperforms Euclid’s algorithm. Lehmer’s GCD (see [11], pp 327-330) works only with the leading digits of large numbers. This allows to do most of the calculations with single-precision arithmetic, and to make a substantial saving in the number of multiple-precision operations. The reduction step does not always preserve the identity gcd(a0 , b0 ) = gcd(a, b), and therefore it can be necessary to perform corrections steps at the end of the algorithm. Recently a new efficient GCD algorithm, called the accelerated GCD, has been proposed [14]. Basically it uses a k-ary reduction [13]. The k-ary reduction replaces a and√b by |c1 a − c2 b|/k, where |c1 a − c2 b| is divisible by k and 0 < |c1 | + |c2 | ≤ 2 k, for some fixed positive integer k. It is usually good to choose k as a small power of two. This algorithm also does not preserve the GCD of the operands: it can only be said that the GCD of the original values evenly divides the GCD of the new values. The reduction in size is so large, though, that there is more than enough time to remove the spurious factors from the final answer and still beat the algorithms mentioned above.
2
Sch¨ onhage’s GCD Algorithm
The algorithms mentioned so far have OB (n2 ) bit-complexity on two n-bit integers. In 1971 Sch¨ onhage described an algorithm [12] that can calculate the GCD in OB (log(n)MB (n)) time, where MB (n) is the bit-complexity to multiply two numbers of length n. This algorithm, called also half-GCD, is described in detail in [1,16]. An implementation can be found in [7]. We have found this algorithm interesting for several reason: (i) though it is described in the literature we did not find comparisons of Sch¨ onhage’s algorithm with other implementations; (ii) having OB (log(n)MB (n)) complexity it is more convenient for large input size where we expect parallelization to be useful, and (iii) we did not find any parallel implementation of this algorithm. In the following we first describe Sch¨ onhage’s algorithm ideas and give an implementation of the half-GCD algorithm. Then we compare its performance with other GCD implementations on sequential architectures. Finally, we show how our implementation can be parallelized. Let us motivate Sch¨ onhage’s approach. Given two integers a and b consider the remainder sequence r0 , r1 , . . . , rk , where r0 = a, r1 = b and ri (for i ≥ 2) is the nonzero remainder from the division of ri−2 by ri−1 . The last term rk divides rk−1 exactly. In the worst case, the sum of the bit-length of this sequence is proportional to n2 . So any algorithm that explicitly computes each member of the remainder sequence has at least quadratic bit-complexity. On the other hand, if q0 , q1 , . . . , qk is the quotient sequence associated to the ri ’s, one can see that the sum of its bit-length is proportional to n. Moreover, as we shall see in the next lemma, we can obtain in time OB (log(n)MB (n)) any member of the remainder sequence from the
66
Giovanni Cesari
qi ’s. This suggests to work with the quotient sequence to perform reduction steps. Definition 1 Let a and b be integers with remainder sequence r0 , r1 , . . . , rk and quotient sequence q0 , q1 , . . . , qk . We define 2 × 2 matrices Rij = Ra,b ij by the following two formulas: 1. For i ≥ 0, 10 . Rii = 01 2. If j > i then, Rij =
0 1 1 −qj
0 1 0 1 · ·...· . 1 −qj−1 1 −qi+1
Two interesting properties of these matrices are given in the next lemma (see for example [1]). Lemma 1 1.
rj
rj+1
2. R0j =
= Rij
sj
ri ri+1
tj
sj+1 tj+1
for
i < j < k,
for
0 ≤ j < k,
where ri is equal to r0 si + r1 ti , as defined in the extended Euclidean algorithm, and r0 = a, r1 = b are the values of the two inputs. Let us define the norm ||a|| of an integer a as ||a|| = blog2 |a|c + 1. Let l(i) be the unique integer such that ||rl(i) || > i and ||rl(i)+1 || ≤ i. By using the matrix R0j = R0l(||r0 /2||) we can compute, directly from r0 and r1 , the term rj of length ||r0/2|| in the remainder sequence. This step can be iterated till the last term of the remainder sequence is reached, that is till the GCD is found. The computation of the R0j matrix can be performed recursively, by using only the leading digits of r0 and r1 . The half-GCD algorithm is therefore divided in two parts. First we define a function hgcd(), which has two multiple-precision integers x and y (x > y) as input. Let x = x1 2m + x0 and y = y1 2m + y0 be, where m is half the bit-length of x and x0 , y0 < 2m . The function hgcd() truncates the last m bits of x and y, and returns the matrix R0j = R0l(||x||/2), where rj is the last term in the remainder sequence originated by x and y, such that ||rj || > ||x||/2. By using the function hgcd() it is possible to write a procedure fgcd() which computes the term rj of length ||a/2|| of the remainder sequence originated by a and b. As mentioned before, this procedure is then iterated starting with
Parallel Implementation of Sch¨ onhage’s Integer GCD Algorithm
67
rj till the last term of the remainder sequence, that is the GCD of a and b, is reached. Usually, when ||rj || becomes shorter than a certain bit-length, the reduction steps are performed with a GCD algorithm better suited for integers of moderate length. Let us start with the fgcd() function. In the C++ notation it can be described as follows (see also [7]). 1 2 3 4 5 6 7
8 9
10 11 12 13
14 15 16 17 18
}
bigint fgcd(const bigint& a, const bigint& b) { matrix R; bigint x = a; bigint y = b; for (;;) { fix(x, y); if (y.bitlen() <= GCDLIMIT) return lgcd(x,y); // use Lehmer’s GCD // generate an identical matrix R = Mat I; hgcd(x, y, R); // compute R0l(||x||/2) // compute the middle term of the remainder // sequence by multipling R0l(||x||/2) with [x, y] mulvM(R, x, y); [x, y] = R0l(||x||/2) · [x, y] fix(x, y); if (y.bitlen() <= GCDLIMIT) return lgcd(x,y); // perform one step of Euclidean GCD oneGCD(x,y); if (x == 0) // if rj+1 divides rj return y; // then return the GCD }
Some remarks about this algorithm: - bigint is an abstract data type representing multiple-precision integers. Similarly, matrix is a 2 × 2 matrix of bigints. - GCDLIMIT used in lines 6 and 12 is a threshold for switching from the halfGCD algorithm to an algorithm more efficient for short integers. We use a straightforward implementation of Lehmer’s GCD. We have experimentally found GCDLIMIT equal to 8192 bytes on 32-bit RISC processors. - The for loop is interrupted when the GCD is computed (line 16) or when the remainder of the sequence is smaller than the threshold (lines 6 and 12). At each iteration step the matrix R0l(||x||/2) is computed with the function hgcd() (line 9). In line 10, [rj , rj+1] are obtained by multiplying in place R0l(||x||/2) with [x, y]. Then a new iteration is performed starting with these new input values. - fix(bigint& x, bigint& y) (lines 5 and 11) takes the absolute values of x and y and swap them if y > x.
68
Giovanni Cesari
- In line 14 one step of the Euclidean GCD is computed. This is to ensure that in our iteration ||rj || < ||x/2||. The hgcd() procedure which computes R0j = R0l(||x||/2) can be coded as follows. 1
2 3 4 5 6 7
8 9 10
11 12 13
14 15
16 17 18 19
20 21 22
}
void hgcd(const bigint& x, const bigint& y, matrix& R) { if (x.bitlen() <= y.bitlen/2) return Mat I; // return identical matrix if (x.bitlen() <= INTLIMIT) return Egcd(x[0], y[0], R); int m = 1 + x.bitlen()/2; // let x = x1 2m + x0 , y = y1 2m + y0 bigint x1 = x>>m; // chop the last m bits of x bigint y1 = y>>m; // chop the last m bits of y 1 ,y1 hgcd(x1, y1, R); // compute Rx0l(||x 1 ||/2) x,y // compute [z, w] = R0l(3||x||/4) · [x, y] mulvM(R, x, y, z, w); fix(z, w, R); if (w != 0){ // perform one step of the Euclidean GCD // z, w and R are updated // R = R · [0, 1, 1, −q] oneGCD(z, w, R); int m1 = m/2; // chop the last m1 bits of z and w bigint z1 = z>>m1; bigint w1 = w>>m1; matrix S = Mat I; 1 ,w1 hgcd(z1, w1, S); // compute S = Rz0l(||x||/4) x,y x,y // compute S0l(||x/2||) = S0l(3||x||/4) Rz,w 0l(||x||/2) mulmM(S, R); }
Some remarks on the code above: - On input x > y ≥ 0. Let x = x1 2m + x0 , y = y1 2m + y0 with m = ||x||/2. The function truncates the last m bit of x and y and computes the matrix ; x and y are left unchanged. Rx,y 0l(||x||/2) - In line 5, if x fits in a digit a more suitable extended GCD is called to compute the matrix R. 1 ,y1 is computed. Then in line 11 two succes- In line 10 the matrix Rx0l(||x 1 ||/2) sive terms rj = z and rj+1 = w of the remainder sequence are obtained
Parallel Implementation of Sch¨ onhage’s Integer GCD Algorithm
69
by multiplying R with x and y. That is, it is possible to use the matrix 1 ,y1 , computed by using only the leading bits of x and y, to obtain Rx0l(||x||/2) successive elements rj , rj+1 of the remainder sequence of x and y (see for example [1,16]). These elements have length 3||x||/4. The matrix multiplication could be performed in place by updating [x, y]. - The function oneGCD(z, w, R) in line 14 performs one step of the Euclidean GCD. It updates z and w with two successive elements in the remainder sequence and computes the quotient q = x/y. Then it performs the matrix multiplication R = R · [0, 1, 1, −q]. 1 ,w1 is computed. As before this matrix can - In line 19 the matrix S = Rz0l(||x||/4) . be thought as Rz,w 0l(||x||/2) - Finally in line 20 the matrix Rx,y 0l(||x/2||) is obtained by multiplying the matrix R · [0, 1, 1, −q], computed in line 14, with the matrix S = Rz,w 0l(||x||/2). The output of this function is therefore 0 1 0 1 0 1 · ·...· . Rij = 1 −qj 1 −qj−1 1 −qi+1 2.1
Comparisons with Other GCD Algorithms
It is interesting to compare Sch¨ onhage’s integer GCD with other implementations on sequential architectures. As reference we take the two packages GMP (V. 2.0.2) [8] and CLN (V. 1996) [9]. The first is written in C and is known in the computer algebra community to be a reliable and efficient package, while the second is more recent and includes advanced algorithms for very large inputs, such as FFT based multiplication schemes. This package is written in C++. CLN implements an improved version of Lehmer’s GCD and GMP the accelerated GCD. We have implemented Sch¨ onhage’s GCD algorithm on top of CALYPSO, a package for multiple-precision arithmetic we have developed [2,3]. CALYPSO is designed for parallel architectures by using a message passing model of computation, but can be used efficiently also on sequential machines. Several multiplication methods are available, including Karatsuba’s algorithms, 3-primes integer FFT and floating point FFT algorithms. The best algorithm is chosen at run time depending from the architecture on which the package is run and on the size of the operands. We compare GCD implementations by using as input two successive Fibonacci numbers which constitute a worst case for GCD computations. We can clearly see in Table 1 that our implementation can be used only for large inputs, when the almost linear behavior of FFT-based algorithms becomes important. For small inputs Sch¨ onhage’s GCD is not competitive. However, as we have pointed out previously, in this algorithm there is a threshold for switching to another more efficient computation scheme for small inputs. We use a straightforward Lehmer’s. An improvement of this algorithm yields good performance also for small inputs.
70
Giovanni Cesari
Table 1. Comparison between different implementations of integer GCD and integer extended GCD algorithms. Inputs are two successive Fibonacci numbers fib(N ) and fib(N − 1). Their length is expressed in bytes. The running time is in seconds and the measurements are performed on a SPARC 4 (110 MHz V8 architecture, 64 MB). GCD[fib(N), fib(N-1)] EGCD[fib(N), fib(N-1)] N length Sch¨ onhage GMP CLN Sch¨ onhage GMP CLN 103 88 0.03 0.0015 0.005 0.28 0.015 0.01 104 868 0.47 0.29 0.05 2.5 1.60 0.22 105 8680 20.5 2.9 5.4 31.0 229.0 25.2 106 86784 267.0 300.0 540.0 366.0 6.3 h 1564.0 107 867804 3630.0 8h 14.5 h 4932.0 44.2 h
2.2
Extended GCD Algorithms
It is easy to use Sch¨ onhage’s algorithm to compute the integer extended GCD. It is only necessary to modify the function fgcd() in order to update the matrix R in the main loop. This is achieved by using an extended Euclidean GCD in lines 7, 13, and 14, and by modifying the matrix R in the function fix() at line 5 and 11 . Moreover, at the end of each iteration step it is necessary to update the matrix R by multiplying it with the matrix computed in the previous iteration. Comparisons between Sch¨ onhage’s, CLN, and GMP implementations to compute extended GCD algorithms on sequential machines are shown in Table 1. Already for medium size inputs, Sch¨ onhage’s method becomes superior to the other algorithms we have considered, and it is definitely the method of choice for large inputs.
3
Parallel GCD Algorithms
It is not known whether the GCD of two n-bit inputs can be computed in polylogarithmic time on a polynomial number of processors. Kannan, Miller, and Rudolf [10] showed how to compute the GCD in OB (n(log log n)/ log n) parallel time using O(n2 log2 n) processors. Chor and Goldreich [6] improved this to OB (n/ log n) time and O(n1+ ) processors. We are interested in a more practical approach. Our goal is to design a GCD algorithm which can be implemented on a parallel machine in order to beat the best known implementations. All GCD algorithms we have mentioned in the introduction of this paper have an iterative structure which is difficult to parallelize efficiently. To increase speed, the single instructions inside the loop are chosen to be as simple as possible. This is the case for most GCD algorithms such as the binary or Lehmer’s GCD. Significant speed-up cannot be obtained by parallelizing these operations. An attempts to parallelize GCD algorithms on shared memory machines can be found in [15]. The best results have been obtained with the accelerated GCD. The focus is on the parallelization of the
Parallel Implementation of Sch¨ onhage’s Integer GCD Algorithm
71
core operation of this algorithm, namely the linear combination and right shift (c1 a − c2 b)/2s . The speed-up obtained by following this strategy on a Sequent Balance are moderate. As pointed out by Weber [15], although the speed-ups displayed by the shared memory multiprocessor implementation of the accelerated GCD algorithm are modest, they are probably as good as can be obtained using algorithms that perform some sort of linear combination reduction (which includes all of the most commonly used integer GCD algorithms). These facts suggested to try to parallelize Sch¨ onhage’s GCD. The following considerations are the base of our work. First, as we have observed for other multiple-precision algorithms [4,5], a reasonable speed-up can be obtained only for large input sizes. This means that it is probably better to work on Sch¨ onhage’s GCD algorithm, which has the complexity of the multiplication, rather than on quadratic algorithms, such as accelerated, binary and Lehmer’s GCD algorithms. The input range where we can expect some speed-up corresponds to the number size for which Sch¨ onhage’s algorithm is attractive. Second, we expect from the structure of Sch¨ onhage’s algorithm a better behavior on parallel architectures due to its very close connection with multiplication algorithms. There are several efficient parallel implementations of multiplication algorithms. Third, it is straightforward to generalize the parallelization of Sch¨ onhage’s algorithm to the extended GCD case. 3.1
Parallel Implementation of Sch¨ onhage’s GCD Algorithm
Consider the hgcd() and fgcd() routines presented in the previous section. Take as inputs of fgcd() two successive Fibonacci numbers FN = Fib(N ) and FN−1 = Fib(N − 1 ). They constitute a worst case for the GCD algorithm. The number of iteration of the algorithm is given by log2 n, where n is the length of FN . To estimate n we can use the following formula: ! √ √ 1 + 5 N+1 1 1 − 5 N+1 ( ) ) −( , (1) FN = √ 2 2 5 and therefore, √ 1+ 5 ). n ≈ N log2 ( 2
(2)
That is, the number of iteration in fgcd() is proportional to log2 N . At each iteration step of the fgcd() program a multiplication of a 2 × 2 matrix by a 2-elements vector, R0j · [x, y], is performed in line 10. The elements of the matrix are of the same length, which is half the length of the two elements of the vector. The length of the elements of the vector starts from n/2, and is divided by two at each iteration step. The iteration is stopped when the length reaches the threshold given by GCDLIMIT.
72
Giovanni Cesari
Let Tm (n, m) the time necessary to multiply two multiple-precision integers of length n and m, respectively. In the worst case the time needed to compute the GCD with the fgcd() function is therefore Tfgcd = 4
k0 X
Tm (
k=0
2
X hgcd n n )+ Tk ( k ) + Tl (n), k 2 2 k0
n
, k+1
(3)
k=0
where k0 is the number of iterations performed in fgcd(). The maximum value of k0 is log2 n. The first summation represents the four multiple-precision multiplications necessary to multiply the 2 × 2 matrix by the vector. Tkhgcd (n/2k ) is the time necessary to compute the matrix R0l(||x||/2), where ||x|| is n/2k . Tl (n) represents the time needed to perform operations which take linear time and which we shall not parallelize. Let n/2k be the length of the two inputs x and y of hgcd(). The half-GCD routine makes two recursive calls to itself. Both are performed with inputs of size 1/2 dn/2k e. After the first call there is a vector-matrix multiplication in line 11. The vector has elements of size dn/2k e and the matrix 1/4 dn/2k e. After the second recursive call, there is a multiplication between matrices (line 20) with elements of length 1/4 dn/2k e. We can therefore write Tkhgcd (
X X n n n n n )=4 2j Tm ( k+j+2 , k+j ) + 8 2j Tm ( k+j+2 , k+j+2 ), (4) k 2 2 2 2 2 j0
j0
j=0
j=0
where j0 = log(n/2k ) represents the depth of the recursion. Thus, Tfgcd = Tl (n) + 4
k0 X
Tm (
k=0
4
j0 k0 X X k=0 j=0
2j Tm (
n n , )+ 2k+1 2k n
2
, k+j+2
n
2
)+8 k+j
(5) j0 k0 X X
2j Tm (
k=0 j=0
n n , ). 2k+j+2 2k+j+2
The two recursive calls in hgcd() are not independent. Therefore, the recursive structure of the half-GCD routine is inherently sequential. Parallelization of Sch¨ onhage’s algorithm can be obtained, however, by using parallel implementations of the multiple-precision operations. The matrix multiplications can be parallelized successfully. We have parallelized them at two levels, by dividing the available processors in four groups. First we distribute the elements of the matrix between the different processors groups. One processor in each group acts as master and receives the operands. Then, we perform each multiplication in parallel using the remaining processors of each group. We have used a parallel implementation of Karatsuba’s algorithms and of the 3-primes FFT. By using a message passing model of computation we have found in [5] that the parallel time Tm (n, n) to perform parallel Karatsuba’s multiplication between two integers of length n, for n >> p, is given by Tk (n, n) ≈
k1 nlog2 3 + βn, p
(6)
Parallel Implementation of Sch¨ onhage’s Integer GCD Algorithm
73
where p is the number of processors and β depends from the network characteristics. Similarly [4], the parallel time to perform parallel multiplication by using the 3-primes FFT algorithm is given by T3p (n, n) ≈
n (α + β log n + γ log p) + δn, p
(7)
where α represents the part of the algorithm which can be parallelized without communication, β is the FFT and FFI parallel part, γ is the communication during the FFT and FFI computation, and δ is the sequential part of the algorithm. Using these results we could rewrite equation 5 as a function of the number of processors and of the characteristics of the machine. It is more convenient, however, to analyze separately the parallelization of the fgcd() and hgcd() functions. Let us consider the first summation in equation 3. If we distribute the four multiplications of the matrix-vector multiplication, and we use the parallel Karatsuba’s algorithm to perform multiplications, we find 4
k0 X 0
Tm (
n n , )= 2k+1 2k
k0 k0 X X n k1 k1 n log2 3 + 2 β k+1 ≈ nlog2 3 + 2βn. 2 k+1 p 2 2 p 0 0
(8)
The time needed to perform the hgcd() function can be expressed in a similar way. Similar results hold also if we use the 3-primes FFT algorithm to perform multiple-precision multiplications. In the extended GCD case the hgcd() function remains unchanged, while in the fgcd() function it is necessary to perform a computationally expensive multiplication between matrices at each iteration step. Again, this operation can be parallelized efficiently.
4
Experimental Results
We have implemented Sch¨ onhage’s GCD algorithm on top of our parallel library CALYPSO. Experiments have been performed on an Intel Paragon located at ETH Zurich with 160 processors. The characteristics of the machine are summarized in Table 2. It is interesting to note that while the multiplications in fgcd() can be parallelized efficiently, it is more difficult to obtain some speed-up in the half-GCD routine. This can be explained in the following way. When the operands of Equation 8 in the fgcd() function become too small, multiplications are performed sequentially. Therefore there is a part of this function which cannot be parallelized. In the fgcd() functions, however, this sequential part is independent from the size of the inputs and, therefore, it becomes negligible after a certain
74
Giovanni Cesari
Table 2. Characteristics of the Intel Paragon located at ETH, Zurich. Architecture 160 compute nodes arranged in a 2-D mesh 2 cpus per node Processors application processor clock rate distributed memory, per node 32-bit integer arithmetic IEEE 754 floating-point arithmetic floating-point pipeline instructions Network bandwidth start-up time
Intel i860 50MHz 64 MB
135 MB/s 30 µs
Table 3. Running time on the Paragon of Sch¨ onhage’s GCD. Input size is in bytes, running time is in seconds. Parallel multiplications between multipleprecision integers are performed by using Karatsuba’s algorithm GCD - Karatsuba multiplications EGCD - Karatsuba multiplications input 1 proc 4 proc 12 proc 36 proc 1 proc 4 proc 12 proc 36 proc 8192 47 36 35 35 36 43 42 41 16384 119 81 79 79 159 94 88 87 32768 306 182 170 170 420 212 190 187 65536 807 416 368 363 1139 496 414 391 131072 2193 980 808 770 3185 1208 935 838 262144 6128 2415 1836 1633 9098 3068 2186 1897 524288 17197 6420 4316 3837 25980 8120 5358 4292 1048576 48260 17185 10145 9021 - 21492 13124 9710
input size. On the other hand, the computation in the half-GCD function can be represented as a binary tree. At each node of the tree multiple-precision multiplications are performed. Therefore, if after a certain threshold operations cannot be parallelized, the sequential part of the algorithm grows with the size of the input. Our results on the Paragon are presented in Tables 3 and 4. In all our experiments we have computed the GCD of two random numbers of the same length. In the first table we use only Karatsuba’s algorithm for multiplications. In the second table we switch to the 3-primes FFT algorithm for input sizes larger than a certain threshold. As we have already mentioned, we have divided the processors in four groups. We first parallelize matrix operations by choosing a master processor in each group. Then we parallelize multiplications by using the remaining processors. This explain why the efficiency is significantly larger when only four processors are used. It can be noted in Table 4 that using eight
Parallel Implementation of Sch¨ onhage’s Integer GCD Algorithm
75
Table 4. Running time on the Paragon of Sch¨ onhage’s GCD. Input size is in bytes, running time is in seconds. Parallel multiplications between multipleprecision integers are performed by using the 3-primes FFT algorithm GCD - FFT multiplications input 1 proc 4 proc 8 proc 16 proc 8192 43 34 34 34 16384 111 77 70 75 32768 289 175 172 163 65536 763 400 394 354 131072 1956 914 812 757 262144 4856 2087 1932 1724 524288 11654 4765 4203 3926 1048576 27390 10960 9344 8940
EGCD - FFT multiplications 1 proc 4 proc 8 proc 16 proc 57 40 40 40 145 90 90 85 386 208 205 184 1328 550 508 406 3295 1264 1163 889 7973 2892 2547 2076 19295 6617 5534 4849 46301 14558 13575 11150
processors does not give significant speed-up over the case with four processors. This is because Karatsuba’s algorithm can be parallelized efficiently only on a number of processors which is a power of three. In general, we can see that to get significant speed-up the size of the operands has to be large. As expected, better results can be obtained for the extended GCD.
5
Conclusions
By parallelizing Sch¨ onhage’s GCD algorithm, speed-up can be obtained only for large inputs. Due to the large number of operations which cannot been parallelized efficiently and of the sequential structure of the recursion of the half-GCD procedure, the overhead of Sch¨ onhage’s algorithm cannot be reduced below a certain threshold. Therefore, for inputs of moderate length, a parallel version of Sch¨ onhage’s algorithm cannot compete with sequential algorithms, such as the accelerated GCD or Lehmer’s GCD. However, when the input size is large, Sch¨ onhage’s algorithm is superior and a parallel implementation provides additional speed-up. It is interesting to compare the parallel implementation of Sch¨ onhage’s GCD with the parallel implementation of the accelerated GCD presented in [15]. As in the sequential case, the two algorithms can be used efficiently on different input ranges. For medium size inputs, therefore, the accelerated GCD seems to be better suited both in the sequential and in the parallel case. After this threshold, however, Sch¨ onhage’s algorithm is the method of choice. For the extended GCD, Sch¨ onhage’s algorithm seems to be particularly well suited. However, we did not find in the literature any other parallel implementations which would have allowed us to perform quantitative comparisons.
76
Giovanni Cesari
References 1. A. Aho, J. Hopcroft, and J. Ullman. The Design and Analysis of Computers Algorithms. Addison-Wesley, 1974. 2. G. Cesari. CALYPSO: a computer algebra library for parallel symbolic computation. In Proceedings of the 2nd International Symposium of Parallel Symbolic Computation (PASCO97), Hawaii, USA. ACM Press, 1997. 3. G. Cesari. Parallel Algorithms for Multiple-Precision Arithmetic. PhD thesis, Swiss Federal Institute of Technology, ETH, CH-8092 Zurich, 1997. 4. G. Cesari and R. Maeder. Parallel 3-primes FFT. In Design and Implementation of Symbolic Computation Systems (DISCO96), volume 1128 of LNCS. Springer Verlag, 1996. 5. G. Cesari and R. Maeder. Performance analysis of the parallel Karatsuba multiplication algorithm for distributed memory architectures. Journal of Symbolic Computation, Special Issue on Parallel Symbolic Computation, 21:467–473, 1996. 6. B. Chor and O. Goldreich. An improved parallel algorithm for integer gcd. Algorithmica, 5:1–10, 1990. 7. R.E. Crandall. Projects in scientific computation. Springer New York, 1994. 8. T. Granlund. GNU MP. The GNU Multiple Precision Arithmetic Library, 1996. 9. B. Haible. CLN, a Class Library for Numbers, 1996. 10. R. Kannan, G. L. Miller, and L. Rudolph. Sublinear parallel algorithm for computing the greatest common divisor of two integers. SIAMJC, 16:7–16, 1987. 11. D. E. Knuth. Seminumerical Algorithms, volume 2 of The Art of Computer Programming. Addison-Wesley, second edition, 1981. 12. A. Schoenhage. Schnelle Berechnung von Kettenbruchentwicklungen. Acta Informatica, 1:139–144, 1971. 13. J. Sorenson. Two fast GCD algorithms. Journal of Algorithms, 16(1):110–144, January 1994. 14. K. Weber. The accelerated integer GCD algorithm. ACM Transactions on Mathematical SW, 21:111–122, March 1995. 15. K. Weber. Parallel implementation of the accelerated integer GCD algorithm. Journal of Symbolic Computation, Special Issue on Parallel Symbolic Computation, 21:457–466, 1996. 16. C.K. Yap. Fundamental Problems in Algorithmic Algebra. Princeton University Press (in press), 1996.
The Complete Analysis of the Binary Euclidean Algorithm Brigitte Vall´ee GREYC, Universit´e de Caen, 14032 Caen Cedex, France
[email protected]
Abstract. We provide here a complete average–case analysis of the binary continued fraction representation of a random rational whose numerator and denominator are odd and less than N . We analyse the three main parameters of the binary continued fraction expansion, namely the height, the number of steps of the binary Euclidean algorithm, and finally the sum of the exponents of powers of 2 contained in the numerators of the binary continued fraction. The average values of these parameters are shown to be asymptotic to Ai log N , and the three constants Ai are related to the invariant measure of the Perron-Frobenius operator linked to this dynamical system. The binary Euclidean algorithm has been previously studied in 1976 by Brent who provided a partial analysis of the number of steps, based on a heuristic model and some unproven conjecture. Our methods are quite different, not relying on heuristic hypothesis or conjecture, and more general, since they allow us to study all the parameters of the binary continued fraction expansion.
Introduction The Euclidean algorithms find the greatest common divisor (gcd) of two integers. The classical Euclidean algorithm uses divisions and exchanges, and is based on the two relations gcd(u, v) = gcd(v mod u, u),
gcd(u, v) = gcd(v, u).
The behaviour of the classical algorithm is now well–understood. Heilbronn [Hei] and Dixon [Di] have independently shown that the average number DN of Euclidean divisions on a random rational input with numerator and denominator less than N is asymptotically logarithmic, 12 log 2 log N. π2 Here, we focus on the binary Euclidean algorithm which operates on pairs of odd integers. It performs only subtractions, right binary shifts and exchanges. Let the symbol Val2 (u) denote the dyadic valuation of the integer u, i.e., the largest exponent b such that 2b divides u; then, the binary Euclidean algorithm is based on the relations v−u gcd(u, v) = gcd(v, u). gcd(u, v) = gcd Val (v−u) , v , 2 2 DN ∼
J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 77–94, 1998. c Springer-Verlag Berlin Heidelberg 1998
78
Brigitte Vall´ee
Unlike the classical Euclidean algorithm, no divisions are required so that each iteration of the binary gcd algorithm is faster than an iteration of the classical gcd algorithm. We consider here the number of operations (subtractions, shifts, and exchanges) that are performed on a random rational with numerator and denominator odd and less than N , and we show that each of these numbers has an average that grows logarithmically. The binary Euclidean algorithm operates on pairs of odd integers that belong to the set Ω := {(u, v); u, v odd, 0 < u ≤ v}. (1) as follows: Binary Euclidean Algorithm (u, v) Input: (u, v) ∈ Ω; While u 6= v do While u < v do b := Val2 (v − u); v := (v − u)/2b ; Exchange u and v;
(2)
Output: u (or v). This algorithm has two nested loops: The external loop corresponds to an exchange. Between two exchanges, there is a sequence of iterations that constitutes the internal loop. This sequence consists in subtractions and shifts and can be written as v = u + 2b1 v1 ,
v1 = u + 2b2 v2 ,
v2 = u + 2b3 v3 ,
...
v`−1 = u + 2b` v` , (3)
with v` < u. Then we exchange u and v` . If x = x0 denotes the rational u/v at the beginning of an internal loop, the global result of the sequence (3) of the iterations followed by the last exchange is the rational x1 = v` /u defined by x0 =
1 , a + 2k x1
where a is an odd integer equal to a = 1 + 2b1 + 2b1 +b2 + 2b1+b2 +b3 + . . . + 2b1 +b2 +b3 +...+b`−1 , while the exponent k is equal to k = b1 + b2 + b3 + . . . + b`−1 + b` . Then the antecedents of the rational x1 can be written as x0 = h(x1 ), where all the possible functions h are of the form h(x) :=
1 , a + 2k x
with a odd, 1 ≤ a < 2k , and k ≥ 1.
(4)
The Complete Analysis of the Binary Euclidean Algorithm
79
Thus the rational u/v has a unique binary continued fraction expansion 1
u = v a1 + a2 +
2k 1 2k 2 .. .. . .+
.
(5)
2kr−1 ar + 2 k r
We study three parameters of this continued fraction: (i) The height or the depth (i.e., the number of exchanges) ; here, it is equal to r. (ii) The total number of operations that are necessary to obtain the expansion; if p(a) denotes the number of 1’s in the binary expansion of the integer a, it is equal to p(a1 ) + p(a2 ) + . . . + p(ar ) − 1, where the ai ’s are the denominators of the binary continued fraction. (iii) The total sum of exponents of 2 in the numerators of the binary continued fraction: here, it is equal to k1 + k2 + . . . + kr . Our main results (Theorems 1 and 2) describe the average values of these three parameters: Main Result. The expectations of the main parameters of the Binary Continued Fraction on the set ΩN := {(u, v); u, v odd, 0 < u ≤ v ≤ N }
(6)
are all asymptotically equal to Ai log N . The constants Ai are related to the dominant spectral properties of the operator V2 V2 [f](x) :=
X X k≥1
a odd, 1≤a<2k
2 1 1 , f a + 2k x a + 2k x
(7)
defined on a suitable Hardy space of holomorphic functions inside a disk that contains the real segment ]0, 1]. Denote by f2 the dominant eigenvector of V2 relative to the dominant eigenvalue λ = 1, with the normalization condition R1 f (t)dt = 1, and by F2 the integral of f2 such that F2 (0) = 0. Then the three 0 2 constants can be expressed with f2 and F2 , as follows: A1 =
2 , π 2 f2 (1)
A2 =
X 1 2 1 [ F2 ( )] π 2 f2 (1) a odd 2ka a
A3 = 2A2 − A1 ,
(8)
a≥1
(Here ka denotes the integer part of log2 a). Remark: The relation between A1 , A2 , A3 in (8) is the one to be expected if, at each depth of the process, the decomposition described in the sequence (3) v = au + 2k r,
with a odd less than 2k and r < u,
80
Brigitte Vall´ee
has the same behaviour as in the beginning of the process: when (u, v) is uniformly distributed in Ω, the average values E[p] of p(a) and E[k] of the exponent k satisfy E[k] = 2E[p] − 1. The methods developed here use quite varied tools: generating functions, Ruelle operators, Tauberian methods, functional analysis. First, we use classical tools in the average–case analysis of algorithms [Fl], [FS]: we introduce the generating functions related to the three parameters to be analyzed; as is usual in the context of computational number theory, these generating functions are Dirichlet series. Second, we prove that these generating functions are closely linked to the operator Vs , X X 1 1 )s f( ). (9) ( Vs [f](x) := kx a + 2 a + 2k x a odd, k≥1
1≤a<2k
This operator contains all the information on the dynamics of the algorithm. In the context of dynamical systems [Ru], it is called the Ruelle operator relative to the system. More precisely, the generating functions involve the quasi–inverse operator Λs := (I − Vs )−1 , and the expectations to be studied are partial sums of coefficients of these Dirichlet series, so the main results of the paper will come from the application of Tauberian Theorems [De], provided that they can be applied. This will be the case as soon as the operator Vs when acting in a suitable Banach space has a “spectral gap”, i.e. a unique dominant eigenvalue separated from the remainder of the spectrum by a gap. When acting on a Hardy space of holomorphic functions relative to a suitable disk, the operator Vs is proven to be compact and positive (in the sense of Krasnoselsky) [Kr] for real values of parameter s, and then it has a spectral gap. Since Tauberian theorems link the asymptotics of coefficients to the dominant singularity of the function, the constants Ai involve the dominant singularity of the quasi–inverse (I − Vs )−1 , i.e. the dominant spectral objects of the Ruelle operator Vs . These “dynamical” methods have been already used in other domains of algorithmic number theory, mainly in the study of the classical Euclidean algorithm, or related algorithms [Ma1], [Hei ] (1 ≤ i ≤ 4), [Vai ] (1 ≤ i ≤ 3), [DFV]. In particular, the author [FV] uses them to obtain an alternative proof of the results of Heilbronn and Dixon about the logarithmic behaviour of the Classical Euclidean algorithm. Recently, the author [Va4] has extended similar methods in the general context of information theory, where she describes algorithmic properties of dynamical sources. In all previous works, the functional analysis to be used is simpler, because the Ruelle operators act on a suitable Banach space of holomorphic functions defined in an open disk and continuous on the boundary. Here, the powers of 2 introduce singularities on the boundary of the domain, and make the functional space more difficult to deal with. Our methods are quite different of those used by Brent. In fact, Brent [Br1] analysed an extension of the algorithm to a continuous (probabilistic) model. There, the input is a real number x in ]0, 1] and the line (2) of the previous algorithm algorithm is modified.
The Complete Analysis of the Binary Euclidean Algorithm
81
Continuous Probabilistic Binary Euclidean Algorithm (x) Input: x (real) ∈ ]0, 1]; While x 6= 1 do 1 u If x = for some (u, v) ∈ Ω, then b := Val2 ( − 1) v x else choose an integer b ≥ 1 with Pr[b = k] = 2−k ; 1 1 y := b ( − 1); 2 x 1 if y < 1 then x := y else x := ; y Brent studies the average number of internal loops and he introduces the operator B2 B2 [f](x) :=
X b≥1
1 X x 1 2 1 2 + f f , b b b 1+2 x 1+2 x x+2 x + 2b b≥1
that transforms densities: if f is the initial density on ]0, 1], then B2 [f] is the density on ]0, 1] after one internal step of the algorithm, and B2` [f] is the density on ]0, 1] after ` internal steps of the algorithm. Brent conjectures that there exists a limit density for the algorithm, i.e. a function g2 such that lim B2` [f] = g2 ,
`→∞
(10)
then he makes an (essential) heuristic hypothesis: the rationals have a “typical” behaviour inside the reals, so that the restriction to the rationals of the limit (real) density defined in (10) is also the limit (rational) density. Then, he can prove the following Theorem (Brent). Under the conjecture and the heuristic hypothesis, the ˜ N of total iterations of the Binary Euclidean Algorithm average number KN or K ˜N := on the set ΩN := {(u, v); u, v odd, 0 < u ≤ v ≤ N }, or on the set Ω {(u, v); u, v odd, 0 < u ≤ v ≤ N, gcd(u, v) = 1} satisfies Z 1 1 1 ˜ log N with M = log 2 − log(1 − x) g2 (x)dx. (11) KN ∼ KN ∼ M 2 0 Here, g2 denotes the fixed point of the operator B2 B2 [f](x) :=
X b≥1
1 X x 1 2 1 2 + f f , 1 + 2b x 1 + 2b x x + 2b x + 2b
(12)
b≥1
R1 such that 0 g2 (t)dt = 1. So, the main differences between both methods, Brent’s method and ours, are the following: (i) We operate a proven transfer from the continuous model to the discrete model by means of Tauberian Theorems.
82
Brigitte Vall´ee
(ii) Brent’s operator Bs [f](x) :=
X b≥1
1 X x 1 s 1 s + f f , b b b 1+2 x 1+2 x x+2 x + 2b
(13)
b≥1
is less powerful than the operator Vs since it can be used only in the study of the second parameter (the total number of steps). (iii) For proving Brent’s conjecture, one can try to exhibit a “spectral gap” for Bs . This seems difficult because it is not clear for instance whether Bs is compact on a suitable Banach space: another expression of Bs , namely Bs [f](x) = Us [f](x) +
X s 1 1 1 1 ), with U , U [f]( [f](x) := f s s xs x 1 + 2k x 1 + 2k x k≥1
(14) involves both x and 1/x and suggests that the behaviour of Bs is more intricate. Plan of the Paper. We first introduce the generating functions that are involved in the study of the first parameter (the height), and we exhibit their connection with the Ruelle operator Vs . Then, we state precisely the properties of the operator Vs on a suitable Banach space (compacity, nuclearity, positivity), and we show that the hypotheses of Tauberian Theorems are fulfilled. We deduce the first result on the average behaviour of the height. Then, we show how to slightly modify the previous framework so as to study the other parameters related to the numerators and the denominators of the binary continued fraction. The properties of the operator Vs which have been previously stated allow us to obtain easily the asymptotic behaviour of their average values. Finally, we relate our work with Brent’s paper and we describe principles on which numerical computations could be performed. An extended version of the paper (with all the proofs) will appear in the Algorithmica journal [Va6].
1
The Average Height of a Binary Continued Fraction
Here, we introduce generating functions that describe the height of binary continued fraction and we relate them with the dynamical operator. We consider the following sets ˜ := {(u, v) ∈ Ω; gcd(u, v) = 1}, Ω := {(u, v); u, v odd, 0 < u ≤ v}, Ω ΩN := {(u, v) ∈ Ω; v ≤ N },
˜N := {(u, v) ∈ Ω; ˜ v ≤ N }, Ω
(15)
for the possible inputs of the Binary Euclidean algorithm, and we denote by Ω [`] , ˜ [`] the subsets of Ω, Ω, ˜ ΩN , Ω ˜N for which the algorithm performs ˜ [`] , Ω [`] , Ω Ω N N exactly ` exchanges. Equivalently, the height of the binary continued fraction is equal to `. We study the average number of exchanges E˜N of the algorithm on
The Complete Analysis of the Binary Euclidean Algorithm
83
˜N and the average number of iterations EN of the algorithm on ΩN . These Ω quantities satisfy E˜N :=
1 X ˜ [`] ` |ΩN | ˜ |ΩN | `≥0
EN :=
1 X [`] ` |ΩN |, |ΩN |
(16)
`≥0
and we wish to evaluate their asymptotic behaviour (for N → ∞). The operator Vs defined in (9) plays a central role for evaluating the costs. This operator can be written, in a shorthand notation, as Vs [f](x) :=
X h∈L
1 f ◦ h(x). D[h](x)s
(17)
Here, D[h] denotes the denominator of the linear fractional transformation (LFT) h, defined for h(x) = (ax + b)/(cx + d) with a, b, c, d coprime integers by s | det h| , (18) D[h](x) = |cx + d| = |h0 (x)| that can be extended on a natural way on a complex neighborhood of ]0, 1]. The sum is taken over the set L that contains all the linear fractional transformations h that express the possible antecedents of a rational x as a function of x; These h are defined in (4) and are called LFT’s of height 1. More generally, the multiplicativity of D, defined in (18), D[h ◦ g](x) = D[h](g(x)) D[g](x) proves that the `-th iterate of Vs uses all the LFT’s h of L`; they are of the form h = h1 ◦ h2 ◦ . . . ◦ h` with hi ∈ L and express all the `-th possible iterates of x0 as a function of x0 . These LFT’s are called LFT’s of height `, and Vs` [f](x) :=
X h∈L`
1 f ◦ h(x). D[h](x)s
(19)
Since the `-th iterate of Vs generates all the LFT’s of height `, the operator Vs can be viewed as a “generating” operator, and this explains why usual generating functions can be easily expressed with it. ˜N of the binary Euclidean Proposition 1. The average number of exchanges E ˜ algorithm on ΩN is a ratio where the numerators and the denominators involve the partial sums of the Dirichlet series F (s) and G(s), defined from the quasi– inverse Λs := (I − Vs )−1 , F (s) := Λs [1](1) = (I −Vs )−1 [1](1), G(s) := Λ2s ◦Vs [1](1) = (I −Vs )−2 ◦Vs [1](1). In the same vein, the average number of exchanges EN of the algorithm on ΩN is a ratio where the numerators and the denominators involve the partial sums of
84
Brigitte Vall´ee
˜ ˜ ˜ is the Riemann function the Dirichlet series ζ(s)F (s), and ζ(s)G(s), where ζ(s) relative to odd numbers, ˜ := ζ(s)
X 1 1 = (1 − s )ζ(s). s v 2 v odd
(20)
v≥1
˜ [`] (` ≥ 0); since the Binary Euclidean Proof. Consider an element (u, v) of Ω algorithm performs ` exchanges on this input, there exists exactly one linear fractional transformation h of height ` such that u/v = h(1). Then, from (19) and for any ` ≥ 0, Vs` [f](1) =
X h∈L`
1 f ◦ h(1) = D[h](1)s
X
1 u f( ). vs v
˜ [`] (u,v)∈Ω
(21)
When using the Riemann series ζ˜ relative to odd numbers defined in (20), we obtain, for (21), a sum over Ω [`] , Vs` [f](1) =
1 ˜ ζ(s)
X
1 u f( ). vs v
(u,v)∈Ω [`]
(22)
Now, the sum over all the possible heights (` ≥ 0) of (21) or (22) gives two formal expressions of the quasi–inverse (I − wVs )−1 [f](1) =
X
w ` Vs` [f](1) =
`≥0
=
X
X
w`
˜ [`] (u,v)∈Ω
`≥0
1 u f( ) vs v
1 u 1 X ` X w f( ). ˜ vs v ζ(s) `≥0 (u,v)∈Ω [`]
(23)
The differentiation of (23) with respect to w gives (I − wVs )−2 ◦ Vs [f](1) =
X
1 X `−1 `w ˜ ζ(s) `≥1
1 u f( ) vs v
˜ [`] (u,v)∈Ω
`≥1
=
X
`w `−1 X
(u,v)∈Ω [`]
1 u f( ). vs v
(24)
[`]
[`]
We now choose f = 1, and w = 1, and we denote by ν˜v (resp. νv ) the number ˜ [`] (resp. Ω [`] ) whose second component is equal to v; then, the of elements of Ω average numbers of iterations described in (16) are of the form P P [`] ˜v `≥0 ` v≤N ν ˜ , EN = P P [`] ˜v `≥0 v≤N ν
P `≥0
EN = P
`
`≥0
P P
[`]
v≤N
v≤N
νv [`]
νv
.
(25)
The Complete Analysis of the Binary Euclidean Algorithm [`]
85 [`]
On the other side, the Dirichlet series that generate the quantities νv and ν˜v to be studied, X 1 X 1 X 1 X [`] ν˜v[`] = νv , (26) F (s) := s ˜ v vs ζ(s) v≥1 `≥0 v≥1 `≥0 G(s) :=
X 1 X 1 X 1 X [`] ` ν˜v[`] = ` νv , s ˜ v vs ζ(s) v≥1 `≥0 v≥1 `≥0
(27)
are expressed in terms of the operator Vs as F (s) := (I − Vs )−1 [1](1)
G(s) := (I − Vs )−2 ◦ Vs [1](1).
(28)
Relations (25), (26), (27), (28) then prove the proposition. Thus, the asymptotic evaluations of E˜N and EN (for N → ∞) are possible if we can apply the following Tauberian theorem [De] to the Dirichlet series F (s), ˜ ˜ ζ(s)F (s), G(s), ζ(s)G(s). Tauberian Theorem. Let F (s) be a Dirichlet series with non negative coefficients such that F (s) converges for <(s) > σ > 0. Assume that (i) F (s) is analytic on <(s) = σ, s 6= σ, and (ii) for some γ ≥ 0 F (s) =
A(s) + C(s), (s − σ)γ+1
where A, C are analytic at σ, with A(σ) 6= 0. Then, as N → ∞, X A(σ) N σ logγ N [1 + (N ) ], an = σΓ (γ + 1)
(N ) → 0.
n≤N
In the remainder of the paper, we show that the Tauberian Theorem applies to the Dirichlet series defined in (26) and (27) with σ = 2, and respectively γ = 0 and γ = 1. This will prove the logarithmic behaviour of the height. We first examine the case of function F (s) defined in (26). It is closely linked to the ζ˜ function defined in (22), F (s) := (I − Vs )−1 [1](1) = 1 +
˜ − 1) 1 X v−1 1 ζ(s + 1]. = [ s ˜ ˜ v 2 ζ(s) 2ζ(s)
(29)
v odd
˜ It is then clear that the Tauberian Theorem applies to F (s) and ζ(s)F (s), with σ = 2 and γ = 0. ˜ Lemma 1. The Tauberian Theorem applies to F (s) and ζ(s)F (s) with σ = 2 and γ = 0. It applies more generally (with the same parameters) to Λs [f](1) for functions f that satisfy |xf 0 (x)| ≤ K for x ∈]0, 1]. Proof. See the extended paper [Va6].
86
2
Brigitte Vall´ee
The Operator Vs
˜ If we wish to apply the Tauberian Theorem to G(s) and ζ(s)G(s), with σ = 2 and γ = 1, we need to study the operator Vs when acting in a suitable functional space. It appears that the convenient space is a Hardy space H2 (D) relative to some disk D that contains the interval ]0, 1]. On this space, the operator is compact, even nuclear, and its spectrum is thus discrete (Proposition 2). Furthermore, for real values of the parameter s, the operator satisfies strong positivity properties that entail the existence of dominant spectral objects (dominant eigenvalue, dominant eigenfunction, projector on the dominant eigensubspace) (Proposition 4). Finally, we exhibit a maximum property of the spectral radius (Proposition 5). All these properties allow us to prove that the hypotheses of the Tauberian Theorems are fulfilled (Proposition 6), and we thus obtain the logarithmic behaviour of the height of the binary continued fraction (Theorem 1). The operator V2 cannot act on the Banach space of analytic functions defined in a complex neighborhood of [0, 1], since, for instance, the function V2 [1] is not bounded in the neighborhood of 0. On the other hand, the operator V2 acts on the Banach space L1 ([0, 1]), since one has: Z
Z
1
1
|V2 [f](t)|dt ≤ 0
|f(t)|dt.
(30)
0
But the space L1 ([0, 1]) seems too large, and, in the following, the main problem is to find a suitable Banach space “between” analytic functions and integrable ones, on which the operators Vs act (for s near 2) with “good” spectral properties. The Hardy space relative to a suitable disk containing the interval ]0, 1] will be convenient. We consider the open disk D of diameter J :=]0, 2ρ[, related to some fixed number ρ with (1/2) < ρ < 1, the boundary δ of D, and the circle δr of center ρ and radius r. We work with the space of the functions defined in D, that are analytic inside D and such that the quantity Z 1 |f(z)|2 |dz| ||f||22 = Sup 0≤r<ρ 2πρ δr is finite. Such functions f are thus defined almost everywhere on δ, and one has also Z 1 2 |f(z)|2 |dz|. (31) ||f||2 := 2πρ δ This set of functions is classically denoted by H2 (D) and is called the Hardy space of order two associated to the disk D. The quantity ||f||2 defined in (31) is a norm which endows H2 (D) with a Banach space structure, and even a Hilbert space structure. Remark that the restriction of an element f ∈ H2 (D) to the unit interval belongs to L2 ([0, 1]), and thus to L1 ([0, 1]).
The Complete Analysis of the Binary Euclidean Algorithm
87
The operator Vs is the sum of the component operators Vs,a,k Vs,a,k [f](z) := (
1 1 )s f( ). k a+2 z a + 2k z
(32)
Each component operator is a so–called composition operator of the form f → hs f ◦ h, where h defined in (4) is an analytic function which maps D inside D. Such operators are studied in an extensive way by several authors (Schwartz [Sc], Shapiro and Taylor [ST], Shapiro [Sh1], [Sh2]) who link the properties of the operator and the position of the image h(D) with respect to the boundary of D. More precisely, they prove the following fact: when acting on H2 (D), a ¯ of the disk D composition operator is compact as soon as h maps the closure D inside D. It is then moreover nuclear of order 0 (in the sense of Grothendieck cf [Gr1], [Gr2]). Here, since ρ is chosen strictly greater than 1/2, each component operator is thus compact and nuclear of order 0, and we prove now that the same properties hold for the operator Vs itself: Proposition 2. For <(s) > 3/2, the operator Vs acts on the space H2 (D) and is compact on this space. Its spectrum is thus discrete with an accumulation point at 0. Moreover, Vs is nuclear of order 0 on this space. Proof. See the extended paper [Va6]. It appears that the image of the space H2 (D) by the operator Vs is contained in a very particular subspace G of H2 (D), where singularities are concentrated at z = 0 and involve only logarithmic terms proportional to log z or periodic terms satisfying C(z) = C(2z). In particular, all the eigenfunctions of Vs belong to G and satisfy the hypotheses of Lemma 1. These facts will play a central rˆole in the proofs of Propositions 4 and 5. Proposition 3. The image of the space H2 (D) by the operator Vs is contained in the subset G of H2 (D), G := {f | f ∈ H2 (D), f(z) = A log z + B(z) + C(z)} ¯ and where B and C are analytic inside D; furthermore, B is continuous in D ¯ satisfies B(0) = 0; C is continuous inside D \ {0}, satisfies C(2z) = C(z), and is bounded on the real segment J . Proof. See the extended paper [Va6]. We describe now the spectral properties of the operator Vs : H2 (D) → H2 (D), for real values of parameter s. We show strong positive properties for this operator, so that we can use Krasnoselsky’s theorem, and exhibit dominant spectral properties for operators Vs when associated to real values of parameter s. Proposition 4. For real s > 3/2, the operator Vs : H2 (D) → H2 (D) has a unique positive dominant eigenvalue λ(s). For s = 2, one has λ(s) = 1. Sketch of the Proof. (For a detailed proof, see the extended paper [Va6]). We follow the main lines of Mayer’s work [Ma2] that we adapt in our context. We
88
Brigitte Vall´ee
consider the real Banach space HR formed with elements f of H2 (D) which are real on the real segment J . For real s, Vs acts on HR . We denote by H+ the subset of HR formed with elements f which are positive on the real segment J . For real s, Vs acts on H+ , and H+ ⊂ HR is a cone (proper and reproducing in the sense of Krasnoselsky). The interior of the cone, denoted by H∗+ , is formed with elements f of H2 (D) which are strictly positive on the real segment J . Since J ⊂]0, 2[, the cone H∗+ contains the function u0 (z) := 1 − log2 z
(33)
which plays the role of a reference function: it is a “typical” function of the set G of Proposition 3. We show that the operator Vs is u0 –positive with respect to the cone H+ : for any non-zero element f of H+ , there exists an integer p such that Vsp [f] is u0 upper–bounded and u0 lower–bounded; more precisely, there exist p and two constants α and β, strictly positive such that α u0 (x) ≤ Vsp [f](x) ≤ β u0 (x) for any x in J . Then, we use Krasnoselsky’s theorem: Since Vs : HR → HR is a compact u0 – positive operator with respect to the proper and reproducing cone H+ , the restriction of Vs to the real Banach space HR has a unique positive dominant eigenvalue λ(s) strictly positive. One can choose the dominant eigenvector fs in the set H∗+ ∩ G, which means that fs is strictly positive on J and of the form A log2 (z) + B(z) + C(z). Moreover, the nuclearity proven in Proposition 1 shows that the spectra of the two operators, the operator Vs : H2 (D) → H2 (D) and its restriction to HR are the same. Propositions 2 and 4 then prove the existence of a spectral gap for Vs . Corollary 1. For real s > 3/2, the operator Vs has on H2 (D) a “spectral gap”: there is a gap between the dominant eigenvalue λ(s) (which is positive and simple) and the remainder of the spectrum, i.e. the supremum ρ(s) := Sup{|µ| | µ ∈ Sp Vs , µ 6= λ(s)} is strictly less than λ(s). We prove now a maximum property for the spectral radius which appears to be analogous to those proven in [Fa], [Po] or [Va3] in other contexts. Proposition 5. For s complex on the punctured plane {<(s) ≥ 2, s 6= 2}, the operator Vs : H2 (D) → H2 (D) has a spectral radius strictly less than 1. Proof. See the extended paper [Va6]. We only describe the three main steps: (i) On each line {<(s) = σ > 2}, any eigenvalue of Vs is less than λ(σ) in absolute value. (ii) On the line {<(s) = 2, s 6= 2}, any eigenvalue of Vs is strictly less than 1 in absolute value. One uses there Lemma 1. (iii) The function λ(s) is strictly decreasing along the real axis.
The Complete Analysis of the Binary Euclidean Algorithm
89
We finally prove that all the properties that we have established for Vs : H2 (D) → H2 (D) are sufficient to apply the Tauberian theorems. Proposition 6. The conditions of the Tauberian Theorem hold for G(s) and ˜ ζ(s)G(s), with σ = 2 and γ = 1. Proof. Proposition 5 shows that the quasi–inverse operator Λs is regular in the punctured plane <(s) ≥ 2, s 6= 2, and the hypothesis (i) of the Theorem is fulfilled. Consider now the hypothesis (ii). Corollary 1 proves that the operator V2 : H2 (D) → H2 (D) has a spectral gap. Then, it admits a spectral decomposition of the form P2 + N2 , where P2 is the projection on the dominant eigensubspace. By Perturbation theory, this decomposition extends for Vs to a (complex) neighborhood of s = 2, the dominant eigenvalue λ(s) is analytic there, and one has, for all ` ≥ 1, and for z ∈ D, Vs` [f](z) = λ(s)` fs (z)es [f] + Ns` [f](z)
(34)
where fs is the dominant eigenvector of Vs and es the projector on the dominant eigensubspace, with the normalization condition es [fs ] = 1. One deduces from (34) expressions for the powers of the quasi–inverse fs (z)es [f] + (I − Ns )−p [f](z). (1 − λ(s))p
Λps [f](z) := (I − Vs )−p [f](z) =
(35)
Now, using the derivability of s → λ(s) at s = 2 and the equality e2 [f] = R1 0 f(x)dx, we obtain the decomposition of the Dirichlet series of Proposition 1 near s = 2 −1 f2 (1) + C(s), (36) F (s) := (I − Vs )−1 [f](z) = 0 λ (2) (s − 2) and G(s) := Λ2s ◦ Vs [1](1) =
1 f2 (1) + D(s), λ0 (2)2 (s − 2)2
(37)
where C(s) and D(s) are analytic at s = 2. Then G(s) satisfies the hypothesis (i) of the Tauberian Theorem. The comparison of (29) and (36) at s = 2 proves 1 f2 (1) , = 0 ˜ λ (2) 2ζ(2)
i.e.
−1 2 = 2 . 0 λ (2) π f2 (1)
(38)
From Proposition 6, and relations (36), (37), (38), we thus deduce our first main result: ˜N of exchanges of the Binary EuTheorem 1. The average number EN or E clidean Algorithm on the set ΩN := {(u, v); u, v odd, 0 < u ≤ v ≤ N }, or on ˜ N := {(u, v); u, v odd, 0 < u ≤ v ≤ N, gcd(u, v) = 1} satisfies the set Ω ˜N ∼ EN ∼ E
2 log N π 2 f2 (1)
90
Brigitte Vall´ee
where f2 is the fixed point of the operator V2 V2 [f](x) :=
X X k≥1
a odd, 1≤a<2k
defined by the normalization condition
3
2 1 1 , f k k a+2 x a+2 x R1 0
f2 (t)dt = 1.
The Average Cost of the Binary Euclidean Algorithm
The study of the other two parameters of the binary continued fraction involves a more general operator Vw,s [f](x) :=
X h∈L
w c(h) f ◦ h(x). D[h](x)s
(39)
where each term involves a “cost” c(h) associated the LFT h. As in (17), the sum is taken on the set L of the LFT’s defined in (4), of the form h(x) = 1/(a + 2k x), and D[h] is the denominator of h, defined in (18). We shall consider two different costs c(1)(h) := p(a) = the number of 1’s in the binary expansion of a
(40)
or c(2) (h) := k = the exponent of 2 in the numerator of the continued fraction, (41) (i) and thus two different operators Vw,s . In both cases, the “multiplicative” property of D gives the expression of the `–th iterate of Vw,s as a sum over the set L` of the LFT of height `. This expression now involves the cost c(h) of a LFT h := h1 ◦ h2 ◦ . . . ◦ h` of L` , defined as the sum of the costs of each factor hi , ` [f](x) := Vw,s
X h∈L`
w c(h) f ◦ h(x). D[h](x)s
(42)
This expression is completely analogous to (19) and we can now follow the same principles as in Proposition 1, where the operator wVs is replaced by the generalized operator Vw,s . In particular, the method involves the quasi–inverse (I − Vw,s )−1 and its derivative with respect to w d d (I − Vw,s )−1 = (I − Vw,s )−1 ◦ Vw,s ◦ (I − Vw,s )−1 dw dw at w = 1. Furthermore, we remark that V1,s equals Vs , and, for each cost c(i) , we use the notation d (i) V |w=1 . Vs(i) := dw w,s Finally we obtain:
The Complete Analysis of the Binary Euclidean Algorithm
91
Proposition 7. The average values of the two last parameters of the algorithm ˜N are ratios where the numerators and the denominators involve the partial on Ω sums of the Dirichlet series F (s) and G(i)(s), defined from the quasi–inverse Λs := (I − Vs )−1 , G(i)(s) := Λs ◦ Vs(i) ◦ Λs [1](1).
F (s) := Λs [1](1), (i)
The operator Vs depends on the parameter. For the parameter “number of iterations”, it is defined by Vs(1)[f](x) =
X X k≥1
a odd, 1≤a<2k
1 p(a) ) f( (a + 2k x)s a + 2k x
where p(a) denotes the number of 1’s in the binary expansion of a. For the parameter “total sum of exponents in the numerators”, it is defined by Vs(2)[f](x) =
X X k≥1
a odd, 1≤a<2k
1 k ). f( (a + 2k x)s a + 2k x
In the same vein, the average values of the two last parameters of the algorithm on ΩN are ratios where the numerators and the denominators involve the partial (i) ˜ is the Riemann ˜ ˜ (s), where ζ(s) sums of the Dirichlet series ζ(s)F (s), and ζ(s)G function relative to odd numbers. So, the asymptotic evaluations (for N → ∞) of the average costs are possible since the Tauberian theorem can be applied to the previous Dirichlet series: Proposition 8. The conditions of the Tauberian Theorem hold for G(1)(s) and G(2)(s) with σ = 2 and γ = 1. Proof. See the extended paper [Va6]. We thus deduce our second result ˜ N of total iterations of the Binary Theorem 2. The average number KN or K Euclidean Algorithm on the set ΩN := {(u, v); u, v odd, 0 < u ≤ v ≤ N }, or on ˜ N := {(u, v); u, v odd, 0 < u ≤ v ≤ N, gcd(u, v) = 1} satisfies the set Ω ˜ N ∼ KN ∼ K
2 π 2 f2 (1)
[
X 1 1 F ( )] log N, ka 2 a 2 a odd a≥1
where f2 is the fixed point of the operator V2 V2 [f](x) :=
X X k≥1
such that F2 (x) := log2 a.
Rx 0
a odd, 1≤a<2k
2 1 1 , f k k a+2 x a+2 x
f2 (t)dt satisfies F2 (1) = 1. Here, ka is the integer part of
92
Brigitte Vall´ee
The average number PN or P˜N of the sum of exponents of 2 used in the numerators of the binary continued fraction expansions on the set ΩN or on the set ˜ N satisfies Ω P˜N ∼ PN ∼
2 π 2 f2 (1)
[−1 + 2
X 1 1 F2 ( )] log N. k a 2 a a odd a≥1
4
Relations with Other Analyses and Numerical Computations
There now exist three different expressions for the asymptotics of the average number of iterations. (i) In this work, we have obtained a proven expression that involvesR the fixed x point f2 of the operator V2 and its integral F2 , defined by F2 (x) := 0 f2 (t)dt, with the normalization condition F2 (1) = 1, KN ∼ A2 log N
with
A2 =
2 π 2 f2 (1)
[
X 1 1 F2 ( )]. k a 2 a
(43)
a odd
(ii) We recall that Brent [Br1] analyses this parameter when using the operator B2 defined in (12). He obtains a conjectural–heuristic expression [Br3] that involves the fixed point g2 of the operator B2 , with the normalization condition R1 g (t)dt = 1, 0 2 KN ∼
1 log N M
with
M = log 2 −
1 2
Z
1
log(1 − x) g2 (x)dx.
(44)
0
(iii) On the other hand, we proved in [Va5], without heuristic hypothesis, but under a “spectral” conjecture on Bs analogous to our Corollary 1 for Vs , that KN ∼ B log N
with
B=
π2
4 . g2 (1)
(45)
We remove the heuristic hypothesis when using Tauberian Theorems which lead to a proven transfer from the continuous model (the reals) to the discrete one (the rationals). We did not succeed in proving the equality of the three constants A2 , (1/M ), B, but Brent [Br3] made extensive computations that show the numerical concordance of the last two estimates (44) and (45) to 44 decimal places. The common numerical value is 1.0185012157614367170.....
The Complete Analysis of the Binary Euclidean Algorithm
93
In the same paper, Brent [Br3] exhibits a relation between the three operators Vs , Bs and Us defined in (9) and (14) (Vs − I) Us = Vs (Bs − I). However, this does not seem sufficient to obtain the equality between the constant A2 of (43) and the constant B of (45). At present, we have not yet computed numerically the dominant eigenfunction of the operator V := V2 , but the principles of the (future) computation should be very similar to those used in [DFV] or [Va3]. The idea is to approximate the operator V by a sequence of truncated operators acting on finite–dimensional subspaces generated by Taylor polynomials of functions at the point z = 1 together with the log function. These truncated operators are represented by finite matrices and their spectral characteristics may be expected to provide reasonably good approximations to the corresponding spectral characteristics of V. Acknowledgements. I wish to thank Herv´e Daud´e, Pierre Ducos, Henri Laville, Pascal Por´ee, Luc Vall´ee for our discussions of 1992 which began this work. Many thanks to Philippe Flajolet for his help in Mellin transforms. I would like to thank Richard Brent for his interest, for six months of regular e–mail, and his extensive computations for verifying my first conjecture. Finally, I wish to thank Don Knuth for his encouragement.
References [Br1] Richard P. Brent. Analysis of the binary Euclidean algorithm, Algorithms and Complexity, New directions and recent results, ed. by J.F. Traub, Academic Press 1976, pp 321–355. [Br2] Richard P. Brent. Unpublished, cf [Kn, 4.5.2] [Br3] Richard P. Brent. Further analysis of the binary Euclidean algorithm, manuscript, Feb 98, submitted to FUN’98. [DFV] Herv´e Daud´e, Philippe Flajolet, and Brigitte Vall´ee. An analysis of the Gaussian Algorithm for Lattice Reduction, ANTS ’1994, Lecture Notes in Computer Science 877, pp 144-158. Extended version in Combinatorics, Probability and Computing (1997) 6, pp 397-433. [De] Hubert Delange. G´en´eralisation du Th´eor`eme d’Ikehara. Ann. Sc. ENS, (1954) 71, pp 213–242. [Di] John G. Dixon. The number of steps in the Euclidean Algorithm, Journal of Number Theory 2 (1970) pp 414-422. [Fa] Christian Faivre. Distribution of L´evy constants for quadratic numbers, Acta Arithmetica, LXI. 1 (1992), pp 13-34. [Fl] Philippe Flajolet. Analytic analysis of algorithms, In Proceedings of the 19th International Colloquium “Automata, Languages and Programming”, Vienna, July 1992, W. Kuich, editor, Lecture Notes in Computer Science 623, pp 186–210 [FS] Philippe Flajolet and Robert Sedgewick. Analytic Combinatorics, Book in preparation (1999), see also INRIA Research Reports 1888, 2026, 2376, 2956.
94
Brigitte Vall´ee
[FV] Philippe Flajolet and Brigitte Vall´ee. Continued Fraction Algorithms, Functional Operators and Structure Constants, Theoretical Computer Science 194 (1998), pp 1-34. [Gr1] Alexandre Grothendieck. Produits tensoriels topologiques et espaces nucl´eaires, Mem. Am. Math. Soc. 16 (1955) [Gr2] Alexandre Grothendieck. La th´eorie de Fredholm, Bull. Soc. Math. France 84 pp 319-384. [Hei] H. Heilbronn. On the average length of a class of continued fractions, Number Theory and Analysis, ed. by P. Turan, New-York, Plenum, 1969, pp 87-96. [He1] Doug Hensley. The distribution of badly approximable rationals and continuants with bounded digits II, Journal of Number Theory, 34 pp 293-334 (1990) [He2] Doug Hensley. The Hausdorff dimensions of some continued fraction Cantor sets, Journal of Number theory 33, (1989) pp 182-198 [He3] Doug Hensley. Continued fraction Cantor sets, Hausdorff dimension, and functional analysis, Journal of Number Theory 40 (1992) pp 336-358. [He4] Doug Hensley. A polynomial time algorithm for the Hausdorff dimension of a continued fraction Cantor set, Journal of Number Theory 58 No. 1, May 1996 pp 9-45. [Kn] D.E. Knuth. The Art of Computer Programming, Vol 2, third edition (1997), Sec. 4.5.2. [Kr] M. Krasnoselsky. Positive solutions of operator equations, Chap. 2 P. Noordhoff, Groningen. (1964) [Ma1] Dieter H. Mayer. Continued fractions and related transformations, In Ergodic Theory, Symbolic Dynamics and Hyperbolic Spaces, M. K. Tim Bedford and C. Series, Eds. Oxford University Press, 1991, pp. 175–222. [Ma2] Dieter H. Mayer. Spectral properties of certain composition operators arising in statistical mechanics, Commun. Math. Phys. 68, 1-8 (1979) [Po] Mark Pollicott. A complex Ruelle-Perron-Frobenius Theorem and two counterexamples, Ergod. Th. and Dynm. Sys (1984), 4, pp 135–146 [Ru] David Ruelle. Thermodynamic formalism. Addison Wesley (1978) [Sc] H. Schwartz. Composition operators in Hp , Ph.D. Thesis, Univ. of Tomedo. [ST] Joel Shapiro, P.D. Taylor, Compact, nuclear, and Hilbert–Schmidt composition operators on H2 , Indiana Univ. Math. J. (1973) 23, pp 471-496 [Sh1] Joel Shapiro. Composition operators and classical function theory, Universitext: tracts in mathematics, Springer-Verlag, 1993. [Sh2] Joel Shapiro. Compact composition operators on spaces of boundary regular holomorphic functions, Proceedings of the AMS, 100, (1997), pp 49-57 [Va1] Brigitte Vall´ee. Op´erateurs de Ruelle–Mayer g´en´eralis´es et analyse des algorithmes de Gauss et d’Euclide, Acta Arithmetica LXXXI.2 (1997) pp 101-144. [Va2] Brigitte Vall´ee. Algorithms for computing signs of 2×2 determinants: dynamics and average-case algorithms, Proceedings of the 8 th Annual European Symposium on Algorithms, ESA’97, pp 486–499, LNCS 1284, Springer Verlag. [Va3] Brigitte Vall´ee. Fractions continues a ` contraintes p´eriodiques, Les Cahiers du GREYC 1997, Universit´e de Caen, to appear in Journal of Number Theory. [Va4] Brigitte Vall´ee. Dynamical systems and average–case analysis of general tries, Les cahiers du GREYC 1997, Universit´e de Caen; also in Proceedings of RALCOM’97, Santorini Island, October 1997. [Va5] Brigitte Vall´ee. Unpublished, cf [Kn, 4.5.2] [Va6] Brigitte Vall´ee. Dynamics of the binary Euclidean algorithm: Functional analysis and operators, les Cahiers du GREYC, Universit´ e de Caen (1998), to appear in Algorithmica (1999)
Cyclotomy Primality Proving – Recent Developments Preda Mih˘ailescu FingerPIN AG & ETH, Institut f¨ ur wissenschaftliches Rechnen
[email protected],
[email protected]
Abstract. Primality proving by cyclotomy is an extension of the Jacobi sum primality test, initially proposed by Adleman, Rumely and Pomerance [3] and implemented by H. Cohen and A. Lenstra [7]. In his presentation of the algorithm of Adleman, Rumely and Pomerance at the Bourbaki Seminar 1981 [14], H. W. Lenstra Jr. proposed under the name of “Galois theory test” the idea to combine classical Lucas – Lehmer tests with the Jacobi sum test. This idea was first studied and implemented by Bosma and van der Hulst in their thesis [6]. In our recently completed thesis [19], we considered the topic anew, from a slightly changed perspective and made an implementation which allowed establishing new general primality testing records. In this paper we shall give an overview of cyclotomy from the perspective of the recent research and implementation. We also discuss the drawbacks of the algorithm – the overpolynomial run time and lack of certificates – and mention some open problems which may lead to future improvements.
1
Introduction
The topic of primality proving is, given a positive odd integer n = 1, to give an answer to the question “is n a prime?”. The answer may be “yes” or “no”, and must be backed up by a proof of the statement – the expected answer is “yes”, while “no” may result in the case in which the proving algorithm discovers the compositeness of a candidate having passed several pseudo – prime tests. Some early and still widely used primality proving algorithm rely upon the following: Lemma 1 (Lucas-Pocklington). Let n ∈ ZZ/(n · ZZ), p be a prime with p | (n − 1) and suppose there is an element α ∈ ZZ/(n · ZZ)∗ such that: αn−1 = 1 (α
n−1 p
mod n
− 1, n) = 1
(1)
Then, all primes r | n are of the shape r = λp + 1. The proof is simple and it uses reduction modulo r. The tests using this lemma and its generalization to algebraic extensions of small degree k = 2, 3, 4, 6, 8 are known as Lucas – Lehmer tests and have been largely investigated ([5], J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 95–110, 1998. c Springer-Verlag Berlin Heidelberg 1998
96
Preda Mih˘ ailescu
[18], [2], [28], [30], etc.). These proofs rely upon finding factors p of n − 1 or Φk (n), k = 2, 3, 4, 6, 8, Φk being the k−th cyclotomic polynomial. They do thus not apply to any test number n, but only to such ones with a large factored part of Φk (n). A Lucas – Lehmer test has O(log(n)3 ) runtime. The first fast test allowing primality proofs for any test number – general primality proof method – was the early version of the Jacobi sum test [3]. Basically, this test implicitly applies higher reciprocity rules by the use of Gauss and Jacobi sums and succeeds: I. to find deterministically a large enough factored part of Φt (n), for t = f (n) = O log(n)c·log log log(n) , due to a theorem of analytical number theory of Prachar and Pomerance([3], [14]). II. perform tests similar to Lucas – Lehmer in extensions of degrees l|t, while the information of the different tests can be accumulated. Knowledge of precomputed factors of Φt (n) does not improve the asymptotic behavior of the Jacobi sum test. The test is polynomial in the above defined function f (n) and thus superpolynomial in log(n). A new proving algorithm based upon a variant of the Pocklington lemma in elliptic groups was proposed in 1986 by Kilian and Goldwasser [11], repeatedly improved and implemented by Atkin and Morain [4]. The ECPP primality proving program has been available on the web [10] for several years and can prove numbers of 2000 decimal digits and more. n−1 Note that under the conditions of lemma 1, the element α p is a primitive p−th root of unity modulo n. This remark motivates the following theorem of Lenstra ( [14], (8.1) ), which is at the origin of cyclotomy primality tests: Theorem 1. Let s ∈ ZZ >0 . Let A be a ring containing ZZ/(n · ZZ) as a subring. Suppose that there exists α ∈ A satisfying the following conditions: αs = 1, αs/q − 1 ∈ A∗ , for every prime q|s, Ψα (X) =
t−1
(2)
i X − αn ∈ ZZ/(n · ZZ)[X], for some t ∈ ZZ >0
i=0
Then every divisor r of n is congruent to a power of n modulo s. Further developments of this theorem ([15], [6]) allowed combining generalized Lucas – Lehmer tests using precomputed factors of Φt (n) with the Jacobi sum test; the first implementation of cyclotomy was made in [6]. Our recent implementation [19], [20] of cyclotomy brings some technical improvements and allows performance primality proofs, so that any number of 2000 decimal digits may be routinely proved in less than 2 days on a DEC Alpha workstation. We shall restrict this paper to the basic theory used for proving
Cyclotomy Primality Proving – Recent Developments
97
the consistency of the implementation. Given the limited space, we shall provide short comments instead of proofs of the propositions. A reference to the labels in the dissertation [19], which may be downloaded from [8], will be also provided for simplifying the search for a complete proof. The theoretical presentation is followed by a statement of the implemented algorithm and a discussion of the implementation results. The question of certificates and trust is treated extensively in the last section of this paper.
2
Cyclotomy of Rings
Throughout this paper we shall consider an integer n > 2 – a prime candidate – and let N = ZZ/(n · ZZ). As mentioned in the introduction, cyclotomy primality proving involves Lucas – Lehmer like tests in extensions of ZZ/(n · ZZ), where n is the prime candidate. These are thus primarily rings (until the primality proof is fulfilled and the rings are shown to be in fact fields). We need therefore to study roots of unity, Gauss and Jacobi sums and cyclotomic extensions first in this general setting. Definition 1. Let A be a ring. Let k > 1 be an integer such that k ·1 ∈ A∗ , with A∗ the group of the units in A; we shall call an element ζ ∈ A a k−th primitive root of unity if Φk (ζ) = 0, where Φk is the k−th cyclotomic polynomial. The Galois structure of finite rings is generalized to rings by: Definition 2. If G is a group, a ring R ⊃ N is called Galois extension of N with group G, if, given t = #G, there are t elements z1 , z2 , . . . , zt ∈ R such that: (i) R =
t
N zi and the map N → R is injective
i=1
(ii) det(σ(zi ))σ∈G ∈ N ∗ Note that (ii) implies that the zi must be independent. In primality proving we are interested in Galois extensions of N which contain roots of unity in the sense defined above. These structures should be possibly close to finite fields; in particular, we require that the action of the Galois group is similar to the one of the Frobenius in a field. This leads to the following definition given by Lenstra [[15], (4.1)]: Definition 3. An s−th cyclotomic extension of N is an N -algebra R together with an automorphism σ of R and an element ζ ∈ R such that: (i) R is Galois over N with group generated by σ. (ii) Φs (ζ) = 0 and σζ = ζ n . (iii) σ t = idR . s is the order and t is the degree of the extension R. The next theorem gives several of the many equivalent properties of cyclotomic extensions, which are most useful for primality proving:
98
Preda Mih˘ ailescu
Theorem 2 ([19]:(3.35), (3.36) and (3.41)). Let A be the ring of integers t−1 i in ILn = Q[ξs ]
and Ψ0 (x) = i=1 (x− ξsn ) ∈ A[x], ξs being a primitive s−th root of unity in C. The following statements are equivalent: (I) (II) (III) (IV)
an s−th cyclotomic extension of N exists r|n −→ r ∈< n mod s > there is a surjective ring homomorphism τ0 : A → N there is a polynomial Ψ (x) ∈ N [x] of degree t with: (i) Ψ (x) | Φs (x) i
(ii) if ζ = x + (Ψ (x)) ∈ N [x]/(Ψ (x)), then Ψ (ζ n ) = 0, for i = 1, 2, . . . , t (V) With ψa , the Artin symbol, i.e. the automorphism of Cs raising ξs to the power a, ψr = ψni , ∀ r|n, for some i = i(r), 1 ≤ i ≤ t. Remark 2. Note that (II) is an excellent primality criterion, while (IV) is a reformulation of Lenstra’s theorem 1: the factorization of cyclotomical polynomials over a cyclotomic extension is built up from factors of the same degree as the factors arising modulo a prime p. The property (V) allows more insight in (II): existence of cyclotomic extensions modulo composite n has consequences upon the structure of the Artin group. In some cases the existence of cyclotomic extensions is known for two coprime orders s1 , s2 and we want to know under which conditions the existence of a s1 · s2 −th extension follows. This may, for instance, be the case when one extension is proved by Lucas – Lehmer methods and the other by Jacobi sums. The following two theorems give an answer to this question: Theorem 3 ([19]:(3.47)). Let s1 and s2 be coprime such that s = s1 · s2 and suppose si −th cyclotomic extensions of N are known to exist for i = 1, 2. For χ a character modulo s, let χ = (χ1 , χ2 ) be the splitting of χ according to s1 and s2 and H(n)⊥ be the set of all characters χ modulo s, such that χ(n) = 1. Then following condition is equivalent to (I) - (V) (VI) for all r|n and χ ∈ H(n)⊥ with χ1 (n) = 1, χ(r) = 1 holds. Theorem 4 ([19]:(3.64)). Let R = N [α] be an Abelian extension with group G and (r, σ, ζ) and (r , σ , ζ ) µ−th and µ −th cyclotomic extensions, respectively. If R = r ∩ r , then the composite of r and r is an µ · µ −th cyclotomic extension. Remark 3. The first theorem uses characters and is adapted for Jacobi sum proofs. Abelian extensions are Galois extensions which may be embedded as subextensions of cyclotomic extensions; they may thus be regarded as the image of the integers in a complex cyclotomic subfield under an extension of the maps τ0 defined in theorem 2 (III). Here one must show that the automorphisms of the two extensions r, r’ acts identically on the intersection. This proposition is used in conjunction with Lucas – Lehmer methods.
Cyclotomy Primality Proving – Recent Developments
99
Knowing for which powers pk of a given prime cyclotomic extensions of N exist is a relevant practical question. It turns out that there exists a minimal exponent ks > 0 such that extensions of order pk exist for all k > ks , provided an extension of order pks exists. The following concept of saturation describes this phenomenon. Definition 4. Let (R, σ, ζ) be a pk −th cyclotomic extension of N , for a certain prime power pk , k ≥ 1. If k is such that: v2 (n2 − 1) if p = 2 and n = −1 mod 4 k ≥ ks (p) = (3) p−1 vp n − 1 otherwise Then the ring R is called saturated and ks is called the saturation exponent. If k < ks , R is sub saturated. An important property of saturated p−th extensions (R, σ, ζp ) is that p−th nt −1
roots may be taken form elements α ∈ R with α p = 1. Properties of Gauss and Jacobi sums are known over finite fields and have been used for the Jacobi sum test ([13], [7]). Let r be a prime, f, m positive integers with f |ϕ(m) and χ : ZZ/(m · ZZ)∗ −→ IL a character of order f and conductor m, with IL some extension of IFr containing primitive m−th and f −th roots of unity. If τ (χ) ∈ IL is a Gauss Sum defined with respect to some primitive m−th root of unity, then: τ (χ)r = χ−r (r) · τ (χr ).
(4)
By iteration of (4), we get: τ (χ)r
k
k
k
= χ−k·r (r) · τ (χr ), for k ≥ 1 .
(5)
The work with powers of Gauss Sums is simplified by using multiple Jacobi sums: J1 = 1 Jν+1 = Jν · j(χ, χν ), for ν = 1, 2, . . . , f − 2 Jf = χ(−1) · m · Jf −1
(6)
It is easy to verify by induction that: Jν =
τ (χ)ν , τ (χν )
for ν = 1, 2, . . . , f , where τ (χf ) = 1.
(7)
Remark 4. If (R, σ, ζ) is a saturated p−th extension of N , q a prime with pk |(q − 1) and ξ is an q−th primitive root of unity over N , χ a character of order pk and conductor q such that χ(n) = 1, then a converse of (4) holds in the following sense: If τ (χ)n = χ(n) · τ (χn ), then τ (χ) ∈ R.
100
3
Preda Mih˘ ailescu
The Lucas - Lehmer Method
In this section we shall give the generalization of classical Lucas – Lehmer tests along the lines of theorem 1 and a method for constructing small cyclotomic extensions. Both can be solved by one and the same construction – hence, the Lucas – Lehmer method. Small cyclotomic extensions are useful for performing the Jacobi sum test. The following theorem describes the Lucas – Lehmer method. Theorem 5 ([19]:(3.42)). Let s, u > 0 be positive – not necessarily different – integers with
< n mod s >= < n mod u >= t, and assume t verifies (t, s) ≤ 2 and if (t, s) = 2 then n = 3 mod 4. Let IL be an Abelian extension of Q with conductor u such that Gal (IL/Q) = < n mod u >. Suppose that ∃ α ∈ O(IL) such that ψn (α) = αn mod n · O(IL).
(8)
For all primes p|s, let kp be the p−th saturation exponent. Put q = pkp and t−1
ci (p) · ni . Let R = O(IL)/n · O(IL) and suppose (nt − 1)/p = i=0
(β(p) − 1) ∈ R∗ , where β(p) =
t−1
(ψni (αci (p) )) mod n · O(IL).
(9)
i=0
Let β = p|s β(p) mod n · O(IL) and σ be the automorphism induced by ψn in R. Then (R, σ, β) is a saturated s−th cyclotomic extension of N . Remark 5. The above theorem builds saturated p−th roots of unity β(p) for all primes p|s. Their product is a saturated s −th root of unity. Note that the order s of β may well be a multiple of s, but s is built up of the same primes as s, each raised to their saturation power. In practice, one will in general have s = pk , a prime power, and the extensions will be rather small. The combination of different prime power order roots will require combining cyclotomic extensions, using one of theorems 3 or 4. The distinction between u and s, allows us to choose a minimal u for a given extension degree t. When s = pk is a prime power, u = p is the simple choice for odd p. For p = 2, one distinguishes the case n = 1 mod 4 in which a saturated extension has degree t = 1, from n = 3 mod 4, in which t = 2. The condition (p, t) ≤ 2 ascertains that no oversaturated extension is required. =1 If µ(t) = 0 (with µ, the M¨ obius function), choosing u such that t, ϕ(u) t insures the existence of the field IL. The search of multiple roots in extensions of fixed degree is made efficient by a binary search algorithm which only takes one large exponentiation for finding all the roots ([19], §6.4).
Cyclotomy Primality Proving – Recent Developments
4
101
The Jacobi Sum Test t
Let n be as above and s a positive integer, t = ords (n) be such that (s, n s−1 ) = 1. Let Q = {q|s : q prime } and P = {(pk , q − 1) : q ∈ Q, pk (q − 1), p prime}. For all (pk , q) ∈ P let χpk ,q be some primitive character of conductor q and order pk . In [7], these characters are defined using complex roots of unity. In [6], they are defined over small cyclotomic extensions. In [16], a method is given for combining the final trial division √with additional checks resulting in a relaxation of the condition on s to s > 3 n. This method does not improve the overall runtime of the final step for the currently reachable domain of magnitude (up to 10000 decimal digits). It will not be considered in more detail. The condition checked by the initial form of the Jacobi sum test [7], [6] is the following: ∀(pk , q) ∈ P, (τ (χpk ,q ))n = τ (χnpk ,q ),
(10)
for some ∈< ζpk >, where ζpk is the primitive pk −th root of unity used in the definition of the character χpk ,q . If these conditions hold then r ∈< n mod s > ∀r|n, thus an s−th cyclotomic extension of N exists according to theorem 2, (II). Characters of pairwise coprime conductors q ∈ Q may be joined to characters of conductor s and the set {χpk ,q : (pk , q) ∈ P } generates the set ZZ/(s · ZZ)⊥ of all primitive characters modulo s. Remark 6. It may be shown that s−th cyclotomic extensions exist if condition (10) is requested “only” for a generating set of Hs⊥ (n) = {χ ∈ ZZ/(n · ZZ)⊥ : χ(n) = 1}. One shall assume from now on that P is a set of generators of Hs⊥ (n), rather then ZZ/(s · ZZ)⊥ . This is not an improvement of practical relevance. It has, though, an important theoretical consequence. Using remark 4, one finds that the Gauss periods
ζsτ σ , ∀τ ∈ (ZZ/(s · ZZ)∗ )/ < n mod s >, ητ = σ∈ with ζs some primitive s−th root of unity, may be computed in N if (10) holds for a generating set of H ⊥ (n). With these values and using the Newton formulae [27], a complete factorization of Φs (x) over N can be computed deterministically ([19], pp. 75-77). This reveals the fact that Jacobi sums and Lucas – Lehmer are in fact two sides of the same medal, an information which was not obvious in the initial form of [3]. In fact, Lucas – Lehmer seeks primitive roots of unity, and implicitly their minimal polynomials, which are factors of the cyclotomic polynomial. The Jacobi sum test proves that a direct factorization of the cyclotomic polynomial may be deterministically achieved modulo n, thus implicitly proving also the existence of primitive roots of unity. Both imply the possibility of constructing the same cyclotomic extension, construction which need not be completed for primality proving.
102
Preda Mih˘ ailescu
The expressions =
(τ (χpk ,q ))n τ (χnk )
are expected to be roots of unity of co-
p ,q
prime orders, for coprime pk . This suggests that the product of several Gauss Sums of coprime orders may be tested simultaneously: we name products Θ = k (p ,q)∈P τ (χpk ,q ) with relatively coprime orders in the product amalgams. After introducing multiple Jacobi sums (7) for simpler expressing
(τ (χpk ,q ))n τ (χnk ) ,
one
p ,q
has the following: Theorem 6 ([19]: 4.25 ). Let f = ji=1 pki i with pi primes, f = ji=1 pi , t = ki # < n mod f > and (R , σ, ζ) an f -th cyclotomic extension of N , ζi = ζ f /pi , i = 1, 2, . . . , j; thus, ζi are primitive roots of order pi ki . Let χi : ZZ/(qi · ZZ) → < ζi > be characters of order pi ki and conductor qi , with qi not necessarily different primes. For i = 1, 2, . . . , j, let: αi = Jpi ki (χi ) βi = Jνi (χi ),
and
(11)
where νi = n mod pi
ki
.
Let n = f · l + ν with 0 ≤ ν < f and ν = pi ki · λi + νi , i = 1, 2, . . . , j. Define α and β by α=
j i=1
f /pi ki
αi
∈R
and
β=
j i=1
αλi i · βi ∈ R
(12)
Suppose there is an ∈ < ζf > such that αl · β = −n
(13)
Then χi (r) = χi (n)lp (r) ∀r | n and i = 1, 2, . . . , j. Furthermore =
j
i=1
χi (n).
Remark 7. An idea similar to amalgams was used in [6], being referred to as “combining Jacobi sums”. When building amalgams with the aim of reducing computation time, it is important that the choice of the orders of characters joined in an amalgam not only be coprime, but also that the degrees of the extensions in which the respective primitive roots of unity are defined, all divide a maximal degree. If A = {(pki i , qi ) : i = 1, 2, . . . , ω(t)} ⊂ P , where ω(t) is the number of distinct prime divisors of t and the exponents ki ≥ 0, then ∃ j, 0 ≤ j ≤ ω(t), such that kj > 0 and ordpki (n)|ordpkj (n) i
j
∀i.
We shall write deg(A) = j and deg(χpk ,q ) = deg(pk ) = ordpk (n), thus determining the extension degree in which amalgams or single characters are tested.
Cyclotomy Primality Proving – Recent Developments
5
103
Algorithm
For proving primality using cyclotomy, a first step definitely consists in performing several probabilistic primality tests, which can eliminate obvious composites. Primality proving is also a sequence of more involved probable prime tests which yield additionally the possibility of combining information from single tests and eventually eliminate any doubts about a candidates’ primality: the check of conditions such as (10) or (13) may be regarded as pseudoprime tests, but they are not independent. √ In cyclotomy one starts by the choice of a parameter s > n and t = t ords (n), (t, n s−1 ) = 1. The order t should be possibly small. The next goal is proving existence of an s−th cyclotomic extension of N using Jacobi sums. The set of pairs P defined in the previous section will first be built and the single pairs joined to amalgams in a list L. The list gives indication about the degrees and orders of small cyclotomic extensions R that will act as working extensions: let E be a list of all working extensions. An extension R = O(IL)/(n · O(IL)) (see theorem 5) is given by a minimal polynomial and a matrix describing the Artin symbol ψn . A root of unity and a Jacobi sum are both elements of extensions R ∈ E. These together with the appropriate roots of unity shall next be built. At this point some computing time may be invested in trying to factor Φu (n), for some small values of u = deg(R), R ∈ E. The resulting Lucas factors will be used for Lucas – Lehmer tests, thus decreasing the size of s to s¯: let s = q |Φu (n) q be the product of the Lucas factors found. The parameter √ s¯ used in the Jacobi sum test will accordingly be decreased, so that, s¯ · s > n holds: one replaces s by s¯. The initial work in building working extensions provides the proof according to theorem 4 that an s¯ · s −th extension exists, provided s −th and s¯−th extensions do. Together with decreasing s to s¯, the amalgam lists must be adapted. After theses preparing steps, roots for the Lucas – Lehmer part will (optionally) be sought and property (13) checked for all amalgams in the list. If the roots may be found and the checks are passed without indication that n is composite, the existence of an s · s¯−th cyclotomic extension of N has been proved. The final step consists in a trial division: for all ri = ni mod s, i = 1, 2, . . . t − 1, prove that r does √ not divide n. This stage may be sensibly reduced, if one √ chooses s¯ · s > t n to start with, and only considers the remainders ri <= n. Their expected number is not larger √ than 1, but if n is composite, at least one prime factor should be r = ri < n. Rather than performing √ divisions one has only to will be the rule; a compute the residues ri and check if they are > n, which √ trial division will only be performed for the remainders ri ≤ n. If no factor is found in the trial division, n is proved prime by cyclotomy. Given that most operations in these procedure tend to fail if n is not prime, many fail stops will be built in the algorithm – we shall only distinguish the fundamental ones. We have described the following:
104
Preda Mih˘ ailescu
Algorithm 1.) Choose s >
(* Cyclotomy Primality Proving *) √
t
n with t = ords (n) and (t, n s−1 ) = 1. Set s¯ = s.
2.) Build the list of pairs P defined in §4 and an amalgam list L according to remark 7. 3.) For all degrees in D = {deg(A) : A ∈ L}, build an extension using theorem 5. Let E be the list of all extensions. Prove the existence of their compositum, using theorem 4. 4.) ( optional Lucas – Lehmer step ) 4.1) Seek Lucas factors using ECM for N (u) = Φu (n), where u ∈ D. After completing the trial factorization, 4.2) Build s = q |Φu (n),u∈D q . For all primes q |s seek a primitive q −th root of unity using theorem 5. If for a given q no root is found within a given time bound, set s = s /q and proceed. √ 4.3) Set s¯ so that s¯ · s > n. If s > 1, recompute the list L. 5.) For each amalgam A ∈ L there is an extension R ∈ E of degree deg(A). Check (13) for A in the ring R. Do this for each A ∈ L. If the test fails for some A, go to 8. 6.) Perform the final trial division. If some divisor is found, go to 8. 7.) Declare n proved prime by cyclotomy and stop. 8.) Declare n composite and stop.
6
Implementation Results and Open Problems
The cyclotomy proving algorithm and the elliptic curve one [10] are currently the only general primality proving algorithms. A practical comparison of the two is thus interesting. ECPP is a polynomial time algorithm, but the degree of the run time polynomial is high (approx 6), due to a slow recursive proof. Cyclotomy is different from ECPP in many ways. The run time, depending upon the function f (n) defined in the introduction, is super However, the main stage of the polynomial. algorithm (Jacobi sum tests) is O log(n)4.5 and thus faster than ECPP. The super polynomial aspect involved in the final trial division is not dominant in the domain of magnitude that can currently be proved. In fact, this will be definitely the case for log n < 109 ! As a consequence, cyclotomy behaves much better in practice.
Cyclotomy Primality Proving – Recent Developments
105
While cyclotomy has a well understood economy of the information from different partial tests – thus leading to a provable run time which is close to the average expected value [19] §7, the resources of elliptic curves for primality testing, as reflected by the following table, are far richer and only a part of them are actually used. Feature Cyclotomic groups Elliptic groups Group orders 1 O(nk/2 ) extension degreee Combining tests in fixed degree done ? Combining tests in embeddings done ?
Table 1. Cyclotomy and Elliptic Curve Resources
It is not known, whether results from tests in different elliptic groups of the same extension degree or embeddings of the same curve in a tower of ring extensions may or may not be combined. Tests in degrees different from 1 have not been investigated yet, and it is possible that the larger choice of group orders may offer some useful advantages. It is also conceivable that cyclotomic extension, providing a surrogate for the Frobenius may be of explicit use. One or the other of these ideas, or their combination, may provide the solution to the open problems of finding a polynomial time deterministic primality proof and a faster probabilistic one. Therefor one can hope the future will be more interesting than a mere competition in gaining small percentage improvements of the current implementations. Reducing the cyclotomy test’s superpolynomial behavior depends on the solution of the following two subproblems: ki + 1 and a bound B ≥ pki i , ∀pi |(q − 1), find an JS Given a prime q = i pi algorithm for computing a Jacobi sum of conductor q and order pki i , requiring not more than O(cB · log(q)) binary steps, for some absolute constant c. TD Given a positive composite number s with t = λ(s) = O(f (s)), with f (s) defined in §1 and λ, the Carmichael function, let G ⊂ ZZ/(s · ZZ)∗ be a cyclic subgroup with t elements. Find an algorithm which produces the subset H = {x ∈ G : x < s/t} ⊂ G in O(log(s)) binary operations. The values x representing equivalence classes in G are assumed 0 ≤ x < s. Finding a positive solution to both above problems would result in a polynomial time cyclotomy test. The cyclotomy proving algorithm was implemented in C++ using the arithmetic of LiDIA [17]. This package offers an FFT class for modular polynomial arithmetic by V. Shoup, which improves the performance considerably. Further dedicated arithmetic improvements – using, among others, the Toom – Cook algorithm [12], [26] and improving the base modular exponentiation performance – will probably lead to a speed up factor of roughly 2 − 3. The current imple-
106
Preda Mih˘ ailescu
mentation was already successful in establishing new records in general primality testing, which followed a short lived record with ECPP [22].
nDec. Av. Time Max Time Min Time Mod.Exp Max Ext Ratio Max Digits (sec) (sec) (sec) (sec) Expo: (sec) Deg 20 0.30 1.05 0.15 0.00 0.04 n.a. 3 50 0.61 0.91 0.36 0.01 0.15 60? 2 100 3.97 7.18 1.49 0.16 1.36 25 5 120 5.63 7.38 2.37 0.20 1.99 28 6 140 9.31 16.81 4.61 0.25 2.79 37 5 160 12.63 24.98 7.08 0.32 3.56 40 3 180 17.72 31.38 4.67 0.38 4.67 47 6 200 23.83 41.40 13.70 0.42 4.67 57 6 300 77.59 117.97 45.76 0.54 12.13 144 4 500 730.01 965.97 573.87 3.74 20.87 195 12 800 3900.47 5105.46 2872.56 9.82 20.63 398 8 1000 10680.22 16293.12 5067.33 14.34 28.17 745 6
Table 2. Performance CYCLOPROVE
The previous table gives run time statistics for sizes of primes between 20 and 1000 decimal digits. For each length, we display the average, maximal and minimal times (in sec.), as well as the average time for one modular exponentiation mod n and for an exponentiation in a maximal degree extension. The degree used for this average is written in the last column of the table, while the column before last displays the machine independent ratio: Average test time Average modular exponentiation time The next table compares the machine independent ratio for several implementation of general primality tests.
Digs. Test A Test A Test B Test B Test C Test C Test D Test D Time Ratio Time Ratio Time Ratio Time Ratio 100 120 140 160 180 200
7.05 13.75 23.41 35.67 47.86 68.36
486 678 839 958 1029 1139
108.65 190.83 418.03 671.45 1017.68 1458.87
201 211 310 348 377 388
22.66⊥ 44.72⊥ 74.69⊥ 122.44⊥ n.a. n.a.
112 176 251 364 n.a. n.a.
Table 3. Performance in Comparison
3.97 5.63 9.31 12.63 17.72 23.83
25 28 37 40 47 57
Cyclotomy Primality Proving – Recent Developments
107
The data about ECPP are taken from [23] and have been calculated for 128 to 512 bits, in steps of 64 bits. We shall fit these figures into our decimal digit scale, indicating with , ⊥, whether the original data are for larger or smaller decimal lengths. Test A is the implementation [7] of the Jacobi sum test, on a Cray–1; Test B is the implementation [6] on Sun4, Test C Morain’s elliptic curve test [23]. Finally, Test D is the present implementation. The average time is in seconds. The largest record achieved at the time of submission [21] was the proof 11279 of primality for N = 2 3 +1 , a 3395 decimal digit number. The proof was completed in roughly 6 days of computation on a DEC Alpha 500. This is also a good example of “cooperation” between the Lucas – Lehmer and the Jacobi sum methods. From the Cunningham tables and further personal Cunningham factorizations, about 700 digits of factors of N ± 1 could be deduced: while considerable, this factored part would never reach for performing a classical Lucas – Lehmer proof. The remaining factored part was provided by the Jacobi sum method, while the main stage of the last was definitely shortened by the use of Lucas – Lehmer factors.
7
Certifying Primes and Programs
In the factorization of large integers, the result of a long computation yielding the factors of some input number, may be very easily verified by anyone using a long integer arithmetic software, by multiplying out the factors and comparing to the initial input. Verifying the statement of a primality proving algorithm is not as easy. Algorithms using the Lucas – Pocklington lemma 1 construct by trial and error some groups and roots of unity in those groups. Once these constructed, a verifier needs only to check the conditions (1) or their analogues in the respective groups: the trial and error work is spared, thus leading to a sensible reduction of computing time, compared to the time initially spent in finding the proof. Such recipes for a quick verification of a primality proof were known to exist for Lucas – Lehmer tests, for a long time [24], and were mentioned for the case of the elliptic curve test already by Goldwasser and Kilian [11]. The certificates generated by ECPP may be checked very fast, using less than 1/100 of the time required for generating them; for instance, the certificate generated for the most recent ECPP record [22] in one month of Dec Alpha 400 MHz, took 6 hours to verify [23]. The situation is rather hopeless with cyclotomy. Practically no random trial steps are performed, with the exception of the root search for generating working extensions and eventual Lucas – Lehmer tests, which together never make up for more than 1% of the proving time. Except for the root finding, there is thus no step in the initial proof that might be spared in a verification. A certificate of a cyclotomy proof is valuable for indicating what extensions, what roots of unity and which amalgams to use. But it takes essentially as much time to check as to generate !
108
Preda Mih˘ ailescu
Since certificates are provably not the way for cyclotomy, we shall address the question of trusting a cyclotomy proof at a program level. One of the strongest practical tools defined for verification and integrity checks in the theory of programming are invariants [9]. Luckily, cyclotomy does not lack invariants, and checking some of them is part of the very task of the primality proof. The main invariant checks are (8) and (13). Note that both identities are very unprobable. The first connects the structure of the working extension to the power n exponentiation and the matrix of the Artin symbol. Highly uncorrelated transformations (as program techniques and modules) of one and the same extension elements are compared: these transformations giving one and the same result is a reliable invariant, according to the definition of Dijkstra. A similar statement holds for (13). A further invariant which may be tested using compiler directives is the absolute value of Jacobi sums, based on the known identity τ (χ)·τ −1 (χ) = χ(−1)q, where q is the conductor of the character χ [7]. This is optional, since not part of the proof itself, and is done both at the level of the sums as elements of complex cyclotomic fields as of their embedding in cyclotomic extensions. Increasing debugging levels allow checking intermediate results by another computational resource: this has been intensively done for numbers of less than 500 decimal digits, using the PARI gp calculator. A hand check using the debug levels is also necessary for ascertaining that the amalgams – which are the blueprint of the proving strategy – are consistently built. The connection between the cyclotomy test and factoring cyclotomic polynomials mentioned in remark 6 offers an unexpected additional means of verification. At the cost of intensive additional computations, a factor Ψ (x)|Φs (x) mod n may be actually computed. Using this polynomial, the proof verification becomes very simple, involving basically the verification of theorem 2 (III). It can thus be done using simple long integer arithmetic. Of course, nothing is for free, and the initial remark that from the point of view of information theory, there is (almost) no irrelevant information produced during the cyclotomy proof, still has its consequences. They reside in the fact that the degree t of the polynomials Ψ is overpolynomial O(f (n)) and in the range of millions for numbers with only 1000 decimal digits: this verification is hardly conceivable for numbers with more than 300 − 500 decimal digits, corresponding to t in the range 10000 − 25000.
8
Conclusions
We gave a quick overview of the cyclotomy primality proving basic theoretical background. The method has been implemented and we presented some statistical results and comparisons to other general primality proving implementations. The results show that cyclotomy is currently the most practical and efficient method for general primality proving – despite of its super polynomial asymptotic run time. We also gave an overview of open problems and possible directions for further research. Since building certificates that may be checked faster than
Cyclotomy Primality Proving – Recent Developments
109
the primality proof is shown not to be possible, we discussed some alternatives for the question of trust of the results of an implementation. Acknowledgments. A comprehensive list of those who helped this work reach a good end may be found in [19]. At this point, we express our gratitude to H. W. Lenstra Jr. for his supportive and attentive advice over a long period of time and M. Bronstein for the impulses given for realizing the implementation. After the very first results, it had an encouraging and friendly echo over the internet. I am grateful to Paul Leyland, who was first informed about the new implementation and provided valuable contacts and E. Mayer, who provided some large test examples and does not give up looking further. F. Morain responded enthusiastically from the first announcements on cyclotomy and the growing exchange concerning primality proving is already completing the competition of timings with collaboration and new insights. I thank him for making this possible.
References 1.
2. 3. 4. 5.
6. 7. 8. 9. 10. 11. 12. 13. 14.
L.M.Adleman, H.W.Lenstra, Jr., “Finding irreducible polynomials over finite fields”, Proc. 18-th Ann. ACM Symp. on Theory of Computing (STOC) 1986, pp. 350-355 W.Adams, D.Shanks: “Strong Primality Tests That are not Sufficient”; Math. Comp. vol. 39, Nr. 159 (July 1982), pp. 255-300. L.M. Adleman, C. Pomerance, R.S. Rumely: “On Distinguishing Prime Numbers from Composite Numb ers”, Ann. Math. 117 (1983), pp. 173-206 A.O.L. Atkin, F.Morain: “Elliptic curves and Primality Proving.”, Math. Comp., vol 61 (1993), pp. 29-68. J.Brillhart, D.H.Lehmer, J.L.Selfridge: “New Primality Criteria and Factorization of 2m ± 1”, Math. of Comp., vol. 29, Number 130 (April 1975), pp. 620-647. W.Bosma and M.van der Hulst: “Primality proving with cyclotomy”, Doctoral Thesis, Universiteit van Amsterdam 1990. H.Cohen, H.W.Lenstra Jr.: “Primality Testing and Jacobi sums”, Math. Comp. vol 48 (1984), pp 297-330. http://www.inf.ethz.ch/personal/mihailes, Homepage of Cyclotomy, Preda Mih˘ ailescu. Dijkstra, E.; Scholten, C.: “Predicate calculus and program semantics”, Springer Verlag (1990) http://lix.polytechnique.fr/~morain/Prgms/ecpp.francais.html, Site for downloading the elliptic curve primality test software of F .Morain. S.Goldwasser, J.Kilian: “Almost all primes can be quickly certified”. Proc. 18-th Annual ACM Symp. on Theory of Computing (1986), 316-329. D.E.Knuth: “The art of computer programming”, Vol.2, Semi numerical algorithms, Addison-Wesley, Reading, Mass. second edition, 1981. S.Lang: Algebraic Number Theory, Chapter IV , Addison Wesley Series in Mathematics. H.W.Lenstra Jr.: “Primality Testing Algorithm s (after Adleman, Rumely and Williams)”, Seminaire Bourbaki # 576, Lectures Notes in Mathematics, vol 901, pp 243-258
110 15.
16. 17. 18. 19. 20.
21.
22.
23. 24. 25. 26. 27. 28. 29. 30. 31.
Preda Mih˘ ailescu H.W.Lenstra Jr.: “Galois Theory and Primality Testing”, in “Orders and Their Applications, Lecture Notes in Mathematics, vol 1142, (1985) Springer Verlag H.W.Lenstra Jr.: “Divisors in residue classes”, Math. Comp. vol 48 (1984), pp 331-334. LiDIA Group: ”LiDIA - A library for computational number theory”, TH Darmstadt, Germany, 1996 D.H.Lehmer: “Computer technology applied to the theory of numbers”, MAA Studies in Mathematics. Mih˘ ailescu, P.M. : “Cyclotomy of Rings & Primality Testing”, dissertation 12278, ETH Z¨ urich, 1997. Mih˘ ailescu, P.M., “Advances in Cyclotomy Primality Proving”, EMail to the NMBRTHRY mailing list; available on http://listserv.nodak.edu/archives/nbrthry.html, November 1997 Mih˘ ailescu, P.M., “New Wagstaff Prime Proved”, EMail to the NMBRTHRY mailing list; available on http://listserv.nodak.edu/archives/nbrthry.html, January 1998 Morain, F.: “New Ordinary Primality Proving Record”, EMail to the NMBRTHRY mailing list; available on http://listserv.nodak.edu/archives/nbrthry.html, October 1997 Morain,F.:“Primality Proving Using Elliptic Curves: An Update”, Preprint, to appear in Proceedings ANTS III (1998). Plaisted, D. A.: “Fast verification, testing and generation of large primes”, Theoretical Computer Science, vol 9 (1979), pp. 1-17. H.Riesel: “Prime Numbers and Computer Methods for Factorization”, Birkh¨ auser, 1994 Toom,A.L.: Doklady Akad. Nauk SSSR 150 (1963), 496-498. B.L.van der Waerden: Algebra I, p. 87, Springer A.E Western: “On Lucas and Pepin’s Test for Primeness of Mersenne Numbers”, Journal of the London Math. Society, vol 7/I (1932) H.C.Williams: “Primality testing on a computer” , Ars Combin. vol 5 (1978), pp 127-185. H.C.Williams, J.S.Judd: “Some algorithms for prime testing, using generalized Lehmer functions”, Math. Comp. vol 30 (1976), 867-886. H.C.Williams, C.R.Zarnke: “Some Prime Numbers of the Forms 2A3n + 1 and 2A3n − 1”, Math. Comp., vol. 26 (October 1972), pp. 995-998.
Primality Proving Using Elliptic Curves: An Update F. Morain
?
´ Laboratoire d’Informatique de l’Ecole polytechnique (LIX – CNRS UMR 7650) F-91128 Palaiseau Cedex France [email protected] http://www.lix.polytechnique.fr/Labo/Francois.Morain
Abstract. In 1986, following the work of Schoof on counting points on elliptic curves over finite fields, new algorithms for primality proving emerged, due to Goldwasser and Kilian on the one hand, and Atkin on the other. The latter algorithm uses the theory of complex multiplication. The algorithm, now called ECPP, has been used for nearly ten years. The purpose of this paper is to give an account of the recent theoretical and practical improvements of ECPP, as well as new benchmarks for integers of various sizes and a new primality record.
1
Introduction
The last ten years have shown the power of the theory of elliptic curves in many areas of number theory and cryptography. Fast algorithms for integer factorization [15], primality proving [8,1] and point counting over finite fields [34] were discovered and optimized (see [17] for a bibliography on the topic and [5,18] for some recent material). The reader wishing to learn about cryptographic applications of elliptic curves is referred to [19]. Even though one could dream of using Schoof’s algorithm for primality proving – as Goldwasser and Kilian did [8] – the approach due to Atkin, using complex multiplication, is still computationally faster. An implementation of this algorithm, popularized as ECPP [1,26], has been available on the WEB since 1991 and has been used by many people. New theoretical results emerged, due in part to the author, related mostly to the computation of character sums. The purpose of this paper is to give an account of these and to detail some new algorithmic improvements. The resulting program is much faster than the old version, and was able to prove the primality of the 2196 decimal digit cofactor of 27331 − 1 (work realized by E. Mayer and the author). Actually, it was about time, since the Jacobi Sums test seems to have woken up from a long sleep; for the new developments ?
The author is on leave from the French Department of Defense, D´el´egation G´en´erale pour l’Armement.
J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 111–127, 1998. c Springer-Verlag Berlin Heidelberg 1998
112
F. Morain
of this test and a new primality record beating that of ECPP, see [21,23] and the announcements [20,22]. Section 2 contains a brief review of elliptic curves and ECPP. One of the major improvements concerns the factorization routines used and Section 3 is devoted to this. Section 4 contains the recent improvements to the proving part of ECPP, including the implementation of Stark’s theorem on the cardinality of CM curves. These results can also be used to build CM curves quickly and are of independent interest (for cryptographic applications as in [27,14,3]). Section 5 contains the benchmarks for our implementation: we give the timings for proving the primality of integers having less than 512 bits on some platforms. We also give the time needed to check the primality certificates. As a typical result, proving the primality of a 512-bit number on an Alpha 125 MHz takes only 35 seconds, compared to 5 seconds for checking the certificate. In the last section, we comment on the new package and some records obtained with it.
2
An Overview of ECPP
Good references for elliptic curves in general are [35,36]. We will work on elliptic curves E with equation Y 2 Z = X 3 + aXZ 2 + bZ 3 over fields or rings Z/N Z. When N is prime, the set of points of such a curve (i.e., solutions of the above equation in the projective plane) forms an Abelian group; the law is denoted by + and the neutral element is the point at infinity, OE = (0, 1, 0) and equations for this are given in [35]. When N is composite, we proceed as if N were prime and wait for some a Z having a non-trivial gcd with N. Let us first recall the primality theorems [8]: Theorem 1. Let N be an integer prime to 6, E an elliptic curve over Z/N Z, together with a point P on E and m and s two integers with s | m. For each prime divisor q of s, we put (m/q)P = (xq , yq , zq ). We assume that mP = OE and gcd(zq , N ) = 1 for all q. Then, if p is a prime divisor of N , one has #E(Fp ) ≡ 0 mod s. We have also:
√ Corollary 1. With the same conditions, if s > ( 4 N + 1)2 , then N is prime. The following description of ECPP comes from [1]. Remember that there are basically two phases in the ECPP algorithm. In the first one, a decreasing sequence of probable primes N1 = N > N2 > . . . > Nk is built, the primality of Ni+1 implying that of Ni . In brief, Ni+1 is the largest (probable) prime factor of the order mi of some given elliptic curve Ei . In the second phase, the curve Ei is built and the primality theorem is used to prove the primality of Ni .
Primality Proving Using Elliptic Curves: An Update
113
More formally, the algorithm is: function ECPP(N ): boolean; 1. if N < 1000 then check the primality of N directly and return the answer. √ 2. Find an imaginary quadratic field K = Q( −D) (D > 0), for which the equation (1) 4N = x2 + Dy2 has solutions in rational integers x and y. 3. For each pair (U, V ) of solutions of (1), try to factor m = ((U − 2)2 + DV 2 )/4 = N + 1 − U . If one of these can be written as F × $ where F is completely factored and $ a probable prime, then go to step 4 else go to step 2. 4. Find the equation of the curve E having m points modulo N and a point P on it. If the primality condition is satisfied for (N, E, P ), then return ECPP($). Otherwise, return composite. 5. end. The justification of this algorithm relies on the fact that if N is really prime, step 2 ensures that N splits as the product of principal √ ideals in K, and therefore is the norm of the algebraic integer π = (x + y −D)/2. In that case, m is precisely the norm of π − 1 and the theory of complex multiplication asserts that E has indeed m points modulo N . It can be shown that the probability for a prime to split as a product of principal ideals in K is 1/(2h) where h is the class number of K. Of particular interest are the nine fields for which h = 1, corresponding to D ∈ {3, 4, 7, 8, 11, 19, 43, 67, 163}. Let us look at the work involved in this algorithm. For each discriminant D tried in step 2, we have to extract a square root modulo N and use a gcdlike computation (the so-called Cornacchia algorithm [1, p. 54] and [32]). Tricks for combining probabilistic primality proving of N and square root extractions modulo N are given in [1, p. 54] and make this phase very fast. Step 3 requires finding small factors of m, plus a probabilistic primality test. Finding small factors of a given number is speeded up by a particular sieve, also explained in [1, p. 55]; new tricks for speeding this sieve are explained in Section 3. The running time of the whole algorithm is dominated by this phase and so the parameters should be chosen very carefully. We will come back to this task in section 5. Finally, Step 4 requires building CM curves, and some progress has been made in that direction, see Section 4. Checking the conditions of the primality theorem also requires computing multiples of a point on an elliptic curve. Since we are working one curve at a time, the homogeneous form of the group law is preferable, and is even more profitable when using Montgomery’s arithmetic [24].
3
Improving the Factorization Stage
The integer N being given as 4N = U 2 + DV 2 (D a fundamental discriminant > 0), we hope that N + 1 ± U can be factored easily. For this, we can try several methods, already mentioned in [1].
114
3.1
F. Morain
Improving the Sieving Part
As explained in [1], we begin our factorization stage by a sieve. We precompute rp = (N + 1) mod p for all small p ≤ Pmax . Once we have this table, we can test whether p | N ±1 (by testing whether rp = 0 or rp = 2) or whether p | N +1 ±U , by testing whether U mod p = ∓rp mod p, which is rather economical, since √ |U | ≤ p. Special Division Routines. The basic operation we have to perform is the computation of U mod p for many small p’s. It is rather embarrassing to see that modern processors perform badly as far as integer division is concerned. This is particularly true for the DecAlpha. This suggests the use of special tricks for speeding up the division process. Note that we used the BigNum package for our implementation. A first trick is to divide U by blocks of primes. Suppose we are working with a base B representation of large integers (typically B = 264 on a DecAlpha). We gather primes pi ’s in such a way that c = pi1 pi2 · · · pik < B and perform one long division of U by c, followed by division operations by the pij ’s. For the DecAlpha only, a second trick is to use a special division routine inspired by [9, Figure 4.1] for computing the quotient of U by c, with c < 232 . We made numerous experiments for integers with less than 512 bits, comparing a special routine written in BigNum for division by small integers? with the algorithm of Granlund and Montgomery. We obtained a speedup of at least 2 resulting in a 10% savings in the whole first stage of ECPP. Note that this trick can be (actually is) combined with the preceding one. A Trick for the Case D = 3. In that case, we have 6 solutions to the equation 4N = x2 + Dy2 . The first one being (U, V ) with U > 0, say, the others are: (−U, −V ), (±(U + 3V )/2, ∓(U − V )/2), (±(U − 3V )/2, ±(U − V )/2). Having computed up = U mod p and wp = 3V mod p, we can check whether p divides any of the numbers N + 1 − W for W ∈ {±U, ±(U ± 3V )/2} using linear combinations of rp , up and wp, thus saving one third of the division operations. 3.2
Modifying the ρ Method for D = 3 and D = 4
Traditionally, one uses Pollard’s ρ method with a degree 2 function. When one knows that the prime factors p of an integer m are congruent to 1 modulo a number k, it is recommended [2] to use a degree k polynomial in Pollard’s ρ method. The number of iterations of the method being reduced by a factor √ k − 1. There are two cases in ECPP where we know such a thing. When D = 3 (resp. D = 4), we know that each prime factor of our m’s are congruent to ?
We divide a 64 bit word integer by a 32 bit one using base 232 arithmetic.
Primality Proving Using Elliptic Curves: An Update
115
1 modulo D. In the case D = 3 (resp. D = 4), one can use f3 (x) = x3 + 1 (resp. f4 (x) = x4 + 1). In tables 1 and 2, we indicate the number of modular multiplications and modular squarings needed to find factors p ≤ 108 . We used Montgomery’s MCF routine [25] for f4 , f3 ?? and for f2 (x) = x2 + 3 and x0 = 1. We list those primes that are champions, namely those for which the number of iterations is larger than all the preceding primes. × 2 8535 4242 874771 5240 5188 ≤ 107 7784389 19418 19357 9992053 26851 13397 ≤ 5 · 107 48749479 66313 33125 48909031 46665 46598 ≤ 108 93490387 100980 50457 95507539 67403 67336 ≤ 106
2
3 830017
Table 1. Comparisons of the variants of ρ for D = 3.
For instance, finding all primes p congruent to 1 modulo 3 that are ≤ 107 requires 26851 modular multiplications and 13397 squarings using f3 (x), compared to respectively 19418 and 19357. If a modular multiplication requires M operations, a squaring S, then the gain τ3 for D = 3 (resp. τ4 for D = 4) for p ≤ 108 is 100980M + 50457S 61148M + 61081S , τ4 = . τ3 = 67403M + 67336S 35418M + 70707S Note that from a practical point of view, such optimizations are rather difficult to appreciate precisely, but they are nice from a theoretical point of view. 4 × 2 968729 3428 6757 994393 5173 5121 ≤ 107 7784389 19418 19357 8854621 11667 23217 ≤ 5 · 107 48659461 24217 48311 49684241 48180 48113 ≤ 108 92188529 35418 70707 ≤ 106
2
Table 2. Comparisons of the variants of ρ for D = 4.
??
Though it cannot be applied a priori with f3 (x), it is trivial to modify MCF in this particular case.
116
4
F. Morain
Building CM Curves of Prescribed Cardinality
In the proving part, one has a (probable) prime p and a putative number of points √ m of a curve E having complex multiplication by the ring of integers of Q( −D). Once we have E, we find a point P on it and we have to check the primality conditions. To find E, one has to find a root of the so-called Weber polynomial, which gives the invariant of the curve. From this, we have to compute the coefficients of E. There are up to 6 (classes of) curves having the same invariant and we have to find the one having m points. We can certainly try all of them before finding the right one. However, any gain is worthwhile. When h(−D) = 1, formulas exist for E and are now fully proved (see [11,33]). Stark has recently given a general theorem when (D, 6) = 1 and this will be explained below. For D = 15, one can use the following result that uses Rajwade’s method as improved in [11] (see [4] for the details): Theorem 2. Let p = u2 + 15v2 with u ≡ 2 mod 3 if p ≡ 1 mod 15 and u ≡ 1 mod 3 if p ≡ 4 mod 15. Let $ be any square root of 5 modulo p. Then the curve of equation y2 = x3 − (3675 + 6240$)x − (188650 + 320320$) has p + 1 + 2u points. For D = 20, we refer to [16]. As a final comment, an algorithmic method is currently being developed by the author in the case D 6≡ 1 mod 3 (see [31]). 4.1
The Cases D = 3 and D = 4
The most heavily used primality tests are the N ± 1 test and the tests corresponding to D = 3 or 4, as indicated in Table 3 where we give the statistics for numbers of b bits. For these D’s, it is natural to speed up the construction of E. For these two cases, algorithms are given in [1, p. 58]. Slightly more efficient D\b 128 −1 0.59 1 0.16 3 0.12 4 0.05
192 0.50 0.12 0.16 0.08
256 0.41 0.12 0.23 0.05
320 0.33 0.14 0.20 0.09
384 0.33 0.13 0.19 0.08
448 0.30 0.10 0.19 0.08
512 0.26 0.09 0.20 0.08
Table 3. Frequencies of discriminants used in ECPP.
approaches can be found in the literature [12]. We combine these theorems with the use of known values of quartic and cubic symbols computed in [38]. The methods to be described yield a 40% savings in the proving part for small numbers (less than 512 bits). We begin with D = 4 to describe the philosophy, giving less details for D = 3. In the sequel, we let (a/p) denote the Legendre symbol.
Primality Proving Using Elliptic Curves: An Update
117
The Case D = 4. Theorem 3 (Katre). Let p ≡ 1 mod 4 and write p = x2 + 4y2 with x ≡ 1 mod 4. The quartic symbol is (k/p)4 ≡ k (p−1)/4 mod p. If a 6≡ 0 mod p, then the curve Y 2 = X 3 + aX has cardinality p + 1 − t where 2x if (a/p)4 = 1, t = −2x if (a/p)4 = −1, −4y otherwise where y is chosen uniquely by 2y(a/p)4 = x. We will use the following result of [38]? ? ? Theorem 4. Write p = s2 + 4y2 with s = 2y + 1 mod 4 and let i = 2y/s mod p so that i2 ≡ −1 mod p. Then 1 if y ≡ 0 mod 4, i if y ≡ 3 mod 4, (2/p)4 = −1 if y ≡ 2 mod 4, −i if y ≡ 1 mod 4. When y ≡ 0 mod 2, we have
1 if i if (3/p)4 = −1 if −i if
3 | y, 3 | x + 2y, 3 | x, 12 | x − 2y.
Suppose that p = u2 + v2 and we want a curve of cardinality p + 1 − 2u, the first thing we have to do is recover x and y from u and v. Then we proceed as follows, finding a satisfying any of the three cases of Katre’s theorem† : 1. If 2u = 2x, it is enough to find an a such that (a/p)4 = 1 and we can take a = 1. 2. If 2u = −2x, then any a with (a/p)4 = −1 will do: If p ≡ 5 mod 8, we take a = −1. If p ≡ 1 mod 8, according to Theorem 4, we take a = 4 when y ≡ 1 mod 2 and a = 2 when y ≡ 2 mod 4; when y ≡ 0 mod 4 we can take a = 3 when 3 | x, and 9 if 3 | x ± 2y; otherwise, we do an exhaustive search, starting at a = 5. 3. When 2u 6= ±2x, we choose the sign of y such that 2u = −4y. Then we must find a such that (a/p)4 = x/(2y) ∈ {±i} (we cannot have x = ±2y). If y is odd, then with the notations of Theorem 4, one has s = −x; if y ≡ 3 mod 4, we take a = 2 and if y ≡ 1 mod 4, we take a = 1/2. When y is even, then s = x; if y ≡ 2 mod 4, we let w be a square root of 2 mod p and take for a the value w or 1/w such that (a/p)4 = x/(2y); if y ≡ 0 mod 4, we take a = 1/3 when 3 | x + 2y, a = 3 when 3 | x − 2y, w or 1/w with w 2 ≡ 3 mod p when 3 | x and we use exhaustive search beginning at a = 5 in the last case where 6 | y. ??? †
Lienen’s notations are not ours at this point. Though tedious to implement, this procedure is very fast, resorting to the Riemann hypothesis in as few cases as possible.
118
F. Morain
Remark. When we need a such that (a/p)4 = ξ 6= 1, it is √ enough to find a b such that (b/p)4 6= 1. Once this is done, one of ±b, ±1/b, ± b will give us our a. The case D = 3. First of all, we have Katre’s result: Theorem 5. Let p ≡ 1 mod 3 and write 4p = L2 + 27M 0 , where L ≡ 1 mod 3. The cubic symbol is denoted (k/p)3 ≡ k (p−1)/3 mod p. If b 6≡ 0 mod p, then the curve Y 2 = X 3 + b has cardinality p + 1 − t where −(b/p)L if (4b/p)3 = 1, t = 12 (b/p)(L + 9M 0 ) otherwise where M 0 is chosen uniquely by (L − 9M 0 )(4b/p)3 = (L + 9M 0 ). 2
Let us normalize things as follows: p is a prime number ≡ 1 mod 3, so that 4p = U 2 + 3V 2 = L2 + 27M 2, with L ≡ 1 mod 3 and M > 0. We want a curve E : Y 2 = X 3 +b having p+1−U points over Fp . Following [38], we can also write p = α2 − αβ + β 2 with β = 3M and α = (L + 3M )/2 for which α ≡ 2 mod 3. Proposition 1. Let X 3 − 1 = (X − 1)(X − vp )(X − wp ) mod p. Then vp ≡ (L + 9M )/(L − 9M ) mod p, wp ≡ 1/vp mod p. With all these notations, we have: Theorem 6.
1 if β ≡ 0 mod 6, 2 = vp if β ≡ 3 mod 6, α ≡ 2 mod 6, p 3 wp if β ≡ 3 mod 6, α ≡ 5 mod 6, 1 if β ≡ 0 mod 9, 3 = vp if β ≡ 6 mod 9, p 3 wp if β ≡ 3 mod 9
The algorithm used to find b such that E : Y 2 = X 3 + b has p + 1 − U points is as follows: We first search for the cubic residue and then correct the value of the Legendre symbol, remarking that if γ is a quadratic non residue, then the value of (b/p)3 is unchanged when b is replaced by bγ 3 . The algorithm is: 1. Find (L, M ) from (U, V ). 2. If |U | = |L|, we let ε = U/L and we must find b such that (4b/p)3 = 1 and (b/p) = −ε. In the case ε = −1, b = 24 is convenient; when ε = +1, we take γ such that (γ/p) = −1 and take b = γ 3 /4. 3. If U = ±(L+9θM )/2 with θ ∈ {±1}, we let ε = (2U )/(L+9θM ) and we must find b such that (4b/p)3 = (L + 9θM )/(L − 9θM ) and (b/p) = ε. For this, we first look for b0 such that (b0 /p)3 = (L + 9θM )/(L − 9θM ) and then set b = b0 /4. If M ≡ 0 mod 6, we look for b0 by enumeration, starting from 5, since 2 and 3 are cubic residues by Theorem 6. Otherwise, we take for b0 a suitable power of 2 or 3, using Theorem 6. Remark. The remark at the end of the case D = 4 applies here too.
Primality Proving Using Elliptic Curves: An Update
4.2
119
The Case (D, 6) = 1: Stark’s Theorem
We give here the main theorem of Stark [37]. Remember the definition of Weber’s functions in terms of the modular invariant j(z): p p γ2 (z) = 3 j(z), γ3 (z) = j(z) − 1728. Stark’s theorem is: √ Theorem 7. Suppose (D, 6) = 1. Put g2 = γ2 ((−3 + −D)/2) and g3∗ = √ √ −Dγ3 ((−3 + −D)/2). Let 4p = U 2 + DV 2 , g˜2 and g˜3∗ be the reductions of g2 and g3∗ modulo p. Finally, let c be an element of Z/pZ. Then the curve g2 /48 − c3 D˜ g3∗ /864 has cardinality Ec : y2 = x3 + c2 D˜ c 2U U if D ≡ 7 mod 8, p D p+1− −c 2U U if D ≡ 3 mod 8. p D Using the techniques described in [1, §7], it is possible to find the minimal polynomials of g2 and g3∗ . However, we need to compute roots of these modulo p and match these roots. From a practical point of view, it is better to use the following observation: since g3∗ belongs to Q(g2 ), there exists a polynomial PD (X) in Q[X] such that g3∗ = PD (g2 ). When working modulo p, we first find g2 by computing a root of its minimal polynomial and then substitute its value in PD to find g3∗ modulo p. We give below a formula for the conjugates of g2 and g3∗ . Then we explain how to compute PD (X). The Conjugates of g2 and g3. For g2 , we use the procedure GAMMA2 of [1, pp. 43]. After a lot of computations, we came to the following conjectural result for the conjugates of γ3∗ (1, 1, (D + 1)/4) = −g3∗ . Conjecture 1. Let D ≡ 3 mod 4. The conjugates of γ3∗ (1, 1, (D + 1)/4) are the quantities (−1)(b+1)/2+ac+a+c γ3∗ (a, b, c) where [a, b, c] runs through the primitive reduced quadratic forms of discriminant −D. This conjecture might well be proved using the techniques described in [6,7]. Computing PD (X). There are basically two approaches to the problem. The first one uses floating point computations and the second is algebraic in nature. In the floating point approach, we compute all the conjugates of g2∗ , then Ph−1 the associated values of −γ3∗ and we find PD (X) = i=0 pi X i using Lagrange’s interpolation formulae. The coefficients of PD (X) are in Q, so that once we have a floating point value for the pi ’s, we have to recognize rational numbers, which is done via continued fractions. The problem with this approach is that
120
F. Morain
the precision needed for the computations is quite high. It appeared faster to use the algebraic method to be described next. Remember that g3∗ and g2 are related by an algebraic relation: g3∗ is also the correct square root of −D(g23 − 1728). Therefore, if we know the minimal polynomial HD [g2 ](X) of g2 , we can try to factor the polynomial Y 2 + D(HD [g2 ]3 − 1728) over Q[X]/(HD [g2 ]) and then find the correct sign of the factor by substitution of the value of g2 . This factorization can be done using a computer algebra system such as Maple or Magma. Numerical Examples. For D = 23, let us give the values of the elements of the group of quadratic forms of discriminant −D and the equivalent forms given by GAMMA2 and the corresponding values of γ2 : ˜ Q γ2 (τQ˜ ) Q [1, 1, 6] [1, 3, 8] −151.73142930462826 [2, 1, 3] [2, 9, 26] −1.6342853476858265 − 12.303828997932955 i [2, −1, 3] [2, 3, 4] −1.6342853476858265 + 12.303828997932955 i From this, the minimal polynomial of g2 is: H23[g2 ](X) = X 3 + 155 X 2 + 650 X + 23375. Factoring Y 2 + 23(H23[g2 ](X)3 − 1728)
we find that P23 (X) = ±
1885 78 2 361 X + X+ 175 35 7
.
On the other hand, we compute √ √ g3∗ = −23γ3 (τ[1,3,8]) = − −23γ3 (τ[1,1,6]) = 8965.7088453344433. Evaluating at X = g2 = −151.73142930462826, we get P23 (g2 ) = 8965.7088453344433 = g3∗ and therefore the good sign is +. Let us take p = 167 such that 4p = 242 + 23 × 22 . Then a root of H23 [g2 ](X) modulo p is 106, from which g˜3∗ = 59. It is easy to check that the curve E36 : y2 = x3 + 28x + 35 has 167 + 1 − 24 = 144 points. A more complex examples involves the factorization of HD [g2 ](X) over the genus field of K, as explained in [1]. For instance, if D = 55, we have √ √ H55 [g2 ](X) = 2X 2 + (2355 + 1053 5)X − (8865 + 6075 5)
Primality Proving Using Elliptic Curves: An Update
121
for which we find g3∗ = −
4 169 2 2866 11825 g3 − g + g2 + . 10935 2 243 2 81 81
Remark. Note that we could work the other way round, by computing the minimal polynomial of g3∗ , that is: H23 [g3∗ ](X) = X 3 + 9338 X 2 + 3384381 X + 417146653 and recover g2 as:
9919 155 78 X2 − X− , 5268725 1309 425 but the coefficients are somewhat larger. −
5
The New Program
The first version of ECPP (version 3.4.1) was made available some time around 1991. Several thousands copies were taken since. Continuous (if sporadic) work on the different libraries progressively modified the program and the program continuously used for proving the primality of large numbers [13,28,29,30]. An intermediate version was donated to the Magma group and incorporated in version 2.2. It is customary to say that a program that is five years old has to be rewritten. This was done a year ago, when the author spent a lot of time rewriting the whole program, starting from the arithmetic. The new public version (5.6.1) has been available for a year, without any publicity made anywhere (at least until November 1997). It has been copied by several hundred people. The ECPP program is written in C and uses the BigNum package [10]. A small package, containing some binaries and data files, is available on http://www.lix.polytechnique.fr/Labo/Francois.Morain/. Data have been recomputed so as to meet the requirements of Stark’s theorem: such data now comprise the minimal polynomial of g2 (or of some Weber function as explained in [1]) and the polynomial PD (X). 5.1
A Few Words on Strategy
As in [1, p. 52], we use a precomputed list of 2244 discriminants. This enables us to factor up to 4500 putative m’s (corresponding to D = ±1 and all D such that h ≤ 20 and for (h, g) ∈ {(32, 16), (24, 8), (48, 16), (32, 8), (64, 16)} – g being the number of genera). ECPP can be seen as a tree search. At each level, we have to run through a set of possible branches, one per discriminant. In most cases, we have enough discriminants to find a good one and we can go down to the next level. In some cases (when we are dealing with a difficult number), our stock is not enough and we have the possibility of backtracking one level up. This backtrack feature is present in our implementation of ECPP. Another special feature is the redo option, which says that before backtracking, we should try our list of discriminants at the same level with higher factorization parameters.
122
5.2
F. Morain
Benchmarks
All benchmarks were done on a DEC 3000 - M300LX (processor alpha). For comparisons with [1, p. 61], we managed to find an old DEC 5000 (processor mips). Basic Arithmetic. We first give the timings for modular multiplication for our implementation, built on top of the BigNum package. We assume our numbers have size between 128 and 512 bit. We give the timings for the naive implementation, as well as the one using Montgomery’s arithmetic [24]. b 128 192 256 320 384 448 512 plain 58 81 101 119 144 165 196 Montgomery 28 37 44 54 76 86 102
Table 4. 64-bit arithmetic on DEC 3000 - M300LX in µs
b 128 192 256 320 384 448 512 plain 47 78 176 203 254 297 336 Montgomery 27 78 125 113 137 195 285
Table 5. 32-bit arithmetic on DEC 5000, in µs
The DEC 3000 machine is a 64-bit, the DEC 5000 is 32-bit. On a DEC 3000, Montgomery’s arithmetic is up to twice as fast as the plain version for 512 bits. Montgomery’s arithmetic is very nice, but some care has to be taken when using it, because all constants are to be normalized before use. Since this can be very cumbersome, we limit the use of this arithmetic to critical parts of the algorithm, such as probabilistic primality proving. ECPP. The speed of ECPP depends largely on the factorization parameters used. As in [1], a sieve is used, followed by Pollard’s ρ if necessary, then p − 1 is used and finally ECM as a last resort. For numbers up to 512 bits, the sieve is enough. Finding the optimal values involves running the program with a lot of different parameters, starting from b = 128 bits, up to 512. The largest prime Pmax for the sieve in our implementation is given in the following table: b 128 192 256 320 384 448 512 Pmax 10000 30000 30000 50000 50000 50000 80000 Tables 6 and 7 contain the timings for 50 random primes of each size. The first line gives the time for the building phase, the second for the proving phase,
Primality Proving Using Elliptic Curves: An Update
123
the third the total time and the fourth the number of intermediate primes. Optimizing the parameters is a tedious task. We tried to minimize the maximal time spent. Finally, we measured the time needed to check a certificate, and this only on the DEC 3000 machine. Results are given in Table 8.
b min max 128 0.34 1.82 0.18 1.36 0.70 3.01 4.00 15.00 192 0.98 6.50 0.37 3.03 1.35 9.52 9.00 21.00 256 3.15 20.62 2.39 10.51 5.97 31.13 12.00 24.00 320 6.54 24.00 5.91 13.49 12.90 34.20 16.00 25.00
mean 0.81 0.61 1.43 9.82 2.94 1.80 4.74 13.76 7.11 4.56 11.67 17.64 13.74 8.92 22.66 20.80
s.d. b min max mean s.d. 0.32 384 12.39 69.99 28.30 11.47 0.25 9.26 26.14 16.42 3.66 0.52 23.54 92.10 44.72 14.26 1.96 18.00 35.00 25.26 3.39 1.15 448 22.56 100.43 46.85 15.70 0.55 17.33 43.35 27.84 6.56 1.57 41.43 141.48 74.69 20.84 2.75 20.00 36.00 29.08 3.16 3.09 512 40.23 143.15 79.39 23.32 1.41 29.35 68.90 43.05 8.33 4.31 69.58 194.66 122.44 29.05 2.79 28.00 40.00 32.52 3.01 4.19 1.81 5.34 2.16
Table 6. Benchmarks for 50 primes on a DEC 5000
Larger Numbers. The emphasis was put so far on small numbers of cryptographic interest. For larger numbers, finding the optimal parameters is more difficult. Some other algorithmic tools are still being developed and will be described elsewhere. We content ourselves with some timings (in seconds on an Alpha 125 MHz) obtained for two primes of 500 and 600 decimal digits:
New Records: E. Mayer sent me enough emails to wake me up and force me to improve my program. This resulted in a small step for theory, but a giant step for users: a restart option was added to the program, making it very useful for long runs. With this option, E. Mayer and I were able to prove the primality of the number (27331 − 1)/458072843161 in one month of DecAlpha 400 MHz, thus setting the new ECPP record at 2196 decimal digits, which is 500 digits more than the preceding one‡ which had 1505 decimal digits (see [28]). Checking the certificate takes 6 hours. ‡
Actually, E. Mayer broke that record some time before without even realizing it!
124
F. Morain b min max 128 0.08 0.82 0.05 0.48 0.15 1.02 7.00 14.00 192 0.22 2.47 0.12 0.77 0.33 3.12 9.00 18.00 256 1.17 6.38 0.38 2.10 1.70 7.20 9.00 23.00 320 1.50 8.80 0.98 3.40 2.87 10.45 13.00 28.00
mean 0.26 0.17 0.43 9.82 1.06 0.42 1.48 13.14 2.95 0.89 3.84 16.54 4.74 1.84 6.58 19.92
s.d. b min max mean s.d. 0.15 384 3.68 27.10 9.17 3.87 0.09 1.87 8.57 3.56 1.21 0.21 5.55 32.97 12.73 4.78 1.88 18.00 33.00 24.08 3.22 0.51 448 5.12 31.38 14.67 5.43 0.13 2.82 14.28 5.93 2.18 0.59 8.23 45.67 20.60 7.10 2.27 23.00 34.00 28.42 2.88 1.42 512 12.40 45.02 25.63 8.43 0.31 5.75 27.00 10.18 4.16 1.61 18.17 68.53 35.81 11.31 2.75 24.00 39.00 31.08 3.56 1.67 0.56 1.96 2.91
Table 7. Benchmarks for 50 primes on a DEC 3000. b 128 192 256 320 384 448 512
min max 0.10 0.35 0.28 0.62 0.45 0.95 0.95 2.07 1.40 3.27 2.42 5.43 3.73 6.65
mean 0.20 0.43 0.72 1.38 2.25 3.59 4.99
s.d. 0.05 0.09 0.12 0.22 0.38 0.61 0.66
Table 8. Time for checking the certificates on a DEC 3000
6
Conclusions
We have described the recent developments in effective complex multiplication, yielding fast and direct ways of building CM curves. This together with many algorithmic improvements made it possible to release a new and faster version of ECPP. Note that the only concurrent of ECPP, the Jacobi sums test, has been recently improved by Mih˘ ailescu [21,22] who announced a new primality record [20], with N = 210000 + 177, whose primality was established in 138 hours on an Alpha 500. This might tend to prove that this cyclotomic algorithm is much faster than ECPP. However, the fact that ECPP gives a certificate of primality that can be checked independently, with a small program, continues to be an advantage. As a final remark, Mih˘ ailescu’s thesis raises interesting connections between cyclotomic ideas and elliptic curves. These connections will be investigated in the near future.
Primality Proving Using Elliptic Curves: An Update
125
p building proving checking 10499 + 153 29559 298 156 10599 + 2161 46287 852 253
Table 9. Timings for two large primes on a DEC 3000 (in seconds).
Acknowledgments. The author wants to express his gratitude to E. Mayer for his interest in ECPP and for developing much enthusiasm for intermediate versions. No doubt that without him, the record would still be at 1500 digits. Also, thanks to P. Mih˘ ailescu for his stimulating competition and many interesting questions related to his work. F. Hajir’s help was crucial in getting a copy of Stark’s article. Finally, G. Hanrot’s reading of this manuscript was heartily welcomed.
References 1. A. O. L. Atkin and F. Morain. Elliptic curves and primality proving. Math. Comp., 61(203):29–68, July 1993. 2. R. P. Brent and J. M. Pollard. Factorization of the eighth Fermat number. Math. Comp., 36(154):627–630, April 1981. 3. J. Chao, K. Harada, N. Matsuda, and S. Tsujii. Design of secure elliptic curves over extension fields with CM field method. In Proceedings of PRAGO-CRYPTO’96, pages 93–108, 1996. 4. J.-M. Couveignes, A. Joux, and F. Morain. Sur quelques sommes de caract`eres. In preparation, February 1994. 5. N. D. Elkies. Elliptic and modular curves over finite fields and related computational issues. In D. A. Buell and J. T. Teitelbaum, editors, Computational Perspectives on Number Theory: Proceedings of a Conference in Honor of A. O. L. Atkin, volume 7 of AMS/IP Studies in Advanced Mathematics, pages 21–76. American Mathematical Society, International Press, 1998. 6. A. Gee. Class invariants by Shimura’s reciprocity law. Preprint, 1998. 7. A. Gee and P. Stevenhagen. Generating class fields using Shimura reciprocity. To appear in the Proc. of ANTS-III, 1998. 8. S. Goldwasser and J. Kilian. Almost all primes can be quickly certified. In Proc. 18th STOC, pages 316–329. ACM, 1986. May 28–30, Berkeley. 9. T. Granlund and P. L. Montgomery. Division by invariant integers using multiplication. SIGPLAN Notices, 29(6):61–72, 1994. 10. J.-C. Herv´e, F. Morain, D. Salesin, B. Serpette, J. Vuillemin, and P. Zimmermann. Bignum: A portable and efficient package for arbitrary precision arithmetic. Rapport de Recherche 1016, INRIA, April 1989. 11. A. Joux and F. Morain. Sur les sommes de caract` eres li´ees aux courbes elliptiques a multiplication complexe. J. Number Theory, 55(1):108–128, November 1995. ` 12. S. A. Katre. Jacobsthal sums in terms of quadratic partitions of a prime. In K. Alladi, editor, Number Theory, volume 1122 of Lecture Notes in Math., pages 153–162. Springer-Verlag, 1985. Proceedings of the 4th Matscience Conference held at Ootacamund, India, January 5-10, 1984.
126
F. Morain
13. W. Keller and F. Morain. The complete factorization of some large Mersenne composites. Abstracts of the AMS, 13(5):506, October 1992. 92T-11-163. 14. G.-J. Lay and H. G. Zimmer. Constructing elliptic curves with given group order over large finite fields. In L. Adleman and M.-D. Huang, editors, ANTS-I, volume 877 of Lecture Notes in Comput. Sci., pages 250–263. Springer-Verlag, 1994. 1st Algorithmic Number Theory Symposium - Cornell University, May 6-9, 1994. 15. H. W. Lenstra, Jr. Factoring integers with elliptic curves. Ann. of Math. (2), 126:649–673, 1987. 16. F. Lepr´evost and F. Morain. Revˆetements de courbes elliptiques ` a multiplication complexe par des courbes hyperelliptiques et sommes de caract`eres. J. Number Theory, 64:165–182, 1997. ´ 17. R. Lercier. Algorithmique des courbes elliptiques dans les corps finis. Th`ese, Ecole polytechnique, June 1997. 18. R. Lercier and F. Morain. Algorithms for computing isogenies between elliptic curves. In D. A. Buell and J. T. Teitelbaum, editors, Computational Perspectives on Number Theory: Proceedings of a Conference in Honor of A. O. L. Atkin, volume 7 of AMS/IP Studies in Advanced Mathematics, pages 77–96. American Mathematical Society, International Press, 1998. 19. A. J. Menezes. Elliptic curve public key cryptosystems. Kluwer Academic Publishers, 1993. 20. P. Mih˘ ailescu. Advances in cyclotomy primality proving. Email to the NMBRTHRY mailing list; available on http://listserv.nodak.edu/archives/nmbrthry.html, November 1997. 21. P. Mih˘ ailescu. Cyclotomy of rings and primality testing. Diss. ETH No. 12278, Swiss Federal Institute of Technology Z¨ urich, 1997. 22. P. Mih˘ ailescu. Cyclotomy news. Email to the NMBRTHRY mailing list; available on http://listserv.nodak.edu/archives/nmbrthry.html, January 1998. 23. P. Mih˘ ailescu. Cyclotomy primality proving – recent developments. To appear in the Proc. of ANTS-III, March 1998. 24. P. L. Montgomery. Modular multiplication without trial division. Math. Comp., 44(170):519–521, April 1985. 25. P. L. Montgomery. Speeding the Pollard and elliptic curve methods of factorization. Math. Comp., 48(177):243–264, January 1987. 26. F. Morain. Courbes elliptiques et tests de primalit´ e. Th`ese, Universit´e Claude Bernard–Lyon I, September 1990. 27. F. Morain. Building cyclic elliptic curves modulo large primes. In D. Davies, editor, Advances in Cryptology – EUROCRYPT ’91, volume 547 of Lecture Notes in Comput. Sci., pages 328–336. Springer–Verlag, 1991. Proceedings of the Workshop on the Theory and Application of Cryptographic Techniques, Brighton, United Kingdom, April 8–11, 1991. 28. F. Morain. Prime values of partition numbers and the primality of p(1840926). Rapport de Recherche LIX/92/RR/11, Laboratoire d’Informatique de l’Ecole Polytechnique (LIX), 1992. 29. F. Morain. (2^10501+1)/3 is prime. Email to the NMBRTHRY mailing list; available on http://listserv.nodak.edu/archives/nmbrthry.html, April 1996. 30. F. Morain. (2^12391+1)/3 is prime. Email to the NMBRTHRY mailing list; available on http://listserv.nodak.edu/archives/nmbrthry.html, April 1996. 31. F. Morain. Torsion points on CM elliptic curves and applications. Preprint, March 1998. 32. A. Nitaj. L’algorithme de Cornacchia. Exposition. Math., 13:358–365, 1995.
Primality Proving Using Elliptic Curves: An Update
127
33. R. Padma and S. Venkataraman. Elliptic curves with complex multiplication and a character sum. J. Number Theory, 61:274–282, 1996. 34. R. Schoof. Elliptic curves over finite fields and the computation of square roots mod p. Math. Comp., 44:483–494, 1985. 35. J. H. Silverman. The arithmetic of elliptic curves, volume 106 of Graduate Texts in Mathematics. Springer, 1986. 36. J. H. Silverman. Advanced Topics in the Arithmetic of Elliptic Curves, volume 151 of Graduate Texts in Mathematics. Springer-Verlag, 1994. 37. H. M. Stark. Counting points on CM elliptic curves. Rocky Mountain J. Math., 26(3):1115–1138, 1996. 38. H. von Lienen. Reelle kubische und biquadratische Legendre-Symbole. J. Reine Angew. Math., 305:140–154, 1979.
Bounding Smooth Integers (Extended Abstract) Daniel J. Bernstein Department of Mathematics, Statistics, and Computer Science (M/C 249) The University of Illinois at Chicago Chicago, IL 60607–7045 [email protected]
1
Introduction
An integer is y-smooth if it is not divisible by any primes larger than y. Define Ψ(x, y) = #{n : 1 ≤ n ≤ x and n is y-smooth}. This function Ψ is used to estimate the speed of various factoring methods; see, e.g., [1, section 10]. Section 4 presents a fast algorithm to compute arbitrarily tight upper and lower bounds on Ψ(x, y). For example, 1.16 · 1045 < Ψ(1054 , 106) < 1.19 · 1045 . The idea of the algorithm is to bound the relevant Dirichlet series between two power series. Thus bounds are obtained on Ψ(x, y) for all x at one fell swoop. More general functions can be computed in the same way.
Previous Work The literature contains many loose bounds and asymptotic estimates for Ψ; see, e.g., [2], [4], [5], and [9]. Hunter and Sorenson in [6] showed that some of those estimates can be computed quickly.
Acknowledgments The author was supported by the National Science Foundation under grant DMS–9600083.
2
Discrete Generalized Power Series
P A series is a formal sum f = r∈R fr tr such that, for any x ∈ R, there are only finitely many r ≤ x with fr 6= 0. P P P gr tr be series. The sum f + g is r (fr + gr )tr . Let f = r fr trPand Pg = rr+s The product fg is r s fr gs t . P P P I write f ≤ g if r≤x fr ≤ r≤x gr for all x ∈ R. If h = r hr tr is a series with all hr ≥ 0, then fh ≤ gh whenever f ≤ g. J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 128–130, 1998. c Springer-Verlag Berlin Heidelberg 1998
Bounding Smooth Integers
3
129
Logarithms
Fix a positive real number α. This is a scaling factor that determines the speed and accuracy of my algorithm: the time is roughly proportional to α, and the error is roughly proportional to 1/α. For each prime p select integers L(p) and U (p) with L(p) ≤ α log p ≤ U (p). I use the method of [7, exercise 1.2.2–25] to approximate α log p.
4
Bounding Smooth Integers
P Define f as the power series p≤y tL(p) + 12 t2L(p) + 13 t3L(p) + · · · . Then Y Y X 1 1 tα log n = ≤ = exp f, 1 − tα log p 1 − tL(p) n is y smooth p≤y p≤y P P so Ψ(x, y) ≤ r≤α log x ar if exp f = r ar tr . P P r U (p) + 12 t2U (p) + 13 t3U (p) +· · · , then Ψ(x, y) ≥ P Similarly, if r br t = exp p t r≤α log x br . One can easily compute exp f in Q[t]/tm as 1 + f + 12 f 2 + · · ·, since f is divisible by a high power of t; it also helps to handle small p separately. An alternative is Brent’s method in [8, exercise 4.7–4]. It is not necessary to enumerate all primes p ≤ y. There are fast methods to count (or bound) the number of primes in an interval; when y is much larger than α, many primes p will have the same value bα log pc.
5
Results
The following table shows some bounds on Ψ(x, y) for various (x, y), along with u = (log x)/log y. x
y
α
lower
upper
u
xρ(u)
1060 1060 1060 1060 1060 1060 1060 1060 1060 1060 1060 1060 1060 1060
102 102 103 103 103 104 104 104 105 105 105 106 106 106
101 102 101 102 103 101 102 103 101 102 103 101 102 103
1018 · 5.2 1018 · 6.73 1032 · 1.44 1032 · 2.278 1032 · 2.4044 1041 · 0.70 1041 · 1.191 1041 · 1.2649 1046 · 0.99 1046 · 1.679 1046 · 1.7817 1049 · 1.82 1049 · 3.025 1049 · 3.2017
1018 · 11.6 1018 · 7.28 1032 · 5.07 1032 · 2.580 1032 · 2.4345 1041 · 2.88 1041 · 1.370 1041 · 1.2827 1046 · 4.07 1046 · 1.931 1046 · 1.8069 1049 · 7.14 1049 · 3.463 1049 · 3.2453
30 30 20 20 20 15 15 15 12 12 12 10 10 10
1011 · 0.327− 1011 · 0.327− 1032 · 0.246+ 1032 · 0.246+ 1032 · 0.246+ 1041 · 0.759− 1041 · 0.759− 1041 · 0.759− 1046 · 1.420− 1046 · 1.420− 1046 · 1.420− 1049 · 2.770+ 1049 · 2.770+ 1049 · 2.770+
In the final column, ρ is Dickman’s rho function.
130
Daniel J. Bernstein
References 1. Joseph P. Buhler, Hendrik W. Lenstra, Jr., Carl Pomerance, Factoring integers with the number field sieve, in [10], 50–94. 2. E. Rodney Canfield, Paul Erd˝ os, Carl Pomerance, On a problem of Oppenheim concerning “factorisatio numerorum”, Journal of Number Theory 17 (1983), 1– 28. 3. Ronald L. Graham, Jaroslav Neˇsetˇril, The mathematics of Paul Erd˝ os, volume 1, Algorithms and Combinatorics 13, Springer-Verlag, Berlin, 1997. 4. Adolf Hildebrand, G´erald Tenenbaum, On integers free of large prime factors, Transactions of the AMS 296 (1986), 265–290. 5. Adolf Hildebrand, G´erald Tenenbaum, Integers without large prime factors, Journal de Th´eorie des Nombres de Bordeaux 5 (1993), 411–484. 6. Simon Hunter, Jonathan Sorenson, Approximating the number of integers free of large prime factors, Mathematics of Computation 66 (1997), 1729–1741. 7. Donald E. Knuth, The art of computer programming, volume 1: fundamental algorithms, 2nd edition, Addison-Wesley, Reading, Massachusetts, 1973. 8. Donald E. Knuth, The art of computer programming, volume 2: seminumerical algorithms, 2nd edition, Addison-Wesley, Reading, Massachusetts, 1981. 9. Sergei Konyagin, Carl Pomerance, On primes recognizable in deterministic polynomial time, in [3], 176–198. 10. Arjen K. Lenstra, Hendrik W. Lenstra, Jr. (editors), The development of the number field sieve, Lecture Notes in Mathematics 1554, Springer-Verlag, Berlin, 1993.
Factorization of the Numbers of the Form m3 + c2 m2 + c1 m + c0 Zhang Mingzhi Sichuan Union University
Abstract. We give an algorithm which can factor integers of the form m3 + c2 m2 + c1 m + c0 , where the ci are small integers. It is expected λ that p the time required is Lδ andpthe space required pis L where L = exp( log n log log n) and δ = r/ 6(r − 1), λ = 2/ 6(r − 1), where r is the elimination exponent.
1
Introduction
Currently, the quadratic sieve (QS) and number field sieve (NFS) are two of the most important methods for factoring integers. NFS is a kind of linear sieve using a homomorphism from Z[α] to Z/nZ, where α is a root of a polynomial P (x) ∈ Z[x] and the number n to be factored is of the form n = P (m). For simplicity, we often assume Z[α] is a PID and we have to compute the units and the generators of the prime ideals of small prime norms. NFS is faster than QS, but QS is simpler and it is a general purpose algorithm. J.M. Pollard ([7]) has factored the number of the form m3 + c using NFS. In the following, we combine some ideas of NFS and QS. For the numbers of the form m3 + c2 m2 + c1 m + c0 (|ci | small), we give a simple algorithm which can factor such numbers faster than the QS.
2
Algorithm
Let P (x) = x3 + c2 x2 + c1 x + c0 where the ci are small integers. Let the odd composite number n to be factored be of the form n = P (m), or n = m3 + c2 m2 + c1 m + c0
(1)
x = a2 m2 + a1 m + a0
(2)
Let where ai are integers and gcd(a2 , a1 , a0 ) = 1. We choose ai suitably so that x2 has small residue mod n. From m3 ≡ −(c2 m2 + c1 m + c0 )(modn), we have x2 ≡ b2 m2 + b1 m + b0 (modn) J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 131–136, 1998. c Springer-Verlag Berlin Heidelberg 1998
132
Zhang Mingzhi
where b2 = c22 a22 − c1 a22 + a21 − 2c2 a1 a2 + 2a0 a2 b1 = c1 c2 a22 − c0 a22 − 2c1 a1 a2 + 2a0 a1 b0 = c0 c2 a22 + a20 − 2c0 a1 a2
(3)
Let b2 = 0. We obtain (a1 − c2 a2 )2 = a2 (c1 a2 − 2a0 )
(4)
Let d = gcd(a2 , c1 a2 − 2a0 ), if prime p|d, then p|2a0, a1 . Therefore, d = 1 or 2. If d = 2, then 2|a2 , 2|a1 . Let a1 = 2a∗1 , a2 = 2a∗2 . We have (a∗1 − c2 a∗2 )2 = a∗2 (c1 a∗2 − a0 ). Hence a∗2 = ±u2 c1 a∗2 − a0 = ±v2 a∗1 − c2 a∗2 = uv where gcd(u, v) = 1. Therefore a0 = ±(c1 u2 − v2 ) a1 = 2(uv ± c2 u2 ) a2 = ±2u2
(5)
where 2|(c1 u2 − v2 ). If d = 1, then a2 = ±u2 c1 a2 − 2a0 = ±v2 a1 − c2 a2 = uv and 1 a0 = ± (c1 u2 − v2 ) 2 a1 = uv ± c2 u2 a2 = ±u2
(6)
where gcd(u, v) = 1 and 2 | c2 u2 − v2 . It is easy to see that for (6) above, 2a0 , 2a1 , 2a2 have the same form as the right side of (5). Since x2 (modn) and (2x)2 (modn) generate the same vector during sieving, we need only to consider (5) and ignore (6) only if we do not mind the parity of c1 u2 − v2 in (5) and permit gcd(a0 , a1 , a2 ) = 2.
Factorization of the Numbers of the Form m3 + c2 m2 + c1 m + c0
133
It is easy to see from (5) that the pairs (−u, −v) and (u, v) generate the same value of x. So, we can assume u > 0. Since (−x)2 and x2 have the same residue mod n, then the double sign ± in (5) can be absorbed by v if v takes both positive and negative values and we need only to consider the plus sign in (5). For ai defined by (5), we have b2 = 0 and x2 ≡ R(mod n), where R = b1 m + b0 and b1 = −4u(c0 u3 + c1 u2 v + c2 uv2 + v3 ) b0 = (c21 − 4c0 c2 )u4 − 8c0 u3 v − 2c1 u2 v2 + v4 .
(7)
R = v4 + d3 v3 u + d2 v2 u2 + d1 vu3 + d0 u4
(8)
Thus, where d3 d2 d1 d0
= −4m = −2(2c2 m + c1 ) = −4(c1 m + 2c0 ) = c21 − 4c0 c2 − 4c0 m
(9)
Now, p we estimate the magnitude of R. Let u, |v| ≤ M, M = Lβ , where L = exp( log n log log n), β > 0. From (7), we have b1 = O(M 4 ), b0 = O(M 4 ), R = O(M 4 n1/3 ) = O(n1/3+ ), > 0. We can see that R is much smaller than the residues in QS for sufficiently large n. Let p be a prime factor of R, if p 6 |u, the p|v from (8), and this contradicts gcd(u, v) = 1. So we can assume p 6 |u. Let (10) f(t) = t4 + d3 t3 + d2 t2 + d1 t + d0 From (8), we have
p|R, p 6 |u ⇔ f(vu−1 ≡ 0(mod p)
where u−1 is the inverse of u mod p. Hence, if t0 is a solution to the congruence f(t) ≡ 0(mod p)
(11)
then every pair (u, v) satisfying p 6 |u, v ≡ t0 u(mod p) will generate a residue R divided by p. The solutions to (11) can be found by trial and error. If four solutions are found already, then we can stop since (11) has at most four solutions. Now, we give the factoring algorithm.
3
Algorithm
1. Generate a factor base F B, F B = {p|pis prime, p ≤ B, (11)has at least one solution}.
134
Zhang Mingzhi
Choose sufficiently large B so that #F B = Lα , α > 0. (Later we shall give the optimal value for α.) For all pi ∈ F B, find all solutions tij , j ≤ 4, to (11) by trial and error and compute log pi . 2. For 1 ≤ u ≤ Lβ , sieve (β should be sufficiently large so that enough vectors can be found). For pi ∈ F B, let rij ≡ utij (mod pi ), and let v = rij , rij ± pi , rij ± 2pi , · · ·
|v| ≤ Lβ .
If gcd(u, v) = 1, then add log pi to the sieve array at location v. Scan the sieve array. If the value at location v is larger than E(u, v) = log
m + log |b1 | pTmax
where pmax is the largest prime in F B and T is the tolerance of large prime (1 < T < 2) ([4]), then the pair (u, v) will generate a vector by trial division (perhaps with a large prime factor). 3. The elimination and the rest of the algorithm is the same as in ordinary QS.
4
Analysis of the Algorithm
Now, we estimate the expected running time of the algorithm 1. Since we have to resort to an unproved hypothesis, the following analysis is only heuristic and not fully proved. Let Ψ (x, y) be the number of positive integers not exceeding x free of prime factors exceeding y. It is well known that ([1]) Ψ (x, y) = xu−u+0(u), u =
log x . log y
Let x = nc , y = Lα . We have c
Ψ (nc , Lα ) = ncL− 2α ; specifically,
c Ψ n1/3 , Lα = n1/3 L− 6α .
To go further, we need the following hypothesis. Hypothesis: There exists a constant c > 0 such that there exist at least cπ(Lα ) primes p ≤ Lα for which (11) has at least one solution. The residues R generated by (u, v) above are distributed with respect to a certain fraction of them h havingiall of their prime factors below some point as are all of the integers in 1, n1/3+ . Therefore, the probability for residues R free of prime factors exceeding Lα is L−1/6α . We must generate Lα+1/6α residues in order to find requisite Lα of them which are composed solely of the primes p ≤ Lα . The number of the pairs (u, v) satisfying 1 ≤ u, |v| ≤ M = Lb eta and gcd(u, v) = 1 is 2Σ1≤i≤M Φ(i) = 6M 2 /π 2 + O(M log M ) > (1/2)M 2
Factorization of the Numbers of the Form m3 + c2 m2 + c1 m + c0
135
for sufficiently large M , where Φ(i) is Euler’s totient function. Therefore, if we take β = 1/2 α + 1/6α ; then there are enough requisite pairs (u, v). By the hypothesis above, in step i) we need only to choose B = Lα and the time required is L2α . In step ii), the sieving needs time L2β = Lα+1/(6α) and the trial division needs time L2α . So the time required is Lγ , γ = max{a + 1/(6α), 2α}. The exponent and elimination needs time Lαr , where r is the elimination n o 2 ≤ r ≤ 3. δ Therefore, the total running time is L , δ = max α + 1/(6α), αr . It is easy to p see that δ is minimal when √ α + 1/(6α) = αr, or α = 1/ 6(r√− 1). In this case, 6(r−1) and the space required√is L2/ 6(r−1) . For r = 3, the running time is Lr/ √ 3/2 and the space required is L 3/3 . the running time is L
5
Some Remarks
1. Let n be an integer, if there exists relatively small integer k such that kn is of the form (1). Then we can deal with kn and factor n in the way above. 2. The solution to (11) can also be found using other methods; for example, random splitting method in [2]. 3. The algorithm here is faster than MPQS for sufficiently large n which are of the form (1). For relatively small n, M 4 = L4β may exceed n1/6 and R may exceed n1/2 . In this case, we have to reduce β so that R does not exceed n1/2 and we may have not enough vectors. A remedy is to use MPQS to generate some other vectors. Namely, we combine our algorithm and MPQS if necessary. 4. When we attempt to generalize the Algorithm, some difficulties appear. For example, n = m4 + c, |c| small, let x = a3 m3 + a2 m2 + a1 m + a0 . Then x2 ≡ b3 m3 + b2 m2 + b1 m + b0 (mod n). Let b3 = b2 = 0. We obtain a system of Diophantine equations which can reduce to the Diophantine equation (12) a42 + ca43 = t2 For some values of c, for example c = −2, (12) has infinitely many non-trivial solutions ([5], [6]), but the magnitude of the solutions increases very quickly. For values of c, for example c = 1, (12) has only trivial solutions a3 = 0 which are useless for our purpose. Therefore, the perspective of generalization of the algorithm seems dim. 5. The implementation of this algorithm will be discussed in a subsequent paper. Acknowledgment: The author thanks Joe Buhler and Cathy D’Ambrosia for their help in putting this paper into TEX.
136
Zhang Mingzhi
References 1. H.W. Lenstra and R. Tijdlman (eds): Computational Methods in Number Theory. Math. Center tracts, No. 154, Math. Centrum Amsterdam (1982) 2. H. Cohen: A Course in Computational Algebraic Number Theory. Graduate Texts in Mathematics, Vol. 138, Springer-Verlag Berlin (1993). 3. A.K. Lenstra, H.W. Lenstra, M.S. Manasse, and T.M. Pollard: The Number Field Sieve. Proc. 22nd annual ACM, STOC (1990) 564-572. 4. R.D. Silverman: The Multiple Polynomial Quadratic Sieve. Math. Comp. 48 (1987) 329-339. 5. L.J. Mordell: Diophantine Equations. Academic Press, London and New York (1969). 6. L.E. Dickson: History of the Theory of Number. Vol. 2, Chelsea Publishing Company, New York (1952). 7. J.M. Pollard: Factoring with Cubic Integers. Lecture Notes in Mathematics Vol. 1554, 4-10. 8. D. Coppersmith, A.M. Odlzyko, R. Schroeppel: Discrete Logarithms in GF(p). Algorithmica 1 (1986), no. 1, 1-15.
Modelling the Yield of Number Field Sieve Polynomials Brian Murphy Computer Sciences Laboratory Research School of Information Sciences and Engineering Australian National University, Canberra ACT 0200. [email protected]
Abstract. Understanding the yield of number field sieve polynomials is crucial to improving the performance of the algorithm, and to assessing its potential impact on the practical security of cryptosystems relying on integer factorisation. In this paper we examine the yield of these polynomials, concentrating on those produced by Montgomery’s selection algorithm. Given such a polynomial f , we consider the influence of two factors; the size of values taken by f and the effect of the knowing the primes p for which f has roots mod p. Experiments show the influence of the first property, particularly whilst sieving close to real roots. Estimates of the effect of the second property show that it may effect yield by as much as a factor of two. We present sieving experiments demonstrating the effect to that extent. Finally, we suggest a preliminary model to approximate the behaviour of these polynomials across the sieving region.
1
Introduction
In practice the speed at which the number field sieve can factor a large integer N is limited mainly by the supply of smooth integers of a given form. In particular, given a polynomial f ∈ Z[x] of degree d and irreducible over Q let F ∈ Z[x, y] be F (x, y) = yd f(x/y). For the number field sieve we require two such polynomials F1 and F2 with a common root mod N . The sieving stage of the number field sieve involves searching for coprime (x, y) for which both |F1 (x, y)| and |F2 (x, y)| are B-smooth for some bound B (see [3] for details). The area in which the number field sieve has the greatest capacity for improvement is in the selection of these polynomials. “Better” polynomials are those which are more likely to take smooth values. In this paper we examine the yield of particular number field sieve polynomials, that is, the number of B-smooth values taken by some given F in the sieve region. Models of polynomial yield are useful for optimising the performance of the number field sieve, and for assessing its potential impact in practice on the security of cryptosystems relying on integer factorisation. The key to predicting yield is to have some understanding of smoothness probabilities, both for randomly chosen integers and for values taken by F . Throughout this paper the term random integer of size r means a positive integer chosen J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 137–150, 1998. c Springer-Verlag Berlin Heidelberg 1998
138
Brian Murphy
uniformly at random from {i ∈ Z+ : 0 < i ≤ r}. Let ψ(r, B) = {i ∈ Z+ : 0 < i ≤ r and r is B-smooth} . Then the probability that a random integer of size r is B-smooth is ψ(r, B)/r. For fixed B this probability is a function only of the size of r. Indeed, the probability decreases rapidly as r increases. The size of values taken by polynomials is therefore a key factor in polynomial selection. Consider a particular polynomial F (x, y). If there exist coprime x1 , x2 ∈ Z for which F (x1 , x2 ) = r then the probability that r is B-smooth is no longer just a function of its size. The primes for which F has no roots mod p never divide values taken by F . If F does have roots mod p then it is likely to have more than one. Depending on how the primes for which F does and does not have roots mod p are distributed the probability that r is smooth may differ significantly from its value if r was simply chosen at random. For a particular F , we call this property – the property by which the distribution of the primes p for which F has (or does not have) roots mod p affects the likelihood of F -values being smooth – the root property of F . To date the polynomial selection algorithm which is in some sense the best is an algorithm proposed by Peter Montgomery, reported in [5]. Montgomery’s algorithm produces pairs of quadratic polynomials with a common root mod N , whose coefficients are heuristically bounded by O(N 1/4 ). In this paper we discuss the yield of number field sieve polynomials, illustrated by polynomials selected according to Montgomery’s algorithm, with respect to their size and their root properties. We examine the yield of particular polynomials across the sieving region noting the increase in yield of all polynomials across their real roots. This increase can be exploited to give more relations from a given number of smooth polynomial values. The results in [10] suggest that, all other things being equal, the difference in yield due to the range of root properties found in candidate polynomials can be as much as a factor of two. Here we present results of sieving experiments conducted on five candidate quadratic polynomials. Results show relative changes in yield to that extent, confirming the importance of attention to root properties. Finally, we suggest a preliminary model to approximate the yield. This model can easily be extended to higher degree polynomials.
2
The Sieving Experiments
We chose five candidate polynomials, Polynomials A,B,. . .,E for a particular 106 digit integer C106 (C106 divides 12157 + 1). In Montgomery’s algorithm, the size of the coefficients produced depends on the form of sieving to be conducted. The most efficient method of sieving in the implementation described in [5] is line sieving. Line sieving is a special form of lattice sieving (see [7]) in which only the line F (x, 1) is sieved. In general sieving without fixing y is preferred. Our model can be extended to this case, but in any event behaviour of F (x, y) as y varies can reasonably be deduced from its behaviour at y = 1. Hence we
Number Field Sieve Polynomials
139
refer only to quadratic polynomials F (x, 1) = f(x). The polynomials A,B,. . .,E all have coefficients with the same number of digits as we would expect from Montgomery’s algorithm on C106, optimised for line sieving. Also, they all have two real roots in the sieve region |x| ≤ 1015 . The polynomials were chosen specifically for their root properties. We use the parameter α(f) to measure the effect of the root properties of f. If α(f) < 0 then heuristically, f takes values more likely to be smooth than random integers of the same size (and vice versa if α(f) > 0). In [10] α is adapted to number field sieve polynomials from ideas that appear in the analysis of MPQS (see [2]). We review the construction of α now. Whilst sieving with the prime p, the expected exponent of p in the factorisation of a given f-value (that is, the expected contribution of p) is removed from the f-value. Let qp be the number of distinct roots of f modulo p. The expected contribution of p is estimated by p
qp
1 1 p + p2
+···
qp
= p p−1 .
The summation 1/p + 1/p2 + . . . counts the contribution from powers of p, since any root mod p corresponds to a unique root mod pk for k > 1. Multiplication by qp counts one contribution from each root of f modulo p. So, after sieving we estimate the sieve array location corresponding to log |f(x)| to be log |f(x)| −
X p≤B
qp
log p . p−1
The corresponding value for a random integer r of the same size is log r −
X log p . p−1
p≤B
So it is suggested that f-values behave like random integers of log size α(f) + log f(x) where X log p . (1 − qp ) α(f) = p−1 p≤B
When α(f) < 0, f-values are more likely to be smooth than random integers of the same size. Moreover, α(f) is most negative when f has many roots modulo small p. Polynomials A, . . . ,E have α ∈ [−2.56, 1.51] as shown Table 1. On each of the polynomials A,. . .,E we performed line sieving in short intervals along |x| ≤ 1015 . We used the smoothness bound B = 2700000 for complete relations and B2 = 30000000 for incomplete relations (with up to two large primes) in accordance with [5]. We sieved in intervals of length 108 centred at steps of 1014 along the sieve interval, and in intervals of 108 centred at each root of each polynomial in the sieve interval.
140
Brian Murphy f α(f ) A B C D E
−2.56 −1.50 −0.50 0.52 1.51
Table 1. α values for candidate polynomials.
3
Results
We refer to the number of B-smooth f-values as the full yield, the number of f-values which are B-smooth but for the appearance of one large prime the 1LP yield, and the number of f-values which are B-smooth but for the appearance of two large primes the 2LP yield. We refer to the sum of the full, 1LP and 2LP yields as the total yield. 3.1
Yield Across the Sieve Region
For all polynomials the obvious feature of yield across the sieve region is the relative increase at real roots. This of course is due to the polynomials taking much smaller values close to roots. Common to all polynomials we investigated is an increase in total yield by a factor of at least 15 across roots. Polynomial A is typical and Figure 1 shows the total yield of polynomial A across the region. 4
18
x 10
16
14
Total Yield
12
10
8
6
4
2
0 −1
−0.8
−0.6
−0.4
−0.2
0 x
0.2
0.4
0.6
0.8
1 15
x 10
Fig. 1. Total yield (with roots) of Polynomial A with |x| ≤ 1015
Number Field Sieve Polynomials
141
During an entire sieve run then, values of x close to real roots of f(x) are a richer supply of smooth f-values than those not. Particularly for polynomials of degree greater than two, it is therefore essential to choose polynomials with as many real roots in the sieve region as possible (or to choose the sieve region to encompass as many real roots as possible). Remark 1. The location of the real roots becomes important too, particularly when sieving F (x, y) as y varies. In that case, since F (x, y) = yd f(x/y) and y 6= 0 the real roots of F lie along the lines x = τ y where f(τ ) = 0 and τ ∈ R. Ideally the region spanned by these lines in the x, y-plane should be chosen so that the smaller values of F (x, y) between and across the lines x = τ y occupy a large portion of the sieve region. However, most values of x in the sieve region are not close to real roots of f. The total yield away from real roots is not quite as flat as Figure 1 indicates. Figure 2 shows total yield across |x| ≤ 1015 just in steps of 1014 (that is, without explicitly showing the yield at real roots). 4
1.5
x 10
1.4
1.3
Total yield
1.2
1.1
1
0.9
0.8
0.7
0.6 −1
−0.8
−0.6
−0.4
−0.2
0 x
0.2
0.4
0.6
0.8
1 15
x 10
Fig. 2. Total yield (without roots) of Polynomial A with |x| ≤ 1015
Remark 2. Figure 2 suggests that, in relative terms, the yield of f varies greatly across the region. This has consesquences in the collection of relations. A relation for the number field sieve is a coprime integer pair (x, y) for which both F1 (x, y)
142
Brian Murphy
and F2 (x, y) are smooth (or almost smooth). In our case, since we use line sieving, y = 1. To this point we have considered only the yield of f1 and f2 individually. It is reasonable to assume that these yields are independent. Now, at any subinterval s of the sieve interval, the number of relations descending from the smooth values of f1 and f2 is at most the minimum of yield(f1 ) and yield(f2 ) across s. So it is wasteful of yield(f2 ) say, to have yield(f1 ) yield(f2 ) across s. Hence, for a given x, f1 and f2 having close real roots will increase the chance of f1 (x) and f2 (x) both being smooth (that is, of (x, 1) being a relation). The same argument holds for sieving F1 (x, y) and F2 (x, y) as y varies. Particularly if one is using more than two polynomials ([6]) current performance might be exceeded by considering the proximity of real roots when selecting polynomials. There is another phenomenon to be observed across the sieve region. Let T be the total yield, and Q, R, S be the full, 1LP and 2LP yields respectively. For all five polynomials the proportions Q/T and R/T increase close to real roots at the expense of S/T . For example, for Polynomial A the proportion Q/T increases from 10% to 18%, R/T increases from 38% to 44% and S/T decreases from 52% to 38%. For the other polynomials the proportions take similar values. This can be explained as follows. Suppose r is a random integer and let u = log r/ log B. Let ρ(u) be Dickman’s rho function [4]. Asymptotically, ρ(u) gives the probability that r is B-smooth. This function can be generalized to give asymptotic approximations to the probability that r has exactly one or exactly two large primes factors at most B2 , but is otherwise B-smooth. In particular, let σ1 (u, v) be the former, and σ2 (u, v) the latter function, with v = log r/ log B2 . Derivations of and effective methods for calculating these functions are given in [1] and [9] respectively. Using these methods we observe that in the range of interest ∂σ1 ∂σ2 ∂σ2 ∂σ1 > and . ∂v ∂u ∂v ∂u
(1)
Note that the inequality for σ1 (u, v) is not true for arbitrary u and v. Intuitively (1) means that as r increases the smoothness probabilities for 2LP smoothness (and to a lesser extent 1LP smoothness) depend more on r being B2 -smooth than on the cofactor (with the large primes removed) being B-smooth. That is, B2 -smoothness is the “difficult” property. The difference in (1) between σ1 and σ2 comes from ∂σ2 ∂σ1 > . ∂u ∂u Intuitively, σ1 ought to be more sensitive than σ2 to changes in u because a B2 -smooth integer with only one known prime factor between B and B2 is less likely to be otherwise B-smooth than one with two known prime factors between B and B2 . Now, since B < B2 dv du > . (2) dr dr
Number Field Sieve Polynomials
143
Ignoring for the moment the question of root properties, (1) and (2) imply that as |f(x)| decreases S/T ought to decrease relative to both Q/T and R/T , and that R/T ought to decrease slightly relative to Q/T . Note also that if the sieve threshold is set too high, sieving will fail to identify a small number of smooth values close to roots. 3.2
Yield due to Root Properties
Differences in yield amongst polynomials f1 and f2 , due only to root properties, can be observed by examining the yield across regions where f1 ≈ f2 . We sieved each polynomial B, . . . ,E in intervals of size 108 centred on a point at which the polynomials take the same value as Polynomial A. Over the entire interval the “other” polynomial has the same size as polynomial A to at least the fourth significant figure, and usually more. Any difference in yield between the polynomials over these intervals should therefore be due their different root properties. Complete results appear in Appendix 2. We summarize the results in Table 2 below. Relative yields shown are the yield of Polynomial A relative to the “other” polynomial, so for example the full yield of Polynomial A is 2.32 times that of Polynomial E. Polyn- α(f )− rel. total rel. full rel. 1LP rel. 2LP omial f α(A) yield yield yield yield B C D E
1.06 2.06 3.08 4.07
1.46 1.92 1.94 2.03
1.55 2.09 2.20 2.32
1.54 1.99 1.99 2.08
1.39 1.83 1.84 1.95
Table 2. Relative Yields due to Root Properties
According to the calculations in [10] the increases in full yield of A should be approximately 1.24, 1.51, 1.86, 2.30 relative to polynomials B,. . . ,E respectively. Also in [10] there are upper and lower bounds for the increases in 1LP and 2LP yields for two polynomials whose α values differ by 4. The values above for Polynomial E fall close to the middle of those bounds. The values taken by Polynomials C and D behave more like random integers than we expect on the basis of [10]. Probably this is because [10] considers only changes in α, not the value itself. The values α(C) and α(D) are close to zero (−0.50 and 0.50 respectively). Hence we must expect their values to behave more like random integers than if their α values were −2 and −1 for example. We conclude that in practice, differences in yield from root properties alone can indeed be as much as a factor of two. Root properties are therefore a factor which should be considered whilst modelling yield.
144
4
Brian Murphy
Modelling the Yield
In this section our aim is to present a method of estimating the number of relations from a given polynomial f. The estimate must balance the effect of the root properties of f, and the effect of the size of values taken by f. 4.1
Combining the Factors
Consider a particular quadratic number field sieve polynomial f = ax2 + bx + c. For p prime recall that qp is the number of distinct roots of f modulo p, so either qp = 0, 2 or (less frequently) qp = 1. In fact qp can be deduced in advance as follows. Let ∆ denote the discriminant of f, and let p be an odd prime with p 6 |∆. Let 2 if ∆ = 1, p qp = 0 if ∆ = −1. p If p|∆, put qp = 1. For p = 2 let 2 if a, b odd, c even, q2 = 0 if a, b even, c odd, or a, b, c all odd. Otherwise let q2 = 1. Let P (r, B) denote the probability that r ∈ Z+ is B-smooth. For random integers r, asymptotically (as r → ∞) P (r, B) is approximated by Dickman’s rho function ([4]) . In fact, for u = log r/ log B and u > 2, P (r, B) = ρ(u) + (1 − γ)
ρ(u − 1) + O((log r)−2 ) log r
where γ is Euler’s constant (see [8]). The second term in the approximation disappears as r → ∞, but contributes to the second significant figure in our range of interest. Recall that we consider f-values f(x) as likely to be smooth as random integers of size f(x) · eα(f) . Assume that f(x) > 0 and let uf (x) =
log f(x) + α(f) . log B
Then we assume that P (f(x), B) ≈ ρ(uf (x)) + (1 − γ)
ρ(uf (x) − 1) . log f(x)
Suppose I is some sieve interval. Then X X ρ(uf (x) − 1) P (f(x), B) ≈ ρ(uf (x)) + (1 − γ) log f(x) x∈I
x∈I
(3)
Number Field Sieve Polynomials
145
We use the right hand side of (3) to approximate the full yield of f across I. For our experiments |I| = 108 , so (3) is too time consuming to compute completely. Instead we approximate the summation by breaking I into s subintervals over which the right hand side of (3) does not change significantly. In fact we use s = 105 sub-intervals, each of length 103 . Let Is be the interval I so divided, so Is contains every thousandth element of I. Hence, if Xf denotes the full yield of f across I, then |I| X ρ(uf (x) − 1) · . (4) ρ(uf (x)) + (1 − γ) Xf ≈ s log f(x) x∈Is
4.2
Experimental Results
We tested estimate (4) for Xf on seven polynomials with α-values sufficiently low to be acceptable number field sieve polynomials. In particular, we used Polynomial A, and six other polynomials F,G,. . .,K. Polynomials F,. . .,K are polynomials used to factorise 105, 106 and 107 digit integers in [5]. We calculated estimate (4) in an interval of size 108 across one real root of each polynomial, and sieved the polynomial across the same root. Yields across the two roots of each polynomial are almost identical so the choice of root is arbitrary. We used B = 1.6 · 106 for polynomials F and G in accordance with [5], otherwise B = 2.7 · 106 . The complete sieving results appear in Appendix 3. Below we reproduce only the results for full relations. Poly- Est. full Full Relative nomial yield yield error (%) K J A H I F G
30462 30461 30193 27621 25583 25209 17096
30732 26100 29005 28248 24646 22186 15989
-0.9 16.7 4.1 -2.1 3.2 13.6 6.9
Table 3. Estimated vs actual full yield
The estimate places only one polynomial, J, in the incorrect position, and has an average relative error of 5.9% (most of which is contributed by polynomials J and F). Remark 3. Clearly there is a weakness in the model that causes significant overestimation for some polynomials. We believe the weakness lies in assessing the average contribution to f-values of the primes p for which p|cd , where cd is the leading coefficient of f. We leave Remarks 1,2 and 3 as subjects of further study.
146
Brian Murphy
Table 3 contains yield only at the roots of the polynomials. Whilst we expect yield at roots to reflect the “peak” yield of a given polynomial, we saw at Remark 2 that it is also of interest to note how estimate (4) changes across an entire sieve interval. In figure 3 below we show estimate (4) across the entire |x| ≤ 1015 interval for Polynomial A (excluding yield at roots). We also show estimate (4) at α = 0, that is, the expected yield if values taken by Polynomial A are as likely to be smooth as random integers of the same size. This is much lower than the actual yield.
2000 actual alpha = 0 alpha = −2.5
1800
Number of full relations
1600
1400
1200
1000
800
600
400 −1
−0.8
−0.6
−0.4
−0.2
0 x
0.2
0.4
0.6
0.8
1 15
x 10
Fig. 3. Estimated and actual yield of Polynomial A with |x| ≤ 1015 We conclude that the approach in (4) to estimating the yield is useful. This approach can be extended to more general NFS polynomials.
Acknowledgements We are very grateful to the computational number theory group headed by Herman te Riele at Centrum voor Wiskunde en Informatica (CWI) in Amsterdam for sharing with us their implementation of the general number field sieve. The implementation was developed by Peter Montgomery and partially by Arjen Lenstra and Oregon State University. The author thanks Richard P Brent and Arjen Lenstra for valuable discussions.
Number Field Sieve Polynomials
147
References 1. E Bach and R Peralta, “Asymptotic Semismoothness Probabilities” Math. Comp. 65 (1996), pp 1717–1735. 2. H Boender, “Factoring Integers with the Quadratic Sieve”, PhD Thesis, University of Leiden, 1997. 3. J P Buhler, H W Lenstra Jr, C Pomerance, “Factoring Integers with the Number Field Sieve”, The Development of the Number Field Sieve, LNM 1554 (1993) pp 50–94. 4. K Dickman, “On the Frequency of Numbers Containing Prime Factors of a Certain Relative Magnitude”, Ark. Mat., Astronomi och Fysik 22A 10 (1930), pp 1–14. 5. M Elkenbracht-Huizing, “An Implementation of the Number Field Sieve”, Experimental Mathematics 5(3) (1996) pp 375–389. 6. M Elkenbracht-Huizing, “A Multiple Polynomial General Number Field Sieve”, Algorithmic Number Theory, LNCS 1122 (1996) pp 99 – 114. 7. R A Golliver, A K Lenstra and K S McCurley, “Lattice Sieving and Trial Division”, Algorithmic Number Theory, LNCS 877 (1994) pp 18–27. 8. D E Knuth and L T Pardo, “Analysis of a Simple Factorization Algorithm”, Theor. Comp. Sci. 3 (1976) pp 321-348. 9. R Lambert, “Computational Aspects of Discrete Logarithms”, PhD Thesis, Univeristy of Waterloo, 1996. 10. B Murphy and R P Brent, “On Quadratic Polynomials for the Number Field Sieve”, Computing Theory 98, ACSC 20(3) (1998) , Springer, pp 199-215.
148
Brian Murphy
Appendix 1. Polynomials Polynomials A, . . .,K are listed below. The values of m given are m ∈ Z for which f(m) ≡ 0 mod N . The values of N are C106 for polynomials A,. . .,E and polynomials H and I ; C105 for polynomials F and G, and C107 for polynomials J and K. Values of C105, C106 and C107 can be found in [5]. Polynomial A: 10642297120196616201018579748198464994687 +157168918105124331525011637x − 323379595900x2 m = 311811767144256795964392770799295468577727849287441 417195888224875673003757757525998997704760967662422630 Polynomial B: −58535465962950604788770735849031669686845 +578123152107916050639034324x + 660940091871x2 m = 111266350151832591590373321222840072472133768682060 5812518391957850167078163045569883641392384840611818322 Polynomial C: −80444723076532128931843884067440931877697 +671898769354767184209613115x + 876541800001x2 m = 644385945238412299450097726772298730429521837407426 656132710287589175267555416671359532826085727240133210 Polynomial D: −45601329349014245961324468559468003125143 +405863886956809889611012220x + 875883403741x2 m = 57022157889652460507276414622928637851608638531004 7513013419381527088912105584724979693796690373689178237 Polynomial E: −43070512279968963999727149653384015128406 −140644997594088206014438353x + 274174364727x2 m = 21431385359461632490985189041791385017574508889045 6629204834574379795020566498337694386071915713661516800
Number Field Sieve Polynomials
Polynomial F: 540759062604782971357139536186424874771 +86817069333519465483641612x + 342910527737x2 m = 22914359055586946906211501353855768192316423575426 6217765793563500275674926893987223245481401160544005942 Polynomial G: 129128767300065233631168229536267982420800 −913049273181768816962553218x + 1242060255079x2 m = 22914359055586946906211501353855768192316423575426 6217765793563500275674926893987223245481401160544005942 Polynomial H: −32430287560495976143910317159823376255144 −101643163734436736066960294x + 190030476113x2 m = 17900441287572625768481534121337659378990978888143 77815816769105476827696665209945565825606429787588581699 Polynomial I: 164086080001456034179238766543256687713827 −401968646051742270344280172x − 785083260639x2 m = 17900441287572625768481534121337659378990978888143 77815816769105476827696665209945565825606429787588581699 Polynomial J: −311653994359418670319775330136434513506986 +763119703166287854853198889x − 241799514805x2 m = 12637530599467776761853128412624277137347729851839 924048392287605249253270797264409813230653725405155484892 Polynomial K: −46786964108579179806101863478910720071558 −425704283028714253779269315x − 540161776283x2 m = 12637530599467776761853128412624277137347729851839 924048392287605249253270797264409813230653725405155484892
149
150
Brian Murphy
2. Root Property Results Polynomials were sieved in the interval x ∈ [y + 108 ] where y is the integer given in the second column of Table 4. Poly. x ∈ [y + 108 ] total yield full yield 1LP yield 2 LP yield A B
12676212/ /6831982
11745 8035
1321 853
4609 2993
5815 4189
A C
13467778/ /6029590
11610 6056
1354 648
4513 2270
5743 3138
A D
13641626/ /0145271
11498 5931
1244 565
4379 2169
5875 3197
A E
63904732/ /3527552
11954 5880
1294 557
4601 2216
6059 3107
Table 4. Sieving results for “same size” regions
3. Estimate (4) Sieve Results
Poly. F G H I J K
α
root
-2.72 -24678703/ /4140270 -2.42 191109377/ /765832 -1.34 -22468082/ /1282226 -2.42 267964342/ /241421 -2.91 482011561/ /612155 -2.28 -65608475/ /1023559
total yield full yield 1LP yield 2 LP yield 186397
22186
76072
88139
142176
15989
57086
69101
166630
28248
73871
64511
146480
24646
64431
57403
154877
26100
68366
60411
176082
30732
78145
67205
Table 5. Yield across roots for Polynomials F,. . .,K
A Montgomery-like Square Root for the Number Field Sieve Phong Nguyen Ecole Normale Sup´erieure Laboratoire d’Informatique 45, rue d’Ulm F - 75230 Paris Cedex 05 [email protected]
Abstract. The Number Field Sieve (NFS) is the asymptotically fastest factoring algorithm known. It had spectacular successes in factoring numbers of a special form. Then the method was adapted for general numbers, and recently applied to the RSA-130 number [6], setting a new world record in factorization. The NFS has undergone several modifications since its appearance. One of these modifications concerns the last stage: the computation of the square root of a huge algebraic number given as a product of hundreds of thousands of small ones. This problem was not satisfactorily solved until the appearance of an algorithm by Peter Montgomery. Unfortunately, Montgomery only published a preliminary version of his algorithm [15], while a description of his own implementation can be found in [7]. In this paper, we present a variant of the algorithm, compare it with the original algorithm, and discuss its complexity.
1
Introduction
The number field sieve [8] is the most powerful known factoring method. It was first introduced in 1988 by John Pollard [17] to factor numbers of form x3 + k. Then it was modified to handle numbers of the form r e − s for small positive r and |s|: this was successfully applied to the Fermat number F9 = 2512 + 1 (see [11]). This version of the algorithm is now called the special number field sieve (SNFS) [10], in contrast with the general number field sieve (GNFS) [3] which GNFS factors integers n in heuristic time can handle arbitrary integers. 1/3 2/3 exp (cg + o(1)) ln n ln ln n with cg = (64/9)1/3 ≈ 1.9. Let n be the composite integer we wish to factor. We assume that n is not a prime power. Let Zn denote the ring Z/nZ. Like many factoring algorithms, the number field sieve attempts to find pairs (x, y) ∈ Z2n such that x2 ≡ y2 (mod n). For such a pair, gcd(x − y, n) is a nontrivial factor of n with a probability of Pd at least 12 . The NFS first selects a primitive polynomial f(X) = j=0 cj X j ∈ Z[X] irreducible over Z, and an integer m with f(m) ≡ 0 (mod n). Denote by F (X, Y ) = Y d f(X/Y ) in Z[X, Y ] the homogenous form of f. Let α ∈ C be a J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 151–168, 1998. c Springer-Verlag Berlin Heidelberg 1998
152
Phong Nguyen
root of f, and K = Q(α) be the corresponding number field. There is a natural ring homomorphism φ from Z[α] to Zn induced by φ(α) ≡ m (mod n). We will do as if φ mapped the whole K. If ever φ(β) is not defined for some β ∈ K, then we have found an integer not invertible in Zn , and thus, a factor N of n which should not be trivial. If n0 = n/N is prime, the factorization is over, and if not, we replace n by n0 , and φ by φ0 induced by φ0 (α) ≡ m (mod n0 ). By means of sieving, the integer pairs (ai , bi ) and a finite Q NFS finds several Q nonempty set S such that i∈S (ai − bi α) and i∈S (ai − bi m) are squares in K Q Q and in Z, respectively. We have φ i∈S (ai − bi α) ≡ i∈S (ai − bi m) (mod n), therefore 2 2 sY sY φ (ai − bi α) ≡ (ai − bi m) (mod n) i∈S
i∈S
after extracting the square roots, which gives rise to a suitable pair (x, y). The NFS does Q not specify how to evaluate these square roots. The square root of the prime factorizainteger i∈S (ai − bi m) mod n can be found using the known Q − bi α) is much tions of each ai − bi m. But extracting the square root of i∈S (aiQ more complicated and is the subject of this paper. We note γ = i∈S (ai − bi α). The following facts should be stressed: – the cardinality |S| is large, roughly equal to the square root of the run time of the number field sieve. It is over 106 for n larger than 100 digits. – the integers ai , bi are coprime, and fit in a computer word. – the prime factorization of each F (ai , bi ) is known. – for every prime number p dividing cd or some F (ai , bi ), we know the set R(p) consisting of roots of f modulo p, together with ∞ if p divides cd . The remainder of the paper is organized as follows. In Section 2, we review former methods to solve the square root problem, one of these is used in the last stage of the algorithm. Section 3 presents a few definitions and results. In Section 4, we describe the square root algorithm, which is a variant of Montgomery’s original algorithm, and point out their differences and similarities. We discuss its complexity in Section 5. Finally, we make some remarks about the implementation in Section 6, and the appendix includes the missing proofs.
2
Former Methods
UFD Method. If α is an algebraic integer and the ring Z[α] is a unique factorization domain (UFD), then each ai − bi α can be factored into primes and units, and so can be γ, which allows us to extract a square root of γ. Unfortunately, the ring Z[α] is not necessarily a UFD for the arbitrary number fields GNFS encounters. And even though Z[α] is a UFD, computing a system of fundamental units is not an obvious task (see [4]). The method was nevertheless applied with success to the factorization of F9 [11].
Square Root for the Number Field Sieve
153
Brute-Force Method. One factorizes the polynomial P (X) = X 2 − γ over K[X]. To do so, one has to explicitly write the algebraic number γ, for instance by expanding the product: one thus gets the (rational) coefficients of γ as a polynomial of degree at most d − 1 in α. But there are two serious obstructions: the coefficients that one keeps track of during the development of the product have O(|S|) digits. Hence, the single computation of the coefficients of γ can dominate the cost of the whole NFS. And even if we are able to compute γ, it remains to factorize P (X). One can overcome the first obstruction by working with integers instead of ˆ ˆ be the algebraic rationals: let f(X) be the monic polynomial F (X, cd), and α 2d|S|/2e ˆ0 0 ˆ α)2 γ f (ˆ integer cd α which is a root of f . If γ is a square in K then γ = cd 0 is a square in Z[α ˆ], where fˆ denotes the formal derivative of fˆ. It has integral coefficients as a polynomial of degree at most d−1 in α ˆ, and these can be obtained with the Chinese Remainder Theorem, using several inert primes (that is, f is irreducible modulo this prime) if there exist inert primes (which is generally true). This avoids computations with very large numbers. However, one still has to factorize the polynomial Q(X) = X 2 − γ 0 , whose coefficients remain huge, so the second obstruction holds. Furthermore, a large number of primes is required for the Chinese Remainder Theorem, due to the size of the coefficients. Couveignes’s Method. This method overcomes the second obstruction. If f has odd degree d, Couveignes [5] remarks that one is able to distinguish the two √ square roots of any square in K, by specifying its norm. Let γ 0 be the square root with positive norm. Since the prime factorization of N (γ 0 ) is known, the √ 0 integer √ N ( γ ) can be efficiently computed modulo any prime q. If q is inert (mod q). From the Chinese then γ 0 (mod q) can be computed after expanding γ 0 √ ˆ]. One can show Remainder Theorem, one recovers the coefficients of γ 0 ∈ Z[α that the complexity of the algorithm is at best O(M (|S|) ln |S|), where M (|S|) is the time required to multiply two |S|-bit integers. The algorithm appears to be impractical for the sets S now in use, and it requires an odd degree. Montgomery’s strategy [15,14,7] can be viewed as a mix of UFD and bruteforce methods. It bears some resemblance to the square root algorithm sketched in [3] (pages 75-76). It works for all values of d, and does not make any particular assumption (apart from the existence of inert primes) about the number field.
3
Algebraic Preliminaries
Our number field is K = Q(α) = Q(ˆ α), where α is an algebraic number and α ˆ = cd α is an algebraic integer. Let O be its ring of integers, and I be the abelian group of fractional ideals of O. For x1 , . . . , xm ∈ K, we note < x1 , . . . , xm > the element of I generated by x1 , . . . , xm . For every prime ideal p, we denote the numerator and by vp the p-adic valuation that maps I to Z. We define Q denominator of I ∈ I to be the integral ideals numer(I) = vp (I)>0 pvp(I) and Q denom(I) = vp (I)<0 p−vp(I) . We denote the norm of an ideal I by N (I), and
154
Phong Nguyen
Q the norm of an algebraic number x ∈ K by NK (x) = 1≤i≤d σi (x), σi denoting the d distinct embeddings of K in C. We define the complexity of I ∈ I to be C(I) = N (numer(I))N (denom(I)), and we say that I is simpler than J when C(I) ≤ C(J). We say that a fractional ideal I is a square if√there exists J ∈ I such that J 2 = I. Such a J is unique and will be denoted I. If pv11 . . . pvmm is the prime ideal factorization of I then: I is a square if and only if every vi is √ v /2 v /2 even; if I is a square, then I = p11 . . . pmm ; if x is a square in K, then so is < x > in I. We follow the notations of [3] and recall some results. Let R be an order in O. By a “prime of R” we mean a non-zero prime ideal of R. We denote by {lp,R : K∗ → Z}p the unique collection (where p ranges over the set of all primes of R) of group homomorphisms such that: – – –
lp,R (x) ≥ 0 for all x ∈ R, x 6= 0; if x is a non-zero element of R, then lp,R (x) > 0 if and only if x ∈ p; ∗ for Y each x ∈ K one has lp,R (x) = 0 for all but finitely many p, and N (p)lp,R (x) = |NK (x)|, where p ranges over the set of all primes of R. p
lp,O (x) coincide with vp ( < x > ). Let βi = cd αd−1−i + cd−1 αd−2−i + · · · + ci+1 . Pd−2 We know that A = Z + i=0 βi Z is an order of O, which is in fact Z[α] ∩ Z[α−1]. Its discriminant ∆(A) is equal to ∆(f) and we have: (d−1)(d−2)
∆(Z[α ˆ]) = cd
∆(A),
(d−1)(d−2) 2
[O : Z[α ˆ]] = cd
[O : A].
Recall that for any prime number p, R(p) is defined as the set consisting of roots of f modulo p, together with ∞ if p divides cd . Note that this R(p) is denoted R0 (p) in [3]. The pairs consisting of a prime number p and an element r ∈ R(p) are in bijective correspondence with the first degree primes p of A: – if r 6= ∞ then p is the intersection of A and the kernel of the ring homomorphism ψp,r : Z[α] → Fp that sends α to r. – if r = ∞ then p is the intersection of A and the kernel of the ring homomorphism ψp,∞ : Z[α−1 ] → Fp that sends α−1 to 0. Let p be a prime number, r an element of R(p) and a, b be coprime integers. If a ≡ br (mod p) and r 6= ∞, or if b ≡ 0 (mod p) and r = ∞, we define ep,r (a, b) = valuation. Otherwise, we set vp (F (a, b)) where vp denotes the ordinary p-adic Q ep,r (a, b) = 0. We have NK (a − bα) = ± c1d p,r pep,r (a,b), the product ranging over all pairs p, r with p prime and r ∈ R(p). Furthermore, for any coprime integers a, b and any first degree prime p of A corresponding to a pair p, r ∈ R(p), we have: if r 6= ∞ ep,r (a, b) lp,A (a − bα) = ep,r (a, b) − vp (cd ) if r = ∞ Theorem 1. Let a and b be coprime integers, and p be a prime number. Let p be a prime ideal of O above p such that vp ( < a − bα > ) 6= 0. If p does not divide [O : A] then:
Square Root for the Number Field Sieve
155
1. For every r ∈ R(p), there is a unique prime ideal pr of O that lies over the first degree prime ideal qr of A corresponding to the pair p, r. pr is a first degree prime ideal, given by pr = < p, β0 −ψp,r (β0 ), . . . , βd−2 −ψp,r (βd−2 ) > . Furthermore, we have vpr ( < a − bα > ) = lqr ,A (a − bα). 2. There is at most one finite r ∈ R(p) such that ep,r (a, b) 6= 0. 3. If p does not divide cd , such a finite r exists and p = pr . 4. If p divides cd , then either p is p∞ , or pr for r finite. 5. p divides F (a, b) or cd . Proof. Let r ∈ R(p) and qr be the first degree prime ideal of A corresponding to the pair p,r. Since P p does not divide [O : A], we have from [3] (Proposition 7.3, pages 65-66): pr |qr f(pr /qr ) = 1, where pr ranges over all primes of O lying over qr and f denotes the residual degree. This proves that pr is unique and is a first degree prime ideal. From [3] (Proposition 7.2, page 65), we also have: X f(p0 /qr )lp0 ,O (a − bα) = lpr ,O (a − bα). lqr ,A (a − bα) = p0 |qr
Hence, vpr (a − bα) = lqr ,A (a − bα). Moreover, we know a Z-basis for any ideal qr of A, namely (p, β0 − ψp,r (β0 ), . . . , βd−2 − ψp,r (βd−2 )). Since pr lies over qr , this Z-basis is a system of O-generators for pr . We therefore proved 1. From the definition of βi , one sees that βi = ci α−1 +ci−1 α−2 +· · · +c0 α−i−1 , which proves that ψp,∞ (βi ) = 0. This simplifies the formula when r = ∞. One obtains 2 from the definition of ep,r . Denote by q the intersection of p and A. q is a prime of A and p lies over q. We have lq,A (a − bα) 6= 0 since vp (a − bα) 6= 0. From [3] (page 89), this proves that q is a first degree prime ideal of A. Hence, there exists r ∈ R(p) such that q = qr . From 1, this proves that p = pr . This r is finite or infinite, and if r is finite, it is the r of 2. This proves 3 and 4. From the formula t u expressing lq,A (a − bα) in terms of ep,r (a, b), we obtain 5.
4
The Square Root Algorithm
We recall that we want to compute a square root of the algebraic number γ = Q i∈S (ai − bi α). The algorithm is split as follows: 1. Transform γ in order to make < γ > simpler. The running time of the rest of the algorithm heuristically depends on C( < γ > ). √ 2. Compute < γ > from the prime ideal factorization of < γ > given by the prime factorization of each F (ai , bi ). √ √ < γ > : using lattice reductions, construct a se3. Approximate γ from quence of algebraic integers δ1 , . . . , δL in O and signs s1 , . . . , sL in {±1} QL such that θ = γ `=1 δ`−2s` is a “small” algebraic integer. θ can be thought Q √ s` γ. as the square of the “guessing-error” in the approximation L `=1 δ` of √ 4. Since γ is a square, so is θ. Compute θ using brute-force method. One is able to explicitly write θ because θ is a “small” algebraic integer.
156
Phong Nguyen
We thus obtain
√
γ as a product of algebraic integers with exponents ±1: √ Y s √ γ= θ δ` ` . L
`=1
√ √ This enables to compute φ( γ) without explicitly calculating γ, and hopefully some factors of n. Although formalized differently, Montgomery’s algorithm uses the same strategy. Only the steps change. We use another heuristic approach in Step 1, which seems to be more effective in practice. We use a new process in Step 2, derived from Section 3. Montgomery used a process which was as efficient, but only heuristic. Step 3 is the core of the algorithm. We modified this step by using the integral basis in a systematic manner, instead of the power basis. This simplifies the algorithm and the proofs. Heuristically, this should also improve the performances. We postpone the computation of the error in Step 4, while Montgomery included it in Step 3, by updating the computations during the approximation. This decreases the running-time because it is easier to estimate the necessary computations when Step 3 is over, and sometimes, Step 4 can be avoided (when the approximation is already perfect, which can be checked without additional computations). The new algorithm might be more suited to analysis, but like Montgomery’s algorithm, its complexity has yet to be determined, even though they both display significantly better performances than former methods. 4.1
Computing in the Number Field
The Ring of Integers. During the whole algorithm, we need to work with ideals and algebraic integers. We first have to compute an integral basis of O. In general, this is a hopeless task (see [13,2] for a survey), but for the number fields NFS encounters (small degree and large discriminant), this can be done by the so-called round algorithms [16,4]. Given an order R and several primes pi , any round algorithm will enlarge this order for all these primes so that the b is pi -maximal for every pi . If we take for the pi all the primes new order R b = O. To determine all these primes, a p such that p2 divides ∆(R), then R partial factorization of ∆(R) suffices, that is a factorization of the form df 2 where d is squarefree and f is factorized. Theoretically, a partial factorization is as hard to find as a complete factorization and unfortunately, the discriminant is sometimes much larger than the number n we wish to factor. However, if one takes a “random” large number, and one removes all “small” prime factors from it (by trial division or by elliptic curves [12]), then in practice the result is quite b = likely to be squarefree. Furthermore, even in the case R 6 O, it will be true b that R has almost all of the good properties of O for all ideals that we are likely to encounter in practice, like the fact that every ideal is a product of prime ideals. This is because every order satisfies these properties for all ideals that are coprime to the index of the order in O. Hence, we can now assume that an integral basis (ω1 , . . . , ωd ) of O has been computed.
Square Root for the Number Field Sieve
157
Algebraic Numbers and Ideals. From this integral basis we can represent any algebraic number of K as a vector of Qd : this Pisd the integral representation. If x ∈ K we define x = [x1, . . . , xd ]t where x = i=1 xi ωi and xi ∈ Q. We can also represent any algebraic number as a polynomial of degree at most d − 1 in α: this is the power representation. When dealing with algebraic integers, the integral representation is preferable. We will represent any integral ideal I by an integral matrix (with respect to (ω1 , . . . , ωd )) from a Z-basis or a system of O-generators. In the case of Z-basis, we use the Hermite normal form (HNF) of the square matrix for efficiency reasons. We refer to [4] for algorithms concerning algebraic numbers and ideals. 4.2
Simplifying the Principal Ideal Q ei If γ is √ a square in K, then so is any γ 0 = i∈S (ai − bi α) √ 0 , when ei = ±1. Since Q √ √ 0 γ = γ ei =−1 (ai − bi α), we can recover γ from γ but actually, we only look for a square identity. Fortunately: 2 2 sY sY φ( (ai − bi α)ei ) ≡ (ai − bi m)ei (mod n) i∈S
i∈S
√ √ This replaces the computation of γ by the computation of γ 0 . By cleverly selecting the ei , C( < γ 0 > ) will be much smaller than C( < γ > ): this is because many < ai − bi α > share the same prime ideals, since many NK (ai − bi α) share the same primes (as a consequence of sieving). We now address the optimization problem of selecting the ei so that C( < γ 0 > ) is small. Given a distribution of ei , the complexity of < γ 0 > can be computed by the following formula (which comes from the known “factorization” of each ai − bi α into primes of A): Y Y p| i∈S ei ep,r (ai ,bi)| × p| i∈S ei [ep,∞ (ai ,bi)−vp (cd )]| .
P
p,r6=∞
P
p|cd
The simplest method is a random strategy which selects randomly ei = ±1. Another method is a greedy strategy (used in [7]): at every step, select ei = ±1 according to the best complexity (whether we put ai − bi α in the numerator or in the denominator). This behaves better than the random strategy. But the best method so far in practice is based on simulated annealing [18], a well-known probabilistic solution method in the field of combinatorial optimization. Here, the configuration space is E = {−1, +1}|S|, and the energy function U maps any e = (e1 , . . . , e|S|) ∈ E to ln C( < γ > ) where γ corresponds to e. For any e ∈ E, we define its neighbourhood V(e) = {(e1 , . . . , ei−1 , −ei , ei+1 , . . . , e|S|) | i = 1, . . . , |S|}. We try to minimize U by the following algorithm, which performances depend on three parameters Θi , Θf (initial and final temperatures) and τ : – select randomly e ∈ E and set Θ ←− Θi . – choose randomly f ∈ V(e) and set ∆ ←− U (f ) − U (e). If ∆ > 0, set p ←− exp(−∆/Θ), otherwise set p ←− 1. Then set e ←− f with probability p, and Θ ←− Θ × τ .
158
Phong Nguyen
– repeat previous step if Θ > Θf . Although this method behaves better in practice than previous methods, theoretical estimates can hardly be given. 4.3
Ideal Square Root
Q ei From now on, we forget about the initial γ and set γ = i∈S (ai − bi α) . √ We wish to obtain γ as a product of ideals with exponents lying in Z (this ideal is too large to be represented as a single matrix). This Q can be done by factoring into prime ideals the fractional ideal < γ > = < i∈S (ai − bi α)ei > . We simplify the problem to the factorization of any linear expression < ai − bi α > with coprime ai , bi . Such a factorization could be obtained by general ideal factorization algorithms (see [4]) but this would be too slow if we had to use these algorithms |S| times. Fortunately, we can do much of the work by ourself using the known factorization of each F (ai , bi ) = f(ai /bi )bdi , as shown in the previous section. We say that a prime number p is exceptional if p divides the index κ = [O : A]. Otherwise, we say that p is normal. Naturally, a prime ideal of O is said to be exceptional (resp. normal) if it lies above an exceptional (resp. normal) prime. If m is the number of prime factors of κ, there are at most md exceptional prime ideals. We compute all the exceptional prime ideals (for example, by decomposing all the exceptional primes in O using the BuchmannLenstra algorithm described in [4]), along with some constants allowing us to compute efficiently any valuation at these primes. From Theorem 1, we get the prime ideal factorization of < a − bα > as follows: for every prime number p dividing cd or such that there exists a finite r ∈ R(p) satisfying ep,r (a, b) 6= 0, – if p is exceptional, compute the valuation of < a−bα > at all the exceptional ideals lying above p. – otherwise, p is normal. If there is a finite r ∈ R(p) such that ep,r (a, b) 6= 0 (r is then unique), pick the prime ideal pr with exponent ep,r (a, b) where pr = < p, β0 − ψp,r (β0 ), . . . , βd−2 − ψp,r (βd−2 ) > . If ∞ ∈ R(p), also pick the prime ideal p∞ with exponent ep,∞ (a, b) − vp (cd ) where p∞ = < p, β0 , . . . , βd−2 > . We thus decompose < γ > as a product of ideals where every exponent is √ necessarily even, which gives < γ > . Montgomery used a different ideal factorization process (see [7,14]) by introducing a special ideal, but its correctness is not proved. 4.4
Square Root Approximation √ √ √ We now use the ideal square root < γ > to approximate γ. Since < γ > is a huge ideal, we will get an approximation through an iterative process, by selecting a small part of the ideal at each step: this small part will be alternatively
Square Root for the Number Field Sieve
159
taken in the numerator and denominator. To lift an integral ideal to an algebraic integer, we use lattice reduction techniques. We associate several variables at each step `: – an algebraic number γ` . It can be considered as the square of the error in √ the current approximation of γ. – a sign s` in {−1, +1}, indicating whether we take something in the denominator or in the numerator of the huge original ideal.√ – a fractional ideal G`, which is an approximation to < γ` > . √ – an integral ideal H` of bounded norm. It differentiates G` from < γ` > . – an algebraic integer δ` . – an integral ideal I` of bounded norm. Q √ We initialize these variables by: γ1 = γ = i∈S (ai − bi α)ei , G1 = < γ > , H1 = < 1 > , s1 = 1 if NK (γ) ≥ 1 and −1 otherwise. Each step of the approximation makes γ`+1 in some sense smaller than γ` , and G`+1 simpler than G` . After enough steps, G` is reduced to the unit ideal < 1 > , and γ` becomes an algebraic integer sufficiently small that its integral representation can be determined explicitly (using Chinese Remainders) and a square root constructed using brute-force method. At the start of step `, we need to know the following: – approximations to the |σj (γ` )| for 1 ≤ j ≤ d, giving an approximation to |NK (γ` )|. – prime ideal factorization of G` . – Hermite normal form of H`. – value of s` . For ` = 1, these information are obtained from the initial values of the variables. Each step ` consists of: 1. Select an integral ideal I` of almost fixed norm, by multiplying H` with another integral ideal dividing the numerator (resp. the denominator) of G` if s` = 1 (resp. s` = −1). Compute its Hermite normal form. 2. Pick some “nice” δ` in I` using lattice reductions. 3. Define: −s` I` < δ` > −2s` , G`+1 = G` , H`+1 = , s`+1 = −s` . γ`+1 = γ` δ` H` I` This allows to easily update necessary information: – compute the |σj (δ` )|’s to approximate the |σj (γ`+1 )|’s. – the selection of I` is actually made in order to obtain the prime ideal factorization of G`+1 simply by updating the exponents of the prime ideal factorization of G` . – H`+1 and s`+1 are directly computed. 4. Store s` and the integral representation of δ` . We now explain the meaning of the different variables, then we detail the first hQ i2 Q`−1 sL `−1 sL δ . In other words, L=1 δL is two parts. By induction on `, γ = γ` L=1 L √ s` √ < γ` > . the approximation of γ at step `. Each γ` is a square and G` = H` Notice that C(G`+1 ) = N (I`1/H` ) C(G` ).
160
Phong Nguyen
Ideal Selection. We try to select an I` with norm as close as possible to a constant LLLmax , set at the beginning of the iterative process, to be explained later on. To do so, we adopt a greedy strategy. Since we know the prime ideal factorization of G`, we can sort all the prime ideals (according to their norm) appearing in this factorization. We start with I` = H`, and we keep multiplying I` by the possibly largest prime ideal power in such manner that N (I` ) is less than LLLmax . In practice, this strategy behaves well because most of our prime ideals lie over small primes. At the same time, when we pick a prime ideal power to multiply with I` , we update its exponent in the prime ideal factorization of G` so that we obtain the prime ideal factorization of G`+1 . At the end of the approximation, when C(G` ) is small, we find an I` of small norm (not close to I` equals the whole numerator or the whole denominator LLLmax ) such that H ` of G`. Integer Selection. We look for a nice element δ` in the integral ideal I` , that is to say, an algebraic integer that looks like the ideal. For us, “looking like” will mainly mean “with norm almost alike”. This really means something since the norm of any element is a multiple of the norm of the integral ideal. So we select δ` in order to make N ( < δ` > /I` ) as small as possible, which is the same as finding a short element in a given ideal. Fortunately an ideal is also a lattice, and there exists a famous polynomial-time algorithm for lattice reduction: LLL [9,4]. We will use two features of the LLL-algorithm: computation of an LLL-reduced basis, and computation of a short vector (with respect to the Euclidean norm, not to the norm in a number field). First, we reduce the basis of I` given by its HNF. In other words, we reduce the matrix of the integral representations (with respect to (ω1 , . . . , ωd )) of the elements of the basis. We do so because the HNF matrix is triangular, therefore not well-balanced: by applying an LLL reduction, coefficients are smaller and better spread. Assume the obtained reduced basis is (v(j) )dj=1 . We specify a constant c > 0 by s LLL |NK (γ` )|s` max . cd = N (I` ) |∆(K)| Let λj = |σ (γ c)|s` /2 for 1 ≤ j ≤ d. We define a linear transformation Ω that maps Pdj ` t any v = i=1 vi ωi ∈ I` to Ωv = [v1 , . . . , vd , λ1 σ1 (v), . . . , λd σd (v)] . This is when K is totally real. If f has complex roots: for any complex conjugate pairs σi and √ σ i , we replace√σi (v) and σ i (v) in the definition of Ω by respectively, <(σi (v)) 2 and =(σi (v)) 2. In Montgomery’s implementation, the Z-basis (v(j) )dj=1 is expressed with respect to the power basis instead of the integral basis, which does not seem to be more attractive. From (v(j) )dj=1 , we form a 2d × d real matrix with the corresponding (Ωv(j) )dj=1 . Proposition 2. This matrix satisfies: 1. The determinant of the image of the first d coordinates is in absolute value equal to N (I` ).
Square Root for the Number Field Sieve
161
2. The determinant of the image of the last d coordinates is in absolute value equal to LLLmax . Proof. The image of the first d coordinates is the matrix representation of a Zbasis of I` with respect to a Z-basis of O. Hence, its determinant is in absolute value equal to [O : I` ], proving 1. For 2, we assume that K is totally real: otherwise, the determinant is unchanged by multilinearity. In absolute value, the determinant of the image of the last d coordinates of (Ωv(j) )dj=1 is equal to q |∆(v(1) , . . . , v(d) )| ×
cd , |NK (γ` )|s` /2
where ∆ denotes the discriminant of d elements of K. Since the v(j) form a Z-basis of I` , this discriminant is N (I` )2 × ∆(ω1 , . . . , ωd), where ∆(ω1 , . . . , ωd)p= ∆(K). The initial determinant is thus in absolute value cd |NK (γ` )|−s` /2 N (I` ) |∆(K)|, and we conclude from the definition of c. t u We apply a second LLL reduction to this matrix. In practice, we apply a LLL reduction to this matrix rounded to an integral matrix (notice that the upper d × d matrix has integral entries) as integer arithmetic is often preferable. We initialize LLLmax to the maximal value where the LLL-reduction algorithm supposedly performs well. The previous proposition ensures us that both LLL reductions perform well. We choose for δ` the algebraic integer defined by the first d coordinates of the first column of the matrix output by the second LLL reduction. We use the following result to prove that the approximation stage terminates. Theorem 3. There exists a computable constant C depending only on K such that the second LLL reduction outputs an algebraic integer δ` with |NK (δ` )| ≤ C × N (I` ), where C is independent of N (I` ), LLLmax and c. In particular, N (H` ) ≤ C. The proof, quite technical, is left in the appendix. End of the Approximation. We stop the iterative process when C(G` ) = √ 1. This necessarily happens if LLLmax C. Indeed, if numer( < γ > ) and √ denom( < γ > ) have close norms, then at every step `, N (I` /H`) is close to LLLmax /C, which gives C(G` ) ≈ (C/LLLmax )`−1 C(G1 ). So the number of steps √ to obtain C(G` ) = 1 is roughly logarithmic in C( < γ > ). More precisely, one can show that if LLLmax /C is greater than the largest prime appearing in C( < γ > ), √ then at most 2dlog2 C( < γ > )e steps are necessary to make C(G` ) equal to 1. Once C(G` ) = 1, we perform one more iteration if s` = +1, in which I`+1 is equal to H`. We can now assume that C(GL ) = 1 with sL = −1. This implies √ that < γL > = HL and therefore, γL is an algebraic integer of norm N (HL )2 bounded by C 2 . This does not prove that γL has a small integral representation: if the coefficients of γL are small, then we can bound NK (γL ), but the converse is false (for instance, γL might be a power of a unit).
162
Phong Nguyen
0 Proposition 4. There exists a computable Pd constant C depending only on K such that for every algebraic number θ = j=1 θj ωj ∈ K, each |θi | is bounded by
C
0
s X
|σi (θ)|2 .
1≤i≤d
Proof. Let Φ be the injective Q-linear transformation that maps any x ∈ K to t [σ1 (x), . . . , σd (x)] . Since Φ(K) and K both are Q-vector spaces of finite dimension, there exists kΦ−1 k ∈ R such that for all x ∈ K: kxk ≤ kΦ−1 k.kΦ(x)k, where we consider the “Euclidean” norms induced on K by the integral basis (ω1 , . . . , ωd), and on Φ(K) by the canonical basis of Cd . The matrix A = (σi (ωj ))1≤i,j≤d represents Φ. A can be computed, and so can be its inverse A−1 . t u This gives an upper bound to kΦ−1 k, which we note C 0 . With Lemma 5 (see the appendix), this proves that bounding the embeddings is the same as bounding the coefficients. But the linear transformation Ω is precisely chosen to reduce the embeddings: the last d coordinates reduce the sum of inverses of the embeddings of γ`+1 . This is not a proof, but it somehow explains why one obtains in practice a “small” algebraic integer. 4.5
Computing the Error
We wish to compute the last algebraic integer θ = γL of norm at most C 2 . We have a product formula for θ, of which we know every term. The partial products are too large to use directly this formula, but since we only deal with integers, we can use the Chinese Remainder Theorem if we choose good primes. A prime p is a good prime if it is inert (f is irreducible modulo p) and if p does not divide any of the NK (δ` )/N (I` ). For such a p, the integral representation of θ (mod p) can be computed. This computation is not expensive if p is not too large. In general, it is easy to find good primes. We first find inert primes. In some very particular cases, inert primes do not even exist, but in general, there are a lot of inert primes (see [3]). Then we select among these primes those who do not divide any of the NK (δ` )/N (I` ). Most of these primes will satisfy this assumption. If we selected several good primes p1 , . . . , pN , and if the coefficients of θ are all bounded by the product p1 . . . pN , then we obtain these coefficients from the coefficients of θ modulo each pi . In practice, a few good primes suffice. Then − θ over K[X] in a reasonable time. The initial square root we can factorize X 2√ QL √ √ follows since γ = θ `=1 δ`s` . Actually, we only need φ( γ), so we compute all the φ(δ` ) to avoid excessively large numbers. We thus obtain a square identity and hopefully, some factors of n.
5
Complexity Analysis
We discuss the complexity of each stage of the algorithm, with respect to the growth of |S|. We assume that f is independent of |S|, which implies that all
Square Root for the Number Field Sieve
163
ai , bi and F (ai , bi ) can be bounded independently of |S|. Recall that during the sieving, all ep,r (a, b) are computed. Simplification of < γ > : even if the simulated annealing method is used, one can easily show that this stage takes at most O(|S|) time. Ideal square root: The only expensive operations are the decomposition of exceptional primes and the computation of valuations at these primes. The decomposition of exceptional primes is done once for all, independently of |S|. Any valuation can be efficiently computed, and takes time independent of |S|. Since exceptional prime numbers appear at most O(|S|) times, this stage takes at most O(|S|) time. Square Root Approximation: We showed that the number of required steps was O(ln C( < γ > )). Since all the F (ai , bi ) are bounded, ln C( < γ > ) is O(|S|). Unfortunately, we cannot say much about the complexity of each step, although each step takes very little time in practice. This is because we cannot bound independently of |S| all the entries of the 2d × d matrix that is LLL reduced. Indeed, we can bound the entries of the upper d × d square matrix, but not the entries of the lower one, as we are unable to prove that the embeddings of the algebraic number γ` get better. However, since we perform LLL reductions on matrices with very small dimension, it is likely that these reductions take very little time, unless the entries are extremely large. This is why in practice the approximation takes at most O(|S|) time. Computing the Error: If we can bound the number and the size of necessary good primes independently of |S|, then this stage takes at most O(|S|) time. Unfortunately, we are unable to do this, because we cannot bound the embeddings of the last algebraic integer θ, as seen previously. In practice however, these embeddings are small. One sees that it is difficult to prove anything on the complexity of the algorithm. The same holds for Montgomery’s algorithm. In practice, the algorithm behaves as if it had linear time in |S| (which is not too surprising), but we are unable to prove it at the moment. We lack a proof mainly because we do not √ √ know any particular expression for γ. For instance, we do not know if γ can be expressed as a product with exponents ±1 of algebraic integers with bounded integral representation.
6
Implementation
We make some remarks about the implementation: √ 1. Since the number of ideals appearing in < γ > is huge, we use a hash-table and represent any normal prime ideal by its corresponding (p, r) pair. Exceptional prime ideals require more place, but there are very few exceptional primes. 2. It is only during the approximation process (namely, to obtain the Hermite normal form of I` ) that one needs to compute a system of O-generators for normal prime ideals. Such a computation is however very fast.
164
Phong Nguyen
3. To avoid overflows, we do not compute |σj (γ` )|, c and λj but their logarithms. Pd One checks that j=1 ln |σj (γ` )| = ln |NK (γ` )| if one is in doubt about the precision. 4. To choose the constant LLLmax , one can compute the C constant from the formulas given in the proof of Theorem 3, but one can also perform some LLL reductions to obtain the practical value of C. Notice that when one knows C and LLLmax , one can estimate the number of iterations. 5. To know how many good primes are sufficient to compute the last algebraic integer, one can compute the C 0 constant as shown in the proof of Proposition 4, which gives a bound for the coefficients of the integral representation. 6. The last algebraic integer is often a small root of unity. This is because the last ideal I` is principal, and we know an approximation to the embeddings of one of its generators. This generator has unusual short norm in the corresponding lattice, therefore it is no surprise that the LLL algorithm finds this generator, making H`+1 equal to < 1 > . In the latter case, the last algebraic integer is often equal to ±1: one should try to bypass the computation of the error and apply φ directly to find some factors of n. The algorithm has been implemented using version 1.39 of the PARI library [1] developed by Henri Cohen et al. In December, 1996, it completed the factorization of the 100-digit cofactor of 17186 + 1, using the quadratic polynomials 5633687910X 2−4024812630168572920172347X+482977515620225815833203056197828591062 and −77869128383X 2 − 2888634446047190834964717X + 346636133525639208946167278118238554489. Each dependency had about 1.5 million relations. It took the square root code about 10 hours to do both square roots on a 75Mhz Sparc 20.
7
Conclusion
We presented an algorithm suitable for implementation to solve the square root problem of the number field sieve. This algorithm is a variant of Montgomery’s square root. We modified the square root approximation process by using an integral basis instead of the power basis: this allows to work with integers instead of rationals, and to search the algebraic integer δ` in the whole ideal I` , not in some of its submodules. We introduced the simulated annealing method in the ideal simplification process. From results of [3], we proposed an efficient ideal square root process and proved its validity. We postponed the computation of the error to avoid useless computations. The present running time of the algorithm is negligible compared to other stages of the number field sieve. In practice, the algorithm behaves as if it had linear complexity, but one should note that this is only heuristic as few things are proved about the complexity. It is an open problem to determine precisely the complexity of the algorithm. Acknowledgements. I am particularly grateful to both Arjen and Hendrik Lenstra for many explanations about the number field sieve. I wish to thank Jean-Marc Couveignes and Peter Montgomery for enlightening discussions. I
Square Root for the Number Field Sieve
165
also thank Philippe Hoogvorst for his helpful comments, and for carrying out experiments.
A
Proof of Theorem 3
This theorem is related to the classical result of the geometry of numbers which states that for any integral ideal I, there exists an algebraic integer δ ∈ I such that |NK (δ)| ≤ M(K)N (I) where M(K) denotes the Minkowski constant of K. It relies on Minkowski’s convex body theorem which can be viewed as a generalization of the pigeon-hole principle. Following an idea of Montgomery [14], we use the pigeon-hole principle to estimate precisely each component of δ` . The only thing we need to know about LLL-reduced bases is that if (b1 , . . . , bd ) is an LLL-reduced basis of a lattice Λ, then det(Λ) ≤
d Y
kbi k ≤ 2d(d−1)/4 det(Λ)
i=1 (d−1)/2
kb1 k ≤ 2
kxk if x ∈ Λ, x 6= 0
(1) (2)
where det denotes the lattice determinant and k.k denotes the Euclidean norm. In the following, we will use the notation k.k even for vectors with different Pd numberq of coordinates. Here, if x = i=1 xi ωi is an algebraic number of K, then Pd 2 kxk = i=1 xi . We will use the notation (x)i to denote the i-th coordinate of x. From now on (all along the proof), we assume that K is totally real to simplify the definition of Ω, but a similar reasoning applies to other cases with a different choice of constants. Lemma 5. There exists a computable constant C1 depending only on K such that for every x ∈ K, and for any integer j = 1, . . . , d: |σj (x)| ≤ C1 kxk |(Ωx)d+j | ≤ λj C1 kxk
(3) (4)
Pd Pd Proof. We have x = i=1 xi ωi where xi ∈ Q. Therefore σj (x) = i=1 xi σj (ωi ). Using triangle inequality and Cauchy-Schwarz, we obtain:
|σj (x)| ≤
d X
v v u d u d uX uX t 2 |xi ||σj (ωi )| ≤ |xi | × t |σj (ωi )|2 ≤ kxkC1 ,
i=1
where C1 = max1≤j≤d definition of Ω.
i=1
q Pd i=1
i=1
|σj (ωi )|2 . This proves (3), which implies (4) by t u
166
Phong Nguyen
Lemma 6. There exists two computable constants C2 and C3 depending only on K such that for any integral ideal I` , there exists a real M and an algebraic integer z ∈ I` , z 6= 0 satisfying: M d ≤ C2
Y
λj
(5)
j∈J
kzk ≤ M N (I` )1/d
(6)
∀j ∈ J λj kzk ≤ M N (I` )
1/d
(7)
kΩzk ≤ C3 M N (I` )
(8)
1/d
where J = {j = 1, . . . , d / λj > 1}. Proof. Let C2 = 2d(d−1)/4 dd 2d+1 . Since 2d(d−1)/4 dd nition of J, there exists M > 0 such that 2
Y
dλj e < C2
j∈J Y d(d−1)/4 d
d
Y j∈J d
λj by defi-
dλj e < M ≤ C2
j∈J
Y
λj .
j∈J
This M satisfies (5). The number of n = (n1 , . . . , nd ) ∈ Nd such that each ni 1/d satisfies ni kv (i) k ≤ M is at least d N (I` ) d d Y Y Y Md M N (I` )1/d M N (I` )1/d e ≥ ≥ d by (1) > dλj e. dkv (i) k dkv (i)k dd 2d(d−1)/4 i=1 i=1 j∈J
(i)
k c is a positive integer less than λj . By the pigeonFor such an n, bλj MniNdkv (I` )1/d hole principle, there therefore exists two distinct n = (n1 , . . . , nd) and n0 = (n01 , . . . , n0d ) both in Nd such that for all i = 1, . . . , d:
M N (I` )1/d d M N (I` )1/d n0i kv (i) k ≤ d ni dkv(i) k n0i dkv (i) k c = bλ c ∀j ∈ J bλj j M N (I` )1/d M N (I` )1/d ni kv (i) k ≤
(9) (10) (11)
Pd Define z = i=1 (ni − n0i )v(i) . Then z ∈ I` , z 6= 0 and by (9) and (10), we have for all i = 1, . . . , d: M N (I` )1/d . |ni − n0i |.kv(i) k ≤ d This proves (6) by triangle inequality . Furthermore, for all j ∈ J and for all i = 1, . . . , d, the quantity λj |ni − n0i |.kv(i) k is equal to ni dkv (i)k n0i dkv(i) k M 1/d N (I` ) λj − λj , d M N (I` )1/d M N (I` )1/d
Square Root for the Number Field Sieve
which is, by (11), less than Finally: kΩzk = 2
d X
M 1/d . d N (I` )
|(Ωz)j | + 2
j=1
≤ kzk + 2
X
This proves (7) by triangle inequality.
d X
|(Ωz)d+j |2
j=1
X
λj C1 kzk2 +
j6∈J
≤ 1 + C1
167
λj C1 kzk2 by (4)
j∈J
X
1 + C1
j6∈J
X
h i2 1 M N (I` )1/d
j∈J
by (6), (7) and the definition of J. This proves (8) with C3 =
√
1 + dC1 .
t u
Now, if δ is the algebraic integer output by the second LLL reduction, (2) implies that kΩδk2 ≤ 2d−1 kΩzk2 . Since kδk ≤ kΩδk, (8) implies that kδk ≤ 2(d−1)/2 C3 M N (I` )1/d . Moreover, |NK (δ)| = one hand, by (3): Y
Qd j=1
|σj (δ)| =
d−|J|
|σj (δ)| ≤ (C1 kδk)
Q
Q On the other hand, j∈J |σj (δ)| = geometric mean inequality:
j∈J
Q |σj (δ)|) × j6∈J |σj (δ)| . On the
h id−|J| ≤ 2(d−1)/2 C1 C3 M N (I` )1/d .
j6∈J
Y
j∈J
Q
Q |(Ωδ)λ
j∈J
j∈J
d+j |
j
, where by the arithmetic-
|J| X |(Ωδ)d+j |2 ≤ |(Ωδ)d+j |2 ≤ (kΩδk2 )|J| ≤ (2d−1 kΩzk2 )|J| j∈J
h i|J| ≤ 2(d−1)/2 C3 M N (I` )1/d by (8).
We collect these two inequalities: d−|J|
C |NK (δ)| ≤ Q 1
j∈J
≤
λj
h id−|J|+|J| 2(d−1)/2 C3 M N (I` )1/d
max(1, C1d ) d(d−1)/2 d d Q 2 C3 M N (I` ) j∈J λj
≤ max(1, C1d )2d(d−1)/2 C3d C2 N (I` ) by (5). This completes the proof with C = 2d(d−1)/2 max(1, C1d)C2 C3d .
168
Phong Nguyen
References 1. Batut, C., Bernardi, D., Cohen, H., and Olivier, M. Pari-gp computer package. Can be obtained by ftp at megrez.math.u-bordeaux.fr. 2. Buchmann, J. A., and Lenstra, Jr., H. W. Approximating rings of integers in number fields. J. Th´eor. Nombres Bordeaux 6, 2 (1994), 221–260. 3. Buhler, J. P., Lenstra, H. W., and Pomerance, C. Factoring integers with the number field sieve. pages 50-94 in [8]. 4. Cohen, H. A course in computational algebraic number theory. Springer, 1993. 5. Couveignes, J.-M. Computing a square root for the number field sieve. pages 95-102 in [8]. 6. Cowie, J., Dodson, B., Elkenbracht-Huizing, R. M., Lenstra, A. K., Montgomery, P. L., and Zayer, J. A world wide number field sieve factoring record: On to 512 bits. In Proceedings of ASIACRYPT’96 (1996), vol. 1163 of Lecture Notes in Computer Science, Springer-Verlag, pp. 382–394. 7. Elkenbracht-Huizing, M. An implementation of the number field sieve. Experimental Mathematics 5, 3 (1996), 231–253. 8. Lenstra, A. K., and Lenstra, Jr., H. W. The development of the Number Field Sieve, vol. 1554 of Lecture Notes in Mathematics. Springer-Verlag, 1993. ´ sz, L. Factoring polynomials 9. Lenstra, A. K., Lenstra, Jr., H. W., and Lova with rational coefficients. Math. Ann. 261 (1982), 515–534. 10. Lenstra, A. K., Lenstra, Jr., H. W., Manasse, M. S., and Pollard, J. M. The number field sieve. pages 11-42 in [8]. 11. Lenstra, A. K., Lenstra, Jr., H. W., Manasse, M. S., and Pollard, J. M. The factorization of the ninth fermat number. Math. Comp. 61 (1993), 319–349. 12. Lenstra, Jr., H. W. Factoring integers with elliptic curves. Ann. of Math. 126 (1987), 649–673. 13. Lenstra, Jr., H. W. Algorithms in algebraic number theory. Bull. Amer. Math. Soc. 26 (1992), 211–244. 14. Montgomery, P. L. Square roots of products of algebraic numbers. Draft of June, 1995. Available at ftp://ftp.cwi.nl/pub/pmontgom/sqrt.ps.gz. 15. Montgomery, P. L. Square roots of products of algebraic numbers. In Mathematics of Computation 1943-1993: a Half-Century of Computational Mathematics (1994), W. Gautschi, Ed., Proceedings of Symposia in Applied Mathematics, American Mathematical Society, pp. 567–571. 16. Pohst, M., and Zassenhaus, H. Algorithmic algebraic number theory. Cambridge University Press, 1989. 17. Pollard, J. M. Factoring with cubic integers. pages 4-11 in [8]. 18. Reeves, C. R. Modern Heuristic Techniques for Combinatorial Problems. Blackwell Scientific Publications, 1993.
Robert Bennion’s “Hopping Sieve” William F. Galway Department of Mathematics University of Illinois at Urbana-Champaign 1409 West Green Street Urbana, IL 61801 [email protected] http://www.math.uiuc.edu/~galway
Abstract. This paper describes a sieving algorithm which in a cached memory environment may be superior to familiar “cross out bits” versions of the sieve of Eratosthenes. The algorithm also seems to show some advantages over “classical” versions of the sieve of Eratosthenes when adapted to the problem of factoring numbers in an interval.
1
Introduction
Suppose we want to sieve a range of numbers n, in the interval n0 ≤ n < n0 + L, eliminating those n which are multiples of any prime p ≤ Y (say p ∈ PY ). In the classical version of the sieve of Eratosthenes, if we think of the natural numbers as a sequence of numbered tiles, we would cross out those tiles in [n0 , n0 + L) which correspond to multiples of each p ∈ PY . For large L the memory requirements for sieving can be reduced by subdividing the interval into segments of length of order Y , giving the “segmented sieve” of Bays and Hudson [2], [1, §9.8]. This paper describes a variation on the sieve of Eratosthenes that was developed by Robert Bennion at the University of Utah in the early 1970s [3]. The running time and space requirements of Bennion’s algorithm are essentially the same as that for the segmented sieve — if L ≥ π(Y ) then both methods require roughly O(Y ) bits storage, and perform O(L ln ln(Y )) arithmetic operations on “words” of sufficient size to hold numbers through n0 + L + Y . However, memory references in Bennion’s hopping sieve appear to be more local, and so his algorithm may be superior in a cached memory environment. Like other versions of the sieve of Eratosthenes, the hopping sieve may be used to efficiently factor numbers in an interval. Again, the asymptotic behavior of the hopping sieve appears to be essentially identical to that of “classical” versions of the sieve, but the hopping sieve seems to to simplify the problem of storage management, (and again may show better caching behavior).
2
The Algorithm
In the hopping sieve, instead of “crossing out tiles” we may think of each tile as holding one of the primes in our set PY . Sieving proceeds by examining J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 169–178, 1998. c Springer-Verlag Berlin Heidelberg 1998
170
William F. Galway
consecutive tiles, rearranging the primes ahead of the tile being examined, and then moving on. The primes are placed so that when we examine the nth tile the prime p at n divides n if and only if n has a divisor in PY . After examining n, the p at that tile “hops forward”, preferably to the next multiple of p, as described further below. As p hops into place, the prime previously at that spot is displaced and itself hops forward, until the primes have been rearranged so that we may proceed to examine n + 1. This algorithm is similar in spirit to the algorithm of B. A. Chartres [4], [6, Exercise 5.2.3.15], which also attempts to make essentially a single pass over the interval n0 ≤ n < n0 + L. However Chartres’ algorithm, as presented in [6], requires O(L ln(Y )) word operations — while Bennion’s algorithm requires O(L ln ln(Y )) operations, as we show in Section 3. An implementation of Bennion’s algorithm, written in ANSI C, is given in Figures 1 through 4. The algorithm uses a circular buffer pbuf, of length S = π(Y ), containing an arrangement of the primes p ∈ PY . When considering whether the number n is to be eliminated, this buffer holds a prime pm for each number m, n ≤ m < n + S. The prime pm associated with m (“the prime at m”) is stored in pbuf[(m − n0 ) mod S]. This is implemented in hopsieve.h (Figure 1).
#include <stdio.h> typedef struct HoppingSieve { int n; int n0; int S; int *pbuf; } HoppingSieve;
/* The "current number". */ /* Starting value for n. */ /* Number of entries in pbuf.
*/
/* Given HoppingSieve *sv, convert m to "the prime at m". */ #define P(sv,m) sv->pbuf[((m) - (sv->n0))%(sv->S)]
Fig. 1. hopsieve.h: The hopping sieve structure, and other declarations
When examining n, primes are arranged in pbuf so that they satisfy the condition that for m ∈ [n, n + S), if m has a divisor in PY , then there is some number m0 , with n ≤ m0 ≤ m, and pm0 | m. (In other words, we ensure that at at least one prime dividing m has not hopped beyond m.) In particular, when m = n we see that if n should be eliminated then pn | n. (If n = pn one might prefer not to eliminate n, when using the sieve to find primes in an interval with n0 ≤ Y .) As we move from examining n to examining n+1 we ensure that this condition remains satisfied by moving the prime p at n forward to the next multiple of p (at n + p − (n mod p)), unless this takes p beyond the end of pbuf. The latter case occurs when p − (n mod p) > S, in which case we store p at n + S (the last
Robert Bennion’s “Hopping Sieve”
171
free spot in pbuf — note this is where the prime at n was stored, since the buffer is circular.) While placing the prime at its new location m, if m < n + S we displace another prime, which proceeds to “hop forward” to its next multiple, if possible, or to n + S otherwise. We have finished rearranging the primes once we have stored some prime at n + S. Our condition remains satisfied since any divisor of m is replaced by some other divisor of m. This process is implemented in AdvanceSieve.c (Figure 2). The method for creating the initial sieve structure is similar to that for advancing the sieve. Given our initial set of primes PY , we begin by setting the size S of pbuf to 1. One-by-one we place the primes of PY into pbuf, incrementing S. We attempt to place a given prime p at a multiple of p, perhaps displacing another prime as above. If p cannot be stored at a multiple, it is placed at the end of pbuf. This is implemented in InitSieve.c (Figure 3). The usage of these routines is illustrated in hopsieve.c (Figure 4), which counts the primes in the interval [36, 1000], given the primes up to 31.
3
Space and Running Time
The memory requirements of the hopping sieve are determined by the size of PY , from which we see that it requires O(ln(Y ) Y / ln(Y )) = O(Y ) bits of storage. To analyze the running time, it suffices to bound the number of hop-anddisplace operations as n traverses the interval [n0 , n0 + L). (See the inner loop of AdvanceSieve.c, Figure 2.) Within this loop primes p always land at some multiple m of p, m ∈ [n0 , n0 + L + S). Writing I = [n0 , n0 + L + S), and recalling that S = π(Y ), we can bound the number of operations by X X X L+S 1= O(1) + p p≤Y m∈I p|m
p≤Y
= O(S) + (L + S)
X 1 p
p≤Y
= O(S) + O(L) + (L + S) ln ln(Y ) , P where we use the fact that p≤Y 1/p = O(1) + ln ln(Y ). If we assume L ≥ S, this is O(L ln ln(Y )) operations — a cost of O(ln ln(Y )) operations per number sieved. In the same way, assuming we are given the set of primes PY , the time to create the initial sieve is bounded by the number of “hop and displace” operations performed by the inner loop of InitSieve.c (Figure 3). In this case we must have m ∈ [n0 , n0 + S), and the number of operations is Y ln ln(Y ) . O(S ln ln(Y )) = O ln(Y ) Normally this would be dominated by the time to find the set PY (either recursively using the hopping sieve, requiring O(Y ln ln(Y )) operations, or by some other method).
172
William F. Galway
int AdvanceSieve(HoppingSieve *sv) { int m, tmp, p; int rslt; m = sv->n; p = P(sv,m); if (m%p == 0) rslt = 0; else rslt = 1; while (1) { m += p - m%p; if (m >= sv->n + sv->S) break; /* p hops into place, displaces previous prime. tmp = p; p = P(sv,m); P(sv,m) = tmp; } P(sv, sv->n) = p;
*/
sv->n++; return rslt; }
Fig. 2. AdvanceSieve.c: Advances the sieve, returns 1 iff original n passes through sieve, else 0
/* Initialize a pre-allocated sieve structure to start at n0. */ void InitSieve(HoppingSieve *sv, int n0, int size, int *primes) { int m, p, tmp; sv->n = sv->n0 = n0; for (sv->S = 1; sv->S <= size; sv->S++) { p = primes[sv->S - 1]; m = n0 + (p-n0%p)%p; while (m < n0 + sv->S - 1) { tmp = p; p = P(sv,m); P(sv,m) = tmp; m += p - m%p; } P(sv, n0 + sv->S - 1) = p; } sv->S = size; }
Fig. 3. InitSieve.c: Initializes a sieve structure
Robert Bennion’s “Hopping Sieve”
173
#include "hopsieve.h" #include "AdvanceSieve.c" #include "InitSieve.c" int primes[11] = {2,3,5,7,11,13,17,19,23,29,31}; int main(int argc, char *argv[]) { int pcount = 0; int pbuf[11]; HoppingSieve sieve; sieve.pbuf = pbuf; InitSieve(&sieve, 36, 11, primes); while (sieve.n <= 1000) { if (AdvanceSieve(&sieve)) pcount++; } printf("%d primes in interval [36, 1000]\n", pcount); return 0; }
Fig. 4. hopsieve.c: Main routine, showing use of other routines
4
Variations on the Algorithm
As presented above, the algorithm requires a large number of remaindering operations (modulo p and modulo S). By roughly doubling the space, no remaindering is needed once the initial sieve structure is created. This is accomplished by maintaining two circular buffers indexed by m: pbuf as before, and dbuf which holds the distance to the next multiple of p, d = p − (m mod p). The revised versions of hopsieve.h, AdvanceSieve.c, and InitSieve.c are given in Figures 5, 6, and 7 respectively. As with the classical sieve, the algorithm may be modified to sieve only the numbers in a given congruence class modulo M , in which case we need not sieve by primes dividing M . So sieving the odd numbers requires less than half the original operation count, while further improvements can be made by considering congruence classes 1 and 5 modulo 6, etc.
5
Caching Issues
Since the hopping sieve and the segmented sieve have similar asymptotic operation counts, the choice between the two methods will depend on implementation details. A possible advantage of the hopping sieve is that its pattern of memory accesses appears to be more suited to a cache memory environment. (A description of cache memories may be found in [5].) The segmented sieve sweeps repeatedly over a long bit-vector, once for each prime eliminated. For larger primes the
174
William F. Galway
#include <stdio.h> typedef struct HoppingSieve { int n; /* The "current number". */ int nindex; /* Corresponding index into pbuf, dbuf... int S; /* Number of entries in pbuf, dbuf. */ int *pbuf; int *dbuf; } HoppingSieve;
*/
Fig. 5. hopsieve2.h: The revised hopping sieve structure, and other declarations int AdvanceSieve(HoppingSieve *sv) { int tmp, p, d; int mindex, dist; int rslt; dist = sv->S; mindex = sv->nindex; p = sv->pbuf[mindex]; d = sv->dbuf[mindex]; if (d == p) rslt = 0; else rslt = 1; while (d < dist) { dist -= d; mindex += d; if (mindex >= sv->S) mindex -= sv->S; tmp = p; p = sv->pbuf[mindex]; sv->pbuf[mindex] = tmp; d = sv->dbuf[mindex]; sv->dbuf[mindex] = tmp; } sv->pbuf[sv->nindex] = p; d -= dist; if (d == 0) d = p; sv->dbuf[sv->nindex] = d; sv->n++; sv->nindex++; if (sv->nindex == sv->S) sv->nindex = 0; return rslt; }
Fig. 6. AdvanceSieve2.c: Advances the sieve with no remaindering operations
Robert Bennion’s “Hopping Sieve”
175
/* Initialize a pre-allocated sieve structure to start at n0. */ void InitSieve(HoppingSieve *sv, int n0, int size, int *primes) { int p, d, tmp; int mindex; sv->n = n0; sv->nindex = 0; for (sv->S = 1; sv->S <= size; sv->S++) { p = primes[sv->S - 1]; mindex = (p-n0%p)%p; while (mindex < sv->S - 1) { tmp = p; p = sv->pbuf[mindex]; sv->pbuf[mindex] = tmp; d = sv->dbuf[mindex]; sv->dbuf[mindex] = tmp; mindex += d; } sv->pbuf[sv->S - 1] = p; sv->dbuf[sv->S - 1] = p - (n0%p + sv->S - 1)%p; } sv->S = size; }
Fig. 7. InitSieve2.c: Initializes the revised sieve structure
memory accesses are widely separated, which seems likely to cause cache misses. In contrast, memory references in the hopping sieve appear to cluster near the location of the n being considered for elimination by the sieve. (On the other hand, the locality of reference of the segmented sieve may be improved by using a two-tiered approach, using smaller segments that fit within the cache while sieving out multiples of those primes smaller than the cache size, and switching to a larger segment size only for the larger primes. This improvement is not implemented in the version of the segmented sieve used below.) To illustrate cache performance, Table 1 shows the cache miss rates found when simulating the performance of both algorithms. Both sieves were used to √ find the primes in intervals of the form [x, x + 8 x] for various values of x. The implementation of the segmented sieve used a segment length of 4Y (roughly √ 4 x). The implementation of the hopping sieve was essentially the same as that shown in Figures 5 through 7. For each program the indices for each read access into the appropriate data structure (bit-vector, or pbuf/dbuf pair) was scaled to give a byte offset, which was then passed to a cache simulator. This simulated a 2-way-set-associative, 16KB cache with a 32 byte line size. Table 1 shows the approximate sizes (in bytes) of the structures being indexed, and the miss rates found for the two programs.
176
William F. Galway
x 1010 1011 1012 1013 1014 1015
segmented size miss rate 5.0 · 104 28% 1.6 · 105 35% 5.0 · 105 38% 1.6 · 106 40% 5.0 · 106 42% 1.6 · 107 43%
hopping size miss rate 7.7 · 104 18% 2.2 · 105 22% 6.3 · 105 25% 1.8 · 106 27% 5.3 · 106 29% 1.6 · 107 30%
Table 1. Cache miss rates for segmented and hopping sieves
Table 2 compares the performance of the hopping sieve versus the segmented sieve on two different architectures. The programs were run on the same problems as described above (with no cache simulation) on both a SUN microSPARC II (32MB main memory, 110 MHz clock) and on a SUN superSPARC (32MB memory, 75 MHz clock). Experience has shown that cache locality is a more important issue in determining running time on the superSPARC — which seems to be reflected in the times shown. Times are given in seconds, with Tµ denoting times on the microSPARC and Ts times on the superSPARC. segmented x Tµ Ts Ts /Tµ 1010 0.5 0.5 1.0 1011 1.8 1.7 0.9 1012 6.1 6.5 1.1 1013 20.8 32.5 1.6 1014 69.0 138.0 2.0 1015 234.0 457.0 2.0
hopping Tµ Ts Ts /Tµ 2.0 1.8 0.9 6.5 6.0 0.9 24.5 20.6 0.8 74.7 79.8 1.1 247.0 287.0 1.2 829.0 1008.0 1.2
Table 2. Timing statistics for segmented and hopping sieves
6
The Hopping Factor Sieve
The same “hopping” idea can be used to factor a range of numbers. Rather than storing a single prime at m, we store a pointer to a linked list of all p dividing m, p ∈ PY , using a circular “factor buffer” fbuf. We also use a similar buffer of “leftovers”, lbuf, with entries pointing to primes which hop past the end of dbuf. When advancing from n to n + 1, primes p from both lists at n hop forward, either to the factor list at n + p − (n mod p), or if this goes past the end of the buffers, then to the leftover list at n + S. To analyze this “hopping factor sieve” when sieving the interval [n0 , n0 + L), we assume that fbuf and lbuf are both of length S, but initially drop the
Robert Bennion’s “Hopping Sieve”
177
assumption that S = π(Y ). Under the reasonable assumption that both pointers and integers fit within a word of O(ln Y ) bits, this sieve may be implemented using 2S+2π(Y ) words of storage for the buffers and linked list of primes p ∈ PY . If we assume S ≤ π(Y ) this is O(Y / ln Y ) words, or O(Y ) bits. To analyze the running time, we consider primes p ≤ S and primes p > S separately. If p ≤ S it will hop directly from one multiple of p to the next. If p > S then it cannot hop a distance greater than S, and so will require a total of p/S + O(1) hops to advance from one multiple of p to the next. To simplify the argument, we assume that S ≤ Y , in which case the total number of hops performed in sieving the interval is X X (L/p + O(1)) + (L/p + O(1))(p/S + O(1)) p≤S
S
≤ O(L ln ln S) + O(S/ ln S) + = O(L ln ln Y ) + O
Y ln Y
X
(L/S + O(L/p) + O(p/S) + O(1))
p≤Y
+O
LY S ln Y
+O
Y2 S ln Y
,
(1)
where we have used the fact that 2 X X Y . p≤ Y =O ln Y p≤Y
p≤Y
If we now let S = π(Y ), so S = Y / ln Y + O(Y / ln2 (Y )), and further assume that L ≥ Y 2 /(S ln Y ) = Y + O(Y / ln(Y )), then (1) reduces to O(L ln ln Y ) hopping operations for the hopping factor sieve. To analyze a “classical” version of the sieve of Eratosthenes applied to factoring, again drop the assumption that S = π(Y ) and divide the interval [n0 , n0 +L) into a total of L/S + O(1) segments of length bounded by S. For each segment we start with a buffer of initially NULL pointers, one pointer for each m in the segment, and then build up a list of prime factors p ∈ PY dividing each m. For this algorithm, the number of list operations per segment, and the total storage per segment (in words) will be X (S/p + O(1)) = S ln ln(Y ) + O(Y / ln Y ) . p≤Y
If we now take S ∼ Y /(ln(Y ) ln ln(Y )), and assume L ≥ S, this gives a storage requirement of O(Y / ln Y ) words, and a total of (L/S + O(1))O(Y / ln Y ) = O(L ln ln Y ) operations for the entire interval, comparable to the storage and operation counts for the hopping factor sieve. Note however that storage management appears more difficult for the “classical” version, since will not know the total storage required until we complete the sieving process.
7 Acknowledgments Of course, much of the credit for this work goes to Robert Bennion. Teresa Johnson, with the University of Illinois IMPACT research group, provided a
178
William F. Galway
quick tutorial on caching technology, and also made available the IMPACT minicache simulator used to collect the data presented in Table 1. Darrin Doud and Chris Hill made several useful remarks on this paper while it was being written.
References 1. Eric Bach and Jeffrey Shallit. Algorithmic Number Theory, volume I: Efficient Algorithms. The MIT Press, Cambridge, Massachusetts, 1996. 2. Carter Bays and Richard H. Hudson. The segmented sieve of Eratosthenes and primes in arithmetic progrssions to 1012 . BIT, 17:121–127, 1977. 3. Robert Bennion. personal communication. 4. Bruce Aylwin Chartres. Algorithm 311, prime number generator 2. Communications of the ACM, 10(9):570, September 1967. 5. John L. Hennessy and David A. Patterson. Computer Architecture: a Quantitative Approach. Morgan Kaufmann Publishers, Inc., San Mateo, California, 1990. 6. Donald Erwin Knuth. The Art of Computer Programming, volume 3: Sorting and Searching. Addison-Wesley, Reading, Massachusetts, 1973.
Trading Time for Space in Prime Number Sieves Jonathan P. Sorenson? Department of Mathematics and Computer Science, Butler University 4600 Sunset Avenue, Indianapolis, Indiana 46208, USA [email protected] http://www.butler.edu/~sorenson/
Abstract. A prime number sieve is an algorithm that finds the primes up to a bound n. We present four new prime number sieves. Each of these sieves gives new space complexity bounds for certain ranges of running times. In particular, we give a linear time sieve that uses only √ O( n/(log log n)2 ) bits of space, an Ol (n/ log log n) time sieve that uses O(n/((log n)l log log n)) bits of space, where l > 1 is constant, and two super-linear time sieves that use very little space.
1
Introduction
A prime number sieve is an algorithm that finds all prime numbers up to a bound n. In this paper we present four new prime number sieves, three of which accept a parameter to control their use of time versus space. The fastest known prime number sieve is the dynamic wheel sieve of Pritchard [11], which uses O(n/ log log n) arithmetic operations and O(n/ log log n) bits of space. Dunten, Jones, and Sorenson [6] gave an algorithm with the same asymptotic running time, while using only O(n/(log n log log n)) bits of space. Pritchard also invented a segmented wheel-based sieve that requires O(n) operations and √ only O( n/ log log n) bits of space [12]. This last sieve is more practical for larger values of n, because space becomes a serious concern when n exceeds, say, 107 . One could also apply a primality test to each integer up to n. If we were to use the Jacobi Sums test, this would take n(log n)O(log log log n) arithmetic operations and (log n)O(log log log n) space [1]. If the ERH is valid, we can improve this to O(n(log n)3 ) operations and O(log n) bits of space [9,2]. According to Bernstein [5], a method of Atkin uses O(n/ log log n) operations and n1/2+o(1) bits of space. In this paper we present four new sieves. All of them give improved complexity bounds for particular combinations of time and space. 1. Let c be a constant with 0 < c ≤ 1/2, and let ∆ := ∆(n) with ∆ = nc . Our first sieve is a modification of Pritchard’s segmented wheel-based sieve combined with trial division. This sieve uses √ n log n n n + O ∆ log log n (log log n)2 ?
Supported by NSF Grant CCR-9626877
J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 179–195, 1998. c Springer-Verlag Berlin Heidelberg 1998
180
Jonathan P. Sorenson
√ arithmetic operations and O (∆) bits of space. If we choose ∆ = n/(log n)l l with 1 ≤ l, this √ sieve gives new complexity bounds of O(n(log n) / log log n) time and O( n/(log n)l ) bits of space. c as above. Our second 2. Let c be a constant with 1/4 ≤ c ≤ 1/2, and let ∆ = n√ n/∆ + n) arithmetic sieve is a modification of our first sieve. It uses O(n √ operations and O(∆) bits of space. If we choose ∆ = n/(log n)l with 0 < l <√ 1, this sieve gives new complexity bounds of O(n(log n)l ) time and O( n/(log n)l ) bits of space. 3. Our third sieve is a modification of Pritchard’s segmented wheel-based sieve. The sieve uses O(n) arithmetic operations and √ n O (log log n)2 bits of space. This is an improvement in space use over the best previous linear time sieve by a factor proportional to log log n. √ 4. Let B := B(n) with log n ≤ B ≤ n. Our fourth sieve is a combination of the sieve of Dunten, Jones, and Sorenson and Pritchard’s segmented wheelbased sieve. This sieve uses n log(1 + log B/ log log n) O log log n arithmetic operations and O
n B log log n
bits of space. This gives an improved space bound of O(n/((log n)l log log n)) bits for O(n/ log log n)-time sieves for any fixed l ≥ 1 by choosing B = (log n)l . The rest of this paper is organized as follows: In Sect. 2 we review some preliminaries, including the wheel data structure, the use of wheels and segmentation in prime number sieves, and we review Pritchard’s segmented wheel-based sieve and the sieve of Dunten, Jones, and Sorenson. In Sects. 3, 4, and 5 we present our four new sieves. In Sect. 6 we present the results of our timing experiments.
2
Preliminaries
In this section we discuss our model of computation and review material on prime number sieves, including wheels and segmentation. 2.1
Model of Computation
Our model of computation is a RAM with a potentially infinite, direct access memory.
Trading Time for Space in Prime Number Sieves
181
If n is the input, then all arithmetic operations on integers of O(log n) bits are assigned unit cost. This includes +, −, ×, and division with remainder. In addition, comparisons, array indexing, assignment, branching, bit operations, and other basic operations are also assigned unit cost. Memory may be addressed either at the bit level or at the word level, where each machine word is composed of O(log n) bits. The space used by an algorithm under our model is counted in bits. Thus, it is possible for an algorithm to touch n bits in only O(n/ log n) time if memory is accessed at the word level. The space used by the output of a prime number sieve (the list of primes up to n) is not counted against the algorithm, as the output is the same for all algorithms discussed here. Note that, on occasion, we will use a sieve as a subroutine, and the primes it generates will be “consumed” as they are generated without being stored. So in this context it makes sense not to charge for the space used to write out this list of primes. For a justification of this choice of computation model, see [6], where the same model is used. 2.2
Some Number Theory
An integer p > 1 is prime if its only divisors are itself and 1. We always have p denote a prime, with pi denoting the ith prime, so that p1 = 2. For integers a, b let gcd(a, b) denote the greatest commong divisor of a and b. We say a and b are relatively prime if gcd(a, b) = 1. For a positive integer m let φ(m) be the number of positive integers up to m that are relatively prime to m, with φ(1) = 1 (this is Euler’s totient function). The number of primes up to x is given by π(x). We make use of the following estimates. Here x > 0, and all sums and products are only over primes. X1 = log log x + O(1); p p≤x X log p = x(1 + o(1));
(1) (2)
p≤x
X
x (1 + o(1)); log x p≤x Y p−1 1 =O . p log x 1 = π(x) =
(3) (4)
p≤x
For proofs of the estimates above, see Hardy and Wright [8]. 2.3
Two Sieves
Next we review two sieves. Both of these algorithms make use of the fact that every composite integer x can be written uniquely in the form x = p · f, where p is the least prime factor of x (p = lpf(x)).
182
Jonathan P. Sorenson
We present our algorithms as C++ code fragments [15]. When we choose to omit details, we do so by utilizing classes and objects, which we will describe only briefly, leaving the details of their implementation to the reader. Algorithm 2.1: The Sieve of Eratosthenes. Our first sieve, the sieve of Eratosthenes, is well-known. It begins with a bit vector S of length n initialized to mostly ones, representing the set {2, . . . , n}. Then, as each prime is found, its multiples are removed from the set by changing√their corresponding bit positions to zero. Once this is done for all primes up to n, only primes remain in the set represented by S. The following C++ code fragment describes this algorithm. Here BitVector is a class that supports standard operations on a bit vector such as clearing and setting bits and testing a bit to see if it is one or zero. (A bit is set if it is one, and clear if it is zero.) int p,q; BitVector S(n); S.setall(); S.clear(1); for(p=2; p<=sqrt(n); p=p+1) if(S[p]==1) for(q=p*p; q<=n; q=q+p) S.clear(q);
// // // // //
sets loop make loop q is
all bits except position 1 over primes p up to sqrt(n) sure p is actually prime over multiples q of p not prime
Note that we can rewrite the inner for-loop multiplicatively as follows: for(f=p; p*f<=n; f=f+1) // loop over multiples p*f of p S.clear(p*f); // p*f is not prime We would expect this second form to be less efficient in practice than the first, but some of our sieves are derived from this view of the algorithm. (Warning: in practice, if n approaches the word size, p*f may overflow giving a false true in the comparison p*f<=n. To avoid this problem, use f<=n/p instead.) The running time of the sieve of Eratosthenes depends on the number of times the inner for-loop executes. But this is just X X bn/pc √ p≤ n f=p
1≤
X n √ p
p≤ n
which is O(n log log n) time by (1). The algorithm uses O(n) space. Algorithm 2.2: Pritchard’s Linear Sieve. Our next sieve is a linear-time sieve devised by Pritchard [13, Algorithm 3.1]. Pritchard’s idea was to switch the order of the for-loops in the sieve of Eratosthenes so that the outer loop iterates through f-values, while the inner loop iterates through primes p. Here each composite integer x ≤ n is generated exactly once in the form x = p· f where p = lpf(x). To determine which primes p
Trading Time for Space in Prime Number Sieves
183
to iterate through for a given value of f, Pritchard observed that when p = lpf(x), then p ≤ lpf(f). Thus, p must run from 2 to lpf(f). This leads to the algorithm given below. Here PrimeList is a class that finds and holds primes up to the specified bound, and could be implemented by using the sieve of Eratosthenes. int p,i,f; BitVector S(n); S.setall(); S.clear(1); // sets all bits except position 1 PrimeList P(sqrt(n)); // finds the primes up to sqrt(n) for(f=2; f<=n/2; f=f+1) // loop over f values for(i=1; i<=P.length(); i=i+1) // loop over primes<=sqrt(n) { p=P[i]; // P[i] is the ith prime, with P[1]==2 if(p>n/f) break; // if p is too large, get the next f S.clear(p*f); // p*f is composite if(f%p==0) break; // if p|f, p==lpf(f), so get the next f } Because each composite integer x is constructed exactly once in the inner forloop, this algorithm takes O(n) time with O(n) bits of space. Next we look at reducing the running times of both sieves using a wheel. 2.4
Wheels
A wheel, as we will use it, is a data structure that encapsulates information about the integers relatively prime to the first k primes. Generally speaking, a wheel can often be used to reduce the running time of a prime number sieve by a factor of log log n. Pritchard was the first to show how to use a wheel in this way. In this section we will explain the wheel data structure and show how to apply it to the two prime number sieves we described above. We begin with the following definitions: Mk :=
k Y
pi ;
i=1
Wk (y) := {x ≤ y : gcd(x, Mk ) = 1}; Wk := Wk (Mk ). Let #S denote the cardinality of the set S. We observe the following (see (2) and (4)): log Mk = pk (1 + o(1)); #Wk = φ(Mk ) = Mk #Wk (n) = O
k Y p−1
p
i=1
n log log Mk
.
=O
Mk log log Mk
;
184
Jonathan P. Sorenson
Our data structure, then, is an array W[] of records or structs, indexed by 0 . . . (Mk − 1), defined as follows: – – – –
W[x].rp is 1 if x ∈ Wk , and 0 otherwise. W[x].dist is d = y − x, where y > x is minimal with gcd(y, Mk ) = 1. W[x].pos is #Wk (x). W[x].inv is in some sense the inverse of the pos field; it is −1 if x = 0, 0 if x ≥ φ(Mk ), and y such that W[y].pos= x otherwise.
We say that W is the kth wheel, with size Mk . For our C++ notation, we will declare W to be of class type Wheel(k), where k is an integer parameter. For examples of the wheel data structure, see [12,14]. Note that, given a value for k, the kth wheel can be constructed in time proportional to Mk . See [6]. Algorithm 2.3: The Sieve of Eratosthenes with a Wheel. Next we show how √ to use a wheel with the sieve of Eratosthenes. The idea is, for each prime p ≤ n, to generate only those multiples of p that are relatively prime to Mk . int p,i,f,m,k,x; BitVector S(n); PrimeList P(sqrt(n)); // find the primes up to sqrt(n) k = setk(n,P); // find k (see below) Wheel W(k); // create the kth wheel W m = W.size(); // m=M_k S.clearall(); // clears all bits for(i=1; i<=k; i=i+1) // include the first k primes S.set(P[i]); for(x=P[k+1]; x<=n; x=x+W[x%m].dist) // include the integers S.set(x); // relatively prime to M_k //** Main Loop for(p=P[k+1]; p<=sqrt(n); p=p+1) // loop over primes<=sqrt(n) if(S[p]==1) // make sure p is prime for(f=p; p*f<=n; f=f+W[f%m].dist) // generate p’s multiples S.clear(p*f); // p*f is not prime √ Normally, one chooses a value for k such that Mk is between n1/3 and n, implying pk = Θ(log n). This way, the time and space used by the wheel is insignificant in comparison to the complexity of the sieve as a whole, yet significant time savings are obtained, as we shall see in a moment. Our setk() function above chooses such a value for k; we leave its details to the reader. The running time, which is dominated by the inner for-loop, is at most X X X X n O(1) = O(#Wk (n/p)) = O p log log Mk √ √ √ p≤ n
f≤n/p
p≤ n
gcd(f,Mk )=1
=O
n log log n log log Mk
p≤ n
= O(n).
Trading Time for Space in Prime Number Sieves
185
Thus, the sieve of Eratosthenes with a wheel runs in linear time. Algorithm 2.4: Pritchard’s Sieve with a Wheel. Using a wheel with Pritchard’s linear sieve uses a similar technique. To generate integers x = pf that are relatively prime to Mk , we loop over only those f-values that are relatively prime to Mk , and only use primes larger than pk . In fact, the setup is identical to what we did with the Sieve of Eratosthenes above, so we need only present the Main Loop here: //** Main Loop for(f=P[k+1]; f<=n/P[k+1]; f=f+W[f%m].dist) for(i=k+1; i<=P.length(); i=i+1) { p=P[i]; // P[i] is the ith prime, with P[1]==2 if(p>n/f) break; // if p is too large, get the next f S.clear(p*f); // p*f is composite if(f%p==0) break; // if p|f, p==lpf(f), so get the next f } The running time to initialize the bit vector S is now significant in comparison to the main loop, so let us analyze the initialization time. We view S as a bit vector with Ω(log n) bits packed in each word. Thus, by assigning a 0 to each word, the S.clearall() operation takes only O(n/ log n) time. The loops that follow take at most O(#Wk (n)) = O(n/ log log n) time. From this, we see that initialization takes at most O(n/ log log n) time. The running time for the main loop is proportional to the number of composite integers in Wk (n). But this is at most O(#Wk (n)) = O(n/ log log n). Thus, the running time for Pritchard’s linear sieve with a wheel is O(n/ log log n). It uses O(n) space. Reducing Space with a Wheel. With some further modification, both of these sieves can employ the wheel to reduce their space use to O(n/ log log n). The idea is to store only #Wk (n) bits in S, one bit for each integer up to n relatively prime to Mk . The bit for the integer x would be at bit position W[x mod Mk ].pos + φ(Mk ) · bx/Mk c. Calculating the bit position for x takes O(1) time if we use the fact that φ(Mk ) = W[Mk − 1].pos. Thus, using this method does not affect the asymptotic running time of either sieve in theory. In practice, this space-saving technique is not always worthwhile, as the increase in running time is quite noticeable. 2.5
Segmentation
Segmentation is perhaps the best method for reducing the space used by a prime number sieve. In this section, we will show how to apply segmentation to√reduce the space used by the Sieve of Eratosthenes by a factor proportional to n, and by Pritchard’s sieve by a factor proportional to log n.
186
Jonathan P. Sorenson
The basic idea is to split the bit vector S into segments, each of size ∆. All the primes are found in a particular segment before moving to the next, permitting the reuse of the same space. The overall structure of the sieve is as follows: Primelist P(sqrt(n)); // find primes up to sqrt(n) Initialize(); // perform any necessary initialization for(l=sqrt(n); l
√ which is O(n) when √ ∆ n. As the list of primes up to n takes space proportional to n, √the total space used by this algorithm is minimized at √ O( n) when ∆ = O( n). By applying the space-saving technique using a wheel as √ mentioned earlier, and by playing some games in how the larger√primes below n are stored, we can further reduce the space requirement to O( n/ log log n). For more details, see [12]. √
Trading Time for Space in Prime Number Sieves
187
Algorithm 2.6: Segmenting Pritchard’s Linear Sieve, with Wheel. Segmenting this algorithm is a bit more challenging. There are two main problems: – For a given interval, which f-values should be used? Some f-values may have been “finished” on an earlier interval when a prime p is found with p | f. The solution is to maintain a bit vector fok, which keeps track of those fvalues that have “finished.” This bit vector takes O(n/pk ) = O(n/ log n) bits. This reduces by a factor of log log n with the wheel space-saving technique. – For a given f-value, which primes p should be used? The solution is to keep a reverse-index of primes. This is simply an array r[] where r[x] gives the index √ i of the smallest prime pi with pi ≥ x. We n, so this is fast to construct and takes never need to use this for x > √ O( n log n) bits of space. With these ideas, the main loop of our algorithm looks like this: for(f=P[k+1]; f<=n/P[k+1]; f=f+W[f%m].dist) if(fok[f]==1) { imin=max(k+1,r[l/f]); // find the first prime for this f for(i=imin; i<=P.length(); i=i+1) { p=P[i]; if(p>r/f) break; S.clear(p*f-l); if(f%p==0) { fok.clear(f); break; } } } The size of a segment is much larger than for Pritchard’s segmented wheel sieve; we use ∆ = n/pk = O(n/ log n). Again, this reduces to O(n/(log n log log n)) with the use of a wheel to save space. The running time remains at O(n/ log log n). For more details, see [6].
3
Sieves That Use Very Little Space
Our first sieve is a very simple modification of Pritchard’s segmented wheel sieve. √ n space, which includes the space to store the Pritchard’s sieve uses roughly √ primes up to n and the space to hold the bit vector that represents the current segment. √ Our new idea is to sieve by all the integers up to n instead of just the primes. We can then reduce the size of the segment, and we no longer need to √ store the primes up to n. The tradeoff is an increased running time. Algorithm 3.1 Choose c, with 0 < c ≤ 1/2. We use ∆ = nc for the segment size. We first find (and output) the primes up to log n, for use in choosing k to construct the wheel.
188
Jonathan P. Sorenson
We use Mk ≈ nc/2 . This way the space used by the wheel is insignificant, yet it is sufficiently large to receive the Θc(log log n) speed benefit. Below is the code fragment for the sieve() function. As before, the current interval is [l + 1, r]. The only thing new is the use of all integers relatively prime to Mk for sieving. //** Initialize the segment int x,d,f,m=W.size(); S.clearall(); for(x=l+W[l%m].dist; x<=r; x=x+W[l%m].dist) S.set(x-l); //** Main loop: sieve by integers relatively prime to m for(d=P[k+1]; d<=sqrt(r); d=d+W[d%m].dist) { firstf=l/d+W[(l/d)%m].dist; for(f=firstf; d*f<=r; f=f+W[f%m].dist) S.clear(d*f-l); } At this point, S represents the primes in the interval [l + 1, r]. Complete details have been omitted in the interest of space. Lemma 1. Let n, m be positive integers with m < n. Then X x≤n gcd(x,m)=1
φ(m) 1 = (log(n/m) + O(1)). x m
Proof. We have X x≤n gcd(x,m)=1
1 = x
X
X
a≤m gcd(a,m)=1
mk+a≤n
X
=
a≤m gcd(a,m)=1
= Here we used the fact that
1 mk + a
1 1 X 1 +O m k k2 k
φ(m) (log(n/m) + O(1)). m
Pn
i=1 1/i
= log n + γ + O(1/n)[7].
t u
Theorem 2. Let 0 < c ≤√1/2, and set ∆ = nc . Algorithm 3.1 correctly finds the primes up to n using O(n n/(∆(log log n)2 ) + (n log n)/(log log n)2 ) operations and O(∆) space. Proof. Correctness follows from our discussion above and from the previous section.
Trading Time for Space in Prime Number Sieves
189
As discussed earlier, performing the S.clearall() function takes O(∆/ log n) operations. Thus, the total number of arithmetic operations is at most proportional to X ∆ n ∆ + +1 ∆ log n x log log n √ x≤ n gcd(x,Mk )=1
√ n φ(Mk ) ∆ log n n + + n log n ∆ Mk log log n √ n log n n n + . ∆ log log n (log log n)2 The only significant space used is that of the wheel and the bit vector S, both of which are bounded by O(∆) bits. t u 2 Note that the second term in the running √ time, the O((n log n)/(log log n) ) term, is only significant if we choose ∆ n/ log n. One may also choose, say, c = 1/3 or c = 1/4. The first choice yields a sieve that uses roughly n7/6 time with n1/3 space, and the second yields a sieve that uses roughly n5/4 time and n1/4 space.
Algorithm 3.2 There is a slight modification we can make to Algorithm 3.1 that is of interest √ in the main loop of the sieve() function, we when ∆ is close to n. Currently, √ sieve by all integers up to n that are relatively prime to Mk . As an alternative, √ we √will use Pritchard’s segmented wheel sieve to find the primes up to n in O( n) operations using O(n1/4 ) bits of space. These primes do not need to be stored; they are used to sieve the interval as they are generated, and they are regenerated for each interval. The space used by this algorithm is O(n1/4 +∆), and the number of arithmetic operations used is at most proportional to √ X √ n n ∆ n ∆ + n+ +1 + n. ∆ log n p log log n ∆ √ p≤ n
1/4 This modification only makes sense √ if c ≥ 1/4, as Pritchard’s sieve requires n space to find the primes up to n anyway. p n/ log n. In This method is superior to Algorithm 3.1 when, say, ∆ = 2 ) operations, whereas Althis case, Algorithm 3.1 takes O((n log n)/(log log n) √ log n) operations. Both use the same amount of space gorithm 3.2 uses only O(n p at O( n/ log n). We have proved the following. c Theorem 3. Let 1/4 ≤ c ≤ 1/2, √ and set ∆ = n . Algorithm 3.2 correctly finds the primes up to n using O(n n/∆ + n) operations and O(∆) space.
190
4
Jonathan P. Sorenson
A Space-Efficient Linear Sieve
In this section, √ we present Algorithm 4.1 which takes O(n) operations while using only O( n/(log log n)2 ) bits of space. This is less space than Pritchard’s segmented wheel sieve [12] by a factor of log log n. Algorithm 4.1 The basic idea is as follows. On each by the primes up √ first sieve √ interval, we 2 n/(log n) . We to a bound near to, but less than, n, such as √ √ then sieve by 2 between n/(log n) and n. We choose all integers relatively prime to M k √ ∆ = n/ log log n. int l,r,k,delta,plimit; delta=sqrt(n)/log(log(n)); // compute our segment size plimit=sqrt(n)/(log(n)*log(n)); Primelist P(plimit); // find the primes up to plimit k=setk(n,P); // set a value for k (see below) Wheel W(k); // build the kth wheel output(P); // output the primes up to plimit Bitvector B(delta); for(l=plimit; l
Trading Time for Space in Prime Number Sieves
191
{ firstf=l/d+W[(l/d)%m].dist; for(f=firstf; d*f<=r; f=f+W[f%m].dist) S.clear(d*f-l); } At this point, S represents the primes in the interval [l + 1, r]. We assume the wheel space-saving technique is employed with S. Theorem 4. √ Algorithm 4.1 correctly finds the primes up to n using O(n) operations and O( n/(log log n)2 ) space. Proof. Correctness follows from our discussions √ above and from earlier sections. n/(log log n)2 ), because ∆ = The space used by the bit vector S is O( √ n/ log log n and we are employing the space-saving technique using √ the wheel2 n/(log n) to save an additional factor of log log n. The list of primes up to √ 2 1/3 uses only O( n/(log n) ) bits. The wheel requires roughly n bits. Thus, the √ total space used is O( n/(log log n)2 ) bits. The number of arithmetic operations is at most proportional to n ∆ + ∆ log log n
X p≤
√ n (log n)2
∆ +1 + p log log n
X √ √ n ≤d≤ n (log n)2 gcd(d,Mk )=1
∆ +1 d log log n
.
√ 3 The second term √ in the parentheses is easily seen to be O(∆ + n/(log n) ) = O(∆) as ∆ = n/ log log n. This dominates the first term. For the third term, we apply Lemma 1 to obtain X X ∆ ∆ +1 − +1 d log log n d log log n √ √ d≤ n d≤ n/(log n)2 gcd(d,Mk )=1
gcd(d,Mk )=1
√ √ √ φ(Mk ) n n n ∆ log − log + O(1) + O = log log n Mk Mk Mk (log n)2 log log n √ φ(Mk ) n ∆ log log n + =O log log n Mk log log n = O(∆).
From this, we see that the total number of arithmetic operations is O((n/∆)∆) = O(n). t u
5
A Space-Efficient Sublinear Sieve
In this section we present Algorithm 5.1, a sublinear sieve that uses less space than the sieve of Dunten, Jones, and Sorenson [6]. This sieve is a combination of Pritchard’s segmented wheel sieve[12] and that of Dunten, Jones, and Sorenson.
192
Jonathan P. Sorenson
Algorithm 5.1 In Algorithm 2.6, the fok[] array uses O(n/(log n log log n)) bits of space. To reduce the space use of this algorithm, we must somehow reduce the number of fvalues that we need in the main loop, thereby reducing the storage requirements for this array. The idea is to pre-sift the current segment by all primes up to a bound B, √ where log n ≤ B ≤ n. Then, we only need to use f-values satisfying B < f ≤ n/B and relatively prime to Mk . To avoid crossing off multiples of primes below B, we also pre-sieve the fok[] array by the primes up to B. //** Sieve by primes up to B for(i=k+1; P[i]<=B; i=i+1) { p=P[i]; firstf=l/p+W[(l/p)%m].dist; for(f=firstf; p*f<=r; f=f+W[f%m].dist) S.clear(p*f-l); } //** Main loop for(f=B+W[B%m].dist; f<=n/B; f=f+W[f%m].dist) if(fok[f]==1) { imin=max(k+1,r[l/f]); // find the first prime for this f for(i=imin; i<=P.length(); i=i+1) { p=P[i]; if(p>r/f) break; S.clear(p*f-l); if(f%p==0) { fok.clear(f); break; } } } We use a segment size of ∆ = n/B. The only change to preprocessing is that we need to sieve the fok[] array. We assume the wheel is roughly n1/3 in size. Theorem 5. Algorithm 5.1 correctly finds the primes up to n using n log(1 + log B/ log log n) O log log n √ arithmetic operations and O(n/(B log log n) + n) bits of space. Proof. Correctness follows from that of Algorithms 2.5 and 2.6. The space used by the algorithm is dominated by that used for the segment and the fok[] array. Both can be implemented to use the wheel-based space saving technique so that they require only O(∆/ log log n) = O(n/(B log log √n)) √ bits. The n term arises from the space needed to store the primes up to n.
Trading Time for Space in Prime Number Sieves
193
To calculate the running time of the algorithm, we will first compute the preprocessing time, then the time spent on one segment, and finally combine these to compute the total time. √ In preprocessing, we must compute the wheel, find the primes up to n, compute the reverse index array r[], and sieve √ the fok[] array. Except for the fok[] array, everything can be done in O( n) operations. Sieving the fok[] array takes time proportional to X pk
n n/B (log log B − log log pk + O(1)) p log log n B log log n
n log(1 + log B/ log log n) , B log log n
as pk = Θ(log n). In processing a segment, we first initialize the segment, then sieve by primes up to B, and finally execute our main loop. As mentioned earlier, initializing the sieve takes O(∆/ log n + ∆/ log log n) operations. This is the time needed to zero out the bit vector and the time to place ones in positions corresponding to integers relatively prime to Mk . Sieving the segment by primes between pk and B takes the same time as sieving the fok[] array: O(n log(1 + log B/ log log n)/(B log log n)) operations. Finally, the main loop crosses off each composite integer x from the current interval where x has no prime divisors less than B. This takes at most O(∆/ log log n) time. Summing this time over the n/∆ segments, we obtain a total running time of O(n log(1 + log B/ log log n)/ log log n) arithmetic operations. t u This theorem provides a nice space-time √ tradeoff between Algorithms 2.5 and 2.6. For example, one may choose B = n, essentially obtaining Pritchard’s segmented wheel sieve, or B = log n, to get the sieve of Dunten, Jones, and Sorenson. Choosing B = (log n)l for some fixed l > 1 gives new space bounds for O(n/ log log n)-time sieves.
6
Timing Results
In this section we present the timing results from our implementation of Algorithms 3.1 (with ∆ = n0.4), Algorithm 4.1, and Algorithm 5.1 (with l ranging from 1.5 to 3). We also implemented several of the algorithms mentioned in Sect. 2 for purposes of comparison. As we are primarily interested in spaceefficient algorithms, we focused on segmented sieves. All algorithms were implemented in C++. Timing results are given in CPU seconds. For smaller values of n, the time given is an average over several runs. Dashed entries indicate that the algorithm was not run for that input due to excessive space requirements. The results are presented in Table 1.
194
Jonathan P. Sorenson
Table 1. Average running times in CPU seconds Algorithm
n = 104 n = 105 n = 106 n = 107 n = 108 n = 109
Alg. 2.1 Sieve of Eratosthenes Bays and Hudson Alg. 2.5 Segmented with wheel Alg. 2.2 Pritchard’s Linear Sieve Alg. 2.4 With wheel Alg. 3.1 Alg. 4.1 Alg. 5.1 l = 1.50 Alg. 5.1 l = 1.75 Alg. 5.1 l = 1.79 Alg. 5.1 l = 2.00 Alg. 5.1 l = 2.25 Alg. 5.1 l = 2.50 Alg. 5.1 l = 2.75 Alg. 5.1 l = 3.00
0.00086 0.00282 0.00210 0.00248 0.00136 0.00904 0.00612 0.00278 0.00284 0.00280 0.00276 0.00290 0.00318 0.00402 0.00508
0.0104 0.0274 0.0212 0.0242 0.0130 0.0828 0.0674 0.0252 0.0248 0.0234 0.0244 0.0248 0.0256 0.0290 0.0410
0.136 0.264 0.216 0.258 0.146 1.002 0.750 0.266 0.262 0.234 0.256 0.260 0.266 0.270 0.268
2.80 – 2.68 27.44 2.27 23.61 2.68 – 1.74 – 12.03 122.28 8.16 78.11 2.42 25.10 2.42 25.06 2.37 24.36 2.40 24.83 2.48 25.40 2.51 25.85 2.54 26.56 2.55 26.71
– 286 242 – – 1478 816 261 258 254 255 259 262 269 274
The computing platform was a 200MHz Pentium Pro with 96MB RAM and a 512kB cache running Linux 2.0. We used the gnu g++ compiler version 2.7.2.1 with -O optimization. We remind the reader that all timing results depend not only on the particular platform, operating system, and compiler, but also on the language and the programmer. Thus, any conclusions drawn from such data must be met with a certain degree of scepticism. With that said, it seems clear that Algorithm 5.1 is very practical. If l is chosen carefully, and the code is tuned more than ours was, it may be capable of surpassing Pritchard’s segmented wheel sieve. We found l = 1.79 ± 0.005 to be nearly optimal.
Acknowledgements Special thanks to Justin Hockemeyer, who, during the summer of 1997, implemented a version of Algorithm 3.1. He was supported by the Butler Summer Institute.
References 1. Adleman, L. M., Pomerance, C., Rumely, R.: On distinguishing prime numbers from composite numbers. Annals of Mathematics 117 (1983) 173–206 2. Bach, E.: Analytic Methods in the Analysis and Design of Number-Theoretic Algorithms. MIT Press, Cambridge (1985)
Trading Time for Space in Prime Number Sieves
195
3. Bach, E., Shallit, J.: Algorithmic Number Theory, Vol. 1. MIT Press, Cambridge (1996) 4. Bays, C., Hudson, R.: The segmented sieve of Eratosthenes and primes in arithmetic progressions to 1012 . BIT 17 (1977) 121–127 5. Bernstein, D. J.: Personal communication. (1998) 6. Dunten, B., Jones, J., Sorenson, J. P.: A space-efficient fast prime number sieve. Information Processing Letters 59 (1996) 79–84 7. Greene, D. H., Knuth, D. E.: Mathematics for the Analysis of Algorithms. 3rd edn. Birkh¨ auser, Boston (1990) 8. Hardy, G. H., Wright, E. M.: An Introduction to the Theory of Numbers. 5th edn. Oxford University Press (1979) 9. Miller, G.: Riemann’s hypothesis and tests for primality. Journal of Computer and System Sciences 13 (1976) 300–317 10. Pomerance, C. (ed.): Cryptology and Computational Number Theory. Proceedings of Symposia in Applied Mathematics, Vol. 42. American Mathematical Society, Providence (1990) 11. Pritchard, P.: A sublinear additive sieve for finding prime numbers. Communications of the ACM 24(1) (1981) 18–23,772 12. Pritchard, P.: Fast compact prime number sieves (among others). Journal of Algorithms 4 (1983) 332–344 13. Pritchard, P.: Linear prime-number sieves: A family tree. Science of Computer Programming 9 (1987) 17–35 14. Sorenson, J. P., Parberry, I.: Two fast parallel prime number sieves. Information and Computation 144(1) (1994) 115–130 15. Stroustrup, B.: The C++ Programming Language. 2nd edn. Addison-Wesley (1991)
Do Sums of 4 Biquadrates Have a Positive Density? Jean-Marc Deshouillers1,2 , Fran¸cois Hennecart1 , and Bernard Landreau1 1
1
Laboratoire Algorithmique Arithm´etique Exp´erimentale, UPRES.A. 5465 CNRS-Universit´e Bordeaux 1 2 Math´ematiques Stochastiques, Universit´e Victor Segalen Bordeaux 2
Introduction
Since almost one century, the study of sums of s integral s-th powers has attracted the attention of many mathematicians. Regardless of their efforts, the precise knowledge of their asymptotic behaviour still seems quite out of reach, except in the case of sums of two squares, where Landau [6] proved in 1908 that their number up to x is asymptotically equal to Cx(log x)−1/2 , for some explicit constant C. On the faith of some computation, P. Barrucand [1] suggested in 1968 that sums of 3 cubes, or sums of 4 fourth powers have a zero asymptotic density, but Ch. Hooley [5] developped in 1986 heuristic arguments in favour of a positive asymptotic density. A third way to tackle the problem has been suggested by Erd˝ os and R´enyi [3] in 1960: they built random sequences called pseudo s-th powers, which mimic the behaviour of integral s-th powers and they suggested that the number of representations of an integer as a sum of s pseudo s-th powers should almost surely behave according to a Poisson law, implying the positive asymptotic density of sums of pseudo s-th powers. Besides the fact that their proof was not complete (it has been indeed completed by Goguel [4] and Landreau [7]), the drawback of their approach is that it leads to a positive density for sums of 2 pseudo-squares as well. For that reason, in [2], we introduced in the Erd˝ os-R´enyi model a refinement which takes into account the arithmetic behaviour of actual s-th powers (which are not well distributed in arithmetic progressions): in our model, sums of 2 pseudo-squares have a zero asymptotic density whereas this density exists and is strictly positive for sums of s pseudo s-th powers, when s ≥ 3. In this paper, we report on our study on sums of 4 biquadrates. In section 2, we explain how to compute the asymptotic density in our probabilistic model. In section 3, we report on the extensive computation we performed on sums of 4 biquadrates, and show a good agreement between the behaviour of the actual biquadrates and that of the pseudo-biquadrates of our probabilistic Erd˝ os-R´enyi model.
J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 196–203, 1998. c Springer-Verlag Berlin Heidelberg 1998
Do Sums of 4 Biquadrates Have a Positive Density?
2
197
The Probabilistic Density
The notation follows that of [2]. The arithmetic probabilistic model described in [2] provides a probabilistic density δ0 (K) which tends to a limit δ0 when K tends multiplicatively to infinity. By this we mean that for any sequence (Kn ) such that Kn divides Kn+1 and any integer d divides Kn when n is large enough, we have X 1 e−λ(k,Kn ) , δ0 := lim n→∞ Kn k mod Kn
Γ (1/4)4 4!.44
, γ ρ(k,K) K3
γ = = 0.02812374122 . . . and ρ(k, K) is the where λ(k, K) = number of solutions of the congruence k14 + k24 + k34 + k44 ≡ k
mod K.
To compute the actual value δ0 with a given precision is quite an interesting algorithmic problem. In order to find good upper and lower bounds for δ0 , we proceed in two steps: • first step: since δ0 (K) is multiplicatively increasing in K, we directly compute δ0 (K) for a given value of K with K as large as possible. This will provide the lower bound. • secound step: we get an upper bound for the difference δ0 − δ0 (K) by using the following lemma Lemma For each K, one has 0 ≤ δ0 − δ0 (K) ≤
γ2 (S − S2 (K)), 2
(1)
2 P ρ(k,K) 1 and S = limK→∞ S2 (K) where K is where S2 (K) = K k mod K K3 assumed to tend multiplicatively to infinity. Proof : This comes from the following upper bound stated in section 3.3 of [2] available for every K, q ≥ 1 |δ0 (Kq) − δ0 (K)| ≤
γ2 (S2 (Kq) − S2 (K)). 2!
For a fixed K we just let q go to infinity. For the first step, we choose K = 240 .
Q
p≥3,pα ≤200000 p
α
0.02435851 ≤ 1 − δ0 (K) ≤ 0.02435863. This value is obtained by summing the series ∞ X γi (−1)i+1 Si (K), i! i=1
, and then get (2)
198
Jean-Marc Deshouillers, Fran¸cois Hennecart, and Bernard Landreau
i P ρ(k,K) 1 where Si (K) = K . We first compute all ρ(k, pα ); for this, k mod K K3 we get formulae depending on different cases on p, namely p = 2, p ≡ 3 mod 4, p ≡ 1 mod 8 or p ≡ 5 mod 8. For the last two cases, the decomposition p = a2 + b2 with a ≡ 1 mod 4 is needed. By multiplicativity of Si , we just have to compute each Si (pα ). The involved computation is quite similar to that of S we present below. We notice that the functions Ri (x) :=
X xi+1 xi xj − +··· = (−1)j−i i! (i + 1)! j! j≥i
satisfy Ri (x) ≥ 0 for all i ∈ N and x ≥ 0. It follows by a summation over k mod K that we have for any n ≥ 1 2n X
(−1)i+1
i=1
2n−1 X γi γi Si (K) ≤ 1 − δ0 (K) ≤ (−1)i+1 Si (K). i! i! i=1
The general term of the series is at first decreasing for small values of i, then it quickly increases and finally it goes to zero for large values of i. It fortunately turns out that good inequalities are obtained with only a few first terms of the series and for our choice of K this happens when i = 12. We also notice that in the course of the computation we get the value S2 (K) = 11.084326...
(3)
For the secound step, we have to compute a good upper bound for S. Following section 3.3 of [2], we have S=
Y
X(p) =
p
with β
Ω(p ) =
∞ YX
Ω(pβ ),
p β=0
β pX −1 h=1 (h,p)=1
G4 (h, pβ ) 8 ≥ 0, pβ
Pq where Gs (h, q) = x=1 e(hxs /q). We have to consider different cases according as p = 2 (i.e. p|4) or p ≥ 3 and then according to the value of (p − 1, 4). First case: p = 2 In view of Lemma 3 of [2], for β = 4k + v ≥ 5, with 1 ≤ v ≤ 4, we get for odd h v G4 (h, 2β ) = 1 G4 (h, 2 ) . 2k 2β 2v This leads to Ω(2
4k+v
1 ) = 8k 2
p4k+v X−1 t=1 (t,2)=1
G(t, 2v ) 8 1 v pv = 24k Ω(2 ).
Do Sums of 4 Biquadrates Have a Positive Density?
199
We just have to compute Ω(2), Ω(4), Ω(8), Ω(16) and use the formula X(2) = 1 + (Ω(2) + Ω(4) + Ω(8) + Ω(16))
∞ X 1 24k k=0
16 (Ω(2) + Ω(4) + Ω(8) + Ω(16)) . =1+ 15 A quick computation gives Ω(2) = 0, Ω(4) = 1/8, Ω(8) = 17/16 et Ω(16) = 35/16, and we get X(2) =
23 . 5
(4)
Second case: p ≥ 3 Again, following Lemma 3 of [2], in the case when p does not divide 4, for β = 4k + v ≥ 2, with 1 ≤ v ≤ 4, we get v G4 (t, pβ ) = 1 G4 (t, p ) . pk pβ pv This leads to Ω(p4k+v ) =
1 Ω(pv ), p4k
and ∞ X 1 X(p) = 1 + Ω(p) + Ω(p2 ) + Ω(p3 ) + Ω(p4 ) p4k k=0 2 3 4 Ω(p) + Ω(p ) + Ω(p ) + Ω(p ) =1+ . 1 − 1/p4
(5)
Furthermore, for 2 ≤ v ≤ 4 and (h, p) = 1 we have G4 (h, pβ ) = pv−1 , and then Ω(p2 ) =
p(p − 1) p−1 = , 8 p p7
Ω(p3 ) =
p−1 p−1 et Ω(p4 ) = . 6 p p5
If p ≡ 3 mod 4, we use the fact that quartic residues and quadratic residues are the same. This gives G4 (a, p) = G2 (a, p), √ and Gauß quadratic sums are well known, G2 (a, p) = i p in the case when p ≡ 3 mod 4. This leads to p−1 . Ω(p) = p4 We finally get using (5) X(p) = 1 +
1 . p3
200
Jean-Marc Deshouillers, Fran¸cois Hennecart, and Bernard Landreau
We compute using, PARI-gp, the following product with B1 = 1 000 000 Y (1 + 1/p3 ) < 1.04115807283 . 3≤p
Then, we get an upper bound for the remaining product Y (1 + 1/p3 ) < exp(1/(2 ∗ B1 )2 ) < 1.00000000000051 . B1
This leads to
Y
Π1 :=
X(p) < 1.0411580729 .
(6)
3≤p p≡3 mod 4
If p ≡ 1 mod 4, we have Ω(p) p−1 (1/p5 + 1/p6 + 1/p7 ) + 1 − 1/p4 1 − 1/p4 1 + Ω(p) − 1/p7 = . 1 − 1/p4
X(p) = 1 +
(7)
We can use the well-known inequality for (h, p) = 1 √ |G4(h, p)| ≤ ((4, p − 1) − 1) p, which gives Ω(p) ≤ 38
6541 p−1 ≤ 3 . 4 p p
In fact we can get a little bit better by using Hooley’s work in [5] – for p ≡ 1 mod 8, Ω(p) ≤ 1 + 1641/p3, – for p ≡ 5 mod 8, Ω(p) ≤ 1 + 313/p3. We compute again with PARI-gp the following product from 5 to B2 = 1 000 000 Y (1 + Ω(p) − 1/p7 ) < 2.31438253318 . (8) 1 − 1/p4 5≤p≤B p≡1
2 mod 4
We then compute upper bounds for the products between B2 and B3 = 10 000 000. Y Y 1 + 1641/p3 1 + 313/p3 . < 1.00000000000992 4 1 − 1/p 1 − 1/p4 B2
Do Sums of 4 Biquadrates Have a Positive Density?
201
A upper bound for the remaining product is Y p>B3 p≡1 mod 4
1 + Ω(p) − 1/p7 ≤ 1 − 1/p4
Y p>B3 p≡1 mod 4
≤ (1 + ≤ (1 +
1 + 1641/p3 1 − 1/p4
1 ) exp( 3B33
X
1641/p3)
(10)
p>B3 p≡1 mod 4
1641 1 ) < 1.0000000000083 . 3 ) exp( 3B3 2B23
Combining (7), (8), (9), and (10) finally leads to Y X(p) < 2.31438253323 . Π2 :=
(11)
3≤p p≡1 mod 4
In view of (4), (6) and (11) we get S=
23 · Π1 · Π2 < 11.08433506712 . 5
(12)
Using now the value obtained in (3) S2 (K) = 11.0843264... > 11.0843264, (2), (1) and (12) we obtain for the probabilistic density 0.02435850 < 1 − δ0 < 0.02435863 .
3 3.1
(13)
Validation of the Arithmetic Model Computation of the Experimental Density
For x ≤ x0 = 6 · 1013 , we have computed, on the one hand, the number N (x) of integers up to x which are the sum of four biquadrates, and on the other hand, the number Nd (x) of such integers restricted to sums of four distinct biquadrates. The difference N (x) − Nd (x) plainly is less than the number of integers n up to x which are sums of the type n = 2n41 + n42 + n43 which is a O(x3/4 ). Thus the two functions ν(x) = N (x)/x and νd (x) = Nd (x)/x join together when x becomes infinity, i.e. ν(x) − νd (x) = o(1). We use here two different ways to compute N (x) and Nd (x). An integer which is the sum of k biquadrates is called Bk . Sums of any four biquadrates can be viewed as the sum of two B2 ’s. The basic √ principle is to compute up to x0 all the B2 ’s (there are asymptotically γ x0 such numbers, where γ is a gamma factor), and to arrange them in a string of 64 bits words. To save memory it is even possible to keep only the differences between consecutive B2 ’s, which do not exceed the 32 bits words size. To determine which elements in a given interval are B4 , we represent them by their address in a string of bits: initially, we give the value 0 to these bits; we now add the string of B2 ’s with itself, restricting addition to the sums which fall in the considered interval,
202
Jean-Marc Deshouillers, Fran¸cois Hennecart, and Bernard Landreau
0.02461 asymptotic probabilistic density experimental density (equal summands allowed) experimental density (distinct summands only)
0.02456
0.02451
0.02446
0.02441
0.02436
0.02431
0.02426
0.02421 0
2e+13
4e+13
6e+13
8e+13
1e+14
Fig. 1. The experimental and probabilistic densities
and give the value 1 to one bit as soon as its address is seen as a sum of two B2 ’s. At the end of the process, we check which bits are 0 and which are 1. We easily observe that this method is better in terms of computing time than the naive one, which consists in adding four biquadrates (set in increasing order), giving a B4 and to put the value 1 to the bit representing the address of this B4 . Nevertheless, to our knowledge, this is the only way to find the sums of four distinct biquadrates, and thus to compute νd (x). 3.2
Comparing the Probabilistic Density with the Experimental Density
Looking at Figure 1, it is impressive to observe how well our model of sums of 4 pseudo-biquadrates fits with the ordinary sums of 4 biquadrates. The general outline of the experimental density functions ν and νd looks like monotone functions, the first being decreasing, and the second being increasing. Furthermore
Do Sums of 4 Biquadrates Have a Positive Density?
203
the value of 1 −δ0 of the probabilistic density takes place with exactness between the curves of the two different experimental density functions. For the case of sums of 3 cubes, the situation remains for the moment unclear: our probabilistic density does not fit exactly with the experimental density. Further, for sums of s-th powers, s ≥ 5, we may expect that our probabilistic density is quite good but we do not yet perform enough computations to support this assertion.
References 1. P. Barrucand, “Sur la distribution empirique des sommes de trois cubes ou de quatre bicarr´es”, Note aux C. R. Acad. Sc. Paris, A 267 (1968), 409-411. 2. J-M. Deshouillers, F. Hennecart, B. Landreau , “Sums of powers: an arithmetic refinement to the probabilistic model of Erd˝os and R´enyi”, to appear in Acta Arithmetica. 3. P. Erd˝ os et A. R´enyi, “Additive properties of random sequences of positive integers”, Acta Arith. 6 (1960), 83-110. ¨ 4. J.H. Goguel, “Uber Summen von zuf¨ alligen Folgen nat¨ urlichen Zahlen”, J.-ReineAngew.-Math. 278/279 (1975),63-77. 5. C. Hooley, “On some topics connected with Waring’s problem”, J.-Reine-Angew.Math. 369 (1986), 110-153. ¨ 6. E. Landau, “Uber die Einteilung der . . . Zahlen in 4 Klassen . . . ”, Arch. Math. Phys. (3) 13 (1908), 305-312. 7. B. Landreau, “Mod`ele probabiliste pour les sommes de s puissances s-i`emes”, Compositio Math. 99 (1995), 1-31. Jean-Marc Deshouillers Math´ematiques Stochastiques Universit´e Victor Segalen Bordeaux 2 F-33076 Bordeaux Cedex e-mail: [email protected] Fran¸cois Hennecart et Bernard Landreau Laboratoire d’Algorithmique Arithm´etique Exp´erimentale Universit´e Bordeaux I F-33405 TALENCE Cedex e-mail: [email protected], [email protected]
New Experimental Results Concerning the Goldbach Conjecture? J-M. Deshouillers1 , H.J.J. te Riele2 , and Y. Saouter3 1
Math´ematiques Stochastiques Universit´e Victor Segalen Bordeaux 2 F-33076 Bordeaux Cedex, France [email protected] 2 CWI, Centre for Mathematics and Computer Science Kruislaan 413, 1098 SJ Amsterdam, The Netherlands [email protected] 3 Institut de Recherche en Informatique de Toulouse 118 route de Narbonne, F-31062 Toulouse Cedex, France [email protected] Abstract. The Goldbach conjecture states that every even integer ≥ 4 can be written as a sum of two prime numbers. It is known to be true up to 4 × 1011 . In this paper, new experiments on a Cray C916 supercomputer and on an SGI compute server with 18 R10000 CPUs are described, which extend this bound to 1014 . Two consequences are that (1) under the assumption of the Generalized Riemann hypothesis, every odd number ≥ 7 can be written as a sum of three prime numbers, and (2) under the assumption of the Riemann hypothesis, every even positive integer can be written as a sum of at most four prime numbers. In addition, we have verified the Goldbach conjecture for all the even numbers in the intervals [105i , 105i + 108 ], for i = 3, 4, . . . , 20 and [1010i , 1010i + 109 ], for i = 20, 21, . . . , 30. A heuristic model is given which predicts the average number of steps needed to verify the Goldbach conjecture on a given interval. Our experimental results are in good agreement with this prediction. This adds to the evidence of the truth of the Goldbach conjecture. 1991 Mathematics Subject Classification: Primary 11P32; Secondary 11Y99 1991 Computing Reviews Classification System: F.2.1 Keywords and Phrases: Goldbach conjecture, sum of primes, primality test, vector computer, Cray C916, cluster of workstations
Acknowledgements The first named author benefited from the support of CNRS and the Universities Bordeaux 1 and Bordeaux 2. The second author’s contribution was carried out ?
To appear in the Proceedings of the Algorithmic Number Theory Symposium III (Reed College, Portland, Oregon, USA, June 21–25, 1998).
J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 204–215, 1998. c Springer-Verlag Berlin Heidelberg 1998
New Experimental Results Concerning the Goldbach Conjecture
205
within CWI Project MAS2.5 “Computational number theory and data security”. He acknowledges the help of Walter Lioen with proving primality of many large numbers with the programs of Cohen, Lenstra and Winter, and of Bosma and Van der Hulst. Access to the Cray C916 vector computer at the Academic Computing Centre Amsterdam (SARA) was provided by the Dutch National Computing Facilities Foundation NCF. Access to the Power Challenge Array R10000 compute server was provided by the Centre Charles Hermite in Nancy, thanks to INRIA Lorraine.
1
Introduction
The binary Goldbach conjecture (BGC) states that every even integer ≥ 4 can be expressed as a sum of two prime numbers. By G2 we denote the least upper bound for the number G with the property that all even numbers n with 4 ≤ n ≤ G can be written as a sum of two prime numbers. It is known that G2 ≥ 4 × 1011 [15,17,7,16]. The ternary Goldbach conjecture (TGC) states that every odd integer ≥ 7 can be expressed as a sum of three prime numbers. Clearly, the truth of BGC implies the truth of TGC. In 1923, Hardy and Littlewood [8] proved that, under the assumption of a weak version of the Generalized Riemann hypothesis (GRH), there exists a positive integer M0 such that TGC holds for all odd integers ≥ M0 . In 1937, Vinogradov [18] proved, unconditionally, that there exists a positive integer N0 such that TGC holds for all odd integers ≥ N0 . In 1989, Chen and Wang [3] showed that one can take N0 = 1043000, and in 1993 [4] they showed, assuming GRH, that one can take M0 = 1050 . Very recently, Zinoviev [19] proved, assuming GRH, that one can take M0 = 1020. By the use of classical computations by Schoenfeld [14], this result implies [6]. Theorem A. If GRH holds and if G2 ≥ 1.615 × 1012, then every odd integer ≥ 7 can be expressed as a sum of three primes. This was one of our motivations for the present study. Remark. In [13], the third author has proved, unconditionally, the truth of TGC up to 1020 by computing an increasing sequence of about 2.5 × 108 prime numbers q0 , q1, . . . , qQ such that q0 < 4 × 1011 , qi+1 − qi < 4 × 1011 for all 0 ≤ i ≤ Q − 1 and qQ > 1020 . This shows that near every odd number N < 1020 there is a prime q such that N − q < 4 × 1011 and by [16] N − q can be expressed as a sum of two primes. A second motivation was the following result of Kaniecki [10]: Theorem B. If the Riemann hypothesis (RH) holds and if G2 ≥ 1.405 × 1012 , then every even positive integer can be written as a sum of at most four primes.
206
J-M. Deshouillers, H.J.J. te Riele, and Y. Saouter
Without any assumption, Ramar´e [12] proved that every even positive integer is a sum of at most six primes. In this paper, we report the results of extensive computer experiments to the effect of the following Theorem 1. We have G2 ≥ 1014, so the assumptions on G2 in Theorems A and B are satisfied. In addition, we have checked that all the even integers in some given intervals are sums of two primes, namely: Theorem 2. All the even integers in the intervals [105i, 105i + 108 ], for i = 3, 4, . . ., 20 and [1010i, 1010i +109 ], for i = 20, 21, . . . , 30, are sums of two primes. We have verified BGC with an algorithm which was used, but not given very explicitly, by Mok-Kong Shen [15]. In addition to extending the interval on which BGC is known to be true by a factor of 250, we give a heuristic model which predicts the average number of steps necessary to check BGC with this algorithm. This adds some theoretical evidence to the already overwhelming numerical evidence of the truth of BGC.
2
Two Algorithms to Verify the Binary Goldbach Conjecture on [a, b]
The known algorithms for verifying the Goldbach conjecture on a given interval [a, b] consist of finding two sets of primes P and Q such that P + Q covers all the even numbers in [a, b]. Let pi be the i–th odd prime number. One approach, as applied in [17,7,16], is to find, for every even e ∈ [a, b], the smallest odd prime pi such that e − pi is a prime. This amounts to taking for P the odd primes p1 , p2 , . . . , pm for suitable m and to take Q = Q(a, b) = {q | q prime and a − a ≤ q ≤ b} for some suitably chosen a . A series of sets of even numbers E0 ⊂ E1 ⊆ E2 ⊆ . . . is then generated, defined by E0 = ∅, Ei+1 = Ei ∪ (Q(a, b) + pi+1 ), i = 0, 1, . . . , 1 until for some j the set Ej covers all the even numbers in the interval [a, b]. The set Q(a, b) is generated with the sieve of Eratosthenes: this is the most time-consuming part of the computation. For the choice of a it is sufficient that a exceeds the largest odd prime pj used in the generation of the sets Ej . This approach permits to deliver, for every even integer e ∈ [a, b], the smallest prime 1
By Q(a, b) + pi+1 we mean, as usually, the set {q + pi+1 |q ∈ Q(a, b)}.
New Experimental Results Concerning the Goldbach Conjecture
207
p such that e − p is prime (the pair (p, e − p) is then called the minimal Goldbach decomposition of e). In the computations used for checking the Goldbach conjecture up to 4×1011 [16], the largest small odd prime needed was p446 = 3163 (this is the smallest prime p for which 244, 885, 595, 672 − p is prime). An expensive part of this approach is that essentially all the primes on the interval [a, b] have to be determined. A more efficient approach, as applied in [15], is to find, for every even e ∈ [a, b], a prime q, close to a, for which e − q is a prime. This amounts to choosing for P the set of all the odd primes up to about b−a and for Q the k largest primes q1 < q2 . . . < qk below a, for suitable k. For the actual check of the interval [a, b], one generates the sets of even numbers F0 ⊂ F1 ⊆ F2 ⊆ . . ., defined by F0 = ∅, Fi+1 = Fi ∪ (P + qi+1 ), i = 0, 1, . . . , until for some j the set Fj covers all the even numbers in the interval [a, b]. The large set P is generated with the sieve of Eratosthenes, but this work has to be done only once if we fix the length b − a of the intervals [a, b]. The primes in Q depend on a and could also be generated with the sieve of Eratosthenes. However, since we only need a few hundred of such primes and since they do not exceed 1014 , it is much cheaper to use results of Jaeschke [9] by which for each prime we only need to do a few pseudoprimality tests, as long as they do not exceed 3.4 × 1014 . A disadvantage of this approach is that it does not, in general, find the minimal Goldbach decomposition. In this study we have chosen to implement the second approach. Apart from extending G2 as much as possible, we are interested in the number of steps in the above algorithms, necessary to verify BGC. In the next section we discuss a heuristic model which is capable to predict the average number of steps accurately.
3
Predicting the Average Number of Steps Needed to Verify BGC on [a, b]
We present some heuristics to estimate the average number of steps needed to generate the sets Fi , i = 0, 1, . . . until all the even numbers in [a, b] are covered. Let l = b − a be large enough, compared with a, so that we can find enough primes q in the vicinity of a for our purpose. The number of primes in P is about π(l). For each prime q ∈ Q, the set P + q covers about π(l) elements in [a, b], i.e. a proportion of about 1 − 2π(l)/l of the even numbers in [a, b] is not covered. If we assume, which is not the case, a statistical independence between the fact to be covered by P +q and the fact to be covered by P +q 0 and a further hypothesis of uniformity, we may expect that, on average, all even integers are covered with the help of k elements q when (1 − 2π(l)/l)k is roughly equal to 2/l, the inverse of the number of even numbers in [a, b]. If l = 108 , this leads to k ≈ 145 and for l = 109 this yields k ≈ 187. A more detailed study of the probabilistic model leads to a Poisson behaviour for the number of integers which are not covered;
208
J-M. Deshouillers, H.J.J. te Riele, and Y. Saouter
in this model, for k ≈ 148 in the case when l = 108 (and k ≈ 191 when l = 109 ) the probability to cover the whole interval [a, b] is close to 1/2. However, this does not agree with our experimental observations described in the next sections. Although a sort of statistical quasi-independence seems a natural hypothesis, the uniform distribution of primes is definitely not a decent one. A first lack of uniformity comes from the rarification of the primes (the local density of primes around x decreases when x increases). Considering only large primes, for example those between 107 and 108 to cover an interval of length 9 ∗ 107 , leads to the value k ≈ 150; this is in good agreement with the experimental mean value of the observed k’s (cf. Section 5.1). A second and more important lack of uniformity is of arithmetical nature. Let us choose a small prime r and consider the Goldbach decomposition of all the even numbers in [a, b] which are coprime with R = 3.5 . . . r. For each large q (prime, so coprime with R), all the primes p ∈ P which satisfy (p + q, R) > 1 cannot be used to represent our numbers. The number of admissible classes of primes is thus (3 − 2)(5 − 2) . . . (r − 2) and the proportion of useful primes in P is thus (3−2)(5−2)...(r−2) (3−1)(5−1)...(r−1) . So, for each prime q, the set P + q contains about (3−2)(5−2)...(r−2) π(l) (3−1)(5−1)...(r−1)
different even numbers and so the proportion of our even numbers in [a, b] which are covered in one step is (3 − 1)(5 − 1) . . . (r − 1) l (3 − 2)(5 − 2) . . . (r − 2) π(l) , (3 − 1)(5 − 1) . . . (r − 1) 3.5 . . . r 2 i.e., 2
Y 3≤s≤r s prime
1−
1 (s − 1)2
π(l) π(l) = C(r) . l l
By the same reasoning as above, we expect k to be close to the solution of 2R )k = φ(R)l . For r = 97 and l = 108 , this leads to k ≈ 206 and for (1 − C(r) π(l) l l = 109 we find k ≈ 270. This agrees well with our experiments and this implies, as one may expect, that for the even numbers in [a, b] which are not coprime with R = 3.5 . . . r, it is easier in general to find a Goldbach decomposition than for those which are coprime with R. Again, if we improve this model by the Poisson probabilistic consideration and the rarification of the primes, we are led to k ≈ 214 when l = 108 , which is, here also, in good agreement with the experimental data of Section 5.1. This probabilistic reasoning will be developed in a forthcoming paper.
4
Computations which Extend G2 from 4 × 1011 to 1014
We have adopted Shen’s approach, described in Section 2, to extend the binary Goldbach conjecture as far as possible beyond the known bound of 4 × 1011 . The intervals [a, b] were chosen to have a length of 108 or 128 × 106 or 109 . The largest possible prime one needs in the set P lies close to b − q1 . By the
New Experimental Results Concerning the Goldbach Conjecture
209
prime number theorem, q1 ≈ a − k log a, so that b − q1 ≈ b − a + k log a. As maximum values of k we found in our experiments that k = 430 was sufficient. For a ≈ 1014 this implies that the largest prime in the set P must have a size of at least 109 + 1.4 × 104 for b − a = 109 . In our actual implementation we have chosen P to contain the odd primes up to 108 + 105 in the case b − a = 108 , and those up to 109 + 106 in the case b − a = 109 . For the actual generation of the primes close to a we have used Jaeschkes computational results [9], stating that if a positive integer n < 215, 230, 289, 8747 is a strong pseudoprime with respect to the first five primes 2, 3, 5, 7, 11, then n is prime; corresponding bounds for the first six and seven primes are, respectively, 3,474,749,660,383 and 341,550,071,728,321. Initially, both the second and the third author have checked the BGC up to 1013, independently, on a Cray C916 vector computer resp. on an SGI compute server with 18 R10000 CPUs. After learning about each other’s results, they decided to work together to reach the bound 1014 . The second author has checked the BGC on the intervals x ×1013 for x = [2, 4], [6, 8], [9, 10] and the third author those for x = [1, 2], [4, 6], [8, 9]. 4.1
Experiments on the Cray C916 Vector Computer
The second author has implemented Shen’s algorithm on a Cray C916 vector computer as follows. With the large set of odd primes P we associate a long bit-array called ODD, in which each bit represents an odd number < 109 + 106 , the bit being 1 if the corresponding odd number is prime, and 0 if it is composite. With Fi we associate a similar bit-array called SIEVE, having the same length as ODD. The first bit of SIEVE represents the even number q1 + 3, the second bit q1 + 5, and, in general, bit i represents the even number q1 + 2i + 1. Initially, ODD is copied into SIEVE, making bit i of array SIEVE equal to 1 if 2i+1 is a prime, indicating that q1 + 2i + 1 can be written as sum of the two primes q1 and 2i + 1. Now array SIEVE represents the set F1 . In the second step, array SIEVE is “or”-ed with a right-shifted version of array ODD, where the shift equals (q2 − q1 )/2. It is easy to see that now array SIEVE represents the set F2 = F1 ∪ (P + q2 ). In general, Fi+1 is generated from Fi by doing an “or” operation between array SIEVE and array ODD, right-shifted with shift (qi+1 − q1 )/2. Of course, these steps can be carried out very efficiently on the Cray C916. We compressed 64 bits into one word and vectorized the “or” operations. Checking whether all the bits of array SIEVE have become 1 is only done when the chance of occurrence of this event has become sufficiently large (after 170 steps, in our program). As soon as the number of 0-bits has dropped below 4, the remaining “stubborn” even numbers are listed in order to “see” some intermediate output. In one typical run, we handled 1000 consecutive intervals of length 109 . Close to 1014 the time to generate 1000 × 430 large primes was about 5000 CPU– seconds, and the total sieving time was about 13, 200 seconds. The average (over 1000 consecutive intervals) number of steps in each run varied between 269 and 271 with standard deviation between 18 and 20. The total (low priority) CPU
210
J-M. Deshouillers, H.J.J. te Riele, and Y. Saouter
time used to cover the intervals [4 × 1011, 1013], [2 − 4] × 1013, [6 − 8] × 1013, and [9 − 10] × 1013 was approximately 75 CPU–hours for generating the large primes, and 225 CPU–hours for the sieving. The latter means that in the sieving part an average of 3.2 × 108 64-bit words per CPU-second were “or”-ed. The largest number of large primes which we needed was 413: for e = 33, 836, 446, 494, 106 and first prime q1 = 33, 835, 999, 990, 007 it turned out that e − qi is composite for i = 1, . . . , 412, and prime for i = 413 (q413 = 33, 836, 000, 002, 499 and e − q413 = 446, 491, 607). 4.2
Experiments on the SGI Compute Server with 18 R10000’s
The algorithm as implemented by the third author on the SGI workstation is very close to the one of the Cray C916 as described in Section 4.1. Prime numbers up to 128 × 106 are represented into a binary array, that we call again ODD, of one million 64 bits long entries: the j-th bit of the i-th element of the array is equal to 1 if and only if 128 ∗ i + 2 ∗ j + 3 is prime. Similarly another array of the same size, corresponding to the array SIEVE of the previous section, is used to note decomposed numbers: the j-th bit of the i-th element of this latter array is equal to 1 if and only if 128 ∗ i + 2 ∗ j + seed is decomposable as sum of two prime numbers, where seed denotes the even integer at which the phase begins. At this point, the task of the program is to fill all the entries of SIEVE with the greatest 64 bits word i.e. 264 − 1. The program searches for the least entry i for which the value of SIEVE[i] is not maximum and then searches for the least bit j of this entry not being equal to 1. Thus, the number 128 ∗ i + 2 ∗ j + seed has still not been written as sum of two primes. The program then searches for the least value k for which 128 ∗ (i − k) + seed − 3 and 128 ∗ k + 2 ∗ j + 3 are both prime. When such a k value is found the array SIEVE beginning at the entry i can be combined with the array ODD beginning at the entry k with an or operation as previously. Having a step size of 128 in the search of prime numbers does not change the density of expected prime numbers and has the advantage of avoiding the shift of the array ODD. At last, in order to gain efficiency, the addressing of the array SIEVE was done through a chained list: this list contains only values i for which SIEVE[i] is not maximal. Then after each or operation, the resulting value is compared with 264 −1 and if there is equality, the corresponding index is removed from the chained list. Thus the size of the array decreases when time elapses and globally no useless or operation is made. The drawback is that addressing has to be done by indirect pointer redirection and this slows down the program at the beginning of the execution. Versions with and without linked chain implementation were tested on a DECSTATION 3100 with various word sizes and various sizes for the arrays ODD and SIEVE. The gain of the linked chain version appeared to be maximal for arrays with a length of 1.5 × 106 words of 32 bits, with a factor of 1.59. Later, some comparisons were made with a version with prime entries up to 109 . The ratio of time executions was 0.82 to the benefit of the latter versions. Some other improvements were not implemented, e.g. anticipating the decompositions in the block of even numbers
New Experimental Results Concerning the Goldbach Conjecture
211
following the one of the current array SIEVE, when indices go out of the range of this latter array. Typical runs consisted of checking 1350 consecutive intervals of even integers of length 128 ∗ 106 with one run on each of the 18 R10000 processors of the SGI workstation. Seven such runs were necessary to deal with an interval of 2.1013. Intervals that were checked are [1013, 2.1013], [4.1013, 6.1013], and [8.1013, 9.1013]. A total number of 324 runs was necessary to complete this whole task. User CPU times for the various runs varied from 10 hours 33 mns for the run beginning at 9,158,401,000,000 and ending at 9,331,201,000,000, up to 17 hours 12 mns for the run from 2,937,601,000,000 up to 3,110,401,000,000. The total sequential time was 4083 hours 38 mns and so the real time, which is about 18 times smaller, was about 227 hours. Those times include the search for primes and the sieving. The number of prime numbers needed to verify the decomposition of 64 ∗ 106 even consecutive integers varies from 160 for the intervals beginning at 16,182,785,000,000 and 53,917,312,000,000, up to 184 for the interval beginning at 145,793,000,000. When testing on intervals of length 109 , the average number of prime numbers grows up to 218.
5
Checking BGC Near High Powers of Ten
Apart from extending G2 , we have also checked the binary Goldbach conjecture on intervals of length 108 and 109 near high powers of ten. The second author has checked the intervals [105i , 105i + 108 ], for i = 3, 4, . . . , 20, and the third author has checked the intervals [1010i, 1010i + 109 ], for i = 20, 21, . . . , 30. 5.1
The Intervals [105i , 105i + 108 ], for i = 3, 4, . . . , 20
For each interval [B, B + 108 ] the largest 300 primes ≤ B were generated. Here, the results of Jaeschke could not be used anymore because the numbers were too large. Instead, we first generated the 300 largest numbers ≤ B which pass a strong pseudo-prime test for one randomly selected base, and next we proved primality of these numbers with a program developed by H. Cohen, A.K. Lenstra, and D.T. Winter [5]: all these numbers turned out to be prime. For the set P we took the odd primes below 108 + 106 . The sieving technique was the same as that used on the Cray C916 for the even numbers up to 1014 . A selection of the results are given in Table 1. The second column gives the value of (q300 − q1 )/(299 log 10) which should be close to log10 B, according to the Prime Number Theorem. It illustrates that the local behaviour of the primes may deviate considerably from the known global behaviour. The average number of steps needed (over the 18 intervals considered) was 217, with standard deviation 23. For a uniform distribution of bits in array ODD (instead of the distribution induced by the primes) the average number of steps was 152, with standard deviation 9. This agrees well with the expected number of steps (214 in the case of primes and 150 in the case of uniform distribution) mentioned in Section 3.
212
5.2
J-M. Deshouillers, H.J.J. te Riele, and Y. Saouter
The Intervals [1010i , 1010i + 109], for i = 20, 21, . . . , 30
Again the SGI compute server was used to make a similar implementation. For an interval of the form [B, B +109 ], as in the implementation for decompositions up to 1014 , the even numbers were represented as bits in an array of 7812500 64–bit words. The sieving technique was the same as previously and also used chained lists. However, because of the size of the numbers, again Jaeschke’s results could not be used to establish primality. Instead of that, we passed candidate numbers through Miller-Rabin pseudo-primality tests for the bases 2, 3, 5 and 7 after a quick trial division sieve. The implementation of this phase was made with the PARI system. In a second phase we certified primality of these numbers by the Elliptic Curve Primality Prover program of Fran¸cois Morain [1,11]. On one R10000 node, the CPU times for the C version of ECPP, which the third author had to his disposal, varied between 4 minutes for numbers of 200 decimal digits and 60 minutes for numbers of 300 decimal digits. As a comparison, the primality of some of these numbers was proved by Cohen, Lenstra and Winter’s program [5] (for numbers up to 220 decimal digits; the average CPU-time was two minutes per number on an 180 MHZ IP32 SGI workstation) and by a program of Bosma and Van der Hulst [2] (for numbers larger than 220 decimal digits; the average CPU-time was seven minutes per number on the same 180 MHZ IP32 SGI workstation2 ). The number of prime numbers required to verify the BGC on an interval of length 109 was in fact nearly stable, varying from 222 up to 231 for the considered intervals with an average value of 225. Table 2 summarizes the results.
2
The CPU-time asked by this program grows with the size of the prime number, but in a very erratic way.
New Experimental Results Concerning the Goldbach Conjecture
213
Table 1. Checking the Goldbach conjecture on the intervals [B, B + 108 ], for B = 1015 , 1020, . . . , 10100. Notation [B, B + 108 ]: the interval on which the Goldbach conjecture is verified; q1 , . . . , q300 : the largest 300 consecutive primes ≤ B, generated on an SGI workstation with a 100 MHZ IP22 processor; : the sum of the CPU-times in minutes spent to generate Tpr the largest 300 strong pseudo-primes < B (which pass a strong pseudo-prime test for a randomly chosen base) with the PARI package, and to prove primality with a Fortran/C code based on the Cohen-Lenstra primality proving algorithm (all the strong pseudo-primes turned out to be prime); : the smallest positive integer such that for each even number N1 e ∈ [B, B + 108 ] there is an index i with 1 ≤ i ≤ N1 such that e − qi is a prime number; W : the “worst case” in the Goldbach check of the interval [B, B + 108 ], i.e., W − qi is composite for i = 1, 2, . . ., N1 − 1, but prime for i = N1 .
log10 B
q300 −q1 299 log 10
15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
16.2 19.8 24.6 31.9 34.1 37.2 51.5 45.3 56.8 58.9 66.4 72.6 80.7 82.6 76.8 95.6 92.9 99.0
B − q1 B − q300 Tpr N1 W − B 11159 13611 17063 21941 23477 25649 35469 31247 39183 40721 45951 50093 55779 56907 52919 65981 64011 68969
11 11 123 11 23 17 9 57 111 161 269 93 191 11 27 143 53 797
1.9 243 87831838 2.4 210 40249602 3.0 240 91143618 4.2 216 70421718 5.7 182 84348372 6.0 202 61919718 8.0 283 80017866 11 198 84955228 16 218 88062574 25 210 68370894 32 193 80085838 43 210 56324104 53 224 31058458 65 206 24403128 82 203 45500944 94 209 70588714 122 207 88980634 150 250 41229036
214
J-M. Deshouillers, H.J.J. te Riele, and Y. Saouter
Table 2. Checking the Goldbach conjecture on the intervals [B, B + 109 ], for B = 10200 , 10210, . . . , 10300. Notation [B, B + 109 ]: : q1 , q2, ... : Nq : Tgen
Tpp
W
the interval on which the Goldbach conjecture is verified; the list of prime numbers needed to verify BGC on [B, B + 109 ]; the cardinality of the previous set; the time needed to sieve the interval and to generate the Nq strong pseudo-primes with the PARI package on a single node of the SGI; : the total sequential time spent by ECPP to prove primality of the Nq pseudoprimes (this task was in fact distributed on all nodes of the compute server); : the “worst case” for the verification of BGC in the interval, i.e. W − qi is composite for i = 1, 2, ..., Nq − 1 but prime for i = Nq . log10 B Nq 200 210 220 230 240 250 260 270 280 290 300
qNq −q1 64.Nq . log 10
224 222 222 228 228 231 226 223 225 226 227
30281.7 30559.2 30557.9 29754.9 29763.3 29381.5 30026.2 30418.1 30151.7 30023.9 29885.3
log10 B 200 210 220 230 240 250 260 270 280 290 300
B − q1 qNq − B
Tgen
1 1 1 1 1 2
h h h h h h
33 35 44 48 55 11 16 13 45 39 04
W −B
97283 999497853 999786382 243203 999503869 999686796 177539 999530109 999620578 112643 999634941 999983752 191747 999836541 999872854 701699 999488509 999991806 174467 999837181 999864924 112643 999503997 999697006 32003 999714813 999837064 115331 999819517 999872646 32771 999692157 999821434
mn mn mn mn mn mn mn mn mn mn mn
Tpp 52 56 47 43 32 38 42 02 09 24 44
s 14 s 39 s 48 s 62 s 72 s 91 s 104 s 120 s 98 s 177 s 219
h h h h h h h h h h h
20 20 45 57 20 10 25 35 31 37 54
mn mn mn mn mn mn mn mn mn mn mn
02 31 05 39 47 43 31 28 15 48 59
s s s s s s s s s s s
New Experimental Results Concerning the Goldbach Conjecture
215
References 1. 2. 3. 4. 5. 6.
7.
8. 9. 10. 11.
12. 13. 14. 15. 16. 17. 18.
19.
A.O.L. Atkin and F. Morain. Elliptic curves and primality proving. Mathematics of Computation, 61:29–68, 1993. Wieb Bosma and Marc-Paul van der Hulst. Primality proving with cyclotomy. PhD thesis, University of Amsterdam, December 1990. J.R. Chen and T.Z. Wang, On the odd Goldbach problem, Acta Math. Sinica 32 (1989), pp. 702–718 (in Chinese). J.R. Chen and T.Z. Wang, On odd Goldbach problem under General Riemann Hypothesis, Science in China 36 (1993), pp. 682–691. H. Cohen and A.K. Lenstra, Implementation of a new primality test, Math. Comp. 48 (1987), pp. 103–121. J-M. Deshouillers, G. Effinger, H. te Riele and D. Zinoviev, A complete Vinogradov 3-primes theorem under the Riemann hypothesis, Electronic Research Announcements of the AMS 3 (1997), pp. 99–104 (September 17, 1997); http://www.ams.org/journals/era/home-1997.html . A. Granville, J. van de Lune and H.J.J. te Riele, Checking the Goldbach conjecture on a vector computer, Number Theory and Applications (R.A. Mollin, ed.), Kluwer, Dordrecht, 1989, pp. 423–433. G.H. Hardy and L.E. Littlewood, Some problems of ’Partitio Numerorum’; III: On the expression of a number as a sum of primes, Acta Math. 44 (1922/3), pp. 1–70. G. Jaeschke, On strong pseudoprimes to several bases, Math. Comp. 61 (1993), pp. 915–926. ˇ L. Kaniecki, On Snirelman’s constant under the Riemann hypothesis, Acta. Arithm. 72 (1995), pp. 361–374. Fran¸cois Morain. Courbes Elliptiques et Tests de Primalit´ e. PhD thesis, L’Universit´e Claude Bernard, Lyon I, September 1990. Introduction in French, body in English. ˇ O. Ramar´e, On Snirel’man’s Constant, Ann. Scuola Norm. Sup. Pisa 22 (1995), pp. 645–706. Yannick Saouter, Checking the odd Goldbach conjecture up to 1020 , Math. Comp., 67 (1998), pp. 863–866. L. Schoenfeld, Sharper Bounds for the Chebyshev Functions θ(x) and ψ(x). II, Math. Comp. 30 (1976), pp. 337–360. Mok-Kong Shen, On Checking the Goldbach conjecture, BIT 4 (1964), pp. 243–245. M.K. Sinisalo, Checking the Goldbach conjecture up to 4 · 1011 , Math. Comp. 61 (1993), pp. 931–934. M.L. Stein and P.R. Stein, Experimental results on additive 2 bases, Math. Comp. 19 (1965), pp. 427–434. I.M. Vinogradov, Representation of an odd number as a sum of three primes, Comptes Rendues (Doklady) de l’Acad´emie des Sciences de l’URSS, 15 (1937), pp. 291–294. D. Zinoviev, On Vinogradov’s constant in Goldbach’s ternary problem, J. Number Th. 65 (1997), pp. 334–358.
Dense Admissible Sets Daniel M. Gordon and Gene Rodemich Center for Communications Research 4320 Westerra Court San Diego, CA 92121 {gordon,gene}@ccrwest.org
Abstract. Call a set of integers {b1 , b2 , . . . , bk } admissible if for any prime p, at least one congruence class modulo p does not contain any of the bi . Let ρ∗ (x) be the size of the largest admissible set in [1, x]. The Prime k-tuples Conjecture states that any for any admissible set, there are infinitely many n such that n+b1 , n+b2 , . . . n+bk are simultaneously prime. In 1974, Hensley and Richards [3] showed that ρ∗ (x) > π(x) for x sufficiently large, which shows that the Prime k-tuples Conjecture is inconsistent with a conjecture of Hardy and Littlewood that for all integers x, y ≥ 2, π(x + y) ≤ π(x) + π(y). In this paper we examine the behavior of ρ∗ (x), in particular, the point at which ρ∗ (x) first exceeds π(x), and its asymptotic growth.
1
Introduction
The Prime k-tuples Conjecture states that for any set {b1 , b2 , . . . , bk } of integers which do not cover all congruence classes modulo any prime, there are infinitely many integers n such that n + b1 , n + b2 , . . . , n + bk are all prime. Call such a set admissible. Let ρ∗ (x) be the size of the largest admissible set in [1, x]. It is known that x (1) π(x) + (log 2 − o(1)) 2 ≤ ρ∗ (x) ≤ 2π(x). log x The lower bound is due to Hensley and Richards [3], and the upper bound was shown by Montgomery and Vaughn [6], using the large sieve. The main interest in ρ∗ (x) is that the Hensley and Richards result shows that the widely believed Prime k-tuples Conjecture is inconsistent with a conjecture of Hardy and Littlewood that for all integers x, y ≥ 2, π(x + y) ≤ π(x) + π(y).
(2)
Let ρ(x) = lim supy→∞ π(x + y) − π(y). Then (2) is equivalent to ρ(x) ≤ π(x) for x ≥ 2. The Prime k-tuples Conjecture implies ρ∗ (x) = ρ(x). By finding dense admissible sets, Hensley and Richards [3] showed that the Prime k-tuples Conjecture implies sets of primes in short intervals (y, x+y] which are denser than the π(x) primes in [1, x]. They construct their sets by sieving out by congruence J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 216–225, 1998. c Springer-Verlag Berlin Heidelberg 1998
Dense Admissible Sets
217
classes a1 mod 2, a2 mod 3, . . ., trying to leave as many survivors as possible. When they have sieved by all primes up to x, the set of survivors is clearly admissible. Hensley and Richards showed that ρ∗ (x) > π(x) for x = 20, 000. Unfortunately, the smallest prime k-tuple will typically be of size about k k , so actually finding such a k-tuple seems hopeless. Computations by Selfridge showed that ρ∗ (x) < π(x) for x ≤ 500. Jarvis [4] computed ρ∗ (x) for x ≤ 1050, and showed that ρ∗ (x) < π(x) for x ≤ 1120. In Section 3 we describe extending those computations, in particular computing ρ∗ (x) for x < 1631, and finding that ρ∗ (1417) = π(1417). Hensley and Richards showed that one particular sieve, the midpoint sieve, resulted in an admissible set giving the lower bound in (1). Another sieve, suggested by Schinzel, gives a better bound, but depends on a conjecture concerning how far the sieve has to go before the set is admissible. We will discuss these and other sieves in the next section. The question of extreme behaviors of sieving is of independent interest. Here we are concerned with saving sieves, which leave as many survivors as possible. The problem of killing sieves, where we want to eliminate all of [1, x] using as few primes as possible, is related to Jacobstahl’s function. The Jacobsthal function j(n) is the maximal gap between consecutive integers relatively prime to n. Maier and Pomerance [5] define j (n) to be the largest x for which a sieve by the factors of n eliminate all integers in [1, x] . Then j (n) = j(n) − 1 by the Chinese Remainder Theorem. Let p. P (x) = p≤x
Then j (P (x)) is the largest interval which can be completely sieved out by primes up to x. Maier and Pomerance show j (P (x)) ≥ (c0 eγ + o(1))x log x log log log x(log log x)−2 ,
(3)
where c0 = 1.312 . . . is the solution of 4/c0 − e−4/c0 = 3. Pintz [7] recently improved the constant from c0 to 2. To rephrase this in a way more convenient for our purposes, let T (x) be the smallest number t such that there is a sieve by primes up to t which sieves out [1, x] . Then from (3) we have T (x) ≤ O(x(log log x)2 / log x log log log x) Maier and Pomerance conjecture that j (P (x)) = x(log x)2+o(1) . This would imply (4) T (x) ≈ x/(log x)2+o(1) . The killing sieve that Maier and Pomerance use to establish (3) is the same as the one suggested by Schinzel for a saving sieve. In both cases we sieve by
218
Daniel M. Gordon and Gene Rodemich
1 mod p for small primes, 0 mod p for “medium-sized” primes, and optimally modulo large primes. This strategy may not result in the best possible killing or saving sieve, but it does yield a set which can be analyzed using standard sieve methods.
2 2.1
Sieve Strategies The Sieve of Eratosthenes
The sieve of Eratothenes is an obvious starting point. Unchanged, it is not a good saving sieve, since it covers all points in [1, x] . However, it is easy to turn it into a saving sieve by stopping the sieve when p > x/2. For all larger primes the congruence classes containing [x/2] and [x/2] + 1 will each contain only a single point in [1, x] , and one of these integers had to be eliminated by the sieve modulo two. The survivors of the sieve are the primes greater than x/2. Changing over to a greedy strategy earlier will do better, √ but will clearly be worse than π(x) if the sieve of Eratothenes is done up to x. We can stop the sieve significantly earlier than x/2. It is obvious that once p is greater than the number of survivors, some congruence class will be completely covered, and the sieve can be halted. Let t(z) be the inverse function of T (x), i.e. the largest x such that [1, x] can be sieved out by primes p < z. Hensley and Richards ([3], Lemma 5*) show Lemma 1. There is an x0 such that for x > x0 , the survivors of any sieve by primes up to x/t(log x/2) are admissible. Hensley and Richards only needed T (x) = o(x), to show that their sieve could stop at x/N log x for any N > 0. If (4) is true, then a sieve by primes up to x/ log x(log log x)2+o(1) will be admissible. In Section 2.4 we give a heuristic argument that the survivors of almost all sieves by primes up to cx/ log2 x will be admissible for any c > 2. 2.2
Random and Greedy Sieves
A random sieve is easy to analyze. Sieving by a random congruence modulo p will on average eliminate 1/p of the remaining integers. Thus the expected number of survivors is x x ≈ 0.5614 . x (1 − 1/p) ≈ e−γ log x log x p<x by Mertens’ Theorem. One might hope that a greedy algorithm would do better. Instead of choosing a congruence class at random, choose the best possible for each p. For small primes, say p < log x, the distribution of survivors over congruence classes modulo p is very flat, by standard sieve arguments. For larger primes it becomes less regular, so larger improvements over the random sieve can be obtained.
Dense Admissible Sets
219
Let g(x) be the size of the admissible set in [1, x] generated by the greedy algorithm. In case of ties, we pick the first congruence class which eliminates the fewest possible survivors (other choices do not affect the behavior much). Figure 1 shows g(x) − π(x) for x < 181, 000. The first x where g(x) equals π(x) is g(11046) = 1337, and g(x) is greater than π(x) at g(11916) = 14294. It seems likely that lim sup(g(x) − π(x)) → ∞, and lim inf(g(x) − π(x)) → −∞.
x Fig. 1. g(x) − π(x)
2.3
The Midpoint Sieve
Hensley and Richards use the fact that primes are denser in [−x/2, x/2] than in [1, x] . Sieving [−x/2, x/2] by primes up to x/N log x for any N > 0 and x sufficiently large, the number of survivors is 2π(x/2) − 2π(x/N log x) ≈ π(x) + (log 2 − (N ))(x/ log2 x) by the sharp form of the Prime Number Theorem. The resulting set of survivors will be admissible by Lemma 1.
220
2.4
Daniel M. Gordon and Gene Rodemich
Schinzel’s Sieve
It is hard to find other sieve strategies that can be analyzed. One notable one is used in analyses of Jacobstahl’s function and prime gaps as well. In this context, it was suggested by Schinzel (see [3]). Choose y < z < x. The sieve is just a variation on the sieve of Eratothenes: sieve by 1 mod p for p ≤ y, and 0 mod p for y < p ≤ z. Hensley and Richards show that for y fixed, m = π(y), and z = x/N log x(log log x)m , the number of survivors will be x r log r π(x) + (1 + o(1)) 2 . log x r 2. Call an integer y-smooth if all its prime factors are at most y. The survivors will be the union R(1) ∪ R(2) , where R(1) is the set of integers in (0, x] of the form mp, where p > z is prime, m is y-smooth and mp − 1 is relatively prime to P (y), and R(2) is the set of y-smooth integers in (0, x]. We will ignore the smaller set R(2) , which has size O(x ) for any > 0, and show that R(1) has the conjectured cardinality. We have Rm , R(1) = m≤x/z
where Rm = {mp : z < p ≤ x/m, (mp − 1, P (y)) = 1}, and the prime indicates that the union (and sums below) are over y-smooth integers. By the Siegel-Walfisz Theorem on primes in arithmetic progressions, x r − 2 1 + O(log−A x) |Rm | = π m r−1 r≤y
r | m
Dense Admissible Sets
221
for any fixed A > 0. Since π(x/m) = π(x) 1/m + log m/(m log x) + O(log−2 x) , we have
|Rm | = π(x)
m≤x/z
r − 2 1 1 log m + +O r−1 m m log x log2 x
m≤x/z r≤y
r | m
survivors. As in (**) of [3], using estimates for the number of y-smooth numbers less than x/z = O(log2 x) (see [2]), this becomes ∞ 1 log m r−2 1 π(x) + +o r − 1 m m log x log x m=1 r≤y = π(x)
r | m
r≤y
∞
∞
log ra r−2 1 + + r − 1 a=1 ra a=1 ra log x
+o
1 log x
1 +o = π(x) log x r≤y r log r 1 1 +o = π(x) 1 + log x (r − 1)2 log x r≤y log y (1 + o(1)) = π(x) 1 + log x x log log log x . = π(x) + (1 + o(1)) log2 x
∞ 1 a log r 1+ log x a=1 ra
The problem is that we cannot show that this set is admissible. A heuristic argument, assuming that the survivors are distributed more or less randomly among congruence classes modulo larger primes, indicates that it should be. Each prime p > z has p > cx/ log2 x congruence classes, and s < 2x/ log x survivors (by [6]) are being put in them. If survivors were randomly distributed, then the probability that some congruence class is empty is e−λ , where λ = pe−s/p > x(c−2)/c− (see, for example, Section IV.2 of [1]). For c > 2, the probability that all congruence classes are covered for any prime p > z is o(1). The experimental data is not very helpful (see Figure 2), since asymptotics do not take over until x is quite large. The crossover point for this sieve with y = 2 is 904,036, and with larger y much larger.
222
Daniel M. Gordon and Gene Rodemich
x Fig. 2. Difference between y = 2 sieve and π(x)
3
Computing ρ∗ (x) Efficiently
The only way to compute ρ∗ (x) seems to be to exhaust over residues modulo 2, 3, . . ., looking for sets of residues with a large number of survivors. It is possible to add various tricks to make this search more efficient, greatly speeding up the search. We start by doing a sieve modulo small primes. By the Chinese Remainder Theorem, looking at survivors in [1, x] of sieves by a1 mod 2, a2 mod 3, . . ., ak mod pk , for all residue classes modulo each prime, is equivalent to looking at all integers in (y, x + y] for 0 ≤ y < P (pk ) which are relatively prime to P (pk ). Thus we can divide up work into two parts: first sieve by primes up to (say) 29 in [1, P (29)], and look for intervals with enough survivors to improve the current bound on ρ∗ (x). For each such interval, exhaust over residue classes modulo 31, 37, . . ., until either the number of survivors is less than the current bound on ρ∗ (x), or we reach a prime larger than the number of survivors, in which case the survivors are admissible. A number of simple theorems about ρ∗ (x) may be used to speed the search. If {s1 , . . . , sl } ⊂ [1, x] is an admissible set, so is {x+1−s1, . . . , x+1−s1}. Thus, we only need to sieve in [1, P (29)/2], cutting the work in half (x numbers should be added on each side of the interval to avoid problems with the boundaries). Since ρ∗ (x) = ρ∗ (x − 1) for x even, we only need to look at odd values of x. Also, since ρ∗ (x − 2) ≤ ρ∗ (x) ≤ ρ∗ (x − 2) + 1
Dense Admissible Sets
223
we can stop as soon as we find an improvement. We only need to look at intervals (y, x + y] where the endpoints y + 1 and x + y are survivors. If either one was not, then the interval without that endpoint would have been checked before. This eliminates a large fraction of the work. One further theorem is of great use: Theorem 1. If ρ∗ (x + 2) > ρ∗ (x) > ρ∗ (x − 2), then x ≡ 1 mod 3. Proof. Consider the optimal sieve on [1, x + 2]. As mentioned above, 1 and x + 2 must be survivors, or else we would have ρ∗ (x) = ρ∗ (x + 2). We also have that 3 and x are survivors, or else the interval [5, x + 2] (respectively [1, x − 2]) would give us ρ∗ (x − 2) = ρ∗ (x + 2) − 1. The only way that 1, 3, x and x + 2 can all be survivors is if we are sieving out by 2 mod 3, and x ≡ 1 mod 3. This allows us to skip x + 2 whenever ρ∗ (x) > ρ∗ (x − 2) and x ≡ 1 mod 3. If x ≡ 1 mod 3, the search is still greatly sped up by the requirement that 3 and x must be survivors. Even so, as x increases, the work becomes formidable. Finding ρ∗ (x) for all x becomes impractical, and it is quicker to just look for possible crossover points. This is accomplished by finding ρ∗ (x) for x = pk − 2, and checking if some interval has k or more survivors. This lets us skip many values of x, and have a higher threshold for the number of survivors. This search was implemented on a Cray T3D. Parallelizing was accomplished by breaking the sieve interval into equal pieces. It seemed possible that load balancing would be a problem, if one interval took much longer than others, but this does not seem to happen. In any large computation, there is some question about whether the algorithm has been implemented correctly. If some intervals were being skipped or not handled correctly, the values of ρ∗ (x) might not be right. The original program was written by the first author in C. The second author got interested in the problem and wrote an independent search program in Fortran, which got the same answers and was significantly faster. The first crossover point, with ρ∗ (x) = π(x) is at x = 1417 (Jarvis discovered an admissible set for x = 1422 with the same cardinality). Unfortunately, the prime 1423 is followed by a prime pair 1427 and 1429, sending π(x) ahead again, where it remains for a long time. The search was continued up to x = 1663. We can push the bound for the crossover point somewhat higher, using an idea of Schinzel [8]. The inequality ρ∗ (x + y) ≤ ρ∗ (x) + ρ∗ (y)
(5)
allows us to get upper bounds for ρ∗ (x) over a larger range. Using the computed values of ρ∗ (x) and (5), we find that ρ∗ (x) ≤ π(x) for x ≤ 1731. Jarvis [4] suggested looking at local maxima of Li(x) − π(x), on the grounds that these are points where π(x) is smaller than expected, and so ρ∗ (x) has a better chance of exceeding it. One such point is x = 1423, and the next is x = 1971. Unfortunately, a search found no admissible set of length 1971 with
224
Daniel M. Gordon and Gene Rodemich
298 elements. It is possible that the crossover point is smaller than 1971, but more likely that it is larger, perhaps at the next local maximum x = 2203, which is computationally infeasible to check. Jarvis also used a combination of exhaustive search on small primes and a greedy strategy for larger primes to get an upper bound for the crossover point. He showed that ρ∗ (4930) ≥ 658 > π(4930). Figure 3 shows ρ∗ (x) − π(x) for x ≤ 1631. The two functions stay extremely close for a long time, and make it tempting to conjecture that lim
x→∞
ρ∗ (x) = 1, π(x)
but as Figure 2 indicates, extrapolating from limited data can be perilous. 0
-2
-4
-6
-8
-10 0
200
400
600
800
x
1000
1200
1400
1600
1800
Fig. 3. ρ∗ (x) − π(x)
Acknowledgment. We would like to thank John Selfridge for making us aware of the work of Jarvis [4].
References 1. William Feller. An introduction to probability theory and its applications, volume 1. Wiley, third edition, 1968.
Dense Admissible Sets
225
2. Andrew Granville. On positive inegers ≤ x with prime factors ≤ t log x. In R. A. Mollin, editor, Number Theory and Applications, pages 403–422. Kluwer, 1989. 3. Douglas Hensley and Ian Richards. Primes in intervals. Bulletin AMS, 25:375–391, 1974. 4. Norman C. Jarvis. Admissible sequences. Master’s thesis, Brigham Young University, 1996. 5. Helmut Maier and Carl Pomerance. Unusually large gaps between consecutive primes. Trans. AMS., 322:201–237, 1990. 6. H. L. Montgomery and R. C. Vaughn. The large sieve. Mathematika, 20:119–134, 1973. 7. J´ anos Pintz. Very large gaps between consecutive primes. J. Number Theory, 63:286–301, 1997. 8. A. Schinzel. Remarks on the paper ’sur certaines hypoth`ese concernant les nombres premiers’. Acta Arith., pages 185–208, 1958.
An Analytic Approach to Smooth Polynomials over Finite Fields Daniel Panario1 , Xavier Gourdon2 , and Philippe Flajolet2 1
Department of Computer Science, University of Toronto M5S 3G4, Toronto, Canada [email protected] 2 Algorithms Project, INRIA Rocquencourt F-78153 Le Chesnay, France [email protected], [email protected]
Abstract. We consider the largest degrees that occur in the decomposition of polynomials over finite fields into irreducible factors. We expand the range of applicability of the Dickman function as an approximation for the number of smooth polynomials, which provides precise estimates for the discrete logarithm problem. In addition, we characterize the distribution of the two largest degrees of irreducible factors, a problem relevant to polynomial factorization. As opposed to most earlier treatments, our methods are based on a combination of exact descriptions by generating functions and a specific complex asymptotic method.
1
Introduction
The security of many applications in public-key cryptography relies on the computational intractability of finding discrete logarithms in finite fields. Examples are the Diffie-Hellman key exchange scheme [7], El Gamal’s cryptosystem [8], and pseudorandom bit generators [3,10]. On the other hand, algorithms for computing discrete logarithms in finite fields depend on finding polynomials with all of their irreducible factors with degree not greater than certain bound m — such polynomials that are the analogue of highly composite numbers are called smooth polynomials. Thus quantitative characterizations of smoothness in random polynomials over finite field are of relevance to cryptographic attacks; see [14,15,16]. In different contexts, like computer algebra and error-correcting codes, knowledge of the distribution of the largest irreducible factor of a random polynomial over a finite field permits us a fine tuning of the stopping conditions in polynomial factorization algorithms. In this paper, we give a unified treatment of the asymptotic enumeration of smooth polynomials over finite fields and quantify precisely the distribution of largest irreducible factors. The results are expressed in terms of a familiar number-theoretic function, the Dickman function, that is already known to underlie the study of numbers with no primes larger than m; see [5,6]. Our approach starts with an exact representation of enumeration problems by means J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 226–236, 1998. c Springer-Verlag Berlin Heidelberg 1998
An Analytic Approach to Smooth Polynomials over Finite Fields
227
of combinatorial generating functions. From there, we develop dedicated contour integration methods that are in the spirit of analytic number theory but have quite a different technical flavour since power series are used instead of Dirichlet series. Such an approach is of general applicability and Gourdon [11] introduced it in order to study the size of the largest cycle in random permutations (where nonconstructive Tauberian methods had been previously used), as well as largest components in several decomposable combinatorial structures, like random mappings. The results on smooth polynomials are presented in Section 2. The number of m-smooth polynomials of degree n over IFq has already been considered in the literature. Odlyzko [15] provides an asymptotic estimate when n → ∞ for the case q = 2 and n1/100 ≤ m ≤ n99/100 using the saddle point method. This generalizes to any prime power q; see [13]. Car [4] has given an asymptotic expression for this number in terms of the Dickman function, but Car’s estimates only hold for m large with respect to n, typically m > c n log log n/ log n. Finally, Soundararajan [17] completes the full range 1 ≤ m ≤ n by giving more precise boundaries. He uses the saddle point method for, log n/ log log n ≤ m ≤ 3n log log n/ log n while the cases of very small and very large m with respect to n are covered through the use of recurrences As a consequence of a large intermediate range and due to the intricate saddle point expressions, some of the quantitative estimates obtained earlier fail to be transparent. In addition, Soundararajan shows that the Dickman function approaches the number of smooth polynomials when √ 1/k m ≥ n log n. We extend this range to m ≥ (1 + ε) (log n) , for a positive integer constant k. The methods we introduce here follow a clear thread that enables us to expand the range where the Dickman function approximates the number of smooth polynomials. For instance, it can be applied to the enumeration of “semismooth” polynomials over finite fields that are defined by constraints on the degrees of several of their largest irreducible factors. (These are the equivalent for polynomials of the semismooth integers defined by Bach and Peralta [2].) We illustrate this fact by treating in some detail the joint distribution of the largest two irreducible factors, a problem that is again of relevance for polynomial factorization algorithms. Throughout this paper, we take a field IFq of fixed cardinality q; it seems possible to obtain similar results uniformly on q. Asymptotic estimates are expressed as functions of the degree n of the polynomials considered.
2
Smooth Polynomials
The Dickman function plays a central rˆ ole in our results on smooth polynomials. This classical number-theoretic function describes the distribution of the largest prime divisor of a random integer [5,6]. A survey on this topic is due to Hildebrand and Tenenbaum [12]. Our general reference for this paper is Tenenbaum’s book [18].
228
Daniel Panario, Xavier Gourdon, and Philippe Flajolet
Definition 1. The Dickman function, ρ(u), is the unique continuous solution of the difference-differential equation ρ(u) = 1 0 uρ (u) = −ρ(u − 1)
0 ≤ u ≤ 1, u > 1.
In order to prove our main result we need the following lemma that deals with the crucial technicality of approximating the required generating functions. An essential rˆole is played by the exponential integral function E that is defined as Z +∞ −s e ds. E(a) = s a Lemma 1. The remainders of the logarithm series, rm (z) =
X zk , k
k>m
are approximable in terms of the exponential integral as 1 , rm (e−h ) = E(mh) + O m where the big-Oh error term is uniform with respect to h, for <(h) > 0 and |=(h)| ≤ π. Proof. When <(h) > 0, we have ! Z +∞ X Z −h −ku e du = rm (e ) = h
k>m
+∞ −(m+1)u
h
e du = 1 − e−u
This can be written in terms of the function ψ(z) = integral as −h
rm (e
) = E(mh) + Rm (mh),
1 Rm (u) = m
Z
+∞
u
e−s
mh
1 1 ez −1 − z
Z
+∞
1/m ds. es/m − 1
and the exponential
e−s ψ
s ds, m
(1)
where we only need to observe that 1 s 1 1 s 1 1 = ψ + = ψ + . m m s/m m m s m(es/m − 1) Finally, the analyticity of ψ(z) in |z| < 2π implies that Rm (u) = O(1/m) uniformly for <(u) ≥ 0 and |=(u)| ≤ mπ. t u Theorem 1. The number of m-smooth polynomials of degree n over IFq satisfies n log n 1+O , Nq (n, m) = q n ρ m m where ρ is the Dickman function.
An Analytic Approach to Smooth Polynomials over Finite Fields
229
Proof. Let I be the collection of all monic irreducible polynomials in IFq , and |ω| the degree of ω ∈ I. The collection of monic polynomials with all irreducible factors with degree smaller than or equal to m can be symbolically written as Y Y (1 + ω + ω2 + · · ·) = (1 − ω)−1 . Sm = ω∈I, |ω|≤m
ω∈I, |ω|≤m
Let z be a formal variable. The substitution ω 7→ z |ω| gives rise to the generating function Sm (z) of m-smooth polynomials Ik m −1 Y Y 1 |ω| = . 1−z Sm (z) = 1 − zk ω∈I, |ω|≤m
k=1
In this context, the generating function of polynomials over IFq is Ik ∞ Y 1 1 . = P (z) = 1 − zk 1 − qz k=1
The number of m-smooth polynomial of degree n over IFq is given by Cauchy’s coefficient formula Z 1 dz n Sm (z) n+1 , Nq (n, m) = [z ]Sm (z) = 2πi C z where the contour C is chosen to be z = e−1/n+iθ , −π ≤ θ ≤ π. The change of variable z = e−h/n within the integral provides z n = e−1+inθ . Thus, h = 1 − inθ, and the limits of integration are (1 + niπ, 1 − niπ). Therefore, Z 1−niπ 1 1 dh −h/n Sm (e ) − . (2) Nq (n, m) = 2πi 1+niπ n e−h An equivalent expression for Sm (z) that makes explicit the singularity at z = 1/q can be obtained by taking the logarithm and inverting summations. Indeed, P [j] considering rm (z) = k>m Ik z kj , we have ! [2] [3] Y 1 rm (z) rm (z) k Ik [1] exp −rm (z) − − ··· . (1 − z ) = Sm (z) = P (z) 1 − qz 2 3 k>m
The last equality holds since Y
(1 − z )
k Ik
X
= exp
k>m
Ik log 1 − z
k>m
= exp − = exp −
X
= exp
!
! z 3k z 2k + +··· z + 2 3 ! X Ik z kj
Ik
k>m ∞ X 1 j=1
k
j
[1] (z) −rm
k
k>m
! [2] [3] rm (z) rm (z) − ··· . − 2 3
230
Daniel Panario, Xavier Gourdon, and Philippe Flajolet
Now, the estimate kIk = q k + O(q k/2 ) gives X zk z [1] = + O(q −m/2 ) rm q k
for |z| <
k>m
and, sup |z|≤1/q
[j] rm
1 z =O q q m(j−1)
1 , q
for j ≥ 2.
The estimate of the remainders rm of the logarithm given in Lemma 1 applied to Sm (z) entails −h e−E(mh)+O(1/m) e = , (3) Sm q 1 − e−h where we may disregard the error term in the exponent since it is of smaller order than the one in the statement of the theorem. Substituting this estimate in (2) yields, for µ = m n, Nq (n, m) = q n
1 2πi
Z
1+niπ
1−niπ
e−E(µh)+O(1/m) h e dh. n(1 − e−h/n )
Set ψ(z) = 1−e1 −z − 1z , that is an analytic function in |z| < 2π. We can express the above number in terms of ψ as follows. First, 1 h 1 1 h 1 1 = ψ + = ψ + . −h/n n n h/n n n h n(1 − e ) Second, 1 1 eO(1/m) = + ψ h n n(1 − e−h/n )
h 1 +O . n hm
Thus, Nq (n, m) = q n
1 2πi
Z
1+inπ
1−inπ
e−E(µh)
1 1 + ψ h n
h 1 +O eh dh. n hm
We treat separatedly the three integrals. The fact that e−E(z) is bounded in the domain <(z) ≥ 0 (see [1], § 5.1) entails that the contribution of the big-Oh term in the integral is O(log n/m). Then, an integration by parts gives also a small contribution of order O(log n/n) for the term containing ψ(h/n). Finally, we have Z 1+inπ −E(µh) log n e 1 eh dh + O . Nq (n, m) = q n 2πi 1−inπ h m We write Z 1+inπ −E(µh) Z 1+i∞ −E(µh) Z −E(µh) 1 e e e 1 h h e dh = e dh − eh dh, 2πi 1−inπ h 2πi 1−i∞ h h L
An Analytic Approach to Smooth Polynomials over Finite Fields
231
where the integration domain L is the union of the two semi-vertical lines defined by <(h) = 1, |=(h)| ≥ nπ. The last integral is O(1/n) as can be checked by partial integration. Therefore, Z 1+i∞ −E(µh) log n e 1 n h e dh + O . (4) Nq (n, m) = q 2πi 1−i∞ h m To conclude the proof, it remains to show that the above integral is ρ(n/m). The Laplace transform ρb(s) of the Dickman function satisfies (see [18], §5.4, p. 373) s ρb(s) = e−E(s) . Thus, ρ(u) =
1 2πi
Z
1+i∞
1−i∞
e−E(v) v
euv dv.
(5)
We now relate Equations (4) and (5). The change of variable µh = v in (4) implies Z 1+i∞ −E(µh) Z 1+i∞ −E(v) 1 e dv e 1 eh dh = ev/µ 2πi 1−i∞ h 2πi 1−i∞ v/µ µ Z 1+i∞ −E(v) n e 1 . evn/m dv = ρ = 2πi 1−i∞ v m The theorem follows since ρ(u) ≤ 1/Γ (u +1) for all u ≥ 0 ([18], §5.3, p. 366). u t The previous theorem shows that when m/ log n → ∞, the number of smooth polynomials is given asymptotically by the Dickman function. In the sequel, we extend the range of applicability of Theorem 1 to sublogarithmic values of m with respect to n. Note that we can restrict our attention to m < n since the case m = n corresponds to the well–known enumeration of irreducible polynomials. Theorem 2. Let m < n, and k a positive integer such that km < n and mk /logn → ∞. Then, the number of m-smooth polynomials of degree n over IFq satisfies n log n 1+O , Nq (n, m) = q n ρ m mk where ρ is the Dickman function. Proof. We use the same notation of Theorem 1, and only show the case k = 2. Using (1) as the estimate of the remainders of the logarithm, Equation (3) can be written as −h/n e−E(µh)−Rm (µh) e = . (6) Sm q 1 − e−h Integrating Rm (µh) by parts yields Rm (µh) =
1 m
Z
+∞
µh
e−s ψ
s ds m
232
Daniel Panario, Xavier Gourdon, and Philippe Flajolet
Z ∞ s 1 +∞ 0 s −s −s e + e ds −ψ ψ m m µh m µh 0 1 h h 1 + 2 e−µh ψ + O e−µh /m3 . = e−µh ψ m n m n 1 = m
Thus, R2m (µh) =
1 −2µh 2 e ψ m2
h + O e−2µh /m3 . n
Expanding e−Rm (µh) in (6), we have 1 1 h 1 e−µh h 1 e−µh 2 h 1 e−Rm (µh) = + ψ − ψ − ψ +O . h n n h m n n m n hm2 n(1 − e−h/n ) Arguments similar to the ones employed in the previous theorem lead to the conclusion that n log n 1+O . Nq (n, m) = q n ρ m m2 (In order to improve on the error estimate, it would suffice to consider successive t u terms in the expansion of e−Rm (µh) .)
3
Distribution of Largest Degrees of Factors
The distribution of the largest degree among the irreducible factors of a random polynomial over IFq underlies many problems dealing with polynomials over finite fields. An instance is in the factorization problem. The joint distribution of the [1] [2] two largest degrees Dn , Dn of the distinct factors of a random polynomial of degree n in IFq provides the halting condition for the distinct-degree factorization stage; see [9]. [1] We first investigate the distribution of the largest degree Dn which is of independent interest. The same analysis techniques are then applied in order to [1] [2] produce the joint distribution of Dn , Dn . 3.1
Largest Degree of Factors [1]
The following theorem gives a local distribution for the largest degree Dn of a random polynomial of degree n. We only sketch the proof since it is similar to that of Theorem 1. [1]
Theorem 3. The largest degree Dn among the irreducible factors of a random polynomial of degree n over IFq satisfies log n 1 m [1] +O , Pr(Dn = m) = f m n m2
An Analytic Approach to Smooth Polynomials over Finite Fields
233
where f(µ) = ρ(1/µ − 1) is a variant of the Dickman function; alternatively 1 f(µ) = 2πi
Z
1+i∞
1−i∞
e−E(µh) (1−µ)h e dh. h
(7)
Proof. The generating function of the class of m-smooth polynomials is m Y
Sm (z) =
k=1
1 1 − zk
Ik . [1]
Thus, the generating function of polynomials for which Dn = m is Lm (z) = Sm (z) − Sm−1 (z) = Sm (z) 1 − (1 − z m )Im .
(8)
The probability we are interested in is then given by the Cauchy formula Z dz [z n ]Lm (z) 1 z = L , Pr(Dn[1] = m) = m n n+1 q 2πi C q z where the contour C is chosen to be z = e−1/n+iθ , −π ≤ θ ≤ π. As in Theorem 1, the change of variable z = e−h/n within the integral gives −h/n h Z 1+niπ e 1 e [1] dh. Lm Pr(Dn = m) = 2πi 1−niπ q n Using the estimate in (3) for Sm (z) and (8), we obtain −h e−E(mh)+O(1/m) e−mh e = . Lm q 1 − e−h m The estimate of Lm in (9) yields, for µ = Pr(Dn[1] = m) =
1 1 m 2πi
Z
1+niπ
1−niπ
(9)
m n,
e−E(µh)+O(1/m) (1−µ)h e dh. n(1 − e−h/n )
A similar argument to the one employed in Theorem 1 completes the proof. u t 3.2
Joint Distribution of the Two Largest Degrees of Factors
The method used to prove the previous theorem generalizes to the joint distri[1] [2] bution of the two largest degrees Dn , Dn of distinct irreducible factors of a random polynomial of degree n in IFq [x]. In the context of the general factorization algorithm, this study appears naturally when analyzing the early-abort stopping rule during the distinct-degree factorization stage; see [9]. The joint distribution of the largest two irreducible factors is also related to semismooth polynomials. Bach and Peralta [2] define and study the asymptotics of semismooth integers. An integer n is semismooth with respect to y and z
234
Daniel Panario, Xavier Gourdon, and Philippe Flajolet
if n1 ≤ y and n2 ≤ z for ni the ith largest prime factor of n. Analogously, a polynomial f of degree n over IFq is a semismooth polynomial with respect to [1] [2] m1 and m2 , m1 ≥ m2 , if Dn ≤ m1 and Dn ≤ m2 . The next theorem provides the asymptotics for the joint distribution of the two largest degrees among the distinct irreducible factors of a random polynomial over IFq . (A similar result holds for the case when repetitions of factors are allowed.) [1]
[2]
Theorem 4. The two largest degrees Dn and Dn of the distinct factors of a random polynomial of degree n in IFq satisfy (i) for 0 ≤ m ≤ n, Pr(Dn[1] = m, Dn[2] ≤ m/2) =
1 m +O g1 m n
log n m2
,
where g1 (µ) is expressed in terms of the exponential integral E as g1 (µ) =
1 2πi
Z
1+i∞
1−i∞
e−E(µh/2) (1−µ)h e dh; h
(ii) for 0 ≤ m2 < m1 ≤ n, Pr(Dn[1] = m1 , Dn[2] = m2 ) =
m m 1 1 2 +O , g2 m1 m2 n n
log n m1 m22
,
where g2 (µ1 , µ2 ) is g2 (µ1 , µ2 ) =
1 2πi
Z
1+i∞
1−i∞
e−E(µ2 h) (1−µ1 −µ2 )h e dh. h
Proof. We only sketch the proof. With the same notations as in the proof of [1] the previous theorem, the generating function of polynomials for which Dn = m [2] and Dn ≤ m/2 is m em (z) = Sbm/2c (z) Im z . (10) L 1 − zm [n]
[n]
The generating function of polynomials with D1 = m1 and D2 = m2 , m2 < m1 is m1 em1 ,m2 (z) = Lm2 (z) Im1 z . (11) L 1 − z m1 The behavior of the nth coefficient of the generating functions in (10) and (11) is then extracted like in Theorem 3. We briefly demonstrate the process for the generating function of (11). The estimate in (9) for Lm2 (z) and (11) entails −h e−E(m2 h)+O(1/m2 ) e−m2 h e−m1 h e e = . Lm1 ,m2 q 1 − e−h m2 m1
An Analytic Approach to Smooth Polynomials over Finite Fields
Plugging this estimate in the Cauchy integral yields, for µ1 = m1 m2 Pr(Dn[1] = m1 , Dn[2] = m2 ) =
1 2πi
Z
m1 n , µ2
1+niπ −E(µ2 h)+O(1/m2 )
1−niπ
e
n(1 − e−h/n )
=
235 m2 n ,
e(1−µ1 −µ2 )h dh.
An argument once more similar to the one in Theorem 1 completes the proof. u t We note that it is possible to generalize the above theorem to the joint distribution of the jth largest distinct irreducible factors. Acknowledgements. This work was supported in part by the Long Term Research Project Alcom-IT (# 20244) of the European Union.
References 1. Abramowitz, M., and Stegun, I. Handbook of mathematical functions. Dover, New York, 1970. 2. Bach, E., and Peralta, R. Asymptotic semismoothness probabilities. Math. Comp. 65 (1996), 1701–1715. 3. Blum, M., and Micali, S. How to generate cryptographically strong sequences of pseudorandom bits. SIAM J. Comput. 13 (1984), 850–864. 4. Car, M. Th´eor`emes de densit´e dans IFq [x]. Acta Arith. 48 (1987), 145–165. 5. de Bruijn, N. On the number of positive integers ≤ x and free of prime factors > y. Indag. Math. 13 (1951), 2–12. 6. Dickman, K. On the frequency of numbers containing prime factors of a certain relative magnitude. Ark. Mat. Astr.Fys. 22 (1930), 1–14. 7. Diffie, W., and Hellman, M. New directions in cryptography. IEEE Trans. Inform. Theory 22 (1976), 644–654. 8. ElGamal, T. A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Trans. Info. Theory 31 (1985), 469–472. 9. Flajolet, P., Gourdon, X., and Panario, D. The complete analysis of a polynomial factorization algorithm over finite fields. Submitted March 1998. [Extended abstract in Proc. 23rd ICALP Symp., Lecture Notes in Computer Science, vol. 1099, p. 232–243, 1996.] Full version in technical report 3370, INRIA, March 1998. 10. Gao, S., von zur Gathen, J., and Panario, D. Gauss periods: orders and cryptographical applications. Math. Comp. 67 (1998), 343–352. 11. Gourdon, X. Combinatoire, algorithmique et g´ eom´ etrie des polynˆ omes. Th`ese, ´ Ecole Polytechnique, 1996. 12. Hildebrand, A., and Tenenbaum, G. Integers without large prime factors. J. Th´eorie des Nombres de Bordeaux 5 (1993), 411–484. 13. Lovorn, R. Rigourous, subexponential algorithms for discrete logarithm algorithms in IFp2 . PhD thesis, University of Georgia, 1992. 14. Lovorn Bender, R., and Pomerance, C. Rigourous discrete logarithm computations in finite fields via smooth polynomials. In Computational Perspectives on Number Theory Proc. of a Conference in Honor of A.O.L. Atkin (Providence, 1998), vol. 7 of AMS/International Press Studies in Advanced Mathematics.
236
Daniel Panario, Xavier Gourdon, and Philippe Flajolet
15. Odlyzko, A. Discrete logarithms and their cryptographic significance. In Advances in Cryptology, Proceedings of Eurocrypt 1984 (1985), vol. 209 of Lecture Notes in Computer Science, Springer-Verlag, pp. 224–314. 16. Odlyzko, A. Discrete logarithms and smooth polynomials. In Finite fields: theory, applications and algorithms, G. Mullen and P. J.-S. Shiue, Eds. Contemporary Mathematics 168, Amer. Math. Soc., 1994, pp. 269–278. 17. Soundararajan, K. Asymptotic formulae for the counting function of smooth polynomials. To appear in J. London Math. Soc. 18. Tenenbaum, G. Introduction to analytic and probabilistic number theory. Cambridge University Press, 1995.
Generating a Product of Three Primes with an Unknown Factorization Dan Boneh and Jeremy Horwitz Computer Science Department, Stanford University, Stanford, CA 94305-9045 {dabo,horwitz}@cs.stanford.edu
Abstract. We describe protocols for three or more parties to jointly generate a composite N = pqr which is the product of three primes. After our protocols terminate N is publicly known, but neither party knows the factorization of N . Our protocols require the design of a new type of distributed primality test for testing that a given number is a product of three primes. We explain the cryptographic motivation and origin of this problem.
1
Introduction
In this paper, we describe how three (or more) parties can jointly generate an integer N which is the product of three prime numbers (N = pqr). At the end of our protocol the product N is publicly known, but neither party knows the factorization of N . Our main contribution is a new type of probabilistic primality test that enables the three parties to jointly test that an integer N is the product of three primes without revealing the factorization of N . Our primality test simultaneously uses two groups: the group ZZ ∗N and the twisted multiplicative group TN = (ZZ N [x]/(x2 + 1))∗ /ZZ ∗N . The main motivation for this problem comes from cryptography, specifically the sharing of an RSA key. Consider classical RSA: N = pq is a public modulus, e is a public exponent, and d is secret where de = 1 mod ϕ(N ). At a high level, a digital signature of a message M is obtained by computing M d mod N . In some cases, the secret key d is highly sensitive (e.g. the secret key of a Certification Authority) and it is desirable to avoid storing it at a single location. Splitting the key d into a number of pieces and storing each piece at a different location avoids this single point of failure. One approach (due to Frankel [7]) is to pick three random numbers satisfying d = d1 + d2 + d3 mod ϕ(N ) and store each of the shares d1 , d2 , d3 at one of three different sites. To generate a signature of a message M , site i computes Si = M di mod N for i = 1, 2, 3 and sends the result to a combiner. The combiner multiplies the Si and obtains the signature S = S1 S2 S3 = M d mod N . If one or two of the sites are broken into, no information about the private key is revealed. An important property of this scheme is that it produces standard RSA signatures; the user receiving the signature is totally unaware of the extra precautions taken in protecting the private key. Note that during signature generation the secret key is never reconstructed at a single location. J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 237–251, 1998. c Springer-Verlag Berlin Heidelberg 1998
238
Dan Boneh and Jeremy Horwitz
To provide fault tolerance, one slightly modifies the above technique to enable any two of the three sites to generate a signature. This way if one of the sites is temporarily unavailable the Certification Authority can still generate signatures using the remaining two sites. If the key was only distributed among two sites, the system would be highly vulnerable to faults. We point out that classic techniques of secret sharing [14] are inadequate in this scenario. Secret sharing requires one to reconstruct the secret at a single location before it can be used, hence introducing a single point of failure. The technique described above of sharing the secret key such that it can be used without reconstruction at a single location is known as Threshold Cryptography. See [9] for a succinct survey of these ideas and nontrivial problems associated with them. An important question left out of the above discussion is key generation. Who generates the RSA modulus N and the shares d1 , d2 , d3? Previously the answer was a trusted dealer would generate N and distribute the shares d1 , d2 , d3 to the three sites. Clearly this solution is undesirable since it introduces a new single point of failure — the trusted dealer. It knows the factorization of N and the secret key d; if it is compromised, the secret key is revealed. Recently, Boneh and Franklin [2] designed a protocol that enables three (or more) parties to jointly generate an RSA modulus N = pq and shares d1 , d2 , d3 of a private key. At the end of the protocol the parties are assured that N is indeed the product of two large primes; however, none of them know its factorization. In addition, each party learns exactly one of d1 , d2 , d3 and has no computational information about the other shares. Thus, there is no need for a trusted dealer. We note that Cocks [5] introduced a heuristic protocol enabling two parties to generate a shared RSA key. In this paper we design an efficient protocol enabling three (or more) parties to generate a modulus N = pqr such that neither party knows the factorization of N . Once N is generated the same techniques used in [2] can be used to generate shares d1 , d2, d3 of a private exponent. For this reason, throughout the paper we focus on the generation of the modulus N = pqr and forgo discussion of the generation of the private key. The methods of [2] do not generalize to generate a modulus with three prime factors and new techniques had to be developed for this purpose. Our main results are described in Section 4 We remark that techniques of secure circuit evaluation [1,4,16] can also be used to solve this problem; however, these protocols are mostly theoretical and result in extremely inefficient algorithms.
2
Motivation
The problem discussed in the paper is a natural one and thus our solution is of independent interest. Nonetheless, the problem is well motivated by a method for improving the efficiency of shared generation of RSA keys. To understand this we must briefly recall the method used by Boneh and Franklin [2]. We refer
Generating a Product of Three Primes with an Unknown Factorization
239
to the three parties involved as Alice, Bob, and Carol. At a high level, to generate a modulus N = pq, the protocol works as follows: Step 1. Alice picks two random n bit integers pa, qa , Bob picks two random n bit integers pb , qb and Carol picks two random n bit integers pc , qc. They keep these values secret. Step 2. Using a private distributed computation they compute the value N = (pa + pb + pc )(qa + qb + qc ). At the end of the computation, N is publicly available; however no other information about the private shares is revealed. This last statement is provable in an information-theoretic sense. Step 3. The three parties perform a distributed primality test to test that N is the product of exactly two primes. As before, this step provably reveals no information about the private shares. Step (3), the distributed primality test, is a new type of probabilistic primality test which is one of the main contributions of [2]. Step (2) is achieved using an efficient variation of the BGW [1] protocol. A drawback of the above approach is that both factors of N are simultaneously tested for primality. Hence, the expected number of times step (3) is executed is O(n2 ). This is much worse than single-user generation of N where the two primes are first generated separately by testing O(n) candidates and then multiplied together. When generating a 1024-bit modulus, this results in significant slowdown when compared with single-user generation. To combat this quadratic slowdown, one may try the following alternate approach. Step 1. Alice picks a random n-bit prime p and a random n-bit integer ra . Bob picks a random n-bit prime q and a random n-bit integer rb . Carol picks a random n-bit integer rc . They keep these values secret. Step 2. Using a private distributed computation they compute the value N = pq(ra + rb + rc ) At the end of the computation, N is publicly available; however no other information about the private shares is revealed. Step 3. The three parties use the results of this paper to test that N is the product of exactly three primes. This step provably reveals no information about the private shares. At the end of the protocol neither party knows the full factorization of N . In addition, this approach does not suffer from the quadratic slowdown observed in the previous method. Consequently, it is faster by roughly a factor of 50 (after taking effects of trial division into account). As before, step (2) is carried out by an efficient variant of the BGW protocol.
240
Dan Boneh and Jeremy Horwitz
Instead of solving the specific problem of testing that N = pq(ra + rb + rc) is a product of three primes we solve the more general problem of testing that N = (pa + pb + pc )(qa + qb + qc)(ra + rb + rc ) is a product of three primes without revealing any information about the private shares. This primality test is the main topic of this paper. For the sake of completeness we point out that in standard single-party cryptography there are several advantages to using an RSA modulus N = pqr rather than the usual N = pq (the size of the modulus is the same in both cases). First, signature generation is much faster using the Chinese Remainder Theorem (CRT). When computing M d mod N one only computes M d mod p−1 mod p for all three factors. Since the numbers (and exponents) are smaller, signature generation is about twice as fast as using CRT with N = pq. Another advantage is that an attack on RSA due to Wiener [15] becomes less effective when using N = pqr. Wiener showed that for N = pq if d < N 1/4 one can recover the secret key d from the public key. When N = pqr the attack is reduced to d < N 1/6 and hence it may be possible to use smaller values of d as the secret key. Finally, we note that the fastest factoring methods [12] cannot take advantage of the fact that the factors of N = pqr are smaller than those of a standard RSA modulus N = pq.
3
Preliminaries
In this section, we explain the initial setup for our new probabilistic primality test and how it is obtained. We then explain a basic protocol which we use in the later parts of the paper. At first reading, the reader may wish to skip to Section 4 and take on faith that the necessary setup is attainable. 3.1
Communication and Privacy Model
The communication and privacy model assumed by our protocol are as follows: Full Connectivity. Any party can communicate with any other party. This is a typical setup on a local network or the Internet. Private and Authenticated Channels. Messages sent from party A to party B are private and cannot be tampered with en route. This simply states that A and B share a secret key which they can use for encryption and authentications. Honest Parties. We assume all parties are honest in their following of the protocol. This is indeed the case when they are truly trying to create a shared key. This assumption is used by both [2] and [5]. We note that some recent work [8] makes the protocol of [2] robust against cheating adversaries at the cost of some slowdown in performance (roughly a factor of 100). These robustness results apply to the protocols described in this paper as well.
Generating a Product of Three Primes with an Unknown Factorization
241
Collusion. Our protocol is 1-private. That is to say that a single party learns no information about the factorization of N = pqr. However, if two of the three parties collude they can recover the factors. For three parties this is fine since our goal is to enable two-out-of-three signature generation. Hence, two parties are always jointly able to recover the secret key. More generally, when k parties participate in our primality test protocol, one can achieve b k−1 2 c-privacy. That is, any minority of parties learns no information about the factors of N . 3.2
Generation of N
In the previous section, we explained that Alice, Bob, and Carol generate N as N = (pa + pb + pc)(qa + qb + qc)(ra + rb + rc ), where party i knows pi , qi , ri for i = a, b, c and keeps these shares secret while making N publicly available. To compute N without revealing any other information about the private shares, we use the BGW protocol [1]. For the particular function above, the protocol is quite efficient; it requires three rounds of communication and a total of six messages. The protocol is information-theoretically secure, i.e. other than the value of N , party i has no information about the shares held by other parties. This is to say the protocol is 1-private. We do not go into the details of how the BGW protocol is used to compute N since it is tangential to the topic of this paper. For our purpose, it suffices to assume that N is public while the private shares are kept secret. An important point is that our primality test can only be applied when pa + pb + pc = qa + qb + qc = ra + rb + rc = 3 mod 4. Hence, the parties must coordinate the two lower bits of their shares ahead of time so that the sums are indeed 3 modulo 4. Indeed, this means that a priori each party knows the two least significant bits of the other’s shares. 3.3
Sharing of (p − 1)(q − 1)(r − 1) and (p + 1)(q + 1)(r + 1)
Let p = pa + pb + pc , q = qa + qb + qc , and r = ra + rb + rc . We define ϕˆ = (p − 1)(q − 1)(r − 1). Since p, q, and r are not necessarily prime, ϕ ˆ may not equal ϕ(N ). Our protocol requires that the value ϕ ˆ be shared additively among the three parties. That is, ϕ ˆ = ϕa + ϕb + ϕc , where only party i knows ϕi for i = a, b, c. An additive sharing of ϕ ˆ is achieved by observing that ϕˆ = N −pq −pr −qr + p + q + r − 1. To share ϕ, ˆ it suffices to represent pq + pr + qr using an additive sharing A + B + C among the three parties. The additive sharing of ϕ ˆ is then ϕa = N −A+pa +qa +ra −1
;
ϕb = −B+pb +qb +rb
;
ϕc = −C +pc +qc +rc .
The conversion of pq + pr + qr into an additive sharing A + B + C is carried out using a simple variant of the BGW protocol used in the computation of N . The
242
Dan Boneh and Jeremy Horwitz
BGW protocol can be used to compute the value pq; however, instead of making the final result public the BGW variant shares the result additively among the three parties. The details of this variant can be found in [2, Section 6.2]. As before, we do not give the full details of the protocol for converting pq + pr + qr into an additive sharing. Since we wish to focus on the primality test, we assume that an additive sharing of ϕˆ is available in the form of ϕa + ϕb + ϕc . In addition to a sharing of ϕ, ˆ we also require an additive sharing of ψˆ = (p + 1)(q + 1)(r + 1). Once an additive sharing of pq + pr + qr is available it is ˆ Simply set trivial to generate an additive sharing of ψ. ψa = N +A+pa +qa +ra +1 ; 3.4
ψb = B +pb +qb +rb
;
ψc = C +pc +qc +rc .
Comparison Protocol
Our primality test makes use of what we call a comparison protocol. Let A be a value known to Alice, B a value known to Bob, and C a value known to Carol. We may assume A, B, C ∈ ZZ ∗N . The protocol enables the three parties to test that ABC = 1 mod N without revealing any other information about the product ABC. We give the full details of the protocol in this section. Let P > N be some prime known to all parties. The protocol proceeds as follows: Step 1. Carol picks a random element C1 ∈ ZZ ∗N and sets C2 = CC1−1 mod N . Clearly C = C1 C2 mod N . Carol then sends C1 to Alice and C2 to Bob. Step 2. Alice sets A0 = AC1 and Bob sets B 0 = (BC2 )−1 mod N . Both values A0 and B 0 can be viewed as integers in the range [0, N ). The problem is now reduced to testing whether A0 = B 0 (as integers) without revealing any other information about A and B. Step 3. Alice picks a random c ∈ ZZ ∗P and d ∈ ZZ P . She sends c and d to Bob. Alice then computes h(A0 ) = cA0 + d mod P and sends the result to Carol. Bob computes h(B 0 ) = cB 0 + d mod P and sends the result to Carol. Step 4. Carol tests if h(A0 ) = h(B 0 ) mod P . If so, she announces that ABC = 1 mod N . Otherwise she announces ABC 6= 1 mod N . The correctness and privacy of the protocol are stated in the next two lemmata. Correctness is elementary and is stated without proof. Lemma 1. Let A, B, C ∈ ZZ ∗N . At the end of the protocol, the parties correctly determine if ABC = 1 mod N or ABC 6= 1 mod N . Lemma 2. The protocol is 1-private. That is, other than the result of the test, each party learns no new information. Proof. To prove the protocol is 1-private, we provide a simulation argument for each party’s view of the protocol. Alice’s view of the protocol is made up of the values A, C1 , c, d, h(A0 ), and the final result of the test. These values can be easily simulated by picking C1 at random in ZZ ∗N , picking c at random in
Generating a Product of Three Primes with an Unknown Factorization
243
ZZ ∗P and d at random in ZZ P . This is a perfect simulation of Alice’s view. A simulation argument for Bob is the same, mutatis mutandis. Simulating Carol’s view is more interesting. Carol’s view consists of C, C1 , C2 , h(A0 ), h(B 0 ), and the result of the test. The key observation we make is that h(A0 ) and h(B 0 ) reveal no information about A and B, since they are either equal or are random independent elements of ZZ P . Which of the two actually occurs is determined by the result of the test. The independence follows since the family of hash functions h(x) = cx + d mod P is a universal family of hash functions (i.e. knowing neither c nor d, the values h(x) and h(y) are independent for any x, y ∈ ZZ P ). To simulate Carol’s view, the simulator picks C1 , C2 ∈ ZZ ∗N at random so that C = C1 C2 mod N . Then, depending on the results of the test, it either picks the same random element of ZZ P twice or picks two random independent elements of ZZ P . This is a perfect simulation of Carol’s view. This proves that Carol gains no extra information from the protocol since, given the outcome of the test, she can generate the values sent by Alice and Bob herself. t u
4
The Probabilistic Primality Test
We are ready to describe our main result, the probabilistic primality test. As discussed in the previous section, our primality test applies once the following setup is achieved: Shares. Each party i has three secret n-bit values pi , qi , and ri for i = a, b, c. The Modulus. N = (pa + pb + pc )(qa + qb + qc )(ra + rb + rc ) is public. We set p = pa + pb + pc , q = qa + qb + qc, and r = ra + rb + rc . Throughout this section, we are assuming that p = q = r = 3 mod 4. Thus, the parties must a priori coordinate the two least significant bits of their shares so that this condition holds. ˆ The parties share (p − 1)(q − 1)(r − 1) as ϕa + ϕb + ϕc and Sharing ϕ, ˆ ψ. (p + 1)(q + 1)(r + 1) as ψa + ψb + ψc . Given this setup, they wish to test that p, q, and r are distinct primes without revealing any of p, q, and r. At this point, nothing is known about p, q, and r other than that p = q = r = 3 mod 4. Throughout the section, we use the following notation: ϕˆ = ϕa + ϕb + ϕc = (p − 1)(q − 1)(r − 1) ψˆ = ψa + ψb + ψc = (p + 1)(q + 1)(r + 1) ˆ Clearly, if N is a product of three distinct primes, then ϕ(N ) = ϕ ˆ and ψ(N ) = ψ. Otherwise, these equalities may not hold. Our primality test is made up of four steps. In the subsequent subsections, we explain how each of these steps is carried out without revealing any information about the factors of N .
244
Dan Boneh and Jeremy Horwitz
Probabilistic Test that N is a Product of Three Primes: Step 1. The parties pick a random g ∈ ZZ ∗N and jointly test that gϕa +ϕb +ϕc = 1 mod N . If the test fails, N is rejected. This step reveals no information other than the outcome of the test. We refer to this step as a Fermat test in ZZ ∗N (see Section 4.2 for details). Step 2. The parties perform a Fermat test in the twisted group TN defined as (ZZ N [x]/(x2 + 1))∗ /ZZ ∗N . Notice that x2 + 1 is irreducible modulo N , since p = q = r = 3 mod 4. If N is the product of three distinct primes then the order of TN is ψ(N ) = (p + 1)(q + 1)(r + 1). To carry out the Fermat test in TN , the parties pick a random g ∈ TN and jointly test that gψa +ψb +ψc = 1 (see Section 4.2 for details). If the test fails, N is rejected. This step reveals no information other than the outcome of the test. Step 3. The parties jointly test that N is the product of at most three prime powers. The implementation of this step is explained in Section 4.1. If the test fails, N is rejected. Step 4. The parties jointly test that gcd(N, p + q + r) = 1. This step reveals no information other than the outcome of the test. The implementation of this step is explained in Section 4.3. If the test fails, N is rejected. Otherwise, N is accepted as the product of three primes. The following fact about the twisted group TN = (ZZ N [x]/(x2 + 1))∗ /ZZ ∗N is helpful in the proof of the primality test. Fact 1. Let N be an integer and k a prime such that k 2 N . Then, k divides both ϕ(N ) and |TN |. Proof. Let α ≥ 2 be the number of times k divides N , i.e. N = k α w where gcd(k, w) = 1. Then ϕ(N ) = k α−1 (k − 1)ϕ(w) and, hence, k divides ϕ(N ). To see that k divides |TN |, note that TN ∼ = Tk α × Tw . When k = 3 mod 4, we know that x2 + 1 is irreducible in ZZ k and, hence, |Tk α | = k α−1 (k + 1). It follows that k divides |TN |. When k = 1 mod 4, we have |Tk α | = k α−1 (k − 1) t u and, again, k divides |TN |. We can now prove that the aforementioned four steps are indeed a probabilistic test for proving that N is a product of three primes. Theorem 2. Let N = pqr = (pa + pb + pc)(qa + qb + qc )(ra + rb + rc ), where p = q = r = 3 mod 4 and gcd(N, p+q +r) = 1. If N is a product of three primes, it is always accepted. Otherwise, N is rejected with probability at least half. The probability is over the random choices made in steps 1–4 above. Proof. Suppose p, q, and r are distinct primes. Then, steps (1), (2), and (3) clearly succeed. Step (4) succeeds by assumption. Hence, in this case, N always passes the test (as required).
Generating a Product of Three Primes with an Unknown Factorization
245
Suppose N is not the product of three distinct primes. Assume, for the sake of deriving a contradiction, that N passes all four steps with probability greater than 1/2. Since N passes step (3) with probability greater than 1/2 we know that N = z1α1 z2α2 z3α3 for three primes z1 , z2 , and z3 (not necessarily distinct). Since N passes step (4) we know gcd(N, p + q + r) = 1. Define the following two groups: G = g ∈ ZZ ∗N gϕa +ϕb +ϕc = 1 and H = g ∈ TN gψa +ψb +ψc = 1 . Clearly, G is a subgroup of ZZ ∗N and H is a subgroup of the twisted group TN . By showing that at least one of G or H is a proper subgroup, we will prove that either step (1) or (2) fails with probability at least 1/2. There are two cases to consider: Case 1: p, q, and r are not pairwise relatively prime. Without loss of generality, we may assume that gcd(p, q) > 1. Let k be a prime factor of gcd(p, q). Recall that N is odd, so k > 2 (since k divides N ). Since N = pqr we know that k 2 N . Hence, by Fact 1, k ϕ(N ) and k |TN |. ˆ To see this, We claim that either k does ˆ or k does not divide ψ. not divide ϕ ˆ ˆ ˆ = p(2q + 2r) + q(2r) + 2. observe that if k ϕˆ and k ψ, then k divides ψ − ϕ Since k divides both p and q, we conclude that k 2, which contradicts k > 2. First, we examine when k does not divide ϕ. ˆ Since k is a prime factor of ϕ(N ), there exists an element g ∈ ZZ ∗N of order k. However, since k does not divide ϕˆ we know that gϕˆ 6= 1. Hence, g 6∈ G proving that G is a proper subgroup of ZZ ∗N . If k does not divide ψˆ a similar argument proves that H is a proper subgroup of the twisted group TN . Case 2: p, q, and r are pairwise relatively prime. We can write p = z1α , q = z2β , and r = z3γ with z1 , z2 , and z3 distinct primes. By assumption, we know that one of α, β, or γ is greater than 1. Without loss of generality, we may assume α > 1. ˆ Indeed, if this ˆ ψ). We first observe that none of the zi can divide gcd(ϕ, ˆ + ψˆ = 2(N + p + q + r). But were not the case, then zi would divide ϕ then, since zi divides N , it must also divide p + q + r, contradicting that gcd(N, p + q + r) = 1 (as tested in step (4)). ˆ not divide We now know that either z1 does not divide ϕˆ or it does ψ. 2 However, since z1 divides N , we obtain, by Fact 1, that z1 ϕ(N ) and z1 |TN |. We can now proceed as in case (1) to prove that either G is a proper subgroup of ZZ ∗N or H is a proper subgroup of TN . t u Clearly, most integers N that are not a product of three primes will already fail step (1) of the test. Hence, steps (2–4) are most likely executed only once a good candidate N is found. Notice that the condition gcd(N, p + q + r) = 1 is necessary. Without it, the theorem is false as can be seen from the following simple example: p = w3 ,
246
Dan Boneh and Jeremy Horwitz
q = aw 2 + 1, and r = bw 2 − 1 where w, q, r are three odd primes with p = q = r = 3 mod 4. In this case, N = pqr will always pass steps 1–3, even though it is not a product of three distinct primes. 4.1
Step 3: Testing that N = pα q β rγ
Our protocol for testing that N is a product of three prime powers borrows from a result of van de Graaf and Peralta [11]. Our protocol works as follows: Step 0. From our construction of ϕ, ˆ we know that it is divisible by 8. However, ˆ may not all be divisible the individual shares ϕa , ϕb , and ϕc which sum to ϕ by 8. To correct this, Alice generates two random numbers a1 , a2 ∈ ZZ 8 such that a1 + a2 = ϕa mod 8. She sends a1 to Bob and a2 to Carol. Alice sets ϕa ← ϕa − a1 − a2 , Bob sets ϕb ← ϕb + a1 and Carol set ϕc ← ϕc + a2 . Observe that, at this point, ϕa j ϕb k l ϕc m ϕ ˆ + . = + 8 8 8 8 Step 1. The parties first agree on eight random numbers g1 , g2, . . . , g8 in ZZ ∗N , all with Jacobi symbol +1. Step 2. For i, j ∈ {1, 2, . . . , 8}, we say that i is equivalent to j (this defines equivalence classes of {1, 2, . . ., 8}) if
gi gj
ϕa +ϕ8b +ϕc = 1 (mod N ).
Since all three parties know gi and gj , they can test if i is equivalent to j as follows: 1. Alice computes A = (gi /gj )ϕa /8 mod N , Bob computes B = (gi /gj )bϕb /8c mod N , and Carol computes C = (gi /gj )dϕc /8e mod N . 2. Using the comparison protocol of Section 3.4, they then test if ABC = 1 mod N . The comparison protocol reveals no information other than whether or not ABC = 1 mod N . Step 3. If the number of equivalence classes is greater than four, N is rejected. Otherwise, N is accepted. Testing that the number of equivalences classes is at most four requires at most twenty-two invocations of the comparison protocol. Note that we restrict our attention to the elements gi with Jacobi symbol +1 for efficiency’s sake. Without this restriction, the number of equivalence classes to check for is eight and, thus, many more applications of the comparison protocol would be necessary. The following lemma shows that when N is a product of three distinct primes, it is always accepted. When N has more than three prime factors, it is rejected
Generating a Product of Three Primes with an Unknown Factorization
247
with probability at least 1/2. Note that if N is a product of three prime powers, it will always be accepted by this protocol. We will use the following notation: o n g = +1 J = g ∈ ZZ ∗N N Q = g ∈ J g is a quadratic residue in ZZ ∗N The index of Q in J is 2d(N)−1 or 2d(N) (exactly when N is a perfect square), where d(N ) is the number of distinct prime factors of N . Lemma 1. Let N = pqr be an integer with p = q = r = 3 mod 4. If p, q, and r are distinct primes, then N is always accepted. If the number of distinct prime factors of N is greater than three, then N is rejected with probability at least half. Proof. If N is the product of three distinct primes, then the index of Q in J is four. Two elements g1 , g2 ∈ ZZ ∗N belong to the same coset of Q in J if and only if g1 /g2 is a quadratic residue, i.e. if and only if (g1 /g2 )ϕ(N)/8 = 1 mod N . Since, in this case, ϕ(N ) = ϕˆ = ϕa + ϕb + ϕc , step (2) tests if gi and gj are in the same coset of Q. Since the number of cosets is four, there are exactly four equivalence classes and, thus, N is always accepted. If N contains at least four distinct prime factors, we show that it is rejected with probability at least 1/2. Define o n ˆ ˆ = g ∈ J gϕ/8 = 1 (mod N ) . Q ˆ need not be the same as Since, in this case, ϕˆ may not equal ϕ(N ), the group Q the group Q. ˆ in J is at least eight. Since p = q = r = We now show that the index of Q 3 mod 4, we know that ϕ/8 ˆ is odd (since ϕˆ = (p − 1)(q − 1)(r − 1) ). Notice that if g ∈ J satisfies gx = 1 for some odd x, g must be a quadratic residue (a root ˆ ⊆ Q and hence is a subgroup of Q. Since the index of Q is g(x+1)/2 ). Hence, Q ˆ in J is at least eight. in J is at least eight, it follows that the index of Q ˆ in J is at least eight, N is It remains to show that when the index of Q rejected with probability at least 1/2. In step (2), two elements g1 , g2 ∈ J are ˆ in J. Let R be the said to be equivalent if they belong to the same coset of Q event that all 8 elements gi ∈ J chosen randomly in step (1) fall into only four of the eight cosets. Then 8 1 8 1 ≈ 0.27 < . Pr[R] ≤ · 2 2 4 N is accepted only when the event R occurs. Since R occurs with probability less than 1/2, the number N is rejected with probability at least 1/2 as, required. u t Next, we prove that the protocol leaks no information when N is indeed the product of three distinct primes. In case N is not of this form, the protocol may
248
Dan Boneh and Jeremy Horwitz
leak some information; however, in this case, N is discarded and is of no interest. To prove that the protocol leaks no information we rely on a classic cryptographic assumption (see [3]) called Quadratic Residue Indistinguishability (QRI). This cryptographic assumption states that when N = pq with p = q = 3 mod 4, no polynomial time algorithm can distinguish between the groups J and Q defined above. In other words, for any polynomial time algorithm A and any constant c > 0, 1 Pr [A(g) = “yes”] − Pr [A(g) = “yes”] < (log N )c . g∈J g∈Q Lemma 2. If N is a product of three distinct primes, then the protocol is 1private, assuming QRI. Proof Sketch. To prove that each party learns no information other than that N is a product of three prime powers, we provide a simulation argument. We show that each party can simulate its view of the protocol; hence, whatever values it receives from its peers, it could have generated itself. By symmetry, we need only consider Alice. Alice’s view of the protocol consists of the elements g1 , g2 , . . . , g8 and bit values bi,j indicating whether (gi /gj )ϕˆ = 1 (recall that we already gave a simulation algorithm for the comparison protocol in Section 3.4). Thus, Alice learns whether or not each gi /gj is a quadratic residue. We argue that, under QRI, this provides no computational information (since it can be simulated). To simulate Alice’s view, the simulation algorithm works as follows: it picks eight random elements g1 , g2 , . . . , g8 ∈ J. It then randomly associates with each gi a value in the set {0, 1, 2, 3}. This value represents the coset of Q in which gi lies. The simulator then says that gi /gj is a quadratic residue if and only if the value associated with gi is equal to that associated with gj . Under QRI, the resulting distribution on g1 , g2, . . . , g8 , b1,1, b1,2 , . . . , b8,8 is computationally indistinguishable from Alice’s true view of the protocol. We note that the value a1 ∈ ZZ 8 that Alice sends Bob in step (0) is an element of ZZ 8 chosen uniformly at random. Hence, Bob can simulate it trivially. t u Similarly, Carol can trivially simulate a2 ∈ ZZ 8 . 4.2
Implementing a Fermat Test with No Information Leakage
We briefly show how to implement a Fermat test in ZZ ∗N without leaking any extra information about the private shares. The exact same method works in the twisted group TN as well. To check that g ∈ ZZ ∗N satisfies gϕa +ϕb +ϕc = 1 mod N , we perform the following steps: Step 1. Each party computes Ri = gϕi mod N (i = a, b, c). Step 2. They test that Ra Rb Rc = 1 mod N by simply revealing the values R1 , R2 , and R3 . Accept N if the test succeeds. Otherwise, reject. Clearly, the protocol succeeds if and only if gϕˆ = 1 mod N . We show that it leaks no other information.
Generating a Product of Three Primes with an Unknown Factorization
249
Lemma 3. If N = pqr is the product of three distinct primes, then the protocol is 2-private. Proof. We show that no two parties learn any information about the private share of the third (other than that gϕˆ = 1 mod N ). By symmetry, we may restrict our attention to Alice and Bob. Since, by assumption, N is the product of three distinct primes, we know that gϕˆ = 1 mod N . Hence, gϕa +ϕb = g−ϕc . To simulate the value received from Carol, the simulation algorithm simply computes Rc = g−ϕa −ϕb . Indeed, this is a perfect simulation of Alice and Bob’s view. Thus, they learn nothing from Carol’s message since they could have generated it themselves. t u 4.3
Step 4: Zero-Knowledge Test that gcd(N, p + q + r) = 1
Our protocol for this step is based on a protocol similar to the one used in the computation of N . We proceed as follows: Step 1. Alice picks a random ya ∈ ZZ N , Bob picks a random yb ∈ ZZ N , and Carol picks a random yc ∈ ZZ N . Step 2. Using the BGW protocol as in Section 3.2, they compute R = (pa + qa + ra + pb + qb + rb + pc + qc + rc)(ya + yb + yc ) mod N. At the end of the protocol, R is publicly known; however, no other information about the private shares is revealed. Step 3. Now that R is public, the parties test that gcd(R, N ) = 1. If not, N is rejected. Otherwise, N is accepted. Lemma 4. If N is the product of three distinct n-bit primes p, q, and r, with gcd(N, p + q + r) = 1, then N is accepted with probability 1 − for some < 1/2n−3. Otherwise, N is always rejected. Proof. Clearly, if gcd(N, p + q + r) > 1 then gcd(R, N ) > 1 and therefore N is always rejected. If gcd(N, p + q + r) = 1, then N is rejected only if gcd(N, ya + yb + yc ) > 1. Since ya + yb + yc is a random element of ZZ N , this happens with t u probability less than (1/2)n−3 . Lemma 5. If N is the product of three distinct n-bit primes p, q, and r, with gcd(N, p + q + r) = 1, then the protocol is 1-private. Proof. Note that, since the BGW protocol is 1-private, the above protocol can be at best 1-private. By symmetry, we need only show how to simulate Alice’s view. Alice’s view consists of her private shares pa , qa , ya and the number R. Since R is independent of her private shares, the simulator can simulate Alice’s t u view by simply picking R in ZZ N at random. This is a perfect simulation.
250
5
Dan Boneh and Jeremy Horwitz
Extensions
One can naturally extend our protocols in two ways. First, one may allow more than three parties to generate a product of three primes with an unknown factorization. Second, one may wish to design primality tests for testing that N is a product of k primes for some small k. We briefly discuss both extensions below. Our protocols easily generalize to allow any number of parties. When k parties are involved, the protocols can be made b k−1 2 c-private. This is optimal in the information-theoretic sense and follows from the privacy properties of the BGW protocol. The only complexities in this extension are the comparison protocol of Section 3.4 and Step (0) of Section 4.1. Both protocols generalize to k parties; however, they require a linear (in k) number of rounds of communication. Securely testing that N is a product of k primes for some fixed k > 3 seems to be harder. Our results apply when k = 4 (indeed Theorem 2 remains true in this case). For k > 4, more complex algorithms are necessary. This extension may not be of significant interest since it is not well motivated and requires complex protocols. Another natural question is whether only two parties can generate a product of three primes with an unknown factorization. The answer appears to be yes, although the protocols cannot be information-theoretically secure. Essentially, one needs to replace the BGW protocol for computing N with a two-party private multiplication protocol. This appears to be possible using results of [5].
6
Conclusions and Open Problems
Our main contribution is the design of a probabilistic primality test that enables three (or more) parties to generate a number N with an unknown factorization and test that N is the product of three distinct primes. The correctness of our primality test relies on that we simultaneously work in two different subgroups of ZZ N [x]/(x2 + 1)∗ , namely ZZ ∗N and the twisted multiplicative group TN . Our protocol generalizes to an arbitrary number of parties k and achieves b k−1 2 cprivacy — the best possible in an information-theoretic setting. Recall that our primality test can be applied to N = pqr whenever p = q = r = 3 mod 4. We note that simple modifications enable one to apply the test when p = q = r = 1 mod 4 (essentially, this is done by reversing the roles of ZZ ∗N and the twisted group). However, it seems that one of these restrictions is necessary; we do not know how to carry out the test without the assumption that p = q = r mod 4. The assumption plays a crucial role in the proof of Lemma 1. A natural question to ask is whether more advanced primality testing techniques can be used to improve the efficiency of our test. For instance, recent elegant techniques due to Grantham [10] may be applicable in our scenario as well.
Generating a Product of Three Primes with an Unknown Factorization
251
References 1. M. Ben-Or, S. Goldwasser, and A. Wigderson. Completeness theorems for noncryptographic fault tolerant distributed computation. In Proceedings of the 20th Annual ACM Symposium on Theory of Computing, pages 1–10. ACM Press, 1988. 2. D. Boneh and M. Franklin. Efficient generation of shared RSA keys. In Proceedings of Advances in Cryptology: CRYPTO ’97, pages 425–439. Lecture Notes in Computer Science, Springer-Verlag, New York, 1998. 3. M. Blum and S. Goldwasser. An efficient probabilistic public key encryption scheme that hides all partial information. In Proceedings of Advances in Cryptology: CRYPTO ’84, pages 289–302. Lecture Notes in Computer Science, Springer-Verlag, New York, 1985. 4. D. Chaum, C. Cr´epeau, and I. Damg˚ ard. Multiparty unconditionally secure protocols. In Proceedings of the 20th Annual ACM Symposium on Theory of Computing, pages 11–19. ACM Press, 1988. 5. C. Cocks. Split knowledge generation of RSA parameters. Available from the author (cliff [email protected]). 6. R. Fagin, M. Naor, and P. Winkler. Comparing information without leaking it. Communications of the ACM, 39(5):77–85, May 1996. 7. Y. Frankel. A practical protocol for large group oriented networks. In Proceedings of Advances in Cryptology: EUROCRYPT ’88, pages 56–61. Lecture Notes in Computer Science, Springer-Verlag, New York, 1990. 8. Y. Frankel, P. MacKenzie, and M. Yung. Robust efficient distributed RSA key generation. Preprint. 9. P. Gemmel. An introduction to threshold cryptography. CryptoBytes (a technical newsletter of RSA Laboratories), 2(7), 1997. 10. J. Grantham. A probable prime test with high confidence. Available online (http://www.clark.net/pub/grantham/pseudo/). 11. R. Peralta and J. van de Graaf. A simple and secure way to show the validity of your public key. In Proceedings of Advances in Cryptology: CRYPTO ’87, pages 128–134. Lecture Notes in Computer Science, Springer-Verlag, New York, 1988. 12. A. Lenstra and H. W. Lenstra ed. The development of the number field sieve. Lecture Notes in Computer Science 1554, Springer-Verlag, 1994. 13. H. W. Lenstra. Factoring integers with elliptic curves. Annals of Mathematics, 126:649–673, 1987. 14. A. Shamir. How to share a secret. Communications of the ACM, 22(11):612–613, November 1979. 15. M. Wiener. Cryptanalysis of short RSA secret exponents. IEEE Transactions on Information Theory, 36(3):553-558, 1990. 16. A. Yao. How to generate and exchange secrets. In Proceedings of the 27th Annual ACM Symposium on Theory of Computing, pages 162–167. IEEE Press, 1986.
On the Performance of Signature Schemes Based on Elliptic Curves Erik De Win1? , Serge Mister2 , Bart Preneel1 ?? , and Michael Wiener3 1
3
Katholieke Universiteit Leuven, ESAT/COSIC K. Mercierlaan 94, 3001 Heverlee, Belgium {erik.dewin,bart.preneel}@esat.kuleuven.ac.be 2 Queen’s University, Department of Electrical and Computer Engineering Kingston, Ontario, K7L 3N6, Canada [email protected] Entrust Technologies, 750 Heron Road, Ottawa (Ontario) K1V 1A7, Canada [email protected]
Abstract. This paper describes a fast software implementation of the elliptic curve version of DSA, as specified in draft standard documents ANSI X9.62 and IEEE P1363. We did the implementations for the fields GF(2n ), using a standard basis, and GF(p). We discuss various design decisions that have to be made for the operations in the underlying field and the operations on elliptic curve points. In particular, we conclude that it is a good idea to use projective coordinates for GF(p), but not for GF(2n ). We also extend a number of exponentiation algorithms, that result in considerable speed gains for DSA, to ECDSA, using a signed binary representation. Finally, we present timing results for both types of fields on a PPro-200 based PC, for a C/C++ implementation with small assembly-language optimizations, and make comparisons to other signature algorithms, such as RSA and DSA. We conclude that for practical sizes of fields and moduli, GF(p) is roughly twice as fast as GF(2n ). Furthermore, the speed of ECDSA over GF(p) is similar to the speed of DSA; it is approximately 7 times faster than RSA for signing, and 40 times slower than RSA for verification (with public exponent 3).
1
Introduction
Elliptic curve public key cryptosystems (ECPKCs) were proposed independently by Victor Miller [M85a] and Neil Koblitz [K87] in the mid-eighties, but it is only recently that they are starting to be used in commercial systems. See [M93] for an introduction to practical aspects of public key cryptosystems based on elliptic curves. The elliptic curve discrete logarithm problem (ECDLP) has been studied ?
??
F.W.O.-Flanders research assistant, sponsored by the Fund for Scientific Research – Flanders. Most of the work presented in this paper was done during an internship with Entrust Technologies in Ottawa, Canada. F.W.O.-Flanders postdoctoral researcher, sponsored by the Fund for Scientific Research – Flanders.
J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 252–266, 1998. c Springer-Verlag Berlin Heidelberg 1998
On the Performance of Signature Schemes Based on Elliptic Curves
253
for several years now, and no significant weaknesses have been found, although some special instances of it have been broken [MOV93], [S97a]. A number of publications discuss software implementations of ECPKCs. [HMV92] is probably the earliest, and uses the field GF(2n ), where the field elements are represented in an optimal normal basis [MOVW88] or as polynomials over the subfield GF(28 ). [SOOS95] uses a standard basis for GF(2n ), where the irreducible field polynomial is a trinomial. [DBV+96] and [GP97] represent the elements of GF(2n ) as polynomials over the subfield GF(216 ). Few comparisons of ECPKCs to other public key cryptosystems are available; only [SOOS95] compares Diffie-Hellman key agreement using elliptic curves over GF(2n ) to its counterpart using large integer numbers, and concludes that the elliptic curvebased version is several times faster, the exact ratio depending on the platform and the amount of optimization used. As far as we know, [MOC97] is the only implementation of ECPKCs over GF(p) that has been reported, and no comparisons have been made between elliptic curves over GF(2n ) and over GF(p). In this paper, we present an implementation of a signature scheme based on elliptic curves. The signature scheme used is elliptic curve DSA (ECDSA), as defined in the ANSI X9.62 draft standard and the IEEE P1363 draft standard. We consider curves both over GF(2n ) and GF(p), in each case using curves that are specified in ANSI X9.62. The remaining part of this paper is organized as follows. Section 2 gives more background on elliptic curves, elliptic curve public key cryptosystems, and related standards. Sections 3 and 4 discuss implementation considerations that are specific to GF(p) and GF(2n ) respectively. Section 5 discusses issues related to operations on elliptic curve points, operations that are common to both GF(p) and GF(2n ). The overall timing results and comparisons to other public key cryptosystems appear in Section 6. A number of topics for further work and research are given in Section 7.
2
Elliptic Curve Cryptosystems
Elliptic curves have been studied by mathematicians since long before they were used in cryptography. Apart from their use for public key cryptosystems, they formed the basis of the elliptic curve factoring method [L87] and of several methods for primality proving, e.g. [AM93]. Recently, they were an important tool in the proof of Fermat’s last theorem. An elliptic curve is the set of solutions of a Weierstrass equation over a mathematical structure, usually a field. For cryptographic purposes, this field is mostly a finite field of the form either GF(p) or GF(2n ). In these particular cases, the Weierstrass equation can be reduced to the following simpler forms: y2 = x3 + ax + b over GF(p), with a, b ∈ GF(p) and 4a3 + 27b2 6= 0 ; y2 + xy = x3 + ax2 + b over GF(2n ), with a, b ∈ GF(2n ) and b 6= 0 . If the formal point at infinity O is added to the set of solutions, an addition operation can be defined, and this turns the set into a group. The addition
254
Erik De Win et al.
operation is defined as follows. Let P1 = (x1 , y1 ) and P2 = (x2 , y2 ) be two points on the elliptic curve, neither the point at infinity. Over GF(p): The inverse of a point P1 is −P1 = (x1 , −y1 ). If P2 6= −P1 , then P1 + P2 = P3 = (x3 , y3 ), with x3 = λ2 − x1 − x2 , y3 = λ(x1 − x3 ) − y1 , and
y − y1 2 if P1 = 6 P2 , x 2 − x1 λ= 3x21 + a if P1 = P2 . 2y1
Over GF(2n ): The inverse of a point P1 is −P1 = (x1 , y1 + x1 ). If P2 6= −P1 , then P1 + P2 = P3 = (x3 , y3 ) with x3 = λ2 + λ + x1 + x2 + a , y3 = λ(x1 + x3 ) + x3 + y1 , y + y1 2 if P1 6= P2 , λ = x2 + x1 y x1 + 1 if P1 = P2 . x1 For both fields we have the following formulas for the cases where the point at infinity is involved: P1 + (−P1 ) = O, O + P1 = P1 + O = P1 and O + O = O. The basic assumption of elliptic curve public key cryptosystems is that the discrete logarithm problem in the elliptic curve group (ECDLP) is a hard problem. Hence all public key cryptographic primitives based on the discrete logarithm over the integers modulo a prime can be translated to an equivalent primitive based on the ECDLP. Moreover, the ECDLP is currently considered to be harder than the integer DLP. Therefore, the sizes of fields, keys, and other parameters can be chosen considerably smaller for elliptic curve based systems; typical field sizes are between 160 and 200 bits. This can be especially advantageous in systems where resources such as memory and/or computing power are limited, but even where this is not the case, ECPKCs turn out to be competitive to other public key cryptosystems such as RSA and DSA. An important condition for the practical usefulness of ECPKCs is that we can efficiently implement the point multiplication operation, which is the repeated group operation, and the equivalent of exponentiation in systems based on the discrete logarithm problem for integers modulo a prime. As became clear from the definition above, the elliptic curve group operation can be expressed in terms of a number of operations in the underlying field. For all cases where the point at infinity is not involved, we see that for the calculation of one group operation, we need 1 field inversion, 2 general field multiplications, 1 or 2 field squarings, a and
On the Performance of Signature Schemes Based on Elliptic Curves
255
number of field additions or subtractions, and a number of multiplications by a fixed small constant. For the case GF(2n ), we will see that only field inversions and multiplications need to be counted, since the other operations are much faster and their share in the overall time for a group operation is negligible. For GF(p), the time needed for a squaring is of the same order of magnitude as the time needed for a multiplication, so we have to take into account the squarings as well; the number of squarings is 1 for a general point addition and 2 for a point doubling (i.e. the case where P1 = P2 ). An important design decision is how the field elements are represented. We discuss this issue for each field separately in Sections 3 and 4. A number of standardization bodies have started initiatives to standardize ECPKCs, among them are ANSI, IEEE, ISO, IETF. Most standards are still drafts, but are expected to be approved in the near future. The specified schemes include signature schemes, encryption schemes, and key agreement schemes. ECDSA is specified in ANSI X9.62 and IEEE P1363; both descriptions are almost identical. We based our implementation on the most recent draft documents we had available, i.e. [A97] and [I97]. Both [A97] and [I97] provide the option to apply point compression to points on an elliptic curve, in order to reduce storage requirements or bandwidth. The basic idea is that specifying the two coordinates of a point is unnecessary, since the fact that they satisfy the curve equation provides redundancy. More specifically, if the x-coordinate is known, at most two values of y are possible, and they can be computed by solving a quadratic equation. One extra bit of information allows to distinguish between the two values of y; this means that an elliptic curve point needs only slightly more storage space than an element of the underlying field. We do not discuss point compression in the rest of the paper since the cost is small compared to the cost of the overall signing and verification operations.
3
Elliptic Curves over GF(p)
In this section we describe implementation issues that are specific to curves over the field GF(p). The issues that apply to both GF(p) and GF(2n ) will be discussed in Section 5. 3.1
Representation of Field Elements
For GF(p), the most obvious way to represent the elements is as numbers in the range [0, p − 1], where each residue class is represented by its member in that range. Yet, it is not the only way. Since we will be using modular multiplications and squarings, we might consider representing the elements as Montgomery residues [M85b]. This only influences the inversion operation, since the inverse of a Montgomery residue is not the Montgomery residue of the inverse, i.e. the inverse operation does not commute with taking the Montgomery residue. Hence
256
Erik De Win et al.
an extra transformation is needed, but this problem can be alleviated by using the algorithm described in [K95a] that computes the Montgomery inverse. Moreover, if projective coordinates are used (see Section 3.4), very few inverse operations are needed anyway. Despite all this, we decided not to use the Montgomery representation for the simple reason that in our implementation the difference in speed between Montgomery and Barrett [B87] reduction is negligibly small. This also saves us the hassle of having to implement a special inverse algorithm and converting between two representations. However, in some cases the representation as Montgomery residues could be advantageous. 3.2
Field Multiplication and Squaring
For field multiplication and squaring, we started from our own implementation in C of well-known algorithms for operations on multi-precision numbers, see e.g. [K81]. Since standard C does not support the full capabilities of modern PCs for integer multiplication and division (i.e. 32-bit×32-bit→64-bit and 64-bit/32bit→32-bit), we used a number of small assembly language macros to make these available. As discussed in Section 3.1, we use a Barrett-like modular reduction algorithm. 3.3
Field Inversion
In most public key cryptosystems that are not based on elliptic curves, the time spent computing modular inverses is negligible compared to the time needed for modular exponentiation. Therefore, in many implementations not much effort has been spent on optimizing the modular inverse algorithm. However, in a straightforward implementation of the equations in Section 2, every single group operation needs to compute one modular inverse, and it turns out that this is where most of the execution time goes. It therefore is worthwhile to give some more thought to the optimization of this operation. We compared a number of algorithms, mostly variants of the extended version of Euclid’s algorithm. The best results were obtained with an algorithm that is based on the Montgomery inverse algorithm [K95a], after speeding it up by applying some extra heuristics and using the same assembly language macros as for multiplication and squaring. For lack of space, we cannot discuss the algorithm in detail here. It suffices to state that we were able to considerably improve the inversion operation, but we still found a ratio of 23 between the time for an inversion and a multiplication in a field with a 192-bit modulus. 3.4
Projective Coordinates
With a ratio of 23 between inversion and multiplication, it is clear that the former operation will be the major bottleneck of the implementation. Fortunately, there are ways to circumvent this problem, and they lie in the possibility of using different ways to specify the group operation. An alternative definition, that is
On the Performance of Signature Schemes Based on Elliptic Curves
257
explicitly specified in the appendices to [I97], uses projective coordinates. In this representation, the elliptic curve equation has 3 variables, and a point has 3 coordinates (x, y, z), but any point with coordinates (λ2 x, λ3 y, λz) for arbitrary λ 6= 0 is considered equal to the former. In fact, this can be thought of as keeping the denominator of the equations for the group operation in a separate variable, and postponing the actual inversion operation until the x- or y-coordinates are really needed, for instance at the end of a point multiplication. The drawback of projective coordinates is that a group operation involves considerably more field multiplications. In [I97], projective formulas are given that allow a point doubling to be computed using 10 field multiplications in the general case, and 8 if the curve parameter a is 3 less than the modulus. A point addition requires 16 multiplications in the general case, and only 11 if one of the points has a z-coordinate equal to 1. On the whole, assuming that an inversion takes the time of approximately 23 multiplications, we can save roughly between 10 and 19 multiplication times per group operation.
4
Elliptic Curves over GF(2n )
In this section we describe implementation issues that are specific to curves over the field GF(2n ). The issues that apply to both GF(p) and GF(2n ) will be discussed in Section 5. 4.1
Representation of Field Elements
For GF(2n ), a number of representations of the field elements are known and each of them has its specific advantages. The most well known representation is the standard basis representation, used for instance in [SOOS95]. Field elements are represented as binary polynomials modulo an irreducible binary polynomial of degree n. Standard basis implementations can be made more efficient if an irreducible polynomial with low Hamming weight and no terms of high degree is chosen, such as a trinomial or a pentanomial. At least one of these can be found for any value of n. Another well known representation uses an optimal normal basis [MOVW88]. This basis gives rise to elegant hardware implementations, but for software, our experience is that a standard basis is more efficient. A third representation (see e.g. [HMV92], [DBV+96] or [GP97]) represents elements of the field as polynomials over a subfield of the form GF(2r ), where r is a divisor of n. This representation enables efficient implementations, but limits the possible values of n to multiples of r. This is not so much an implementation issue, since we can make r small enough that the number of possible values of n is still sufficiently large. But the fact that these fields have some extra structure, consisting of a fairly large subfield, could reduce the security in the sense that the ECDLP over these fields might turn out to be easier to break. Although there currently is no indication that the latter fields are less secure, we wanted to avoid the risk by choosing a prime n. And since optimal normal
258
Erik De Win et al.
bases seem to be slower in software, we opted for a standard basis representation using trinomials or pentanomials. This representation is well specified in both [A97] and [I97]. 4.2
Field Multiplication and Squaring
The algorithms for multiplication and squaring in a standard basis, as well as algorithms for reduction modulo a trinomial or pentanomial, are described in [SOOS95]. Contrary to GF(p), no assembly language was used, because most microprocessors do not have a special instruction for multiplying binary polynomials. While this may seem to result in a biased comparison between both kinds of fields, the situation in a practical application is likely to be similar, hence our comparisons are practically relevant. Note that the squaring operation is much more efficient than multiplication, because GF(2n ) has characteristic two, so that all the cross-terms vanish. 4.3
Field Inversion
The almost inverse algorithm [SOOS95] is the fastest known algorithm for computing modular inverses of binary polynomials. With a suitable choice of the field polynomial, the inversion time is approximately 3 times longer than the multiplication time. At the end of the algorithm, a Montgomery-like reduction is necessary to convert the almost inverse to the real inverse. This reduction is fast if the irreducible field polynomial has low Hamming weight and has no terms of low degree (smaller than the word size of the processor), except for the constant term. Unfortunately, most of the field polynomials specified in ANSI X9.62 do have terms of low degree. This increases the timings of the almost inverse algorithm by up to 30%. Therefore, we conclude that the choice of polynomials in ANSI X9.62 is rather unfortunate, and may be revised if that is practically feasible. This problem can be circumvented by converting the field elements and elliptic curve points from a representation based on a standardized polynomial to an internal representation based on a polynomial with better properties. We did not implement the conversion yet, but we give timing results using both a polynomial from the standard and a more suitable polynomial. Because the ratio between field inversion and field multiplication is rather low, the use of projective coordinates brings no benefit for GF(2n ) in a standard basis representation. 4.4
Basis Conversion
Although a single basis may be chosen for a program’s internal representation of field elements, it is important for interoperability with other implementations that an efficient method of converting between the chosen representation and the others exist. This is the case for the bases already discussed; the procedure
On the Performance of Signature Schemes Based on Elliptic Curves
259
involves finding a root (in the target basis) of the field polynomial of the original basis. The field element in the target basis is then calculated as the linear combination of powers of that root. Details are provided in [A97] for conversion between standard and optimal normal bases. In practice, the calculation of the root is expensive, so the roots are tabulated and the required powers are calculated during the first conversion. Apart from interoperability, basis conversion is also useful from an efficiency point of view, for example for field inversion (see Section 4.3).
5
Operations on Elliptic Curve Points
The basic group operations can be implemented in a straightforward way in terms of the field operations discussed in Sections 3 and 4. However, the core operation of ECPKCs is the repeated group operation, i.e. the multiplication of a point by an integer, and this operation deserves some more thought. It is the equivalent of modular exponentiation for integer DLP-based systems, and is therefore also referred to as elliptic curve exponentiation, and the multiplier is sometimes called the exponent. We will use both terms interchangeably; we are confident that this will not cause confusion since strictly speaking there exists no such thing as elliptic curve exponentiation. Many authors have discussed fast ways to do exponentiation under various conditions; [G96] gives a concise overview. Most of these algorithms can be extended to the point multiplication in an elliptic curve group. However, the elliptic curve group has the interesting property that the inverse of a point is extremely efficient to compute (see Section 2). This allows for some extra optimizations [R60]. On the other hand, exponents in an elliptic curve system are generally much shorter than in other systems such as RSA. Some optimizations described in the literature may only be advantageous for exponents above certain lengths, and may not be worthwhile for elliptic curves. In the next paragraphs, we discuss point multiplication for a number of cases that are relevant to ECDSA. The algorithms are mostly based on known algorithms for exponentiation, but we adapt them in order to make better use of the parameters of the elliptic curve case. Before that, we will discuss some issues related to the representation of the exponent. 5.1
Representation of Exponents
The binary representation can be considered as the generic representation for exponents, because it is the basis for the square-and-multiply algorithm. This algorithm is discussed in [K81, p. 442] for instance, and gives an extremely simple and relatively efficient way to find addition chains. It has been improved upon in a number of ways depending on the context, e.g. by using windowed methods, or precomputation, but the binary representation remains the basis of many practical implementations.
260
Erik De Win et al.
For elliptic curve exponentiation, the binary representation can still be used, but a signed binary representation, where each bit has a sign, seems more appropriate. A negative bit is processed similarly as a positive bit, but uses the inverse of the point, which can easily be calculated and used in the course of an exponentiation. It is important to note that this representation is not unique, e.g. 1000¯ 1 and 1111 both represent the number 15 (¯1 stands for a negative bit). In [MO90], an algorithm is proposed to convert from a non-signed to a particular signed representation. The result has the so-called non-adjacent form (NAF); this means that of any two adjacent bits, at least one must be zero. An interesting property of the NAF representation is that it is unique. Also, for a random exponent, the expected fraction of non-zero bits is 1/3, as opposed to 1/2 for a binary representation. This results in an 11% speed-up on average for the standard square-and-multiply algorithm. As we will see, the use of the NAF can speed up windowed techniques as well. Although the recoding algorithm in [MO90] looks a little involved, the signed binary NAF of a number e can be computed easily as follows: subtract e from 3e, replacing the borrow mechanism by the rule 0 − 1 = ¯1, and then discard the least significant bit. Alternative signed binary representations have been proposed in [KT92] and [MOC97]. These representations have better properties with respect to windowed exponentiation techniques. However, we will see that in comparing different representations, it is important to take into account the number of precomputations. It is an open problem what the best signed binary representation for windowed techniques is. To analyze the expected number of operations for a point multiplication, we need an estimate of the expected length of a run of zeros, since this has an impact on the expected number of additions. According to [KT92], this average length is 4/3 for the signed binary NAF and 3/2 for the improved method they propose. In [MOC97], an algorithm is proposed that results in an average zero-run length of 2. The binary representation has an average zero-run length of 1 [K95b]. 5.2
General Point Multiplication
The square-and-multiply (or double-and-add in additive notation) algorithm can easily be extended to a double-and-add/subtract algorithm based on signed binary NAF. The expected improvement is roughly 11% [MO90]. Other algorithms, such as the sliding window technique, can be extended to the signed binary NAF representation as well. We will first give an example for a particular window size, and then generalize the results to arbitrary window sizes. With a window size w of 4 bits, the only windows that can occur are 1000, 1001, 1010, 1001, 1010, plus their counterparts with the signs of all bits reversed. The values associated to the latter can easily be computed as the negative of the precomputed values associated to the former. All other window values are excluded because of the NAF property. Denoting the point to be multiplied by P , this means that we only have to precompute and store 6P , 7P , 8P , 9P
On the Performance of Signature Schemes Based on Elliptic Curves
261
and 10P ; the other values can be obtained from these by taking the negative. The precomputation can be done in 7 operations using the addition sequence 1, 2, 4, 6, 7, 8, 9, 10. This can even be reduced to 5 operations if trailing zeros are handled such as in alg. 14.85 of [MvV97]. In this case, only 3P , 5P , 7P and 9P need to be precomputed. If we consider the window size w as a parameter, the average number of operations for a complete point multiplication is C(w) + λ + 2 − w +
λ + 1 − w/2 , w + 4/3
(1)
where λ = blog2 (k)c (denoting the exponent by k), 4/3 is the average zero-runlength, and C(w) is the number of operations needed for the precomputation. The expression for C(w) for a signed binary NAF is slightly more complex than for the binary case: 2w − (−1)w . C(w) = 3 The algorithm described here was considered in [KT92], and in the same paper, an improvement was proposed, consisting of an alternative, slightly more complex, algorithm to convert from binary to signed binary representation. This results in an increased average length of zero runs and a reduced number of operations in the course of the algorithm. As an example, consider the bit string 00111100 as part of an exponent; this is replaced by 01000100 in the NAF. With a window size of 4, two add/subtract operations are needed to handle the NAF of this bit string, whereas the original form potentially needs one addition, depending on the other exponent bits; hence it is better not to do the substitution. The expected number of operations of the improved algorithm is [KT92]: λ + 1.25 + 2w−1 − 1 . (2) (λ + 2.75 − w) + w + 1.5 When comparing the number of operations given by (1) and (2) for exponents up to 2000 bits, we find that the latter algorithm needs in fact more operations than the former, contrary to the conclusion in [KT92]. This is probably due to an overestimation of the cost of precomputation C(w) in (1): because of the NAF property, a considerable number of values do not have to be precomputed since they cannot occur. Since the algorithm proposed in [KT92] does not produce a NAF, we see no way to obtain comparable savings for the precomputation step. We used the first algorithm in our implementation. The optimal window size is 4 for exponents up to roughly 170 bits, 5 for the range 170–420 bits, 6 for the range 420–1290 bits and 7 for 1290 bits up to well above 2000 bits. Comparing to a sliding window technique based on the binary representation, we gain approximately 2.6% for 200-bit exponents, decreasing to only 1.3% for 2000 bits. In a recent paper [MOC97], an even better recoding algorithm is proposed, resulting in an average zero-run length of 2. To our understanding, there is no restriction on the values of the windows, and the number of values to be
262
Erik De Win et al.
precomputed is 2w−1 − 1, as in (2). When we calculate the expected number of operations under these assumptions, we find that the difference with the signed binary NAF remains under a fraction of a percent for exponent lengths up to over 2000 bits. In that range, there are alternating subranges for which signed binary NAF is better than [MOC97] and vice-versa. Note that in the estimates of the number of operations, no distinction was made between additions and doublings. For GF(2 n ), this is a good approximation, since both operations are almost equally fast. However, for GF(p) with projective coordinates, a typical doubling is 25% faster than a typical addition, so an accurate estimate of the number of operations should make a distinction between them. Fortunately, this distinction has very little influence on the optimal window size, since the number of doublings depends only lightly on it. 5.3
ECDSA Key Generation and Signing Operation
Most of the time needed for key generation and signing is typically taken by the multiplication of the EC group generator by a random number. The EC group and generator are typically known ahead of time; therefore, we can afford to do some precomputation at initialization time in order to obtain a faster signing operation. A number of algorithms for exponentiation with a fixed generator have been described in [BGMW92]. We use a rather simple one, which is also described in algorithm 14.109 of [MvV97]. We denote the group generator by P and use the additive notation. After choosing a basis b, we precompute the products bi P for all values of i up to a certain bound t so that all multipliers will be smaller than bt . Then the algorithm does a point multiplication in at most t + h − 2 group operations, where h is the maximum value of the digits in the b-ary representation of the exponent. To avoid doing basis conversions, we choose b = 2w , which essentially results in a windowed method with window size w. If we use a binary representation for the exponent, h = b − 1. However, if we use the signed binary NAF, a number of high values of the window are impossible and h is reduced to h=2
2w − 1 3
for even w ,
2w+1 − 1 for odd w . 3 For the curves and field sizes used for the timings, using the NAF reduces the signature time by almost 10%. Note that the algorithm we implemented is not the best algorithm known. The signing time can be reduced even more using a slightly more advanced recoding algorithm from [BGMW92], which has h = 2w−1 . With the parameters used for our timings, this would result in an additional 5% gain. Recently, an algorithm was proposed [MOC97] that gives better results for elliptic curves over GF(p). The algorithm is substantially different from the algorithms discussed in [BGMW92]; it trades point additions for doublings, which are more efficient when projective coordinates are used (see Section 3.4). h=
On the Performance of Signature Schemes Based on Elliptic Curves
5.4
263
ECDSA Verification Operation
Both the DSA and the ECDSA operation require the computation of a simultaneous multiple point multiplication, i.e., a group element of the form k1 P1 +k2 P2 , where P1 and P2 are elements of the group and k1 and k2 are integers. Algorithm 14.88 of [MvV97] gives a way to compute this in an interleaved way, rather than by calculating the two point multiplications separately and adding the result. If this algorithm is combined with a sliding window technique, we obtain an algorithm that is only 20%-25% slower than a single point multiplication. The optimal window length is 2 for exponents up to at least 500 bits. From simulations, we estimate that the average length of a zero run is approximately 0.6. The number of operations is given by a formula similar to (1).
6
Timings and Comparison
We timed our implementation for a number of example curves from the current draft of ANSI X9.62. For GF(p) we used a modulus of 192 bits. The curve parameter a of the example curve is 3 less than the modulus, allowing for a faster projective doubling operation. For GF(2n ), we did timings for 2 trinomials of degree 191, one that is specified in the standard, and one that has better properties with respect to the reduction step of the almost inverse algorithm. For the latter, we did not use a curve from the standard. The timings were done on a PPro200-based PC with Windows NT 4.0 using MSVC 4.2 and maximal optimization. The code for RSA and DSA was written in C, using some small macros in assembly language. The elliptic curve code was mainly written in C++; for GF(p) the same multi-precision routines in C were called as for RSA and DSA. Table 1 gives timing results for the field operations and the elliptic curve group operations for both GF(p) and GF(2n ). The computation of inverses is clearly more expensive over GF(p), but this is largely compensated for by the faster multiplication, since projective coordinates can be used. Table 1. Timings for field operations over GF(p) and GF(2n ). The field size is approximately 191 bits for both. For GF(2n ), two timings are given, one using a trinomial specified in ANSI X9.62 and the other using a trinomial with better properties with respect to the almost inverse algorithm. All times in µs. GF(p) addition multiplication squaring inverse EC addition EC double
1.6 7.8 7.6 180 103 76
GF(2n ), standard trinomial 0.6 39 2.6 159 242 246
GF(2n ), improved trinomial 0.6 39 2.6 126 215 220
264
Erik De Win et al.
Table 2 gives timing results for the overall key generation, signing, and verification operations for ECDSA, RSA and DSA, as well as for general point multiplication on an elliptic curve. For DSA and ECDSA, we assumed that the underlying group is the same for all users; if this is not the case, the key generation time has to be augmented by the time needed to generate an appropriate group (such as prime generation, point counting on an elliptic curve, etc.). Table 2. Comparison of ECDSA to other signature algorithms. For EC, the field size is approximately 191 bits. The modulus for RSA and DSA is 1024 bits long; the RSA public exponent is 3. All times in ms, unless otherwise indicated. ECDSA GF(2n ) ECDSA GF(2n ) ECDSA RSA standard trin. improved trin. GF(p) key generation 13.0 11.7 5.5 1s signature 13.3 11.3 6.3 43.3 verification 68 60 26 0.65 general point multipl. 56 50 21.1
DSA 22.7 23.6 28.3
The modulus for both RSA and DSA is 1024 bits long. There is no general consensus about the relative security levels of EC, RSA, and DSA as a function of the size of the parameters. It is probably safe to state that EC with a group size of 190 bits is slightly stronger than RSA or DSA with a 1024-bit modulus. The RSA public exponent is 3. Note that the DSA implementation does not use precomputation for the key generation and signing operation, whereas ECDSA does.
7
Further Work
There are still a number of potential optimizations we have not used in our implementation. For GF(2n ), anomalous curves could be used [K91]. In [S97b], an algorithm is proposed that requires less than λ/3 elliptic curve additions and a number of field squarings, the latter being almost for free in GF(2n ). This would be particularly interesting to speed up the verification operation. Note that anomalous curves over GF(p) should be avoided for cryptographic use [S97a]; for anomalous curves over GF(2n ) no particular weaknesses have been found. For key and signature generation, the optimizations described at the end of Section 5.3 could be implemented. Using an advanced technique from [BGMW92] might further improve the speed of these operations. In [GP97], an improved point multiplication algorithm is described, based on a more efficient way to repeatedly double a point by trading inversions for multiplications. The paper only discusses the GF(2n ) case, and is currently not advantageous for our implementation because the inversion is relatively fast. However, a similar idea can probably be applied to GF(p), and there the benefit
On the Performance of Signature Schemes Based on Elliptic Curves
265
could be more important because of the fast field multiplication. Note that this idea cannot be combined with projective coordinates; more work is needed to determine which of the two results in the fastest implementation.
References A97.
ANSI X9.62-199x: Public Key Cryptography for the Financial Services Industry: The Elliptic Curve Digital Signature Algorithm (ECDSA), June 11, 1997. AM93. A. Atkin and F. Morain, “Elliptic curves and primality proving,” Mathematics of Computation, Vol. 61 (1993), pp. 29–68. B87. P. Barrett, “Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor,” Advances in Cryptology, Proc. Crypto’86, LNCS 263, A. Odlyzko, Ed., Springer-Verlag, 1987, pp. 311–323. BGMW92. E. Brickell, D. Gordon, K. McCurley and D. Wilson, “Fast exponentiation with precomputation,” Advances in Cryptology, Proc. Eurocrypt’92, LNCS 658, R.A. Rueppel, Ed., Springer-Verlag, 1993, pp. 200–207. DBV+96. E. De Win, A. Bosselaers, S. Vandenberghe, P. De Gersem and J. Vandewalle, “A fast software implementation for arithmetic operations in GF(2n ),” Advances in Cryptology, Proc. Asiacrypt’96, LNCS 1163, K. Kim and T. Matsumoto, Eds., Springer-Verlag, 1996, pp. 65–76. G96. D. Gordon, “A survey of fast exponentiation methods,” draft, 1996. GP97. J. Guajardo and C. Paar, “Efficient algorithms for elliptic curve cryptosystems,” Advances in Cryptology, Proc. Crypto’97, LNCS 1294, B. Kaliski, Ed., Springer-Verlag, 1997, pp. 342–356. HMV92. G. Harper, A. Menezes and S. Vanstone, “Public key cryptosystems with very small key length,” Advances in Cryptology, Proc. Eurocrypt’92, LNCS 658, R.A. Rueppel, Ed., Springer-Verlag, 1993, pp. 163–173. I97. IEEE P1363: Editorial Contribution to Standard for Public Key Cryptography, August 18, 1997. K95a. B. Kaliski Jr., “The Montgomery inverse and its applications,” IEEE Transactions on Computers, Vol. 44, no. 8 (1995), pp. 1064–1065. K81. D. Knuth, The art of computer programming, Vol. 2, Semi-numerical Algorithms, 2nd Edition, Addison-Wesley, Reading, Mass., 1981. K87. N. Koblitz, “Elliptic curve cryptosystems,” Mathematics of Computation, Vol. 48, no. 177 (1987), pp. 203–209. K91. N. Koblitz, “CM-curves with good cryptographic properties,” Advances in Cryptology, Proc. Crypto’91, LNCS 576, J. Feigenbaum, Ed., SpringerVerlag, 1997, pp. 279–287. K95b. C. Ko¸c, “Analysis of sliding window techniques for exponentiation,” Computers Math. Applic., Vol. 30, no. 10 (1995), pp. 17–24. KT92. K. Koyama and Y. Tsuruoka, “Speeding up elliptic cryptosystems by using a signed binary window method,” Advances in Cryptology, Proc. Crypto’92, LNCS 740, E. Brickell, Ed., Springer-Verlag, 1993, pp. 345–357. L87. H.W. Lenstra Jr., “Factoring integers with elliptic curves,” Annals of Mathematics, Vol. 126 (1987), pp. 649–673. M93. A. Menezes, Elliptic curve public key cryptosystems, Kluwer Academic Publishers, 1993.
266
Erik De Win et al.
MOV93.
A. Menezes, T. Okamoto and S. Vanstone, “Reducing elliptic curve logarithms to logarithms in a finite field,” IEEE Transactions on Information Theory, Vol. 39 (1993), pp. 1639–1646. MvV97. A. Menezes, P. van Oorschot and S. Vanstone, Handbook of applied cryptography, CRC Press, 1997. M85a. V.S. Miller, “Use of elliptic curves in cryptography,” Advances in Cryptology Proc. Crypto’85, LNCS 218, H.C. Williams, Ed., Springer-Verlag, 1985, pp. 417–426. MOC97. A. Miyaji, T. Ono and H. Cohen, “Efficient elliptic curve exponentiation,” Proceedings of ICICS’97, LNCS 1334, Y. Han, T. Okamoto and S. Qing, Eds., Springer-Verlag, 1997, pp. 282–290. M85b. P. Montgomery, “Modular multiplication without trial division,” Mathematics of Computation, Vol. 44 (1985), pp. 519–521. MO90. F. Morain and J. Olivos, “Speeding up the computations on an elliptic curve using addition-subtraction chains,” Informatique Th´ eorique et Applications, Vol. 24, pp. 531–543, 1990. MOVW88. R. Mullin, I. Onyszchuk, S. Vanstone and R. Wilson, “Optimal normal bases in GF(pn ),” Discrete Applied Mathematics, Vol. 22 (1988/1989), pp. 149–161. R60. G. Reitwiesner, “Binary arithmetic,” Advances in Computers, Vol. 1 (1960), pp. 231–308 SOOS95. R. Schroeppel, H. Orman, S. O’Malley and O. Spatscheck, “Fast key exchange with elliptic curve systems,” Advances in Cryptology, Proc. Crypto’95, LNCS 963, D. Coppersmith, Ed., Springer-Verlag, 1995, pp. 43–56. S97a. N. Smart, “Elliptic Curve Discrete Logarithms,” message to newsgroup sci.math.research. no. [email protected], Sept. 30 1997. S97b. J. Solinas, “An improved algorithm for arithmetic on a family of elliptic curves,” Advances in Cryptology, Proc. Crypto’97, LNCS 1294, B. Kaliski, Ed., Springer-Verlag, 1997, pp. 357–371.
NTRU: A Ring-Based Public Key Cryptosystem Jeffrey Hoffstein, Jill Pipher, and Joseph H. Silverman Abstract. We describe NTRU, a new public key cryptosystem. NTRU features reasonably short, easily created keys, high speed, and low memory requirements. NTRU encryption and decryption use a mixing system suggested by polynomial algebra combined with a clustering principle based on elementary probability theory. The security of the NTRU cryptosystem comes from the interaction of the polynomial mixing system with the independence of reduction modulo two relatively prime integers p and q.
Contents 0. Introduction 1. Description of the NTRU Algorithm 1.1. Notation 1.2. Key Creation 1.3. Encryption 1.4. Decryption 1.5. Why Decryption Works 2. Parameter Selection 2.1. Notation and a Norm Estimate 2.2. Sample Spaces 2.3. A Decryption Criterion 3. Security Analysis 3.1. Brute Force Attacks 3.2. Meet-in-the-Middle Attacks 3.3. Multiple Transmission Attacks 3.4. Lattice Based Attacks 4. Practical Implementations of NTRU 4.1. Specific Parameter Choices 4.2. Lattice Attacks — Experimental Evidence 5. Additional Topics 5.1. Improving Message Expansion 5.2. Theoretical Operating Specifications 5.3. Other Implementation Considerations 5.4. Comparison with Other PKCS’s 6. Appendix
J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 267–288, 1998. c Springer-Verlag Berlin Heidelberg 1998
268
Jeffrey Hoffstein, Jill Pipher, and Joseph H. Silverman
§0.
Introduction
There has been considerable interest in the creation of efficient and computationally inexpensive public key cryptosystems since Diffie and Hellman [3] explained how such systems could be created using one-way functions. Currently, the most widely used public key system is RSA, which was created by Rivest, Shamir and Adelman in 1978 [9] and is based on the difficulty of factoring large numbers. Other systems include the McEliece system [8] which relies on error correcting codes, and a recent system of Goldreich, Goldwasser, and Halevi [4] which is based on the difficulty of lattice reduction problems. In this paper we describe a new public key cryptosystem, which we call the NTRU system. The encryption procedure uses a mixing system based on polynomial algebra and reduction modulo two numbers p and q, while the decryption procedure uses an unmixing system whose validity depends on elementary probability theory. The security of the NTRU public key cryptosystem comes from the interaction of the polynomial mixing system with the independence of reduction modulo p and q. Security also relies on the (experimentally observed) fact that for most lattices, it is very difficult to find extremely short (as opposed to moderately short) vectors. We mention that the presentation in this paper differs from an earlier, widely circulated but unpublished, preprint [6] in that the analysis of lattice-based attacks has been expanded and clarified, based largely on the numerous comments received from Don Coppersmith, Johan H˚ astad, and Adi Shamir in person, via email, and in the recent article [2]. We would like to take this opportunity to thank them for their interest and their help. NTRU fits into the general framework of a probabilistic cryptosystem as described in [1] and [5]. This means that encryption includes a random element, so each message has many possible encryptions. Encryption and decryption with NTRU are extremely fast, and key creation is fast and easy. See Section 5 for specifics, but we note here that NTRU takes O(N 2 ) operations to encrypt or decrypt a message block of length N , making it considerably faster than the O(N 3 ) operations required by RSA. Further, NTRU key lengths are O(N ), which compares well with the O(N 2 ) key lengths required by other “fast” public keys systems such as [8, 4]. §1.
Description of the NTRU Algorithm
§1.1. Notation. An NTRU cryptosystem depends on three integer parameters (N, p, q) and four sets Lf , Lg , Lφ , Lm of polynomials of degree N − 1 with integer coefficients. Note that p and q need not be prime, but we will assume that gcd(p, q) = 1, and q will always be considerably larger than p. We work in the ring R = [X]/(X N − 1). An element F ∈ R will be written as a polynomial or a vector, N −1 F = Fi xi = [F0 , F1 , . . . , FN −1 ]. We write
i=0
to denote multiplication in R. This star multiplication is given
NTRU: A Ring-Based Public Key Cryptosystem
269
explicitly as a cyclic convolution product, F
G =H
with Hk =
k
N −1
Fi Gk−i +
i=0
Fi GN +k−i =
i=k+1
Fi Gj .
i+j≡k (mod N )
When we do a multiplication modulo (say) q, we mean to reduce the coefficients modulo q. Remark. In principle, computation of a product F G requires N 2 multiplications. However, for a typical product used by NTRU, one of F or G has small coefficients, so the computation of F G is very fast. On the other hand, if N is taken to be large, then it might be faster to use Fast Fourier Transforms to compute products F G in O(N log N ) operations. §1.2. Key Creation. To create an NTRU key, Dan randomly chooses 2 polynomials f, g ∈ Lg . The polynomial f must satisfy the additional requirement that it have inverses modulo q and modulo p. For suitable parameter choices, this will be true for most choices of f , and the actual computation of these inverses is easy using a modification of the Euclidean algorithm. We will denote these inverses by Fq and Fp , that is, Fq
f ≡ 1 (mod q)
and
Fp f ≡ 1 (mod p).
(1)
Dan next computes the quantity h ≡ Fq
g (mod q).
(2)
Dan’s public key is the polynomial h. Dan’s private key is the polynomial f , although in practice he will also want to store Fp . §1.3. Encryption. Suppose that Cathy (the encrypter) wants to send a message to Dan (the decrypter). She begins by selecting a message m from the set of plaintexts Lm . Next she randomly chooses a polynomial φ ∈ Lφ and uses Dan’s public key h to compute e ≡ pφ h + m (mod q). This is the encrypted message which Cathy transmits to Dan. §1.4. Decryption. Suppose that Dan has received the message e from Cathy and wants to decrypt it using his private key f . To do this efficiently, Dan should have precomputed the polynomial Fp described in Section 1.1. In order to decrypt e, Dan first computes a≡f
e (mod q),
where he chooses the coefficients of a in the interval from −q/2 to q/2. Now treating a as a polynomial with integer coefficients, Dan recovers the message by computing Fp a (mod p).
270
Jeffrey Hoffstein, Jill Pipher, and Joseph H. Silverman
Remark. For appropriate parameter values, there is an extremely high probability that the decryption procedure will recover the original message. However, some parameter choices may cause occasional decryption failure, so one should probably include a few check bits in each message block. The usual cause of decryption failure will be that the message is improperly centered. In this case Dan will be able to recover the message by choosing the coefficients of a ≡ f e (mod q) in a slightly different interval, for example from −q/2 + x to q/2 + x for some small (positive or negative) value of x. If no value of x works, then we say that we have gap failure and the message cannot be decrypted as easily. For well-chosen parameter values, this will occur so rarely that it can be ignored in practice. §1.5. Why Decryption Works. The polynomial a that Dan computes satisfies a≡f
e ≡ f pφ h + f m (mod q) = f pφ Fq g + f m (mod q) from (2), = pφ g + f m (mod q) from (1).
Consider this last polynomial pφ g + f m. For appropriate parameter choices, we can ensure that (almost always) all of its coefficients lie between −q/2 and q/2, so that it doesn’t change if its coefficients are reduced modulo q. This means that when Dan reduces the coefficients of f e modulo q into the interval from −q/2 to q/2, he recovers exactly the polynomial a = pφ g + f
m
in
[X]/(X N − 1).
Reducing a modulo p then gives him the polynomial f tiplication by Fp retrieves the message m (mod p). §2.
m (mod p), and mul-
Parameter Selection
§2.1. Notation and a Norm Estimate. We define the width of an element F ∈ R to be |F |∞ = max {Fi } − min {Fi }. 1≤i≤N
1≤i≤N
As our notation suggests, this is a sort of L∞ norm on R. Similarly, we define a centered L2 norm on R by |F |2 =
N i=1
1/2 (Fi − F¯ )2
,
N 1 where F¯ = Fi . N i=1
√ (Equivalently, |F |2 / N is the standard deviation of the coefficients of F .) The following proposition was suggested to us by Don Coppersmith.
NTRU: A Ring-Based Public Key Cryptosystem
271
Proposition. For any ε > 0 there are constants γ1 , γ2 > 0, depending on ε and N , such that for randomly chosen polynomials F, G ∈ R, the probability is greater than 1 − ε that they satisfy γ1 |F |2 |G|2 ≤ |F
G|∞ ≤ γ2 |F |2 |G|2 .
Of course, this proposition would be useless from a practical viewpoint if the ratio γ2 /γ1 were very large for small ε’s. However, it turns out that even for moderately large values of N and very small values of ε, the constants γ1 , γ2 are not at all extreme. We have verified this experimentally for a large number of parameter values and have an outline of a theoretical proof. §2.2. Sample Spaces. The space of messages Lm consists of all polynomials modulo p. Assuming p is odd, it is most convenient to take 1 1 Lm = m ∈ R : m has coefficients lying between − (p − 1) and (p − 1) . 2 2 To describe the other samples spaces, we will use sets of the form has d1 coefficients equal 1, L(d1 , d2 ) = F ∈ R : d F . 2 coefficients equal −1, the rest 0 With this notation, we choose three positive integers df , dg , d and set Lf = L(df , df − 1),
Lg = L(dg , dg ),
and Lφ = L(d, d).
(The reason we don’t set Lf = L(df , df ) is because we want f to be invertible, and a polynomial satisfying f (1) = 0 can never be invertible.) Notice that f ∈ Lf , g ∈ Lg , and φ ∈ Lφ have L2 norms √ |f |2 = 2df − 1 − N −1 , |g|2 = 2dg , |φ|2 = 2d. Later we will give values for df , dg , d which allow decryption while maintaining various security levels. §2.3. A Decryption Criterion. In order for the decryption process to work, it is necessary that |f m + pφ g|∞ < q. We have found that this will virtually always be true if we choose parameters so that and |pφ g|∞ ≤ q/4, |f m|∞ ≤ q/4 and in view of the above Proposition, this suggests that we take |f |2 |m|2 ≈ q/4γ2
and
|φ|2 |g|2 ≈ q/4pγ2
(3)
for a γ2 corresponding to a small value for ε. For example, experimental evidence suggests that for N = 107, N = 167, and N = 503, appropriate values for γ2 are 0.35, 0.27, and 0.17 respectively.
272
Jeffrey Hoffstein, Jill Pipher, and Joseph H. Silverman
§3.
Security Analysis
§3.1. Brute Force Attacks. An attacker can recover the private key by trying all possible f ∈ Lf and testing if f h (mod q) has small entries, or by trying all g ∈ Lg and testing if g h−1 (mod q) has small entries. Similarly, an attacker can recover a message by trying all possible φ ∈ Lφ and testing if e − φ h (mod q) has small entries. In practice, Lg will be smaller than Lf , so key security is determined by #Lg , and individual message security is determined by #Lφ . However, as described in the next section, there is a meet-in-the-middle attack which (assuming sufficient storage) cuts the search time by the usual square root. Hence the security level is given by
Key 1 N! = #Lg = dg ! (N − 2dg )! Security
Message N! 1 = #Lφ = . d! (N − 2d)! Security §3.2. Meet-in-the-Middle Attacks. Recall that an encrypted message looks like e ≡ φ h + m (mod q). Andrew Odlyzko has pointed out that there is a meet-in-the-middle attack which can be used against φ, and we observe that a similar attack applies also to the private key f . Briefly, one splits f in half, say f = f1 + f2 , and then one matches f1 e against −f2 e, looking for (f1 , f2 ) so that the corresponding coefficients have approximately the same value. Hence in order to obtain a security level of (say) 280 , one must choose f , g, and φ from sets containing around 2160 elements. (For further details, see [13].) §3.3. Multiple Transmission Attacks. If Cathy sends a single message m several times using the same public key but different random φ’s, then the attacker Betty will be able to recover a large part of the message. Briefly, suppose that Cathy transmits ei ≡ φi h + m (mod q) for i = 1, 2, . . . , r. Betty can then compute (ei − e1 ) h−1 (mod q), thereby recovering φi − φ1 (mod q). However, the coefficients of the φ’s are so small that she recovers exactly φi − φ1 , and from this she will recover many of the coefficients of φ1 . If r is even of moderate size (say 4 or 5), Betty will recover enough of φ1 to be able to test all possibilities for the remaining coefficients by brute force, thereby recovering m. Thus multiple transmission are not advised without some further scrambling of the underlying message. We do point out that even if Betty decrypts a single message in this fashion, this information will not assist her in decrypting any subsequent messages. §3.4. Lattice Based Attacks. The object of this section is to give a brief analysis of the known lattice attacks on both the public key h and the message m. We begin with a few words concerning lattice reduction. The goal of lattice reduction is to find one or more “small” vectors in a given lattice. In theory, the smallest vector can be found by an exhaustive search, but in practice this is not possible if the dimension is large. The LLL algorithm of Lenstra-LenstraLov´ asz [7], with various improvements due to Schnorr and others, [10, 12, 11]
NTRU: A Ring-Based Public Key Cryptosystem
273
will find relatively small vectors in polynomial time, but even LLL will take a long time to find the smallest vector provided that the smallest vector is not too much smaller than the expected length of the smallest vector. We will make these observations more precise below. §3.4.1. Lattice Attack on an NTRU Private Key. Consider the 2N -by-2N matrix composed of four N -by-N blocks:
α 0 .. .
0 α .. .
··· ··· .. .
0 0 .. .
h0
hN −1 .. .
h1 h0 .. .
··· ··· .. .
hN −1 hN −2 .. .
0 0 0 .. .
0 0 0 .. .
··· ··· ··· .. .
α 0 0 .. .
h1 q 0 .. .
h2 0 q .. .
··· ··· ··· .. .
h0 0 0 .. .
0
0
···
0
0
0
···
q
(Here α is a parameter to be chosen shortly.) Let L be the lattice generated by the rows of this matrix. The determinant of L is q N αN . Since the public key is h = g f −1 , the lattice L will contain the vector τ = (αf, g), by which we mean the 2N vector consisting of the N coefficients of f multiplied by α, followed by the N coefficients of g. By the gaussian heuristic, the expected size of the smallest vector in a random lattice of dimension n and determinant D lies between D
1/n
n 2πe
and D
1/n
n . πe
In our case, n = 2N and D = q N αN , so the expected smallest length is larger (but not much larger) than N αq . s= πe An implementation of a lattice reduction algorithm will have the best chance of locating τ , or another vector whose length is close to τ , if the attacker chooses α to maximize the ratio s/ |τ |2 . Squaring this ratio, we see that an attacker should choose α so as to maximize α α2 |f |22 + |g|22
−1 = α |f |22 + α−1 |g|22 .
This is done by choosing α = |g|2 / |f |2 . (Note that |g|2 and |f |2 are both public quantities.) When α is chosen in this way, we define a constant ch by setting |τ |2 = ch s. Thus ch is the ratio of the length of the target vector to the length of the expected
274
Jeffrey Hoffstein, Jill Pipher, and Joseph H. Silverman
shortest vector. The smaller the value of ch , the easier it will be to find the target vector. Substituting in above, we obtain
2πe |f |2 |g|2 . ch = Nq For a given pair (f, g) used to set up the cryptosystem, ch may be viewed as a measure of how far the associated lattice departs from a random lattice. If ch is close to 1, then L will resemble a random lattice and lattice reduction methods will have a hard time finding a short vector in general, and finding τ in particular. As ch decreases, lattice reduction algorithms will have an easier time finding τ . Based on the limited evidence we have obtained, the time required appears to be (at least) exponential in N , with a constant in the exponent proportional to ch . §3.4.2. Lattice Attack on an NTRU Message. A lattice attack may also be directed against an individual message m. Here the associated lattice problem is very similar to that for h, and the target vector will have the form (αm, φ). As before, the attacker should balance the lattice using α = |φ|2 / |m|2 , which leads to the value
2πe |m|2 |φ|2 . cm = Nq This constant cm gives a measure of the vulnerability of an individual message to a lattice attack, similar to the way ch does for a lattice attack on h. An encrypted message is most vulnerable if cm is small, and becomes less so as cm gets closer to 1. In order to make the attacks on h and m equally difficult, we want to take cm ≈ ch , or equivalently, |f |2 |g|2 ≈ |m|2 |φ|2 . For concreteness, we will now restrict to the case that p = 3; other values may be analyzed similarly. For p = 3, ≈ 2N/3. an average message m will consist of N/3 each of 1, 0 and −1, so |m|2 √ Similarly, φ consists of d each of 1 and −1, with the rest 0’s, so |φ|2 = 2d. Thus we will want to set |f |2 |g|2 ≈ 4N d/3. This can be combined with the decryption criterion (3) to assist in choosing parameters. §3.4.3. Lattice Attack on a Spurious Key. Rather than trying to find the private key f , an attacker might use the lattice described above (in Section 3.4.1) and try to find some other short vector in the lattice, say of the form τ = (αf , g ). If this vector is short enough, then f will act as a decryption key. More precisely, if it turns out that with high probability, f e ≡ pφ g + m f (mod q)
satisfies |pφ g + m f |∞ < q, then decryption will succeed; and even if this width is 2q or 3q, it is possible that the message could be recovered via errorcorrecting techniques, especially if several such τ ’s could be found. This idea,
NTRU: A Ring-Based Public Key Cryptosystem
275
which is due to Coppersmith and Shamir, is described in [2]. However experimental evidence suggests that the existence of spurious keys does not pose a security threat. See Section 4.2 for a further discussion of this point. §4.
Practical Implementations of NTRU
§4.1. Specific Parameter Choices. We will now present three distinct sets of parameters which yield three different levels of security. The norms of f and g have been chosen so that decryption failure occurs with probability less than 5 · 10−5 (based on extensive computer experimentation). Case A: Moderate Security The Moderate Security parameters are suitable for situations in which the intrinsic value of any individual message is small, and in which keys will be changed with reasonable frequency. Examples might include encrypting of television, pager, and cellular telephone transmissions. (N, p, q) = (107, 3, 64) Lf = L(15, 14),
Lg = L(12, 12),
Lφ = L(5, 5) (i.e., d = 5).
(In other words, f is chosen with 15 1’s and 14 −1’s, g is chosen with 12 1’s and 12 −1’s, and φ is chosen with 5 1’s and 5 −1’s.) These give key sizes Private Key = 340 bits
and Public Key = 642 bits,
and (meet-in-the-middle) security levels Key Security = 250
and Message Security = 226.5 .
(We note again that meet-in-the-middle attacks require large amounts of computer storage; for straight search brute force attacks, these security levels should be squared.) Substituting the above values into the appropriate formulas yields lattice values ch = 0.257,
cm = 0.258,
and s = 0.422q.
Case B: High Security (N, p, q) = (167, 3, 128) Lf = L(61, 60),
Lg = L(20, 20),
Private Key = 530 bits Key Security = 2 ch = 0.236,
82.9
Lφ = L(18, 18) (i.e., d = 18)
and Public Key = 1169 bits and Message Security = 277.5
cm = 0.225,
and s = 0.296q.
276
Jeffrey Hoffstein, Jill Pipher, and Joseph H. Silverman
Case C: Highest Security (N, p, q) = (503, 3, 256) Lf = L(216, 215),
Lg = L(72, 72),
Private Key = 1595 bits Key Security = 2 ch = 0.182,
285
Lφ = L(55, 55) (i.e., d = 55)
and Public Key = 4024 bits
and Message Security = 2170
cm = 0.160,
and s = 0.0.365q.
§4.2. Lattice Attacks — Experimental Evidence. In this section we describe our preliminary analysis of the security of the NTRU Public Key Cryptosystem from attacks using lattice reduction methods. It is based on experiments which were performed using version 1.7 of Victor Shoup’s implementation of the Schnorr,Euchner and Hoerner improvements of the LLL algorithm, distributed in his NTL package at http://www.cs.wisc.edu/ shoup/ntl/. The NTL package was run on a 200 M Hz Pentium Pro with a Linux operating system. This algorithm has several parameters that can be adjusted to give varying types of results. In general the LLL algorithm can be tuned to either find a somewhat short point in a small amount of time or a very short point in a longer time. The key quantity is the constant ch (or cm ) described above. It is somewhat easier to decrypt messages if these constants are small, somewhat harder if they are close to 1. The idea is to choose a compromise value which makes decryption easy, while still making it difficult for LLL to work effectively. The following tables give the time required for LLL to find either the target (αf, g) or a closely related vector in the lattice L of 3.4.1 for various choices of q, ch and dimension N . As will be elaborated on further in the Appendix, the algorithm seems to find either a vector of the correct length, or one considerably too long to be useful for decryption. Even if it were to find a spurious key of length somewhat longer than the target, as discussed by Coppersmith and Shamir in [2], it appears that the time required to find such a key would not be significantly less than that required to find the true target. We have chosen parameters so that cm ≈ ch . (So the time required to break an individual message should be on the same order as the time required to break the public key). In all cases we found that when N gets sufficiently large the algorithm fails to terminate, probably because of accumulated round off errors. The tables end roughly at this point. In this version of LLL there are three parameters that can be fine tuned to optimize an attack. The tables give typical running times to break a key pair for the most optimal choices of parameters we have found to date. The two columns give results for two different floating point versions of the program, QP1 offering higher precision. We then use this information to extrapolate running times for larger values of N , assuming the algorithm were to terminate.
NTRU: A Ring-Based Public Key Cryptosystem
FP
Case A q=64 c=0.26
Case B q=128 c=0.23
Case C q=256 c=0.18
277
QP1
N 75 80 85 90 92 94 96 98
time (secs) 561 1493 2832 4435 7440 12908 28534 129938
N 75 80 85 88 90 95 96 98 100
time (secs) 1604 3406 5168 11298 16102 62321 80045 374034 183307
N 75 80 85 90 95
time (secs) 600 953 1127 3816 13588
N 75 80 85 90 95 100
time (secs) 3026 5452 8171 20195 57087 109706
N 75 80 85 90 95 100 102
time (secs) 547 765 1651 2414 2934 7471 8648
N 75 78 81 84 87 90 93 96 99 102 105 108
time (secs) 2293 3513 3453 5061 6685 9753 16946 19854 30014 51207 75860 145834
We will write t(N ) for the time in seconds necessary to break a public key corresponding to a parameter N . When we graph log t(N ) against N , the examples we have done seem to indicate that the graph has a positive slope with a small positive concavity. This would indicate that t(N ) grows at least exponentially with N , and possibly even with N log N . To extrapolate out to higher values of N , we have taken the information we have and approximated a lower bound for the slope of log t(N ) against N . This gives the following rough estimates for
278
Jeffrey Hoffstein, Jill Pipher, and Joseph H. Silverman
t(N ) in seconds using FP: t(N ) > 12908 exp[(0.396)(N − 94)]
(Moderate Security)
t(N ) > 13588 exp[(0.291)(N − 95)] t(N ) > 2414 exp[(0.10)(N − 92)]
(High Security) (Highest Security)
The running times for QP1 are longer for small N, but yield a better exponential constant, so for QP1 we obtain: t(N ) > 80045 exp[(0.207)(N − 96)]
(Moderate Security)
t(N ) > 8171 exp[(0.17315)(N − 85)] t(N ) > 30014 exp[(0.17564)(N − 99)]
(High Security) (Highest Security)
These lower bounds yield the following estimates for the time necessary to break the different levels of NTRU security using QP1 running on one 200 MHz Pentium Pro: Type
Level
q
c
N
Time (seconds)
QP 1
Moderate
64
0.26
107
780, 230 (9 days)
QP 1
High
128
0.23
167
1.198 · 1010 (380 years)
QP 1
Highest
256
0.18
503
1.969 · 1035 (6.2 · 1027 years)
A more detailed analysis and description of the lattice experiments is given in the Appendix. §5.
Additional Topics
§5.1. Improving Message Expansion. The NTRU PKCS’s for the sample parameters presented in Section 4.1 have moderate message expansions. However, as the principal use for PKCS’s is the exchange of a private key in a single message block this is not a significant problem. It may be worth mentioning, though, that there is a simple way that the NTRU technique can be used to convey a very long message, with an expansion of only 1-1 after the first mesage block. With this approach, the first encrypted message e1 that Cathy sends is decrypted as a sequence of 1’s, 0’s and −1’s (taking p = 3) and interpreted as a φ1 for the next message block. The next encrypted message block is φ1 e1 + m1 , where m1 is the first block of the actual message. As Dan knows φ1 , he can recover m1 mod q exactly. The next encrypted message block Cathy sends is e2 = φ2 e1 +m2 , where Cathy derived φ2 from m1 by squaring m1 and reducing it mod 3. Dan can now recover φ2 as he knows m1 , and hence can derive m2 mod q from e2 . This can continue for a message of arbitrary length. §5.2. Theoretical Operating Specifications. In this section we consider the theoretical operating characteristics of the NTRU PKCS. There are three
NTRU: A Ring-Based Public Key Cryptosystem
279
integer parameters (N, p, q), four sets Lf , Lg , Lφ , Lm determined respectively by integers df , dg , d, p as described in Sections 1.1 and 2.2. The following table summarizes the NTRU PKCS operating characteristics in terms of these parameters. Plain Text Block N log2 p bits Encrypted Text Block N log2 q bits Encryption Speed∗
O(N 2 ) operations
Decryption Speed Message Expansion
O(N 2 ) operations logp q-to-1
Private Key Length Public Key Length
2N log2 p bits N log2 q bits
∗
Precisely, 4N 2 additions and N divisions by q with remainder
§5.3. Other Implementation Considerations. We briefly mention some additional factors which should be considered when implementing NTRU. (1) It is important that gcd(q, p) = 1. Although in principle NTRU will work without this requirement, in practice having gcd(q, p) > 1 will decrease security. At the extreme range, if p|q, then the encrypted message e satisfies e ≡ m (mod p), so it is completely insecure. (2) We want most f ’s to have inverses modulo p and modulo q, since otherwise it will be hard to create keys. A first necessary requirement is that gcd(f (1), pq) = 1, but if this fails for some chosen f , the code creator can instead use, say, f (X) + 1 or f (X) − 1. Assuming gcd(f (1), pq) = 1, virtually all f ’s will have the required inverses if we take N to be a prime and require that for each prime P dividing p and q, the order of P in (/N )∗ is large, say either N − 1 or (N − 1)/2. For example, this will certainly be true if (N − 1)/2 is itself prime (i.e., N is a Sophie Germain prime). Examples of such primes include 107, 167 and 503. §5.4. Comparison with Other PKCS’s. There are currently a number of public key cryptosystems in the literature, including the system of Rivest, Shamir, and Adelman (RSA [9]) based on the difficulty of factoring, the system of McEliece [8] based on error correcting codes, and the recent system of Goldreich, Goldwasser, and Halevi (GGH [4]) based on the difficulty of finding short almost-orthogonalized bases in a lattice. The NTRU system has some features in common with McEliece’s system, in that -multiplication in the ring R can be formulated as multiplication of matrices (of a special kind), and then encryption in both systems can be written as a matrix multiplication E = AX + Y , where A is the public key. A minor difference between the two systems is that for an NTRU encryption, Y is the message and X is a random vector, while the McEliece system reverses these assignments. But the real difference is the underlying trap-door which allows decryption. For the McEliece system, the matrix A is associated to an error correcting (Goppa) code, and decryption works because the random contribution is small enough to be “corrected” by the Goppa code. For NTRU, the matrix A
280
Jeffrey Hoffstein, Jill Pipher, and Joseph H. Silverman
is a circulant matrix, and decryption depends on the decomposition of A into a product of two matrices having a special form, together with a lifting from mod q to mod p. As far as we can tell, the NTRU system has little in common with the RSA system. Similarly, although the NTRU system must be set up to prevent lattice reduction attacks, its underlying decryption method is very different from the GGH system, in which decryption is based on knowledge of short lattice bases. In this aspect, GGH actually resembles the McEliece system, since in both cases decryption is performed by recognizing and eliminating a small random contribution. Contrasting this, NTRU eliminates a much larger random contribution via divisibility (i.e., congruence) considerations. The following table compares some of the theoretical operating characteristics of the RSA, McEliece, GGH, and NTRU cryptosystems. In each case the number N represents a natural security/message length parameter. NTRU
RSA
McEliece
GGH
Encryption Speed
N
N
N
2
N2
Decryption Speed(3)
N2
N3
N2
N2
Public Key
N
N
N2
N2
Private Key
N
N
N2
N2
varies
1–1
2–1
1–1
(1,2)
Message Expansion(4) (1) (2) (3) (4)
2
2
NTRU encryption requires only additions and shifts, no other multiplications RSA encryption is O(N 3 ) unless small encryption exponents are used. Asymptotically, NTRU encryption and decryption are O(N log N ) using FFT. For NTRU, see Section 5.1.
We have made some preliminary timing comparisons between NTRU and RSA, using information available from RSA’s web page. The NTRU program we used was written in C and not optimized for speed. The main uses to which PKCS’s are applied are the exchange of secret keys and short messages. Also, RSA, ECC and NTRU all work in units of “message blocks,” and any message block in any of these systems is large enough to hold a secret key of very high security, or a short message. Thus for comparison purposes, in the following we interpreted a key encryption or decryption in a PKCS to be the process of encrypting or decrypting one message block. Numbers given for encryption and decryption are message blocks processed per second. The information is summarized in the following tables: Security Level
Encrypt (blks/sec)
Decrypt (blks/sec)
Create key (sec)
Moderate
1818
505
0.1080
High
649
164
0.1555
Highest
103
19
0.8571
NTRU: 75 MHz Pentium, running MSDOS
NTRU: A Ring-Based Public Key Cryptosystem
Security Level
Encrypt (blks/sec)
Decrypt (blks/sec)
Create key (sec)
Moderate
16666
2273
0.0079
High
4762
724
0.0184
Highest
730
79
0.1528
281
NTRU: 200 MHz Pentium Pro, running Linux Security Level
Encrypt (blks/sec)
Decrypt (blks/sec)
Create key (sec)
512 bit
370
42
0.45
768 bit
189
15
1.5
1024 bit
116
7
3.8
RSA: 90MHz Pentium Security Level
Encrypt (blks/sec)
Decrypt (blks/sec)
Create key (sec)
512 bit
1020
125
0.26
768 bit
588
42
0.59
1024 bit
385
23
1.28
RSA: 255 MHz Digital AlphaStation Comparing NTRU and RSA on the Pentium 75 and 90 platforms, adjusting for clock speed, and comparing the moderate NTRU security level to 512 bit RSA security level, we find that NTRU is 5.9 times faster at encryption, 14.4 times faster at decryption and 5.0 times faster at key creation. Similarly comparing the highest NTRU security level to the 1024 bit RSA security level, NTRU is the same speed at encryption, 3.2 times faster at decryption, and 5.3 times faster at key creation. The 200 MHz Pentium pro and the 256 MHz Digital Alpha are sufficiently different that there is no obvious way to precisely compare one to the other. But simply comparing the raw numbers it is interesting to note that in spite of the slower clock speed, NTRU comes out 16, 18 and 33 times faster at encryption, decryption and key creation at moderate security, and 2, 3 and 8 times faster at high security. For related timings of ECC, we refer to Certicom’s published report: “Certicom Releases Security Builder 1.2 Performance Data” According to their report (available at http://www.certicom.com/secureb.htm), on a Pentium platform ECC takes 4.57 times as long as RSA to encrypt a message block, and 0.267 times as long to decrypt a message block. Thus compared to RSA, ECC wins by a factor of about 4 when decrypting, but loses by a factor of 4 when encrypting. Acknowledgments. We would like to thank Don Coppersmith, Johan H˚ astad, Hendrik Lenstra Jr., Bjorn Poonen, Adi Shamir, Claus Schnorr and Benne de
282
Jeffrey Hoffstein, Jill Pipher, and Joseph H. Silverman
Weger for their help with lattice reduction methods, Philip Hirschhorn for his assistance in implementing NTRU and doing LLL testing, Victor Shoup for his NTL package, Martin Mohlenkamp for several enlightening conversations about this package, Andrew Odlyzko for pointing out the meet-in-the-middle attack and other helpful suggestions, Mike Rosen for his help with polynomial inverses, and Dan Lieman for his assistance in all phases of this project. In particular, our analysis of lattice-based attacks is an amalgamation of the suggestions of Don Coppersmith, Johan H˚ astad, and Adi Shamir, combined with some thoughts of our own, although we stress that any oversights or errors in this analysis are entirely of our own devising. References 1. M. Blum, S. Goldwasser, An efficient probabilistic public-key encryption scheme which hides all partial information, Advances in Cryptology: Proceedings of CRYPTO 84, Lecture Notes in Computer Science, vol. 196, Springer-Verlag, 1985, pp. 289–299. 2. D. Coppersmith, A. Shamir, Lattice attacks on NTRU, Preprint, April 5, 1997; presented at Eurocrypt 97. 3. W. Diffie, M.E. Hellman, New directions in cryptography, IEEE Trans. on Information Theory 22 (1976), 644–654. 4. O. Goldreich, S. Goldwasser, S. Halevi, Public-key cryptosystems from lattice reduction problems, MIT – Laboratory for Computer Science preprint, November 1996. 5. S. Goldwasser and A. Micali, Probabilistic encryption, J. Computer and Systems Science 28 (1984), 270–299. 6. J. Hoffstein, J. Pipher, J.H. Silverman, NTRU: A new high speed public key cryptosystem, Preprint; presented at the rump session of Crypto 96. 7. A.K. Lenstra, H.W. Lenstra, L. Lov´sz, Factoring polynomials with polynomial coefficients, Math. Annalen 261 (1982), 515–534. 8. R.J. McEliece, A public-key cryptosystem based on algebraic coding theory, JPL Pasadena, DSN Progress Reports 42–44 (1978), 114–116. 9. R.L. Rivest, A. Shamir, L. Adleman, A method for obtaining digital signatures and public key cryptosystems, Communications of the ACM 21 (1978), 120–126. 10. C.P. Schnorr, Block reduced lattice bases and successive minima, Combinatorics, Probability and Computing 3 (1994), 507–522. 11. C.P. Schnorr, M. Euchner, Lattice basis reduction: improved practical algorithms and solving subset sum problems, Mathematical Programing 66 (1994), 181-199. 12. C.P. Schnorr, H.H. Hoerner, Attacking the Chor Rivest cryptosystem by improved lattice reduction, Proc. EUROCRYPT 1995, Lecture Notes in Computer Science 921, Springer-Verlag, 1995, pp. 1–12. 13. J.H. Silverman, A Meet-In-The-Middle Attack on an NTRU Private Key, preprint.
NTRU: A Ring-Based Public Key Cryptosystem
§6.
283
Appendix - Some Remarks on the Impementation of the Schnorr-Euchner Improvements of LLL
The LLL algorithm produces, from a given basis for a lattice, a reduced basis whose first vector is guaranteed to be relatively short. Part of this procedure involves minimizing the length of linear combinations of basis vectors, taking “blocks” of two at a time. If one minimized the length of linear combinations of basis vectors, taking as a block the entire basis, then an actual shortest vector could be found, but the time to produce it would be exponential in the dimension. One of Schnorr and Euchner’s improvements (see [10, 11, 12] was to add an extra degree of flexibility. They minimize over blocks of vectors of size greater than two, but less than the dimension. This results in shorter vectors than are generally found by the original LLL algorithm, i.e with block size equal 2, but causes an increase in running time which is exponential in the block size. In NTL 1.7 the blocksize β can be chosen, as well as a second parameter p which Schnorr and Hoerner introduced. This is intended to moderate the increase in running time as β increases. The “pruning” parameter p halts the minimization process when the probability of finding a shorter vector than already found within a given block falls below a prescribed value which depends on p. This probability is computed via the gaussian volume heuristic, the validity of which depends on the randomness of the lattice. There is a third parameter δ which is allowed to vary between 0.5 and 1.0. This parameter determines how frequently a certain recursive operation is performed. The program recommends setting δ = .99, and we have followed this recommendation. In our experiments we varied the choice of ch and of the blocksize β and pruning factor p. We never observed, even for larger values of β, a noticeable improvement from the pruning procedure and finally set p = 0, so the pruning procedure was not called. The following tables give a more complete set of information which includes the choice of β and the ratio of the smallest vector found to the target vector. We observed that for small values of β the algorithm would fail to find a vector useful for decryption. In fact it would most likely produce a q-vector, that is to say a vector with a single coordinate equal to q and the rest all zero. The initial basis for L contains N of these vectors, which are in fact not much longer than the length s = N αq/πe of the shortest expected vector. As β increased, the smallest vector found would continue to be a q-vector until a certain threshold was passed, which depended on N and ch . (Increasing with N , decreasing with ch ). After this threshold, if the algorithm terminated it would usually succeed in finding the target vector. On some occasions it would find a vector slightly smaller than a q-vector and then at the next blocksize succeed in finding the target. The general pattern is that for fixed ch the blocksize would have to increase with N in order for the algorithm to succeed in finding the target. At slightly smaller blocksizes the time required would be on the same order as the time required to find the target but the vector found — either the q-vector or slightly smaller — would be useless for decryption purposes.
284
Jeffrey Hoffstein, Jill Pipher, and Joseph H. Silverman
In Table 1 timings are given for a lattice corresponding to ch = 0.26 with |f |2 = |g|2 . This is the equivalent to the moderate security lattice attack, but the balancing of f and g makes it possible to work with smaller integers and the NTL program runs, with some exceptions, more efficiently. Notice that the necessary blocksize increases monotonically with N . In the Tables 2, 3 and 4, timings are given for moderate, high and highest security. These are again formed with |f |2 = |g|2 , and the moderate security table is a repeat to give some idea of the variation that occurs. Finally, Table 5 is formed with |f |2 and |g|2 taking the same ratio as in the actual encryption procedure. The α = 0.9097 indicates that the lattice has been balanced to optimize the chances of an attacker. Note that the times are roughly the same as the equivalent situation in Tables 1 and 2, but timing deteriorates very substantially at N = 98. Notice some curiously short timings at N = 90 in Tables 2 and 5. These occurred when the algorithm terminated after locating a particular short vector: (f , f h), with f = (1, −1, 1, −1, 1, . . . ). The value of f h is then (k, −k, k, . . . ), for some k, with k taking the value 1 or −1 with probability 2/q. If this happens, (f , f h) is short, but as f is highly non-invertible it is useless for decryption purposes.
NTRU: A Ring-Based Public Key Cryptosystem
N
Block size
Running time (sec)
Actual Total Norm
Smallest Norm Found
Ratio of found to actual
75 80 80 80 80 85 85 85 85 85 85 90 90 90 90 90 90 90 90 95 95 95 95 95 95 95 95 95 95 100 100 100 100 100 100 100 100 100 100
6 4 6 8 10 4 6 8 10 12 14 4 6 8 10 12 14 16 18 4 6 8 10 12 14 16 18 20 22 4 6 8 10 12 14 16 18 20 22
1910 1823 2731 3285 3663 2091 3661 5012 5497 7438 7433 3382 3305 5910 7173 7367 12182 16102 18920 3019 4434 7707 9449 11308 14520 22348 23965 81028 62321 4020 6307 9225 11109 13381 19096 23850 40670 72130 444773
6.32 6.48 6.78 6.48 6.63 6.93 6.78 6.93 6.78 6.93 7.07 6.93 6.78 6.78 6.78 6.78 6.93 6.78 6.93 7.21 7.07 7.07 7.35 7.21 7.21 7.07 7.21 7.07 7.35 7.21 7.07 7.07 7.07 7.07 7.21 7.07 7.21 7.21 7.21
6.32 64.00 64.00 64.00 6.63 64.00 64.00 64.00 64.00 64.00 7.07 64.00 64.00 64.00 64.00 64.00 64.00 6.78 6.93 64.00 64.00 64.00 64.00 64.00 64.00 64.00 64.00 64.00 7.35 64.00 64.00 64.00 64.00 64.00 64.00 64.00 50.99 64.00 7.21
1.0 9.9 9.4 9.9 1.0 9.2 9.4 9.2 9.4 9.2 1.0 9.2 9.4 9.4 9.4 9.4 9.2 1.0 1.0 8.9 9.1 9.1 8.7 8.9 8.9 9.1 8.9 9.1 1.0 8.9 9.1 9.1 9.1 9.1 8.9 9.1 7.1 8.9 1.0
Table 1: BKZ-QP1 with Q = 64, c = 0.26, δ = 0.99, and prune = 0
285
286
Jeffrey Hoffstein, Jill Pipher, and Joseph H. Silverman
N
Block size
Running time (sec)
Actual Total Norm
Smallest Norm Found
Ratio of found to actual
75 75 80 80 85 85 85 85 85 88 90 90 95 95 95 96 98 98 100
4 6 6 8 8 10 12 14 16 16 16 18 18 19 20 20 20 22 22
1797 1604 2776 3406 4614 5898 7536 8106 5168 11298 12987 2 25908 36754 59664 80045 75365 374034 183307
6.16 6.48 6.78 6.63 6.93 6.78 6.93 7.21 6.78 6.93 6.93 6.78 7.21 7.21 7.21 7.07 7.21 7.07 7.07
64.00 6.48 64.00 6.63 64.00 64.00 64.00 64.00 6.78 6.93 64.00 13.42 64.00 64.00 64.00 7.07 64.00 7.07 7.07
10.4 1.0 9.4 1.0 9.2 9.4 9.2 8.9 1.0 1.0 9.2 2.0 8.9 8.9 8.9 1.0 8.9 1.0 1.0
Table 2: BKZ-QP1 with Q = 64, c = 0.26, δ = 0.99, and prune = 0 N
Block size
Running time (sec)
Actual Total Norm
Smallest Norm Found
Ratio of found to actual
75 75 75 75 80 80 85 85 90 90 90 95 95 95 100
2 4 6 8 8 10 10 12 12 14 16 16 18 20 20
1067 2699 3244 3026 6022 5452 10689 8171 15304 17802 20195 31338 54490 57087 109706
8.00 8.00 8.12 7.87 8.37 8.12 8.37 8.37 8.60 8.83 8.60 9.17 8.94 8.83 9.17
128.00 121.90 121.04 7.87 124.54 8.12 124.26 8.37 128.00 126.60 8.60 128.00 128.00 8.83 9.17
16.0 15.2 14.9 1.0 14.9 1.0 14.9 1.0 14.9 14.3 1.0 14.0 14.3 1.0 1.0
Table 3: BKZ-QP1 with Q = 128, c = 0.23, δ = 0.99, and prune = 0
NTRU: A Ring-Based Public Key Cryptosystem
N
Block size
Running time (sec)
Actual Total Norm
Smallest Norm Found
Ratio of found to actual
75 75 78 81 81 84 87 90 90 93 93 93 96 96 99 102 102 102 105 105 108 108
4 20 4 4 6 6 6 6 8 8 10 12 12 14 14 14 16 18 18 20 20 22
2293 1930 3513 3422 3453 5061 6685 7085 9753 11900 14671 16946 22684 19854 30014 30817 64718 51207 81336 75860 197697 145834
8.60 8.72 8.94 9.38 9.17 9.17 9.38 9.49 9.59 9.90 9.80 9.70 9.80 9.90 10.00 10.20 10.39 10.39 10.58 10.30 10.30 10.30
8.60 8.72 12.25 221.22 9.17 9.17 9.38 256.00 9.59 254.55 237.58 9.70 231.59 9.90 10.00 239.62 223.64 10.39 244.38 10.30 255.87 10.30
1.0 1.0 1.4 23.6 1.0 1.0 1.0 27.0 1.0 25.7 24.2 1.0 23.6 1.0 1.0 23.5 21.5 1.0 23.1 1.0 24.9 1.0
287
Table 4: BKZ-QP1 with Q = 256, c = 0.18, δ = 0.99, and prune = 0
288
Jeffrey Hoffstein, Jill Pipher, and Joseph H. Silverman
N
Block size
Running time (sec)
Actual Total Norm
Smallest Norm Found
Ratio of found to actual
75 75 75 80 85 85 85 85 85 85 85 90 95 95 95 96 96 98
2 4 6 6 6 8 10 12 14 16 18 18 18 20 22 22 24 24
808 1895 2363 3582 5412 7252 8633 10074 12371 17729 16095 4 37998 43108 200195 240563 68054 1369730
6000.00 6000.00 6000.00 6164.41 6324.56 6324.56 6324.56 6324.56 6324.56 6324.56 6324.56 6480.74 6633.25 6633.25 6633.25 6633.25 6633.25 6782.33
64000.0 64000.0 7857.87 6164.78 64000.0 64000.0 64000.0 64000.0 64000.0 64000.0 6630.40 12820.5 64000.0 64000.0 6900.34 64000.0 6779.54 6852.89
10.7 10.7 1.3 1.0 10.1 10.1 10.1 10.1 10.1 10.1 1.0 2.0 9.6 9.6 1.0 9.6 1.0 1.0
Table 5: BKZ-QP1 with Q = 64, c = 0.26, α = 0.9097, δ = 0.99, and prune = 0 Jeffrey Hoffstein, Mathematics Department, Box 1917, Brown University, Providence, RI 02912 USA. jhoff@ntru.com, jhoff@math.brown.edu Jill Pipher, Mathematics Department, Box 1917, Brown University, Providence, RI 02912 USA. [email protected], [email protected] Joseph H. Silverman, Mathematics Department, Box 1917, Brown University, Providence, RI 02912 USA. [email protected], [email protected]
Finding Length-3 Positive Cunningham Chains and Their Cryptographic Significance Adam Young1 and Moti Yung2 1
Dept. of Computer Science, Columbia University [email protected] 2 CertCo New York, NY, USA [email protected],[email protected]
Abstract. A Cunningham chain of length k is a finite set of primes p1 , p2 , ..., pk such that pi+1 = 2pi +1, or pi+1 = 2pi −1 for i = 1, 2, 3, ..., k− 1. In this paper we present an algorithm that finds Cunningham chains of the form pi+1 = 2pi + 1 for i = 2, 3 and a prime p1 . Such a chain of primes were recently shown to be cryptographically significant in solving the problem of Auto-Recoverable Auto-Certifiable Cryptosystems [YY98]. For this application, the primes p1 and p2 should be large to provide for a secure enough setting for the discrete log problem. We introduce a number of simple but useful speed-up methods, such as what we call trial remaindering and explain a heuristic algorithm to find such chains. We ran our algorithm on a Pentium 166 MHz machine. We found values for p1 , starting at a value which is 512 bits and ending at a value for p1 which is 1,376 bits in length. We give some of these values in the appendix. The feasibility of efficiently finding such primes, in turn, enables the system in [YY98] which is a software-based public key system with key recovery (note that every cryptosystem which is suggested for actual use must be checked to insure that its computations are feasible). Keywords: Cunningham Chains, Public-Key Cryptosystems, Auto-Recoverable and Auto-Certifiable Cryptosystem, ElGamal system, Primality testing.
1
Introduction
Cunningham chains of length greater than 2 have moved from being a number theoretic curiosity [Gu94] to having a real cryptographic significance. In a companion paper [YY98] it was shown how, given a Cunningham sequence of length 3 consisting of large primes, the problem of implementing an Auto-Recoverable Auto-Certifiable cryptosystem can be solved. More specifically, given the primes p, q, and r where p = 2q + 1 and q = 2r + 1, a public key infrastructure can be established with the capability of recovering the private keys by authorities cooperating with the Certification Authority, where the system has the same operational efficiency as a normal public key system, from the perspective of the user. This may enable secure and robust (recoverable) file systems and may also be used for law enforcement. Note however, that we do not advocate that J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 289–298, 1998. c Springer-Verlag Berlin Heidelberg 1998
290
Adam Young and Moti Yung
governments spy on their citizens, abusing the key recovery/escrow system, and we only treat the technical non-social non-political aspects of the issue (since there is commercial need for such efficient systems). The primary reason that such chains are cryptographically significant is that they provide two concurrent suitable settings for the ElGamal cryptosystem. Recall that the ElGamal Public Key Cryptosystem can be implemented in any field where the discrete logarithm problem is difficult [ElG85,St95,Ko94]. Since the fields Zp and Z2q fit this description, ElGamal can be conducted in Zp and in Zφ(p) , where the later is the field corresponding to the exponent of elements in Zp . The fact that secure public key encryptions can reside in the field Zp and in the exponents of elements in Zp (i.e., what we call a “double decker” exponentiation) is what makes the algorithm in [YY98] possible (embedding of certificate of recoverability inside the key generation procedure). Another usage of the “double decker” domain in what is known as a “group signature protocol” has been proposed independently in [CS97]. In this paper we give an algorithm that is capable of finding values for p, q, and r when run on a Pentium processor. Simply using library calls to primality testing procedures will not work fast enough since the chains we are looking for are quite sparse (based on heuristic estimates). Thus, we need to speed up the computation, which we did by employing some simple but sufficient methods. One such method which we call trial remaindering exploits the number theoretic relationships of the values in these chains. Using trial remaindering it is possible to sieve away candidate values which fail to constitute such chains. This is performed in a preprocessing stage and has the effect that all subsequent candidate values that are checked are guaranteed to be more likely to yield a chain of length 3. We then describe our heuristic algorithm. It is heuristic since the algorithm termination and time depends on the density of the chains in the interval being searched. We ran the algorithm for several weeks on a Pentium 166 MHz processor, and a partial listing of our results is given in the appendix. We started with primes of size 512, and continued the search in increments of 32 bits. The largest value for r which we found is 1,376 bits long. Overall, the computation lasted a couple of months. We are currently running statistics on smaller ranges. In summary, this accomplishment complements the theoretical work in [YY98], demonstrating the actual feasibility of the suggested system.
2
2.1
Background on Auto-Recoverable Auto-Certifiable Cryptosystems Definition
Public Key Cryptosystems (PKC’s) are highly convenient in terms of use, and permit users to conduct private communications over public channels. Public Key Infrastructure (PKI) with Certification Authorities (CA’s) enable a scalable deployment of cryptosystems. However, there are situations where private
Length-3 Positive Cunningham Chains
291
keys may be needed to be recoverable or escrowable (e.g., sensitive storage systems where losing keys implies losing data, archival systems, and also in law enforcement - a sticky political issue which is beyond the technical one). Various architectural and systems solutions have been proposed in recent years to solve the problem, some by governments have been broken, others have been too cumbersome. Informally, an Auto-Recoverable Auto-Certifiable cryptosystem is a system that allows a user to generate a public key such that the corresponding private key is automatically recoverable by the escrow authorities. Thus, the submission to the escrow authorities auto-certifies the public key that is submitted. The solution is mathematical rather than systems oriented (new cryptosystem), also no essential overhead is put on users, no changes of communication etc. are required, and users initiate their scheme with CA’s as before. The following is the formal definition. Definition 1. An Auto-Recoverable and Auto-Certifiable Cryptosystem is an (m+2)-tuple (GEN,VER,REC1 ,REC2 ,...,RECm) such that: 1. GEN is a publicly known poly-time probabilistic algorithm that takes no input and generates the triple (K1 ,K2 ,P ) which is left on the tape as output. Here K2 is a randomly generated private key and K1 is the corresponding public key. P is a poly-sized (short) certificate that proves that K2 is recoverable by the escrow authorities using P . 2. VER is a publicly known poly-time deterministic algorithm that takes (K1 ,P ) on its input tape and returns a boolean value. With very high probability, VER returns true iff P can be used to recover the private key K2 . 3. RECi , where 1 ≤ i ≤ m is a private poly-time deterministic algorithm that takes P as input and returns share i of K2 on its tape as output, assuming that K2 was properly escrowed. The algorithms RECi for 1 ≤ i ≤ m can be used collaboratively to recover K2 . 4. It is intractable to recover K2 given K1 and P . It is assumed that the CA will not publish a public key unless it is verified that the corresponding private key is escrowed properly. The CA will not collaborate with the escrow authorities unless recovering keys (or better yet, recovering session keys under the keys) is authorized. Let EAi denote Escrow Authority i. EAi knows only RECi , in addition to what is publicly known. To publish a public key, user U runs GEN() and receives (K1 ,K2 ,P ). U keeps K2 private and encrypts the pair (K1 ,P ) with the public key of the CA. U then sends the resulting ciphertext to the CA. The CA decrypts this value, and recovers (K1 ,P ). The CA then computes VER(K1 ,P ), and publishes a signed version of K1 in the database of public keys iff the result is true. Otherwise, U’s submission is ignored. In the case that the system is used for a national PKI, it is anticipated that the governing body will insist on having their own database of values P . In this case, the CA’s can forward (K1 ,P ) to them. Suppose that U’s public key is accepted and K1 appears in the database of the CA. Given P , the escrow agencies can recover K2 as follows. EAi computes share i of K2 by running RECi (P ). The agencies then pool their shares and recover K2 . This scheme is advantageous
292
Adam Young and Moti Yung
over schemes like [Mi92] since it is highly decentralized. The keys can be sent to a multitude of CA’s, and can be verified immediately. 2.2
Auto-Recoverable Auto-Certifiable Cryptosystem
The following is a description of the implementation of an Auto-Recoverable Auto-Certifiable cryptosystem. System Setup A large prime r is agreed upon s.t. q = 2r + 1 is prime and s.t. p = 2q + 1 is prime. A generator g is agreed upon s.t. g generates Zp , and ∗ . The values (p,q,r,g,g1) are an odd value g1 is agreed upon s.t. g1 generates Z2q made public. We give one example of organizing the escrow authorities; other settings of threshold schemes or even schemes where users decide on which authorities to bundle together are possible. There are m authorities. Each authority EAi chooses zi ∈R Z2r . They each compute Yi =Qg1 zi mod 2q. They then pool m their shares Yi and compute Pm the product Y = i=1 Yi mod 2q. Note that Y = z g1 mod 2q, where z = i=1 zi mod 2r. The authorities choose their zi over ∗ . Each authority EAi keeps zi again if (g1 /Y ) mod 2q is not a generator of Z2q private. The public key of the authorities is (Y ,g1 ,2q). The corresponding shared private key is z. Key Generation. GEN chooses a value k ∈R Z2r and computes C = g1 k mod 2q. GEN then solves for the user’s private key x in Y k x = g1 k mod 2q. GEN computes the public key y = gx mod p. GEN computes a portion of the certificate −k mod p. GEN also computes three Non-Interactive Zero Knowledge v to be gY (NIZK) proofs (as in Fiat-Shamir) P1 , P2 , P3 . The certificate P is the 5-tuple (C,v,P1,P2 ,P3 ). GEN leaves ((y,g,p),x,P ) on the output tape (note that y need not be output by the device since y = vC mod p). The user’s public key is (y, g, p). This is the first usage of NIZK in key generation. Public Escrow Verification. VER takes ((y,g,p),P ) on its input tape and outputs a boolean value. VER verifies the following things: 1. 2. 3. 4.
P1 is valid, which shows that U knows k in C P2 is valid, which shows that U knows k in v P3 is valid, which shows that U knows k in vC mod p verifies that y = vC mod p
VER returns true iff all 4 criterion are satisfied. P1 is essentially the same as the proof described first in [GHY85] for isomorphic functions, but we are ∗ . It is easy to show that P1 is complete, sound, specifically operating in Z2q perfect-zero knowledge, and that it constitutes a proof of knowledge. P2 and P3 use the same proof system, which is given in [YY98].
Length-3 Positive Cunningham Chains
2.3
293
Key Recovery
RECi recovers share i of the user’s private key x as follows. RECi takes C from on its tape. The P . It then computes share si to be C zi mod 2q, and outputs si Q authorities then pool their shares and each computes Y k = m i=1 si mod 2q. From this they can each compute x = CY −k mod 2q, which is the user’s private key. Criterion 3 of definition 1 is therefore met. The escrow authorities can recover the plaintext under the key rather the key itself (i.e., session keys and keys encrypting individual files can be recovered without destroying the user’s total privacy). To decrypt the ciphertext (a, b) of user U the escrow authorities proceed as follows: 1. 2. 3. 4.
3
Each of the m escrow authorities i receives C corresponding to U . −z1 mod p. Escrow authority 1 computes s1 = aC −zi+1 mod p. Escrow authority i + 1 computes si+1 = si C Escrow authority m decrypts (a, b) by computing b/(sm C ) mod p.
Our Heuristic Algorithm
In this section we present several of the tools that are used to speed-up the primality testing algorithm. Probabilistic primality tests have been known since the late 1970’s. Among these are the Solovay-Strassen and Rabin-Miller probabilistic primality tests [SS78,Mi76,Ra80]. Of the two, Rabin-Miller is the most efficient since it has an error probability of at most 1/4 in a given round in terms of saying that the number is prime when it is actually composite as analyzed by Rabin (The Miller variant relies on the extended Riemann Hypothesis and was further analyzed in [Ba90]). We generate a sequence of primes (this issue was studied in a number of places, e.g., it was shown in [BD92] that there are definite advantages to generation of probable primes by incremental search). To achieve a speed-up we introduced a number of simple tools and rules in computing our chain of primes. The first tool that we introduce is trial remaindering. The second tool that we introduce is a heuristic optimization on how to conduct the Rabin-Miller probabilistic primality tests (called dove tailing). 3.1
The Method of Trial Remaindering.
The method of trial division is a well known speed-up for probabilistic primality testing. In trial division, we attempt to divide the number being tested by all primes up to some upper limit. In CryptoLib for example, this limit is 251, since 251 is the smallest prime that fits in a byte [LMS]. If one of the values divides the candidate evenly, then the prime that divides it is a witness of compositeness. We only use this step as a preprocessing optimization to find composites and not to decide primality (a task for which it would be insufficient, see [BS96]). In this section we introduce a technique which we call trial remaindering. Trial remaindering is similar to trial division in the sense that we seek to find witnesses of compositeness by expending minimal computational effort. The method
294
Adam Young and Moti Yung
exploits the algebraic relationships among p, q, and r. The concept behind trial remaindering is best understood by the following observation. Suppose c is the large candidate value which we hope is prime. If c mod 3 = 1 then c is not a candidate value for r. To see why, note that c = 3x + 1 for some x, hence 2c + 1 = 3(2x + 1). So, if we let r = c, then q would be divisible by 3 and hence would not be prime. Clearly, if c mod 3 = 0 then c wouldn’t be prime. So, for c to be a valid r it must be the case that c mod 3 = 2. For a more complex example, consider the following. If c mod 5 = 0, 2, or 3 then 5 | c, 5 | 2c + 1, and 5 | 4c + 3, respectively. Hence, c is not a valid candidate for the Cunningham sequence in these cases. As long as c mod 5 = 1 or 4, c may lead to a valid set of values for r, q, and p. Like in trial division, trial remaindering involves using the first several primes to identify composite values. The difference is, we take our candidate value c modulo these primes, and we check to see if any of the would be values for r, q, or p are composite. During these trials, if any of the candidate values for r, q, or p is found to be composite, then the prime that is used is a witness of compositeness. 3.2
Optimizing the Use of Rabin-Miller: Dove Tailing
In the Rabin-Miller primality test, we choose a random integer a where 1 ≤ a ≤ n − 1 to test if n is composite. Since the algorithm is monte-carlo based, an answer of true indicates that n is composite for sure, in which case a is a witness of this fact. If the answer is false then n is composite with probability at most 1/4. It is therefore a yes-biased monte-carlo algorithm. For the problem at hand, it seems quite naive to apply, say, 20 rounds of Rabin-Miller to the candidate value for r, then 20 rounds to 2r + 1, and then 20 rounds to 4r + 3. To see this, suppose that r is in fact prime. Then we are guaranteed to conduct 20 rounds at the start. It may well be the case that q is composite, in which case we would find this out with probability at least 3/4 in the first round of tests on 2r + 1. A better and natural approach is to dove-tail the tests among the candidate values r, 2r + 1, and 4r + 3 in succession. That is, we conduct one test on r, then one test on 2r + 1, then one test on 4r + 3, then one test on r, etc. This way we uniformly distribute our chances of detecting a composite over all three candidates. If any test indicates that the corresponding candidate is composite, then we need to choose another candidate for r and rerun the algorithm. It is well known that choosing large numbers uniformly at random is a difficult and often somewhat computationally expensive operation. We therefore opted to make another ‘optimization’ in our primality testing. We chose to use a fixed set of witnesses in the algorithm. Rather than generating each witness a at run-time, we index into an array of fixed witnesses to conduct the RabinMiller tests. By precomputing the random values, we avoid having to compute random numbers at run-time. We found that this improved the performance of our algorithm considerably. We used a fixed array of 20 witnesses. Each of these witnesses is used to test the three candidates in succession. To speed things up
Length-3 Positive Cunningham Chains
295
even further, we chose witnesses that fit within a machine word. This also speeds things up, since the first few exponentiations of these values in Rabin-Miller can be computed very quickly (choosing small fixed witnesses is used in various prime testing implementations). We note in passing that though we achieved good results using a fixed set of witnesses, Alford, Granville, and Pomerance proved that no set of fixed bases suffices to make the Rabin-Miller test error-free. 3.3
Putting It All Together
Before explaining the entire algorithm, one more critical observation needs to be pointed out. Suppose we find a candidate prime c that passes trial divisions and trial-remaindering. However, suppose that we also find a witness for compositeness for one of the three desired primes c, 2c + 1, or 4c + 3. Do we need to choose another c and conduct trial remaindering again? The answer is no. Having found a value c that passes trial remaindering for say, 2, 3, 5, 7, 11, and 13, we can choose our next candidate to be c0 = c + 2 ∗ 3 ∗ 5 ∗ 7 ∗ 11 ∗ 13 and be guaranteed that our assertions will once again hold. To see why, suppose that c mod 5 = 1. Then it must be the case that c0 mod 5 = 1, since c0 mod 5 = 1 + 2 ∗ 3 ∗ 5 ∗ 7 ∗ 11 ∗ 13 mod 5 = 1. We argue that this speeds things up. In essence, we only consider lattice points where the assertions are guaranteed to hold. This process is very similar to the process of finding primes by incremental search where we keep adding 2 and testing for primality. But, since we need various algebraic relations to hold among our three candidate primes, we have the flexibility of incrementing by these larger values while at the same time gaining more of an advantage than is possible by using this method to look for strong primes alone. The following is the pseudo-code for our algorithm to find these special Cunningham chains of length 3. Let MAX PRIME be the largest prime that we use to conduct trail remaindering, and let INCR VAL be 2*3*5*7*...*MAX PRIME. 1. choose a large candidate c for primality 2. apply trial remaindering to c 3. if a witness for compositeness is found, goto step 1 4. i = 0 5. while (i < NUM ITER) do 6. c = c + INCR VAL 7. apply trial division to c, 2c + 1, 4c + 3 8. if a witness of compositeness is found, goto step 15 9. for (j=0;j
296
Adam Young and Moti Yung
IsPrime() is normal N-round Rabin-Miller with witnesses chosen uniformly at random. witness[] is the array of fixed witnesses. The size of the array is NUM WITNESSES. NUM ITER is the number of lattice points which we wish to check. In our implementation this value is 6. It is not clear what the best choice of this constant is for a given size of c. In general it seems useful to check a small number of lattice points. Once a value c is found that seems to have the desired properties, we check the three candidate primes rigorously using the Rabin-Miller implementation IsPrime(). From the above algorithm it is clear that by choosing a value c using trial remaindering and then subsequently adding INCR VAL to it, we essentially ‘sieve’ away pseudo-Cunningham chains of length three for NUM ITER rounds.
4
Conclusion
We explain how positive Cunningham chains interact with the requirement of El Gamal encryption systems and how they can be used for generating such systems based on “double-decker” exponentiation. The application to recoverability of keys was discussed. No cryptosystem is complete without computational feasibility of all its stages at the proper range. This paper produced such experiments regarding Auto-Recoverable Auto-Certifiable Cryptosystems. A heuristic algorithm was given to find Cunningham chains of length 3 for large primes. We introduced the method of trial remaindering and showed how it can be used to implement a heuristic chain finding algorithm. We also presented a heuristic involving the use of a fixed set of small witnesses for the Rabin-Miller probabilistic primality test. The results of Appendix A prove that finding chains of length 3 consisting of large primes is tractable, since many cryptographically secure chains of length 3 were found on a Pentium machine over the course of several weeks. We are currently gathering statistics on the density of such length 3 chains, by analyzing the running times in finding small primes (given our limited computational resources, we are unable to gather such statistics for this submission). We also implemented the rest of the cryptosystem which is a trivial task compared to the set-up computations.
References E. Bach, Explicit bounds for primality testing and related problems. Mathematics of Computation, 55 (1990), 355–380. BD92. J. Brandt, I. Damgard. On generation of probable primes by incremental search. In Advances in Cryptology—CRYPTO ’92, pages 358–370, 1992. Springer-Verlag. BS96. E. Bach, J. Schallit. Algorithmic Number Theory - Efficient Algorithms, vol. 1, Chp. 9, 1996. MIT Press. CS97. J. Camenisch, M. Stadler. Efficient Group Signature Schemes for Large Groups. In Advances in Cryptology—CRYPTO ’97, pages 410–424, 1997. Springer-Verlag. Ba90.
Length-3 Positive Cunningham Chains
297
ElG85. T. ElGamal. A Public-Key Cryptosystem and a Signature Scheme Based on Discrete Logarithms. In Advances in Cryptology—CRYPTO ’84, pages 10–18, 1985. Springer-Verlag. Fo97. T. Forbes. Prime 15-tuplet. NMBRTHRY Mailing List, March 1997. GHY85. Z. Galil, S. Haber, M. Yung. Symmetric public-key encryption. In CRYPTO ’85, pages 128–137. Gu94. R. Guy. Unsolved Problems in Number Theory. Springer-Verlag, Berlin, 2nd edition, 1994. Ko94. N. Koblitz. A course in Number Theory and Cryptography. 2nd edition, 1994. Springer-Verlag. LMS. J. Lacy, D. Mitchell, W. Schell. CryptoLib: Cryptography in Software. AT&T Bell Laboratories, section 2.2.1. Lo89. G. Loh. Long chains of nearly doubled primes. Math. Comp., 53, pages 751– 759, 1989. Mi76. G. Miller. Riemann’s hypothesis and tests for primality. In Journal of Computer and System Sciences, vol. 13, pages 300–317, 1976. Mi92. S. Micali. Fair Public-Key Cryptosystems. In Advances in Cryptology— CRYPTO ’92, pages 113–138, 1992. Springer-Verlag. Ra80. M. Rabin. Probabilistic Algorithm for Testing Primality. In volume 12, n. 1 of Journal of Number Theory, pages 128–138, Feb 1980. Ro93. K. R. Rosen. Elementary Number Theory and its Applications. 3rd edition, Theorem 8.14, page 295, 1993. Addison Wesley. SS78. R. Solovay, V. Strassen. A fast Monte-Carlo test for primality. In SIAM Journal on Computing, vol. 6, pages 84–85, 1977. St95. D. Stinson. Cryptography Theory and Practice. Theorem 8.2, page 267, 1995. CRC Press. YY98. A. Young, M. Yung. Auto-Reoverableand Auto-Certifiable Cryptosystems. In Advances in Cryptology—Eurocrypt ’98, Springer-Verlag.
A
Appendix: Values for r
768 bit r 8f9fbca258fe64f46b7560cf21948079d89193f0d8ffe0e234340fc27ba32a50 e56de953432c258508e605a7fe1d99b17535d717b9c93b399a6d96bdafd3fc5f d83a4ea4ca9df6eae930e520a7c5d5d6303ed527b02eac009b87bebcb2cd631d 1024 bit r fd90e33af0306c8b1a9551ba0e536023b4d2965d3aa813587ccf1aeb1ba2da82 489b8945e8899bc546dfded24c861742d2578764a9e70b88a1fe9953469c7b5b 89b1b15b1f3d775947a85e709fe97054722c78e31ba202379e1e16362baa4a66 c6da0a58b654223fdc4844963478441afbbfad7879864fe1d5df0a4c4b646591
298
Adam Young and Moti Yung
1344 bit r 88fe36d26cee18199e146540f773848e41824d7653758bc9a65606f2c852dc7f ebb564ee4787b4594e2f98973d2517eb701cb8533454805a5f2c30d7494acd8b ae3637ebcca79f28812e20097ae1894029ae4213e7f2b2dbf4eb81aae045ed35 9679e37f43b85b0d9d849682e331be49e38fecf68f442547ea47275c5244b4d7 81cf047e19472ed5e2c6ff8bf6cab1c864248275c0d0e7feb60c614bc3c1aa64 b0676e7d9583ce99 1376 bit r f6b0b9ef50c928f188a96832d721b2fc8fc65eb720b3fb307de17ece04383db7 afe1a56e66fd0e7353bb7160e74887ab01a7578af81130164e1302233ed16566 76091120aa4836983f58b8e198f7c270b10c767a1f3f8d474112a640a4432651 c9846f73c514c65a85bb9402a7772b849da7bbbf24aa5658cb7926db942f7cfe f811f679fd0d044d0038eddd651bb30d29f52659d7d5b75501c78904caa48906 1fcdc9e1a595d4e4aff561dd Currently, the above values for r that are greater than or equal to 800 bits are strong enough for use in cryptographic systems. These values were found using the AT&T multi-precision library “CryptoLib” [LMS]. These values were found on a Pentium 166 MHz machine using the algorithm described in this paper.
Reducing Ideal Arithmetic to Linear Algebra Problems Stefan Neis Institute of Theoretical Computer Science Darmstadt University of Technology 64283 Darmstadt Germany
Abstract. In this paper, we will show a reduction of ideal arithmetic, or more generally, of arithmetic of ZZ–modules of full rank in orders of number fields to problems of linear algebra over ZZ/mZZ , where m is a possibly composite integer. The problems of linear algebra over ZZ/mZZ will be solved directly, instead of either “reducing” them to problems of linear algebra over ZZ or factoring m and working modulo powers of primes and applying the Chinese Remainder theorem.
1
Introduction
Although there are well known algorithms to do ideal arithmetic in orders of number fields, for example, those implemented in KASH ([KASH]) and gp ([GP]), we consider the problem of ideal arithmetic again, since many of the existing algorithms only solve specific problems and each new problem involves a completely new algorithm. We will present a more general way to solve the problems of ideal arithmetic, which is based on the reduction of ideal arithmetic to linear algebra problems over ZZ/mZZ. Although this is basically a well-known technique, it is usually ignored, since the standard textbooks claim that the arising linear algebra problems are too hard to solve and therefore reject this reduction without giving further details about it. For example Cohen, when considering ideal inversion (see [Coh95], pp. 202) describes a method based on the solution of a linear equation system mod d, where d is a possibly composite number. He says that this could be done “by factoring d and working modulo powers of primes” and applying the chinese remainder theorem. But Cohen claims that it is probably better to solve a much larger equation system over ZZ. This however is still too slow, so he only considers ideal inversion for ideals over the ring of integers of an algebraic number field, where a special algorithm using the different can be used, which is much faster. In particular, we will show an algorithm for pseudo – division of ideals in any order of a number field, which we implemented in LiDIA([LiDIA]). In practice,it is as fast as the more special division algorithm for ideals of the maximal order described in [Coh95], pg. 202f. and implemented in gp. It is also as fast as the implementation in KASH, which is more specialised as well. This shows J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 299–310, 1998. c Springer-Verlag Berlin Heidelberg 1998
300
Stefan Neis
that a general implementation of pseudo–division is no longer too slow to be competitive, which was claimed up to now. However we can only prove a theoretical bound for the run time of this algorithm, which is worse than the run times for the more specialised algorithms presented so far, i.e., we can only show: Theorem 1. Given an ideal a and a full module b both contained in an order O of a number field of degree n with exponents bounded by l and m, respectively, the computation of the (pseudo) quotient (a : b) requires at most O(n5 ) arithmetic operations modulo m and O(n4 ) arithmetic operations modulo lm. The essential feature to obtain this fast pseudo – division – algorithm is our algorithm for computing the kernel of an homomorphism over ZZ/mZZ, which is given in section 3 and described in more detail in [BuNe97]. The outline of the paper is as follows: First, we present some basic facts about algebraic number fields (Section 2). Then we give the basic algorithm needed to solve the linear algebra problems involved (Section 3) and finally we describe the reduction of the arithmetic to problems of linear algebra (Section 4).
2
ZZ–Modules in Algebraic Number Fields
In this section we review a few basic facts about ZZ–modules, ideals and orders in algebraic number fields. For further information the reader may consult [BoSh66]. Let K be an algebraic number field K with degree [K : Q] = n. A ZZ–module a ⊂ K is called a module of full rank or simply a full module, if it contains n linearly independent elements (over Q). A set {α1 , . . . , αm } ⊂ a is called a basis of a, if α1 , . . . , αm are linearly independent (over Q) and if they generate the module a, i.e., if a = {c1 α1 + · · · + cm αm , c1 , . . . , cm ∈ ZZ}. An element α ∈ K is called a multiplier of the full module a, if αa ⊂ a. The set of all multipliers of a full module a is a ring containing 1 and it is called the ring of multipliers of a, denoted by Oa . Such rings are full modules, and are called orders. For a given algebraic number field K there may be many orders, however the quotient field of each order is K. Moreover, there is a unique maximal order which contains all other orders. This maximal order is denoted by OK . It consists of all elements of K, whose monic minimum polynomials have rational integers as coefficients. Any full module a additionally is called an ideal of its ring of multipliers Oa , i.e., Oa · a ⊂ a. Obviously, a also is an ideal of any smaller order O ⊂ Oa . Therefore, for a fixed order O, we may distinguish two types of full modules: those modules that are ideals of O, and those that are not. Where it is possible without increasing the run time, we will describe algorithms which operate on modules. However, if a version specific to ideals is faster, we will only describe this faster algorithm. We will assume number fields to be represented by an order, so let O be an order of an algebraic number field K with degree n, i.e., the rank of O as ZZ– module also is n. For simplicity, we identify elements of O and elements of ZZ n .
Reducing Ideal Arithmetic to Linear Algebra Problems
301
This identification is possible due to the isomorphism Ψ , which maps elements of ZZ n to elements of O by interpreting them as coefficients relative to some fixed basis of O, i.e., let ω1 , . . . , ωn be the fixed basis of O, then we define Ψ : ZZ n −→ O, (a1 , . . . , an ) 7−→
n X
ai ω i .
(1)
i=1
For any module a ⊂ K there is a rational integer d, such that d · a ⊆ O. Therefore every module a may be represented by a pair (d, a0) with d·a = a0 ⊆ O. For a module a ⊆ O the number exp(a) := min{m ∈ IN|mO ⊆ a} is called the exponent of a, i.e., exp(a) is the exponent of the additive group O/a. If we define min{} := 0, then exp(a) is a positive integer if and only if a is a full module. Obviously we have exp(a)|N (a), where N (a) denotes the norm of a, i.e., the number of elements in O/a. For m ∈ IN there is a one-to-one correspondence between modules a with exp(a)|m and the modules of O/mO. (If a module a/mO with basis (¯ a1 , . . . , ¯ad ) is given, do the following to get back to the original module a: Lift a ¯i to some number ai ∈ O for 1 ≤ i ≤ d and then let a = a1 ZZ + · · · ad ZZ + mO.) If a is an ideal we have exp(a) = min(IN ∩ O). In general, only min(IN ∩ O)| exp(a) is valid. In the following we will represent a module a ⊆ O by a pair (m, A) with the following properties: 1. exp(a)|m. 2. A = {a1 , . . . , ad } ⊆ O generates a/mO as a ZZ–module, so we have a = (m, A) = mO + a1 ZZ + · · · + ad ZZ. Since A may be interpreted as a collection of column vectors (due to the given isomorphism, see 1), we may identify A and the matrix consisting of these column vectors, i.e., we also say A ∈ ZZ n×d. Note that we do not insist on any conditions on A, which would force A to be somehow uniquely determined. This causes some trouble when checking two modules for equality, which in contrast is a very easy task, if you use the Hermite normal form of matrices over ZZ to represent modules. However, it seems to us that uniqueness of A would be computationally too hard to achieve, in conjunction with a reasonably fast arithmetic. Example 1. √ √ −2, 5).√The maximal order O√ Consider the number field K=Q( √ K of this √ field = 1, ω = ( −2 + 5), ω = 3/4 + 1/2( −2 + has the Z Z–basis ω 1 2 3 √ 2 √ √ 2 √ √ √ 5) + √ −2 + 5) , ω = 5/4 +29/28( −2 + 5)+1/4( −2 + 5) +1/28( −2 + 1/4( 4 √ 3 5) . The set a = 21ZZ + 21ω2 ZZ + (17 + 3ω2 + ω3 )ZZ + (20 + 4ω2 + ω4 )ZZ is
302
Stefan Neis
an ideal in OK and we obtain exp(a) = 21. So we can write a = (17 + 3ω2 + ω3 )ZZ + (20 + 4ω2 + ω4 )ZZ + 21OK and obtain the following representation: 17 20 3 4 a= 21, 1 0 mod 21 . 0 1
3
Linear Algebra over ZZ/mZZ
An algorithm for computing images of homomorphisms over ZZ/mZZ has first been described by J.A. Howell in [Ho86]. We will extend this algorithm to also compute the kernel. The idea is to perform mostly unimodular transformations of the matrix describing a homomorphism with respect to a fixed generating system. To do this, the same techniques that have been developed for ZZ can be applied. Additionally, we sometimes multiply by a zero divisor in a controlled way, which will not cause any loss of information. This approach leads to an algorithm which is described, proven as correct, and analyzed in [BuNe97]. Since the algorithm itself is rather simple, we will give at least this algorithm and some run time estimates, although we will not go into details. In the description, we will use the following functions: – The function an(a) which returns a generator of the annihilator {x ∈ ZZ/mZZ : ax = 0} of a, – and the function xgcdc(x, y, a, b, e, f) which computes g, x, y, e, f, such that g = gcd(a, b, m) and ab
f x −e y
= 0g
and ex + fy = 1.
The algorithm will operate on a matrix k×l
B = (b1 , . . . , bl ) = (bij ) 1≤i≤k ∈ (ZZ/mZZ) 1≤j≤l
where bi denotes the i–th column of B. Then, the algorithm can be given as follows:
Reducing Ideal Arithmetic to Linear Algebra Problems
303
Algorithm 2 Image and Kernel k×l
Input: A homomorphism described by a matrix B ∈ (ZZ/mZZ) Output: Generating systems of the image and the kernel of B, repre-
sented by matrices A and T , where A is in Howell triangular form (if only the defined columns are considered). A transformation U such that B · U = A. (1) for (i = k; i > 0; i − −) do (2) j = l; (3) Initialize T with the l × l identity matrix; (4) while (j > 0 ∧ bij = 0) do (5) j − −; (6) od (7) if (j 6= 0) then (8) swap columns j and l of both B and T ; (9) for (j = l − 1; j > 0; j − −) do (10) if (bij 6= 0) then (11) g = xgcdc(x, y, bil , bij , e, f); (12) (bl , bj ) = (xbl + ybj , fbl − ebj ); (13) (tl , tj ) = (xtl + ytj , ftl − etj ); (14) fi (15) od (16) ai = b l ; (17) u i = tl ; (18) x = an(bil ); (19) if (x = 0) then (20) l−− (21) else (22) bl = xbl ; (23) tl = xtl ; (24) fi (25) fi (26) od (27) Remove all but the first l columns of A. Counting xgcdc(. . .) and an(. . .) as well as addition and multiplication as ring operations in ZZ/mZZ , we obtain the following complexity result, as was shown in [BuNe97]. Theorem 3. Algorithm 2 requires O((k + l)kl) operations in ZZ/mZZ to compute the kernel
304
Stefan Neis k×l
and the image of a matrix in (ZZ/mZZ) operations in ZZ/mZZ are sufficient.
. To compute only the image O(k 2 l)
In addition, the following result about operations on modules has been shown in [BuNe97]. Theorem 4. k Let M, N ⊂ (ZZ/mZZ) be two modules given by at most k generators each. The sum M + N and the intersection M ∩ N can both be computed by O(k 3 ) operations in ZZ/mZZ.
4
Ideal Arithmetic
Let a and b be two modules contained in a number field of degree n. We will consider the following operations: a + b := {m1 + m2 |m1 ∈ a, m2 ∈ b} (sum) X m1 · m2 |S ⊆ a × b finite} (product) ab := a · b := { (m1 ,m2 )∈S
a ∩ b := {m|m ∈ a and m ∈ b} (Intersection) (a : b) := {x ∈ K|x · b ⊆ a} (pseudo – division). First, we assume, that all modules we have to deal with are contained in a fixed order O, since additional denominators are easy to handle by precomputations. Consider two modules a = a0 /a and b = b0 /b such that a0 and b0 are contained in O. Then the following equations hold: a 1 b a0 + b0 · a+b= gcd(a, b) gcd(a, b) lcm(a, b) a 1 b a0 ∩ b0 · a∩b= gcd(a, b) gcd(a, b) lcm(a, b) 1 a · b = (a0 · b0 ) ab 0 0 b (a : b) = (a : b ) a
4.1
Addition and Intersection
Let a = (l, A) = (l, (a1 , . . . , ar1 )) and b = (m, B) = (m, (b1 , . . . , br2 )) be two ZZ–modules contained in O. Obviously gcd(l, m) O ⊆ a + b is valid, which implies | {z } =:q
a + b = (a + qO) + (b + qO) = (q, A) + (q, B),
Reducing Ideal Arithmetic to Linear Algebra Problems
305
where · denotes reduction modulo q. Thus, computing the sum of two modules is reduced to computing the sum of two modules in O/qO which amounts to n computing the sum of two modules over (ZZ/qZZ) . Example 2. Consider the ideals
17 20 11 8 3 4 12 1 a= 21, 1 0 mod 21 , b = 15, 1 0 mod 15 . 0 1 0 1
Since gcd(21, 15) = 3, we have a + b = (3, C) where C is given by the sum of the modules given by the reduced generating systems 22 22 0 1 0 1 1 0 mod 3 and 1 0 mod 3 01 01 over ZZ/3ZZ, so we have:
22 0 1 a+b = 3, 1 0 mod 3 . 01
Similarly, we can reduce the computation of the intersection to the computation of the intersection of two modules over ZZ/mZZ. However, there is a little quirk in this computation, since we do not know how to make use of the component lO in the representation a = lO + a1 ZZ + · · · + ar1 ZZ. However, we have lcm(l, m)O ⊆ a, b, a ∩ b. So, we first lift the representations of a and b to O/(lcm(l, m)O), and then, we can use the algorithm for modules over ZZ/lcm(l, m)ZZ immediately. This lifting can be done as follows: If a1 , . . . , ak is a generating system of a mod lO and ω1 , . . . , ωn is a basis of O, then a1 , . . . , ak , lω1 , . . . , lωn is a generating system of a mod lcm(l, m)O. This generating system can be reduced to a generating system with at most n components by the method of section 3. Looking at the complexity, computing the sum of a and b takes O(n3 ) operations mod gcd(l, m), whereas computing their intersection requires O(n3 ) operations mod lcm(l, m). 4.2
Multiplication
We will consider multiplication only for ideals, since this enables us to use an easier and faster algorithm, however the algorithm can be modified easily to handle general modules.
306
Stefan Neis
Let a = (l, A) = (l, (a1 , . . . , ar1 )) and b = (m, B) = (m, (b1 , . . . , br2 )) be two integral ideals. For any c ∈ a · b, there exist algebraic integers x0 and y0 and rational integers x1 , . . . , xr1 , y1 , . . . , yr2 such that c = (lx0 +
r1 X
xi ai ) · (my0 +
i=1
r2 X
yi bi ).
i=1
By multiplying this out, we obtain c = lmx0 y0 + | {z } ∈l·m·O
r2 r1 X X i=1 j=1
|
xi yj ai bj +
{z
}
r1 X
my0 xi ai +
i=1
r2 X
lx0 yi bi .
i=1
∈ha i bj i1≤i≤r1 ,1≤j≤r2
Since a is an ideal we have for each i: y0 ai ∈ a, thus we obtain y0 ai = l · z0 +
r1 X
zj aj
(2)
j=1
for a suitable algebraic integer z0 and suitable rational integers z1 , . . . , zr1 , resulting in r1 X xi zj maj . my0 xi ai = lmz0 xi + | {z } j=1 ∈l·m·O | {z } ∈hma 1 ,...,ma r i 1
Doing an analogous computation on b we obtain that the product a · b is given by (l · m, (a1 b1 , . . . , a1 br2 , . . . , ar1 b1 , . . . , ar1 br2 , ma1 , . . . , mar1 , lb1 , . . . , lbr2 )). Using algorithm 2, the matrix in ZZ/lmZZ is then reduced to a matrix with at most n columns. Looking at the complexity, the product of two ideals a and b can be computed by at most O(r1 r2 n2 ) operations mod lm. This is the time needed to compute the products of algebraic numbers generating the result as well as the time needed to reduce the resulting (n × (r1 r2 ))–matrix. In the worst case this amounts to O(n4 ) operations. Example 3. Choose a and b as in example 2. Then, we obtain 310 81 249 80 255 300 231 168 100 7 103 5 45 60 252 21 ab = 315, 94 2 161 311 15 0 21 0 mod 315 , 112 58 136 83 0 15 0 21
Reducing Ideal Arithmetic to Linear Algebra Problems
which can be reduced to
307
101 188 87 46 ab = 315, 1 0 mod 315 . 0 1
4.3
Division
If b is an invertible O–ideal, we know that b−1 = (O : b), i.e., b · (O : b) = (O : b) · b = O. In general, however, only b · (O : b) = (O : b) · b ⊆ O is valid. In this case, (O : b) is the best existing approximation to an inverse. If b is an O–ideal such that (O : b) · b 6= O, the order O can be easily maximized such that we are able to compute a true inverse in the larger order. (b : b) is the ring of multipliers of b and it is easily seen to be the larger order we wanted. For efficiency, we only consider the pseudo quotient of an ideal a divided by a full module b. Computing the pseudo quotient uses the following result: Proposition 1. If a is an integral O–ideal, b ⊂ O is a full module, and m is a multiple of exp(b), then ma ⊆ m(a : b) ⊆ a. Proof. Since b ⊆ O and since a is an ideal, we have a · b ⊆ a · O ⊆ a, which in turn implies a ⊆ (a : b) due to the definition of (a : b). To show the second inclusion, fix an x ∈ (a : b). Due to exp(b) ∈ b we have 1 a for all x ∈ (a : b). x · exp(b) ∈ a (by definition of (a : b)). This implies x ∈ exp(b) 1 t u Note, that exp(b) is well-defined, since b is a full module, i.e., exp(b) > 0. This proposition is used as follows: We consider the homomorphism φ : a/ma −→ Hom (b/mO → a/ma) (α + ma) 7−→ (x + mO 7→ αx + ma). This is well defined since mα O + ma(x + mO) ∈ a/ma (α + ma) · (x + mO) = |{z} α · x + |{z} {z } | ∈a
∈ma
∈ma
and the following equivalences hold: α + ma ∈ Ker(φ) ⇔ (α + ma) · (x + mO) ∈ m · a ∀x ∈ b ⇔ α · b ⊆ m · a, by proposition 1 ⇔ α ∈ ((m · a) : b) = m(a : b). Therefore we see that m · (a : b) is the kernel of the homomorphism φ. Since this algorithm is relatively complicated, we will state it explicitly in a C++-like form:
308
Stefan Neis
Algorithm 5
Ideal Division Input: An ideal a and a full module b Output: The (pseudo) quotient (a : b)
(1) Compute ZZ–bases A and B of a and b, respectively. (2) for (i = 1; i ≤ n; i + +) do (3) for (j = 1; j ≤ n; j + +) do (4) Store the representation of ai · bj as column ((i − 1) ∗ n + j) of the matrix C. (5) od (6) od (7) Compute the n2 solutions of the n2 equation systems A·X = C. // Now n successive columns of X represent the image of a // basis vector under φ. (8) Store each set of n successive columns of X as column of the n2 × n matrix M . (9) Compute the kernel of M modulo m. // So we have the kernel with respect to the ZZ–basis of a and // need to represent it with respect to the basis of O. This // involves only numbers bounded by l · m. (10) Lift the basis of a and the kernel to ZZ/lmZZ and compute the product P of both matrices. (11) Return 1/m · (lm, P ).
Now we consider the complexity of this algorithm. First, we compute a matrix representing φ (steps 1–4). To achieve this, we compute ZZ–bases of a and b, which requires O(n3 ) arithmetic operations mod l and mod m. Then we compute all pairwise products of elements of the first base with elements of the second base. This may be done mod lm and thus requires O(n4 ) arithmetic operations mod lm. The results of all these multiplications are obviously contained in a and therfore, they may be represented with respect to the ZZ–basis of a. To accomplish this, we need to simultaneously(!!) solve n2 equation systems mod lm, which requires at most O(n4 ) arithmetic operations mod lm. Now, we have to compute the kernel of an n2 × n–matrix over ZZ/mZZ . By theorem 3, this requires O(n5 ) arithmetic operations mod m. Finally, to obtain the matrix representing the result, we need to multiply the ZZ–basis of a by the columns of the kernel, which requires at most O(n3 ) operations modulo lm, since lm is a multiple of the exponent of the pseudo quotient. So we obtain the following result:
Reducing Ideal Arithmetic to Linear Algebra Problems
309
Theorem 6. Given an ideal a and a full module b both contained in and order O of a number field of degree n with exponents bounded by l and m, respectively, the computation of the (pseudo) quotient (a : b) requires at most O(n5 ) arithmetical operations modulo m and O(n4 ) arithmetical operations modulo lm. Note that this upper bound on the run time is overly pessimistic, since it assumes that during the computation of the kernel all n3 entries in the n2 × n matrix have to be eliminated by a suitable column operation. Over a field or over a principal ideal domain, this would never happen, since the rank of such a matrix would be at most n, i.e., after eliminating n2 entries of the matrix, only zeros would remain. This however is not true for ZZ/mZZ. Using the Chinese remainder theorem and the fact that it is sufficient to eliminate O(n2 ) entries for finite prime fields, one can assume that eliminating O(n2 × τ (m)) entries is sufficient, where τ (m) denotes the sum of the exponents in the prime factorization of m. However, this is not really an improvement, since n is a small number anyway. Nevertheless, cases where one has to eliminate O(n3 ) entries seem to be very rare in practice. 4.4
Testing for Equality
Let a = (l, A) = (l, (a1 , . . . , ar1 )) and b = (m, B) = (m, (b1 , . . . , br2 )) be two ZZ–modules, which are contained in O. Since l and m may be multiples of the exponents of a and b, and since uniqueness of A and B is not required, testing for equality is rather complicated and time–consuming. However, equality testing is not needed very often, so this is not a real problem. If l = m, then a and b are equal if and only if A and B generate the same module over ZZ/mZZ . This can be tested with O(n3 ) operations in ZZ/mZZ, as was shown in [BuNe97]. Otherwise, we have to check whether gcd(l, m) is a multiple of the exponents of a and b. This can be done by checking whether gcd(l, m) · ωi is contained in both a and b for all 1 ≤ i ≤ n (ω1 , . . . , ωn denotes a ZZ–Basis of O). According to [BuNe97], this will take O(n3 ) operations. If gcd(l, m) is not a multiple of both exponents, a and b are not equal, otherwise we reduce A and B mod gcd(l, m) and apply the method described for the case l = m. Hence, testing equality of two ZZ–modules in an algebraic number field of degree n which have exponents dividing l and m, respectively, will take at most O(n3 ) operations in ZZ/ gcd(l, m)ZZ. This bound however is overly pessimistic. I believe a more careful run–time analysis should enable us to show that in fact O(n2 ) operations are sufficient.
Acknowledgements I would like to thank Johannes Buchmann for inspiring discussions on the topic of this paper.
310
Stefan Neis
References [BoSh66] Z.I. Borevich, I.R. Shafarevich, Number Theory, Academic Press, New York and London, 1966 [Coh95] H. Cohen, A Course in Computational Algebraic Number Theory, 2nd corrected printing, Springer, Heidelberg, M¨ unchen, New York, 1995 [BuNe97] J. Buchmann, S. Neis, Algorithms for Linear Algebra Problems over Principal Ideal Rings, submitted to SIAM Journal of Computing [Ho86] J.A. Howell, Spans in the Module (ZZ m )s , Lin. Mult. Alg. 19, 1986, pg. 67–77 [KASH] M. Pohst, KAnt SHell (Version 1.9), TU Berlin [LiDIA] The LiDIA Group, LiDIA – a library for computational number theory, TU Darmstadt [GP] C. Batut, D. Bernardi, H. Cohen, M. Olivier, GP/PARI CALCULATOR Version 1.39, Universit´e Bordeaux I
Evaluation of Linear Relations Between Vectors of a Lattice in Euclidean Space I. A. Semaev 43-2 Profsoyusnaya ul. Apt. 723, 117420 Moscow, Russia
Abstract. We prove that to find a nontrivial integer linear relation between vectors of a lattice L ⊂ IRn , whose euclidean length is at most M , one needs O n5+ε (ln M n/λ)1+ε binary operations for any ε > O, where λ is the first successive minimum of L.
Let IRn be n-dimensional space over the field IR of real numbers. A lattice of dimension k ≤ n in IR is the set L of vectors x1¯b1 + x2¯b2 + · · · + xk¯bk ,
xi ∈ ZZ ,
(1)
where ¯b1 , ¯b2 , · · · , ¯bk are linearly independent vectors in IRn . In this paper we solve the following problem. Given vectors `¯1 , `¯2 , · · · , `¯m of some lattice L ⊂ IRn , find a nonzero integer vector z1 , z2 , · · · , zm such that z1 `¯1 + z2 `¯2 + · · · + zm `¯m = 0
(2)
if it exists. A particular case of this problem has been considered in [1]. Namely, let K be a field of algebraic numbers of degree n = r1 + 2r2 over the field Q of rational numbers, where r1 is the number of real embeddings of K and 2r2 is the number of complex embeddings. Let O be a ring of integers of K, U be its group of units and U0 be its group of roots of unity. Then the factor-group U/U0 is isomorphic to the (r − 1)-dimensional lattice L(K) ⊂ IRr , r = r1 + r2 . D. Gordon proved in [1] that for 2r vectors of L(K), whose Euclidean length is 3 no more than M r , onecan find a nontrivial integer linear relation between 2+ε binary operations for any ε > O. This algorithm is them in O r 5+ε (ln M ) based on the LLL-reduction algorithm and depends on fast arithmetic. Without 3 6 it the algorithm works in O r (ln M ) binary operations. We prove here the following theorem. Theorem 1. Let L be a lattice in IRn with λ as the first successive minimum. Then for any ε > 0, one can in O n5+ε (ln M n/λ)1+ε binary operations find a nontrivial integer linear relation between vectors of L whose Euclidean length is no more M , or establish their independence. J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 311–322, 1998. c Springer-Verlag Berlin Heidelberg 1998
312
I. A. Semaev
The proof is accomplished by giving an explicit algorithm. We assume that components of vectors of L are given by their rational approximations with some accuracy. One needs t = O(n log n + n log(M/Λ)) leading binary digits of coordinates, where for Λ one can take some positive lower bound of the first succesive minimum of the lattice L. In this case, the algorithm works in O n5+ε (ln M n/λ)1+ε binary operations. We remark that one does not need to check an integer relation founded by our algorithm, since its validity is guaranteed by Theorem 1. The complexity as stated in Theorem 1 depends on asymptotically fast arithmetic algorithms. Without using fast arithmetic our algorithm works in 2 7 O n (ln M n/λ) binary operations. For L = L(K) from [3] we have λ 1/r2 . So under the in Gordon’s work, nontrivial linear relation can be assumptions 1+ε binary operations for any ε > O. Let us note that found in O r 5+ε (ln M ) our method offers an advantage over Gordon’s method in the applications to the discrete logarithm problem in prime finite fields of order p. For example, for p ≈ 21000, we have r ≤ 10 and ln M ≈ 1015 . We shall demonstrate the idea of our method in the particular case when the vectors `¯1 , `¯2 , · · · , `¯n of some full lattice L ⊂ IRn are linearly independent and `¯n+1 ∈ L is some nonzero vector. In this case m = n + 1. For integers zi , i ∈ [1, n + 1] in (2), we have zn+1 6= 0, and the vector z1 /zn+1 , · · · , zn /zn+1 is the unique solution of a system of linear equations with a matrix whose rows are `¯1 , · · · , `¯n and the free row is `¯n+1 . We find some approximate solution of this system. One can solve to degree of accuracy that zi /zn+1 are convergents to the corresponding coordinates of the approximation above. Using the algorithms of continued fractions, we evaluate zi /zn+1 . Thus we evaluate the integers zi . This paper contains three sections. In the first section, we introduce a few definitions from the geometry of numbers and prove some auxiliary propositions. In the second section, we consider some problems in the area of the approximation of reals by their convergents. In the third section, we formulate our algorithm and prove Theorem 1.
1
Lemmas
Let L be a lattice in IRn given by (1). The vectors ¯b1 , ¯b2 , · · · , ¯bk are the rows of a k × n matrix B. If k = n, then L is called full. One defines the inner product of the vectors ¯b = (b1 , b2 , · · · , bn) and c¯ = (c1 , c2 , · · · , cn ) by h¯b, ¯ci = Σbi ci . The Euclidean length of ¯b equals k¯bk = |h¯b, ¯bi|1/2 . The Cauchy-Schwarz inequality asserts that |h¯b, c¯i| ≤ k¯bkk¯ ck .
Evaluation of Linear Relations
313
Let B ∗ = BB T be the k ×k matrix whose entries are h¯bi , ¯bj i. The determinant of L is defined by d(L) = | det B ∗ |1/2 i.e., the positive square root of the modulus of det B ∗ . We have d(L) = | det B| for a full lattice L. The number λ = λ(L) = ¯ taken on the set of nonzero vectors of L is called the first successive min k`k minimum. The following inequality is a consequence of the Minkovski’s theorem on convex bodies: (3) λk νk (1) ≤ 2k d(L) , where νk (r) = 2π k+1/2 r k /Γ (k + 1/2) is the volume of a sphere in IRk of radius r[4]. Lemma 1. Let `¯1 , `¯2 , · · · , `¯k+1 ∈ L, k`¯i k ≤ M . Then for an integer V ≥ k ((2k + 3)M/λ) there exist coprime integers z1 , z2 , · · · , zk+1 not all of them are zero such that z1 `¯1 + z2 `¯2 + · · · + zk+1 `¯k+1 = 0 . Pk+1 Proof. Let us consider i=1 zi `¯i , where O ≤ z1 < V . There are V k+1 such sums altogether. We show that there are two of them which express the same vector of L. Since
k+1
X ¯ zi `i < (k + 1)V M ,
i=1
all the vectors of this kind are in the sphere in IRn of radius (k + 1)V M . The number of points of L in this sphere is no more than νk ((k + 1)V M + λ/2)/νk (λ/2) ≤ (2(k + 1)M V /λ + 1)k < V k ((2k + 3)M/λ)k ≤ V k+1 . So there exist sums which are equal. Their difference provides the desired relation. The estimate for the number of points ∈ L in the sphere of radius (k+1)V M for k = n is obvious. The general case requires explanation. The lattice L is the image of a lattice of integer vectors in IRk under the linear map (x1 , x2 , · · · , xk ) → x1¯b1 + x2¯b2 + · · · + xk ¯bk . The inverse image of the sphere of radius r in IRn is a body in IRk defined by k X
h¯bi , ¯bj ixi xj ≤ r .
(4)
i,j=1
The quadratic form in the left of the inequality is positive definite. So the volume of this body is equal to νk (r)/| det B ∗ |1/2 . The number of integer vectors for which (4) is valid is equal to the number of points of L in the sphere of radius r in IRn . Thus, in the general case, the estimate above is valid. So the lemma is proved. t u
314
I. A. Semaev
Corollary 1. Let `¯1 , `¯2 , · · · , `¯r+1 be linearly dependent vectors in L, r ≤ k such that `¯1 , `¯2 , · · · , `¯r are linearly independent. There exists a unique Pk+1 nonzero integer vector z1 , z2 , · · · , zr+1 with coprime coordinates such that i=1 zi `¯i = O, where r zr+1 > O, |zi | < ((2r + 3)M/λ) , i ∈ [1, r + 1]. Proof. The vectors `¯1 , `¯2 , · · · , `¯r+1 span a lattice L0 ⊂ L. So λ(L0 ) ≥ λ = λ(L). From Lemma 1, this proposition is valid. t u ¯ ¯ ¯ Let `1 , `2 , · · · , `r+1 be vectors of the lattice L. Let us map these vectors to IRr as follows: `¯i → `¯∗i = h`¯i , `¯1 i, h`¯i , `¯2 i, · · · , h`¯i , `¯r i . It is obvious that `¯1 , `¯2 , · · · , `¯r are linearly independent if and only if the determinant of a matrix A with rows `¯∗1 , `¯∗2 , · · · , `¯∗r isn’t equal to zero. Let us denote by A(i) a matrix with the same rows as A except the i-th, which equals `¯∗r+1 . Let for some s |h`¯i , `¯j i| ≤ 2s ,
i ∈ [1, r + 1] , j ∈ [1, r] .
Then for t > 0, we have
`∗ij = h`¯i , `¯j i = 2s aij + 2−t a0ij ,
i ∈ [1, r + 1] , j ∈ [1, r] ,
where |aij |, |a0ij | < 1. Let A1 = (aij )i,j∈[1,r] . Let us denote by A1 , a matrix which rows are the rows of A1 except the i-th, which equals (ar+11 , ar+12 , · · · , ar+1r ). (i)
Lemma 2. det A = 2sr (det A1 + 2−t1 d), where t1 = t − r − log2 r!,
|d| < 1 .
Proof. The proposition follows from the equalities (t0 > O, |ai|, |a0i | < 1): ! m m Y Y s0 −t0 0 ms0 −t0 2 ai + 2 ai = 2 ai + 2 d 1 i=1
=2
ms0
i=1 m Y
! ai + 2
−t0+m
d2
,
i=1
where |d1| ≤ 2m − 1, |d2| < 1, and the equality (t0 > 0, |a0i | < 1): ! m m m X X X s0 −t0 0 s0 −t0 0 2 ai + 2 ai = 2 ai + 2 ai i=1
i=1
= 2s
0
m X
i=1 0
ai + 2−t +log 2 m d3
! ,
i=1
where |d3| < 1, and the following repesentation of the determinant of a matrix: X `∗1j1 · · · `∗rjr (−1)σ(j1 ···jr ) , det A = j1 ···jr
where σ (j1 · · · jr ) = O if the permutation j1 , j2 , · · · , jr of the numbers 1, 2, · · · , r t u is even and σ (j1 · · · jr ) = 1 otherwise. So Lemma 2 is proved.
Evaluation of Linear Relations
315
Lemma 3. Let t ≥ log2 2sr+3r+1 r!/ λ2r vr2 (1) . Then det A = O if and only if | det A1 | < 2−t1 . Proof. It follows from Lemma 2 that if det A = O then | det A1 | < 2−t1 . Let | det A1 | < 2−t1 . If det A 6= O, then `¯1 , `¯2 , · · · , `¯r span a lattice L0 of dimension r, whose determinant equals | det A|1/2 . The inequality (3) follows that λr νr (1)/2r ≤ | det A|1/2 , since λ is no more than the first successive minimum of the lattice L0 . So λ2r νr2 (1)/22r < 2sr−t1+1 which contradicts the conditions. This proves the lemma. t u 0 0 Lemma 4. Let α = 2s a + 2−t a0 , where t0 ≥ 2, |a| ≥ 1/2, |a0| < 1. Then 0 0 α−1 = 2−s a−1 + 2−t +3 a00 , where |a00 | < 1. Proof. From the conditions we obtain |α| > 2s
0
1/2 − 2−t
0
0
≥ 2s −2 . So
0 0 0 0 0 0 0 0 |α−1 − 2−s a−1 | = | 2s a − α /2s aα| < 2s −t a0 /2s aα < 2−t −s +3 . t u
Thus, the lemma is proved.
Let us suppose that `¯1 , `¯2 , · · · , `¯r are linearly independent vectors of the lattice L while `¯1 , · · · , `¯r , `¯r+1 are linearly dependent. Then det A 6= O. Furthermore, we have from (3), | det A| ≥ λ2r νr2 (1)/22r . The system of linear equations y¯A = `¯∗r+1
(5)
has unique solution y¯ = (y1 , y2 , · · · , yr ).
Lemma 5. Let t ≥ log2 2sr+3r+2 r!/ λ2r νr2 (1) . Then |yi − det A1 / det A1 | < 2−u , 22sr+5r+5 r!/ λ4r νr4 (1) . (i)
where u = t − log2
Proof. Let s1 = log2 | det A|. Thus s1 ≥ log2 λ2r νr2 (1)/22r . Let us denote a = 2sr−s1 det A1 , t2 = t1 − sr + s1 . From the conditions t2 ≥ 2. Lemma 2 shows det A = 2s1 (a + 2−t2 d), where |d| < 1. Then |a| ≥ 1/2. According to Lemma 4, (det A)−1 = 2−sr (det A1 )−1 + 2−t2−s1 +3 d0 , where |d0| < 1. From Cramer’s rule, the solution of (5) can be found as (i)
yi = det A1 / det A1 (i) = 2sr det A1 + 2−t2 +s1 di 2−sr (det A1 )−1 + 2−t2−s1 +3 d0 = det A1 / det A1 + 2−t2 a−1 di + 2−2t2+3 di d0 + 2−t2 +sr−s1 +3 det A1 d0 .(6) (i)
(i)
316
I. A. Semaev
(i) We have used the expression det A(i) = 2sr det A1 + 2−t1 di , where |di | < 1, (i)
from Lemma 2. It is obvious that | det A1 | ≤ r!. Since t2 ≥ 2 and s1 ≤ sr + log2 r!, we have 2t2 − 3 ≥ t2 − 1 ≥ t2 − sr + s1 − log2 r! − 3 . Thus (6) follows the inequality |yi − det A1 / det A1 | < 2−(t2 −sr+s1 −log2 r!−5) . (i)
So the lemma is proved.
2
t u
Approximation of Reals by Convergents
Let α be a positive real number less than one prescribed to any degree of accuracy. That is, for any natural s we have at our disposal s leading binary digits of α. Let ν be a real with 1 ≤ ν < 2n for some natural n. We prove here the following propositions. Theorem 2. One can evaluate convergents ps−1 /qs−1 , ps /qs to α such that qs ≤ ν < qs+1 in O(n ln2 n ln ln n) binary operations. Lemma 6. Let α, ν be positive reals. There exist at most one convergent ps /qs , qs ≤ ν to α such that |α − ps /qs | ≤ 1/4ν 2 . We shall list some results of the theory of continued fractions used in this section. All of them are cited from [5,6]. For a positive real α we denote by 1/O = p−1 /q−1 , [α]/1 = p0 /q0 , · · · , ps /qs , · · · the sequence of convergent to α. If α is a rational, then α = ps /qs for some s ≥ O. In this case we suppose that there is the convergent ps+1 /qs+1 = ∞/∞. 1. The following inequalities are valid: 1/qs (qs + qs+1 ) ≤ |α − ps /qs | ≤ 1/qsqs+1 .
(7)
2. Legendre’s Theorem: Let p/q be a fraction in its lowest terms. By p0 /q 0 we denote the next to the last convergent to p/q. Then p/q is a convergent to α if and only if |α − p/q| < 1/q(q + q 0 ) . 3. Let ps−1 /qs−1, ps /qs be convergents to α. Then ps+1 /qs+1 = (ps a + ps−1 ) / (qs a + qs−1 ) , where a = [rs+1 ] can be found from rs+1 = (αqs−1 − ps−1 ) / (−αqs + ps ) .
(8)
Evaluation of Linear Relations
317
Lemma 7. Let |β − α| < δ, for positive reals α, β. There exists a convergent pk /qk , k ≥ 1 to α such that |α − pk /qk | ≥ δ/2 . Then pk−1 /qk−1 is a convergent to β. Proof. Without the loss of generality, we suppose that β < α. Let pk /qk < α at first. If, in this case pk /qk < β < α , then |β − pk /qk | < |α − pk /qk | ≤ 1/qk qk+1 ≤ 1/qk (qk + qk−1 ), since qk+1 ≥ qk + qk−1. Since pk−1/qk−1 is a convergent to pk /qk , according to Legendre’s Theorem, pk /qk is a convergent to β. Let β ≤ pk /qk < α . Then |β −pk /qk | < δ/2 ≤ |α −pk /qk |. Similarly in this case pk /qk is a convergent to β, much less pk−1/qk−1 is a convergent to β. Suppose now that pk /qk ≥ α. Then pk−1 /qk−1 < α and |α − pk−1/qk−1 | > |α − pk /qk | ≥ δ/2 . In a similar way pk−1 /qk−1 is a convergent to β. Thus in any case pk−1/qk−1 is a convergent to β. So the lemma is proved. t u Let p0k /qk0 , k = −1, 0, 1, · · · be a sequence of convergents to β. Lemma 8. Let |β − α| < δ ≤ 1 and k ≥ 0 such number that qk ≤ δ −1/2 qk+1 . Then 1. pi /qi = p0i /qi0 , i ∈ [0, k − 2] for k ≥ 2; 0 , then ` ∈ [k − 2, k + 2]. 2. if q`0 ≤ δ −1/2 < q`+1 Proof. The following inequalities are valid: |α − pk−1/qk−1 | ≥ 1/qk−1 (qk + qk−1 ) ≥ 1/2qk2 ≥ δ/2 . By Lemma 7, pk−2 /qk−2 is a convergent to β. So (1) holds. Let us prove (2). 0 = It is obvious that k − 2 ≤ ` for k = 0, 1. For k ≥ 2, it is evident from qk−2 −1/2 0 −1/2 . Let ` ≥ k + 3. Then qk+3 ≤ δ . Consequently qk−2 < qk ≤ δ 2 0 0 0 0 | ≥ 1/qk+2 + qk+3 qk+2 ≥ 1/2q 0 k+3 ≥ δ/2 . |β − p0k+2 /qk+2 0 is a convergent to α. So According to Lemma 7, we have that p0k+1 qk+1 0 ≤ δ −1/2 . qk+1 = qk+1
This is in contradiction to the choice of k. Thus ` ≤ k + 2. So the lemma is proved. u t
318
I. A. Semaev
Proof (of Theorem 2). Let k be a minimal integer that is greater than or equal to log2 ν 2 . Let us consider the 2-adic expansion α = α−1 2−1 + α−2 2−2 + · · · + α−i 2−i · · · , Let
α−i ∈ [0, 1] .
β = α−1 2−1 + α−2 2−2 + · · · + α−k 2−k = b/a, 0 ≤ b < a = 2k .
Then |α − β| < 2−k ≤ ν −2 . Let p0i /qi0 , i = −1, 0, 1, · · · be a sequence of convergents to β. Let us apply the extended Euclid’s algorithm [7] to the integers a, b. Let us denote by ri the sequence of remainders. It is easy to see that ri = (−1)i+1 (p0i a − qi0 b), i = −1, 0, 1, · · ·. This is a decreasing sequence of natural numbers: a > b > · · · > 0. The asymptotically fast Sh¨ onhage algorithm for the 0 0 /qi+1 to evaluation of the gcd evaluates in particular the convergents p0i /qi0 , qi+1 2 1/2 > ri+1 . This algorithm works in O(k ln k ln ln k) β = b/a such that ri ≥ a 0 binary operations. In this time one can evaluate convergents p0j /qj0 , p0j+1 /qj+1 such that rj ≥ a/ν > rj+1 . These inequalities with (7) show that 0 0 0 0 ≤ ν < qj+1 + qj+2 ≤ qj+3 . qj+1
So we have to do one evaluation by (8) to find p0j 0 −1 /qj0 0−1 , p0j 0 /qj0 0 , such that qj0 0 ≤ ν < qj0 0+1 . According to Lemma 2, the following convergents pi /qi to α for i ≤ j 0 − 2 are equal to p0i /qi0 . In addition, for s such that qs ≤ ν < qs+1 , we have j 0 − 2 ≤ s ≤ j 0 + 2. So to evaluate ps−1 /qs−1 , ps /qs , we need only find by (8) no more than 6 convergents. This can be done in O(n ln2 n ln ln n) binary operations. Thus, the theorem is proved t u Proof (of Lemma 6). Let us suppose that there exist two convergents ps /qs , pt /qt , where qs < qt ≤ ν. Then ps /qs is a convergent to pt /qt . From (7) 1/qs (qs + qs+1 ) ≤ |pt /qt − ps /qs | ≤ |pt /qt − α| + |α − ps /qs | < 1/2ν 2 . So 2ν 2 < qs (qs + qs+1 ) < 2ν 2 . This contradiction proves our lemma.
3
t u
The Proof
Let us prove Theorem 1. We are given vectors `¯i = (`i1 , `i2 , · · · , `in ), i ∈ [1, m] of some lattice L ⊂ IRn , whose Euclidean length is no more than M . Thus, we have any number of binary digits of the coordinates of these vectors. Let λ denote the first successive minimum of the lattice L. The following algorithm n finds Pm an¯integer nonzero vector z1 , z2 , · · · , zm , |zi | < ((2n + 3)M/λ) , such that i=1 zi `i = 0, or determines the linear independence of the vectors above. It is obvious that we can assume m ≤ n + 1. 1. Let s = 2s1 + 1, s1 = dlog2 M e and t = dlog2
(2n + 3)2n (n!)2 211n+7 /νn4 (1) (M/λ)6n e
where dαe is the minimal integer ≥ α ∈ IR, and νn (1) is the volume of the sphere of radius 1 in IRn .
Evaluation of Linear Relations
319
2. Let us present `ik = 2s1 bik + 2−t−4−[log2 n] b0ik , where |bik |, |b0ik | < 1. To find bik we take the binary expansion of `ij /2s1 which is < 1 in absolute value and take out the digits with the following Pn numbers: −t − 4 − [log2 n], −t−5 −[log 2 n], · · ·. Then wePevaluate sums k=1 bik bjk . To find aij we take the binary expansion of 2−1 nk=1 bik bjk , which is < 1 in absolute value, and take out the digits with the following numbers: −t − 1, −t − 2, · · ·. So (9) h`¯i , `¯j i = 2s aij + 2−t a0ij , where |aij |, |a0ij | < 1. 3. We find a number r ≤ m such that `¯i , i ∈ [1, r] are linearly independent and `¯i , i ∈ [1, r + 1] are linearly dependent. To determine the linear dependence of `¯i , i ∈ [1, r] we need only evaluate the determinant of the matrix (aij )i,j∈[1,r] . If its absolute value is < 2−t+r+log2 r! , then the vectors are linearly dependent, otherwise they are linearly independent. Let µ = dm/2e. If `¯i , i ∈ [1, µ] are linearly dependent, we have r < µ, otherwise r ≥ µ. At the following step, we test the linear dependence of the vectors `¯i , i ∈ [1, µ1] for µ1 = dm/4e in the first case and µ1 = d3m/4e in the second. Finally, if r = m, then the vectors `¯i , i ∈ [1, m] are linearly independent. (i) 4. Let r < m. Let us denote A1 = (aij )i,j∈[1,r] and let A1 be a matrix whose ¯ = (ar+11 , · · · , ar+1r ). rows are the rows of A1 except that the i-th row is a By Gaussian elimination, we solve the following system ¯ 1 = ¯a , βA
(10)
for β¯ = (β1 , β2 , · · · , βr ). (i) (i) 5. If βi = 0, then zi = 0. If βi 6= 0, then we evaluate a convergent pki /qki to (i)
r
(i)
βi such that qki ≤ ((2r + 3)M/λ) < qki+1 . 6. We suppose (i) (i) (i) zr+1 = lcmβi 6=0 qki , zi = zr+1 pki /qki . We shall prove that this algorithm actually solves the problem above. We note that |`ik | ≤ k`¯i k ≤ M so |`ik |/2s1 ≤ 1. Thus the presentation `ik = 2s1 bik + 2−t−4−[log2 n] b0ik , |bik |, |b0ik| < 1 is correct. We have h`¯i , `¯j i =
n X
`ik `jk = 22s1
k=1
n X
! bik bjk + 2−t−1dij
,
|dij | < 1 .
k=1
By the Cauchy-Schwarz inequality |h`¯i , `¯j i| ≤ M 2 , it follows that n X bik bjk + 2−t−1 dij ≤ 1 . k=1
320
I. A. Semaev
Pn So | k=1 bik bjk | ≤ 2. The correctness of (9) follows. It is easy to verify the inequality . t ≥ log2 2sr+3r+2 r!/ λ2r νr2 (1)
(11)
Thus we can use Lemma 3 at step 3 of the algorithm since the linear independence of the vectors `¯i , i ∈ [1, r] is equivalent to the matrix A = h`¯i , `¯j ii,j∈[1,r] having determinant zero. Since `¯i , i ∈ [1, r] are linearly independent and `¯i , i ∈ [1, r + 1] are linearly dependent, then by the Corollary to Lemma 1, there r exists integer nonzero vector z1 , z2 , · · · , zr+1 , |zi | < ((2r + 3)M/λ) with coPr+1 ¯ ¯ = prime coordinates such that i=1 zi `i = 0. Since zr+1 6= 0 the vector y (z1 /zr+1 , z2 /zr+1 , · · · , zr /zr+1 ) is the unique solution of the linear system y¯A = `¯∗r+1 . The inequality (11) is valid, so we can use Lemma 5. Furthermore, βi = (i) det A1 / det A1 . Thus |βi − zi /zr+1 | < 2−u ≤ 1/4 ((2r + 3)M/λ)
2r
.
(12)
We used the inequality u = t − r − 2 sr − log2 λ2r νr2 (1)/22r r!
− 5 ≥ 2 + 2r log2 ((2r + 3)M/λ) ,
which follows from t ≥ log2
(2n + 3)2n (n!)2 211n+7 /νn4 (1) (M/λ)6n
since 2 log2 M + 3 ≥ s and by the choice of t. So if β1 = 0 and zi 6= 0, then the inequality (12) follows 4 ((2r + 3)M/λ)
2r
r
< |zr+1 | < ((2r + 3)M/λ)
,
which is the contradiction. Thus, βi = 0 shows zi = 0. Let βi 6= 0. By the (i) (i) Legendre’s Theorem and Lemma 6, zi /zr+1 = pki /qki is the unique convergent to βi such that (i) r (i) qki ≤ ((2r + 3)M/λ) < qki+1 . It follows that (i)
zr+1 = lcmβi 6=0 qki ,
(i)
(i)
zi = zr+1 pki /qki
since gcd (z1 , · · · , zr+1 ) = 1. Thus, our algorithm is correct. Let ε be any positive real number. At step 2 of the algorithm, one has to find of integers of binary length ≤ t + 4 + dlog2 ne. no more than (n + 1)2 n products One can do it in O n3 t1+ε = O n4+ε (ln M n/λ)1+ε binary operations [7]. At step 3, one has to evaluate no more than dlog2 ne determinants of matrices of degree no more than n × n with rational entries the numerators and denominators of which have no more than t + 2 binary digits. For example, A1 = (aij )i,j∈[1,r] where aij = a00ij /2t+1 , a00ij ∈ ZZ, |a00ij | < 2t+1 .
Evaluation of Linear Relations
321
At step 4, one has to solve the system of linear equations (10). These problems can be solved by Gaussian elimination. Let us consider this procedure in more detail. ¯ = (a1 , a2 , · · · , an ) be a vectorLet A = (aij )i,j∈[1,n] be a square matrix and a row. Entries of the matrix and the row are considered as independent variables. Let ¯0 , (13) A¯ x0 = a ¯. be a system of linear equations, where a ¯0 is a column associated to the row a 0 Let us denote by A a n × (n + 1) matrix whose first columns are those of A and the last column is a ¯0 . Let us denote by A0k,l a matrix derived from A0 by the commutation of the lth and k-th columns. Let A0 (i) be a submatrix of A0 in the first i rows and i columns of A0 . Applying Gaussian elimination to A0 we obtain the upper triangular matrix B = (bij )i∈[1,n],j∈[1,n+1] that is bij = 0 for n ≥ i > j ≥ 1. Let us divide the i-th row of B by bii . We have a matrix C = (cij )i∈[1,n],j∈[1,n+1] with 1 on the (k) principal diagonal. Let Ck = (cij )i∈[1,n],j∈[1,n+1] be obtained by the reduction of the submatrix in the first k rows and k columns of C to the identity k × k matrix. Thus Cn such that Cn (n) is the identity n × n matrix and its (n + 1)-th column is a solution of the system (13). Lemma 9. The following inequalities are valid: 1. bij = det A0ij (i)/ det A0 (i − 1), 1 ≤ i < j ≤ n + 1, 2. cij = det A0ij (k)/ det A0 (k), i ∈ [1, k], j ∈ [k + 1, n + 1]. (k)
The proof of the lemma follows from the Cramer’s rule by induction on n. ¯ as integers no more We can consider entries of the matrix A1 and the row a than 2t+1 in absolute value. From Lemma 9, it follows that in the Gaussian elimination we are dealing with the rationals, the numerators and denominators of which are determinants of matrices of degree no more than n×n which entries ¯. By Hadamard’s inequality, the absolute are those of the matrix A1 and the row a values of these determinants are no more than n1/2 2(t+1)n . Thus we can solve the system and evaluate the determinants in O n4+ε t1+ε = O n5+ε (ln M n/λ)1+ε binary operations. At step 5, one needsto evaluate no more thann convergents. By section 2, n 1+ε 1+ε 2+ε (ln M n/λ) = O n this can be done in O n (ln ((2n + 3)M/λ) ) binary operations. At step 6, one needs to evaluate the lcm. That is, one needs to evaluate no n more than n gcd’s of natural numbers at most ((2n + 3)M/λ) . By the asymp- totically fast method [7], this evaluation can be done in O n2+ε (ln M n/λ)1+ε binary operations. This proves the theorem. I am grateful to Joe Buhler and Jerry Shurman for their transformations of my English prose and to MacCentre, Moscow for technical assistance.
322
I. A. Semaev
References 1. Gordon, D.M., Discrete logarithms in GF (p) using the number field sieve, SIAM J. Disc. Math. 6 (1993) 124–138 2. Lenstra, A.K., Lenstra, Jr., H.W., Lovasz, L. Factoring polynomials with rational coefficients Math. Ann. 261 (1982) 515–534 3. Dobrowolski, E. On the maximal modulus of conjugates o an algebraic integer Bull. Acad. Polon. Sci. Ser. Sci. Math. Astronom. Phys. 26 (1978) 291–292 4. Schmidt, W.M. Diophantine approximation, Springer-Verlag NY (1980) 5. Khinchin, A.Ja., Continued fractions, “Nauka,” Moscow, 1978 (in Russian) 6. Vorob’ev, N.N., Jr., Fibonacci numbers, “Nauka,” Moscow, 1984 (in Russian) 7. Aho, A., Hopcroft, J., Ullman, J. The design and analysis of computer algorithms, Addison-Wesley Reading MA (1974)
An Efficient Parallel Block-Reduction Algorithm Susanne Wetzel? Daimler Benz AG FTK/A HPC 0507 D-70564 Stuttgart Germany
Abstract. In this paper, we present a new parallel block-reduction algorithm for reducing lattice bases which allows the use of an arbitrarily chosen block-size between two and n where n denotes the dimension of the lattice. Thus, we are building a hierarchy of parallel lattice basis reduction algorithms between the known parallel all-swap algorithm which is a parallelization for block-size two and the reduction algorithm for block-size n which corresponds to the known sequential lattice basis reduction algorithm. We show that even though the parallel all-swap algorithm as well as the parallel block-reduction algorithm have the same asymptotic complexity in respect to arithmetic operations in theory, in practice neither block-size two nor block-size n are a priori the best choices. The optimal block-size in respect to minimizing the reduction time rather depends strongly on the used parallel system and the corresponding communication costs.
1
Introduction
Lattice basis reduction algorithms were once of interest primarily to number theorists studying quadratic forms. However, starting with the work of H.W. Lenstra about 15 years ago, lattice basis reduction algorithms have emerged as an important tool in integer programming [19]. The LLL lattice basis reduction algorithm [18], which was invented soon afterwards, has spurred extensive research in lattice theory, thus leading to the improvement of sequential lattice basis reduction algorithms [2,15,22,25,26,27] as to reduce the computational costs of the reduction and achieve better reduction results. Moreover, it has revolutionized combinatorial optimization [6] and cryptography (e.g., [4,14,16,17,26]). Nevertheless, the run time for sequentially reducing lattice bases of large dimension or with big entries is still quite high. Thus, there is great interest in parallelizing lattice basis reduction algorithms as to achieve an additional improvement in reducing the run times so that reductions of even larger lattice bases with bigger entries can also practically be performed in a reasonable amount of time. ?
The research was done while the author was a member of the Graduiertenkolleg Informatik at the Universit¨ at des Saarlandes (Saarbr¨ ucken), a fellowship program of the DFG (Deutsche Forschungsgemeinschaft).
J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 323–337, 1998. c Springer-Verlag Berlin Heidelberg 1998
324
Susanne Wetzel
Recently, various parallel lattice basis reduction algorithms have been developed based on the so-called all-swap algorithm which is a modification of the classical LLL algorithm, doing the reductions in parallel on blocks of size two [7,8,9,10,11,12,13,24,29]. In this work, we present a new parallel block-reduction algorithm which allows the use of an arbitrarily chosen block-size between two (known parallelization) and n (sequential algorithm) thus building a hierarchy of parallel lattice basis reduction algorithms (n denotes the dimension of the lattice). On the basis of practical results, we show that neither block-size two nor block-size n are a priori the best choices as to minimize the reduction time even though the parallel all-swap algorithm (known parallelization for blocksize two) and the parallel block-reduction algorithm have the same asymptotic complexity in respect to arithmetic operations in theory. It turns out that the optimal block-size rather depends strongly on the lattice to be reduced as well as the parallel system and the corresponding communication costs used for the reduction. This paper is organized as follows: At first we give a brief introduction to the theory of lattice basis reduction. We then provide an overview of the parallel lattice basis reduction algorithms known so far. Based on the so-called all-swap algorithm, we then present and analyze our new parallel block-reduction algorithm. In the following theoretical considerations, we are using a distributed memory computational model with n or n2 processors connected by a communication network as our parallel system. The performance of the parallel algorithms will be measured by means of the speed-up Sp =
T∗ Tp
(1)
which is defined as the ratio of the optimal run time T ∗ of the fastest sequential algorithm for the problem to the run time Tp of the parallel algorithm using p processors. Thus, the speed-up describes the advantage of the parallel algorithm, compared to the best possible sequential algorithm. Ideally, Sp = p.
2
Lattice Basis Reduction
In this section we give a brief introduction to the theory of lattice basis reduction. For further details we refer to [3,6,23,30]. Definition 1. A lattice L ⊂ IRn is a discrete additive subgroup of IRn such that L=
n nX
o xi bi xi ∈ ZZ, i = 1, . . . , n ,
(2)
i=1
where b1 , b2 , . . . , bn ∈ IRn are linearly independent vectors. We call B = (b1 , . . . , bn ) ∈ IRn×n a basis of the lattice L = L(B) with dimension n. Obviously, a lattice has various bases whereas the dimension is uniquely determined:
An Efficient Parallel Block-Reduction Algorithm
325
Theorem 1. Let B = (b1 , . . . , bn ) ∈ IRn×n be a basis of the lattice L ⊂ IRn . Then, B 0 = (b01 , . . . , b0n ) ∈ IRn×n is also a basis of the lattice L iff there is a unimodular matrix U ∈ ZZ n×n such that BU = B 0 . The aim of lattice basis reduction is to construct one of the many bases of a lattice (in polynomial time) such that the base vectors are as small as possible (by means of the Euclidean norm) and as orthogonal as possible to each other. Theorem 2. For a basis B = (b1 , . . . , bn ) ∈ IRn×n , the associated orthogonal basis is denoted by B ∗ = (b∗1 , . . . , b∗n ) ∈ IRn×n and can be computed by the GramSchmidt orthogonalization procedure as b∗1 = b1 ,
(3)
b∗i = bi − µij =
i−1 X
j=1 hbi , b∗j i kb∗j k2
µi,j b∗j
for 2 ≤ i ≤ n and
(4)
for 1 ≤ j < i ≤ n.
(5)
With µii = 1 for 1 ≤ i ≤ n and µij = 0 for i < j, M = (µij )1≤i,j≤n is a lower triangular matrix and the following equation holds: (b1 , . . . , bn) = (b∗1 , . . . , b∗n )M T
(6)
In general, for a lattice L ⊂ IRn with basis B ∈ IRn×n , the corresponding M is not integral (but det(M ) = 1) and therefore B ∗ is not a basis of the lattice L. Definition 2 ([18]). For a lattice L ⊂ IRn with basis B = (b1 , . . . , bn ) ∈ IRn×n and corresponding Gram-Schmidt orthogonalization B ∗ = (b∗1 , . . . , b∗n ) ∈ IRn×n , the basis B is called LLL-reduced if the following conditions are satisfied: 1 for 1 ≤ j < i ≤ n (7) |µij | ≤ 2 3 for 1 < i ≤ n. (8) kb∗i + µii−1 b∗i−1 k2 ≥ kb∗i−1 k2 4 Property (7) is the criterion for size-reduction [3,6,23,30]. The original LLL lattice basis reduction algorithm can be found in [18]. From the analysis in [18] it is known that the algorithm is polynomial in time: Theorem 3. Let L ⊆ ZZ n be a lattice with basis B = (b1 , . . . , bn ) ∈ ZZ n×n and let C ∈ IR, C ≥ 2 be such that kbi k2 ≤ C for 1 ≤ i ≤ n. Then, the number of arithmetic operations needed by the LLL algorithm is O(n3 n log C), and the integers on which these operations are performed each have binary length O(n log C). Theorem 4. Let L be a lattice in IRn and B = (b1 , . . . , bn ) ∈ IRn×n be an LLL-reduced basis of L. Then, the following estimate holds: p (9) kb1 k ≤ 2(n−1)/4 n | det(B)| Based on the introduced notations and the original LLL algorithm [18], we will now focus on parallel lattice basis reduction and present the newly-developed parallel block-reduction algorithm.
326
3
Susanne Wetzel
Parallel Lattice Basis Reduction Algorithm
In the last few years, various parallel lattice basis reduction algorithms based on the LLL algorithm have been developed [7,8,9,10,11,12,13,24,29]. While in [7] an efficient parallelization for the original Schnorr-Euchner algorithm [26] and thus also for the original LLL algorithm for a ring of n processors is presented, all the other parallel lattice basis reduction algorithms proposed so far are parallelizations of a modification of the original LLL algorithm. This is due to the fact that it seems to be impossibly to efficiently parallelize the classical reduction algorithms for n2 processors because both the original LLL algorithm and the Schnorr-Euchner algorithm work step-by-step. Since the new parallel blockreduction algorithm presented in the sequel is a generalization of the so-called parallel all-swap algorithm which is based on the modified LLL algorithm, we will now focus on the modification of the classical LLL and introduce its parallelization: From [18] it is known that the LLL-reduction of a given lattice basis is achieved by performing a particular sequence of size-reductions and exchange steps. Moreover, it is also known that independent of the order in which these operations are performed, the computations will result in an LLL-reduced bases after a finite number of steps [19]. Thus, the original step-by-step LLL algorithm [18] can be modified as follows: Algorithm 1. Modified LLL(b1 , . . . , bn ) Input: Lattice basis B = (b1 , . . . , bn ) ∈ ZZ n×n Output: LLL-reduced lattice basis
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
compute the Gram-Schmidt orthogonalization compute the size-reduction of the basis (b1 , . . . , bn ) swapping := true while (swapping) do swapping := false if (there is an index i with kb∗i+1 k2 < ( 34 − µ2i+1i )kb∗i k2 ) then swapping := true for such an index i swap bi and bi+1 , i.e., consider the new basis (b1 , . . . , bi+1 , bi , . . . , bn ) update the Gram-Schmidt orthogonalization compute the size-reduction of the basis (b1 , . . . , bn ) fi od
Instead of reducing the basis step-by-step and executing the while-loop as long as the stage index is not larger than the dimension of the lattice as it is done in the original LLL-reduction algorithm [18], the while-loop in the modified algorithm is executed as long as there is at least one pair of base vectors bi , bi+1 (1 ≤ i < n) for which the LLL condition (8) does not hold yet. Obviously, the modified algorithm yields an LLL-reduced basis upon termination. The following
An Efficient Parallel Block-Reduction Algorithm
327
theorem shows that the algorithm is polynomial in time [28] but needs more arithmetic operations than the original LLL algorithm (see Theorem 3): Lemma 1. Let L be a lattice with basis B = (b1 , . . . , bn ) ∈ ZZ n×n and let C ∈ IR, C ≥ 2 be such that kbi k2 ≤ C for 1 ≤ i ≤ n. The modified LLL algorithm needs at most O(n5 log(C)) arithmetic operations for computing an LLL-reduced lattice basis. Proof. From [18] it is known that at most n2 log(C) exchanges have to be performed in order to obtain an LLL-reduced basis of B = (b1 , . . . , bn ) ∈ ZZ n×n . Since each run through the while-loop causes O(n3 ) arithmetic operations, we obtain the assertion. t u Now, the crucial idea for the parallelization of the modified LLL algorithm is to exchange (swap) at once as many vectors of the lattice basis as possible, leading to the so-called (parallel) all-swap lattice basis reduction algorithm [7,8,9,10,11,12,13,24,29]: Algorithm 2. All-Swap-Reduction(b1 , . . . , bn ) Input: Lattice basis B = (b1 , . . . , bn ) ∈ ZZ n×n Output: LLL-reduced lattice basis
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14)
compute the Gram Schmidt orthogonalization compute the size-reduction of the basis (b1 , . . . , bn ) order := odd while (it is still possible to do some swaps) do if (order = odd) then for all i swap bi and bi+1 if i is odd and kb∗i+1 k2 < ( 34 − µ2i+1i )kb∗i k2 order := even else for all i swap bi and bi+1 if i is even and kb∗i+1 k2 < ( 34 − µ2i+1i )kb∗i k2 order := odd fi compute the Gram-Schmidt orthogonalization compute the size-reduction of the basis (b1 , . . . , bn ) od
On (parallel) execution of the all-swap-reduction algorithm, in an odd phase all vectors bi and bi+1 with i odd and kb∗i+1 k2 < ( 34 − µ2i+1i )kb∗i k2 are swapped (in parallel) and in an even phase, the same is done for i even. The combination of those two phases is called an all-swap phase. Even though as many exchanges as possible will be performed in one all-swap phase, in the serial case the all-swap algorithm still requires O(n5 log(C)) arithmetic operations for the computations of an LLL-reduced bases of B = (b1 , . . . , bn ) ∈ ZZ n×n . This is due to the fact that the complexity of an all-swap phase is dominated by the complexity of the orthogonalization and the size-reduction process which is O(n3 ) arithmetic
328
Susanne Wetzel
operations in both cases. According to [7], the size-reduction process as well as the Gram-Schmidt orthogonalization are well suited for parallelization. I.e., the computational costs can be reduced to O(n) (respectively O(n2 )) arithmetic operations by doing the computations in parallel on a mesh-connected network of n2 processors (ring of n processors) thus achieving a speed-up in the order of magnitude of O(n) (O(1)) in comparison to the fastest known LLL algorithm (requiring O(n4 log(C)) arithmetic operations (see Theorem 3)). If one is willing to refrain from the demand to compute an LLL-reduced lattice basis (thus satisfying (9)) and accepts the computation of a reasonably short vector of the lattice L(B) with basis B = (b1 , . . . , bn ) ∈ ZZ n×n instead, it is, of course, possible to improve the complexity of the (parallel) all-swap n5 log(C)) arithmetic operations using p(n) ∈ algorithm which originally is O( p(n) {1, n, n2} processors since fewer all-swap phases are required in order to achieve the relaxed reduction result. For example, by performing at most n log(C) allswap phases, the following notion of reducedness (in comparison to the LLLreducedness and (9)) is attainable: Theorem 5 ([7]). Let L be a lattice with basis B = (b1 , . . . , bn ) ∈ ZZ n×n and let C ∈ IR, C ≥ 2 be such that kbi k2 ≤ C for 1 ≤ i ≤ n. After at most n log(C) all-swap phases, the following situation holds for all i with 1 ≤ i ≤ n: kb∗1 k2 · kb∗2 k2 · . . . · kb∗i k2 ≤ with c =
4 i(n−i) 2i c 2 (det(L) n ) 3
(10)
32 9 .
This leads to the following property [7]: Corollary 1. Let L be a lattice with basis B = (b1 , . . . , bn ) ∈ ZZ n×n and let C ∈ IR, C ≥ 2 be such that kbi k2 ≤ C for 1 ≤ i ≤ n. After at most n log(C) all-swap phases the vector b1 = b∗1 of the resulting lattice basis satisfies 2 n−1 1 kb1 k ≤ √ c 4 det(L) n 3 with c =
(11)
32 9 .
Thus, the following corollary holds: Corollary 2. Let L be a lattice with basis B = (b1 , . . . , bn ) ∈ ZZ n×n and let C ∈ IR, C ≥ 2 be such that kbi k2 ≤ C for 1 ≤ i ≤ n. Using the (parallel) all-swap algorithm, the computation of a short vector of the lattice (satisfying n4 log(C)) arithmetic operations using p(n) procesinequality (11)) requires O( p(n) sors with p(n) ∈ {1, n, n2}.
4
Parallel Block-Reduction Algorithm
While in the case of the original LLL algorithm the reduction of a lattice basis B = (b1 , . . . , bn ) is done globally at once, in the case of the all-swap algorithm
An Efficient Parallel Block-Reduction Algorithm
329
global update operations and size-reductions take turns with odd or even phases where blocks of size two are LLL-reduced locally. By introducing the newlydeveloped parallel block-reduction algorithm, this method will be generalized such that m disjoint blocks of size l will locally be LLL-reduced first before global update operations are performed and the process is iterated in order to compute the sought LLL-reduced bases. Thus, we build a hierarchy of lattice basis reduction algorithms between the (parallel) all-swap algorithm (block-size two) and the LLL algorithm (block-size n). Without loss of generality, we assume that n = ml with 2 ≤ l ≤ n. (Otherwise, m = d nl e and the m-th block has size n − (m − 1)l.) For B ∗ and M corresponding to B (see Theorem 2), let B[k] = (bl(k−1)+1 , . . . , blk ), ∗ = (b∗l(k−1)+1 , . . . , b∗lk ) and M[k] be the l×l submatrix MU V of M (see Figure 1 B[k] (only non-zero elements are put down)) with U = V = {l(k − 1) + 1, . . . , lk} and 1 ≤ k ≤ m. Hence, the new (parallel) block-reduction algorithm can be stated as follows: Algorithm 3. Block-Reduction(b1 , . . . , bn , l) Input: Lattice basis B = (b1 , . . . , bn ) ∈ ZZ n×n and block-size l Output: LLL-reduced lattice basis
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
(11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) (24)
if (l < n) then compute the Gram-Schmidt orthogonalization compute the size-reduction of the basis (b1 , . . . , bn ) fi split the basis into m blocks of size l for (k := 1; k ≤ m; k + +) do exchange[k] := true od while (there are k’s with 1 ≤ k ≤ m such that exchange[k] = true) do for all such k compute the LLL-reduction of the block ∗ and bl(k−1)+1 , . . . , blk according to [18] for given M[k] , B[k] B[k] for (k := 1; k ≤ m; k + +) do exchangek := false od for (k := 1; k < m; k + +) do compute the size-reduction of µkl+1,kl if (kb∗lk+1 k2 < ( 34 − µ2lk+1lk )kb∗lk k2 ) then swap blk and blk+1 exchange[k] := true exchange[k + 1] := true update and size-reduce µlk,i for lk − 1 ≥ i ≥ l(k − 1) + 1 update and size-reduce µj,lk+1 for lk + 2 ≤ j ≤ l(k + 1) update kb∗lk k2 , kb∗lk+1 k2 fi od
330
Susanne Wetzel
(25) for (k := 1; k < m; k + +) do (26) for (i := lk + 1; i ≤ n; i + +) do (27) update and size-reduce µij for lk ≥ j ≥ l(k − 1) + 1 (28) od (29) od (30) od After splitting the original basis into m blocks of size l, the reductions are done ∗ and M[k] for 1 ≤ k ≤ m. (in parallel) on the blocks B[k] using the data of B[k] Upon completion, B[k] is size-reduced and kb∗l(k−1)+i+1 k2 ≥
3 4
− µ2l(k−1)+i+1l(k−1)+i kb∗l(k−1)+i k2
(12)
for 1 ≤ k ≤ m and 1 ≤ i ≤ l − 1. In order to achieve an overall LLL-reduced basis, the borders of the blocks have to be checked for possible additional swaps. If swaps occur, the µlk,i for l(k − 1) + 1 ≤ i ≤ lk − 1 and µj,lk+1 for lk + 2 ≤ j ≤ l(k + 1) have to be recomputed and size-reduced since the corresponding Gram-Schmidt coefficients belonged to a part of M which did not get updated during the local block-reduction of the B[k] ’s. The kb∗lk k2 , kb∗lk+1 k2 are updated according to the formulas in [18]. In order to guarantee that B is size-reduced after each iteration, all the other Gram-Schmidt coefficients which are not needed in the course of the local reductions also have to be updated and size-reduced. Then, the process is iterated. The algorithm terminates as soon as no swaps occur at the borders of the blocks. It obviously yields an LLL-reduced basis. For block-size l = n, the block-reduction algorithm is just the LLL algorithm and if l = 2 it is the all-swap algorithm (see Algorithm 2). For l = 2, the arithmetic complexity of the algorithm is summarized in Corollary 2 and for the case l = n, Theorem 3 provides the corresponding information. In the following, we analyze the block-reduction algorithm for 2 < l < n. Lemma 2. Let L be a lattice with basis B = (b1 , . . . , bn ) ∈ ZZ n×n and let C ∈ IR, C ≥ 2 be such that kbi k2 ≤ C for 1 ≤ i ≤ n. Then, the local LLL∗ (1 ≤ k ≤ m) can be computed reduction of block B[k] in conjunction with B[k] 3 with O(nl log(C)) arithmetic operations. ∗ = (b∗l(k−1)+1 , . . . , b∗lk ) and M[k] one Proof. For B[k] = (bl(k−1)+1 , . . . , blk ), B[k]
can compute a basis U[k] = (ul(k−1)+1 , . . . , ulk ) ∈ Qn×l of an l-dimensional ∗ T M[k] where lattice L[k] ⊂ Qn such that U[k] = B[k] X
l(k−1)+i
ul(k−1)+i =
µl(k−1)+i,j b∗j
(13)
j=l(k−1)+1
with 1 ≤ i ≤ l. Since U[k] ∈ Qn×l and kb∗i k2 ≤ C for l(k − 1) + 1 ≤ i ≤ lk, the claim ensues from the analysis in [18]. t u Thus, the following theorem holds for the complexity of the block-reduction algorithm:
331 An Efficient Parallel Block-Reduction Algorithm
.
1 ..
1)l;1
1)l+1;1
(k
(k
(k
n;1
1)l+2;1 . . .
2l+1;1 . . .
2l;1
. . .
l+2;1
l+1;1
l;1
. . .
2;1
1
0 BB BB BB BB BB BB BB BB BB BB BB BB BB , BB BB , BB , @B .. .
l
;l
;l
l;l
. . .
l
l;l
;l
l
. . .
l;l
,1 +1 ,1 +2 ,1
l
k
;l
2 , 1 2 +1 ,1
k
k
;l
;l
1
l
l;l
l;l
;l
l
+1 +2 . . .
. . .
l
k
l
2 2 +1
k
k
n;l
;l
1)l+2;l . . .
( ,1) ,1 ( ,1) ( ,1) +1 ,1 ( ,1) +1 ( , ,1 ( , 1)l+2;l . . .
n;l
,1
l
..
1
.
. . .
. . .
;l
l;l
;l+1
1
+2
l
2 +1 2 +1 +1
l;l+1
k
( ,1)
l
..
. . .
.
l; l
; l
l; l
2 2 ,1 2 +1 2 ,1 k
l
1
. . .
; l
l;2l
2 +1 2 k
( ,1) 2 ,1 ( ,1)
1 ..
.
..
.
( ,1) ( ,1)
l; k
k
Fig. 1. M = (µij ) with 1 ≤ i, j ≤ n
l
k
k
l
1
; k
l
l
k
1 l
,1) +1
1 .. .
,1) +1
n;(k
l
1)l+2;(k . . .
( ,1) +1 ( ,1) ( , ,1) ( , l
,1) +1
1)l+2;(k . . .
n;(k
.. .
n;n
,1
1
1 CC CC CC CC CC CC CC CC CC CC CC CC CC CC CC CC AC
332
Susanne Wetzel
Theorem 6. Let L be a lattice with basis B = (b1 , . . . , bn ) ∈ ZZ n×n and let C ∈ IR, C ≥ 2 be such that kbi k2 ≤ C for 1 ≤ i ≤ n. For computing an LLLreduced basis, the block-reduction algorithm on m disjunctive blocks of size l (n = ml) requires O(n5 log(C)) arithmetic operations. Proof. In the following, we will distinguish between so-called “heavy” and “slight” swaps, occurring in (14) - (29) of Algorithm 3. A swap of bk and bk+1 for 1 ≤ k ≤ m − 1 is said to be a slight swap if in the next phase of the reduction process only a constant number of local swaps are performed for reducing the blocks B[k] and B[k+1] . It is called a heavy swap if O(l2 log(C)) swaps are done in at least one of the neighboring blocks. While a slight swap causes O(nl) arithmetic operations in the next phase, a heavy swap implies O(nl3 log(C)) operations (see Lemma 2). Let sh (ss ) be the number of heavy (slight) swaps and cmax be the maximum number of local swaps caused by slight swaps. Then, the number of arithmetic operations performed in (9) - (29) is at most c1 ss nl + c2 sh nl3 log(C) + c3 (ss + sh )n3
(14)
with (15) cmax ss + c4 sh l2 log(C) + (ss + sh ) denoting the maximum overall number of swaps (c1 , c2 , c3 , c4 ∈ IN). From [18] we know that at most n2 log(C) swaps are necessary for computing an LLL-reduced basis of B. Therefore, the following holds: (cmax ss + c4 sh l2 log(C) + (ss + sh ))c3 n2 (n − m) = O(n5 log(C))
(16)
Thus, c1 ss nl + c2 sh nl3 log(C) + c3 (ss + sh )n2 (n − m) = O(n5 log(C)).
(17) t u
Since the Gram-Schmidt orthogonalization, the size-reduction and the updating are well suited for parallelization according to [7], the parallel block-reduction algorithm requires O(n3 log(C)) (respectively O(n4 log(C)) arithmetic operations on a mesh-connected network of n2 processors (ring of n processors) thus achieving the same speed-up as for the parallel all-swap algorithm, i.e., an speed-up in the order of magnitude of O(n) (O(1)). As in the case of the all-swap algorithm (see Corollary 2), the complexity of the block-reduction for computing a reasonably short vector of a lattice L(B) n4 log(C)) arithmetic with basis B = (b1 , . . . , bn ) ∈ ZZ n×n can be reduced to O( p(n) 2 operations (using p(n) processors with p(n) ∈ {1, n, n }) by relaxing the bound (9) to (11), thus reducing the number of necessary swaps to O(n log(C)). Remark 1. From the construction of the basis Uk = (ul(k−1)+1 , . . . , ulk ) ∈ Qn×l (see Theorem 2) and the analysis of the binary complexity of the LLL algorithm in [18], it follows that the binary complexity of the block-reduction algorithm is worse than the one of the original LLL algorithm. This is also the case for the all-swap algorithm [7]. Since the all-swap algorithm as well as the new blockreduction algorithm are based on the modified LLL algorithm for which no binary complexity results are known, it cannot be stated which one of the algorithms has the worst binary complexity in the final analysis.
An Efficient Parallel Block-Reduction Algorithm
5
333
Practical Results
Since in theory both the all-swap algorithm and the new block-reduction algorithm have the same asymptotic complexity in respect to arithmetic operations, on the basis of some tests it shall be demonstrated in the sequel how the chosen block-size affects the reduction time of the block-reduction algorithm in practice. For this purpose, Algorithm 3 has been implemented on a serial machine (using LiDIA1 [1,20,21]), simulating the parallel block-reductions by simply doing one after another in each iteration step. The block-reductions were computed using the implementation of the original Schnorr-Euchner algorithm [26] in LiDIA on input of the local bases. In order to prevent floating point errors in the input data for the local reductions, the initial Gram-Schmidt orthogonalization as well as the corresponding updates and size-reductions beyond the block-reduction were done using rational arithmetic. In the following, we present timings of tests which have been performed on a Sparc 4 with 110 MHz and 32 MB main memory (see also [30]). Block- Blocks Iterations RT BT BT/Block size in seconds 2 25 141 1767.85 540.94 21.64 5 10 91 641.70 170.07 17.01 10 5 67 371.12 94.67 18.93 25 2 34 206.76 89.38 44.69 50 1 1 114.26 114.26 114.26
OT 1226.91 471.63 276.45 117.34 0
Table 1. Damg˚ ard Lattices - Dimension n = 50 Table 1 summarizes the test results in the case of Damg˚ ard lattices [14,30] with dimension n = 50 where the size of the entries is about 100 bits. We distinguish between the block-reduction time BT which was determined as the sum of the times for each single block-reduction, OT as the time needed for performing the size-reductions, the updates as well as the initial Gram-Schmidt orthogonalization (which were computed using the usual sequential algorithms) and the overall sequential reduction time RT = BT+OT. Disregarding the costs for the communication, the second to last column of the table contains the blockreduction time per block, i.e., the expected reduction time if the block-reductions of the # blocks are done in parallel. The results in Table 1 show that in respect √ to the block-reduction time (per block), the optimal block-size is close to n. Moreover, OT decreases since less iterations occur as the block-size increases. For unimodular lattices (see Table 2), the optimal block-reduction time (per block) is achieved for block-size two while the minimal reduction time per block is obtained for the maximal block-size. As the block-size increases, the number of iterations decreases only slightly. In the case of random lattices (see Table 3), √ the minimal block-reduction time BT is also obtained for block-sizes close to n whereas the block-reduction 1
LiDIA is a library for computational number theory
334
Susanne Wetzel Block- Blocks Iterations RT BT BT/Block size in seconds 2 10 287 2045.20 68.53 6.85 4 5 271 993.06 114.84 22.97 5 4 263 798.28 146.89 36.72 10 2 200 572.59 358.25 179.13 20 1 1 96.29 96.29 96.29
OT 1976.67 878.22 651.39 214.34 0
Table 2. Unimodular Lattices - Dimension n = 20, Size of the Entries ≈ 50 bits time per block is minimal for block-size two. As before, the number of iterations decreases with increasing block-size. Block- Blocks Iterations RT BT BT/Block size in seconds 2 15 20 2089.30 13.56 0.90 3 10 15 1309.03 9,77 0.98 5 6 12 851.57 9.02 1.50 6 5 11 768.10 9.56 1.91 10 3 10 629.04 13.26 4.42 15 2 9 564.97 19.33 9.67 30 1 1 29.77 29.77 29.77
OT 2075.74 1299.26 842.65 758.54 615.78 545.64 0
Table 3. Random Lattices - Dimension n = 30, Size of the Entries ≈ 50 bits We have to note that for all test classes it was necessary to do the approximations (occurring during the local block-reductions) using xdoubles or even bigfloats (see [20]) in order to guarantee a correct reduction result. From [26,30] it is known that this is not necessary for those kind of lattices in the serial case. Consequently, the size of the occurring intermediate results in the (simulated) parallel computation is much bigger than in the serial one. Despite the facts that the parallel computations were only simulated and slow rational arithmetic was used for the implementation of the block-reduction algorithm, the tests show clearly how the reduction time for particular test classes depends on the chosen block-size. However, on basis of the test data in hand, for none of the test classes it can be stated whether the parallel block-reduction algorithm (including parallel implementations of the Gram-Schmidt orthogonalization etc.) will be profitable in respect to the run time in practice in comparison to the Schnorr-Euchner algorithm (especially on application to larger lattice bases with huge entries) since this strongly depends on the actually used parallel system and the corresponding communication costs. I.e., in order to allow the comparison it would be necessary to run the block-reduction algorithm on a parallel machine or use a distributed system, thus also allowing a quantification of the communication costs. In order to be able to implement this process using LiDIA, this system would have to provide a parallel interface. Moreover, for a practical implemen-
An Efficient Parallel Block-Reduction Algorithm
335
tation of the algorithm it would be necessary to use a floating point arithmetic for all the operations (Gram-Schmidt orthogonalization, size-reductions, updates etc.), thus requiring the implementation of additional heuristics for preventing error propagation and correcting occurring floating point errors. As a first step, the orthogonalization would have to be done using Givens rotations since this method is more stable than the original Gram-Schmidt orthogonalization [5]. Moreover, for stability reasons it would be necessary to recompute the complete orthogonalization after checking the swap condition at the boundaries of the blocks. Thus, the steps (20) – (22) of the block-reduction algorithm would have to be replaced by an overall orthogonalization. Furthermore, for performing the block-reductions itself, it would be necessary to adjust the actual reduction algorithm such that it works on input of an orthogonalization instead of a basis, thus requiring new heuristics, e.g., for correcting large reduction coefficients especially at the boundaries of the blocks.
6
Summary
In this paper, we have built a hierarchy between the all-swap algorithm and the LLL algorithm by proposing a new block-reduction algorithm, thus allowing reductions on blocks of the lattice basis in parallel. In theory, both the allswap algorithm and the block-reduction algorithm have the same asymptotic complexity in respect to arithmetic operations. Practical tests have illustrated that there is no general optimal block-size in respect to the reduction time, it rather depends on the test class. Furthermore, the tests have shown that the size of the intermediate results is reasonably bigger than during the sequential reduction. In order to allow a comparison of the Schnorr-Euchner algorithm and the newly-proposed parallel block-reduction algorithm, it would be necessary to implement and test the block-reduction algorithm on a parallel computer or for a distributed system. Moreover, it would be necessary to adjust the blockreduction algorithm such that the operations could be done using floating point arithmetic, thus requiring additional heuristics for correcting floating point errors and preventing error propagation.
Acknowledgements The author would like to thank her supervisor Prof. Dr. J. Buchmann as well as her colleagues Patrick Theobald and Christoph Thiel for helpful remarks and suggestions.
336
Susanne Wetzel
References 1. Biehl, I., Buchmann, J., and Papanikolaou, T.: LiDIA: A Library for Computational Number Theory. Technical Report 03/95, SFB 124, Universit¨ at des Saarlandes (1995). 2. Buchmann, J., and Kessler, V.: Computing a Reduced Lattice Basis from a Generating System. Preprint, Universit¨ at des Saarlandes, Saarbr¨ ucken (1992). 3. Cohen, H.: A Course in Computational Algebraic Number Theory. Second Edition, Springer Verlag Heidelberg (1993). 4. Coster, M.J., LaMacchia, B.A., Odlyzko, A.M., and Schnorr, C.P.: An Improved Low-density Subset Sum Algorithm. Proceedings EUROCRYPT ’91, Springer Lecture Notes in Computer Science LNCS 547, pp. 54–67 (1991). 5. Golub, G.H., and van Loan, C.F.: Matrix Computations. John Hopkins University Press Baltimore (1996). 6. Gr¨ otschel, M., Lov´ asz, L., and Schrijver, A.: Geometric Algorithms and Combinatorial Optimization. Second Edition, Springer Verlag Heidelberg (1993). 7. Heckler, C.: Automatische Parallelisierung und parallele Gitterbasisreduktion. PhD Thesis, Universit¨ at des Saarlandes, Saarbr¨ ucken, Germany (1995). 8. Heckler, C., and Thiele, L.: On the Time Complexity of Parallel Algorithms for Lattice Basis Reduction. Technical Report 05/93, SFB 124, Universit¨ at des Saarlandes (1995). 9. Heckler, C., and Thiele, L.: A Parallel Lattice Basis Reduction for MeshConnected Processor Arrays and Parallel Complexity. Proceedings SPDP ’93, pp. 400–407 (1993). 10. Heckler, C., and Thiele, L.: Parallel Complexity of Lattice Basis Reduction and a Floating-Point Parallel Algorithm. Proceedings PARLE ’93, Springer Lecture Notes in Computer Science LNCS 694, pp. 744–747 (1993). 11. Heckler, C., and Thiele, L.: Complexity Analysis of a Parallel Lattice Basis Reduction Algorithm. To appear in SIAM J. Comput. (1998). 12. Joux, A.: A Fast Parallel Lattice Reduction Algorithm. Proceedings Second Gauss Symposium, pp. 1–15 (1993). 13. Joux, A.: La R´eduction des R´eseaux en Cryptographie. PhD Thesis Laboratoire d’Informatique de L’Ecole Normale Superieure LIENS, Paris, France (1993). 14. Joux, A., and Stern, J.: Lattice Reduction: A Toolbox for the Cryptanalyst. Preprint (1994). 15. Kaltofen, E.: On the Complexity of Finding Short Vectors in Integer Lattices. Computer Algebra, Springer Lecture Notes in Computer Science LNCS 162, pp. 236–244 (1983). 16. Lagarias, J.C., and Odlyzko, A.M.: Solving Low-Density Subset Sum Problems. JACM 32, pp. 229–246 (1985). 17. LaMacchia, B.A.: Basis Reduction Algorithms and Subset Sum Problems. Master’s Thesis MIT, (1991). 18. Lenstra, A.K., Lenstra, H.W., and Lov´ asz, L.: Factoring Polynomials with Rational Coefficients. Math. Ann. 261, pp. 515–534 (1982). 19. Lenstra, H.W.: Integer Programming With a Fixed Number of Variables. Mathematics Operations Research, pp. 538–548 (1983). 20. LiDIA Group: LiDIA Manual. Universit¨ at des Saarlandes/TU Darmstadt, see LiDIA homepage: http://www.informatik.tu-darmstadt.de/TI/LiDIA (1997). 21. Papanikolaou, T.: Software-Entwicklung in der Computer-Algebra am Beispiel einer objektorientierten Bibliothek f¨ ur algorithmische Zahlentheorie. PhD Thesis, Universit¨ at des Saarlandes, Saarbr¨ ucken, Germany (1997). 22. Pohst, M.E.: A Modification of the LLL Reduction Algorithm. Journal of Symbolic Computation 4, pp. 123–127 (1987).
An Efficient Parallel Block-Reduction Algorithm
337
23. Pohst, M.E., and Zassenhaus, H.J.: Algorithmic Algebraic Number Theory. Cambridge University Press (1989). 24. Roch, J.L., and Villard, G.: Parallel Gcd and Lattice Basis Reduction. Proceedings CONPAR ’92, Springer Lecture Notes in Computer Science LNCS 634, pp. 557– 564 (1992). 25. Schnorr, C.P.: A More Efficient Algorithm for Lattice Basis Reduction. Journal of Algorithms 9, pp. 47–62 (1988). 26. Schnorr, C.P., and Euchner, M.: Lattice Basis Reduction: Improved Practical Algorithms and Solving Subset Sum Problems. Proceedings of Fundamentals of Computation Theory ’91, Springer Lecture Notes in Computer Science LNCS 529, pp. 68–85 (1991). 27. Sch¨ onhage, A.: Factorization of Univariate Integer Polynomials by Diophantine Approximation and an Improved Basis Reduction Algorithm. Proceedings ICALP ’84, Springer Lecture Notes in Computer Science LNCS 172, pp. 436–447 (1984). 28. Schrijver, A.: Theory of Linear and Integer Programming. J. Wiley & Sons, New York (1986). 29. Villard, G.: Parallel Lattice Basis Reduction. Proceedings ISSAC ’92, ACM Press, pp. 269–277 (1992). 30. Wetzel, S.: Lattice Basis Reduction Algorithms and their Applications. PhD Thesis, Universit¨ at des Saarlandes, Saarbr¨ ucken, Germany, submitted (1998).
Fast Multiprecision Evaluation of Series of Rational Numbers Bruno Haible1 and Thomas Papanikolaou2 1
2
ILOG, 9 rue de Verdun, F – 94253 Gentilly Cedex [email protected] Laboratoire A2X, 351 cours de la Lib´eration, F – 33 405 Talence Cedex [email protected]
Abstract. We describe two techniques for fast multiple-precision evaluation of linearly convergent series, including power series and Ramanujan series. The computation time for N bits is O((log N )2 M (N )), where M (N ) is the time needed to multiply two N -bit numbers. Applications include fast algorithms for elementary functions, π, hypergeometric functions at rational points, ζ(3), Euler’s, Catalan’s and Ap´ery’s constant. The algorithms are suitable for parallel computation.
1
Introduction
Multiple-precision evaluation of real numbers has become efficiently possible since Sch¨ onhage and Strassen [15] have showed that the bit complexity of the multiplication of two N -bit numbers is M (N ) = O(N log N log log N ). This is not only a theoretical result; a C++ implementation [8] can exploit this already for N = 40000 bits. Algorithms for computing elementary functions (exp, log, sin, cos, tan, asin, acos, atan, sinh, cosh, tanh, arsinh, arcosh, artanh) have appeared in [4], and a remarkable algorithm for π was found by Brent and Salamin [14]. However, all these algorithms suffer from the fact that calculated results are not reusable, since the computation is done using real arithmetic (using exact rational arithmetic would be extremely inefficient). Therefore functions or constants have to be recomputed from the scratch every time higher precision is required. In this note, we present algorithms for fast computation of sums of the form S=
∞ X
R(n)F (0) · · · F (n)
n=0
where R(n) and F (n) are rational functions in n with rational coefficients, provided that this sum is linearly convergent, i.e. that the n-th term is O(c−n ) with c > 1. Examples include elementary and hypergeometric functions at rational points in the interior of the circle of convergence, as well as π and Euler’s, Catalan’s and Ap´ery’s constants. J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 338–350, 1998. c Springer-Verlag Berlin Heidelberg 1998
Fast Multiprecision Evaluation of Series of Rational Numbers
339
The presented algorithms are easy to implement and extremely efficient, since they take advantage of pure integer arithmetic. The calculated results are exact, making checkpointing and reuse of computations possible. Finally, the computation of our algorithms can be easily parallelised.
2
Evaluation of Linearly Convergent Series
The technique presented here applies to all linearly convergent sums of the form S=
∞ X a(n) p(0) · · · p(n) b(n) q(0) · · · q(n) n=0
where a(n), b(n), p(n), q(n) are integers with O(log n) bits. The most often used case is that a(n), b(n), p(n), q(n) are polynomials in n with integer coefficients. Algorithm. Given two index bounds n1 and n2 , consider the partial sum S=
X n1 ≤n
a(n) p(n1 ) · · · p(n) b(n) q(n1 ) · · · q(n)
It is not computed directly. Instead, we compute the integers P = p(n1 ) · · · p(n2 − 1), Q = q(n1 ) · · · q(n2 − 1), B = b(n1 ) · · · b(n2 − 1) and T = BQS. If n2 − n1 < 5, these are computed directly. If n2 − n1 ≥ 5, they are computed using binary splitting: Choose an index nm in the middle of n1 and n2 , compute the components Pl , Ql , Bl , Tl belonging to the interval n1 ≤ n < nm , compute the components Pr , Qr , Br , Tr belonging to the interval nm ≤ n < n2 , and set P = Pl Pr , Q = Ql Qr , B = Bl Br and T = Br Qr Tl + Bl Pl Tr . Finally, this algorithm is applied to n1 = 0 and n2 = nmax = O(N ), and a T is performed. final floating-point division S = BQ Complexity. The bit complexity of computing S with N bits of precision is O((log N )2 M (N )). Proof. Since we have assumed the series to be linearly convergent, the n-th 2 + O(1) will ensure term is O(c−n ) with c > 1. Hence choosing nmax = N log log c −N that the round-off error is < 2 . By our assumption that a(n), b(n), p(n), q(n) are integers with O(log n) bits, the integers P , Q, B, T belonging to the interval n1 ≤ n < n2 all have O((n2 − n1 ) log n2 ) bits. nmax + O(1). At recursion depth The algorithm’s recursion depth is d = loglog 2 nmax k (1 ≤ k ≤ d), integers having each O( 2k log nmax ) bits are multiplied. Thus, the entire computation time t is
340
Bruno Haible and Thomas Papanikolaou
t=
d X k=1
=
d X
n max 2k−1 O M log n max 2k O (M (nmax log nmax ))
k=1
= O(log nmax M (nmax log nmax )) N ) and Because of nmax = O( log c
M
N N log log c log c
=O
1 N (log N )2 log log N log c
we have
t=O
=O
1 log N M (N ) log c
1 (log N )2 M (N ) log c
Considering c as constant, this is the desired result. Checkpointing / Parallelising. A checkpoint can be easily done by storing the (integer) values of n1 , n2 , P , Q, B and T . Similarly, if m processors are available, then the interval [0, nmax] can be divided into m pieces of length l = bnmax /mc. After each processor i has computed the sum of its interval [il, (i + 1)l], the partial sums are combined to the final result using the rules described above. Notes 1. For the special case a(n) = b(n) = 1, the binary splitting algorithm has already been documented in [3], section 6, and [2], section 10.2.3. 2. Explicit computation of P , Q, B, T is only required as a recursion base, for n2 − n1 < 2, but avoiding recursions for n2 − n1 < 5 gains some percent of execution speed. 3. The binary splitting algorithm is asymptotically faster than step-by-step evaluation of the sum – which has binary complexity O(N 2 ) – because it pushes as much multiplication work as possible to the region where multiplication becomes efficient. If the multiplication were implemented as an M (N ) = O(N 2 ) algorithm, the binary splitting algorithm would provide no speedup over step-by-step evaluation. 2.1
Example: The Factorial
This is the most classical example of the binary splitting algorithm and was probably known long before [2].
Fast Multiprecision Evaluation of Series of Rational Numbers
341
Computation of the factorial is best done using the binary splitting algorithm, combined with a reduction of the even factors into odd factors and multiplication with a power of 2, according to the formula k Y Y (2m + 1) n! = 2n−σ2 (n) · k≥1
n 2k
<2m+1≤
n 2k−1
and where the products Y
P (n1 , n2 ) =
(2m + 1)
n1 <m≤n2
are evaluated according to the binary splitting algorithm: 2 if n2 − n1 ≥ 5. P (n1 , n2 ) = P (n1 , nm )P (nm , n2 ) with nm = n1 +n 2 2.2
Example: Elementary Functions at Rational Points
The binary splitting algorithm can be applied to the fast computation of the elementary functions at rational points x = uv , simply by using the power series. We present how this can be done for exp(x), ln(x), sin(x), cos(x), arctan(x), sinh(x) and cosh(x). The calculation of other elementary functions is similar (or it can be reduced to the calculation of these functions). exp(x) for Rational x. This is a direct application of the above algorithm with a(n) = 1, b(n) = 1, p(0) = q(0) = 1, and p(n) = u, q(n) = nv for n > 0. Because the series is not only linearly convergent – exp(x)is an entire function N –, nmax = O( log N+log
1 |x|
), hence the bit complexity is O
(log N)2 1 log N+log |x|
M (N ) .
Considering x as constant, this is O(log N M (N )). exp(x) for Real x. This can be computed using the addition theorem for exp, by a trick due to Brent [3] (see also [2], section 10.2, exercise 8). Write x = x0 +
∞ X uk k=0
k
with x0 integer, vk = 22 and |uk | < 22
vk
k−1
, and compute Y uk exp exp(x) = exp(x0 ) · vk k≥0
This algorithm has bit complexity O(log N) 2 X (log N ) M (N ) = O((log N )2 M (N )) O log N + 2k k=0
342
Bruno Haible and Thomas Papanikolaou
ln(x) for Rational x. For rational |x − 1| < 1, the binary splitting algorithm can also be applied directly to the power series for ln(x). Write x − 1 = uv and compute the series with a(n) = 1, b(n) = n + 1, q(n) = v, p(0) = u, and p(n) = −u for n > 0. This algorithm has bit complexity O((log N )2 M (N )). ln(x) for Real x. This can be computed using the “inverse” Brent trick: Start with y := 0. As long as x 6= 1within the actual precision, choose k maximal with |x −1| < 2−k . Put z = 2−2k 22k (x − 1) , i.e. let z contain the first k significant bits of x − 1. z is a good approximation for ln(x). Set y := y + z and x := x · exp(−z). Since x · exp(y) is an invariant of the algorithm, the final y is the desired value ln(x). This algorithm has bit complexity O(log N) 2 X (log N ) M (N ) = O((log N )2 M (N )) O log N + 2k k=0
sin(x), cos(x) for Rational x. These are direct applications of the binary splitting algorithm: For sin(x), put a(n) = 1, b(n) = 1, p(0) = u, q(0) = v, and p(n) = −u2 , q(n) = (2n)(2n + 1)v2 for n > 0. For cos(x), put a(n) = 1, b(n) = 1, p(0) = 1, q(0) = 1, and p(n) = −u2 , q(n) = (2n − 1)(2n)v2 for n > 0. Of course, when both sin(x) and cos(x)p are needed, one should only compute sin(x) this way, and then set cos(x) = ± 1 − sin(x)2 . This is a 20% speedup at least. The bit complexity of these algorithms is O(log N M (N )). sin(x), cos(x) for Real x. To compute cos(x) + i sin(x) = exp(ix) for real x, again the addition theorems and Brent’s trick can be used. The resulting algorithm has bit complexity O((log N )2 M (N )). arctan(x) for Rational x. For rational |x| < 1, the fastest way to compute arctan(x) with bit complexity O((log N )2 M (N )) is to apply the binary splitting algorithm directly to the power series for arctan(x). Put a(n) = 1, b(n) = 2n+1, q(n) = 1, p(0) = x and p(n) = −x2 for n > 0. arctan(x) for Real x. This again can be computed using the “inverse” Brent trick: 1 x + i √1+x and ϕ := 0. During the algorithm z will Start out with z := √1+x 2 2 be a complex number with |z| = 1 and Re(z) > 0. As long as Im(z) 6= 0 within the actual precision, choose k maximal with | Im(z)| < 2−k . Put α = 2−2k 22k Im(z) , i.e. let α contain the first k significant bits of Im(z). α is a good approximation for arcsin(Im(z)). Set ϕ := ϕ + α and z := z · exp(−iα).
Fast Multiprecision Evaluation of Series of Rational Numbers
343
Since z · exp(iϕ) is an invariant of the algorithm, the final ϕ is the desired x . value arcsin √1+x 2 This algorithm has bit complexity O(log N) 2 X (log N ) M (N ) = O((log N )2 M (N )) O log N + 2k k=0
. sinh(x), cosh(x) for Rational and Real x. These can be computed by similar algorithms as sin(x) and cos(x) above, with the same asymptotic bit complexity. The standard computation, using exp(x) and its reciprocal (calculated by the Newton method) results also to the same complexity and works equally well in practice. The bit complexity of these algorithms is O(log N M (N )) for rational x and O((log N )2 M (N )) for real x. 2.3
Example: Hypergeometric Functions at Rational Points
The binary splitting algorithm is well suited for the evaluation of a hypergeometric series X ∞ an1 · · · anr n a1 , . . . , ar x x = F b 1 , . . . , bs bn1 · · · bns n=0
with rational coefficients a1 , ..., ar , b1 , ..., bs at a rational point x in the interior of the circle of convergence. Just put a(n) = 1, b(n) = 1, p(0) = q(0) = 1, and p(n) r +n−1)x = (a(b11+n−1)···(a for n > 0. The evaluation can thus be done with bit q(n) +n−1)···(bs +n−1) complexity O((log N )2 M (N )) for r = s and O(log N M (N )) for r < s. 2.4
Example: π
The Ramanujan series for π ∞ 12 X (−1)n (6n)!(A + nB) 1 = 3/2 π (3n)!n!3C 3n C n=0
with A = 13591409, B = 545140134, C = 640320 found by the Chudnovsky’s 1 and which is used by the LiDIA [10, 11, 9] and the Pari [7] system to compute π, is usually written as an algorithm of bit complexity O(N 2 ). It is, however, possible to apply binary splitting to the sum. Put a(n) = A + nB, b(n) = 1, p(0) = 1, q(0) = 1, and p(n) = −(6n − 5)(2n − 1)(6n − 1), q(n) = n3 C 3 /24 for n > 0. This reduces the complexity to O((log N )2 M (N )). Although this is theoretically slower than Brent-Salamin’s quadratically convergent iteration, which has a bit complexity of O(log N M (N )), in practice the binary splitted Ramanujan sum is three times faster than Brent-Salamin, at least in the range from N = 1000 bits to N = 1000000 bits. 1
A special case of [2], formula (5.5.18), with N=163.
344
Bruno Haible and Thomas Papanikolaou
Example: Catalan’s Constant G
2.5
A linearly convergent sum for Catalan’s constant G :=
∞ X
(−1)n (2n + 1)2 n=0
is given in [2], p. 386: G=
∞ 3X 8 n=0
√ π 1 + log(2 + 3) 2 8 (2n + 1)
2n n
The series is summed using binary splitting, putting a(n) = 1, b(n) = 2n + 1, p(0) = 1, q(0) = 1, and p(n) = n, q(n) = 2(2n + 1) for n > 0. Thus G can be computed with bit complexity O((log N )2 M (N )). 2.6
Example: The Gamma Function at Rational Points
For evaluating Γ (s) for rational s, we first reduce s to the range 1 ≤ s ≤ 2 by the formula Γ (s + 1) = sΓ (s). To compute Γ (s) with a precision of N bits, choose a positive integer x with xe−x < 2−N . Partial integration lets us write Z
∞
Γ (s) =
e−t ts−1 dt
0
= xs e−x
∞ X n=0
xn + s(s + 1) · · · (s + n)
Z
∞
e−t ts−1 dt
x
The last integral is < xe−x < 2−N . The series is evaluated as a hypergeometric function (see above); the number of terms to be summed up is O(N ), since x = O(N ). Thus the entire computation can be done with bit complexity O((log N )2 M (N )). Note 1. This result is already mentioned in [4]. 2. For Γ (s) there is no checkpointing possible because of the dependency on x in the binary splitting. 2.7
Example: The Riemann Zeta Value ζ(3)
Recently, Doron Zeilberger’s method of “creative telescoping” has been applied to Riemann’s zeta function at s = 3 (see [1]), which is also known as Ap´ery’s constant:
Fast Multiprecision Evaluation of Series of Rational Numbers
ζ(3) =
345
∞ 1 X (−1)n−1 (205n2 − 160n + 32) 5 2 n=1 n5 2n n
This sum consists of three hypergeometric series. Binary splitting can also be applied directly, by putting a(n) = 205n2 + 250n + 77, b(n) = 1, p(0) = 1, p(n) = −n5 for n > 0, and q(n) = 32(2n + 1)5 . Thus the bit complexity of computing ζ(3) is O((log N )2 M (N )).
3
Evaluation of Linearly Convergent Series of Sums
The technique presented in the previous section also applies to all linearly convergent sums of the form ∞ X c(n) p(0) · · · p(n) a(n) c(0) +···+ U = b(n) d(0) d(n) q(0) · · · q(n) n=0 where a(n), b(n), c(n), d(n), p(n), q(n) are integers with O(log n) bits. The most often used case is again that a(n), b(n), c(n), d(n), p(n), q(n) are polynomials in n with integer coefficients. Algorithm. Given two index bounds n1 and n2 , consider the partial sums X
S=
n1 ≤n
and U=
X n1 ≤n
a(n) b(n)
a(n) p(n1 ) · · · p(n) b(n) q(n1 ) · · · q(n)
c(n1 ) c(n) + ···+ d(n1 ) d(n)
p(n1 ) · · · p(n) q(n1 ) · · · q(n)
As above, we compute the integers P = p(n1 ) · · · p(n2 − 1), Q = q(n1 ) · · · q(n2 − 1), B = b(n1 ) · · · b(n2 − 1), T = BQS, D = d(n1 ) · · · d(n2 − 1), C = c(n1) c(n2−1) + · · · + d(n and V = DBQU . If n2 − n1 < 4, these are computed D d(n 1) 2 −1) directly. If n2 − n1 ≥ 4, they are computed using binary splitting: Choose an index nm in the middle of n1 and n2 , compute the components Pl , Ql , Bl , Tl , Dl , Cl , Vl belonging to the interval n1 ≤ n < nm , compute the components Pr , Qr , Br , Tr , Dr , Cr , Vr belonging to the interval nm ≤ n < n2 , and set P = Pl Pr , Q = Ql Qr , B = Bl Br , T = Br Qr Tl + Bl Pl Tr , D = Dl Dr , C = Cl Dr + Cr Dl and V = Dr Br Qr Vl + Dr Cl Bl Pl Tr + Dl Bl Pl Vr . Finally, this algorithm is applied to n1 = 0 and n2 = nmax = O(N ), and final T V and U = DBQ are performed. floating-point divisions S = BQ Complexity. The bit complexity of computing S and U with N bits of precision is O((log N )2 M (N )).
346
Bruno Haible and Thomas Papanikolaou
Proof. By our assumption that a(n), b(n), c(n), d(n), p(n), q(n) are integers with O(log n) bits, the integers P , Q, B, T , D, C, V belonging to the interval n1 ≤ n < n2 all have O((n2 − n1 ) log n2 ) bits. The rest of the proof is as in the previous section. Checkpointing / Parallelising. A checkpoint can be easily done by storing the (integer) values of n1 , n2 , P , Q, B, T and additionally D, C, V . Similarly, if m processors are available, then the interval [0, nmax] can be divided into m pieces of length l = bnmax /mc. After each processor i has computed the sum of its interval [il, (i + 1)l], the partial sums are combined to the final result using the rules described above. Example: Euler’s Constant C P∞ P∞ xn xn Theorem Let f(x) = n=0 n=0 Hn n!2 . Then for x → ∞, n!2 and g(x) = √ g(x) 1 −4 x . f(x) = 2 log x + C + O e 3.1
Proof. The Laplace method for asymptotic evaluation of exponentially growing sums and integrals yields f(x) = e2
√ x − 14
x
1 1 √ (1 + O(x− 4 )) 2 π
and √ 2 x − 14
g(x) = e
x
1 √
2 π
On the other hand, h(x) :=
g(x) f(x)
1 − 14 log x + C + O(log x · x ) 2 satisfies the differential equation
xf(x) · h00 (x) + (2xf 0 (x) + f(x)) · h0 (x) = f 0 (x) hence 1 h(x) = log x + C + c2 2
Z x
∞
√ 1 1 −4 x log x + C + O(e dt = ) tf(t)2 2
l Algorithm. To compute C with a precision of N bits, set x = (N + 2)
log 2 4
m2 ,
and evaluate the series for g(x) and f(x) simultaneously, using the binarysplitting algorithm, with a(n) = 1, b(n) = 1, c(n) = 1, d(n) = n + 1, p(n) = x, . . . be√the solution of the equation q(n) = (n + 1)2 . Let α = 3.591121477 √ 1 −α log α + α + 1 = 0. Then α x − 4 log α log x + O(1) terms of the series suffice for the relative error to be bounded by 2−N .
Fast Multiprecision Evaluation of Series of Rational Numbers
347
Complexity. The bit complexity of this algorithm is O((log N )2 M (N )). Note 1. This algorithm was first mentioned in [6]. It is by far the fastest known algorithm for computing Euler’s constant. 2. For Euler’s constant there is no checkpointing possible because of the dependency on x in the binary splitting.
4
Computational Results
In this section we present some computational results of our CLN and LiDIA implementation of the algorithms presented in this note. We use the official version (1.3) and an experimental version (1.4a) of LiDIA. We have taken advantage of LiDIA’s ability to replace its kernel (multiprecision arithmetic and memory management) [10, 11, 9], so we were able to use in both cases CLN’s fast integer arithmetic routines. 4.1
Timings
Table 1 shows the running times for the calculation of exp(1), log(2), π, C, G and ζ(3) to precision 100, 1000, 10000 and 100000 decimal digits. The timings are given in seconds and they denote the real time needed, i.e. system and user time. The computation was done on an Intel Pentium with 133Hz and 32MB of RAM. Table 1. LiDIA-1.4a timings of computation of constants using binary-splitting D 102 103 104 105
exp(1) log(2) 0.0005 0.0020 0.0069 0.0474 0.2566 1.9100 5.5549 45.640
π 0.0014 0.0141 0.6750 17.430
C 0.0309 0.8110 33.190 784.93
G 0.0179 0.3580 13.370 340.33
ζ(3) 0.0027 0.0696 2.5600 72.970
The second table summarizes the performance of exp(x) in various Computer Algebra systems2 . For a fair comparison of the algorithms, both argument and precision are chosen in such a way, that system–specific optimizations (BCD arithmetic in Maple, FFT multiplication in CLN, √ special exact argument handling in LiDIA) do not work. We use x = − 2 and precision 10(i/3), with i running from 4 to 15. 2
We do not list the timings of LiDIA-1.4a since these are comparable to those of CLN.
348
Bruno Haible and Thomas Papanikolaou √ Table 2. Timings of computation of exp(− 2) D 21 46 100 215 464 1000 2154 4641 10000 21544 46415 100000
Maple 0.00090 0.00250 0.01000 0.03100 0.11000 0.4000 1.7190 8.121 39.340 172.499 868.841 4873.829
Pari 0.00047 0.00065 0.00160 0.00530 0.02500 0.2940 0.8980 5.941 39.776 280.207 1972.184 21369.197
LiDIA-1.3 CLN 0.00191 0.00075 0.00239 0.00109 0.00389 0.00239 0.00750 0.00690 0.02050 0.02991 0.0704 0.0861 0.2990 0.2527 1.510 0.906 7.360 4.059 39.900 15.010 129.000 39.848 437.000 106.990
MapleV R3 is the slowest system in this comparison. This is probably due to the BCD arithmetic it uses. However, Maple seems to have an asymptotically better algorithm for exp(x) for numbers having more than 10000 decimals. In this range it outperforms Pari-1.39.03, which is the fastest system in the 0–200 decimals range. The comparison indicating the strength of binary-splitting is between LiDIA1.3 and CLN itself. Having the same kernel, the only difference is here that LiDIA√ 1.3 uses Brent’s O( nM (n)) for exp(x), whereas CLN changes from Brent’s method to a binary-splitting version for large numbers. As expected in the range of 1000–100000 decimals CLN outperforms LiDIA1.3 by far. The fact that LiDIA-1.3 is faster in the range of 200–1000√decimals (also in some trig. functions) is probably due to a better optimized O( nM (n)) method for exp(x). 4.2
Distributed computing of ζ(3)
Using the method described in 2.7 the authors were the first to compute 1,000,000 decimals of ζ(3) [13]. The computation took 8 hours on a Hewlett Packard 9000/712 machine. After distributing on a cluster of 4 HP 9000/712 machines the same computation required only 2.5 hours. The half hour was necessary for reading the partial results from disk and for recombining them. Again, we have used binary-splitting for recombining: the 4 partial result produced 2 results which were combined to the final 1,000,000 decimals value of ζ(3). This example shows the importance of checkpointing. Even if a machine crashes through the calculation, the results of the other machines are still usable. Additionally, being able to parallelise the computation reduced the computing time dramatically.
Fast Multiprecision Evaluation of Series of Rational Numbers
4.3
349
Euler’s Constant C
We have implemented a version of Brent’s and McMillan’s algorithm [6] and a version accelerated by binary-splitting as shown in 3.1. The computation of C was done twice on a SPARC-Ultra machine with 167 MHz and 256 MB of RAM. The first computation using the non-accelerated version required 160 hours. The result of this computation was then verified by the binary splitting version in (only) 14 hours. The first 475006 partial quotients of the continued fraction of C were computed on an Intel Pentium with 133 MHz and 32 MB of RAM in 3 hours using a programm by H. te Riele based on [5], which was translated to LiDIA for efficiency reasons. Computing the 475006th convergent produced the following improved theorem: If C is a rational number, C = p/q, then |q| > 10244663 Details of this computation (including statistics on the partial quotients) can be found in [12].
5
Conclusions
Although powerful, the binary splitting method has not been widely used. Especially, no information existed on the applicability of this method. In this note we presented a generic binary-splitting summation device for evaluating two types of linearly convergent series. From this we derived simple and computationally efficient algorithms for the evaluation of elementary functions and constants. These algorithms work with exact objects, making them suitable for use within Computer Algebra systems. We have shown that the practical performance of our algorithms is superior to current system implementations. In addition to existing methods, our algorithms provide the possibility of checkpointing and parallelising. These features can be useful for huge calculations, such as those done in analytic number theory research.
6 Thanks The authors would like to thank J¨ org Arndt, for pointing us to chapter 10 in [2]. We would also like to thank Richard P. Brent for his comments and Hermann te Riele for providing us his program for the continued fraction computation of Euler’s constant.
350
Bruno Haible and Thomas Papanikolaou
References 1. Theodor Amdeberhan and Doron Zeilberger: Acceleration of hypergeometric series via the WZ method. Electronic Journal of Combinatorics, Wilf Festschrift Volume (to appear) 2. Jonathan M. Borwein and Peter B. Borwein: Pi and the AGM, Wiley (1987) 3. Richard P. Brent: The complexity of multiple-precision arithmetic. Complexity of Computational Problem Solving, (1976) 4. Richard P. Brent: Fast multiple-precision evaluation of elementary functions. Journal of the ACM 23 (1976), 242–251 5. Richard P. Brent, Alf van der Poorten and Hermann te Riele: A comparative study of algorithms for computing continued fractions of algebraic numbers. In H. Cohen (editor), Algorithmic Number Theory: Second International Symposium, ANTS-II (1996), Springer Verlag, 37–49. 6. Richard P. Brent, and Edwin M. McMillan: Some new algorithms for high-precision computation of Euler’s constant. Mathematics of Computation 34 (1980), 305–312 7. Henri Cohen, C. Batut, D. Bernardi, and M. Olivier: GP/PARI calculator – version 1.39.03. Available via anonymous FTP from ftp://megrez.math.u-bordeaux.fr (1995). 8. Bruno Haible: CLN, a class library for numbers. Available via anonymous FTP from ftp://ftp.santafe.edu/pub/gnu/cln.tar.gz (1996) 9. LiDIA Group: LiDIA 1.3 – a library for computational number theory. Available from ftp://ftp.informatik.tu-darmstadt.de/pub/TI/systems/LiDIA or via WWW from http://www.informatik.tu-darmstadt.de/TI/LiDIA, Technische Universit¨ at Darmstadt (1997) 10. Thomas Papanikolaou, Ingrid Biehl, and Johannes Buchmann: LiDIA: a library for computational number theory. SFB 124 report, Universit¨ at des Saarlandes (1995) 11. Thomas Papanikolaou: Entwurf und Entwicklung einer objektorientierten Bibliothek f¨ ur algorithmische Zahlentheorie. PhD thesis, Universit¨ at des Saarlandes (1997). 12. Thomas Papanikolaou: Homepage, http://www.math.u-bordeaux.fr/~papanik 13. Simon Plouffe: ISC: Inverse Symbolic Calculator. Tables of records of computation, http://www.cecm.sfu.ca/projects/ISC/records2.html 14. Eugene Salamin: Computation of π using arithmetic-geometric mean. Mathematics of Computation 30 (1976), 565–570. 15. Sch¨ onhage, A., and Strassen, V.: Schnelle Multiplikation großer Zahlen. Computing 7 (1971), 281–292.
A Problem Concerning a Character Sum (Extended Abstract?) E. Teske1 and H.C. Williams??2 1
Technische Universit¨ at Darmstadt Institut f¨ ur Theoretische Informatik Alexanderstraße 10, 64283 Darmstadt Germany 2 University of Manitoba Dept. of Computer Science Winnipeg, MB Canada R3T 2N2
Abstract. Let p be a prime congruent to −1 modulo 4,
Pp−1
n
n p
the Legen-
dre symbol and S(k) = n=1 nk p . The problem of finding a prime p such that S(3) > 0 was one of the motivating forces behind the development of several of Shanks’ ideas for computing in algebraic number fields, although neither he nor D. H. and Emma Lehmer were ever successful in finding such a p. In this extended abstract we summarize some techniques which were successful in producing, for each k such that 3 ≤ k ≤ 2000, a value for p such that S(k) > 0.
1
Introduction
Let√d denote a fundamental discriminant of an imaginary quadratic field IK = Q( d ) and let h(d) denote the class number of IK. Let p be a prime (≡ 3(mod 4)),
n p
the Legendre symbol and n , n S(k) = p n=1 p−1 X
k
(1)
Ayoub, Chowla and Walum [1] showed that while S(1) = −ph(−p), S(2) = −p2 h(−p) and S(k) < 0 whenever k ≥ p − 2, we nevertheless have S(3) > 0 infinitely often. It is illustrated in [12] that the problem of finding a prime p (≡ 3(mod 4)) for which S(3) > 0 is connected with the problem of producing a small value for the ratio √ λ(p) = h(−p)/ p . Namely, if p (≡ 3(mod 4)) is a prime such that qp = −1 for all primes q ≤ 41 and S(3) > 0, then λ(p) < .041. A collection of correspondence between Shanks ? ??
The full version of this paper is to appear in Experimental Mathematics Research supported by NSERC of Canada grant A7649
J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 351–357, 1998. c Springer-Verlag Berlin Heidelberg 1998
352
E. Teske and H.C. Williams
and D. H. and Emma Lehmer covering the period between 1968-71 shows that these two problems were the focus of Shanks’ and the Lehmers’ investigations; it was during this period that Shanks was very active in formulating techniques that would be of very great significance to the development of computational algebraic number theory. Despite a concerted effort made by the Lehmers and Shanks, they never found a value of p for which either λ(p) < .041 or S(3) > 0. They did, however find several values of p for which S(4) > 0 [6], and noted that S(5), S(6) > 0 for p = 163. Later Fine [4] proved the following result. Theorem 1. For each real k > 2 there are infinitely many primes p ≡ 3(mod 4) for which S(k) > 0 and infinitely many for which S(k) < 0. Unfortunately, Fine’s method is not easily adapted to the problem of finding values for p such that S(k) > 0. The purpose of this paper is to show how to find such values of p for small integer values of k. Our initial objective was to discover values of p such that S(k) > 0 for 3 < k ≤ 50, but we were somewhat surprised to learn that we could extend our methods to do this for all 3 < k ≤ 2000. We also exhibit a value of p for which S(3) > 0 and λ(p) < .041 under the ERH. As usual, we define the Dirichlet L-function by L(s, χ) =
∞ X
n−s χ(n) .
n=1
Also, if χ(n) is the Kronecker symbol √ for IK = Q( d ) asserts that 2πh(d)/(w
d n
, then the analytic class number formula
p |d| ) = L(1, χ) ,
(2)
where w is the number of rootsofunity in IK (w = 2 if |d| > 4). When d = −p ≡ 1(mod 4), then χ(n) = nd = np .
2
Our Initial Strategy
In [12] it is shown that k−1 b 2 c √ X (2i)! k (−1)i −pk p 2i L(2i + 1, χ) , S(k) = π (2π)2i i=0
where χ(m) =
m p
=
−p m
. In order to get S(k) > 0, we need L(1, χ) < A(k, χ) ,
(3)
A Problem Concerning a Character Sum
353
where we define b k−1 2 c
X (2i)!
A(k, χ) =
i=1
(−1)i+1 L(2i + 1, χ) . (2π)2i k 2i
Our first strategy was to find p such that χ(q) = −1 for as many small primes q as possible. Suppose that χ(q) = −1 for all primes q ≤ Q; then from the Euler product representation of L(s, χ) we have L(s, χ) = Fs (Q)Ts (Q, χ) , where Fs (Q) =
Y q≤Q
qs
qs , +1
Ts (Q, χ) =
Y q>Q
qs
qs . − χ(q)
For Ts (Q, χ) we have the estimate (cf. [12]) (−1)j Ts (Q, χ) > (−1)j − 3Q−s+1 / log Q ,
(4)
for any j ∈ ZZ as long as s ≥ 3 and Q ≥ 90. From (4) it follows that X (2i)! k 2i F2i+1 (Q)(−1)i+1 T2i+1 (Q, χ) > B(k, Q) , A(k, χ) = (2π)2i b k−1 2 c i=1
where b k−1 k 2 c X (2i)! k X (2i)! 2i 3 i+1 2i (−1) F2i+1 (Q) − F2i+1 (Q) . B(k, Q) = (2π)2i log Q i=1 (2Qπ)2i i=1 b k−1 2 c
Thus, if L(1, χ) < B(k, Q), then S(k) > 0. In order to find values of p ≡ 3(mod 4) such that χ(q) = −1 for all q ≤ Q we made use of the number sieve MSSU (see Lukes et al. [8] or [9]). Jacobson [5], p.128, found the number N257 = 7961860547428719787 , = which is the least positive prime integer p satisfying p ≡ 3(mod 8) and −p q −1 for all odd primes q ≤ 257. Indeed, we even have −Nq257 = −1 for all odd primes q ≤ 269; also h(−N257 ) = 140879803 and L(1, χ) = .156852. This means that for p = N257 we have S(k) > 0 if .156852 < B(k, Q), with some Q between 90 and 270. We next computed a table of values for B(k, Q) for Q = 270; because of the growth rate of the terms of B(k, Q), we computed it to 800 digits of precision. We found that B(k, Q) > .156855 for all k such that 4 ≤ k ≤ 142, which implies that S(k) > 0 for these values of k when p = N257 .
354
3
E. Teske and H.C. Williams
A Second Approach
The idea behind our second strategy for finding values of p such that S(k) > 0 is to allow for a greater degree of freedom than that afforded by insisting that χ(q) = −1 for all primes q ≤ Q. To this end we define Fs (Q, χ) by Fs (Q, χ) =
Y q≤Q
qs
qs − χ(q)
and b k−1 2 c
X (2i)!
B(k, Q, χ) =
i=1
b k−1 k 2 c (2i)! 2i (−1)i+1 F2i+1 (Q, χ) F2i+1 (Q, χ) 3 X − . 2i (2π) log Q i=1 (2Qπ)2i
k 2i
By using the same reasoning as before, we see that S(k) > 0 if L(1, χ) < B(k, Q, χ) or (5) T1 (Q, χ) < B(k, Q, χ)/F1 (Q, χ) . If we define Gs (Q, χ) = Fs (Q, χ)/F1 (Q, χ) =
Y q s−1 (q − χ(q)) q s − χ(q)
q≤Q
and b k−1 2 c
C(k, Q, χ) =
X (2i)! i=1
b k−1 k 2 c (2i)! 2i (−1)i+1 G2i+1 (Q, χ) 3 X G (Q, χ)− , 2i+1 (2π)2i log Q (2Qπ)2i k 2i
i=1
then by (5) we see that S(k) > 0 if T1 (Q, χ) = L(1, χ)/F1 (Q, χ) < C(k, Q, χ) .
(6)
Now a result of Elliott (see [10]) implies that for z between 0 and 2 it is very likely that T1 (Q, χ) < 1 + z, especially if z is small. This suggests that if k, Q, p and z are chosen such that C(k, Q, χ) ≥ 1 + z, the chance that T1 (Q, χ) < C(k, Q, χ) is very good. For example, if p is selected such that χ(q) = +1 for q = 2, 3, 5 and χ(q) = −1 for 7 ≤ q < Q = 220, then C(k, Q, χ) > 1.011 for 26 < k ≤ 800. (This was determined by computing C(k, Q, χ) to 2000 digits of precision.) But, by using the MSSU we found that p = 2754235520364791
(7)
satisfies the conditions above and h(−p) = 25834697; hence, since F1 (220, χ) = 1.52969893, we get T1 (220, χ) = L(1, χ)/F1 (220, χ) = 1.01098973 < 1.011 .
A Problem Concerning a Character Sum
355
Thus, for p given by (7) we have S(k) > 0 for 18 ≤ k ≤ 800. We also found it useful to make Q in (6) much larger than the limit to which we can sieve with the MSSU. This is because if Q∗ denotes the upper bound on the prime moduli used by MSSU, then T1 (Q∗ , χ) will likely not differ very much from T1 (Q, χ) when Q is much larger than Q∗ . Since we found that for k and χ fixed, C(k, Q, χ) grows with Q, it is likely that C(k, Q, χ) > T1 (Q, χ) for a larger interval of values of k. For example, if we put Q∗ = 230 and Q = 1000 and specify that χ(q) = 1 for q = 2, 3, 5, 7 and χ(q) = −1 for all the remaining q ≤ Q∗ , then p = 164093214527675999 (8) satisfies our conditions on χ(q) for q ≤ Q∗ . For this value of p we get h(−p) = 263229907; hence, T1 (Q, χ) = L(1, χ)/F1 (Q, χ) = 1.01102065. On tabulating C(k, Q, χ) for the χ values produced by p and Q = 1000, we found that if 29 ≤ k ≤ 35, then C(k, Q, χ) > 1.0128 and C(k, Q, χ) > 1.085 for 35 ≤ k ≤ 2000. Thus, since 1.085 > 1.011021, we see that for p given by (8) we get S(k) > 0 for 29 ≤ k ≤ 2000. That the value 1.085 is quite a lot larger than 1.011 suggests that if we had tabulated C(k, Q, χ) even further, we would likely have produced an even larger value for k such that S(k) > 0; however, at this point the computation of the C(k, Q, χ) values was very expensive because we were using 6000 digits of precision.
4
The Problem of S(3)
In [6] it is shown that if L(1, χ) < ζ(6)/(4ζ(2)ζ(3)) = .12863 ,
(9)
then S(3) > 0. The value of F1 (1283) = .12854204 is already less than the value of L(1, χ) needed by (9). But the least prime number p to satisfy p ≡ 3(mod 8) = −1 for all odd primes q ≤ 1283 might have as many as 54 digits [12], and −p q a number far too large for any current sieve device to find. There is, however, thanks to a recent result of Bach [2], another way to find a candidate for p. Because F1 (1279) is quite close to ζ(6)/(4ζ(2)ζ(3)), we simply found values for −N = −1 for all q ≤ 1279. We did this by specifying that for all N such that q prime values of q ≤ 1279, N ≡ 3(mod 8) , N ≡ 1(mod q) when q ≡ −1(mod 4) , N ≡ r(q)(mod q) when q ≡ 1(mod 4) . Here r(q) denotes a randomly selected nonresidue of q. Notice that if quadratic −N N satisfies these conditions we have q = −1 for all q ≤ 1279. The difficulty with this process is that the values we get for N are very large, 535 or more digits. However, testing the numbers for primality is very easy because N − 1
356
E. Teske and H.C. Williams
is divisible by all the primes q ≡ −1(mod 4) (q ≤ 1279). √ Thus it is easy to find a completely factored part of N − 1 which exceeds N, and the method of Pocklington mentioned in Brillhart, Lehmer and Selfridge [3] (Theorem 4) can easily be used to establish the primality of N . We produced 10 prime values for p in this way and selected that one such that F1 (200000, χ) was least. This value is the 535 digit p = 881974625057785931222613817074917532086866157498333873986616\ 772405314952314649125430692674421301535335822565110383045261\ 662288884171496652768853130693547568926092470486468758067960\ 339622958266444317598747950276228195628141063361018553506872\ 307865094282349696360084281769391483388553654419029093991970\ 223187255252971434802826943154408037354452295695797112414760\ 456576881727709666986157386200364701289849665480127513654606\ 154630655217220710053068332795778436402430725458959096262770\ 8420000628672269188450606570430205509080296159176108667 . To show that L(1, χ) for this p satisfies (9), we used the method of [2] to estimate L(1, χ). We define X
2Q−1
C(Q) =
i log i ,
aj = (Q + j) log(Q + i)/C(Q)
i=Q
(j = 0, 1, . . . , Q − 1). Bach showed that under the ERH Q−1 X ai log F1 (Q + i − 1) ≤ A(Q, d) , log L(1, χ) − i=0
√ where A(Q, d) = (A log |d| + B)/( q log Q) and A, B are explicit constants tabulated in [2], Table 3. Carrying 40 digits of precision, for Q = 275000000 and d = −p we computed that X
Q−1
ai log F1 (Q + i − 1) = −2.074865302036 .
i=0
and A(Q, p) = .0239249754, which implies that .12260465 ≤ L(1, χ) ≤ .12861391 . Thus, for the 535 digit prime p above we get S(3) > 0 and λ(p) < .041 under the ERH. Acknowledgment. The authors wish to thank the LiDIA Group [7] and the SIMATH Research Group [11] in Darmstadt and Saarbr¨ ucken, respectively, for providing software and computing time.
A Problem Concerning a Character Sum
357
References 1. R. Ayoub, S. Chowla, and H. Walum. On sums involving quadratic characters. J. London Math. Soc., 42:152–154, 1967. 2. E. Bach. Improved approximations for Euler products. In Number Theory: CMS Conference Proceedings, volume 15, pages 13–28. AMS, Providence, R.I., 1995. 3. J. Brillhart, D.H. Lehmer, and J. Selfridge. New primality criteria and factorizations of 2m ± 1. Math. Comp., 29:620–647, 1975. 4. N.J. Fine. On a question of Ayoub, Chowla and Walum concerning character sums. Illinois J. Math., 14:88–90, 1970. 5. M.J. Jacobson, Jr. Computational techniques in quadratic fields. Master’s thesis, University of Manitoba, 1995. M.Sc. Thesis. 6. D.H. Lehmer, E. Lehmer, and D. Shanks. Integer sequences having prescribed quadratic character. Math. Comp., 24:433–451, 1970. 7. LiDIA Group, Technische Universit¨ at Darmstadt, Darmstadt, Germany. LiDIA A library for computational number theory, Version 1.3, 1997. 8. R.F. Lukes, C.D. Patterson, and H.C. Williams. Numerical sieving devices: their history and some applications. Nieuw Archief voor Wiskunde, 13(4):113–139, 1995. 9. R.F. Lukes, C.D. Patterson, and H.C. Williams. Some results on pseudosquares. Math. Comp., 65:361–372, 1996. 10. D. Shanks. Class number, a theory of factorization and genera. In Proc. Symp. Pure Math. 20, pages 415–440. AMS, Providence, R.I., 1971. 11. SIMATH Research Group, Chair of Prof. Dr. H.G. Zimmer, University of Saarland, Saarbr¨ ucken, Germany. SIMATH Manual, 1997. 12. E. Teske and H.C. Williams. A problem concerning a character sum. Experimental Mathematics, to appear.
Formal Power Series and Their Continued Fraction Expansion Alf van der Poorten Centre for Number Theory Research, Macquarie University, Sydney [email protected]
1
Introduction
1.1 Basics. The familiar continued fraction algorithm, normally P∞applied to real numbers, can just as well be applied to formal Laurent series h=−m gh X −h in P0 a variable X −1 , with the ‘polynomial portion’ h=−m gh X −h of the complete quotient taken to be its ‘integer part’. Then the partial quotients are polynomials in X, and we learn that continued fraction expansions [ a0 (X), a1 (X), . . . , ah (X), . . . ] with partial quotients polynomials of degree at least 1 in X and defined over some field apparently converge to formal Laurent series in X −1 over that field. It is an interesting exercise to prove that directly and to come to understand the sense in which the convergents provide best approximations to Laurent series. Specifically, given a Laurent series F (X) — unless the contrary is clearly indicated we will assume it not to be a rational function — define its sequence (Fh )h≥0 of complete quotients by setting F0 = F , and Fh+1 = 1/(Fh − ah (X)). Here, the sequence (ah )h≥0 of partial quotients of F is given by ah = bFh c where b c denotes the polynomial part of its argument. Plainly we have 1
F = a0 +
1
a1 + a2 +
1 a3 + .
..
Only the partial quotients matter, so such a continued fraction expansion may be conveniently detailed by [ a0 , a1 , a2 , a3 , . . . ]. The truncations [ a0 , a1 , . . . , ah ] are rational functions ph /qh . Here, the pairs of relatively prime polynomials ph (X), qh (X) are given by the matrix identities a1 1 ah 1 ph ph−1 a0 1 ··· = qh qh−1 1 0 1 0 1 0 and the remark that the empty matrix product is the identity matrix. This alleged correspondence, whereby these matrix products provide the sequences of J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 358–371, 1998. c Springer-Verlag Berlin Heidelberg 1998
Formal Power Series and Their Continued Fraction Expansion
359
continuants (ph )h≥0 and (qh )h≥0 , and thus the convergents ph /qh for h ≥ 0, may be confirmed by induction on the number of matrices on noticing the definition [ a0 , a1 , . . . , ah ] = a0 + 1/[ a1 , . . . , ah ],
[ a0 ] = a0 .
It follows that the continuants qh satisfy deg qh+1 = deg ah+1 + deg qh . It also clearly follows, from transposing the matrix correspondence, that [ ah , ah−1 , . . . , a1 ] = qh /qh−1 ,
for
h = 1, 2, . . . .
h−1 /qh−1 qh whence, The matrix correspondence h /qh = ph−1 /qh−1 +(−1) P∞entails ph−1 /qh−1 qh , and so by induction, F = a0 + h=1 (−1)
deg(qh F − ph ) = − deg qh+1 < − deg qh , displaying the excellent quality of approximation to F provided by its convergents. Proposition 1. Let p, q be relatively prime polynomials. Then deg(qF − p) < − deg q if, and only if, the rational function p/q is a convergent to F . Proof. The ‘if’ part of the claim has already been noticed, so we may take h so that deg qh−1 ≤ deg q < deg qh , and note that supposing p/q is not a convergent entails that q is not a constant multiple of qh−1 . Because ph qh−1 − ph−1 qh = ±1, there are nonzero polynomials a and b such that q = aqh−1 + bqh p = aph−1 + bph , and so qF − p = a(qh−1 F − ph−1 ) + b(qh F − ph ). Now suppose that the two terms on the right are of different degree, deg a − deg qh and deg b − deg qh+1 , respectively. In that case plainly deg(qF − p) > deg(qh−1 F − ph−1 ) > deg(qh F − ph ), confirming that the convergents provide the locally best approximations to F. To verify the suggestion that the degrees of the two terms are different, notice that deg aqh−1 = deg bqh , otherwise deg q < deg qh is not possible, so deg a − deg qh = deg b−deg qh−1 > deg b−deg qh+1 . Moreover, deg a−deg qh = deg(qF − p). So it remains to confirm that deg a − deg qh ≥ − deg q. But that’s plain because, of course, deg a must be at least as large as deg qh −deg qh−1 . These arguments are noticeably clearer with a nonarchimedean absolute value, namely degree in X of a Laurent series in X −1 , than in the traditional archimedean case where one deals with the usual absolute value of real numbers.
360
Alf van der Poorten
1.2 Generalisations. There is of course an extensive literature touching on the topics of power series and continued fractions, going back to the very beginnings of modern mathematics. However the expansions involved are typically not the simple continued fractions we consider here but have the more general shape b1
F = a0 +
=: [ a0 , b1 : a1 , b2 : a2 , b3 : a3 , . . . ] .
b2
a1 + a2 +
b3 a3 + .
..
The abstract theory is not all that different from our ‘basics’ above, but now questions about the quality of convergence of the convergents are relevant and dominate. In brief, neither the series nor their continued fractions are a priori ‘formal’. The bible of these matters is H. S. Wall [23]; there’s a nice introduction in Henrici [11]. One might further study [12] and the very extensive literature on Pad´e approximation. And then there are wondrous identities a` la Ramanujan; see for example the five volume series [4].
2
Remarks and Allegations
2.1 A Generic Example. A brief computation by PARI GP reveals that G(X) =
Y
h
(1 − X −2 ) =
h≥0
[1 , −X+1 , − 12 X−1 , 2X 2 −2X+4 , − 12 X , 2X 2 +2X , 12 X−1 , X+ 12 , 43 X+ 14 9 , 2 2 27 9 8 8 16 81 8 8 16 81 81 8 X + 4 , − 81 X − 81 X − 81 , 8 X , 81 X − 81 X − 81 , 8 X − 4 , 4 10 1 1 32 56 X + 243 , −243X + 729 , − 486 X − 729 , − 2187 X − 729 , − 10935 X − 18225 , 243 2 2 8 2 273375 54675 128 128 256 4100625 4100625 128 X − 32 , − 4100625 X − 4100625 X − 4100625 , − 2176 X + 2312 , 39304 26299 83521 83521 X + 2733750 , 131220000 X + 1960098750 , 31492800000 X − 1968300000 , 4100625 83521 83521 472392000000 2 472392000000 83521 1085773 X − X , − 31492800000 X − 62985600000 , − 83521 83521 8398080000 802016640000 183495637 1256239361 X + 451630080000 , − 1085773 X + 14115049 , 184757760000 17884551168000 1681960743936000 189659438942467 1292330595584717 102574061083 X − 4410684626569 , − 5901901885440000 X + 11066066035200000 , 53537627478297600000 − 663963962112000000 236505320361256349 X − 6858654290476434121 , 23792671733662749965749 − 198900974423816589509 52287162016320000000 X − 2196060804685440000000 , 11529319224598560000000 693955214280599040000000 X + 7896567585599942420096809 , − 213420745556755200543157 292173000667197869543581933 16337998334606280867180297821 − 2421157037165697600000000 X + 40352617286094960000000000 , 288232980614964000000000000 12040726884489681840000000000 X − 5718702142059063900576529174609 , − 248639223567785386981588224983 10117703789796805362558474693539 2342739085214054266441507954708313 − 6647005471324680000000000000 X − 1314445331954455470000000000000 , 673615975105911420000000000000 182621794871498446530000000000000 X− 45234697074367701882866239051013863 , , . . . . . . ]. 183136425402298388189741858506129
−
Formal Power Series and Their Continued Fraction Expansion
361
2.2 Thoughts and Remarks. The following are among the thoughts I mean to provoke by this example. a. PARI is a fine program indeed; the computation truly is brief. b. This is computational mathematics. It’s nearly impossible to notice this kind of thing by hand; one thinks one must have blundered in the calculation. c. The example is no more than an example, and it seems quite special. But the general appearance of the expansion in fact is typical. d. It is striking that the complexity of the coefficients grows at a furious rate, yet the mindful eye sees pattern, of sorts. It will be worthwhile to hint at an explanation for that. e. Most of the partial quotients are of degree 1; the others have degree 2. It turns out that it is the partial quotients of degree 2 that should surprise. Partial quotients of formal Laurent series ‘want’ to have degree 1. A kind of sort of? ‘quasi-repetition’ in our particular example in fact ‘perpetuates’ an ‘initial accident’ which happens to yield a partial quotient of degree 2. f . One should give in to the temptation to wonder what happens to our example when it is considered to be defined over some field of characteristic p 6= 0. Of course, if p never occurs in a denominator of a partial quotient then the expansion has good reduction, and we can just reproduce it reduced modulo p. But what happens when the expansion has bad reduction at p? By the way. It is not just that it’s reasonable to reduce mod p. It’s unreasonable not to. The example, let alone my claim that it is generic, shouts a reminder that formal power series want to be defined over a finite field, and not over Q. 2.3 Two Examples of Reduction mod p. It’s easy to begin to answer that last question by computing a few instances. For example, over F3 we find that Y
h
(1 − X −2 ) =
h≥0
= [1 , 2X +1 , X +2 , 2X 2 +X +1 , X , 2X 2 +2X , 2X +2 , X +2 , X 2 +2X +2 , 2X 2 + 2X + 1 , X , X 2 + 2X + 1 , X + 1 , 2X 6 + X 5 + X 4 + 2X 3 + X 2 + 2X + 2 , X 2 + X + 2 , 2X + 2 , X , 2X + 1 , 2X 4 + X 3 + X 2 + 2X + 2 , 2X + 1 , X + 1 , 2X 12 +2X 11 +X 8 +X 7 +X 4 +X 3 +2 , X +2 , 2X 2 +2X +1 , X 2 +2X +2 , X +1 , X 2 + 2X , 2X + 1 , X + 2 , . . . . . . ]. Not too surprisingly, to the extent that the original expansion has good reduction the new expansion is its reduction; the first term with bad reduction ‘collapses’ to a term of higher degree. Beyond that term the expansion is not immediately recognisable in terms of the original. Of course, Y h (1 + X −2 ) = 1/(1 − X −1 ) = X/(X − 1) , h≥0 ?
For those not Anglophone: The phrase ‘kind of sort of’, though rarely produced in print, is thought often. It’s, well, kind of sort of a little more vague than just ‘kind of’, or ‘sort of’, alone.
362
Alf van der Poorten
and so we should not be shocked to find that over F2 , for example (1 − X −1 )(1 − X −2 )(1 − X −4 )(1 − X −8 )(1 − X −16 )(1 − X −32 )(1 − X −64 ) = [ 1 , X + 1 , X 126 + X 125 + · · · + X + 1 ]. In this case, the collapse to high degree is exceptionally vivid. 2.4 An Atypical Example. On the other hand, consider X −1 + X −2 + X −3 + X −5 + · · · + X −Fh + . . . = [0 , X−1 , X 2 +2X+2 , X 3 −X 2 +2X−1 , −X 3 +X−1 , −X , −X 4 +X , −X 2 , − X 7 + X 2 , −X − 1 , X 2 − X + 1 , X 11 − X 3 , −X 3 − X , −X , X , X 18 − X 5 , − X , X 3 + 1 , X , −X , −X − 1 , −X + 1 , −X 29 + X 8 , X − 1 , . . . ] . Here the sequence of exponents (Fh ) of the series is defined by the recurrence relation Fh+2 = Fh+1 + Fh and the initial values F2 = 1, F3 = 2. The following thoughts and remarks will surely have sprung to the reader’s mind. a. This example is likely to have first been noticed by persons excessively interested in Fibonacci numbers. b. The continued fraction expansion appears to have good reduction everywhere; that entails that on replacing X by any integer of absolute value at least 2 we obtain a numerical expansion defective only to the extent that it may include nonpositive integer partial quotients. Indeed, Jeff Shallit had long known that 2−1 + 2−2 + 2−3 + 2−5 + · · · + 2−Fh + · · · = [0 , 1 , 10 , 6 , 1 , 6 , 2 , 14 , 4 , 124 , 2 , 1 , 2 , 2039 , 1 , 9 , 1 , 1 , 1 , 262111 , 2 , 8 , 1 , 1 , 1 , 3 , 1 , 536870655 , 4 , 16 , 3 , 1 , 3 , 7 , 1 , 140737488347135 , . . . ] . It seemed difficult to explain the apparent patterns of the numerical expansion; the more rigid formal power series case appeared relatively accessible [20]. c. One expects, with considerable confidence, that the example is representative of the nature of the continued fraction expansion of a very wide class of power series. After all, it’s well known that for mathematical purposes the Fibonacci numbers have no property not generalised by way of units of real quadratic number fields or, according to the case, by higher order recurrence sequences, say those ‘generated’ by Pisot numbers. d. On the other hand, this example is fairly startling in that the sequential truncations of the series do not themselves provide convergents. It shares that property with the product with which we began. What if the exponent 2h in that product were replaced with powers of larger integers?
3
Various Hilfs¨ atze and Related Principles
3.1 Negating the Negative. The following very simple lemmata provide most of the results we will need. We will here occasionally write to denote − . We draw attention to the following.
Formal Power Series and Their Continued Fraction Expansion
363
Lemma 1. −β = [ 0 , 1 , 1 , 1 , 0 , β ]. Proof. According to taste, study either of the two columns of computation below. −β =0 + β −1/β =1 + (β − 1)/β β/(β − 1) =1 + 1/(β − 1) β − 1 =1 + β 1/β =0 + 1/β β =β
or − β = [ 0 + β ] = [ 0 , −1/β ] = [ 0 , 1 + (β − 1)/β ] = [ 0 , 1 , β/(β − 1) ] = [ 0 , 1 , 1 + 1/(β − 1) ] = [ 0 , 1 , 1 , β − 1 ] = [ 0 , 1 , 1 , 1 + β ] = [ 0 , 1 , 1 , 1 , 1/β ] = [ 0 , 1 , 1 , 1 , 0 + 1/β ] = [ 0 , 1 , 1 , 1 , 0 , β ]
One needs to recall, say by the matrix correspondence, that [ a , 0 , b ] = [ a + b ]. Since, of course, −[ b , γ ] = [ −b , −γ ], we have, for example, − π = −[ 3 , 7 , 15 , 1 , 292 , 1 , . . . ] = [ −3 , −[7 , 15 , 1 , 292 , 1 , . . . ] ] = [ −3 , 0 , 1 , 1 , 1 , 0 , 7 , 15 , 1 , 292 , . . . ] = [ −4 , 1 , 6 , 15 , 1 , 292 , . . . ]. Corollary 1. Alternatively, −β = [ 0 , 1 , 1 , 1 , 0 , β ]. Using Lemma 1 we readily remove negative partial quotients from expansions. Thus [ a , b , c , δ ] = [ a , 01110 , b , −c , −δ ] = [ a − 1 , 1 , b − 1 , −c , −δ ] , and that’s [ a − 1 , 1 , b − 1 , 01110 , c , δ ] = [ a − 1 , 1 , b − 2 , 1 , c − 1 , δ ]. If b = 1 one proceeds differently. It seems best to work from first principles, applying the Lemma repeatedly, rather than trying to apply consequent formulas. 3.2 Removing and Creating Partial Quotients. For continued fractions of real numbers the ‘admissible’ partial quotients are the positive integers. That makes it useful to have techniques for removing inadmissible partial quotients, specifically 0 and negative integers; it’s rather more difficult to neatly remove more complicated quantities. For continued fraction expansions of formal power series, however, the corresponding admissibility criterion is that the partial quotients be polynomials of degree at least 1. It is now constant partial quotients that are inadmissible but which can be dealt with fairly readily. We had best first remark that x[ a , b , c , δ ] = [ xa , x−1 b , xc , x−1 δ ], a fact that is obvious but that is somehow not terribly widely known. Lemma 2. [ a , x , γ ] = [ a + x−1 , −x2 γ − x ]. Proof. Set F = [ a , x , γ ], so xF = [ xa , 1 , xγ ] = [ xa , 1 , 01110 , −xγ ]. Then xF = [ xa + 1 , −xγ − 1 ] yields F as claimed. Corollary 2. Conversely, [ a + x , γ ] = [ a , x−1 , −x2 γ − x ].
364
Alf van der Poorten
We see that ‘moving x’ propagates through the tail of the expansion as alternate multiplication and division by x2 . I suggest — this is ‘philosophy’, not mathematics — that the explosive increase in complexity of the rational coefficients of the partial quotients in the continued fraction expansion of a ‘typical’ formal power series is the consequence of a sequence of ‘movings’ of rational quantities. I will illustrate this explicitly for the example function G. 3.3 Paperfolding. The matrix correspondence readily yields the following extraordinarily useful result; I learned it from Mend` es France [14]. As above, we have set [ a0 , a1 , . . . , ah ] = ph /qh for h = 0, 1, . . . . For convenience, we think of the string of symbols a1 · · · ah as the ‘word’ wh . Naturally, given that, ← − by ← w− h we then mean the word ah · · · a1 , and by −wh the word ah · · · a1 . Proposition 2 (Folding Lemma). We have w− ph /qh + (−1)h /xqh2 = [ a0 , wh , x − qh−1 /qh ] = [ a0 , wh , x , −← h ]. Proof. Here ←→ denotes the ‘correspondence’ between 2 by 2 matrices and continued fractions. We have ph ph−1 x − qh−1 /qh 1 = [ a0 , wh , x − qh−1 /qh ] ←→ qh qh−1 1 0 xph − (ph qh−1 − ph−1 qh )/qh ph ←→ ph /qh − (−1)h+1 /xqh2 , xqh qh w− as alleged. Moreover, [ x − qh−1 /qh ] = [ x , −← h ]. − yields Why, ‘folding’ ? Iteration of the perturbed symmetry w −→ w, x, −← w a pattern of signs corresponding to the pattern of creases in a sheet of paper repeatedly folded in half; see [8]. For example, the continued fraction expansion of the sum X h X −2 = 1 + X −1 + X −3 + X −7 + X −15 + X −31 + X −63 + X −127 + · · · F =X h≥0
is given sequentially by 1 + X −1 = [ 1 , X ], 1 + X −1 + X −3 = [ 1 , X , X , X ], 1+X −1 +X −3 +X −7 = [ 1 , X , X , X , X , X , X , X ], . . . , where the addition of each term is done by a ‘fold’ with x = −X; see [19]. There is a different way of producing that folded sequence, but I’ll use the more conventional symbols 0 and 1 in place of X and X. We’ll just ‘fill the spaces’ with ‘1 · 0 · ’ repeatedly . . . ; having begun with ‘0 · 1 · ’. Here the · s denote a space about to be filled. 0·1·0·1·0·1·0·1·0·1·0·1·0·1·0·1·0·1·0·1·0·1·0·1·0 . . . 011 · 001 · 011 · 001 · 011 · 001 · 011 · 001 · 011 · 001 · 011 · 001 · 0 . . . 0111001 · 0110001 · 0111001 · 0110001 · 0111001 · 0110001 · 0 . . . 011100110110001 · 011100100110001 · 011100110110001 · 0 . . . 0111001101100011011100100110001· 01110011011000100 . . . 0111001101100011011100100110001101110011011000100 . . .
Formal Power Series and Their Continued Fraction Expansion
365
This remark actually seems useful in understanding some continued fraction expansions of formal power series. For me, it motivated the following result. Proposition 3 (Ripple Lemma). [ z , a , b , c , d , e , f , g , h , i , j , . . . ] = [ z − 1 , 1 , a , 1 , b , 1 , c , 1 , d , 1 , e , 1 , f , 1 , g , 1 , h , 1 , i , 1 , j , 1 , . . . ]. Proof. Appropriately apply Lemma 1, equivalently the Corollary to Lemma 2 with x = ±1, again, and again, and . . . . The series F is given by the functional equation 1 +X −1 F (X 2 ) = F (X). But it’s easy to see that the folded continued fraction F (X) claimed for F above has the property that XF (X) − X is just a rippled version of F (X 2 ), providing a new proof that F has the continued fraction expansion F (X). This new viewpoint [17] readily allowed a noticeable simplification and generalisation of the work of [1] detailing various more delicate properties of the expansion.
4
Some Details
We display a computation illustrating how there is an explosion in complexity of the rational coefficients of the partial quotients of formal power series, and mention the effect of reduction mod p on such continued fraction expansions. 4.1 A Painful Computation. Suppose we have discovered, either laboriously by hand, or aided by the miracle of PARI, that G(X; 1) = (1 − X −1 )(1 − X −2 ) = [ 1 , −X + 1 , − 12 X −
3 4
, 8X − 4 ].
We bravely set out to compute, by hand, the continued fraction expansion of G(X; 2) = (1 − X −1 )G(X 2 ; 1). Replacing X by X 2 is easy, but then we’ll want to divide by X, and multiply by X − 1. To that end we first ready the expansion for being divided by X by repeated applications of Lemma 2; thus generalised ‘rippling’. We see that [ 1 , −X 2 + 1 , − 12 X 2 −
3 4
, 8X 2 − 4 ] = [ 0 , 1 , X 2 − 2, 12 X 2 +
3 4
, −8X 2 + 4 ]
= [ 0 , 1 , X 2 , − 12 , −2X 2 − 1 , 2X 2 − 1 ] = [ 0 , 1 , X 2 , − 12 , −2X 2 , −1 , −2X 2 + 2 ], so we’ll divide [ 0 , 1 , X 2 , − 12 , −2X 2 , −1 , −2X 2 ,
1 2
] by X. We obtain
[ 0 , X , X , − 12 X , −2X , −X , −2X , 12 X ] = [ 0 , X − 1 , 1 , −X − 1 , 12 X , . . . ] = [ 0 , X − 1 , 1 , −X + 1 , − 12 , −2X + 2 , − 12 X , −4X , − 12 X , 2X ], where we’ve started to ripple the expansion to ready it for multiplication by X − 1. The exciting feature is the underlined term −2X + 2. It is ‘accidentally’ ready — without our having had to ripple it into submission. It’s that unlikely
366
Alf van der Poorten
to be repeated accident that causes the expansion of G to have partial quotients of degree 2. Next [ 0 , X − 1 , 1 , −X + 1 , − 12 , −2X + 2 , − 12 X , −4X + 4 , − 14 , 8X + 4 , − 18 X ] 1 1 , 18X − 12 ] = [ . . . , − 14 , 8X − 8 , 12 , 18X − 18 , 16 ]. = [ . . . , − 14 , 8X − 8 , 12 On multiplying by X − 1, as we now can easily do, we see that G(X; 2) is [ 0 , 1 , X − 1 , −1 , − 12 (X − 1) , −2 , − 12 X(X − 1) , −4 , − 14 (X − 1) , 8 ,
1 12 (X
− 1) , 18 , 16 (X − 1) ].
Finally, we tidy that up, again risking an increase in complexity of the rational coefficients. When the dust has settled we obtain G(X; 2) = [1 , −X +1 , − 12 X −1 , 2X 2 −2X +3 , −X + 12 , − 43 X − 14 9 ,
27 9 8 X−4
].
I don’t want to claim that the method used here is a sensible way of pursuing the computation; it’s far more convenient to type a few lines on one’s computer. But I do suggest that it enables us to see both how complexity of the coefficients propagates, and that it requires an unlikely accident — so unlikely as to be near impossible other than at the beginning of the expansion, when the coefficients still are orderly — to newly create a partial quotient of degree other than 1. It is the functional equation satisfied by G(X) that generates the ‘kind of sort of quasi-periodicity’ I vaguely spoke of above. Q h 4.2 On the Other Hand . . . . Consider H4 (X) = h≥0 (1 + X −4 ). The surprise this example provides is not just that its partial quotients have high degree. The coefficients of the partial quotients all are integers! Just as in the clumsy approach just tried, we obtain sequentially that H4 (X; 0) = [ 1 , X 4 ] ; =
H4 (X; 1) =
X −1 [ 1 , X4 ] = X
X −1 [ 0 , 1 , −X 4 , 1 ] = (X − 1)[ 0 , X , −X 3 , −X ]. X
Next, we ripple again, now to permit the multiplication by X − 1. We obtain (X − 1)[ 0 , X − 1 , 1 , X 3 − 1 , X ] = [ 0 , 1 , X − 1 , (X 3 − 1)/(X − 1) , X(X − 1) ] = [ 1 , −X , −(X 3 − 1)/(X − 1) , −X(X − 1) ]. The reader caring to pursue this process will find that each iteration adds just two partial quotients, and that all have integer coefficients. Indeed, Mend`es France and I remark in [15] that for k ≥ 3 the truncations of Hk (X) are readily seen to be convergents of Hk and, when k is even, not 2, they are every second convergent of Hk . However, for k ≥ 3 odd, we show in [2] with the aid of
Formal Power Series and Their Continued Fraction Expansion
367
Allouche that the expansion is ‘normal’? — up to the partial quotients given by the truncations of the product, and their ‘quasi-repetition’ occasioned by the functional equation. In the case k = 3, the partial quotients whose existence is given by the truncations of the product too have degree 1. However, the ingenious proof in [2] that indeed all the partial quotients of H3 have degree 1 — which relies on multiplying by using the ‘Raney automata’ [21] — was nugatory. It is remarked by Cantor [7] that the degree of the partial quotients of H3 is an obvious consequence of the fact that H3 (X) is (1 + X −1 )−1/2 when reduced mod 3 and that its partial quotients all have degree 1. 4.3 Beal’s Principle. Cantor’s observation follows from a general principle. We’ll need to mind our p s and q s, since we’ll want to use p to denote a prime; so our convergents will here be x/y. Given a series F , we denote its sequence of partial quotients by (ah ), and of its complete quotients by (Fh ). My remarks are inspired by a question put to me by Guillaume Grisel (Caen) at Eger, 1996. The principle underlying Grisel’s question was that it seemed likely that every reduction F , mod p, of F has no more partial quotients than does F itself. Notice that F must have reduction at p for this to make sense at all and that, naturally, if we mention the number of partial quotients then we must apparently be alluding to the number of partial quotients of rational functions; thus of truncations of the series F . However, I’ve now realised that the idea is to understand the first principles genesis of the sequence of polynomials (yh ) yielding the convergents to F . Recall that, by Proposition 1, those are the polynomials of least degree not exceeding dh , say, respectively, so that the Laurent series yh F has no terms of degree −1, −2, . . . , and −dh . There is no loss of generality (but there is a significant change in definition of the yh ) in our determining that the yh have been renormalised so that each has integer coefficients not sharing a common factor. Now consider this story in characteristic p. It can be told in the same words, other than that it’s not relevant to fuss about normalisation of the yh and that we mark all quantities with an overline . Theorem 1. The distinct reductions yh of the yh yield all the convergents of F . Proof. Certainly, each yh yields a convergent to F , because deg(yh F − xh ) < − deg yh implies that deg( yh F − xh ) < − deg yh ≤ − deg yh . However, some of the yh may coincide. Denote representatives of the distinct yh by yh(0) , yh(1) , . . . , yh(j) , . . . , where each h(j) is maximal; that is yh(j) = yh(j)−1 = · · · = yh(j−1)+1 . Then deg(yh(j) F −xh(j)) = − deg yh(j)+1 entails deg( yh(j) F − xh(j) ) ≤ − deg yh(j)+1 . ?
That is, all the partial quotients are of degree 1. But I also intend to invoke the notion that the coefficients of those polynomials are ‘typical’ and explode in complexity.
368
Alf van der Poorten
The last inequality informs us that the corresponding next partial quotient of F , let’s call it bj+1 , has degree at least deg yh(j)+1 − deg yh(j) . But n X
n X deg yh(j)+1 − deg yh(j−1)+1 ≥ deg yh(j)+1 − deg yh(j−1)+1 = deg yh(n)+1 ,
j=0
j=0
where we recall yh(j) = yh(j−1)+1 , and that by the formalism yh(−1)+1 = 1, so that yh(−1)+1 is a constant, and thus is of degree zero. However, it’s plain by induction on a remark in the introduction, that n X
deg bj+1 = deg yh(n)+1 ≤ deg yh(n)+1 .
j=0
It follows that the ‘polite’ inequalities above (where we wrote ‘≤’ because we could not be certain that we were allowed to write ‘=’) all are equalities, that is, deg yh(j−1)+1 = deg yh(j−1)+1 and deg yh(j)+1 − deg yh(j) = deg bj+1 , and the yh(j) must account for all the convergents of F as claimed. This yields a verification of Beal’s principle in the best sense, because we show that the convergents of F arise from a subset of those of F , so that it always makes sense to claim in that sense that the number of convergents of F cannot exceed that of F . It is then a triviality that if deg qh = h for all h necessarily the same, that is deg qh = h for all h, is true for the original function. But it is easily confirmed that deg qh = h for all h for (1 + X −1 )−1/2 over F3 , so of course also deg qh = h Q h is true for the product h≥0 (1 + X −3 ), as Cantor pointed out.
5
In Thrall to Fibonacci
We remark that to our surprise, and horror, continued fraction expansion of formal power series appears to adhere to the cult of Fibonacci. 5.1 Specialisable Continued Fraction Expansions. Suppose (gh )h≥0 is a sequence of positive integers satisfying gh+1 ≥ 2gh . Then the Folding Lemma, together with Lemma 2 whenever gh+1 = 2gh , readily shows that every series P −gh ±X has a continued fraction expansion with partial quotients polynoh≥0 mials with integer coefficients. Since such expansions are precisely the expansions that continue to make sense when X is replaced by an integer at least 2, we call them specialisable. When the exponents gh increase less rapidly more ad hoc tricks become necessary. Shallit and I noticed [20], mostly experimentally but with proofs for several P −T simpler cases, that certain series X h are specialisable, where the recurrence sequence (Th ) satisfies Th+n = Th+n−1 + Th+n−2 + · · · + Th — the dreaded Fibonacci, Tribonacci [sic], and more generally, forgive me, Polynacci? sequences. ?
Surely ‘n-acci’ is no better?
Formal Power Series and Their Continued Fraction Expansion
369
Since it seemed absurd that continued fractions be in thrall to Fibonacci, I was keen to discover a larger class of examples of which those instances were part. 5.2 A Shocking Surprise. It seems one should study the continued fractions of the sequence of sums X Gm+1 (X −Gm+1 + X −Gm+2 + · · · + X −Gm+h ) h≥0 . Having, somehow, obtained the expansion for h, one changes m −→ m+1, divides by X Gm+2 −Gm+1 , and finally one adds 1. The ripple lemma makes it feasible to do this ‘by hand’ and to see what ‘really’ happens in moving h −→ h+1. Whatever, one finds that the folded sequence obtained is first perturbed after n + 1 steps by the behaviour of the ‘critical’ exponent Gm+n+2 − 2Gm+n+1 + Gm+1 . Call this quantity Gm,n . Specifically, if for some n, Gm,n−1 > 0 but Gm,n < 0 then the expansion is not specialisable; and if both are positive, then n is not critical. So the interesting case is Gm,n−1 > 0 and Gm,n = 0. In that case, and only that case, the expansion is specialisable. But, horribile dictu, the condition Gm,n = 0 for all m says that (Gh ) is some constant translate of the Polynacci sequence of order n. Contrary to decency and common sense, it does seem that these cases really are special when it comes to specialisable continued fraction expansion. The perturbation caused by the vanishing of Gm,n spreads through the expansion by the inductive step. One also notices that specialisability is lost if one makes arbitrary changes to the signs of the terms. Finally, one obtains cases such as the examples of [20], see §2.4 above, by presuming G0 = G1 = · · · = Gn−2 = 1, Gn−1 = 1 and taking m = n − 1. There’s still work to do then to show that the expansion remains specialisable on division by X. That division further perturbs the pattern in the expansion, explaining why the work of [20] was so complicated.
6
Concluding Remarks
6.1 Normality. That, for formal power series over an infinite field, all partial quotients are almost always of degree one, is just the observation that remainders P −h a X have a reciprocal with polynomial part of degree greater than one, h h≥1 and thus give rise to a partial quotient of degree greater than one, if and only if −1 a1 = 0. Moreover, if a1 6= 0, the partial quotient is a−1 1 X − a2 a1 and the next −3 −1 2 remainder is (a2 − a1 a3 )a1 X + terms of lower degree in X. This same viewpoint shows that the reduction mod p of a formal power series almost always has partial quotients of degree greater than one, since now the nonvanishing of the coefficient of X −1 of all remainders is as unlikely as the nonappearance of the digit 0 in the base p expansion of a random real number. These two remarks combine to explain my claim that one should expect a formal power series with integer coefficients to have partial quotients of degree one, that the continued fraction expansion will have bad reduction at all primes, and that — noting the shape of the coefficient of X −1 of the ‘next’ remainder — the coefficients of the partial quotients will quickly explode in complexity.
370
Alf van der Poorten
6.2 Announcements. Other remarks also are announcements with hint of their eventual proof; one might call them conjectures with some justification. Thus, I certainly give no proof that the partial quotients of G(X), see §2.1, are all of degree at most 2; notwithstanding the strong hints of §4.1. Here, I’m not sure that a proof warrants the effort. On the other hand, §5 both reports some results proved in [20] and the announcement that I now know how to prove the conjectures of that paper; evidently with details to appear elsewhere. 6.3 Power Series over Finite Fields. There is a well studied analogy between number fields and function fields in positive characteristic leading, for example, to a theory of diophantine approximation of power series in finite characteristic as in the work of de Mathan at Bordeaux. Iteration of references from the recent paper [13] will readily lead the reader into that literature. By the way, Theorem C sketched on p.224 of that paper is a trivial application of the Folding Lemma, Proposition 2 above. I might add that it was the work of Baum and Sweet [3] that first interested me in the present questions. Beal’s Principle informs on these matters; description of that will be the subject of future work. 6.4 Power Series with Periodic Continued Fraction Expansion. It is of enormous interest to find infinite √ classes of positive integers D for which the D has a ‘long’ period, continued fraction expansion of √ in principle of length O( D log log D), but in practice of length O (log D)k since no better is known than some cases with p small k. One approach, as exemplified by [9], leads to a study of families f(n), with f a given polynomial taking integer values at the integers, and for integers n. Here a theorem of Schinzel [22] shows that the period has uniformly bounded length, thus independent of n, exactly when p the power series f(X) has a periodic expansion with good reduction at all primes, perhaps other than 2. These issues connect closely with recent work of Bombieri and Paula Cohen [6] showing that the coefficients of simultaneous Pad´e approximants to algebraic functions are large — this is essentially the explosive growth of the coefficients of partial quotients of which we make much above — in effect unless a ‘generalisation’ of Schinzel’s conditions holds. For the hyperelliptic case of this phenomenon, which goes back to Abel, see [5].
References 1. Jean-Paul Allouche, Anna Lubiw, Michel Mend`es France, Alf van der Poorten and Jeffrey Shallit, ‘Convergents of folded continued fractions’, Acta Arith., 77 (1996), 77–96. 2. J.–P. Allouche, M. Mend`es France and A. J. van der Poorten, ‘An infinite product with bounded partial quotients’, Acta Arith. 59 (1991), 171–182. 3. L. Baum and M. Sweet, ‘Continued fractions of algebraic power series in characteristic 2’, Ann. of Math. 103 (1976), 593–610. 4. Bruce C. Berndt, Ramanujan’s notebooks, Springer-Verlag. Part I, with a foreword by S. Chandrasekhar, New York-Berlin, 1985; Part II, New York-Berlin, 1989; Part III, New York, 1991; Part IV, New York, 1994; Part V, New York, 1998.
Formal Power Series and Their Continued Fraction Expansion
371
5. T. G. Berry, ‘On periodicity of continued fractions in hyperelliptic function fields’, Arch. Math. 55 (1990), 259–266. 6. Enrico Bombieri and Paula B. Cohen, ‘Siegel’s Lemma, Pad´e Approximations and Jacobians’ (with an appendix by Umberto Zannier), to appear in the De Giorgi volume, Annali Scuola Normale Superiore, Pisa 7. David G. Cantor, ‘On the continued fractions of quadratic surds’, Acta Arith. 68 (1994), 295–305. 8. Michel Dekking, Michel Mend`es France and Alf van der Poorten, ‘FOLDS!’, The Mathematical Intelligencer 4 (1982), 130-138; II: ‘Symmetry disturbed’, ibid. 173181; III: ‘More morphisms’, ibid. 190-195. 9. E. Dubois etpR. Paysant-Le Roux, ‘Sur la longeur du developpement en fraction continue de f (n)’, Ast´erisque 198–200 (1991), 107–119 10. David Goss, David R. Hayes and Michael I. Rosen eds., The arithmetic of function fields, Proceedings of the workshop held at The Ohio State University, Columbus, Ohio, June 17–26, 1991. Ohio State University Mathematical Research Institute Publications, 2. Walter de Gruyter & Co., Berlin, 1992. viii+482 pp. 11. Peter Henrici, Applied and computational complex analysis, Vol. 2, Special functions—integral transforms—asymptotics—continued fractions. Reprint of the 1977 original. Wiley Classics Library. A Wiley-Interscience Publication. John Wiley & Sons, Inc., New York, 1991. x+662 pp. 12. William B. Jones and Wolfgang J. Thron, Continued fractions. Analytic theory and applications, with a foreword by Felix E. Browder and an introduction by Peter Henrici. Encyclopedia of Mathematics and its Applications, 11. Addison-Wesley Publishing Co., Reading, Mass., 1980. xxix+428 pp. 13. Alain Lasjaunias, ‘Diophantine approximation and continued fractions expansions of algebraic power series in positive characteristic’, J. Number Theory 65 (1997), 206–225. 14. Michel Mend`es France, ‘Sur les fractions continues limit´ees’, Acta Arith. 23 (1973), 207–215. 15. M. Mend`es France and A. J. van der Poorten, ‘From geometry to Euler identities’, Theoretical Computing Science 65 (1989), 213–220. 16. M. Mend`es France and A. J. van der Poorten, ‘Some explicit continued fraction expansions’, Mathematika 38 (1991), 1–9. 17. Michel Mend`es France, Alf van der Poorten and Jeffrey Shallit, ‘On lacunary formal power series and their continued fraction expansion’, to appear in the Proceedings of the Number Theory Conference in honour of Andrzej Schinzel on the occasion of his 60th birthday, K. Gy˝ ory, H. Iwaniec and J. Urbanowicz eds, 6pp. 18. A. J. van der Poorten, ‘An introduction to continued fractions’, in J. H. Loxton and A. J. van der Poorten eds., Diophantine Analysis (Cambridge University Press, 1986), 99–138. 19. A. J. van der Poorten and J. Shallit, ‘Folded continued fractions’, J. Number Theory 40 (1992), 237–250 . 20. A. J. van der Poorten and J. Shallit, ‘A specialised continued fraction’, Canad. J. Math. 45 (1993), 1067-1079. 21. G. N. Raney, ‘On continued fractions and finite automata’, Math. Ann. 206 (1973), 265–283. 22. A. Schinzel, ‘On some problems of the arithmetical theory of continued fractions’, Acta Arith. 7 (1962), 287–298. 23. H. S. Wall, Analytic Theory of Continued Fractions, D. Van Nostrand Company, Inc., New York, N. Y., 1948; xiii+433 pp.
Imprimitive Octic Fields with Small Discriminants Henri Cohen, Francisco Diaz y Diaz, and Michel Olivier Universit´e Bordeaux I, Laboratoire A2X 351 cours de la Lib´eration, 33 405 Talence, France {cohen,diaz,olivier}@math.u-bordeaux.fr
Abstract. We give here the complete table of octic fields with a quartic subfield with minimum discriminants for all signatures and for all the possible Galois groups. Moreover we give some pairs of octic fields arithmetically equivalent (i.e. with the same Dedekind zeta function).
1
Introduction
In a forthcoming paper (see [3]), we describe the computation of extended tables of octic fields with a quartic subfield using an algorithmic version of global class field theory (see also [2]). In degree 8, only the first three minimal for totally real and the first fifteen minimal for totally imaginary were known (see [5] and [12]). For each of the 50 possible Galois groups for octics (see [1]), one example of a parametrized family of polynomials is given in [14]. But nothing was known for the minimal discriminants of octics with a given signature and a given Galois group, apart from some results which can be found incidentally in the literature (see [9] and [10]). Note that the reference [9] contains some wrong results that we have corrected in the above mentioned paper (see [3]). Our extended tables of octics with a quartic subfield contain respectively the first 11639 fields with signature (0, 4), the first 12301 fields with signature (2, 3), the first 13077 fields with signature (4, 2), the first 11680 fields with signature (6, 1), and the first 13796 fields with signature (8, 0). In the second section, we describe the methods used for finding all such octic fields with a given signature and a given Galois group (for the fields which are not in the extended tables of octics). We give the table of all minimal discriminants. Finally, in the last section we give a table of pairs of octics which have the same Dedekind zeta function.
2
Galois Groups
The notation that we use here for the Galois groups of octic fields is the notation of [1]. We have computed the Galois group of the octic fields that we have found in the extended tables (see [3]) using the methods described in [6] and [7]. The J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 372–380, 1998. c Springer-Verlag Berlin Heidelberg 1998
Imprimitive Octic Fields with Small Discriminants
373
Galois groups that we obtain are necessarily among those corresponding to octic fields containing a quartic subfield, and this gives 36 of the 50 possible Galois groups in degree 8 (see the table in signature (0, 4) or (8, 0) for the complete list). Even though we found many octic fields, there was no reason that we would find all the possible combinations of signatures and Galois groups, and indeed we found only 97 of the 114 possible combinations. For the 17 missing ones, we used the following methods. – Specializations of the parametrized solutions given in the literature, for example in [14]. – Particular polynomials one can find in the literature. More precisely, we checked that the minimal discriminant for the Galois group T5+ and signature (0, 4) is indeed as given by S. Y. Kwon in [10]. On the other hand, the discriminant for the Galois group T23 and signature (0, 4) given by A. Jehanne in [9] gives us an upper bound but is not minimal. – The “mirror effect”: an octic field L having a quartic subfield K, can be defined by √ an even polynomial P (X 2 ) ∈ Z[X]. Let D be a rational integer such that D is not in the Galois closure N of L over Q and denote by G the Galois group of the extension N/Q. The field obtained by adjoining √ to the rationals the roots {±θ1 , ±θ2 , ±θ3 , ±θ4 } of P (X 2 ) as well as D is Galois over Q and its Galois group is isomorphic √ to G × C2 . In this group, the intersection of the stabilizers of the elements Dθj is trivial if there does not exist σ ∈ G such that σ(θj ) = −θj for j = 1, 2, 3, 4, and it is equal to H = {(1, 1), (σ, 1)} ' C2 otherwise. In this case, since σ is a central element in G we have (G × C2 )/H ' G as an √ abstract group. Thus, replacing P (X 2 ) with P ((X D)2 ) = P (DX 2 ) gives a polynomial whose Galois group is (as an abstract group) isomorphic to G × C2 in the first case above, and to G in the second case. Using this, one can prove (Y. Eichenlaub, personal communication) that the Galois group of the Galois closure of L/Q is not changed except when G is + ) in which case the new group becomes T4+ or T9+ the group T4+ (resp. T14 + (resp. T24 ). + is the – The direct study of the group structure. For example, the group T14 group S4 considered as a transitive group of degree 8. It is not difficult to prove that an octic field having such a Galois group is obtained by taking a quartic field of Galois group S4 , and adjoining the square root of the discriminant of the quartic field (which of course belongs to the Galois closure). – Pushing this idea further, we adjoined to quartic fields square roots of divisors of the discriminant, and we obtained in this way practically all the missing groups and signatures. The following tables give, for all signatures and for all possible Galois groups of the Galois closure corresponding to this signature: – The name of the Galois group in the notation of [1]. We chose not to use the more recent (but more complex) notation used in [4].
374
Henri Cohen, Francisco Diaz y Diaz, and Michel Olivier
– The minimal discriminant (in absolute value) corresponding to this Galois group. When the minimal discriminant was obtained by a specific method the value of the discriminant is followed by a ∗ . Signature (0,4) G dmin G dmin + ∗ T1 21474 83648 T13 178 50625 + T2+ 12 65625 T14 608 86809 T3+ 53 08416 T15 314 43200 T4+ 17 50329 T16 94 53125 + ∗ T5 1 22305 90464 T17 12 57728 + T6 41 02893 T18 60 36849 + T7 51200 00000∗ T19 671 08864 + T8 10 49633 09568∗ T20 262 65625 + T9 32 11264 T21 335 54432 + + T10 18 90625 T22 254 01600 + T11 32 40000 T23 3 39710 01237∗ + + T12 7059 11761∗ T24 17 63584
G dmin T26 187 53525 T27 15 78125 T28 378 79808 + T29 35 04384 T30 2153 78125 T31 15 13728 + T32 11424 40000∗ T35 13 27833 T38 1671 86432 + T39 42 27136 T40 120 08989 T44 13 61513
Signature (2,3) G dmin G T6 −42 86875 T26 T8 −1071 71875 T27 T15 −409 60000 T30 T23 −226 65187 T31
dmin −74 86875 −746 71875 −214 34375 −793 60000
G dmin T35 −44 61875 T38 −49413 82327∗ T40 −226 65187 T44 −47 11123
Signature (4,2) G dmin G + T7 569 53125 T20 + T9 409 60000 T21 + + T10 640 00000 T22 + + T11 230 40000 T24 ∗ T15 1 16625 89952 T26 T16 3200 00000 T27 T17 152 43125 T28 + + T18 193 60000 T29 + ∗ T19 2 77102 63296 T30
dmin 2684 35456 88305 03125∗ 3686 40000 393 75625 3219 78368 713 03168 318 78125 257 55625 10768 90625
G dmin T31 1049 60000 + T32 4 51783 52704∗ T35 173 18125 T38 56472 94088∗ + T39 205 02784 T40 74950 14493∗ T44 152 97613
Imprimitive Octic Fields with Small Discriminants
375
Signature (6,1) G dmin G dmin G dmin T27 −746 71875 T35 −688 56875 T44 −1034 05923 T31 −2494 95552 T38 −49413 82327
Signature (8,0) G T1 T2+ T3+ T4+ T5+ T6 T7 T8 T9+ + T10 + T11 + T12
3
dmin 4103 38673 3240 00000 33177 60000 4420 50625 1 22305 90464 59101 06112 51200 00000 1 60984 53125 15341 32224 10643 90625 4326 40000 58873 39441
G + T13 + T14 T15 T16 T17 + T18 + T19 + T20 T21 + T22 T23 + T24
dmin 6 05238 72256 82 13869 40416∗ 11 94356 44125 4 78975 78125 2823 00416 41634 75625 8 75781 16096 1 48840 00000 17 51562 32192 1 80633 60000 2 14154 71433 23936 55625
G dmin T26 25760 88125 T27 12922 03125 T28 81608 00000 + T29 50694 40000 T30 12 39119 40625 T31 19481 60000 + T32 315 03303 56889∗ T35 3095 93125 T38 6 28261 46729 + T39 52400 22544 T40 46 31434 05393∗ T44 11527 84549
Arithmetically Equivalent Fields
In this section we give all the examples of non-isomorphic octic fields having the same Dedekind ζ-function that we have found in the tables; such fields are called arithmetically equivalent (see [11]). Note that the equality of the Dedekind zeta functions of two fields K1 and K2 does not imply the equality of the ramification exponents of the primes in K1 and K2 but only of the residual indices. So, for some pairs of such fields, it is possible to decide that they are not isomorphic by factoring ramified primes in each of them. The following theorem gives a necessary and sufficient condition for the existence of arithmetically equivalent fields (see [8]). In fact, this is a purely grouptheoretical property. Theorem 1. Let K1 and K2 be two number fields (assumed to be in a fixed algebraic closure of Q). The fields K1 and K2 are arithmetically equivalent if and only if the following two conditions are satisfied. – The fields have a common Galois closure N . – Let G = Gal(N/Q), G1 = Gal(N/K1 ) and G2 = Gal(N/K2 ). Then for each conjugacy class C in G, we must have |G1 ∩C| = |G2 ∩C| (where | | denotes cardinality).
376
Henri Cohen, Francisco Diaz y Diaz, and Michel Olivier
From this theorem, it is not difficult to deduce the following (Y. Eichenlaub, personal communication). Corollary 1. If K1 and K2 are non-isomorphic arithmetically equivalent octic fields, their Galois group is isomorphic to T15 or to T23 . Conversely, if an octic field K1 has Galois group isomorphic to T15 or to T23 there exists a nonisomorphic field K2 arithmetically equivalent to K1 . Note that T15 can be interpreted as the group Hol(C8 ) of order 32 or as the semi-direct product of C8 by its group of automorphisms acting in a natural way (C8 denotes the cyclic group of order 8); T23 is the group GL2 (F3 ) of order 48. Since T15 cannot occur in signature (6, 1), we cannot have arithmetically equivalent fields with such a Galois group in that signature. In the limits of our tables, we found no example in signature (4, 2), but the desired (minimal) example was found during the search for the minimal discriminant with Galois group T15 . Similarly, T23 cannot occur in signature (4, 2) and (6, 1). In the limits of our tables, we found no example in signature (0, 4), but the desired (minimal) example was found during the search for the minimal discriminant with Galois group T23 using the upper bound from [9]. For each signature we give all the examples found in the limits of our tables (plus the example of T15 in signature (4, 2) and T23 in signature (0, 4)). For each pair of fields having the same ζ-function, we give octic polynomials generating the corresponding fields. In the limits of our tables, we have found two examples of quadruples of number fields having the same discriminant, signature and Galois group, forming two pairs of arithmetically equivalent fields (all having T23 as Galois group). These examples occur for discriminants −150730227 and −1327373299. Two arithmetically equivalent fields have the same product h(K)R(K) of the class number by the regulator. Since the class number is very often equal to 1, it is usually the case that the class numbers and the regulators are equal. It has however been noticed by several authors (see for example [13]) that the class numbers (hence the regulators) of arithmetically equivalent fields may be different. Two of the 18 pairs of arithmetically equivalent fields with T15 as Galois group that we have found give such examples. Both are in signature (2, 3). The pairs are for discriminant −518711875 and −1097440000, for which the second field given below has class number 2 while the first has class number equal to 1. In both of these cases, the narrow class numbers of the fields coincide and are equal to 2.
Imprimitive Octic Fields with Small Discriminants
Signature (0,4), group T15 dK P olynomial 8 6 31443200 x − x − 4x5 − 2x4 + 4x3 + 12x2 + 6x + 1 8 7 31443200 x − 2x + 3x6 − 6x5 + 13x4 − 20x3 + 21x2 − 14x + 5 70304000 x8 − 4x7 + 7x6 − 2x5 − 8x4 + 8x3 + 2x2 − 4x + 1 70304000 x8 + x6 − 2x5 + 5x4 − 2x3 + x2 + 1 143327232 x8 + 6x6 + 15x4 + 12x2 + 3 143327232 x8 + 3x4 + 3 8 7 6 5 4 3 212556032 x − 4x + x + 8x + 5x − 14x − 13x2 + 4x + 17 212556032 x8 − 2x7 + 5x6 − 10x5 + 23x4 − 36x3 + 51x2 − 48x + 17 Signature (0,4), group T23 dK P olynomial 33971001237 x8 − 4x7 + 8x6 − 9x5 + 27x4 − 39x3 + 35x2 − 17x + 7 33971001237 x8 − x7 − x5 + 25x4 − 54x3 + 50x2 − 8x + 9 Signature (2,3), group T15 dK P olynomial −40960000 x8 + 4x6 + 5x4 + 2x2 − 1 −40960000 x8 − x4 − 1 8 7 6 5 4 3 −131274675 x − x + x − 4x + x − 4x + x2 − x + 1 8 −131274675 x − 2x7 + 4x6 − 5x5 + x4 − 5x3 + 4x2 − 2x + 1 −342102016 x8 + 8x4 − 1 8 6 −342102016 x + 4x + 5x4 + 2x2 − 4 −359661568 x8 − 4x6 − 4x5 + 2x4 − 2 8 −359661568 x − 4x5 + 8x3 + 4x2 + 4x + 1 8 6 5 −518711875 x − 6x − x + 4x4 + 13x3 + 9x2 + 10x − 5 8 7 −518711875 x − x − 6x6 + 8x5 + 11x4 − 15x3 − 29x2 + 55x − 25 −1024000000 x8 − 15x4 − 50x2 − 25 −1024000000 x8 + 5x4 − 25 8 6 −1097440000 x − 5x + 3x4 + 15x2 − 19 −1097440000 x8 + 5x6 + 3x4 − 15x2 − 19 8 7 6 −1119744000 x − 2x + x − 8x5 + x4 − 8x3 + x2 − 2x + 1 −1119744000 x8 − 2x7 + x6 + 4x5 − 5x4 + 4x3 + x2 − 2x + 1 −1344252672 x8 − 3x6 + 6x2 − 3 −1344252672 x8 + 3x6 − 6x2 − 3 8 7 6 5 4 −1517535243 x − x − 5x + 8x + 4x − 25x3 − 5x2 + 11x − 5 −1517535243 x8 − x7 + 4x6 − x5 + 7x4 − 10x3 − 8x2 + 14x − 5
377
378
Henri Cohen, Francisco Diaz y Diaz, and Michel Olivier
Signature (2,3), group T23 dK P olynomial 8 7 6 −22665187 x − x + x − 2x5 + x4 − 9x3 + 7x2 − 6x + 1 8 −22665187 x − 3x7 + 4x6 − 8x5 + 8x4 − x3 + 2x2 − 3x − 1 −32019867 x8 − 4x7 + 7x6 − 7x5 + 7x4 − 7x3 + 2x2 + x − 1 −32019867 x8 − x7 + x6 − 2x5 − x4 − 2x3 + x2 − x + 1 −36264691 x8 − 4x7 + 5x6 − x5 − 3x4 + 3x3 − x − 1 8 −36264691 x − 2x7 + 3x6 − 7x4 + 17x3 − 17x2 + 11x − 1 −81415168 x8 − 4x7 + 8x6 − 8x5 + 6x3 − 2x2 − 2 8 7 6 −81415168 x − 4x + 10x − 14x5 + 10x4 + 2x3 − 10x2 + 8x − 2 −110716875 x8 − 4x7 + 5x6 − x5 − 2x4 + x3 + 4x2 − 4x − 1 −110716875 x8 − 4x7 + 7x6 − 7x5 + 4x4 − x3 − 4x2 + 4x − 1 −118370771 x8 − 2x7 − x5 + 7x3 + x − 9 8 7 6 5 −118370771 x − 3x + 7x − 10x + 11x4 − 5x3 + 3x2 − 2x − 3 −150730227 x8 − x7 + x6 + 5x5 − 11x4 + 8x3 − 7x2 + 4x − 1 −150730227 x8 − 2x7 + 2x6 − x5 + 2x4 + x3 − 5x2 + 4x − 1 −150730227 x8 − 4x7 + 7x6 − 6x5 − 3x4 + 9x3 − x2 − 5x − 1 −150730227 x8 + 2x6 − 2x5 − 3x4 − 7x3 − 11x2 − 6x − 1 8 −178453547 x − 2x7 + 3x6 + x5 − 4x4 + 12x3 − 7x2 + 2x − 9 −178453547 x8 − x7 − 2x6 + 6x5 − 11x3 + 5x2 + 8x − 3 −181398528 x8 − 2x6 − 2x5 − 2x3 − 2x2 + 1 −181398528 x8 − 6x4 + 4x2 − 3 8 7 6 5 4 −182660427 x − 4x + 7x − 7x + 4x − x3 + 4x2 − 4x + 1 −182660427 x8 − x7 + 4x6 − 4x5 − 2x4 − 4x3 − 5x2 − x + 1 8 −265847707 x − 2x7 + 3x6 + x5 − 7x4 + 18x3 − 12x2 + 4x − 1 −265847707 x8 − x7 − x6 + 5x5 − 3x4 + 2x3 − 6x2 + 9x − 1 −286557184 x8 − 5x6 + 6x4 + 3x2 − 1 −286557184 x8 + 5x6 + 6x4 − 3x2 − 1 8 7 6 5 −325660672 x − 2x + 6x − 12x + 16x4 − 10x3 − 4x2 + 4x − 1 −325660672 x8 − 2x7 + 4x6 − 12x5 + 16x4 − 18x3 + 10x2 − 1 8 −423564751 x − 2x7 + 2x6 − 4x5 − 4x4 + 20x3 − 14x2 + 13x − 4 −423564751 x8 − x7 − 2x6 + 4x5 − 9x4 + 17x3 − 21x2 + 7x − 4 −425329947 x8 − x7 − 3x6 + 2x5 + 4x4 + 3x3 − 5x2 − 7x − 3 −425329947 x8 − 6x4 − x2 − 3 −725594112 x8 − 6x4 − 4x2 − 3 8 −725594112 x − 4x6 + 12x2 − 12 8 7 6 5 4 −941391011 x − 2x − 2x + 17x − 32x + 31x3 − 15x2 + 4x − 1 −941391011 x8 + 2x6 − 9x4 + 6x2 − 11 8 7 6 5 −999406512 x − 4x + x + 11x − 11x4 − x3 + 4x2 − x − 2 8 −999406512 x − 4x7 + 7x6 − 7x5 − 2x4 + 11x3 − 2x2 − 4x − 2 −1280239375 x8 − 2x7 + 5x5 − 6x4 − 10x3 + 21x2 + 5x − 13
Imprimitive Octic Fields with Small Discriminants
Signature (2,3), group T23 (continued) dK P olynomial −1280239375 x8 − 4x7 + 10x6 − 16x5 + 18x4 − 14x3 + 4x2 + x − 1 −1327373299 x8 − x7 − 2x6 − 6x5 + 10x4 + 15x3 − 6x2 − 5x − 7 −1327373299 x8 − 3x7 + 2x6 − 3x4 + 11x3 − 13x2 + 15x − 11 −1327373299 x8 − 6x6 − 5x5 + 19x4 + 21x3 − 18x2 − 36x − 5 −1327373299 x8 − x7 − 4x6 + 4x5 + 4x4 − 9x3 + 2x2 − 3x − 5 −1399680000 x8 − 6x6 + 12x4 − 6x3 − 6x2 + 18x − 3 −1399680000 x8 − 6x4 − 12x2 − 3
Signature (4,2) Group T15 dK P olynomial 8 7 11662589952 x − 4x + 2x6 − 4x5 + 12x4 + 12x3 − 4x − 2 8 7 11662589952 x − 4x + 6x6 + 8x5 − 36x4 + 32x3 + 14x2 − 24x + 1
Signature (8,0), group T15 dK P olynomial 119435644125 x8 − 4x7 − 3x6 + 23x5 − 3x4 − 37x3 + 8x2 + 15x − 5 119435644125 x8 − x7 − 11x6 + 4x5 + 21x4 − 4x3 − 11x2 + x + 1 131153375232 x8 − 12x6 + 45x4 − 54x2 + 12 131153375232 x8 − 9x6 + 24x4 − 21x2 + 3 8 186601439232 x − 14x6 + 44x4 − 46x2 + 13 186601439232 x8 − 10x6 + 32x4 − 38x2 + 13
Signature (8,0), group T23 dK P olynomial 8 6 5 21415471433 x − 15x − 8x + 66x4 + 61x3 − 57x2 − 53x + 1 21415471433 x8 − 4x7 − 4x6 + 26x5 + 2x4 − 52x3 + 31x + 1 8 60276601856 x − 2x7 − 8x6 + 8x5 + 16x4 − 8x3 − 8x2 + 2x + 1 8 60276601856 x − 4x7 − 4x6 + 24x5 + 4x4 − 42x3 + 4x2 + 22x − 7 95281280000 x8 − 2x7 − 10x6 + 14x5 + 16x4 − 22x3 − 2x2 + 8x − 2 95281280000 x8 − 14x6 − 14x5 + 38x4 + 54x3 − 10x2 − 28x − 2 8 108105297381 x − 4x7 − 5x6 + 29x5 − 14x4 − 25x3 + 10x2 + 8x + 1 108105297381 x8 − 3x7 − 9x6 + 21x5 + 33x4 − 33x3 − 54x2 − 12x + 3
379
380
Henri Cohen, Francisco Diaz y Diaz, and Michel Olivier
References 1. G. Butler and J. McKay: The transitive groups of degree up to eleven. Comm. in Algebra 11 (1983) 863–911 2. H. Cohen, F. Diaz y Diaz and M. Olivier: Algorithmic methods for finitely generated Abelian groups. Submitted to J. of Symb. Comp. (1997) 3. H. Cohen, F. Diaz y Diaz and M. Olivier: Tables of Octic Fields with a Quartic Subfield. Submitted to Math. of Comp. (1998) 4. J. Conway, A. Hulpke and J. McKay: Names and generators for the transitive groups of degree up to 15. Preprint (1996) 5. F. Diaz y Diaz: Petits discriminants des corps de nombres totalement imaginaires de degr´e 8. J. of Number Th. 25 (1987) 34–52 6. Y. Eichenlaub: Probl`emes effectifs de th´eorie de Galois en degr´es 8 ` a 11. Th`ese Universit´e Bordeaux I (1996) 7. Y. Eichenlaub and M. Olivier: Computation of Galois groups for polynomials with degree up to eleven. Submitted to Math. of Comp. (1997) 8. F. Gassmann: Bemerkungen zur Vorstehenden Arbeit von Hurwitz. Mat. Z. 25 (1926) 665–675 9. A. Jehanne: Sur les extensions de Q ` a groupe de Galois S4 et S˜4 . Acta Arith. LXIX (1995) 259–276 10. S.-H. Kwon: Sur les discriminants minimaux des corps quaternioniens. Arch. Math. 67 (1996) 119–125 11. R. Perlis: On the equation ζK (s) = ζK 0 (s). J. Number Th. 9 (1977) 342–360 12. M. Pohst, J. Martinet and F. Diaz y Diaz: The Minimum Discriminant of Totally Real Octic Fields. J. Number Th. 36 (1990) 145–159 13. B. de Smit and R. Perlis: Zeta functions do not determine class numbers. Bull. Am. Math. Soc. 31 (1994) 213–215 14. G. W. Smith: Some polynomials over Q(t) and their Galois groups. Preprint (1993)
A Table of Totally Complex Number Fields of Small Discriminants Henri Cohen, Francisco Diaz y Diaz, and Michel Olivier Universit´e Bordeaux I, Laboratoire A2X 351 cours de la Lib´eration, 33 405 Talence, France {cohen,diaz,olivier}@math.u-bordeaux.fr
Abstract. Using the explicit class field theory developed in [3] and tables of number fields in low degree, we construct totally complex number fields having a degree smaller than 80 and a root discriminant near from Odlyzko’s bounds. For some degrees, we extend and improve the table of totally complex number fields of small discriminants given by Martinet [7]. For all these fields L (with 4 exceptions) we also give the relative equation over a base field K, and the absolute equation of L/Q.
1
Introduction
The purpose of this paper is to extend and improve the table of totally complex number fields of small discriminants given by Martinet [7]. For a fixed degree N and a fixed signature (R1 , R2 ) with N = R1 + 2R2 , it is known since Minkowski that there exists a positive constant C(R1 , R2 ) such that for any number field L of signature (R1 , R2 ), we have |dK |1/N ≥ C(R1 , R2) , where dK denotes the absolute discriminant of L/Q. We will call |dK |1/N the root discriminant of L. Since the work of several people, among which Stark, Odlyzko, Poitou, Serre and Diaz y Diaz, one has good values of C(R1 , R2 ), the best ones being obtained by assuming the Generalized Riemann Hypothesis (GRH), which we will do here for the purpose of comparison (otherwise the results presented in this paper do not depend on the GRH). It is interesting to test how close the GRH bounds are to discriminants of existing number fields. Evidently, since they are obtained as consequences of exact explicit formulas, it cannot be expected that the bounds are too close, since they depend on the splitting behavior of small primes and on the ordinates of the smallest zeroes of the Dedekind zeta functions. For example, the totally real number field of degree 7 of smallest discriminant (which is known) has root discriminant |dK |1/N which is 14.909% above the GRH bounds. Another interest of number fields of small discriminants is the construction of dense lattices, either through the additive structure, or through the multiplicative structure. Indeed, it is easy to show that the denseness of these lattices is closely related to the size of the root discriminant. J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 381–391, 1998. c Springer-Verlag Berlin Heidelberg 1998
382
Henri Cohen, Francisco Diaz y Diaz, and Michel Olivier
Since the number of signatures up to degree N grows quadratically with N , it is usual to consider only a subset of all possible signatures. In this paper we will restrict to totally complex number fields L (i.e. R1 = 0, R2 = N/2), both because the discriminants are the smallest in absolute value, and because they have been the most studied (for that reason). In [7], Martinet gives a table of the smallest known discriminants of totally complex number fields of degree up to 80 (not all even degrees are included). He does not include all possible degrees, and does not give explicit defining equations. It is essential to have these equations if one wants to work in the lattices associated to the number fields. To our knowledge, since the work of Martinet, the only improvement of his table is a paper by Leutbecher and Niklash [6] which contains among many other things a totally complex number field of degree 10 having smaller discriminant than the one in Martinet’s table. In this paper, we will extend Martinet’s table in three ways. First, we found 9 totally complex number fields in degrees 12, 16, 18, 32, 36, 40, 48, 52 and 56 with smaller discriminants that the ones given by Martinet. Secondly, we found new fields of small discriminant for degrees 64, 68 and 76 which were not included in Martinet’s table. Finally, we give for every even degree up to 32 and every degree divisible by 4 up to 80, the totally complex number field L having the smallest known root discriminant, by giving the discriminant in factored form, the root discriminant, the percentage above the GRH bounds and, except for degrees 44, 52, 68, 76 and 80, the relative equation over a base field K, and the absolute equation over Q. Note that, using optimized techniques, we have completely recomputed the GRH bounds using [8] and obtained values which are very slightly better than the published ones. These values for every possible signature up to N = 100 are available by anonymous ftp from ftp://megrez.math.u-bordeaux.fr/pub/numberfields/odlyzkobounds For the sake of completeness, the values corresponding to totally complex fields are reproduced here.
2
Methods Used for Computing the Tables
We have proceeded as follows. As explained in [3], for a given base field K of absolute degree m, if we want to look for all Abelian extensions L of K having root discriminant less than or equal to a constant C, it is enough to consider modules whose norm is less than or equal to C 2m /d(K)2 . This bound being in general pessimistic, and also being too large for practical computations for small base fields, we have often lowered the actual bound used, at the risk of missing some fields which may be obtained by this method and having smaller root discriminant than those which have been found. Since we limit ourselves to fields L whose absolute degree is at most equal to 80, we have chosen C to be 20, which is 10.6% above the largest GRH bounds. Once the bound chosen, we use a recursive procedure to compute all discriminants of totally complex Abelian extensions of K obtained as ray class fields
A Table of Totally Complex Number Fields of Small Discriminants
383
of K, using the methods and formulas of [3]. Note that we should also consider subfields of these ray class fields, but we have not done so for two reasons. First, because the amount of computation would become extremely large, but second and foremost because extensive experiments have shown that these subfields do not usually give the smallest possible discriminants (in fact never in the numerous cases that we have tried). Once interesting extensions have been detected, there remains the problem of computing defining equations. For this purpose we have used 2 methods (the method of Stark units mentioned in [3] and explained in [9] is not mature enough and is not applicable when the base fields are not totally real). The main method we have used is Kummer theory (see [4] and [3]). In the case where the base field is an imaginary quadratic field, we have also used the method based on elliptic functions. We thank C. Fieker for performing the computations for us using the most recent version of KANT. Note that frequently the field L is obtained through intermediate extensions and not directly as a relative extension of K. When a relative equation is obtained, it is easy to obtain an absolute equation. However, the coefficients of these equations are often large. It is important to reduce these equations. For this, we proceed as follows. First, after improving an algorithm of C. Fieker and M. Pohst explained in [5], we have written a relative polynomial reduction algorithm analogous to the POLRED algorithm explained in [1] and [2]. The new absolute equation obtained from this is usually simpler, and we apply the usual absolute polynomial reduction algorithms to find a nice equation. This final reduction process becomes quite difficult to apply when the absolute degree of L is greater or equal to 30 (it is already quite difficult in degree 24 and above), and hence we have not spent the considerable amounts of computation necessary to reduce the equations in these cases.
3
The Tables
The tables are presented in the following way. The smallest known totally complex number field L of degree N will be obtained as a ray class field extension of degree n of some base field K of degree m corresponding to a modulus m. We give K by a defining equation, the discriminant of K in factored form, the modulus m, the discriminant of L in factored form, the root discriminant dr(L), the Odlyzko bound, the percentage above the Odlyzko bound, the relative equation of L/K, when it was obtained, and the absolute equation of L/Q. The modulus m is coded as in [7], that is written as a product of primes, where Pp denotes a prime ideal of degree 1 above p and pp denotes a prime ideal of degree 2 above p. In certain cases, instead of giving directly a relative equation of L/K, we pass through an intermediate extension K1 /K. A comment should be made about the quality of the number fields found. In most cases, the root discriminant is less than 2% above the GRH bounds, which
384
Henri Cohen, Francisco Diaz y Diaz, and Michel Olivier
is very satisfactory, both to show that the GRH bounds are sharp and that the number field fond is close to optimal. In certain cases, however, the percentage above the GRH bound is much larger. For example, in degree 26, the best known polynomial is 6.468% above the GRH bound. The main reason for this is that we have had to use ray class field constructions above quadratic fields. It is very plausible that if we took as base fields number fields of degree 13, we would get much better results. Unfortunately, tables of such fields are not available (we could of course try a few individual degree 13 fields, but we have not done so). It is also for similar reasons that we do not give number fields whose degree is not divisible by 4 and greater than 32. In degrees 68 and 76, a similar phenomenon occurs: for lack of tables of number fields of degree 17 or 19, we have to be content with using a base field of degree 4, which gives percentages of relatively poor quality (3.777% and 5.656% respectively). Degree 2 N = 2, m = 1, n = 2 K : y d(K) = 1 = 3 ∞1 d(L) = −3 L/K: x2 − x + 1 L/Q: x2 − x + 1 dr(L) = 1.73205080757, GRH = 1.722443139, % = 0.557793
m P
Degree 4 N = 4, m = 2, n = 2 K : y2 − y + 1 d(K) = −3 = 13 d(L) = 32 · 13 2 L/K: x + (y − 1)x + (−y + 1) L/Q: x4 − x3 − x2 + x + 1 dr(L) = 3.2888681680, GRH = 3.266493871, % = 0.684964
m P
Degree 6 N = 6, m = 2, n = 3 K : y2 − y + 1 d(K) = −3 = 19 d(L) = −33 · 192 3 2 L/K: x + (−y + 1)x + (−2y + 1)x − y L/Q: x6 − x5 + x4 − 2x3 + 4x2 − 3x + 1 dr(L) = 4.6218072306, GRH = 4.595330090, % = 0.576175
m P
Degree 8 N = 8, m = 2, n = 4 K : y2 + 1 d(K) = −22 = 17 d(L) = 28 · 173 4 3 2 L/K: x + (−y + 2)x + (−y + 1)x + (−y + 1)x − y L/Q: x8 − 2x7 + 4x5 − 4x4 + 3x2 − 2x + 1 dr(L) = 5.7869314938, GRH = 5.7378391765, % = 0.855589
m P
Degree 10 N = 10, m = 5, n = 2 K : y5 − y2 + 1 d(K) = 7 · 431 = 23 ∞1 d(L) = −72 · 23 · 4312 2 3 2 3 L/K: x + (−y − y + y + 1)x − y L/Q: x10 − 3x9 + 7x8 − 11x7 + 13x6 − 12x5 + 9x4 − 5x3 + 3x2 − 2x + 1 dr(L) = 6.79341030426, GRH = 6.7301825388, % = 0.939466
m P
Degree 12 N = 12, m = 6, n = 2
A Table of Totally Complex Number Fields of Small Discriminants
m
385
P
K : y6 − y5 + 2y3 − 2y2 + 1 d(K) = 37 · 857 = 41 ∞1 ∞2 d(L) = 372 · 41 · 8572 L/K: x2 + (y5 + y4 − y3 + 2y2 + y − 1)x + (y5 − y3 + 2y2 − y − 1) L/Q: x12 − 2x11 + 2x10 − x9 + 2x8 − 5x7 + 8x6 − 7x5 + 4x4 − 3x3 + 4x2 − 3x + 1 dr(L) = 7.6664753870, GRH = 7.6023702394, % = 0.843226 Degree 14 N = 14, m = 2, n = 7 K : y2 − y + 18 d(K) = −71 = ZK d(L) = −717 L/K: x7 − 3x6 + 2x5 + x4 − 2x3 + 2x2 − x + 1 L/Q: x14 − 7x13 + 25x12 − 59x11 + 103x10 − 141x9 + 159x8 − 153x7 + 129x6 − 95x5 + 58x4 − 27x3 + 10x2 − 3x + 1 dr(L) = 8.42614977318, GRH = 8.3774697780, % = 0.581082
m
Degree 16 N = 16, m = 4, n = 4 K : y4 − y − 1 d(K) = −283 = 17 37 ∞1 ∞2 d(L) = 172 · 372 · 2834 L/K: x4 − y2 x3 + x2 − y2 x + 1 L/Q: x16 +2x14 −x13 +3x12 −4x11 +4x10 −7x9 +5x8 −7x7 +4x6 −4x5 +3x4 −x3 +2x2 +1 dr(L) = 9.17863161063, GRH = 9.0730290358, % = 1.163917
m P P
Degree 18 N = 18, m = 6, n = 3 K : y6 − 2y5 + 3y4 + y2 + 3y + 1 d(K) = −232 · 107 = 2ZK d(L) = 12 6 3 −2 · 23 · 107 L/K: x3 + (−3y5 + 8y4 − 14y3 + 7y2 − 5y − 7)x2 + (3y5 − 7y4 + 11y3 − 4y2 + 4y + 5)x + (−2y5 + 5y4 − 8y3 + 4y2 − 4y − 4) L/Q: x18 − x17 + 3x16 + 2x15 − x14 + 11x13 + 3x12 + 3x11 + 28x10 − 18x9 + 47x8 − 27x7 + 45x6 − 23x5 + 27x4 − 11x3 + 9x2 − 2x + 1 dr(L) = 9.8361823651, GRH = 9.7025076307, % = 1.377734
m
Degree 20 N = 20, m = 4, n = 5 K : y4 + 1 d(K) = 28 = 11 d(L) = 240 · 118 5 3 2 4 3 2 L/K: x + (y − y − 1)x + (−y + y + y)x3 + (−y2 − 2y + 1)x2 + (y3 + y − 1)x − y3 L/Q: x20 − 4x19 + 8x18 − 8x17 − x16 + 12x15 − 8x14 − 16x13 + 43x12 − 44x11 + 24x10 − 12x9 + 24x8 − 44x7 + 48x6 − 36x5 + 21x4 − 12x3 + 8x2 − 4x + 1 dr(L) = 10.43799454111, GRH = 10.2763715085, % = 1.572763
m p
Degree 22 N = 22, m = 2, n = 11 K : y2 − y + 2 d(K) = −7 = 23 d(L) = −711 · 2310 11 10 9 8 L/K: x + (−y − 2)x + (y + 2)x − 5x + (3y + 6)x7 + (−7y − 1)x6 + 7yx5 + (−9y − 3)x4 + (12y − 1)x3 + (−9y + 6)x2 + (3y − 5)x + 1 L/Q: x22 − 5x21 + 13x20 − 26x19 + 48x18 − 82x17 + 127x16 − 179x15 + 238x14 − 309x13 + 391x12 − 475x11 + 560x10 − 644x9 + 703x8 − 690x7 + 578x6 − 398x5 + 220x4 − 95x3 + 31x2 − 7x + 1 dr(L) = 11.0031293437, GRH = 10.8028794413, % = 1.853672
m P
Degree 24 N = 24, m = 4, n = 6 K : y4 − y3 − y2 + y + 1
d(K) = 32 · 13
m=P
397
d(L) = 312 · 136 · 3975
386
Henri Cohen, Francisco Diaz y Diaz, and Michel Olivier
L/K: x6 + (−2y3 + 2y2 − 2)x5 + (y3 − 2y2 − y − 1)x4 + (4y3 − y2 − 2y + 1)x3 + (−2y3 + 5y2 + y − 1)x2 + (−y3 − 4y2 + 3y + 3)x + (y2 − y − 1) L/Q: x24 − 6x23 + 22x22 − 62x21 + 146x20 − 295x19 + 522x18 − 829x17 + 1191x16 − 1559x15 +1874x14 −2078x13 +2127x12 −2007x11 +1752x10 −1403x9 +1023x8 −683x7 + 407x6 − 216x5 + 103x4 − 41x3 + 15x2 − 4x + 1 dr(L) = 11.4409254140, GRH = 11.2886417987, % = 1.348999 Degree 261 d(L) = −239 · 15357612 · 7036903 L/Q: x26 − x25 + 3x24 − 4x23 + 6x22 − 8x21 + 9x20 − 12x19 + 12x18 − 14x17 + 14x16 − 14x15 + 15x14 − 13x13 + 15x12 − 14x11 + 14x10 − 14x9 + 12x8 − 12x7 + 9x6 − 8x5 + 6x4 − 4x3 + 3x2 − x + 1 dr(L) = 12.41851167599, GRH : 11.7390198188, % = 5.788319 Degree 28 N = 28, m = 4, n = 7 K : y4 + 2y2 − 2y + 1 d(K) = 24 · 37 = 71 d(L) = 228 · 377 · 716 L/K: x7 + (−3y3 − 3y2 − 9y − 3)x6 + (23y3 + 18y2 + 55y − 15)x5 + (−66y3 − 30y2 − 124y + 82)x4 + (75y3 − 3y2 + 126y − 148)x3 + (−20y3 + 31y2 − 29y + 115)x2 + (−14y3 − 18y2 − 35y − 32)x + (6y3 + 5y2 + 16y) L/Q: x28 − 6x27 + 14x26 − 12x25 − 15x24 + 64x23 − 94x22 + 38x21 + 106x20 − 230x19 + 198x18 + 20x17 − 268x16 + 324x15 − 128x14 − 132x13 + 241x12 − 164x11 + 6x10 + 82x9 − 68x8 + 28x7 − 2x6 − 10x5 + 9x4 − 2x3 + 1 dr(L) = 12.2964187438, GRH = 12.15841433838, % = 1.135053
m P
Degree 30 N = 30, m = 5, n = 6 K : y5 − y − 1 d(K) = 19 · 151 = 307 ∞1 d(L) = −196 · 1516 · 3075 6 3 2 5 4 3 2 L/K: x + (−3y + y − 2y − 1)x + (6y + y + 6y + 3y − 2)x4 + (−2y4 − 7y3 − 8y2 − 8y − 9)x3 + (5y4 + 9y3 + 6y2 + 12y + 6)x2 + (−5y4 − 5y3 − 5y2 − 7y − 2)x + (2y4 + y3 + 2y2 + 2y) L/Q: x30 − 5x29 + 13x28 − 20x27 + 22x26 − 36x25 + 77x24 − 141x23 + 211x22 − 237x21 + 247x20 − 329x19 + 456x18 − 543x17 + 580x16 − 538x15 + 327x14 − 54x13 − 34x12 − 85x11 + 176x10 − 109x9 + 16x8 + x7 + 13x6 − 9x5 − 4x3 + 9x2 − 5x + 1 dr(L) = 12.76642129721, GRH = 12.5504775347, % = 1.720602
m P
Degree 32 N = 32, m = 4, n = 8 K : y4 − y3 + 2y + 1 d(K) = 33 · 7 = 3 13 d(L) = 328 · 78 · 1314 8 3 2 6 2 4 L/K: x + (7y − 7y + 5y + 7)x + (−2y − 5y − 5)x + (−9y3 + 12y2 − 24y − 21)x2 + (12y2 − 9y − 9) L/Q: x32 − 5x31 + 17x30 − 40x29 + 77x28 − 131x27 + 200x26 − 295x25 + 385x24 − 496x23 + 575x22 −647x21 +669x20 −585x19 +561x18 −292x17 +323x16 +52x15 +162x14 +183x13 + 111x12 + 146x11 + 92x10 + 67x9 + 31x8 + 22x7 + 11x6 + 11x5 + 11x4 + 7x3 + 8x2 + 5x + 1 dr(L) = 13.06489201515, GRH = 12.9182704463, % = 1.134994
m Pp
Degree 36 N = 36, m = 4, n = 9 K : y4 − y3 + 31y2 − 24y + 252 1
d(K) = 32 · 4057
m=Z
K
d(L) = 318 · 40579
The polynomial given here was found by D. Simon at Bordeaux after that this paper was submitted. The corresponding number field has a discriminant root nearer from Odlyzko’s bound than our field.
A Table of Totally Complex Number Fields of Small Discriminants
387
L/K: x9 + (−1/48y3 − 5/48y2 − 13/48y − 9/8)x8 + (1/16y3 − 11/16y2 + 29/16y − 93/8)x7 + (1/24y3 + 5/24y2 + 37/24y + 1/4)x6 + (−1/3y3 + 1/3y2 − 13/3y + 3)x5 + (−1/8y3 + 3/8y2 − 29/8y + 17/4)x4 + (−1/12y3 + 7/12y2 − 25/12y + 9/2)x3 + (3/16y3 − 1/16y2 + 39/16y + 17/8)x2 − x + 1 L/Q: x36 + 2x35 − x34 − 6x33 − 10x32 − 7x31 + 6x30 + 16x29 + 64x28 + 18x27 − 72x26 − 119x25 +140x24 +20x23 +96x22 −528x21 +429x20 −237x19 +613x18 −533x17 +1151x16 − 484x15 +664x14 −464x13 +161x12 +1006x11 −1324x10 +716x9 −36x8 −239x7 +245x6 − 197x5 + 121x4 − 55x3 + 17x2 − 4x + 1 dr(L) = 13.8233046436, GRH = 13.5910188106, % = 1.709113 Degree 40 N = 40, m = 2, n = 20 K : y2 + 2 d(K) = −23 = 3 03 11 d(L) = 260 · 320 · 1118 20 19 18 17 L/K: x − yx + (y − 3)x + (y + 2)x + (−2y + 2)x16 + (−y − 5)x14 + 2x13 + (−8y − 9)x12 − 6yx11 + (−12y + 24)x10 + (6y + 29)x9 + (15y + 21)x8 + (18y − 15)x7 + (2y − 13)x6 + (−7y − 12)x5 − yx4 + (−y + 1)x3 + (−y − 1)x + 1 L/Q: x40 − 4x38 + 11x36 − 24x34 + 8x33 + 20x32 − 4x31 + 90x30 + 22x29 − 143x28 − 164x27 + 294x26 + 330x25 + 107x24 − 224x23 − 274x22 − 272x21 − 62x20 + 548x19 + 881x18 + 388x17 − 59x16 − 444x15 − 359x14 − 264x13 + 13x12 + 166x11 + 98x10 − 50x9 + 6x8 − 8x7 + 29x6 − 20x5 + 2x4 + 2x3 + 3x2 − 2x + 1 dr(L) = 14.41226799431, GRH = 14.19319756384, % = 1.543489
m PPP
Degree 44 N = 44, m = 4, n = 11 K : y4 − y3 + 2y + 1 d(K) = 33 · 7 = 463 d(L) = 333 · 711 · 46310 dr(L) = 14.9599214311, GRH = 14.7371519769, % = 1.511618
m P
Degree 48 N = 48, m = 4, n = 12 K : y4 −y3 +4y2 +3y +9 d(K) = 32 ·132 = 2 5 d(L) = 216 ·324 ·520 ·1324 L/K: x12 +(−7/12y3 −2/3y2 −7/3y −7/4)x11 +(11/12y3 −14/3y2 −19/3y −49/4)x10 + (49/6y3 − 8/3y2 + 5/3y − 33/2)x9 + (55/6y3 + 76/3y2 + 74/3y + 33/2)x8 + (−49/12y3 + 124/3y2 + 176/3y + 303/4)x7 + (−265/12y3 + 58/3y2 + 77/3y + 375/4)x6 + (−89/4y3 − 22y2 − 29y + 5/4)x5 + (−5y3 − 33y2 − 42y − 41)x4 + (47/12y3 − 47/3y2 − 70/3y − 149/4)x3 + (13/3y3 − 10/3y2 − 5/3y − 13)x2 + (7/6y3 + 4/3y2 + 2/3y − 3/2)x + (1/6y3 + 1/3y2 + 2/3y + 1/2) L/Q: x48 + 7x47 + 25x46 + 71x45 + 175x44 + 347x43 + 572x42 + 888x41 + 1362x40 + 1986x39 + 3151x38 + 5481x37 + 7847x36 + 9363x35 + 11957x34 + 15267x33 + 18675x32 + 28428x31 + 40477x30 + 34765x29 + 21534x28 + 37241x27 + 67183x26 + 58487x25 + 27439x24 +31600x23 +55433x22 +41601x21 +5795x20 −1065x19 +13853x18 +12575x17 − 1237x16 − 5312x15 − 467x14 + 1315x13 − 17x12 − 414x11 + 206x10 + 378x9 + 166x8 + 7x7 + 31x6 + 55x5 + 32x4 + 6x3 + x2 + x + 1 dr(L) = 15.3855539416, GRH = 15.2323247026, % = 1.005948
m pp
Degree 52 N = 52, m = 4, n = 13 K : y4 −2y3 +21y2 −20y +68 d(K) = 26 ·1009 = ZK dr(L) = 15.9410816066, GRH = 15.6860889956, % = 1.625597
m
d(L) = 278 ·100913
Degree 56 N = 56, m = 4, n = 14 K : y4 − y3 − 2y + 8
d(K) = 22 · 33 · 241
m=P
3 2
d(L) = 249 · 342 · 24114
388
Henri Cohen, Francisco Diaz y Diaz, and Michel Olivier
L/K: x14 + (1/2y3 + 1/2y2 − 4y)x13 + (−7/2y3 + 15/2y2 − 5y + 14)x12 + (−5/2y3 + 25/2y2 − 13y − 46)x11 + (−41/2y3 + 33/2y2 + 36y + 4)x10 + (13/2y3 + 9/2y2 + 25y − 172)x9 + (−34y3 − 10y2 + 92y + 15)x8 + (61/2y3 − 71/2y2 + 93y − 226)x7 + (3y3 − 97y2 + 109y + 94)x6 + (63y3 − 77y2 + 13y + 32)x5 + (15y3 − 50y2 − 47y + 202)x4 + (22y3 − 5y2 − 46y + 44)x3 + (−1/2y3 − 3/2y2 − 13y + 44)x2 + (3/2y3 − 1/2y2 − 4y)x + 1 L/Q: x56 + 7x54 + 24x52 + 9x51 + 50x50 − 96x49 − 231x48 − 767x47 − 522x46 + 1561x45 + 3671x44 + 4986x43 − 24x42 − 23635x41 − 19107x40 + 26056x39 + 17348x38 − 10245x37 + 22417x36 + 64623x35 + 11484x34 − 200511x33 − 51054x32 + 413079x31 − 100323x30 − 842611x29 +117846x28 +1319851x27 +405558x26 −1372689x25 −1172107x24 +799649x23 + 1695840x22 +374529x21 −1300189x20 −1033710x19 +387818x18 +463217x17 −292251x16 − 269473x15 + 201823x14 + 461967x13 + 294401x12 − 11411x11 − 40950x10 + 14659x9 + 5655x8 + 1275x7 + 1995x6 − 298x5 − 363x4 − x3 + 23x2 + 6x + 1 dr(L) = 16.4720043344, GRH = 16.1043051623, % = 2.283235 Degree 60 N = 60, m = 4, n = 15 K : y4 − y3 − 2y2 + 3 d(K) = 32 · 37 = 19 d(L) = 330 · 1928 · 3715 15 3 2 14 3 2 L/K: x + (−8y − 2y + 10y + 12)x + (y − 12y − 4y − 3)x13 + (105y3 + 47y2 − 175y − 214)x12 + (199y3 + 158y2 − 185y − 423)x11 + (−459y3 − 154y2 + 862y + 1182)x10 + (−1666y3 − 838y2 + 1942y + 3117)x9 + (1617y3 + 72y2 − 3165y − 4074)x8 + (6709y3 + 3301y2 −8399y−12687)x7 +(−3220y3 +448y2 +7403y+8961)x6 +(−14535y3 −6455y2 + 19179y + 28338)x5 + (−49y3 − 2486y2 − 3871y − 3342)x4 + (4963y3 + 1550y2 − 7013y − 9524)x3 + (3796y3 + 563y2 − 7849y − 9282)x2 + (9689y3 + 5226y2 − 12089y − 19194)x + (−1398y3 − 165y2 + 2831y + 3442) L/Q: x60 +10x59 +37x58 +46x57 −75x56 −269x55 +27x54 +1097x53 +1214x52 −1780x51 − 4206x50 + 1575x49 + 10806x48 + 3321x47 − 20476x46 − 21268x45 + 25388x44 + 49225x43 − 3910x42 − 58388x41 − 5946x40 − 49016x39 − 12871x38 + 186537x37 + 388195x36 − 555813x35 −813507x34 +56424x33 +2686112x32 −712674x31 −2606256x30 −2175134x29 + 7476159x28 − 1622459x27 − 2692394x26 − 7284708x25 + 14342600x24 − 6382068x23 + 2822924x22 − 11565293x21 + 12450662x20 + 978554x19 − 9322316x18 + 2707565x17 + 5477798x16 −4069228x15 −894005x14 +1975761x13 −403570x12 −428317x11 +161042x10 + 67809x9 − 14563x8 − 12779x7 + 2955x6 + 2343x5 − 558x4 − 218x3 + 127x2 − 19x + 1 dr(L) = 16.8796228533, GRH = 16.4917030908, % = 2.352212
m p
Degree 64 N = 64, m = 4, n = 16 K : y4 − 2y3 − 2y + 5 d(K) = 26 · 13 = 32 3 03 d(L) = 2128 · 348 · 1316 16 3 2 15 3 2 L/K: x + (−16y + 4y + 8y + 44)x + (104y − 46y − 74y − 324)x14 + (216y3 + 12y2 − 4y − 500)x13 + (−2079y3 + 651y2 + 1131y + 6125)x12 + (1856y3 − 1192y2 − 1832y − 6540)x11 + (6224y3 − 1182y2 − 2374y − 16988)x10 + (−10233y3 + 3891y2 + 6553y + 31337)x9 + (−697y3 − 1302y2 − 1726y − 619)x8 + (7964y3 − 2084y2 − 3748y − 22708)x7 + (−4191y3 + 1655y2 + 2741y + 12933)x6 + (1056y3 − 632y2 − 992y − 3628)x5 + (180y3 + 123y2 + 162y − 214)x4 + (−433y3 + 75y2 + 147y + 1159)x3 + (293y3 − 82y2 − 146y − 837)x2 + (−90y3 + 37y2 + 62y + 283)x + (−6y3 + 13) L/Q: x64 − 16x63 + 148x62 − 984x61 + 5204x60 − 23008x59 + 88052x58 − 298124x57 + 907746x56 − 2514344x55 + 6395456x54 − 15054016x53 + 33041716x52 − 68149264x51 + 133176154x50 − 248547252x49 + 445915596x48 − 771961928x47 + 1290208116x46 − 2077591740x45 + 3213358850x44 − 4760474288x43 + 6740970516x42 − 9107529416x41 + 11715979780x40 −14324538888x39 +16649722294x38 −18489120556x37 +19841696568x36 − 20928451724x35 +22048245450x34 −23329999988x33 +24569577446x32 −25287735528x31 + 24982192532x30 −23317244656x29 +20296007468x28 −16211263464x27 +11646125770x26 −
m Ppp
A Table of Totally Complex Number Fields of Small Discriminants
389
7285750840x25 + 3745909126x24 − 1317074608x23 − 89926728x22 + 801153924x21 − 1070585098x20 + 1027689256x19 − 750812554x18 + 405964220x17 − 134792341x16 − 12643528x15 +70491898x14 −69798408x13 +43382222x12 −16437836x11 +2088690x10 + 2564252x9 −2324514x8 +1247320x7 −373830x6 +46236x5 +29860x4 −20864x3 +9622x2 − 2880x + 397 dr(L) = 17.31357571165, GRH = 16.8521519243, % = 2.738070 Degree 68 N = 68, m = 4, n = 17 K: y4 − y + 1 d(K) = 229 = 647 d(L) = 22917 · 64716 dr(L) = 17.8380802376, GRH = 17.1888544112, % = 3.777016
m P
Degree 72 N = 72, m = 4, n = 18 K : y4 + 1 d(K) = 28 = 577 d(L) = 2144 · 57717 18 3 2 17 3 L/K: x − (10y + 7y + y − 3)x − (17y + 58y2 + 91y + 10)x16 − (646y3 − 1672y2 + 1894y − 88)x15 + (591y3 − 1983y2 + 5403y − 5591)x14 + (−17767y3 + 37293y2 − 21757y + 4281)x13 +(−1451989y3 +953818y2 +119069y−1090994)x12 +(3190683y3 −5877491y2 + 5098137y − 1306195)x11 + (−26599655y3 + 37464015y2 − 26483478y − 58084)x10 − (407075367y3 −573387591y2 +403940890y+2270006)x9 +(619132777y3 −850876509y2 + 584163098y + 24614772)x8 + (592356855y3 + 1328523115y2 − 2471081009y + 2166098035)x7 + (7800677025y3 + 27954300280y2 − 47333921882y + 38986042155)x6 + (162089754270y3 − 204244324605y2 + 126755388209y + 24985198246)x5 − (77852060842y3 − 146334786767y2 + 129096576321y − 36235364814)x4 + (329274157643y3 − 589411894650y2 + 504280131618y − 123747905748)x3 − (19664357803y3 − 198050048014y2 + 260420708061y − 170240448043)x2 + (82385860238y3 − 451202797445y2 + 555711255098y − 334691596501)x + (247471898361y3 − 121326572078y2 − 75890214657y + 228651542887) L/Q: x72 + 12x71 + 152x70 + 3488x69 + 30257x68 + 364332x67 + 12268024x66 + 139910168x65 + 1288102264x64 + 18724236696x63 + 161911222284x62 + 1683107717960x61 + 32679649469296x60 + 373240693040164x59 + 58 57 3296191053266432x + 28212091660883444x + 188216287154933386x56 + 1056612755483073508x55 + 6821432170056881116x54 + 43772098823905528992x53 + 240812354654079672656x52 + 1237565713922393565524x51 + 5562913588888281806906x50 + 18864843280711079675584x49 + 45560004259483619102538x48 + 82505496889751639163980x47 + 299714324876238245579180x46 + 2852531920264595224517204x45 + 22986761837686523315292768x44 +141161493887026894230111212x43 + 707092270739003339874674450x42 + 3044340585503231730516353400x41 + 11617220452543804239821064974x40 +40!059790243399604768626461292x39 + 126662802373823337076998208332x38 + 370465763667503417338061369288x37 + 1009755069523221888485479162864x36 +2578048250828731439608558447564x35 + 6190274144554536387411590751512x34 + 14020799514845961862492048379616x33 + 30026173689133157560196450198434x32 + 60904170402984085738245558927100x31 + 117153859492914967543272540245744x30 +213900678149921246044936472716352x29 + 370902100847465564073341180573752x28 +610951657992400557998997597240968x27 + 955946578169090135641325278933990x26 +1420584617855117381669976753865272x25 + 2003925482350390659066822187138742x24 +2681651440495162326479856150804272x23 + 3401511057616793757665547023483436x22 +4085089015005293196057477686601688x21 + 4640270244498445873358874553990138x20 +4976162663320426358234392889213744x19 + 5031210450944746799752078622965352x18 +4782626748544873329680413356313408x17! +
m P
390
Henri Cohen, Francisco Diaz y Diaz, and Michel Olivier
4263945989218703027366084651345768x16 +3549867798215055899282781820690204x15 + 2747202368238217519265567372133112x14 +1963818090109919245327229142293044x13 + 1287663376817944644060907735821961x12 +767959042615169074803616242563504x11 + 412490214458326686433652705510922x10 + 197288506738666695848571604361948x9 + 82872265348689029062530016836455x8 + 30060143890232630390315590311036x7 + 9234912627781582916641908697716x6 + 2343411674504523779697954335928x5 + 483905066337894263861078168502x4 + 77567453405027786965053453688x3 + 8647139603109548792721591036x2 + 564799741054021199470795860x + 15809712786173624251723561 dr(L) = 17.9475131639, GRH = 17.5044897217, % = 2.530913 Degree 76 N = 76, m = 4, n = 19 K: y4 − 2y3 + 21y2 − 20y + 32 d(K) = 172 · 433 = ZK dr(L) = 18.8081653755, GRH = 17.8013202253, % = 5.656014
m
d(L) = 1738 · 43319
Degree 80 N = 80, m = 4, n = 20 K: y4 + y2 − y + 1 d(K) = 257 = 641 d(L) = 25720 · 64119 L/K: x20 + (6y3 + 12y2 + 11y + 10x19 + (122y3 + 101y2 + 150y − 64)x18 + (685y3 + 152y2 + 462y − 918)x17 + (947y3 − 1199y2 − 923y − 3398)x16 + (−3604y3 − 6502y2 − 8503y − 4081)x15 + (−17412y3 − 14363y2 − 22074y + 7022)x14 + (−34569y3 − 15201y2 − 29258y + 35252)x13 + (−39088y3 + 699y2 − 13952y + 67874)x12 + (−19396y3 + 31043y2 + 27196y + 81458)x11 + (24606y3 + 64190y2 + 81800y + 59810)x10 + (79668y3 + 84542y2 + 123553y − 2072)x9 + (111121y3 + 70378y2 + 117622y − 82980)x8 + (91926y3 + 22013y2 + 56727y − 125519)x7 + (40486y3 − 19366y2 − 15141y − 100878)x6 + (−1541y3 − 33826y2 − 43647y −52688)x5 +(−12353y3 −28686y2 −36231y −17101)x4 +(−11231y3 −14451y2 − 19742y − 3249)x3 + (−5123y3 − 6193y2 − 7166y − 36)x2 + (−1801y3 − 1285y2 − 1999y + 603)x + (−139y3 − 183y2 − 139y + 202) L/Q: x80 +34x79 +581x78 +6543x77 +53869x76 +340641x75 +1689241x74 +6542599x73 + 18901578x72 + 33517881x71 − 13647014x70 − 367819537x69 − 1567676193x68 − 3994021115x67 −5420232876x66 +5004424142x65 +53457817110x64 +171379541438x63 + 328449257966x62 + 267253766323x61 − 707440947598x60 − 3658306054182x59 − 8938820603447x58 − 13314203808787x57 − 5737328788161x56 + 34171746068889x55 + 125740136059400x54 + 254207859883948x53 + 316384198953795x52 + 80065990857116x51 − 762198218647921x50 − 2350069358074344x49 − 4211705209922402x48 − 4824565558574113x47 − 1653043433357809x46 + 7787215207391369x45 + 23685434618610528x44 + 41279871164910640x43 + 49558309145408389x42 + 33325057779269493x41 − 2051272390217648!7x40 − 113839224069372179x39 − 228815002602117610x38 − 323915396652114978x37 − 337965834715596619x36 − 205380455543079065x35 + 115046961862625290x34 + 601263759107536821x33 + 1135654718099107546x32 + 1509381506768577092x31 + 1490341171862413076x30 + 948612363659112474x29 − 20477283903105652x28 − 1068959054500679494x27 −1724060241976981233x26 −1629731372268309199x25 − 761331158996540162x24 + 535386704066533113x23 + 1723032391939006310x22 + 2360523689350512610x21 +2314068496152744373x20 +1766582338444309777x19 + 1052067665851235327x18 + 453963382245646072x17 + 95826664603922826x16 − 45552126867855222x15 − 60517604243103603x14 − 32761669444193750x13 − 7490119288625120x12 + 3812969951188627x11 + 5390015215845779x10 + 3533018329602310x9 +1654!398129081621x8 +604698611486903x7 +
m P
A Table of Totally Complex Number Fields of Small Discriminants
391
178842175271649x6 + 44427085090496x5 + 9865942652298x4 + 2056780339178x3 + 389076025908x2 + 53633378920x + 4890723961 dr(L) = 18.5828389409, GRH = 18.0812725668, % = 2.773955
References 1. H. Cohen: A Course in Computational Algebraic Number Theory. GTM 138, (1993) Springer-Verlag 2. H. Cohen and F. Diaz y Diaz: A polynomial reduction algorithm. S´ em. Th. Nombres Bordeaux (S´erie 2) 3, (1991) 351–360 3. H. Cohen, F. Diaz y Diaz and M. Olivier: Computing ray class groups, conductors and discriminants. Math. Comp. To appear 4. M. Daberkow and M. Pohst: Computations with relative extensions of number fields with an application to the construction of Hilbert class fields. Proc. ISAAC’ 95, ACM Press, (1995) 68–76 5. C. Fieker and M. Pohst: On lattices over number fields. Algorithmic Number Theory Symposium II. Lecture Notes in Computer Science 1122 (1996) SpringerVerlag 133–139 6. A. Leutbecher and G. Niklash: On cliques of exceptional units and Lenstra’s construction of Euclidean fields. Journ´ees arithm´etiques 1987. Lecture Notes in Math. 1380, (1989) Springer-Verlag 150–178 7. J. Martinet: Petits discriminants des corps de nombres. Journ´ees arithm´etiques 1980, London Math. Soc. Lecture Note Ser. 56, (1982) Cambridge Univ. Press 151–193 8. A. Odlyzko: Bounds for discriminants and related estimates for class numbers, regulators and zeros of zeta functions: a survey of recent results. S´em. Th. des Nombres Bordeaux (S´erie 2) 2, (1990) 119–141 9. X. Roblot: Unit´es de Stark et corps de classes de Hilbert. C. R. Acad. Sci. Paris. 323 (1996) 1165–1168
Generating Arithmetically Equivalent Number Fields with Elliptic Curves Bart de Smit Rijksuniversiteit Leiden, Postbus 9512 2300 RA Leiden, The Netherlands [email protected]
Abstract. In this note we address the question whether for a given prime number p, the zeta-function of a number field always determines the p-part of its class number. The answer is known to be no for p = 2. Using torsion points on elliptic curves we give for each odd prime p an explicit family of pairs of non-isomorphic number fields of degree 2p + 2 which have the same zeta-function and which satisfy a necessary condition for the fields to have distinct p-class numbers. By computing class numbers of fields in this family for p = 3 we find examples of fields with the same zeta-function whose class numbers differ by a factor 3.
1
Introduction
Two fields are said to be arithmetically equivalent if they have the same zetafunction. The easiest examples of non-isomorphic arithmetically equivalent fields √ √ are the fields K = Q( 8 a) and K 0 = Q( 8 16a), where a is any integer for which both |a| and 2|a| are not squares. One can show that the class number quotient h(K)/h(K 0 ) is 1 or 2 or 1/2; see [4]. By actually computing the class numbers for some small a one finds that all three values occur [5]. The question we will address in this paper is the following. For a given odd prime number p, do there exist arithmetically equivalent number fields for which the p-parts of the class numbers are distinct? We expect the answer to be yes for all p. In this paper we will construct, for each prime p > 2, a family of pairs of fields of degree 2p + 2 which have the same zeta-function but which also satisfy a necessary condition for the class numbers to have distinct p-parts. By computing class groups of some fields in the family for p = 3 of relatively small discriminant, we found examples which settle the question in the affirmative for p = 3. To find examples for larger p by this method will require a considerable amount of computation with class groups or units of fields of degree at least 12. We hope that the families of fields given in this paper will provide interesting testing material for those working on improving the performance of software for computing class groups and units. In Section 2 we will describe the necessary combinatorial conditions that the Galois groups of arithmetically equivalent fields have to satisfy in order to have J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 392–399, 1998. c Springer-Verlag Berlin Heidelberg 1998
Arithmetical Equivalence and Elliptic Curves
393
any hope that they may have distinct p-parts of the class numbers. Since we want to compute class numbers, we want our fields to have small degree. The smallest degree for which we could produce the right combinatorial setting is 2p + 2. For p = 3, 5 and 7 we know that this degree is minimal. Since our construction is based on the group G = GL2 (IFp ), we can find our fields in any Galois extensions of Q with Galois group GL2 (IFp ). It is well known that the group GL2 (IFp ) can be realized as a Galois group over Q by adjoining the coordinates of p-torsion points of an elliptic curve. These torsion points are described by explicit division polynomials. In Section 3 we show how one can produce the equations for our particular subfields. We can control the discriminant of the fields we obtain by starting with an elliptic curve with small conductor. In Section 4 we address the issue of deciding when two arithmetically equivalent fields have the same p-class number, and we give a small table of results for p = 3.
2
Group Theoretic Setting
Let N be a finite Galois extension of Q with Galois group G. By Galois theory, the category of fields that can be embedded in N is anti-equivalent to the category of transitive G-sets X. Under this equivalence a field K corresponds to the set of field embeddings of K in N . By the formalism of the Artin L-function, two such fields have the same zeta-function if and only if for the corresponding Gsets X and X 0 we have an isomorphism of C[G]-modules C[X]∼ =C[X 0 ]; see [2] This last condition is also equivalent to the Q[G]-modules Q[X] and Q[X 0 ] being isomorphic (cf. [2, p. 110]). One can show that the two number fields must have isomorphic p-parts of the class group if we have a ZZ p [G]-module isomorphism ZZ p [X]∼ =ZZ p [X 0 ]; see [8], [9]. We sketch a short proof: if CN is the idele class group of N , and UN denotes the group of ideles which are units at the finite primes, then we have a H = UN H and canonical map f: UN → CN . For a subgroup H of G we have UN H = CN H . The p-part of the class group of N H is the cokernel of the map CN that we get by applying the functor HomZZ p [G] (ZZ p [G/H], ZZ p ⊗ZZ −) to f, so it depends only on the field N and the ZZ p [G]-module ZZ p [G/H]. Thus, our first, purely combinatorial, task is to find for given p, a finite group G and two transitive G-sets X and X 0 of smallest cardinality possible so that (∗)
Q[X] ∼ 6 ZZ p [G] ZZ p [X 0 ] . =Q[G] Q[X 0 ] but ZZ p [X] ∼ =
The key to our construction is to consider the standard action of the group G = GL2 (IFp ) on the set V of column vectors of length 2 over IFp . Let V ∗ = Hom(V, IFp ) be the dual of V with G-action given by (gϕ)(x) = ϕ(g−1 x) for g ∈ G, x ∈ V and ϕ ∈ V ∗ . The character of the representation C[V ] of G assigns to each element g ∈ G the number of points of V fixed under g. For g ∈ G the number of fix-points in V and V ∗ are the same, so it follows that
394
Bart de Smit
C[V ]∼ =C[G] C[V ∗ ]. Taking out the trivial representation, i.e., the zero-elements of V and V ∗ and changing scalars we get Q[V \{0}] ∼ =Q[G] Q[V ∗ \{0}] . Note that the G-sets V \{0} and V ∗ \{0} are transitive of order p2 − 1. If p > 2 then the stabilizer of a point fixes no element of V ∗ \{0}, so that the G-sets are not isomorphic. Thus, the G-sets give non-isomorphic arithmetically equivalent fields. The degree of these fields is the cardinality of V \{0}, which is p2 − 1. Note that the group IF∗p is embedded in G as the scalar multiplications on V . To find fields of smaller degree we consider the action of subgroups S of IF∗p . Since S lies in the center of G, we have a quotient G-set X/S for any G-set X. We now consider X = (V \{0})/S and X 0 = (V ∗ \{0})/S. We can also take the quotient by S for G-modules, so Q[X] ∼ =Q[G] Q[V \{0}]/S ∼ =Q[G] Q[V ∗ \{0}]/S ∼ =Q[G] Q[X 0 ] . The stabilizers of elements of X are the conjugates of the subgroup H = ( S0 ∗∗ ) of G, and the stabilizers of the elements of X 0 are the conjugates of the subgroup H 0 = ( ∗0 S∗ ). Note that both H and H 0 have only one stable 1-dimensional subspace of V . If S 6= IF∗p then the number orbits of H and H 0 on their stable lines is not the same, so that X and X 0 are not isomorphic as G-sets. For p > 2 and S = IF∗2 p we thus obtain non-isomorphic arithmetically equivalent fields of degree 2p + 2. 6 ZZ p [G] ZZ p [X 0 ] we consider the subgroup In order to check that ZZ p [X] ∼ = H = ( 10 ∗∗ ) of G. Note that the H has orbit lengths 1, 1, 2p on X and 2, p, p on X 0 . This implies that the ZZ p [G]-modules ZZ p [X] and ZZ p [X 0 ] have distinct ˆ 0 (H, M ) = M H /(P ˆ 0 (H, −), where H Tate-cohomology groups H h∈H h)M . This completes the group-theoretic part of the construction. One can summarize as follows:
Proposition 1. Suppose p is an odd prime number. Let G = GL2 (IFp ), and let H and H 0 be the subgroups ( 0 ∗∗ ) and ( ∗0 ∗ ) of G, where “ ” denotes the condition that the matrix entry be a square. Then H and H 0 have index 2p + 2 in G, and the G-sets X = G/H and X 0 = G/H 0 satisfy (∗). For p = 3, 5 and 7 we have checked computationally that the degree, i.e., the cardinality of the G-sets X and X 0 in this proposition is minimal by using the classification of transitive groups of degree up to 15. Moreover, for p = 3 and for p = 5 we know that the configuration in the proposition is the only one with this minimal degree. It would be nice to have a more conceptual proof of these statements which may also say something for larger p. For p = 2 our construction fails because then IF∗p has no strict subgroups. The smallest degree in this case is obtained in the same way by taking G = GL3 (IF2 ) rather than GL2 (IF2 ). This leads to number fields of degree 7 as in [8]. In this case G is the simple group group of order 168, and it is quite some work [6] to realize this group as a Galois group over Q and find explicit equations [1]. An
Arithmetical Equivalence and Elliptic Curves
395
example of such fields with distinct 2-parts of the class numbers has been found by Wieb Bosma and the author: x7 + 8x6 + x5 − 15x4 + 13x3 + 8x2 − 20x + 8 x7 + 24x6 + 194x5 + 604x4 + 653x3 + 816x2 + 359x + 212 . These polynomials define two arithmetically equivalent fields with class numbers 2 and 1 respectively.
3
Realization as Number Fields
It is well known that we can realize the group GL(2, IFp ) as a Galois group over Q by considering p-torsion points on elliptic curves. Such a Galois extension of Q always contains a p-th root of unity, so the families of fields obtained in this way are somewhat limited. In this section E denotes an elliptic curve E: y2 = x3 + ax + b,
d = 4a3 + 27b2 6= 0
¯ of p-torsion points is a vector with coefficients a, b ∈ Q. The set V = E(Q)[p] ¯ acts linearly. space of dimension 2 over IFp on which the Galois group Gal(Q/Q) This means that we have a group homomorphism ¯ ∼ ¯ → Aut(E(Q)[p]) ρ: Gal(Q/Q) = GL2 (IFp ) . We will assume that a and b are chosen in such a way that ρ is surjective. This is true generically (see [10, Rem. 6.7] or [7, Chap. 6, §3]) and by Hilbert’s irreducibility theorem the pairs (a, b) for which ρ is not surjective form a “thin” set. Let us first consider the particularly easy case that p = 3. We take X = V \{0} and X 0 = V ∗ \{0}. The field corresponding to X is obtained by adjoining both coordinates of a non-trivial 3-torsion point of E. Writing µ3 for the group ¯ we have isomorphisms of Galois representations of third roots of unity in Q, V ∗ = Hom(V, IF3 ) ∼ = Hom(V, µ3 )⊗µ3 ∼ = V ⊗µ3 . The first isomorphism holds because µ3 ⊗µ3 has trivial Galois action. The second isomorphism is due to the Weil-pairing [11, Chap. 3, §8]. It follows that we get V ∗ as a Galois representation by twisting √ V with the quadratic character associated with of the number field Q(µ3 ) = Q( −3). But it is also possible to twist the entire elliptic curve by a quadratic character, that is, we have ¯ V ⊗µ3 ∼ = E 0 (Q)[3], where E 0 is the twist of E given by E 0 : −3y2 = x3 + 3ax + b .
396
Bart de Smit
Thus, the number field corresponding to X 0 is obtained by adjoining the coordinates of a non-trivial 3-torsion point of E 0 . Let give some explicit equations for in this case: the x-coordinates of the nontrivial 3-torsion points of E are the four zeros of the division polynomial (see [11, Ex. 3.7]) P (x) = 3x4 + 6ax2 + 12bx − a2 . By our hypothesis that ρ is surjective, the 4-dimensional Q-algebra Q[x]/(P ) is a field. A purely formal computation shows that the minimum polynomial of the image of x3 + ax + b in Q[x]/(P ) is 2 1 f(t) = t4 + 8bt3 + dt2 − d2 . 3 27 This means that the y-coordinates of the nontrivial 3-torsion points of E are the isomorphism the zeros√ of the octic polynomial f(t2 ) ∈ Q[t]. By considering √ over Q( −3) from E 0 to E that sends (x, y) to (x, −3y) one sees that the y-coordinates of the non-trivial 3-torsion points of E 0 are the zeros of f(−3t2 ). It turns out that the x-coordinate of a non-trivial 3-torsion point of E or E 0 is contained in the field generated by its y-coordinate (this follows from the next proposition). √ p = 3 the two arithmetically equivalent fields are the √ Thus, for fields Q( α) and Q( −3α), where α is a zero of the polynomial f. We will now show how to obtain equations for any odd prime p. We will not use the standard equations for p-torsion points. Let Q(E) denote the function field of E over Q. Any rational function ϕ ∈ Q(E) gives a map ¯ =Q ¯ ∪ {∞} , ¯ → IP1 (Q) E(Q) ¯ Suppose that we have a function ϕ ∈ Q(E) that which is Gal(Q/Q)-equivariant. satisfies the following hypotheses. (1) (2) (3)
¯ ϕ has no poles in E(Q)[p]\{0}; ¯ ϕ is constant on each IF∗2 p -orbit of E(Q)[p]\{0}; ∗ ¯ ϕ is not constant on each IFp -orbit of E(Q)[p]\{0}.
Let the “quadratic twist” of ϕ be the function ϕ¯ = ϕ ◦ [n] where n ∈ ZZ is not a square modulo p and [n] denotes multiplication by n on E. Note that ϕ¯ does not depend on the choice of n. We now set ψ = (ϕ − ϕ) ¯ 2 . Let the groups H, H 0 and G = GL2 (IFp ) be as in Proposition 1. ¯ for a nonProposition 2. Let p∗ = ±p ≡ 1 mod 4, and let α√= ψ(P ) ∈ √Q ¯ trivial p-torsion point P ∈ E(Q). Then the fields Q( α) and Q( p∗ α) are the fields of invariants of H and H 0 in a Galois extension of Q with Galois group isomorphic to G. ¯ map Proof. The function ϕ restricts to a Gal(Q/Q)-equivariant ϕ:
¯ . ¯ → Q E(Q)[p]\{0}
Arithmetical Equivalence and Elliptic Curves
397
¯ Choose an IFp -basis for E(Q)[p] with P as the first basis element. Since the homomorphism ρ is surjective, the image of ϕ lies in a Galois extension N of Q ¯ ¯ whose Galois group is identified with Aut(E(Q)[p]) = GL2 (IFp ) = G. within Q Moreover, ϕ(P ) is fixed by the subgroup ( 10 ∗∗ ) of G. ¯ also lies in N . A diagonal matrix M = The element β = ϕ(P ) − ϕ(P ¯ )∈Q ( a0 ∗b ) ∈ G now sends β to ( ap )β, where ( ap ) denotes the quadratic symbol. By the Weil-pairing the composite map ρ det ¯ → G → IF∗p Gal(Q/Q)
is equal to the restriction map to Gal(Q(µp )/Q) = IF∗p , where µp denotes the ¯ Thus, the matrix M sends √p∗ to ( ab )√p∗ . group of p-th roots of unity in Q. √ p This implies that β is fixed by the subgroup ( 0 ∗∗ ) of G, and that β p∗ is fixed by ( ∗0 ∗ ). √ It remains to show that β and β p∗ are not fixed √ by larger subgroups, because we then know that the fields Q(β) and Q(β p∗ ) are non-isomorphic and arithmetically equivalent by Proposition 1. Thus, we must show that Q(β) √ and Q(β p∗ ) have degree 2p + 2. We first claim that Q(β) contains no abelian extension of Q of degree at least 2. To see this, note that the commutator subgroup of G is SL2 (IFp ), and that the group ( 0 ∗∗ ) maps surjectively to IF∗p by the determinant. We have β 6= 0 by hypothesis (3) above, and since −β is conjugate to β, it follows that the degree of Q(β) is larger than 2. Thus, the field Q(α) where α = β 2 is a nontrivial extension of Q. The element α is fixed by the maximal subgroup that Q(α) has degree p + 1. We B = ( ∗0 ∗∗ ) of G. Since Q(α) 6= Q it follows √ already saw that B does not fix β, or β p∗ , so these algebraic numbers have have degree 2p + 2. This proves the proposition. There are some obviousP candidates for the function ϕ above. If p ≡ − 1 mod 4 then we can take ϕ = n y ◦ [n] where n ranges over a set of representatives 2 ¯ = −ϕ, in ZZ of IF∗2 p . In this case we have ϕ P and ψ = 4ϕ . If p ≡ 1 mod 4 then −1 is a square in IFp , and we take ϕ = n x ◦ [n] where n ranges over a set of representatives in ZZ of IF∗2 p /h−1i. In both cases hypotheses (1) and (2) are clearly satisfied. For given p, a, and b we would now like to find the minimal polynomial ¯ To do this, it is convenient to first comf ∈ Q[t] of the element α = ψ(P ) of Q. pute approximations of its complex roots by explicitly computing Weierstrass functions. The Pari program (see [3]) is well suited for this. For small p one could also use the addition formulas or division polynomials and do formal computations over the field Q(a, b) with transcendental a and b, but typically this will take much more effort. In fact, the best method to compute f as a polynomial with coefficients in the transcendental field Q(a, b), is to compute the polynomial for enough sample values of a and b and then interpolating. Let us treat some small cases explicitly. For p = 3 take ϕ = y; for p = 5 take ϕ = (x − x ◦ [2])/2, and for p = 7 take ϕ = y + y ◦ [2] + y ◦ [4]. This gives rise to
398
Bart de Smit
the following polynomials for α: p=3: p=5: p=7:
1 2 d f(t) = t4 + 8 bt3 + 23 dt2 − 27 5 1 2 6 5 3 f(t) = 5 t + 12 at − 2 d t + 16 d 8 7 f(t) = 7 t +13824 bt + 51586416 dbt5 + 319956 dt6 −42 d(6237547 d − 4976640 b2)t4 + 10947369888 d2bt3 −28 (150387289 d + 4417425072 b2)d2 t2 +226800 409637 d + 1174176 b2 bd2 t −81 d2 (17161 d − 41472 b2)2 .
Here we use the notation d = 4a3 + 27b2 . These “generic” minimal polynomials can be used as follows. If for given a, b ∈ Q with d 6= 0 the homomorphism ρ is surjective, and 0 is not a root of f, then by Proposition 2 the polynomials f(t2 ) and f(p∗ t2 ) define realizations of the G-sets of Proposition 1 as field extensions of Q, so that we indeed obtain non-isomorphic arithmetically equivalent fields. In practice, we do not test whether ρ is surjective for given a and b ∈ Q, but we test whether f(t2 ) and f(p∗ t2 ) are irreducible. If this is the case, then the Galois group of the minimal common normal field will be a subgroup of the group GL2 (IFp )/IF∗2 p , which we obtain generically. Then the fields are arithmetically equivalent, because if two G-sets give isomorphic permutation representations of G, then they also give isomorphic permutation representations of any subgroup of G. It is still possible that the fields are isomorphic. However, if we are searching for arithmetically equivalent fields with distinct class numbers, then this is of no concern, since fields with distinct class numbers are certainly not isomorphic, and we do not expect to waste a lot of computing time on the thin set of pairs (a, b) with non-generic behavior.
4
Computing Class Numbers
By explicit computations with the equations of the last section, we can answer the question in the introduction for p = 3. Proposition 3. There exist two number fields with the same zeta-function for which the 3-parts of the class numbers are distinct. To find such fields we used the Pari program. We computed the class numbers of 819 pairs of fields of relatively small discriminant. Of those pairs, 118 had one or both class numbers divisible by 3, and 88 pairs had distinct class numbers. In all these 88 cases the class numbers differed by a factor 3, and one can actually prove that this is the only possibility [1]. We did not use the rigorous version of the routines for class number computation, but we did check correctness of the class number quotients for all 819 pairs by the method given in [5]. In the next table one finds a small selection of these fields with the notation of Section 3: the a and b give the elliptic curve E and its twist E 0 , and the number D is the absolute value of the discriminant of the number fields K and
Arithmetical Equivalence and Elliptic Curves
399
K 0 that one gets by adjoining a non-trivial 3-torsion point of E and E 0 . The class numbers of K and K 0 are denoted by h and h0 a 12 6 −51 6 −24 48
b 64 8 78 −3 −60 48
D 2
18
3
4
3 17 2 311 37 534 10 7 2 3 414 24 37 974 28 37 734 22
h
h0
1 12 3 3 1 2
3 4 1 1 3 6
Since it seems unlikely, by the Cohen-Lenstra heuristics, that a degree 12 number field has class number divisible by 5, one would have to sieve through many pairs before finding arithmetically equivalent fields whose class numbers differ by a factor 5. But perhaps this is feasible as routines for class group computations become faster. A theoretical construction which forces a factor 5 in the class number would be even more helpful.
References 1. Bosma, W., De Smit, B.: On arithmetically equivalent fields of small degree. (in preparation) 2. Cassels, J.W.S., Fr¨ ohlich, A. (eds.): Algebraic number theory. Academic Press, London-New York 1967 3. Cohen, H.: A course in computational number theory. Springer-Verlag New York 1993 4. De Smit, B.: On Brauer relations for S-class numbers. Technical Report 97-10 Universiteit van Amsterdam 1997 5. De Smit, B., Perlis, R.: Zeta functions do not determine class numbers. Bull. Amer. Math. Soc. 31 (1994) 213–215 6. LaMacchia, S.E.: Polynomials with Galois group PSL(2, 7). Comm. Algebra 8 (1980) 983–982 7. Lang, S.: Elliptic functions. Springer-Verlag, New York 1987 8. Perlis, R.: On the class numbers of arithmetically equivalent fields. J. Number Theory 10 (1978) 489–509 9. Roggenkamp, K., Scott, L.: Hecke actions on Picard groups. J. Pure Appl. Algebra 26 (1982) 85–100 10. Shimura, G.: Introduction to the arithmetic theory of automorphic functions. Princeton University Press, Princeton 1971 11. Silverman, J.H.: The arithmetic of elliptic curves. Springer-Verlag, New York 1986
Computing the Lead Term of an Abelian L-Function David S. Dummit1? and Brett A. Tangedal2?? 1 2
University of Vermont, Burlington VT, 05401, USA College of Charleston, Charleston SC, 29424, USA
Abstract. We describe the extension of the techniques implemented in [DST] to the computation of provably accurate values for the lead term at s = 0 of Abelian L-functions having higher order zeros, and provide some explicit examples. In particular we raise the question of applying the higher order extensions of the Abelian Stark Conjecture to the explicit construction of an interesting field extension in a manner analogous to the applications here and in [DST], [Ro] in the case of zeros of rank one.
1
Introduction
In [DST], a method for computing the first derivative at s = 0 of Abelian Lfunctions was implemented and then applied to numerically verify the rank one Abelian Stark conjecture for some totally real cubic fields. These computations were based on work of Friedman [F] and the purpose of this paper is to indicate how the results in [DST] can also be used to compute the lead term at s = 0 of Abelian L-functions having higher order zeros. These values arise in various higher rank generalizations of Stark’s conjecture ([G], [P], [R], [T]).
2
Abelian L-Functions
Let k be an algebraic number field with signature (r1 , r2 ) and degree n over Q. Let mm∞ be a modulus in k where m is an integral ideal and m∞ is a formal product of real infinite primes, and let G = G(mm∞ ) be the corresponding ray class group. If χ : G → C× is a character on G then in general χ will not be primitive. Let ff∞ be the conductor of the associated primitive character, again denoted by χ; we have f | m and every infinite prime appearing in f∞ also appears in m∞ . The L-function associated to χ is given for Re(s) > 1 by L(s, χ) =
X χ(a) , Nas
(a,f)=1 ? ??
Partially supported by grants from the National Science Foundation and the National Security Agency. Partially supported by a grant from the National Security Agency.
J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 400–411, 1998. c Springer-Verlag Berlin Heidelberg 1998
Computing the Lead Term of an Abelian L-Function
401
with the sum running over all integral ideals a of k relatively prime to f. As is well known, L(s, χ) can be analytically continued to a meromorphic function in the whole complex plane with at most a simple pole at s = 1. We shall refer to the order of vanishing r of L(s, χ) at s = 0 as the rank of the character χ. The Abelian Stark Conjecture concerns the values of the derivatives at s = 0 of the L-series in the case of rank 1 characters. The various generalizations to higher rank characters involve the coefficients cr of the leading term of L(s, χ) as a Taylor series L(s, χ) = cr sr + cr+1 sr+1 + · · · near s = 0, so it is important to be able to compute provably accurate numerical values for these coefficients. The rank of the trivial character χ0 is r1 + r2 − 1 and L(s, χ0 ) = −
hk Rk r1 +r2 −1 s +··· wk
(1)
where hk , Rk , and wk denote the class number, the regulator, and the number of roots of unity of k, respectively. If χ 6= χ0 is a nontrivial character, then χ has rank r = r1 + r2 − q where q is the number of infinite primes in the conductor of χ. The computation of the leading term for L(s, χ) is based on consideration of the “completed” L-function Λ(s, χ) = (A(f))s (Γ ((s + 1)/2))q (Γ (s/2))r1 −q (Γ (s))r2 L(s, χ)
(2)
of k, Nf is the absolute norm of the ideal f, and where dk is the discriminant np A(f) = 2−r2 π − 2 |dk | Nf. The function Λ(s, χ0 ) is analytic except for simple poles at s = 0 and s = 1 and Λ(s, χ) is entire for χ 6= χ0 . Suppose χ is primitive and nontrivial, in which case L(s, χ) is also entire. If χ is primitive and nontrivial, then the same is true for the complex conjugate character χ, and we have the functional equation Λ(s, χ) = W (χ)Λ(1 − s, χ),
(3)
where the root number W (χ) is a complex number of absolute value 1. The computation of the root number W (χ) is considered in Section 4. Proposition 1. If χ is primitive and nontrivial, then Λ(0, χ) 6= 0. Proof. Combining (2) and (3) we have Λ(s, χ) = W (χ)(A(f))1−s (Γ ((2 −s)/2))q (Γ ((1 −s)/2))r1 −q (Γ (1 −s))r2 L(1 −s, χ) since χ has precisely the same conductor ff∞ as χ and thus the same Γ -factors and the same factor A(f) when defining Λ(s, χ). Hence Λ(0, χ) = W (χ)A(f)(Γ (1))q (Γ (1/2))r1 −q (Γ (1))r2 L(1, χ) and since L(1, χ) 6= 0, also Λ(0, χ) 6= 0.
402
David S. Dummit and Brett A. Tangedal
It follows from the proposition and equation (2) that the rank of χ 6= χ0 is r1 + r2 − q, as previously mentioned, since this is the order of the pole at s = 0 contributed by the Γ -factors, and then equation (2) gives Λ(0, χ) = π q/2 2r1 −q
L(r1 +r2 −q) (0, χ) = 2r1 −q π q/2 cr . (r1 + r2 − q)!
(4)
The computation of Λ(0, χ) is described in the following section, so (1) and (4) compute the leading term of L(s, χ) about s = 0 for any primitive character χ. It is now a simple matter to compute the lead term for imprimitive characters. If S is any set of primes in k including the infinite primes and the primes dividing m, then the imprimitive L-series of χ with respect to S is given by Y (1 − χ(p)Np−s ) LS (s, χ) = L(s, χ) p∈Yχ
where Yχ is the set of finite primes in S not dividing the conductor f of χ. For each p ∈ Yχ such that χ(p) = 1, the order of the zero of LS (s, χ) at s = 0 will increase by one and the leading term for the primitive L-series L(s, χ) will be multiplied by log Np. For each of the remaining primes p ∈ Yχ the leading term will be multiplied by (1 − χ(p)).
3
Computing Λ(0, χ)
The computation of Λ(0, χ) is as in [DST] and is based on Friedman [F]. In [DST] the emphasis was on computations related to numerically confirming the rank one Abelian Stark Conjecture, so the computations involved a particular linear combination of values of derivatives of L-series tailored to that situation (and due originally to Stark). The principal observation here is that the same techniques provide provably accurate computation of the leading terms of the L(s, χ) directly, in particular the leading terms for the characters of rank greater than one as well. P Write L(s, χ) = n≥1 an n−s where X χ(a) an = (a,f)=1,N(a)=n
is the sum of the values of χ on the integral ideals of k of norm n. Then X A(f) A(f) , 0 + W (χ) an f ,1 , an f Λ(0, χ) = n n n≥1
where f(x, s) is given by the line integral Z δ+i∞ dz 1 xz (Γ ((z + 1)/2))q (Γ (z/2))r1 −q (Γ (z))r2 f(x, s) = 2πi δ−i∞ z−s for any δ > 1.
(5)
Computing the Lead Term of an Abelian L-Function
403
A(f) The values f A(f) , 0 and f , 1 in (5) are computed by shifting the line n n of integration to the left and computing residues. Note that these residues depend principally only on q, r1 and r2 , so the computation of values to a particular (provable) accuracy are essentially the same for different fields k having the same signature. See Section 3 in [DST] for a detailed discussion of the integral computations. The values of an are computed by determining the decomposition of prime ideals in k and their class in the ray class group G and then computing the number of integral ideals of given norm n sorted according to their class in G. The recently added functionality in Pari 2.0.2 for ray class group computations makes this a straightforward computation. Note that some caution must be exercised since the ideals of interest for the character χ may involve primes dividing m (but not dividing f), and these primes have no corresponding class in G. It remains in (5) to compute the root number W (χ) (this portion of the computation in [DST] is completely hidden in the computation of ideals of norm n in the ‘dual’ class).
4
Computation of the Root Number
One method of approaching this computation is to decompose the root number into a product of local root numbers and then compute the latter individually (see [Ro] pp. 47-50). We shall instead outline a method for computing W (χ) based upon a classical (global) formula for the root number (see pp. 76-78 of [L] and [Ta]). As above, we assume that χ is a primitive ray class group character (1) (q) with conductor ff∞. Order the real infinite primes of k so that f∞ = p∞ · · · p∞ where 0 ≤ q ≤ r1 . We say that a nonzero element α ∈ k is “f∞ -positive” if α(1) > 0, . . . , α(q) > 0, where α(i) is the image of α in the real embedding (i) defined by p∞ . Let d denote the different of k/Q. Then the root number can be expressed in the form (−i)q C(χ) √ (6) W (χ) = Nf where C(χ) is a certain finite sum defined by X χ((β)) e2πiTr(βµ/λ) . C(χ) = χ(h)
(7)
β
The trace is the trace from k to Q and the sum is over a complete residue system of f∞ -positive integers β mod f. As usual, χ((β)) = 0 if the principal ideal (β) is not relatively prime to f. The integers λ, µ and the integral ideal h in k are defined as follows: (A) Choose λ ∈ df so that (i) the integral ideal g =
(λ) is relatively prime to f, and df
404
David S. Dummit and Brett A. Tangedal
(ii) λ is f∞ -positive. (B) Choose µ ∈ g so that (i) the integral ideal h = (ii) µ is f∞-positive.
(µ) is relatively prime to f, and g
It is easy to see that such λ and µ exist—one way to achieve this is indicated below. The sum in (7) is independent of the choice of λ and µ and the choice of representatives β, and gives the root number as a sum of Nf-th roots of unity times the values of χ. If the character χ has no finite primes in its conductor, i.e., f = (1), then the sum in (7) has a single term, we can take β = 1, λ = Nd, and µ = Ng, so that in this case the root number is given by the simple formula W (χ) = (−i)q χ(d).
(8)
In particular, for the trivial character we have W (χ0 ) = 1. We mention also the b is a subgroup of characters defined on G, then basic result that if H Y W (χ) = 1. b χ∈H
This provides a good numerical check for computations and also implies that W (χ) = 1 if χ is a quadratic character. We now give more details for computing W (χ), when f 6= (1), in terms of commands available in Pari 2.0.2 (see [BBBCO]). Assume that the Z-basis for the ideal f in Hermite normal form is f = [f11 , f12 + f22 ω2 , . . . , f1n + f2n ω2 + · · · + fnn ωn ] where 1, ω2 , . . . , ωn is the integral basis for the integers of k computed by Pari. A complete set of representatives mod f is given by {b1 + b2 ω2 + · · · + bn ωn | 0 ≤ bi < fii }. If the algebraic integer β = b1 + b2 ω2 + · · · + bn ωn is not f∞-positive, we can replace it by (b1 + u · Nf) + b2 ω2 + · · · + bn ωn which is in the same class as β mod f and can be made f∞ -positive by choosing u to be a large positive integer. To compute λ we proceed as follows. Applying strong approximation (the function ‘idealappr’ in Pari) to the product ideal fd gives an element λ1 with λ1 ∈ df and (λdf1 ) relatively prime to f. If λ1 is f∞-positive set λ = λ1 . If λ1 is not f∞ -positive, determine an algebraic integer α ≡ 1 (mod f) having the same signature with respect to f∞ as λ1 and set λ = αλ1 . Such an integer α is easily constructed. For example, suppose that (1)
(q )
(q +1)
λ1 < 0, . . . , λ1 1 < 0, λ1 1
(q)
> 0, . . . , λ1 > 0.
Computing the Lead Term of an Abelian L-Function
405
Choose u to be a sufficiently large positive integer so that (1)
λ1 +
1 1 (q ) < 0, . . . , λ1 1 + < 0. u · Nf u · Nf
An easy check shows that it suffices to take α = 1 + u · Nf · λ1 . To compute µ, apply strong approximation to the quotient ideal (λ)/(fd) (being careful to include zero exponents for all prime ideals dividing f) to obtain an element µ1 and then construct an f∞ -positive element µ from µ1 as above.
5 5.1
Examples Fields K/k in [DST]
The leading terms of the L-functions for the 55 fields K/k considered in [DST] have been computed. In each case the field K is a cyclic extension of k of degree 6 unramified at all finite primes; there are 3 characters of rank 1 (for which L0 (0, χ) was computed in [DST]), one character of rank 2 (the trivial character), and the remaining 2 characters have rank 3. For these characters the root number W (χ) is most easily computed using equation (8). For example, suppose k = Q(α) where α3 − α2 − 9α + 8 = 0. This is the original (and unique) cubic example computed by Stark (and the first of the 55 (1) (2) (3) examples in [DST]). Take m = 1 and m∞ = p∞ p∞ p∞ , the product of all three (1) (real) infinite places of k, where p∞ is defined by the embedding α 7→ 3.079118... The ray class field K to conductor mm∞ (the strict Hilbert class field to k) is cyclic of order 6 over k. With respect to a choice of primitive character χ of G of order 6, the characters of G and the values of the leading terms of their L-series are: character conductor rank
leading coefficient cr
2
−7.193985833045266981800893012474...
p∞ p∞
1
3.264637725613672236133252689303... −i 0.147170752903187778573392249690...
χ2
1
3
2.530444424919522699924324889540... −i 0.100038865411813113339436425093...
χ3
p∞ p∞
1
1.339502188216318921900003862702...
χ4
1
3
2.530444424919522699924324889540... +i 0.100038865411813113339436425093...
χ5
p∞ p∞
1
3.264637725613672236133252689303... +i 0.147170752903187778573392249690...
χ0
1
χ
(2) (3)
(2) (3)
(2) (3)
406
David S. Dummit and Brett A. Tangedal
The product of these leading coefficients is −659.9887647869219683465342584..., which agrees with the leading coefficient −hK RK /wK of the zeta function ζK (s) at s = 0 computed by Pari (hK = 1 and wK = 2). 5.2
A Z/4Z × Z/4Z Relative Extension
Let k = Q(β) where β 3 + β 2 − 14β − 23 = 0, a totally real cubic field of discriminant 2777. There is a unique prime p3 in k of norm 9 and a unique (1) prime p5 in k of norm 5. Define the infinite prime p∞ by the embedding β 7→ (1) (2) (3) −3.124784... and let m = p3 p5 and m∞ = p∞ p∞ p∞ . The ray class field K to conductor mm∞ is the strict ray class field of k of conductor p3 p5 (and in this (2) (3) case, this is in fact the same field as the ray class field to conductor p3 p5 p∞ p∞ ). The ray class group G is of order 16 with Galois group Z/4Z × Z/4Z. The isomorphism is given explicitly by mapping the class of the unique prime p03 of norm 3 to (1, 0) and the class of the unique prime p53 of norm 53 to (0, 1). The characters χ of G can be enumerated in the form χj,j 0 with j, j 0 taken modulo 0 4, where χj,j 0 (a, b) = iaj+bj for (a, b) ∈ Z/4Z × Z/4Z. The lattice of subfields of K/k is K
E
E1
F3
F4
H1
F1
F
H
E2
F2
F5
F6
H2
k where the fields are defined by the kernels of the following sets of characters: H H1 H2 F F1 F2 F3 F4 F5 F6 E E1 E2
: {χ0,0, χ2,0 }, the Hilbert class field of k : {χ0,0, χ0,2 } : {χ0,0, χ2,2 } : {χ0,0, χ2,0 , χ0,2, χ2,2 }, the ray class field for p3 p5 : {χ0,0, χ1,2 , χ2,0, χ3,2 } : {χ0,0, χ1,0 , χ2,0, χ3,0 }, the strict Hilbert class field of k : {χ0,0, χ2,1 , χ0,2, χ2,3 } : {χ0,0, χ0,1 , χ0,2, χ0,3 } : {χ0,0, χ1,1 , χ2,2, χ3,3 } : {χ0,0, χ1,3 , χ2,2, χ3,1 } : {χ0,0, χ1,2 , χ2,0, χ3,2 , χ1,0, χ3,0 , χ2,2, χ0,2 } (3) : {χ0,0, χ2,0 , χ0,2, χ2,2 , χ0,1, χ0,3 , χ2,1, χ2,3 }, the ray class field for p3 p5 p∞ (2) : {χ0,0, χ2,0 , χ0,2, χ2,2 , χ1,1, χ3,3 , χ1,3, χ3,1 }, the ray class field for p3 p5 p∞ .
Computing the Lead Term of an Abelian L-Function
407
The usual rank one Abelian Stark Conjecture applies nontrivially to the fields whose character sets above include a character of rank 1. These are the fields F1 , F2 , E, and K. The conductor ff∞, rank r, root number W (χ) and coefficient cr of the leading term of L(s, χ) at s = 0 for the 16 ray class characters are given in the following table. The root numbers were computed using formulas (6) and (7). We let W = 0.850650808352039932181540497063... +i 0.525731112119133606025669084847...
χ
fχ
rχ
W (χ)
1. χ0,0
1
2
1
−3.949038016309490466058004299913...
2. χ0,1
p3 p5 p∞
2
W
38.895690190913818091252121632837... +i 17.219802878738922275276249305507...
3. χ0,2
p3 p5
3
1
53.923686177816848972225515188256...
4. χ0,3
p3 p5 p∞
(3)
2
W
38.895690190913818091252121632837... −i 17.219802878738922275276249305507...
5. χ1,0
p∞ p∞
(2) (3)
1
1
2.635246871694050424250868328601... −i 0.09604035325601181425197878862...
6. χ1,1
p3 p5 p∞
(2)
2
W
27.151361821789085658692943413441... +i 9.0505756960662943559989590053836...
7. χ1,2 p3 p5 p∞ p∞ 1
−1
−13.234862742780104747441617846173...
(3)
(2) (3)
(2)
cr
8. χ1,3
p3 p5 p∞
2
W
22.846895112357827158784180518642... −i 14.354206761606988602692297654014...
9. χ2,0
1
3
1
3.038773403795369756303052908247...
10. χ2,1
p3 p5 p∞
2
W
17.610528583642463727541646316518... +i 8.6183776076083412933041900700832...
11. χ2,2
p3 p5
3
1
28.952806760286896962929047412660...
12. χ2,3
p3 p5 p∞
2
W
17.610528583642463727541646316518... −i 8.6183776076083412933041900700832...
(3)
(3)
408
David S. Dummit and Brett A. Tangedal
(cont.) fχ
rχ
W (χ)
(2) (3)
1
1
2.635246871694050424250868328601... +i 0.09604035325601181425197878862...
(2)
2
W
22.846895112357827158784180518642... +i 14.354206761606988602692297654014...
15. χ3,2 p3 p5 p∞ p∞ 1
−1
−13.234862742780104747441617846173... −i 2.316971845280172505460649648305...
W
27.151361821789085658692943413441... −i 9.0505756960662943559989590053836...
χ 13. χ3,0
p∞ p∞
14. χ3,1
p3 p5 p∞
(2) (3)
16. χ3,3
(2)
p3 p5 p∞
2
cr
Using the values of the derivatives at s = 0 of the characters for F1 we can compute a numerical value for the values of the conjugates of the associated (1) Stark unit 1 (in the real embedding defined by p∞ ), and then use the techniques in [DST] to determine the polynomial of degree 12 satisfied by 1 . The result is the irreducible polynomial f1 (x) = x12 −559552x11 + 9079464x10 − 44303150x9 + 116721128x8 −198980632x7 + 236087507x6 − 198980632x5 + 116721128x4 −44303150x3 + 9079464x2 − 559552x + 1. The polynomial f1 (x2 ) factors into two irreducible polynomials of degree 12, one of which is f√1 (x) = x12 −748x11 − 24x10 + 6058x9 + 8060x8 − 4804x7 − 15061x6 −4804x5 + 8060x4 + 6058x3 − 24x2 − 748x + 1. It is now easy to verify that f√1 (x) generates the appropriate subfield of the ray class field K, numerically confirming Stark’s rank one Abelian conjecture for F1 , proving that 1 is a square in F1 and providing a relatively small polynomial defining this extension. A similar computation for the field F2 , the strict Hilbert class field for k, numerically confirms Stark’s rank one Abelian conjecture for F2 , and again the Stark unit 2 is a square, in this case the square root satisfies the irreducible polynomial f√2 (x) = x12 −6x11 + 11x10 − 10x9 − x8 + 16x7 − 23x6 +16x5 − x4 − 10x3 + 11x2 − 6x + 1.
Computing the Lead Term of an Abelian L-Function
409
All of the characters associated to the field F have rank at least 2, so the functorial behavior of Stark units predicts that the Stark unit E for E should have norm 1 to F , norm 1 to F1 , and norm 22 to F2 (note that the set S in Stark’s Conjecture for E includes the two finite primes p3 and p5 , so the L-series values for the characters χ corresponding to F2 are multiplied by (1 −χ(p3 ))(1 −χ(p5 )), √ which is why the norm to F2 is not simply 2 ). This implies that E = 1 2 (1) (taking the positive square root at p∞ ). Note that the element on the right defined by the algebraic elements above exists in E by the computations above. Computing the polynomial of degree 24 satisfied by the algebraically defined therefore gives a polynomial generator of the class field E, and a quick check shows that its Galois conjugates give the numerical values of the appropriate L-series for E as predicted by Stark’s Conjecture. This use of the Stark units in the subfields F1 and F2 avoids the necessity of checking that the polynomial of degree 24 one can obtain directly from the L-series values in fact defines the appropriate ray class field, for example by trying to use Pari to determine its discriminant 38 54 27778 (which is resource intensive since this polynomial has relatively large coefficients). √ The Abelian condition of Stark’s Conjecture predicts that E( E ) is an √ √ √ √ Abelian extension of k. Since E = 4 1 2 and 2 ∈ F2 , this implies that √ √ F1 ( 4 1 ) would be an Abelian extension of k. In fact 4 1 is an element of F1 : the polynomial f√1 (x2 ) factors into two irreducible polynomials of degree 12, one of which is 12 4 (x) = x −24x11 − 86x10 − 144x9 − 254x8 − 338x7 − 335x6 f√ 1 −338x5 − 254x4 − 144x3 − 86x2 − 24x + 1. √ √ Since E is an element of E, this suggests computing the polynomial for E = √ √ 4 1 2 to find a generator for the class field E. This produces a polynomial f√E (x):
x24 −65x23 + 1338x22 − 9309x21 + 24370x20 − 34479x19 + 31229x18 − 4203x17 −16177x16 + 21182x15 + 3202x14 − 31344x13 + 44639x12 − 31344x11 +3202x10 + 21182x9 − 16177x8 − 4203x7 + 31229x6 − 34479x5 + 24370x4 −9309x3 + 1338x2 − 65x + 1 The characters of K not belonging to E all have rank greater than 1, which √ √ √ implies that the Stark unit K for K is given by K = E , i.e., K = 4 1 2 ∈ E, the element whose minimal polynomial was computed above. Stark’s Con√ √ jecture asserts that K( K ) is an Abelian extension of k, equivalently, E( K ) is Abelian over k since K ∈ E. The polynomial f√E (x2 ) factors into two irreducible polynomials of degree 24, one of which is f√K (x) = x24 −13x23 + 52x22 − 35x21 − 228x20 + 671x19 − 909x18 +825x17 − 749x16 + 922x15 − 1284x14 + 1662x13 − 1831x12 +1662x11 − 1284x10 + 922x9 − 749x8 + 825x7 − 909x6 + 671x5 −228x4 − 35x3 + 52x2 − 13x + 1.
410
David S. Dummit and Brett A. Tangedal
√ In particular, K is in fact a square (in E), i.e., K = 1 1/8 2 1/4 ∈ E, and its minimal polynomial above defining E is sufficiently small (its discriminant is on the order of 1087 and factors easily) that it is possible to find the class number hE and regulator RE without difficulty using Pari. The result is that hE = 1 and −hE RE /2 = −47039012.4980528180338912880199... This is also the product of the values of the computed leading coefficients for the characters defining E, confirming the computations for these 8 characters. Similar regulator computations for the subfields of E confirm some of the individual L-series values. This completes the numerical confirmation of the rank one Abelian Stark Conjecture for all the subextensions of K/k. Each of the Stark units in these extensions is actually a square. We have as yet found no explanation for this behavior, although in light of [DH], this suggests the involvement of the local Stark Conjecture. It would be of interest to know whether there is some (conjectural) criterion in k determining when a Stark unit is a square (more generally, an e-th power where e is the number of roots of unity in the extension of k)—possibly in the form of additional Abelian extensions of k (as in [DH]) (note that then these extensions are not generated by the Stark unit, but “hinted at” by the Stark unit). Finally, we end with the observation that the usual higher-order Abelian Stark Conjectures apply to the fields E1 and E2 , since the first is the ray class (3) (1) (2) field for conductor p3 p5 p∞ and so has the two totally split primes p∞ , p∞ , and (2) the second is the ray class field for p3 p5 p∞ , with the two totally split primes (1) (3) p∞ , p∞ . The computations described here provide the numerical values of the higher-order leading coefficients required for these conjectures, and as indicated above, can also be used to help provide the necessary algebraic information to describe the associated fields. Note, however, that this algebraic information was provided by the rank one Stark Conjecture. Can one use the various suggested generalizations to the higher-rank case in a similar manner? More specifically, for the example above, can one use the higher-order conjectures for E1 and E2 to describe these fields explicitly in a fashion similar to the use of the rank-one conjecture to explicitly describe E?
References BBBCO. C. Batut, K. Belabas, D. Bernardi, H. Cohen, M. Olivier: User’s Guide to PARI-GP, 1997. DST. David S. Dummit, Jonathan W. Sands, and Brett A. Tangedal: Computing Stark units for totally real cubic fields. Math. Comp. 66 (1997) 1239–1267 DH. David S. Dummit and David R. Hayes: Checking the refined p-adic Stark Conjecture when p is Archimedean, in Algorithmic Number Theory, Proceedings ANTS 2, Talence, France, Lecture Notes in Computer Science 1122, Henri Cohen, ed,˙ Springer-Verlag, Berlin-Heidelberg-New York, 1996. 91–97 F. Eduardo Friedman: Hecke’s integral formula. S´eminaire de Th´eorie des Nombres de Bordeaux (1987-88) Expos´e No. 5
Computing the Lead Term of an Abelian L-Function G. L. P. Ro.
R. T. Ta.
411
David Grant: Units from 5-torsion on the Jacobian of y2 = x5 + 1/4 and the conjectures of Stark and Rubin, preprint ¨ Edmund Landau: Uber Ideale und Primideale in Idealklassen. Math. Zeit. 2 (1918) 52–154 Cristian D. Popescu: Base change for Stark-type conjectures “over ”, preprint X.F. Roblot: Algorithmes de factorisation dans les extensions relatives et applications de la conjecture de Stark ` a la construction des corps de classes de rayon, Th`ese, Universit´e Bordeaux I, 1997 Karl Rubin: A Stark conjecture “over ” for Abelian L-functions with multiple zeros. Ann. Inst. Fourier (Grenoble) 46 (1996) 33-62 Brett A. Tangedal: A question of Stark. Pacific J. Math. 180 (1997) 187–199 Tikao Tatuzawa: On the Hecke-Landau L-series. Nagoya Math. J. 16 (1960) 11–20
Z
Z
e-mail: [email protected], [email protected]
Timing Analysis of Targeted Hunter Searches John W. Jones1 and David P. Roberts2 1
Department of Mathematics, Arizona State University, Box 871804 Tempe, AZ 85287 [email protected] 2 Department of Mathematics, Hill Center, Rutgers University New Brunswick, NJ 08903 [email protected]
Abstract. One can determine all primitive number fields of a given degree and discriminant with a finite search of potential defining polynomials. We develop an asymptotic formula for the number of polynomials which need to be inspected which reflects both archimedean and non-archimedean restrictions placed on the coefficients of a defining polynomial.
Several authors have used Hunter’s theorem to find a defining polynomial xn + a1 xn−1 + · · · + an−1 x + an ∈ Z[x] for each primitive degree n field of absolute discriminant D less than or equal to some cutoff ∆. The method requires a computer search over all vectors (a1 , . . . , an ) satisfying certain bounds. In [JR1] we explained that one is sometimes particularly interested in the fields with D = ∆, especially when all primes dividing D are very small. To find just these fields by a Hunter search, one imposes not only archimedean inequalities on the ai as above, but also p-adic inequalities for each prime p dividing D. This is an example of a targeted search, the target being D. In this paper we investigate the search volume of such Hunter searches, which approximates the number of polynomials one is required to inspect. We find that these search volumes have the form Search Volumen (D ≤ ∆) =
C(n, ∞)∆ Y Search Volumen (D = ∆) = C n, pd C(n, ∞)∆(n−2)/4 . (n+2)/4
pd ||D
In Section 1 we work over R. The constant C(n, ∞) is a sum of constants C n, ∞d, one for each possible signature r + 2d = n. We identify the constant C n, ∞0 using a Selberg integral; the remaining integrals are harder and we evaluate them in the cases n ≤ 7. In Sections 2 and 3 we work over Qp . The constant C n, pd is a sum of con stants C n, pd, K , one for each possible p-adic completion K with discriminant J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 412–423, 1998. c Springer-Verlag Berlin Heidelberg 1998
Timing Analysis of Targeted Hunter Searches
413
pd . Evaluating C n, pd, K requires evaluating an Igusa integral. We evaluate a few cases exactly and get a reasonable simple upper bound in all cases. In Sections 4 and 5 we work over Q. Section 4 describes Hunter’s theorem and gives an asymptotic formula for the number of defining polynomials of a degree n algebra within a given search radius. In Section 5 we prove the above search volume formulas, and discuss how our results apply in practice. We have carried out all targeted searches for n ≤ 5, and D of the form paq b with p and q primes ≤ 19. Complete tables are available at [J1]. Our computations here show that the enormously harder case n = 6 is feasible too. Search results will appear at [J1] as they become available. We now fix some notation. Let F be a field of characteristic zero; typically F = Q or one of its completions Qv in this paper. We work with finite dimensional F -algebras K. Here, all algebras areQ assumed to be separable. So, K factors canonically as a product of fields, K = Ki . We will work with monic degree n polynomials f(x) = xn + a1 xn−1 + · · · + an−1 x + an ∈ F [x] . Often we think of such polynomials as simply elements (a1 , . . . , an ) of F n . If f(x) is separable, then we call f(x) a defining polynomial for the F -algebra Q K = FQ [x]/f(x). The factorization K = Ki is induced by the factorization f(x) = fi (x) into irreducibles, via Ki = F [x]/fi(x). Conversely, let K be an algebra and y ∈ K. Let fy (x) be the characteristic polynomial of y acting on K by multiplication. Basic algebraic facts about the map c : K → F n defined by y 7→ fy underlie many of our considerations. For example, c induces a surjection (Regular elements of K) → (Defining polynomials for K) with Aut(K) acting freely and transitively on the fibers. This accounts for the presence of | Aut(K)| in many formulas. Q Q If f(x) = ni=1 (x − yi ) we put D(f) = i<j (yi − yj )2 and think of D as a polynomial function of the aj , as usual. Finally, if F ⊆ C we let T2 (f) = P n 2 i=1 |yi | .
1
Archimedean Volumes
Let A be a degree n algebra over R. So, we can simply take A = Rr × Cd for some r + 2d = n. The characteristic polynomial of y ∈ A is fy (x) = xn + a1 xn−1 + · · · + an−1 x + an ∈ R[x] . Let A0 be the set of elements of A with trace 0, and consider the corresponding space of polynomials P 0 (A, r) = fy ∈ R[x] : y ∈ A0 and T2 (fy ) ≤ r 2 . We measure the volume of P 0 (A, r) with respect to the usual volume form da2 · · · dan .
414
John W. Jones and David P. Roberts
Proposition 1.1. vol(P 0 (A, r)) = vol(P 0 (A, 1)) r (n+2)(n−1)/2 Proof. One has a linear map L: P 0(A, 1) → P 0 (A, r) (a2 , . . . , an ) 7→ (r 2 a2 , . . . , rn an ) . The Jacobian of this map is r to the power n X j=2
j=
(n + 2)(n − 1) . 2 t u
This simple observation is the most important point in analyzing Hunter searches. We work more generally with Z 1 |D|s− 2 da2 · · · dan . ζA (s) := P 0 (A,1)
The desired volume vol(P 0 (A, 1)) is just the special value ζA (1/2). A general formula for ζA (s) would be desirable, since it would give one the moments of the polynomial discriminants encountered in a Hunter search. For example, to compute the average polynomial discriminant encountered one needs the number ζA (3/2), as well as ζA (1/2). Let A0 (1) be the unit ball in A0 ; it is a degree | Aut(A)| cover of P 0 (A, 1) via the characteristic polynomial map c. One can pull back the defining integral to A0 (1); at this step the Jacobian |D|1/2 enters the integrand. One can next extend the integral to the full unit ball A(1). Using the homogeneity of the integrand, one can replace the sharp radial cutoff ρ ≤ 1 by an integral over all of A against 2 a Gaussian e−ρ /2 . The net result is Z 2 2n(s−ns−1)/2 e−ρ /2 |D|s ω . ζA (s) = √ (ns + 1)(n − 1) ! A | Aut(A)| πn 2 Here ω is the standard volume form on A, giving the unit ball ρ ≤ 1 its usual volume π n/2 /(n/2)!. Also | Aut(Rr × Cd )| = r!d!2d. Proposition 1.2. Qn 2−n(n−5)/4 j=1 (j/2)! vol(P (R , 1)) = √ (n + 2)(n − 1) ! n! πn 4 0
n
Timing Analysis of Targeted Hunter Searches
415
Proof. In the case A = Rn the roots y1 ,. . . ,yn are coordinates on A and ω = dy1 · · · dyn . A special case of Selberg’s integral is Z ∞ Z ∞ n Y 2 (js)! ; ··· e−r /2 |D|s dy1 · · · dyn = (2π)n/2 s! −∞ −∞ j=1
see e.g. [M1], 17.6.7. Evaluating at s = 1/2 this becomes 2 ing the proposition.
3n/2
Qn
j=1 (j/2)!,
yieldt u
Proposition 1.3. The ratios vol(P 0 (Rr × Cd , 1))/vol(P 0 (Rn , 1)) for n ≤ 7 are as follows. 3 4 5 6 7 d\n 0 1 1 1 1 1 5 18 58 179 543 1 9 134 31 1355 11875 2 451 23 17466 31 3 Proof. Let
Y
I=
(yi − yj )
1≤i<j≤n
be the indicated square root of D. We need to compute Z 2 e−ρ /2 |I| ω . Rr ×Cd
√ Taking yr+1 , . . . , yr+d as coordinates on Cd and writing yk = (uk + ivk )/ 2 one has ρ2 =
r X j=1
yj2 +
d X
(u2k + vk2 )
k=1
ω = dy1 · · · dyr du1 · · · dud dv1 · · · dvd |I| = f(y1 , . . . , yr , u1 , . . . , ud , v1 , . . . , vd )
Y 1≤i<j≤r
|yi − yj |
d Y
|vk |
k=1
with f a polynomial. One can expand f and integrate out the uk ’s and the vk ’s using Z ∞ −x2 /2 j (j−1)/2 j − 1 ! e x dx = 2 2 0 2d times on each term. One is left with an integral of the form Z ∞ Z ∞ Y 2 2 ··· e−(y1 +...+yr )/2 |yi − yj | g(y1 , . . . , yr ) dy1 · · · dyr −∞
−∞
1≤i<j≤r
with g(y1 , . . . , yr ) a symmetric polynomial in the yi . Here the absolute values pose a problem. For r ≤ 3 this obstruction can be surmounted in an elementary way and one can again integrate term-by-term. When d = 1 the moment formulas in [M1], Section 17.8 suffice. This covers all cases with n ≤ 7. t u
416
John W. Jones and David P. Roberts
The case of cubics f(x) = x3 + a2 x + a3 is illustrative. The two regions P 0 (A, 1) are shown in Figure 1. In the case A = R × C, let r2 = y12 + |y2 |2 + |y3 |2 0.2 0.1
a3
0
R3
Rx C
-0.1 -0.2 -0.4
-0.2
0
a2
0.2
0.4
Fig. 1. The sets P 0 (A, 1) for cubics f(x) = x3 + a2 x + a3 and express the real root y1 as rt. Then the defining integral p for ζR×C (s) can p be evaluated by changing variables from (a2 , a3 ) to 0 ≤ r ≤ 1, − 2/3 ≤ t ≤ 2/3. The result is √ −s−1 2s− 1 2 (s − 1/2)! 2 π2 3 1 + s; 1 + s; . ζR×C (s) = 2 F1 −2s, (1 + 3s) s! 2 3 The presence of the hypergeometric function 2 F1 indicates that the general ζRr ×Cd (s) is more complicated than the general ζRn (s).
2
Ultrametric Masses
The set of isomorphism classes of degree n algebras A over Qp is much more complicated than in the archimedean case v = ∞. A starting point for analyzing this set is a mass formula, due to Krasner and Serre [S1]. Here is a quick summary, more details being contained in [R2]. The mass of A is by definition mA := 1/| Aut(A)|. Let Qun p be a maximal . Call two algebras A1 and unramified extension of Qp and put Aun = A ⊗ Qun p un ∼ . Then, the sum of m A A2 geometrically equivalent if Aun = A over A in a 1 2 geometric equivalence class is 1. Let mn,pd be the sum of mA over all totally ramified A of degree n and discriminant pd . Then, in the tame case p - n, the only non-vanishing mn,pd is mn,pn−1 = 1. The first few wild cases are shown in Table 1. Other wild cases are but are also governed by the Krasner-Serre mass formula P more complicated, n−1−d m d p = 1. n,p d The general case reduces to the totally ramified case just summarized. Given Q A, from the canonical factorization Aun = Aun i one gets an unordered collection of (ni , di ). The sum of mA over algebras A giving rise to these (ni , di ) is Q mni ,pdi . In particular, the degree partition λA = (n1 , n2 , . . .) is a complete geometric invariant of a tame algebra.
Timing Analysis of Targeted Hunter Searches
417
Table 1. Masses for low degree wildly ramified algebras
mp,pd
8< p − 1 if p ≤ d ≤ 2p − 2 if d = 2p − 1 = p : 0 else
d m4,2d m6,2d m6,3d 4 1 5 6 2 1 2 7 2 8 4 2 9 4 6 10 4 4 6 11 8 8 9
Let M (n, pd ) be the sum of mA over all p-adic algebras with degree n and discriminant pd . The above discussion is sufficient for computing M (n, pd ) for n ≤ 7. The integers M (n, pd) appear in Corollary 3.3 and also the table in Section 5.
3
Ultrametric Volumes
Let A be a degree n algebra over Qp , with ring of integers O, and discriminant pdA . Define P (A) = {fy (x) ∈ Zp [x] : y ∈ O} . We measure volumes with da1 · · · dan , so that all of Znp gets volume 1. Proposition 3.1. With ZA (t) as defined in the proof below, (i) vol(P (A)) = (ii) ZA (1/p) ≤ 1.
ZA (1/p) | Aut(A)| pdA
Proof. Let ω be Haar measure on A, normalized so that ω(O) = 1. We use the characteristic polynomial map c: O → P (A). Pulled back to O, the polynomial discriminant function D factors as pdA I 2 . Here, I is a polynomial function on O with Zp coefficients. Note, Zp [y] has index 1/|I(y)|p in O. The Jacobian function c∗ (da1 · · · dan )/ω is p−dA |I|p . On regular elements, i.e. elements on which I is non-zero, c has degree Aut(A). So Z Z 1 s− 12 s− 1 |D|p da1 · · · dan = |pdA I 2 |p 2 p−dA |I|p ω ζA (s) := | Aut(A)| O P (A) Z 1 |I|2s = 3 p ω | Aut(A)| pdA ( 2 −s) O ZA (p−2s ) . = 3 | Aut(A)| pdA ( 2 −s)
418
John W. Jones and David P. Roberts
Here we have defined ZA (t) =
∞ X
ω(O[j])tj
j=0
with O[j] the set of y ∈ O with |I(y)|p = 1/pj . Plugging in s = 1/2 gives part (i). To prove part (ii), we note that ZA (t) is a power series with positive coefficients such that ZA (1) = ω(O) = 1. It is an increasing function on [0, 1] and so t u ZA (1/p) ≤ 1. The function ZA (t) is an example of an Igusa zeta function [D1]. Thus, it is known to be in Q(t). Proposition 3.1 and its proof make no reference to the classification of p-adic algebras sketched in Section 2. Define [ P (A) P (λ) = λA =λ
[
d
P (n, p ) =
P (A) .
dA =d
Summing over A with λA = λ in Proposition 3.1 and using the Krasner-Serre mass formula gives Corollary 3.2 below. Summing over A with dA = d in Proposition 3.1 and using the definition of M (n, pd ) gives Corollary 3.3. Corollary 3.2. For λ a partition of n, vol(P (λ)) ≤
1 . pn−`(λ)
where `(λ) denotes the length of λ. Corollary 3.3. For d ∈ Z≥0 , vol P n, pd
≤
M (n, pd ) . pd
On the other hand, one can also prove Corollary 3.2 directly, using neither the Krasner-Serre mass formula, nor Proposition 3.1. Direct proof of Corollary 3.2. Write λ = (λ1 , . . . , λ`(λ)), with each λi > 0. Let P (λ)1 ⊂ Fnp be the reduction of P (λ) ⊂ Znp . For e a positive integer, let µe be the number of i such that λi = e. Very simply, ( ) Y e fe (x) P (λ)1 = e
µe . To give an element of P (λ)1 is to where fe (x) ∈ Fp [x] is monic of degree P µe = `(λ) coefficients, and so P (λ)1 give the coefficients of the fe . There are is p`(λ) /pn of Fnp . t u
Timing Analysis of Targeted Hunter Searches
419
It would be nice to compute ZA (1/p) exactly. To do this it seems necessary to compute all of ZA (t). We have succeeded when n is prime and A is a field; the results in the unramified case U and the totally ramified case R are ! 2 1 t(n−1) /2 1 1− 1− 1 − n−1 p pn−1 p Z (t) = ZU (t) = . R tn(n−1)/2 tn(n−1)/2 t(n−1)/2 1− 1− 1− pn−1 p pn−1 We have also computed several more difficult ZA (t), sometimes directly, and sometimes making use of the stationary phase formula [D1], Theorem 3.4. The resulting formulas are quite complicated.
4
The Search Set
Let K be a degree n algebra over Q with absolute discriminant D. With respect to the quadratic form T2 , one has an orthogonal decomposition K = K 0 ⊕ Q, K 0 being the subspace of traceless elements. 0 Let O be the ring of integers in K. Let O0 be the projection p of O to K . As 0 0 a lattice in the Euclidean space K ⊗ R, O has covolume D/n. Let gm be the smallest real number so that every lattice in Euclidean space Rm with covolume V has a non-zero vector of length ≤ (gm V 2 )1/(2m). The value of gm is known ([CS1], Table 1.2) for m ≤ 8. m gm
1 1
2 1 31
3 2
4 4
5 8
6 21 31
7 64
8 256
In the literature one often sees Hermite’s constant γm = Define 1/(2n−2) gn−1 D . rD = n
√ gm instead of gm .
m
One gets immediately that in O0 there is a non-zero vector y0 of length ≤ rD . The subalgebra Q(y0 ) of K strictly contains Q; so if K is a primitive field, Q(y0 ) is automatically all of K. Henceforth in this paper we take n ≥ 3 to avoid trivialities. By replacing y0 by −y0 one can assume that a03 ≥ 0 in its characteristic polynomial. As j varies from 0 to n − 1, exactly one of y = y0 − j/n is in O. This element y has characteristic polynomial fy ∈ P (rD ); here the search set P (r) is the set of polynomials f(x) =
n Y (x − yi ) = xn + a1 xn−1 + · · · + an−1 x + an ∈ Z[x] i=1
satisfying the two conditions
420
John W. Jones and David P. Roberts
(i) trace condition: a1 ∈ {0, . . . , n − 1} and a3 ≥
(n − 1)(n − 2)a31 (n − 2)a1 a2 − n 3n2
(ii) length condition: T2 (f) ≤
a21 + r2 . n
Proposition 4.1. Let K be a degree n algebra over Q with absolute discriminant D. Let m(K, ∆) be the number of defining polynomials for K in P (r∆ ). (i) If K is a primitive field and D ≤ ∆, then m(K, ∆) ≥ 1. (ii) For general K, mn m(K, ∆) ∼ | Aut(K)|
r
∆ D
√ with
mn =
gn−1 π (n−1)/2 2 n−1 ! 2
as ∆ → ∞. Proof. Part (i) is essentially Hunter’s theorem, see e.g. [C1], Theorem 6.4.1. It is proved by our discussion above. Our trace condition is a modification of the standard one. We make this modification in order to fully exploit the involution y0 7→ −y0 , thereby making |P (r)| as small as possible. 0 be the subset of O0 consisting of elements y0 with For Part (ii), let O∆,+ 0 0 0 be the subset of O∆,+ consisting of length ≤ r∆ and a3 ≥ 0. Let Oreg,∆,+ regular elements. Then (Volume of ball of radius r∆ ) 0 0 | ∼ |O∆,+ |∼ | Aut(K)| m(K, ∆) = |Oreg,∆,+ 2 (Covolume of O0 ) r √ n−1 n−1 √ / 2 ! gn−1 π (n−1)/2 ∆ (r∆ π) p = = D 2 n−1 ! 2 D/n 2 as ∆ → ∞.
t u
Part (ii) relates to the phenomenon that searches tend to find several defining polynomials for each primitive field sought, as well as defining polynomials for non-primitive fields. For 3 ≤ n ≤ 9 one has n mn to one decimal place.
3 1.8
4 3.0
5 4.9
6 7.4
7 11.9
8 18.9
9 32.5
Timing Analysis of Targeted Hunter Searches
5
421
Timing Analysis
To incorporate targeting into the formalism, let S be a finite set of places of Q containing ∞. For v ∈ S, let Av be a degree n algebra over Qv . Let P ({Av }, r) = {f(x) ∈ P (r) : Qv [x]/f(x) ∼ = Av for v ∈ S} . From Proposition 4.1, P ({Av }, r∆ ) contains a defining polynomial for every primitive degree n field K with absolute discriminant D ≤ ∆ and Kv ∼ = Av , v ∈ S. Proposition 5.1. ! Y n gn−1 (n+2)/4 0 vol(P (A∞ , 1)) vol(P (Ap )) ∆(n+2)/4 |P ({Av }, r∆ )| ∼ 2 n p as ∆ → ∞. Proof. For elements in P ({A∞ }, r∆ ) there are n possible values of a1 , each giving asymptotically the same number of polynomials; this accounts for the factor n. Those with a1 = 0 are the intersection of the standard lattice Zn−1 with the interior of the region P 0 (A∞ , r∆ )+ in Rn−1 . The + indicates the extra condition a3 ≥ 0, and accounts for the 2 in the denominator. Proposition 1.1 and the definition of r∆ account for the factor vol(P 0 (A∞ , r∆ )) = (gn−1 ∆/n)(n+2)/4vol(P 0 (A∞ , 1)). Finally the ultrametric conditions account for t u the extra factors vol(P (Ap )). The deeper Propositions 1.2 and 1.3 determine the archimedean volumes for n ≤ 7; Proposition 3.1 bounds the ultrametric volumes in general. Proposition 5.1 is an asymptotic formula. However one would expect, and experience shows, that it applies well when ∆ is simply the product of the discriminants of the Ap ’s. In this restricted context, and summing over p-adic algebras with a given discriminant, the formula can be restated as follows. Define n gn−1 (n+2)/4 vol P 0 (Rn−2d × Cd , 1) C(n, ∞d) = 2 n C(n, pd) = vol P n, pd pd . Then, a Hunter searchQfor all primitive fields of signature (n − 2d∞, d∞ ) and absolute discriminant pdp requires inspection of approximately Y C(n, pdp )pdp (n−2)/4 C(n, ∞d∞ ) p
Q polynomials. (Naturally there are no such fields unless (−1)d∞ pdp is congruent to 0 or 1 modulo 4. If n = 6, some searches can be replaced by easier searches via sextic twinning [R1].) In the literature there are several methods which allow one to implement the length inequality and target the local algebra at ∞ with little loss [BFP1],
422
John W. Jones and David P. Roberts
[SPD1], [O1], [DO1]. In principal, p-adic bounds on the ai giving only the slight loss C(n, pd) ≤ M (n, pd) of Corollary 3.3 are easy to describe since they amount to collections of congruences on the ai . For example, in the tame case one can follow the direct proof of Corollary 3.2. In practice, for large p and/or n it can become unwieldy to implement sharp p-adic bounds as well. Table 2 below gives what we call local difficulty ratings, namely the numbers log10 (C(n, ∞d)) and log10 (M (n, pd)pd(n−2)/4 ). All entries are rounded to the nearest tenth. The rows labelled All give totals for all values of d. Table 2. Local Difficulty Ratings d ∞ 2 0 −3.1 1 −1.9 − 2 −2.2 0.6 3 0.8 4 0.9 5 1.1 6 1.7 7 − 8 1.8 9 2.0 10 2.1 11 2.6 All −1.7 2.9 0 −8.1 1 −5.9 − 2 −5.0 0.9 3 −5.5 1.2 4 1.9 5 2.1 6 2.8 7 2.7 8 3.5 9 3.9 10 4.1 11 4.8 12 4.7 13 5.1 14 5.4 All −4.8 5.7
3
5 7 Quartics 0.2 0.3 0.4 0.5 1.0 1.1 1.2 1.0 1.3 1.3 1.7
1.9 1.4 1.5 Sextics 0.5 0.7 0.8 1.0 1.7 2.0 2.0 2.6 3.0 2.7 3.1 3.9 3.1 4.2 4.2 3.8 4.8 4.1 5.5 4.8 6.2 5.4 7.0 5.9 6.2
6.4 7.1 4.4
11
13
∞ −5.3 −3.5 −3.1
2
3
− 0.4 0.8 0.7 1.0 1.6 1.5 2.1 1.7 2.5 2.1 2.6 − 2.4 2.6 2.9 3.4 1.8 1.9 −3.0 3.6 3.0 −11.4 1.0 1.1 −8.7 − 0.6 2.4 2.5 −7.3 1.1 1.2 3.6 3.8 −7.2 1.4 2.4 4.6 4.9 2.2 3.2 5.2 5.6 2.5 3.9 3.4 4.7 3.5 5.2 4.3 5.9 4.6 6.4 5.0 7.1 5.6 7.5 5.7 6.3 6.5 5.3 5.7 −6.9 6.8 7.7 0.5 1.3 1.6
0.6 1.4 1.7
5 7 Quintics 0.5 0.6 1.3 1.6 1.9 2.2 − 2.5 3.2 3.7 4.3 4.8 5.4
5.5 2.7 Septics 0.9 1.1 2.0 2.4 3.1 3.6 4.0 4.8 5.1 5.8 6.2 − 7.0 8.2 7.9 9.2 8.8 10.3 9.4 11.3 12.4 13.5 14.6
11
13
0.8 1.9 2.6 3.1
0.8 2.0 2.8 3.3
3.3 3.5 1.3 2.9 4.4 5.8 7.0 7.8
1.4 3.1 4.7 6.2 7.4 8.4
9.5 14.6 7.9 8.4
The translation from search volumes to search times requires that one incorporate a number of practical concerns as well. All told, one can expect that a degree n search with difficulty x will take longer than a degree n − 1 search with
Timing Analysis of Targeted Hunter Searches
423
difficulty x. For our current programs, in degrees 5 and 6 on a medium speed personal computer, the translation from volumes to times goes as follows. For quintics, searches with difficulty rating x take us about 10x−7.9 days. For example, the search for all 211 36 59 quintics has difficulty rating −3.0 + 3.4 + 2.6 + 5.4 = 8.4 and took 100.5 ≈ 3 days. For sextics, searches with difficulty rating x take us about 10x−7.3 days. For example, the search for all primitive 21459 sextics has difficulty rating −4.8 + 5.4 + 7.0 = 7.6 and took around 100.3 ≈ 2 days. In [JR1] we found all sextic fields ramified within S = {∞, 2, 3}. Table 2 shows that a few other {∞, p, q} are easier, while harder cases like {∞, 3, 5} are feasible too.
References BFP1. C1. CS1. D1. DO1. J1. JR1.
M1. O1. R1. R2. S1.
SPD1.
Buchmann, J., Ford, D., and Pohst, M., Enumeration of quartic fields of small discriminant, Math. Comp. 61 (1993) 873–879. Cohen, H.: A Course in Computational Algebraic Number Theory, GTM 138 Springer Verlag, 1995. Conway, J. and Sloane, N.: Sphere Packings, Lattices, and Groups, Springer Verlag, 1988. Denef, J.: Report on Igusa’s local zeta function, S´eminaire Bourbaki 741, Ast´erisque 201-202-203, 359–386. Diaz y Diaz, F. and Olivier, M., Imprimitive ninth-degree number fields with small discriminants, Math. Comp. 64 (1995) 305–321. Jones, J.: Tables of number fields with prescribed ramification, a WWW site, http://math.la.asu.edu/~jj/numberfields Jones, J. and Roberts, D.: Sextic number fields with discriminant −j 2a 3b , to appear in the Proceedings of the Fifth Conference of the Canadian Number Theory Association. Mehta, M.: Random Matrices, 2nd edition, Academic Press, 1991. Olivier, M., The computation of sextic fields with a cubic subfield and no quadratic subfield, Math. Comp. 58 (1992) 419–432. Roberts, D.: Twin sextic algebras, to appear in Rocky Mountain J. Math. Roberts, D.: Low degree p-adic fields, in preparation. Serre, J.-P.: Une “formule de masse” pour les extensions totalement ramifi´ ees de degr´ e donn´e d’un corps local, C. R. Acad. Sci. Paris S´er. A-B 286 (1978), no. 22, A1031-A1036. Schwarz, A., Pohst, M., and Diaz y Diaz, F.: A table of quintic number fields, Math. Comp. 63 (1994) 361–376.
On Successive Minima of Rings of Algebraic Integers Jacques Martinet Universit´e Bordeaux I, Laboratoire A2X, 351 cours de la Lib´eration, 33 405 Talence, France [email protected]
Abstract. We give an account of some largely experimental results about the successive minima of the ring of integers of an algebraic number field endowed with its canonical Euclidean norm. This throws light on some interesting facts which could be important for the algorithmic theory of number fields.
1
The Notation
Let K be a number field of degree n and signature (r1 , r2 ) (r1 + 2r2 = n). We order the embeddings σk : K → C in the usual way: σk is real for 1 ≤ k ≤ r1 and σk+r2 = σ k for r1 + 1 ≤ k ≤ r1 + r2 . We consider on K the standard positive definite quadratic form (the “twisted” trace form) defined by q(x) =
n X
σk (x)σ k (x) ,
k=1
in which we shall often write xk for σk (x). The completion Nof K for the real embedding of Q yields the real algebra with involution E = R Q K. It is a Euclidean space for the form TrE/R (xx), whose restriction to K is the form q considered above. Let ZK be the ring of integers of K. It is a lattice (discrete subgroup of rank n) of E. We can thus apply to it the usual notions of geometry of numbers. Given a lattice Λ with scalar product (x, y) 7→ x.y, its determinant is det(Λ) = det(ei .ej ) where (e1 , . . . , en ) is any basis of Λ. Here, the determinant is the absolute value of the discriminant of K: det(ZK ) = |dK | . Next, for any λ ∈ R, let Sλ (Λ) = {x ∈ Λ | 0 < x.x ≤ λ}. The minimum (or norm) of Λ is m(Λ) = minx∈Λr{0} x.x ; we let S(Λ) = Sm (Λ). More generally, we define for 1 ≤ k ≤ n the successive minima of Λ by mk (Λ) = inf λ ∈ R | rkSλ (Λ) ≥ k . We simply write m, mk for m(ZK ), mk (ZK ). We refer to Pohst and Zassenhaus ([10], pp. 195–200) for an algortihm to compute the successive minima of a lattice. J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 424–432, 1998. c Springer-Verlag Berlin Heidelberg 1998
On Successive Minima of Rings of Algebraic Integers
425
We have m1 = m ≤ m2 ≤ · · · ≤ mn , and the above infimum is actually a minimum. The relative lattice Λr generated by the vectors x ∈ Λ with x.x ≤ mr has dimension r 0 , the largest of the integers t such that mt = mr , i.e., the smallest jump r 0 ≥ r between two minima. We shall use the notation Λsm to denote a (full) sub-lattice of Λ generated by independent vectors which represent the successive minima; it is a (not welldefined) sub-lattice of Λ. The values of our twisted trace form are algebraic. However, we must quote two special cases where this form has an interesting arithmetical meaning: • For totally real K, q is the integral form x 7→ TrK/Q (x2 ) . • For C.M. fields, denoting by K + the maximal real subfield of K, q is the (integral) even form x 7→ 2TrK + /Q (NK/K + (x)). The determinant of ZK is the classical invariant |dK |; the calculation of m is an easy matter, see proposition 1 below; however, except for very special fields, we are not able to say a lot about the upper minima.
2
Some Easy Results
In this section, we consider the lattice ZK . The following result is well known: Proposition 1. One has m(ZK ) = n, and the equality q(θ) = n holds exactly on the roots of unity contained in K. Proof. Let θ ∈ K ∗ . Applying the arithmetico-geometric inequality yields the lower bound !1/n n n X Y θk θk ≥ n θk θ k = n |NK/Q (θ)|2/n ≥ n . q(θ) = k=1
k=1
Equality clearly holds for roots of unity. Conversely, for equality to hold in the arithmetico-geometric inequality, it is necessary that all terms θk θk be equal. But their sum must be n. Hence, we must have |θk | = 1 for all k, and Kronecker’s lemma shows that θ must be a root of unity. t u Example 1. Suppose K is a quadratic field with discriminant d, embedded in and is then E identified with C. The second minimum m2 is greater than m1 √ √ 1 ±1± d if d ≡ 1 attained exactly on θ = ± 2 d if d ≡ 2, 3 mod 4 and on θ = 2 mod 4, except for d = −3 andpd = −4 where m2 = m1 = 2. Moreover, the lattice is the rectangular √ lattice h1, i |d|i in the first case and the centered rectangular lattice h1, i
|d|
1+
2
i in the second case. Thus, we have m1 = 2 and m2 = b |d|+1 c. 2
The group µK of roots of unity of K ∗ acts on K by multiplication. The form q is obviously invariant under this action. Under certain hypotheses (e.g., K/Q
426
Jacques Martinet
is Abelian, or K is totally real, or . . . ), the elements of G = Gal(K/Q) also preserve q, for the formula σθ = σθ holds for all θ ∈ K and all σ ∈ G. We then ˜ = µK o G which preserves q. even have an action of the semi-direct product G As a consequence, denoting by r the degree over Q of the maximal cyclotomic subfield of K, we have m1 = · · · = mr = n < mr+1 = mr+2 = · · · = m2r ≤ m2r+1 ≤ · · · ≤ mn (and there may be further equalities between minima, some of them induced by Galois actions). The form q is the basic tool in the determination of fields with small discriminants for a given signature. Actually, for θ ∈ K with characteristic polynomial χθ (X) = X n − T1 X n−1 + T2 X n−2 + · · · + (−1)n Tn , one has TrK/Q (θ2 ) = T12 − 2T2
and
q(θ) = T12 − 2T2 + 4
rX 1 +r2
=(θj )2 .
j=r1 +1
Thus, q(θ) − Tr(θ2 ) is a measure of the deviation of the roots of χθ from being real. An other easy result concerns the trace TrK/Q = T1 of representatives of minima: Proposition 2. Any representative θ 6= ±1 of one of the successive minima satisfies the double inequality −
n n ≤ TrK/Q (θ) ≤ . 2 2
Proof. Given a lattice Λ and a minimal vector x of Λ, one has |x.y| ≤ any y ∈ Λ other than ±x, as one sees from the calculation
y.y 2
for
(x ± y).(x ± y) ≥ x.x + y.y ∓ 2x.y ≥ x.x . This proves the proposition, since the trace TrK/Q (θ) is precisely the scalar product 1.θ. t u The existence of wildly ramified primes may force the trace to take its values in some restricted subset of Z/nZ. Apart from this restriction, and disregarding the trivial case of roots of unity (for which we have TrK/Q (ζ) = 0 or ± [K : Q(ζ)]), not much seems to be known from n = 4 onwards about the possible values of the trace on vectors which represent the minima. Conjecture 2 below suggests that the absolute value of the trace could often be close to 0 (e.g., 0 or ±1) rather than to ±b n2 c. However, large values of the trace may be forced by the existence of subfields, see proposition 5 below.
On Successive Minima of Rings of Algebraic Integers
3
427
The Index Problem
We come back in this section to the general case of a lattice Λ in some Euclidean space also denoted by E, for which we keep the notation q(x) = x.x. m(Λ) and the Hermite constant is The Hermite invariant of Λ is γ(Λ) = det(Λ)1/n γn = supdim Λ=n γ(Λ). Let us recall two famous theorems: Theorem 1 (Hermite). Every lattice possesses a basis (e1 , . . . , en ) which satisfies the inequality q(e1 ) q(e2 ) . . . q(en ) ≤
n(n−1)/2 4 det(Λ) . 3
Theorem 2 (Minkowski). The product of the successive minima of Λ satisfies the inequality m1 (Λ) m2 (Λ) . . . mn (Λ) ≤ γnn det(Λ) . As one knows, Minkowski found linear upper bounds for γn (one can for instance take the universal bound γn ≤ 1 + n4 ). Thus, the upper bound of the geometric mean of the product in theorem 2 is linear, whereas it is exponential in theorem 1 (and all known bounds are indeed exponential). As a consequence, one expects to find large possible values for [Λ : Λsm ], the index in Λ of any sub-lattice generated by independent vectors e1 , . . . , en with ei .ei = mi . There is a general upper bound for this index: n/2
Theorem 3. One has [Λ : Λsm ] ≤ γn . Sketch of proof. By a deformation argument (which can also be used to prove theorem 2), we reduce to the case when m1 = · · · = mn = m. We rescale Λ to det(Λsm ) , and m = 1, then calculate the index by the formula [Λ : Λsm ]2 = det(Λ) finally bound the numerator from above by 1 using the Hadamard inequality t u and the denominator from below by γn−n . The value of γn is known up to n = 8, and the above bound is optimal in all cases. The largest possible indices are thus 1 for n = 1, 2, 3, 2 for n = 4, 5 and 4, 8, 16 for n = 6, 7, 8 respectively. However, if for n = 4, 6, 7, 8 respectively we exclude lattices similar to D4 , D6 , E7 , E8, we find the smaller bounds 1, 3, 4, 8, see [12] for n = 4, 6, 7 and [6] for n = 8. Work by Watson, extended first by Ryˇskov, then by Zahareva, and recently revisited by myself yields a complete picture of the possible structures of
Λ/Λsm
up to dimension
8.
One
must consider finer invariants than the mere structure of the previous quotient. There is a considerable amount of details, which cannot be discussed here; see the references [6], [11], [12], [13].
428
Jacques Martinet
We now explain for further use Watson’s method. The basic idea is to use the following identity relating the norms (=square of the length) q(ei ) of r ina1 e1 + · · · + ar er for some integer dependent vectors e1 , . . . , er with that of e = d d≥2 : r r X X |ai |[q(e − sgn(ai )ei ) − q(ei )] = [( |ai |) − 2d)] q(e) . (*) i=1
i=1
[Proof: replace ei by −ei for ai < 0 and develop.] Applying this identity to representatives of the successive minima, we obtain: Proposition 3. Let Λ be a lattice, let e1 , . . . , en ∈ Λ be independent vectors of Λ with q(ei ) = mi and let a1 , . . . , an be rational integers. Denote by imin (resp. imax ) the smallest (resp. the largest) index k such that ak 6= 0. Suppose a1 e1 + · · · + ar er for some integer d ≥ 2. that Λ contains a non-zero vector e = d Pn Then, one has i=1 |ai | ≥ 2d. Moreover, if equality holds, q(e − ej ) and q(ej ) are equal to a same real number for all j ∈ [imin , imax ]. Proof. We first remark that the inequality q(e − sgn(ai )ei ) ≥ q(ei ) holds for all i with ai 6= 0, since otherwise the i-th minimum could have been defined by and e − sgn(ai )ei instead of ei . Hence, the left hand side of (*) is non-negative, P so is the right hand side. We have thus proved the inequality ai ≥ 2d. Suppose now that equality holds in the above inequality. We then have q(e − ei ) = q(ei ) for all index i with ai 6= 0, and we must show that one has moreover q(ei ) = q(ej ) for all i, j such that imin ≤ i < j ≤ imax . Otherwise, let i, j with q(ei ) < q(ej ). If ai = 0 (resp. if aj = 0), there exist i0 with imin ≤ i0 < i and ai0 6= 0 (resp. j 0 with j < j 0 ≤ jmax and a0j 6= 0). Replacing if need be i by i0 and j by j 0 , we may assume that ai and aj are non-zero. But replacing ej by e − ei then contradicts the definition of mj . t u
4
Rings of Algebraic Integers
The following natural question arises for number fields: Problem 1. Let K be a number field of degree n endowed with its twisted trace form. What are the possible values for the index in Λ = ZK of the sublattice Λsm ? This problem is of some importance: for algorithmic purposes, it is necessary to have some reduction procedure to handle number fields efficiently; it appeared clearly to the authors of packages (e.g., KANT, PARI, . . . ) that computations of units and class numbers are faster when working with reduced bases rather than standard bases obtained by a triangular construction from a basis (1, θ, . . . , θn−1 ) of some order of K ; also, the computation of quadratic extensions of cubic fields done in 1992 by Olivier in [8] showed better efficiency when working over sharply reduced bases of cubic fields.
On Successive Minima of Rings of Algebraic Integers
429
A basis of ZK corresponding to the successive minima, if any, is certainly a very good choice for a reduced basis. The following theorem gives some indications about the possible indices for fields of degree up to 8 : Theorem 4. Let K be a number field of degree n ≤ 8, let θ1 , . . . , θn be Qlinearly independent elements of ZK such that q(θi ) = mi and let Λ0 be the Z-lattice they generate. Then one has Λ0 = ZK for n ≤ 4, [ZK : Λ0 ] ≤ 2 for n = 5, 6, [ZK : Λ0 ] ≤ 4 for n = 7 and [ZK : Λ0 ] ≤ 8 for n = 8. Proof. We must exclude the possibility that [ZK : Λ0 ] be maximal for n = 4, 6, 7, 8 as well as the equality [ZK : Λ0 ] = 3 for n = 6. If [ZK : Λ0 ] is maximum, ZK is a root lattice, and is in particular wellrounded, i.e. possesses n independent minimal vectors. Now, ZK is well-rounded if and only if K is a cyclotomic field (proposition 1). We thus have n = ϕ(a) for a uniquely defined even integer a, and the number of pairs of minimal vectors of ZK is then s = a2 . This is obviously impossible for odd n > 1, and for ϕ(a) = 4, 6, 8 respectively, we find the upper bound a2 ≤ 6, 9, 15 whereas one has s(D4 ) = 12, s(D6 ) = 30 and s(E8 ) = 120. If n = 6 and [ZK : Λ0 ] = 3, we apply proposition 3: we can take ai = 0 or P indeed ai = 1 for all i because of the inequality P 1 for i = 1, . . . , 6. We have a ≥ 2d = 6, hence i i i ai = 2d; again by proposition 3, we must have t s = a2 ≥ 2.6 = 12, and this contradicts the upper bound s ≤ 9 found above. u Extensive computations have been done in 1995 by Napias ([7]) and in 1996 by Boucher ([1]), relying on the available tables of fields of degree 5, 6, 7. The result is really astonishing: among the somewhat 65,000 fields they considered (62533, 2424, 638 for degrees 5, 6, 7 respectively), all are of index 1 ! In order to avoid phenomena which might be related to small discriminants, a good solution is to test random polynomials. This was recently done by Diaz y Diaz and Olivier ([3]), who considered random polynomials with degrees between 5 and 12. Once more, they systematically found integral bases made with representatives of the successive minima. With a modicum of doubt (there still might exist some scarce counterexamples occurring e.g. for fields with many subfields), and although I am not able to produce any reason for it to be true, it propose the following conjecture: Conjecture 1. Rings of algebraic integers possess bases whose elements represent their successive minima. The case of dimension 5 is striking: lattices with and without a basis of representative of successive minima define a partition into two sets of the space of lattices, both containing open subsets, but no example of the second set have been found. Note that, for algorithmic purposes, it would be enough that the result should hold for “almost all” not too large discriminants. At any rate, it proves often more convenient when handling numerically lattices to consider a lattice together
430
Jacques Martinet
with a sub-lattice of small index possessing a well reduced basis rather than a single lattice whose bases will all contain very long vectors.
5
Cyclic Fields of Odd Prime Degree
We consider in this section cyclic extensions of Q of prime degree ` ≥ 3. The first minimum is `, attained exactly on ±1. Taking into account the action of the Galois group G of K/Q (of order `), we immediately see that the minima m2 , . . . , m` are equal and strictly greater than `. Recall (theorem of Kronecker-Weber) that Abelian extensions of Q embed into cyclotomic fields. More precisely, such an extension has a conductor f in the sense of class field theory, and the smallest cyclotomic field containing K is Cf = Q(ζ), were ζ = ζf is any root of unity of order f in a given algebraic closure K of K. One has then K = Q(θ) for θ = TrCf /K (ζ). Proposition 4. If ` = 3, m2 and m3 are represented exactly by ±θ and its conjugates. Proof. For any `, 1 together with `−1 conjugates of θ constitute a basis of ZK /Q (“quasi-normal” basis) in the sense of [9]; if ` does not ramify in K, ZK even possesses the normal basis {σθ | σ ∈ G}). Now, if θ0 is an other quasi-normal basis, one has θ0 = a + εθ for some a ∈ Z and ε ∈ Z[G]∗. When ` = 3, the only units of Z[G] are the six elements ±σ, σ ∈ G. But, in dimension 3, representatives of the successive minima of a lattice Λ constitute a basis of Λ. Moreover, if ϕ represents m2 = m3 , then any two conjugates of ϕ constitute together with 1 a quasi-normal basis. We thus have θ = ±σϕ for some σ ∈ G. t u For ` ≥ 5, the unit group of Z[G] is infinite as is the unit group of C`+ , and the above counting argument no more works. Nevertheless, experiments done by H. Napias ([7]) for cyclic fields of degree 5 and 7 support the following conjecture, which might well follow from a not too difficult direct computation of Gauss sums: Conjecture 2. The minima m2 = · · · = m` of any cyclic field K of prime degree ` are represented by the conjugates of TrCf /K (ζf ). Note that whenever conjecture 2 holds, the elements of ZK which represent a non-trivial minimum have trace 0 or ±1.
On Successive Minima of Rings of Algebraic Integers
6
431
The Subfields Problem
For an imprimitive field K, the following question naturally arises: what are the minima which can have or which must have representatives belonging to a proper subfield ? Disregarding the trivial cases involving roots of unity, it seems that no easy answer may be expected. The smallest possible degree is n = 4, for which imprimitive fields share out among three Galois types, namely C2 × C2 , C4 and D4 (dihedral case). A systematic study could be done (but, as far as I know, has not been done) for the easier first case; several possibilities immediately show up for the possible repartitions of representatives for m2 , m3 , m4 among quadratic or quartic elements. Before entering into some details, we first quote the following general (and easy) result: Proposition 5. Let k ⊂ K and let θ ∈ k. Then the values taken on θ by the forms q relative to K and to k satisfy the relation qK (θ) = [K : k] qk (θ). We now come back to quartic fields, and concentrate ourselves on the D4 case. Then, K possesses a unique proper quadratic field k, and there are four systems of signatures for the pair (K, k), three for real k and one for imaginary k. We denote by x 7→ x0 the non-trivial element of Aut(K/Q) = Gal(K/k), and the conjugates of θ ∈ K by θ1 , θ10 , θ2 , θ20 , which gives the form q one of the expressions θ12 + θ10 + θ22 + θ20 , 2
2
θ12 + θ10 + 2θ2 θ20 , 2
2(θ1 θ10 + θ2 θ20 ) or 2(θ1 θ2 + θ10 θ20 ) .
For the sake of simplicity, we shall restrict ourselves to the case of a real K ; we denote by d the discriminant of k. If there exists a quartic θ with q(θ) ≤ 2b d+1 2 c, m2 is then represented by such a quartic θ and m3 by θ0 ; since θ + θ0 belongs to k, no quadratic element may appear among the minima, except when equality holds in the above inequality. On the contrary, if the opposite inequality q(θ) > 2b d+1 2 c holds, we then have c < m ≤ m , and m , m are represented by quartic elements of K m2 = 2b d+1 3 4 3 4 2 which are not conjugate. Now, we are going to show by an explicit calculation that the first case is special and the second one is generic: write K = Q(ϕ) with M = ϕ2 ∈ Zk , for some α, β ∈ Zk ; we so that the elements of ZK are of the form θ = α+βϕ 2 thus have q(θ) = 12 Trk/Q (α2 )+ 12 Trk/Q (M β 2 ), and it is clear that for given k, the c and (since M is totally positive) Trk/Q (M β 2 ) ≤ inequalities Trk/Q (α2 ) ≤ 2b d+1 2 d+1 2b 2 c have only finitely many solutions. [The same result holds for all admissible signatures of (K, k) ; moreover, the case of cyclic extensions is quite similar, except that conjugacy offers wider possibilities.] Example 2. (After [7].) The dihedral quartic fields with discriminants 52 .29 and 5.292 are respectively generic and special.
432
Jacques Martinet
√ √ 1+ 7+2 5 Actually, the first one is K = Q . It has the smallest possible 2 discriminant among totally real quartic fields. The outer automorphism of the of K a field with discriminant dihedral group D4 produces inside a Galois closure q ˜ where θ˜ = ˜ = Q(θ) 5.292, namely K
√ 1+ 29 2
+
√ 7+ 29 2
is a root of the polynomial 2 2 2 ˜ c = 15. X − X − 5X − X + 1, and we have Tr(θ ) = 1 + 2 × 5 = 11 < b d+1 2 4
3
2
Of course, to make a complete study of dihedral quartic extensions, one √ √ p should consider the cases of equality (example: d = 6, θ = (2 − 6) 3 + 6), look at all systems of signatures (the cases of real and imaginary k are quite different), and consider aside the two cases d ∈ {−3, −4}. Added in proof Two days after this paper was sent for publication, H. Lenstra and Bart de Smit ([4]) sent me a counterexample to conjecture 1. Actually, for the field of degree 13 defined by the polynomial X 13 − 110.10712, one has [Λ : Λsm ] = 13. There still remains the intriguing question of small degrees, e.g, degree n = 5.
References 1. D. Boucher: Minima successifs. Applications aux corps de nombres Rapport de stage de D.E.A., Universit´e Bordeaux 1 (1996) 2. J.H. Conway and N.J.A. Sloane: Sphere Packings, Lattices and Groups Grundlehren290 (1988) Springer-Verlag, Heidelberg 3. F. Diaz y Diaz and M. Olivier: Private communication (1997) 4. H.W. Lenstra, Bart de Smit: E-mail, dated April 2 nd , 1998 5. J. Martinet: Les r´eseaux parfaits des espaces euclidiens. (1996) Masson, Paris 6. J. Martinet: Sur l’indice d’un sous-r´eseau. Preprint (1997) 7. H. Napias: Th`ese, Bordeaux (1996) 8. M. Olivier: The computation of sextic fields with a cubic subfield and no quadratic subfield. Math. Comp. 58 (1992) 419–432 9. J.-J. Payan: Contribution ` a l’´etude des corps ab´eliens absolus de degr´e premier impair. Ann. Inst. Fourier series 15, 2 (1965), 133–199 10. M. Pohst and H. Zassenhaus: Algorithmic algebraic number theory. Cambridge University Press (1989), Cambridge, U.K. 11. S.S. Ryˇskov: On the problem of the determination of quadratic forms in many variables. Proc. Steklov Inst. math. series 142 (1979), 233–259; Russsian original: 1976 12. G.L. Watson: On the minimum points of a positive quadratic form. Mathematika series 18 (1971), 60–70 13. N.V. Zahareva: Centerings of 8-dimensional lattices that preserve a frame of successive minima. Proc. Steklov Inst. math. series 152 (1982) 107–134; Russian original: 1980
Computation of Relative Quadratic Class Groups Henri Cohen, Francisco Diaz y Diaz, and Michel Olivier Universit´e Bordeaux I, Laboratoire A2X 351 cours de la Lib´eration, 33 405 Talence, France {cohen,diaz,olivier}@math.u-bordeaux.fr
Abstract. Using the theory of binary pseudo-quadratic forms over Z developed in [5], we sketch an algorithm for computing the relative class group of quadratic extensions. We end by a striking example which can be treated orders of magnitude faster using the relative method than using the absolute one.
1 1.1
Relative Class and Unit Group Computations Relative Class Groups
Let L/K be a relative extension of number fields, which in this section are not assumed to be necessarily quadratic. There are (at least) two natural maps linking ideals and units of K and L. If a is an ideal of K, then we can set i(a) = aZL which is an ideal of L. Conversely, if I is an ideal of L, the relative norm NL/K (I) is an ideal of K. These maps are defined at the level of ideal classes, and are also defined for units. Thus, it is possible to give two different notions of relative class and unit groups, and these notions are closely linked. Definition 1. 1. A pseudo-element of L/K is a ZK -module of the form αa, where α ∈ L and a is an ideal of K (this is not the same as the ZL -module αaZL ) . 2. We say that an ideal I of L is pseudo-principal if it is of the form I = αaZL , i.e. if it is generated by a pseudo-element. We denote by I the group of all fractional ideals of L, and by P ∗ the subgroup of pseudo-principal ideals. 3. The quotient group I/P ∗ will be called the relative class group of L/K for the map i, and denoted Cli (L/K). We set also hi (L/K) = |Cli (L/K)|. 4. Equivalently, we can define Cli (L/K) = Cl(L)/i(Cl(K)) = Coker(i) . 5. We also define the capitulation subgroup Cli (K) of Cl(K) in L/K by Cli (K) = Ker(i) . The equivalence of the two definitions of Cli (L/K) is clear. Similarly, we can give the analogous definitions for the norm map. J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 433–440, 1998. c Springer-Verlag Berlin Heidelberg 1998
434
Henri Cohen, Francisco Diaz y Diaz, and Michel Olivier
Definition 2. 1. The relative class group of L/K with respect to the map NL/K is the subgroup of Cl(L) defined by ClN (L/K) = Ker(NL/K ) . We set also hN (L/K) = |ClN (L/K)|. 2. We also define the norm default quotient ClN (K) of Cl(K) in L/K by ClN (K) = Cl(K)/NL/K (Cl(L)) = Coker(NL/K ) . In the following, if G is a finite Abelian group and p is a prime, we denote by Gp its p-Sylow subgroup. We give without proof the following (easy) result. Proposition 1. 1. If p - h(K), then Cl(K)p = Cli (K)p = ClN (K)p = 1 and Cl(L)p ' Cli (L/K)p ' ClN (L/K)p , and these last three groups are generated by the classes of the same ideals. 2. If p - [L : K], then Cli (P ) = ClN (P ) = 1 and Cli (L/K)p ' ClN (L/K)p , and these last two groups are generated by the classes of the same ideals. 3. If h(K) is coprime to [L : K] (in particular if h(K) = 1), then Cli (K) = ClN (K) = 1 and Cli (L/K) ' ClN (L/K). Remarks 1. It is easy to give examples where Cli (L/K) and ClN (L/K) are not isomor√ phic: take for instance K = Q(y) with y2 +30 = 0, and L = K( y). Then one can show that Cl(L) ' C4 × C2 , Cli (L/K) ' C2 × C2 and ClN (L/K) ' C2 (we thank D. Simon for this example). 2. The theory of capitulation (i.e. of the map i) is well known to be difficult, and not too much is known in general, apart from the beautiful results of Furtw¨ angler on capitulation in the Hilbert class field (where [L : K] = h(K)) and more recent generalizations. On the other hand the theory of the map NL/K is much better understood, and in the case L/K Abelian is essentially class field theory. 3. In algorithmic practice however, as we shall see presently, the situation is different: the group which will be naturally computed is the group Cli (L/K), not the group ClN (L/K), although of course the latter can also be computed, but less naturally. 1.2
Relative Unit Groups and Regulators
For the same reason as above (the maps i and NL/K ), we can give two definitions of relative units. Definition 3. 1. We say that a pseudo-element αa (see Definition 1) is a relative unit for the map i if αaZL = ZL , i.e. if the pseudo-principal ideal it generates is equal to ZL . The group of relative units for i will be denoted by Ui (L/K).
Computation of Relative Quadratic Class Groups
435
2. We say that an element α ∈ U (L) is a relative unit for the map NL/K if NL/K (α) is a root of unity of K. The subgroup of relative units for NL/K will be denoted by UN,0 (L/K), and we set UN (L/K) = ClN (K) × UN,0 (L/K). 3. We also define UN (K) = U (K)/(µ(K) · NL/K (U (L))) and Ui (K) = Ker(i), considered as a map from U (K) to U (L), where µ(K) denotes the subgroup of roots of unity in K. The following (easy) result gives the main properties of these groups, and also explains somewhat the apparently artificial definition of UN (L/K). Proposition 2. 1. We have Ui (K) = 1 and there exist two natural exact sequences i
1 −→ U (K) −→ U (L) −→ Ui (L/K) −→ i
−→ Cl(K) −→ Cl(L) −→ Cli (L/K) −→ 1 and NL/K
1 −→ ClN (L/K) −→ Cl(L) −→ Cl(K) −→ NL/K
−→ UN (L/K) −→ U (L) −→ U (K)/µ(K) −→ UN (K) −→ 1 . 2. In particular, we have the exact sequence 1 −→ U (L)/i(U (K)) −→ Ui (L/K) −→ Cli (K) −→ 1 . 3. Ui (L/K) and UN (L/K) are finitely generated Abelian groups of equal rank r(L) − r(K), where r(K) (resp. r(L)) denotes the unit rank of K (resp. L) . Although Ui (K) = 1 (trivially), note that we do not have UN (K) = 1 in general. We finally define the two notions of relative regulators. Definition 4. Denote by R(K) (resp. R(L)) the regulator of K (resp. L) . Then we define the relative regulators Ri (L/K) and RN (L/K) by Ri (L/K) = and RN (L/K) =
2 2.1
R(L) R(K)
R(L) 1 . |UN (K)| R(K)
Computation of Relative Class and Unit Groups Introduction
We now come to the algorithmic part of the paper. We consider a relative quadratic extension L/K, and we want to compute one of the relative groups
436
Henri Cohen, Francisco Diaz y Diaz, and Michel Olivier
defined above. Of course one could do this using absolute methods, but it would be much more costly than the relative methods presented below. Note however that, if we want to recover the absolute class or unit groups, this is easily done from the relative ones by using the exact sequences given above and the Abelian group techniques explained in [4]. Please note that the restriction to the relative quadratic case is absolutely not necessary, but enables us to √ use the √ efficient representations described in [5]. We have L = K( D) = K( Dk 2 ), with D ∈ K ∗ \ K ∗ 2 and we may assume WLOG that D ∈ ZK , and we make this assumption from now on. Let us summarize very briefly the results of [5]: there exists a√ unique integral −1 = Z ⊕ q ( D − δ). Every ideal q of ZK and an element δ of ZL such that Z√ L K ideal I of ZL can be represented as I = n(a ⊕q−1 ( D −b)) where a is an integral ideal such that aq2 | (D − b2 ) and b − δ ∈ q, and a and n are unique, as is the class of b modulo aq. To such an ideal is associated the pseudo-quadratic form (a, b, c; n), and ideal operations are easily translated into operations on forms which are immediate generalizations of the operations used in the absolute case. The now standard method to compute class and unit groups consists in first choosing a factor base P1 , . . . , Pk of ideals whose classes generate the class group, then of finding relations among these ideals in the class group using several techniques, and terminating by Hermite and Smith normal form algorithms to compute the class group structure. Simultaneously, one keeps information about the generators of the principal ideals implicitly contained in the class group relations, and after the class group computation one is then able to compute the regulator and if desired a system of fundamental units. Finally, to check whether enough relations have been found, one uses either Minkowski bound techniques which are slow but sure, or assume the Generalized Riemann Hypothesis and use estimates of the product h(K)R(K) given by a partial Euler product of the Dedekind zeta function, which is much faster but assumes GRH. See [2], Chapter 6 and [3] for a detailed description of this method. In the relative case, we copy this method essentially verbatim, but we must describe the technical details which have to be changed, hence we describe the method more precisely. We choose to assume the GRH in the presentation which follows, but the necessary modifications to be made if GRH is not assumed are immediate. 2.2
Choice and Representation of the Factor Base
We fix some limit A, which will determine the size of the factor base. We choose as factor base the prime ideals P of L whose absolute norm is less than or equal to A, and which are not inert in L/K (in our case this means that the ideal p of K below P is either split or ramified). We also choose a small integer s (typically 2 ≤ s ≤ 6) and we construct a sub-factor base by taking the s elements of the factor base of smallest norm which are unramified in L/K (if A is not large enough to obtain s such prime ideals, we of course enlarge it). The ideals P considered as elements of the factor base will be used only for computing valuations vP (I) of ideals of L as described in [5]. Thus, they will be
Computation of Relative Quadratic Class Groups
437
√ stored as P = p ⊕ q−1 ( D − bp ), i.e. as the pair (p, bp ). To have uniqueness of this representation, we choose bp HNF reduced modulo the ideal pq in the sense described in [5]. When the ideals P are considered as elements of the sub-factor base, we are going to use them to compute power products, and here we will represent them completely by their associated pseudo-quadratic form. While computing the factor base, we will find a number of relations in the relative class group of the form pZL =
Y
PeP
P|p
(in our case, either pZL = P1 P2 or pZL = P2 ). Note that pZL is by definition a pseudo-principal ideal, hence we indeed naturally get relations in the relative class group Cli (L/K). These relations are called the trivial relations. There are two ways to use the relations coming from the split case pZL = P1 P2 . We can either store them as such, or we can decide to choose in our factor base only one of the two ideals P1 and P2 , say P1 . When later we obtain a relation involving the ideal P2 , we simply use the equality P2 = pP−1 1 . This method leads to matrices having approximately half the number of rows, hence which are a priori simpler to reduce. On the other hand, if we keep both P1 and P2 in our factor base, one a complete matrix of relations is obtained we can start reducing the matrix by grouping together the rows corresponding to P1 and P2 , so the methods are equivalent. 2.3
Finding Relations in the Relative Class Group
In the absolute case, apart from the trivial relations, there are essentially two methods to obtain relations in the class group. The first one is by looking for elements of small norm, which will then factor on our factor base and give us a relation. The second method is the random relation method, where we take (small) random exponents of our sub-factor base, reduce the resulting product, and hope that it factors on our factor base since a reduced ideal should be “small” in some sense. We use here only the second method, since we have not yet found an efficient way to find elements of small relative norm. Thus we choose limits for our exponents, and for each element gi in our sub-factor base, we compute a reduced pseudo-quadratic form equivalent to gim for m between the chosen limits (for example for −8 ≤ m ≤ 7). To find relations, we then compute random integers mi in the chosen exponent and compute a reduced ideal equivalent in the relative class group to Q range, mi g , where s is the size of our sub-factor base. The product, inverses and i 1≤i≤s reduction of the representatives of the classes are computed using the algorithms mentioned in [5]. This reduction is very naive but at least it has allowed us to solve some non-trivial problems (see §3 below) hence is not completely useless.
438
Henri Cohen, Francisco Diaz y Diaz, and Michel Olivier
We then factor this reduced ideal (more precisely the pseudo-quadratic form associated to it) using P-adic valuations. If it does factor on our factor base, we have found a relation. Q vi Each relation found is of the form Pi = αaZL . Thus, we see that the natural object which enters this algorithm is the relative class group Cli (L/K). We store such a relation by storing first the exponents vi , and second the pseudoelements αa (the technical way in which the latter is important in practice, but need not be mentioned here). In addition, if we know that the absolute extension L/Q is Galois, we can compute the Galois automorphisms and apply them to the relations, thus giving quite cheaply more relations (although sometimes identical to relations obtained before). 2.4
The Complete Algorithm
We can now briefly sketch the complete algorithm, without writing it formally. We do not details when they are identical to the absolute case, and refer to [2] for those. Let L/K be a given extension. A) As a first initialization step, compute everything that will be needed about the base field K, including its class and unit groups (the bnfinit function of Pari). B) Compute basic data about the relative extension L/K, including in particular the ideal q and the data allowing to go back and forth from ideals of K to L (the rnfinit function of Pari). C) After choosing a suitable constant A, compute ramified and split ideals P of L of absolute norm less than A, and express them as pseudo-quadratic forms as explained above. This will be the factor base. At the same time, store the corresponding trivial relations including the pseudo-elements. D) Choose small values s, l1 and l2 (for example s = 3, l1 = −8, l2 = 8). Extract from the factor base the s non-ramified prime ideals Pj of smallest norm, and compute the reduced quadratic forms associated to Pm j for 1 ≤ j ≤ s and l1 ≤ m < l2 . E) For 1 ≤ j ≤ s, choose random exponents mj such that l1 ≤ mj < l2 , and Q m compute a reduced form F equivalent to 1≤j≤s Pj j . Using P-adic valuations, try to factor F on our factor base. If it does factor, store the resulting relation in the format explained above: a column vector of integer exponents, together with a pseudo-element generating a principal ideal. F) If one believes that one has enough relations, compute the Hermite normal form of the relation matrix, and simultaneously the corresponding pseudoelements. As in the absolute case, the pseudo-elements which correspond to zero columns will be relative units for i, i.e. pseudo-elements αa such that αaZL = ZL . G) Compute a tentative relative class group (the Smith Normal Form of our relation matrix) and class number (its determinant). From this and knowledge of the class group of the base field K, one easily deduces a tentative absolute class number. Similarly, using the solution to the principal ideal problem in the base field, from the relative units that we have obtained we can obtain a set of absolute units of L and compute a tentative absolute regulator.
Computation of Relative Quadratic Class Groups
439
H) As in the absolute case, since we have assumed GRH, we check that a suitable partial Euler product coming from the absolute Dedekind ζ function of L is sufficiently close (up to a factor of 2) to the tentative product of the class number by the regulator. If it is not, compute more relations and go back to step F. I) Otherwise, we have computed the relative class group and regulator under some reasonable hypotheses. The ideals occurring in the pseudo-elements that have been kept for computing the relative units will by definition generate the capitulation subgroup Cli (K), which we thus compute at the same time. As in the absolute case, we can also compute a fundamental system of units if desired. A word about the correctness of the result: as in the absolute case, we need to assume GRH in two essential places. First in the numerical verification of the product h(L)R(L), to insure fast convergence of the Euler product. But second, we also need that our factor base generates the class group, by taking as constant A the value 12 log(|d(L)|)2 and using a theorem of E. Bach. As in the absolute case however, we choose a much lower constant A, and then we must “be honest”, i.e. check that all the prime ideals of norm between A and Bach’s bound are generated in the relative class group by the prime ideals of norm less than A. This can easily be done by generating more random relations involving the specific prime which is examined. Finally note that, as in the absolute case, if we keep the full HNF of the reduction matrix and the corresponding pseudo-elements (and not only the class group and the relative units), it is easy to solve the principal ideal problem in L (in fact more precisely the pseudo-principal ideal problem!)
3 3.1
An Example and Conclusion A Striking Example
The following example was given to us by C. Fieker and shows some of the limi√ tations of the absolute method. Let L = Q(ζ9 , −4201), where ζ9 is a primitive 9-th root of unity. Compute its class group, regulator, √ units, etc... For completeness, note that L is the compositum of√Q(ζ9 ) and Q( −4201), and that Q(ζ9 ) has class number equal to 1 while Q( −4201) has class group isomorphic to C36. The field L enters naturally if you want √ to apply Kummer theory to the construction of the Hilbert class field of Q( −4201) (which can in this specific case of an imaginary quadratic field be constructed very simply by using complex multiplication). The field L is a totally complex number field of degree 12 over Q, with root discriminant approximately 673.6, so neither the degree nor the discriminant are large compared to what can be presently attacked. However, if you feed it to the best existing programs (Kant, Pari,...), even a week of CPU time on a good workstation does not seem to produce enough relations in the class group. This is due mainly to the fact that L has many subfields. Thus, when searching for example for elements of small norm, they tend to be in the smaller subfields, and so the relations they generate are highly dependent.
440
Henri Cohen, Francisco Diaz y Diaz, and Michel Olivier
This example was one of the main motivations for the work described in this paper (although relative class group computations are now a natural goal). We chose of course K = Q(ζ9 ), which happens to have class number 1 so the relative and absolute class groups coincide, and the pseudo-elements are simply elements. As parameters in the algorithm described above, we chose (almost arbitrarily) A = 600, s = 5, l1 = −8, l2 = 8. By laziness, we did not check that the ideals up to Bach’s bound (here 73291) are generated by the small ones, although it would be easy. In less than 2 hours time on a Pentium pro 200, we found the desired result: the class group is isomorphic to C377244 × C6 (note that 377244 = 22 · 33 · 7 · 499) and the absolute regulator is approximately equal to 3338795.5921522... We of course have explicitly the generators of the class group and the fundamental units themselves, as well as the information necessary to solve the principal ideal problem in L. 3.2
Conclusion
We have developed an algorithm to compute relative class and unit groups. Although explicited for relative quadratic extensions, this algorithm can evidently be extended to a general relative extension, except that the representation of the relative ideals will be more complicated. It shows in addition that the basic theoretical notions which are useful in a computational context are the notions linked to the map i, i.e. the relative class group Cli (L/K), unit group Ui (L/K), the capitulation subgroup Cli (K), and the notion of pseudo-element. The main weakness of the algorithm, which is completely independent of the rest, is that we have not been able to develop a reasonable theory of relative reduction of ideals. As the example that we have just given shows, however, even a very naive definition of reduction such as the one used here suffices to give highly non-trivial results.
References 1. A.-M. Berg´e and J. Martinet: Notions relatives de regulateurs et de hauteurs. Acta Arith. 54, No.2, (1989) 155–170 2. H. Cohen: A Course in Computational Algebraic Number Theory. GTM 138, (1993) Springer-Verlag 3. H. Cohen, F. Diaz y Diaz and M. Olivier: Subexponential Algorithms for Class and Unit Group Computations. J. Symb. Comp. 24 (1997) 433–441 4. H. Cohen, F. Diaz y Diaz and M. Olivier: Algorithms for Finite Abelian Groups. Submitted to J. Symb. Comp. (1997) 5. H. Cohen, F. Diaz y Diaz and M. Olivier: Pseudo-quadratic forms and their applications. Preprint (1998)
Generating Class Fields Using Shimura Reciprocity Alice Gee and Peter Stevenhagen Universiteit van Amsterdam, Plantage Muidergracht 24, 1018 TV Amsterdam, The Netherlands [email protected] [email protected]
Abstract. The abelian extensions of an imaginary quadratic field can theoretically be generated by the values of the modular j-function, but these values are too large to be useful in practice. We show how Shimura’s reciprocity law can be applied to find small generators for these extensions, and to compute the corresponding irreducible polynomials.
1
Introduction
Among the finite extensions of a number field, the abelian extensions play a special role. Over the rational number field Q, the Kronecker-Weber theorem states that the abelian fields are the subfields of the cyclotomic fields Q(ζ) obtained by adjoining a root of unity ζ ∈ C to Q. If ζ is of order N , the corresponding irreducible polynomial is the N -th cyclotomic polynomial ΦN . It is easy to compute ΦN for small N , and for such N the coefficients of ΦN are very small. By Galois theory, essentially the theory of cyclotomic periods developed by Gauss, we can descend and find explicit generators for subfields of Q(ζ). The arithmetic theory of abelian fields is much nicer than that of arbitrary number fields. They come with explicit groups of cyclotomic units, which can be exploited to find their class groups in situations where general class group algorithms currently have no hope of succeeding [10]. Over an arbitrary number field K, the abelian extensions K ⊂ L are described by class field theory. For such L, the Galois group Gal(L/K) is canonically isomorphic to a quotient JK /(K ∗ NL/K JL ) of the id`ele group JK of K by the open subgroup generated by K ∗ and the norm image NL/K JL of the id`ele group of L. Conversely, every open subgroup B ⊂ JK containing K ∗ corresponds in this way to a unique abelian extension of K, the class field of B. The explicit determination of class fields is one of the main tasks of computational class field theory, an algorithmic area that has only recently come to enjoy popularity. The id`ele group JK is a large object that is not always convenient for explicit computations. It is however possible to describe the finite quotients of JK /K ∗ corresponding to abelian extensions of K in a different way, as quotients of the ray class groups Cn of K. These are finite abelian groups depending on a conductor n, and they play a role that is analogous to that of the Galois groups (Z/N Z)∗ in the case of the cyclotomic extensions Q(ζN ) of Q. A special role is J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 441–453, 1998. c Springer-Verlag Berlin Heidelberg 1998
442
Alice Gee and Peter Stevenhagen
played by the ray class field corresponding to the trivial conductor n = 1: this is the Hilbert class field H of K. The Galois group Gal(H/K) is canonically isomorphic to the class group C of K, and for this reason it is one of the most important extension fields of K. Just as in the case K = Q, the explicit determination of abelian extensions of K reduces to the problem of generating the ray class fields Hn corresponding to the ray class group Cn , at least if we measure the size of an abelian extension by its conductor. By generating Hn we mean computing a polynomial h ∈ K[X] for which we have Hn ' K[X]/(h). Even though class field theory proves the existence of class fields in a constructive way, it does not readily provide an algorithm to compute generators for class fields, not even for the Hilbert class field. The theory indicates that these extensions can in principle be generated over large extensions of K. Algorithmically it is often not feasible to do computations over these large number fields, and this is a serious obstruction. The only class of fields K different from Q for which there exists a theory that yields generators of class fields is the class of imaginary quadratic fields, and it is this class that we will address in the current paper. The theory of complex multiplication asserts that the ray class fields over an imaginary quadratic field K can be generated by the values of suitable modular functions. These modular functions can be viewed as elliptic analogues of the m at the rational points exponential function q(z) = exp(2πiz), whose values ζN ∈ Q generate the class fields of Q. A basic example of the theorems from z= m N complex multiplication is the following. Theorem 1. Let K be imaginary quadratic with ring of integers O = Z[θ]. Then the Hilbert class field H of K is generated by the value j(θ) of the modular function j. The modular function j : H → C is a complex valued function on the complex upper half plane H that occurs in may contexts. As a function on lattices Λ = [ω1 , ω2] ∈ C, the value j(Λ) = j(ω1 /ω2 ) is the j-invariant of the complex elliptic curve E = C/Λ. The conjugates of j(θ) = j(O) over either K or Q are the values j(a) for [a] ranging over the ideal classes Q in C(O). These values are algebraic integers, so the class polynomial FO = [a]∈C(O) (X − j(a)) ∈ Z[X] can be computed using complex approximations of the values j(a). As it is relatively easy to approximate these values, this method is to be preferred over the computationally unfeasible algebraic computation of FO as a factor of some modular polynomial Φm (X, X) that is explained in [2]. Numerical examples show that the polynomial FO is already huge for small values of the discriminant of K. For the field of discriminant −71, which has
Generating Class Fields Using Shimura Reciprocity
443
class number 7, the class polynomial equals FO = X 7 + 313645809715 X 6 − 3091990138604570 X 5 + 98394038810047812049302 X 4 − 823534263439730779968091389 X 3 + 5138800366453976780323726329446 X 2 − 425319473946139603274605151187659 X + 737707086760731113357714241006081263 , and the situation rapidly gets worse. Weber discovered that in many cases, one can generate H using functions that are considerably smaller than the j-function. For the example above, there exists a modular function f of level 48 and degree 72 over Q(j) that gives rise to a class invariant f(θ) with irreducible polynomial X 7 + X 6 − X 5 − X 4 − X 3 + X 2 + 2X + 1 . The observations on such ‘lucky occurrences’ in [11] range from theorems and numerical observations to open questions, and the distinction between them is not always clear. Following the confusion around Heegner’s purported proof of the class number one problem for imaginary quadratic orders, some of the obscure points were clarified by Birch [1] and Stark [9] in 1969. The revival of interest in modular forms in the seventies, and more in particular the contributions to the subject by Shimura [7], [8] have resulted in the development of abstract tools that are ideally suited to deal with the questions raised by Heegner. The aim of this paper is to show that Shimura’s techniques can be applied to answer the following basic questions: 1. given a modular function f, determine for which K the value f(θ) at a generator θ of OK generates the Hilbert class field H of K; 2. if f(θ) generates H, compute its irreducible polynomial. In fact the techniques can be used to identify the field K(f(θ)) in all cases, or to generate other class fields than the Hilbert class field. Even if f(θ) is not a class invariant, the information obtained is usually sufficient to produce a generator of H from f(θ)). As we will see (theorem 3), a modular function f that yields a class invariant for some imaginary quadratic field K does so for a positive proportion of all imaginary quadratic fields. From a complexity point of view, the improvement in using ‘small’ modular functions is not dramatic: as we are still working with exponential functions, the size of the coefficients of the generating polynomials for the Hilbert class field grows exponentially with the discriminant of K. This seems to be an unavoidable consequence of the theory of complex multiplication. On the other hand, the improvement by a constant factor (like 72 or 48) in the size of the coefficients enables us to produce decent generating polynomials when the discriminant of K is of moderate size. The modular function j does not have this property and is therefore never useful in computational practice.
444
2
Alice Gee and Peter Stevenhagen
Modular Functions
Before we can start our investigations on class invariants, we provide concise definitions of the modular functions that we use and indicate some ‘small’ modular functions of not too high level that can be used to produce class invariants. Proofs of all statements in this section can be found in [4]. The basic example of a modular function is the j-function encountered in the introduction. This is a holomorphic function on H that respects the action of the elements of the modular group Γ = SL2 (Z) on H. By this we mean that we have a b j( az+b cz+d ) = j(z) for z ∈ H and c d ∈ Γ . Note that the action of Γ on H factors via the quotient Γ/ ± 1. The j-function has simple pole at infinity and extends ∼ to an isomorphism of Riemann surfaces j : (Γ \ H)− −→ P1 (C) between the compactified orbit space and the complex projective line. The elements of the corresponding function field C(j) of rational functions in j are called modular functions of level 1. One obtains modular functions of arbitrary level N ≥ 1 by replacing the modular group Γ in the setting above by the congruence subgroup Γ (N ) = ker[SL2 (Z) → SL2 (Z/N Z)]. The compactified Riemann surface (Γ (N ) \ H)− is then isomorphic to the modular curve X(N ) of level N over C. The natural map X(N ) → X(1) is a Galois covering with group Γ/(±1 · Γ (N )) = SL2 (Z/N Z)/ ± 1. This implies that the function field FN,C of X(N ), whose elements are the modular functions of level N , is a Galois extension of F1,C = C(j) with group SL2 (Z/N Z)/ ± 1. Thus the modular functions of level N are simply the Γ (N )invariant meromorphic functions on H that are also ‘meromorphic at infinity’. Note that such functions are invariant under z 7→ z + N , hence periodic. As we want our modular functions to have algebraic values, we need to define our function field FN,C over a smaller base field than C. For N = 1, the modular curve X(1) = P1 (C) can clearly be defined over Q, with function field F1 = Q(j). In the general case one needs to pass to the cyclotomic base field Q(ζN ). This means that there exists a Galois extension FN of Q(ζN , j) with group SL2 (Z/N Z)/ ± 1 that yields FN,C after base change from Q(ζN ) to C. More precisely, we can write the elements of FN,C as Laurent series in q 1/n with q = exp(2πiz), and then FN is the subfield of FN,C ⊂ C((q 1/N )) consisting of the functions with Fourier coefficients in Q(ζN ). The action of the cyclotomic Galois group (Z/N Z)∗ = Gal(Q(ζN , j)/Q(j)) has a natural extension to FN ⊂ Q(ζN )((q 1/N )), and this leads to a description of Gal(FN /F1 ) = Gal(FN /Q(j)) as a semidirect product (SL2 (Z/N Z)/ ± 1) o (Z/N Z)∗ ' GL2 (Z/N Z)/ ± 1 . Here (Z/N Z)∗ is embedded in GL2 (Z/N Z)/ ± 1 as the subgroup of elements of the form 10 d0 ∈ GL2 (Z/N Z). For the full modular function field F∞ = ∪N≥1 FN one obtains the Galois group over Q(j) by passing to the projective limit: b ± 1. Gal(F∞ /Q(j)) = lim (GL2 (Z/N Z)/ ± 1) = GL2 (Z)/ ←N
The main theorem of complex multiplication is the following generalization of the theorem stated in the introduction.
Generating Class Fields Using Shimura Reciprocity
445
Theorem 2. Let K be imaginary quadratic with ring of integers O = Z[θ] and N ≥ 1 an integer. Then the ray class field HN of conductor N of K is generated over K by the values f(θ) of the functions f ∈ FN that do not have a pole at θ. It follows that the maximal abelian extension K ab of K is generated by the finite function values f(θ) for f ∈ F∞ . Our example for discriminant −71 in the introduction illustrates the fact that the j-function is already large for values of θ with small imaginary part. There are however modular functions f of higher level that are ‘smaller’ than the j-function. As we are working with functions that are integral over Z[j], this simply means that the coefficients occurring in the q-expansion of f are smaller than those we encounter for j. We mention a couple of possibilities for f, leaving a fuller treatment to [3]. The oldest examples of modular functions yielding class invariants are obtained by modification of the j-function itself. One has representations j=
(216g3)2 (12g2 )3 = 1728 + ∆ ∆
of j in terms of the normalized Eisenstein series g2 and g3 and the discriminant function ∆. It follows that there exist holomorphic branches γ2 of the cube root of j and γ3 of the square root of j − 1728 on H, both with rational q expansions. The modular group Γ acts on these functions via characters of order 3 and 2, respectively. In terms of the standard generators S = 10 11 and T = 01 −1 0 of Γ/ ± 1 we have S
(γ2 , γ3 ) −→ (γ2 , −γ3 )
and
(γ2 , γ3 ) −→ (ζ3−1 γ2 , −γ3 ). T
One deduces that we have γ2 ∈ F3 and γ3 ∈ F2 . Much smaller functions can be obtained using the fact that the discriminant function ∆ equals (2π)12 η 24 , with η the Dedekind η-function. Q This is a holomorphic function on H with rational q-expansion η(q) = q 1/24 n>0 (1 − q n ), and its transformation behavior under Γ is given by a simple formula ([4], §18.5). The discriminant function is a modular form of weight 12, not a modular function. One can obtain modular functions by forming suitable quotients of modular forms of the same weight, see ([4], §11.2). A good example is the modular function ∆(2z)/∆(z) of √ level 2, which has a holomorphic 24-th root η(2z)/η(z) on H. The function f2 = 2·η(2z)/η(z), which is integral over Z[j], has been studied by Weber, who introduced related modular functions f and f1 with rational qexpansions satisfying (X + f8 )(X − f81 )(X − f82 ) = X 3 − γ2 X + 16 . These functions f, f1 and f2 are in F48 , and they are much smaller than j. Each of them generates an extension of degree 72 of the field F1 = Q(j), and the action of S and T is given by S
(f, f1, f2 ) −→ (f, f2 , f1)
and
−1 −1 2 (f, f1 , f2) −→ (ζ48 f1 , ζ48 f, ζ48 f2 ) . T
446
Alice Gee and Peter Stevenhagen √
The miraculous fact that the function value ζ48 f2 ( −1+2 −71 ) lies in the Hilbert √ class field of Q( −71) accounts for the existence of the small polynomial given in the introduction. There are other modular functions than those introduced by Weber that yield small generators as well. Schertz constructs small generators from quotients of η-functions in [6]. In a similar vein, one can generalize Weber’s √ classical functions of level 48 by considering the holomorphic 24-th root n · η(nz)/η(z) of n12 ∆(nz)/∆(z) for any integer n > 1. For √ n = 2 this yields Weber’s function f2 , for n = 3 we obtain a function g3 = 3 · η(3z)/η(z) ∈ F72 of degree 48 over Q(j) that satisfies an analogue (X + g6 )(X − g61 )(X − g62 )(X − g63 ) = X 4 + 18X 2 + γ3 X − 27 √ of the identity for √ f2 above. The relation gg1 g2 g3 = 3 is analogous to Weber’s identity ff1 f2 = 2. It shows that, unlike the case of the modular function j, the values of these functions at singular moduli are ‘almost’ algebraic units.
3
Finding Class Invariants
As in the previous sections, we let K be an imaginary quadratic field with ring of integers O = Z[θ]. For uniqueness’ sake, we normalize θ such that we have TrK/Q (θ) ∈ {−1, 0}. We formulate the problem of finding class invariants as follows: given some modular function f ∈ FN , determine for which K the value f(θ) lies in the Hilbert class field H of K. By the complex multiplication theorem from section 2, we know that f(θ) is an element of K ab , and even of the ray class field of conductor N of K. In order to prove that f(θ) actually lies in K, it suffices to show that all automorphisms in Gal(K ab /H) act trivially on f(θ). The group Gal(K ab /H) can be described by class field theory. The main theorem of class field theory for imaginary quadratic fields can be phrased as a single exact sequence A b ∗ −→ Gal(K ab /K) −→ 1 . 1 −→ K ∗ −→ K
b ∗ of K. b ∗ = (K ⊗Z Z) Here A denotes the Artin map on the finite id`ele group K Q0 b ∗ is the b= eles of K, and that K Note that K p (K ⊗Q Qp ) is the ring of finite ad` factor group of JK obtained by dividing out the component group corresponding to the infinite prime of K, which is also the connected component of the identity b we have the profinite completion O b = O ⊗Z Z b = lim←N (O/N O) in JK . Inside K Q ∗ b b ∗ is of the ring of integers O of K. The subgroup O = p (O ⊗Z Zp )∗ of K ab the inverse image under the Artin map of Gal(K /H), and we need to check b∗ on f(θ) induced by A is trivial. The Shimura whether the natural action of O b ∗ can be obtained as the reciprocity law says that the image of f(θ) under x ∈ O value in θ of a modular function that is conjugate to f over Q(j). More precisely,
Generating Class Fields Using Shimura Reciprocity
447
there is a map g = gθ connecting the exact rows A b∗ O −→ Gal(K ab /H) −→ 1 g yθ b −→ Gal(F∞ /F1 ) −→ 1 . 1 −→ {±1} −→ GL2 (Z)
1 −→ O∗ −→
(1)
such that we have Shimura’s reciprocity relation (f(θ))x = (f g(x
−1
)
)(θ)
(2)
and the fundamental equivalence (f(θ))x = f(θ)
⇐⇒
f g(x) = f .
Note that only the implication ⇐ is immediate from the reciprocity relation, the implication ⇒ requires an additional argument ([7], prop. 6.33). With the exponent −1 in the reciprocity relation (2), the definition of the b is the transpose connecting homomorphism g is the following: g(x) ∈ GL2 (Z) ∗ b b on the free Z-module of the matrix describing the multiplication by x ∈ O b = Z b ·θ+Z b with respect to the basis [θ, 1]. Explicitly, if θ has irreducible O polynomial X 2 + BX + C, then we have t − Bs −Cs (3) gθ : x = sθ + t 7−→ s t If f ∈ FN is a modular function of level N , the value f(θ) lies in the ray class b ∗ can be computed via the finite quotient (O/N O)∗ . field and the action of O Diagram (1) may then be reduced to a diagram of finite abelian groups ∗ O∗ −→ (O/N O) −→ Gal(HN /H) −→ 1 g yθ {±1} −→ GL2 (Z/N Z) −→ Gal(FN /F1 ) −→ 1 .
(4)
In order to prove that f(θ) lies in H, we compute generators x1 , x2, . . . , xk of the group (O/N O)∗ , map them to GL2 (Z/N Z) using the reduction g of g modulo N and check that each of the matrices g(xi ) ∈ GL2 (Z/N Z) acts trivially on f. This is relatively straightforward if we know the action on f of the standard generators S, T ∈ Γ and of the Galois group Gal(Q(ζN )/Q) = (Z/N Z)∗ . We refer to section 5 for some explicit examples. If we replace the base field K in diagram (4) by another quadratic field whose discriminant is in the same residue class modulo 4N , the integers B and C occurring in (3) coincide modulo N and the image of g : (O/N O)∗ −→ GL2 (Z/N Z) is the same for both fields. We get the following result. Theorem 3. Let K be imaginary quadratic with ring of integers Z[θ], and let f ∈ FN be a modular function with the property that f(θ) lies in the Hilbert class field of K. Then the same statement holds for all for all imaginary quadratic fields whose discriminant is congruent to disc(K) modulo 4N .
448
4
Alice Gee and Peter Stevenhagen
Computing Class Polynomials
Suppose that we have found a modular function f for which f(θ) lies in the Hilbert class field H of K. In order to find its irreducible polynomial over K, we need to determine the conjugates of f(θ) over K. This means that we have to compute the action of the class group C(O) = Gal(H/K) on f(θ). For any imaginary quadratic order O of discriminant D, we can list the elements of C(O) as reduced primitive binary quadratic forms [a, b, c] of discriminant D. For our purposes, it suffices to know that these are triples [a, b, c] of integers satisfying gcd(a, b, c) = 1 and b2 − 4ac = D. They are reduced if they satisfy |b| ≤ a ≤ c and, in case we have |b| = a or a = c, also b ≥ 0. For given discriminant D < 0, there are only finitely many such triples, and they are easily enumerated if D is not too large. The correspondence between reduced forms and elements of the class group is obtained by associating to [a, b, c] the class of √ −b+ D the ideal with Z-basis [ 2 , a]. Note that [a, b, c] and [a, −b, c] correspond to inverse ideal classes. In terms of quadratic forms, the action of the class group of O = Z[θ] on the value j(θ) of the j-function is given by √
j(θ)[a,−b,c] = j( −b+ 2a
D
).
For a general modular function f with f(θ) ∈ H, Shimura reciprocity enables us to determine the conjugate f˜ of f over Q(j) for which we have √
˜ −b+ f(θ)[a,−b,c] = f( 2a
D
).
(5)
For this one needs a more general form of the law, which gives us the analogue of b ⊗Z Q)∗ on the values f(θ). For this b ∗ = (O (1) for the action of the full group K we need to replace Gal(F∞ /F1 ) in (1) by the full automorphism group Aut(F∞ ), b and those coming which is generated by the automorphisms coming from GL2 (Z) + from the group GL2 (Q) of rational 2 × 2-matrices of positive determinant. The right action of GL2 (Q)+ on F∞ comes, just like in the case of Γ = SL2 (Z), from the natural action of GL2 (Q)+ on H via fractional linear transformations: Q + b = f α = f ◦ α. The groups GL2 (Z) p GL2 (Zp ) and GL2 (Q) are subgroups of Q0 b = GL2 (Qp ) of invertible 2 × 2- matrices over the finite the group GL2 (Q) b of Q. They have intersection Γ = SL2 (Z), and every b = Q ⊗Z Z ad`ele ring Q b can be written in a non-unique way as u · α with u ∈ GL2 (Z) b element of GL2 (Q) + b and α ∈ GL2 (Q) . One proves that this induces an action of GL2 (Q) on F∞ u·α given by f −→ (f u )α . We can now enlarge diagram (1) to A b ∗ −→ K Gal(K ab /K) −→ 1 gθ y ∗ b −→ Aut(F∞ ) −→ 1 . 1 −→ Q −→ GL(Q)
1 −→ K ∗ −→
(6)
and with this diagram the reciprocity relation (2) holds unchanged. Here g = gθ is the natural Q-linear extension of the map defined in (3).
Generating Class Fields Using Shimura Reciprocity
449
b ∗ that locally For every ideal a of O, we can find an id`ele x = (xp )p ∈ K generates a at p for all rational primes p. This means that we have a ⊗Z Zp = xp (O ⊗Z Zp ) √
for all p. For the ideal with Z-basis [ −b+2 D , a] corresponding to [a, b, c], we can take x = (xp )p with if p - a a √ −b+ D if p | a and p - c xp = 2√ −b+ D − a if p | a and p | c . 2 This id`ele maps to [a] ∈ C(O) under the Artin map, so we have f(θ)[a,b,c] = f(θ)x for this x. Applying the reciprocity relation (3) for x−1 , we find f(θ)[a,−b,c] = to the basis [θ, 1] the Q(f g(x) )(θ). Let M ∈ GL2 (Q)+ describe with respect √ −b+ D linear map K → K that maps [θ, 1] to [ 2 , a]. Then M acts on H by M (θ) =
√ −b+ D , 2a
and we obtain f(θ)[a,−b,c] = f g(x)·M
−1
√
( −b+ 2a
D
).
(7)
b so f˜ = A straightforward check shows that ux = g(x) · M −1 is in GL2 (Z), g(x)·M −1 is a conjugate of f over Q(j), and (7) is the explicit form of (5). f Computing the function f˜ from f is another instance of the problem solved in the previous section. Clearly, we only need to compute ux modulo the level N of f. If pk is the largest power the prime p dividing N , we find that we have a b/2 if p - a pk 0 1 mod −b/2 −c k (8) if p | a and p - c mod p ux ≡ 1 0 −b/2−a −b/2−c mod pk if p | a and p | c. 1 −1 in the case that D is odd. For even discriminants there is a similar formula.
5
Numerical Examples
We apply the methods explained in this paper to find small generators for a few imaginary quadratic fields K. For each of these fields, the class polynomial obtained by a straightforward application of theorem 1 is huge. Example 1. Let D = −71 be the prime discriminant occurring in the introduction. Then 2 and 3 do not divide D, and it was already known to Weber that in invariants when evaluated at this situation, the functions γ2 and γ3 yield class √ −1+ −71 has irreducible polynomial appropriate values in O = Z[θ], where θ = 2 X 2 + X + 18. In order to check this for γ2 , which has level 3, one notes first that 3 splits in K, so (O/3O)∗ /O∗ is a cyclic group of order 2 generated by θ − 1. The action
450
Alice Gee and Peter Stevenhagen
of the matrix g θ (θ − 1) = 11 02 ∈ GL(Z/3Z) is given by ζ3 7→ ζ32 and γ2 7→ ζ32 γ2 , so it leaves α = ζ3 γ2 (θ) invariant. In order to find the irreducible polynomial of α, we list the 7 reduced quadratic forms of discriminant −71 and for each form the matrix u ∈ GL2 (Z/3Z) corresponding to it by (8). From the complex approximations of the conjugates of α we find a polynomial α = X 7 + 6745 X 6 − 327467 X 5 + 51857115 X 4 + 2319299751 X 3 fK + 41264582513 X 2 − 307873876442 X + 903568991567
that is somewhat smaller than the class polynomial listed in the introduction. In order to discover the small class invariant arising from the Weber function f2 ∈ F48 , we need to compute the action of the generators of (O/48O)∗ /O∗ on f2 (θ). As 2 is also split in K, this is an abelian 2-group of type (2) × (2) × (2) × (4) × (4). One can take {17, 16θ + 17, 6θ + 19, 19, 36θ + 1} as a set of generators. Applying gθ , one sees that the first generator acts trivially on Q(ζ48 , f2 ). Table 1. Action of (O/48O)∗ on Q(ζ48 , f2) for θ = σ
σ−1 ζ48
σ−1 f2
16θ + 17 6θ + 19 19 36θ + 1
ζ3 ζ85 −1 ζ4
ζ32 ζ83 −1 ζ43
√ −1+ −71 2
Table 1 gives the action of the gθ -image σ of the remaining generators. It follows that ζ48 f2 (θ) is left invariant. Its conjugates over K are computed as before, and we find the small polynomial from the introduction: ζ
fK48
f2 (θ)
= X 7 + X 6 − X 5 − X 4 − X 3 + X 2 + 2X + 1 .
In fact, one can show in this way that ζ48 f2 yields a class invariant whenever 2 splits and 3 is unramified in K. One can also construct class invariants for D = −71 using the generalized Weber function g(z) = η(z/3)/η(z) ∈ F72 . The group (O/72O)∗ /O∗ is of type (2)×(2)×(2)×(6)× (6) with set of generators {19, 36θ +1, 18θ +1, 65, 64θ +65}. One finds that β = ζ3 g2 (θ) is a class invariant, with irreducible polynomial β = X 7 + (2 + 2θ)X 5 − (30 + 3θ)X 4 + (51 − 3θ)X 3 − (8 − 10θ)X 2 − (47 + 2θ) . fK
A normalization of the function g2 always works when 3 splits and 2 is unramified in K. In this case we find irreducible polynomials over K, not Q. Example 2. For K of discriminant D = −580 = −4 · 5 · 29 the prime 2 is ramified and the prime 3 inert. In this case one can construct a class invariant from the
Generating Class Fields Using Shimura Reciprocity
Table 2. Action of (O/72O)∗ on Q(ζ3 , g) for θ = σ
ζ3σ−1
19 36θ + 1 18θ + 1 65 64θ + 65
1 1 1 1 ζ3
√ −1+ −71 2
σ−1
g
−1 1 1 −1 ζ3
Table 3. Action of (O/48O)∗ on Q(ζ48 , f) for θ = σ
σ−1 ζ48
16θ + 1 33θ + 34 19 33θ + 16
ζ3 ζ43 −1 1
451
f
√
−145
σ−1
1 ζ43 1 1
√ Weber function f(z) = η(z/2)/η(z) ∈ F48 evaluated at θ = −145. To see this, one computes the action of the group (O/48O)∗ of type (8) × (8) × (4) × (4) generated by {16θ + 1, 33θ + 34, 19, 33θ + 16}. Clearly f4 yields a class invariant. However, the action on ζ48 shows that ζ8 is left invariant by all √ generators except the second, which maps ζ8 to −ζ8 . It follows that α = f2 (θ)/ 2 is also a class invariant. In this case K has class number 8 and C(O) is of type (2) × (4). We find α = X 8 − 17X 7 + 7X 6 + 12X 5 − 42X 4 + 12X 3 + 7X 2 − 17X + 1 . fK
Example 3. For K of discriminant D = −471 = −3 · 157 the primes 2 and 3 are respectively split and ramified in K, and C(O) is cyclic of order 16. This case resembles that in example 1, but the ramification at 3 yields a different action of (O/48O)∗ /O∗ , which is now a group of type (6) × (4) × (4) × (2) with generating set {32θ + 33, 19, 36θ + 1, 6θ + 19}. Table 4. Action of (O/48O)∗ on Q(ζ48 , f2) for θ = σ
σ−1 ζ48
32θ + 33 19 36θ + 1 6θ + 9
1 −1 ζ4 ζ85
f
σ−1
ζ3 −1 ζ43 ζ83
√ −1+ −471 2
452
Alice Gee and Peter Stevenhagen
From the Galois action in table√ 4 one deduces that ζ16 f3 yields a class invariant α when evaluated at θ = −1+ 2 −471 . Its irreducible polynomial is α = X 16 + 6X 15 + 62X 14 − 106X 13 + 382X 12 − 942X 11 + 4756X 10 fK − 9629X 9 + 18987X 8 − 22281X 7 + 36601X 6 − 44222X 5 + 60470X 4 − 29217X 3 + 4085X 2 + 1775X − 1 .
This is not as good as in example 1, and we can do better by using the ‘generalized Weber functions’ of level 72. More precisely, we can use the function √ 2 2 g2 (z) = η( z+2 3 )/η(z) ∈ F72 , which yields a class invariant β = ζ3 g2 (θ)/ −3 with irreducible polynomial β = X 16 + 20X 15 − 127X 14 + 342X 13 + 183X 12 − 427X 11 − 1088X 10 fK + 794X 9 + 1333X 8 + 794X 7 − 1088X 6 − 427X 5 + 183X 4 + 342X 3 − 127X 2 + 20X + 1 .
As in the previous example, we see that our class invariant is a unit — quite a contrast with the modular value j(θ). Example 4. We finally take K of discriminant D = −803 = −11 · 73. Then 2 is inert and 3 is split in K, and (O/48O)∗ /O∗ is a group of type (2)×(4)×(8)×(6) with generating set {16θ + 17, 36θ + 1, 18θ + 1, 15θ + 40}. This time the Galois action is more complicated, involving more than just multiplication by roots unity. Table 5. Action of (O/48O)∗ on Q(ζ48 , f, f1, f2) for θ = σ
σ−1 ζ48
σ f
σ f1
σ f2
16θ + 17 36θ + 1 18θ + 1 15θ − 8
ζ3 ζ4 ζ83 1
ζ32 f −f −f 3 ζ16 f2
ζ32f1 ζ43f1 ζ87f1 11 ζ16 f
ζ32 f2 ζ 4 f2 ζ 8 f2 ζ 8 f1
√ −1+ −803 2
The action of 15θ − 8 in the bottom row shows that none of the Weber functions can be normalized such √as to yield a class invariant. A close approximation is 16 f(θ), an element invariant under the first 3 genhowever given by α = 2ζ48 erators. It generates a cubic extension the Hilbert class field H of K, with √ of √ 31 25 f1 (θ) and γ = 2ζ48 f2 (θ) over H. As we have αβγ = −4, conjugates β = 2ζ48 it follows that H is generated over K by the symmetric expressions α + β + γ and αβ + αγ + βγ = −4(α−1 + β −1 + γ −1 ) . One finds that that the single expression δ = 12 (α+β+γ) is sufficient by checking that C(O), which has order 10, acts transitively on δ. We obtain δ = X 10 + 8X 9 + 19X 8 + 35X 7 + 101X 6 + 179X 5 + 220X 4 fK + 263X 3 + 230X 2 + 100X + 16 .
Generating Class Fields Using Shimura Reciprocity
453
In this case, the symmetric expression ε = α−1 + β −1 + γ −1 also yields a class invariant, with irreducible polynomial ε = X 10 + 22X 9 + 32X 8 + +45X 7 + 109X 6 + 92X 5 + 266X 4 fK + 161X 3 − 104X 2 + 48X + 128 .
References Birch B.: Weber’s class invariants. Mathematika 16 (1969) 283–294 Cox D.A.: Primes of the form x2 + ny2 . Wiley-Interscience (1989) Gee, A.C.P.: thesis Universiteit van Amsterdam (in preparation) Lang, S.: Elliptic functions, 2nd edition. Springer Graduate Text in Mathematics 112 (1987) 5. Schertz, R.: Die singul¨ aren Werte der Weberschen Funktionen f, f1 , f2 , γ2 , γ3 . J. Reine Angew. Math. 286/287 (1976) 46–74 6. Schertz, R.: Probl`emes de construction en multiplication complexe. S´em. Th´eor. Nombres Bordeaux, (2) 4 (1992) no. 2, 239–262 7. Shimura G.: Introduction to the Arithmetic Theory of Automorphic Functions. Iwanami Shoten and Princeton University Press (1971) 8. Shimura, G.: Complex Multiplication. Modular functions of One Variable I Springer Lecture Notes in Mathematics 320 (1973) 9. Stark, H.M. On a ”gap” in a theorem of Heegner. J. Number Theory 1 (1969) 16–27 10. Washington, L.C.: Introduction to Cyclotomic Fields, 2nd edition. Springer Graduate Text in Mathematics 83 (1997) 11. Weber, H.: Lehrbuch der Algebra, Vol. III. Chelsea reprint original edition 1908 12. Yui, N. and Zagier, D.: On the singular values of Weber modular functions. Math. Comp. 66 (1997) no. 20, 1645–1662
1. 2. 3. 4.
Irregularity of Prime Numbers over Real Quadratic Fields Joshua Holden Department of Mathematics and Statistics, University of Massachusetts at Amherst Amherst, MA 01003, USA [email protected]
Abstract. The concept of regular and irregular primes has played an important role in number theory at least since the time of Kummer. We extend this concept to the setting of arbitrary totally real number fields k0 , using the values of the zeta function ζk0 at negative integers as our “higher Bernoulli numbers”. Once we have defined k0 -regular primes and the index of k0 -irregularity, we discuss how to compute these indices when k0 is a real quadratic field. Finally, we present the results of some preliminary computations, and show that the frequency of various indices seems to agree with those predicted by a heuristic argument.
1
Definitions and Basic Concepts
Let k0 be a totally real number field, and let p be an odd prime. Let k1 = k0 (ζp ), where ζpn will denote a primitive pn -th root of unity. Let ∆ = Gal(k1 /k0 ), and let δ = |∆|. Let pe be the largest power of p such that ζpe ∈ k0 (ζp ). Definition 1. Let ζk0 be the zeta function for k0 . We say that p is k0 -regular if p is relatively prime to ζk0 (1 − 2m) for all integers m such that 2 ≤ 2m ≤ δ − 2 and also p is relatively prime to pe ζk0 (1 − δ). The number of such zeta-values that are divisible by p will be the index of k0 -irregularity of p. It is well-known that the numbers ζk0 (1 − 2m) and pe ζk0 (1 − δ) are rational; see, e.g., [16]. In fact we can show that they are p-integral. Deligne and Ribet [6] have shown that one can construct an p-adic L-function Lp (s, ρ) for even characters ρ over a totally real field k0 . Furthermore, there exists a power series fχ ∈ Zp [[T ]] such that Lp (1 − s, ωχ−1 ) = fχ (us − 1)/hχ (us − 1) uller character) and for s ∈ Zp , where hχ (T ) = T if χ = ω (the p-adic Teichm¨ hχ (T ) = 1 otherwise, and u is a certain element of the principal units of Zp . The p-adic L-function Lp (1 − n, ρ) is equal to L(1 − n, ρω−n ) at negative integers, up to some fudge factors which are units. If s = 2m is a positive integer then u2m − 1 is in Zp and so is fχ (u2m − 1). The case 2m < δ corresponds to χ 6= ω, so Lp is p-integral and thus so is L(1 − 2m, 1) = ζk0 (1 − 2m). The case of 2m = δ J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 454–462, 1998. c Springer-Verlag Berlin Heidelberg 1998
Irregularity of Prime Numbers over Real Quadratic Fields
455
is similar, with hχ (uδ − 1) = uδ − 1 being equal to pe , up to a unit. (For more details, see [14].) The definition of p being k0 -regular is consistent with that of p being regular, since if k0 = Q we have δ = p − 1, L(1 − 2m, 1) = −B2m /(2m) for B2m the 2m-th Bernoulli number, 2 ≤ 2m ≤ p − 3, and p never divides pe L(2 − p, 1) = −pBp−1 /(p − 1) by the Claussen-von Staudt Theorem (Theorem 5.10 of [20]) or by the argument given in Section 2 of [17]. Also, k0 -regularity seems to share at least some of the properties of regularity. For example, by a proof analogous to one for the case over Q, there are infinitely many k0 -irregular primes for any given k0 . As applications of this concept, we have the following two theorems (see [14]). The first is a special case of the Fontaine-Mazur Conjecture for number S fields. Let kn = k0 (ζpn ) for n > 1 be a non-trivial extension of k1 . Let K = kn . Theorem 1 (Holden). Suppose p is k0 -regular. Then there are no unramified infinite powerful pro-p extensions M of kn , Galois over k0 , such that K ∩M = kn and Gal(Mel /kn ) = Gal(Mel /kn )− according to the action of ∆, where Mel is the maximal elementary abelian subextension of M/kn . The second theorem is an improvement of a theorem of Greenberg. A more limited version of the concept of k0 -regularity, though not the definition itself, appeared in Greenberg’s paper [11], which presents a generalization of Kummer’s criterion for the class number of k1 to be divisible by `. Let k1+ denote the maximal real subfield of k1 , which is equal to k0 (ζp + ζp−1 ). Let h(k1 ) denote the class number of k1 and h+ (k1 ) denote the class number of k1+ . It is known that h+ (k1 ) | h(k1 ); we let the relative class number h− (k1 ) be the quotient. Theorem 2 (Greenberg, Holden). Assume that no prime of the field k1+ lying over p splits in k1 . Then p divides h− (k1 ) if and only if p is not k0 -regular. Ernvall, in [7,8,9], has defined a notion of “generalized irregular primes” which is closely related to mine for abelian k0 . Also, Hao and Parry, in [12], defined m-regular primes for any integer m and showed they are equivalent to what I have called D-regular primes when m is a positive discriminant. In both cases, results stronger than Theorem 2 are proved, but only for abelian k0 .
2
Computation
√ For the rest of this paper, k0 will be a real quadratic field Q( D), with D a positive fundamental discriminant. For such a k0 , we will say that primes are D-regular or have given index of D-irregularity, and we will let the zeta function ζk0 be also denoted by ζD . In this case δ will be equal to p − 1 unless D = p, in which case δ = (p − 1)/2. Also, e is always equal to 1 when p does not divide the order of k0 over Q, which is true in this case since p is odd. (The case of p = 2 is quite similar, but we will not go into it here.)
456
Joshua Holden
The first order of business is clearly to compute ζD (1 − 2m) for m ≥ 1 an integer. We know that ζ(1 − 2m) = −B2m /(2m), and using similar methods Siegel showed that D B2m 2m−1 X D χ(j)B2m (j/D) . 4m2 j=1 (See [18] and [21].) Here χ(j) = Dj , the Kronecker symbol, and B2m (j/D) indicates the 2m-th Bernoulli polynomial evaluated at the fraction j/D. The Bernoulli polynomial Br (x) can be computed from the Bernoulli numbers as r X r Br−s xs . Br (x) = s s=0
ζD (1 − 2m) =
(This is not the most profound formula Siegel found for such zeta functions, and it is probably also not the fastest. It is, however, the easiest to understand and to program, and so it seems reasonable to start with a thorough analysis of this formula before going on to others. For more on these other expressions, see [19], [21], and [4]. Also, there is a formula called the “continued fraction formula” for these zeta-values, which is likely to be faster but uses more computer storage space. For more on this, see [13].) In order to determine whether a prime p is D-irregular, we will need to know the values of ζD (1 − 2m) for all 2 ≤ 2m ≤ δ, which in most cases will be 2 ≤ 2m ≤ p−1. Thus it is most efficient to first compute all ζD (1−2m) for a range 2 ≤ 2m ≤ M and then use them to test all primes p such that 3 ≤ p ≤ M + 1. The time to check whether p divides ζD (1 − 2m) or pζD (1 − δ), and even to what order, is then much smaller than the time to compute ζD (1 − 2m) in the first place. Therefore it is interesting, and sometimes useful, to find the order of the running time (in m and D) of the computation of ζD (1 − 2m). Since the Bernoulli numbers are ubiquitous in this computation, we will assume that they have been precomputed. For more on the computation of Bernoulli numbers, see e.g. [1] and [10]. Our cost model will assume “naive” arithmetic: if the arguments are integers with bit length t and t0 , we assign a cost of O(t + t0 ) to addition and subtraction and O(tt0 ) to multiplication and division. For rational numbers, there is the phenomenon of “coefficient explosion” when we put fractions over a common denominator to add them. We will deal with this by computing a common denominator of the Bernoulli polynomials in the zeta function beforehand and assuming that all of our values are expressed over it, thus reducing the problem to precomputation and integer arithmetic. For more on the time-complexity of arithmetic, see e.g. Section 1.1.2 of [5]. First we will need to calculate 2m X 2m B2m−s (j/D)s B2m (j/D) = s s=0 =
2m X s=0
(2m)! B2m−s (j/D)s . s!(2m − s)!
Irregularity of Prime Numbers over Real Quadratic Fields
457
All of the fractions in this expression have a common denominator equal to the product of the denominators of B0 , . . . , B2m times D2m , and this is the same for each j. Thus we will assume that the Bernoulli numbers are precomputed over this denominator. (We will discuss the cost of this precomputation later.) The way to compute the Bernoulli polynomial which is asymptotically fastest seems to be to start with B0 and repeatedly perform the operation of multiplying by s (j/D) 2m − s + 1 and then adding Bs , as s goes from 1 to 2m. (This idea, which is used by PARI to compute Bernoulli numbers, may go back to Lehmer. The papers [10] and [1] investigate some fast methods of computing Bernoulli numbers using “inexact” arithmetic due to Wilf, Buhler, et al., which may also be useful for Bernoulli polynomials. However, they require a faster multiplication algorithm to be worthwhile.) In order to figure out how long this will take, we first need to know the bit length of Bs , stored as a rational number. Using a standard estimate of the size of Bernoulli numbers, we see that s s . |Bs | = O 2πe (See Chapter 15 of [15], or Section 3 of [1], for example.) Since we need to keep the Bernoulli numbers over a common denominator, our denominator will be D2m times the product of the denominators of B0 , . . . , B2m , and the Claussenvon Staudt Theorem shows that this second factor is Y p . 1≤2k≤m (p−1)|2k
Bach shows in [1] that the bit size of this is O(m lg m), so the bit size of our denominator as a whole is O(m(lg m + lg D)). Thus the bit length of Bs , which is the bit length of the numerator plus the bit length of the denominator, is s s + 2m(lg m + lg D) , O lg 2πe or O(m(lg m + lg D)). The order of running time is controlled by 2m multiplications of (s/(2m − s + 1))(j/D) by the result of the previous step. The first factor has bit length of order O(lg m + lg D), while the second factor has a bit length of the same order as the final result, B2m (j/D), which from its definition can be seen to have bit length of order O(m + m(lg m + lg D) + m lg D) + O(lg D) = O(m(lg m + lg D)). (Note that the binomial coefficient is less than 22m .) Thus the total time to compute B2m (j/D) is O(m)O(lg m + lg D)O(m(lg m + lg D)) = O(m2 (lg2 D + lg2 m + lg D lg m)). Now we need to compute ζD (1 − 2m) =
D B2m 2m−1 X D χ(j)B2m (j/D) . 4m2 j=1
458
Joshua Holden
Computing the Kronecker symbol χ(j) =
D j
takes time O(lg2 D) which is not
significant compared to computing B2m (j/D). (See, for example, Section 1.4.2 of [5].) The product χ(j)B2m (j/D) needs to be calculated D times, which takes O(Dm3 lg D(lg m + lg D)), and then added up, which is not significant. The rest of the calculations are of about the same running time as one calculation of B2m (j/D), and thus do not contribute. The bit lengths are also of the same order as those we have already calculated, giving us a total computation time of O(Dm2 (lg2 D + lg2 m + lg D lg m)) and a total bit length of O(m(lg m + lg D)). We should add to this the time to put the Bernoulli numbers over a common denominator. Since we are planning to compute all ζD (1 − 2m) for a range 2 ≤ 2m ≤ M , we can start with a small common denominator and update it as we increase m. In fact, since we need a precomputed table of Bernoulli numbers up to BM , it seems to make more sense to precompute them initially over a common denominator for all of them. Then we need only to update the factor of D2m , which involves a multiplication by D2 of each of O(m) numbers of bit size O(m lg m), for a total time of O(m2 lg D lg m), which is of the same order as the above computations. (If we also updated the common denominator of the Bernoulli numbers at each step, we would have a similar time-complexity.) Finally, we conclude that if the time necessary to compute ζD (1 − 2m) is O(Dm2 (lg2 D + lg2 m + lg D lg m)), then the time necessary to compute all ζD (1 − 2m) for a range 2 ≤ 2m ≤ M and then use them to test all primes p such that 3 ≤ p ≤ M + 1 is O(DM 3 (lg2 D + lg2 M + lg D lg M )). Implementing the algorithm using the PARI-GP program (see [2] for more information) and a Sun SPARCstation-10 computer, we have in Table 1 a few examples of actual processor times in hours, minutes, and seconds. These times include an estimated 14.5 seconds used in starting up PARI-GP and loading the programs and Bernoulli numbers from disk. Table 1. Examples of processor times D 5 8 12 13
M running time 1000 55:01.9 1000 1:28:47.7 1000 2:17:34.5 1000 2:43:22.6
We may compare these time estimates to those of others working with generalized notions of irregularity. Ernvall does not seem to have a general algorithm, and his algorithms for special cases do not overlap ours, although they are similar to those of Hao and Parry. Hao and Parry have an algorithm for testing D-irregularity which does not compute zeta-values explicitly. It checks
Irregularity of Prime Numbers over Real Quadratic Fields
459
whether p divides ζD (1−2m) (or pζD (1−δ)) in time O(Dp lg3 p+D lg2 D). Thus the index of D-irregularity for p is calculated in time O(Dp2 lg3 p + Dp lg2 D). To then check all primes p such that 3 ≤ p ≤ M + 1 takes O(DM 2 lg3 M + DM lg2 D)O(M/ lg M ) = O(DM 3 lg2 M +DM 2 lg2 D/ lg M ) by the Prime Number Theorem. This is comparable to our running time in both D and M . Since Hao and Parry’s algorithm does not use Bernoulli numbers, and in fact fundamentally uses only integers, it requires no precomputation and is likely always to be faster in practice. On the other hand, since it does not compute zeta-values it cannot provide any extra information. In particular, it cannot tell whether a higher power of p divides ζD (1 − 2m), an issue we will address in the next section.
3
Predictions and Data
There is a heuristic argument which has been used to predict the fraction of primes with a given index of Q-irregularity. If one assumes that for any prime p the Bernoulli numbers are distributed randomly modulo p (i.e. B2m is divisible by p with probability 1/p), then the probability that the index of irregularity i(p) is equal to k should be 1 (p−3)−k k p−3 1 2 1 2 1− , k p p which approaches
k 1 1 e−1/2 k! 2 as p goes to infinity. (See, e.g., section 5.3 of [20].) The resulting predicted fractions appear to agree well with computer calculations of actual percentages of irregular primes, for example the work of Buhler, Crandall, Ernvall, and Mets¨ ankyl¨ a in [3] for the primes less than 4000000. A similar argument can be given for D-irregular In this case it is primes. , the Kronecker symbol. well-known that ζD (s) = ζ(s)L(s, χ), where χ(j) = D j Thus each ζD (1 − 2m) and each pζD (1 − δ) breaks up into two pieces. The first piece behaves essentially like B2m in terms of distribution modulo p. We will assume that the second piece is independent of the first modulo p, and is also evenly distributed. We would then predict that the probability that p divides the second piece is 1 (p−1)−k k p−1 1 2 1 2 1− , k p p f(k) =
which also goes to f(k). Then the expected probability of p having index of d-irregularity iD (p) equal to k would be X f(i)f(j) , i+j=k
where i, j ≥ 0.
460
Joshua Holden
Since the behavior of the first piece is well-studied for primes up to 4000000, (2) we concentrate on the second piece. We let iD (p) be the number of zeta-values for which the second piece is divisible by p. The PARI-GP program was used to compute the index of D-irregularity for various ranges of primes and various values of D. For D = 5, 8, 12, and 13, the actual numbers and fractions of the (2) 167 primes less than 1000 with iD (p) = k are shown in Table 2. For comparison, Table 3 gives values of f(k) for small k. Table 2. Results for D = 5, 8, 12, 13 and p < 1000 k D=5 Number 0 112 1 43 2 11 3 1 4 0
D=5 D=8 Fraction Number .670659 108 .257485 47 .065868 11 .005988 1 0 0
D = 8 D = 12 Fraction Number .646707 102 .281437 53 .065868 11 .005988 0 0 1
D = 12 D = 13 Fraction Number .610778 103 .317365 44 .065868 17 0 3 .005988 0
D = 13 Fraction .616766 .263473 .101796 .017964 0
Table 3. Values of f(k) k 0 1 2 3 4
f (k) .606531 .303265 .075816 .012636 .001580
(2)
The average number and fraction of the 24 primes less than 100 with iD (p) = k was also calculated, for all D less than 100. Table 4 gives the results. Table 4. Results for D < 100 and p < 100 k Average Number Average Fraction 0 15.3 .637500 1 7.16667 .298611 2 1.23333 .051389 3 0.26667 .011111 4 0.03333 .001389
Irregularity of Prime Numbers over Real Quadratic Fields
461
While these samples are far too small to be conclusive, they seem to indicate some merit to the heuristic argument. One other notable observation is that p2 sometimes divides ζD (1 − 2m) or pζD (1 − δ). This is something which has not yet been observed over Q, and a similar heuristic to the one above predicts that the probability of it drops to 0 as p goes to infinity. However, for discriminants less than 100 we observe 12 total occurrences for primes less than 50, including 2 where p3 divides. In the case of ζ77 (1 − 32), p = 37, we have p dividing both the first piece and the second piece; in the other cases p2 divides the only second piece. For the same discriminants there are only 4 occurrences for primes between 50 and 100; in each of them p2 exactly divides the second piece and p does not divide the first. For D = 5 this phenomenon happens once more in the primes less than 1000, at ζ5 (1 − 216), p = 443, where p2 exactly divides the second piece. For D = 8, at ζ8 (1 − 92), p = 587, and for D = 12, at ζ12 (1 − 520), p = 929, p divides the first and second pieces once each. For D = 13, p2 does not divide any of the zeta-values we have calculated. Once again, the evidence so far seems to agree with the heuristic prediction.
Acknowledgments The author would like to thank David Hayes, for his comments on a draft of this paper; Michael Rosen, for supervising the thesis which was the starting point from which this work diverged; and the Brown Mathematics Department and especially Joseph Silverman for the use of the department’s computing facilities in the preliminary stages of this research. I would also like to thank Masanobu Kaneko for the e-mail exchange which led me to start thinking about faster ways to compute Bernoulli polynomials.
References 1. Eric Bach. The complexity of number-theoretic constants. Information Processing Letters, 62:145–152, 1997. 2. C. Batut, D. Bernardi, H. Cohen, and M. Olivier. User’s Guide to PARI-GP. Laboratoire A2X, Universit´e Bordeaux I, version 1.39 edition, January 14, 1995. ftp://megrez.math.u-bordeaux.fr. 3. J. Buhler, R. Crandall, R. Ernvall, and T. Mets¨ ankyl¨ a. Irregular primes and cyclotomic invariants up to four million. Mathematics of Computation, 59:717–722, 1992. 4. Henri Cohen. Variations sur un th`eme de Siegel et Hecke. Acta Arithmetica, 30:63–93, 1976. 5. Henri Cohen. A Course in Computational Number Theory, volume 138 of Graduate Texts in Mathematics. Springer-Verlag, 1993. 6. Pierre Deligne and Kenneth Ribet. Values of abelian L-functions at negative integers over totally real fields. Inventiones Mathematicae, 59:227–286, 1980. 7. Reijo Ernvall. Generalized Bernoulli numbers, generalized irregular primes, and class number. Annales Universitatis Turkuensis. Series A. I., 178, 1979. 72 pp.
462
Joshua Holden
8. Reijo Ernvall. Generalized irregular primes. Mathematika, 30:67–73, 1983. 9. Reijo Ernvall. A generalization of Herbrand’s theorem. Annales Universitatis Turkuensis. Series A. I., 193, 1989. 15 pp. 10. Sandra Fillebrown. Faster computation of Bernoulli numbers. Journal of Algorithms, 13:431–445, 1992. 11. Ralph Greenberg. A generalization of Kummer’s criterion. Inventiones Mathematicae, 21:247–254, 1973. 12. Fred H. Hao and Charles J. Parry. Generalized Bernoulli numbers and m-regular primes. Mathematics of Computation, 43:273–288, 1984. 13. David Hayes. Brumer elements over a real quadratic field. Expositiones Mathematicae, 8:137–184, 1990. 14. Joshua Holden. On the Fontaine-Mazur Conjecture for Number Fields and an Analogue for Function Fields. PhD thesis, Brown University, 1998. 15. Kenneth Ireland and Michael Rosen. A Classical Introduction to Modern Number Theory, volume 84 of Graduate Texts in Mathematics. Springer-Verlag, second edition, 1990. Second Corrected Printing. 16. Kenneth Ribet. Report on p-adic L-functions over totally real fields. Asterisque, 61:177–192, 1979. 17. Michael Rosen. Remarks on the history of Fermat’s last theorem 1844 to 1984. In Gary Cornell, Joseph H. Silverman, and Glenn Stevens, editors, Modular Forms and Fermat’s Last Theorem, pages 505–525. Springer-Verlag, 1997. 18. Carl Ludwig Siegel. Bernoullische Polynome und quadratische Zahlk¨ orper. Nachrichten der Akademie der Wissenschaften in G¨ ottingen, Mathematischphysikalische Klasse, 2:7–38, 1968. 19. Carl Ludwig Siegel. Berechnung von Zetafunktionen an ganzzahligen Stellen. Nachrichten der Akademie der Wissenschaften in G¨ ottingen, Mathematischphysikalische Klasse, 10:87–102, 1969. 20. Lawrence C. Washington. Introduction to Cyclotomic Fields, volume 83 of Graduate Texts in Mathematics. Springer-Verlag, second edition, 1997. 21. Don Zagier. On the values at negative integers of the zeta-function of a real quadratic field. L’Enseignement Math´ ematique II S´ erie, 22:55–95, 1976.
Experimental Results on Class Groups of Real Quadratic Fields (Extended Abstract) Michael J. Jacobson, Jr.? Technische Universit¨ at Darmstadt, FB Informatik Institut f¨ ur theoretische Informatik Alexanderstr. 10, 64283 Darmstadt, Germany
In an effort to expand the body of numerical data for real quadratic fields, we have computed the class groups and regulators of all real quadratic fields with discriminant ∆ < 109 . We implemented a variation of the group structure algorithm for general finite Abelian groups described in [2] in the C++ programming language using built-in types together with a few routines from the LiDIA system [12]. This algorithm will be described in more detail in a forth-coming paper. The class groups and regulators of all 303963581 real quadratic fields were computed on 20 workstations (SPARC-classics, SPARC-4’s, and SPARCultra’s) by executing the computation for discriminants in intervals of length 105 on single machines and distributing the overall computation using PVM [8]. The entire computation took just under 246 days of CPU time (approximately 3 months real time), an average of 0.07 seconds per field. In this contribution, we present the results of this experiment, including data supporting the truth of Littlewood’s bounds on the function L (1, χ∆ ) [13] and Bach’s bound on the maximum norm of the prime ideals required to generate the class group [1]. Data supporting several of the Cohen-Lenstra heuristics [6,7] is presented, including results on the percentage of non-cyclic odd parts of class groups, percentages of odd parts of class numbers equal to small odd integers, and percentages of class numbers divisible by small primes p. We also give new examples of irregular class groups, including examples for primes p ≤ 23 and one example of a rank 3 5-Sylow subgroup (3 non-cyclic factors), the first example of a real quadratic class group which has a p-Sylow subgroup with rank greater than 2 and p > 3.
1
The L (1, χ ) Function
Much interest has been shown in extreme values of the L (1, χ∆ ) function [3,14,10,4]. A result of Littlewood [13] and Shanks [14] shows that under the Extended Riemann Hyptothesis (ERH) −1
{1 + o(1)} (c1 log log ∆) ?
< L (1, χ∆ ) < {1 + o(1)} c2 log log ∆,
(1)
The author is supported by the Natural Sciences and Engineering Research Council of Canada
J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 463–474, 1998. c Springer-Verlag Berlin Heidelberg 1998
464
Michael J. Jacobson, Jr.
where the values of the constants c1 and c2 depend upon the parity of ∆: c1 = 12eγ /π 2 γ
c1 = 8e /π
2
and and
c2 = 2eγ γ
c2 = e
when 2 6 | ∆ when 2 | ∆ .
For a fixed ∆, Shanks [14] defines the upper and lower Littlewood indices as U LI = L (1, χ∆ ) / (c2 log log ∆)
(2)
LLI = L (1, χ∆ ) c1 log log ∆ .
(3)
If (1) is true, then as ∆ increases, we would expect that extreme values of the U LI and LLI would tend to approach 1. A U LI value greater than 1 or LLI value less than 1 would probably indicate a violation of the ERH [14]. Following [11] and [14], we define the function Y p . (4) L∆ (1) = 4∆ p − p p prime Note that this function is essentially L (1, χ∆ ) with the 2-factor divided out, i.e., if ∆ ≡ 0 (mod 4) L (1, χ∆ ) L∆ (1) = (1/2)L (1, χ∆ ) if ∆ ≡ 1 (mod 8) (3/2)L (1, χ∆ ) if ∆ ≡ 5 (mod 8) . Since the 2-factor is determined by the congruence class of ∆ modulo 8, dividing it out allows us to compare the quadratic residuosity of all discriminants regardless of their congruence modulo 8. In [14], Shanks derives bounds for L∆ (1) analogous to (1) (also under ERH) {1 + o(1)}
8 log log 4∆ π2
−1 < L∆ (1) < {1 + o(1)} eγ log log 4∆,
(5)
and the corresponding indices U LI∆ = L∆ (1)/ (eγ log log 4∆) LLI∆ = L∆ (1) π82 log log 4∆ .
(6) (7)
If (5) is true, then as ∆ increases, we would also expect the extreme values of the U LI∆ and LLI∆ to approach 1. We have recorded the successive L (1, χ∆ ) maxima and minima for even ∆, ∆ ≡ 1 (mod 8), and ∆ ≡ 5 (mod 8) where ∆ < 109 , together with U LI values and L∆ (1) and U LI∆ values where appropriate. The maximum L (1, χ∆ ) value found was 7.07046680 . . . (U LI = 0.65623747 . . .) for ∆ = 872479969 and the maximum L∆ (1) value was 3.74995980 . . . (U LI∆ = 0.68501570 . . .) for ∆ = 612380869. The minimum L (1, χ∆ ) value found was 0.18948336 . . .
Experimental Results on Class Groups of Real Quadratic Fields
465
(LLI = 1.12478715 . . .) for ∆ = 5417453 and the minimum L∆ (1) value was 0.27822361 . . . (LLI∆ = 1.20515814) for ∆ = 133171673. We found no surprises here — the L (1, χ∆ ) , U LI, and LLI values seem to behave similarly to those of imaginary quadratic fields [3] and correspond to previous observations [10]. At first glance, it may appear that the LLI value for ∆ = 1592 indicates a violation of the ERH. However, as discussed by Shanks in [14], the apparent violation can almost certainly be accounted for by the o(1) term in (1), since this discriminant is so small. In [4], Buell also looked at the mean values of L (1, χ∆ ) for imaginary quadratic fields of both even and odd discriminant. His computations suggest that the mean value of L (1, χ∆ ) is approximately 1.186390 for even discriminants and 1.581853 for odd discriminants. Our computations show that the same mean values probably hold for real quadratic fields. We have computed a mean value of 1.18639 for the even discriminants less than 109 , and our computed value of 1.58154 for odd discriminants less than 109 is close to Buell’s value. We suppose that the difference can be accounted for by the fact that Buell has considered over twice as many fields as we have (|∆| < 2.2 × 109 ). Indeed, it seems that at ∆ ≈ 109 the mean value of L (1, χ∆ ) is still slowly approaching Buell’s value in our case.
2
Odd Parts of Class Numbers
√ ∗ Let Cl∆ be the odd part of the class group of Q( ∆). Cohen and Lenstra [6,7] ∗ . For example, if we provide some heuristics on the distribution of various Cl∆ define Y 1 , (8) w(n) = α (1 − p−1 )(1 − p−2 ) . . . (1 − p−α ) p α p kn
η∞ (p) = C∞ =
∞ Y (1 − p−i ) (η∞ (2) = 0.288788095 . . .), i=1 ∞ Y
(9)
ζ(j + 1) = 2.294856589 . . .,
(10)
1 = 0.754458173 . . . , 2η∞(2)C∞
(11)
j=1
C=
∗ | is equal to k is the probability that h∗∆ = |Cl∆
Cw(k) . (12) k This gives us Prob(h∗∆ = 1) = 0.754458173 . . . , Prob(h∗∆ = 3) = 0.125743028 . . ., and Prob(h∗∆ = 5) = 0.037722908 . . . for the first few small values of k. Using this heuristic assumption, Lukes, Williams, and the author were also able to derive [10] log x 1 ∗ +O , (13) Prob(h∆ > x) = 2x x2 Prob(h∗∆ = k) =
466
Michael J. Jacobson, Jr.
a generalization of a conjecture of Hooley for prime discriminants [9] and k+1=
1 2
1 1 − Prob(h∗∆ ≤ k)
+O
log k k2
,
(14)
which can be used to test the validity of (13). √ We have used our computation of all class groups of fields Q( ∆) where ∆ < 109 to extend the numerical evidence supporting (12) and (13) presented in [10]. Define qi (x) to be the observed ratio of odd discriminants less than x with h∗∆ = i divided by the conjectured asymptotic probability given by (12). Similarly, we define si (x) to be the observed ratio of odd discriminants less than x with h∗∆ ≤ i and 1 1 . ti (x) = 2 1 − si (x) Tables 1 and 2 contain values of qi (x) and ti (x) for various i and x for ∆ ≡ 1 (mod 4), ∆ < 109 . If (12) is correct, we would expect the values in Tab. 1 (qi (x) values) to approach 1 for each value of i as x increases. Similarly, if (13) is correct, by (14) we would expect the values in Tab. 2 (ti (x) values) to approach i + 1 for each value of i as x increases. As observed in [10], this does appear to happen in both cases. Our extended computation also supports this, although the convergence is still rather slow. The corresponding tables for even discriminants are so similar that in the interest of brevity we do not include them here.
3
Divisibility of h by Odd Primes
Another heuristic presented in [6,7] is the probability that h∆ is divisible by an odd prime p is given by Prob(p | h∆ ) = 1 −
η∞ (p) , 1 − p−1
(15)
where η∞(p) is defined in (9). For example, Prob(3 | h∆ ) = 0.159810883 . . . , Prob(5 | h∆ ) = 0.049584005 . . . , and Prob(7 | h∆ ) = 0.023738691 . . . for the first few small odd primes. Define pp (x) to be the observed ratio of odd discriminants less than x with p | h∆ divided by the conjectured asymptotic probability given by (15). As x increases, we expect pp (x) to approach 1 for a specific odd prime p. In Tab. 3 we provide values of pp (x) for various p and x for ∆ ≡ 1 (mod 4), ∆ < 109 . Unlike the case in imaginary fields [4], the values of pp (x) seem to approach 1 fairly smoothly from below. The corresponding table for ∆ ≡ 0 (mod 4) is very similar and hence not included here.
Experimental Results on Class Groups of Real Quadratic Fields
4
467
Non-cyclic p-Sylow Subgroups
∗ As above, let Cl∆ be the odd part of Cl∆ . Then, under the heuristic assumptions ∗ is cyclic, namely in [6,7] one can easily derive the probability that Cl∆
Y
∗ cyclic) = C Prob(Cl∆
p odd prime
p3 − p2 + 1 = 0.997630528 . . . (p − 1)(p2 − 1)
(16)
where C is given by (11). Define c(x) to be the observed ratio of odd (or even) ∗ cyclic divided by the conjectured asymptotic discriminants less than x with Cl∆ probability given by (16). This function should approach 1 as x increases if (16) is true. Table 4 provides values of c(x) for various values of x and both even and odd ∆. The total number of fields with discriminant less than x and the number of ∗ are also listed for even and odd ∆. As expected, the values of non-cyclic Cl∆ c(x) appear to approach 1 in both cases. For an odd prime p, define the p-rank of Cl∆ to be the number of non-cyclic factors of the p-Sylow subgroup of Cl∆ . Yet another heuristic of Cohen and Lenstra [6,7] states that the probability that the p-rank of Cl∆ is equal to r is given by Prob(p-rank of Cl∆ = r) =
η∞ (p) Q . pr(r+1) (1 − p−(r+1) ) 1≤k≤r (1 − p−k )2
(17)
For example, Prob(3-rank of Cl∆ = 2) = 0.002272146 . . . , Prob(3-rank of Cl∆ = 3) = 0.000003277 . . ., and Prob(5-rank of Cl∆ = 2) = 0.000083166 . . . . Define prp,r (x) to be the observed ratio of odd discriminants less than x with p-rank = r divided by the conjectured asymptotic probability given by (17). As x increases, we expect prp,r (x) to approach 1 for a specific odd prime p and p-rank r if (17) is true. In Tab. 5 we provide values of prp,r (x) for various values of p, r, and x for ∆ ≡ 1 (mod 4), ∆ < 109 . These values do seem to approach 1, but due to the scarcity of examples the convergence is extremely slow, especially for pr3,3 (x). The corresponding table for ∆ ≡ 0 (mod 4) is very similar and hence not included here.
5
First Occurrences of Non-cyclic p-Sylow Subgroups
Following Buell [4], we list the total number and first occurrences of discriminants for which the p-Sylow subgroup is non-cyclic for various primes p. For the prime 2, we consider only the principal genus (the subgroup of squares) instead of the whole class group, since much of the information on the 2-Sylow subgroup of Cl∆ is easily obtainable from the factorization of ∆. In Tab. 6 and 7 we present those discriminants for which the p-Sylow subgroup has rank 2, and in particular has the structure C(pe1 ) × C(pe2 ). Table 6
468
Michael J. Jacobson, Jr.
contains the data corresponding to the principal genus, and Tab. 7 contains data for odd primes p. In both tables, we list the smallest discriminant and the total number of discriminants ∆ < 109 whose class groups contain the specified p-Sylow subgroup, odd and even discriminants being tabulated separately. We have found class groups with non-cyclic p-Sylow subgroups for primes p ≤ 23. There are obviously not as many examples as in the case of imaginary fields [4], as one would expect from the heuristic p-rank probabilities derived by Cohen and Lenstra [6,7]. In Tab. 8 and 9 we present the corresponding data for class groups with pSylow subgroups of rank 3, i.e., having structure C(pe1 ) × C(pe2 ) × C(pe3 ). Once again, we consider the 2-Sylow subgroup of the principal genus, not Cl∆ . Again, we have significantly fewer examples as in the case of imaginary fields [4], and no examples with rank greater than 3. However, the discriminant 999790597 has class group isomorphic to C(5) × C(5) × C(40) and is believed to be the only discriminant known with p-rank greater than 2 for an odd prime p > 3. The smallest discriminants and total number of discriminants ∆ < 109 whose class groups contain 2 non-cyclic p-Sylow subgroups are presented in Tab. 10. When one examines the probability that the p-Sylow subgroup is non-cyclic presented in the last section, it is easy to see why so few examples of fields with doubly non-cyclic class groups were found.
6
The Number of Generators Required
In [1], Bach gives a theorem which states that under the ERH, the prime ideals of norm 6 log2 ∆ are sufficient to generate the class group. In practice, it has been observed that this bound does not seem to be tight, i.e., fewer generators are sufficient [5]. During the course of our computation, we have kept track of the maximum norm of the prime ideals required to generate the class 9 group √ of each discriminant ∆ < 10 . Of all 303963581 fields considered, the field Q( 519895977) required the prime ideal with largest norm to construct a full generating system, namely 197. For a specific ∆, define maxp (∆) to be the √ largest norm of the prime ideals required to generate the class group of Q( ∆). If Bach’s theorem is true, we would expect that maxp (∆)/ log2 ∆ should always be less than 6. For ∆ < 109 , this is in fact the case, and indeed if we exclude the very smallest discriminants (like ∆ = 5), the maximum value obtained for this ratio is 0.55885 . . . for ∆ = 519895977. As one would expect due to the high probability of cyclic odd parts of class groups, the average value of this ratio is significantly less than 6 — for ∆ < 109 we have obtained a value of 0.01984 . . .. It has been conjectured [5] that a tighter bound of the form c log1+ ∆ for any > 0 may hold in this case. Hence, in order to get an idea of the order of magnitude of the constant c, we also considered the ratio maxp (∆)/ log ∆. For ∆ < 109 , the largest value we obtained was 9.81607 . . . for the discriminant 519895977 and the average value was 0.38982 . . . .
Experimental Results on Class Groups of Real Quadratic Fields
A
Appendix
Table 1. Values of qi (x) for ∆ ≡ 1 (mod 4). x 1000000 10000000 20000000 30000000 40000000 50000000 60000000 70000000 80000000 90000000 100000000 200000000 300000000 400000000 500000000 600000000 700000000 800000000 900000000 1000000000
q1 (x) 1.06119 1.03676 1.03178 1.02923 1.02752 1.02634 1.02541 1.02461 1.02389 1.02333 1.02284 1.01994 1.01839 1.01739 1.01662 1.01604 1.01558 1.01515 1.01480 1.01449
q3 (x) 0.85263 0.89604 0.90683 0.91246 0.91613 0.91893 0.92078 0.92235 0.92374 0.92480 0.92605 0.93304 0.93699 0.93972 0.94173 0.94313 0.94444 0.94556 0.94654 0.94739
q5 (x) 0.95644 0.99125 0.99465 0.99592 0.99663 0.99664 0.99588 0.99632 0.99637 0.99702 0.99695 0.99698 0.99776 0.99796 0.99830 0.99839 0.99867 0.99887 0.99903 0.99925
q7 (x) 0.94918 0.99564 1.00142 1.00250 1.00194 1.00315 1.00446 1.00504 1.00623 1.00608 1.00581 1.00554 1.00567 1.00537 1.00476 1.00498 1.00457 1.00473 1.00491 1.00484
q9 (x) 0.70424 0.83023 0.84625 0.85705 0.86264 0.86638 0.87092 0.87567 0.87874 0.88182 0.88409 0.89658 0.90286 0.90680 0.91021 0.91269 0.91438 0.91642 0.91783 0.91907
q11 (x) 0.90228 0.97519 0.98812 0.99247 0.99791 0.99846 0.99982 1.00148 1.00372 1.00418 1.00528 1.00676 1.00637 1.00635 1.00679 1.00725 1.00665 1.00663 1.00686 1.00690
q27 (x) 0.47347 0.69086 0.74718 0.76587 0.78753 0.79660 0.80705 0.81494 0.82014 0.82863 0.83205 0.86198 0.87221 0.87994 0.88370 0.88921 0.89328 0.89664 0.89807 0.90041
469
470
Michael J. Jacobson, Jr.
Table 2. Values of ti (x) for ∆ ≡ 1 (mod 4). x 1000000 10000000 20000000 30000000 40000000 50000000 60000000 70000000 80000000 90000000 100000000 200000000 300000000 400000000 500000000 600000000 700000000 800000000 900000000 1000000000
t1 (x) 2.50786 2.29561 2.25667 2.23723 2.22443 2.21560 2.20874 2.20287 2.19765 2.19354 2.18998 2.16921 2.15828 2.15123 2.14593 2.14189 2.13868 2.13575 2.13331 2.13123
t3 (x) 5.42530 4.75574 4.64952 4.59746 4.56286 4.54032 4.52115 4.50462 4.48987 4.47810 4.46955 4.41793 4.39186 4.37589 4.36363 4.35361 4.34659 4.33986 4.33435 4.32982
t5 (x) 8.91565 7.38079 7.14116 7.02378 6.94593 6.89384 6.84708 6.81076 6.77728 6.75275 6.73308 6.61671 6.56095 6.52601 6.49985 6.47794 6.46330 6.44904 6.43741 6.42809
t7 (x) 12.81041 10.02841 9.61024 9.40226 9.26159 9.17287 9.09414 9.03188 8.97656 8.93314 8.89798 8.69513 8.59942 8.53874 8.49242 8.45561 8.42964 8.40581 8.38650 8.37053
t9 (x) 17.88166 13.58368 12.91103 12.59204 12.36781 12.22767 12.10904 12.02043 11.93639 11.87337 11.82131 11.51779 11.37594 11.28573 11.21844 11.16400 11.12534 11.09178 11.06361 11.04056
t11 (x) 22.96408 16.60010 15.64977 15.19731 14.88841 14.68742 14.52054 14.39801 14.28389 14.19501 14.12367 13.69636 13.49526 13.36847 13.27520 13.20015 13.14471 13.09783 13.05911 13.02709
t27 (x) 109.65097 55.01249 48.43620 45.43814 43.53718 42.23651 41.36938 40.61104 39.94645 39.48465 39.02412 36.62777 35.52398 34.83250 34.32766 33.93718 33.63894 33.39694 33.19010 33.01701
Table 3. Values of pp (x) for ∆ ≡ 1 (mod 4). x 1000000 10000000 20000000 30000000 40000000 50000000 60000000 70000000 80000000 90000000 100000000 200000000 300000000 400000000 500000000 600000000 700000000 800000000 900000000 1000000000
p3 (x) 0.79263 0.86203 0.87781 0.88602 0.89166 0.89565 0.89875 0.90138 0.90359 0.90548 0.90723 0.91756 0.92311 0.92681 0.92958 0.93165 0.93340 0.93494 0.93627 0.93736
p5 (x) 0.85146 0.92211 0.93583 0.94186 0.94644 0.94941 0.95110 0.95319 0.95474 0.95692 0.95769 0.96437 0.96849 0.97071 0.97254 0.97381 0.97503 0.97597 0.97676 0.97748
p7 (x) 0.81554 0.90990 0.92884 0.93593 0.93995 0.94450 0.94890 0.95187 0.95493 0.95641 0.95775 0.96503 0.96937 0.97204 0.97360 0.97533 0.97625 0.97733 0.97837 0.97896
p11 (x) 0.75676 0.87157 0.89645 0.90931 0.91964 0.92469 0.92865 0.93207 0.93650 0.93841 0.94103 0.95327 0.95884 0.96205 0.96488 0.96747 0.96881 0.97022 0.97156 0.97250
p13 (x) 0.78022 0.88008 0.90250 0.91832 0.92549 0.92899 0.93287 0.93397 0.93619 0.93792 0.93852 0.95101 0.95706 0.95890 0.96211 0.96460 0.96618 0.96828 0.96979 0.97123
p17 (x) 0.64981 0.83734 0.86862 0.88125 0.89038 0.89572 0.90293 0.90733 0.91241 0.91248 0.91572 0.93335 0.94071 0.94696 0.95253 0.95654 0.95852 0.96099 0.96284 0.96415
p19 (x) 0.64482 0.82371 0.85824 0.87796 0.88702 0.89286 0.90142 0.90522 0.91046 0.91254 0.91663 0.93290 0.94133 0.94566 0.94938 0.95244 0.95441 0.95553 0.95672 0.95859
Experimental Results on Class Groups of Real Quadratic Fields
Table 4. Number of non-cyclic odd parts of class groups. x 1000000 10000000 20000000 30000000 40000000 50000000 60000000 70000000 80000000 90000000 100000000 200000000 300000000 400000000 500000000 600000000 700000000 800000000 900000000 1000000000
∆ ≡ 0 (mod 4) total non-cyclic c(x) 101322 50 1.00188 1013213 919 1.00147 2026421 2129 1.00132 3039631 3385 1.00126 4052850 4733 1.00120 5066064 6108 1.00117 6079270 7595 1.00112 7092461 9048 1.00110 8105723 10519 1.00107 9118933 12028 1.00105 10132112 13508 1.00104 20264226 28941 1.00094 30396405 44996 1.00089 40528481 61286 1.00086 50660585 78144 1.00083 60792730 94989 1.00081 70924833 112001 1.00079 81056948 129369 1.00078 91189082 146508 1.00076 101321191 164246 1.00075
∆ ≡ 1 (mod 4) total non-cyclic c(x) 202635 114 1.00181 2026440 2088 1.00134 4052851 4627 1.00123 6079260 7365 1.00116 8105666 10137 1.00112 10132117 13008 1.00109 12158544 15999 1.00106 14184949 19007 1.00103 16211387 22000 1.00101 18237802 25091 1.00100 20264212 28150 1.00098 40528477 60347 1.00088 60792687 93517 1.00083 81056963 127467 1.00080 101321188 161867 1.00077 121585380 197074 1.00075 141849691 232554 1.00073 162113906 267801 1.00072 182378148 303469 1.00071 202642390 339554 1.00070
Table 5. Values of prp,r (x) for ∆ ≡ 1 (mod 4). x 1000000 10000000 20000000 30000000 40000000 50000000 60000000 70000000 80000000 90000000 100000000 200000000 300000000 400000000 500000000 600000000 700000000 800000000 900000000 1000000000
pr3,2 (x) 0.24109 0.43263 0.47803 0.50844 0.52500 0.53875 0.55223 0.56236 0.56963 0.57777 0.58360 0.62506 0.64581 0.66070 0.67127 0.68148 0.68942 0.69479 0.70012 0.70513
pr3,3 (x) 0.00000 0.00000 0.00000 0.00000 0.03765 0.06023 0.05019 0.06454 0.05647 0.08366 0.09035 0.13552 0.15058 0.14682 0.18973 0.19325 0.21082 0.20893 0.20747 0.20780
pr5,2 (x) pr7,2 (x) pr11,2 (x) pr13,2 (x) 0.17802 0.00000 0.00000 0.00000 0.49842 0.58526 0.00000 0.00000 0.61116 0.60965 0.00000 0.00000 0.62303 0.61778 0.26275 0.00000 0.64825 0.59745 0.59119 0.54645 0.67525 0.55600 0.63060 0.43716 0.69423 0.53649 0.52550 0.36430 0.70186 0.58526 0.45043 0.31225 0.70610 0.61574 0.49265 0.54645 0.70742 0.64487 0.43791 0.48573 0.71026 0.63891 0.55177 0.65574 0.78116 0.70231 0.63060 0.54645 0.80539 0.75434 0.68315 0.58288 0.81291 0.76815 0.68971 0.54645 0.82311 0.79498 0.69366 0.56830 0.83002 0.78604 0.70942 0.51002 0.83817 0.79985 0.74320 0.43716 0.84376 0.80290 0.73898 0.40983 0.84409 0.81069 0.73570 0.46144 0.84898 0.81936 0.74883 0.56830
471
472
Michael J. Jacobson, Jr.
Table 6. Non-cyclic rank 2 2-Sylow subgroups. e1 1 2 2 3 3 3 4 4 4 4 5 5 5 6 6 6 7 7 8 9
e2 1 1 2 1 2 3 1 2 3 4 1 2 3 1 2 3 1 2 1 1
first odd ∆ # odd ∆ first even ∆ # 26245 625278 12104 134249 233132 69064 1717505 8914 1781004 563545 57267 796552 2044369 3267 5324556 22325605 111 34560024 1397321 13789 1542748 8443681 742 19369756 48365305 34 103252696 * * 683376268 7182401 3053 10562504 82670065 138 107723544 327805705 3 522315292 18727689 603 31610632 256055305 13 592435596 938900353 1 887803144 64209289 73 187432072 351270505 4 * 216442945 6 325080904 438986305 1 *
even ∆ 437912 164617 6132 39791 2138 82 9535 496 18 1 2091 83 2 353 9 1 35 * 2 *
Table 7. Non-cyclic rank 2 p-Sylow subgroups. p 3 3 3 3 3 3 3 3 3 3 5 5 5 7 7 11 13 13 17 19 23
e1 e2 1 1 2 1 2 2 3 1 3 2 3 3 4 1 4 2 5 1 6 1 1 1 2 1 3 1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1
first odd ∆ # odd ∆ first even ∆ # even ∆ 32009 279754 94636 135945 255973 39982 626264 19100 8739521 313 25725176 147 2178049 4184 1559644 1771 49831633 33 82435336 15 395659153 1 * * 4822921 381 51236956 115 * * 793667548 1 125609177 13 412252408 2 604420177 2 * * 244641 13691 1277996 6929 3874801 605 52929592 220 225225057 12 569204156 2 1633285 1652 3626536 799 30883361 28 96847468 8 26967253 95 81903208 54 39186673 25 41912572 14 900384041 1 * * 810413473 1 361880744 3 65028097 4 * * 763945277 1 * *
Experimental Results on Class Groups of Real Quadratic Fields
Table 8. Non-cyclic rank 3 2-Sylow subgroups. e1 1 2 2 3 3 4 4 5 6
e2 1 1 2 1 2 1 2 1 1
e3 first odd ∆ # odd ∆ first even ∆ # even ∆ 1 5764805 1409 12490568 879 1 17737705 620 38922248 396 1 110255245 32 270453068 22 1 100282145 133 87572168 80 1 230818741 8 155979976 7 1 154877545 27 37970248 12 1 689289745 1 387642264 2 1 499871221 7 216461884 4 1 * * 708776776 1
Table 9. Non-cyclic rank 3 p-Sylow subgroups. p 3 3 3 5
e1 e2 1 1 2 1 3 1 1 1
e3 first odd ∆ # odd ∆ first even ∆ # even ∆ 1 39345017 122 66567068 44 1 88215377 15 157753592 10 1 545184113 1 * * 1 999790597 1 * *
Table 10. Doubly non-cyclic p-Sylow subgroups. p1 2 2 2 3 3
p1 first odd ∆ # odd ∆ first even ∆ # even ∆ 3 10876805 1299 9622408 908 5 66376409 43 200600008 20 7 230181505 3 630353080 1 5 57586597 15 492371864 4 7 204242449 3 * *
473
474
Michael J. Jacobson, Jr.
References 1. E. Bach. Explicit bounds for primality testing and related problems. Math. Comp., 55(191):355–380, 1990. 2. J. Buchmann, M.J. Jacobson, Jr., and E. Teske. On some computational problems in finite abelian groups. Math. Comp., 66(220):1663–1687, 1997. 3. D.A. Buell. Small class numbers and extreme values of L-functions of quadratic fields. Math. Comp., 31(139):786–796, 1977. 4. D.A. Buell. The last exhaustive computation of class groups of complex quadratic number fields. To appear in Number Theory: Fifth Conference of the Canadian Number Theory Association, 1996. 5. H. Cohen. A Course in Computational Algebraic Number Theory. Springer-Verlag, Berlin, 1993. 6. H. Cohen and H.W. Lenstra, Jr. Heuristics on class groups of number fields. In Number Theory, Lecture notes in Math., volume 1068, pages 33–62. SpringerVerlag, New York, 1983. 7. H. Cohen and H.W. Lenstra, Jr. Heuristics on class groups. In Number Theory (Noordwijkerhout, 1983), Lecture Notes in Math., volume 1052, pages 26–36. Springer-Verlag, New York, 1984. 8. A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam. PVM: Parallel Virtual Machine - A User’s Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge, Mass., 1994. 9. C. Hooley. On the Pellian equation and the class number of indefinite binary quadratic forms. J. reine angew. Math., 353:98–131, 1984. 10. M.J. Jacobson, Jr., R.F. Lukes, and H.C. Williams. An investigation of bounds for the regulator of quadratic fields. Experimental Mathematics, 4(3):211–225, 1995. 11. D.H. Lehmer, E. Lehmer, and D. Shanks. Integer sequences having prescribed quadratic character. Math. Comp., 24(110):433–451, 1970. 12. LiDIA. http://www.informatik.tu-darmstadt.de/TI/LiDIA, 1997. √ 13. J.E. Littlewood. On the class number of the corpus P ( − k). Proc. London Math. Soc., 27:358–372, 1928. 14. D. Shanks. Systematic examination of Littlewood’s bounds on L(1, χ). In Proc. Sympos. Pure Math, pages 267–283. AMS, Providence, R.I., 1973.
Computation of Relative Class Numbers of Imaginary Cyclic Fields of 2-Power Degrees St´ephane Louboutin Universit´e de Caen, UFR Sciences D´epartement de Math´ematiques 14032 Caen cedex, France [email protected] Abstract. In this abridged version of [Lou], we outline an efficient technique for computing relative class numbers of imaginary Abelian fields. It enables us to compute relative class numbers of imaginary cyclic fields of degrees 32 and conductors greater than 1013 , or of degrees 4 and conductors greater than 1015 . Our major innovation is a technique for computing numerically root numbers appearing in some functional equations. Mathematics Subject Classification: Primary 11R20, 11R29, 11Y40; Secondary 11M20, 11R42. Keywords: Imaginary Abelian number field, relative class number.
1
Introduction
Proposition 1. (See [Wa]). Let N be an imaginary cyclic field of prime conductor p and 2-power degree [N : Q] = n = 2r ≥ 4. Set ζn = exp(2πi/n). Then, p ≡ n + 1 (mod 2n) and for any prime p ≡ n + 1 (mod 2n) there exists exactly one imaginary cyclic field of conductor p and degree n, to be denoted by Np . We then let h− p denote the relative class number of Np and χp denote any one of the n/2 characters of order n modulo p. We also set − = {χkp ; 1 ≤ k ≤ n and k odd}. Moreover, Xn,p 1. If p = n + 1 then Np = Q(ζp ). 2. If p > n + 1 then h− p = and
2 2n/2
1X = xχp (x) = p x=1 p−1
B1,χp
NQ(ζn )/Q (B1,χp ), X
(1)
(n/2)−1
def
ak ζnk ∈ Z[ζn ]
(2)
k=0
is an algebraic integer of Q(ζn ) where each ak =
n−1 2 2 X −ik TrQ(ζn )/Q (ζn−k B1,χp ) = ζ B1,χip n n i=1 n i odd
is a rational integer which satisfies |ak | ≤
1 √ 2π p(log p
J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 475–481, 1998. c Springer-Verlag Berlin Heidelberg 1998
+ 2) (use (8)).
(3)
476
St´ephane Louboutin
Remark 1. 1. h− p is always odd (use [Wa, Theorem 10.4 (b)]). If one of these ak ’s were even then all of them would be even, 2n/2 would divide NQ(ζn )/Q (B1,χp ) and h− p would be even, a contradiction. Therefore, all these ak ’s are odd. n 2. According to the Brauer-Siegel theorem log h− p which is asymptotic to 4 log p when p goes to infinity is usually a very large integer. 3. Let n = 2r ≥ 2 be a given 2-power and p ≡ 1 (mod n) be an odd prime. Set a np = min{a ≥ 1; a(p−1)/2 = ( ) = −1 (Legendre’s symbol)}. p We shall choose χp to denote the character modulo p well defined by def
χp (np ) = ζn = exp(2πi/n). (p−1)/n
(modulo p), we have the following efficient technique Setting mp = np for computing the values of χp : χp (x) = ζ kx where kx = min{k ∈ {0, 1, 2, · · ·, n − 1}; x(p−1)/n ≡ mkp
(mod p)}.
Note that χp is odd if and only if p ≡ 1 + n (mod 2n). . In Roughly speaking, using (2) we have to do O(p) operations to compute h− √ p this paper will reduce this amount of required computation down to O( p log p) elementary operations. The idea is to compute good approximations of all the B1,χ ’s, to use (3) and to use the fact that all the ak ’s must be rational integers to deduce their exact values from their good enough numerical approximations. We finally compute the exact value of h− p from these ak ’s (here of course, we need to work with large precision arithmetic on integers): setting Sp (r) = B1,χp ∈ Z[ζn ] and Sp (i) = NQ(ζ2i+1 )/Q(ζ2i ) (Sp (i + 1))
=
2i−1 X−1
ai (j)ζ2ji ∈ Z[ζ2i ] (1 ≤ i ≤ r − 1),
j=0
we can write
Sp (i) =
2i−1 X−1
2
ai (2j)ζ2ji − ζ2i
j=0
=
i 2X −1
j=0
2i−1 X−1
Aj ζ2ji =
j=0
2i−1 X−1 j=0
ai (j)ζ2ji
2 ai (2j + 1)ζ2ji
Relative Class Numbers
477
where A0 = (ai (0))2 , A2i −1 = −(ai (2i − 1))2 , min(j,2i−1 −1)
X
Aj =
ai (2k)ai (2j − 2k)
k=max(0,j−(2i−1 −1))
min(j−1,2i−1 −1)
X
−
ai (2k + 1)ai (2j − 2k − 1)
k=max(0,j−2i−1 )
(for 1 ≤ j ≤ 2i − 2) and ai−1 (j) = Aj − Aj+2i−1 (for 0 ≤ j ≤ 2i−1 − 1), which enables us to compute the exact value of the positive integer Sp (1) = NQ(ζn )/Q (B1,χp ) = 2(n/2)−1h− p.
2
Numerical Computation of B1;
Let χ be an odd primitive Dirichlet character of conductor f and order m. We set f−1 X χ(a)e2aπi/f , τχ = a=1 1 τχ and χ = √ i
f
g(x, χ) =
X
nχ(n)e−πn
2
x/f
(x > 0).
(4)
n≥1
It is known that χ has absolute value equal to one and that ¯ = χ x3/2 g(x, χ) (x > 0) g(1/x, χ) = χ x3/2 g(x, χ)
(5)
and we now express B1,χ = −L(0, χ) as the limit of a rapidly absolutely convergent series: √ X χ(n) 2 f X χ(n) ¯ χ e−πn /f + F (πn2 /f) . (6) B1,χ = − π n n n≥1
n≥1
Z
where F (X) = X
1
∞
dx e−Xx √ ≤ e−X . x
In particular, (6) is a rapidly absolutely convergent series which can be used to compute numerical approximations of values of Dirichlet generalized Bernoulli’s numbers B1,χ (see below). However, since there is no known general formula for Gauss sums (see [BE]), we will compute χ numerically: we will use (5)
478
St´ephane Louboutin
to verify that g(1, χ) 6= 0 and to compute good approximations of all χ = − (see Theorem 1). Second, we will use the rapidly g(1, χ)/g(1, χ)’s for χ ∈ Xn,p absolutely convergent series (6) to obtain good enough approximations of all B1,χ ’s to use (3) to deduce the exact values of the ak ’s. We set r B(t, M, f) =
f f (t log( ) + M ) π π
(7)
and we will replace various infinite sums similar to (6) by sums up to the least integer greater than or equal to B(t, M, f) where t and M will be be suitably chosen. Note that n ≥ B(t, M, f) implies 0 ≤ F (πn2 /f) ≤ e−πn
2
/f
≤ (π/f)t e−M .
Roughly speaking, we will prove first that we need compute only B(1, M, f) terms in (4) to compute χ with an error not exceeding e−M (see Theorem 1), second that we need compute only B( 32 + , M, f) terms in (6) (where χ is replaced by its just computed approximation) to compute B1,χ with an error not exceeding e−M (see Theorem 2), and third we will show that this enables us to compute the exact values of the coordinates of the algebraic integer B1,χ in the canonical Z-basis of the ring of algebraic integers Z(ζm ) of the cyclotomic field Q(ζm ). Theorem 1. 1. Set g = g(1, χ) and
gm =
m X
nχ(n)e−πn
2
/f
.
n=1
Then m ≥ B(1, M, f) and |gm | > g 6= 0, and
1 −M 2e
imply |g − gm | ≤ 12 e−M , gm 6= 0,
gm | ≤ e−M /|gm |. |χ − gm /¯ 2. Whenever p is an odd prime let gp ≥ 2 denote the least primitive root modulo p and let χp be the odd character modulo p defined by χp (gp ) = exp(2πi/(p − 1)). Hence, the χkp ’s with 1 ≤ k < p and k odd are the (p−1)/2 odd characters modulo p. Choosing M = 25 and letting χkp range over the 2867583 odd characters for the 1228 odd primes p ≤ 10000 we get Table 1 of the ten least values of |g(1, χkp )| (with 1 ≤ k ≤ (p − 1)/2 and k odd) according to which we have g(1, χ) 6= 0 for the 2867583 odd characters modulo any prime p ≤ 10000.
Relative Class Numbers
479
Table 1 p k ord(χkp ) |g(1, χkp )| 6007 1545 2002 0.000450 · · · 6551 2635 1310 0.000391 · · · 8539 3087 2846 0.000389 · · · 9281 1775 1856 0.000254 · · · 9661 4397 9660 0.000205 · · · 3061 143 3060 0.000196 · · · 8849 3251 8848 0.000113 · · · 9643 4635 3214 0.000090 · · · 3373 615 1124 0.000080 · · · 2803 1337 2802 0.000005 · · · Here, we let ord(χkp ) = (p − 1)/ gcd(p − 1, k) denote the order of χkp . 3. Point 2 makes it reasonable to put forward the following Conjecture: for any primitive odd Dirichlet character χ (of conductor f) we have g(1, χ) 6= 0. We finally explain how to use (6) to compute as good as desired numerical approximations of B1,χ , provided that g(1, χ) is not equal to zero. Theorem 2. Let χ be an odd primitive Dirichlet character modulo f. Then, |B1,χ | ≤ Set
√
f B1,χ (m) = − π
1 p f (log f + 2). 2π
(8)
! m m X X χ(n) ¯ χ(n) −πn2 /f 2 χ e F (πn /f) . + n n n=1 n=1
Then m ≥ B( 12 , M, f) implies |B1,χ − B1,χ (m)| ≤ √
2e−M . π(log(f/π) + 2M )
(9)
Now, assume gm 6= 0 and set √
˜1,χ (m) = − f B π
! m m X 2 gm X χ(n) ¯ χ(n) e−πn /f + F (πn2 /f) g¯m n=1 n n n=1
and t=
3 3 M + log log f) + = + o(1). 2 log(f/π) 2
˜1,χ (m)| ≤ 2e−M . Then m ≥ B(t, M, f) and |gm | ≥ 12 e−M imply |B1,χ − B
(10)
480
3
St´ephane Louboutin
Numerical Computation of hp
− Now, we use Theorem 1 to verify that for all the χ ∈ Xn,p we have g(1, χ) 6= 0. We have not yet found any χ such that Theorem 1 would not imply g(1, χ) 6= 0. Then we use Theorem 2: we let m be the least integer greater than or equal to − ˜1,χ (m) of all B1,χ for χ ∈ Xn,p B( 32 + , M, f) and compute approximations B (in practice, we choose M = 15). Setting
a ˜k =
n−1 2 X −ik ˜ ζ B1,χip (m) n i=1 n i odd
ak | < 2e−M , so that ak is the nearest integer to a ˜k , and we have we get |ak − ˜ It is worth noticing that the absolute computed the exact values of all the ak ’s. √ p values of all the B1,χip being less than 2π (log p + 2), then even for very large values of p we need only work with complex numbers of reasonable absolute values to compute the exact values of the coordinates of B1,χp . Since we do not have any positive lower bound on the absolute values of the g(1, χ)’s, for we do not even know how to prove they are never equal to zero, we cannot give any proved upper bound on the number of elementary operations our algorithm requires for computing h− p . However, in practice, |g(1, χ)| is never very small so that we may use the bound B( 32 + , M, f) with M = 15. We have programmed our formulas in Kida’s language Ubasic, which allows fast arbitrary precision calculation on PC’s. For example, let Np be the imaginary Abelian field of degree 32 and conductor p = 1013 + 609. We get min |g(1, χip)| = 173 010 991.29 · · ·
1≤i≤p−1 i odd
and B1,χp =
15 X
k ak ζ32
k=0
with k ak k ak k ak k ak
0 1 2 3 −216157 −211319 74357 396321 4 5 6 7 −213847 −264627 −25413 −238953 8 9 10 11 −160929 35681 309661 −15135 12 13 14 15 152601 −271679 388853 537675
and h− p =10 57160 41460 14284 21537 30049 35283 89944 64043 49937 90979 09467 19576 76809 23876 07191 38750 25726 21601 ≈ 1091 . Note that h− p and all the ak ’s are indeed odd.
Relative Class Numbers
4
481
The Cyclic Quartic Case
We now focus on imaginary cyclic quartic fields of prime conductors p ≡ 5 (mod 8), for here we almost have an explicit formula for χp . Proposition 2. Let p = a2 + b2 ≡ 5 (mod 8) be prime, where a and b are rational integers chosen such that a ≡ −1 (mod 4) and b ≡ 2 (mod 4). Hence, ab ≡ 2 (mod 4). Choose the sign sgnb of b so that ab ≡ 2 (mod 8). Then, q √ Np = Q( −(p + b p)) and for some p = ±1, we have τχp = p αp where q αp =
q √ √ (p + a p)/2 + isgnb (p − a p)/2.
Once using Theorem 1 we have proved that g(1, χp ) 6= 0 and have computed good approximations of χp , we can deduce the exact values of ±1 = p = √ τχp /αp = i pχp /αp and τχp = p αp . Then, according to Theorem 2 and since B1,χp = x + yi is in Z[i], we need compute only B( 12 , M, f) terms to determine the exact value of B1,χp . For example, if p = 1015 + 37 = 179368792 + (−26043586)2 then p = −1, B1,χp = −9475929 + 163987i and
1 |B |2 = 44 910 061 074 605. 2 1,χp Note that we could not have computed τχp or B1,χp easily by simply using their definition, for this p is much too large. h− p =
References BE. Lou. Wa.
B.C. Berndt and R.J. Evans. The determination of Gauss sums. Bull. Amer. Math. Soc. 5 (2) (1981), 107-129. S. Louboutin. Computation of relative class numbers of imaginary abelian number fields. Experimental Math, to appear. L.C. Washington. Introduction to Cyclotomic Fields. Grad.Texts Math. 83, Springer-Verlag (1982); Second Edition: 1997.
Formal Groups, Elliptic Curves, and Some Theorems of Couveignes Antonia W. Bluher National Security Agency, 9800 Savage Road, Fort George G. Meade, MD 20755-6000 Abstract. The formal group law of an elliptic curve has seen recent applications to computational algebraic geometry in the work of Couveignes to compute the order of an elliptic curve over finite fields of small characteristic ([2], [6]). The purpose of this paper is to explain in an elementary way how to associate a formal group law to an elliptic curve and to expand on some theorems of Couveignes. In addition, the paper serves as background for [1]. We treat curves defined over arbitrary fields, including fields of characteristic two or three. The author wishes to thank Al Laing for a careful reading of an earlier version of the manuscript and for many useful suggestions.
1
Definition and Construction of Formal Group Laws
Let R be a commutative ring with a multiplicative identity 1 and let R[[X]] denote the ring of formal power series of R. In general it is not possible to compose two power series in a meaningful way. For example, if we tried to form the composition f ◦ g with f = 1 + τ + τ 2 + τ 3 + · · · and g = 1 + τ we would get f ◦ g = 1 + (1 + τ ) + (1 + τ )2 + (1 + τ )3 + · · · The constant term is 1 + 1 + 1 + · · ·, which makes no sense. But there are some cases where f ◦ g does make sense, namely when f is a polynomial or when the constant term of g is zero. Let R[[X, Y ]] = R[[X]][[Y ]], the ring of formal power series in two variables. If F ∈ R[[X, Y ]] and g, h ∈ τ R[[τ ]] then F (g, h) makes sense and belongs to R[[τ ]]. If in addition F has a zero constant term, then F (g, h) ∈ τ R[[τ ]]. A one dimensional (commutative) formal group law over R is a power series F ∈ R[[X, Y ]] with zero constant term such that the “addition” rule on τ R[[τ ]] given by g ⊕F h = F (g, h) makes τ R[[τ ]] into an abelian group with identity 0. In other words, for every g, h we must have (f ⊕F g) ⊕F h = f ⊕F (g ⊕F h) (associative law), f ⊕F g = g ⊕F f (commutative law), f ⊕F 0 = f (0 is identity), and for each f ∈ τ R[[τ ]] there exists g ∈ τ R[[τ ]] such that f ⊕F g = 0 (inverses). Denote this group by C(F ). An equivalent and more widely known definition is the following: a formal group J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 482-501, 1998. Springer-Verlag Berlin Heidelberg 1998
Formal Groups, Elliptic Curves, and Some Theorems of Couveignes
483
law over R is a power series F (X, Y ) ∈ R[[X, Y ]] such that (i ) (ii ) (iii )
F (X, 0) = X; F (X, Y ) = F (Y, X) F (F (X, Y ), Z) = F (X, F (Y, Z))
(Additive Identity) (Commutative Law) (Associative Law).
(1.1)
The first property implies that F has the form X + Y H(X, Y ). By symmetry in X and Y , it must therefore be of the form F (X, Y ) = X + Y + XY G(X, Y ),
G ∈ R[[X, Y ]].
(1.2)
Proposition 1.1 Let F be a power series in two variables with coefficients in R such that F (0, 0) = 0. The following are equivalent. (1) The three conditions in (1.1) hold; (2) The binary operation on τ R[[τ ]] defined by f ⊕F g = F (f, g) makes τ R[[τ ]] into an abelian group with identity 0; (3) The binary operation on τ R[[τ ]] defined by f ⊕F g = F (f, g) makes τ R[[τ ]] into an abelian semigroup with identity 0. Proof. We will show (1) ⇒ (2) ⇒ (3) ⇒ (1). Assume (1) holds. Define a binary operation on τ R[[τ ]] by f ⊕F g = F (f, g) for f, g ∈ τ R[[τ ]]. The three conditions immediately imply f ⊕F 0 = f, f ⊕F g = g⊕F f, and (f ⊕F g)⊕F h = f ⊕F (g⊕F h) for f, g, h ∈ τ R[[τ ]]. It remains only to prove the existence of inverses. For this, it suffices to prove there is a power series ι ∈ τ R[[τ ]] such that F (g, ι ◦ g) = 0 for all g ∈ τ R[[τ ]]. Let ι(1) = −τ . By (1.2) F (τ, ι(1)) ≡ τ − τ ≡ 0 mod τ 2 . Now assume inductively that ι(N) ∈ τ R[[τ ]] satisfies F (τ, ι(N)) ≡ 0 mod τ N+1 and ι(N) ≡ ι(N−1) mod τ N . Then there is a ∈ R such that F (τ, ι(N)) ≡ aτ N+1 mod τ N+2 . Let ι(N+1) = ι(N) − aτ N+1 . By (1.2) F (ι(N) , −aτ N+1 ) ≡ ι(N) − aτ N+1 = ι(N+1) mod τ N+2 . Thus F (τ, ι(N+1)) ≡ F (τ, F (ι(N), −aτ N+1 )) = F (F (τ, ι(N)), −aτ N+1 ) ≡ F (τ, ι(N)) − aτ N+1 ≡ 0 mod τ N+2 . This completes the induction. Let ι ∈ τ R[[τ ]] be the power series such that ι ≡ ι(N) mod τ N+1 for all N . Then F (τ, ι(τ )) = 0, and hence F (x, ι(x)) = 0 for all x ∈ τ R[[τ ]]. This proves (1) ⇒ (2). It is obvious that (2) ⇒ (3). Now assume (3) holds. We will prove condition (iii) of (1.1) holds; the other conditions in (1.1) can be proved similarly. Let G(X, Y, Z) = F (F (X, Y ), Z) − F (X, F (Y, Z)). We must show G = 0. By hypothesis, if a, b, c are any positive integers then G(τ a , τ b , τ c) = (τ a ⊕F τ b ) ⊕F τ c − τ a ⊕F (τ b ⊕F τ c) = 0 as an element of R[[τ ]]. We must show that every coefficient of G is zero. Write
484
Antonia W. Bluher
G=
X
gijk X i Y j Z k .
i,j,k≥0
Since the N th coefficient of G(τ a , τ b, τ c ) is zero we have X gijk = 0 { i,j,k∈Z≥0 | (a,b,c)·(i,j,k)=N }
(1.3)
for all positive integers a, b, c, N . We need to show each gijk = 0. Suppose not. Among all i, j, k for which gijk is nonzero, consider those for which N1 = i+j +k is minimal. Among all i, j, k with gijk 6= 0 and i + j + k = N1 , consider those for which N2 = i+j is minimal. Finally, among all i, j, k with gijk 6= 0, i+j+k = N1 , and i + j = N2 select the one for which N3 = i is minimal. Call this triple (i0 , j0 , k0 ); that is, i0 + j0 + k0 = N1 , i0 + j0 = N2 , i0 = N3 . Choose integers M1 , M2 , M3 such that M3 ≥ 1,
M2 > M3 N3 ,
M1 > M2 N2 + M3 N3 .
Let (a, b, c) = (M1 + M2 + M3 , M1 + M2 , M1 ),
N = M1 N1 + M2 N2 + M3 N3 .
We will obtain a contradiction by showing that X gijk = gi0,j0 ,k0 6= 0. { i,j,k∈Z≥0 | (a,b,c)·(i,j,k)=N }
(1.4)
Suppose gijk 6= 0 and (a, b, c) · (i, j, k) = N . The equality can be written M1 (i + j + k) + M2 (i + j) + M3 i = N.
(1.5)
Now i + j + k ≥ N1 by the minimality of N1 . Strict inequality cannot hold, since otherwise N = M1 (i + j + k) + M2 (i + j) + M3 i ≥ M1 (N1 + 1) > M1 N1 + M2 N2 + M3 N3 = N. Thus i + j + k = N1 . By minimality of N2 we know i + j ≥ N2 . Again strict inequality cannot hold, since otherwise N = M1 (i + j + k) + M2 (i + j) + M3 i ≥ M1 N1 + M2 (N2 + 1) > M1 N1 + M2 N2 + M3 N3 = N. Thus i + j = N2 . Now the equality (1.5) shows i = N3 . This establishes (1.4) and completes the proof. t u The following proposition gives a general method to construct formal group laws. Proposition 1.2 Let G be an abelian group, 0G its identity element, and write its multiplication law additively. Suppose there is a one-to-one map T : τ R[[τ ]] → G
Formal Groups, Elliptic Curves, and Some Theorems of Couveignes
485
such that T (0) = 0G, and a power series F ∈ R[[X, Y ]] with zero constant term such that T (g) + T (h) = T (F (g, h)) (1.6) for all g, h ∈ τ R[[τ ]]. Then F defines a formal group law. Some easy examples of the above proposition are: (1) G = R[[τ ]] under addition, T = inclusion, F (X, Y ) = X + Y (called the additive group law), and (2) G = R[[τ ]]× under multiplication, T (g) = 1 + g, F (X, Y ) = X + Y + XY (called the multiplicative group law). A less trivial example is the construction of the group law associated to an elliptic curve, which will be given in §4. Proof of Proposition 1.2:. The hypothesis is that there is an injective map T from τ R[[τ ]] into an abelian group G such that T (0) = 0G , and there is a power series F (X, Y ) with zero constant term such that T (g) + T (h) = T (F (g, h)) for all g, h ∈ τ R[[τ ]]. We need to show that F gives an abelian group law on τ R[[τ ]]. By Prop. 1.1, it suffices to show F makes τ R[[τ ]] into an abelian semigroup with identity 0; that is, if f, g, h ∈ τ R[[τ ]] then f ⊕F (g ⊕F h) = (f ⊕F g) ⊕F h,
f ⊕F g = g ⊕F f,
f ⊕F 0 = f.
Now T (f ⊕F (g ⊕F h)) = T (f) + T (g ⊕F h) = T (f) + T (g) + T (h) and similarly T ((f ⊕F g) ⊕F h) = T (f) + T (g) + T (h). This proves the first identity, since T is one-to-one. The other two identities are proved similarly. t u
2
Homomorphisms of Formal Group Laws
If F is a formal group law then write C(F ) for the group it determines. That is, C(F ) = τ R[[τ ]] as a set, and the group law is given by g ⊕F h = F (g, h). If F, F 0 are two formal group laws then a homomorphism from F to F 0 is defined as a power series U (τ ) ∈ τ R[[τ ]] with zero constant term such that g 7→ U (g) defines a homomorphism from C(F ) into C(F 0 ). Explicitly, U ◦ (x ⊕F y) = (U ◦ x) ⊕F 0 (U ◦ y) for all x, y ∈ τ R[[τ ]]. In terms of power series this can be written U (F (X, Y )) = F 0 (U (X), U (Y )).
(2.1)
The reason that U has zero constant term is that U must take τ R[[τ ]] into itself. An example of a homomorphism from F to itself is the multiplication by n map, denoted [n] or [n]F , which is defined by the rules: [0] = 0,
[1] = τ,
[n + 1]τ = [n]τ ⊕F τ = F ([n]τ, τ ) if n > 0, [n] = ι ◦ [−n] if n < 0.
(2.2)
486
Antonia W. Bluher
Let G1 , G2 be abelian groups, and let Ti : τ R[[τ ]] → Gi (i = 1, 2) be one-toone maps such that Ti (0) is the identity element of Gi . Let Fi be power series with zero constant term such that Ti (g) ⊕Gi Ti (h) = Ti (g ⊕Fi h),
i = 1, 2,
where ⊕Gi denotes addition on the group Gi and g ⊕Fi h = Fi (g, h). We showed that Fi is a formal group law, and the above equation simply states that Ti is a group homomorphism from C(Fi ) into Gi . Lemma 2.1 Let Gi , Ti , Fi , C(Fi ) be as above. Suppose there is a group homomorphism ψ : G1 → G2 and a power series U with zero constant term such that (2.3) ψ(T1 (g)) = T2 (U (g)) for all g ∈ τ R[[τ ]]. Then U is a homomorphism between the formal group laws defined by F1 and F2 . Proof. It suffices to show that U is a homomorphism from C(F1 ) to C(F2 ). By hypothesis there is a commutative diagram T
C(F1 ) ,−−−−1−→ G1 ψ U y y T
C(F2 ) ,−−−−2−→ G2 Here T1 , T2 , ψ are homomorphisms and T1 , T2 are injective. It follows by diagram chasing that U is a homomorphism, as claimed. t u As a special case, let G1 = G2 = G, T1 = T2 = T , F1 = F2 = F , and ψ(g) = ng, where n ∈ Z. Then U = [n], which was defined by (2.2). The power series for [n] may either be computed from the recursion (2.2) or from the formula (2.3), which in this context reads nT (g) = T ([n](g))
for g ∈ τ R[[τ ]].
(2.4)
For the additive formal group law we have T = inclusion of τ R[[τ ]] into R[[τ ]] and the formula reads ng = [n](g). So in that case, [n](τ ) = nτ
(Additive Formal Group)
For the multiplicative formal group law we have G = R[[τ ]]× and T (g) = 1 + g, so the formula reads (1 + g)n = 1 + [n](g). In the special case where n = p = the characteristic of R with p > 0 we have (1 + g)p = 1 + gp , and therefore [p](τ ) = τ p
(Multiplicative Formal Group in Char. p).
Formal Groups, Elliptic Curves, and Some Theorems of Couveignes
3
487
Height
If R has characteristic p then the height of a homomorphism U , written h ht(U ), is the largest integer h such that U (τ ) = V (τ p ) for some power series V , or ∞ if U = 0. The height of the formal group law is defined as the height of the homomorphism [p]. For the additive formal group law defined by F (X, Y ) = X + Y we have [p](τ ) = pτ = 0, so the height of F is ∞. For the multiplicative formal group law given by F (X, Y ) = X + Y + XY we have [p](τ ) = τ p , therefore the multiplicative formal group law has height one. P law over an integral Example 3.1 Let F = fij X i Y j be a formal P p group X i Y j . We claim that F (p) is domain R of characteristic p > 0. Let F (p) = fij a formal group law, and φ = τ p is a homomorphism (evidently of height 1) from F to F (p) . For the first assertion, replace X, Y, Z by X 1/p , Y 1/p , Z 1/p in the relation (1.1) then take the pth power. This yields the corresponding relations for F (p). For the second assertion, note that F (p) φ(X), φ(Y ) = F (X, Y )p = φ F (X, Y ) . k
Observe that φk : F → F (p ) .
t u
formal group laws over an integral domain R Proposition 3.2 Let F1 , F2 be P of characteristic p. Let U (τ ) = ui τ i be a homomorphism from F1 to F2 of height k. Then the first nonzero coefficient of U is upk . Moreover, there is a (pk )
homomorphism V : F1
→ F2 such that U = V ◦ φk .
Proof. If P k = 0 then uj 6= 0 for some j which is prime to p, therefore U 0 (τ ) = m mum τ m−1 is nonzero. Differentiate the equation U (F1 (X, Y )) = F2 (U (X), U (Y )) with respect to Y and then set Y = 0. We obtain ∂F1 ∂F2 (X, 0) = U (X), U (0) U 0 (0). U 0 F1 (X, 0) ∂Y ∂Y Since Fi (X, Y ) = X + Y + XY Gi (X, Y ) for i = 1, 2, this becomes U 0 (X) 1 + XG1 (X, 0) = 1 + U (X)G2 (U (X), 0) u1 . The left side is nonzero, therefore u1 6= 0. Now let k ≥ 1 and set q = pk . By definition of height, there is a power series V (τ ) ∈ τ R[[τ ]] such that U (τ ) = V (τ q ). Now V 0 is nonzero, since otherwise V would be a function of τ p , so that q could be replaced by pq. We (q) (q) claim V is a homomorphism from F1 to F2 . We have to show V F1 (X, Y ) = F2 V (X), V (Y ) . The left side is V (F1 (X 1/q , Y 1/q )q ) = U F1 (X 1/q , Y 1/q ) . The right side is F2 U (X 1/q ), U (Y 1/q ) . These two are equal because U is a homomorphism from F1 to F2 . Since V 0 6= 0, V has height zero. It follows from the case k = 0 that the first coefficient of V is nonzero. Thus the coefficient of τ q in U is nonzero. t u
488
Antonia W. Bluher
Proposition 3.3 Let F, F 0, F 00 be formal group laws over an integral domain R of characteristic p. In parts (a), (b), (d) and (e) assume p > 0. (a) If U : F → F 0 , and V : F 0 → F 00 , then ht(V ◦ U ) = ht(V ) + ht(U ). (b) If there is a nonzero homomorphism U from F to F 0 then F and F 0 have the same height. (c) For n ∈ Z, [n]F = nτ + τ 2 (· · ·). (d) Every formal group F over a ring of characteristic p has height at least one. (e) If n = apt with (a, p) = 1 then ht([n]F ) = t ht(F ). P Proof. (a) Define the degree of a nonzero power series ai τ i to be the smallest i such that ai 6= 0. Prop. 3.2 asserts that if U is a nonzero homomorphism of formal group laws then deg(U ) = pht(U ) . The degrees of power series multiply when they are composed, therefore pht(V ◦U ) = pht(V ) pht(U ) = pht(V )+ht(U ) . (b) Certainly [p]F 0 ◦ U = U ◦ [p]F , so [p]F and [p]F 0 have the same height by (a). (c) can easily be shown by induction, using (2.2). (d) is immediate from (c) and Prop. 3.2. (e) ht([n]F ) = ht([a]F ) + t ht([p]F ) by (a). The height of [a]F is zero t u by (c), and ht([p]F ) = ht(F ) by definition. If F, F 0 are formal group laws over an integral domain R and U1 , U2 : F → F 0 , define U1 ⊕F 0 U2 = F 0 (U1 , U2 ). U1 ⊕F 0 U2 is a homomorphism from F to F 0 . This composition rule makes Hom(F, F 0) into an abelian group. In particular, it is a Z-module. Suppose that R has characteristic p > 0. We put a topology on Hom(F, F 0 ) by decreeing that U and V are close iff U F 0 V has a large height. In other words, the topology on Hom(F, F 0 ) is induced from the height metric t u |U | = cht(U ), where 0 < c < 1. Proposition 3.4 Let F, F 0 be formal groups over an integral domain R of characteristic p > 0. (a) ht(U1 ⊕F 0 U2 ) ≥ inf{ ht(U1 ), ht(U2 ) }. If ht(U1 ) < ht(U2 ) then ht(U1 ⊕F 0 U2 ) = ht(U1 ). Hence, the height metric is nonarchimedean. (b) The map Z×Hom(F, F 0) → Hom(F, F 0 ) given by (n, U ) 7→ [n]F 0 ◦U is continuous with respect to the p-adic metric on Z and the height metric on Hom(F, F 0). Hence, Hom(F, F 0 ) is naturally a Zp -module. (c) If ht(F ) < ∞ then Hom(F, F 0 ) is a faithful Zp -module. Proof. (a) Write F 0 (X, Y ) = X + Y + XY G0 (X, Y ). Then U1 ⊕F 0 U2 = F 0 (U1 , U2 ) = U1 +U2 +U1 U2 G0 (U1 , U2 ). Part (a) is therefore true when the word “degree” is substituted for the word “height”. Since ht(Ui ) = logp (deg(Ui )), (a) follows. (b) We must show that if n = m + apt with t large and if U, V ∈ Hom(F, F 0 ) are close then n · U is close to m · V . But n · U F 0 m · V = [n]F 0 ◦ (U F 0 V ) ⊕F 0 [apt ]F 0 ◦ V. The height of [n]F 0 ◦ (U F 0 V ) is ≥ ht(U F 0 V ). The height of [apt ]F 0 ◦ V is ≥ t. Both these heights are large, so the height of the sum is large by (a). (c) We must show that if a ∈ Zp and 0 6= U ∈ Hom(F, F 0 ) then a · U = 0 iff a = 0. k Write a = pk b, where b ∈ Z× p . We have a · U = [p ] ◦ b · U . Certainly b · U 6= 0,
Formal Groups, Elliptic Curves, and Some Theorems of Couveignes
489
since b is invertible, and [pk ] is nonzero since it has finite height. Thus a · U is the composition of two nonzero formal power series over R, and since R is an integral domain, this composition is nonzero. t u It is a theorem of M. Lazard ([3], [4]) that if R is a separably closed field of characteristic p then two formal group laws F, F 0 defined over R are isomorphic iff they have the same height; this gives a partial converse to Prop. 3.3(b). We will see that the height of the formal group law associated to an elliptic curve E defined over a field R of characteristic p is one or two according as E is ordinary or supersingular. Thus Lazard’s Theorem implies that the formal group laws of any two ordinary elliptic curves (or any two supersingular elliptic curves) are isomorphic over the algebraic closure of R. On the other hand, the condition that two elliptic curves over R be isomorphic is much more restrictive (the two curves must have the same j-invariant; see [7], p. 47-50) This means that isomorphisms of formal group laws are far more abundant than isomorphisms of elliptic curves.
4
Constructing the Formal Group Law of an Elliptic Curve
Let E be an elliptic curve over a field K determined by a nonsingular Weierstrass equation W (X, Y, Z) = Y 2 Z + a1 XY Z + a3 Y Z 2 − (X 3 + a2 X 2 Z + a4 XZ 2 + a6 Z 3 ), (4.1) ai ∈ K. Let L be the quotient field of K[[τ ]]. Since K ⊂ L, we can consider the points in E(L). Let R be a subring of K (possibly R = K) containing 1 and all the Weierstrass coefficients ai . We will construct a formal group law by embedding τ R[[τ ]] into E(L) and “stealing” the group law from E(L). Consider points of the form (t, −1, s) in E(K). Then t can be regarded as the function −X/Y ∈ K(E), where K(E) denotes the function field of E over K, and t is a uniformizer at the identity O = (0, 1, 0). Also s can be regarded as the function −Z/Y , and s has a triple zero at O. Let Ω be the ring of functions in K(E) which are defined at O and M the ideal of functions in Ω which vanish at O. Then M is principal, generated by t, and Ω/M ∼ = K by the map f + M 7→ f(O). Ω has a metric induced by M , namely |f| = cn , where 0 < c < 1 and n is the largest integer such that f ∈ M n . The uniformizer t determines an isometry Ψ : Ω → K[[τ ]] (where K[[τ ]] has the τ -adic topology) PN P∞ as follows: f 7→ i=0 ai τ i (where ai ∈ K) iff for each N , f − i=0 ai ti ∈ M N+1 . The image of Ψ is densePin K[[τ ]], since it contains all polynomials. ∞ Let S(τ ) = Ψ (s) = i=3 si τ i . We will prove below that if f ∈ τ R[[τ ]] then (f, −1, S(f)) ∈ E(L), so there is an embedding T : τ R[[τ ]] → E(L) given by T (f) = (f, −1, S(f)).
(4.2)
The formal group law ofE will be the power series F ∈ τ R[[τ ]] such that T (g) + T (h) = T F (g, h) . All we need to do is to prove this power series F exists; it will automatically be a formal group law because of Prop. 1.2.
490
Antonia W. Bluher
By dividing through the Weierstrass equation by Y 3 we see that s and t satisfy the equation s = t3 + a1 ts + a2 t2 s + a3 s2 + a4 ts2 + a6 s3 .
(4.3)
The series S can be computed by recursively substituting approximations for s into the right hand side of (4.3) and expanding to get improved approximations. We start with the approximation s = O(t3 ) to obtain s = t3 + a1 t O(t3 ) + a2 t2 O(t3 ) + a3 (O(t3 ))2 + a4 t(O(t3 ))2 + a6 (O(t3 ))3 = t3 + O(t4 ). On the next round substitute t3 + O(t4 ) for s in the right side of the equation to obtain s = t3 + a1 t4 + O(t5 ). This procedure yields the general rule: s3 = 1, and if n ≥ 4 then s0 = s1 = s2 = 0, X X X si sj +a4 si sj +a6 si sj sk . (4.4) sn = a1 sn−1 +a2 sn−2 +a3 i+j=n
i+j=n−1
i+j+k=n
Lemma 4.1 Let W be the Weierstrass equation (4.1), where ai ∈ R and R is an P integral domain. Let si ∈ R be defined by the recursion (4.4) and let S = si τ i ∈ τ R[[τ ]]. Then W (τ, −1, S) = 0 in R[[τ ]]. If f, g ∈ τ R[[τ ]] and W (f, −1, g) = 0 then g = S ◦ f.
Remark. Since the Weierstrass equation is cubic in the variable Z, it follows that for fixed f ∈ τ R[[τ ]], the equation W (f, −1, g) = 0 has three solutions for g in the algebraic closure of the quotient field of R[[τ ]]. The lemma asserts that exactly one of these solutions lies in τ R[[τ ]]. Proof. Let K be the quotient ring of R and let E be the elliptic curve over K with equation W . Let t = −X/Y , s = −Z/Y ∈ K(E), and Ψ : Ω → K[[τ ]] be as described in the beginning of this section. Then ψ(t) = τ , Ψ (s) = S. Now W (t, −1, s) = 0, so 0 = Ψ (W (t, −1, s)) = W (τ, −1, S). From this it follows that W (f, −1, S ◦ f) = 0 for any f ∈ τ K[[τ ]]. Now suppose f, g ∈ τ R[[τ ]] and W (f, −1, g) = 0. Let h = S ◦ f. Then 0 = W (f, −1, h) − W (f, −1, g)
= (g − h) −1 + a1 f + a2 f 2 + a3 (g + h) + a4 f(g + h) + a6 (g2 + gh + h2 ) .
t u Since −1 + a1 f + · · · is a unit in R[[τ ]], g − h must be zero. The above lemma establishes that the map T : τ K[[τ ]] → E(L) is welldefined, furthermore it is obviously one-to-one. Recall Prop. 1.2, which guarantees that if we can find a power series F in two variables with the properties that
Formal Groups, Elliptic Curves, and Some Theorems of Couveignes
491
F (0, 0) = 0 and T (f) + T (g) = T (F (f, g)) then F will be a formal group law. We now show such an F can be found. First we need to know addition formulas for points of the form (t1 , −1, s1 ). Such formulas are provided below. Proposition 4.2 Let Pi = (ti , −1, si ) for i = 1, 2 be points on the elliptic curve with Weierstrass equation (4.1). (a) Suppose t1 6= 0 and let m = s1 /t1 . If 1 + a2 m + a4 m2 + a6 m3 6= 0 then −t1 −s1 , −1, . (4.5) −P1 = 1 − a1 t1 − a3 s1 1 − a1 t1 − a3 s1 (b) Suppose t1 6= t2 and let m = (s1 − s2 )/(t1 − t2 ), b = s1 − mt1 , A = 1 + a2 m + a4 m2 + a6 m3 . If A 6= 0 then P1 + P2 = −(t3 , −1, mt3 + b), t3 = −t1 − t2 −
a1 m + a2 b + a3 m2 + 2a4 mb + 3a6 m2 b . A
(4.6)
Proof. (b) P1 , P2 lie on the line mX − bY − Z = 0. Let P3 be the third point of intersection of this line with the elliptic curve. Write P3 = (x3 , y3 , z3 ). If y3 = 0 then P3 = (1, 0, m). From the Weierstrass equation (4.1), 1 + a2 m + a4 m2 + a6 m3 = 0, contrary to the hypothesis. Thus y3 6= 0, and hence P3 can be written P3 = (t3 , −1, mt3 + b). Likewise Pi = (ti , −1, mti + b) for i = 1, 2. When (t, −1, mt + b) is substituted for (X, Y, Z) in the Weierstrass equation, the result must be of the form A(t − t1 )(t − t2 )(t − t3 ) with A 6= 0. Hence −(mt + b) + a1 t(mt + b) + a3 (mt + b)2 + t3 + a2 t2 (mt + b) + a4 t(mt + b)2 + a6 (mt + b)3 = A(t − t1 )(t − t2 )(t − t3 ). The left side is of the form (1 + a2 m + a4 m2 + a6 m3 )t3 + (a1 m + a3 m2 + a2 b + 2a4 mb + 3a6 m2 b)t2 + (· · ·)t + (· · ·) and the right side is of the form At3 − A(t1 + t2 + t3 )t2 + · · ·. Now (b) follows immediately. (a) Let P2 = (0, 1, 0), m = s1 /t1 , A = 1 + a2 m + a4 m2 + a6 m3 . Since A 6= 0, (b) implies that P1 + (0, 1, 0) + (t3 , −1, mt3 ) = (0, 1, 0), where t3 = −t1 − (a1 m + a3 m2 )/A. Thus −P1 = (t3 , −1, mt3 ). Now t31 A = t31 + a2 t21 s1 + a4 t1 s21 + a6 s31 = s1 − a1 t1 s1 − a3 s21 , thus
−t1 (t31 A) − (a1 t21 s1 + a3 t1 s21 ) a1 m + a3 m2 = A t31 A −t1 −t1 s1 = . = s1 − a1 t1 s1 − a3 s21 1 − a1 t1 − a3 s1
t3 = −t1 −
t u
492
Antonia W. Bluher
Theorem 4.3 There is a power series F (t1 , t2 ) ∈ R[[X, Y ]] with zero constant term such that for f, g ∈ τ R[[τ ]], T (f) + T (g) = T (F (f, g)).
(4.7)
Therefore F is a formal group law. Proof. Consider Prop. 4.2, but treat t1 , t2 as indeterminates and substitute S(t1 ), S(t2 ) for s1 , s2 . In other words, we are working over the field L0 = the quotient field of R[[t1 , t2 ]]. We need to show t3 of equation (4.6) is a power series in t1 , t2 . Let M be the ideal of R[[t1, t2 ]] generated by t1 and t2 . That is, M is the set of elements µ ∈ R[[t1 , t2 ]] for which µ(0, 0) = 0. If µ ∈ M and u is a unit of R then u + µ is a unit in R[[t1, t2 ]]. Now ∞
m=
S(t1 ) − S(t2 ) X si (ti1 − ti2 ) = t1 − t2 t1 − t2 i=3
=
∞ X
i−1 si (ti−1 + ti−2 + ti−1 1 1 t 2 + · · · + t1 t 2 2 )
i=3
so m belongs to M 2 . Then A = 1 + a2 m + a4 m2 + a6 m3 is a unit in R[[t1, t2 ]], since A is the sum of a unit in R and an element of M . In particular, A 6= 0, so Prop. 4.2(b) applies. Also b = S(t1 ) − mt1 ∈ M 3 . Now (4.6) shows that t3 ∈ M . Thus we can write t3 = G(t1 , t2 ), G ∈ M . Certainly t3 6= 0, because G ≡ −t1 − t2 mod M 2 . We have (t1 , −1, S(t1 )) + (t2 , −1, S(t2 )) = −(t3 , −1, s3 ) in E(L0 ), where s3 = mt3 + b ∈ M 3 . By Prop. 4.2(a), the right side is −s3 −t3 , −1, . 1 − a1 t3 − a3 s3 1 − a1 t3 − a3 s3 Let F (t1 , t2 ) =
−t3 ∈ M, 1 − a1 t3 − a3 s3
H(t1 , t2 ) =
−s3 ∈ M 3. 1 − a1 t3 − a3 s3
If we substitute t1 = f(τ ), t2 = g(τ ) for f, g ∈ τ R[[τ ]] we get a homomorphism R[[t1 , t2 ]] → R[[τ ]], which induces a homomorphism E(L0 ) → E(L). It follows that (f, −1, S(f)) + (g, −1, S(g)) = (F (f, g), −1, H(f, g)). By Lemma 4.1 H(f, g) = S(F (f, g)). This proves (4.7). The fact that F is a formal group law follows from Prop. 1.2. u t The first few terms of F are: F (X, Y ) = X + Y − a1 XY − a2 (X 2 Y + XY 2 ) − (2a3 X 3 Y + (3a3 − a1 a2 )X 2 Y 2 + 2a3 XY 3 ) + · · ·
Formal Groups, Elliptic Curves, and Some Theorems of Couveignes
5
493
Homomorphisms of Formal Group Laws Arising from Isogenies
Let E, E 0 be two elliptic curves defined over the same field K. An algebraic map from E to E 0 is a function α : E(K) → E 0 (K) such that for each P ∈ E there exist homogeneous polynomials f1 , f2 , f3 of the same degree and not all vanishing at P such that for all but finitely many Q ∈ E(K), α(Q) = (f1 (Q), f2 (Q), f3 (Q)). An example of an algebraic map from E to itself is the translation by P map τP (Q) = P + Q for P, Q ∈ E. The algebraic map is said to be defined over a field K if E, E 0 are defined over K and if all the coefficients of f1 , f2 , f3 can be chosen to belong to K. It is a theorem ([7], p. 75) that every nonconstant algebraic map from E into E 0 which takes the origin to the origin is a group homomorphism. Such an algebraic map is called an isogeny. If τ : E → E 0 and −Q = τ (0, 1, 0) ∈ E 0 then τQ ◦ τ takes the origin of E into the origin of E 0 . Thus every nonconstant algebraic map is the composition of an isogeny with a translation. Two curves E, E 0 are called isogenous over K if there exists an isogeny defined over K from E into E 0 . The endomorphism ring of E, written EndK (E), is the set of isogenies over K from E to itself, together with the constant zero map, with the addition and multiplication laws: (α + β)(P ) = α(P ) + β(P ),
αβ = α ◦ β.
Note that Z ⊂ EndK (E). If K is the finite field with q elements then the Frobenius endomorphism ϕq is defined by ϕq (X, Y, Z) = (X q , Y q , Z q ). Since ϕq coincides with the Galois action, it commutes with any endomorphism of E which is defined over K. In particular, ϕq commutes with Z. We claim that an isogeny of elliptic curves over K gives rise to a homomorphism of the corresponding formal group laws over K. Indeed, let I(X, Y, Z) = (f1 (X, Y, Z), f2 (X, Y, Z), f3 (X, Y, Z)) be an isogeny between elliptic curves E, E 0 over K. Here f1 , f2 , f3 are homogeneous polynomials of the same degree, say d, and f1 , f2 , f3 do not simultaneously vanish at the origin. Since the origin of E is carried to the origin of E 0 , f1 and f3 vanish at O = (0, 1, 0) but f2 (O) 6= 0. Thus f1 /Y d ∈ M and f2 /Y d ∈ Ω × . Now f1 /Y d = f1 (X/Y, 1, Z/Y ) = f1 (−t, 1, −s) = (−1)d f1 (t, −1, s) ∈ M and similarly f2 /Y d = (−1)d f2 (t, −1, s) ∈ Ω × . Thus f1 (X, Y, Z)/f2 (X, Y, Z) = f1 (t, −1, s)/f2 (t, −1, s) ∈ M. P∞ Let U (τ ) = i=1 ui τ i denote the expansion of f1 /f2 with respect to t. Practically speaking, U can be obtained by expanding s as a power series S and then computing f1 (τ, −1, S(τ ))/f2 (τ, −1, S(τ ))
494
Antonia W. Bluher
in the ring K[[τ ]]. Note that f2 (τ, −1, S(τ )) is invertible since its constant term is nonzero. Proposition 5.1 Let E, E 0 , E 00 be elliptic curves over K and let F, F 0 , F 00 denote the associated formal group laws over K. If I : E → E 0 is an isogeny then the power series U constructed above belongs to Hom(F, F 0 ). The map I 7→ U is a one-to-one group homomorphism from Isog(E, E 0 ) ,→ Hom(F, F 0). If I 0 : E 0 → E 00 and I 0 corresponds to U 0 ∈ Hom(F 0 , F 00) then I 0 ◦ I corresponds to U 0 ◦ U ∈ Hom(F, F 00). Proof. Let L be the quotient field of K[[τ ]]. Since I is defined over K, it is a priori defined over L. The discussion above shows that I can be written in a neighborhood of the origin as f3 (t, −1, s) f1 (t, −1, s) , −1, . I(X, Y, Z) = f2 (t, −1, s) f2 (t, −1, s) Let T : τ K[[τ ]] → E(L) and T 0 : τ K[[τ ]] → E 0 (L) be the embeddings (4.2). Substitute (X, Y, Z) → T (f) = (f, −1, S(f)) ∈ E(L), where f ∈ τ K[[τ ]]. Then t = −X/Y changes to f and s = −Z/Y changes to S ◦ f. Thus I(T (f)) = (U (f), −1, V (f)), where U (τ ) = f1 (τ, −1, S(τ ))/f2 (τ, −1, S(τ )) ∈ τ K[[τ ]] and V (τ ) = f3 (τ, −1, S(τ ))/f2 (τ, −1, S(τ )) ∈ τ K[[τ ]]. By Lemma 4.1, V = S 0 ◦ U , where S 0 (t) is the power series expansion for −Z/Y in the curve E 0 . Thus I(T (f)) = T 0 (U (f)).
(5.1)
By Lemma 2.1, this equation proves that U is a homomorphism of formal group laws. If I1 , I2 ∈ Isog(E, E 0 ), and if U1 , U2 ∈ Hom(F, F 0) are the corresponding homomorphisms of formal group laws then on the elliptic curve E(L), (I1 + I2 )(τ, −1, S τ ) by definition of I1 + I2 = I1 τ, −1, S(τ ) + I2 τ, −1, S(τ ) 0 0 by (5.1) = T (U1 ) + T (U2 ) 0 0 by (4.7). = T (F (U1 , U2 )) On the other hand, if I1 + I2 corresponds to U3 then (I1 + I2 ) τ, −1, S(τ ) = T 0 (U3 ). Since T 0 is one-to-one, U3 = F 0 (U1 , U2 ) = U1 ⊕F 0 U2 . This shows that the map I 7→ U is a group homomorphism. Finally, if I : E → E 0 , I 0 : E 0 → E 00 correspond to U, U 0 , respectively, then since U is the unique solution in τ K[[τ ]] to I ◦ T = T 0 ◦ U , I 0 ◦ I ◦ T = I 0 ◦ T 0 ◦ U = T 00 ◦ U 0 ◦ U, whence I 0 ◦ I corresponds to U 0 ◦ U .
t u
Formal Groups, Elliptic Curves, and Some Theorems of Couveignes
495
Example 5.2 Let F be the formal group law over R associated to an elliptic curve E with Weierstrass equation (4.1), where the coefficients ai ∈ R, and R is an integral domain. We will compute [−1]F . Let g ∈ τ R[[τ ]]. By Proposition 4.2(a), −S ◦ g −g , −1, [−1]E T (g) = [−1]E (g, −1, S ◦ g) = 1 − a1 g − a3 S ◦ g 1 − a1 g − a3 S ◦ g The right side is T −g/(1 − a1 g − a3 S ◦ g) by Lemma 4.1. Now Lemma 2.1 implies ∞ X −τ = −τ (a1 τ + a3 S)n . [−1]F = 1 − a1 τ − a3 S n=0 t u An isogeny I : E → E 0 is called separable if it has the property: if t0 is a uniformizer at the origin of E 0 then t0 ◦ I is a uniformizer at the origin of E. This definition does not depend on the choice of uniformizer t0 . An isogeny which is not separable is called inseparable. In characteristic zero, all isogenies are separable. In characteristic p, the Frobenius is not separable, since it carries uniformizers into pth powers of uniformizers. It is a theorem ([7], II.2.12) that every isogeny can be factored as ϕkp from E into E (q) (q = pk ) composed with a separable isogeny from E (q) into E 0 . P ui τ i be the Lemma 5.3 Let I be an isogeny from E to E 0 and let U (τ ) = corresponding homomorphism between the formal group laws. I is separable iff u1 6= 0. Proof. Let t0 be the function −X/Y ∈ K(E 0 ). U is the power series expansion of t0 ◦ I with respect to the uniformizer t = −X/Y ∈ K(E). Thus t0 ◦ I is not a 2 iff u1 = 0. t u uniformizer at the identity of E iff t0 ◦ I ∈ M(0,1,0) Example 5.4 Let E be an elliptic curve whose Weierstrass coefficients ai belong to a field K of characteristic p > 0, and let F be its associated formal group law. Let E (p) be the elliptic curve with Weierstrass coefficients api . Then the Frobenius map ϕp : E → E (p) defined by ϕ(X, Y, Z) = (X p , Y p , Z p ) corresponds to the t u homomorphism of formal group laws φ = τ p : F → F (p).
6
Height of an Elliptic Curve
We begin this section with some facts about elliptic curves over finite fields. If α : E → E 0 is an isogeny, define α∗K(E 0 ) = { f ◦ α | f ∈ K(E 0 ) }; this is a subfield of K(E). The degree of an isogeny α : E → E 0 is the index of α∗ K(E 0 ) in K(E). This number is finite because both fields have transcendence degree 1 and α is a nonconstant map. If K has characteristic p then the Frobenius isogeny ϕp (X, Y, Z) = (X p , Y p , Z p ) from E into E (p) has degree p. Here E (p) is the curve
496
Antonia W. Bluher
whose Weierstrass equation is obtained from that of E by raising the coefficients to the pth power. ˆ : E 0 → E. The dual isogeny Every isogeny α : E → E 0 has a dual isogeny α ˆ ◦ α = [deg(α)]E , is characterized by the property that α ◦ α ˆ = [deg(α)]E 0 and α where [n]E denotes multiplication by n. If E = E 0 , then there is an integer a(α), called the trace of α, such that α + α ˆ = [a(α)]E . The endomorphism α satisfies the quadratic equation α2 − [a(α)]α + [deg(α)] = 0
in End(E).
In particular, if K has q elements then there is t ∈ Z such that ϕ2q − [t]ϕq + [q] = 0. The integer t is called the trace of Frobenius. It is well known ([7], Ch. 5) √ that |t| ≤ 2 q and the cardinality of E(K) is q + 1 − t. The height of a formal group law was defined in §3. Naturally, the height of an elliptic curve is defined to be the height of the associated formal group law. Proposition 6.1 An elliptic curve over a field of characteristic p, where p > 0, has height one or two. E (p) → E its dual. Proof. Let ϕp : E → E (p) be the pth power Frobenius and ϕˆp : P Let F be the formal group law associated to E, and let V (τ ) = vi τ i : F (p) → F be the homomorphism of formal group laws associated to ϕˆp . Then [p]F = V (τ p ). If ϕˆp is separable then v1 6= 0, so E has height one. If ϕˆp is inseparable, it can be written as a composition of a power of ϕp and a separable isogeny ([7], Corollary II.2.12). Since the degree of ϕˆp equals the degree of ϕp , only one power of ϕp can occur ˆp = α ◦ ϕp with α an isomorphism. Let P in this decomposition. Thus ϕ A = ai τ i be the power series corresponding to α and let A0 be the power series 2 2 corresponding to α−1 . Then [p]E = A(τ p ) = a1 τ p + · · ·, and a1 6= 0 because 0 t u A ◦ A (τ ) = τ . In this case E has height two. An elliptic curve in characteristic p of height one is called ordinary. An elliptic curve in characteristic p of height 2 is called supersingular. The next lemma gives another characterization of supersingular and ordinary curves when the underlying field is finite. Proposition 6.2 An elliptic curve E over a finite field K with q = pn elements is supersingular iff p divides the trace of Frobenius iff |E(K)| ≡ 1 mod p. If E is √ supersingular and n is even then |E(K)| = q + 1 + m q, m ∈ { −2, −1, 0, 1, 2 }. If E is supersingular, n is odd, and p ≥ 5, then |E(K)| = q + 1. If E is supersin√ gular, n is odd, and p ≤ 3 then |E(K)| = q + 1 + m pq, where m ∈ { −1, 0, 1 }. For a more precise statement about which values of |E(K)| can occur, the reader may consult [8], Theorem 4.1.
Formal Groups, Elliptic Curves, and Some Theorems of Couveignes
497
Proof. As above, let F be the formal group law corresponding to E and V : F (p) → F the homomorphism of formal group laws corresponding to ϕˆp . In other words, V is defined by [p]F = V (τ p ). Recall that E (p) denotes the elliptic curve whose Weierstrass equation is obtained by taking the pth powers of the Weierstrass coefficients for E, and we use similar notation for isogenies. Now k k+1 k k k+1 ϕˆ(p ) : E (p ) → E (p ) is the dual of the map ϕp : E (p ) → E (p ) , so n−1
ˆp(p ϕ ˆp ◦ ϕˆ(p) p ◦···◦ϕ
)
is the dual of ϕnp . The corresponding formal group law homomorphism is n−1
N (V ) = V ◦ V (p) ◦ · · · ◦ V (p
)
.
Let t be the trace of Frobenius, so that |E(K)| = q + 1 − t. Since [t]E is the sum of ϕnp and its dual in End(E), it follows that n
n
[t]F = N (V ) ⊕F τ p = F (N (V ), τ p ). If E is supersingular then V has height one, so N(V ) has height n. In that case, [t]F has height at least n, so [t2 ]F has height at least 2n. Since the height of F is √ two in this case, Prop. 3.3(e) implies t2 is divisible by pn . Since |t| ≤ 2 q and q|t2 , we deduce that t2 ∈ { 0, q, 2q, 3q, 4q }. Since t ∈ Z, we√find t ∈ { 0, ±q 1/2 , ±2q 1/2 } if n is even; √ t = 0 if n is odd and p > 3; t ∈ { 0, ± 2q } if n is odd and p = 2, t ∈ { 0, ± 3q } if n is odd and p = 3. Since |E(K)| = q + 1 − t, the cardinality of E(K) must be of the form stated. Next suppose E is ordinary. Then N(V ) has height zero, so [t]F has height zero. In that case Prop. 3.3(e) implies t is prime to p. t u Proposition 6.3 If E is an ordinary elliptic curve defined over a field K of cardinality pn and F is its associated formal group law then the trace of the Frobenius endomorphism is equal mod p to the norm from K to Fp of the first nonzero coefficient of [p]F . Proof. Let |K| = pn = q. The homomorphism of F associated to ϕ2q + [−t]E ϕq + [q]E is zero, thus each of its coefficients is zero. Now ϕq corresponds to the power series τ q , and [−t]E corresponds to a power series of the form −tτ + τ 2 (· · ·), 2 therefore ϕ2q + [−t]E ◦ ϕq corresponds to F (τ q , −tτ q + τ 2q (· · ·)), which is of the form −tτ q +τ 2q (· · ·). Finally, we evaluate [q]F . Let φ = τ p . Since φ◦V = V (p) ◦φ, n−1
[q]F = (V ◦ φ)n = V ◦ V (p) ◦ · · · ◦ V (p
)
◦ φn = (NK/Fp (v)τ + (· · ·)τ 2 ) ◦ τ q ,
so [q]F = NK/Fp (v)τ q + (τ 2q )(· · ·). Thus 0 = F −tτ q + τ 2q (· · ·), NK/Fp (v)τ q + τ 2q (· · ·) = (−t + NK/Fp (v))τ q + τ 2q (· · ·). t u
498
7
Antonia W. Bluher
Some Theorems of Couveignes
Let R be an integral domain of characteristic p. Let Fp ⊂ R be the field with p elements if p is prime, and Fp = Z if p = 0. Let X X 0 fij X i Y j , F0 = fij XiY j F = i,j
i,j
P∞ be two formal group laws over R, and let U (τ ) = i=1 ui τ i ∈ τ R[[τ ]] be a homomorphism from F to F 0 . Couveignes proved with an elementary argument in his PhD thesis that the coefficients ui satisfy some simple relations over R. He used these relations to compute the orders of elliptic curves over finite fields of small characteristic (see [2] and [6]). In [1] it is shown that Couveignes’ method is closely related to the modified Schoof algorithm which was developed by Atkins and Elkies; see [5] and its bibliography. In this section we state and prove Couveignes’ theorems. In the next section we prove related results which are used in [1]. Theorem 7.1 Let i be a positive integer which is not a power of p. If p = 0 assume mi is a unit in R for some 1 ≤ m < i. There is a polynomial Ci in several variables with coefficients in Fp such that for each F, F 0 , U as above we have 0 | 1 ≤ j < i, 1 ≤ k + ` ≤ i ). ui = Ci (uj , fk`, fk` Proof. Let A be transcendental and work in the integral domain R[A]. Since U is a homomorphism, U (F (τ, Aτ )) = F 0 (U (τ ), U (Aτ )). By (1.2) there are power series G, G0 ∈ R[[X, Y ]] such that F (X, Y ) = X + Y + XY G(X, Y ) and F 0 (X, Y ) = X + Y + XY G0 (X, Y ). Therefore X uj (τ + Aτ + Aτ 2 G(τ, Aτ ))j = X X uj (Aτ )j + U (τ )U (Aτ )G0 (U (τ ), U (Aτ )). uj τ j + This can be rewritten X 0= uj τ j {(1 + A + Aτ G(τ, Aτ ))j − (1 + Aj )} ∞ ∞ ∞ ∞ X X X X j j 0 j uj+1 τ )( uj+1 (Aτ ) )G ( uj τ , uj (Aτ )j ). − Aτ ( 2
j=0
j=0
j=1
j=1
The coefficient of τ i is of the form ui {(1 + A)i − (1 + Ai )} + Mi , where Mi is a polynomial in A, u1 , u2 , . . . , ui−1 and in some of the coefficients of G, G0. This gives the relation ui {(1 + A)i − (1 + Ai )} − Mi = 0.
Formal Groups, Elliptic Curves, and Some Theorems of Couveignes
499
The hypothesis that i is not a power of p implies (1 + A)i 6= 1 + Ai . If p = 0 choose m such that mi is a unit in R, and if p > 0 let m be a positive integer such that the coefficient of Am is nonzero in the polynomial (1 + A)i − (1 + Ai ). In characteristic p this coefficient is a unit in R because it is a nonzero element of the prime field Fp . Since A is transcendental, the coefficient of Am in our relation must be identically zero. This coefficient gives our desired formula for t u ui in terms of the uj and the coefficients of F and F 0 . The next theorem accounts for the ui when i is a power of p. It was proved by Couveignes for formal group laws associated to ordinary elliptic curves, but his argument generalizes easily to formal group laws of any height. Theorem 7.2 Let i be a power of a prime p and let h > 0. There is aP polynomial variables with coefficients in F such that: if F = fk` X k Y ` Ci in several p P 0 j ` 0 and F = fj` X Y are formal group laws of height h over a domain R of P characteristic p and U = uj τ j : F → F 0 a homomorphism then 0 | j < i, k + ` ≤ qi ) v10 uqi − v1i ui = Ci (uj , fk`, fk`
where q = ph and v1 , v10 are the first nonzero coefficients of the power series [p]F , [p]F 0 , respectively. Proof. By Prop. 3.2 we can write [p]F (τ ) = V ◦ φh (τ ) = V (τ q ), where V (τ ) = P vj τ j is a homomorphism of height zero from F (q) to F 0 . It is easy to show by induction on n that for n > 0 the jth coefficient of [n]F is a polynomial in the fk` with k + ` ≤ j. Since vj is the jqth coefficient of [p]F , vjPis a polynomial in vj0 τ j , and vj0 is a the fk` with k + ` ≤ jq. Similarly [p]F 0 = V 0 ◦ φh , V 0 (τ ) = 0 polynomial in the fk` with k + ` ≤ jq. Since [p]F 0 ◦ U = U ◦ [p]F , V 0 (U (τ )q ) = U V (τ q ) . Let σ = τ q . The left side is ∞ ∞ X X uqj σ j ) + v20 ( uqj σ j )2 + · · · , v10 ( j=1
j=1
and the coefficient of σ i is of the form v10 uqi plus terms involving uj for j < i and vj0 for j ≤ i. The right side is X X X vj σ j ) + u2 ( vj σ j )2 + · · · + ui ( vj σ j )i + · · · . u1 ( j
j
j
This time the coefficient of σ i is of the form ui (v1 )i plus terms involving uj for j < i and vj for j ≤ i. By equating the two sides we get v10 uqi − v1i ui equals a t u polynomial in the uj for 1 ≤ j < i and the vj , vj0 for 1 ≤ j ≤ i.
500
8
Antonia W. Bluher
Further Results Relating to Couveignes’ Theorems
Fix the following notation throughout this section. Let R be an integral domain of characteristic p > 0, F and F 0 formal group laws of height h over R, and q = ph . Let C1 , C2 , . . . denote Couveignes’ relations given in §7 evaluated at the coefficients of F, F 0 but leaving the ui as indeterminates; thus Ci ∈ R[X1 , . . . , Xi ] and Ci = Xi + a certain polynomial in X1 , . . . , Xi−1 if i is not a power of p; Ci = v10 Xiq − v1i Xi + a certain polynomial in X1 , . . . , Xi−1 if i is a power of p. Here the vi and vi0 lie in R, since they are polynomials inPthe coefficients of F ui τ i ∈ Hom(F, F 0 ) and F 0 , respectively. Couveignes’ theorems assert that if then Ci (u1 , . . . , ui ) = 0 for all i. Let K denote the separable algebraic closure of the quotient field of R. Lemma 8.1 There are exactly q n solutions (u1 , . . . , upn −1 ) with ui ∈ K to the first pn − 1 of Couveignes’ relations. Proof. For each solution (w1 , . . . , wi−1) to the first i−1 of Couveignes’ equations over K there are q values or 1 value of wi such that (w1 , . . . , wi ) is a solution to the ith relation, according as i is or is not a power of p. (To see that the q solutions for wi are distinct when i is a power of p, note that the derivative with respect to Xi of Ci is v1i , which is nonzero.) The lemma now follows easily by induction on n. t u Theorem 8.2 If u1 , u2 , . . . is a solution to Couveignes’ relations then Hom(F, F 0 ).
P
ui τ i ∈
Proof. Without loss of generality we can replace R by K. In Chapter III, §2 of [3] it is shown that Hom(F, F 0) is free over Zp of rank h2 and pn Hom(F, F 0 ) is the set of homomorphisms with height ≥ nh. (In fact, it is shown that Hom(F, F 0) is the maximal order of a central division algebra over Qp of rank h2 and invariant 1/h, but we do not need this here.) It follows that a complete set of Zp -module generators U1 , . . . , UP h2 can be found such that the height of each generator is less than h, and if ci Ui has height ≥ nh for some ci ∈ Zp then each ci is divisible by pn . If U, U 0 ∈ Hom(F, F 0) and U ≡ U 0 mod deg q n (meaning that the ith coefficient of U and U 0 coincide for all i ≤ q n ) then 0 = F 0 (U 0 , [−1]F 0 ◦ U 0 ) ≡ F 0 (U, [−1]F 0 ◦ U 0 ) = U F 0 U 0 mod deg q n , P ci Ui ≡ so U F 0 U 0 has height ≥ nh, and it is therefore divisible by pn . Thus P c0i Ui mod deg q n (ci , c0i ∈ Zp ) implies ci ≡ c0i mod pn . This shows that the Pqn −1 number of distinct elements i=1 ui τ i which are truncations of power series in 2 Hom(F, F 0 ) is the cardinality of (Z/pn Z)h , which is q nh . Each truncation gives rise to a solution (u1 , . . . , uqn−1 ) of the first q n −1 of Couveignes’ relations. Since this coincides with the total number of solutions, each solution of Couveignes’ t u relation arises from Hom(F, F 0).
Formal Groups, Elliptic Curves, and Some Theorems of Couveignes
501
Corollary 8.3 If h = 1 and if Hom(F, F 0 ) contains a homomorphism (with coefficients in R) of height k then all the solutions (v1 , v2 , . . .) in K to Couveignes’ relations for which vi = 0 for i < pk actually lie in R. Proof. Let U be the homomorphism of height k and Zp · U = { c · U | c ∈ Zp }. As mentioned in the previous proof, Hom(F, F 0) ∼ = Zp , and it is generated by a homomorphism U0 of height zero. Find a ∈ Zp such that U = a · U0 . Since ht(a · U0 ) = vp (a), vp (a) = k. Thus Zp · U = Zp a · U0 = pk Zp · U0 . Since U is defined over R, so is c · U for each c ∈ Zp . Thus every element of pk Zp · U0 has coefficients in R. The coefficients of such elements are precisely the solutions t u (v1 , v2 , . . .) to Couveignes’ relations which have vi = 0 for all i < pk − 1.
References 1. A. W. Bluher, Relations between certain power sums of elliptic modular forms in characteristic two, to appear in J. Number Theory 2. J. M. Couveignes, Quelques calculs en theorie des nombres, Ph.D. thesis, Bordeaux, 1995 3. A. Frohlich, Formal Groups, Lect. Notes in Math. 74, Springer-Verlag, 1968 4. M. Hazewinkel, Formal Groups and Applications, Academic Press, New York, 1978 5. R. Lercier and F. Morain, Counting the number of points on elliptic curves over Fpn using Couveignes’ algorithm, Research report LIX/RR/95/09, Ecole PolytechniqueLIX, September 1995 6. R. Lercier and F. Morain, Counting the number of points on elliptic curves over finite fields: strategies and performances, Advances in Cryptology – EUROCRYPT ’95 Lect. Notes in Computer Science 921, Springer, 1995, 79-94 7. J. H. Silverman, The Arithmetic of Elliptic Curves, Springer-Verlag, New York, 1986 ´ Norm. Sup. 8. W. C. Waterhouse, Abelian varieties over finite fields, Ann. Scient. Ec. 2 1969, 521-560
A Comparison of Direct and Indirect Methods for Computing Selmer Groups of an Elliptic Curve Z. Djabri1 and N.P. Smart2 1
2
Institute of Maths and Statistics, University of Kent at Canterbury, Canterbury, Kent, CT2 7NF, U.K. [email protected] Hewlett-Packard Laboratories, Filton Road, Stoke Gifford, Bristol, BS12 6QZ, U.K. [email protected]
Abstract. In this paper we examine differences between the two standard methods for computing the 2-Selmer group of an elliptic curve. In particular we focus on practical differences in the timings of the two methods. In addition we discuss how to proceed if one fails to determine the rank of the curve from the 2-Selmer group. Finally we mention briefly ongoing research into generalizing such methods to the case of computing the 3-Selmer group.
Computing the 2-Selmer group is a basic problem in the computational theory of elliptic curves over the rationals. It is, assuming the Tate-Shaferevich group, X, has no 2-primary part, the most efficient way known of computing the rank and generators of the Mordell-Weil group. That we do not have an algorithm to compute the Mordell-Weil group in general is one of the major open problems in the theory of elliptic curves. The computation of the Mordell-Weil group is basic to many Diophantine problems such as computing the set of integral points on a curve via elliptic logarithms, [12], [23], [22], or verifying the Birch-Swinnerton-Dyer conjecture, [2], [3]. Throughout this paper, by an elliptic curve we shall mean a curve of the form E : Y 2 = X 3 − 3IX + J
(1)
where I, J ∈ ZZ. We let ∆ = 4I 3 − J 2 denote the discriminant of the curve. There are currently two methods used to compute the 2-Selmer group, S2 . The first method, which is essentially part of the standard proof of the MordellWeil theorem, uses number field arithmetic. This method works directly with the Selmer group and can therefore make explicit use of the underlying group structure of the elements. The second method, due to Birch and SwinnertonDyer, [2], computes S2 in an indirect way by computing a set of binary quartic forms which indirectly represent the elements of S2 . Although the method still has access to the group structure, it comes in an indirect way, through the invariant theory of the quartic forms. The indirect method of Birch and Swinnerton-Dyer has recently undergone major improvement due to the work of Cremona, see [7] and [8]. The methods J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 502–513, 1998. c Springer-Verlag Berlin Heidelberg 1998
Selmer Groups of an Elliptic Curve
503
p estimated complexity is O( |∆|), but is known to be very fast in practice. Cremona has implemented this method in his program, mwrank, which is now widely used. On the other hand the direct method can be shown to have conjectured subexponential complexity in |∆|, see [21]. This sub-exponential behaviour is due to the conjectured sub-exponential complexity of determining the basic invariants of cubic number fields, such as generators of the unit and class groups. We decided to compare the practical behaviour of the two methods. This paper describes our findings. For the indirect method we used the code in mwrank which we used in a way so that it only output S2 and did not try to determine which quartics had small rational points. It turned out that this modification made very little difference in practice. The version of mwrank used was one of March 1998, this version had many improvements incorporated from [8] which were not available six months previously. This reduced the running times for the indirect method considerably from those in our initial experiments. After comparing the methods to compute S2 we look at how one can overcome the obstruction to computing E(Q)/2E(Q). The first way is by performing further descents on the elements in S2 . We shall see that the indirect method of computing S2 is more suited to performing these second descents. Finally we report on problems that one encounters when trying to generalize the direct and indirect methods to compute the 3-Selmer group of an elliptic curve, S3 . If such a method could be made practical this would allow the computation of the Mordell-Weil group when there exists no 3-torsion in X. This clearly would be of importance when there are elements of order 4 in X, as then the methods for constructing S2 and performing further descents become less useful. The authors would like to thank J. Merriman, P. Swinnerton-Dyer and E. Schaefer for useful conversations and communications during which the work in this paper was carried out. In particular the authors express their gratitude to J. Cremona for making available his latest improvements to mwrank and the associated preprints which explain the changes and enhancements.
1
The Direct Method
If F (X) = X 3 − 3IX + J is reducible then the curve has a point of order two so decent via two-isogeny should always be the preferred method. We therefore assume that F (X) is irreducible. We shall quickly recap on the direct method, so as to explain our implementation in more detail. Let θ denote a root of F (X) and set K = Q(θ). Let S denote the set of places of K which either divide 2∆ or are infinite. We define K(S, 2) to be the set of elements of K, modulo squares, which give an unramified extension away from S on addition of their square root to K. Using the LiDIA, [15], and PARI, [1], libraries we wrote a C++ program to compute a set of generators for the group K(S, 2) in any given example. This
504
Z. Djabri and N.P. Smart
is the “hard” part of the direct method which has conjectured sub-exponential complexity. The method used was the one described in [21]. We then restrict our attention to the subgroup, H, of K(S, 2) which is the kernel of the norm map: NK/Q : K(S, 2) → Q∗ /Q∗2 . Clearly determining the generators of H is simply an application of linear algebra over IF2 . Then for every element α ∈ H we need to determine whether there exists X, Z ∈ Q such that we can find a β ∈ K with X − θZ 2 = αβ 2 . Using a standard method, see [6][Page 70], this reduces the problem to determining a simultaneous solution to a system of quadratic forms Q1 (x1 , x2, x3 ) = 0 ,
Q2 (x1 , x2 , x3 ) = −x24 ,
where Q1 and Q2 are quadratic forms in three variables. The 2-Selmer group is those set of α’s which give rise to a pair as above which have a solution everywhere locally. The first test is whether Q1 = 0 has a solution everywhere locally. If it does we can find a global solution, by the Hasse Principle for curves of genus zero, and use this to express the general solution, (x1 , x2, x3 ), as three quadratic forms in two variables. Substituting these into the second equation gives a “quartic” of the form x24 = G(m, n) where G(m, n) is a binary quartic form. We can then test whether this equation is locally soluble everywhere using the random polynomial time method described in [16]. This last method only works for p 6= 2 but for small p, in particular p = 2, we can use the standard method which is explained in [9]. If one naively carries out this method the two forms, Q1 and Q2 , we obtain can have rather large coefficients. The global solution to Q1 = 0 can be hard to determine as the standard solution method, due to Lagrange, requires square root extraction modulo composite moduli. The quartic form, G(m, n), will in general also have prohibitively large coefficients. It should be noted that the indirect method not does suffer from this problem as it computes “reduced” quartic forms. To get around these problems we note that we need only check locally solubility at each stage for primes dividing 2∆ and infinity. We could therefore carry the above computations out for each prime in turn and not work globally. Suppose we wish to test α for the prime p, it would be advantageous if we could decide what level of p-adic precision we would need before starting any computation. To see how to do this write the two quadratic forms as Q1 (x) = xt Ax = 0 ,
− Q2 (x) = xt Bx = x24 ,
where A and B are symmetric integer matrices. Then by a unimodular change of variable we can diagonalize A. The matrices of the new equivalent quadratic forms we shall by abuse of notation also refer to as A and B. We let ∂(A, B) denote the discriminant of det(XA − B).
Selmer Groups of an Elliptic Curve
505
Lemma 1. There is an algorithm to detect the local solubility of the pair of quadratic forms at an odd prime p which runs in random polynomial time and which requires working to a p-adic accuracy of pe where e = ordp (212 ∂(A, B) det(A)2 ). Proof. We first check whether xt Ax = 0 has a solution modulo p. If it does then we find a p-adic solution (x1 , x2 , x3 ) ≡ (χ1 , χ2 , χ3 ) (mod pe ) such that, after a possible reordering of the variables, we have χi ∈ ZZ and χ1 6≡ 0 (mod p). This last step can be done in polynomial time using Hensel’s Lemma. As A = diag(a1 , a2 , a3 ) we set a2 m2 + a3 n2 , r=− 2a2 mχ2 + 2a3 nχ3 for two new variables m and n. Then all solutions to Q1 ≡ 0 (mod pe ) are parameterized by m and n where x1 = rχ1 ,
x2 = rχ2 + m ,
x3 = rχ3 + n.
Substituting these into x24 = xt Bx and clearing the denominator of 4(a2 mχ2 + a3 nχ3 )2 we obtain a binary quartic form, G(m, n) (mod pe ). The discriminant of G(m, n) is equal to Λ = 212 ∂(A, B)a21 a22 a23 χ61 , as can be verified by a computer algebra system. To check the local solubility of x24 = G(m, n) at p we need only have computed G(m, n) to an accuracy of at most pordp (Λ) = pe . That we can determine the local solubility of x24 = G(m, n), when p is odd, in random polynomial time follows from Section 7 of [16], as has already been mentioned. A similar method to the one above can be applied when one wishes to check the pair of forms for local solubility over IR or over Q2 . Care of course needs to be taken that one works to a sufficient number of decimal or 2-adic digits. One does not actually have to perform the above for all values of α ∈ H, which we recall was the kernel of the norm map from K(S, 2) to Q∗ /Q∗ 2 . This is because we can make use of the underlying group structure. We adopted the method in [21] for this purpose which greatly sped up the overall computation. The entire method was programmed in C++ using the LiDIA library and using the program for K(S, 2) which we mentioned previously.
2
The Indirect Method
The indirect method proceeds by computing the binary quartic forms directly. This is done by application of what is essentially 19th century invariant theory. This idea is due to Birch and Swinnerton-Dyer, [2], and in recent years the method has been greatly improved and simplified by work of Cremona, [7] and [8]. We note that the map, ξ, from the curve D : z 2 = G(m, 1) to the elliptic curve, E, in the “descent diagram”
506
Z. Djabri and N.P. Smart
[2] E 6
ξ
-E
D is given by the covariant syzygy of the binary quartic G. This map, ξ, can be used to map rational points on D to representatives of the cosets of E(Q) in E(Q). Classical invariant theory, see [11] or [13], tells us that G(m, n) has two fundamental invariants denoted by I and J. These are the values of I and J used to define our elliptic curve in equation (1). By appealing to [2][Lemmata 3, 4 and 5] we can assume that G(m, n) has integral coefficients at the expense of increasing the possible values of (I, J) from a single pair to a couple of pairs. It is then possible to construct all the possible binary quartic forms, G(m, n), up to equivalence, using a standard reduction theory, [14]. This reduction theory gives a finite search region for the coefficients of the binary quartics. The size of the search region appears to depend heavily on the size of diam(θ) = max |θ(i) − θ(j) |, which gives rise to the expected√exponential behaviour of the complexity of the method, since diam(θ) ≤ ∆/δ(θ)2 where δ(θ) = min |θ(i) − θ(j) |. As this method is explained well elsewhere we shall be content with just giving the following references, [2], [9], [7] and [8].
3
Numerical Results
We ran both mwrank and our own program for the direct method on a list of over 2900 curves, made up of a subset of the list of curves of conductor less than one thousand plus some randomly chosen curves with large coefficients or ranks of the order of 4 and 5. The results we summarize in the table below. The indirect method in the range of ∆ we considered usually computed the Selmer group in a much shorter time than that taken by the direct method. However when log |∆| > 30 we noticed a significant improvement in using the direct method. However there was some variation in the running times for the indirect method, but not as much as had been experienced before the incorporation in mwrank of the improvements explained in [8]. The direct method on the other hand exhibited remarkably small variation in running times. Clearly, the direct method is as fast as the underlying programs one is using to compute field invariants, in our case PARI. We divided the curves up into twelve groups ranked according to the size of |∆|. In Table 1 we give the average and worst case running times for each of these twelve groupings. The timings are given in seconds and represent processor time and not user time, on a HP-UX 9000/780.
Selmer Groups of an Elliptic Curve
507
Table 1. Comparison of the Direct and Indirect Methods Sample Indirect Method log |∆| Size Mean Worst 11-14 39 0.12 0.3 14-17 180 0.13 0.47 17-20 410 0.14 3.94 20-23 1445 0.14 1.2 23-26 405 0.26 4.56 26-29 269 0.66 7.7 29-32 77 9.06 171.63 32-35 47 28.57 118.58 35-38 23 94.05 393.24 38-41 13 592.93 2284.09 41-44 4 5025.08 6486.64 44-47 4 17428 32236.6
4
Direct Mean 14.33 14.16 13.96 15.48 16.48 18.97 18.71 24.86 25.19 16.83 19.31 19.04
Method Worst 16.45 18.92 37.31 102.15 111.06 152.16 57.81 215.52 170.33 19.44 29.07 21.14
Further Descents and the Mordell-Weil Group
We now discuss how one can perform further descents with both the direct and the indirect methods and in addition determine elements in the Mordell-Weil group and not just S2 . As will be seen the indirect method for computing S2 is more preferable than the direct method if further descents are required. We now let r denote the rank of the Mordell-Weil group of E. The above two methods as described compute S2 which, assuming there is no two torsion on E, has order 2r . If one can find r independent rational points on E then we know that the rank is r. We hence have a sublattice of finite index which we can pass to a procedure for enlargement to the whole of E(Q), see [20] for such a procedure. If we cannot find enough points we could be in one of two positions: 1. The points exist but the smallest ones are far too large to be spotted by a simple search procedure. 2. There exists a non-trivial element in X of order two. We would clearly like to be able to cope with both problems. In the first situation it is better to work with the representation that the indirect method gives us for elements of S2 , namely the global quartics; z 2 = G(m, 1) = am4 + bm3 + cm2 + dm + e. In the second situation it is better to represent elements in S2 by means of an algebraic integer α ∈ K(S, 2). If we let p = 3b2 − 8ac denote the seminvariant of G(m, n) of degree two and weight two then we can pass from the representation used in the indirect method to that used in the direct method using the formula α=
4aθ + p , 3
508
Z. Djabri and N.P. Smart
see [7]. To go the other way is a little more tricky, which is a major drawback of the direct method as we shall now see. In the indirect method we can, given an element of S2 D : z 2 = G(m, 1), determine whether it has any small solutions and then map these to E using the covariant syzygy. Any element of E(Q) we find in this way we would expect to have much larger height than the corresponding point on D. Hence this is a way of finding points of large height on the curve. As the image of a point on each D ∈ S2 gives a representative of a coset of 2E(Q) in E(Q), this hopefully allows us to determine r independent points on E. Now suppose that we cannot find any small solutions on the curve D. We first test whether the curve D could arise as an image of an element of the 4-Selmer group, S4 , if not then D must be an element of order two in X. To test whether D is an image of an element in S4 we map the curve, D, to the corresponding element α ∈ K(S, 2) and then use the method of [4], which makes use of the Cassels-Tate pairing on X. Finally, if D does arise as an image of an element of S4 , we can apply the method of [16] to perform a further descent on the curve D and actually determine the element of S4 . Searching for points on this further descent gives us a way of finding points on D of large height, which in turn gives us a way of finding points on E of very large height. In the direct method such a variety of techniques are not available to us. Given α ∈ K(S, 2) we cannot determine a corresponding curve D with ease, which is why we only worked locally in the algorithm of Section 1. We only have available the Cassels-Tate pairing to detect whether α corresponds to the image of an element of S4 . In practice this may be all that is required but we do fail to obtain a method of searching for points of large height.
5
The 3-Selmer Group
Clearly the problem that remains is; what should we do when we have a curve which has an element of 4-torsion in X ? If we could construct S3 we would have at least solved our problem in the cases where X has no elements of order three. However there is a problem, although the methods to construct the mSelmer group in the standard literature are constructive they are not practical methods. For example they often involve determining the m-Selmer group of the curve over a large degree number field and then constructing the m-Selmer group over Q using Galois theory. We shall assume that our curve does not possess a 3-isogeny. If a 3-isogeny does exist one can attempt to determine the rank of the curve using descent via 3-isogeny. This has been explained in the literature in many places, see [25] for a very accessible account. Firstly we look at the generalization of the indirect method. As was noted by Swinnerton-Dyer, [5][Page 269], an element of S3 can be represented as a ternary
Selmer Groups of an Elliptic Curve
509
cubic form C:
a300x3 + 3a210 x2 y + 3a201x2 z + 3a120 xy2 + 3a102xz 2 +6a111xyz + a030y3 + 3a021y2 z + 3a012 yz 2 + a003 z 3 = 0
and the map, ξ, in the “descent diagram” [3] E 6
ξ
-E
C is given by the covariant syzygy, [17][Page 203], of the ternary form, C, just as it was in the indirect method for computing S2 . Hence if we can determine all possible curves C up to the necessary equivalence then we can test them for local solubility and determine S3 . In addition, using the covariant syzygy, we can map any rational points on C to representatives of cosets of 3E(Q) in E(Q). The curve C has two classical fundamental invariants, usually denoted S and T , these play the exactly the same role as the invariants I and J before. The curve is non-degenerate if T 2 + 64S 3 6= 0. Using the covariant syzygy one can relate the pair (S, T ) to the pair (I, J) which define the elliptic curve (1). Swinnerton-Dyer has pointed out to us, [24], that if the curve C has integral coefficients, by which we mean the aijk above are integral, then one can construct representatives of equivalence classes of all such curves with given invariants. A simplification of the method of Swinnerton-Dyer is given in the following result; Theorem 1. Let S and T be two given integers such that T 2 + 64S 3 6= 0. There is an algorithm which computes a complete set of representatives from the GL3 (ZZ)-equivalence classes of ternary cubic forms with integral coefficients and invariants given by S and T . Proof. We first determine a finite set of SL3 (IR)-equivalence classes. Let F (x, y, z) be a ternary cubic form with invariants S and T . By [10] there is a real unimodular transformation which sends F to the form G = α(X 3 + Y 3 + Z 3 ) + 6βXY Z, where α, β ∈ IR. The invariants of G, and hence of F , are given by S = α3 β − β 4 ,
T = 8β 6 − α6 + 20α3 β 3 .
Hence, by solving these two equations for α and β, we can determine a finite set of possible pairs (α, β) ∈ IR2 . Now X x Y = Ay Z z
510
Z. Djabri and N.P. Smart
where A = (λ1 , λ2 , λ3 ) ∈ SL3 (IR). We then apply a GL3 (ZZ) transformation to F (x, y, z) to obtain a form, which we also denote by F (x, y, z), for which the columns of A form a Minkowski reduced basis. Hence |λ1 | ≤ |λ2 | ≤ |λ3 | and |λ1 λ2 λ3 | ≤ 2. Now if F (x, y, z) =
X i1 +i2 +i3 =3
3! ai i i x i 1 y i 2 z i 3 , i1 !i2 !i3 ! 1 2 3
on setting c1 = |3|α| + 6|β||, it is easy to see that |aijk | ≤ c1 |λ1 |i |λ2 |j |λ3 |k ,
(2)
−1/3
then we obtain |a300|, |a210|, |a120|, |a030| < 1. As these Suppose |λ2 | < c1 are all integers we have a300 = a210 = a120 = a030 = 0, which implies that −1/3 T 2 + 64S 3 = 0, a contradiction. So we can assume that |λ2 | ≥ c1 . −4/3 Now suppose that |λ1 | < c1 /2. Then as |λ1 |2 |λ3 | ≤ 2||λ1 |/|λ2 | < c−1 1 we have that a300 = a210 = a201 = 0 which again implies that T 2 + 64S 3 = 0. So −4/3 we must have |λ1 | ≥ c1 /2. All that remains is to bound the values of aijk which we can now do us−4/3 −1/3 5/3 /2, |λ2 | ≥ c1 and |λ3 | ≤ 4c1 and ing the three inequalities |λ1 | ≥ c1 inequality (2). We obtain |a300|, |a210|, |a120|, |a201|, |a111| ≤ 2c1 , |a102 | ≤ 8c31 , |a030|, |a021|, |a012 ≤ 16c41 and |a003| ≤ 64c61 . So to find all forms F (x, y, z) up to GL3 (ZZ )-equivalence we loop through all coefficients which are bounded by the inequalities above and determine which forms have invariants given by S and T . Such a set will contain a representative from each GL3 (ZZ)-equivalence class. To determine a unique representative from each class we need to determine which forms in the list are GL3 (ZZ )-equivalent. But this is just a matter of solving a set of eleven non-linear equations in nine integer unknowns. However there is a problem; before we can apply this result we need to reduce to the consideration of forms with integral coefficients. As mentioned earlier this was done in the case of computing S2 by applying to Lemmata 3, 4 and 5 of [2]. The standard method for doing this for binary quartic forms, which is explained in detail in [19], appears to suffer from combinatorial explosion when applied to ternary cubic forms. We have therefore been unable to fully work out the details of how this can be done for S3 . We now turn our attention to the direct method. We apply the procedure which is explained in [18]. Let L denote the algebra L = Q[σ, τ ]/(f(σ), g(σ, τ )) ∼ = Q[τ ]/(h(τ )) where g(σ, τ ) = τ 2 − σ 3 − Aσ − B, f(σ) = σ 4 + 2Aσ 2 + 4Bσ − A2 /3 , 8 6 3 2 4 h(τ ) = τ + 8Bτ + (8A /3 + 18B )τ − 16A6 /27 − 8B 2 A3 − 27B 4 ,
Selmer Groups of an Elliptic Curve
511
with A = −3I and B = J. Then (σ, τ ) represents a generic point of order 3 on our Paelliptic curve. The algebra L decomposes into a sum of number fields, L = i=1 Ki , and as we are assuming that E possesses no rational 3-isogeny we have a = 1 or 2. Every element of E(Q) can be represented by a rational divisor class of degree zero, n n X X Pi − Qi , i=1
i=1
where Pi , Qi ∈ E(Q) are not points of order 3. Let Si denote the set of primes of Ki lying above 3, ∞ and the primes of bad reduction of the curve E. If we then let G denote the group Ker{NL/Q :
a X
Ki (Si , 3) → Q∗ /Q∗3 }
i=1
then there is an injective group homomorphism given by →Q G E(Q)/3E(Q) Pn φ : Pn n P − Q → Θ(P i )/Θ(Qi ) i=1 i i=1 i i=1 where
Θ(x, y) = 2τ y − 2τ 2 + (3σ 2 + A)(σ − x) (mod L∗3 ).
Let βp denote the natural map from L∗ /L∗ 3 to L∗p /L∗p 3 , and let Ep denote E(Qp )/3E(Qp ). Lemma 2. The image of the 3-Selmer group in L∗ /L∗ 3 is contained in the intersection of βp−1 (φp (Ep )) over all primes p of ZZ. Proof. This follows from the definition of the 3-Selmer group of an elliptic curve. Using a minor adaption of the program mentioned earlier we can compute Ki (S, 3). However to carry out the procedure in [18], it appears that we also need the equivalent local maps, φp , to be injections. A little group cohomology reveals that this means that for all primes, p, the galois group Gp = Gal(Qp (E[3]), Qp ) must not be equal to either the cyclic or symmetric group on three elements. Unluckily such primes can occur with positive density. However all is not lost. We proceed in the standard way. For a prime p we determine a subgroup of G whose image modulo p is contained in the image of φp. This subgroup will contain the Selmer group, we then take this subgroup as the new G and repeat the process with another prime. So we keep decreasing the size of the group G, and always obtain a group containing the 3-Selmer group. If all the local maps, φp , were injections then this process would terminate with G = S3 as soon as we had used all the primes in S. However as the local maps φp may not be injections for a positive density of the primes we can obtain S3 < G. We have found that by looking at the image of φp for some primes p 6∈ S we can reduce the group G so that eventually we obtain S3 = G. It is an open
512
Z. Djabri and N.P. Smart
question as to whether this can always be achieved after using a finite number of “non-injective good primes”. We end this discussion with an example, in which we determine the rank of E : y2 = x3 − 456219x − 118606410 via 3-descent. This is a curve such that 4|X(E/Q). The set S of bad primes of E is S = {2, 3, 11}. The algebra (in fact a field), L, is obtained by adjoining a root β of x8 − 4x7 + 7x6 − 7x5 + 4x4 − x3 + 8x2 − 8x + 2 to Q. By applying our algorithm, we found that L(S, 3) had a basis consisting of 12 elements. From these, a simple application of linear algebra allowed us to compute a basis of 9 elements for the subgroup of L(S, 3) whose elements are in 3 the kernel of the map from L(S, 3) to Q∗ /Q∗ . Of the 9 basis elements computed, exactly one maps down to φp (Ep ) for every bad prime p. Since 3 and 11 are primes such that φp is not injective we still have to decide whether this element genuinely represents an element of the 3-Selmer group. A simple check using the good prime 5 reveals that this element does not map down to an element of φ5 (E5 ). Hence, we can deduce that the 3-Selmer group is trivial, and that the rank is zero. The example chosen is a trivial one, since mwrank can compute the rank in around 20 seconds after performing a second descent. However the example should convince the reader that using 3-descent is possible.
6
Summary
We have shown that for the case of computing the 2-Selmer group that the indirect method of Birch and Swinnerton-Dyer appears to be more suitable for curves with moderately sized discriminants. However as the discriminant increases the direct method becomes more applicable, as one would expect from the complexity estimates. On the other hand our early investigation of the case of computing the 3Selmer group seems to point in the direction that the direct method via number fields is to be the preferred one, even for curves of small discriminant. In a future paper we will explain, with full proofs, the justification for our method of computing the 3-Selmer group.
References 1. C. Batut, D. Bernardi, H. Cohen, and M. Olivier. GP/PARI version 1.39.03. Universit´ e Bordeaux I, 1994. 2. B.J. Birch and H.P.F. Swinnerton-Dyer. Notes on elliptic curves. I. J. Reine Angew. Math., 212:7–25, 1963. 3. B.J. Birch and H.P.F. Swinnerton-Dyer. Notes on elliptic curves. II. J. Reine Angew. Math., 218:79–108, 1965. 4. J.W.S. Cassels. Second descents for elliptic curves. J. Reine Angew. Math., 494:101–127, 1998.
Selmer Groups of an Elliptic Curve
513
5. J.W.S. Cassels. Diophantine equations with special reference to elliptic curves. J. of LMS, 41:193–291, 1966. 6. J.W.S. Cassels. Lectures on Elliptic Curves. LMS Student Texts, Cambridge University Press, 1991. 7. J.E. Cremona. Classical invariants and 2-descent on elliptic curves. Preprint 1997. 8. J.E. Cremona. Reduction of cubic and quartic polynomials. Preprint 1998. 9. J.E. Cremona. Algorithms for Modular Elliptic Curves. Cambridge University Press, 1992. 10. H. Davenport. On the minimum of a ternary cubic form. J. London Math. Soc., 19:13–18, 1944. 11. E.B. Elliott. An Introduction to the Algebra of Quantics. Oxford University Press, 1895. 12. J. Gebel, A. Peth˝o, and H.G. Zimmer. Computing integral points on elliptic curves. Acta. Arith., 68:171–192, 1994. 13. D. Hilbert. Theory of Algebraic Invariants. Cambridge University Press, 1993. ´ 14. G. Julia. Etude sur les formes binaires non quadratiques. Mem. Acad. Sci. l’Inst. France, 55:1–293, 1917. 15. LiDIA Group. LiDIA v1.3 - a library for computational number theory. TH Darmstadt, 1997. 16. J.R. Merriman, S. Siksek, and N.P. Smart. Explicit 4-descents on an elliptic curve. Acta. Arith., 77:385–404, 1996. 17. G. Salmon. Higher Plane Curves. Hodges, Foster and Figgis, 1879. 18. E.F. Schaefer. Computing a Selmer group of a Jacobian using functions on the curve. Preprint. 19. P. Serf. The rank of elliptic curves over real quadratic number fields of class number 1, Phd Thesis, Universit¨ at des Saarlandes, 1995. 20. S. Siksek. Infinite descent on elliptic curves. Rocky Mountain Journal of Maths, 25:1501–1538, 1995. 21. S. Siksek and N.P. Smart. On the complexity of computing the 2-Selmer group of an elliptic curve. Glasgow Math. Journal., 39:251–258, 1997. 22. N.P. Smart. S-integral points on elliptic curves. Proc. Camb. Phil. Soc., 116:391– 399, 1994. 23. R.J. Stroeker and N. Tzanakis. Solving elliptic diophantine equations by estimating linear forms in elliptic logarithms. Acta. Arith., 67:177–196, 1994. 24. P. Swinnerton-Dyer. Private communication. 1996. 25. J. Top. Descent by 3-isogeny and the 3-rank of quadratic fields. In F.Q. Gouvea and N. Yui, editors, Advances in Number Theory, pages 303–317. Clarendon Press, Oxford, 1993.
An Algorithm for Approximate Counting of Points on Algebraic Sets over Finite Fields Ming-Deh Huang and Yiu-Chung Wong Computer Science Department University of Southern California Los Angeles, CA 90089 huang@@cs.usc.edu ycwong@@cs.usc.edu
Abstract. We present a randomized algorithm that takes as input a prime number p, and an algebraic set (represented by a system of polynomials) over the finite field p , and counts approximately the number of p -rational points in the set. For a fixed number of variables, the algorithm runs in random polynomial time with parallel complexity polylogarithmic in the input parameters (number of input polynomials, their maximum degree, and the prime p), using a polynomial number of processors. However, the degree of the polynomial bound on the running time grows sharply with the number of variables. A combinatorial analysis of the algorithm also shows that, when p is sufficiently large, a good approximate count is represented by N pD , where D is the highest possible dimension of an ¯p -irreducible subvariety of the input defined over p , and N is the number of such distinct subvarieties. In addition, the algorithm computes these two numbers efficiently. It is also applied to obtain an asymptotic lower bound counting result in the case when an algebraic set defined over is reduced mod p, where p goes to infinity.
1
Introduction
In this paper we study the counting version of the following problem: Polynomial Congruences Problem: Given a prime number p, and a set of m integer polynomials f1 , . . . , fm in x1 , . . . , xn of total degree bounded by d, solve the following system of congruences: f1 (x1 , x2 , . . . , xn ) ≡ 0 (mod p) .. . fm (x1 , x2 , . . . , xn ) ≡ 0 (mod p).
(1)
Equivalently, the polynomial congruences problem can be considered as an arithmetic problem of solving for p -rational points of an algebraic set defined by a system of equations f1 , . . . , fm ∈ p [x1 , . . . , xn ]. J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 514–527, 1998. c Springer-Verlag Berlin Heidelberg 1998
Approximate Counting of Points
515
We provide in [HW96] an algorithm to solve the decision version, that is, to decide the non-emptiness of p -rational points of the set. In this paper we extend our investigation to the counting version, namely, that of counting the number of p -rational points of an algebraic set. An algorithm that solves the problem in an approximate sense (to be explained below) will be presented. The problem of counting points on geometric objects over a finite field has appeared in many applications. For example, in the area of primality testing, the algorithm of Adleman and Huang [AH92] needs to count the number of points on the Jacobian of a hyperelliptic curve over p , while Goldwasser and Kilian’s method [GK86] requires a similar count on an elliptic curve. In the design of error-correcting codes of Goppa, it is desirable to find curves with large number of rational points over some finite field [vLvdG88]. An efficient algorithm to count (even approximately) this number for a given curve is thus essential. Also, in the special case when p = 2, the problem is interesting from the viewpoint of circuit complexity [KL93]. In this context a polynomial defined over 2 can be viewed as defining a special case of constant depth circuits. The counting problem is clearly #P -complete. The related problem of approximating the number of satisfiable truth assignments for a disjunctive normal form can be solved in subexponential deterministic time [NW88]. The more general use of low degree polynomials in the study of small-depth circuits is surveyed in [Bei93]. The mathematical quest for counting zeroes of polynomial congruences can be traced back to Gauss, when he studied laws of reciprocity. It was followed by the works of Davenport, Mordell, Hasse, Weil, and others. This line of work finally culminated in Deligne’s celebrated result on the Riemann Hypothesis over finite fields (also known as Weil’s Conjecture). It implies, among other things, an explicit bound on the number of rational points of a smooth absolutely irreducible hypersurface over a finite field. The bound is governed by a geometric invariant of the hypersurface. Various computational techniques have been devised in recent years to study the complexity-theoretic issues related to the problem in some special cases. In the bivariate case of curves, for example, von zur Gathen, Karpinski and Shparlinski [vzGKS93] propose a “strip counting” method to estimate its size over a finite field. They also show that the counting problem for sparsely represented curves over finite fields is #P-complete, under an unproven number-theoretic assumption. In a subsequent paper [vzGS95], von zur Gathen and Shparlinski give an algorithm to find all rational points deterministically, in amortized polynomial time per point, over a prime field, when the curve is non-exceptional. Huang and Ierardi [HI93] provide a probabilistic algorithm that computes the number of points on a projective plane curve of degree d that are rational over O(1) p , with running time (log p)d . In the case of counting zeroes of a single multivariate polynomial, Karpinski and Luby [KL93] give the first polynomial time approximation algorithm over the field 2 . Grigoriev and Karpinski [GK91] give a Monte Carlo algorithm that approximates the count for a single polynomial over arbitrary finite fields, and it works in polynomial time when the finite field is fixed and the polynomial is given in sparse representation.
516
Ming-Deh Huang and Yiu-Chung Wong
In this paper, we tackle the counting version of the polynomial congruences problem in the general setting as stated at the beginning of this section. Our main contribution is a (randomized) algorithm that computes a good approximation to the counting. It runs in polynomial time when the number of variables n is fixed (but p can vary). In fact, its complexity is doubly exponential only in n but polynomial in the size of all other input parameters. Hence the algorithm is theoretically efficient in those cases with small n and very large p. More precisely, we prove: Theorem 1. There exists an algorithm that, given a prime p, and non-zero polynomials f1 , . . . , fm ∈ p [x1 , . . . , xn ] of total degree bounded by d, approximately counts the number #V of p -rational points in V(f1 , . . . , fm ) in the following sense. It calculates two non-negative integers N and D with D ≤ n − 1, where D is the highest possible dimension of an ¯p -irreducible subvariety of V(f1 , . . . , fm ) defined over p , and N is the number of such distinct subvarieties. Then N pD is an approximation to the count #V with an error of #V − N pD ≤ dc nn pD−1/2 for some computable constant c . The algorithm can be run in randomized sequential time O(dcn (m log p)O(1) ), where cn = nO(n) , and parallel time polynomial in cn , log d, log m and log p using O(dcn (m log p)O(1) ) processors. The interpretation of this result is as follows. We treat the input system as defining an algebraic set and consider each of its p -irreducible components. The Riemann hypothesis over finite fields allows us to bound its number of p -rational points if it is an absolute variety (with some technical condition as well). In general, however, it is not the case. But one can show that every p -rational point on it must fall inside some ¯p -irreducible subvariety of the component which is defined over p . On these subvarieties, we can apply Schmidt’s result [Sch74], which is in the same spirit as the Riemann Hypothesis over finite fields as far as counting points is concerned, but it gets rid of some extra technical condition. Roughly speaking, we can use Schmidt’s result to show that the number of p rational points of an absolute p -subvariety of dimension D is about pD , when p is sufficiently large. Therefore our focus is to find these absolute p -subvarieties of each component. Now our analysis says that the count of such subvarieties of highest possible dimension will dominate the total count. Therefore we only need to find out the number (N ) of distinct subvarieties of this kind and their dimension (D). In other words, with respect to approximate counting of p rational points, the information is represented by these two numbers. For any field K, and any set G of polynomials with coefficients in K, denote the zero set defined by G over K by VK (G). For an algebraic set V (G) defined by a system of integer polynomials G, its reduction modulo p, for any prime p, is defined as the algebraic set Vp (G(p) ), where G(p) is the reduction of G modulo p. To count the p -rational points of the latter, one can set up a corresponding polynomial congruences problem, and apply the algorithm in Theorem 1 for an
Approximate Counting of Points
517
approximate count. The answer is encoded by a pair of integers (Np , Dp ) as described in the theorem. An interesting question is: “How does the corresponding count behave when p varies?” Based on Theorem 1, we show that there is an invariant pair of non-trivial integers (N, D) depending only on G, such that for all sufficiently large primes p, the algebraic set defined by G over p contains at least N pD (1 + o(1)) p -rational points. This paper is organized as follows. In section 2, we describe the modification of the algorithm in [HW96] to achieve the counting procedure of Theorem 1. Section 3 lists some technical lemmas, which are applied in Section 4 to give an analysis to justify our main result. Section 5 deals with the lower bound result for the number of rational points in the reduction of a -algebraic set modulo primes.
2
Overview
We begin with a brief review of our decision algorithm for the polynomial congruences problem reported in [HW96]. It relies on a decomposition algorithm that finds all the p -irreducible components of any dimension of an algebraic set defined by polynomials over p . Each component is output as an p -irreducible hypersurface birational to the component. The birational map in both directions is also computed. Equipped with this tool, the decision algorithm works as follows. Recall that the input is a set of polynomials f1 , . . . , fm ∈ p [x1 , . . . , xn ]. The algorithm first decomposes the set V(f1 , . . . , fm ) into p -irreducible components such that each of them is represented by a birational p -irreducible hypersurface. For each component C, this is a hypersurface H such that, except for a closed subset, every point on it is in 1-1 correspondence with a point in C. The exceptional subset is defined by the intersection of H and another hypersurface E that depends on the birational map. If H happens to be absolutely irreducible, then one can use Schmidt’s result to show that C contains (many) p -rational points, when p is sufficiently large (which is the case we are interested in), and the algorithm stops here. If it is not absolutely irreducible, and let h be the defining polynomial for H, then it can be shown that the p -rational points on H must also lie in the hypersurface defined by the first derivative of h with respect to any variable. Thus the algorithm will construct a polynomial h , where h is a first derivative of h with respect to some variable such that h is not identically zero. The pullback h∗ of h is added to the original system {f1 = · · · = fm = 0}. Then we look for p -rational points in the intersection of C and V(h∗ ), which can be shown to have dimension one less than that of C. All components of this dimension of the new system are extracted and the algorithm continues by recursion on those, among these components, that are subvarieties of C. This can be done by checking with generic points, which are made available by the birational maps. It is noted that the dimension of each subvariety is dropped by one. In the degenerate case when H is of dimension 0, it either corresponds to a single point in p or to a set of conjugate points defined in an algebraic extension of p (but not in p ). In the former the algorithm will
518
Ming-Deh Huang and Yiu-Chung Wong
check whether it corresponds to a genuine p -rational point of V(f1 , . . . , fm ) through the birational map. If it does, the algorithm is done. If this test fails, or if it is the latter case (when H is a set of conjugate points) the current recursion path terminates and the recursive search will continue on other remaining components. Conceptually the execution of the decision algorithm corresponds to constructing the following “decomposition tree”. The root represents the input V(f1 , . . . , fm ), the irreducible components of which form the immediate children of the root. Each non-root node i is associated with an p -irreducible algebraic set Ci and a p -hypersurface Hi birational to it. For each node i, if the associated set is absolutely irreducible, then it is a leaf. Otherwise we intersect the set Ci with a hypersurface Hi which corresponds to a non-zero first derivative of the defining polynomial of Hi . The set Ci ∩ Hi is decomposed over p . All those components of dimension one less than dim Ci are extracted and become the children of node i. The decision algorithm executes recursively, implicitly building the tree in a depth-first manner. When it comes to a leaf node, it will find either a single point, a set of conjugate points, or an absolute subvariety of positive dimension. In the first case, it checks whether the point gives rise to a witness point of the input set. If it does, it stops. Otherwise, or in the second case it has to go on with the recursion (on another part of the tree). In the last case, it terminates with a “yes” answer. We are going to extend this to a counting algorithm that computes a good approximation (asymptotically) of the number of all p -rational points in an algebraic set, when p is large, say p > dcn , for some cn = nO(n) . The idea is to apply the algorithm to the input system (f1 , . . . , fm ) to compute all the birational p -hypersurfaces which are absolutely irreducible and have the highest dimension possible. Essentially the algorithm goes through the decomposition tree as does the decision algorithm, but this time it is done in a breadth-first like manner: all internal nodes whose associated sets have codimension 1 are explored first, then those of co-dimension 2, and so on. It terminates at the dimension when the first absolutely irreducible subvariety is found. All birational hypersurfaces of this dimension which are absolutely irreducible are computed. We collect non-duplicates in the sense that no two of them share the same set of p -rational points. We then compute a bound of the number of p -rational points in the corresponding closed subsets of V(f1 , . . . , fm ) that are birational to these hypersurfaces. We are going to show that this bound is good enough to approximate the total number of p -rational points of the whole input set. More specifically, let D be the highest possible dimension of the absolutely irreducible birational hypersurfaces found by the above counting algorithm. We remove duplicates by checking generic points. Let there be N such distinct hypersurfaces. Assume that p > dcn , where cn = ncn for some computable constant c. Then our result below shows that the number #V of p -rational points of V(f1 , . . . , fm ) satisfies
n
|#V − N pD | ≤ dc n pD−1/2 for some computable constant c .
(2)
Approximate Counting of Points
3
519
Useful Lemmas
These are useful results for our analysis in the next section. Some of them come from [HW97], and are repeated here for completeness as well as future use. Unless otherwise stated, the proofs we omit here can be found in that paper. The result on decomposing an algebraic set represented by a system of polynomials over p is given below. Theorem 2. There is an algorithm that given a prime p and a set of polynomials f1 , . . . , fm ∈ k[x1 , . . . , xn ] (k is the finite field p ), constructs a birational hypersurface hW for each irreducible component W of Vk (f1 , . . . , fm ) with sequential 2 complexity of mO(1) dO(n ) field operations where d = max(n, max{deg fi : 1 ≤ i ≤ m}), and with parallel complexity of a number of field operations polynomial 2 in n, log m and log d using mO(1) dO(n ) processors, such that the total degree of hW , as well as that of the inverse map from hW to W , are in the order of dO(n) . (Note: the inverse map from hW to W is given by n rational functions σ1 , . . . , σn , and the previous statement says that each of these σi has a degree bound of dO(n) in both of its numerator and denominator polynomials.) We can also bound the number of components at a particular dimension: Lemma 3. Suppose f1 , . . . , fm ∈ p [x1 , . . . , xn ] and deg fi ≤ d for each fi . Then the number of s-dimensional components that can be extracted by the decomposition algorithm is at most dn−s . The following lemma is an implication of Bezout’s Theorem, and is proven in [Ier89]. We will use it to bound the number of closed subsets in the intersection of two hypersurfaces. Lemma 4. Let k be an arbitrary field. Let f1 , . . . , fm ∈ k[x0 , . . . , xn ] be homogeneous polynomials. Let Z1 , . . . , Zr be the irreducible components of V(f1 , . . . , fm ). Then m r deg fi ≥ degZj . i=1
j=1
The complexity of our randomized decision algorithm for the polynomial congruences problem is summarized below: Theorem 5. There exists an algorithm that given a prime p, and polynomials f1 , . . . , fm ∈ p [x1 , . . . , xn ] of total degree bounded by d, decides for the non-emptiness of p -rational points in V(f1 , . . . , fm ) in randomized sequential time O(dcn (m log p)O(1) ), where cn = nO(n) , and parallel time polynomial in cn , log d, log m and log p using O(dcn (m log p)O(1) ) processors. Furthermore, a witness point can be found in expected time of the same complexity.
520
Ming-Deh Huang and Yiu-Chung Wong
The next lemma allows us to estimate the degree growth of the polynomials defining the birational hypersurfaces obtained in the decision algorithm: Lemma 6. Let f1 , . . . , fm ∈ p [x1 , . . . , xn ] be polynomials whose degrees are each bounded by d. Let H be an p -irreducible hypersurface of dimension D produced by applying the decision algorithm to this system. Then H is birational to a component of the system V(f1 , . . . , fm , hD , hD −1 , . . . , hD+1 ) for some D in the range D ≤ D ≤ n − 1 and some hD+1 , . . . , hD ∈ p [x1 , . . . , xn ] such that for all D + 1 ≤ i ≤ D , hi has degree no larger than d(n−i+1) D −D+1 of H is bounded by d(n−D+1) .
D −i+1
. The degree
For our purpose of approximate counting, it is important for us to be able to estimate the number of p -rational points on an absolutely irreducible hypersurface defined over p . We avoid using Deligne’s result directly, since it requires the hypersurface to be smooth. Instead we rely on Theorem 7 below, due to Schmidt [Sch74], which gives a lower bound for the estimate. Making use of Kaltofen’s effective version of Hilbert’s irreducibility theorem [Kal95], we are able to derive a corresponding upper bound. The results are combined in Lemma 8 to serve our purpose. Theorem 7. Suppose f (x1 , . . . , xn ) is an absolutely irreducible polynomial of total degree d > 0, with coefficients in the finite field q . Let A be the number of solutions (x1 , . . . , xn ) with coordinates in q of the equation f (x1 , . . . , xn ) = 0. Suppose q > 104 n3 d5 P 3 (4 log d)
(3)
where P (1) = 2, P (2) = 3, . . . is the sequence of primes. (In particular, P (x) ∼ x log x, and hence the right-hand side of the above inequality is O(n3 d5+ ) for every > 0.) Then A > q n−1 − (d − 1)(d − 2)q n−(3/2) − 6d2 q n−2 . Lemma 8. Let h ∈ p [x1 , . . . , xn ] be an absolutely irreducible polynomial over the finite field p of degree d. Suppose p > 104 n3 d5 P 3 (4 log d) where P (i) is the sequence of primes. Then the number N of p -rational points in V(h) satisfies |N − pn−1 | ≤ d2 pn−3/2 + (d2 + 2d5 )pn−2 + 2d7 pn−5/2 . Proof. The lower bound is a weaker form of Schmidt’s result. Therefore we only need to prove the upper bound. Consider a selection of 3n − 2 elements w2 , . . . , wn , u2 , . . . , un , v1 , . . . , vn ∈ p and the polynomial ˆ y) def h(x, = h(x + v1 , w2 x + u2 y + v2 , . . . , wn x + un y + vn ) ∈ p [x, y]. ˆ There are 3 possibilities regarding the irreducibility of h: (i) it is reducible over p ,
Approximate Counting of Points
521
(ii) it is irreducible over p but not over ¯p , (iii) it is absolutely irreducible over p . ˆ Then in case (ii), ˆ denote the number of p -rational solutions of h. Let N ˆ we know that all p -rational points must satisfy both h and its first derivative ˆ ≤ d2 by Bezout’s theorem. In case (iii), Corollary 2b of [HW96]. Therefore N ˆ ≤ p + d2 p1/2 . In case (i), N ˆ ≤ d(p + d2 p1/2 ), since [LY94] allows us to get N ˆ consists of at most d plane curves, and hence reduces to in the worst case V(h) either cases (ii) or (iii). If the values w2 , . . . , vn are chosen uniformly from p , then Kaltofen’s effective version of Hilbert’s irreducibility theorem [Kal95] shows that the probability ˆ having distinct irreducible factors over p is no more than 2d4 /p. Therefor h fore, if we try every possible choice and sum up the number of p solutions of ˆ the total number Ntotal is upper bounded by the corresponding h, Ntotal ≤ p3n−2 · (
2d4 2d4 2 d(p + d2 p1/2 ) + (1 − )(d + p + d2 p1/2 )). p p
For each p -rational point (x1 , . . . , xn ) of V(h), and for each (x, y) pair in 2p , there are p2n−2 choices of the elements w2 , . . . , wn , u2 , . . . , un , v1 , . . . , vn from p satisfying x1 = x + v1 , x2 = w2 x + u2 y + v2 , .. . xn = wn x + un y + vn . Therefore the number N of p -rational points of V(h) is upper bounded as follows: Ntotal p2 p2n−2 2d4 2d4 2 ≤ pn−2 · ( d(p + d2 p1/2 ) + (1 − )(d + p + d2 p1/2 )) p p
N=
≤ pn−1 + d2 pn−3/2 + (d2 + 2d5 )pn−2 + 2d7 pn−5/2 . We will approximate the number of p -rational points in a closed absolute p -subset by the number of p -rational points in its birational p -hypersurface. The next two results are used in the analysis of the error bound. Theorem 9 is again due to Schmidt [Sch74]. Theorem 9. Suppose u0 (x1 , . . . , xn ), . . . , ut (x1 , . . . , xn ) are polynomials of degree ≤ d with coefficients in q and without a common factor. Then the number of solutions of u0 (x1 , . . . , xn ) = . . . = ut (x1 , . . . , xn ) = 0 with coordinates in q is ≤ 2nd3 q n−2 .
522
Ming-Deh Huang and Yiu-Chung Wong
Lemma 10. Suppose the decomposition algorithm is applied to the set of polynomials f1 , . . . , fm ∈ p [x1 , . . . , xn ], and d = max{ deg fi }. Let H be a Ddimensional birational hypersurface resulting from the corresponding extraction. Then there are at most dn−D pre-images in V(f1 , . . . , fm ) corresponding to each point of H. Proof. In the decomposition algorithm [HW97] the fi ’s are preprocessed to form n − D polynomials p1 , . . . , pn−D , each of whose degrees is at most d, such that they define a pure D-dimensional set. Then after a suitable linear transformation V(p1 , . . . , pn−D ) is projected to a union of hypersurfaces of which H is a member. As a result if we fixed a point of H, then V(p1 , . . . , pn−D ) restricted to the corresponding coordinate specialization is a 0-dimensional set. Now apply Lemma 3 to this set. The number of p -rational points in this set is therefore bounded by dn−D .
4
Counting Points
In this section we carry out a combinatorial analysis on the error bound in approximating the number #V of p -rational points on V(f1 , . . . , fm ) by the quantity N pD as described in section 2. There are three contributions to the error bound. First, the absolutely irreducible hypersurfaces we found are only birationally equivalent to the corresponding closed subsets of V(f1 , . . . , fm ). We have to account for those points where the 1-1 correspondence fails. This can affect both the upper and lower bound of the error. Second, there are overcounting of those p -rational points in the intersection of two or more of these hypersurfaces. We have to subtract these counts in estimating the lower bound. And finally, we have omitted counting the p -rational points coming from absolutely irreducible hypersurfaces of lower dimensions. These contribute to the upper bound of the error. In the following we will justify that all these errors have been taken care of by the simple bound given in (2). Let us look at the first type of errors. We know from Lemma 8 that an absolutely irreducible hypersurface of dimension D defined over p has pD + n−D O(d2(n−D+1) pD−1/2 ) p -rational points, since the degree bound of the hyn−D persurface is d(n−D+1) by Lemma 6. For each such hypersurface H calculated by the algorithm, those p -rational points of H falling outside an exceptional subset E are in 1-1 correspondence with the p -rational points in a birational closed subset C of V(f1 , . . . , fm ). It is easy to see that n − D times the degree of H is an upper bound on the degree of the polynomial e defining the exceptional set E on H (i.e. V(e) ∩ H = E) (recall from [HW97] that e is the product of the denominator polynomials of all the rational functions defining the inverse map from H to C, and each of these denominator polynomials has the same degree bound as that of H). Now both e and h involve D + 1 variables. By Schmidt’s result (Theorem 9) the number of p -rational points in H ∩ E is at most 2(D + 1) · max(deg h, deg e)3 · pD+1−2 ≤2(D + 1)(n − D)d3(n−D+1)
n−D
pD−1 .
Approximate Counting of Points
523
By Lemmas 6 and 10 we know that each point of H (and hence of H ∩ E) has n−D+1 pre-images in V(f1 , . . . , fm ). Therefore from the above at most d(n−D+1) inequality we deduce that the difference between the numbers of p -rational points #C of C and the number of p -rational points #H of H satisfies |#H − #C| ≤ d(n−D+1)
n−D+1
≤ c1 d2(n−D+1)
· 2(D + 1)(n − D)d3(n−D+1)
n−D+1
n−D
pD−1
pD−1
where c1 is a computable polynomial in n alone. Therefore, combining with the observation made at the beginning of this paragraph, we have (from our assumption on p) |#C − pD | ≤ O(d2(n−D+1) = O(d2(n−D+1)
n−D
pD−1/2 ) + c1 d2(n−D+1)
n−D+1
pD−1/2 ).
n−D+1
pD−1 (4)
Though the above analysis concentrates on hypersurfaces of dimension D, it can be extended similarly to other birational hypersurfaces. In other words, the inaccuracy in approximating the number of p -rational points on a subset by that of the corresponding birational hypersurface bears a similar form. This completes the analysis of the first type of errors. Before proceeding to consider the other two types of errors, we want to classify in more detail the hypersurfaces generated by the algorithm. We say that a hypersurface of dimension D is obtained through i steps of recursion if it is birational to a component of V(f1 , . . . , fm , hD , hD −1 , . . . , hD −i+1 ) for some hj ’s, D ≥ j ≥ D − i + 1 ≥ D + 1, such that D and the hj ’s follow the meaning in Lemma 6. Intuitively we try to quantify how deep such a hypersurface is embedded in an ordinary component of V(f1 , . . . , fm ). For example, a hypersurface birational to an ordinary component of V(f1 , . . . , fm ) is obtained through 0 step of recursion, that of V(f1 , . . . , fm , hD ) through 1 step of recursion, etc. Conceptually we classify all the hypersurfaces according to their dimensions as well as the number of steps of recursion required to obtain them. The following lemma gives an upper bound of the size of these classes.
Lemma 11. Suppose f1 , . . . , fm ∈ p [x1 , . . . , xn ] and deg fi ≤ d for each fi . Then the number of s-dimensional birational hypersurfaces obtained from the i+1 algorithm through i levels of recursions is at most d2(n−s) . Proof. Denote the number in question by N (s, i). First note that when i = 0, we are dealing with hypersurfaces birational to components of V(f1 , . . . , fm ). Therefore by Lemma 3 N (s, 0) ≤ dn−s , which obviously satisfies the assertion.
524
Ming-Deh Huang and Yiu-Chung Wong
For i > 0, the hypersurfaces are obtained by extracting the s-dimensional components from the subset corresponding to an (s + 1)-dimensional hypersurface obtained through i − 1 steps of recursion by the algorithm. The detail is as described by Lemma 6, which shows that the degree of a (s + 1)-dimensional hyi persurface obtained through i − 1 steps of recursion is at most d(n−s) . Therefore by induction, and applying Lemma 3 we have i
N (s, i) ≤ N (s + 1, i − 1) · (d(n−s) )n−s i
≤ d2(n−s−1) · d(n−s) ≤ d2(n−s)
i+1
i+1
.
Now we come back to analyze the error bound in (2). There is a lower bound contribution due to over-counting the overlapped p -rational points of any two absolutely irreducible hypersurfaces of the highest dimension D. By Lemma 11 n−1−D i+1 d2(n−D) such hypersurfaces, the degree of each is there are at most i=0 n−D (n−D+1) (Lemma 6). Now the intersection of any two of upper bounded by d them gives a number of closed subsets of dimension at most D−1. The number of these subsets is upper bounded by the square of their degrees (Lemma 4). Similar to the analysis arriving at (4), we can show that the number of p -rational points n−D+1 pD−3/2 ). Therefore the total in each of these subsets is pD−1 + O(d2(n−D+2) number of p -rational points in all these intersections is at most n−1−D
(
d2(n−D)
i+1
) · (d(n−D+1)
n−D
)2 · (pD−1 + O(d2(n−D+2)
n−D+1
pD−3/2 ))
i=0
≤ O(dc2 (n−D+1)
n−D
pD−1 )
(5)
where c2 is some computable polynomial depending on n alone. We also need to account for the p -rational points in lower dimensional absolutely irreducible hypersurfaces that are ignored in our approximation counting scheme. This contributes to the upper bound of the error in (2). Reasoning as n−1−s i+1 before, there are at most i=0 d2(n−s) absolutely irreducible hypersurfaces of dimension s that can be obtained. Each of them contains at most O(ps ) p rational points. Therefore the upper bound that we need to take into account is D−1 n−1−s n s 2(n−s)i+1 d (6) O(p ) · ≤ dc3 n pD−1 s=0
i=0
where c3 is some computable polynomial depending on n alone. Combining the bounds in (4), (5) and (6), the inequality in (2)
n
|#V − N pD | ≤ dc n pD−1/2 is justified.
Approximate Counting of Points
525
Since the number of the birational hypersurfaces of the highest possible diO(n) mension obtained through the algorithm is at most dn , its running time has the same asymptotic order as in the decision algorithm of Theorem 5. Therefore we have given a proof of Theorem 1. Note that if we assume that p > dcn , where cn ≥ 2c nn , then the error bound in the theorem can be simplified to pD . In other words, the relative error is at most 1/N .
5
Number of Points in the Reduction of an Algebraic Set Modulo Primes
In light of Theorem 1, an interesting question is:“Given a system of polynomials with integer coefficients, is there an invariant pair of numbers (N, D) that provides good information on counting the number of p -rational points on the algebraic set defined by this system, when p is sufficiently large?” We provide the following observations in relation to this. First note that the algebraic arguments we make for the counting algorithm, up to the point where Schmidt’s result is used, remains valid over the rational field . More specifically, the decomposition algorithm also works over the field (refer to [HW96] for details). And, if an -irreducible polynomial f is not absolutely irreducible over , then any -rational point of V (f ) must also be a zero of any first derivative of f . Now let G be a set of integer polynomials. Consider it as defined over the rational field . Suppose we carry out the same procedure as in the above counting algorithm to the system G over (instead of a finite field). Then it will still end up with a set of absolutely irreducible hypersurfaces over , each of which is birational to an absolute subvariety of V (G). Let D be the maximum among the dimensions of these subvarieties, and N be the number of distinct subvarieties of this dimension. By Shimura’s result ([Shi55] Section 6 Lemma 3) each of the absolutely -irreducible polynomial defining these birational hypersurfaces over remains absolutely irreducible over p for all but a finite number of p. Therefore, there are at least N distinct absolute p -subvarieties of dimension D in the reduction of V (G) mod p, or Vp (G(p) ) (recall that G(p) is the reduction of G mod p), for all but a finite number of primes p (though D may not be the highest possible dimension of an absolute p -subvariety of Vp (G(p) )). Hence, together with the arguments in Section 4, the quantity N pD is an approximate lower bound of the number of p -rational points in Vp (G(p) ) for all sufficiently large primes p. An alternative approach to the question is to derive a lower bound in terms of the dimension D(G) of V (G). (Note that D < D(G) if no absolutely irreducible component of V (G) is defined over .) Consider a finite Galois extension L of over which the absolutely irreducible components of V (G) are all defined. For sufficiently large primes p, we may assume that p is unramified in L and that all absolutely irreducible components of V (G) remain absolutely irreducible upon reduction mod p. Since p is unramified in L, the Frobenius element σp ∈
526
Ming-Deh Huang and Yiu-Chung Wong
Gal(L/ ) is well-defined up to conjugation, and upon reduction mod p, the number of ¯p -irreducible components that are defined over p equals the number of ¯ -irreducible components (before the reduction) that are fixed by σp . Call this number Np . Then the quantity Np pD(G) is an approximate lower bound of the number of p -rational points in Vp (G(p) ). The number Np depends on the conjugacy class of σp in Gal(L/ ). Let N (G) be the number of absolutely irreducible components of V (G) of dimension D(G). Then Np = N (G) when σp is the identity. By Chebotarev density theorem, the density of such primes p is at least 1/|Gal(L/ )|.
Acknowledgement We would like to thank one of the referees for suggesting the alternative approach in the last section. This research is supported in part by National Science Foundation Grant CCR-9412383. In addition, please note that the current address of the second author, Yiu-Chung Wong, is: Synopsys, Inc. 700 East Middlefield Road, Mountain View, CA 94043; ycwong@@synopsys.com.
References [AH92]
[Bei93]
[GK86]
[GK91]
[HI93]
[HW96]
[HW97] [Ier89]
[Kal95]
Leonard M. Adleman and Ming-Deh Huang, Primality testing and two dimensional Abelian varieties over finite fields, Lecture Notes in Mathematics, vol. 1512, Springer-Verlag, 1992. Richard Beigel, The polynomial method in circuit complexity, Proceedings of 8th Annual Structure in Complexity Theory Conference, IEEE Computer Society Press, May 1993, pp. 82–95. Shafi Goldwasser and Joe Kilian, Almost all primes can be quickly certified, Proceedings of the Eighteenth Annual ACM Symposium on Theory of Computing (Berkeley, California), 28–30 May 1986, pp. 316–329. Dima Grigoriev and Marek Karpinski, An approximation algorithm for the number of zeros of arbitrary polynomial over GF[q], Proceedings of 32nd IEEE Symposium on Foundation of Computer Science, 1991, pp. 662–669. Ming-Deh Huang and Doug Ierardi, Counting rational points on curves over finite fields, Proceedings of 34th IEEE Symposium on Foundation of Computer Science, IEEE, 1993, pp. 616–625. Ming-Deh Huang and Yiu-Chung Wong, Solving systems of polynomial congruences modulo a large prime, Proceedings of IEEE Symposium on Foundations of Computer Science, 1996, pp. 115–124. Ming-Deh Huang and Yiu-Chung Wong, Solving systems of polynomial equations modulo a large prime, manuscript, a full version of [HW96]. Douglas John Ierardi, The complexity of quantifier elimination in the theory of an algebraically closed field, Ph.D. thesis, Department of Computer Science, Cornell University, Ithaca, New York 14853-7501, 1989, also available as Technical Report no. TR 89-1030 of Computer Science Department, Cornell University. Erich Kaltofen, Effective Noether irreducibility forms and applications, Journal of Computer and System Sciences 50 (1995), no. 2, 274–295.
Approximate Counting of Points
527
Marek Karpinski and Michael Luby, Approximating the number of zeroes of a GF[2] polynomial, Journal of Algorithms 14 (1993), 280–287. [LY94] David B. Leep and Charles C. Yeomans, The number of points on a singular curve over a finite field, Arch. Math. 63 (1994), 420–426. [NW88] Noam Nisan and Avi Wigderson, Hardness vs randomness, Proceedings of 29th Annual IEEE Symposium on Foundations of Computer Science, 1988, pp. 2–11. [Sch74] Wolfgang M. Schmidt, A lower bound for the number of solutions of equations over finite fields, Journal of Number Theory 6 (1974), 448–480. [Shi55] Goro Shimura, Reduction of algebraic varieties with respect to a discrete valuation of the basic field, American Journal of Mathematics 77 (1955), 134–176. [vLvdG88] Jacobus H. van Lint and Gerard van der Geer, Introduction to coding theory and algebraic geometry, DMV Seminar, no. Band 12, Birkhauser Verlag, 1988. [vzGKS93] Joachim von zur Gathen, Marek Karpinski, and Igor Shparlinski, Counting curves and their projections, Proceedings of 25th ACM Symposium on Theory of Computing, The Association of Computing Machinery, May 1993, pp. 805–812. [vzGS95] Joachim von zur Gathen and Igor Shparlinski, Finding points on curves over finite fields, Proceedings of 36th IEEE Symposium on Foundation of Computer Science, 1995. [KL93]
S-Integral Points on Elliptic Curves and Fermat’s Triple Equations Attila Peth˝ o1? , Emanuel Herrmann2 ?? , and Horst G. Zimmer2 1
1
Institute of Mathematics and Informatics, Kossuth Lajos Universit¨ at H-4010 Debrecen, P.O. Box 12, Hungary 2 Fachbereich 9 Mathematik, Universit¨ at des Saarlandes D-66041 Saarbr¨ ucken, Germany
Introduction
In an earlier paper [15] we developed an algorithm for computing all S-integral points on an elliptic curve E over Q, the field of rationals. Here we show how this algorithm can be used to solve certain Fermat triple equations. For the convenience of the reader, we also give an outline of the algorithm in [15]. Let E:
y2 = x3 + ax + b (a, b ∈ Z)
be given in short Weierstraß form with integral coefficients a, b. Denote by ∆ = 4a3 + 27b2 the discriminant and by j = 123
4a3 ∆
the modular invariant of E over Q. Fix a finite set S = {q1, . . . , qs−1, ∞} of places of Q including the infinite one. Let E(ZS ) denote the set of S-integral points of the curve E over Q. It was proved by K. Mahler [11] that E(ZS ) is finite, and Coates [2] showed that E(ZS ) is effectively computable. A practical algorithm for determining the set E(ZS ), based on ideas of Lang [9] and Zagier [18], is described in a paper of Smart [14]. However, Smart’s approach depends on an assumed lower bound for linear forms in q-adic elliptic logarithms. In [15] we circumvent this assumption by bringing in an explicit upper bound, recently derived by Hajdu and Herendi [8], for the coordinates of S-integral points in the Mordell-Weil group E(Q). In this way, all S-integral points in E(Q) can be computed unconditionally. Our procedure relies on Theorem 1 stated below.
? ??
Research partially supported by Hungarian National Foundation for Scientific Research Grant No. 16791 and 16975 Research partially supported by the Siemens AG
J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 528–540, 1998. c Springer-Verlag Berlin Heidelberg 1998
S-Integral Points on Elliptic Curves and Fermat’s Triple Equations
2
529
Theorem 1 and Supplements
The method of Lang and Zagier presumes the knowledge of the rank r and a basis P1 , . . . , Pr ∈ E(Q) of the Mordell-Weil group E(Q) modulo the torsion group Etors (Q). Such a basis can be calculated, e.g., by an algorithm of Manin described in [5]. Whenever it is possible, that algorithm is made independent of the conjecture of Birch and Swinnerton-Dyer (cf. [4]). Once a basis P1 , . . . , Pr ∈ E(Q) is known, the task arises of finding a bound on the coefficients ni ∈ Z in the unique representation (1)
P = n1 P 1 + · · · + nr P r + T
(ni ∈ Z)
of any point P ∈ E(ZS ), where T ∈ Etors (Q) is a torsion point. Actually, the torsion point T can be removed from the representation (1) by multiplying the equation (1) by the maximal order g of T which is at most 12 by the well-known theorem of Mazur [12]. Our task is therefore to find an upper bound for the integer (2)
N := max{|ni |; i = 1, . . . , r}.
Such a bound is given in Theorem 1 below. To state the theorem, we require some notation. Denote by ˆ h the N´eron-Tate height as well as the associated symmetric bilinear form on E(Q) λ
the smallest positive eigenvalue of the regulator matrix (ˆ h(Pµ , Pν ))1≤µ,ν≤r ,
Q
:= max{qν ; ν = 1, . . . , s − 1},
log∗ Q := max{log Q, 1}, uν,q u0ν,q
for q ∈ S the q-adic elliptic logarithm of the basis point Pν := 1/ω · (muν,q ) real period of E if q = ∞ with ω := 1 if q 6= ∞ g if q = ∞ , and m := 6 ∞ lcm(g, Nq ) if q = where Nq := #E˜ns (Fq ) is the number of rational points of the reduced curve E modulo q over the finite field Fq (when one leaves out the singular point in the case of bad reduction).
530
Attila Peth˝ o, Emanuel Herrmann, and Horst G. Zimmer
Theorem 1. For any point P ∈ E(ZS ) in the representation (1) in terms of a basis of E(Q), the maximum N of the coefficients defined by (2) satisfies the upper estimate p (3) N ≤ N0 := (c1 /2 + c2 )/λ with constants c2 := log max{(8|a|)1/4, (32|b|)1/6}, c1 := 7 · 1038s+49s20s+15Q24 (log∗ Q)4s−2 c3 (log c3 )2 (c3 + 20(s − 1)c3 + log(ec4 )), where
p |∆|(8 +
log |∆|)4, p 3 c4 := 104 max{16a2 , 256 |∆| }.
c3 :=
32 3
1 2
Moreover, there exists a place q ∈ S such that the normalized q-adic elliptic logarithm of the point mP ∈ E(Q) satisfies the inequality r X 0 nν uν,q + nr+1 ≤ c5 exp{−(λ/s)N 2 + c2 /s} (4) ν=1
q
√
with c5 :=
8g/ω if q = ∞ 1 if q 6= ∞
and with an integer nr+1 ∈ Z which is 0 if q 6= ∞. We shall explain in section 4 how to employ this theorem in order to compute all S-integral points on E over Q. Before we sketch the proof of Theorem 1, we wish to make some comments on the bound N0 in (3). – In the case of elliptic curves E over Q of rank r ≤ 2, R´emond and Urfels [13] established a lower bound for linear forms in q-adic elliptic logarithms (q 6= ∞) which leads to another upper bound for N which can be much smaller than the bound N0 in (3). This new bound was used in [7]. – As the proof of Theorem 1 will reveal, the constant c1 in (3) can be replaced by any upper bound for the height of S-integral points in E(Q) (cf. the Proposition in section 3). If, for instance, we want to compute all ordinary integral points in E(Q), that is, if we consider the special case of s = 1, by Theorem 1 in the paper [8] by Hajdu and Herendi, the constant c1 can be replaced by the smaller value c01 = 5 · 1064 c3 (log c3 )(c3 + log c4 ). In this way, we obtain the sharper estimate q N ≤ N00 := (c01 /2 + c2 )/λ (30 ) in place of (3).
S-Integral Points on Elliptic Curves and Fermat’s Triple Equations
531
– In the special case of s = 1, i.e. of ordinary integral points, another bound N1 for N was established by the authors of [6] and by Stroeker and Tzanakis [16], viz. √ N ≤ N1 = 2r+3 c6 c7 log(r+3)/2 (c7 (r + 3)r+3 ) ˆ ν ) of the with some constants c6 , c7 depending on λ, g, ω, a, b, the heights h(P basis points and the rank r. In particular, c7 depends exponentially on the square of the rank r of E over Q since the essential part of c7 arises from a lower bound derived by S. David [3] for linear forms in complex elliptic logarithms (cf. [6], [16]). It is difficult to compare the size of the constants N00 and N1 because they depend on different parameters. For instance, N1 depends on the rank r of E over Q whereas N00 is independent of the rank r. Numerical experiments suggest that (see section 5)
and
3
N1 ≤ N00
if r ≤ 2
N1 > N00
if r > 2.
Outline of Proof of Theorem 1
Inequality (3) originates from height estimates. Recall that the N´ eron-Tate height ˆ h on E(Q) is defined by virtue of the Weil height h on E(Q). The Weil height h of a rational point P = (x, y) ∈ E(Q) is given by the formula h(P ) :=
1 log h(x), 2
where, for a rational number c in simplest fraction representation c = γδ , γ, δ ∈ Z, δ > 0, gcd(γ, δ) = 1, h(c) := log max{|γ|, |δ|}. The N´eron-Tate height of P is then the limit n ˆ ) = lim h(2 P ) . h(P 2n n→∞ 2
The two height functions on E(Q) are related by the inequality h(P ) ≥ ˆh(P ) − c2
(P ∈ E(Q)).
On the other hand, for non-torsion points, we have ˆ ) ≥ λN 2 h(P
(P ∈ E(Q) \ Etors(Q))
for the maximum N in (2) of the coefficients of P in the representation (1). The last two inequalities yield (5)
h(P ) ≥ λN 2 − c2
(P ∈ E(Q) \ Etors (Q)).
532
Attila Peth˝ o, Emanuel Herrmann, and Horst G. Zimmer
Now we choose P ∈ E(ZS ) and bring in a result by Hajdu and Herendi [8]: Proposition. For S-integral points P = (x, y) ∈ E(ZS ), the coordinates satisfy max{h(x), h(y)} ≤ c1 with the constant c1 from Theorem 1. Hence, for P ∈ E(ZS ), we have h(P ) ≤ c1 /2. Combined with (5), the last inequality entails λN 2 − c2 ≤ c1 /2. This proves (3). To verify (4), we consider an S-integral point P = (x, y) ∈ E(ZS ) and choose q ∈ S such that (6)
|x|q = max{|x|q1 , . . . , |x|qs−1 , |x|∞}
for the multiplicative absolute values corresponding to the places in S = {q1 , . . . , qs−1, ∞}. By the definition of h we then conclude that h(P ) ≤
(7)
s log |x|q . 2
Combining (5) and (7) yields 1 1/2
|x|q
≤ c8 exp{−c9 N 2 }
with
c2 λ ), c9 := . s s Transforming the last inequality into an inequality for elliptic logarithms leads to the asserted estimate (4). Here both the complex (q = ∞) as well as the q-adic (q 6= ∞) elliptic logarithms are to be taken into account. We omit the details (see [15]). c8 := exp(
4
Algorithmic Application of Theorem 1
The computation of all S-integral points on the curve E over Q is achieved on the basis of the two inequalities (3) and (4) in Theorem 1. The bound for N has to be reduced to a size which makes the direct search feasible.
S-Integral Points on Elliptic Curves and Fermat’s Triple Equations
533
The first step consists in determining a basis P1 , . . . , Pr of the free part of E(Q). This can be done by Manin’s “conditional” algorithm (see [5], [20]) which is to be carried out without the assumption of any conjectures whenever this turns out to be possible. Once a basis of E(Q) is known, the smallest positive eigenvalue λ can be calculated. The constants c1 amd c2 are easily computed in terms of the given set S = {q1 , . . . , qs−1, ∞} and the coefficients a, b ∈ Z of the Weierstrass equation for E. This gives the bound N0 for N in (3) in Theorem 1. However, the bound N0 for N is in general very large. Here the inequality (4) of Theorem 1 comes into play. The q-adic elliptic logarithms for q ∈ S have to be calculated with a very high precision. The technical details are explained in [15]. The system of inequalities (3) and (4) is then solved by de Weger’s reduction method [17] as described by Smart in [14], pp. 396-397. It should be pointed out that de Weger reduction must be performed at all places q ∈ S. Denote by Nq the bound for N obtained after de Weger reduction with respect to a place q ∈ S assuming that (6) holds for this q ∈ S. Then we replace the bound N0 by max{Nq ; q ∈ S} and iterate the reduction whenever it leads to a smaller bound. Experience shows that the last value of N0 obtained after several reductions is sufficiently small to facilitate a test for S-integrality of the points P ∈ E(Q) in their representation (1) within the range N ≤ N0
(8) for the N in (2).
5
Comparison of Bounds
In [1] Bremner, Stroeker and Tzanakis consider the family of elliptic curves Ek :
y2 = x3 − 36x − 864k(k − 1)(2k − 1)
parametrized by k ∈ Z. We compare the bounds N00 and N1 for N mentioned in section 2. To this end we choose four curves Ek of ranks rk = 1, 2, 3 and 4. Table 1 k rk 1 1 3 2 7 3 20 4
N1 8.9 · 1023 5.8 · 1039 2.4 · 1060 2.1 · 1086
N00 1.3 · 1041 1.8 · 1044 5.9 · 1045 2.9 · 1047
In these examples, we see that N1 < N00 if r ≤ 2 and N1 > N00 if r > 2, as already indicated at the end of section 2.
534
6
Attila Peth˝ o, Emanuel Herrmann, and Horst G. Zimmer
Fermat’s Triple Equations
In connection with his study of the equation xn + yn = z n , P. Fermat considered those rational triplets A, B, C for which there exists a u ∈ Q such that (9)
Au + 1 = , Bu + 1 = , and Cu + 1 = ,
where denotes the square of a rational number. It is clear that, if u is a solution of (9), then there exists a v ∈ Q such that v2 = (Au + 1)(Bu + 1)(Cu + 1),
(10)
i.e. (u, v) ∈ Q2 represents a rational point on the elliptic curve (10). Leech [10] invented a method for producing triplets A, B, C ∈ Q for which (9) is solvable. Zagier [19] gave sufficient conditions on A, B, C under which (9) has infinitely many solutions u ∈ Q. On the other hand, if A, B, C ∈ Q are distinct and S is a finite set of primes, then by the above-quoted results of Mahler and Coates, (10) has only finitely many effectively computable S-integer solutions (u, v) ∈ Q2 . Thus there exists only finitely many effectively computable S-integers u satisfying (9). Our method described in sections 2-4 makes it possible to compute all S-integral solutions of (10), hence of (9) too. As an illustration of the method, we have chosen the triplet (A, B, C) = (40, 24, −15) from the paper of Leech. In fact, we prove the following theorem. Theorem 2. Let u, z, y1 , y2 , y3 ∈ Z be such that (11)
40u + z 2 = y12 , 24u + z 2 = y22 , −15u + z 2 = y32
and that z is divisible by the primes 2, 3, 5 and 7 only. Then u ±z ±y1 ±y2 ±y3 0 1 1 1 1 3 7 13 11 2 -1 7 3 5 8 19592 74 2559 2497 2339 Note that (u, z) = (0, 1) is a trivial solution and that the solutions (u, z) = (3, 7) and (−1, 7) were found by Leech, but the solution (u, z) = (19592, 74) appears to be new. Proof of Theorem 2. Let u, z, y1 , y2 , y3 ∈ Z be a solution of (11) such that z is divisible by the primes 2, 3, 5 and 7 only. On choosing S = {2, 3, 5, 7, ∞} and putting x0 = zu2 and y0 = y1 yz23 y3 , we see that (x0 , y0 ) is an S-integral point of the elliptic curve E0 :
y0 = (40x0 + 1)(24x0 + 1)(−15x0 + 1). 2
S-Integral Points on Elliptic Curves and Fermat’s Triple Equations
535
2y −x 0 The birational transformation x0 = 60 = 60 2, y 2 yields the short Weierstraß form E : y2 = x3 − 44100x + 3240000
of E 0 . It is clear that all solutions of (11) are S-integral points of E. First we compute the constants c1 , . . . , c4 in Theorem 1 to obtain N0 on dependence on λ only. We have a = −44100 and b = 3240000, hence ∆ = 59629284000000, c2 = 3.1934, c3 = 2.67 · 1013 , c4 = 1.18 · 1027 . Taking s = 5 and Q = 7 we obtain c1 = 2.86 · 10377 and eventually
√ N0 = 3.76 · 10188 / λ.
Next we compute the group structure and a basis of the Mordell-Weil group E(Q) of the elliptic curve E over Q. On inserting the coefficients of E into the CA-system SIMATH, we obtain the torsion group Etors(Q) = {T1 = (−240, 0), T2 = (150, 0), T3 = (90, 0), T4 = O} ' Z/2Z ⊕ Z/2Z, the rank r = 3 of E(Q) and a basis of the Mordell-Weil group E(Q), viz. P1 = (−75, 2475), P2 = (−110, 2600), P3 = (−6, 1872). The regulator of the curve is R = 4, 0237, the real period is ω = 0, 240762 and the smallest positive eigenvalue corresponding to the chosen basis is λ = 0.95. Thus we compute N0 = 3.88 · 10188. We may take g = 2. The curve E has additive reduction with respect to the primes 2,3 and 5, thus m2 = 2, m3 = 6 and m5 = 10. On the other hand, E has good reduction modulo 7; then N7 = 12, hence m7 = 12. By Theorem 1, for each q ∈ S, we have to solve the following diophantine approximation problems 3 X 0 ni ui,q + n4 ≤ exp{−0.19N 2 + c10 } i=1
q
N = max{|n1|, |n2 |, |n3|} ≤ N0 = 3.88 · 10188, where c10 = 3.8, if q = ∞, and c10 = 0.64 otherwise. In order to reduce the huge upper bound for N , we first take q = ∞ and perform de Weger reduction with C = 10950. We obtain the new upper bound N ≤ N∞ = 8 in this case. Next, for each q ∈ S \ {∞}, we compute the q-adic elliptic
536
Attila Peth˝ o, Emanuel Herrmann, and Horst G. Zimmer
logarithms of Pi , i = 1, 2, 3 with a precision of at least ν2 = 1880, ν3 = 1190, ν5 = 810, ν7 = 670. In Table 2 we list the first and last three significant digits of ψq (mq Pi ), i.e. the digits a2 , a3 , a4 , . . . , aνq −2 , aνq −1 , aνq in the series approximation Ψq (mq Pi ) ≈
νq X
ai q i .
i=1
Note that, as mq Pi ∈ E1 (Q), we always have a1 = 0. Table 2 i 1 2 3 Pi (−75, 2475) (−110, 2600) (−6, 1872) ψ2 (2Pi ) (1, 1, 1, . . ., 1, 1, 1) (1, 1, 0, . . ., 0, 0, 1) (0, 1, 1, . . . , 1, 1, 0) ψ3 (6Pi ) (1, 2, 2, . . ., 1, 1, 1) (1, 1, 0, . . ., 2, 2, 1) (2, 2, 0, . . . , 2, 1, 0) ψ5 (10Pi ) (2, 0, 3, . . ., 2, 3, 1) (4, 1, 3, . . ., 0, 0, 0) (4, 3, 1, . . . , 2, 0, 4) ψ7 (12Pi ) (2, 6, 4, . . ., 6, 2, 3) (1, 4, 2, . . ., 1, 5, 2) (0, 0, 0, . . . , 1, 4, 4) Now we perform the q-adic de Weger reduction with the values Cq = q νq and obtain the new bound N0 = 120 = max{N∞ , N2 , N3 , N5 , N7 }. This new upper bound can be further reduced. On repeating the reduction procedure 3-times, we eventually arrive at N ≤ 15, which cannot be reduced any further. Finally we test all the points P=
3 X
ni Pi + Tj , |ni | ≤ 15, j = 1, . . . , 4
i=1
for S integrality. In Table 3 we list all S-integral points on E. Finally, we check for which entries x, y in Table 3 the quantity x0 = −x/(4 · 3 · 5)2 satisfies the triple equation 40x0 + 1 = , 24x0 + 1 =
and − 15x0 + 1 = .
We found exactly the values given in Theorem 2. These values arose from those S-integral points in Table 3 marked by an asterisk.
S-Integral Points on Elliptic Curves and Fermat’s Triple Equations
537
Table 3 S-integral points P = (x, y) =
ξ η , ζ2 ζ3
=
3 X
ni Pi + Tj , j ∈ {1, . . . , 4} on
i=1
E : y2 = x3 − 44100x + 3240000 for S = {2, 3, 5, 7, ∞}
rank basis torsion
3 P1 = (−75, 2475), P2 = (−110, 2600), P3 = (−6, 1872) T1 = (−240, 0), T2 = (150, 0), T3 = (90, 0), T4 = O
#
ξ
1 2 3 4 5 6 7 8 9 10 ∗11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
1710 −174 165 750 60 189 −110 1965 374550 310 0 −6 −240 150 90 13840 384 250 −234 21210 −210 156 85 4050 540
η −70200 2376 675 19800 900 −1287 −2600 −86625 −229226400 4400 1800 −1872 0 0 0 −1628000 6552 −2800 864 −3088800 −1800 −396 325 257400 −11700
ζ
F 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
(n1 , n2 , n3 , j) (0, −1, −1, 2) (0, −1, −1, 3) (0, −1, −1, 4) (0, −1, 0, 1) (0, −1, 0, 2) (0, −1, 0, 3) (0, −1, 0, 4) (0, −1, 1, 4) (0, 0, −2, 2) (0, 0, −1, 1) (0, 0, −1, 2) (0, 0, −1, 4) (0, 0, 0, 1) (0, 0, 0, 2) (0, 0, 0, 3) (1, −1, −2, 2) (1, −1, −1, 1) (1, −1, −1, 3) (1, −1, 0, 1) (1, −1, 0, 4) (1, 0, −1, 1) (1, 0, −1, 2) (1, 0, −1, 3) (1, 0, −1, 4) (1, 0, 0, 1)
538
Attila Peth˝ o, Emanuel Herrmann, and Horst G. Zimmer
E : y2 = x3 − 44100x + 3240000 # 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
ξ 46 210 −75 3324 2256 54 640 2650 101010 12765 72261 889 1185 −135 −375 4081 721 101265 2401 1425 −3015 −14775 5265 43665 93694185 700 1360 −1835 1450 11590 −11315 −5426 3613990 65410 16009
η −1144 1800 2475 191268 −106704 1008 15400 −136000 32103000 1442025 −19424691 16813 −32175 −17325 −20475 255529 −8569 32207175 2449 −8775 141075 541125 −210375 3378375 121993742475 −14300 4400 −52325 15200 −1232000 1883375 1769768 6864299200 −1453400 −1827323
(continued) ζ 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 4 4 4 4 8 8 16 1024 3 3 3 3 3 9 9 27 27 6
F
2 2 2 2 2 2 22 22 22 22 23 23 24 210 3 3 3 3 3 32 32 33 33 2×3
(n1 , n2 , n3 , j) (1, 0, 0, 2) (1, 0, 0, 3) (1, 0, 0, 4) (1, 0, 1, 2) (1, 1, −1, 1) (1, 1, 0, 1) (1, 1, 0, 2) (1, 1, 1, 3) (1, 2, 0, 3) (2, −1, −1, 4) (2, 1, 0, 3) (0, −2, 0, 4) (0, 0, −1, 3) (1, −1, −1, 4) (1, 1, 0, 3) (2, 0, −2, 4) (2, 0, 0, 4) (0, −2, −1, 3) (0, 0, −2, 4) (1, −1, 0, 3) (1, 1, −1, 4) (1, −1, −2, 3) (1, 1, 1, 4) (2, 0, −1, 3) (1, −3, −1, 4) (0, −1, −1, 1) (1, −1, 0, 2) (1, 0, 1, 3) (1, 1, −1, 3) (2, 0, −1, 1) (1, −2, −1, 3) (1, 0, −2, 2) (0, −2, 1, 1) (2, 1, 0, 4) (0, −2, −2, 4)
S-Integral Points on Elliptic Curves and Fermat’s Triple Equations
E : y2 = x3 − 44100x + 3240000 # ξ η 61 3055129 −5335908733 62 1986 −60984 63 26076 −4130676 64 2214 −21312 65 3984 63648 66 −3894 314496 67 −53714 −1360216 ∗68 −10800 −514800 69 40149 7800507 70 −8900 786500 71 1110 514800 72 24690 −3580200 73 4110 124200 74 9690 −514800 ∗75 3600 −216000 76 417810 −107526600 ∗77 −70531200 −26902432794600 78 122089 8471413 79 156409 33882227 80 293301 108523701 81 4614381 677241279 82 3883773601 −203504210260849
539
(continued) ζ
F
24 2 ×3 5 5 5 5 5 5 5 5 5 5 15 3×5 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 49 72 2401 74 28 22 × 7 28 22 × 7 35 5×7 175 52 × 7 3360 25 × 3 × 5 × 7 3
(n1 , n2 , n3 , j) (2, −2, 0, 4) (0, −1, 1, 3) (1, −2, −1, 2) (1, −1, −2, 1) (1, 1, 1, 1) (2, 0, −1, 4) (1, 2, 0, 2) (0, −2, −1, 2) (0, −1, −2, 3) (0, −1, 1, 1) (1, −1, −1, 2) (1, 0, −2, 3) (1, 0, 1, 1) (1, 1, 0, 4) (2, 0, −1, 2) (1, −2, 0, 3) (0, 0, −3, 2) (2, −2, −2, 4) (2, 2, 0, 4) (2, −1, 0, 3) (2, 1, −2, 3) (4, 0, −2, 4)
References 1. 2. 3. 4.
5.
6.
A. Bremner, R.J. Stroeker and N. Tzanakis, On sums of consecutive squares. J. Number Theory 62 (1997), 39 - 70. J. Coates, An effective p−adic analogue of a theorem of Thue III; The diophantine equation y2 = x3 + k, Acta Arith. 74 (1970), 425 - 435. S. David, Minorations de formes lin´eaires de logarithmes elliptiques. M´em. Soc. Math. France 62 (N.S.), 1995, 143+iv p.p. J. Gebel, Bestimmung aller ganzen und S-ganzen Punkte auf elliptischen Kurven u ¨ber den rationalen Zahlen mit Anwendung auf die Mordellschen Kurven. PhDThesis, Saarbr¨ ucken 1996. J. Gebel and H. G. Zimmer, Computing the Mordell-Weil group of an elliptic curve over . In: Elliptic Curves and Related Topics, Eds.: H. Kisilevsky and M. Ram Murty. CRM Proc. and Lect. Notes, Amer. Math. Soc., Providence, RI, 1994, 61 - 83. J. Gebel, A. Peth˝ o and H. G. Zimmer, Computing integral points on elliptic curves. Acta Arith. 68 (1994), 171 - 192.
Q
540 7.
8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.
Attila Peth˝ o, Emanuel Herrmann, and Horst G. Zimmer J. Gebel, A. Peth˝ o and H. G. Zimmer, Computing S−integral points on elliptic curves. In: Algorithmic Number Theory, Second International Symposium, ANTSII, Talence, France, May 1996, Ed.: H. Cohen, Lect. Notes in Comp. Sci., Vol. 1122, Springer Verlag, Heidelberg 1996, 157 - 171. L. Hajdu and T. Herendi, Explicit bounds for the solutions of elliptic equations with rational coefficients. To appear in J. Symbolic Comp. S. Lang, Elliptic Curves: Diophantine Analysis. Grundl. Math. Wiss. 231, SpringerVerlag, Berlin 1978. J. Leech, Four integers whose twelve quotients sum to zero, Can. J. Math. 38 (1986), 1261-1280. ¨ K. Mahler, Uber die rationalen Punkte auf Kurven vom Geschlecht Eins. J. Reine Angew. Math. 170 (1934), 168 - 178. B. Mazur, Rational isogenies of prime degree. Invent. Math. 44 (1978), 129 - 162. G. R´emond and F. Urfels, Approximation diophantienne de logarithmes elliptiques p-adiques. J. Numb. Th. 57 (1996), 133-169. N. Smart, S-integral points on elliptic curves. Proc. Cambr. Phil. Soc. 116 (1994), 391 - 399. A. Peth˝ o, H. G. Zimmer, J. Gebel, E. Herrmann, Computing all S-integral points on elliptic curves. To appear. R.J. Stroeker and N. Tzanakis, Solving elliptic diophantine equations by estimating linear forms in elliptic logarithms. Acta Arith. 67 (1994), 177-196. B. M. M. de Weger, Algorithms for diophantine equations. PhD Thesis, Centr. for Wiskunde en Informatica, Amsterdam 1987. D. Zagier, Large integral points on elliptic curves. Math. Comp. 48 (1987), 425 436. D. Zagier, Elliptische Kurven: Fortschritte und Anwendungen, Jber. d. Dt. Math.Verein. 92 (1990), 58-76. H. G. Zimmer, Generalization of Manin’s conditional algorithm. SYMSAC ’76. Proc. 1976 ACM Sympos. on Symb. Alg. Comp. Ed. R. D. Jenks, Yorktown Heights, N.Y. 1976, 285 - 299.
Attila Peth˝ o Horst G. Zimmer Institute of Mathematics and Informatics Emanuel Herrmann Kossuth Lajos Universit¨ at Fachbereich 9 Mathematik H-4010 Debrecen, P.O. Box 12 Universit¨ at des Saarlandes Hungary D-66041 Saarbr¨ ucken Germany
Speeding Up Pollard’s Rho Method for Computing Discrete Logarithms Edlyn Teske Technische Universit¨ at Darmstadt Institut f¨ ur Theoretische Informatik Alexanderstraße 10, 64283 Darmstadt, Germany [email protected]
Abstract. In Pollard’s rho method, an iterating function f is used to define a sequence (yi ) by yi+1 = f (yi ) for i = 0, 1, 2, . . ., with some starting value y0 . In this paper, we define and discuss new iterating functions for computing discrete logarithms with the rho method. We compare their performances in experiments with elliptic curve groups. Our experiments show that one of our newly defined functions is expected to reduce the number of steps by a factor of approximately 0.8, in comparison with Pollard’s originally used function, and we show that this holds independently of the size of the group order. For group orders large enough such that the run time for precomputation can be neglected, this means a real-time speed-up of more than 1.2.
1
Introduction
Let G be a finite cyclic group, written multiplicatively, and generated by the group element g. Given an element h in G, we wish to find the least non-negative number x such that gx = h. This problem is the discrete logarithm problem (DLP) in G, and x is the discrete logarithm of h to the base g. We write x = logg h. We write hgi to denote the cyclic group generated by g, and |hgi|, or |G|, to denote the order of g, which is the least positive number x such that gx = 1. An efficient method to solve the DLP is based on the rho method: Assume we are given a function f : G → G. We select a starting value y0 ∈ G and then compute the sequence (yi ) formed by the rule yi+1 = f(yi ), for i = 0, 1, 2, . . .. Since G is finite, this sequence is ultimately periodic so that there exist two uniquely determined smallest integers µ ≥ 0 and λ ≥ 1 such that yi = yi+λ for all i ≥ µ. We call µ the preperiod and λ the period of (yi ) and any pair (yi , yj ) with yi = yj and i 6= j a match. If the function f, which we refer to as the iterating function, is a random function in the sense that each of the |G||G| functions f : G →p G is equally probable, the expected value for λ + µ is close to p π|G|/2 = 1.253 |G| . Pollard [12] showed how this theory can be applied to p solve the DLP in expected run time O( |G| ) multiplications in G. Pollard’s algorithm [12] is generic in the sense that it can be applied to any group for which the following is satisfied. • Given any two group elements g and h we can compute the product g ∗ h. J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 541–554, 1998. c Springer-Verlag Berlin Heidelberg 1998
542
Edlyn Teske
• Given any two group elements g and h we can check whether g = h. • Given a small integer r (in Pollard’s case: r = 3), we can divide the group into r disjoint sets T1 , . . . , Tr of roughly equal size, and given any group element g we can check to which of these sets it belongs. The space requirements of algorithms using the rho method are negligible. Therefore, to solve the DLP in groups of large group orders, this method is superior to Shanks’ baby step-giant p step method [14] that has roughly the same run time but space requirements O( |G| ). Pollard’s original algorithm for discrete logarithm computation [12] could be used on a programmable calculator, and Pollard applied it to residue class groups (ZZ/pZZ)∗ (p prime) with group orders up to 106 . Nowadays, algorithms based on the rho method run on powerful workstations to solve the DLP in various finite Abelian groups of considerably larger group orders. Some work has been done to speed-up the rho method. There are better methods to find matches, e.g. by Brent [2], and van Oorschot and Wiener [18] have developed a method for efficient parallelization of the rho method. We now suggest to choose a more efficient iterating function to obtain further speed-up. Recently, the author has elaborated a generic algorithm [17] that uses the rho method to compute the structure of a finite Abelian group. This algorithm uses a type of iterating functions specially designed to meet the requirements for the group structure computation. Experiments show that the sequences produced by such functions have average values for λ + µ very close to the value p π|G|/2 expected for the random function case. The question naturally arises whether, for solving the DLP with the rho method, these functions lead to better performances than the function Pollard [12] used. In this paper, we compare the performances of totally four types of iterating functions, among which we find the functions successfully used for group structure computation as well as Pollard’s original function. In comprehensive experiments, we apply them to solve the DLP in groups of points of elliptic curves over prime fields (ECDLP). We chose this particular application since the ECDLP is one of the most fashionable examples for which the rho method is the best algorithm known up to date. This makes these groups particularly interesting for cryptographic applications, and much work has been done in this area since elliptic curve cryptosystems have first been proposed by Miller [10] and Koblitz [7]. However, since we do not exploit any special properties of the groups, the results we obtain are very likely to hold in any other finite Abelian group. We also use a result about random walks on the integers modp to show that the property of being-close-to-random of the functions used for group structure computation is independent of the size of the group orders, thus answering an open question in Teske [17] in the case of prime group orders. Due to the method of Pohlig and Hellman [11], both in our experiments and in our theoretical considerations we restrict ourselves to groups of prime order. Then our main result consists of the following empirical estimates. Let G be a group of prime group order p. Let g be a generator of G and h ∈ G. Let y0 be a randomly chosen element of G and the sequence (yi ) be
Speeding Up Pollard’s Rho Method for Computing Discrete Logarithms
543
formed according to the rule yi+1 = f(yi ), i = 0, 1, 2, . . ., with some function f : G → G that suits to find the discrete logarithm of h to the base g. Let E(λ + µ) denote the expected value of the sum of the preperiod and the period of (yi ). If we use the function fP suggested by Pollard [12] to define the sequence (yi ), we have √ E(λ + µ) ≥ 1.59 p . With the same function fT as used for group structure computation [17], we have √ E(λ + µ) ≤ 1.30 p . In the following, we show how we solve the DLP using the Pohlig-Hellman and the rho methods. Then, in Section 3, we define our new iterating functions and discuss some important features of the sequences generated by them. In Section 4, we describe our experiments in detail. We give a representative selection of our experimental results and show how they lead to our main result.
2
Pollard’s Rho Method for Discrete Logarithm Computation
Let g be a generator of G and h ∈ G. We outline how we use the methods of Pohlig-Hellman and Pollard to compute logg h. First we use the Pohlig-Hellman method [11] to reduce the DLP in G to the DLP in groups of prime group orderQ p, with p dividing |G|. Let the prime factorization of |G| be given as |G| = p pt(p). For each p dividing |G|, we compute x = logg h modulo pt(p), as follows. Let t = t(p). We write x mod pt in P j its base p expansion as x = t−1 j=0 xj p . Then the coefficients xj are computed by solving the equations Pj−1 i |G|/pj+1 xj = g|G|/p , h ∗ g− i=0 xi p
j = 0, 1, 2, . . ., t − 1 .
Each of these equations represents a DLP in the group of order p generated by g|G|/p . Having computed x mod pt(p) for all prime factors p of |G|, we use the Chinese Remainder Theorem to determine logg h. So for the following we assume that G has group order |G| = p with p prime. To describe how to find logg h using the rho method, we exemplarily use Pollard’s iterating function, to which we refer as Pollard’s original walk. So we divide G into three pairwise disjoint sets T1 , T2 and T3 of roughly equal size. Let fP : G → G, y ∈ T1 , g∗y , y ∈ T2 , y2 , fP (y) = h∗y , y ∈ T3 . We choose a random number α in the range {1, . . . , |G|}, compute a starting element y0 = gα , and put yi+1 = fP (yi ), for i = 0, 1, 2, . . .. This induces two
544
Edlyn Teske
integer sequences (αi ) and (βi ) with the property that yi = gαi ∗ hβi for i = 0, 1, 2, . . .. These sequences are given by α0 = α and β0 = 0, and αi+1 = αi + 1 , βi+1 = βi ,
αi+1 ≡ 2αi mod |G| , or αi+1 = αi ,
βi+1 ≡ 2βi mod |G| , or βi+1 = βi + 1 ,
according to the three cases above. While computing the terms (yi , αi , βi ), we try to find a match (yj , yi ) for some j < i. We use the same method as in Teske [17], which is based on a method of Schnorr and Lenstra [13] but with optimized parameters. This means that we work with a chain of 8 cells, which in each stage of the algorithm store altogether 8 triplets (yσd , ασd , βσd ), d = 1, . . . , 8. In the beginning we put σd = 0 for all d, thus storing (y0 , α0, β0 ) in each cell. After the computation of each new triplet (yi , αi , βi ), we check whether yi matches one of the stored terms yσd . If this is the case for some d, we return the corresponding triplet (yσd , ασd , βσd ) and stop. Otherwise, we check whether i ≥ 3σ1 . If this is the case, we put σd = σd+1 for d = 1, . . . , 7, thus shifting the contents of the 8 cells to the left; we put σ8 = i and store (yi , αi , βi ) in the last cell. With µ denoting the preperiod and λ denoting the period of the sequence (yi ), we have shown [17] that this method finds a match (yσ , yσ+λ ) with σ + λ ≤ 1.25 · max(λ/2, µ) + λ .
(1)
Having found a match (yj , yi ), we have gαj −αi = hβi −βj , from which we can compute x = logg h by solving the equation αj − αi ≡ (βi − βj )x mod p , provided that gcd(βi − βj , p) = 1. If gcd(βi − βj , p) 6= 1, we repeat the whole computation with another starting value y0 ; but this case is very rare for large group orders |G| = p. Remark 1. Under the assumption that f is a random mapping and that |G| is large enough such that a continuous approximation pis valid, the expected value of the right-hand side of (1) is approximately 1.229 π|G|/2 . See [16] for details.
3
The Iterating Functions
We next define some new iterating functions. For r ∈ IN, let T1 , . . . , Tr be a partition of G into r pairwise disjoint and roughly equally large sets. The set {1, 2, . . ., |G|} is denoted by [|1, |G|[|. To indicate that an element m is randomly chosen from the set M , according to the uniform distribution, we write m ∈R M . As in the case of Pollard’s original walk, together with the sequence (yi ) we always compute two sequences (αi ) and (βi ) which keep track of the exponents
Speeding Up Pollard’s Rho Method for Computing Discrete Logarithms
545
of g and h in the representation yi = gαi ∗ hβi . Therefore, together with the definition of each iterating function f we indicate how the application of f to yi effects on αi and βi . As before, we have y0 = gα with α ∈R [|1, |G|[|, and α0 = α, β0 = 0. 1. Pollard’s walk, modified. As in Pollard’s original walk, we use a partition of G into 3 sets. But now we let m, n ∈R [|1, |G|[| and put M = gm , N = hn . Then we define fPm : G → G, y ∈ T1 , M ∗ y , y ∈ T2 , y2 , fPm (y) = N ∗y , y ∈ T3 . For the sequences (αi ) and (βi ) this means or αi+1 = αi (mod|G|) , αi+1 ≡ αi + m , αi+1 ≡ 2αi , or βi+1 ≡ βi + n (mod|G|) , βi+1 ≡ βi , βi+1 ≡ 2βi , according to the three cases above. 2. Linear walk, with 20 multipliers. (This is the walk used in [17].) We use a partition of G into 20 sets T1 , . . . , T20 . Let m1 , n1 , . . . , m20 , n20 ∈R [|1, |G|[| and put s = 1, . . . , 20 . Ms = gms ∗ hns , Define fT : G → G, fT (y) = Ms ∗ y ,
with s = s(y) such that y ∈ Ts .
For the sequences (αi ) and (βi ) this means αi+1 ≡ αi + ms
and
βi+1 ≡ βi + ns
(mod|G|) ,
where s such that yi ∈ Ts . 3. Combined walk, with 16 multipliers and 4 squarings. Again, we use a partition of G into 20 sets. Choose 4 pairwise distinct numbers u1 , . . . , u4 between 1 and 20, and let m1 , n1 , . . . , m20 , n20 ∈R [|1, |G|[|. Put Ms = gms ∗ hns , Define fC : G → G, Ms ∗ y , fC (y) = y2 ,
s ∈ {1, . . . , 20} \ {u1 , u2 , u3 , u4} .
if s ∈ / {u1 , u2 , u3 , u4 } and with s s.th. y ∈ Ts , otherwise .
For the sequences (αi ) and (βi ) this means αi+1 ≡ αi + ms βi+1 ≡ βi + ns according to the cases above.
or or
αi+1 ≡ 2 ∗ αi (mod|G|) , βi+1 ≡ 2 ∗ βi (mod|G|) ,
546
Edlyn Teske
Note that only when using fT as iterating function we do not need to know the group order. An upper bound of |G| is already suitable for defining the multipliers M1 , . . . , M20 appropriately, and we have shown [17] how one can do without knowing anything about |G|. On the contrary, for fP , fPm and fC the knowledge of |G|, or at least of a multiple of it, is indispensable because of the otherwise exponential growth of (αi ) and (βi ). Since fPm and fC can be viewed as variations of fP and fT , we restrict our further discussion of the iterating functions to fP and fT . Let x = logg h. Then the terms yi can be written as yi = gei ,
where
ei ≡ αi + βi x mod p ,
i = 0, 1, 2, . . . .
This means there is a one-to-one correspondence between the walks (yi ) in G and the walks (ei ) on the integers modp. A truly random walk in G, for instance, corresponds to the walk e0 ∈R [|1, p[| ,
ei+1 ≡ ei + d mod p ,
with d ∈R [|1, p[| .
In this case, the numbers ei are uniformly distributed on the integers modp. Since we want to produce sequences (yi ) with expected preperiods and periods as close as possible to the case of the random walk, it is desirable that the distributions of the ei generated by our iterating functions get as close as possible to uniformly distributed on the integers modp, and this should happen after as few steps as possible. In the cases of both fP and fT , we have e0 ∈R [|1, p[|. In the case of fP , we then have ei+1 = ei + 1 ,
ei+1 ≡ 2 ∗ ei ,
or
ei+1 ≡ ei + x
(modp) ,
(2)
with s = s(yi ) such that yi ∈ Ts .
(3)
whereas fT produces ei+1 ≡ ei + (ms + ns x) mod p ,
At first glance at these equations, it is not clear how to predict anything about how fast and how close the sequences (ei ) get to uniformly distributed modp. Maybe the fact that one third of the steps in (2) has step-width one and one third has step-width x slows down the process of coming close to uniformly distributed? Maybe this effect varies for different sizes of group orders? One may also have some concerns that in the case of (3), the process of getting close to uniformly distributed slows down with increasing group orders, due to the fact that the number of multipliers in the definition of fT remains constant (= 20). We are not aware of any theoretical result about the behavior of the sequence given through (2). In the case of the sequence given through (3), we can make use of a result of Hildebrand [4], which we present in the following. Let n be a prime. Let k ≥ 2 and p1 , . . . , pk such that pj > 0 for all j and Pk a = (a1 , . . . , ak ) and j=1 pj = 1. For a1 , . . . , ak ∈ [|1, n[|, let ˜ if a = aj for some j , pj Pa˜ (a) = 0 otherwise .
Speeding Up Pollard’s Rho Method for Computing Discrete Logarithms
547
(m)
Then for m ∈ IN, let Pa˜ be the probability distribution of the sum of m independent random variables distributed as Pa˜ . We consider the random walk (ei ) on the integers modn defined by e0 = 0 ,
ei+1 = ei + a ,
i = 0, 1, 2, . . . ,
(4)
where each a is randomly chosen from the set a1 , . . . , ak , according to the prob(m) ability distribution Pa˜ . Then Pa˜ gives the probability distribution of the position of the random walk after m steps. Then, with the distance of a probability distribution P on a finite group G from the uniform distribution U being defined as 1 X 1 max |P (A) − U (A)| , kP − U k := P (v) − |G| = A⊆G 2 v∈G
we have the following theorem. Theorem 1 ([4]). Let pj (j = 1, . . . , k), ˜a and Pa˜ be as above. Given ε > 0, then for sufficiently large primes n there exists some constant γ > 0, which may depend on k and on the values for pj but not on n, such that for m = bγn2/(k−1) c we have (m) E(kPa˜ − U k) < ε , where the expectation is taken over a uniform choice of all possible a ˜ such that a1 , . . . , ak ∈ [|1, n[| and such that all values of a1 , . . . , ak are pairwise distinct. It is worth noting that Greenhalgh [3] has shown the following lower bound, which nicely complements Theorem 1. ˜ and Pa˜ be as above. Then there exists a Theorem 2. Let pj (j = 1, . . . , k), a value β = β(p1 , . . . , pk ) > 0 and n0 = n0 (p1 , . . . , nk ) such that for all choices of a ˜, m = bβn2/(k−1)c and n > n0 , (m)
kPa˜
− Uk ≥
1 . 4
Comparing the number m = bγn2/(k−1) c of steps after which the random walk (4) is expected p to be close to uniformly distributed with the expected number E(λ + µ) ≈ πn/2 of steps until the first match occurs in the rho method, we see that limn→∞ m/E(λ + µ) = 0 for k ≥ 6. Let us now go back to the walk (ei ) defined through (3). Let as = ms + ns x, s = 1, . . . , 20. Let X denote the number of pairwise distinct numbers in the set {a1 , . . . , a20}. We Q19 have P (X = 20) = l=0 (1 − l/n) > 0.99998 for n ≥ 107 , so that in the very most cases we work with k = 20 pairwise distinct numbers as . Apart from this, the situation of (3) differs from the situation of Theorem 1 only by the fact that ei+1 = ei + as with s such that yi ∈ Ts rather than s randomly chosen. These differences do not change the following conclusion from Theorems 1 and 2, which answers an open question in Teske [17].
548
Edlyn Teske
Corollary 1. If for the sequences (yi ) defined by fT we observe the same average performance for some range of prime group orders, this performance does not considerably change when passing over to much larger group orders. Remark 2. On the other hand, it is very likely that if a certain stable performance for any of the iterating functions fP , fPm , fT , fC is observed over a sufficiently large range of group orders, it will not considerably improve when passing over to much larger group orders.
4
Experimental Results
Using the computer algebra system LiDIA [9], we implemented the PohligHellman and the rho methods and conducted experiments to compare the performances of the iterating functions fP , fPm , fT and fC to solve the DLP in elliptic curve groups over prime fields of characteristic 6= 2, 3. In this section, we describe these experiments and give a representative selection of our experimental results. Let us first introduce elliptic curve groups over prime fields and the notation we use in the following. We refer to Koblitz [6] for an elementary introduction to elliptic curves, and to Silverman [15] for more details. So let q be a prime 6= 2, 3, and let IFq denote the field ZZ/qZZ of integers modulo q. Let a, b ∈ IFq such that 4a3 + 27b2 6= 0. Then the elliptic curve Ea,b over IFq is defined through the equation Ea,b : y2 = x3 + ax + b . The set of all solutions (X, Y ) ∈ IFq × IFq of this equation, together with the element O called the “point at infinity”, forms a finite Abelian group which we denote by Ea,b (IFq ). Usually, this group is written additively. But we remain in the multiplicative setting of the previous sections and therefore write it multiplicatively, which is just a matter of notation. For r ∈ {3, 20}, we define the partition of Ea,b (IFq ) into r sets T1 , . . . , Tr as follows. First we compute a rational approximation A of the golden mean √ ( 5 − 1)/2, with a precision of 2 + blog10 (qr)c decimal places. Let (AY ) mod 1 if P 6= O , P = P (X, Y ) 7→ u∗ : Ea,b (IFq ) → [0, 1) , 0 if P = O , where c mod 1 denotes the (non-negative) fractional part of c, namely c − bcc. Then let u : Ea,b (IFq ) → {1, . . . , r} ,
u(P ) = bu∗ (P ) · rc + 1
and Ts = {P ∈ Ea,b (IFq ) : u(P ) = s} . From the theory of multiplicative hash functions we know [5] that among all √ numbers between 0 and 1, choosing A as a rational approximation of ( 5 − 1)/2
Speeding Up Pollard’s Rho Method for Computing Discrete Logarithms
549
with a sufficiently large precision (that is, in comparison with the input size) leads to the most uniformly distributed hash values, even for non-random inputs. The precision indicated above is large enough such that all decimal places of the golden mean that are significant for the value of u(P ) appear in A. The purpose of our experiments is to produce data on which we can base reliable statements about the expected number of steps until a match is found. These statements are needed for all four iterating functions defined in Sections 2 and 3, and they have to be made in terms of the square root of the orders of the groups in which we use the rho method. For our experiments, this means that we actually do not need to perform all steps of the discrete logarithm computation (as presented in Section 2) in order to get the data we want: Given a discrete logarithm problem, we restrict ourselves to solving it in the subgroup whose order p is the largest prime factor of |G|, thus producing data relevant for groups of group order p. When using the rho method to compute this discrete logarithm, we count the number of steps we perform until we find a match. Then √ we determine the ratio R of the number of steps and p. We do this a couple of times for each iterating function, for a couple of DLPs, in a couple of groups of some group order p between 102 and 1012 . Let us describe this explicitly. First we produce a data file containing approximately 2000 6-tuples (q, a, b, n, p, k) with the following properties: q > 102 and prime, Ea,b is an elliptic curve over IFq , the corresponding elliptic curve group Ea,b (IFq ) has group order n, and p is the largest prime factor of n and has k digits, k ≥ 3. To compute a 6-tuple we select a number l, 2 ≤ l ≤ 20, randomly choose a prime q between 10l and 10l+1 , then randomly chose a, b ∈ (IFq )∗ and check whether 4a3 + 27b2 6= 0 mod q. If this is the case, we use our implementation for the group structure computation [17] or, for primes q > 109 , the implementation [8] of an algorithm of Atkin [1], to compute the order n of Ea,b (IFq ). Finally we factor n to find p and k. Having built up this file, for k = 3, 4, . . . , 13 we go through the following algorithm: 1. Read 6-tuple (q, a, b, n, p, k) from file. 2. Use the algorithm for group structure computation to find a group element g such that gn/p has group order p and therefore is a generator of the subgroup G(p) = {P n/p : P ∈ Ea,b (IFq )}. 3. Randomly choose h ∈ Ea,b (IFq ) and compute hn/p . 4. Put g0 = gn/p and h0 = hn/p and G = G(p). 5. Use the rho-method as described in Section 2 to compute logg0 h0 = logg h mod p. For each of the four iterating functions fP , fPm , fT , fC , do this st times, where st = st(k) and between 100 and 1. 6. For each of the four iterating functions, keep track of the average run times and of the average number of steps computed until a match has been found. 7. Go back to 1. until m 6-tuples have been used, where m = m(k) and between 100 and 30. Our results are listed in Tables 1 – 3.
550
Edlyn Teske
Table 1. DL-computation in groups of prime order, average number of steps √ Number of average (number of steps/ p) with Number of digits in Pollard’s Pollard’s linear combined walk examples largest prime original walk, walk, with 20 16 multipliers, computed factor p walk modified multipliers 4 squarings (m · st) 3 1.891 1.871 1.454 1.463 100 · 100 4 1.776 1.844 1.453 1.477 100 · 100 5 1.773 1.832 1.453 1.461 100 · 100 6 1.800 1.837 1.462 1.469 100 · 100 7 1.825 1.820 1.445 1.469 100 · 100 8 1.703 1.832 1.443 1.459 80 · 40 9 1.773 1.842 1.440 1.461 40 · 30 10 1.804 1.817 1.441 1.474 30 · 30 11 1.948 1.872 1.428 1.489 25 · 20 12 1.856 1.801 1.431 1.481 30 · 5 13 1.895 1.785 1.319 1.313 40 · 1 ave 1.807 1.841 1.452 1.467
In Table 1, each row shows the averages taken over the m ratios √ average number of steps until match is found p, (average taken over the st computations for the same DLP) where p denotes the order of the group in which the respective computation took place. In the last row we list the averages taken over all ratios in the rows above, where we weighted each ratio by the number of examples having contributed to it. We see a clear difference between the average performances of fP and fPm on the one hand and of fT and fC on the other hand. The performances are convincingly stable, which gives us a good basis for drawing conclusions in the sense of Corollary 1 and Remark 2. Note that the higher oscillation in the first row is mostly due to the fact that as soon as the partition {T1 , T2 , T3 } and the group elements g and h are defined, the functional graph associated with the map fP is completely determined. Therefore, when using Pollard’s original walk the only variation between the st different runs for the same DLP comes through the different starting points, but important properties such as the number of components of the graph or the cycle lengths in the components remain the same. We illustrate this phenomenon in Table 2. For both examples given in this table, we ran our algorithm st = 100 times. We see that while the average ratios for fPm , fT and fC differ only slightly from the values in Table 1, the ratios for fP differ considerably. We also show the average values taken over the 10 smallest numbers of steps and the 10 largest numbers of steps. These two examples are extreme but typical cases. In all cases where we took averages over 100 computations, the ratios for fPm varied between 1.6 and 2.0, for fT and fC between 1.2 and 1.7, but for fP we often found
Speeding Up Pollard’s Rho Method for Computing Discrete Logarithms
551
Table 2. Two examples with m = 1 Pollard’s Pollard’s lin. walk, comb. walk Number of original walk, with 20 16 multipl., examples walk modified multipl. 4 squarings (m · st) q = 422827, a = 334851, b = 138169, n = 422613, p = 46957, g = (29541, 46435), h = (105820, 396164), x = 7855 √ av.no.steps/ p 3.193 1.892 1.527 1.481 1 · 100 av.(10 smallest) 528 87 73 124 av. (10 largest) 938 769 766 617 q = 34158689, a = 5903203, b = 12110056, n = 34152717, p = 81901, g = (1663637, 28574918), h = (27578155, 12646030), x = 48707 √ av.no.steps/ p 0.974 1.77 1.415 1.488 1 · 100 av.(10 smallest) 115 86 95 109 av. (10 largest) 497 986 864 898
similar deviations as shown in Table 2. In this respect, the experiments with the modified walk fPm can be viewed as control experiments for fP . It is interesting to see what the different average ratios mean for the expected run times. Since we do not want do take average run times when different groups are involved, for k = 5, . . . , 13 we select one elliptic curve group each from the previously computed examples such that the largest prime factor p of the group √ order has k digits and such that the average ratio (number of steps)/ p is close to the corresponding value of Table 1. These ratios together with the average run times are listed in Table 3; all run times were taken on a SPARCstation ULTRA170. We see that for prime group orders up to seven digits, the smaller number of steps needed by fT and fC does not pay off in run time, whereas for prime group orders with 9 or more digits we notice a clear speed-up. This is due to the fact that for fT and fC we have to precompute the multipliers. Using the method of fast exponentiation for this, the precomputation requires O(log p) multiplications so that the run time for it becomes more and more negligible with increasing group orders. In our experiments, it never took more than two seconds to compute the multipliers. Finally, we want to recover the expected values for λ+µ from our experimental data. For this, we need the expected “delay factor” δ = E(l(λ, µ))/(λ + µ)), where l(λ, µ) denotes the number of steps until a match is found by our algorithm. For the case that the iterating function is a random function, an upper bound for δ is given in Remark 1: δ ≤ 1.229. A sharp value for δ can be found experimentally. For this, we run our algorithm for groups of small prime group orders, but we store the whole sequence. In each run, in addition to l(λ, µ) we determine λ + µ and compute the ratio of both numbers. Having done this 50 times, we take the average over these ratios. The results are shown in Table 4.
552
Edlyn Teske
Table 3. Selected run times, on SPARCstation ULTRA170 Number of digits in largest p, (p) 5 (62219) 6 (690611) 7 (2994463) 8 (29738497) 9 (102982171) 10 (1485244759) 11 (2189335923) 12 (416701214639) 13 (4105475030323)
√ no. of steps/ p run time √ no. of steps/ p run time √ no. of steps/ p run time √ no. of steps/ p run time √ no. of steps/ p run time √ no. of steps/ p run time √ no. of steps/ p run time √ no. of steps/ p run time √ no. of steps/ p run time
Pollard’s Pollard’s lin. walk, comb. walk original walk, with 20 16 multipl., walk modified multipl. 4 squarings 1.8 1.82 1.455 1.447 0.10 s 0.11 s 0.19 s 0.17 s 1.838 1.829 1.425 1.462 0.36 s 0.37 s 0.48 s 0.46 s 1.831 1.813 1.452 1.486 0.67 s 0.69 s 0.68 s 0.69 s 1.711 1.883 1.447 1.465 2.07 s 2.31 s 1.90 s 1.95 s 1.727 1.846 1.44 1.472 4.53 s 4.72 s 3.55 s 3.94 s 1.818 1.857 1.467 1.469 19.84 s 19.89 s 15.22 s 15.23 s 1.833 1.862 1.476 1.449 1 m 23.63 s 1 m 25.70 s 1 m 7.18 s 1 m 7.09 s 1.809 1.832 1.434 1.458 7 m 41.84 s 7 m 49.9 s 6 m 1.22 s 6 m 4.18 s 1.799 1.831 1.429 1.452 27 m 17.7 s 27 m 48.55 s 22 m 13.52 s 22 m 29.79 s
Dividing the average values of the last row of Table 1 by the corresponding average delay factors of Table 4, we obtain the following approximations of the expected values for λ + µ: p 1. Pollard’s original walk (fP ): E(λ + µ) ≈ 1.596 |G|. p 2. Pollard’s walk, modified (fPm ): E(λ + µ) ≈ 1.623 |G|. p 3. Linear walk, 20 multipliers (fT ): E(λ + µ) ≈ 1.292 |G|. p 4. Combined walk, 20 mult., 4 squarings (fC ): E(λ + µ) ≈ 1.3 |G|. Comparing these values, we see that we obtain a speed-up of more than 1.2 if we use fT instead of fP . Corollary 1 and Remark 2 ensure that this holds not only for group orders up to 1013 but also beyond this bound.
5 Acknowledgments This paper was written during the author’s stay at the Department of Computer Science of the University of Manitoba. The author wishes to thank Hugh Williams for giving her this wonderful opportunity. The author also wishes to thank Eric Bach for pointing out Hildebrand’s work, and Martin Hildebrand for his helpful comments on his results.
Speeding Up Pollard’s Rho Method for Computing Discrete Logarithms
553
Table 4. Delay factors Number of average(# steps to find match/(λ + µ)) with digits in Pollard’s Pollard’s lin. walk, comb. walk largest prime original walk, with 20 16 multipl., factor p walk modified multipl. 4 squarings 3 1.127 1.138 1.12 1.125 4 1.138 1.135 1.126 1.126 5 1.137 1.134 1.131 1.128 6 1.127 1.129 1.118 1.132 average 1.132 1.134 1.124 1.128
References 1. O. Atkin. The number of points on an elliptic curve modulo a prime. Manuscript. 2. R. P. Brent. An improved Monte Carlo factorization algorithm. BIT, 20:176–184, 1980. 3. A. Greenhalgh. Random walks on groups with subgroup invariance properties. PhD thesis, Department of Mathematics, Stanford University, 1989. 4. M. V. Hildebrand. Random walks supported on the random points of ZZ/nZZ . Probability Theory and Related Fields, 100:191–203, 1994. 5. D. E. Knuth. The art of computer programming. Volume 3: Sorting and searching. Addison-Wesley, Reading, Massachusetts, 1973. 6. N. Koblitz. A Course in Number Theory and Cryptography. Springer-Verlag, New York, 1987. 7. N. Koblitz. Elliptic curve cryptosystems. Mathematics of Computation, 48:203– 209, 1987. 8. F. Lehmann, M. Maurer, V. M¨ uller, and V. Shoup. eco - a tool for elliptic curve group order computations, 1997. TI, Technische Universit¨ at Darmstadt. 9. LiDIA Group, Technische Universit¨ at Darmstadt. LiDIA - A library for computational number theory. Available from http://www.informatik.tudarmstadt.de/TI/LiDIA. 10. V. Miller. Uses of elliptic curves in cryptography. In Advances in Cryptology CRYPTO ’85, volume 218 of Lecture Notes in Computer Science, pages 417–426, 1986. 11. S. C. Pohlig and M. E. Hellman. An improved algorithm for computing logarithms over GF (p) and its cryptographic significance. IEEE-Transactions on Information Theory, 24:106–110, 1978. 12. J. M. Pollard. Monte Carlo methods for index computation (mod p). Mathematics of Computation, 32(143):918–924, 1978. 13. C. P. Schnorr and H. W. Lenstra, Jr. A Monte Carlo factoring algorithm with linear storage. Mathematics of Computation, 43(167):289–311, 1984. 14. D. Shanks. Class number, a theory of factorization and genera. In Proc. Symp. Pure Math. 20, pages 415–440. AMS, Providence, R.I., 1971. 15. J. Silverman. The arithmetic of elliptic curves. Springer-Verlag, 1986. 16. E. Teske. New algorithms for finite abelian groups. PhD thesis, Technische Universit¨ at Darmstadt, 1998.
554
Edlyn Teske
17. E. Teske. A space efficient algorithm for group structure computation. To appear in Mathematics of Computation, 1998. 18. P. C. van Oorschot and M. J. Wiener. Parallel collision search with cryptanalytic applications. To appear in Journal of Cryptology.
A General Method of Constructing Global Function Fields with Many Rational Places Harald Niederreiter1 and Chaoping Xing2 1
Institute of Information Processing, Austrian Academy of Sciences, Sonnenfelsgasse 19, A–1010 Vienna, Austria [email protected] 2 Department of Information Systems and Computer Science The National University of Singapore, Singapore 11926 [email protected]
Abstract. We present a general method of constructing global function fields with many rational places based on Drinfeld modules of rank 1 and narrow ray class fields. This method leads to many improvements on previous constructions. We tabulate improvements for constant fields of order q = 4, 8, 9, 16, and 27.
1
Introduction
Let q be an arbitrary prime power and let K be a global function field with full constant field IFq , i.e., K is an algebraic function field over the finite field IFq with IFq algebraically closed in K. We use the notation K/IFq if we want to emphasize the fact that IFq is the full constant field of K. By a rational place of K we mean a place of K of degree 1. We write g(K) for the genus of K and N (K) for the number of rational places of K. For fixed g ≥ 0 and q we put Nq (g) = max N (K), where the maximum is extended over all global function fields K/IFq with g(K) = g. Equivalently, Nq (g) is the maximum number of IFq -rational points that a smooth, projective, absolutely irreducible algebraic curve over IFq of given genus g can have. The calculation of Nq (g) is a very difficult problem, so in most cases we have only bounds for this quantity. In an informal way, we say that a global function field K/IFq of genus g has many rational places if N (K) is reasonably close to Nq (g) or to a known upper bound for Nq (g). Global function fields with many rational places, or equivalently algebraic curves over IFq with many IFq -rational points, allow applications in algebraic coding theory (see [12], [13]) and in recent constructions of lowdiscrepancy sequences (see [5], [7], [9], [16]). In view of these applications, the subject of global function fields with many rational places has generated a lot of interest. We refer to Garcia and Stichtenoth [1] and Niederreiter and Xing [10], [11] for recent surveys of the literature. J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 555–566, 1998. c Springer-Verlag Berlin Heidelberg 1998
556
Harald Niederreiter and Chaoping Xing
In this paper we generalize a method due to the authors [8] for the construction of global function fields with many rational places. This method can be applied whenever q is composite and is based on narrow ray class fields obtained from Drinfeld modules of rank 1. The general method is quite powerful and leads to many improvements on previous constructions (see the tables in Section 3).
2
The General Method
We follow the notation and terminology in [8] as much as possible. For the general background on Drinfeld modules we refer to Goss [2] and Hayes [3]. Let F/IFq be a global function field with N (F ) ≥ 1. We distinguish a rational place ∞ of F , let A be the ∞-integral ring of F (i.e., the ring of elements of F that have no poles outside ∞) and HA the Hilbert class field of F with respect to A. Then [HA : F ] = h(F ), the divisor class number of F . We fix a sign function sgn and let φ be a sign-normalized Drinfeld A-module of rank 1 defined over HA . The additive group of the algebraic closure H A of HA forms an A-module under the action of φ. For a nonzero integral ideal M in A, let Λ(M ) be the M -torsion submodule of H A . The field E = HA(Λ(M )) generated by the elements of Λ(M ) over HA is called the narrow ray class field modulo M. This field is independent of the specific choice of the sign-normalized Drinfeld A-module φ of rank 1. We have Gal(E/F ) ' PicM (A) := IM (A)/RM (A), where IM (A) is the group of all fractional ideals of A that are prime to M and RM (A) is the subgroup of IM (A) consisting of all principal ideals bA with sgn(b) = 1 and b ≡ 1 mod M . Note that (1)
|PicM (A)| = h(F )|(A/M )∗|,
where (A/M )∗ is the group of units of the ring A/M . For r ≥ 2 the constant field extension Fr = IFqr · F of F is viewed as a global function field with full constant field IFqr . Then the place ∞ can also be viewed as a rational place of Fr /IFqr with ∞-integral ring Ar of Fr . Let Q 6= ∞ be a place of F of degree d with gcd(d, r) = 1. Then similarly, Q is a place of Fr /IFqr of the same degree d. Note that Q corresponds to a nonzero prime ideal in A and Ar , respectively. For a given n ≥ 1, the groups (Ar /Qn )∗ and PicQn (A) can both be viewed as subgroups of PicQn (Ar ), as explained in [8, p. 84]. Let I∞ be the inertia group of ∞ in the extension HAr (Λ(Qn ))/Fr . Then I∞ is a subgroup of (Ar /Qn )∗ isomorphic to IF∗qr and I∞ is also the decomposition group of ∞ in the same extension. With the notation above we have the following auxiliary result. Lemma 1. Let T be a subgroup of I∞ . Then: (i) (Ar /Qn )∗ ∩ (T · PicQn (A)) = T · (A/Qn )∗ ; (ii) |T ∩ PicQn (A)| = gcd(|T |, q − 1). Proof. (i) This result generalizes [8, Lemma 1] and its proof is analogous to that of [8, Lemma 1].
Function Fields with Many Rational Places
557
(ii) This follows from the fact that I∞ is formed by the residue classes mod Qn of the elements of IF∗qr , so that I∞ ∩ PicQn (A) is a cyclic subgroup of (A/Qn )∗ of order q − 1. u t The following result generalizes Theorems 1 and 2 of [8]. We note that in the latter theorems and their corollaries in [8], we can replace the equalities in the results on N (K) by lower bounds since in the proofs we considered only rational places of K lying over IFq -rational places rather than over arbitrary IFqr -rational places. Theorem 1. Let F/IFq be a global function field of genus g(F ). For an integer r ≥ 2 let t be a positive divisor of q r − 1 and put s = gcd(q − 1, t). Suppose that F has a place of degree d with gcd(d, r) = 1 and that N (F ) ≥ 1 + εd , where εd = 1 if d = 1 and εd = 0 if d ≥ 2. Then for every integer n ≥ 1 there exists a global function field Kn /IFqr with s(q r − 1)h(Fr ) · 2g(Kn ) − 2 = t(q − 1)h(F )
s(q r − 1) t(q − 1) r s(q − 1) + t(q − 1)
+
(q − 1)(q dr − 1) d(r−1)(n−1) q (2g(F ) + dn − 2) (q d − 1)(q r − 1) d(q − 1)(q dr − 1)(q d(r−1)(n−1) − 1) −d − (q d − 1)(q r − 1)(q d(r−1) − 1) h(Fr ) −1 d h(F ) (q − 1)(q dr − 1)h(Fr ) d(r−1)(n−1) −1 q (q d − 1)(q r − 1)h(F )
and N (Kn ) ≥
s(q dr − 1)h(Fr ) d(r−1)(n−1) q (N (F ) − 1 − εd ) t(q d − 1)h(F ) +
(q − 1)(q dr − 1)h(Fr ) d(r−1)(n−1) h(Fr ) q εd . + (q d − 1)(q r − 1)h(F ) h(F )
Proof. Let ∞ be a distinguished rational place of F/IFq and let Q 6= ∞ be a place of F/IFq of degree d. As noted above, Q is still a place of degree d of Fr /IFqr . For given n ≥ 1 let En = HAr (Λ(M )) be the narrow ray class field modulo the ideal M = Qn in Ar . Then we can identify Gal(En/Fr ) with PicM (Ar ). Let T be a subgroup of I∞ with |T | = t. Now let Kn be the subfield of the extension En /Fr fixed by the subgroup H = T ·PicM (A) of PicM (Ar ). We have |H| =
t |T | · |PicM (A)| = h(F )(q d − 1)q d(n−1) |T ∩ PicM (A)| s
by (1) and Lemma 1(ii), and so (2)
[Kn : Fr ] =
s(q dr − 1)h(Fr ) d(r−1)(n−1) |PicM (Ar )| = q , |H| t(q d − 1)h(F )
558
Harald Niederreiter and Chaoping Xing
where we used again (1). The rational place of Fr /IFqr lying over ∞ is again denoted by ∞. Let P∞ be a place of Kn lying over ∞. Then the inertia group of P∞ in the extension En /Kn is I∞ ∩ H, and so the ramification index e(P∞ |∞) of P∞ over ∞ is given by |I∞ · H| |I∞ · PicM (A)| |I∞ | = = |I∞ ∩ H| |H| |H| s(q r − 1) |I∞ | · |PicM (A)| = , = |I∞ ∩ PicM (A)| · |H| t(q − 1)
e(P∞ |∞) =
where we used Lemma 1(ii) in the last step. Let R be a place of Kn lying over Q. Since the inertia group of Q in En /Fr is (Ar /M )∗ by the theory of narrow ray class fields (compare with [8, Propositions 1 and 2]), the inertia group of R in En /Kn is (Ar /M )∗ ∩ H = T · (A/Qn )∗ in view of Lemma 1(i). Thus, the ramification index e(R|Q) of R over Q is given by (3) |(Ar /M )∗| · |T ∩ (A/Qn )∗ | s(q dr − 1) d(r−1)(n−1) |(Ar /M )∗ | = = q . e(R|Q) = n ∗ n ∗ |T · (A/Q ) | |T | · |(A/Q ) | t(q d − 1) Let Ln be the subfield of En /Fr fixed by I∞ ·PicM (A). Then ∞ is unramified in Ln /Fr , and from the special case t = q r − 1 in (3) we get that Q is ramified in Ln /Fr with ramification index (q − 1)(q dr − 1) d(r−1)(n−1) q . (q r − 1)(q d − 1) Furthermore, from (2) applied to Kn and Ln we obtain [Kn : Ln ] =
s(q r − 1) . t(q − 1)
It follows that the places of Ln lying over ∞ or Q are all totally and tamely ramified in the extension Kn /Ln , and by the theory of narrow ray class fields these are the only ramified places in this extension. By the proofs of Theorems 1 and 2 in [8], the sum of the degrees of the places of Ln lying over Q is dh(Fr )/h(F ). Now we apply the Hurwitz genus formula to the extension Kn /Ln and note that ∞ splits completely in Ln /Fr , then we obtain r h(Fr ) s(q r − 1) s(q − 1) (2g(Ln ) − 2) + −1 d 2g(Kn ) − 2 = t(q − 1) t(q − 1) h(F ) r (q − 1)(q dr − 1)h(Fr ) d(r−1)(n−1) s(q − 1) −1 q . + t(q − 1) (q d − 1)(q r − 1)h(F ) The desired formula for g(Kn ) follows now from the formulas for g(Ln ) in Theorems 1 and 2 in [8].
Function Fields with Many Rational Places
559
By construction, all rational places of Fr counted by N (F ), with the possible exception of ∞ and Q, split completely in Kn /Fr . By what we have shown above, ∞ splits into (q − 1)(q dr − 1)h(Fr ) d(r−1)(n−1) q (q d − 1)(q r − 1)h(F ) rational places in Kn /Fr . If d = 1, then Q splits into h(Fr )/h(F ) rational places of Ln , as shown in the proof of Theorem 1 in [8], and we have noted above that these are totally ramified in Kn /Ln . Putting these facts together and using (2), t we get the desired lower bound on N (Kn ). u
3
Tables
Theorem 1 is a powerful tool for constructing global function fields with many rational places and thus for getting lower bounds on Nqr (g). Tables 2 to 6 list examples of global function fields K with full constant fields IF4 , IF8 , IF9 , IF16 , and IF27 that are obtained from Theorem 1 and yield at least as large a value of N (K) as the previously best example (according to the tables in [11], [14]) for the given genus. As an additional condition for including a field K/IFqr in these tables we have used √ N (K) > ( q r − 1)g(K), which is suggested by the Vlˇ adut-Drinfeld bound [15]. The resulting tables are much more extensive than those obtained by the methods of other authors such as Lauter [4]. In Tables 2 to 6 we list data that are required in Theorem 1, namely the values of g(F ), d, t, and n. If g(F ) ≥ 1, then in the column labeled “F ” there is a pointer to the list of explicitly described base fields F in Table 1. In the second column of Tables 2 to 6, the first number is a lower bound for Nqr (g) obtained from Theorem 1 and the second is an upper bound for Nqr (g). If only one number is given, then this is the exact value of Nqr (g). A program for calculating upper bounds for Nqr (g), which is based on Weil’s explicit formula for the number of rational places in terms of the zeta function and on the trigonometric polynomials of Oesterl´e, was kindly supplied to us by Jean-Pierre Serre. In Table 1 we list the base fields F/IFq that are needed for Tables 2 to 6. The field F = IFq (x, y) is given either by a reference or by the defining equation of y over IFq (x). The quotient h(Fr )/h(F ) of divisor class numbers, that will simply be denoted by hr /h in Table 1, is obtained by standard methods (compare with [12, Chapter V]) from the L-polynomial LF (u) = (1 − u)(1 − qu)ZF (u) of F , where ZF (u) is the zeta function of F .
560
Harald Niederreiter and Chaoping Xing
Table 1. Base fields F/IFq q g(F ) N (F ) equation or reference 2 1 5 y 2 + y = x3 + x 2 1 4 y2 + y = (x + 1)2 /x 2 1 3 y2 + y = x(x2 + x + 1) 2 2 6 y2 + y = x(x + 1)/(x3 + x + 1) F.5 2 2 5 y2 + y = x2 (x + 1)(x2 + x + 1) F.6 2 2 4 y2 + y = x/(x3 + x + 1) F.1 F.2 F.3 F.4
F.7 2 F.8 2
2 3
F.9 3 F.10 3 F.11 3 F.12 3 F.13 3
1 1 1 1 2
F.14 3
2
F.15 4 F.16 4 F.17 4
1 1 1
F.18 4
1
F.19 4 F.20 4
1 2
LF (u) h2 /h h3 /h 2u2 + 2u + 1 1 1 2u2 + u + 1 2 1 2u2 + 1 3 4u4 + 6u3 + 5u2 + 1 4 3u + 1 4u4 + 4u3 + 4u2 + 3 2u + 1 4u4 + 2u3 + 3u2 + 5 u+1 3 y2 + y = x(x2 + x + 1)2 4u4 + 2u2 + 1 7 7 [6, Example 3A] 8u6 + 16u5 + 18u4 + 1 7 15u3 + 9u2 + 4u + 1 7 y 2 = x3 − x + 1 3u2 + 3u + 1 1 4 2 2 6 y = x(x + x − 1) 3u2 + 2u + 1 2 3 5 y 2 = x3 − x2 + 1 3u2 + u + 1 3 4 y2 = −x(x2 + 1) 3u2 + 1 4 7 2 6 2 8 y = x −x +1 9u4 + 12u3 + 9u2 + 3 16 4u + 1 7 y 2 = x5 − x + 1 9u4 + 9u3 + 7u2 + 5 19 3u + 1 9 y 2 + y = x3 4u2 + 4u + 1 1 2 2 8 y + y = (x + x + 1)/x 4u2 + 3u + 1 2 7 y2 + y = αx(x + 1)(x + α) 4u2 + 2u + 1 3 with α2 + α = 1 6 y2 + y = α(x2 + x + 1)/x 4u2 + u + 1 4 with α2 + α = 1 5 y2 + y = x2 (x + 1) 4u2 + 1 5 2 3 10 y + y = x/(x + x + 1) 16u4 + 20u3 + 13u2 + 5 5u + 1
Function Fields with Many Rational Places
Table 2. Constructions over IF4 g 3 4 5 6 7 8 9 10 13 15 18 21 25 26 29 30 31 33 34 35 37 41 43 46 48 49 51 57 59 61 65 70 76 81 88 91 92 94 97 101 105 109 113 114 115
N4 (g) g(F ) F d t n 14 1 F.2 1 3 2 15 1 F.3 1 3 2 17-18 0 134 20 2 F.6 1 3 1 21-22 0 311 21-24 2 F.7 1 3 1 26 1 F.2 1 3 3 27-28 1 F.3 3 3 1 33 1 F.1 1 3 4 33-37 0 531 41-42 1 F.1 1 1 3 41-47 2 F.4 1 3 4 51-53 2 F.5 1 3 3 55 1 F.1 5 3 1 49-60 3 F.8 1 3 4 53-61 2 F.4 1 1 3 60-63 2 F.6 3 3 1 65-66 1 F.1 1 3 5 57-68 3 F.8 3 1 1 58-69 1 F.2 1 1 3 66-72 2 F.4 5 3 1 65-78 2 F.6 1 3 3 72-81 0 332 81-86 1 F.1 1 1 4 77-89 3 F.8 5 3 1 81-90 2 F.4 1 3 5 88-93 1 F.2 5 3 1 63-102 2 F.7 1 3 3 77-105 0 511 99-108 2 F.5 1 3 4 98-114 1 F.2 1 3 5 105-121 2 F.4 1 1 4 99-130 1 F.3 5 3 1 129-137 1 F.1 1 3 6 123-147 2 F.5 1 1 3 144-151 2 F.4 3 3 2 143-152 1 F.1 5 1 1 129-155 3 F.8 1 1 4 99-159 1 F.3 1 3 5 125-165 2 F.6 1 3 4 129-170 0 731 165-176 2 F.5 5 3 1 161-181 2 F.4 1 3 6 161-183 1 F.1 1 1 5 168-184 3 F.8 3 3 2
g 121 125 145 148 154 158 161 162 181 183 191 193 199 208 210 234 241 257 274 295 298 321 337 370 373 379 449 451 466 492 571 577 621 705 750 766 769 937 1015 1108 1207 1731 2083 2435
N4 (g) g(F ) F d t n 150-192 2 F.6 3 1 1 176-198 2 F.4 5 1 1 195-225 2 F.5 1 3 5 215-229 1 F.1 7 3 1 168-237 0 312 209-243 3 F.8 5 1 1 194-247 1 F.2 1 3 6 209-248 2 F.4 1 1 5 220-274 2 F.6 5 3 1 220-276 1 F.2 5 1 1 258-287 2 F.4 7 3 1 257-290 1 F.1 1 3 7 216-298 1 F.3 3 3 2 243-309 2 F.5 1 1 4 257-312 3 F.8 1 1 5 301-343 3 F.8 7 3 1 245-353 2 F.6 1 3 5 321-373 2 F.4 1 3 7 321-396 1 F.1 1 1 6 344-423 1 F.2 7 3 1 384-427 2 F.4 3 1 2 385-456 3 F.8 1 3 7 387-477 2 F.5 1 3 6 417-520 2 F.4 1 1 6 429-523 2 F.5 5 1 1 456-531 3 F.8 3 1 2 513-619 1 F.1 1 3 8 480-622 2 F.6 3 3 2 513-641 3 F.8 1 1 6 559-673 1 F.1 7 1 1 645-772 2 F.5 7 3 1 641-779 2 F.4 1 3 8 688-834 2 F.4 7 1 1 769-939 3 F.8 1 3 8 817-994 3 F.8 7 1 1 855-1014 1 F.1 9 3 1 771-1018 2 F.5 1 3 7 1026-1223 2 F.4 9 3 1 1152-1318 2 F.4 3 3 3 1197-1430 3 F.8 9 3 1 1344-1550 3 F.8 3 3 3 1760-2179 1 F.1 5 3 2 2112-2596 2 F.4 5 3 2 2464-3011 3 F.8 5 3 2
561
562
Harald Niederreiter and Chaoping Xing
Table 3. Constructions over IF8 g N8 (g) g(F ) F d t n 6 33-36 0 173 9 45-47 0 211 45 144-156 0 272 53 120-179 2 F.4 1 1 1 54 129-181 0 174 77 195-242 1 F.1 4 7 1 78 175-245 3 F.8 1 7 2 93 192-284 1 F.2 2 7 2 118 257-348 1 F.1 1 7 4 141 259-407 3 F.8 1 1 1 149 324-428 2 F.4 1 7 3 225 453-616 0 571 376 755-977 1 F.1 5 7 1 461 936-1178 2 F.4 4 7 1
Function Fields with Many Rational Places
Table 4. Constructions over IF9 g N9 (g) g(F ) F d t n 1 16 0 142 3 28 0 112 5 32-36 1 F.10 1 8 2 6 35-40 2 F.14 1 8 1 7 39-43 1 F.11 1 8 2 9 40-51 1 F.12 1 8 2 12 55-63 1 F.9 1 8 3 15 64-74 1 F.9 1 1 2 19 84-88 1 F.10 3 8 1 21 82-95 0 184 22 78-98 2 F.13 1 1 1 23 92-101 1 F.10 1 8 3 24 91-104 0 311 25 64-108 1 F.12 1 4 2 28 105-117 1 F.11 3 8 1 29 104-120 1 F.10 1 1 2 34 111-136 1 F.11 1 8 3 36 110-142 2 F.14 1 1 1 37 120-145 2 F.13 1 4 2 43 120-164 1 F.11 1 1 2 45 112-170 1 F.12 1 8 3 47 154-177 1 F.10 3 4 1 48 163-180 1 F.9 1 8 4 49 168-183 2 F.13 3 8 1 52 175-192 1 F.9 3 1 1 55 164-201 1 F.10 1 4 3 60 190-217 1 F.9 1 1 3 61 192-220 2 F.13 1 8 3 70 189-247 1 F.11 3 4 1 79 228-273 2 F.13 1 1 2 81 245-279 2 F.14 3 8 1 82 192-282 1 F.11 1 4 3 90 244-304 0 581 93 196-313 1 F.12 3 4 1 95 272-318 1 F.10 1 8 4 101 275-335 2 F.14 1 8 3 102 244-338 0 185 103 294-341 1 F.10 3 1 1
g N9 (g) g(F ) F d t n 109 298-358 1 F.9 1 4 4 112 315-366 2 F.13 3 4 1 119 308-386 1 F.10 1 1 3 131 320-419 2 F.14 1 1 2 136 354-433 2 F.13 1 4 3 142 327-449 1 F.11 1 8 4 151 427-474 1 F.9 5 8 1 154 357-483 1 F.11 3 1 1 183 487-563 1 F.9 1 8 5 186 455-571 2 F.14 3 4 1 212 427-642 0 541 217 488-656 1 F.10 1 4 4 223 570-672 2 F.13 1 8 4 226 500-681 2 F.14 1 4 3 231 568-694 1 F.9 1 1 4 238 609-713 2 F.13 3 1 1 286 678-840 2 F.13 1 1 3 301 732-879 1 F.10 5 8 1 334 793-965 1 F.9 5 4 1 365 812-1045 1 F.10 1 8 5 367 756-1050 0 382 371 815-1061 2 F.14 1 8 4 396 875-1125 2 F.14 3 1 1 406 892-1151 1 F.9 1 4 5 451 915-1267 1 F.11 5 8 1 487 1056-1359 2 F.13 1 4 4 556 1323-1536 1 F.9 3 8 2 634 1464-1735 2 F.13 5 8 1 667 1342-1819 1 F.10 5 4 1 669 1459-1824 1 F.9 1 8 6 700 1525-1903 1 F.9 5 1 1 790 1704-2132 2 F.13 1 8 5 1056 2135-2790 2 F.14 5 8 1 1111 2268-2925 1 F.10 3 8 2 1207 2457-3160 1 F.9 3 4 2 1366 2745-3548 2 F.13 5 4 1 1912 3829-4875 1 F.9 7 8 1 2233 4536-5652 2 F.13 3 8 2
563
564
Harald Niederreiter and Chaoping Xing
Table 5. Constructions over IF16 g N16 (g) g(F ) F d t n 6 65 0 112 36 185-223 2 F.20 1 5 2 37 208-228 1 F.16 3 5 1 43 226-259 1 F.16 1 5 3 51 250-295 1 F.16 1 1 2 54 257-309 0 154 55 273-313 1 F.17 3 5 1 58 273-327 0 311 60 257-336 0 113 64 291-354 1 F.17 1 5 3 73 312-393 1 F.18 3 5 1 76 315-407 1 F.17 1 1 2 85 324-446 1 F.18 1 5 3 91 325-472 1 F.19 3 5 1 101 340-516 1 F.18 1 1 2 106 325-538 1 F.19 1 5 3 118 513-590 1 F.15 1 5 4 123 533-611 1 F.15 3 1 1 140 577-685 1 F.15 1 1 3 156 650-754 2 F.20 3 5 1 186 725-884 2 F.20 1 5 3 226 825-1054 2 F.20 1 1 2 235 898-1090 1 F.16 1 5 4 245 936-1131 1 F.16 3 1 1 279 994-1267 1 F.16 1 1 3 306 1025-1374 0 551 352 1155-1557 1 F.17 1 5 4 367 1209-1616 1 F.17 3 1 1 511 1845-2181 1 F.15 5 5 1 598 2049-2521 1 F.15 1 5 5 716 2305-2980 1 F.15 1 1 4 906 2885-3719 2 F.20 1 5 4 936 2990-3835 2 F.20 3 1 1 1021 3280-4163 1 F.16 5 5 1 1195 3586-4812 1 F.16 1 5 5 2476 7488-9525 1 F.15 3 5 2
Function Fields with Many Rational Places
565
Table 6. Constructions over IF27 g N27 (g) g(F ) F d t n 17 128-185 2 F.13 1 13 1 19 126-199 1 F.10 2 13 1 20 133-207 2 F.14 1 13 1 25 196-242 1 F.9 2 13 1 33 220-298 1 F.9 1 13 2 36 244-319 0 1 13 3 37 162-326 1 F.10 1 1 1 42 280-360 0 2 11 43 196-367 1 F.12 2 13 1 48 244-402 0 1 12 49 268-409 1 F.9 1 1 1 209 896-1404 2 F.13 2 13 1 References 1. A. Garcia and H. Stichtenoth, Algebraic function fields over finite fields with many rational places, IEEE Trans. Inform. Th. 41, 1548–1563 (1995). 2. D. Goss, Basic Structures of Function Field Arithmetic, Springer, Berlin, 1996. 3. D.R. Hayes, A brief introduction to Drinfeld modules, The Arithmetic of Function Fields (D. Goss, D.R.Hayes, and M.I. Rosen, eds.), pp. 1–32, W. de Gruyter, Berlin, 1992. 4. K. Lauter, Ray class field constructions of curves over finite fields with many rational points, Algorithmic Number Theory (H. Cohen, ed.), Lecture Notes in Computer Science, Vol. 1122, pp. 187–195, Springer, Berlin, 1996. 5. H. Niederreiter and C.P. Xing, Low-discrepancy sequences and global function fields with many rational places, Finite Fields Appl. 2, 241–273 (1996). 6. H. Niederreiter and C.P. Xing, Explicit global function fields over the binary field with many rational places, Acta Arith. 75, 383–396 (1996). 7. H. Niederreiter and C.P. Xing, Quasirandom points and global function fields, Finite Fields and Applications (S. Cohen and H. Niederreiter, eds.), London Math. Soc. Lecture Note Series, Vol. 233, pp. 269–296, Cambridge University Press, Cambridge, 1996. 8. H. Niederreiter and C.P. Xing, Drinfeld modules of rank 1 and algebraic curves with many rational points. II, Acta Arith. 81, 81–100 (1997). 9. H. Niederreiter and C.P. Xing, The algebraic-geometry approach to low-discrepancy sequences, Monte Carlo and Quasi-Monte Carlo Methods 1996 (H. Niederreiter et al., eds.), Lecture Notes in Statistics, Vol. 127, pp. 139–160, Springer, New York, 1997. 10. H. Niederreiter and C.P. Xing, Algebraic curves over finite fields with many rational points, Proc. Number Theory Conf. (Eger, 1996), W. de Gruyter, Berlin, to appear. 11. H. Niederreiter and C.P. Xing, Global function fields with many rational places and their applications, Proc. Finite Fields Conf. (Waterloo, 1997), submitted. 12. H. Stichtenoth, Algebraic Function Fields and Codes, Springer, Berlin, 1993. 13. M.A. Tsfasman and S.G. Vlˇ adut, Algebraic-Geometric Codes, Kluwer, Dordrecht, 1991. 14. G. van der Geer and M. van der Vlugt, Tables for the function Nq (g), preprint, 1997.
566
Harald Niederreiter and Chaoping Xing
15. S.G. Vlˇ adut and V.G. Drinfeld, Number of points of an algebraic curve, Funct. Anal. Appl. 17, 53–54 (1983). 16. C.P. Xing and H. Niederreiter, A construction of low-discrepancy sequences using global function fields, Acta Arith. 73, 87–102 (1995).
Lattice Basis Reduction in Function Fields
Sachar Paulus Institute of Theoretical Computer Science Darmstadt University of Technology, 64283 Darmstadt, Germany [email protected]
Abstract. We present an algorithm for lattice basis reduction in function fields. In contrast to integer lattices, there is a simple algorithm which provably computes a reduced basis in polynomial time. This algorithm works only with the coefficients of the polynomials involved, so there is no polynomial arithmetic needed. This algorithm can be generically extended to compute a reduced basis starting from a generating system for a lattice. Moreover, it can be applied to lattices over the field of puiseux expansions of a function field. In that case, this algorithm represents one major step towards an efficient arithmetic in Jacobians of curves.
1
Previous Work
In [5], A. Lenstra published a work on factoring multivariate polynomials over finite fields. Part of the problem was solved by computing a smallest vector of a lattice in a polynomial ring. To solve this problem, he formulated an algorithm which works “only” with coefficients of the finite field. The “only” means that except addition and subtraction no polynomial arithmetic is performed; every reduction step consists in the solution of a triangular linear system of equations with coefficients in the finite field. A. Lenstra proposed this algorithm for lattice bases which are not necessarily of full rank. We argue that this algorithm can also be used (with some minor changes) for computing a reduced basis starting from a generating system. The main argument for its correctness is analogous to the MLLL justification. Moreover, there is no need to restrict it to polynomials over finite fields; we formulate it for polynomials over any field. The analogon to “real” lattices are lattices over K((X)), the field of puiseux expansions over the field K. The reduction algorithm proposed in this paper can also be applied to such lattices, although some precision problems have to be dealt with and consequently its complexity is not as predictable as in the J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 567–574, 1998. c Springer-Verlag Berlin Heidelberg 1998
568
Sachar Paulus
“integral” case. Such a variant of the algorithm has been proposed in [9] to compute integral bases of function fields. An appropriately adapted version can be used to formulate a reasonably fast arithmetic in Jacobians of curves of higher degree.
2
Reduced Lattice Bases in Function Fields
Let n be a positive integer and K a field. For a function g ∈ K[X] we denote by |g| its degree in X. The norm |a| of a n-dimensional vector a = (a1 , . . . , an ) ∈ K[X] is defined as max{|aj | : 1 ≤ j ≤ n}. Let b1 , b2 , . . . , bn ∈ K[X]n be linearly independent over K(X). The lattice L ⊂ K[X]n of rank n spanned by b1 , . . . , bn is defined as n n X X K[X]bj = rj bj : rj ∈ K[X] (1 ≤ i ≤ n) . L= j=1
j=1
The determinant d(L) ∈ K[X] of L is defined as the determinant of the n × n matrix B having the vectors b1 , . . . , bn as columns. The value of d(L) does not depend on the choice of a basis of L up to units of K. The orthogonality defect OD(b1 , . . . , bn ) of a basis b1 , . . . , bn for a lattice L is defined as n X
|bi | − |d(L)|.
i=1
Clearly OD(b1 , . . . , bn ) ≥ 0. For 1 ≤ j ≤ n a j-th successive minimum |mj | of L is defined as the norm of a vector mj of smallest norm in L that is linearly independent of m1 , . . . , mj−1 over K(X). |mj | is independent of the particular choice of m1 , . . . , mj−1 . See [6]. Proposition 1. Let b1 , . . . , bn be a basis for a lattice L with OD(b1 , . . . , bn ) = 0, ordered in such a way that |bi | ≤ |bj | for 1 ≤ i < j ≤ n. Then |bj | is a j-th successive minimum of L for 1 ≤ j ≤ n. Proof. See [5]. We say that the basis b1 , . . . , bn is reduced if OD(b1 , . . . , bn) = 0.
Lattice Basis Reduction in Function Fields
569
Proposition 2. Let b1 , . . . , bn be a basis for a lattice L and denote bi,j the j-th coordinate of bi . If the coordinates of the vectors b1 , . . . , bn can be permuted in such a way that they satisfy for 1 ≤ i < j ≤ n and 1. |bi | ≤ |bj | for 1 ≤ j < i < k ≤ n, 2. |bi,j | < |bi,i | ≤ |bi,k | then the basis b1 , . . . , bn is reduced. Proof. The second condition implies that d(L) = reduced.
Pn
j=1 |bj |,
so b1 , . . . , bn is
The second condition is illustrated by the following figure, where the i-th column of the matrix is bi . The j-th position in the i-th column gives the condition that holds for |bi,j |: = |b1 | < |b2 | < |b3 | · · · < |bn | ≤ |b1 | = |b2 | < |b3 | · · · < |bn | ≤ |b1 | ≤ |b2 | = |b3 | · · · < |bn | .. .. .. .. . . . . ≤ |b1 | ≤ |b2 | ≤ |b3 | · · · = |bn | We extend this theory to the case of a lattice whose rank is smaller than n. Let m be a positive integer < n, let b1 , . . . , bm ∈ K[X] be linearly independent over K(X) and let L be the lattice in K[X]n of rank m spanned by b1 , . . . , bm . Denote by B the n × m matrix having the bi as columns. We define the determinant d(L) of L to be the maximum of the norms of the determinants of the m × m submatrices of B. The orthogonality defect is again defined as OD(b1 , . . . , bm ) = Pm |b | − d(L). A basis is called reduced if OD(b1 , . . . , bm) = 0. If the vectors i i=1 are sorted according to their norm, then |bi | is a i-th successive minimum of L. We have an analogous proposition to the one above: Proposition 3. Let b1 , . . . , bm be a basis for a lattice L of rank m < n and denote bi,j the j-th coordinate of bi . If the coordinates of the vectors b1 , . . . , bm can be permuted in such a way that they satisfy for 1 ≤ i < j ≤ m and 1. |bi | ≤ |bj | for 1 ≤ j < i ≤ m and i < k ≤ n, 2. |bi,j | < |bi,i | ≤ |bi,k | then the basis b1 , . . . , bm is reduced.
570
Sachar Paulus
Proof. The second condition implies that d(L) = reduced.
Pn
j=1 |bj |,
so b1 , . . . , bn is
The second condition is illustrated by the following figure, where the i-th column of the matrix is bi . The j-th position in the i-th column gives the condition that holds for |bi,j |: = |b1 | < |b2 | < |b3 | · · · < |bm| ≤ |b1 | = |b2 | < |b3 | · · · < |bm| ≤ |b1 | ≤ |b2 | = |b3 | · · · < |bm| .. .. .. .. . . . . ≤ |b1 | ≤ |b2 | ≤ |b3 | · · · = |bm| ≤ |b1 | ≤ |b2 | ≤ |b3 | · · · ≤ |bm| . .. .. .. .. . . . ≤ |b1 | ≤ |b2 | ≤ |b3 | · · · ≤ |bm| Finally, we want to compute a reduced basis starting from a generating system. Therefore we need the following Proposition 4. Let b1 , . . . , bm be a generating system for a lattice L and denote bi,j the j-th coordinate of bi . If the coordinates of the vectors b1 , . . . , bm can be permuted in such a way that they satisfy for 1 ≤ i < j ≤ m and 1. |bi | ≤ |bj | for 1 ≤ j < i ≤ m and i < k ≤ n, 2. |bi,j | < |bi,i | ≤ |bi,k | then the system b1 , . . . , bm forms a (reduced) basis of L. Proof. The determinant of the submatrix (bi,j )i,j=1,... ,m has the largest degree Qm of all m × m submatrices, namely i=1 |bi | and is obviously 6= 0 . If b1 , . . . , bm were linear dependent, then the vectors resulting from cutting the last n − m coefficients were also linear dependent and thus the determinant would be 0 which is a contradiction. Thus b1 , . . . , bm are linear independent over K(X) and so form a basis. We have formulated these facts for “integral” lattices, i.e. lattices over K[X] for simplicity. The same facts can be showed with analogous arguments for “real” lattices, i.e. lattices over the field of puiseux expansions ) ( n X ai x i : ai ∈ K . K((X)) = −∞
We will formulate an application in this setting in the last section.
Lattice Basis Reduction in Function Fields
3
571
The Algorithm
We will now describe an algorithm which will compute a reduced basis of a lattice of full rank given by a generating system of vectors. In the course of the algorithm the coordinates of the vectors will be permuted several times. The original ordering of the coefficients can be restored by applying the appropriate permutation. For a polynomial bi,j we denote by bi,j,p the coefficient of X p . Algorithm 1 Input: b1 , . . . , bl ∈ K[X] Output: a1 , . . . , am basis of hb1 , . . . , bl i 1. k ← 0 2. WHILE k < l DO 2.1. Choose c ∈ {bk+1 , . . . , bl } such that |c| = min{|bj | : k + 1 ≤ j ≤ l}, let ic be the corresponding index, swap(bk+1 , bic ) k P ai,j,|ai|ri = cj,|c| for 1 ≤ j ≤ k in K 2.2. Solve i=1
0
2.3. c ← c −
k P
ri X |c|−|ai | · ai
i=1
2.4. IF |c0 | = |c| THEN 2.4.a1 ak+1 ← c 2.4.a2 Permute the coordinates (k + 1, . . . , n) such that |ak+1,k+1| = |ak+1 | 2.4.a3 k ← k + 1 ELSE /* We have found a shorter vector, possibly 0 */ 2.4.b1 IF c0 = 0 THEN 2.4.b1.a1 Eliminate bk+1 2.4.b1.a2 l ← l − 1 ELSE /* Insert the new vector at the right place and restart from there */ 2.4.b1.b1 p ← max{0, . . . , k : |al | ≤ |c0 |} 2.4.b1.b2 FOR j = k + 1 DOWNTO p + 2 DO bj ← aj−1 2.4.b1.b3 bp+1 ← c0 2.4.b1.b4 k ← p Remark: We have denoted the vectors which are assumed to be correct during the computation with a and those which are assumed to be reviewed with b. Some assignments have been done in the case where these sets are subject to change (2.4.a1, 2.4.b1.b2-3). Those are clearly not to be done in an implementation: an easy pointer arithmetic can produce the same effect very fast.
572
Sachar Paulus
Correctness: The following invariants are easy to check to hold before step 2.1: I1 I2 I3 I4 I5
|ai | ≤ |aj | for 1 ≤ i < j ≤ k |ak | ≤ bj | for k < j ≤ l |ai,j | < |ai,i| ≤ |ai,h | for 1 ≤ j < i ≤ k and i < h ≤ n ai,i,|ai| 6= 0 for 1 ≤ i ≤ k ai,j,|ai| = 0 for 1 ≤ j < i ≤ k
Note that I4 and I5 imply that the linear system to be solved in step 2.2. is in fact triangular with non-zero entries on the diagonal. Thus there exists a unique solution. Pl Pk The algorithm terminates, since in step 2.4. either i=1 |ai | + i=k+1 |bi | becomes smaller, where k becomes also smaller, or stays unchanged, in which case Pk k is increased by 1. The algorithm terminates if k = l, so exactly when i=1 |ai | equals the determinant of the lattice. Thus only a finite number of passes through 2.4. is possible. If the algorithm terminates, then the vectors a1 , . . . , ak fulfill I1,I2,I3 with k = l, thus with proposition 4 they form a reduced basis of the lattice.
We will express the complexity of the algorithm in terms of arithmetical operations in K. By an arithmetical operation in K, we mean addition, subtraction, multiplication or division of two elements of K. We will first study the case where the input of the algorithm is a basis b1 , . . . , bl . In that case, the number of passes of step 2.4. of the algorithm is bounded by (l + 1) · (OD(b1 , · · · , bl ) + 1), since Pl either i=1 |bi | decreases by at least 1 or stays unchanged, in which case at most l + 1 passes are possible, since then k is increased by 1. Now every pass of the main loop consists of O(k 2 ) operations in K for step 2.2. and O(k · n · max |bi |) operations in K for step 2.3. Thus we get the following result: Proposition 5. Algorithm 1 takes O(l2 ·n·max |bi |·OD(b1 , . . . , bk )) arithmetical operations in K to compute a reduced basis starting from a basis b1 , . . . , bl .
Now if the input of the algorithm is not a basis, the analysis stays unchanged, but the upper bound given by OD(b1 , . . . , bl ) makes no longer sense. In that case, we P use as upper bound for the number of passes trough the main loop (l + 1) · ( li=1 |bi | − d(L) + 1). We get the following Proposition 6. Algorithm 1 takes O(l3 · n · (max |bi |)2 ) arithmetical operations in K to compute a reduced basis starting from a generating system b1 , . . . , bl .
Lattice Basis Reduction in Function Fields
573
If the lattice is “real”-valued, then given a sufficient accurate precision p, the algorithm above can be used without changes. The complexity of the algorithm is then O(l · (l + p) · n · max |bi | · OD(b1 , . . . , bk )). The determination of the a priori precision needed is not obvious and subject to further research.
4
An Application in Divisor Class Groups
There exist several applications for this algorithm. As stated above A.K. Lenstra used it for factoring multivariate polynomials over finite fields. It can also be used for the presentation of large simple groups. We will give a new application in the context of the arithmetic of Jacobians of curves. It is a major goal in function field theory to have a reasonably fast arithmetic for Jacobians of non-hyperelliptic curves, in other words for the divisor class group of function fields of degree > 2. The important work of Coates [2] yields a polynomial time algorithm [10] which is nevertheless not suitable for practical needs. Another approach is to try and apply the mechanisms known from number fields, although an efficient algorithm for computing in the class group of a non-imaginary quadratic number field is still missing. But it appears that many problems are easier solved in the function field case (such as lattice basis reduction), so there is hope that there exists such an arithmetic. As already mentioned in [1], the (degree zero) divisor class group (which is isomorphic to the Jacobian variety) of a hyperelliptic curve can be uniquely represented by reduced ideals in an imaginary quadratic function field. In the composition algorithm of reduced ideals, the reduction process of non-reduced ideals plays an important role. We will now sketch how we expect to find an analogous arithmetic in the divisor class group of curves of degree > 2 and what is the role of the lattice reduction. In contrast to the number field case, there may exist a special situation which is in some sense very similar to imaginary quadratic function fields, namely function fields where the (chosen) infinite prime is totally ramified (if it exists). In this situation, first results concerning uniqueness of representations of divisor classes by reduced ideals are obtained (see [3]). One major result is that if an ideal has the smallest norm in its equivalence class, then there is no other equivalent ideal with this same smallest norm. Thus any ideal class can be uniquely represented by this integral ideal I with smallest norm. It is now natural to investigate how to compute a unique representation of this ideal. The idea is as follows: assume that we can compute for a given ideal A an element e of A which has the smallest possible norm. Then dividing A by e yields an equivalent (fractional) ideal R including 1. Now compute the least
574
Sachar Paulus
common multiple m of all denominators of the elements of the basis of this ideal and we get I = mR. A unique representation is then given by e.g. a Hermite reduction of the basis of mR. The computation of the element e can be achieved by lattice reduction as follows: We compute for a generating system of A the corresponding basis in the logarithmic embedding (see [9]). This is then a generating system of a lattice in puiseux expansions. Now we apply our reduction algorithm and get a smallest vector of that “real” lattice. Applying the same transformations to the original basis will yield an element e of smallest norm in A. This works if the a priori precision of the algorithm is sufficiently high. We expect our algorithm nevertheless to work fine in most cases. The assumption of the existence of a totally ramified prime made here may be removed in the future by developing a similar result to [7], where a unique representation together with an arithmetic has been developed for divisor class groups of hyperelliptic funcion fields without a ramified prime.
References 1. E. Artin: Quadratische K¨ orper im Gebiete der h¨ oheren Kongruenzen I. Mathematische Zeitschrift 19 (1924). pp. 153 – 206. In: S. Lang, J. Tate (eds.): The collected papers of Emil Artin. Reading, Mass.: Addison Wesley 1965. 2. J. Coates: Construction of rational functions on a curve. Proc. Camb. Phil. Soc. 68 (1970). 3. S. Galbraith, S. Paulus: Unique representation of divisor class groups of function fields of degree > 2. In preparation. 4. A. K. Lenstra, H. W. Lenstra Jr., L. Lovasz: Factoring polynomials with rational coefficients. Math. Ann. 261 (1982). 5. A. K. Lenstra: Factoring multivariate polynomials over finite fields. J. Computer & System Sciences 30 (1985) No. 2. 6. K. Mahler: An analogue of Minkowski’s geometry of numbers in a field of series. Annals of Math. 42 (1941). 7. S. Paulus, H. G. R¨ uck: Real and imaginary quadratic representations of hyperelliptic function fields. To appear in Mathematics of Computation. 8. M. E. Pohst, H. Zassenhaus: Algorithmic algebraic number thoery. Cambrigde University Press: Cambridge 1989. 9. M. E. Pohst, M. Sch¨ orning: On integral basis reduction in global function fields. Proceedings of ANTS II. Lecture Notes in Computer Science 1122. Springer Verlag 1996. 10. G. Walsh: A polynomial-time complexity bound for the computation of the singular part of a Puiseux expansion of an algebraic function. Department of Mathematics, University of Ottawa. Unpublished.
Comparing Real and Imaginary Arithmetics for Divisor Class Groups of Hyperelliptic Curves Sachar Paulus1 and Andreas Stein2 1
Institute of Theoretical Computer Science Darmstadt University of Technology 64283 Darmstadt Germany 2 Department of Computer Science University of Manitoba Winnipeg, Manitoba R3T 2N2 Canada
Abstract. We compare optimized arithmetics with ideals in real resp. imaginary quadratic function fields for divisor class groups of hyperelliptic curves. Our analysis shows that the new real quadratic arithmetic presented by R¨ uck and the first author in [6] and an appropriate modification of the algorithm of Cantor both require a number of operations which is O(g2 ) in the field of constants, where g is the genus of a hyperelliptic curve.
1
Introduction
Until recently, the only relatively fast method known to compute in the divisor class group of a hyperelliptic curve was the algorithm of Cantor. This algorithm uses ideal multiplication and reduction in an imaginary quadratic function field. Thus one needs at least one ramified point on the curve. H.G. R¨ uck and the first author presented a new method using ideal reduction and multiplication in a real quadratic function field which does not need an assumption like Cantor’s algorithm. The authors compared the complexity and got similar results for both arithmetics (7g3 + O(g2 ) for the imaginary quadratic case and 9g3 + O(g2 ) for the real quadratic case). First experiments showed that neither seems to have an advantage over the other. In this paper, we provide a more detailed analysis on the arithmetics proposed and outline variants of the proposed algorithms which seem to be optimal. Such results are of practical interest since hyperelliptic curves over finite fields are becoming of greater interest for cryptographers. We show that both the real and the imaginary quadratic arithmetics require a quadratic number of arithmetical operations in the field of constants. J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 576–591, 1998. c Springer-Verlag Berlin Heidelberg 1998
Real and Imaginary Hyperelliptic Curve Arithmetics
577
More precisely, we prove the following theorem: Theorem 1. 1. Composition of two classes in the divisor class group of a hyperelliptic curve of genus g over a field of characteristic 6= 2 using an imaginary quadratic model can be performed in 22g2 + O(g) field operations. 2. Composition of two classes in the divisor class group of a hyperelliptic curve of genus g over a field of characteristic 6= 2 using a real quadratic model can be performed in 23g2 + O(g) field operations. More detailed statements can be formulated concerning squaring and other special cases. The complexity of the composition and reduction of reduced ideals (Giant step) in the real quadratic case as defined in [7] needs only 21g2 +O(g) field operations. Although the underlying structure is no longer a group, it can also be used in a cryptographic protocol. This paper is organized as follows. We will first recall the two variants of doing arithmetic in the divisor class group and then analyze the three main components: the Euclidean algorithm and the reduction processes both in the real and the imaginary quadratic cases.
2
The Arithmetics
Let k be a field (not necessarily finite) whose characteristic is different from 2 (note that analogous results are possible in the case of characteristic 2). We consider a hyperelliptic function field K over k of genus g, i.e. a quadratic extension of the rational function field over k of one variable. Then K can be generated over the rational function field by the square root of a polynomial of degree 2g + 1 or 2g + 2. p In the first case we assume K = k(x)( F (x)), where F (x) ∈ k[x] is a separable polynomial of degree 2g + 1. This can only be achieved if at least one of the ramified prime divisors in K/k(x) is rational over k. One calls p K then an imaginary quadratic function field. The second case is K = K(t)( D(t)), where
578
Sachar Paulus and Andreas Stein
D(t) ∈ k[t] is a monic, separable polynomial of degree 2g + 2. This occurs if a prime divisor in k(t) splits into two extensions in K. Then K is called a real quadratic function field. We neglect here the case that the leading coefficient of the polynomial D(t) is not a square in k ∗ . A constant field extension of degree 2 over k leads to our second case. We denote by Div0 (K) the group of divisors of degree 0. The group of principal divisors P (K) = {(f) | f ∈ K ∗ } is a subgroup of Div0 (K) and the factor group Cl0 (K) = Div0 (K)/P (K) is called the divisor class group (of degree 0) of K. We denote by [D] ∈ Cl0 (K) the class of D ∈ Div0 (K). We will express in both cases the arithmetic in the (degree 0) divisor pclass group of Kpin terms of reduced ideals in the corresponding orders k[x][ F (x)] resp. k[t][ D(t)]. We will fix an effective divisor D∞ of degree g. If D ∈ Div0 (K) is any divisor, the Riemann-Roch theorem yields that dim(D +D∞ ) ≥ 1, i.e. there is a function f ∈ K ∗ and an effective divisor D0 of degree g such that (f) = D0 − (D∞ + D). Hence any divisor class [D] ∈ Cl0 (K) has a representative of the form [D] = [D0 − D∞ ], where D0 is an effective divisor of degree g.
2.1
The Imaginary Quadratic Case
p Let F (x) ∈ k[x] be a separable polynomial of degree 2g + 1. K = k(x)( F (x)) is a function field over k of genus g. The pole divisor ∞ of x in k(x) is ramified under the extension to K, let P∞ be its extension in K. We fix the divisor D∞ := gP∞ and represent each element of Cl0 (K) in the form [D0 − gP∞ ]. If B is a divisor in K which is the conorm of a divisor of k(x), then deg(B) is even and B − deg(B)P∞ is a principal divisor. Therefore one can get rid of conorms in D0 . Furthermore one cancels contributions of P∞ in D0 . One gets the well known result [1,2,5]: Proposition 1. Each divisor class [D] ∈ Cl0 (K) has a unique representation of the form [D] = [A − deg(A)P∞ ], where A is an effective divisor of K with deg(A) ≤ g which is divisible neither by P∞ nor by the conorm of a divisor of k(x). p (x) Now we consider the Dedekind domain OK = k[x][ F (x)] which is the integral (x) closure of k[x] in K. Any ideal a ⊂ OK can be given in the form p a = T (x)(U (x)k[x] + (V (x) + F (x))k[x]) with T (x), U (x), V (x) ∈ k[x], where U (x) divides F (x) − V (x)2 . If deg V (x) < deg U (x) and if the leading coefficients of U (x) and T (x) are 1, then this representation by (T (x), U (x), V (x)) is unique. The degree of a satisfies deg(a) = deg(U (x)T (x)2 ). If T (x) = 1, then a is called primitive.
Real and Imaginary Hyperelliptic Curve Arithmetics
579
(x)
Each prime ideal in OK defines a valuation on K. Therefore one can associate with it a prime divisor of K. This gives an isomorphism from the group of ideals (x) of OK onto the group of divisors of K which are prime to P∞ (and induces an (x) isomorphism between the ideal class group of OK and Cl0 (K)). Hence we can (x) associate to each divisor A of Proposition 2.1 a primitive ideal a ⊂ OK with deg(a) = deg(A) ≤ g. An ideal a which corresponds to a divisor A of Proposition 2.1 is called reduced ideal. It has a unique reduced basis (U (x), V (x)) where the leading coefficient of U (x) is 1, deg V (x) < deg U (x) ≤ g and U (x) divides F (x) − V (x)2 . Now we formulate Proposition 2.1 in terms of ideals and get Theorem 2. There exists a canonical bijection between the divisor class group (x) Cl0 (K) and the set of reduced ideals in OK . This bijection induces the following group law a ∗ b = c on the set of reduced ideals: multiply the ideals a and b and let c be the unique reduced ideal in the ideal class of ab. See [1,2,5] for the proof. This yields the following algorithm for computing the composition law in the divisor class group. Algorithm 3 (Algorithm of Cantor) Input: (Ua (x), Va (x)), (Ub (x), Vb (x)) with Ui (x) monic, deg Ui (x) ≤ g, deg Vi (x) < deg Ui (x) and Ui (x) | (F (x) − Vi (x)2 ) for i = a, b, representing two reduced ideals a and b. Output: (U (x), V (x)) with U (x) monic, deg U (x) ≤ g, deg V (x) < deg U (x) and U (x) | (F (x) − V (x)2 ) representing the unique reduced ideal c in the ideal class of ab. 1. /* Compute a primitive ideal c∗ equivalent to ab */ 1.1. Compute the extended gcd (S(x), X(x), Z(x)) ← XGCD3(Ua (x), Ub (x), Va (x) + Vb (x)) Ua (x)Ub (x) 1.2. U ∗ (x) ← (NORM ALIZE(S(x))2 ) 2
(x)−(Va (x)) ) 1.3. V ∗ (x) ← Va (x) + X(x)Ua (x)(Vb(x)−Va (x))+Z(x)(F mod U ∗ (x) S(x) ∗ 2. /* Reduce c */ 2.1 (r(x), U (x), V (x), U ∗(x), V ∗ (x)) ← RED IM AG1(U ∗ (x), V ∗ (x)) 2.2 WHILE deg U (x) > g DO 2.2.2 (r(x), U (x), V (x), U ∗ (x), V ∗ (x)) ← RED IM AG2(r(x), U (x), V (x), U ∗(x), V ∗ (x)) 2.4 RETURN (U (x), V (x))
580
2.2
Sachar Paulus and Andreas Stein
The Real Quadratic Case
Let D(t) p ∈ k[t] be a monic, separable polynomial of degree 2g + 2. Then K = k(t)( D(t)) is a function field over k of genus g. The pole divisor ∞ of t in k(t) decomposes into two different prime divisors P1 and P2 of K. Let ν1 and ν2 be the corresponding normalized valuations of K. We choose and fix the divisor D∞ := gP2 and represent each element of Cl0 (K) in the form [D0 − gP2 ]. If B is a divisor in K which is the conorm of a divisor of k(t), then deg(B) is even and B − (deg(B)/2)(P1 + P2 ) is a principal divisor. With this remark we can cancel conorms in D0 and we get [D0 − gP2 ] = [A + nP1 − mP2 ], where A is an effective divisor in K which is not divisible by a conorm, by P1 or by P2 . Since A is effective n and m are integers with 0 ≤ deg(A) + n = m ≤ g. We change this slightly to [A + nP1 − mP2 ] = [A − (m − n)P2 ] + n[P1 − P2 ].
Proposition 2. Each divisor class [D] ∈ Cl0 (K) has a unique representation of the form [D] = [A − deg(A)P2 ] + n[P1 − P2 ], where A is an effective divisor of K with deg(A) ≤ g which is divisible neither by P1 or P2 nor by the conorm of a divisor of k(t), and where n is an integer with 0 ≤ n ≤ g − deg(A). Proof: See [6]. p (t) We consider the Dedekind domain OK = k[t][ D(t)] which is the integral (t) closure of k[t] in K. Any ideal a ⊂ OK can be given in the form p a = S(t)(Q(t)k[t] + (P˜ (t) + D(t))k[t]) with S(t), Q(t), P˜ (t) ∈ k[t], where Q(t) divides D(t) − P˜ (t)2 . If deg P˜ (t) < deg Q(t) and if the leading coefficients of Q(t) and S(t) are 1, then this representation is unique. In this case, we call it the standard representation. The degree of a satisfies deg(a) = deg(Q(t)S(t)2 ). If S(t) = 1, we call a primitive. Again we (t) get a canonical isomorphism from the group of ideals of OK onto the group of divisors of K which are prime to P1 and P2 . (t)
An ideal a ⊂ OK which corresponds to a divisor A with the properties of Proposition 2.4 is called a reduced ideal. It is an ideal a with deg(a) ≤ g which is (t) not divisible by an ideal of the form S(t)OK with S(t) ∈ k[t] and it is therefore uniquely represented by the pair (Q(t), P˜ (t)).
Real and Imaginary Hyperelliptic Curve Arithmetics
581
We consider the following ideal in Z {m ∈ Z | m(P1 − P2 ) is a principal divisor} = RZ, (t)
where the generator R with R ≥ 0 is called the regulator of OK . Either R = 0 or R ≥ g + 1. If ab−1 = fOK , we define the distance between a and b as (t)
d(b, a) := ν1 (f) mod RZ. This corresponds to the definition in [7,8]. We want to compute with small representatives of the residue class d(b, a), therefore we define for λ ∈ R d(b, a, λ) := max{n ∈ d(b, a) | n ≤ λ}.
Theorem 4. There exists a canonical bijection between the divisor class group (t) Cl0 (K) and the set of pairs {(a, n)}, where a is a reduced ideal of OK and n is an integer with 0 ≤ deg(a)+n ≤ g. This bijection induces the following group law (a1 , n1 ) ∗ (a2 , n2 ) = (a3 , n3 ) on the set of these pairs: multiply the ideals a1 and a2 , find in the ideal class of a1 a2 a reduced ideal a3 such that d(a3 , a1a2 , n1 + n2 ) is maximal and define n3 = n1 + n2 − d(a3 , a1 a2 , n1 + n2 ). Proof: See [6]. This yields the following algorithm for computing the composition law in the divisor class group. The computation of the extended gcd and the reduction will be addressed in their own section. Hereby, we assume two reduced ideals a and b given in standard representation. Algorithm 5 Composition Input: (Qa (t), Pa (t), na ), (Qb (t), Pb(t), nb ) with Qi (t) monic, deg Pi (t) < deg Qi (t) ≤ g, Qi (t) | (D(t) − Pi (t)2 ) and 0 ≤ ni ≤ g − deg Qi (t) for i = a, b, representing two reduced ideals with distances (a, na ) and (b, nb ). Output: (Q(t), P (t), n) with Q(t) monic, deg P (t) < deg Q(t) ≤ g, Q(t) | (D(t) − P (t)2 ) and 0 ≤ n ≤ g −deg Q(t) representing the unique reduced ideal with distance (c, n) in the ideal class of ab such that n = na + nb − d(c, ab, na + nb ). 1 /* Compute a primitive ideal c∗ equivalent to ab */ 1.1 Compute the extended gcd (S(t), X(t), Z(t)) ← XGCD3(Qa (t), Qb(t), Pa (t) + Pb (t))
582
Sachar Paulus and Andreas Stein
1.2 Q∗ (t) ←
Qa (t)Qb (t) NORM ALIZE((S(t))2 ) X(t)Qa (t)(Pb (t)−Pa (t))+Z(t)(D(t)−(Pa (t))2) Pa (t) + S(t)
1.3 P ∗(t) ← mod Q∗ (t) ∗ 2 /* Reduce c */ 2.1 dist ← 0 ∗ ∗ 2.2 (r(t), Q(t), P (t), Q∗(t), P ∗(t)) ← p RED REAL1(Q (t), P (t)) 2.3 dist ← deg Q(t) − deg(P (t) − D(t)) 2.4 WHILE deg Q(t) > g 2.4.1 (r(t), Q(t), P (t), Q∗(t), P ∗(t)) ∗ ← RED REAL2(r(t), Q(t), P (t), Q∗(t), p P (t)) 2.4.2 dist ← dist + deg Q(t) − deg(P (t) − D(t)) 3 /* Compute closest ideal to deg S(t) + na + nb */ 3.1 WHILE dist ≤ deg S(t) + na + nb 3.1.1 P ∗ (t) ← P (t), dist∗ ← dist 3.1.2 (r(t), Q(t), P (t), Q∗(t), P ∗(t)) ∗ ← RED REAL2(r(t), Q(t), P (t), Q∗(t), p P (t))) 3.1.3 dist ← dist + deg Q(t) − deg(P (t) − D(t)) 3.2 RETURN (Q∗ (t), P ∗(t), deg S(t) + na + nb − dist∗ Note that in step 2.4.2 a nonpositive number is added to dist and in step 3.1.2 a positive number is added to dist. Thus the algorithm terminates with the correct reduced ideal and its distance. Also note that there existspa more efficient way to compute dist instead of having to evaluate deg(P (t) − D(t)) (see [7]). We will now analyze the complexities of these two algorithms. The first part of both algorithms is almost identical, so we treat this analysis for both algorithms in the next section. The reduction processes, although similar, will be investigated in their own section.
3
Computing an Equivalent Primitive Ideal
Step 1 of both algorithms differ only in the fact that deg F (x) = 2g + 1 and deg D(t) = 2g + 2. This difference will not appear in our complexity statements. Therefore, we will only give one analysis. We use the classical algorithms (see [3] or [4]) for the elementary operations: addition, subtraction, normalization, negation, multiplication and division with remainder. Let A, B be two polynomials of degree a, b respectively, where a ≥ b. We will only count field operations which are of quadratic complexity, such as multiplication, and inversion, since they strongly dominate over field operations of linear complexity. We will use as complexities the following number of field operations for basic polynomial operations:
Real and Imaginary Hyperelliptic Curve Arithmetics
583
addition,subtraction,negation of A and B 0 normalization of A (a + 1) multiplication of A and B (a + 1)(b + 1) division with remainder of A and B (b + 1)(a − b + 1) Although these techniques might be well-known, we analyze the Euclidean algorithm to obtain explicit bounds for our purpose. To compute S = gcd(A, B) only, we let R0 = A, R1 = B, and, for i ≥ 2, Ri−2 = qi−1 Ri−1 + Ri such that deg Ri ≤ deg Ri−1 − 1 (division with remainder). Then, there exists n ≥ 1 such that Rn+1 = 0, and Rn = gcd(A, B) = S. The extended Euclidean algorithm computes, in addition, polynomials X, Y in k such that S = XA + Y B. We let s0 = 1, s1 = 0, t0 = 0, t1 = 1, and, for i = 2, . . . , n, si = si−2 − qi−1 si−1 , ti = ti−2 − qi−1ti−1 . Then, X = sn and Y = tn . Note that s2 = 1 and t2 = −q1 . Proposition 3. Let A, B be two polynomials, where deg A ≥ deg B ≥ 1. Let S = gcd(A, B) = XA + Y B with polynomials X, Y computed by the extended Euclidean algorithm. 1. The number of operations to compute S is bounded by (deg A + 1)(deg B + 1) − deg S − (deg S)2 2. The number of operations to compute X resp. Y is bounded by (deg B − deg S)(deg B − deg S − 1) resp. by (deg A − deg S)(deg A − deg S − 1). Proof: The proof is straightforward using standard techniques, observing that n X (deg Ri−1 − deg Ri + 1) deg Ri i=1
= deg A deg B + deg S − (deg S)2 +
n−1 X
(deg Ri deg Ri+1 − (deg Ri )2 + deg Ri )
i=1
≤ deg A deg B + deg S − (deg S) , 2
since deg Ri+1 ≤ deg Ri − 1 for i ≥ 1. Note that the number of operations is equal to the given bounds in the above proposition provided that n − 1 = deg B − deg S, i.e. deg qj = 1 for j = 2, . . . , n. We now formulate the algorithm XGCD3 as it is used in the algorithms 3 and 5. We denote by HXGCD the half-extended Euclidean algorithm which computes X and S, and by XGCD the extended Euclidean algorithm which, in addition, computes Y . Since the input for both algorithms is slightly different, we will formulate the algorithm twice.
584
Sachar Paulus and Andreas Stein
Algorithm 6 XGCD3 CANTOR Input: Ua (x), Ub (x), Va (x) + Vb (x) with Ui (x) monic, deg Ui (x) ≤ g, deg Vi (x) < deg Ui (x) , Ui (x) | (F (x) − Vi (x)2 ) for i = a, b. Output: S(x), X(x), Y (x) such that S(x) = gcd(Ua (x), Ub (x), Va (x) + Vb (x)), and S(x) = X(x)Ua (x) + V (x)Ub (x) + Y (x)(Va (x) + Vb (x)) with a polynomial V (x).
1. 2. 3. 4. 5. 6.
IF Ua (x) = Ub (x) RETURN XGCD(Ua (x), Va (x) + Vb (x)) (S1 (x), X1 (x)) ← HXGCD(Ua (x), Ub (x)) IF deg S1 (x) = 0, RETURN (S1 (x), X1 (x), 0). (S(x), X2 (x), Y (x)) ← XGCD(S1 (t), Va (x) + Vb (x)) X(x) = X1 (X) · X2 (x) RETURN (S(x), X(x), Y (x)).
Remark that S1 (x) is monic, since Ua (x) and Ub (x) are monic. Moreover, S(x) is in general not monic, since Va (x) and Vb (x) are in general not monic. Proposition 4. Algorithm XGCD3 CANTOR requires the following number of field operations: – If gcd(Ua (x), Ub(x)) = 1: 2g2 + O(g) – In all other cases: 3g2 + O(g).
Proof: To prove these bounds, we proceed as follows: first of all, we bound the degree of Ua (x) and Ub (x) by g and the degree of Va (x) + Vb (x) by g − 1. Note that this bound is sharp in most cases. Next, we assume the worst case of deg S being 0. This is also mostly the case. By using proposition 3 we obtain our results. 2 These considerations yield bounds on the number of operations of step 1 of Algorithm 3. Note that if Ua (x) = Ub (x) and Va (x) = −Vb (x), a primitive ideal equivalent to the product is given by (U (x) = 1, V (x) = 0). The computation of a reduced ideal equivalent is thus “for free”. This case will not be mentioned any more. Note furthermore that in almost all cases, we will have gcd(Ua (x), Ub(x)) = 1.
Real and Imaginary Hyperelliptic Curve Arithmetics
585
Proposition 5. Step 1 of Algorithm 3 requires the following number of operations in the finite field: – If Ua (x) = Ub (x) and Va (x) = Vb (x): 9g2 + O(g) . – If gcd(Ua (x), Ub (x)) = 1: 8g2 + O(g) . – In all other cases: 10g2 + O(g) .
Proof: We simply have to compute the amount of operations which is necessary in addition to the extended gcd computation. 2 In the real quadratic setting, there are only a few changes to the situation described above: the degrees of Pa (t) and Pb (t) are equal to g + 1 and their p two highest coefficients are always equal to the two highest coefficients of D(t) which is now of degree 2g + 2. We get the same complexity statements which can be proved in an analogous fashion to the situation above; we skip them. Algorithm 7 XGCD3 COMPOSITION Input: Qa (t), Qb (t), Pa(t) + Pb(t) with p p Qi (t) monic, deg Qi (t) ≤ g, deg(Pi (t) − D(t)) < deg Qi (t) < deg(Pi (t) + D(t)) , Qi (t) | (D(t) − Pi (t)2 ) for i = a, b. Output: S(t), X(t), Y (t) such that S(t) = gcd(Qa (t), Qb (t), Pa (t) + Pb(t)), and S(t) = X(t)Qa (t) + V (t)Qb (t) + Y (t)(Pa (t) + Pb(t)) with a polynomial V (t).
1. 2. 3. 4. 5. 6.
IF Qa (t) = Qb (t) RETURN XGCD(Qa(t), Pa (t) + Pb (t)) (S1 (t), X1 (t)) ← HXGCD(Qa (t), Qb(t)) IF deg S1 (t) = 0, RETURN (S1 (t), X1 (t), 0). (S(t), X2 (t), Y (t)) ← XGCD(S1 (t), Pa (t) + Pb (t)) X(t) = X1 (t) · X2 (t) RETURN (S(t), X(t), Y (t)).
Remark that S1 (t) is monic, since Qa (t) and Qb (t) are monic. Moreover, S(t) is in general not monic, since Pa (t) and Pb(t) are in general not monic. This yields for XGCD3 COMPOSITION the same running time as XGCD3 CANTOR.
586
4
Sachar Paulus and Andreas Stein
Reduction
We show that both in the imaginary and in the real quadratic case the reduction of a nonreduced ideal can be performed in O(g2 ) operations in k. The formulas used in [6] yield O(g3 ) operations in k. The quadratic complexity can be achieved by making use of slightly modified formulas which are known as Tenner’s algorithm. The correctness of this variant is well known in the case of imaginary resp. real quadratic number fields and has e.g. been proved for real quadratic congruence function fields in [8]. But these algorithms work also for both real and imaginary quadratic function fields over any field of constants of characteristic 6= 2 and the proof of correctness is substantially identical. The reduction process is split into a first, expensive step and in the subsequent, much more efficient steps. They have been used in the algorithms 3 and 5.
4.1
The Imaginary Quadratic Case
We now formulate the first reduction step and its complexity: Algorithm 8 RED IMAG1 Input: U ∗ (x), V ∗ (x) such that U ∗ (x) | (F (x) − (V ∗ (x)2 )) and deg V ∗ (x) < deg U ∗ (x) Output: a(x), U (x), V (x), U ∗ (x), V ∗ (x) such that U (x) | (F (x) − (V (x)2 )) and deg V (x) < deg U (x), deg U (x) ≤ deg U ∗ (x) − 2, a(x) = b−V ∗ (x)/U (x)c and the ideal described by (U (x), V (x)) is equivalent to the ideal described by (U ∗ (x), V ∗ (x)) ∗
2
(x)) 1. U (x) ← F (x)−(V U ∗ (x) 2. (a(x), V (x)) ← DIVREM(−V ∗ (x), U (x)) 3. RETURN (a(x), U (x), V (x), U ∗(x), V ∗ (x))
Proposition 6. Algorithm RED IMAG1 requires at most 2g deg U ∗ (x) − (deg U (x))2 + (3 + deg U ∗ (x)) deg U (x) + 2g) operations if the result represents a reduced ideal; if it is not, the complexity is at most 2(deg U ∗ (x))2 − deg U ∗ (x) − (deg U (x))2 + (3 + deg U ∗ (x)) deg U (x) − 3).
Real and Imaginary Hyperelliptic Curve Arithmetics
587
2
Proof: Straigtforward. Corollary 1. Step 2.1 of algorithm 3 requires at most 9g2 + O(g) . operations in the field.
Proof: We simply use that deg U ∗ (x) ≤ 2g and take the maximum of the possibilities. 2 Now we formulate and analyze a single additional reduction step: Algorithm 9 RED IMAG2 Input: a∗ (x), U ∗ (x), V ∗ (x), U ∗∗ (x), V ∗∗ (x) output of either RED IMAG1 or RED IMAG2 Output: a(x), U (x), V (x) , U ∗ (x), V ∗ (x) such that U (x) | (F (x) − (V (x)2 )) and deg V (x) < deg U (x), deg U (x) ≤ deg U ∗ (x) − 2, a(x) = b−V ∗ (x)/U (x)c and the ideal described by (U (x), V (x)) is equivalent to the ideal described by (U ∗ (x), V ∗ (x)). 1. U (x) ← U ∗∗(x) + a∗ (x)(V ∗ (x) − V ∗∗ (x)) 2. (a(x), V (x)) ← DIVREM(−V ∗ (x), U (x)) 3. RETURN (a(x), U (x), V (x), U ∗(x), V ∗ (x))
Proposition 7. The complexity of algorithm RED IMAG2 is at most (deg U ∗ (x)(deg U (x) − deg U ∗∗ (x)) + (deg U ∗∗ (x))2 −(deg U (x))2 + deg U ∗ (x) − deg U (x)) 2
Proof: Immediate. We can now formulate the complexity of the whole reduction process:
Proposition 8. Let (U (x), V (x)) the representation of an ideal in an imaginary quadratic function field. If the reduction process does not finish after the application of RED IMAG1, the reduction of this ideal, i.e. the computation of a representation of an equivalent reduced ideal requires no more than 3(deg U (x))2 + 3 deg U (x) − g − 12 operations.
588
Sachar Paulus and Andreas Stein
Proof: Denote by u−1 , v−1 the values of deg U (x), deg V (x), u0 , v0 the degrees of the new values computed by RED IMAG1 and by ui , vi the degrees of the new values computed by the i-th application of RED IMAG2. Assume that Ul (x), Vl (x) is the representation of a reduced ideal. Let us first compute the complexity of the reduction of U1 (x), V1 (x): l X
(ui−1 (ui − ui−2 ) + u2i−2 − u2i + ui−1 − ui )
i=1
= u2−1 + u20 − u2l−1 − u2l − u−1 u0 + ul−1 ul + u0 − ul ≤ u2−1 + u20 − u−1 u0 + u0 − ul−1 . Now we add the complexity of RED IMAG1 and get, for that case that the result of RED IMAG1 is not reduced, i.e. if l ≥ 1, the following complexity bound: 3u2−1 + 4u0 − u−1 − 3 − ul−1 ≤ 3u2−1 + 4(u−1 − 2) − u−1 − 3 − (g + 1) , since ul−1 ≥ g + 1. This finishes the proof.
2
Corollary 2. Step 2 of algorithm 3 requires no more than 12g2 + O(g) operations. Proof: This follows now easily for deg U (x) ≤ 2g in the proposition above. An exact upper bound for the number of operations of Step 2 is given by 12g2 + 5g − 12. 2 This finishes the proof of the first statement of theorem 1.
4.2
The Real Quadratic Case
We p analyze the complexity of the first reduction step. Let H(t) be the polynomial b D(t)c. Algorithm 10 RED REAL1 Input: Q∗(t), P ∗ (t) such that Q∗ (t) | (D(t)−(P ∗ (t)2 )) and deg P ∗ (t) < deg Q∗ (t) Output: r(t), Q(t), P (t), Q∗ (t), P ∗ (t) such that Q(t) | (D(t) − (P (t)2 )) and deg P (t) < deg Q(t), deg Q(t) ≤ deg Q∗ (t) − 2, r(t) = (P ∗(t) + H(t)) mod Q(t) and the ideal described by (Q(t), P (t)) is equivalent to the ideal described by (Q∗ (t), P ∗(t))
Real and Imaginary Hyperelliptic Curve Arithmetics
1. 2. 3. 4.
589
r(t) ← (P ∗(t) + H(t)) mod Q∗ (t) P (t) ← H(t) − r(t) (t))2 Q(t) ← D(t)−(P Q∗ (t) RETURN (r(t), Q(t), P (t), Q∗(t), P ∗(t))
Proposition 9. Algorithm RED REAL1 requires at most 2g2 + O(g) operations if deg Q∗ (t) = g + 1; if deg Q∗ (t) > g + 1, the complexity is at most 2(deg Q∗ (t))2 − 1 . Proof: Straigtforward, by using in the second case that r(t) = P ∗ (t) + H(t) and 2 P (t) = −P ∗(t). Corollary 3. Step 2.1 of algorithm 5 requires at most 8g2 + O(g) . operations in the field. Proof: We simply use that deg Q∗ (t) ≤ 2g.
2
Now we formulate and analyze a single additional reduction step: Algorithm 11 RED REAL2 Input: r ∗ (t), Q∗ (t), P ∗ (t), Q∗∗ (t), P ∗∗ (t) output of either RED REAL1 or RED REAL2 Output: r(t), Q(t), P (t) , Q∗ (t), P ∗ (t) such that Q(t) | (D(t) − (P (t)2 )) and deg P (t) < deg Q∗ (t), deg Q(t) ≤ deg Q∗ (t) − 2, r(t) = (P ∗ (t) + H(t)) mod Q(t) and the ideal described by (Q(t), P (t)) is equivalent to the ideal described by (Q∗ (t), V ∗ (t)).
1. 2. 3. 3.
(a(t), r(t)) ← DIVREM(P ∗(t) + H(t), Q∗(t)) P (t) ← H(t) − r(t) Q(t) ← Q∗∗ (t) + a(t)(r(t) − r ∗ (t)) RETURN (r(t), Q(t), P (t), Q∗(t), P ∗(t))
590
Sachar Paulus and Andreas Stein
Proposition 10. The complexity of algorithm RED REAL2 is at most (deg Q∗∗ (t) − deg Q∗ (t))(deg Q∗∗ (t) + deg Q∗ (t) + 1) . 2
Proof: Immediate. By summing up we obtain the complexity of Step 2 of algorithm 5:
Proposition 11. Let (Q(t), P (t)) the representation of an ideal in real quadratic function field. If the reduction process does not finish after the application of RED REAL1, the reduction of this ideal, i.e. the computation of a representation of an equivalent reduced ideal requires no more than 3(deg Q(t))2 + deg Q(t) − g2 − 3g − 3 operations. Proof: We let q−1 = deg Q(t), q0 be the degree of the new value computed by RED REAL1 and let qi be the degrees of the new values computed by the i-th application of RED REAL2. Assume that Ql0 (t), Pl0 (t) is the representation of a reduced ideal. By 10, we need the following operations to compute Ql0 (t), Pl0 (t): l0 X i=1
2 (qi−2 − qi−1 )(qi−1 + qi−2 + 1) = q−1 + q−1 − ql20 −1 − ql0 −1 .
If the reduction process stops after the application of RED REAL1, then l0 = 0 and no additional operations must be performed. Otherwise, i.e. l0 ≥ 1, we add the complexity of RED REAL1 and obtain the following complexity bound for Step 2 of algorithm 5: 2 2 + q−1 − ql20 −1 − ql0 −1 − 1 ≤ 3q−1 + q−1 − (g + 1)2 − (g + 1) − 1 , 3q−1
since ql0 −1 ≥ g + 1. This finishes the proof
2
Corollary 4. Step 2 of algorithm 5 requires no more than 11g2 + O(g) operations. Proof: This follows now easily for deg Q(t) ≤ 2g in the proposition above. An exact upper bound for the number of operations of Step 2 is given by 11g2 −g −2. 2
Real and Imaginary Hyperelliptic Curve Arithmetics
591
Proposition 12. Step 2 and 3 of algorithm 5 require no more than 13g2 + O(g) operations. Proof: Straightforward, by adding the complexity of step 2 and 3 together and using that deg Q(t) ≤ 2g, l0 ≤ (g + 1)/2 and the fact that the input Q∗ (t), P ∗(t) 2 of RED REAL2 in step 3 represents a reduced ideal. This completes the proof of theorem 1. Finally, one may be interested in the composition and reduction operation as defined as a Giant step operation in [7]. In that case one just has to perform one additional operation instead of step 3 of algorithm 5, namely to compute the standard base of the reduced ideal represented by Ql0 (t), Pl0 (t). This means, one has to compute P˜l0 (t) = Pl0 (t) mod Ql0 (t) which gives a total complexity of 11g2 + g − 1 operations.
References 1. E. Artin: Quadratische K¨ orper im Gebiete der h¨ oheren Kongruenzen I. Mathematische Zeitschrift 19 (1924). pp. 153 – 206. . In: S. Lang, J. Tate (eds.): The collected papers of Emil Artin. Reading, Mass.: Addison Wesley 1965. 2. D. G. Cantor: Computing in the Jacobian of a hyperelliptic curve. Mathematics of Computation 48 (1987). pp. 95 – 101. 3. H. Cohen, A Course in computational algebraic number theory, Springer Verlag, Berlin 1995. 4. D. E. Knuth, The Art of Computer Programming, vol. 2: Seminumerical Algorithms, Addison-Wesley, Reading (Mass.) 1981. 5. D. Mumford: Tata Lectures on Theta I, II. Boston: Birkh¨ auser Verlag 1983/84. 6. S. Paulus, H. -G. R¨ uck: Real and imaginary quadratic representations of hyperelliptic function fields. To appear in Mathematics of Computation. 7. R. Scheidler, A. Stein, H. C. Williams: Key-exchange in real quadratic congruence function fields. Designs, Codes and Cryptography 7 (1996). pp. 153 – 174. 8. A. Stein: Equivalences between elliptic curves and real quadratic congruence function fields. Journal de Th´eorie des Nombres de Bordeaux 9. 1997. pp. 79 – 95.
Unit Computation in Purely Cubic Function Fields of Unit Rank 1 Renate Scheidler1 and Andreas Stein2 1
2
University of Delaware, Newark DE 19716, USA [email protected] University of Manitoba, Winnipeg MB R3T 2N2, CANADA [email protected]
Abstract. This paper describes a method for computing the fundamental unit and regulator of a purely cubic congruence function field of unit rank 1. The technique is based on Voronoi’s algorithm for generating a chain of successive minima in a multiplicative cubic lattice which is used for calculating the fundamental unit and regulator of a purely cubic number field.
1
Introduction
Voronoi’s Algorithm [14,7] computes a system of fundamental units of a cubic number field. The method is based on computing chains of successive minima in the maximal order O of the field K. An implementation in purely cubic number fields was given by Williams et al. [16,17,15]. Since then, the general method has been extended to fields of higher degree; see [1,2,3,4,5,6]. The first algorithm for computing fundamental units in cubic funcion fields was given by Mang [9]. His technique is based on the Pohst-Zassenhaus method for number fields [10, Chap. 5]. By Mang’s own admission, his technique is slow and is infeasible for even modest size fields; an example that took 273 seconds of CPU time on a Siemens mainframe using Mang’s method required only 0.04 seconds on a Silicon Graphics Challenge workstation with our algorithm. In this paper, we show how to adapt Voronoi’s algorithm to purely cubic congruence function fields of unit rank 1. While the number field and function field situations are similar in many ways, there are also significant differences between the two settings; most notably, the different behavior of the valuation (which is non-archimedian in the function field case) and the lack of geometric lattice structure in function fields. For an introduction to congruence function fields, see [13]; the purely cubic case is discussed in more detail in [9]. Let k = IFq be a finite field of order q and let t be a an element that is transcendental over k. As usual, we denote by k(t) the rational function field and by k[t] the ring of polynomials over k in the variable
Research supported by NSF grant DMS-9631647
J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 592–606, 1998. c Springer-Verlag Berlin Heidelberg 1998
Unit Computation in Purely Cubic Function Fields of Unit Rank 1
593
t. A purely cubic (congruence) function field K over the field of constants k is a cubic extension of k(t) of the form K = k(t, ρ) where ρ3 = D ∈ k[t] and D = D(t) is cubefree in k[t]; write D = GH 2 where G, H ∈ k[t] are relatively prime. The algebraic closure O =√ k[t] of k[t] in K is a√k[t]-module of rank 3 3 3 with a (t−)integral basis {1, ρ = GH 2 , ω = ρ2 /H = G2 H}. Its unit group ∗ ∗ ∗ O is the (t−)unit group of K. O = k × E where E is the product of r infinite cyclic groups and r ∈ IN0 is the (t−)unit rank of K. The units in k ∗ are the trivial units. If r > 0, an independent set of r generators of E is a system of fundamental (t−)units of K. Denote by k((1/t)) the field of Puiseux series ∞ i i=m ai /t (m ∈ ZZ, ai ∈ k for i ≥ m) over k. Then the number of irreducible factors over k((1/t)) of the polynomial F (t, y) = y 3 − D ∈ k[t, y] is r + 1. Henceforth, we assume that q ≡ −1 (mod 3) (so k does not contain any primitive cube roots of unity), the degree deg(D) of D is divisible by 3, and the leading coefficient sgn(D) of D is a cube in k. In this case, ρ ∈ k((1/t)), so K ≤ k((1/t)), and F (t, y) splits into two irreducibles over k((1/t)), namely F (t, y) = (y−ρ)(y 2 + ρy + ρ2 ), so r = 1 and O∗ = k ∗ × with a fundamental unit (see [11]). If g denotes the genus of K, then we have g = deg(GH) − 2.
(1.1)
Let D be the divisor group of K over k, D0 the subgroup of D of divisors of degree 0, and P ≤ D0 the group of principal divisors of K|k. The divisor class group (of degree 0) of K|k is the factor group C 0 = D0 /P; its order h = #C 0 is finite and is the divisor class number of K. In analogy to D and D0 , denote by U the subgroup of D generated by the infinite places (with respect to t) of K and by U 0 the subgroup of divisors in U of degree 0. The (t−)regulator of K is the index R = [U 0 : P ∩ U 0 ]. If I is the group of fractional (t−)ideals of K and H the subgroup of fractional principal (t−)ideals of K, then the (t−)ideal class group of K is C = I/H; its order h = #C is also finite and is the (t−)ideal class number of K. We have h = Rh . For α =
∞
i=m
(1.2)
ai /ti ∈ k((1/t)) (m ∈ ZZ, ai ∈ k for i ≥ m, am = 0), we define deg(α) = −m, |α| = q −m = q deg(α) , sgn(α) = am , 0 ai
α = . ti i=m
We also set deg(0) = −∞ and 0 = 0. Note that α ∈ k[t] and |α − α| < 1. If is a fundamental unit with deg() > 0, then is unique up to a trivial unit factor. Then we have for the regulator R = deg()/2.
594
Renate Scheidler and Andreas Stein
Let ι be a primitive cube root of unity in some algebraic closure of k, so ι2 + ι + 1 = 0 and ι3 = 1. Then k((1/t))(ι) is a quadratic extension of k((1/t)) whose nontrivial K-automorphism is “complex conjugation” − : k((1/t))(ι) → k((1/t))(ι) via ι = ι−1 . For φ ∈ k((1/t))(ι), we define deg(φ) = |φ| =
1 deg(φφ), 2
1
|φφ| = q 2 deg(φφ) = q deg(φ) .
K(ι) = k(ι, t, ρ) is a cyclic extension of k(ι, t) of degree 3 for which we fix the k(ι, t)-automorphism : K(ι) → K(ι) via ρ = ιρ. Write γ for (γ ) (γ ∈ K(ι)). Note that α = α for α ∈ K. For α ∈ K, the norm of α (over k(t)) is N (α) = αα α . We have N (α) ∈ k(t), and if α ∈ O, then N (α) ∈ k[t].
2
Reduced Ideals and Minima
A subset A of O is an integral ideal if for any α, β ∈ A and θ ∈ O, α + β ∈ A and θα ∈ A. A subset A of K is a fractional ideal if there exists a nonzero d ∈ k[t] such that dA is an integral ideal of O. A fractional or integral ideal A is principal if A = (α) = {θα | θ ∈ O} for some α. Henceforth, we assume all ideals (fractional and integral) to be nonzero, i.e. the term “ideal” will be synonymous with “nonzero ideal”. An integral ideal A is primitive if there exists no nonconstant polynomial f ∈ k[t] such that every α ∈ A is a multiple in O of f . For a primitive integral ideal A, the greatest common divisor of all polynomials in A ∩ k[t] is denoted by L(A). Every integral or fractional ideal A of O is a k[t]-module of rank 3. If A has a k[t]basis {λ, µ, ν}, write A = [λ, µ, ν]. Specifically, if a fractional ideal A contains 1, then A has a k[t]-basis of the form {1, µ, ν} where µ = (m0 + m1 ρ + m2 ω)/d, ν = (n0 + n1 ρ + n2 ω)/d, with m0 , m1 , m2 , n0 , n1 , n2 , d ∈ k[t]. If gcd(m0 , m1 , n0 , n1 , n2 , d) = 1, then dA is a primitive integral ideal with L(dA) = d/sgn(d). The (t−)norm of a fractional ideal A = [λ, µ, ν] is N (A) = sgn(det(T ))−1 det(T ) ∈ k(t)∗ where T ∈ Gl3 (k(t)) such that λ 1 µ = T ρ . ν ω N (A) is independent of the choice of bases for A and O. The norm of an integral ideal A is N (A) = L(A)3 N (L(A)−1 A) ∈ k[t]. For an integral ideal A, we have L(A) | N (A), and if A is primitive, then N (A) | L(A)2 .
Unit Computation in Purely Cubic Function Fields of Unit Rank 1
595
The (t−)discriminant of a fractional or integral ideal A = [λ, µ, ν] is the quantity 2 λ λ λ k(t) if A is a fractional ideal, ∆(A) = det µ µ µ ∈ k[t] if A is an integral ideal. ν ν ν ∆(A) is independent of the choice of k[t]-basis of A. The discriminant of O = [1, ρ, ω] is ∆ = −27G2 H 2 . We have ∆(A) = a2 N (A)2 ∆ for some a ∈ k ∗ .
(2.1)
If A is a fractional ideal and α ∈ A, α = 0, then α is a minimum in A if for β ∈ A with β = 0, |β| ≤ |α| and |β | ≤ |α | imply β ∈ k ∗ α, i.e. β and α differ only by a factor that is a trivial unit. A is reduced if 1 ∈ A and 1 is a minimum in A. An integral ideal A is reduced if the fractional ideal (L(A)−1 )A is reduced, i.e. if and only if L(A) is a minimum in A. It is easy to see that O is reduced. If A is a fractional ideal of O with a minimum θ ∈ A, then ηθ is a minimum in A for every unit η ∈ O∗ . In particular, every unit in O is a minimum in O. Theorem 2.1. If A is a reduced fractional ideal, then |∆(A)| > 1, so |N (A)| > √ 1/| ∆|. Proof. See [11].
√ Corollary 2.2. If A is a reduced integral ideal, then |L(A)| < | ∆| and |N (A)| < |∆|. Proof. Since A is reduced, we have L(A) | N (A) | L(A)2 . Also B = (L(A)−1 )A is a reduced fractional ideal, so by√Theorem 2.1, |L(A)|2 ≥ |N (A)| = |L(A)|3 |N (B)| √ 3 > |L(A)| /| ∆|, so |L(A)| < | ∆| and |N (A)| ≤ |L(A)|2 < |∆|. Corollary 2.3. If A is a reduced fractional ideal and α ∈ A is nonzero, then |N (α)| > 1/|∆|. Proof. Let d ∈ k[t] be of minimal degree so that B = dA is an integral ideal. Then dα ∈ B, so (dα)(d2 α α ) = N (dα) = d3 N (α) ∈ B. Hence L(B) = d | d3 N (α), so |N (α)| ≥ 1/|d|2 = 1/|L(B)|2 > 1/|∆| by Corollary 2.2. Let A be a fractional ideal and let θ ∈ A be a minimum in A. An element φ ∈ A is a minimum adjacent to θ in A if (M1) (M2) (M3)
φ is a minimum in A, |θ| < |φ|, For no α ∈ A, |θ| < |α| < |φ| and |α | < |θ |.
596
Renate Scheidler and Andreas Stein
Note that conditions (M1) and (M2) imply |φ | < |θ |, as |θ | ≤ |φ | would yield θ ∈ k ∗ φ by (M1) and hence |θ| = |φ|, contradicting (M2). In the number field setting, the existence of adjacent minima is guaranteed by Minkowski’s lattice point theorem. However, in function fields, we have no such tool available, so we need to establish their existence analytically. Theorem 2.4. Let A be a fractional ideal and let θ ∈ A be a minimum in A. Then a minimum φ adjacent to θ in A exists and is unique up to a trivial unit factor. Proof. The set H(θ) = {α ∈ A | |α| > |θ| and |α | < |θ |} is nonempty as θ ∈ H(θ). Let α ∈ H(θ) have minimal degree. Then the set D(θ) = {deg(N (α)) | α ∈ H(θ), |α| is minimal} is a nonempty subset of ZZ which is bounded below by − deg(∆) by Corollary 2.3. Let φ ∈ H(θ) so that |φ| is minimal and deg(N (φ)) is a smallest element of D(θ). Then (a) (b) (c)
|φ| > |θ| and |φ | < |θ |, if α ∈ A with |α| > |θ| and |α | < |θ |, then |α| ≥ |φ|, if α ∈ A with |α| = |φ| and |α | < |θ |, then |α | ≥ |φ |.
Conditions (M2) and (M3) for φ follow from properties (a) and (b), respectively, so we only need to show that φ is a minimum in A. Let α ∈ A, α = 0 with |α| ≤ |φ| and |α | ≤ |φ |. By (a), |α | < |θ |. If |α| ≤ |θ|, then α ∈ k ∗ θ as θ is a minimum in A, implying |θ | = |α | < |θ |. So |α| > |θ|. By (b), |α| ≥ |φ|, so |α| = |φ|. Hence by (c), |α | ≥ |φ |, so |α | = |φ |. Thus we have |α| = |φ| and |α | = |φ |. Let β = α − (sgn(α)sgn(φ)−1 )φ, then β ∈ A, |β| < |φ| and |β | ≤ max{|α |, |φ |} < |θ |. Suppose β = 0, then by (M3), |β| ≤ |θ|, so β ∈ k ∗ θ. But then |θ | = |β | < |θ |. So we must have β = 0 and thus α ∈ k ∗ φ. Therefore, φ is a minimum in A. To see that φ is unique up to a factor in k ∗ , let φ1 , φ2 be two minima in A adjacent to θ. Without loss of generality, assume |φ1 | ≤ |φ2 |. Both φ1 and φ2 are minima in A by (M1) and |θ| < |φ1 |, |φ2 | by (M2). If |φ1 | < |φ2 |, then by (M3), |φ1 | ≥ |θ |, so since φ1 is a minimum in A, θ ∈ k ∗ φ1 , implying the contradiction |θ| = |φ1 | > |θ|. Similarly we can rule out |φ1 | > |φ2 |. Hence |φ1 | = |φ2 |, so φ1 ∈ k ∗ φ2 . We will henceforth speak of the minimum adjacent to an element in a fractional ideal, keeping in mind that it is only unique up to a trivial unit factor. If A is a reduced fractional ideal with a minimum θ ∈ A, then it is easy to see that A∗ = (1/θ)A is reduced. Furthermore, if θ∗ is the minimum adjacent to 1 in A∗ , then θθ∗ is the minimum adjacent to θ in A.
Unit Computation in Purely Cubic Function Fields of Unit Rank 1
3
597
The Algorithm
The basic idea for our algorithm is the same as in the unit rank 1 case of number fields. Start with the reduced ideal A1 = O, and recursively define a sequence of reduced fractional ideals An as follows. Let µn be the minimum adjacent to 1 in An and set An+1 = (µ−1 n )An . Then An+1 is a reduced fractional ideal. Define θ1 = 1,
θn =
n−1
µi
for n ≥ 2.
(3.1)
i=1
Then An = (θn−1 ) and θn+1 = µn θn , so by our above remarks, θn+1 is the minimum adjacent to θn in O (n ∈ IN). Thus we have a chain θ1 = 1, θ2 , θ3 , . . .
(3.2)
of successive minima in O. This sequence can easily be shown to contain all the minima in O of nonnegative degree. In particular, the fundamental unit must appear in the sequence (3.2), and since is the unit of smallest positive degree, the first index l ∈ IN such that N (θl+1 ) is a trivial unit yields θl+1 = (up to a constant factor). l is the period of (or of K). We have Al+1 = A1 , µl+1 = µ1 and in fact µml+i = µi for m, i ∈ IN, where the last two equalities again only hold up to a trivial unit factor. Hence the sequence (3.2) is equal to 1, θ2 , . . . , θl , , θ2 , . . . , θl , 2 , 2 θ2 , . . . , 3 , . . . and contains all nonnegative powers of . A simpler termination condition for the computation of the chain (3.2) that avoids computing norms is given as follows. Let A = (θ−1 ) = [1, µ, ν] where θ is an element of the chain (3.2) and µ = (m0 +m1 ρ+m2 ω)/d, ν = (n0 +n1 ρ+n2 ω)/d with m0 , m1 , m2 , n0 , n1 , n2 , d ∈ k[t] and gcd(m0 , m1 , m2 , n0 , n1 , n2 , d) = 1. Then N (θ) ∈ k ∗ if and only if d ∈ k ∗ . We are now ready to present our algorithm for computing the fundamental unit of K. In each iteration, we have a basis {1, µ ˜n = (m0 + m1 ρ + m2 ω)/d, ν˜n = (n0 +n1 ρ+n2 ω)/d} of our current ideal An = (θn−1 ) where θn = (e0 +e1 ρ+e2 ω)/f (mi , ni , d, ei , f ∈ k[t] for i = 0, 1, 2). This basis is replaced by a reduced basis {1, µn , νn }; that is, a basis containing the minimum µn adjacent to 1 in An . Details on how to obtain such a basis are given in the next section. Then θn is updated to θn+1 = µn θn , and since An+1 = (µ−1 n )An , µn and νn are replaced by µ ˜n+1 = 1/µn = µ n µ n /N (µn ) and ν˜n /µn = νn µn+1 , respectively. Initially, θ1 = 1, µ1 = ρ, and ν1 = ω. According to our termination condition, we end the algorithm as soon as we encounter a basis denominator d that is a constant. Algorithm 3.1 (Fundamental Unit Algorithm). Input: The polynomials G, H where D = GH 2 .
598
Renate Scheidler and Andreas Stein
Output: e0 , e1 , e2 ∈ k[t] where = e0 + e1 ρ + 2 ω is the fundamental unit of K. Algorithm: 1. Set e0 = f = 1, e1 = e2 = 0; m0 = m2 = n0 = n1 = 0, m1 = n2 = d = 1. 2. Repeat (a) { Reduce the basis } Use Algorithm 4.1 below to replace m0 , m1 , m2 , n0 , n1 , n2 , d by the coefficients of a reduced basis. (b) { Update θn } i. Replace e0 e0 m0 + (e1 m2 + e2 m1 )GH e1 e 0 m1 + e 1 m0 + e 2 m2 G by e . e m + e m H + e m 2 0 2 1 1 2 0 f df ii. Compute g = gcd(e0 , e1 , e2 , f ). For i = 0, 1, 2, replace ei by ei /g and f by f /g. (c) { Update µ and ν } i. Set a0 = m20 − m1 m2 GH, a1 = m22 G − m0 m1 , a2 = m21 H − m0 m2 , b = m30 + m31 GH 2 + m32 G2 H − 3m0 m1 m2 GH. ii. Replace
m0 m1 m2
a0 d a1 d . a2 d
by
iii. Replace n0 n1 n2
by
a0 n0 + (a1 n2 + a2 n1 )GH a0 n 1 + a1 n 0 + a2 n 2 G . a0 n 2 + a1 n 1 H + a2 n 0
iv. Replace d by b. v. Compute h = gcd(m0 , m1 , m2 , n0 , n1 , n2 , d). For i = 0, 1, 2, replace mi by mi /h, ni by ni /h and d by d/h. until d ∈ k ∗ . The number of reduction steps is exactly the period l of . This number can be quite large. 1
1 deg(∆)−2
Theorem 3.2. l ≤ 2R = deg() = O(q 2 deg ∆−2 ), so || = O(q q 2
).
Unit Computation in Purely Cubic Function Fields of Unit Rank 1
599
Proof. For n ∈ IN, let δn = deg(θn ) ∈ IN0 . Since δ1 = 0 and δn strictly increases with n, a simple induction argument shows δn ≥ n − 1. Hence l ≤ deg(θl+1 ) = √ deg() = 2R. Using the inequality h ≤ ( q + 1)2g deduced in [12], together with √ (1.1) and (1.2), we obtain R ≤ ( q + 1)deg(∆)−4 = O(q 1/2 deg(∆)−2 ), whence follows the bound on . The above theorem shows that the coefficients e0 , e1 , e2 of can be so huge that it might be infeasible to compute or even simply write down the fundamental unit for large values of |∆|. For this situation, we modify Algorithm 3.1 to avoid calculating the minima θn and compute only the regulator R of K as follows. In step 1, initialize only the mi , ni (i = 0, 1, 2), and d, as well as setting R = 0. Perform step 2 as in Algorithm 3.1, except omit part (b) of step 2. Instead, we need to add deg(µn ) to R. Since deg(µn ) = deg(m0 /d) (see Theorem 4.4), we replace step 2 (b) of Algorithm 3.1 by the instruction “replace R by R + deg(m0 ) − deg(d)”. Since the algorithm with these modifications computes deg() = 2R, we must divide the value of R by 2 after the loop in step 2 terminates to obtain the correct value for the regulator.
4
Computation of a Minimum Adjacent to 1
The above discussion shows that the task of finding (or R) reduces to the problem of computing a reduced basis of a reduced fractional ideal A. In particular, we need to be able to generate the minimum adjacent to 1 in A. This is accomplished by applying a sequence of suitable unimodular transformations to the pair (˜ µ, ν˜) where {1, µ ˜, ν˜} is a k[t]-basis of A, until a basis {1, µ, ν} is obtained such that µ is our desired minimum. Before we present the details of this reduction technique, we require several somewhat technical definitions. Henceforth, we exclude the characteristic 2 case; that is, we require k to be a finite field of characteristic at least 5. If α = a + bρ + cω ∈ K with a, b, c ∈ k(t), let ξα = bρ + cω
= α − a, 1 (α − α ), ηα = bρ − cω = 2ι + 1 ζα = 2a − bρ − cω = α + α ,
(4.1)
where we recall that ι is a primitive cube root of unity. Then ξf α+gβ = f ξα +gξβ , ηf α+gβ = f ηα +gηβ , ζf α+gβ = f ζα +gζβ for any α, β ∈ K and f, g ∈ k(t). Simple calculations show α=
1 (3ξα + ζα ), 2
α α =
1 (3ηα2 + ζα2 ). 4
and if A = [1, µ, ν] is a fractional ideal, then
ξ η det µ µ = ξµ ην − ξν ηµ = −2 ∆(A), ξν ην
(4.2)
(4.3)
600
Renate Scheidler and Andreas Stein
so this determinant is independent of the choice of basis of A. We are now ready to present our reduction method. Algorithm 4.1 (Reduction Algorithm). Input: µ ˜, ν˜ where {1, µ ˜, ν˜} is a basis of some reduced fractional ideal A. Output: µ, ν where {1, µ, ν} is a basis of A such that |ζµ | < 1, |ζν | < 1, |ξµ | > |ξν |, |ηµ | < 1 ≤ |ην |. Algorithm: 1. Set µ = µ ˜, ν = ν˜. 2. If |ξµ | < |ξν | or if |ξµ | = |ξν | and |ηµ | < |ην |, replace
µ 0 1 µ by . ν −1 0 ν 3. If |ηµ | ≥ |ην | (a) while ξµ /ξν = ηµ /ην , replace
µ µ 0 1 by . ν ν −1 ξµ /ξν (b) Replace
µ ν
by
0 1 −1 ξµ /ξν
µ . ν
(c) If |ηµ | = |ην |, replace µ ν
by
1 −a 0 1
µ ν
where a = sgn(ηµ )sgn(ην )−1 ∈ k ∗ . 4. (a) While |ην | < 1, replace
µ µ 0 1 by . ν ν −1 ξµ /ξν (b) While |ηµ | ≥ 1, replace µ ν
by
ην /ηµ −1 1 0
5. If |ζµ | ≥ 1, replace µ by µ − (1/2) ζµ . If |ζν | ≥ 1, replace ν by ν − (1/2) ζν .
µ . ν
Unit Computation in Purely Cubic Function Fields of Unit Rank 1
601
Proposition 4.2. Algorithm 4.1 terminates and produces the output specified above. Proof. It is easy to see that all transformations of µ and ν in steps 2, 3 and 4 maintain a basis {1, µ, ν} of A because the basis transformation matrices all have determinant 1. We claim that after step 3, we have |ξµ | > |ξν |,
|ηµ | < |ην |.
(4.4)
This can be seen as follows. Since step 2 replaces µ by ν and ν by −µ, we have |ξµ | > |ξν | or |ξµ | = |ξν | and |ηµ | > |ηµ | after step 2. If at the beginning of step 3, |ηµ | < |ην |, then from the previous step |ξµ | > |ξν |, so conditions (4.4) hold and step 3 is skipped. Assume now that |ηµ | ≥ |ην |, so step 3 is entered. Consider step 3 (a) and set α = ν and β = ξµ /ξν ν − µ, so α and β are obtained by applying the linear transformation of step 3 (a) to µ and ν. Then ξµ |ξβ | = ξν − ξµ < |ξν | = |ξα |, ξν ηµ |ηβ | = ην − ηµ < |ην | = |ηα |. ην Hence, |ξν | and |ην | strictly decrease in each iteration, so the loop must terminate at the latest before |ξν ην | ≤ 1, for otherwise by (4.3): | ∆(A)| = |ξν ην ||ηµ /ην − ξµ /ξν | < |ξν ην | ≤ 1, contradicting Theorem 2.1. After step 3 (b), we have |ξβ | < |ξν | = |ξα | and
ξµ ηµ ηµ |ηβ | = − ην + ην − ηµ ≥ |ην | = |ηα | ξν ην ην because | ξµ /ξν − ηµ /ην | ≥ 1 and | ηµ /ην ην − ηµ | < |ην |. Finally, observe that in step 3 (c), a = ηµ /ην . If we set α = µ − aν and β = ν, then as before |ηα | < |ηβ |, and since |ξµ | > |ξν |, we have |ξα | = |ξµ − aξν | = |ξµ | > |ξν | = |ξβ |. So step 3 achieves the inequalities (4.4) above. In step 4, we ensure that |ηµ | < 1 ≤ |ην |. From (4.4), it is clear that at most one of the while loops in step 4 is entered. Consider first the case |ην | < 1, i.e. case 4 (a). Set α = ν and β = ξµ /ξν ν − µ. Then ξµ ην − ηµ > |ην | = |ηα |, |ηα | = |ην | < 1, |ξβ | < |ξν | = |ξα |, |ηβ | = ξν so inequalities (4.4) and the condition |ηµ | < 1 are maintained throughout the loop. Furthermore, |ην | strictly increases in each iteration, so the while loop will terminate with the desired basis. Step 4 (c) can be analyzed analogously. Finally, step 5 achieves |ζµ |, |ζν | < 1. To see this, let α = µ − (1/2) ζµ , then by (4.1) |ζα | = |ζµ − (1/2)ζζµ | = |ζµ − ζµ | < 1. Similarly for ν.
602
Renate Scheidler and Andreas Stein
We proceed to prove that the basis of Algorithm 4.1 is indeed a reduced basis, Using the identities (4.2), one can show that if α ∈ K, then |α | < 1 if and only if |ηα | < 1 and |ζα | < 1. Theorem 4.3. Let {1, µ, ν} be a basis of a reduced fractional ideal A such that |ζµ | < 1, |ζν | < 1, |ξµ | > |ξν |, |ηµ | < 1 ≤ |ην |. Then µ is the minimum adjacent to 1 in A, so {1, µ, ν} is a reduced basis of A. Proof. Let θ be the minimum adjacent to 1 in A, θ = l + mµ + nν with l, m, n ∈ k[t]. We need to show that l = n = 0 and m ∈ k ∗ . Since |θ | < 1, we have |ζθ | < 1 and |ηθ | < 1. Also |ζµ | < 1 and |ηµ | < 1 imply |µ | < 1. Then |µ| > 1 as otherwise µ ∈ k. Hence |µ| ≥ |θ| since otherwise 1 < |µ| < |θ| and |µ | < 1, contradicting (M3) for θ. Now |ξθ | = |2θ − ζθ |, so since |ζθ | < 1 and |θ| > 1, |θ| = |ξθ |. Similarly, |µ| = |ξµ |. If n = 0, then m = 0 as θ ∈ k[t], so |m| > |n| and |mξµ | > |nξν |. If n = 0, then 1 > |ηθ | = |mηµ + nην | with |nην | ≥ 1 implies |mηµ | = |nην |. Thus, |n| ≤ |nην | = |mηµ | < |m|, so |m| > |n| and |mξµ | > |nξν | as well. It follows that |θ| = |ξθ | = |mξµ + nξν | = |mξµ | = |mµ| ≥ |mθ|, so |m| ≤ 1. Thus, 1 ≥ |m| > |n|, so n = 0 and m ∈ k ∗ . Now 1 > |ζθ | = |ζl+mµ | = |2l + ζµ |, so since |ζµ | < 1, |l| < 1, so l = 0 and θ = mµ ∈ k ∗ µ. The coefficients of the basis generated by Algorithm 4.1 are small: Theorem 4.4. Let A be a reduced fractional ideal and let {1, µ, ν} be the basis of A produced by Algorithm 4.1. Let µ = (m0 + m1 ρ + m2 ω)/d, ν = (n0 + n1 ρ + n2 ω)/d with m0 , m1 , m2 , n0 , n1 , n2 , d ∈ k[t] and gcd(m √ 0 , m1 , m2 , n0 , n1 , n2 , d) = | = |m ρ| = |m ω| ≤ | ∆|, and |n0 |, |n1 ρ|, |n2 ω| < 1. Then |d| < |dµ| = |m 0 1 2 √ | ∆|. Proof. |d| < |dµ| follows from |µ| > 1. From |µ| > 1 and |ζµ | = |3m0 /d − µ| < 1, it follows that |dµ| = |m0 |. The inequalities |ξµ | > 1 and |ηµ | < 1 imply |m1 ρ| = |m2 ω| = |dξµ |. From |ζµ | = |2m0 /d − ξµ | < 1, we obtain |dξµ | = |m0 |. So |d| < |dµ| = |m0 | = |m1 ρ| = |m2 ω|. Now dA is a reduced integral ideal with L(dA) = sgn(d)−1 d, so d3 N (A) = N (dA) | d2 , and thus |dN (A)| ≤ 1. From (2.1) and (4.3), we obtain √ √ | ∆| ≥ |dN (A) ∆| = |d ∆(A)| = |d(ξµ ην − ξν ηµ )| ≥ |dξµ | as |ξµ | > |ξν | and |ηµ | < 1 ≤ |ην |. √ Since |ξµ | > |ξν |, √ we have | ∆| ≥ |m1 ρ + m2 ω| > |n1 ρ + n2 ω|. Also,√| ∆(A)| = |ξµ ην | > |ην |, so | ∆| > |dην | = |n1 ρ − n2 ω|.√Hence |n1 ρ|, |n√ 2 ω| < | ∆|. Finally, |ζν | < 1 implies |2n0 − n1 ρ + n2 ω| < |d| < | ∆|, so |n0 | < | ∆|.
Unit Computation in Purely Cubic Function Fields of Unit Rank 1
5
603
Implementation
We implemented our algorithm on a Silicon Graphics Challenge workstation using the computer algebra system SIMATH developed by the research group of Professor H. G. Zimmer at the Universit¨ at des Saarlandes in Saabr¨ ucken, Germany. To compute with Puiseux series, it was necessary to use truncated series as approximations, in analogy to using rational approximations when computing with real numbers. To that end, we employed the method for extracting cube roots as described in [8] and implemented by Mang in [9] to compute “approximations” ρˆ and ω ˆ of the basis elements δ ρ and ω, respectively. That is, if ∞ ρ = i=− deg(ρ) ri /ti , then for δ ≥ 0, ρˆ = i=− deg(ρ) ri /ti is an approximation of precision δ to ρ, so |ρ − ρˆ| < q −δ . Similarly for ω. In contrast to Voronoi’s algorithm in number fields, it was possible to establish conditions on the required precision δ that could be checked throughout the algorithm; it is a simple matter to flag the cases where the precision is not large enough and increase it as required. It turned out that a uniform precision of δ = deg(∆) was sufficient throughout our computations. Examples show that reducing the precision to deg(∆)/2 or even deg(∆)/4 might still produce correct results, but computation times improved only marginally with smaller precision. Since the polynomials and series approximations in our algorithm generally had few zero coefficients, they were given in dense representation; that is, as a list starting with the degree of the polynomial or the series, followed by the coefficients in order of decreasing degree of monomial. The main difficulty in our implementation was the computation of the principal parts of quotients as required in steps 3 – 5 of Algorithm 4.1. Here, an approximation ξˆµ of ξµ = (m1 ρ + m2 ω)/d was represented as a pair (αµ , d) where ˆ ; similarly for ξν , ηµ , and ην . To compute a quotient ξµ /ξν αµ = m1 ρˆ + m2 ω for example, we performed “division with remainder” on the quanitities αµ and ˆ . Note that it is possible to reduce the division with remainder αν = n1 ρˆ + n2 ω of two truncated series to a division of a truncated series by just a polynomial by using formulas such as ξµ A − Bην = ξν C where A = m1 n21 H + m2 n22 G,
B = m1 n 2 − m2 n 1 ,
C = n31 H + n32 G.
Then ξµ /ξν = (A − B ηˆν )/C, provided |n1 |, |n2 | < |C| which is extremely likely. Here, ηˆν is an approximation of precision deg(B) to ην . Similar formulas, involving different values of A and C, but using the same B value, hold for the other quotients. Note that N (dA) = dB/sgn(dB), so B is independent of the basis and need only be computed once per reduction. Furthermore, |B| <
604
Renate Scheidler and Andreas Stein
|∆|/|d| ≤ |∆| by Corollary 2.2, so deg(B) < deg(∆). We performed computations with both explicit division with remainder and the above formulas, and the division with remainder version of the algorithm turned out to be about 20 percent faster. In step 5 of Algorithm 3.1, we approximate ζµ = 2m0 /d + ξµ by ζˆµ = (2m0 + αµ )/d. Then the principal part ζµ of ζµ can be computed as simply (2m0 − αµ )/d. This will always produce the correct polynomial as |ζµ −(2m0 +αµ )/d| √< max{|m1 |, |m2 |}/|d| q −δ < 1 since |d| ≥ 1 and at this point |m1 |, |m2 | < | ∆| by Theorem 4.4. Similarly for ζν .
6
Numerical Examples
All our examples were done over prime fields k = IFp where p is a prime with p ≡ −1 (mod 3), and used monic polynomials G and H. Not surprisingly, our regulator algorithm was significantly faster than our unit algorithm due to the time-consuming polynomial arithmetic involved in updating θn in step 2 (b) of each iteration of Algorithm 3.1. √ 3 The largest unit we computed was the fundamental unit of K = IF17 ( GH 2 ) 4 3 2 where G = t + 4 and H = t + t + 11t + 5t + 12. Here, = e0 + e1 ρ + e2 ω where deg(e0 ) = 1554, deg(e1 ) = 1551, and deg(e2 ) = 1552, so || = 171554 , a number of 3109 decimal digits. The period of is 775. It took just under 15 CPU minutes to compute . For the examples given in the table below, we randomly generated monic polynomials G, H ∈ IFp [t] so that deg(GH 2 ) ≡ 0 (mod 3), G and H are both squarefree, and gcd(G, H) = 1. Each row of the table specifies the prime p, the √ polynomials 3 G and H, the period l of the fundamental unit of K = IFp (t, GH 2 ), the regulator R of K, and the CPU time required to compute R. We point out that for small genus and large field of constants, knowledge of the regulator oftentimes uniquely determines the divisor class number h of the field, or at least narrows h down to only a few possible values. From the Hasse-Weil Theorem (see [13, Theorem V.1.15, p. 166, and Theorem V.2.1 , p. 169]), we can √ √ infer that ( q − 1)2g ≤ h ≤ ( q + 1)2g . By (1.2), h is a multiple of R. Usually, there are only a few multiples of R that fall within these bounds. For example, the last five examples in our table below each permit only three possible values for h. We plan to investigate the computation of a suitable approximation of h by means of truncated Euler products in a forthcoming paper.
Unit Computation in Purely Cubic Function Fields of Unit Rank 1
605
Table 1. Regulator Computations p
G
5 t+4
H 7
6
5
l 4
R
Time
3
t +t +t +4t +2t + 6387 6655 38.52 t2 + t + 1 5 t2 + 4t + 2 t8 + t7 + 3t5 + 3t4 + 57105 59501 8 min 13 3t3 + 2t2 + t + 2 5 t4 + t3 + 2t2 + 3t + 3 t4 + t2 + 2t + 3 2834 2950 17.31 5 4 3 2 5 t +t +3t +2t +2t+4 t5 + t4 + 4t3 + 4t2 + 3 251783 262322 37 min 9 t7 + 4t6 + 2t5 + 9t3 + 189893 191487 22 min 58 t2 + 4t + 10 11 t3 + 4t2 + 7t + 8 t3 + 2t2 + t + 1 855 870 3.97 4 2 11 t + 10t + 2t + 6 t4 + 2t3 + 10t2 + 6t + 6 122619 123718 15 min 7 11 t5 + 2t4 + 8t3 + t2 + t + 2 t2 + 4t + 8 61702 62204 8 min 45 11 t + 4
17 t3 + 9t2 + 12t + 2 t3 + 5t2 + 3t + 5 17 t4 + 15t3 + 12t2 + 14t + 6 t + 3 17 t5 + 3t4 + 13t3 + 15t2 + t2 + 6t + 3 7t + 13 23 23 23 23
t+3 t4 + 3t3 + 17t + 13 3 t + 5t + 2 t3 + 22t2 + 2t + 2 4 3 2 t + 22t + 16t + 4t + 4 t + 7 t5 + 15t4 + 16t3 + 16t2 + t2 + 21t + 10 4t + 16
29 t3 + 24t2 + 12t + 24 t3 + 16t2 + 10t + 1 4 3 2 29 t + 22t + 17t + 12 t+5 5 4 3 2 29 t + 27t + 13t + 10t + t2 + 4t + 17 23t + 3 41 t4 + 15t3 + 4t2 + 37t + 14 t + 28 41 t3 + 30t2 + 35t + 9 t3 + 29t2 + 15t + 38
sec sec sec sec sec sec sec sec
31987 32077 2 min 40 sec 892 894 3.38 sec 562601 564510 58 min 3 sec 1145 1146 4.20 sec 102347 102553 8 min 42 sec 4251 4256 16.50 sec 744378 745808 1 h 21 min 80008 80103 8508 8520 1483564 1485310
7 min 3 sec 33.62 sec 2 h 44 min
24238 24248 1 min 37 sec 961413 962005 1 h 25 min
71 t4 + 9t3 + 9t2 + 3t + 20 t + 56 41058 41064 2 min 49 sec 3 2 3 2 71 t + 30t + 37t + 2 t + 13t + 66t + 34 1408409 1408658 2 h 7 min 89 t2 + 8t + 56 t2 + 22t + 67 4 3 2 89 t +23t +50t +67t+35 t + 79 107 t2 + 58t + 74 2
t2 + 54t + 86 2
1317 1318 116511 116520 3862
3.87 sec 8 min 1 sec
3863
11.98 sec
6526
20.20 sec
197 t + 27t + 125
t + 65t + 158
6525
401 t2 + 51t + 400
t2 + 71t + 59
26925
26926 1 min 24 sec
797 t2 + 526t + 353
t2 + 765t + 687
70680
70681 3 min 42 sec
2
983 t + 15t + 279
2
t + 740t + 864
107574 107575 5 min 33 sec
606
Renate Scheidler and Andreas Stein
References 1. Buchmann, J. A.: A generalization of Voronoi’s algorithm I, II. J. Number Theory 20 (1985) 177–209 2. Buchmann, J. A.: The computation of the fundamental unit of totally complex quartic orders. Math. Comp. 48 (1987) 39–54 3. Buchmann, J. A.: On the computation of units and class numbers by a generalization of Lagrange’s algorithm. J. Number Theory 26 (1987) 8–30 4. Buchmann, J. A.: On the period length of the generalized Lagrange algorithm. J. Number Theory 26 (1987) 31–37 5. Buchmann, J. A.: Zur Komplexit¨ at der Berechnung von Einheiten und Klassenzahlen algebraischer Zahlk¨ orper. Habilitationsschrift, Universit¨ at D¨ usseldorf, Germany, (1987) 6. Buchmann, J. A., Williams, H. C.: On the infrastructure of the principal ideal class of an algebraic number field of unit rank one. Math. Comp. 50 (1988) 569–579 7. Delone, B. N.,Fadeev, D. K.: The Theory of Irrationalities of the Third Degree. Transl. Math. Monographs 10, Amer. Math. Soc., Providence, Rhode Island (1964) 8. Jung, E.: Theorie der Algebraischen Funktionen einer Ver¨ anderlichen. Berlin (1923) 9. Mang, M.: Berechnung von Fundamentaleinheiten in algebraischen, insbesondere rein-kubischen Kongruenzfunktionenk¨ orpern. Diplomarbeit, Universit¨ at des Saarlandes, Saarbr¨ ucken, Germany, (1987) 10. Pohst, M., Zassenhaus, H.: Algorithmic Algebraic Number Theory. Cambridge University Press, 1st paperpack ed., Cambridge (1997) 11. Scheidler, R., Stein, A.: Voronoi’s Algorithm in Purely Cubic Congruence Function Fields of Unit Rank 1 (in preparation) 12. Stein, A., Williams, H. C.: Some Methods for Evaluating the Regulator of a Real Quadratic Function Field. Experimental Mathematics (to appear) 13. Stichtenoth, H.: Algebraic Function Fields and Codes. Springer, Berlin (1993) 14. Voronoi, G. F.: On a Generalization of the Algorithm of Continued Fractions (in Russian). Doctoral Dissertation, Warsaw, Poland, (1896) 15. Williams, H. C.: Continued fractions and number-theoretic computations. Rocky Mountain J. Math. 15 (1985) 621–655 16. Williams, H. C., Cormack, G., Seah, E.: Calculation of the regulator of a pure cubic field. Math. Comp. 34 (1980) 567–611 17. Williams, H. C., Dueck, G. W., Schmid, B. K.: A rapid method of evaluating the regulator and class number of a pure cubic field. Math. Comp. 41 (1983) 235–286
An Improved Method of Computing the Regulator of a Real Quadratic Function Field Andreas Stein1 and Hugh C. Williams?2 1 2
University of Manitoba, Winnipeg MB R3T 2N2, Canada, [email protected] University of Manitoba, Winnipeg MB R3T 2N2, Canada hugh [email protected]
Abstract. There exists an effective algorithm for computing √ the regulator of a real quadratic congruence function field K = k(X)( D) of genus 2 g = deg(D)/2 − 1 in O(q 5 g ) polynomial operations. In those cases where 8 the regulator exceeds 10 , this algorithm tends to be far better than the g Baby step-Giant step algorithm which performs O(q 2 ) polynomial op2 erations. We show how we increased the speed of the O(q 5 g )-algorithm such that we are able to large values of regulators of real quadratic congruence function fields of small genus.
1
Introduction
Let k = IFq be a finite field of odd characteristic with q elements and let K be a real quadratic function field over k of genus g. √Then K can be generated over the rational function field k(X) as K = k(X)( D), where D is a monic, squarefree polynomial of degree 2g + 2, with respect to the real quadratic order √ OK = k[X][ D], i.e. the maximal order. ∗ = k ∗ × hi, where ∈ K is We know that the group of X-units E = OK a fundamental unit. In this case, the decomposition of the infinite place ∞ of k(X) is ∞ = ∞1 · ∞2 , where ∞1 and ∞2 are the infinite places of K/k with respect to OK . Denoting by v1 and v2 the corresponding normalized valuations of K, we define the regulator of K over k with respect to OK as R := |v1 ()|. F. K. Schmidt [6] showed that h = Rh0 , where h0 denotes the ideal class number of K with respect to OK and h the divisor class number of K. In [14], an algorithm was presented which computes 2 R in O(q 5 g ) polynomial operations. For small genus, i.e. 2g + 2 < log q, this method is so far the best algorithm known. We first present the Baby step-Giant step algorithm for computing R by extending the infrastructure techniques of Shanks [8] (see also [18], [15] and [16]) to real quadratic function fields. For a more detailed discussion of Shanks’s infrastructure ideas in real quadratic function fields we refer to [11], [12] and ?
Research supported by NSERC of Canada Grant #A7649
J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 607–620, 1998. c Springer-Verlag Berlin Heidelberg 1998
608
Andreas Stein and Hugh C. Williams
[13], As shown in [14], the ideas of Lenstra [3] and Schoof [7] can be applied to 2 the problem of determining R unconditionally in O(q 5 g ) polynomial operations. This method can be improved considerably; indeed, we were able to compute 25-digit regulators in the case that the genus is 3 in approximately 10 hours CPU-time. We implemented the algorithms and compared their running times. Here, we remark that the regulator is the same as the order of the divisor [∞1 − ∞2 ] in the divisor class group, i.e. the group of k-rational points of the Jacobian of the curve having the given function field. The method we describe below should extend to give an algorithm with the same running time that can compute the order of any k-rational point on the Jacobian (presented as the class of a divisor). Moreover, the same algorithm should work for the function field of any curve (not necessarily hyperelliptic) over a finite field, given a way to compute the group operation in its Jacobian.
2
Continued Fraction Expansion
Let L = k((1/x)) be the field of Puiseux series over k. Then K is a subfield of k((1/x)) and the completions of K with respect to ∞1 and ∞2 are isomorphic to L. We consider the continued fraction expansion in K via Laurent series at ∞1 in the variable 1/x. Many properties of these continued fractions can be found in [1], [13], [5], and [17]; many others can easily be established by analogy Pmto results given in [4], [18]. Let α ∈ L \ k(X) be a non-zero element, i.e. α = i=−∞ ci X i with cm 6= 0. Set deg(α) = m |α| = q m
(2.1)
sgn(α) = cm P i bαc = m i=0 ci x .
If m is negative we have bαc = 0. For completeness, we set deg(0) = −∞ and |0| = 0. Set 1 α0 := α, a0 := bα0 c, αi+1 =
1 , ai+1 = bαi+1 c (αi − ai )
(i ∈ IN0 ) .
(2.2)
Also, θ1 := 1 θi+1 :=
i Y
1 αj
(i ∈ IN) .
(2.3)
j=1 1
Here and in the sequel, IN, respectively, IN0 , denote the set of positive, respectively, nonnegative integers
The Regulator of a Real Quadratic Function Field
609
We note that deg(αi ) = deg(ai ) ≥ 1 (i ∈ IN). For α ∈ L, we say that the continued fraction expansion of α is quasi-periodic if there are integers ν > ν0 ≥ 0 and a constant c ∈ k ∗ such that αν = cαν0
.
(2.4)
The smallest positive integer ν −ν0 for which (2.4) holds is called the quasi-period of the continued fraction expansion of α. The expansion of α is called periodic if (2.4) holds with c = 1. The smallest positive integer ν − ν0 for which (2.4) holds with c = 1 is called the period of the continued fraction expansion of α. In the periodic case, the quasi-period divides the period, and they both start at the same index ν0 . We now investigate the√ continued fraction expansion of real quadratic irra√ tionalities. We set d = b Dc. Let α = (P + D)/Q, α ∈ L \ k(X), where 0 6= Q, P ∈ k[X], and Q|(D − P 2 ). We put Q0 = Q, P0 = P , α0 = α, Q−1 = (D − P 2 )/Q, We iterate 2 )/Qi (i ∈ IN0 ) . Pi+1 = ai Qi − Pi ; Qi+1 = (D − Pi+1 Then 0 6= Qi , Pi , ∈ k[X], Qi | D − Pi2 and √ αi = (Pi + D)/Qi (i ∈ IN0 ) .
(2.5)
(2.6)
Defining ri ∈ k[X] to be the remainder on division of Pi + d by Qi , we obtain the optimized formulas (i ∈ IN0 ) Pi+1 =d − ri Qi+1 =Qi−1 + ai (ri − ri−1 ) (i ∈ IN) (2.7) (i ∈ IN0 ) ai =(Pi + d) div Qi (i ∈ IN0 ) . ri =(Pi + d) mod Qi By definition, deg(ri ) < deg(Qi ) for i ≥ 0. Finally, Q0 θi+1 θ i+1 = (−1)i Qi
(i ∈ IN0 ) .
(2.8)
√ α = (P + √ if deg(α) < 0 < deg(α), or equivalently, √ D)/Q is called reduced, deg(P − D) < deg(Q) < deg(P + D). Artin, [1], p.193, showed that if one αi is reduced for i ∈ IN0 , then all αj are reduced for j ≥ i. Of course, the continued fraction expansion of real quadratic irrationalities is√periodic, if k is finite. Let us now consider the special case that α = D. The continued fraction expansion of α is periodic and quasi-periodic. We easily see that α is not reduced; but α1 is reduced, and, therefore, so is αi for any i ≥ 1. We know that E = k ∗ × h θ m+1 i, and the regulator R of K with respect to OK is then R = deg(θ m+1 ) .
(2.9)
We also know that for s ∈ IN0 , we have Qs ∈ k ∗ if and only if s = λm with λ ≥ 0. Furthermore, θλm+1 θλm+1 ∈ k ∗ for λ ≥ 1.
610
3
Andreas Stein and Hugh C. Williams
Baby Step Giant Step Method
In this section, we point out the close relation between primitive ideals and the continued fraction expansion of real quadratic irrationalities. Any non-zero subset a of OK is an integral ideal if and only if√there exist S, P, Q ∈ k[X] with case, we call Q|(D − P 2 ) such √ that a = SQIFq [X] + (SP + S D)IFq [X]. In this √ SQ, SP + S D a k[X]-basis of a, and we write a = [SQ, SP + S D]. We say that an integral OK -ideal a is primitive, if S can be chosen to be 1. √ The k[X]-basis [Q, P + D] of a primitive OK -ideal a is unique, if sgn(Q) = 1 and deg(P ) < deg(Q). This gives a method to test ideal equality. For any OK -ideal a, the OK -ideal a := {α; α ∈ a} is called the conjugate ideal of a. If a = (α) = αOK with α ∈ K, we call a a principal OK -ideal. We say that two integral OK -ideals a and b are equivalent, written a ∼ b, if there exist some non-zero elements α, β ∈ OK such that (α)a = (β)b. Let a and b be two primitive OK -ideals. By using essentially the same ideas as Gauss (see [9], [3] or [15]), we can compute the product of a and b, i.e. a primitive OK -ideal c and a polynomial S ∈ k[X] such that ab = (S)c. This can be done in O(deg(D)) polynomial operations. 2 An primitive OK -ideal a is called reduced √ if there exists a IFq [X]-basis for a √ of the form { Q, P + D} such that (P + D)/Q is a reduced √ real quadratic irrationality. We know that a primitive OK -ideal a = [Q, P + D] is reduced if and only if deg(Q) ≤ g. The continued fraction expansion of primitive OK -ideals is defined via√the continued fraction expansion of real quadratic irrationalities. Let a = [Q, P + D] √ be any primitive OK -ideal, and set α := (P + D)/Q. With Qi , Pi ∈ IFq [x] defined as in (2.5), we let a1 := a, Q0 := Q, and P0 := P . Then, for i ∈ IN0 , h √ i (3.1) ai+1 := Qi , Pi + D is a primitive integral OK -ideal, and αi = (Pi + ai+1 a Baby step. Furthermore,
√
(Qθi+1 ) ai+1 = (Qi ) a .
D)/Qi . We call a step ai → (3.2)
√ If αi = (Pi + D)/Q √i is reduced (i ∈ IN0 ), then the ideal ai+1 is reduced. Let a = a1 = [Q0 , P0 + D] be any non-reduced, primitive OK -ideal, then there is a l ∈ IN with l ≤ 12 deg(Q) − (g + 1)/2 + 1 such that al+1 is reduced, i.e. deg(Ql ) ≤ g. Each ideal class contains exactly one cycle of reduced OK -ideals, and the continued fraction expansion of a reduced OK -ideal a will produce all equivalent ideals. In particular, if a is a reduced, principal OK -ideal, then the continued 2
Here and in the sequel we will use the term “polynomial operations” to mean one of the basic arithmetic operations of addition, subtraction, multiplication, division with remainder, degree comparison, or assignment in k[X].
The Regulator of a Real Quadratic Function Field
611
fraction expansion of a gives us a method to compute all reduced, principal OK -ideals. That means, if a = a1 and b are two reduced, principal OK -ideals, then there exists some ν ∈ IN0 such that b = aν+1 , and by (3.2), we have (Qθν+1 )aν+1 = (Qν )a. Then we define the distance from a to b = aν+1 as δ(b, a) = δ(aν+1 , a) := deg θ ν+1 . (3.3) We always put δi := δ(ai , a). Note that the distance function δi is integer valued and strictly increasing in i. Thus, if δi = δj , we conclude that ai = aj . We have that ai = aj if and only if δi = δj + lR where R is the regulator of K. In this case θ i and θ j differ only by a unit. We can compute δi by δi =
1 2
deg(D) − deg(Q) +
i−2 X
deg(aj )
(i ∈ IN , i ≥ 2) .
(3.4)
j=1
√ √ In the sequel, we let a = a1 = (1) = OK = [1 , D] and α0 = α = D. Clearly, a is reduced. Also ai = (θ i ) are reduced principal ideals for i√∈ IN0 . Let m be the quasi-period of the continued fraction expansion of α = D. Then, am+1 = a1 = a = OK , and R = δm+1 . By iteration, we obtain aλm+i+1 = ai+1 and δλm+i+1 = λR + δi+1 for i ∈ IN. Furthermore, we have ai+1 = [Qi , Pi+1 + √ D], and then ai+1 = am−i+1
(0 ≤ i ≤ m) .
(3.5)
If we set δ˜i := δ(ai , a), we get R = δ˜i+1 + δi+1 − deg(Qi ) (0 ≤ i ≤ m) .
(3.6)
Fix any s, t ∈ IN. Then, we can find a polynomial S ∈ IFq [X] and a primitive OK -ideal c such that as at = (S)c. We apply the continued fraction algorithm to c = c1 . Let l ∈ IN minimal such that cl+1 is reduced. Since cl+1 is also principal, we must have cl+1 = aν with some ν ∈ IN. We then know from [12] that δν = δt + δs + f , where f ∈ ZZ such that −2g ≤ f ≤ 0. We call the computation of aν and f from as and at a Giant step, and denote this operation by as ∗ bt := (aν , f) .
(3.7)
Consequently, a Giant step is a composition of two operations, namely computation of the product of two primitive OK -ideals and reduction of the primitive part of the product using the continued fraction algorithm. This provides us with an algorithm to compute the regulator K (see [14], [12]). Theorem 3.1. There exists an effective algorithm (Baby step-Giant step) for √ computing the regulator R of K = k(X)( D) in OK in O(q g/2 ) polynomial operations.
612
Andreas Stein and Hugh C. Williams
The idea of the Baby step-Giant step algorithm is to create a stock of principal, reduced ideals up to an index s + T where T ≥ 14 deg(D), and s should be g g of order q 2 . By using z = O(q 2 ) Giant steps we jump to principal ideals in the same chain lying at a distance of about 2δs away from each other. √ Because of the quasi-periodicity of the continued fraction expansion of α = D, we must reach one of the stored ideals. We only have to make sure that the step size is not greater than the length of the initial interval. Also, note that this algorithm can be used to test√whether R is less than a given √ bound G. For instance, if and z = O( G) Giant steps, then one can one performs s = b Gc Baby steps √ determine whether R ≤ G in O( G) polynomial operations.
4
The New Method
We now need √ the concept of “closest ideals” as defined in [5]. Again, let a = a1 = OK = [1 , D]. For any non-negative integer y we define the reduced, principal ideal a(y) by a(y) = aj , where δj ≤ y < δj+1 . We also know from [14] that a(y) = aj and δj can be computed effectively in O(g log y) polynomial operations. First, we sketch the method of [14]. Basically, one has to perform the following steps: a) Compute E, L ∈ IN such that |h − E| < L2 . √ b) If R ≤ G := E/ q L, then stop. (Baby step-Giant step) c) Compute a multiple h0 = h∗ R of R by searching the interval [E −L2 , E +L2 ] of length 2L2 with Baby steps and Giant steps. d) Let B := (E + L2 )/G > h∗ . Compute h∗ by trying all primes less than B as follows: r | h∗ ⇔ a(h∗ R/r) = OK . We refer to this algorithm as Regulator1. One disadvantage of this method is that G and B are dependent on E and L. If G is too small, then B is too large which means that too many primes have to be tested in the last step of the algorithm. The main theoretical step is the approximation of h. For deg(D) ≤ 6, i.e. g = 1 or 2, we use √ √ 2g 2g ( q − 1) ≤ h ≤ ( q + 1) .
(4.1)
In this case, the approximation of h is given immediately without further computations. For g ≥ 3 we bound the tail of the truncated Euler product. Let ζK (s) be the zeta function for K. Let P represent any prime polynomial in k[X] and define
The Regulator of a Real Quadratic Function Field
613
χ(P ) ∈ {−1, +1, 0} by Artin’s [1] symbol [D/P ]. We know that −1 Y χ(P ) −s+1 −s −s −s −1 . 1− 1−q = LK (q ) = 1 − q ζK (s) 1 − q |P |s P
(4.2) Since h = LK (1), we see that h = q g LK (1/q) = q g
1 − q −1
−1 Y
1−
P
For n ∈ IN, we put E 0 (n, D) =
q g+1 q−1
Y |P |≤qn
Y
B(n, D) = log
1−
1−
|P |>qn
χ(P ) |P |
χ(P ) |P |
χ(P ) −1 . |P |
−1 ,
−1 ,
(2g + 3) . ψ(n, D) = √ ( q − 1) q n/2 We define E = E(n, D) and L = L(n, D) by E(n, D) = Ne ( E 0 (n, D) ) ,
(4.3)
&r E 0 (n, D)(eψ(n,D)
L(n, D) = Then,
1 − 1) + 2
' .
(4.4)
h = E 0 (n, D) · e B(n,D) ,
and | B(n, D) | < ψ(n, D) .
(4.5)
Also, we have for any n ∈ IN that | h − E(n, D) | < L(n, D)2 . −n−1
g
n+1
Since E(n, D) = O(q g ), ψ(n, D) = O(q 2 ), L(n, D) = O(q 2 − 4 ), and g n+1 L(n, D) = Ω(q 2 − 4 ), the optimal choice for n, provided that g ≥ 3, is n = Ne((deg(D) − 3)/5). Thus, n = bdeg(D)/5c − 1, if deg(D) ≡ 0 (mod 10), and n = bdeg(D)/5c, otherwise. We also assume that q is sufficiently large that ψ(n, D) < 1. Theorem 4.1. There √ exists tor R of K = k(X)( D) in the algorithm needs O(q 1/4 ), Furthermore, if g ≡ 3 ( mod nomial operations.
an effective algorithm for computing the regulaO(q 2g/5 ) polynomial operations. For g = 1, 2, 3, O(q 3/4 ), O(q) polynomial operations, respectively. 5), then the algorithm performs O(q (2g−1)/5 ) poly-
614
5 5.1
Andreas Stein and Hugh C. Williams
Improvements General Improvements
In the third step of Regulator1, one computes a multiple h0 = h∗ R such that h0 ≤ (E + L2 ). Since R is an integer, we can factor h0 and simply try the factors of h0 by the method described in step 4.) of the algorithm. Since factorization can be done in subexponential time, and there are only O(log(E + L2 )) factors to be considered, the speed of the final step can be considerably decreased. Also, √ note that G can be chosen much smaller than G = E/ q L, for instance as the fixed value 100000. Step 3.) is the most time-consuming step of Regulator1, since we have to perform O(L) Baby steps and O(L) Giant steps. Hereby, we must store O(L) reduced, principal OK -ideals. Each of these ideals is represented by two polynomials Q, P such that deg(P ) < deg(Q) ≤ g. Therefore, one has to store roughly 2g + 1 integer coefficients and one additional integer, the distance of the ideal. Thus, the number of reduced, principal OK -ideals, which could be stored, was restricted, in our implementation, to 100000. For larger regulators, this caused an increase of the number of Giant steps and slowed down the algorithm considerably. However, we found that the time needed to perform 1 Giant step coincides with that of l Baby steps, where some experimental values for l can be found in Tab. 1. In the first row, there are values for q = p prime. The first column contains values for g = deg(D)/2 − 1. Table 1. Comparison of Baby steps and Giant steps g\p 17 97 991 10009 100003 1000003 10000019 100000007 1073741741 3 9 9 5 15 15 8 20 24 13 34 36
9 15 26 38
10 16 28 40
11 18 30 44
11 18 30 46
11 18 30 48
11 18 30 49
11 18 30 50
In the sequel, we denote by N the number of Baby steps the computer is able to store. We now perform l · N Baby steps to compute the ideals b1 , . . . , blN , but store every l th ideal only. This set of ideals covers a range which is l times bigger than the set {b1 , . . . , bN }, because δ(bN·l , b1 ) ≈ l · δ(bN , b1 ). This means, the new Giant steps can be l times larger than the previous Giant steps. For each Giant step we have to compute l additional Baby steps, since we stored every l th ideal only. This only doubles the time needed for one Giant step, but increases the speed of the time needed for the Giant steps by a factor of l/2. Thus, we were able to compute much larger regulators. Finally, we do not need to store every coefficient of the polynomials Q, P representing a reduced, principal OK -ideal a. If q is large, it is sufficient to store only the last coefficient of P , some aspect of Q (for instance the second coefficient
The Regulator of a Real Quadratic Function Field
615
of Q) and the distance δ. If the search for a multiple produces a value h00 which might be a multiple of the regulator, one has to test whether a(h00 ) = OK . If this is the case, then h00 is a multiple. If a(h00 ) 6= OK , then continue with the search. In practice, this test had to be performed only a few times. 5.2
The Algorithm
We are now able to present the improved algorithm. We allow G as an input parameter. In practice, we used G = 100000. Also, l was selected as an experience parameter with respect to the above table such that the algorithm is optimal. Algorithm 5.1. Regulator2 Input: G ∈ IN, l ∈ IN, k = IFq , D ∈√k[X] monic, squarefree of even degree, Output: R, the regulator of k(X)( D). √ 1 1.) If g = 1, then s := bq 4 c; E := q + 1; L := d 2 q 1/4 e ; √ 2 If g = 2, then s := bq 3 c; E := q 2 + 6q + 1; L := d2 q 1/4 q + 1e; 2g−1 If g ≥ 3, then s := bq 5 c; n := Ne((2g − 1)/5). Compute E and L by (4.3) and (4.4). 2.) Use Algorithm Regulator1 to test, whether R ≤ G. If R ≤ G, then return(R). 3.) { R > G and |h − E| < L2 . Compute a multiple h0 = h∗ R of R such that h0 < E + L2 .} a.) Determine ak = a(E), δk , as = a(L), and δs . b.) Let b1 := ak and proceed in Baby steps from b1 to produce b1 , bl , b2l , b3l , 0 , . . . , δlt0 , where δi0 := δ(bi , b1 ), until δlt0 > δs + 12 deg(D) . . . , blt and δl0 , δ2l or t + 1 > N . Put S = {b1 , bl , b2l , b3l , . . . , blt }. c.) If t + 1 > s, then L1 := δlt0 , and compute ar = a(L1 ) and δr . We put c1 := ar and δ1∗ := δr . If t + 1 ≤ s, then we set c1 := as and δ1∗ := δs . For j ≥ 2 define cj and δj∗ recursively by (cj , fj ) := c1 ∗ cj−1 ; δj∗ := δ(cj , a1); Proceed until cj+λ or cj+λ ∈ S for some j ∈ IN and 0 ≤ λ ≤ l. d.) If cj+λ = bi then h0 := δj∗ + δ(cj+λ , cj ) − δ(bi , a1 ). If cj+λ = bi , then h0 := δj∗ + δ(cj+λ , cj ) + δ(bi , a1 ) − deg N (bi ). 4.) { h0 = h∗ R, where h0 < E + L2 . } Factor h0 and put B := (E + L2 )/G. For each rational prime divisor r of h0 such that r < B a.) Compute a(h0 /r γ ) for γ = 1, 2, . . . until one finds the least β such that |N (a(h0 /r β ))| 6= 1. Then r β−1 k h∗ . b.) B := B/r β−1 ; h0 := h0 /r β−1 ; 5.) R := h0 ; return(R).
616
Andreas Stein and Hugh C. Williams
We know that h = h0 R, h0 = h∗ R, and E − L2 < h, h0 < E + L2 . Note that, if R > 2L2 , then h = h0 and h0 = h∗ . Our experiments showed that in almost every case, R was bigger than 2L2 , and we could compute the values for h and h0 with no further efforts. Also, as described in [14], the methods of Buchmann and Williams [2] can be employed to provide an algorithm which will find h0 (given a ˜ 2 ) polynomial operations. In most cases h ˜=1 divisor ˜ h of h0 ) in O(q deg(D) /(Rh) was sufficient. 5.3
E(1, D) and E(2, D)
E(1, D) and E(2, D) represent the approximation of h, if 1 ≤ g ≤ 4 and 5 ≤ g ≤ 7, respectively. For n ∈ IN, we know that E(n, D) = Ne(E 0 (n, D)) and E 0 (n, D) =
n q g+1 Y F (ν, D) , q − 1 ν=1
where, for 1 ≤ ν ≤ n, F (ν, D) =
Y |P |=qν
q ν s ν q ν t ν qν = , q ν − χ(P ) qν − 1 qν + 1
and sν , respectively tν , denote the sum over all monic prime polynomials of degree ν with χ(P ) = 1, respectively, χ(P ) = −1. Thus, in order to compute F (ν, D), we have to generate each monic, prime polynomial of degree ν and evaluate its Artin’s symbol χ(P ) = [D/P ]. We now assume that k = IFq , where q = p is a prime. In this case we improved the running time for the approximation considerably. We have that s1 = #{P = X − c : [D(X)/(X − c)] = 1} = #{c ∈ IFq : (D(c)/q) = 1} , t1 = #{c ∈ IFq : (D(c)/q) = −1} , where D(c) denotes the value of the polynomial D ∈ k[X] at c and (D(c)/q) the ordinary Legendre symbol. Thus, we have to evaluate D(c) for each c ∈ IFq . Here, we made use of the method of finite differences which needs only (deg(D)2 − 1) multiplications and (q · deg(D) + deg(D)2 /2 − deg(D)/2) additions in IFq . In addition, we have to compute (D(c)/q) for each c ∈ IFq . First, we precom√ pute a table of all Legendre symbols (z/q) for each z ≤ b qc. To compute (a/q) we put r0 = q, r1 = a, B1 = 0, and B2 = 1. For i ≥ 2 we let ri = ri−2 (mod ri−1 ) , qi−1 = ri−2 (div ri−1 ) , Bi+1 = qi−1Bi + Bi−1 , √ until we find a minimal i such that Bi+1 > b qc. Then, i
(a/q) = (Bi /q) (ri−1 /q) (−1/q) ,
The Regulator of a Real Quadratic Function Field
617
√ where Bi , ri−1 ≤ b qc, and (−1/q)i = 1, if q ≡ 1 ( mod 4) or i even, and √ (−1/q)i = −1, otherwise. To show that ri−1 ≤ b qc, we note that (−1)i ri−1 = aBi − qAi
(i ≥ 1) ,
where A0 = 1, A1 = 0, and Ai+1 = qi−1Ai + Ai−1 for i ≥ 2. Now, Ai /Bi is a convergent in the continued fraction expansion of a/q, Bi < Bi+1 and √ |a/q − Ai /Bi | < 1/(Bi Bi+1 ). If i is minimal such that Bi ≤ b qc < Bi+1 , then √ ri−1 = |aBi − Ai q| < q/Bi+1 < q. In order to compute E(2, D) we have to determine s2 and t2 . We first have to 2 generate all monic, prime polynomials of degree 2. Note that X +AX+B ∈ k[X] 2 is prime, if and only if (A − 4B)/q = −1. Furthermore, the number of monic, prime polynomials of degree 2 is q(q − 1)/2. For large q, this number is large. Proposition 5.2. Let a, b ∈ IFq with (a2 − 4b)/q = −1 . Then, all monic primes of degree 2 are given by the sequence X 2 + AX + B, where A = A(l, m) = l(a − 2m)
,
B = B(l, m) = l2 (b − ma + m2 )
for m = 0, . . . , q − 1, l = 1, . . . , q−1 2 . Thus, we only need to find one monic, prime polynomial of degree 2, and the above proposition tells us how to find all of them. Let A = A(l, m) and B = B(l, m) be given as in the proposition, and let D
mod (X 2 + AX + B) = rX + s ,
(5.6)
where r = r(l, m), s = s(l, m) ∈ IFq . Then, it is a simple matter to see that (5.7) D/(X 2 + AX + B) = (s2 − Asr + Br 2 )/q , where f(l, m) := s2 − Asr + Br 2 is a polynomial of degree 2(2g + 2) in both l and m. Here, the method of finite differences can be applied again. For fixed l or m, the first 2(2g + 2) + 1 values of (f(l, m)/q) can be computed by (5.6) and (5.7). The remaining values can be determined with finite differences and (5.7).
6
Computations
Our computations were performed on a Sun SPARC Ultra 1/140 under Solaris 2.5. We made use of the Computer Algebra System SIMATH [10] which is based on the programming language C. All our computations were done over prime fields IFp , i.e q = p prime, and p < 230 − 1. We concentrated on real quadratic function fields of genus 3, i.e. deg(D) = 8. The discriminants D were selected as follows: For a prime p we randomly constructed a monic, squarefree √ polynomial D of degree 8 in IFp [X]. We calculated the regulator R of IFq (X)( D), and compared the running times of Algorithm Regulator1 and Regulator2 in Tab. 2. Here, Time1 denotes the time for determining the regulator with Algorithm
618
Andreas Stein and Hugh C. Williams
Regulator1, and Time2 the time for determining the regulator with Algorithm Regulator2. Notice that in all cases Regulator2 produces a considerable saving in the amount of time needed to compute R. In Tab. 3, we list examples with large regulators which can not be computed in a reasonable amount of time by Regulator1. For these computations, we used a Pentium Pro/200 under Linux. Table 2. Comparison of the regulator computations on Sun SPARC Ultra 1/140 p
D
R
h0 Time1 Time2
10009 X 8 + 6496X 7 + 5200X 6 + 2832X 5 + 8736X 4 + 8695X 3 + 4883X 2 + 8797X + 2903
15894599452 64
19 s
5s
11003 X 8 + 9536X 7 + 4706X 6 + 3039X 5 + 2291X 4 + 3949X 3 + 8403X 2 + 7501X + 1971
42196128039 32
21 s
6s
12007 X 8 + 6823X 7 + 4262X 6 + 11348X 5 +3943X 4 +10142X 3 + 8163X 2 + 4734X + 1849
7590933683 228
23 s
8s
16001 X 8 + 10484X 7 + 12899X 6 + 15735X 5 +12388X 4 +1694X 3 + 6393X 2 + 6916X + 10016
84960266440 48
33 s
9s
59999 X 8 + 41207X 7 + 11741X 6 + 11960X 5 +4931X 4 +55683X 3 + 58644X 2 + 57422X + 50393
1654185507576 130 1
3 4
m
36 s
70001 X 8 + 1798X 7 + 10632X 6 + 61470X 5 +11788X 4 +8582X 3 + 12335X 2 + 62507X + 17036
2701509852858 128 3
1 2
m 1
1 2
1000003 X 8 + 395982X 7 + 594024X 6 + 282144X 5 + 861840X 4 + 389178X 3 + 108847X 2 + 245026X + 602782
33348834711480068 30
3h
m
22 m
2999999 X 8 + 637021X 7 + 1126126X 6 + 2701685961518879123 10 1503554X 5 + 1345264X 4 + 2946924X 3 + 1822234X 2 + 1118142X + 203383
3
3 4
h 45
1 2
2999999 X 8 +1714883X 7 +2925166X 6 + 9001031984873848717 256938X 5 + 2705750X 4 + 722268X 3 + 1261069X 2 + 2139572X + 1286480
3
9
1 2
h
1
1 8
h
4000037 X 8 +1951801X 7 +3708092X 6 + 32003976721016837378 3700497X 5 + 33188X 4 + 3264226X 3 + 1754294X 2 + 3133810X + 2240125
2 17
1 2
h
1
3 4
h
m
The Regulator of a Real Quadratic Function Field
619
Table 3. Regulator computations on Pentium Pro/200 under Linux p
D
R
h0 Time2
1000099 X 8 + 376676X 7 + 409564X 6 + 364348X 5 + 211552X 4 + 642542X 3 + 945020X 2 + 810762X + 86535
250112704595878790 4
20 m
2000003 X 8 +1234570X 7 +1224049X 6 + 1399371X 5 + 296564X 4 + 451456X 3 + 272553X 2 + 20274X + 554588
8014381361254268607 1
3h
10000019 X 8 +6059305X 7 +5710629X 6 + 2372603X 5 + 5659597X 4 + 8469475X 3 + 8007833X 2 + 2142015X + 5273278
1000105118373556911188 1
7
3 4
h
30000001 X 8 + 16421527X 7 6 19249697X + 10198529X 5 217185X 4 + 14817291X 3 7647976X 2 + 21826962X 20299762
1350236945849657791993 20
9
1 4
h
9
1 4
h
+ + + +
100000007 X 8 + 14075936X 6 90596192X 4 31539469X 2 55743875
88645202X 7 + 90052032X 5 + 35705398X 3 + 32354275X
+ 999988515289041165142833 1 + + +
100000007 X 8 + 14154736X 6 18736251X 4 3879894X 2 77756256
11607007X 7 + 2837523X 5 + 22879699X 3 + 48555574X
+ 1000008785601260429574717 1 + + +
10 h
Acknowledgments: We would like to thank an anonymous referee for some helpful comments.
References 1. Artin, E.: Quadratische K¨ orper im Gebiete der h¨ oheren Kongruenzen I, II. Mathematische Zeitschrift 19 (1924) 153–246 2. Buchmann, J., Williams, H.C.: On the Computation of the Class Number of an Algebraic Number Field. Math.Comp. 53 (1989) 679–688 3. Lenstra, H.W., Jr.: On the Calculation of Regulators and Class Numbers of Quadratic Fields. London Math.Soc.Lec.Note Ser. 56 (1982) 123–150 4. Perron, O.: Die Lehre von den Kettenbr¨ uchen. Teubner, Leipzig (1913)
620
Andreas Stein and Hugh C. Williams
5. Scheidler, R., Stein, A., Williams, H.C.: Key-exchange in Real Quadratic Congruence Function Fields. Designs, Codes and Cryptography 7, Nr.1/2 (1996) 153–174 6. Schmidt, F.K.: Analytische Zahlentheorie in K¨ orpern der Charakteristik p. Mathematische Zeitschrift 33 (1931) 1–32 7. Schoof, R.J.: Quadratic Fields and Factorization. Computational Methods in Number Theory (H.W.Lenstra and R.Tijdemans, eds.). Math.Centrum Tracts 155 II, Amsterdam (1983) 235–286 8. Shanks, D.: The Infrastructure of a Real Quadratic Field and its Applications. Proc.1972 Number Th.Conf., Boulder, Colorado (1972) 217–224 9. Shanks, D.: Class Number, A Theory of Factorization and Genera. Proc.Symp.Pure Math.20 (1971) 415–440 10. SIMATH Manual Chair of Prof.Dr.H.G.Zimmer, University of Saarland (1997) 11. Stein, A., Zimmer, H.G.: An Algorithm for Determining the Regulator and the Fundamental Unit of a Hyperelliptic Congruence Function Field. Proc. 1991 Int. Symp. on Symbolic and Algebraic Computation, ISSAC, Bonn, July 15–17, ACM Press (1991) 183–184 12. Stein, A.: Algorithmen in reell-quadratischen Kongruenzfunktionenk¨ orpern PhD Thesis, Universit¨ at des Saarlandes, Saarbr¨ ucken (1996) 13. Stein, A.: Equivalences between Elliptic Curves and Real Quadratic Congruence Function Fields. Journal de Theorie des Nombres de Bordeaux 9 (1997) 75–95 14. Stein, A., Williams, H.C.: Some Methods for Evaluating the Regulator of a Real Quadratic Function Field. Experimental Mathematics (to appear) 15. Stephens, A.J., Williams, H.C.: Some Computational Results on a Problem Concerning Powerful Numbers. Mathematics of Computation 50 (1988) 619–632 16. Stephens, A.J., Williams, H.C.: Computation of Real Quadratic Fields with Class Number One. Mathematics of Computation 51 (1988) 809–824 17. Weis, B., Zimmer, H.G.: Artin’s Theorie der quadratischen Kongruenzfunktionenk¨ orper und ihre Anwendung auf die Berechnung der Einheiten- und Klassengruppen. Mitt.Math.Ges.Hamburg Sond., XII, No. 2 (1991) 18. Williams, H.C., Wunderlich, M.C.: On the Parallel Generation of the Residues for the Continued Fraction Algorithm. Mathematics of Computation 48 (1987) 405–423
The Equivalence between Elliptic Curve and Quadratic Function Field Discrete Logarithms in Characteristic 2 Robert J. Zuccherato? Entrust Technologies 750 Heron Road Ottawa, Ontario Canada K1V 1A7 [email protected]
Abstract. In this paper we show that solving the discrete logarithm problem for non-supersingular elliptic curves over finite fields of even characteristic is polynomial-time equivalent to solving a discrete logarithm type of problem in the infrastructure of a certain function field. We give an explicit correspondence between the two structures and show how to compute the equivalence.
1
Introduction
Shanks first introduced the concept of the infrastructure of a quadratic number field in 1972 [14]. Since then the concept has been generalized to function fields of odd characteristic [16] and also to function fields of even characteristic [18]. The infrastructure is the inner structure in an equivalence class of the ideal class group. The main tool that is used in the exploration of the infrastructure is the continued fraction algorithm. Scheidler, Buchmann and Williams [11] were able to use this infrastructure, a non-group structure, to implement a Diffie-Hellman [2] type key exchange system. Unfortunately, this system was plagued by problems of ambiguity. Using function fields of odd characteristic, these problems were overcome [12]. Recently, function fields of even characteristic were used to implement the key exchange scheme and also ElGamal [3] type signature schemes were introduced [8,10]. Elliptic curves were first proposed for use in public key cryptography by Koblitz [4] and Miller [6] in 1985. Since then a tremendous amount of work has been done both in implementation of elliptic curve cryptosystems and in showing their security (see for example [5]). Stein has been able to show, using results of Adams and Razar [1], that if we are working in odd characteristic, breaking elliptic curve systems is actually polynomial-time equivalent to breaking systems using the infrastructure of certain function fields [15]. (By “polynomial-time ?
This work was performed while the author was a student at Dept. of Combinatorics and Optimization, University of Waterloo, Waterloo, Ontario, Canada, N2L 3G1
J.P. Buhler (Ed.): ANTS-III, LNCS 1423, pp. 621–638, 1998. c Springer-Verlag Berlin Heidelberg 1998
622
Robert J. Zuccherato
equivalent” we mean that each problem may be reduced to the other in polynomial time.) This may provide further evidence of the security of elliptic curve systems as there is no known way to break systems based on the infrastructure. His result did not apply to characteristic 2 however, which is disappointing since systems using fields of characteristic 2 are the most attractive for implementation. This paper will show that breaking elliptic curve cryptosystems of even characteristic is also polynomial-time equivalent to breaking infrastructure cryptosystems of a certain type. This is accomplished by showing that the problems on which these systems are based, the elliptic and infrastructure discrete logarithm problems, respectively, are polynomial-time equivalent. Our explanation follows closely that of [1]. Section 2 gives an overview of the result. Sections 3 and 4 will provide background on quadratic function fields of characteristic 2 and their ideals. Section 5 gives an explicit correspondence between elliptic curves and certain function fields. In Section 6 we examine the connection between the periodicity of the continued fraction expansion and orders of points and finally in Section 7 the polynomial-time equivalence is shown.
2
An Overview
The remainder of this paper will describe the polynomial-time equivalence between the elliptic discrete logarithm problem and certain instances of the infrastructure discrete logarithm problem. Sections 3 and 4 provide the necessary background. Definition 1. Let E be a non-supersingular elliptic curve E : w 2 + vw = v3 + a2 v + a6 defined over the finite field k = GF (2n) and let P = (a, b) be a point on the curve. Let #E be the order of the curve. Then the elliptic discrete logarithm problem is, given a point Q, also on the curve, find the integer l, 0 < l < #E, such that Q = lP if such an l exists, otherwise return “No solution”. Definition 2. Let k = GF (2n) and K = k(x)(y) be a function field defined by the non-singular equation y2 + By = C as described in Section 3 and let O = k[x][y]. Let R be the regulator. Then the infrastructure discrete logarithm problem is, given a primitive reduced ideal A find δ(A, O) < R if it exists, otherwise return, “No solution”. This section will give an outline of the proof of this polynomial-time equivalence. In [5] it is defined what is meant by a non-supersingular elliptic curve over a field, k, of characteristic 2. The group law for this curve is also given. Thus, given a point P = (a, b) on an elliptic curve, E, we can compute all multiples of this point. Since an elliptic curve of this type is a finite group, P has a finite order, µ.
Discrete Logarithms in Characteristic 2
623
In Section 5 we will show how to use the curve E and the point P to produce an equation EP . We will give a birational transformation between E and EP so that given a point on E, (v, w) 6= P, ∞, we will be able to easily produce the corresponding point (x, y) on EP . We will be interested in the multiples of P , as shown in the following diagram. E, P 0P = ∞ P = (a, b) 2P = (v2 , w2 ) 3P = (v3 , w3 ) .. .
multiples of P
iP = (vi , wi ) .. .
EP −→
∞
−→ −→
(x2 , y2 ) (x3 , y3 ) .. .
−→
(xi , yi ) .. .
(µ − 1)P = (vµ−1 , wµ−1 ) −→ (xµ−1 , yµ−1 )
The equation EP will be of the form y2 + By = C with y ∈ k(( x1 )) and EP will be non-singular, so we will be able to use the results of Sections 3 and 4. In particular, we will be able to compute the continued fraction expansion of elements of K = k(x)(y). Section 6 will introduce a family of elements of K, fQ for all Q ∈ E, Q 6= P . We will examine the continued fraction expansion of fQ and see that its quasi-period is related to the order of P . In fact, the quasi-period of f∞ is m = µ − 1 and the elements of K produced by the continued fraction expansion of f∞ are (up to scalar factors) f∞ , f2P , f3P , . . . , f(µ−1)P . Since we can compute the continued fraction expansion of f∞ , we can use the results of Section 4 to produce O-ideals, Ai , corresponding to each of these quadratic irrationals. Section 6 will show that the ideal Ai , which corresponds to fiP for 2 ≤ i ≤ m, has the form Ai = [x+xi , yi +y] where (xi , yi ) is, as before, the point on EP corresponding to iP = (vi , wi ) on E. It is this final correspondence that will show that the two discrete logarithm problems are polynomial-time equivalent. This is outlined in the following diagram and the two main results of the paper. fQ ∈ K obtained from the continued fraction expansion of f∞
O-ideals
A1 = [1, y] f∞ −→ f2P −→ A2 = [x + x2 , y2 + y] f3P −→ A3 = [x + x3 , y3 + y] .. .. . . fiP −→ Ai = [x + xi , yi + y] .. .. . . fmP −→ Am = [x + xm , ym + y]
624
Robert J. Zuccherato
Theorem 1. Let E be a non-supersingular elliptic curve defined over k = F2n and let P be a point on the curve. Let EP be the quadratic model for E which defines the quadratic function field K = k(x)(y). If the elliptic discrete logarithm problem for E can be solved in polynomial time, then the infrastructure discrete logarithm problem for EP can also be solved in polynomial time. Theorem 2. Let E be a non-supersingular elliptic curve defined over k = F2n and let P be a point on the curve. Let EP be the quadratic model for E which defines the quadratic function field K = k(x)(y). If the infrastructure discrete logarithm problem for EP can be solved in polynomial time, then the elliptic discrete logarithm problem for E can also be solved in polynomial time.
3
Continued Fractions in Quadratic Function Fields of Characteristic 2
In this section we will review some results concerning the continued fraction algorithm in quadratic function fields of characteristic 2. For a more detailed description please see [18,19]. Let k be a field with q = 2n elements and let x be a transcendental element over k. We will be concerned with function fields of the form K = k(x)(y) where y satisfies the equation y2 + By = C for some B, C ∈ k[x]. We will assume that C is monic and that y2 + By = C has no singular points (u, v) ∈ k × k. The completion of k(x) with respect to the place at infinity is k(( x1 )). We need k(x)(y) ⊆ k(( x1 )), so we need y ∈ k(( x1 )) \ k(x). This is equivalent to saying that the place at infinity, P∞ splits completely as P∞ = P1 · P2 in K. Thus we are in the “real” case [17]. It is therefore necessary that deg(B) ≥ 1. For the remainder we will assume that this is the case. Since there are two embeddings of K ⊆ k(( x1 )), we must choose one. These embeddings correspond to two solutions in k of z 2 + z = γ for some γ ∈ k. We will consider k as being represented by the polynomial basis whose defining polynomial has smallest Gray Code rank. We will then choose as the solution to z 2 + z = γ, the one whose binary vector representation has smallest Gray Code rank. This fixes our embedding. The following definitions now make sense. If α ∈ K then α has the form α=
m X
ci x i
i=−∞
where m ∈ ZZ and ci ∈ k for i = −∞, . . . , m. Define deg(α) = m |α| = q m sgn(α) = cm m X bαc = ci x i i=0
Discrete Logarithms in Characteristic 2
625
with deg(0) = −∞ and |0| = 0. If the ring of integers of K is OK then the order O = k[x][y] ⊆ OK . For α = u + vy ∈ K with u, v ∈ k(x), define the conjugate of α by α = u + v(y + B) and the norm by N (α) = αα = u2 + uvB + v2 C. For any α ∈ K we can define the continued fraction expansion of α by α0 = α, a0 = bα0 c and the recursion 1 αi−1 + ai−1 ai = bαi c
αi =
for all i ≥ 1. +y where P, Q ∈ k[x] and Q|P 2 + P B + C then we call α a quadratic If α = PQ irrational. We can then compute the continued fraction expansion of α to get a i +y for i ≥ 0. Let d = byc. The following series of quadratic irrationals αi = PQ i recursions can be used to compute the αi , Pi+1 = d + ri + B
= ai Q i + P i + B P2
+Pi+1 B+C Qi
Qi+1 = Qi−1 + ai (ri + ri−1 ) = i+1 ai = (Pi + d) div Qi = bαi c ri = (Pi + d) mod Qi .
We say the continued fraction expansion of α is quasi-periodic if there exist integers ν > ν0 ≥ 0 and c ∈ k ∗ such that αν = cαν0 . The smallest integer ν − ν0 for which this holds is called the quasi-period. The expansion is called periodic if it holds with c = 1 and then ν − ν0 is called the period. If ν0 = 0 then the expansion is called pure quasi-periodic or pure periodic, respectively.
4
Ideals in O
This section will examine ideals in the order O and their relation with the continued fraction algorithm. This forms the infrastructure of K. For more details on these ideas see [18,19]. If A is an O-ideal, then it is known that A = [SQ, SP + Sy] for some S, P, Q ∈ k[x] with Q|P 2 + P B + C. If S can be chosen to be 1 then we say that A is primitive. We will now only consider primitive O-ideals. For such primitive ideals, Q and P can be chosen such that Q is monic and deg(P ) < deg(Q). If A is written in this way, then we say that it has been written in adapted form. It is now easy to see that there is a correspondence between quadratic irrationals and representations of primitive O-ideals. So, given an O-ideal A1 = [Q0, P0 + y] we can use the given recursions for the continued fraction algorithm to produce a series of O-ideals Ai = [Qi−1 , Pi−1 + y]
626
Robert J. Zuccherato
for i ≥ 1. A primitive O-ideal is called reduced if there exists a k[x]-basis {Q, P + y} for A with Q|P 2 + P B + C and |P + y + B| < |Q| < |P + y| . +y is also called In this situation, the corresponding quadratic irrational, α = PQ reduced. The following result from [18] describes the reduced O-ideals:
Theorem 3. Let A be a primitive reduced O-ideal with k[x]-basis {Q, P + y} as described above. Then the following hold 1. |P + y| = |B| and sgn(P + y) = sgn(B). Also the second highest coefficient of P + y must equal the second highest coefficient of B. 2. |B| ≥ |P | or |y| = |P |. 3. 1 ≤ |Q| < |B|. So if a = (P + d) div Q then 1 < |a| ≤ |B|. In fact A is reduced if and only if |Q| < |B|. Let A1 , A2 , . . . be a sequence of primitive ideals produced by the continued fraction algorithm and α0 , α1 , . . . be the corresponding sequence of quadratic irrationals. If A1 = [1, y] = O then α0 = y. Expanding the continued fraction algorithm on A1 = O will produce a series of reduced O-ideals, A1 , A2 , . . .. By our bounds on |P | and |Q| from Theorem 3 we see that this series will eventually repeat. Let < = {A1 = O, A2 , . . . , Am } be the set of all primitive reduced ideals produced by the continued fraction algorithm on O. Then |<| = m is also the quasi-period of α0 = y. Qi We define θ1 = 1 and θi+1 = j=1 α1j for all i ≥ 2. It is now true that θi+1 θi+1 =
Qi Q0
and when A1 = O that Ai = (θi ).
Since A1 = O = Am+1 , we must have θm+1 ∈ O∗ . Actually, O∗ = k ∗ × hθm+1 i. (See [18, Theorem 10].) The regulator, R, is therefore defined to be the degree of the fundamental unit of O∗ or R = deg(θm+1 ). We define the distance from A1 to Ai to be the degree of θi and denote it by δi = δ(Ai ) = δ(Ai , A1 ) = deg(θi ). The distance function can be computed as δ1 = 0 and δi = deg(B) − deg(Q0 ) +
i−2 X
deg(aj )
j=1
for i ≥ 2. From this formula and the bound on |a| we get that 1 ≤ δi+1 − δi ≤ deg(B) for all i ≥ 1. The distance between two reduced ideals Ai and Aj produced by the continued fraction algorithm on A1 = O can be defined as δ(Aj , Ai ) = δj − δi
Discrete Logarithms in Characteristic 2
627
when j ≥ i ≥ 1. Since a reduced ideal with a given distance may not exist, we define the ideal closest to the left of k ∈ ZZ to be the reduced ideal Ai such that k − δi is minimal and positive. Now, R = δ(Am+1 , O), so we get that δλm+i = δi + λR for all i ≥ 1 and λ ≥ 0.
5
The Correspondence
Assume that we have a non-supersingular elliptic curve defined over k = GF (q) where q = 2n . Let the curve be defined by E : w 2 + vw = v3 + a2 v2 + a6 for a2 , a6 ∈ k, a6 6= 0. In order to avoid confusion with addition of divisors, we will denote the usual addition on the curve E by the operation ⊕. Let P = (a, b) be a point on the curve with a, b ∈ k and 2P 6= ∞ (i.e. P is not a point of order 2). Then K = k(E) = k(v, w) is the function field for E. Now let w+b+a v+a 2 w+b+a + a2 . y=v+ v+a
x=
Notice that x and y are functions of v and w (i.e. x, y ∈ K). Substituting into E we get the following equation EP : y2 + (x2 + x + a + a2 )y = x3 + a2 x + a2 + b + a which we will call the quadratic model for E. It is this transformation between E and EP that will give the connection between the elliptic curve group and the infrastructure. Also, we have the following formulae for v and w in terms of x and y: v = y + x 2 + a2 w = x(v + a) + b + a. Since this is a birational transformation between E and EP we see that K can also be written as k(EP ) = k(x, y). Notice that y ∈ k(( x1 )) and that EP is non-singular, so K is a quadratic function field as described in Sections 3 and 4. For any f ∈ K, let (f) be the divisor of the function. Similarly, let (f)∞ and (f)0 be the divisors of poles and zeros, respectively. We would now like to find the divisors of poles of x and y. Since P is not a point of order 2, its uniformizing parameter is v + a and so it is easy to see from
628
Robert J. Zuccherato
the formula for x that P is a pole of x of order 1. Also, since wv is a uniformizing parameter for ∞ there is a pole of order 1 at ∞. These are the only poles, so (x)∞ = (∞) + (P ) . Similarly, it is easy to see that (y)∞ = 2 (P ) + (∞) . There is a k(x)-automorphism of K that takes y to y + x2 + x + a + a2 . That is, if f = g(x) + yh(x) is in K where g(x), h(x) ∈ k(x), write f ∗ = g(x)+yh(x)+(x2 +x+a+a2)h(x). This is the conjugate automorphism described earlier for general function fields. Notice that x∗ = x and y∗ = y +x2 +x +a+a2. Also v∗ = y + x + a and So and
w ∗ = x(v∗ + a) + b + a. v + v ∗ = x 2 + x + a + a2 vv∗ = ax2 + b + a + aa2 + a2 .
If Q0 = (x0 , y0 ) ∈ k × k is a solution to the equation EP , then so is Q∗0 = (x0 , y0 + x20 + x0 + a + a2 ). For Q ∈ E, Q 6= P, ∞, we can define Q∗ = (v∗ (Q), y∗ (Q)). Also define, ∞∗ = P and P ∗ = ∞. If we start with the curve EP then ∞ and P are the two points at infinity. We will now distinguish between the two. Now vv∗ has double poles at ∞ and P . If v and v∗ both have simple poles at ∞ (and at P ), then v + v∗ has at most a simple pole at ∞ (and at P ). This contradicts the fact that v + v∗ has double poles at ∞ (and P ). So v has a double pole at one of ∞ or P and v∗ has a double pole at the other. Using the uniformizing parameter for ∞ we can see that ∞ is a double pole of v and so P is a double pole of v∗ . Thus (v)∞ = 2 (∞) , (v∗ )∞ = 2 (P ) . There are two possibilities for y expressed as a Laurent series in x1 . We have chosen, as in Section 3, y = x + · · ·, so then v = x2 + x + a2 + · · ·. Also y∗ = x2 + (a + a2 ) + · · · and v∗ = a + · · ·. From this we get d = byc = x. Notice now that the place at infinity, P∞ , of k(x) extends to the place at ∞ in K. This follows from our choice for y. If we had made the other choice, then P∞ would have extended to the place at P . Since (x + x(Q)) = (Q) + (Q∗ ) − (∞) − (P ) for Q ∈ E, Q 6= ∞, P , we get that Q ⊕ Q∗ = P . If Q = (v0 , w0 ) ∈ E, then −Q = (v0 , w0 + v0 ). So, if Q 6= ∞, then v(−Q) = v(Q) and w(−Q) = w(Q) + v(Q).
Discrete Logarithms in Characteristic 2
6
629
Periodicity of the Continued Fraction Expansion and Orders of Points
This section will examine the continued fraction expansion of a specific function in K, fQ . Its periodicity will be related to the order of the point P and its special form will therefore give us the equivalence we want. Definition 3. Let Q be any k-rational point on E, with Q 6= P . Define v+v(Q∗ ) x+x(Q) if Q 6= ∞ fQ = v + v(Q∗ ) = v + a if Q = ∞. The function fQ is a quadratic irrational in K. Lemma 1. Let Q be any k-rational point on E, with Q 6= P . Then, up to multiplication by a non-zero constant, there is one and only one function f on E such that (f)∞ = (∞)+(Q) and f(P ) = 0. It is given by f = fQ . Furthermore, (fQ ) = (P ) + (−Q∗ ) − (Q) − (∞) . Proof. If Q 6= ∞, then (v + v(Q∗ )) = (Q∗ )+(−Q∗ )−2 (∞). Since x(Q) = x(Q∗) we have (x + x(Q)) = (Q) + (Q∗ ) − (∞) − (P ). Hence (fQ ) = (P ) + (−Q∗ ) − (Q) − (∞) .
If Q = ∞ then (fQ ) = (v + v(Q∗ ) = (Q∗ ) + (−Q∗ ) − 2 (∞). Since Q∗ = P , (fQ ) = (P ) + (−Q∗ ) − (Q) − (∞)
.
Let (f) = (P )+(Q0 ) −(∞)−(Q) for any point Q0 . Then f −1 fQ = (−Q∗ )− (Q0 ) which says that −Q∗ = Q0 and that f −1 fQ is a constant. Thus, up to multiplication by a non-zero constant, fQ is unique. Definition 4. Let ϕ(f) :=
1 , f + bfc
for any f ∈ K ∗ . This is one step in the continued fraction algorithm performed on f. Lemma 2. Let Q be a k-rational point on E, with Q 6= P . Let f be a function such that (f)∞ = (∞) + (Q). Then (P ) + (Q) − (∞) − (Q0 ) if Q 6= ∞ (ϕ(f)) = 2 (P ) − (∞) − (Q0 ) if Q = ∞
630
Robert J. Zuccherato
where Q0 =
P ⊕ Q if Q 6= ∞ 2P
if Q = ∞.
Thus, ϕ(f) is a constant multiple of fQ0 . Proof. Let Q 6= ∞. Since P is not a pole of f, f is not a polynomial in x and so f + bfc 6= 0 and has a zero at ∞. Thus, ϕ(f) has a pole at ∞. Now f has a simple pole at ∞, so bfc is a linear polynomial in x, and hence has a simple pole at P . So ϕ(f) has a zero at P . Also, f has a simple pole at Q and bfc does not, so ϕ(f) has a zero at Q. Now, f has poles at ∞ and Q, and bfc has poles at P and ∞, so ϕ(f) has no other zeros. Thus, (ϕ(f)) = (P ) + (Q) − (∞) − (Q0 ) and then Q0 = P ⊕ Q. If Q = ∞ then again ϕ(f) has a pole at ∞. Now f has a double pole at ∞, so bfc is a quadratic polynomial in x and has a double pole at P . Telling us that ϕ(f) has a double zero at P . There are no other poles for f or bfc, so (ϕ(f)) = 2 (P ) − (∞) − (Q0 ) and Q0 = 2P . Let Q 6= ∞, P . Then from the definition of fQ , v + v(Q∗ ) + x2 + x(Q)2 + x + x(Q) x + x(Q) ∗ v + a + a2 + x(Q)2 + v(Q∗ ) + x(Q) = x + x(Q) ∗ v + v(Q) = x + x(Q∗ ) ∗ = fQ ∗
fQ + (x + x(Q) + 1) =
since v = v∗ + x2 + x + a + a2 , v∗ (Q) = v(Q∗ ) and x(Q) = x(Q∗ ). ∗ ∗ ∗ = ∞. Thus deg(fQ Since fQ∗ has a zero at P , fQ ∗ has a zero at P ∗) < 0 and so, bfQ c = x + x(Q) + 1. This tells us that ϕ(fQ ) =
1 f ∗ (Q∗ )
x + x(Q∗ ) v∗ + v(Q) (x + x(Q∗ )) (v + v(Q)) = ∗ (v + v(Q)) (v + v(Q))
=
Discrete Logarithms in Characteristic 2
631
(x + x(Q∗ )) (v + v(Q)) v∗ v + (v + v∗ )v(Q) + v(Q)2 (x + x(Q∗ )) (v + v(Q)) = 2 (a + v(Q))x + v(Q)x + b + a + aa2 + a2 + av(Q) + a2 v(Q) + v(Q)2 = cfQ⊕P =
1 for some c ∈ k ∗ by the previous lemma. Thus c = a+v(Q) if v(Q) 6= a and 1 c = v(Q) if v(Q) = a. (Note that if v(Q) = a then Q = −P since we are not allowing Q = P .) Now let Q = ∞, so that fQ = v + a. We get
(v + a)∗ = (v + a) + x2 + x + a + a2 . Since v+a has a zero at P , (v+a)∗ has a zero at P ∗ = ∞. Thus deg((v+a)∗ ) < 0 and so bfQ c = x2 + x + a + a2 . Also, as before 1 (v + a)∗ v+a = ∗ (v + a)(v + a) v+a = ∗ vv + a(v + v∗ ) + a2 v+a = ax + b + a + a2 = cf2P ,
ϕ(fQ ) =
for some c ∈ k ∗ . Thus, c = a1 . These results allow us to state the following lemma. Lemma 3. Let Q be a k-rational point on E with Q 6= P . Then x + x(Q) + 1 if Q 6= ∞ bfQ c = x2 + x + a + a if Q = ∞ 2 and
1 v(Q)+a fQ⊕P if Q 6= ∞, −P ϕ(fQ ) = 1a f∞ if Q = −P 1 f if Q = ∞. a 2P
Let Q 6= P , we can then define 1 v(Q)+a if Q 6= ∞, −P λ(Q) = 1 if Q = ∞, −P a
632
Robert J. Zuccherato
Q ⊕ P if Q 6= ∞
and ψ(Q) =
2P
if Q = ∞.
As a consequence of the previous lemma ϕ(fQ ) = λ(Q)fψ(Q) . We will use the notation ϕj , λj and ψj to mean the j-fold composition of ϕ, λ and ψ with themselves. It is easy to see that ν
ϕν (cf) = c(−1) ϕν (f). Let ρν (Q) =
ν−1 Y
λ(ψj (Q))(−1)
ν−1−j
j=0
and we get the following proposition. Proposition 1. Let Q 6= P be a k-rational point on E. Then for ν ∈ ZZ≥0 , ϕν (fQ ) = ρν (Q)fψν (Q) and
ρν (Q) (x + x(ψν (Q)) + 1) if ψν (Q) 6= ∞ bϕν (fQ )c = ρν (Q) x2 + x + a + a2 if ψν (Q) = ∞.
Moreover, the formulae for ψν (Q) in terms of the group law on E are: Case 1: −Q is not a non-negative multiple of P and P has finite order µ. Write ν = qµ + r for q, r ∈ ZZ, 0 ≤ r < µ. Then ψν (Q) = Q ⊕ rP. Case 2: −Q = ν0 P and P has finite order µ. We may assume that 0 ≤ ν0 < µ−1 (since Q = P is not allowed). Write ν − ν0 = q(µ − 1) + r for q, r ∈ ZZ, 1 ≤ r ≤ µ − 1. Then ψν (Q) = (r + 1)P. Proof. These follow directly from repeated applications of Lemma 3 and the above definitions. To see Case 2, notice that if l(µ − 1) ≥ ν − ν0 > (l − 1)(µ − 1) for some l ∈ ZZ≥1 then ψν (Q) = ψr (ψ(l−1)(µ−1) (ψν0 (Q))) = ψr (ψ(l−1)(µ−1) (∞)) = ψr (∞) = (r + 1)P.
Discrete Logarithms in Characteristic 2
633
If ν − ν0 ≤ 0 then ν ≤ ν0 and r = ν − ν0 + µ − 1 so we simply get ψν (Q) = Q ⊕ νP = (ν − ν0 )P = (ν − ν0 + µ)P = (r + 1)P Corollary 1. Let ν ≥ 1 be an integer and P be a point of finite order µ. Write ν = q(µ − 1) + r with q, r ∈ ZZ and 1 ≤ r ≤ µ − 1. Then ϕν (f∞ ) = ρν (∞)f(r+1)P . Proof. This follows directly from Proposition 1 and the fact that ∞ = −ν0 P when ν0 = 0. From [18, Corollary 1] we know that if α is a quadratic irrational then the following hold. 1. If the continued fraction expansion of α is quasi-periodic with odd quasiperiod m, then it is periodic with period n and n = m or n = 2m. 2. If the continued fraction expansion of α is periodic with odd period, then it is quasi-periodic with quasi-period m = n. Theorem 4. Let Q 6= P be any k-rational point on E. Then the continued fraction expansion of fQ is quasi-periodic. Indeed, it is pure quasi-periodic. Moreover, if P has order µ and the continued fraction expansion of fQ has quasi-period m(Q) then m(Q) + 1 if −Q = ν0 P , ν0 ≥ 0 µ= m(Q) otherwise. Proof. Let P have finite order µ. Then if −Q is not a non-negative multiple of P, ψµ (Q) = Q ⊕ 0P = Q. So ϕµ (fQ ) = ρµ (Q)fψµ (Q) = ρµ (Q)fQ . Thus, fQ has pure quasi-period m(Q) ≤ µ. If −Q = ν0 P for 0 ≤ ν0 < µ − 1, then ψµ−1 (Q) = (µ − ν0 )P = µP − ν0 P = Q. So ϕµ−1 (fQ ) = ρµ−1 (Q)fψµ−1 (Q) = ρµ−1 (Q)fQ and fQ has pure quasi-period m(Q) ≤ µ − 1. Now m(Q) ≤ µ (resp. µ − 1). Since m(Q) is the quasi-period of fQ ϕm(Q) (fQ ) = ρm(Q) (Q)fψm(Q) (Q) = cfQ for some c ∈ k ∗ . Then ψm(Q) (Q) = Q by the uniqueness of fQ . This is only possible when m(Q) = µ (resp. µ − 1).
634
Robert J. Zuccherato
Theorem 5. Let P have order µ, let ν0 6≡ 1 (mod µ) be an integer, and let n be the period of the continued fraction expansion of fν0 P . Then if ρµ−1 (ν0 P ) = 1 µ − 1 n= 2(µ − 1) if ρ µ−1 (ν0 P ) 6= 1 where the second case only occurs if µ is even. Proof. We can assume without loss of generality that 2 ≤ ν0 ≤ µ, and let Q = ν0 P . Then the continued fraction expansion of fQ has pure quasi-period µ − 1. Of course if ρµ−1 (Q) = 1, then n = µ − 1. If µ is even, then µ − 1 is odd and so the period of fQ must be either n = µ − 1 or n = 2(µ − 1). We must show that if µ is odd, then ρµ−1 (Q) = 1. Since we are in Case 2, if 0 ≤ j ≤ µ − ν0 (j + ν0 )P ψj (Q) = ψj ((ν0 − µ)P ) = (j + ν + 1)P if µ − ν < j ≤ µ − 2. 0 0 So, Y
µ−2
ρµ−1 (ν0 P ) =
λ(ψj (ν0 P ))(−1)
µ−2−j
j=0
Y
µ−ν0
=
λ((j + ν0 )P )
j=0
=
µ Y
=
λ(iP )(−1)
µ−2−i+ν0
=
µ−2−j
µ+ν0 −1
Y
λ(iP )(−1)
µ−1−i+ν0
i=µ+2
λ(iP )(−1)
i−ν0 +1
i=ν0 µ−2 Y
λ((j + ν0 + 1)P )(−1)
j=µ−ν0 +1
i=ν0 µ Y
Y
µ−2
(−1)µ−2−j
µ+ν 0 −1 Y
λ(iP )(−1)
i−ν0
i=µ+2
λ(iP )(−1)
i−ν0 +1
λ(−P )(−1)
µ−ν0
λ(∞)(−1)
µ−ν0 +1
i=2
= 1, since λ(iP ) = λ((µ − i)P ) and λ(−P ) = λ(∞). Corollary 2. The continued fraction expansion of y is periodic. If the order of P is µ and the period of y is n, then if ρµ−1 (∞) = 1 µ − 1 n= 2(µ − 1) if ρ µ−1 (∞) 6= 1 where the second case can only occur when µ is even.
Discrete Logarithms in Characteristic 2
635
Proof. Notice that f∞ + y = v + a + y = x 2 + y + a2 + a + y = x2 + a2 + a. So the continued fraction expansion for y differs from that of f∞ only in the first term. Thus for all ν ≥ 1, ϕν (y) = ϕν (f∞ ). The result now follows from Theorem 5.
7
The Discrete Logarithm Problems
This section shows a polynomial-time equivalence between two types of discrete logarithm problems using underlying fields of characteristic 2 for which implementations of Diffie-Hellman [2] and ElGamal [3] type cryptosystems have been based. These are the elliptic discrete logarithm problem and the infrastructure discrete logarithm problem. The polynomial-time equivalence follows from the next two theorems. Theorem 1. Let E be a non-supersingular elliptic curve defined over k = F2n and let P be a point on the curve. Let EP be the quadratic model for E which defines the quadratic function field K = k(x)(y). If the elliptic discrete logarithm problem for E can be solved in polynomial time, then the infrastructure discrete logarithm problem for EP can also be solved in polynomial time. Proof. Let A be a primitive reduced ideal in O. If A = O, then the solution to the infrastructure discrete logarithm problem is δ(A, O) = 0. We will therefore assume that A = [x + x0 , y0 + y] is the ideal in adapted form for some x0 , y0 ∈ k. Let y0 = x20 + x0 + a + a2 + y0 . Then A = [x + x0 , x20 + x0 + a + a2 + y0 + y]. Since A is an ideal, x + x0 |y0 2 + (x2 + x + a + a2 )y0 + (x3 + a2 x + a2 + b + a) and thus, (x0 , y0 ) is a solution to the equation EP . Notice that y0 = y0∗ , so (x0 , y0 ) is also a solution to the equation EP . Let Q = (v0 , w0) be the corresponding point on E, using the formulae of Section 5. Notice that Q 6= P, ∞. Let (xi , yi ) be the solution to the equation EP corresponding to iP ∈ E, iP 6= ∞, P . We can assume that 2 ≤ i < µ where µ is the order of P . By Theorem 4 we know that µ − 1 is the quasi-period of y. Now ϕi−1 (y) = ϕi−1 (f∞ ) = ρi−1 (∞)fψi−1 (∞) = ρi−1 (∞)fiP . i−1 +y then ϕi−1 (y) = αi−1 = PQ since ϕ just performs one step If α0 = y = 0+y 1 i−1 in the continued fraction algorithm. Thus,
x2 + xi + a + a2 + yi + y Pi−1 + y = ρi−1 (∞) . Qi−1 x + xi
636
Robert J. Zuccherato
This implies that Pi−1 = x2 + xi + a + a2 + yi 1 (x + xi ) . Qi−1 = ρi−1 (∞) So, we get the reduced ideal in adapted form Ai = [x + xi , x2i + xi + a + a2 + yi + y]. Thus, if xi = x0 and yi = y0 then Q = iP and also Ai = A, and if no such xi and yi exist, then δ(A, O) does not exist. Now δ(Ai , O) = deg(B) − deg(Q0 ) +
i−2 X
deg(aj )
j=1
= 2 − 0 + (i − 2) =i since for 1 ≤ j < µ − 1, deg(aj ) = 1. So if we can solve the elliptic discrete logarithm problem on E, (e.g. find i ≥ 2 such that iP = Q or determine that no such i exists) then we can solve the infrastructure discrete logarithm problem. Since v0 and w0 can be computed in polynomial time, the infrastructure discrete logarithm problem can be computed in polynomial time if the elliptic discrete logarithm problem can be solved in polynomial time. Theorem 2. Let E be a non-supersingular elliptic curve defined over k = F2n and let P be a point on the curve. Let EP be the quadratic model for E which defines the quadratic function field K = k(x)(y). If the infrastructure discrete logarithm problem for EP can be solved in polynomial time, then the elliptic discrete logarithm problem for E can also be solved in polynomial time. Proof. Let Q be a point on E. If Q = ∞ then the solution to the elliptic discrete logarithm problem is 0. If Q = P then the solution to the elliptic discrete logarithm problem is 1. So, we will assume that Q = (v0 , w0 ) is the point, and that (x0 , y0 ) is the corresponding solution to the equation EP . Now let A = [x + x0 , x20 + x0 + a + a2 + y0 + y] be a primitive reduced ideal. As was shown in the proof to Theorem 1, if we can find δ(A, O) or determine that it does not exist, then we have found i such that Q = iP or determined that such an i does not exist. Again, we are able to compute v0 and w0 in polynomial time, so if the infrastructure discrete logarithm problem can be solved in polynomial time then so can the elliptic discrete logarithm problem. We have just shown that solving the elliptic discrete logarithm problem on E is polynomial-time equivalent to solving the infrastructure discrete logarithm problem on EP . Recently, a probabilistic sub-exponential algorithm was developed for solving the infrastructure discrete logarithm problem for function fields
Discrete Logarithms in Characteristic 2
637
whose genus is at least logarithmic in the order of the underlying odd characteristic finite field [7]. It seems reasonable that a version of this algorithm should also work in characteristic 2. Since this algorithm is only applicable to function fields of relatively large genus and EP has genus 1, it does not appear that this attack is feasible. In [8] a Pohlig-Hellman type algorithm [9] is described that solves the infrastructure discrete logarithm problem in √ O ((d pe + log2 R) deg(B)) polynomial operations, where p is the largest prime that divides R. This attack has the same expected running time as the Pohlig-Hellman algorithm in the elliptic curve group. No known methods for solving the infrastructure discrete logarithm problem combined with the correspondence described in this paper give an improvement over known methods of solving the elliptic discrete logarithm problem. Since we know of no other way of solving the infrastructure discrete logarithm problem, this may provide further evidence of the intractability of the elliptic discrete logarithm problem. It is easy to see that the proofs to the above theorems give a bijection between the sets {Q ∈ E | Q = iP, 2 ≤ i ≤ µ − 1} and {A ⊂ O, A 6= O | A can be obtained from the continued fraction expansion of O} and that µ − 1 equals the quasi-period, m, of the continued fraction expansion of y. Now, since the regulator R = δm+1 = m + 1, we get that R is also the order of P . Thus, computing the order of a point, P , on E is polynomial-time equivalent to finding the regulator of the function field defined by EP . Also, producing a curve with a point of a given order is polynomial-time equivalent to producing a function field of the form given by EP with a given regulator. The problem of finding curves and points with large prime order is of great interest in elliptic curve cryptography. Thus, it would also be of great interest if we could efficiently compute regulators of such fields. In [16] a method is given that determines the regulator of quadratic function fields of odd characteristic in 1
O(q 5 deg(D)+ ) operations where Y 2 = D(X) defines the function field. It is unclear if this method generalizes to function fields of even characteristic. Schoof’s algorithm [13] for computing the number of points on an elliptic curve requires O log8 q bit operations, and so at the present time computing regulators is not as efficient as counting points. Stein has empirically observed that in the odd characteristic case there are certain classes of function fields that tend to have large regulator [15]. At the present time it is unclear if there is a characteristic 2 analog of these classes.
638
Robert J. Zuccherato
References 1. W.A. Adams and M.J. Razar, Multiples of points on elliptic curves and continued fractions, Proc. London Math. Soc. 41 (1980), pp. 481-498. 2. W. Diffie and M.E. Hellman, New directions in cryptography, IEEE Trans. Inform. Theory 22 (1976), pp. 644-654. 3. T. ElGamal, A public key cryptosystem and a signature scheme based on discrete logarithms, IEEE Trans. Inform. Theory 31 (1985), pp. 469-472. 4. N. Koblitz, Elliptic curve cryptosystems, Math. Comp. 48 (1987), pp. 203-209. 5. A.J. Menezes, Elliptic Curve Public Key Cryptosystems, Kluwer, Boston, 1993. 6. V. Miller, Uses of elliptic curves in cryptography, Advances in Cryptology CRYPTO ’85, Lecture Notes in Computer Science 218 (1986), Springer-Verlag, pp. 417-426. 7. V. M¨ uller, A. Stein and C. Thiel, Computing discrete logarithms in real quadratic congruence function fields of large genus, preprint. 8. V. M¨ uller, S.A. Vanstone and R.J. Zuccherato, Discrete logarithm based cryptosystems in quadratic function fields of characteristic 2, to appear in Designs, Codes and Cryptography. 9. S. Pohlig and M. Hellman, An improved algorithm for computing logarithms over GF (p) and its cryptographic significance, IEEE Trans. Inform. Theory 24 (1978), pp. 918-924. 10. R. Scheidler, Cryptography in real quadratic congruence function fields, Proceedings of Pragocrypt 1996, CTU Publishing House, Prague, Czech Republic (1996). 11. R. Scheidler, J.A. Buchmann and H.C. Williams, A key exchange protocol using real quadratic fields, J. Cryptology 7 (1994) pp. 171-199. 12. R. Scheidler, A. Stein and H.C. Williams, Key-exchange in real quadratic congruence function fields, Des. Codes Cryptogr. 7 (1996), pp. 153-174. 13. R. Schoof, Elliptic curves over finite fields and the computation of square roots mod p, Math. Comp. 44 (1985), pp. 483-494. 14. D. Shanks, The infrastructure of a real quadratic field and its applications, Proc. 1972 Number Theory Conf., Boulder, Colorado, 1972, pp. 217-224. 15. A. Stein, Equivalences between elliptic curves and real quadratic congruence function fields, Proceedings of Pragocrypt 1996, CTU Publishing House, Prague, Czech Republic (1996). 16. A. Stein and H.C. Williams, Baby step-giant step in real quadratic function fields, preprint. 17. B. Weiss and H.G. Zimmer, Artin’s Theorie der quadratischen Kongruenzfunkionenk¨ orper und ihre Anwendung auf die Berechnung der Einheiten- und Klassengrupen, Mitt. Math. Ges. Hamburg XII (1991), pp. 261-286. 18. R.J. Zuccherato, The continued fraction algorithm and regulator for quadratic function fields of characteristic 2, Journal of Algebra 190 (1997), pp. 563-587. 19. R.J. Zuccherato, New Applications of Elliptic Curves and Function Fields in Cryptography, Ph.D. Thesis, Department of Combinatorics and Optimization, University of Waterloo, Canada (1997).
Author Index
Bernstein, Daniel J. 128 Bluher, Antonia W. 482 Boneh, Dan 48, 237
Murphy, Brian
Cesari, Giovanni 64 Cohen, Henri 372, 381,433
Neis, Stefan 299 Nguyen, Phong 151 Niederreiter, Harald 555 Olivier, Michel
De Win, Erik 252 Deshouillers, Jean-Marc 196, 204 Diaz y Diaz, Francisco 372, 381, 433 Djabri, Z. 502 Dummit, David S. 400 Elkies, Noam D. Flajolet, Philippe
1
137
372, 381,433
Panario, Daniel 226 Papanikolaou, Thomas 338 Paulus, Sachar 567, 576 Peth6, A. 528 Pipher, Jill 267 Poorten, Alf van der 358 Preneel, Bart 252
226 Riele, H.J.J. te 204 Roberts, David P. 412 Rodemich, Gene 216
Galway, William F. 169 Gee, Alice 441 Gordon, Daniel M. 216 Gourdon, Xavier 226
Saouter, Y. 204 Scheidler, Renate 592 Semaev, I.A. 311 Silverman, Joseph H. 267 Smart, N.P. 502 Smit, Bart de 392 Sorenson, Jonathan P. 179 Stein, Andreas 576, 592, 607 Stevenhagen, Peter 441
Haible, Bruno 338 Hennecart, Francois 196 Herrmann, E. 528 Hoffstein, Jeffrey 267 Holden, Joshua 454 Horwitz, Jeremy 237 Huang, Ming-Deh 514 Jacobson, Jr., Michael J. Jones, John W. 412 Landreau, Bernard 196 Louboutin, St~phane 475 Martinet, Jacques 424 Mih~ilescu, Preda 95 Mister, Serge 252 Morain, F. 111
463
Tangedal, Brett A. 400 Teske, Edlyn 351,541 Vallde, Brigitte
77
Wetzel, Susanne 323 Wiener, Michael 252 Williams, Hugh C. 351, 607 Wong, Yiu-Chung 514
640
Author Index
Xing, Chaoping
555
Young, Adam 289 Yung, Moti 289
Zhang, Mingzhi 131 Zimmer, H.G. 528 Zuccherato, Robert J.
621
Lecture Notes in Computer Science For information about Vols. 1 - 1 3 4 8 please contact your bookseller or Springe~Verlag
Vol. 1349: M. Johnson (Ed.), Algebraic Methodology and Software Technology. Proceedings, 1997. X, 594 pages. 1997.
Vol. 1367: E.W. Mayr, H.J. Pr6mel, A. Steger (Eds.), Lectures on Proof Verification and Approximation Algorithms. XII, 344 pages. 1998.
Vol. 1350: H.W. Leong, H. Imai, S. Jain (Eds.), Algorithms and Computation. Proceedings, 1997. XV, 426 pages. 1997.
Vol. 1368: Y. Masunaga, T. Katayama, M. Tsukamoto (Eds.), Worldwide Computing and Its Applications - WWCA'98. Proceedings, 1998. XIV, 473 pages. 1998.
Vol. 1351: R. Chin, T.-C. Pong (Eds.), Computer Vision - ACCV'98. Proceedings Vol. I, 1998. XXIV, 761 pages. 1997.
Vol. 1370: N.A. Streitz, S. Konomi, H.-J. Burkhardt (Eds.), Cooperative Buildings. Proceedings, 1998. XI, 267 pages. 1998.
Vol. 1352: R. Chin, T.-C. Pong (Eds.), Computer Vision - ACCV'98. Proceedings Vol. II, 1998. XXIV, 757 pages. 1997.
Vol. 1371: I. Wachsmuth, M. Fr~hlich (Eds.), Gesture and Sign Language in Human-Computer Interaction. Proceedings, 1997. XI, 309 pages. 1998. (Subseries LNAI).
Vol. 1353: G. BiBattista (Ed.), Graph Drawing. Proceedings, 1997. XII, 448 pages. 1997.
Vol. 1372: S. Vaudenay (Ed.), Fast Software Encryption. Proceedings, 1998. VIII, 297 pages. 1998.
Vol. 1354: O. Burkart, Automatic Verification of Sequential Infinite-State Processes. X, 163 pages. 1997.
Vol. 1373: M. Morvan, C. Meinel, D. Krob (Eds.), STACS 98. Proceedings, 1998. XV, 630 pages. 1998.
Vol. 1355: M. Darnell (Ed.), Cryptography and Coding. Proceedings, 1997. IX, 335 pages. 1997.
Vol. 1374: H. Bunt, R.-J. Beun, T. Borghuis (Eds.), Multimodal Human-Computer Communication. VIII, 345 pages. 1998. (Subseries LNAI).
Vol. 1356: A. Danthine, Ch. Diot (Eds.), From Multimedia Services to Network Services. Proceedings, 1997. XII, 180 pages. 1997. Vol. 1357: J. Bosch, S. Mitchell (Eds.), Object-Oriented Technology. Proceedings, 1997. XIV, 555 pages. 1998. Vol. 1358: B. Thalheim, L. Libkin (Eds.), Semantics in Databases. XI, 265 pages. 1998. Vol. 1359: G. Antoniou, A.K. Ghose, M. Truszczytiski (Eds.), Learning and Reasoning with Complex Representations. Proceedings, 1996. X, 283 pages. 1998. (Subseries LNAI). Vol. 1360: D. Wang (Ed.), Automated Deduction in Geo m e t r y . P r o c e e d i n g s , 1996. VII, 235 pages. 1998. (Subseries LNAI). Vol. 1361: B. Christianson, B. Crispo, M. Lomas, M. Roe (Eds.), Security Protocols. Proceedings, 1997. VIII, 217 pages. 1998. Vol. 1362: D.K. Panda, C.B. Stunkel (Eds.), NetworkBased Parallel Computing. Proceedings, 1998. X, 247 pages. 1998.
Vol. 1375: R. D. Hersch, J. Andr6, H. Brown (Eds.), Electronic Publishing, Artistic Imaging, and Digital Typography. Proceedings, 1998. XIII, 575 pages. 1998. Vol. 1376: F. Parisi Presicce (Ed.), Recent Trends in Algebraic Development Techniques. Proceedings, 1997. VIII, 435 pages. 1998. Vol. 1377: H.-J. Schek, F. Saltor, I. Ramos, G. Alonso (Eds.), Advances in Database Technology - EDBT'98. Proceedings, 1998. XII, 515 pages. 1998. Vol. 1378: M. Nivat (Ed.), Foundations of Software Science and Computation Structures. Proceedings, 1998. X, 289 pages. 1998. Vol. 1379: T. Nipkow (Ed.), Rewriting Techniques and Applications. Proceedings, 1998. X, 343 pages. 1998. Vol. 1380: C.L. Lucchesi, A.V. Moura (Eds.), LATIN'98: Theoretical Informatics. Proceedings, 1998. XI, 391 pages. 1998. Vol. 1381: C. Hankin (Ed.), Programming Languages and Systems. Proceedings, 1998. X, 283 pages. 1998.
Vol. 1363: J.-K. H a o , E. Lutton, E. Ronald, M. Schoenauer, D. Snyers (Eds.), Artificial Evolution. XI, 349 pages. 1998.
Vol. 1382: E. Astesiano (Ed.), Fundamental Approaches to Software Engineering. Proceedings, 1998. XII, 331 pages. 1998.
Vol. 1364: W. Conen, G. Neumann (Eds.), Coordination Technology for Collaborative Applications. VIII, 282 pages. 1998.
Vol. 1383: K. Koskimies (Ed.), Compiler Construction. Proceedings, 1998. X, 309 pages. 1998.
Vol. 1365: M.P. Singh, A. Rao, M.J. Wooldridge (Eds.), Intelligent Agents IV. Proceedings, 1997. XII, 351 pages. 1998. (Subseries LNAI). Vol. 1366: Z. Li, P.-C. Yew, S. Chatterjee, C.-H. Huang, P. Sadayappan, D. Sehr (Eds.), Languages and Compilers for Parallel Computing. Proceedings, 1997. XII, 428 pages. 1998.
Vol. 1384: B. Steffen (Ed.), Tools and Algorithms for the Construction and Analysis of Systems. Proceedings, 1998. XIII, 457 pages. 1998. Vol. 1385: T. Margaria, B. Steffen, R. Rtickert, J. Posegga (Eds.), Services and Visualization. Proceedings, 1997/ 1998. XII, 323 pages. 1998.
Vol. ! 386: T.A. Henzinger, S. Sastry (Eds.), Hybrid Systems: Computation and Control. Proceedings, 1998. VIII, 417 pages. 1998.
Vol. 1407: H. Burkhardt, B. Neumann (Eds.), Computer Vision - ECCV'98. Vol. II. Proceedings, 1998. XVI, 881 pages. 1998.
Vol. 1387: C. Lee Giles, M. Gori (Eds.), Adaptive Processing of Sequences and Data Structures. Proceedings, 1997. XII, 434 pages. 1998. (Subseries LNAI).
Vol. 1409: T. Schaub, The Automation of Reasoning with Incomplete Information. XI, 159 pages. 1998. (Subseries LNAI).
Vol. 1388: J. Rolim (Ed.), Parallel and Distributed Processing. Proceedings, 1998. XVII, 1168 pages. 1998.
VoL 141 i: L. Asplund (Ed.), Reliable Software Technologies - Ada-Europe. Proceedings, 1998. XI, 297 pages. 1998.
Vol. 1389: K. Tombre, A.K. Chhabra (Eds.), Graphics Recognition. Proceedings, 1997. XII, 421 pages. 1998. Vol. 1390: C. Scheideler, Universal Routing Strategies for Interconnection Networks. XVII, 234 pages. 1998.
Vol. 1412: R.E. Bixby, E.A. Boyd, R.Z. Rios-Mercado (Eds.), Integer Programming and Combinatorial Optimization. Proceedings, 199g. IX, 437 pages. 1998.
Vol. 1391: W. Banzhaf, R. Poli, M. Schoenauer, T.C. Fogarty (Eds.), Genetic Programming. Proceedings, 1998. X, 232 pages. 1998.
Vol. 1413: B. Pernici, C. Thanos (Eds.), Advanced Information Systems Engineering. Proceedings, 1998. X, 423 pages. 1998.
Vol. 1392: A. Barth, M. Breu, A. Endres, A. de Kemp (Eds.), Digital Libraries in Computer Science: The MeDoc Approach. VIII, 239 pages. 1998.
Vol. 1414: M. Nielsen, W. Thomas (Eds.), Computer Science Logic. Selected Papers, 1997. VIII, 5 ! 1 pages. 1998.
Vol. 1393: D. Bert (Ed.), B'98: Recent Advances in the Development and Use of the B Method. Proceedings, 1998. VIII, 313 pages. 1998.
Vol. 1415: J. Mira, A.P. de] Pobil, M.AIi (Eds.), Methodology and Tools in Knowledge-Based Systems. Vol. I. Proceedings, 1998. XXIV, 887 pages. 1998. (Subseries LNAI).
Vol. 1394: X. Wu. R. Kotagiri, K.B. Korb (Eds.), Research and Development in Knowledge Discovery and Data Mining. Proceedings, 1998. XVI, 424 pages. 1998. (Subseries LNAI).
Vol. 1416: A.P. del Pobil, J. Mira, M.AIi (Eds.), Tasks and Methods in Applied Artificial Intelligence. Vol.II. Proceedings, 1998. XXIII, 943 pages. 1998. (Subseries LNAI).
Vol. 1395: H. Kitano (Ed.), RoboCup-97: Robot Soccer World Cup I. XIV, 520 pages. 1998. (Subseries LNAI).
Vol. 1417: S. Yalamanchili, J. Duato (Eds.), Parallel Computer Routing and Communication. Proceedings, 1997. XII, 309 pages. 1998.
Vol. 1396: E. Okamoto, G. Davida, M. Mambo (Eds.), Information Security. Proceedings, 1997. XII, 357 pages. 1998.
Vol. 1418: R. Mercer, E. Neufeld (Eds.), Advances in Artificial Intelligence. Proceedings, 1998. XII, 467 pages. 1998. (Subseries LNAI).
Vol. 1397: H. de Swart (Ed.), Automated Reasoning with Analytic Tableaux and Related Methods. Proceedings, 1998. X, 325 pages. 1998. (Subseries LNAI).
Vol. 1420: J. Desel (Ed.), Application and Theory of Petri Nets. Proceedings, 1998. VIII, 385 pages, 1998.
Vol. 1398: C. N6dellec, C. Rouveirol (Eds.), Machine Learning: ECML-98. Proceedings, 1998. XII, 420 pages. 1998. (Subseries LNAI).
Vol. 1421: C. Kirchner, H Kirchner (Eds.), Automated Deduction - CADE-15. Proceedings, 1998. XIV, 443 pages. 1998. (Subseries LNAI).
Vol. 1399: O. Etzion, S. Jajodia, S. Sripada (Eds.), Temporal Databases: Research and Practice. X, 429 pages. 199"8.
Vol. 1422: J. Jeuring (Ed.), Mathematics of Program Constmction. Proceedings, 1998. X, 383 pages, 1998.
Vol. 1400: M. Lenz, B. Bartsch-Sptrl, H.-D. Burkhard, S. Wess (Eds.), Case-Based Reasoning Technology. XVIII, 405 pages. 1998. (Subseries LNAI). Vol. 1401: P. Sloot, M. Bubak, B. Hertzberger (Eds.), High-Performance Computing and Networking. Proceedings, 1998. XX, 1309 pages. 1998. Vol. 1402: W. Lamersdorf, M. Merz (Eds.), Trends in Distributed Systems for Electronic Commerce. Proceedings, 1998. XII, 255 pages. 1998. Vol. 1403: K. Nyberg (Ed.), Advances in Cryptology EUROCRYPT '98. Proceedings, 1998. X, 607 pages. 1998. Vol. 1404: C. Freksa, C. Habel. K.F. Wender (Eds.), Spatial Cognition. VIII, 491 pages. 1998. (Subseries LNAI). Vol. 1405: S.M. Embury, N.J. Fiddian, W.A. Gray, A.C. Jones (Eds.), Advances in Databases. Proceedings, 1998. XII, 183 pages. 1998. Vol. 1406: H. Burkhardt, B. Neumann (Eds.), Computer Vision - ECCV'98. Vol. I. Proceedings, 1998. XVI, 927 pages. 1998.
Vol. 1423: J. Buhler (Ed.), Algorithmic Number Theory. Proceedings, 1998. X, 640 pages. 1998. Vol. 1424: L. Polkowski, A. Skowron (Eds.), Rough Sets and Current Trends in Computing. Proceedings, 1998. XIII, 626 pages. 1998. (Subseries LNAI). Vol. 1425" D. Hutchison, R. Schfffer (Eds.), Multimedia Applications, Services and Techniques - ECMAST'98. Proceedings, 1998. XVI, 532 pages. 1998. Vol. 1427: A.J. Hu, M.Y. Vardi (Eds.), Computer Aided Verification. Proceedings, 1998. IX, 552 pages. 1998. Vol. 1430: S. Trigila. A. Mullery, M. Campolargo, H. Vanderstraeten, M. Mampaey (Eds.), Intelligencein Services and Networks: Technology for Ubiquitous Telecom Services. Proceedings, 1998. XII, 550 pages. 1998. Vol. 1431: H. Imai, Y. Zheng (Eds.), Public Key Cryptography. Proceedings, 1998. XI, 263 pages. 1998. Vol. 1435: M. Klusch, G. Wei8 (Eds.), Cooperative Information Agents II. Proceedings, 1998. IX, 307 pages. 1998. (Subseries LNAI). Vol. 1436: D. Wood, S. Yu (Eds.), Automata Implementation. Proceedings, 1997. VIII, 253 pages. 1998.