May 2005
Volume 112, Number 5
Arturo Magidin David McKinnon Florian Deloup Antonio Behn Christopher Kribs-Zaleta Vadim Ponomarenko Paul E. Becker
Gauss's Lemma for Number Fields
385
The Fundamental Group of the Circle Is Trivial The Convergence of Difference Boxes
417
Do Normal Subgroups Have Straight Tails?
440
Avoiding Eigenvalues in Computing Matrix Powers Exact Solutions for Minimax Optimization Problems A Simple Proof of the Gauss-Winckler Inequality Copositive Symmetric Cubic Forms
450
426
NOTES
Raghib Abu-Saris Wajdi Ahmad Francese Comellas J. Luis A. Yebra F. G. Avkhadiev Mowaffaq Hajja PROBLEMS AND SOLUTIONS
454 459 462 467
REVIEWS
Daniel J. Madden
Abstract Algebra. By Ronald Solomon. Integers, Polynomials and Rings.
475
By Ronald S. Irving.
John Sylvester
Introduction to the Mathematics of Medical Imaging. By Charles L. Epstein.
AN OFFICIAL PUBLICATION OF THE MATHEMATICAL ASSOCIATION OF AMERICA
479
Gauss's Lemma for Number Fields Arturo Magidin and David McKinnon
1. INTRODUCTION. This article arose when the following question was asked on the newsgroup sci. math: Question 1.1. Can every polnwlIlial lI'ith integer coefficients be factored into (not necessarily monic) linear terms, each \\'ith algebraic integer coefficients? The answer is yes. and follows from a version of Gauss's lemma applied to number fields. Gauss's lemma plays an important role in the study of unique factorization, and it was a failure of unique factorization that led to the development of the theory of algebraic integers. These developments were the basis of algebraic number theory, and also of much of ring and module theory. We take the opportunity afforded by this problem to discuss some of these historical developments, on the (at times flimsy) excuse of introducing the necessary notions for the proofs. It wi II take us some time to get to the answer to Question 1.1, so we ask for the reader's patience. To make for easier reading, we give names to many of the results. While some of these are standard, others are the inventions of the authors. The paper is organized as follows: First we discuss some of the history of unique factorization. In sections 3 and 4 we discuss how Euler and Lame ran afoul of unique factorization when dealing with Fermat's Last Theorem. Section 5 relates Kummer's own struggle with the failure of unique factorization, and his solution for cyclotomic fields. Section 6 deals with Kronecker's and Dedekind's extensions of Kummer's work, setting up the stage for our treatment of Question 1.1. In section 7 we recall the basic notions associated with number fields that we need, and proceed in section 8 to prove Gauss's lemma and a version of its corollaries for number fields, providing an answer to Question 1.1. In section 9 we discuss some related ring theoretic notions and provide an alternative approach to answering our question. Finally, in section lOwe consider the question in the setting offunction fields (which arc closely related to number fields) and give an example to show that Question 1.1 has a negative answer there.
2. UNIQUE FACTORIZATION AND GAUSS'S LEMMA. Gauss was the first to give a proof of the following fact [9, art. 161:
Theorem 2.1 (Fundamental Theorem of Arithmetic). Ever\' positil'e integer call be factored uniquel\' into a product of'prime /llImbers. The proof proceeds in two steps: Step 1. First, we show that every positive integer can be written as a product of primes in at least one way. This argument relies on a kind of "finiteness" that the positive integers exhibit, namely, that there can be no infinite strictly decreasing sequence of positive integers. If n = I, it equals, by definition, the empty product. Otherwise, it is either a prime. in which case we can write 11 = n. or it is not a prime. in which case we can write it as a product of two strictly smaller positive integers. We can proceed by applying the same argument to each of those factors. or alternatively. we May 20051
GAUSS'S LEMMA FOR NUMBER FIELDS
385
can appeal to an induction hypothesis to assert that each must be a finite product of primes. We conclude that n itself is a product of primes. (In fact, Gauss skips this step in his proof, saying merely that "it is clear from elementary considerations that any composite number can be [factored] into prime factors.") Step 2. Once we know that the number can be written as a product of primes in at least one way, we prove the uniqueness by using the "prime divisor property"[8, Proposition 30]: Proposition 2.2 (Prime Divisor Property). If p is a prime number and p divides the product ab of two integers a and b, then p divides a or P divides b. To prove the uniqueness of the factorization, we suppose that n = PI ... P,. = ql ... q,
are two factorizations of n into primes. From the prime divisor property, it follows that, since PI divides the product ql ... q" it divides at least one of the qi' Because qi is prime, we must have PI = qi' at which point we may cancel them and apply induction. Although the prime divisor property goes back at least to the time of Euclid, unique factorization had not been explicitly formulated (nor proved) until Gauss did so. Gauss points out in the first paragraph of his proof that "it is often wrongly taken for granted" that factorization is unique. In fact, Euclid himself implicitly invokes this fact: in his discussion of Pythagorean triples a key step uses the fact that, if the product of two relatively prime integers is a square, then each of the integers must itself be a square, a result that requires unique factorization. An argument following the same broad outline as the proof of the fundamental theorem of arithmetic is used to prove that the ring of polynomials (/l[x] is also a unique factorization domain (UFD), and in fact that F[x] is a UFD whenever F is a field. First, we replace the notion of "prime" with that of "irreducible": Definition 2.3. Let R be a commutative ring with identity. An element x of R is irreducihle if (I) x is not a unit (i.e., x does not have a multiplicative inverse in R) and (2) whenever x factors in R as x = yz, either y is a unit or z is a unit (i.e., either y or
z has a multiplicative inverse). The finiteness condition necessary to guarantee that every polynomial f(x) in F[x] is a product of irreducibles follows by looking at the degree function, while the fact that every irreducible satisfies the prime divisor property is a corollary of the division algorithm: Theorem 2.4 (Division Algorithm for Polynomials over a Field). Let F he a .field, and let a (x) and h(x) he polynomials with coefficients in F and with b f= O. Then there exist unique polynomials q(x) and rex) in F[x] (the "quotient" and the "remainder," respectively) such that a(x) = b(x)q(x)
+ rex),
where either r = 0 or deg(r) < deg(b).
(This, of course, establishes that F[x] is a Euclidean domain, a condition that is well known to imply unique factorization.) 386
© THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
In [9] Gauss is mostly concerned with integers, however, and if we restrict ourselves to polynomials with integer coefficients, the division algorithm no longer holds. For example, if a (X) = 3x + 2 and h(x) = 2x + 3, we cannot find q (x) and r (x) in Z[x] satisfying the conditions stipulated by the division algorithm. So unique factorization in Z[x] is harder to establish. The degree function, together with the fundamental theorem of arithmetic, shows that the first step in the proof of unique factorization can be achieved: every polynomial in Z[x] can be written as a product of irreducibles, where an irreducible is either a prime integer or a nonconstant irreducible polynomial f (x) in Z[x] (i.e., f (x) cannot be written as the product of two nonconstant polynomials) whose coefficients have no common factor other than I and -I (this condition is known as primitivity, and we will have much more to say about it). It is the prime divisor property that seems a bit more difficult. That is where Gauss's lemma comes in. What we would like to do is make use of the already established facts that both Z and Q[x] are unique factorization domains. First, we want to show that the primes from Z remain irreducible in Z[x], which is of course trivial. It is also clear that if f (x) in Z[x] is irreducible when we consider it in Q[x], then it will be the product of an integer times a polynomial that is irreducible in Z[x] (since all nonzero constants have multiplicative inverses in Q[x], they do not affect irreducibility; this is not the case in Z[x], where the only constants with multiplicative inverses are I and -I). If we separate out the integer and factor it into primes, the result will be a factorization into irreducibles in Z[x] for f(x). The immediate difficulty, however, lies in the converse question: Is it possible that there is a polynomial f (x) in Z[x J that can be factored in Q[x], but not in Z[x]? And the second issue, which we can perhaps already see approaching us, is whether the integer primes are actually primes in Z[x]: if p I g(x)h(x), does it follow that p I g(x) or p I hex) in Z[x]? As it happens, both difficulties can be dealt with at the same time. We deal with the latter first by establishing the contrapositive, which is nothing other than Gauss's lemma [9, art. 42]. We then prove unique factorization by first factoring out any constants that we can, using Q[x] as a "stepping stone" to factor the polynomial part, and then "lifting" this factorization back to Z[ x].
Definition 2.5. Let f(x) = allx ll + ... + alx + ao be a nonzero memberofZ[x]. The content cont(j) of f is the greatest common divisor of ao ..... all' Definition 2.6. A polynomial f (x) in Z[x 1 is primitive if cont(j)
= I.
Theorem 2.7 (Gauss's Lemma). The product of primitive polynomials is itselfprimitive. Proof Let f(x) and g(x) be primitive polynomials in Z[xl. and let hex) = f(X)R(X).
Write
g(x) hex)
ll
+ ... + alx + ao. = b/llx/ll + ... + blx + bo. = c +/IIX + m + ... + CIX + Co.
f(x) = allx
Il
lI
To prove that hex) is primitive, consider an arbitrary prime p. We must show that p does not divide all the Ci. May 2005]
GAUSS'S LEMMA FOR NUMBER FIELDS
387
Let io be the smallest index such that p t ([i o ; such an index exists since f(x) is primitive. Likewise, let jo be the smallest index such that p t bjo' We consider cio+io' We have: Cio+io = Uio+iobO
+ aio+io-Ibl + ... + aio+lb jo - I
+ uiob jo + Gio-lbio+1 + ... + Glbio+jo-I + Ciobio+jo' Each summand in the flrst expression is a multiple of p by the choice of jo, while each summand in the third expression is a multiple of p by the choice of i o. Furthermore, Uiobjll is not a mUltiple of p, ensuring that p cannot divide cio+io' • The argument given also shows that if p is a prime in Z and divides the product, then it must divide one of the two factors, showing that integer primes are sti II primes in Z[xJ. Two more auxiliary results demonstrate that Z[xl is a UFO. They establish the connection between Q[x 1 and Z[x 1 that allows us to "lift" factorizations from the former to the latter. The first uses, implicitly, unique factorization in Z in order to express an element of Q as the quotient of two relatively prime integers. The second uses Gauss's lemma explicitly.
Lemma 2.8 (Factoring Out the Content). ff" f (x) is G nonzero polynomial in Q[x], then there exist ci in Q und a primitive polynomial f* (x) in Z[x 1 sueh that f (x) = elf*(x). Moreol'('I; CI and f*(x) are unique lip to sign. Proot Write
f .(x)
=
((/11) x I + ... + (ao) - , bll
bo
where ai and hi are relatively prime integers for i b o ... /)11 to clear denominators, we obtain: (b o ' .. hll)f(x) = allx"
Let g(x) = allx"
+ ... + ao, and let C =
= O.
I, .... n. Multiplying by
+ ... + ao·
cont(g). Then g(x) can be written as
g(x) = cg*(x). where g*(x) is a primitive polynomial. Therefore. (17 0
, ..
bll)f(x) = cg*(x).
Letting ci = e/(h o '" bll) and f*(x) = g*(x) proves the existence assertion. Uniqueness will follow provided we can prove the following: if g*(x) = c . h*(x), where g*(x) and h*(x) in Z[x1 are both primitive and c is rational, then c = ±1. Write C = lI/v, with relatively prime integers u and v. Then vg*(x) = uh*(x). Each coefflcient of IIh*(x) must be a multiple of v. As gcd(u. v) = I and h*(.r) is primitive, • this forces v = ± I. Similarly, u = ± I, so c = ± I, as desired.
Theorem 2.9 (Lifting the Factorization). Let f (x) belong to Z[x]. (Ind suppose that f (x) admits the factorization f (x) = G (x) H (x) in Qlx], Then there exist polynomials g(x) and hex) in Z[xJ such that g(x) is a rational multiple of" G(x). hex) is a rationall11l1ltiple olH(x), and f(x) = g(x)h(x). 388
©
THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
Proof: We can factor out the content to write: I(x)
= C(r(x),
G(x) =
c~g;(x),
Then we have ('I
F(x)
=
(c~ch)g;(x)h8(x).
By Gauss's lemma, the product g*(x)h"(x) is primitive. The uniqueness clause in Lemma 2.8 thus gives c I = C~Ch or CI = -C~Ch. In either case, C~Ch is an integer, so we let g(x) = (C~Ch)g*(X) and h(x) = h*(x) to complete the proof. •
Corollary 2.10. The ring Zlx] is (/ UFO The irreducihles in Z[x1 are the integer primes unci the primitive pO/Yllomiuls that (ire irreducihle when considered as polynomials il1 Q[ x 1. It is not hard to see that the same sequence of results holds if we replace Z with an arbitrary unique factorization domain Rand Q with the field of fractions of R. We must, however, modify the lemma on factoring out the content (Lemma 2.8) so that uniqueness means "up to units of R" rather than "up to sign." Since any witness to the fact that a ring R is not a UFO will confirm that R[x J is not either, we obtain: Theorem 2.11. Let R he (/ ring. Then R is ({ U FD
if and Olll\' if R Ix]
is aUFD.
Gauss employs his lemma in the study of cyclotomic polynomials (see, for example, [9, art. 341]), among other places. The study of cyclotomic polynomials is used in turn
to prove Gauss's celebrated result that the construction of a regular N -gon using only compass and straightedge can be achieved if and only if the odd prime factors of N are all distinct Fermat primes, meaning primes of the form 22'" + I.
3. UNIQUE FACTORIZATION AND EULER'S PROOF OF FERMAT'S LAST THEOREM FOR 11 = 3. Unique factorization had already made a covert appearance in the study of quadratic forms by Fermat, though it was hard to recognize it in that guise. We refer the reader to John Stillwell's introduction in [5]. Gauss was very interested in quadratic forms and dedicates section 5 of the Disquisitiones Arithmeticae 19J to their study. In the English edition, it comprises over 260 pages, compared with only 107 for sections I through 4. It is likely that it was through the study of quadratic forms and their connection to unique factorization that Gauss recognized the unstated assumption of unique factorization of integers, and so was led to state explicitly the fundamental theorem of arithmetic. Perhaps the first overt appearance occurs in the work of Euler. who ran afoul of unique factorization in his attempt to prove Fermat's Last Theorem for n = 3. The idea behind Euler's attempt was similar to Fermat's own proof for the case n = 4: to establish an infinite descent by proving that the existence of a nontrivial solution to x3 + = ;::" with x, y, and;:: positive, pairwise coprime integers, necessarily leads to the existence of another solution X'l + \,'3 = ;::'.1 with smaller integers (i.e" ;::' < z), Since there can be no infinite descending sequence of positive integers, no solution could exist in the first place, Note that the finiteness property of the positive integers we discussed earlier is at play here again, For 11 = 3, Euler starts with x' + y3 = ;::', with gcd(x, y, ;::) = I, If both x and yare odd, then x + y and x - yare both even, say 2p and 2q, respectively, so x = p + q,
,,3
May 2005J
GAUSS'S LEMMA FOR NUMBER FIELDS
389
y = p - q, and
Since x and yare odd and coprime, p and q must be of opposite parities and coprime. And as x 3 + y3 = z3, 2p(p2 + 3q2) must be a cube. A similar argument yields the same conclusion if z is odd and one of x or y is even. Next, we want to show that both 2p and p2 + 3q2 are cubes. If 3 t p, this follows easily by noting that 2p and p2 + 3q2 are relatively prime and appealing to unique factorization; if 3 I p, then we must write p = 3s and then rewrite
from which we infer that 32 . 2s and 3s 2 + q2 are relatively prime. Each must therefore be a cube (see [7, sec. 2.2] for the details). Euler noted that one way in which both 2p and p2 + 3q2 can be cubes when p is not a multiple of 3 is for p and q to have the forms
17 = a(a - 3b)(a
+ 3b),
q = 3b(a - b)(a
+ b)
(3. I)
(a similar expression is found for sand q when p is a multiple of 3). If this is indeed the case, then a and b must be relatively prime, because p and q are relatively prime, and must have opposite parities. From here one shows that 2a, a - 3h, and a + 3b must be pairwise coprime. Since 2p = 2a(a - 3b)(a + 3b) is a cube, each of 2a, a - 3b, and a + 3b must be a cube. Then (a - 3b) + (a + 3b) = 2a gives a new solution to the Fermat equation, one with 2a < Z3, setting up the infinite descent and thus proving the result. A similar argument is used when 3 I p. At this point, of course, we need to show that the only way for both 217 and p2 + 3q2 to be cubes if 3 p is for p and q to be expressible as in (3.1). It is here that the argument presented by Euler fails (he did, however, have at his disposal other results on quadratic forms that he could have used to establish this claim about p and q). Euler factors p2 + 3q2 = (p + qH)(p - qH) and proceeds to work in ZrHl. Since
t
(p +qH)
+ (p - q H ) =
2p,
(p +qH) - (17 - q H ) = 2qH, any common divisor of p + q H and p - q H would be a divisor of 2p and of 2qH. One can show that both 2 and H are irreducible in Z[Hl (see the argument to follow). From the fact that p and q have opposite parities it follows that 2 does not divide 17 + q H, and from the fact that 3 t p one deduces that H does not divide p + q H either. Accordingly, any common divisor of p + q Hand p - q H must in fact be a common divisor of both p and q, which are relatively prime. Hence p + q H and p - q H have no common divisors in Z[Hl other than I and - 1. Since their product is a cube, Euler concludes that each must be a cube, so in fact we have:
p+qH=(a+bH)l. p - q H = (a - hH)3 390
©
THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
for some integers a and h. It now follows that
~
')
= (a - 9ulr)
+ (3cz-/) '1
1;--::;
3h)V' -3,
from which the desired equations (3.1) follow. The problem, of course, is that hidden in that argument is an assumption of unique factorization: we know that p - q H and p + q H have no common divisors in Z[ H l (other than I and -I) and that their product is a cube. If we have unique factorization into irreducibles in Z[ Hl. then we are able to conclude that each factor must itself be a cube. But in fact we do not have uniqueness:
and each of the numbers 2, 1 + H, and I - H is irreducible in Z[ Hl. as we demonstrate shortly. Thus, Euler's argument breaks down. Euler must have realized that there was something wrong with the argument [7, sec. 2.3], for in a later portion of his Algehra he noted that his methods would indicate that 2X2 - 5 cannot be a cube, even though 2(4)2 - 5 = 27 = 33 . Euler attributed the difficulty to the minus sign in the equation, however, and so apparently did not feel that his argument for the n = 3 case of Fermat's Last Theorem was in danger. This is probably a good place to establish the asserted irreducibility of 2, I + H, and I - H in Z[ Hl. A central role in the proof is played by a "norm" function defined from Z[ H l to Z. This function allows us to translate certain divisibility questions concerning Z[ H l to questions in the more familiar setting of Z. Each element a of Z[ H l can be written uniquely as a = a + hH with a and h in Z. We define a florm N: Z[ H 1-----* Z as follows:
that is, N(a + hH) is the product of all images of a + hH under the distinct embeddings of Q( H) (the field of fractions of HD into cc.
Zr
Lemma 3.1 (Properties of the Norm). The norm N satisfies: (1) N(af3) = N(a)N(f3)forall a and f3 in Z[H]'
I f3 in Z[Hl. then N(a) I N(f3) in Z. The element a of Z[ H l is a unit (i.e.. has a multiplicative inverse Z[Hl) ifand only if N(a) = 1.
(2) Ifa (3)
III
Proof Statements (2) and (3) follow directly from (I), which can be established through direct computation. • The two most important points to observe right now are these: first, the properties of the norm provide the "finiteness" needed to accomplish at least the first step in a proof of unique factorization; namely, every element of Z[ H l can be expressed as a product of irreducible elements. This is true because any proper divisor in Z[ H l of an element a of Z[ H l will necessarily have a norm that is a proper divisor of N (a) in Z, ensuring that no infinite descent is possible. And, second, that property (2) gives a way to study divisibility in Z[ H l by referring to divisibility in Z. Since the imMay 20051
GAUSS'S LEMMA FOR NUMBER FIELDS
391
plication is not reversible, it is not a perfect translation, but even so it is extremely useful. Consider the number ;=3. Since N ( R ) = 3, it follows from the properties of the norm that ;=3 is irreducible. Moreover, because N(2)
=
N(l
+;=3)
= N(l -
vC3) =
4,
each of 2, I + ;=3, and I - R must also be irreducible (a proper divisor of any of the three would have norm 2, but 2 = {/2 + 3b 2 has no solution with (/ and b integers). As the only units of Z[;=31 are I and -I, we see that even though 2 . 2 = (I + ;=3)( I - ;=3), the two factorizations are not related by multiplications by units. In other words, unique factorization does indeed fail in Zl vC31. This is what Euler failed to take into account.
4. LAME AND A GENERAL PROOF OF FERMAT'S LAST THEOREM. In 1847, G. Lamc announced to the Paris Academy that he had found a proof of Fermat's Last Theorem (our account is taken from [7, chap. 4]). His brief sketch consisted in factoring the equation x" + .v" = ;;:" as ;:" = x"
+.v"
= (x
+
y)(x
+ S",v)··· (x +
(,,-1.\"),
where ( is a primitive 11th root of unity, say
( = cos(2nln) + i
sin(2nln).
Lamc's idea was to obtain a contradiction by proving that each of the factors x + (i Y would necessarily be an 11th power. He would prove this by showing either that no two of the factors had common divisors in Z[ (] other than units or that there was a factor 111 common to all 11 factors sLlch that (x + v) / 111. (x + (, y) / /J1, and so forth, had the property that no two had common divisors other than units, and then use a similar argument. A sketch of this argument for a prime exponent p such that p f xvz can be found in [16, chap. L Exercises 16-28]. Lame made a point of mentioning that he had come up with the idea after a casual conversation with Liouville a few months earlier. Liouville, for his part. took the floor immediately after Lame and cast doubt on the viability of the latter's program. He quickly pointed out the gap in the argument: to conclude that each factor was an 11th power from the fact that there were no common divisors (other than units) of any two, one needed a property analogous to unique factorization for the elements of Z[( J, and this was by no means a given. Lame acknowledged the gap, and in the following months attempted to fill it. It was not thought a hopeless task: two small cases had already been treated. The case 11 = 4 had been studied by Gauss, who proved in 1831 that Z[ i I is a UFO. Gauss had been led to study this ring through his interest in biquadratic reciprocity. The case p = 3 had been studied by Eisenstein during his analysis of cubic reciprocity, and he had demonstrated in 1844 that Z[ (-I + R) 12J is also a UFO. However, although neither Liouville nor Lame were aware of it, Kummer had three years earlier published a memoir [14] in which he had shown that the domains Z[( I did not always enjoy unique factorization. Kummer communicated this to Liouville, who passed the news on to the Academy. Lame then abandoned his attack on Fermat's Last Theorem, defeated by the failure of unique factorization. Kummer's study of the rings ZI (, I had also been fueled by a desire to obtain higher reciprocity laws, though 392
© THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
it is often incorrectly attributed to an interest in proving Fermat's Last Theorem (his results could be used to present a variant of Lame's argument; see the next section).
5. KUMMER AND IDEAL NUMBERS. Starting in 1837. Kummer had begun to study the arithmetic of certain cyclotomic fields, extensions of Q obtained by adjoining primitive 11th roots of unity. Kummer studied divisibility in rings Z[ (,,], where (" is a primitive pth root of unity for a prime p. He quickly found that these rings were not in general UFDs. It was only after several years of effort that he discovered a way to circumvent the difficulties, with the introduction of "ideal numbers." We will borrow from Dedekind's exposition of ideal numbers in [51. Dedekind explains the situation skillfully, but rather than consider the case of a cyclotomic field, he considers the similar situation in the much simpler ring Zl ;=5l. This ring has a norm analogous to the one we introduced earlier for Zl Hl. Here we define N by
the product of all images of a + hJ=S under the different embeddings of Q(;=5) (the field of fractions of Zl ;=5]) into C It is now easy to verify that this map also satisfies the properties in Lemma 3.1. so it can be used to establish the fact that every element in Z[ J=Sl can be written as a product of irreducibles. When trying to proceed to uniqueness. however. we again run into a problem: not every irreducible has the prime divisor property. Consider. for example. the two factorizations of 6:
6 = 2 . 3 = (I
+
vC5)( I - vC5).
Since N (2) = 4. any proper divisor of 2 would necessarily have norm equal to 2. but a C + Sb c = 2 has no solution for a and h integers, hence no element can have norm 2. Thus 2 is irreducible. as are I + H and I - H. both of norm 6. Since a C + Sb c = 3 also has no integer solution and N(3) = 9, we also see that 3 is irreducible in Z[ HI. Furthenn;re. the two factorizations are patently distinct. and we cannot pass from one to the other by multiplication by suitable units (the only units of Z[ J=Sl again being I and -I). Thus. ZI J=Sl is not a UFD. However, Kummer realized that one can tell a lot about the prime factorization of an integer without actually having to factor it into primes: how it behaves with respect to divisibility can help tell the complete story. For example. we already know that we can tell that an integer p greater than I is a prime by noting that it has the prime divisor property. For a slightly more complex result. consider the following two conditions: (i) If 11 divides a Ch 2 , then either
/I
divides a 2 or
(ii) There exists an integer III such that
II
11
divides b 2 .
does not divide
111.
but
II
divides
1112.
An integer II larger than I that satisfies condition (i) must be either a prime or the square of a prime. An integer that satisfies condition (ii) must be divisible by the square of a prime. If we could show that an integer 11 satisfies hoth (i) and (ii). then it would follow immediately that II is the square of a prime. Consider the number 2 in Zl Hl. Clearly. 2 divides a number (l + if and only if both a and h are even integers. The square of a + bJ=S is given by
bH
(a
+ b\! -5)-
~,
= (cr" - Sir)
+ (2ab)\! -5. ~
a" -
Thus, 2 divides (0 + hH)C if and only if Shc is even. This occurs exactly when a and b have the same parity. If a and 17 are both odd. then 2 divides (a + hH)2, May 2005]
GAUSS'S LEMMA FOR NUMBER FIELDS
393
despite the fact that it does not divide a
+ bH. For example, 2 divides
yet it does not divide I + H. SO 2 satisfies condition (ii). Note next that if 2 divides the product of two squares, say (a (c + dH)2, then it must divide (
0 0 0 (a-0 - Slr)(c- Set-) - 20ahcd )
+2
(
0 ab(c-0 - Sd-)
+ cd(a0
+ hH)2
and
1) ~ - SIr) v' -5,
whence (a 2 - Sh2)(C 2 - Sd 2) must be even. This implies that at least one of the factors is even, so 2 divides at least one of (a + bH)2 and (c + d H)2. Hence 2 satisfies conditions (i) and (ii), and by all rights should be called the "square of a prime." In fact, one can show that with regards to all divisibility properties in Z[ Hl. the number 2 behaves as if it were the square of a prime. But of course, we know there is no such prime in Z[ Hl. So we introduce an "ideal prime number" ex with the property that ex 2 = 2. Here we are using ideal in a sense similar to that found in the dictionary: "existing as a mental image or in fane)' or imagination ollly" (Wehster's Ninth New College Dictionarv). We say that a number a + h H is divisible by the ideal prime ex if and only if its square is divisible by the number 2. More generally, ex' is the highest power of ex that divides a + b H if and only if 2' is the highest power of 2 that divides (a + bH)2. A similar process leads to the conclusion that the irreducible 3 behaves in all respects as the product of two distinct ideal prime numbers, call them j3 and y, and that and I - H, One way to establish this is to show that the same is true of I + H each of these numbers can play the role of 11 in the following three conditions: (iii) There exist x and y such that n divides neither x nor y, but divides their product. (iv) If 1/ divides x 2 , then
11
divides x.
(v) If n divides the product
xy~,
then n divides at least one of -,y, xz, or -"Z.
The first two conditions are easy to establish. The last is a bit more difficult and lengthy, so we will not prove any of them here, We now know that in Z[ Hl the number 2 behaves like the square of an ideal prime ex, and that each of 3, I + H, and I - H behaves like the product of two distinct ideal primes. If we are to have some kind of unique factorization in Zl then in light of the fact that
HJ,
2·3 = (I
+ 0)(1
-
H),
it must be the case that ex is one of the prime factors of each of I + H and I - H and that the other factors are the prime factors of 3. That is, we must have 2 = ex 2 , 3 = j3y, 1+ = exj3, and 1= exy for distinct ideal primes ex, j3, and y. Luckily, as far as divisibility in Z[ Hl is concerned, these identifications do work out perfectly, Of course, the "ideal prime numbers" ex, j3, and y do not actually exist in Zl Hl, which is why we call them ideal primes, after all. But, in studying divisibility, we can in fact proceed as if they did exist, as if we had unique factorization into primes, whether actual-numbers in Z[ that have the prime divisor property (for example, II or l3)-or ideal (such as ex).
H
H
HJ
394
©
THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
We can think of "ideal prime numbers" as analogous to quarks in the study of matter. Elementary particles (the irreducible numbers) are made up of quarks that combine in different ways. We never observe isolated quarks, we only observe them in combinations making up certain particles. Likewise, we never observe these ideal prime numbers as occurring independently, we only "see" them when they combine with one another to form actual numbers in the domain. Kummer does not in fact define what "ideal prime numbers" are but instead always speaks of them indirectly, in terms of the divisibility properties of actual numbers. It takes a fair amount of work to make sure that everything does work out properly (and a certain amount of luck: if we attempted to proceed analogously in Z[ HL for instance, we would soon encounter unsurmountable difficulties and fail completely). For the cyclotomic rings Z[SI'] where £'1' is a primitive pth root of unity for a prime p, Kummer proved [15]:
Theorem 5.1 (Unique Factorization into Ideal Primes). Even element of Z[£,I'] factors uniquelY as a product
(~l (ideal alld
actual) primes.
The finiteness condition can be established once again through a norm, which maps each element of Zlsl} 1to the product of all its images under the different embeddings of iQl( £'1') (the field of fractions of Z[ £'1' j) into C Enough of unique factorization is then recaptured to proceed with the arithmetic almost as usuaL Among other things, Kummer used his ideal prime numbers to establish a variant of Lame's argument for Fermat's Last Theorem. The argument does run, however, into certain subtle technical difficulties and does not work for an arbitrary prime. Kummer listed a set of two conditions that p would have to satisfy to be able to deduce the n = p case of Fermat's Last Theorem using this approach. He even informed the Berlin Academy that he "had reason to believe" that p = 37 did not satisfy them (it does not; see [7]). On the other hand, Kummer considered his proof of Fermat's Last Theorem for the so-called regular primes a by-product of his research into higher reciprocity laws, not the main interest of his development.
6. KRONECKER, DEDEKIND, AND ALGEBRAIC INTEGERS. Kummer's approach had a number of drawbacks, not the least of which was the imprecise nature of "ideal numbers." It was also hard to see how to extend the notion to extensions of iQl other than cyclotomic fields. Although the ideas worked very well in Z[ Nl and some rings of the form Z[8], they failed completely when applied to others, like
Z[Hl The first difficulty lies in finding the correct notion of number or integer with which to work. In general simply taking Z[8] as the set of "integers" for the field iQl(8) will not do. In some cases, a small adjustment is all that is needed (if one works with Z[(l + H)/2] instead of Z[ H]' all difficulties disappear), but in others there is no apparent way of avoiding problems. The right definition was eventually given by Dedekind (it had also been discovered independently by Kronecker): the correct generalization of integer is that of "algebraic integer." The concept had already appeared in the work of Eisenstein and others, although it had received no special attention. An algebraic number is a complex number Q' that satisfies some monic polynomial with rational coefficients. In contrast, an algebraic integer is a complex number Q' that satisfies a monic polynomial with integer coefficients. Every integer is an algebraic integer, and in fact the only rational numbers that are algebraic integers are the integers themselves. This follows, for example, using the May 2005J
GAUSS'S LEMMA FOR NUMBER FIELDS
395
rational root test from precalculus. It is a bit harder, but not much, to show that the product and sum of any two algebraic integers is again an algebraic integer, and that the roots of any monic polynomial with algebraic integer coefficients are again algebraic integers. In general, given a domain D and a subring R of D, we say that an element of Dis integral over R if it satisfies a monic polynomial with coefficients in D. The algebraic integers are the elements of ([ that are integral over 21:, and if a complex number is integral over the ring of algebraic integers, then it is an algebraic integer as well. At this point, it would seem that algebraic integers are too much trouble to be of use. If A signifies the collection of all algebraic integers, then we do not even have factorization into irreducibles. For if 0' belongs to A and 0' is not a unit there, then fo is also an clement of A and not a unit, so 0' = fo.ja is not irreducible. In particular, no element of A is irreducible! What is more, we can find many more factorizations of 0': for instance, 0' = PIP2, where PI and P2 are the roots of x 2 - x + 0'. The very first step towards a proof of unique factorization, which we managed for other rings, breaks down in A. In order to surmount this difficulty, we restrict ourselves to subrings of A where we do have a natural finiteness condition. This is accomplished by first specifying a .finite extension K of Ql (called a number .field) and then considering its ring (~r integers OK = K n A, the collection of all algebraic integers that are in K. This has the virtue of also providing the correct subring of an arbitrary extension Ql(8) in which to continue the pursuit of unique factorization. The reason 21:[;=3J gives so much trouble is that the collection of all algebraic integers in Ql(;=3) is not 21:[ ;=3L but 21:[( 1+ ;=3)/2]: (I + ;=3)/2 satisfies the monic polynomial x 2 - x + I. Another difficulty lies in trying to establish unique factorization into ideal numbers. Kummer's method relied very strongly on properties ofQl(~fI) that are not shared by arbitrary extensions Ql(H). In particular, the ring of integers of Ql(~I») admits a basis over 21: consisting of powers of the same element (namely, ~I»)' so it is of the form 21:1~,) J. But some number fields K do not admit such bases: for example, if K = Q(,J7 + ,flO), then OK f Z[HI for any H in OK (see [16, chap. 2, Exercise 30]). Several attempts had been made to generalize Kummer's arguments. In 1865, for instance, Selling, a student of Dedekind, produced an argument that in fact ended up in nonsense (it can be made rigorous by using q-adic numbers, but these numbers would not be introduced by Hensel until 1897). Dedekind had attempted a different generalization in 1857 (later redeveloped independently by Zolotareff in 1880), but both Dedekind and Kronecker were stymied by the difficulties presented by any field whose ring of integers was not of the form 21:[81. (For a more detailed explanation of the difficulties, see l3. chap. 71.) In order to avoid these difTiculties, it was necessary to give some substance to Kummer's ideal numbers, to have something tangible to work with rather than these shadowy constructs that were never explicitly defined. This was accomplished independently by Kronecker and Dedekind using two very different techniques.
Kronecker and forms. Kronecker was a student and colleague of Kummer. His approach generalized Gauss's theory of forms, the aforementioned subject of section 5 of the Disquisitiones. Aj(mn over a number field K is a homogeneous polynomial in arbitrarily many variables whose coefficients are algebraic integers in K (Gauss had considered binary quadratic forms with integer coefficients). Suitably chosen forms play the role of the ideal prime numbers and ideal numbers of Kummer. In modern terms, we adjoin the ideal prime numbers to our field by first adjoining indeterminates and then taking the quotient by the corresponding (ideal of) relations, in order to cre396
© THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
ate a bigger field K', If we concentrate only on the algebraic integers of K, then each can be written uniquely as a product of prime elements of () K and certain elements of () K'. The former correspond to actual primes. while the latter play the role of the ideal primes. Of course. there are now new numbers in OK' that may not be expressible uniquely as products of irreducibles of () K', but we do not worry about them. As long as we only care about the algebraic integers of K. we can proceed as Kummer does. Kronecker's methods survive as major tools in algebraic geometry. but they have had a lesser impact in number theory. He was slow to publish. and he did not have Dedekind's gift for exposition. Although Dedekind's work was mostly ignored when it first appeared. his "theory of ideals" would take center stage in Hilbert's landmark Zahlbericht [Ill and form the basis of algebraic number theory. Even today, over a hundred years after its appearance. Dedekind's exposition in [5] could still be used as a suitable introduction to the subject (in itself a great testament to its impact and clarity). Nonetheless, it should be mentioned that a complete correspondence can be established between Kronecker's approach and Dedekind's. as Hilbert does in the Zahlbericht. so the two methods are, in essence. equivalent.
Dedekind and ideals. Dedekind understood the dangers of the somewhat shadowy approach of Kummer. As he writes [5. pp. 571: While this introduction of new numbers is entirely legitimate, it is nevertheless to he feared at first that the language which speaks of ideal numhers being determined by their products. presumahly in analogy with the theory of rational numbers, may lead to hasty conclusions and incomplete proofs. And in fact this danger is not always completely avoided. On the other hand. a precise definition covering (/1/ the ideal numbers that may be introduced in a particular numerical domain 1(') 1\)' and at the same time a general definition of their multiplication. seems all the more necessary since the ideal numhers do not actually exist in the numerical domain [(') 1\ I·
Dedekind preferred, if at all possible. to make this definition solely in terms of objects he already had in hand. Moreover. he wanted a definition that would "create" all these new objects simultaneously, rather than through a recursive process, and that would allow for ease of calculation. His classical construction of the reals as Dedekind cuts illustrates this general philosophy: assuming we understand the rationals and all their arithmetic properties. we define the reals as sets of rational numbers satisfying certain properties and define their operations in terms of operations of the rational numbers. What is more. his construction of the reals also illustrates another very interesting idea, namely, identifying an object with a set that somehow characterizes it. For the real numbers, constructed with a view towards their order. a real number a is uniquely determined by the collection of all rational numbers less than or equal to a, so it can be identified with such a set. Dedekind accomplished a similar program for Kummer's ideal numbers, through the introduction of ideals. Dedekind defines ideals to be nonempty subsets J of the ring of integers OK satisfying two conditions: (I) If a and b are in J. then both ([ + band ([ - b are also in J. (2) If a is in J and r is in OK. then r([ is in J. Dedekind named such collections "ideals" because they would play the role of Kummer's ideal numbers. as we will see shortly. The motivation for this definition is the observation that the collection of all multiples of a given number (including alll11ultiples of a given ideal number in the cycloMay 20051
GAUSS'S LEMMA FOR NUMBER FIELDS
397
tomic case) satisfies these two conditions. In the case of the integers, we can identify a number with the collection of all its multiples, and since we are dealing with a principal ideal domain, every collection that satisfies these conditions corresponds to an integer. (To be more precise, the collections correspond to an equivalence class of integers, where a '" b if a and b differ by a unit, but we are interested in divisibility; multiplication by units becomes irrelevant.) However, in the case of rings of integers that are not UFDs, there are collections satisfying these two conditions that do not correspond to actual numbers. In the case of Z[ J="51, to consider a now familiar situation, the collection of all multiples of the ideal prime a. where a 2 = 2, satisfies the conditions. We simply identify each such collection with a number, ideal or actual, since it will correspond to "all multiples" of that number. One defines an addition and a multiplication of ideals: the sum / + f of two ideals / and f is the ideal
/+f
=
Ii + j liE /.
j E f}.
while the product /J is somewhat more complicated, defined as follows:
that is, all (finite) sums of products of an element of / by an element of f. It would have been nice to define /J as the set of all products i j, but unfortunately this is not an ideal, so we have to include all finite sums of such products as well. Given an element a of R, the principal ideal generated by a, which is denoted (a), is defined by (a) = ira IrE R}.
(i.e., it consists of all multiples of a). It is then easy to verify that ([ divides h in R if and only if (b) is contained in (a) as sets. We generalize this observation and say that the ideal/divides the ideal f if f is contained in /. In the case of Z, the division algorithm shows that every ideal is principal and that (a) + (b) is none other than the ideal generated by the greatest common divisor of (a) and (b), as can be expected from the fact that the ideal (a) + (b) is the smallest ideal that contains both (a) and (b) (hence, is generated by the largest number dividing both a and b). To mirror all the divisibility properties of Z in its ideals, we say that an ideal P is a prime ideal if whenever P contains a product /J of two ideals either P contains / or P contains f (this is equivalent to the usual definition: P is a prime ideal if and only if whenever a product xy lies in P at least one of x or y must lie in P). It is now a nice exercise to verify that an ideal (p) in Z is a prime ideal if and only if p is a prime number. In short, all the arithmetic theory of Z may be restated in terms of ideals, ideal multiplication, ideal division, and prime ideals. We can then use the ideals of OK to play the role of the ideal numbers; rather than defining divisibility by a in terms of divisibility by 2, we define divisibility by a in terms of belonging to the ideal we have identified with a. In general, principal ideals correspond to actual numbers (up to units)-namely. their generators-while ideals that are not principal correspond to Kummer's ideal numbers, with no actual counterpart. The main difficulty in this approach, as Dedekind candidly admits, is showing that the notions of divisibility and multiplication of ideals are connected as we hope. That is, it is easy to verify that if / and f are ideals of 0 K. then 1 and f both divide /J, in the sense that they both contain it. More difficult, however, is showing 398
© THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
that if I divides the ideal I' (i,e" if I contains /'), then there exists a unique ideal 1 such thaUl = 1'. This Dedekind succeeded in doing, though only after considerable effort. Once that was done. the next step was to prove that every ideal could be factored uniquely as a product of prime ideals. Thus, the ideal numbers of Kummer are replaced by ideals, and we can rescue for all rings of integers OK as much of unique factorization as Kummer had restored to 2[s{,].
7. PROPERTIES OF RINGS OF INTEGERS. It has been a while since we broached the topic, so it may be appropriate to remind our readers of where we were headed when we took them on the foregoing detour through history. We are interested in answering the following question: Question 1.1. Can every polynomial with integer coefficients be factored into (not necessarily monic) linear terms, each with algebraic integer coefficients')
Rings of integers provide the context in which we can analyze this question, It seems reasonable to wonder if, with unique factorization into prime ideals now at our command, we have salvaged enough to push through the results used to establish the UFD property for 2lx J. If we could do that, then starting with a polynomial f(x) from 2[x], we could let K be its splitting field and then attempt to lift the factorization of I(x) into linear terms in K[xJ to a factorization in OdxJ. Unfortunately, we do not have quite enough to be able to do this directly, but a solution suggests itself quickly enough. First, we recall the necessary definitions and properties of algebraic integers and rings of integers. Some of the results are not triviaL but they are classical, so we will simply refer the reader to a standard textbook in algebraic number theory for details. What we want to highlight is how these important classical results combine to give an answer to our question, in a way that ulmost parallels the development for 2[x]. Recall that a complex number CI is an algebraic number if and only if there exists a monic polynomial I(x) in Q[x] such that f(C1) = 0, A complex number a is said to be an algebraic integer if and only if there exists a monic polynomial g(x) in 2[x] such that g(a) = O. We denote the collection of all algebraic numbers by Q and the collection of all algebraic integers by A. A number .field is a finite extension of Q. Given a number field K, its ring of integers is the collection of all algebraic integers lying in K, and we denote it by OK' It is not hard to verify that K is the field of fractions of 0 K, so every element of a number field can be written as a quotient of two elements of its ring of integers, The most important result that we need to know, due to Dedekind. reads as follows [5, sec. 25, Prop. 41:
Theorem 7.1 (Unique Factorization of Ideals). Let K be a 1l1ll71berfield, and let OK be its rillg (~/,illtege,.s. Each 1l(}Il~ero ideal O/,OK call befactored ulliquely as a product prime ideals, with the trivial ideal 0 K correspollding to the empty product.
()f
The general philosophy when working with rings of integers is to avoid the use of divisibility statements in terms of numbers. resorting instead to divisibility in terms of ideals. So we work with ideals rather than elements, translating many (but not all) notions and properties of divisibility from 2 to the ring of integers of a number field. We say that an ideal b of OK dil'ides the ideal a if b contains a; this is in fact equivalent to the existence of an ideal e such that be = a. Likewise. we pronounce two May 2005J
GACSS'S LEMMA FOR NUMBER FIELDS
399
ideals a and b coprime if their factorizations into prime ideals have no prime ideal factor in common or, equivalently, if the ideal a + b, the smallest ideal dividing both, is the trivial ideal OK. We call elements a and b of 0 K coprime (or relatively prime) if the ideals (a) and (b) are coprime or. equivalently, if the ideal (a. b) = (u) + (b) is the trivial ideal OK. Since rings of integers are not, in general, UFOs, they are also not in general principal ideal domains (the two conditions are in fact equivalent for rings of integers). Let K and K' be number fields, with K a subfield of K'. Given an ideal a of 0 K, we can extend it to an ideal of 0 K' by taking the ideal it generates in the larger ring. that is. the ideal aO K'. Conversely, given an ideal b of 0 K', we can restrict it to OK by taking OK n b. If a is not the trivial ideal in OK, then its extension to OK' is also nontrivial: if an element a of a had a multiplicative inverse in OK'. then this inverse would be in K and therefore would already lie in OK. We say that the ideal b of 0 K' lies O\'er the ideal a of OK if a = b n OK (we also say that a lies under b).
Theorem 7.2 (Lying Over). Let K und K' be /lumber/ields. \\"ith K (I slIbfield oj" K'. Ij"p is a prime ideal OJ"OK. Ihel1 there exists u prime ideul q OjOK' lying over p.
One way to see this is to look at the ideal of 0 K' generated by p, factor it into prime ideals, and let q be any prime factor. A full proof, together with other properties, is found in [16, Theorem 20[. Note that if a = (a) is principaL then its extension is also principaL generated by a. However, it is in general false that an ideal lying under or over a principal ideal must be principal. The analogy between ideals and elements is unfortunately not complete. In a UFO, for example, any two elements have a greatest common divisor that is an element, which among other things allows one to take a quotient (J I b of elements of the domain and rewrite it in lowest terms (i.e., alb = cld, where c and d are elements of the domain that are relatively prime). With ideals, however, the greatest common divisor is defined as an ideal and may have no actual counterpart in the domain. One key difficulty in simply extending Gauss's lemma and its associated corollaries to arbitrary number fields is precisely the absence of greatest common divisors: for a number field K and for given u and b in OK, it may be impossible to express alb in lowest terms. For example, in K = Qi( v'iO), we have OK = Z[ v'iO[, where v'iO/2 cannot be expressed in lowest terms. On the other hand, if the ideal (a. b) is principaL say generated by d. then we may write (J = dx and b = dy for x and y in OK, in which case ulh = xl.\' and (x. v) = OK. Thus. a final property that we need is the following, which is a consequence of the finiteness of the class number [16. Corollary 2, p. 132J:
Theorem 7.3 (Extending to a Principal Ideal). Let K be a numher .field, and let a be (In ideal oj 0". Then there exists a positive integer k such that a' is principal. In purticlll{//: there exists afinite extension L oj" K such that
aO I. is principal.
Our main use of this theorem involves its second assertion. To see how to obtain L from the first part of the theorem, suppose that ak = (u) and that L = K (a 1/ k). where ai/I. is any fixed kth root of u, and consider the principal ideal of 0 1 generated by a 1/,. Since
400
©
THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
unique factorization into prime ideals shows that aOI = (a 1/"). as needed. Note that one way to interpret the first conclusion of Theorem 7.3 is that an ideal number (that is, an ideal) can be transformed into an actual number (principal ideal) by raising it to a sufficiently high power (i.e .. every ideal number is the kth root of an element of 0 K for suitable k). By extending to a principal ideal we conclude that for any given members {/ and b of OK there exists a finite extension L of K and elements x and y of 0 1 such that alb = xly and (x. y) = OL. If (a, h) was already principal, we can take L = K. In the case of JIOI2 in QJ( JIO). for example. it suffices to go to the extension L = QJeJlO,)2) = QJ(y's, )2). In 0 1 • the ideal (JIO, 2) is principal. generated by)2. Cancelling that factor. we have jlo12 = j51)2. and the latter fraction expresses this number in lowest terms (relative to Orl. The extension to principal ideals shows that in the ring A of all algebraic integers every finitely generated ideal is principal. If
we let K = QJ(({I .... , all) and extend the ideal generated by ai, .... ([II in OK to a principal ideal in OJ for some extension L of K. The original ideal in A. which is the extension of this ideal of L, must likewise be principal. This means. in particular. that for any algebraic integers ([ and h the ideal (a, b) of A is principal, so a and b have a greatest common divisor d in A. Moreover. d can be written as a linear combination d = 0'(/ + f3b with 0' and f3 in A. This greatest common divisor is unique only up to units. however. and we do not have an obvious choice among them. as we do in Z. As usual. we will not be overly concerned with this. for we are interested in divisibility. Dedekind calls the fact that any two algebraic integers ([ and b have a greatest common divisor in A that can be expressed as a linear combination of ([ and b an "important theorem:' but he notes that "it is not at all easy to prove" without first establishing the finiteness of the class number [5, pp. 106]. Domains in which every finitely generated ideal is principal are sometimes called Be,;ou[ domains. The usual proof of Gauss's lemma and associated results can be extended to any Bezout domain, or more generally, to any GCD-dolllain, meaning an integral domain in which any pair of elements have a greatest common divisor (which need not be expressible as a linear combination of them) [13, sec. 1.6, Exercise 8 and Theorem 102]. It is worth noting, however, that A is not a principal ideal domain: there are ideals of A that are not even finitely generated. For example, it is not hard to verify that
cannot be finitely generated. Although in this discussion we have glossed over the finiteness of the class number, most number theorists should recognize it as the arithmetical heart of our argument in the next section. We emphasize Theorem 7.3 instead because it entails a slightly weaker condition. The exact analogy is that the finiteness of the class number asserts that a certain group is finite. whereas Theorem 7.3 asserts merely that the group is a torsion group (i.e., each element has finite order).
8. GAUSS'S LEMMA FOR NUMBER FIELDS. This section is patterned after [2. sec. 11.9]. We only sketch some of the proofs. since they mimic very closely the proof that Z[x] is a UFD. May 2005J
GAUSS'S LEMMA FOR NUMBER FIELDS
401
Suppose that K is a number field. We say that a polynomial I(x) in Q[x] is defined ol'er K if the coefficients of I (x) lie in K. Now let I (x) belong to A[x], let K be a number field such that I (x) is defined over K, and let OK denote the ring of integers of K. Then the coefficients of I (x) all lie in OK. Accordingly, we can make the following definition:
Definition 8.1. For I (x) in 0 dx], the content contdf) of" I (x) in K is the ideal of OK generated by the coefficients of I (x). The polynomial I (x) is primitive in K if contK(f) = OK.
This is in fact very similar to the definition of content of" a Iorm given by Kronecker (see [11, chap. 6]). The main difference is that Kronecker considers the norm of this ideal, so the content is a positive integer rather than just in ideal of 0 K. The content of a polynomial clearly depends on K, since it must be an ideal of 0 K. However, the property of being primitive does not depend on the specific K:
Lemma 8.2 (Independence of Primitivity). Let I(x) helong to Alx], (lnd let K and K' he two nlll1lherfields ol'er which I(x) is de.fined. Then I(x) is primitive in K if" {lnd only ifit is prilllitil'e in K'.
Proof By looking at the compositum of K and K' (i.e., the smallest field containing both K and K'), it suffices to establish the result when K is contained in K'. If I(x) is not primitive in K, then there is a prime ideal p of 0 K such that every coefficient of I (x) lies in p. (To see this, factor the content into primes, cont K (f) = Pl' .. p", and take P = Pi for any i.) Now let q be a prime of OK' lying over p. Then every coefficient of I(x), when considered as a polynomial in OK'[X], lies in q, implying that the content of I (x) in K' cannot be the trivial ideal OK', since q divides it. Thus, I(x) is not primitive in K' either. Conversely, if I(x) is not primitive in K', then there is a prime ideal q of OK' such that every coefficient of I(x) lies in q. Let p = q n OK. As f(x) is also defined over K, the coefficients of I (x) lie in p, so p divides the content of f (x) in K. This ensures that I (x) is not primitive in K. • Since primitivity does not depend on K, we simply say I(x) is primitive to mean that it has algebraic integer coefficients and is primitive in any number field Kover which it is defined. Gauss's lemma itself is now straightforward, provided we remember to use prime ideals instead of prime numbers:
Theorem 8.3 (Gauss's Lemma for Number Fields). The product ()f" two primitive polYl1omials is prilllitil'e.
Proof Proceeding as in the case of Z[x], consider primitive polynomials I(x) and g(x) in Alx], and let K be any number field over which both I(x) and g(x) are
defined. Write
402
©
I(x)
=
I:>i Xi ,
g(x)
=
I>jX i ,
hex)
=
I(x)g(x)
=
I>kXk.
THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
Pick an arbitrary prime ideal p of 0 K, tind the smallest indices io and jo for which neither Gin nor bin lies in p, and observe that Cin+io
= (/oiJ in + io
+ ... + {/io-lbio +
1
+ (/iobjo + (/io+lb jo - I + ... + (/in+job o does not lie in p. It follows that hex) is primitive.
•
At this point one might wonder if. with unique factorization of ideals and Gauss's lemma in hand, it might be possible to prove that a factorization of a polynomial I(x) in OdxJlifts from K[x] to Odx]. Unfortunately. the results at our disposal are not quite strong enough.
Example 8.4. Consider the field K = Q( )10\ The ring of integers of K is known to be OK = Z[ ~]. which is not a UFD. For instance.
6
= 2·3
= (2
+ Fa)( -2 + JiO).
where all four of 2. 3. 2 + Fa. and -2 + JTO are irreducible. To prove they are indeed irreducible, we once again rely on a norm N: Z[ JTOJ -+ Z given by N (a + bJTO) = (/2 - I Ob 2 . The norm has the same basic properties as the norms on Z[ H] and Z[ yCS] discussed previously (e.g .. it is multiplicative). but now we must expand the characterization of units to those elements of Z[ JTO] with norms equal to I or - I. (For elementary properties of the norm map for general number fields, see [16, chap. 2].) Note that N(2) = 4, N(3) = 9, and N(±2 + JTO) = -6. If any of these were factorable into a product x\' of non units x and.\' of OK, then at least one of x or y would have to have norm ±2 or ±3. Since the equations {/2 - IOb 2 = ±2 and a 2 - I ~b" = ±3 have no solutions modulo 5. they have no solutions in Z either. Thus. there are no elements of 0 K of norm ±2 or ±3. All four of the elements in the indicated factorizations of 6 must therefore be irreducible. and hence OK is not a UFD (the two factorizations cannot be transformed into one another by multiplication by suitable units because the norms do not match). Now consider the polynomial pix) = 2X2 - 5. Its splitting field is Q( Fa). for 2X2 - 5 = 2(x - JTO/2)(x + Fa/2). But there is no way to rewrite this factorization so that all the coefficients lie in OK. To see this, assume that pix) = (ax + b)(cx + d) with {/. b, c. and d all in OK' We then have ac = 2. Since 2 is irreducible. we may assume without loss of generality that a = I and c = 2 (by possibly switching a and c and adjusting the factors by a unit). This means that b must be the negative of a root of pix). but neither JTO/2 nor -~/2 is an algebraic integer. Our original assumption must therefore be wrong. whence p(x) = 2x 2 - 5 cannot be factored over OK = Zl JTO]. To see where things go wrong in Example 8.4. we go back to the proof that Z[x] is a unique factorization domain in section 2. It is, in fact, in the step corresponding to the next proposition (factoring out the content) that we find unique factorization of ideals insufficient for our purposes, forcing us to modify our result somewhat: May 2005]
GAUSS'S LEMMA FOR NUMBER FIELDS
403
Proposition 8.5 (Factoring Out the Content in Number Fields). Let I(x) be a /lon::,ero po/mol7lial in QlxJ, and let K be a numberfield over which I(x) is dc.fined. Then there exists a finite extensio/l L of K such that I(x) = elr(x), Hihere e I is a constant lying in Land I* (x) is a primitive po/nlOmial dc.fined over L. Moreovel; CI and (x) arc unique up to multiplication bv units of 0 I>.
.r
Proof: We begin just as before: we write I(x) as .
f (x)
.
all = -x" b ll
+ ... + -al) 170
with the ai and hi in OK. We mUltiply by ho .. . 17 11 to clear denominators and see that
(bl)" . b,JI(x) = g(x), with g(x) in OK. We cannot simply write g (x) as a constant times a primitive polynomial, however, because cont K (g) is only an ideal and may not be principal. Of course, there is a finite extension L of K such that we can extend contdg) to a principal ideal in Since contdg)O[ is the same as cont[(g), we have contr(g) = (c) with c in 0[>. We can then write g(x) = c· g*(x), with g*(x) primitive and defined over L. We thus take CI = cj(ho '" bll)' an element of L, and f*(x) = g*(x). For uniqueness up to a unit of L, it is again enough to consider the case of I* (x) = eg*(x), with both f'(x) and g*(x) primitive and defined over L. Write e = ujv with u and v in 0[. Going to a finite extension of L if necessary, we may assume that u and v arc relatively prime by the remarks following Theorem 7.3. We now proceed as in the case of Z[x] to conclude that both u and v are units, making ujv a unit in a" as claimed. •
at..
It is worth noting that if 0 K is a unique factorization domain (or if contK (g) is principal), then in the proof of this proposition we may use L = K, since we can factor out the content in K itself. Although we cannot simply lift factorizations from K[x] to Odx], as we saw in Example 8.4, by factoring out the content we are able to lift a factorization from K [x 1 to Odx], where L is a finite extension of K. This is the message of the following result:
Theorem 8.6 (Lifting a Factorization). Let K be (lllwnberfield, and let f(x) belong to OdxJ. Iff(x) = g(x)h(x)Ior po/.vnol11ials g(x) and hex) in K[x], then there is a .finite extension L of K sllch that I (x) = G (x) H (x), ,\'here G (x) is (In L-multiple ()f g(x), H(x) an L-multiple ofh(x), and both G(x) and H(x) have coefficients in 0[>. Proot In view of Proposition 8.5, we can pass to a finite extension L of K where we can write I(x) = cIF(x),
g(x) = cgg*(x),
with f*(x), g*(x), and h*(x) primitive, ci in
a,>, and c~ and e"
in L. We then have
elf*(x) = f(x) = g(x)h(x) = (c~c,,)g*(x)h*(x),
404
© THE
MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
and Gauss's lemma for number fields tells us that g*(x)h*(x) is primitive. By the uniqueness of the representation, c I = lIC~C" for some unit u of O/.. In particular C~C" lies in 0 1., permitting us to write
•
and thus completing the proof.
Remark 8.7. It is very important to note that in Theorem 8.6 it is typically necessary to pass to an extension L of K. Of course, if we can factor out the content of I (x), g(x), and hex) in OK (as happens if OA. is a UFD), then we may again set L = K and get the usual result.
Corollary 8.8 (Complete Factorization in A[x]). E\'(!r.'" nonconstant polynomial I(x) in A[.\"] can befauored into a prodllct o{(not necessarilY l11onic) linearfactors, each 1,vith algebraic integer coefficients.
Proof Let I(x) be a nonconstant polynomial in A[x], and let K be the splitting field of I(x). We can now lift the factorization in K[xl to a factorization in Odx] for a finite extension L of K. Since we are lifting a factorization into linear terms, the factorization in Odx], a subdomain of Alxj, is a factorization into linear factors as well. and we are done. • Specializing to integer coefficients we finally obtain the affirmative answer to Question 1.1:
Corollary 8.9. Every polYllomial fix) in Z[xj can be factored into a product of(not necessarily lIlonic) linearlactors Ivith algebraic integer coefficients. Example 8.10. Let us go back to Example 8.4 and find a factorization with algebraic integer coefficients. Recall that pix) = 2x" - 5. Any factorization of p(x) over Q must be equivalent to
2x' - 5
~ 2 (, + ~) (,- ~)
up to mUltiplication by elements of Q. We want to factor 2 = ab in A in such a manner that the two elements aJIO/2 and -hJTO/2 also lie in A. In Zl JIO], the greatest common divisor of JTO and 2 is defined as the ideal (JlQ. 2), which is not principal since 2 is irreducible and does not divide JIO. On the level of ideals, we have (2) = (JIO, 2)". To see this, note that
(JIO.2)" = (IO,2J1O.4), and since all generators are multiples of 2, it is clear that (JIO. 2)" is contained in (2). Because it is also true that 2 = 10 - 2(4), it follows that 2 lies in (JIQ, 2)2, so we have equality. As (2) is the square of the greatest common divisor of 2 and JlQ, it would suffice to choose ([ = b, doing it in such a way that a is a generator of the ideal (JIO, 2) in some extension of K. We adjoin J2 to K to obtain the number field L = Q( JIQ. J2) = May 2005]
GAUSS'S LEMMA FOR NUMBER FIELDS
405
Q( j5, ~), and in OL we have inO!.:
,,(x)
~
(JT5, 2) = (v'2). Taking a =
+
=h
+
h
= v'2, we then have
~) (x - ~)
JT5) h (x-~2~ JT5) (x+~2~
= ( hx +
vis) (hx - vis) ,
which is a factorization into linear polynomials with algebraic integer coefficients. It is not hard to verify that O[ = Z[(l + ~)/2, v'2] (see [16, chap. 2, Exercise 42(c)]), although the factorization may in fact be achieved over the smaller subring Z[~, v'2]. 9. SOME RING THEORY. Dedekind's ideas and exposition, as found in [5], are surprisingly modern. Modulo a few edits (what Dedekind calls a "module" we would nowadays call a "Z-module," for example), it would be right at home in a modern algebra text. Unfortunately, his ideas were not immediately recognized or adopted. For one thing, there was great resistance at the time to dealing with infinite sets, such as Dedekind's ideals, as completed objects that one could manipulate. Moreover, those who opposed such infinite constructions had a powerful spokesperson in Kronecker. It was probably not until Hilbert's Zahlhericht appeared that the theory came fully into its own. Hilbert brought together several strands and approaches, and he presented a unified treatment for the algebraic theory of numbers as it had been developed up to that point. Hilbert passed effortlessly between Dedekind's theory of ideals and Kronecker's theory of forms but found in the unique factorization of ideals into prime ideals one of the "foundation pillars" of algebraic number theory. Indeed, these notions are the very language in which we described our answer to Question 1.1. Dedekind's approach to the problem of unique factorization in number fields did much more than extend Kummer's work and provide the basic framework on which algebraic number theory would later be built. The notions of modules and ideals were taken up by Emil Artin and Emmy Noether in the 1920s and generalized into what we now call ring theory. Dedekind's influence was powerful. Indeed, according to Stillwell [5, pp. 3]: But even then. EI1lIllY Noether used to say "Eoi' Siehl schon bei Dedekind" (It's already in Dedekindl, and urged her students to read all of Dedekind's work in ideal theory.
When we first encountered Question 1.1, our minds naturally turned to algebraic number theory, which explains the solution we found. One can think of that solution as a sort of "bottoms up" solution, in which we proceed by taking finite extensions of number fields to get the appropriate greatest common divisors, and thus mimic the proof of lifting the factorization. As noted earlier, this is very much in the spirit of Kummer, Kronecker, and Dedekind. However, much can be said as well for a "top down" approach, which would proceed instead by asking: Question 9.1. For rvhich integral domains D does the analogue o.f'Theorem 2.9 (lifting ti1efactori:::ation) hold'! 406
© THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
This question had already been asked and answered from a purely ring-theoretic point of view, though we were unaware of it. We would be remiss if we did not also take the opportunity to discuss here some of the notions involved. Many deserve to be better known, and they again revolve around ideas of factorization, treading close to the origins of algebraic number theory. We will need a few considerations before coming back to this question. We again beg the reader's indulgence.
Uniqueness of the factorization. We know that a domain D is a UFO if and only if the polynomial ring Dlx] is a UFO. The "only if" direction follows because any evidence of the fact that D is not a UFO will show that D[x] is not a UFO either. On the other hand, even though Alx 1 is not a UFD, it comes very close to being one, as we see in the following two results:
Theorem 9.2.
Let f(x) be a primitive polYllomial in A[x]. If' f(x) = gl (x)··· g,,(x) =
where the gi (x) alld
h j (x)
hi (x)··· h,,(x).
are polYllomials in A[x 1(if degree I. thell LIp to a reordering
n
of the hi (x) there exist lIllits u I . . . . • lI" of A sllch that u i = I and [Ii hi (x) = Ri (x). III particul(lJ; each lIi is a lIllit ill Or: lor any numherfield K containing it.
Proof Note that if a product of polynomials in Alx] is primitive. then each of the factors must be primitive. Thus. each of the gi (x) and hi (x) is primitive. Let K be a number field over which the gi (x) and h j (x) are defined. Consider both factorizations in K Ix]' which is a UFO. The two factorizations must be equivalent up to multiplication by elements of K. Up to a reordering of the h j (x). we may assume that each gi(X) is a K -multiple of h;(x). Fix an index i. Write gi(X) = ax + band hi(x) = cx + d. with a. h. c, and d in OK and (Ie !- O. Since gi(X) and hi(x) are both primitive, it follows that each of the ideals (a. b) and (c, d) is the trivial ideal Or:. We also have an element lIi of K such that gi(X) = lIihi(X). By passing to an extension of K if necessary, we may write Ui = Vi/Wi with Vi and Wi algebraic integers and (Vi. Wi) the trivial ideal. Therefore. Wigi(X) = Vihi(X), hence (wi)(a. b) = (Vi)(C. d) as ideals. Since both (a. b) and (c. d) are trivial and since (Wi) and (Vi) are relatively prime. from the unique factorization of ideals it follows that both (Wi) and (Vi) reduce to the unit ideal. We infer that Wi and Vi are algebraic integer units, as then is lIi. A substitution and cancellation now establishes that U i = I. •
n
Theorem 9.3.
Let f (x) be
a Iloll;:ero polynomial
lex)
in A[x]. Then H'e can write
= CrRI (x)··· g,,(x).
where Cj belongs to A and each gi (x) is a primitive polynomial in A[x 1 of'deRree I. Moreover, the factori;:ation is unique lip to units orA.
Proof This follows by combining Proposition 8.5 (factoring out the content in number fields) and Theorem 9.2.
•
In short. even though A[x] is not a UFO. we have a certain "uniqueness up to constants" in factorizations. It is "merely" the fact that the constants cannot be factored May 2005]
GAUSS'S LEMMA FOR NUMBER FIELDS
407
into a product of irreducibles (uniquely or otherwise) that prevents A[x] from being a UFO. We restate this unique factorization up to constants in the following corollary in a way that is more amenable to generalization for other domains. Recall that Q is the field of fractions of A.
Corollary 9.4 (Unique Factorization up to Constants). Let f (x) be a polynomial in Alx] of" positive deRree. Then f(x) factors into a product o{lIoncollstant polynomials with coefficiellts ill Alx J, f(x) = RI (x) ... R,,(X). such thatllo Ri (x) can helactored as a product O{tH'O llOllCOllstallt polYllomials. Moreovel; W1\' t\t'O such factori,;atiol7s oj' f (x) are equivalent, in the sense that ij' f(x)
=
gl (x)", gnC,)
=
hi (x),,· hll/(x),
then n = III alld up to a reordering oj'the hi (x) there exist constants such that Ri(X) = uih;(x) and lIi = I.
n
U I. ' ..•
u" in
Q
Compare this with what happens in a ring of integers that is not a UFO:
JIO],
Example 9.5. Let D = :21:1 We saw in Example 8.4 that the polynomial p(x) = 2x" - 5 cannot be written as the product of two linear polynomials in Dlx J, The same argument shows that q (x) = 5x 2 - 2 cannot be written as a product of two linear polynomials in Dlx] either. Consider now the following two polynomials in D[x J:
+ Fo. Fox" + 7x + Fo,
Fox" - 7x
rex) =
sex) = We have p(x)q(x) = (2x 2
5)(5x 2
-
-
2)
+ 10 7x + Fo)( Fox" + 7x + Fo)
= 10x~ - 29x 2 =
(-!lOx
2
-
= r(x)s(x),
We claim that neither rex) nor sex) can be written as a product of two linear polynomials in Drx 1. Indeed. if
Fox" - 7x
+ Fo =
(ax
+ b)(cx + d),
with a. h. c. and din D. then ab = ~. By looking at the norms and remembering that no element of D has norm 2. we see that up to multiplication by a unit we may assume that (/ = 1 and c = ~. But then -b must be a root of ~x2 - 7x + ~, whereas neither of its roots lies in D. We conclude that no such factorization exists. A similar argument shows that the same is true of the other factor. Example 9.5 shows that in D[x 1 for D = :21:[ JIO] we do not have unique factorization up to constants: the factorizations of 10x 4 - 29x 2 + 10 as p(x)q(x) and as
408
©
THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
r(x)s(x) are not related by multiplication by constants. When do we get this kind of uniqueness of factorization'? Now is the time to get back to Question 9.1: since the ring of polynomials over a field is a UFO. if D is a domain in which we can lift factorizations from K[x] to Dlx], then in D[x] we will have unique factorization up to constants. Perhaps somewhat surprisingly. the converse also holds:
Lemma 9.6. Let D be a domain, and let K be its field olFactiol/.l'. The following statements are equivalent: (a) Any polynomial f(x) in D[x] ol degree at least two that call be factored in K [x] call also be factored as a product of two 11OIlcoilstant polmomials with coefficients in Drx J,
(b) The analogue of Theorelll 2.9 holdsfor D.
(c) There is unique factori~atioll lip to cOllstallts ill D Ix 1, Proof: The fact that (a) and (b) are equivalent is clear. That (c) follows from (a) is the observation that K [x] is a UFD whose units are the nonzero constants. Finally, to see that (c) implies (a). assume that f (x) is a polynomial in Dlx] that cannot be written as a product of two nonconstant polynomials in D[x J. but is nevertheless not irreducible in K[xj, say f(x) = G(x)H(x) with G(x) and H(x) of positive degrees. Multiplying by a suitable constant from D to clear denominators, we have cf(x) = g(x)h(x). where g(x) and hex) are polynomials of positive degrees with coefficients in D. Therefore, we have cxf(x) = xg(x)h(x). The assumption on I(x) means that the polynomial exf(x) can be factored in Dlx] as (cx)I(x). Each term is of positive degree and cannot be written as the product of nonconstant polynomials in D[x). On the other hand. we have the alternative factorization x . g(x) . hex) with at least three terms (more if either g(x) or hex) is further reducible). so DrxJ does not have unique factorization up to constants. •
Lifting factorizations and the Schreier property. Most students are familiar with the hierarchy of domains that is presented in most upper-division abstract algebra courses: Fields C Euclidean Domains C PIDs C UFOs C Domains. and all of the inclusions are proper. (Note that PID stands for "principal ideal domain.") However, there are other important properties of domains that are in turn closely related to factorizations and that are not nearly so well known. We have already mentioned a few in passing. The Bezout property (that every finitely generated ideal is principal) and the greatest common divisor property (that every pair of elements has a greatest common divisor) are two of these. The latter gives us the ability to lift factorizations. Notions that are closely related to the foregoing properties have been investigated. In [1]. there is a list of twelve conditions for domains that are linked with the concept of unique factorization. existence and behavior of greatest common divisors. and polynomial factorizations (including, for example, the "Gauss's Lemma Property." which requires that products of primitive polynomials be primitive). Three of these conditions were considered in detail by P. M. Cohn in his wonderful paper [4]. Once again. they revolve around the notions of divisibility. factorizations, and integral elements. We present these three notions here.
Definition 9.7. A domain D is a pre-Schreier domain if the following holds for every a. b. and c in D: if a I be. then there exist f3 and y in D such that a = f3y. f3 I b, and May 2005]
GAUSS'S LEMMA fOR NUMBER FIELDS
409
y I c. Equivalently, D is a pre-Schreier domain if and only if any two factorizations of an element have common refinements.
Definition 9.8. A domain D is a Schreier domain if it is a pre-Schreier domain that is integrally closed in its field of fractions (i.e., any element of the field of fractions of D that satisfies a monic polynomial with coefficients in D already lies in D). Definition 9.9. We say that a domain D has the AP-property (atoms are primes) if every irreducible element a of D satisfies the prime divisor property (i.e., for any elements band c of D satisfying a I be, either a I b or a I c). We have the following implications among these properties: PID
n
C UFD
n
Bezout c GCD c Schreier c pre-Schreier cAP. All inclusions are known to be proper (see [1]). One connection between these notions and unique factorization is the following [4, Theorem 2.3]:
Theorem 9.10. A domain D is a UFD (f and only if it is a Schreier domain such that each nonunit of D can be factored into a product of irreducible elements in at least one way. The Schreier property is exactly the missing link between being able to factor into irreducibles in at least one way and being able to do so (modulo units) in exactly one way. Since in rings of integers we always have factorization in at least one way, the Schreier property for a given OK is equivalent to the UFD property. The Bezout property. a consequence of the finiteness of the class number. implies that the ring A of all algebraic integers is a Schreier domain. It is precisely the Schreier property that characterizes the domain in which the analogue of lifting the factorization holds, thus answering Question 9.1. One of the key ingredients in the proof of this result once again brings us close to the very foundations of algebraic number theory. It is a generalization of Dedekind's "Prague Theorem." which Hilbert used in the Zahlbericht as the key step in establishing the unique factorization of ideals into prime ideals. We quote it here in its original version for algebraic integers:
Theorem 9.11 (Dedekind's Prague Theorem). Let f(x) and g(x) be polynomials with algebraic integer coefficients. f(x) = arx r + ar_lx rg(x) = b,x
2
1
+ b,_lx'-'
+ ... + ao. + ... + boo
If every coefficient of the product f (x )g(x) is divisible by the integer m, then each of the numbers aob o. aob l •...• aob,. al bo, ... , arb, is also divisible by m.
A proof of this result, together with the classical proof of unique factorization into prime ideals, can be found in [18, Lemma 8.12]. Through suitable definitions, one can interpret the Prague theorem as saying that the content of a product of polynomials is the product of their contents. To make this precise, however, requires a modification of our definition of content to something akin 410
©
THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
to Kronecker's definition, which makes the content of such a polynomial a positive integer rather than an ideal. We will not go into it here. Viewed in that light, the Prague theorem is itself a generalization of Gauss's lemma, so we are still circling the same notions with which we began. Using a generalization of the Prague theorem for more general rings, P. M. Cohn proved the following theorem:
Theorem 9.12 (P. M. Cohn). If D is a Schreier domain and x is all indeterminate, then D[x] is a Schreier domain. This result gives the top-down answer to our question (see [17]):
Theorem 9.13. Let D be a domain, and let K he its field of fractions. The analogue of Theorem 2.9 mfting of factorization) holds for D[x 1 if and only (f D is a Schreier domain. Proof First suppose that D is a Schreier domain. By Theorem 9.12 D[x] is a Schreier domain as well. Let f (x) = G (x) H (x) be a factorization of a polynomial f (x) from D[x] in K[x]. Multiplying by suitable constants to clear denominators, we obtain c in D such that cf(x) = g(x)h(x), with g(x) and hex) in D[x], g(x) a D-multiple of G(x), and hex) a D-multiple of H(x). Since f (x) I g(x)h (x), the Schreier property implies the existence of a (x) and b(x) in D[x] such that f(x) = a(x)b(x), a(x) I g(x), and b(x) I hex). By considering the degrees of a(x), g(x), G(x), b(x), hex), and H(x), we see that deg(a) = deg(G) and deg(b) = deg(H). Accordingly, we can lift factorizations from K[x] to D[x]. Conversely, suppose that the analogue of Theorem 2.9 holds for D[x]. We must prove that D is integrally closed and satisfies the pre-Schreier property. Consider an element k of K that is integral over D, and let f(x) = x"
+ a,,_lx"-1 + ... + ao
be a monic polynomial in D[xj such that f(k) = O. In K[x], we have
f(x) = (x - k)g(x) for some g(x); lifting this factorization, we find that f(x) = a(x)b(x) for polynomials a (x) and b(x) in D[x], where a (x) is linear and has k as a root. Multiplying by suitable units if necessary, we may assume that both a(x) and hex) are monic. Then a(x) = x - k, which implies that k belongs to D. Thus D is integrally closed. To see that D satisfies the pre-Schreier property, let a, b, and c be elements of D such that a I be. If either b = 0 or c = 0, then to verify the pre-Schreier property we simply factor a as a = a . I or as a = I . a, respectively. If a = 0, then we must have b = 0 or c = 0, so we may assume that ahe i= O. Let r in D be such that ar = he. Consider the polynomial
which has coefficients in D[x] and has been factored in K [x]. Since we can lift factorizations, we must have f(x) = g(x)h(x), with g(x) = ax - f3, hex) = 8x - y, and 0', f3, y, and 8 in D. Exchanging g(x) and hex) if necessary, we may assume that f3la = bla, so af3 = ba; and y j8 = cia, so ay = c8. We also have a = 0'8 and May 2005]
GAUSS'S LEMMA FOR NUMBER FIELDS
411
r = fJy. Thus we see that he = ar = afJy = cfJo. implying that h = fJo. Similarly. be = afJy = hO'y. whence c = O'y. Thus we have factored a = 0'0. where 0' I c and o I b. confirming that D satisfies the pre-Schreier property. We conclude that if we can lift factorizations, then D is a Schreier domain. as claimed. •
It is not hard to verify that we can lift factorizations of l7lonic polynomials in Dlx J if and only if D is integrally closed. so the notion of integral closure is also closely tied to factorizations. Theorem 9.13 provides a "top-down" answer to Question 1.1: invoke Theorem 7.3 (extending to a principal ideal) to show that A is a Schreier domain. note that its field of fractions is algebraically closed, and lift the factorization from Ql[x 1 to A[xl. We also obtain: Corollary 9.14. Let K he a number field, and let OK he its ring ailalogue of'Theorem 2. 9 110 Ids for OK if'and only if'OK is (/ UFD.
of integers.
The
10. DEDEKIND DOMAINS AND FUNCTION FIELDS. Before finishing, we exhibit a situation that closely parallels the case of number fields, but in which Question 1.1 has a negative answer. Most of the details require, unfortunately, some heavy technical machinery, so we merely assert many of the necessary results and point the reader to suitable references. The ring of integers in a number field is an example of a Dedekind domain. which is an integral domain in which ideals can be uniquely factored into prime ideals. (There are many equivalent definitions of a Dedekind domain; see [12. sec. 10.21.) The other main example of a Dedekind domain is the coordinate ring of a nonsingular curve over an algebraically closed field. These two cases are referred to in terms of their fields of fractions as "the number field case" and "the function field case." respectively. There is a strong general theory of Dedekind domains. The close connection between the number field and function field cases was in fact noted by Dedekind, who together with Weber applied these ideas to develop an arithmetic theory of Riemann surfaces [6], which marks the beginning of modern algebraic and arithmetic geometry. The fact that they are analogous was made even clearer when Dedekind developed the notions of ideals and ideal multiplication. His first proofs of unique factorization were very computational and relied throughout on the exact nature of the elements in the rings in question. I Later he abstracted the key properties of the ideals. and he proved most of his results in terms of those properties rather than in a computational manner. For any specific setting, the exact nature of the rings in question would be used to show that ideals satislied those key properties. but after that all further results would follow automatically. Once again. this represents a very modern approach. Thus. many of the results we have discussed hold for general Dedekind domains: unique factorization of ideals, the lying over theorem for prime ideals, Lemma 8.2, and Gauss's lemma can all be proved in the general setting. Unfortunately, the key property we used to prove the factorization result. Theorem 7.3. fails to hold in general: if u is an ideal in the function field analogue of the ring of integers. it is possible that Uk is not principal for any positive integer k and that there is no extension of the underlying field in whose ring of integers u becomes principal. Because of this failure, the proof lin his exposition of ideals published in 1871. for example, Dedckind docs not even define multiplication of ideals until afier he has proved the unique factorization theorem. Though this sounds paradoxical. recall that divisibility was defined in terms of inclusion of ideals. not in terms of multiplication. Dedekind proved that every ideal is the intersection of all prime ideal powers that divide it.
412
© THE MATHEMATICAL ASSOCIATION OF
AMERICA
[Monthly 112
of an analogue of the complete factorization theorem for function fields breaks down. We exhibit an explicit counterexample here. A precise definition of "function field" would take us too far away from the main discussions on this paper. Thus. we discuss only the two specific examples necessary to answer Question 1.1 for general Dedekind domains. The easiest example of a function field is the field CC(x) of rational functions of a single variable x; this is the analogue of the field of rational numbers ([JJ. Inside CC(x) is the ring of polynomials CC[x L which is the function field analogue of Z. Since CClx] is a unique factorization domain. it will certainly not furnish the counterexample we seek. This motivates us to find a second example of a function field. We consider the larger ring A = CC[x. y]/(p(x. where p(x. y) is a nonconstant polynomial in two variables that satisfies an additional technical hypothesis." We consider the following polynomial:
y».
p(x. y) = y-' - x'
+ 3x
- 49.
For this choice of p(x. y). the ring A is a Dedekind domain. If we denote its field of fractions by K = CC(x)[yl/(p(x. then K is a function field. and A is the analogue of the ring of integers OK in the number field setting. Just as in the number field case. the ring A is precisely the set of elements 0' of K that satisfy monic polynomials with coefficients in CC[x J:
y».
where qi
=
qJr) belongs to CC[x] for each i. In other words. A is the set of CC[x[-
illfegers in K.
Notice that A is closely associated with the elliptic curve C defined by the following equation: y-'
= .\", - 3x
+ 49.
The nonzero prime ideals of A arc associated naturally with points lying on C via the one-to-one correspondence
(a.b)EC
~
p=(x-a.y-b)
For example. the prime ideal (x. 7 + y) corresponds to the point (0. -7). The ideal (x - a. y - b) is precisely the set of polynomials in A that vanish at the point (a. b).' If the point (a. b) does not lie on C. then the ideal (x - a. \' - b) will be the unit ideal of A. so in particular will not be prime. It is not at all obvious that every nonzero prime ideal of A has the form (x - a. y - h) for some point (a. h) in C. but it is nevertheless true. This correspondence also sheds some light on prime factorization. For instance. the ideal (7 + y) can be factored in A as (7
+ -y) =
(x. 7
+ .y)(x
-
J3. 7 + .y)(x + J3. 7 + .y).
-'Thc tcchnical hypothesis is that there should he no solutions (x. \") in complex numbers x and y to the equations 1'., (x. y) = 1', (x. y) = I,er. y) = O. where V, and 1', denote partial derivatives. 'Technically. the elements of II are equivalence classes of polynomials. hut if ler. y) and g(x. r) are congruent modulo (I'(x. y)). then they have thc samc value at any point (II. h) of C.
May 2005]
GAUSS'S LEMMA FOR NUMBER fIELDS
413
which reflects the fact that the three points (0, -7), (-J3, -7), and (--J3, -7) are precisely the points of the curve C such that 7 + y = 0. Another factorization, which will be a key player in our counterexample, is the factorization of the ideal (x); namely, (x) = (x, 7
+ y)(x, 7 -
y),
again corresponding to the fact that (0, ±7) are the only two points in C with x-component equal to 0. This is a factorization of (x) into two distinct prime ideals of A; if the two ideals were the same, then they would both contain (7 + y) + (7 - y) = 14, which is obviously untrue because 14 is a unit of A. The important feature of A, for our purposes, is the failure of the analogue of Theorem 7.3. In fact, for most prime ideals p of the ring A, the ideal pk is not principal for any positive integer k. In particular, we have the following result:
Theorem 10.1. For ever.V positive integer k the ideal (x, 7 + y)k is not principal. A proof of Theorem 10.1 is unfortunately well beyond the scope of this paper.4 Taking it for granted, however, we are now able to describe the promised counterexample to the analogue of Question 1.1 for function fields. Let I (T) be the following polynomial, with coefficients in A:
=X
( 7+V)( 7-,,) T+~
T+~.
(10.1 )
Any factorization of I(T) over K (the algebraic closure of K) will be equivalent to Equation 10.1 up to multiplication by elements of K. To show that we cannot factor I (T) into linear factors with Clx J-integer coefficients, it suffices to show that we cannot factor x as x = ab with a and b integral over CC[x] in such a manner that the following two elements of K are also integral over C[x J: a(7
+ y) x
b(7 - y)
x
If a(7 + y)/x is an integral element, the ideal (x) must divide the ideal (a)(7 + y). From the factorizations of (x) and (7 + y) already computed, this means that (a) contains (x,7 + y). Similarly, (b) must contain (x,7 - y). As (x) = (a)(b) = (x,7 + y)(x, 7 - y), this implies that (a) = (x, 7 + y) and (b) = (x, 7 - y), where the ideals should now be interpreted in the ring of CC[x ]-integers in some finite extension L of K. Thus, in order to prove that ICT) cannot factor into linear factors with integral coefficients, it suffices to prove that the ideal (7 + y, x) is not principal in the ring of C[x ]-integers of any finite extension of K. Just as in the number field case, it turns out that this is equivalent to showing that (7 + y, x)k is not principal in A, which 4The multiplication of ideals in A turns out to be closely related to the group of points of C, described in much more generality in [19] and [20]. In particular, if a prime ideal p corresponds to a point P on C, then pk is principal if and only if the order of P divides k in the group of points. Since only countably many of the uncountably many points on C have finite order, pk is almost always not principal. Using techniques of arithmetic geometry, it is a straightforward matter to confirm that the point (0. - 7) has infinite order in the group of points on C, and therefore that the ideal (x, 7 + v)k is not principal for any positive integer k.
414
© THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
is precisely the content of Theorem 10, I. Thus, f (T) cannot be factored into linear factors with integral coefficients, In fact, notice that even though x divides the product (7 + y)(7 - y) (since plainly 49 - y2 = 3x - x 3 in A), we cannot factor x into a product of an element that divides 7 + Y and one that divides 7 - y, even in the integral closure of A in K, In other words, the integral closure of A in K is not a pre-Schreier domain, so we cannot always lift factorizations from K, as noted in Theorem 9,13,
11. FINAL REMARKS. Question 1,1 goes to the very heart of Dedekind's vision of algebraic number theory, exploring not only the parallels between the rationals and number fields and between the integers and rings of integers, but also the circumstances under which those parallels break down, Our answer has taken us on a tour of the history of some of the fundamental building blocks of ring theory and algebraic number theory, and even when we thought we were leaving number theory behind for the wider field of ring theory, we found ourselves drawn back to the notions of factorizations and to Dedekind's work, It is easy to understand, then, Emmy Noether's dictum: Es steht schon bei Dedekind.
ACKNOWLEDGMENTS. The authors thank Derek Holt and George Bergman for helpful comments. We also thank Bill Dubuque for many suggestions, and particularly for bringing P. M, Cohn's paper to our attention. We are grateful to the anonymous referee for glimpsing some potential of the paper based on our original submission and suggesting that we build the paper around the historical investigation of factorizations.
REFERENCES
I,
2. 3. 4, 5. 6. 7. 8. 9. 10. II.
12. 13. 14.
15. 16. 17. 18. 19. 20.
D. D. Anderson and R. O. Quintero, Some generalizations of GCD-domains, in F(/ct()ri~(/tion in Imel(ral Domains (Iowa Citro IA, J99fJ), Lecture Notes in Pure and App!. Math .. no. 189, Marcel Dekker. New York, 1997. pp. 189-195. G. Birkhoff and S. MacLane. A SlIn'er o(Modem Algehra, 3rd ed .. MacMillan. New York, 1970. N. Bourbaki. Elements o(the History o(Mathematics (trans. J. Meldrum), Springer-Verlag, New York, 1991. P. M. Cohn, Bezout rings and their subrings. Proc. Cambridge Philos. Soc. 65 ( 1968) 251-264. R. Dedekind, Theor\' o( Aigehraic Integers (trans. J. Stillwell). Cambridge University Press, Cambridge. 1996. R. Dedekind and H. Weber, Theorie der algebraischen Functionen einer Veranderlichen, 1. Reine AngeH'. Math. 92 (1882) 181-290. H. M. Edwards, Fermat's Last Theorem: A Genetic Imroduction to Algebraic Number Theor\', SpringerVerlag, New York, 1977. Euclid, Elemems, Book VII. C. F. Gauss, Disquisitiofles Arithmeticae (trans. A. A. Clarke), Springer- Verlag. New York, 1986. R. Hartshorne, Algebraic GeometrY, Springer- Verlag, New York, 1977. D. Hilbert, The Theory o( Algebraic Number Fields (trans. I. T. Adamson), Springer-Verlag. New York. 1998. N. Jacobson. Basic Algebra fl. 2nd ed., W. H. Freeman, New York, 1989. I. Kaplansky, COll1l1lllltltil'C Rings, revised ed .. University of Chicago Press, Chicago, 1974. E. E. Kummer, De numeris complex is, qui radicibus unitatis et numeris integris realibus constant. Gratulationschrift der Univ. Breslau zur lubelfeier der Univ. Konigsberg; reprinted as: Sur les nombres complexes qui sont formes avec les nombres entiers reels et les racines de l'unite, J. Math. 12 (1847) 185-212. - - - , Ueber die Zerlegung der aus Wurzeln der Einheit gebildeten complexen Zahlen in Primfactoren, 1. de Cyelle 35 (1847) 327-367. D. A. Marcus, Number Fields, Springer- Verlag. New York, 1977. S. McAdam and D. E. Rush. Schreier rings, Bull. London Math. Soc. 10 (1978) 77-80. H. Pollard and H. G. Diamond, The Theory of Algebraic Numbers, 3rd ed., Dover, Mineola, NY. 1998. J. H. Silverman, The Arithmetic o(Elliptic Curves. Springer-Verlag, New York, 1986. 1. H. Silverman and J. Tate. Ratiol1al Points 011 Elliptic Curves, Springer- Verlag, New York. 1992.
May 2005]
GAUSS'S LEMMA FOR NUMBER FIELDS
415
ARTURO MAGIDIN earned his degrees at the Universidad Nacional Aut6noma de Mexico (Matematico. 1993) and at UC Berkeley (Ph.D .. 199X). the latter as a student of George Bergman. He then spent four years at the Instituto de Matematieas at the UNAM and has heen at the University of Montana for three. only the second time in his life that he has lived north of his coauthor. Although his main research has heen at the intersection of general algehra and group theory. he considers himself a "number theory groupie" and is always keen to try his hand at it when timc pcrmits. Dept. o(Mathellllltica! Scienccs. Vnil'Nsitv o(Mo/ltlllu/. Missou!a MT 59RI2-0R(j.j /1/ogidin0)lI/clI/l!cl:allls.org
DAVID MCKINNON earned a perfectly respectable BAIMA degree in geometry from Harvard in 1992. whereal'ter he was almost immediately seduced by the lure of numher theory. This disreputahle path led him to a Ph.D. in 1999 from UC Bcrkeley under the delightful direction of Paul Voita. Sinee then. he has held a postdoctoral position at Tufts and a tenurc-track position at the University of Waterloo. and he was even lucky enough to become the husband of Jennifer and the father of Heather and Robert. He wonders if he is the only numher theorist whose Erd{)s number depends on symplectic topologists. Pllrt' Matht'lI/atics f)"flartll/et/t. UlliI·cr.lit\· Water/oo. \l.Yiter/()(}. ON N2L 3G /
or
dll/(-killllon ([l'lI/ath.l/II·ater!o(l. ClI
Arithmetic Cheer Three, five, seven, nine, Some odd numbers are not prime. Add the digits, everyone. See if nine divides their sum. Repeating nines, when analyzed, Show that one has been disguised. To reduce a ratio, it's common wisdom: Divide with Euclid's algorithm. Some infinite sets of different size Are found when we diagonalize. GOdel's numbers were quite a feat. Arithmetic is incomplete.
--Submitted by Nathaniel L. Silver,
CCNY
416
© THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
The Fundamental Group of the Circle Is Trivial Florian Deloup
1. INTRODUCTION. The somewhat provocative title of this article does not refer to an actual refutation of a basic theorem of algebraic topology. Rather, it points to an experiment I would like to suggest to teachers of introductory courses in algebraic topology. When I was a undergraduate student, I was often puzzled by the apparent arbitrariness of the notions that were introduced. At the same time, like many students, I was amazed at how those notions would fit together perfectly to produce nice theories. Understanding why one definition rather than another one is "right" is a fine art, and there is much room for argument about it. However, this kind of understanding lies at the core of doing mathematics. I will illustrate this point with three closely related examples, all at the undergraduate level. Each relics on the difference, for a function of two variables, between continuity (respectively, smoothness) in each variable separately and continuity (respectively, smoothness) in both variables together. The idea of linking the second to the third example via an inversion in the unit sphere was suggested by one of the referees of this article. 2. WHY IS THE FUNDAMENTAL GROUP FUNDAMENTAL? It is reasonable to assume that the typical introductory course in algebraic topology includes the definitions of homotopy and homotopy groups. A homotopy between two continuous maps f, g : X ~ Y of topological spaces X and Y is usually defined as a continuous family (f;);E[O.I[ of continuous maps from X to Y such that fo = I and fl = g. The problem with this definition is that it requires one to define beforehand what is meant by a "continuous family of continuous maps." This, of course, is not a problem if we know what topology to put on the space C(X. Y) of continuous maps from X to Y. Or, we may argue, we can just define a homotopy F between .f and g as a continuous map F : X x [0, I [ ~ Y such that F(x, 0) = I(x) and F(x. I) = g(x) for all x in X, where X x 10, I] is endowed with the product topology. This definition makes perfect sense. But then, we may ask-and in fact our students should-why is it that we require global continuity (or why is it that we choose this particular topology for C(X, Y))?
Let us relax slightly the condition of global continuity and define a pselldohomotopy between f. g : X ~ Y to be a family U; );E[O.I [ of continuous maps from X to Y such that the map (x, t) f-? ./; (x) is continuous in each variable separately (i.e., for fixed x in X the map t ~ f; (x) is continuous on [0, I J and for fixed t in [0, I] the map x f--+ .ft (x) is continuous on X). From the usual argument that one presents in class (or leaves as an exercise for one's students), there is a well-defined l1ot-sofundamental group for any pathwise-connected topological space X and any specified basepoint x in X. Its elements are the pseudo-homotopy classes of continuous maps y : lO, 1] ~ X such that yeO) = y(l) = x. (Such maps are called loops at .Y.) Just as for the fundamantal group, the group operation is induced by the concatenation of loops. We make the following bold assertion: May 200S]
THE fUNDAMEt\TAL GROUP Of THE CIRCLE IS TRIVIAL
417
Theorem 1. The not-so}undamental group of the circle trivial.
51
=
{z
E
rc : Iz 1=
I} is
Pro(){ We take x = I to be the basepoint. What could be a pseudo-homotopy between an arbitrary loop Yo and the constant loop YI at x = I (defined by YI (.1') = 1)7 The idea is that, as the pseudo-homotopy evolves, we can stay longer and longer at the point I, then go along (the image of) the loop Yo faster and faster, until in the limit we truly stay at the point 1. We look for a pseudo-homotopy of the type Y, (x) = Yo (f (.I', t)), where f : [0, I] x [0, I] ---+ [0, I] is continuous in each variable separately. When 0 ::: .I' ::: t, we want to sit at the point I; ass grows from t to I, we want to go along (the image of) the loop Yo. Looking for f (.I', t) as an affine function of .I' on (t, I], we find that
.f
s-t (s,t) = - 1- t
(t <
x:::
I)
is an appropriate choice. Hence the formula
Y'(S)=jlYo
(S-t) -1- t
if 0 :::
.I' :::
t,
if t < s ::: I,
defines a suitable pseudo-homotopy. It is clear that the map (.I', t)
J--+
y,(s)
is (jointly) continuous on [0,1] x [0. I). In addition, we have lim y,(s) = I t~l-
for each .I' in [0. I J, so the partial maps t J--+ Y, (s) (s fixed) and both continuous on [0. I] and we are done.
S J--+
Y, (s) (t fixed) are
•
Since an arbitrary loop Y : [0, I] ---+ X in a path wise-connected topological space X can be written as a composite map [0, I] ---+ 51 ---+ X, we deduce the following (rather annoying for the theory of pseudo-homotopy) fact:
Corollary 2. The
n()t-s()~fundamental
group is always trivial.
However, as we all know, the fundamental group of the circle is not trivial. It may be worthwhile for the reader to reconsider the proof of this fact and discover where global continuity is used. Here we merely check that the pseudo-homotopy that occurs in the proof of Theorem I is not in general a homotopy. Consider the nontrivial loop Yo that goes around the circle once and is defined on [0,1] by Yo(s) = exp(27Tis). We show that the pseudo-homotopy Y, from Yo to YI constructed in the proof of Theorem I is not a homotopy. The reason is that the convergence of Y, (s) is not uniform in s as t ---+ I. Indeed, the quantity
does not converge to 0 as (.I', t) ---+ (1,1). In fact, the right-hand side has no limit. To see this, let a be an arbitrary real number such that a > I. If we choose .I'll
418
©
I = 1--.
n
til =
a 1 - -, n
THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
so that s" -
tIl
= (a - l)/n > 0 and I -
I ( Sin
for each n. Noting that a
s" - tIl ) Jr ~
til
= a/n, we find that
I= I ( Sin
Jr
I) I
a -a-
= I and a = 2 produce different values for
we see that the expression
has no limit as (s. t) tends to (l, 1).
3. WHY MUST AN AMBIENT ISOTOPY BE AMBIENT? We now consider a second example. Again, it is safe to assume that any introductory course in knot theory will introduce the notions of isotopy and ambient isotopy. An isotopy from a topological space X to a topological space Y is "a continuous one-parameter family of embeddings." It is formally defined as a continuous map F : X x [0, I] ---+ Y, (x, t)
~
F(x, t) = .ft(x),
such that for each t in [0, I] the map .ft : X ---+ Y is an embedding (i.e., it is a homeomorphism between X and fl (X), the latter endowed with the relative topology it inherits from Classically, a knot is defined as a smooth (infinitely differentiable) embedding of Sl in ]]{3. Equivalently, a knot is a smooth loop y : [0, 1] ---+ ]]{3 whose image is a closed curve with no self-intersection (a simple closed curve). Sometimes, a knot is taken to be the image of such an embedding. Rather than giving away the accepted definition in class, I suggest proceeding differently once more and asking the following question:
n.
Question. What is the "good definition" of knot isotopy? A first attempt at answering this question would be to define an isotopy between two knots fo : Sl ---+ ]]{3 and fl : Sl ---+ ]]{3 to be an isotopy (ft)tEIO.II from Sl to ]]{3, that begins and ends with the given knots. We also require the mappings.ft (0 S t S 1) to be smooth. Classically, one tells students that this attempt will not work because of the following observation:
Theorem 3. With the foregoing definition, any two knots are isotopic. Theorem 3 is usually used as a pretext for introducing another notion of isotopy (that of ambient isotopy). We will challenge this notion soon (keep on reading), but we first give an idea of the proof of Theorem 3. It is not hard to see that the relation "being isotopic to" is an equivalence relation. It is therefore sufficient to show that any knot is isotopic to some specified knot. A knot is called an unknot if its image in ]]{3 lies in a fixed plane. It follows from a classical result in planar topology that any two unknots are isotopic. Hence it is sufficient to show that any knot is isotopic to an unknot. The idea of a possible isotopy is described by the picture in Figure 1: it consists in shrinking the knotted part to a point. Though this idea works (details are left to the May 2005]
THE FUNDAMENTAL GROUP OF THE CIRCLE IS TRIVIAL
419
(f2
(f2
0
.~
~
.(j;l
(j;l
• Ii'
Figure 1. Moments in the isotopy of a (part of a) knot.
Figure 2. Moments in another isotopy of a (part of a) knot.
interested reader as an exercise). we proceed in a slightly different fashion. We keep the idea of shrinking the knot to a point of ~\ but we simultaneously move the knot towards the origin. as shown in Figure 2.
Proof: We prove that any knot is isotopic to an unknot. Represent the given knot as a smooth map y : [0. I] ---+ ~\ say y = (YI. Y2. y,) with yeO) = y(I). Without loss of generality. we can assume that y (0) = (0,0,0). Furthermore, since y is smooth. subjecting y to an isotopy if necessary, we may assume that the following properties hold (see Figure 3): (I) There exist .1'1 and .1'2 with 0 < .\'1 < .1'2 < I such that y (s) = (.1'.0,0) whenever .1'1 :s s :s ,1'2, (In particular, the restriction of y to [,1'1 • .\'21 describes a straight segment.) (2) There is a closed ball B such that B y ([ O. I J) is planar.
n y([O,
I J)
=
y([O,.\'1 D and (R1
-
B)
n
Figure 3. A smooth knot satisfying conditions (I) and (2),
We now work out the idea of shrinking the knot towards the origin from the right side (first coordinate positive). Since all the "knottedness" occurs between s = 0 and s = S I. we are Jed to the formula
' ,I /(.1')
(I -
(s, 0, 0)
=
1 420
f)
Y (.I')
Y
(_.I' ) I - t
if 0
:s .I' :s
(I - t) ,I I •
if (I - f),11 :s if ,1'2 :s s :s I.
.I'
:s ,12,
© THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
when 0 .:: t < I. while at the end of the isotopy (i.e .. for t = I). we set (.I
.f l )
= { (.I. O. 0)
yes)
if 0 .::
.I .:: .12 •
if.12'::S.::I.
Let us verify that the function (.I. 1) t---* fJI") is continuous. From the formulas. this is clear at any point (.I. t) with t < I. We must check global continuity when t = I. Let 0.:: s .:: I. When s > 0, continuity at (.I. I) still follows from the formulas. So the real issue is continuity at (.I. 1) = (0. I). For .I satisfying 0.:: s .:: .12, we have
It follows that
•
as (.I. t) ---+ (0. I). The conclusion follows.
In one respect, the proof is incomplete: it should explain why the smoothness of the curve implies the existence of an isotopy transforming the given knot to a knot of the kind depicted in Figure 3, where the un knotted portion curve lies in a ball disjoint from (the interior of a ball containing) the knotted part. Can we imagine a knot so badly knotted or so pathological that we cannot straighten out any subarc without creating intersections, or even so bad that the complement of any proper subarc is not simply connected? The latter can happen if the embedding is a topological embedding (i.e., is continuous but not smooth). For this. see [9]. Here lies a plausible motivation for the "classic" definition of a tallle knot: there exists a plane P in IR' such that the projection of the (image of the) knot onto P has no multiple points apart from afinite number of double points. Such a definition could be classified as "strategic withdrawal" in Lakatos's terminology [5]. We (innocently, of course) avoided this problem by decreeing that our embeddings be smooth. I do not know whether Theorem 3 holds in general for nontame knots.l There are. of course, other motivations for the classical definition of tame knots: 1. (Combinatorial motivation) Knots should be "finite" objects. Two equivalent knots should be seen as equivalent after a finite number of combinatorial operations (called Reidemeister mOl'es). Nontame knots (also called "wild" knots) involve an infinite amount of knotting. as evidenced by the fact that uny planar diagram of a wild knot has an infinite number of double points (crossings). 2. (Motivation from physics) Our concept of a knot comes from physical knots, made of rope or wire. having nonzero thickness. Such physical knots are in fact solid tori. But tameness of a knot K specifically implies the existence of a solid torus tubular neigborhood of K. 3. (Thomian motivation) Knottedness should be a global concept rather than a local one. Knots should be locally unknotted. The reader who is familiar with isotopies will recognize that the isotopy that appears in the proof of Theorem 3 bears some res semblance to the one appearing in Alexander's trick. I Specifically for knots K in 1R2.' such that the complement in 1R2.' of connected.
May 20051
1111\'
proper subare in K is not simply
THE FUNDAMENTAL GROUP OF THE CIRCLE IS TRIVIAL
421
Theorem 4 (Alexander's Trick). Any self-homeomorphism ¢ : D
-'>- D of a disk D that fixes the boundary a D pointwise is isotopic to the identity through selfhomeomorphisms of D fixing the boundary pointwise.
Proof Define an isotopy (¢t)tE[O,IJ by
¢,(x) = { /¢
Gx)
if
Ilxll ::::
t,
if
Ilx II :::
t
for 0 < t < I and ¢o = id D . This is the required isotopy between ¢o = ida and ¢I =¢. • Now students are told that "in view of Theorem 3," a finer notion should be defined, that of ambient isotopy. Two embeddings /0, fl : X -'>- Yare ambient isotopic in Y if there is an isotopy (Ft )tE[O,1 J from Y to itself such that Fo = id y, the identity mapping of Y, and fl = FI 0 fo. The main point is that Ft is a map from the whole space Y to itself. The map FlIY-/i)(Xl sends the complement of fo(X) homeomorphically to the complement of fl (X). This notion applies in particular to knots, and it leads to the usual notion of equivalence for knots: two knots are equivalent if they are ambient isotopic in JR.' (for convenience, we do not take orientation into account here). However, one can be psychologically impressed by Theorem 3. It says that the relation "is isotopic to" is not very useful in classifying knots, though we might have easily thought otherwise (after all, Alexander's trick plays a fundamental role in both high- and low-dimensional topology, in particular in the dynamical study of selfhomeomorphisms). What is wrong with the usual notion of isotopy? On the one hand, we can say that, unlike isotopy, ambient isotopy takes into account the space surrounding the knots; this is reflected in the fact that the topological type of the complement is preserved by ambient isotopies. But it is not so clear a priori why isotopiesfail to preserve the topological type of the complement. On the other hand, we may account for the failure in a different way. Our intuition tells us that an isotopy of the kind occurring in the proof of Theorem 3 cannot be smooth at every point. A second look at the proof shows, in fact, that the isotopy (s, t) f--+ ft (s) is not smooth at (1, 0) (provided that the knot is indeed nontrivial). Nontriviality of the knot implies that there exists s* with o < s* < Sl such that dy /ds(s*) =1= (1,0,0). We have aft(s) I dy -= -(s*) = (1, 0, as s=( I-f)s, ds
as t
-'>-
0)~(1,
0, 0)
1-, whereas aft(s) I dy dy -= -(s*)~-(s*) =1= (1,0,0) as s=(l-t)s, ds ds
as t -'>- 1-. It follows that lim(s,t)-+(l,O) aft(s)/as does not exist. Given the idea that led us to the isotopy in the proof of Theorem 3, one suspects that the isotopy in question cannot be made smooth in general (unless the knots are really equivalent). This is indeed true. Here a knot isotopy (ft )tE[O, IJ is smooth if the map (x, t) f--+ f,ex) from Sl x [0,1] to JR.3 is (jointly) smooth. Let (f,) tE[O, IJ be an isotopy between two knots fo : S I -'>- JR.3 and fl : S I -'>- JR.'. We say that an ambient isotopy (Ft)tE[o,IJ oflR' extends the isotopy (ft)tE[O,IJ provided that 422
©
THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
I,
= F,
0
10 for each t in [0, I]. In other words, the diagram
]R' x [0, 1]
is commutative. Theorem 5. Any smooth isotopy (s. t) extends to an ambient isotopy ol]R'.
r-+ .1;(.1')
between two knots
.Il), 11
: SI
--+ ]R'
This is a particular case of the "Isotopy Extension Theorem." See [4, Theorem 1.3, p.180] for a proof (which definitely uses differentiability) accessible to an undergraduate. The fact that SI is compact plays a key role. In particular, Theorem 5 says what our intuition suggests, namely. that smoothly isotopic knots are ambient isotopic in ]R' (i.e., equivalent). So if any two knots were isotopic through a smooth isotopy Cft )'E[O.1 [, then the given isotopy would extend to an ambient isotopy of ]R'. Hence the knots would be equivalent! But of course there exist nonequivalent knots-although students usually have to wait a while before they see an actual example with a complete proof.
Figure 4. Two nonequivaIent knots.
In particular, a smooth isotopy satisfying the conditions of Theorem 5 does preserve the topological type of the complement! This point seems to be overlooked in all introductory books on knot theory that I checked,2 whereas it is often emphasized that a general (i.e., not smooth) isotopy does not invariably have that property (Theorem 3). In fact, the intuition that the notion of equivalence between knots coincides with isotopy is correct! (Intuition involves only smooth isotopies.) An introductory course in knot theory might well begin with the usual notion of isotopy. Is it the right definition (with respect to what is expected: for instance, the knots in Figure 4 should be nonequivalent)? On intuitive grounds, it is. Why not even work with this definition until one is forced by the mathematics to introduce further the smooth (or piecewise linear) category? There is some justification to the fact that smoothness is a notion that our intuition often takes for granted! 2See the references at the end of the article. References 12] and [6] use Theorem 3 to "justify" the notion of ambient isotopy; [10] does not motivate the notion of ambient isotopy; [8] does not discuss it; interestingly, [7, p. 9] defines an ambient isotopy as a smooth isotopy, which is closer to the view we are advocating here.
May 2005]
THE FUNDAMENTAL GROUP OF THE CIRCLE IS TRIVIAL
423
There is another notion that our intuition usually takes for granted, especially when a smooth isotopy is involved: compactness. We illustrate this in the next section.
4. RUNNING OFF TO INFINITY. When isotopies involve noncompact spaces, ambient isotopy and smooth isotopy may be two distinct notions. We illustrate this with Theorem 6. Consider a long knot L in JR.': it is (the image of a) smooth embedding t f---+ Y (t) = (x (t), y (t). ;: (t)) from the open interval (0, I) to JR.' such that y and;: are compactly supported, x (t) --+ -00 as t --+ 0+, and x (t) --+ +00 as t --+ 1-. In particular, there exists E > 0 such that both yeO, E) and y(i - E. I) are contained in the x-axis. An example is shown in Figure S. (We are really thinking of y : [0. 11 --+ 5' = R' U {oo), with y (0) = y (I) = 00 supplying the missing point from the knot.)
--~-Figure 5. A long knot in R'.
We say that a long knot is a planar if its image lies in some plane. It follows from the definition that a planar long knot must lie in a plane containing the x-axis. It is a classical exercise to show that any two planar long knots are smoothly isotopic. Actually. a stronger statement is true (see [4. Exercise 9. p. 183]):
Theorem 6. Any two long knots are smoothly isotopic. Proof It is sufficient to prove that any two long knots are smoothly isotopic to the trivial long knot whose image is the x-axis. Here the isotopy we are looking for is SlIIooth. which seems to preclude the kind of solution developed in the proof of Theorem 3. We can follow the original suggestion of M. Hirsch Croll the knot to infinity") and build the isotopy by hand. However, it is possible to deduce Theorem 6 from Theorem 3 as follows. First notice that the isotopy (fl )IEIO.II in the proof of Theorem 3 is smooth everywhere except at the point (0. I), where it satisfies II (0) = (0.0,0). the origin of JR.'. Now we can use the inversion I in the unit sphere of JR." the map JR.' U {oo} --+ JR.' U {oo} given by 7
,
f---+
1(7) -
"' -
-"-
11;:11 2
for;: tic {O. oo}. with 1(0) = 00 and 1(00) = O. Notice that I transforms long knots into knots that lie on the x-axis in a neighborhood of the origin (and vice versa). Thus each knot I 0 .1; is a long knot, a well-defined map from the open interval (0. I) to JR.'. In particular. the image of I 0 II is easily seen to be planar. hence smoothly isotopic to the trivial long knot. Moreover, it is not hard to see that any long knot is smoothly isotopic to a long knot that is the image under I of a knot satisfying the conditions (1 )-(2) of Lemma 3. Since 11,,3-101 : JR.' - {OJ --+ JR.' is smooth, the map from [0. II x [0, 11 to JR.' given by (.I'. t)
f---+
10 IU)
is the required smooth isotopy.
424
©
THE MATHEMATICAL ASSOCIATION OF AMERICA
• [Monthly 112
Basically the idea consists in pushing off to infinity the non smoothness of the isotopy Cft)IE[OII' The reader can check that the isotopy (s. t) f--+ I 0 .f;C~) does not have bounded velocity at infinity. Running off to infinity is sometimes a way to escape problems of regularity.
5. CONCLUSION. As a matter of fact. no student has cver complained to the author-nor to any teacher whom hc knows-that the not-so-fundamental group could be ill-defined or trivial. But he wishes he had heard more complaints about the right (accepted) definition. Questioning the right definition is the only way to understand it. And if it's not the right definition. you may discover a bettcr one. REFERENCES I. 2. 3. 4. 5. 6. 7. 8. 9. 10.
E. Artin and R. H. rox. Some wild cells and spheres in three-dimensional space. AIIII. M{(th. (2) 49 ( 1948) 979-990. G. Burde and H. Zieschang. Knots. Walter de Gruyter. New York. 1985. B. S. Edwards and M. B. Ward. Surprises from mathematical education research: Student (mis)use of mathematical delinitions. this MOi\THLY 111 (2004) 41 1--124. M. W. Hirsch. Ditf"erellli{(1 7ill'ologr. 4th cd .. Springer- Ycrlag. New York. 1991. I. Lakatos. ProoFI' alld Rejilllllions: The Logic o(MlIlhellliltical lJil"CfII'erl". Camhridge University Press. Cambridge. 1976. W. B. Lickorish. All Introdllctioll to Knot Theon'. Springer-Verlag. New York. 1997. L. H. Kauffman. Oil Knots. Princeton University Press. Princeton, 1987. A. Kawauchi. A SIIIWY oj Kllot Theorl". Birkh~iuser. Basel. 1996. J. Milnor. Most knots are wilu, Flllld. M{(th. 54 (1964) :135-338. D. Rolfsen. Knots und Links. 2nd eu .. Publish or Perish Pre". Houston. I 99().
FLORIAN DELOUP. who was born in 1971 in Belfort. France. still spends much of his time struggling with all kinds of definitions. He studieu at the University of Strasbourg and at Columbia University. where he received his Ph,D. in mathematics in 1998 under the supervision of V G. Turaev and J. M. Morgan, A fonner Marie Curie Research Fellow. he has held visiting positions at the University of Calgary and at the Hebrew University of Jerusalem, He is currently maitre ue confCrences at the Institute of Mathematics at the Universitc Paul Sahatier of Toulouse. France, His main research interest lies in the rich interplay of algebra and topology. especially in low dimensions. Institllt de Mathellliiliqllcs, L{(i>ol'illOire Eillile Picard, UMR 55S0 CNR,)JUnil'er,\ite PUIiI Sa/}((tie!; 118 mllte
de Nari>olllle, 31062 7illllollse. h{(nce delollp (gl picu rd.lIfis-tlse.f;-
May 2005]
THE FUNDAMENTAL GROUP or THE CIRCLE IS TRIVIAL
425
The Convergence of Difference Boxes Antonio Behn, Christopher Kribs-Zaleta, and Vadim Ponomarenko 1. INTRODUCTION. "Difference boxes," also known as "ditTy boxes," are the basis for a simple mathematical puzzle that provides elementary-school students with subtraction practice. The idea appears to have originated in the late nineteenth century with E. Ducci of Italy, who applied it to integers [2J. In this setting it has been well studied (see [11 for current open problems in this context). Another source cites a World War II prisoner of war as a popularizer of the idea [71. More recently, Ullman [6] gave a more probabilistic treatment for difference boxes involving real numbers. Its use among teachers can be traced back to Professor Juanita Copley of the University of Houston, who introduced it about twenty years ago as a mechanism for teachers in elementary schools to provide problem-solving and arithmetic practice without the tedium of drill (and who, in turn, credits her grandmother). One creates a difference box as follows: I. Draw a (large) square, and label each vertex with a (real) number. 2. On the midpoint of each side write the (unsigned) difference between the two numbers at its endpoints. 3. Inscribe a new square in the old one, using these new numbers to label the vertices. 4. Repeat this process, and continue inscribing new boxes until reaching a square that has all four vertices labeled O. It is perhaps surprising that diffy boxes always tend to "converge" rather quickly, that
is, it usually takes no more than a handful of iterations to get a box with all zeroes. Figure I shows a simple example, which converges to all zeroes after four iterations (on the fifth box).
------12
-2------2
Figure 1. A simple diffy box (left) and its "descendants" (right).
The question we want to investigate is whether all diffy boxes really do converge to the zero box, and, if so, how quickly? It turns out that the answer is no if we require a
426
© THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
finite number of iterations; however. the essentially unique counterexample also converges to zero after a fashion. We approach the problem by considering the diffy box process as a map from the set of all possible four-tuples of real numbers (i.e., R4) into itself.
2. DEFINITIONS. We begin our analysis by establishing some notation. We denote by [a b c d] the ditfy box that contains the numbers a, 17, c, and d on its upper left, upper right, lower right, and lower left corners, respectively (which makes the first box in Figure 1 [7 12 2 - 21). Next we introduce notation to describe the diffy box iteration process. as well as some terms related to convergence: Definition 1. The child of a given box B = [a he dj is C(B) = fib - al Ie - bl Id cl la - dl]; B is a parent of CCB). (We shall see that any given box has many parents.) The parent-child relation is signified by B c> C(B). We denote C(C(B» by C 2 (B), etc. The box CCB) is more commonly called the iterate of B. Definition 2. A given box B cOllverges to ;:ero ill 11 generations if C" (B) but en-I (B) i= [00001; we also say in this case that B has longevity 11.
=
[0000]
Definition 3. The range of B. denoted IBI. is the largest difference between two (not necessarily adjacent) vertices of B. Definition 4. A box B is monotone if its vertices are distinct and occur around the square in numerical order from least to greatest. Note that we consider boxes such as [2 I 4 3] monotone, since the definition allows us to start with any vertex and proceed clockwise or counterclockwise from it. A couple of relatively quick results may help the reader begin to develop some intuition as to how and why diffy boxes tend to converge to zero.
Proposition 1. The estimate IC(B)I the four numbers in B are distinct.
::: IBI always holds, and the inequality is strict if
Proof Let w ~ x ~ y ~ z be the numbers in B (not necessarily in order of appearance). We have IBI w - z. The numbers in C(B) all fall between 0 and w - z; hence IC(B)I ::: (w - z) - 0 = IBI. Furthermore, if w, x, y, and z are distinct, then C(B) does not contain zero, whence IC(B)I < IBI. •
=
Proposition 2. Any nonmonotone diffy box converges to zero in six or fewer generations. Proof Table I details the longevity of all diffy boxes whose vertices are nonmonotone. The proof is an exhaustive case analysis (a simpler proof follows from Figure 3 in section 3). To put a given diffy box into a form listed in the table, reorder the vertices (using reflection and/or rotation, isometry properties that we shall discuss in section 4) so that the smallest vertex is listed first, followed by the smaller of the two vertices adjacent to it. (If two copies of the smallest number occupy adjacent vertices, put them first and second, followed by the smaller of the two vertices adjacent to them.) This reordering does not affect longevity. •
May 2005]
THE CONVERGENCE OF DIFFERENCE BOXES
427
Table I. Longevities for all nonlllonotone diffy boxes (here 1I < " < (' < iI).
Longevity
Isometric form of ditTy box
:)
op 0 /J a h] [a b c b]. h = (a ]a a h 17]
4
lahdc]. a-iJ=c-d ]acbd]. a-iJ=c-d [a a a b]
la 2
a a a]. a
[a
+ c)/2
]ahhc]. b=(a+c)/2
[a a b cJ, h at least as close to c as to a
]a b a c1 [a h h h] ]abhc]. hop(1I+c)/2
la h c hi. hop (a + c)/2 fa h c cJ. h at least as close to a as to c la h d c]. a - hope - d. (a + d)/2 between la c h c] ]a c
6
band c
/J iI]. a - hop c - d
]a ({ h c1. h closer to a than to c [a bee]. /J closer to c than to a [a h d cJ. hand c on the same side of (a
+ d)/2
We remark that Proposition 2 includes any box whose vertex numbers are not all distinct. We will observe in section 4 that ditTy boxes [(I b c d] whose vertices are monotone (with ([ < h < c < d or ([ > h > c > d) have longevity five or greater. Since. in view of Proposition I. IC(R)I = IRI only when the vertices of Rare not all distinct, we can bound the longevity of any box that has integer vertices by observing that the range IB I of any such box must decrease by at least I per iteration, until we reach a box C with at least one pair of identical vertices. At this point we compare in Table I the possible longevities of C (2, 3. 4, or 6) with the corresponding minimum possible range (I, I, I. or 3, respectively) and record the greatest difference. We therefore have the following result:
Corollary.
Ir B cOllsists or integers, then the IOllgevitv or B is less than or equal to
IBI +3. We leave improvements on this specialized estimate as an exercise for the reader. For example. it can be sharpened from linear to logarithmic; namely. to 4( I + ,log 2 IB Il).
3. CANONICAL FORM. A little experimentation with different sets of numbers quickly leads one to the observation that there are different boxes that behave the same way with respect to the iteration process. For example. adding the same number k to each of the vertices of a box B (written B + k) or changing the signs of all the vertices (- B) will produce other parents of C (B), since each iteration records only differences between successive vertices; thus the map B f---* C (B) is many-to-one. Furthermore, since our real interest is the history of families rather than of individuals. we observe three other types of changes that generate family histories parallel to the original. Multiplying each vertex of a box B by a positive constant k will produce
428
© THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
a box (kB) whose child is that same multiple of C(B) (i.e., kB [> kC(B»; Band k B therefore take the same number of iterations to reach the all-zero box. We may think of Band k B as "cousins" with the same family histories. Finally, rotating or reflecting the numbers on the vertices of a box B (call these rotC B) and ref( B» will create further cousins, boxes whose respective children are the rotated and reflected versions of C(B) (i.e .. rot(B) [> rot(C(B». ref(B) [> ref(C(B». and so on through successive iterations). These changes are merely cosmetic, since the numbers retain their positions relative to each other. To simplify the analysis that follows, we define a set of equivalence classes that reduces the number of distinct boxes we must consider (and thereby reduces the dimension of the problem considerably. as we shall see).
Definition 5. The equivalence class of a box B = [a bed J consists of all boxes that can be obtained from B via a finite sequence of the following flve elementary operations: (I) translation: [a bed]
(2) negation: [a b c eI]
f--+
[(a+a) (h+a) (c+a) (d+a)] (a E R):
f--+ [-(/
(3) positive scaling: [a he dl (4) rotation: [a 17 c
elJ
f--+
(5) reflection: [a bed]
f--+
[17 cd
f--+
-
b - c - dl; [(aa) (ah) (ae) (ad) I (a E R+):
al:
[d c b a].
We can consider the flrst three operations as field operations. and the last two as isometries. It is simple to verify that the relation ~ thus defined is indeed an equivalence relation on the set B of all diffy boxes (i.e., it is refiexive, symmetric, and transitive). We also observe. following the same arguments given informally earlier. that if B \ ~ B 2 , then C(B\) ~ C(H:.), and B\ and Be have the same longevity unless one of them is the zero box and the other is [a a a a I (a f= 0). We would like to flnd a way to select a unique member from each equivalence class in B so that we can concentrate our remaining analysis on a reduced domain. To this end, we define the canonical form for an equivalence class.
Definition 6. The collollica!fonn for an equivalence class in B is one of the following: (i) [0000] for the class containing this (zero) box: (ii) [00 I II for the class containing this box: or (iii) the unique class member of the form [0 I x y I for which (x. to S = {(x. y) : x :::: o. y :::: I. x - I S y S x + I} otherwise.
y)
belongs
Equivalence class (i) consists of all boxes [a a a oj (a E R), each of which converges to zero in one iteration (if a f= 0). Equivalence class (ii) consists of all boxes [a a b h] and [a b b al (a. hER). which are seen to converge to zero in three iterations ([00 I II [> [0 I 0 II [> [I I I I] [> [0000]). Identifying the canonical form for all other classes requires the notion of an exlreme elemenl. that is. an clement ({ of a box that is either maximal (at least as big as each of the other elements) or minimal (at least as small as each of the other elements). Note that at least two of the four elements of each box must be extreme. The following result gives a procedure for determining a type (iii) canonical form. as well as a justification of its uniqueness.
Theorem 1. Any equi\'(/!cnce class ill B or tyPC (iii) (i.e., not including a hox oj' thcform [a a a a] or [a a h h]) has a uniquc rcprcscntatil'c [0 I x y] with (x. S = {(x. y) : x :::: o. y:::: I. x - I S y S x + I}. May 2005J
THE CONVERCiENCE OF DIFFERENCE BOXES
y)
ill
429
Proof We first provide an algorithm for finding the representative, using our elementary operations. We begin with an arbitrary diffy box [a h c dl and operate on it as follows (relabeling the results [a h e dl). I. Rotate (operation (4» until a box is obtained for which among la - bl, Ib - cl, Ie - dl, Id - al·
Id - al is maximal
2. (a) Observe that either a or d must be extreme. If necessary, reflect [a b e (operation (5» to ensure that a is extreme.
dl
(b) If a and d are both extreme, it's possible that la - bl < Ie - dl. If necessary, reflect (operation (5» to make la - bl ~ Ie - dl. 3. If a is maximal, use negation (operation (2» to make a minimal. 4. Use translation (operation
(I»
to make a = 0.
5. Use positive scaling (operation (3» to make b
=
I.
Observe that properties created at any step are preserved in subsequent steps. The last step is always possible since steps 2(b) and 4 together imply that e = d when b = 0, and the only two such cases correspond to equivalence classes of type (i), e = d = 0, or (ii), e = d i=- 0. Otherwise h > (from steps 3 and 4), so positive scaling can be used to normalize b. We now have a class representative of the form [0 led] (from steps 4 and 5). Since a = is minimal, c and d are nonnegative. From steps 1,4, and 5, we infer that d ~ I. On the basis of2(b) we have that Ie - dl S I, so e - I S d S e + I. Thus (c, d) lies in 5. It remains to show that the canonical form (iii) is unique: if [0 I a b] ~ [0 led], where (a, h) and (c, d) belong to 5, then (a, b) = (c, d). This can be seen first by observing that 0 is a unique minimal number (i.e., a, b, e, d > 0 except for [0 I 0 I], which has no other [0 led] representation), and second on a case-by-case basis by taking a or b to be maximal and showing that each of the seven transformations that would transform 0, I, a, or b to 0 and normalize either of the numbers adjacent to it results in a box with (c, d) not in S. As the details are simple but technical, we leave them as an exercise for the reader (we likewise leave as an exercise the proof that the equivalence classes containing [0000] and [00 I 11 have no type (iii) canonical form representation). •
°
°
4. A TWO-DIMENSIONAL MAP. We can now focus our remaining analysis on what happens to equivalence classes in B of the form [0 I x y] with (x, y) in 5. We first observe that in the special case y = I the children are of type (i) or (ii) (since [0 I x I] [> [I Ix - II Ix - II I D, hence convergence to zero occurs in two (x = 0, 2) or four generations. We now consider the diffy box process as a continuous map from 5\ {(x, y) : y = I} into 5 that calculations show to be given by (u, v) = I(x, y)
XX - --) ( 1 +y - I' y - I
430
2-x - - -X) ( 1+ y-l'y-I
if x > I. x < y Sx+ I (ex,y) E 52);
2-x 2-X) --,2+-( 1+ y-I y-I
if x, y > I, x - I S y S x (x, y)
©
THE MATHEMATICAL ASSOCIATION OF AMERICA
E
5,).
[Monthly 112
Figure 2 illustrates the three sets SI. S2. and S, into which this definition decomposes S. y 2.5
2
1.5
0.5
2
1.5
2.5
Figure 2. Suhdi\'ision of S into regions corresponding to the three hranches of f.
By inspection we can see that the first and third branches of this function are infiniteto-one mappings: the first branch sends all points to the line v = II - I (with u ::: 2. since)' S x + I implies that x I (Y - I) ::: I). while the third branch sends all points to the line v = 1I + I (with 1I ::: O. since y ::: x - I implies that (2 - x)/()' - I) ::: -I). On the first branch (where x S I). f(xi. )'1) = f(X2. \'2) holds when )'1 -
I
XI -
0
that is. when (X1, .\'1) and (x> )'2) lie on the same line segment from (0,1). Likewise, on the third branch (where Y S x). f(xi . .\'1) = f(X2, .\'2) occurs when
that is, when (X1, yJl and (X2, )'2) lie on the same line segment from (2, I). On these branches, f compresses two-dimensional regions into rays on the boundary of S. The second branch, however, is one-to-one: for (X1, )'1) and (X2' )'2) both in S2, f(X1, yd = f(X2' V2) implies that (X1' )'1) = (X2, V2). We can see this by noting that
and substituting the latter expression into 2-
XI
2-
Xo
1+--=1+---. .\'1 -
May 2005]
I
)'2 -
I
THE CONVERGENCE OF DIFFERENCE BOXES
431
We can also see that the first and third branches have no fixed points inside their respective domains: the first branch sends all points to the line v = u - I, which isn't in SI, and branch 3 sends all points to the line v = u + I, which isn't in S1. (In fact, there is no possible periodicity in these regions, either, since for (x. y) in SI or S,. (c. d) = I(f(x . .1'» has d = I.) Therefore any fixed points must lie in S2. Indeed, straightforward calculations show that S7. contains the unique fixed point (q(q - I). q) ~ (1.5437. 1.8393), where
is the unique real solution to c/ - q2 - q - I = O. Here C(lO I q(q - I) qJ) ~ [0 I q(q - I) q] (i.e .. the child is in the same equivalence class as the parent. and therefore takes just as long to converge-in other words, it has infinite longevity). The only other equivalence class for which this is true is [0000] (but properly speaking this class has no longevity). We should note, however, that the foregoing does not imply that B = C(B) for members of the class containing [0 I q (q - I) q J. In fact, in an R+ -norm sense, as well as that of Definition 3, members of this class do approach the zero box under iteration-they just take infinitely long to get there. For example, calculation of a few iterations of the ditt"y box process beginning with [0 I q (q - I) q] shows that after the first step the four entries gradually diminish in size (by a factor of q - I) as they cycle around counterclockwise, in keeping with Proposition I. We now determine whether the fixed point (q (q - I). q) is stable, in the sense of other nearby points' images approaching it under repeated application of f. To determine the stability of a fixed point of a complex map I. one looks at its Jacobian matrix Jf. This is derived from the linearization of the map and consists of the map's partial derivatives, evaluated at a given fixed point. For f = (fl (x. y) . .t? (x, the Jacobian is given by
\'»,
Jf
This (second) branch of
f
l
all
all
(ix
i.Jy
ii.t?
i.J.t2
ax
ay
]
has
l
--I-
JtCe v)
Jt (q (q - I).
=
q) ~ [
1'-1
(yx-2 _ 1)2
J
~ If!
.
_-._I_ y-I
-1.1915 I. 1915
(y
-0.6478 ] -2.1915 .
The matrix on the right has eigenvalues A ~ -1.6915 ± 0.7224i. Because the eigenvalues have magnitude greater than I. the fixed point is unstable. Because the imaginary components are nonzero, we see that points near (q (q - I). q) spiral away from it under (repeated) application of f. (To read more about stability analysis of fixed points, see [4] or [5J.) Because this is the only fixed point. one might expect that further applications of f will eventually move any other point to the boundary of the
432
©
THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
domain, and then out of it entirely. Therefore, we might expect that those ditTy boxes that take longest to converge to zero correspond to those points in 5 closest to the fixed point (q(q - I), q). As we shall see later, these intuitive notions turn out to be correct. If we begin a case-by-case analysis of the successive applications of f in 5, we notice the domain subdividing into regions beginning along the boundaries and working in toward the fixed point.
Example 1. Any box [0 I x y] with x :::; I converges to zero within four generations (three if .y = x + I, two if (x, .y) = (0, I». We calculate [0 I x
y]
[> [I (I - x) [>
(y -
x)
Y]
[> [x
(y -
I) x (y -
I)]
[Iy - x - Illy - x - Illy - x - Illy - x - II]
[> [00001
and observe convergence to zero one or two generations sooner in the aforementioned special cases. (This corresponds to region 51 in Figure 2.)
Example 2. Any box [0 I x y] with x 2: y and x 2: 2 converges to zero within four generations (three if y = x - I, two if (x. y) = (2, I». We compute [0 I x y] [> [I (x - I) (x - y) [> [(y - x
+ I)
(v -
Y] [> [(x x + I) (v -
2) (y - I) (2y - x) (y -
x
+ I)
(y - x
+ I)] [>
I)]
[0000],
again observing the quicker convergence for the special cases.
Example 3. Any box [0 I x vj with I < v :::; x < 2 converges to zero within six generations. Here we find that [0 I x v] [> [I (x -I) (x-y) 2] [> [(2-x) (v-I) (2\'-x) (y- I)] [>
[Ix + y
- 31 (y -
x
+ I)
(y -
x
+ I) Ix + y -
31] [> [p 0 pO]
[>[ppppj[>[OOOOj.
where p = Ilx in Figure 2.)
+y -
31 -
y+x
-
II. (Examples 2 and 3 together comprise region 53
Note that Examples I, 2. and 3, together with the case v = I discussed earlier and the type (i) and (ii) classes. cover all nonmonotone classes of diffy boxes. I (Monotone classes have 0 < I < x < L) Further calculations show that monotone classes have longevity at least five. Figure 3 demonstrates how 5 is subdivided into regions of different longevities (the lighter the region. the greater the longevity). The only two equivalence classes not depicted are (i) and (ii): here (ii) can be considered as the point at infinity (by which is meant the point that must be adjoined to R2 in order to compactify it). The black dot in the center is the fixed point, and the white region around it represents all equivalence classes of longevity ten or more generations. Note that where two regions of different longevity meet. the boundary between them belongs to the region of lower longevity. To determine the longevity of monotone classes in full detail, we change our approach from the sort of increasingly detailed calculations in the foregoing examples to a consideration of preimages under f. We will need to invoke the following result regarding the invertibility of the map f. I They
thus provide an alternate proof of Proposition 2.
May 200S]
THE CONVERGENCE OF DIFFERENCE BOXES
433
4
3.5
2.5
2
1.5
2 0.5
4
2
1.5
2
x
2.5
Figure 3. Suhdivision of S into regions colored hy longevity.
Theorem 2. The map
I
1.12 has an inverse g that is defined on the set
A
=
int SUi (x. I): 0 < x < 2}.
l/1aps the interior of" 5 into the interior ()f" 52, and preserves line segmellts.
Proof We have already seen that of the three branches in the definition of I. only the second is one-to-one. Since the images of the first and third branches lie on the left and right boundaries of 5. the inverse map g = Ils,1 should be well-defined on the set A. We can invert the expression for I on the second branch to find that
g(x. y) = ( 2y . _x_+---=--,"_+_I) . x+y-I x+v-I for (x. r) in A. We observe that g maps the interior of 5 into the interior of 52: for g(x. r) = (u. v). x > 0, r > 1, and x - I < y < x + I together imply that u > I and
u
I) inductively, as follows. Let T} = 5, and for n > I let T,,+I = I-I(T Because the backward map g is not defined on the left and right boundaries of 5, we shall consider the first few examples individually. until we arrive at a T" such that T" C int 5. We also need to consider the type (ii) class (the I1 ) .
434
© THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
point at infinity), because when 0 < x < 2, [0 I x I] c> [I Ix - II Ix - II I] ~ [0 0 I I] c> [0 I 0 I], that is, the diffy box process sends points (x, I) on the boundary (0 < x < 2) to the type (ii) class and then sends the type (ii) class to the point (0, I) in 5. Tj
T2 y
T..,
oc
r
;1
\' /
/
---_/ x
0
Y
/>
/
x
0
Tr,
Ts
/
/
/
x
0
;1
/
/
T7
\'
\'
/
I
/
,\/
0
0
X
0 Figure 4, The sets Tn for 2 :::
0
X
11 :::
.r
7.
Four further calculations are necessary before we can make use of g. Excluding the point at infinity from 5, we find that T j = I-I (5) is 5 without its lower boundary {(x, I) : 0 < x < 2} but with the point at infinity (see Figure 4), since the only points in 5 that do not have images in 5 are on the lower boundary y = I. (The previous paragraph addresses the point at infinity.) Next, T.., = I-I (T}) is 5 without its left and right boundaries (i.e., T... = A), as the preimage of the interior of 5 is the interior of 52, the preimage of the left boundary of 5 is 53 n int 5, the preimage of the right boundary of 5 is 51 n int 5, and the preimage of the point at infinity is the lower boundary of 5 (endpoints excluded). The left and right boundaries of 5 (endpoints excluded), which we exclude from T4 , are the preimage of the lower boundary of 5 (endpoints included), which we excluded from T j • (The corners are not preimages of anything in 5.) Next we find that To, = I-I (T..,) = 52: the included boundary is the preimage of the lower boundary of 5, endpoints excluded. Finally. we find that Tn = I-I(T,,) is the interior of the triangle with vertices (I, I), (2,1), and (2,3), again by excluding the preimages of the parts of 5 excluded from T" (see Figure 4 for sketches of all these). Tn lies in int S. If we continue in this way, we discover that the 0, for 11 > 6 are also interiors of triangles. When n ~ 8 the vertices of these triangles are in the interior of 5. which allows us to keep track of the 0, via their vertices (Table 2 provides a partial list, and Figure 5 shows boundaries of some of the T" (n :s 10) superimposed upon each other). The utility of the T" derives from the fact that all equivalence classes of longevity n (n > I) are represented in 0,. It is simple enough to check this for the first few examples; thereafter the result follows by induction. It is also worth noting that. although Tn does not contain all classes of longevity 11 + I, it does contain all classes May 2005]
THE CONVERGENCE OF DIFFERENCE BOXES
435
Table 2. Vertice, of Til for 6 S
II
S 10.
Vertex I
Vertex 2
Vertex 3
6
(I. I)
(2. I)
(2.3)
7
(2.3)
(I. 2)
. ~)
R
. ~)
(2.2)
.2)
9
.2)
10
. ~)
11
. ~) (
.Jt)
\'
3.)
2.)
2
I.)
I.)
0.)
Figure 5. 7;,
(II
2
2.)
S I OJ sllperimpmco upon each other.
of longevity 11 + 2 or greater. Furthermore, we observe (again by induction, starting with 11 = 2) that T,,+2 C T", so that T,,+, C T,,+ I, and we can classify the set of all equivalence classes of longevity 11 (11 > I) as precisely T" \ (T,,+I U T,,+2)' That is, an equivalence class has longevity 11 if and only if its canonical representative is in T" but not T,,+I or T,,+2' At this point the question arises of how to test where a given point (i.e., equivalence class) falls relative to the sequence of triangles T" (n 2: 6). Arguably the simplest is just to plot it on a graph containing (enough of) the T". There are also several simple algebraic approaches, however, to test whether a point falls within a given triangle. One is to write the points involved in vector form. First let the interior of each triangle be written as the set of points whose coordinates are a weighted average of the coordinates of the three vertices VI, V2 , and V,:
f 436
=
I(x. -") : (x. -") =
r VI
+ S V2 + (I
- r - s) V,:
r.
S
> 0, r
© THE MATHEMATICAL ASSOCIATION OF AMERICA
+s
< I).
[Monthly 112
Now, to test whether a point P is inside rand .1':
t, calculate the corresponding "coordinates"
and check whether r > 0, S > 0, and r + s < I. For example, To is the set of (x, y) such that (x, y) = r(l. I) + .1(2, I) + (1 - r - .1')(2, 3) for some rand s satisfying r > 0, S > 0, and r +.1 < I, and the fixed point P is (q (q - I), q) ~ (1.5437. 1.8393), making ----+
V3 VI = (-1. -2).
----+
V, V2 = (0, -2),
and
[ :: ] =
[=~ _~
----+
V,P
r =~:~~~~ ]
~
(-0.4563, -1.16(7),
l
[
= (0.4563.0.124),
verifying that P lies in T6 . The only drawback to an algebraic approach is that it is inescapably recursive, and the number of calculations required to continue testing whether a given point falls inside each ~, is comparable to the number of calculations required simply to take the diffy box process toward its eventual end. A graphical approach merely requires plotting a sufficient number of ~, so that the point falls outside two consecutive triangles. We close this section with one more way to look at the domain S. Figure 6 divides S into three disjoint invariant regions by shades of gray: that is, each shade (light. jumping medium, or dark) represents a sequence of images and preimages under around and toward the fixed point. The fading of the colors near the fixed point indicates increasing longevity.
t,
5. CONCLUSIONS, APPLICATIONS, AND EXTENSIONS. We now return to our original question: Does every diffy box converge to the zero box in a finite number of generations, and, if so, how many generations will it take'? We can now reinterpret the results of our analysis on equivalence classes in terms of boxes as four-tuples. The answer to our question as phrased is no, for the diffy boxes belonging to the class containing the box [0 I q(q - I) q J associated with the fixed point of the map require an infinite number of iterations to reach zero: any ditTy box in this class has entries (vertex numbers) that become smaller and smaller but never actually reach zero. However, the counterexample class is unique, and all other ditTy boxes converge to zero in finitely many generations (the number depends on how close their canonical forms are to the fixed point). In fact. for any specified longevity, there are classes of ditty boxes that take precisely that long to converge to zero. Nonmonotone boxes converge especially quickly (in no more than six generations), while monotone boxes have longevity at least five. To determine the longevity of a monotone diffy box, it is simplest to put the box in canonical form and compare its coordinates (x. y) with a graph of the regions of various longevities identified in the previous section. We might also make the observation that (as seen by the regions into which S is subdivided in Figure 3) the use of "complicated" numbers such as radicals or transcendentals does not really prolong convergence much, since within a couple of generations the differences have propagated through the four vertices and get subtracted out.
t
May 2005J
THE CONVERGENCE OF DIf'f'ERENCE BOXES
437
r
0.5
1.5
2
2.5
x
Figure 6. Subdivision of S into 3 invariant regions (by color).
As mentioned in the introduction, diffy boxes can be used as problem-solving contexts for clemcntary grades students (see [3]). After working through several diffy boxes, children can group them according to longevity and begin to observe some patterns in the forms of boxes that converge in one, two, three, and possibly four generations. They can also observe the properties we used in Definition 5 to define equivalence, as well as the effects (or lack thereof) of using numbers other than whole numbers. A natural extension of the diffy-box problem that we leave to the reader is the generalization from squares to other polygons. For example, a quick investigation of "diffy triangles" reveals a peculiar chasing pattern and the surprising(?) result that no "diffy triangles" ever converge to the all-zero triangle, except for those with all three numbers the same (to convince yourself of this, try to construct the parent of such a diffy triangle). From this observation one might try to classify the types of possible behavior of diffy triangles, or else move to a larger scale and perhaps consider convergence of "diffy polygons" with varying numbers of edges. (If we place "diffy polygons" in the context of graph theory, we see that any simple generalization to a more general class of graphs is prevented by the fact that only the cycle graphs CII (i.e., polygons) have line graphs isomorphic to themselves. Considering polygons as cycle graphs, however, does allow us to include the trivial example C 2 , the two-sided polygon, which converges to zero in two steps for any two starting values.) The reader interested in the "diffy n-gon" may wish to take a hint from Winkler [7, chap. 2], who observes (p. 17) that, for the special case where all vertices are integers, "A little analysis via polynomials over the integers modulo 2 shows that the salient issue is whether 11 is a power of 2." Winkler's book also uses integers modulo 2 to establish 438
© THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
the logarithmic bound suggested at the end of section 2 for integer-valued ditTy boxes (using the maximum entry rather than 181). Finally, it is interesting to note that the irrational number q involved in the fixedpoint ditfy box class is also associated with sequences of numbers called tribol1Clcci numbers. Similar to the notion of Fibonacci numbers, tribonacci numbers are a sequence of numbers til that obey the recursive relation til = til_I + t ll -2 + t ll -3. (The sequence typically begins with tl = 1, t2 = Land t, = 2.) Like Fibonacci numbers, any sequence of tribonacci numbers tends toward a geometric increase, with the ratio of any two successive numbers in the sequence approaching a fixed constant. For tribonacci numbers that constant is q. In fact, beginning as indicated. til ~ q" as 11 ---+ 00. (If we look for geometric solutions til = {/" to the defining recursive relation, we see that we must have {/" = a"-I + (111-2 + (111-3 or (r' = (/2 + (/ + I. the same equation we solved to obtain q.) It is remarkable how mathematically rich such a simple notion can be. We invite the reader to explore further. ACKNOWLEDGMENT. The authors thank George Christ for introducing them to ditly boxes.
REFERENCES I.
M. Chamberland and D. Thomas. The n-number Ducci game-Open problems. 1.
DifJ~m/Ce
EI{. AI'I'I. 10
(2004) 339-342. 2.
C. Ciamberlini and A. Marengoni. Sa una interessante curiosita numerica. Paiod. Mat. Sa 4 (1937) 25-
30. 3. 4. 5. 6. 7.
Charles A. Dana Center. TEXTEAMS Rethinking Elelllellturl School Mul!wl/I(I[ic.l. Purl I. University of Texas. Austin, 2002. J. Guckcnheimer and P. Holmes. NOlllinear Oscill
ANTONIO BEHN received a Ph.D. in mathematics from the University of Wisconsin. He held a postdoctoral position at Texas A&M University before returning to Chile to teach mathematics at the Universidad de Chile. His research interests include ring theory. Universidad de Chile, Santiago, Chile [email protected]
CHRISTOPHER KRIBS-ZALETA is a professor of mathematics and curriculum & instruction at the University of Texas at Arlington. He received undergraduate degrees in electrical engineering and mathematics from Duke University, a master's degree in electrical engineering from Georgia Tech, and a master's and Ph.D. in mathematics from the University of Wisconsin. His research interests include mathematical biology and mathematics education. University of Texas at Arlington, Arlington, TX 76019-0408 kriiJs@u/a.edu
VA DIM PONOMARENKO received an undergraduate degree in mathematics and computer science from the University of Michigan and a master's degree in computer science and Ph.D. in mathematics from the University of Wisconsin. He is a professor of mathematics at Trinity University. His diverse research interests lie primarily within combinatorics. Trinity University, Sail Antonio, TX 78212 [email protected]
May 2005]
THE CONVERGENCE OF DIFFERENCE BOXES
439
Do Normal Subgroups Have Straight Tails? Paul E. Becker
1. INTRODUCTION. In this article we show that group representations permit simple, visual explanations of semidirect products and related concepts. Along the way, we prove an extension of Cayley's theorem, define blocks and tails for the resulting representations. and investigate the title question: Do normal subgroups have straight tails? We consider representations of finite groups over the field of complex numbers. Specifically, we define a representatio/1 o( degree n of a group G to be a homomorphism from G into a multiplicative group of complex 11 x 11 matrices. A more general viewpoint is certainly possible. If F is any field, then an F -representation of G is a homomorphism from G into the automorphism group of some F -vector space. Our first example is somewhat trivial. View the cyclic group of order three as the powers (modulo 3) of a single variable x: Z, = (x : x J = e). We define a homomorphism (jJ into the multiplicative group of 3 x 3 complex matrices as follows:
w(x)
w(x')
n U
~ [~
0
~
0 0
~ [~
I
0
0
0
n
( 1.1 )
~l 0
Thus (jJ is a representation of Z, of degree three. This representation is not very efficient. It docs, however, have several nice features. First. (jJ is faith(u/: the kernel of (jJ is the identity, so the matrix group (jJ (Z,) is isomorphic to ZJ. Second, (jJ is a permlltation represe11tation: (jJ maps group elements to binary matrices with a single one in each row and column. Cayley's theorem implies that any finite group admits a similar representation. For the record. we state:
Theorem 1.1 (Cayley's Theorem). Each group is isol7101phic to some group o(per!Illlta tions.
For any group G. let Sc; denote the permutation group on the elements of G. Suppose that G is finite. with IG I = n. Then Sc; is isomorphic to Sn. the group of permutations on {I. 2.... , /1}. Cayley's theorem implies the existence of an injective homomorphism p : G -+ Sc;. This homomorphism identifies each element g of G with a permutation peg) of the elements. We make a fairly standard choice for peg); namely, the map given by h r-+ gh. Next. we identify the resulting permutations with matrices. Order the elements of Gas gl • .... gn' For each element g of G we define the 11 x n binary matrix Mp(g) as follows:
440
©
THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
M
,[i 1'(\)
'l={ 0I
ifggj~gi'
,.I
otherwIse.
Then G is isomorphic to the multiplicative group of permutation matrices {Mp(g) g E GJ. The isomorphism w given by K ---+ M/)(g) is called the (lejt-) reKular representation of G. We saw the regular representation of Z, in display (1.1). This terminology can vary: some authors call p the regular representation of G, leaving the matrix interpretation w to the reader. Cayley's theorem guarantees a faithful permutation representation for each finite group. It does not, however, guarantee an efficient representation. Section 2 shows that a careful application of Cayley's theorem provides efficient representations of direct products. We call these representations "faithful blockings." Section 3 describes a method for constructing faithful blockings of semidirect products. Section 4 proves an extension of Cayley's theorem, showing that finite semidirect products always admit faithful blockings. In the final section, we briefly describe classroom applications, and suggest related undergraduate research projects.
2. FAITHFUL BLOC KINGS. The external direct product of groups A and B is the group of ordered pairs P = {(a, h) : a E A, h E B}, where multiplication is defined component-wise. We denote this by A EB B. A group G is factori~ahle if it contains nontrivial subgroups Nand H such that G = NH = {nh : n E N, Iz E H}. We say that G is a semidirect product of N by H if the following hold: G = NH, N is normal in G, and N n H = {e}. This is denoted by G = N /1 H. If G = N /1 Hand H is normal in G, then G is an internal direct product of Nand H; we denote this by G = N x H. The concepts of internal and external direct product are equivalent. Suppose that P is the external direct product of groups A and B. The sets A' = {(a, e) : a E A} and B' = lee, b) : b E B} form normal subgroups of P. Furthermore, P factors as P = A' B' = {a'b' : ct' E A'. b' E B'}: we may view P as the internal direct product A' x B'. Conversely, every internal direct product is isomorphic to an external direct product. Therefore. we occasionally describe groups as "direct products" without specifying their precise constructions. Assume that G is a finite direct product, say G = N x H. Cayley's theorem promises a representation of G wi th degree 1 G 1 = 1 N 1 . 1 H I. We create a more convenient substitute, with degree 1 N 1 + 1 HI, by combining the regular representations of Nand H. Let and f3 denote the regular representations of Nand H. respectively. If g = /liz with 11 in Nand h in H. then O'(n) is an INI x INI permutation matrix, while /}(h) is an 1 H 1 x IH 1 permutation matrix. Our substitute for the regular representation of G is the blocked representation <1>:
0'
cP(g) =
[0'(0
11
)
0] f3 (/1)
.
In particular,
0] I
'
The formal statement of this idea, Corollary 2.1, is relatively easy to prove. In section 4. we show that a similar statement is true for semidirect products. May 2005]
DO NORMAL SUBGROUPS HAVE STRAIGHT TAILS')
441
Corollary 2.1 (Cayley's Theorem for Direct Products). ff G is a finite direct product of Nand H, then G is isomorphic to a subgroup of 5 N EEl 5 H and admits afait~ful permutation representation in IN I x IN I and I H I x IH I blocks. Assume that G is a finite factorizable group, G = NH. We hope to construct a faithful permutation representation of G from representations of Nand H. Afaithful blocking of G is a faithful permutation representation
where a and f3 are representations of G whose restrictions to Nand H, respectively, are faithful. We use the terms "blocks" and "tails" to describe the nonzero parts offaithful blockings. If
Example 2.1. Faithful blocking of a direct product. G
= =
= a 3 = b 2 = e, xa = ax, xb = bx, ab = ba- 1 ) = e) x (a, b: a 3 = b 2 = e, ab = ba- 1)
(x, a, h : x 3 (x: x 3
;-::: £, EEl 5,
[~ g] 0 0
0
1
0
o
o
o
442
0 100 1 0 000 o 1 0 0 0 00000 000 I 0 o 0 0 0 1
0 0 0 1 0 0
© THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
0
0 0 0 I 0 0
= 0
0 0 0 0 0 I
0 0 0 0 I 0
0 0 0 0 0
0 0 I 0 0 0
0 I 0 0 0 0
We view elements of
Conjecture 1 (Normal Subgroups). Anyfactoriz.able finite group G = NH admits a faithful blocking. Furthermore, N (respectively, H) is normal in G If and only if N (respectively, H) has a straight tail under the faitMul blocking. Conjecture I is certainly true if G = N x H. Example 2.2 demonstrates that the conjecture is credible for the broader class of semidirect products. Recall that G is a semidirect product of N by H if the following hold: G = NH, N H = Ie}, and N is normal in G. In Example 2.2, we create a faithful blocking for the semi direct product
n
G
= (x, y
: x3
= l = e,
yxy-I
= X-I) = (x
: x3
= e)
>
(y :
l = e) = X
>
Y.
The generator of X must have a block with multiplicative order three. We assume the tail of X is straight. The generator of Y must have a block with order four. As Y is not normal, we suspect that tail(Y) cannot be straight. Therefore, we create a twisted tail, with multiplicative order two, for its generator y. The resulting matrix group is isomorphic to G. Indeed, it represents G in a faithful blocking.
Example 2.2. Faithful blocking of a semidirect product.
G
= (x, y
: x3 =
l = e, yxy-I = X-I) = X >
[! HJ
0
o
o
o
May 2005]
DO NORMAL SUBGROUPS HAVE STRAIGHT TAILS')
443
In Example 2.1 (a direct product), the tails are largely place-holders. In Example 2.2 (a semidirect product), the tails are significant. In particular, Conjecture I predicts that normal subgroups can be identified by their tails. We test this prediction for the subgroup Y.
[(i
()
0
I
[
(\
()
0
]
()
][
()
,
()
[
. .
,
. I ()
I .
()
[':
0 0
0 ()
()
I
0
0
0
I
,
"l
][[:;
0
I 0 0
~]
0
()
()
o
• ] I .
. I
()
[~
0 0
0
()
()
()
I
()
()
I
!ill
The resulting matrix is not in
Conjecture 2 (Faithful B10ckings of Semidirect Products). If" G is a .finite semidirect product of" N bv H. then G admits a faith/iii blocking in I N I x I N I and I H I x I H I blocks. Furthermore. N has a straight tail. and tails jimn H act on blocks/imn N by cOlljugatioll. Suppose that our conjecture is accurate. We can then view any semidirect product G = N )<1 H as a group of blocked matrices. The group of blocks (block(x) : x E N} faithfully represents N. while the group of blocks (block(y) : y E H} faithfully represents H. The group of tails (tail (y) : y E H) accurately represents the interaction between the subgroups Nand H.
3. FAITHFUL BLOCKINGS OF SEMIDIRECT PRODUCTS. In this section. we describe a construction for faithful blockings of semidirect products. We delay until section 4 the formal proof that this construction produces faithful blockings. Recall the faithful blocking of Example 2.2. The blocks of X and Y were essentially the regular representations of X and Y. We would like to explain tails of faithful blockings in a similar manner. To do this. we require a well-known generalization of Cayley's theorem. 444
©
THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
The generalized Cayley theorem. Cayley's theorem identifies each element of a group G with a permutation of the elements of G, A more general viewpoint identifies each element of G with a permutation of the cosets of G relative to some subgroup. Assume that A is a subgroup of G. and let G / A signify the collection of left cosets of G with respect to A (i.e .. G / A = (g A : g E G}). For each p in G define the coset permutation cp;\[p] to be the bijection on G/A given by gA f--* (pg)A. Now G/A forms a group only if A is normaL but the set SCi/;\ of all coset permutations is always a group under composition. (For more details. see Adkins [1].) The resulting generalization of Cayley's theorem paraphrases a result from Gal!ian's book [S. p. 420]:
Theorem 3.1 (Generalized Cayley Theorem). If A is a sllbgroup of G, then SCi/A is a group and the //lapping cp;\ : G ~ SCi/,\ is ([ hO//lo//lorphism. If M is (J normal subgroup ofG that is contained in A, then M ~ Ker(cp;\) ~ A. Assume that a group G factors as G = N H. Theorem 3.1 defines corresponding homomorphisms cp/{ and CPN. The kernel of CPH is the largest subgroup of H that is normal in G (it is frequently called the core of H): core(H)
=
Ker(CPH)
= (p
E
G: (pg)H
= gH
for all g E Gl.
Thus, Theorem 3.1 implies that CPII(G) ~ G / core(H) and. by similar reasoning, CPN(G) ~ G/core(N).
Cayley's theorem and semidirect products. We earlier produced faithful blackings of direct products by mapping G = N x H into Sly EO SH' Since N is normal in G, G / N ~ Hand SCi/N ~ SII. Similarly. SN and SCi/II are isomorphic. We could say that, because G = N x H is isomorphic to a subgroup of SCi/H EO SC/N. direct products admit faithful blockings. Theorem 3.1 suggests an extension of this logic to general factorizations. If G = NH, then G admits the homomorphisms CPN : G ~ SC/N and CPH : G ~ SCi/H. Define cP = (CPII. CPN) to be the homomorphism from G into SCIf/ EO SC/N given by cp[g] = (CPH[A']. CPN[gJ)· We are searching for faithful blockings. A natural question is: Does cP produce the faithful components we require? From Theorem 3.1. we know that Ker(CPII) ~ H. while Ker(CPN) ~ N. Thus, Ker(cp) ~ N H. so cP is one-to-one if N H = Ie}. We thus obtain an extension of Cayley's theorem:
n
n
Proposition 3.1. IfG = NH ond N
nH
= Ie}. then Gis isoll101phic to a subgroup
of SC/H EO SC/N.
A modification of cP should produce the desired matrix representations of G = NH. Define WH to be the injective homomorphism that identifies each permutation p of SC/H with the corresponding binary matrix MI" We can convert the homomorphism CPH : G ~ SC/H to a matrix representation by composing it with WHo Thus. cDH = WH 0 CPH maps G into IG/HI x IG/HI matrices. Similarly. we define W,y and cDN = WN 0 CPN. We assemble these pieces to form cD = (cD H. cD N). which maps G into blocked permutation matrices. If N H = Ie}. then cD is faithful. We now have sufficient information to prove that semidirect products admit faithful blockings. Before completing the proof. however. we use the methods that we have just described to construct an example of a faithful blocking.
n
May 2005]
DO NORMAL SUBGROUPS HAVE STRAIGHT TAILS')
445
Construction of faithful blockings. Let G be the semidirect product with presentation: G
=
l = e, yxy-I = XC) e) >
(x. y : x 5 = 5
= (x : x =
=N>
Note that N is normal in G. Furthermore, H = (y) acts on N = (x) by conjugation. We expect G to admit a faithful blocking in which N has a straight tail. The representation should produce matrices containing 8 x 8 and 5 x 5 blocks. (This is somewhat more manageable than the regular representation, with its 40 x 40 matrices.) We first deal with cP N. For this, we order the elements of the quotient group G / N as N,yN, ... ,/N.
We define ¢ N via the left -action of the generators y and x on G / N. In other words, ¢N [y 1and ¢N [x 1are the permutations of G / N that occur when we multiply cosets on the left by y and x, respectively. We begin with ¢N[yj. For any exponent k, we see that ¢NLv](/ N) = (y/)N =
/+I(modS) N.
Thus, ¢N [y 1cyclically permutes the collection of cosets. Now consider the action of x on G / N. From the group presentation, we have yxv- I = Xl. This implies that xy = yx', whence for any k ¢N[x](/N)
=
(x/)N
=
=
/(x 3 k(modS)N)
/N.
In other words, ¢N [x 1is the identity permutation of G / N. The homomorphism ¢N, and the corresponding matrix representation CPN = WN 0 ¢N, are now fully determined. The restrictions of ¢N and CPN to H are faithful, while their restrictions to N are trivial. The mapping cP N defines tail(x) and block(y) in a partially completed faithful blocking: tail(x)
block(y)
0
CPN[Xj =
CPN[yj =
I
0 0
0 0 0 0 0 0
I
0 0 0
0 0 0 0 0
I
0 0 0 0
0 0 0 0
I
0 0 0 0 0
0 0 0
I
0 0 0 0 0 0
0 0
0
I
0 0 0 0 0 0 0
I
0 0 0 0 0 0 0
We might have predicted the straight tail of x, for x belongs to Nand N is normal in G. We next turn to cP H. Here we order the co sets that comprise G / H as
The permutations ¢ H [x j and ¢ H [y j are the left-actions of x and y on G / H. 446
© THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
The mapping ¢H[X] is a cyclic permutation given by Xk H ---+ xk+l(mod5) H. The action of y is more complex. From the group presentation, we have yx = x 2 y, so
for any exponent k. We concl ude that ¢ H [y 1 fixes H. while exchanging the other cosets. The corresponding matrix representation
block(x)
I
=
0 0 0 100 0
I
0
[ 001 000
I]
o
0
00. o 0 1 0
As tail(y) is not the identity, H has a twisted tail. This is again not surprising, since H is not normal in G. The representation
0
0
] = [
bIO~(X)
] = [
tai~Y)
0 tail (x)
bIO~C.}')
l ].
The representation is consistent with our comments following Conjecture 2: N is faithfully represented by the group of blocks from N; H is faithfully represented by its group of blocks; and the action of H on N is represented by the twisted tails from H. We now prove that this construction produces a faithful blocking for any finite semidirect product.
4. EXISTENCE OF FAITHFUL BLOCKINGS. Suppose that G is a semidirect
n
H = {e}, so the mapping ¢ = (¢ H, ¢ N) is well-defined product of N by H. Then N and one-to-one. We can invoke Proposition 3.1 to assert that G is isomorphic to a subgroup of Sc / H EB Sc / N . Furthermore, N is normal and GIN is isomorphic to H, so SC/N ;:: SH. This gives:
Corollary 4.1. If G is a semidirect product of N by H, then G is isomorphic to a subgroup of SC/H EB
SH'
From here, the proof that semidirect products admit faithful blockings is straightforward. We simply verify the behavior of
Theorem 4.1 (Faithful Blockings of Semidirect Products). If G is a finite semidirect product of N by H, then G admits a faithful blocking
DO NORMAL SUBGROUPS HAVE STRAIGHT
TAILS~
447
Proof: From Corollary 4. L we know that
n
n
=[
block(n) . tail(h) . block(n- I ) 0
tail (11) .
block(n)· tail(h) . block(n- I )
o
1.
n
bIOC~h) . tail (n- I) ]
bIO~(h) . 1 ].
Sincetail(h) = I,
-
[I0 bIO~(h)
] =
We conclude that H is normal in G and that G is a direct product.
•
5. CONCLUSION. Faithful blockings have proved useful for teaching introductory abstract algebra. Students have constructed these representations with computer algebra systems, creating and exploring models of small groups. The advantage of this approach is that it is visual. Students can see the internal structure of a group in its blocks and tails. Normal subgroups, quotient groups, and other fundamental concepts admit simple visual explanations. Consider a final example. We have indicated that a group G is a semidirect product of N by H by writing G = N :xl H, but the notation G = N :xlii H is equally common. The symbol fj represents the action of H on N. More formally, fj is a homomorphism from H into the automorphism group of N. Faithful blockings display such mappings as matrices: the automorphism group 8(H) is isomorphic to the matrix group (tail(y) : y E H}. We stated two conjectures. Conjecture 2 is correct: Theorem 4.1 confirms that finite semidirect products admit faithful blockings and that their tails reflect group structure. In semidirect products, normal subgroups have straight tails. We neither proved nor disproved Conjecture I, which asserts that every group factorization implies a faithful blocking. This conjecture would make a good subject for undergraduate research projects. We suggest several related projects: I. We showed that semidirect products admit faithful blockings. The proof of Proposition 3.1, however, requires only G = NH and N H = {e}. Are there groups that admit faithful blockings, but that are not semidirect products?
n
2. We found that, for G = N :xl H,
© THE
MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
4. Erdos and Strauss [4J proposed several interpretations of the question: How Abelian is a finite group? Is there is a simple visual interpretation for this question? Is there a simple visual answer?
5. Complements of normal subgroups are not always unique. It is even possible for a group to be simultaneously a direct product of N by H and a semidirect product of N by another subgroup K [3]. Are there visual criteria for possible complements to a normal subgroup?
6. Wild [7] classifies the groups of order sixteen as cyclic extensions from groups of order eight. Which of these extensions correspond to nontrivial faithful blockings? These questions are only a beginning. The literature on permutation representations is extensive. Dozens of authors (reference [2], for example) give partial answers to the question: Which groups admit convenient permutation representations? ACKNOWLEDGMENT. Thanks to the anonymous referee. whose detailed comments greatly improved this paper.
REFERENCES
I. 2. 3.
4. 5. 6. 7.
W. Adkins and S. Weintraub. Algebra, All Approach Via Modllie Thcory. Springcr- Verlag. New York, 1992. L. Babai, A, 1. Goodman, and L. Pyber. Groups without faithful transitive permutation representations of small degree. 1. Algcbra 195 (1997) 1-29. H. Bechtel!. On semi
PAUL E. BECKER received B.S. and M.S. degrees in mathematics from Michigan State University. After several years of teaching at Saginaw Valley State University. he completed a Ph.D. at Central Michigan University. He is currently an assistant professor at Penn State Erie, His hobbies include algebraic combinatorics and kayaking, School o{Science, Pcnn State Eric, Thc Behrend College, Erie, PA 16563 [email protected]
May 200SJ
DO NORMAL SUBGROUPS HAVE STRAIGHT TAILS"
449
NOTES Edited by William Adkins
Avoiding Eigenvalues in Computing Matrix Powers Raghib Abu-Saris and Wajdi Ahmad 1. INTRODUCTION. Computing the powers of a square matrix A is of paramount importance from both pedagogical and computational points of view. For example, if one is studying the dynamical system governed by the system of difference equations x(n
+ I) = Ax(n) + g(n)
(n
= 0,
1,2 .... ; x(O)
= xo),
then (see [2, p. 116]) n-I
x(n)
=
A"xo
+L
A"-r-Ig(r)
(n
= 0,
1,2 .... ; A O =
/).
r=O
In particular, if the forcing term g(n) is absent, the solution is given by x(n) = A"xo. To compute A", one can use similarity transformation methods: A" = 5 B" 5- 1, where B is a square matrix (e.g., the Schur's canonical matrix (triangular) or the Jordan canonical matrix) whose powers are relatively easier to compute than those of A. To do so, however, one needs to determine the eigenvalues of A and the associated eigenvectors or generalized eigenvectors. This process can be tedious and computationally involved. Alternatively, one can apply the discrete analog [2, pp. 106-109] of Putzer's algorithm for the matrix exponential [5] or the recently developed algorithm by Elaydi and Harris [3], which is analogous to Leonard's algorithm for the matrix exponential [4]. A common feature of the aforementioned methods is that they, too, require the determination of the eigenvalues of A, which means finding the zeros of the characteristic polynomial of A, a challenging and difficult problem to solve exactly for polynomials of degree five or more. So we pose the question: Do we really need the eigenvalues of A to compute A" ?
2. AN ALGORITHM FORAn. Consider a k x k matrix A (k
~
2), and let
k-l
p()..) = det(U - A) =)..k
+ La))..) )=0
be the characteristic polynomial of A. By the celebrated Cayley-Hamilton theorem [1, p. 210] peA) = 0 and by the division algorithm, we have A" = q,,(A) peA)
+ r,,(A)
= r,,(A)
(n
~
k),
where the remainder polynomial r" is of degree at most k - I. Therefore A" can be computed once the coefficients of r" are determined. To do this, we write 450
©
THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
k-I
rll(A) = L,hj(n)A J . j=O Since A' = - I:~:~ajAj, we obtain for
I hiCk) = -aj
(j
=
/I
k
= O.... , k -
I).
Furthennore, k-I
L,hj(n
k-I
+ I)N =
A Il + I
=
k
AAII = L,hj(n)Ai+ 1 = L,hj_l(n)Aj
j=O
j=O
j=1
k-I
= h k - I (n)A
k
+ L, h j _ 1(n)N i=1 k-I
= -aOhk - 1(n)l
+ L, [-Gjh k - I (n) + h j _ 1(n)] Aj. j=1
Hence, comparing coefficients, we arrive at
Observe that the boxed formulas provide a recursive algorithm for computing A" that does not require determination of the eigenvalues of A. Moreover, these recursive fonnulas can be solved explicitly in terms of the coefficients of the associated characteristic polynomial. In fact, by the Laplace expansion of determinants through the first column, one can show that hj (n) is a signed detenninant of size (n - k + 1) X (n - k + I) given by ifn = k, aj_1
if n = k
+ 1,
ifn = k
+ 2,
ifn = k
+ 3,
ak-l
1
o 1
o
o
aj_1
aj-2
ak-I
ak-2
1
ak-I
aj_1
aj-2
ak-l
ak-2
ak-3
1 0
ak-l
ak-2
1
ak-I
aj-3
where we adopt here the convention that af = 0 if £ < O. We summarize these findings in the following theorem:
Theorem 1. If A is a k x k matrix with characteristic polynomial p()...) = ",k-l ,j h L...j=oajA ,t en May 2005]
NOTES
)...k
+
451
k-I
A" = I>j(n)N j=O
when n ::: k, where b; (k) = -(I; and b j (n) is given recursively by
with h_1 (n) = O. When n > k, bj(n) can be expressed as bj(n) = (_I)"-k+1 det(T;(I1)), in which T;(n) is the (n - k
+ I) a .I
X
(n - k
+ I) matrix (/;-II+k
{1j-2
{/;-I
I
T;(n) =
(
where
(It
)
= 0 whenever {; < O.
Remark 1. Observe that Tk - I (n) is a Toeplitz matrix (the entries of Tk - I (11) are constant down the diagonals parallel to the main diagonal), and T; (11) can be obtained from Tk- I (n) by replacing its first row with (aj a;_1 ... a;-IIH)' Also, notice that if A is singular, then (/o = 0, which implies that bo(n) = O. 3. EXAMPLES. To illustrate the applicability of Theorem I, we present the following examples. Example 1. This example is borrowed from l3]. If A is the 3 x 3 matrix
then the characteristic polynomial of A is peA) = AO -
Therefore, C/o = -12, follows:
Cil
= 16, and
II
3 4 5 6 7 8 9 10 11
452
©
Ci2
n + 16A -
12.
= -7, so by Theorem I the bien) are given as
hll(lI)
hl(n)
h 2 (n)
12 84 396 1572 5676 19332 63372 202404 634860
-16 -100 -444 -1700 -5996 -20100 -65164 -206500 -644076
7 33 131 473 1611 5281 16867 52905 163835
THE MATHEMATICAL ASSOCIATION OF AMERICA
lMonthly 112
Note that
+ n)2" + 4.3".
bo(n)
=
bl(n)
= (8+5n)2"-
-3(1
b2 (n) = -(2
-4.3",
1
+ 11)2"- + 3". 1
Example 2. Consider the square matrix
A =
(-~ 3 I). -4 6 2
Observe that A is singular. Its characteristic polynomial is p(),.) = ),.3 - 5),.2 + 6),.. Therefore, an = 0, (II = 6, and (/2 = -5. Applying Theorem I we obtain the followIng: n
ho(n)
0 0 0 0 0 0 0 0 0
3 4 5 6 7 R 9 10 11
hl(n)
h 2 (n)
-6 -30 -114 -390 -1266 -3990 -12354 -37R30 -115026
5 19 65 211 665 2059 6305 19171 58025
Example 3. Finally, we look at the 5 x 5 matrix
A~U
0 I I 0 2
2 0 I I 0
0 2 0
I 0 2 0 I
)
whose characteristic polynomial is p(),.)
=),." -
5),.4
+ 10)'"
- 20A 2
-
IS)" - 4.
an = -4, (/1 = -15,02 = -20, (/, = 10, and (14 = -5. Using MAPLE, we were unable to compute the zeros of p! However. with the aid of Theorem 1 we produce the following table:
In this instance,
May 2005]
n
born)
hl(n)
b 2 (n)
h,(n)
h 4 (n)
5 6 7 R 9 10 11
4 20 60 180 760 3516 14560
15 79 245 735 3030 13945 5R116
20 115 379 1145 4535 20610 86745
-10 -30 -35 -71 -755 -4255 -15790
5 15 45 190 879 3640 13945
NOTES
453
REFERENCES I. 2. 3. 4. 5.
T. M. Apostol, Linear Algel>ra: A Firsr Course \I'irh Al'l'licariol1s ro Differenrial Equariol1s, Wiley, New York,1997. S. N. Elaydi, An Inrmdllcrion ro Difference Equarions, 2nd ed., Springer- Verlag, New York, 1999. S. N. Elaydi and W. A. Harris, 1r., On the computation of An, SIAM Rn'. 40 (1998) 965-971. I. E. Leonard, The matrix exponentiaL SIAM Ret'. 38 (1996) 507-512. E. 1. Putzcr, Avoiding the 10rdan canonical form in the discussion of linear systems with constant coefficients, this MONTHLY 73 (1966) 2-7.
Del'orrlllent of Basic Sciences, Unit'ersity (!/'S/wrjaiz, PO.Box 27272, Sh{//jo/z, United AmI> Emirates. ral>[email protected](·.oe Departlllent of Eiecrrical and Electronics Engineering, University or S/wrjoh, Po.Box 27272, Sh(l1joh, United Aral> Elllirares. I\'[email protected]
Exact Solutions for Minimax Optimization Problems Francese Comellas and J. Luis A. Yebra
1. INTRODUCTION. Functions defined as pointwise maxima or minima of a finite set of smooth functions need not be differentiable, thus disallowing the characterization of their extrema by the standard methods of elementary calculus, However, the minimization of functions such as f = max {fl, 12, , , , , f;,,} do often surface in applications, so methods for sorting out their critical points may be of interest In fact, an appropriate characterization can be used to find effectively exact solutions to problems of this type that are generally studied in the field of nonsmooth optimization and solved approximatively by various algorithmic methods (see [1], [3], [5], and [6]). A concrete application by the authors can be found in [2], To give a feel for the problem, we consider first a one variable example in which we look for the minima of the function f(x) = max
{II (x),
h(x)} = max { xc,
8 + 4x - XC} 3
whose graph is shown in Figure I, The function has two critical points at which f' (x) does not exist: x* = -I and x* = 2. It attains its minimum value at the former, while at the latter it does not have even a local minimum, This may be inferred from a known version of the first derivative test that assumes the existence of the left- and right-hand derivatives of f at x* and states: f'(x*-) :::: 0 and f'(x H ) :::: 0 are necessary conditions for f to have a minimum at x*, while f' (x*-) < 0 and f' (x H ) > 0 are sufficient conditions for it In the example f satisfies the necessary and sufficient conditions at -I but only the necessary conditions at 2, Of course, reversing all the inequalities characterizes a maximum. Now a simple and common characterization of an extremum (either a maximum or a minimum) comes out: namely, the existence of
454
©
THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
6
5
-I
2
0 -I
x
Figure!. !(x)=max(x 2 • (8+4x-x 2 )f3).
constants a and
f3
such that a
+ f3
> 0 and
is a sufficient condition for an extremum when both constants are required to be positive, whereas it is a necessary condition only when one of them is allowed to vanish. Moreover, this characterization can be carried over to the case of several variables.
2. CHARACTERIZATION OF THE MINIMA. Consider now the minimization of a real function f defined in some open set in R" as the maximum of m continously differentiable functions of x = (XI, X2, ..• , XII): (1)
Besides regular critical points x* where some function Ji is larger than the others, hence characterized by 'V.r (x*) = 0, we have to study the critical points x* where the values of several functions coincide and their common value is larger than those of the remaining functions. The following characterization can be found in [3, chap. 3, sec. 2] and [6, sec. 2.1] in a somewhat more general framework.
Theorem 1. Let f = max {fl, f2, ... , fill}, where the functions Ji are continously differentiable in some open set in R", and let x* be a critical point of f. Suppose that the labeling of the functions is such that
for some k. Then a necessary condition for f to have a local minimum at x* is that the k + I gradients 'V fl (x*), 'V h(x*) .... , 'V fk+1 ex*) have a vanishing nontrivial linear combination with nonnegative coefficients: k+1
k+1
I>"; 'V f;(X*)
= 0,
C; :::
;=1
0,
Le; >0.
(3)
;=1
Moreover, when k = n and the n + 1 gradients 'V fi (x*) (1 SiS n + 1) span R", a sufficient condition for f to have a strict minimum at x* is that (3) hold with positive coefficients c;: 11+1
I>;'Vf;(x*) =0,
c; > O.
(4)
;=1
May 2005]
NOTES
455
Characterizations (3) and (4) still hold when looking for the maxima of the minimum of a (finite) set of functions.
3. EXACT SOLUTIONS. Any point x* at which (3) holds is called a stationary point of I. While the determination of global minima is often accomplished via different iterative algorithmic methods (see, for instance, [6]), in many cases stationary points can be determined exactly through an appropriate use of condition (3). Indeed, when k < 11 the linear dependence expressed by (3) provides 11 - k equations that, together with the k equations II (x*) = ./l(X*) = ... = .h+1 (x*), lead to the critical points among which stationary points (and thus extrema) are to be found, as shown in the following examples.
Example 1. We look here for the point closest to three given points in the plane (i.e., at minimum distance from the furthest one). In the preceding formulation, we must find the minimum of the maximum of three functions: I = maxUI, .12, .f~}' where II, .h. and .h are the (squared) distances from a point (x, y) to the given points. I. Consider first the points (-4,0), (4,0), and (0, -2). Then II (x, y)
=
(x +4)2 +
We have II (x, y)
.'1'2,
=
.h(x, y)
hex, y)
=
=
(x - 4)2 + v
.h(x, y)
=
2
,
hex, y)
=
25 at (x*, y*)
2
=
x + (y + 2)2.
(0,3). At this point
5VII (0,3) + 5Vh(0, 3) - 6Vh(0, 3) = 5(8,6) + 5( -8,6) - 6(0, 10) = (0,0). Since the necessary condition (3) does not hold, no minimum can be attained at this point. Considering now II (x, y) = .f2(X, y), we obtain x = 0, and for points on this line VII (0, y) = (8, 2y) and VI:. (0, y) = (-8, 2,v), which are linearly dependent only when y = 0. Now, at (x*, y*) = (0, 0) we have II (0, 0) = ,h (0, 0) = 16> h(O,O) = 4 and also VII (0,0) + V,h(O, 0)
=
(8,0) + (-8,0)
=
(0,0),
satisfying the necessary condition (3). Direct verification along the line x = 0, where I (0, .v) = 16 + ,v2, shows that the function does attain its minimum value at this point. 2. Now let the points be (0,0), (2,0), and (0, 2), so that
= x2 +
./'c(x, y)
=
(x - 2)2 + )'2,
These functions satisfy II (x, y) at which point
=
I2(X, y)
II (x, y)
/,
=
f,(x, y)
.h(x, y)
=
=
x 2 + (y - 2)2.
2 at (x*, y*)
=
(I, I),
OVII (I , I) + Vh(l, I) + Vh(l, I) = 0(2, 2) + (-2,2) + (2, -2) = (0,0). Thus at (I, I) the necessary condition (3) is satisfied, but not the sufficient condition (4). However, direct verification (f (1 + EI, 1 + E2) - I (1, I) :::: ET + Ei near ( 1, I» shows that the function does attain its minimum val ue at this point. 3. Finally, consider the points (-4,0), (4,0), and (0, 8). In this instance
II (x, 456
y)
=
=
x 2 + (y - 8)2.
THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
(x + 4)2 + / '
©
./l(X, y) =
(x - 4)2 + / ,
hex, v)
Now fl(X, 5V fl (0.3)
y)
=
.h(x. Y)
=
,hex. Y)
=
25 at (x'. v*)
=
(0.3). where
+ 5V.l2(O, 3) + 6Vf,(0. 3) = 5(8. 6) + 5(-8. 6) + 6(0, -10) =
(0.0).
At (0. 3) both the necessary condition (3) and the sufficient condition (4) are satisfied, so this is the point we were after (see Figure 2a).
80 70
60 50
40 2
30
1.5 1
()
6
(l,)
5 4
()
-0.5
3 2
Y
-I
\'
-l
.r
(al
-1.5
(b) Figure 2. (a) Example I, case 3: (b) Example 2,
Of course. this is an easy example because each function fi (hence f) is convex, so the minimum is always attained: at an inner point (the triangle's circumcenter) when the three points are the vertices of an acute triangle, and at the midpoint of the longest side when the points are the vertices of an obtuse or right triangle. The next example describes quite a different situation.
Example 2. Consider
f(x. y)
= max {II (x. Y). hex. y)j = max{x
- / , -3x +x"
+ y2}
(see Figure 2b). We have II = 12 on the ellipse E : x 2 - 4x + 2y2 = 0, with II > h inside and fl < h outside E. Inside the ellipse Vfl (x. y) = (I. -2y) I- (0.0), while outside it V h(x. y) = (-3 + 2x. 2.1') I- (0.0) as well. Next, for points on the ellipse V fl and V.h can be linearly dependent only under the following conditions: (a) if y
I-
0, then x = I. which yields the two points A = (I. ~3/2) and B
=
(I. -J3/2);
(b) if v
= 0,
then x 2
-
4x
= O. giving two new points, C =
(0.0) and D
=
(4.0).
At A and B, V II + V h = O. Parameterizing the ellipse E in the standard way by g: [-Jr. Jr]---+ R" we obtain g(s)
with A
= (x(s). yes») = (2 -
2coss. hsins).
= g(Jr /3) and B = g( -Jr /3). Set ¢(s)
May 2005J
= I[g(s)] =
2 (cos" .I
NOTES
-
coss). 457
Then ¢' (±rr /3) = 0 and ¢// (±rr /3) = 3 > 0, revealing that I attains its minimum value ¢(±rrj3) = -1/2 at these points. At C, we get 3V'II + V' h = O. Since C = g(O) and ¢'(O) = 0 but ¢//(O) = -2 < 0, I does not attain a minimum at C. Lastly, at the point D, V'II = (1,0) and V' h = (5,0), so the necessary condition (3) does not hold at D. Finally, we consider one of the standard test examples used to compare different algorithms in nonsmooth optimization (see [5, p. 139] or [4]). Most problems in these tests admit analogous treatment.
Example 3. We look for the minimum of I(x, y) = max {f1(X. y), hex, y), hex, y») = max
{x 2 + l, (x - 2)2
+ (y -
2)2, 2e- x +').
All the algorithms find approximate solutions near (x', y') = (1.14, 0.9), for which II (x', y')
=
1.9557,
hex', y')
=
1.9496,
f,(x', y')
=
1.5733.
In our context we have to study the behavior of I at a critical point (x*, y*) for which II (x*, y*) = h(x*, y*) > f,(x*, y*). Therefore (x*, y*) must be a solution of x2
+l
= (x - 2)2
+ (y -
2)2,
x(y - 2) = 2y\x - 2),
where the second equation comes from the linear dependence of the gradients V'II (x, y) = (2x,4 y 3) and V' hex, y) = 2(x - 2, y - 2). Solving these equations, we find the "exact" optimal solution: x* = 1.139037651992656, y* = 0.899559938395897, I(x*, y*) = 1.952224493870659. ACKNOWLEDGMENTS. This research was supported by the MCyT Grant TIC-2002-00155, Spain.
REFERENCES I.
2. 3. 4. 5. 6.
F. H. Clarke, Y. S. Ledyaev, R. J. Stern, and P. R. Wolensky, Nonsmooth Analysis and Control Theory, Springer-Verlag, New York, 1998. F. Comellas and J. L. A. Yebra, New lower bounds for Heilbronn numbers, Elect. 1. Combinatorics 9 (2002) R6. V. F. Dem'yanov and V. N. Malozenov, Introduction to Minimax, John Wiley & Sons, New York, 1974; reprinted by Dover, New York, 1990. L. Luksan and J. Vlcek, Test Problems for Nonsmooth Unconstrained and Linearly Constrained Optimization. ICS Technical Report, no. 798, 2000; available at http://www.cs.cas.czf'luksan/test.html M. M. Makela and P. Neittaanmaki, Nonsmooth Optimization, World Scientific, Singapore, 1992. E. Polak, Optimization, Springer-Verlag, New York, 1997.
Departament de Matematica Aplicada IV, E. P S. C, Ulliversitat Politixnica de Catalunya, Av. del Canal Olfmpic, S.Il., 08860 Castel/defels, Catalonia, Spain (comellas,[email protected]
458
© THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
A Simple Proof of the Gauss-Winckler Inequality F. G. Avkhadiev 1. INTRODUCTION. Let that
I : (0, (0)
I
cc
---+ [0, (0) be a nonincreasing function such
I(x) dx = I.
(I)
II
The Gauss-Winckler inequality is an old comparison theorem in probability theory for moments (p>-I).
According to Faber [1] and von Mises [2], in 1821 Gauss stated without proof that
In 1866 Winckler presented the following generalization: [ (I
+ r)M,.(j) ] 1;" :s:
[
(I
+ s)M,(j) ] I/.I
(2)
whenever 0 < r < s. But Winckler's proof of (2) was insufficient (see Beesack [3]). In 1896 Kriiger proved (2) for s = 2r > 0 and for certain other cases. The first proof of (2) for all positive rand s with s > r is due to Faber [1]. Faber established that
provided that 0 :::: a < b < c. For a = 0 inequality (3) is equivalent to the GaussWinckler inequality with b = rand c = s. Later, von Mises [2] gave a new proof of (2) when -I < r < sand f is continuously differentiable on (0, (0). By a modification of the method from [2], Beesack [3] proved (2) in the case when 0 < r < sand f is continuous on (0, (0). Except for the special case of s = 2r in (2) (see [4, p. 164]), the case of equality in (2) and (3) does not seem to have been investigated. The aim of this paper is to present a simple proof of the Gauss-Winckler inequality in its full generality and to determine the extremal functions. More precisely, we give a proof of the following assertion: Theorem 1. Let f be a nonnegative and non increasing function on (0, (0) satisfying condition (I). If -1 < r < s, then inequality (2) is valid. The case 0<
occurs
[0
+r)M,.(f)]' =
[0
+s)MI(f)r < 00
if and only !f
May 2005]
NOTES
459
c {0
f(x) = fo(x):=
ifO<x < I/C, 'f I/C < X < 00, I
(4)
for some positive constant C. We also obtain Faber's inequality (3) for a and b such that -I < a < h < c.
2. THE PROOF. We show that inequalities (2) and (3) are consequences of the Holder inequality for integrals and of the following lemma:
Lemma 1. If a nonnegative and nonincreasing function f : (0, (0) ---+ [0. (0) satisfies (1), then the function 1/1 : CO, (0) ---+ R defined by
V/CX)
=
r
io
f(t) dt - xf(x)
has thefo//owing properties:
Ci)
VI
is a nondecreasing function;
Cii) 1/1(0) := lim,---,o+ 1/I(x) = 0 and 1/1(00) := lim,---,x 1/1 (x) = 1; (iii) if MI,c.n isfinite, then
+ I)M,,(f)
(p
=
[0<- Xl' d1/l(x).
io
(5)
Proof: The property
fCt) :::: f(x) :::: 0
(6)
(0 < t :S x < (0)
implies that
if X2 > Xl > O. This gives (i). Using (6) and the existence of the integral C1) we obtain
r
xfCx):s
io
fCt) dt ---+ 0
(as
X
---+ 0+)
and
xfCx) :S 2
IX
f(t) dt ---+ 0
Cas
x ---+ (0).
,/2
Together with (I) these prove (ii). It is clear that MI,(f) = 00 when p :S -I. Consequently (cf. [2], [3]), if M,,(f) is finite, then p + I > 0 and Xl'+l
460
©
f(x) :S (I
+ p)
r t" fCt) dt ---+ 0
io
Cas
x ---+ 0+)
THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
and X,,+I
f(x) :::: 2 /,+1
1'\ r
tl'f(t) dt ---+ D
x ---+ (0).
(as
/2
Integrating by parts we get
Thus,
I'"
1%
x"d1/l(x) =
IJ
1~ xl'd(xf(x»
xl'f(x)dx -
IJ
+ pM"U),
= MI,U)
()
•
which is (5). This completes the proof of the lemma.
We now suppose that -I < a < IJ < c < Mc(f) are finite. Using Holder's inequality
00
and that the quantities M" U) and
for the exponents C-(l
C-{/
a=-->L
/3=-->1
c-IJ
IJ-a
and the functions 1I
= x,,/a,
v = x'l!!
we obtain
I
o
"" x"cl1/1(x):::: (1"-()
)
I, -"I/IC-ill
X"dVf(X)
(1cx.. x d1/l(x) )
(Ii-"I/Ic-ill
C
0
(7)
Thanks to (5), inequality (7) is equivalent to Faber's inequality (3). Since a < c, equality occurs in (7) (see [4, sec. 6.17]) if and only if
.
..
1/I(x)=1/IoC\).=
{D I
ifD::::x < lie. 't'I/C I
.
<.X <
00,
for some constant C > O. For the corresponding extremal function ifD < x < IIC, if I I C < x < 00,
x\' , -
{ 0 . .\' = -I
iil(x) one has (8)
where
y
May 2005J
=
1'\ iii
(t ) d t .
o
NOTES
461
The solution of (8) is given by C1x
y = { C x 2
+I
if 0 < x < lie. if I I C < x < 00,
where C 1 and C2 are constants. Consequently, ifO<x
Theorem 2. ff' -I <
(I < b < c < 00 and if' f is a nonnegative, Ilonincrea.l'ing .fimction with property (I), then the Faber inequality (3) is valid. Moreover,forfinite Mil (f) and MJf') equality occurs in (3) (f'and only (f'the.filllction f equals hFoll1 (4).
According to the classical theory of Holder inequalities (sec r4]), inequality (7) implies that
r~
(J()
r
x d1/l(x)
)
l/r
(r~
:s J()
x'd 1/1 (x)
)
1/,
(9)
when -I < r < s < 00 (with a special interpretation of the integral means for r = 0 or s = 0). Moreover, there is no equality in (9) if 1/1 (x) I- 1/10 (x ) and the integral means in question are finite. Consequently, Theorem 1 follows from Theorem 2. ACKNOWLEDGMENTS. This work was supported by a grant from thc Deutsche Forschungsgcmeinschafl. The author also thanks Professor K.-J.Wirths for helpful discussions.
REFERENCES I. 2. 3. 4.
G. Faber. Bermerkungen zu Satzen der Gausschen theoria combinations observationum. Sitzunl{sha Bayer. Akad. Wiss. 1 (1922) 7-21. R. von Mises, Uber einige Abschatzungen von Erwartungswerten, 1. Reine Anl{ew. Math. 165 (1931) 184193. P. R. Beesack, Inequalities for absolute moments of a distribution: From Laplace to von Mises. 1. Math. Ana/. App/. 98 (1984) 435-457. G. H. Hardy, J. E. Littlewood, and G. P6lya, Inequalities, 2nd ed., Cambridge University Press, London, 1952.
Institute of Math. and Mech., Kazan State University, R-420008 Kazan, Russia [email protected]
Copositive Symmetric Cubic Forms Mowaffaq Hajja A homogeneous polynomial j in n variables over IR is called ajorm. Such a form j is said to be cofJositive if f (x I , . . . , x,,) ::: 0 whenever Xi ::: 0 for i = I, ... , n. This
462
© THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
term seems to have first appeared in [3], where copositive symmetric quadratic forms were investigated. These forms are also encountered in [2], where Theorem I (a) says in effect that a symmetric quadratic form f in 11 variables is copositive if and only if it is nonnegative at the two test n-tuples (I, I, ... , I) and (1,0, ... , 0). A similar algorithm for deciding whether a symmetric cubic form f in 11 variables is copositive consists in testing f at the n-tuples v\") .... , v;;/), where m
v;::)
JI-/11
~~
=
(1, .... I, 0 ..... 0)
(1:5: m :5: n).
This was established in [5] when 11 :5: 3 and in [1. Theorem 3.7] for all n. In Theorem I we give two shorter inductive proofs, one using elementary calculus and the other utilizing a simple form of [1, Lemma 3.1]. It is worth mentioning that the exact analogue for quartics does not hold even when Il = 3, as one can use methods from [6] to construct a non-copositive quartic form that is nonnegative at the points (1, O. 0). (1, I, 0), and (I, I, I). However, an algorithm for deciding whether a symmetric quartic form f in n variables is copositive that consists in testing f on n-tuples of the type 11-11/-"
III
~~~
(a .... ,a.I, .... I,O, ... ,O)
(O:5:m,r :5:n.m+r :5:n)
is established in [4]. It is also proved there that if 11 = 3, then the same algorithm works for quintics but does not work for forms of higher degrees. Note that the authors of [1] and [4] study even symmetric forms F with F(x) 2: 0 for all x in ffi.1l, where a form is even if all the exponents in its terms are even. Such a positive even form F of degree 2d corresponds to a copositive form f of degree d by setting F(xJ, ... , XII) = f(x~ . .... X,~), Note also that the set of symmetric cubic forms in any number n of variables is a vector space over ffi. that is spanned by the three forms M(Il) M(Il) M(Il) and (Mill)), where :1'
2
I'
I'
M r(Il) (x 1,
••• ,
x) 11 == x.r1
+ . . . + xr
1/'
In fact, these three forms form a basis for the space in question when n 2: 3. In view of the fact that a symmetric cubic form in the two variables X and y is necessarily of the simple form (x + y) Q (x, y), where Q is a quadratic, one may confine oneself to the case in which n 2: 3, although everything that follows applies to the case n = 2 as welL
Theorem 1. Let
pn) be a symmetric cubic form in n variables, say f
(ll) -
-
aMen) 3
+ bM(n) M(n) + c(M(n»)3 2 1 l'
Then pn) is copositive if and only if pn) (v~)) 2: 0 for m = I, ... , n. In other words, fen) is copositive if and only if a + bm + cm 2 2: 0 for m = 1, ... , n. Proof Clearly, we need prove only the "if" direction. We proceed by induction on n, the theorem being trivial for n = 1. Thus we assume that the theorem holds for n = r - 1 for some r 2: 2, and we let prj be such that fer) (v;~)) 2: 0 for m = I, ... , r. We must show that pr)(x) 2: 0 whenever x 2: 0, where x in ffi.n is said to be nonnegative if each of its coordinates is nonnegative. By homogeneity, this is equivalent to May 2005]
NOTES
463
proving that
f
,[,Ir)
(x) :::: 0 for all x in the simplex =
{x =
(XI, ... ,
xr )
E JR;r :
'['(I)
x :::: 0
defined by and
XI
+ ... + Xr
= I).
Since pr-I)(v;:,-I) = pr)(v;;,) :::: 0 for m = I, ... , r - I, it follows from the inductive assumption that pr-I) (x) :::: 0 for all x in JR;r-1 such that x :::: O. Equivalently, (I)
where
signifies the boundary of
a,[,lr)
pr-I)(XI, " "
This is because
,[,(r).
P')
is symmetric and
Xr-I) = pr)(XI,"" Xr-I, 0), and because the boundary 3,[,(r) is given
by ')'11'(r) () II -
(( •
X I, ... ,
Xr )
' Ieast one.l.) . Xi -- 0 f or at
'11'(r). E ll.
We claim that min {pr) (x) : x E ,[,(r)} is attained either at a boundary point (in which case we are done by (I» or at the interior point v;r) / r whose coordinates are all equal (in which case we are again done since P') (v;r) :::: 0). For if not, let w f. v;rI / r be an interior point of '['(r) at which P') attains its minimum. Due to symmetry, one may assume that w = (x, Y, lh" ... , Pr) with 0 < X < y. We describe two different ways for reaching a contradiction. Before doing so, however, we make the useful observation that the restriction of P') to '['(r) coincides with the much simpler function R(r) given by glr)(x) = aM~r)(x) .
+ bM~r)(x) + c. -
(2)
First, the gradient of the Lagrangian function r
.(
Ir) (XI,.
r
" . ,xj-g . _ (r) " .. ,Xr)+AL. (XI, ... ,Xr)+AL...,.Xi j=1
j=1
r
= a
r
r
LX~ + b LX; + C + A LX) i=1
j=1
i=i
must vanish at w. Thus each coordinate of w is a root of
3aX 2 + 2bX + A = O. Since then
X
f. y, it follows that a f. 0 and
X
+y =
-2b/3a. But if
~
+ 77 =
-2b/3a,
_ (-2b)' -a - - +b (-2b)2 -3a 3a which does not depend on ~ and 77. Taking u = (0, -2b/3a, p" ... , P,), we conclude that pr)(w) = pr)(u), and we obtain the contradiction that the minimum of P') on '['(I) is attained at the boundary point u. Alternatively, consider the points u = (0,
464
X
©
+ y, P"
... , Prj,
q
=
(
X + y x + y ) -2-' -2-' P3, ... , Pr
THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
of 1[(rl. One can easily check that
for j = 1.2. 3. Since the restriction g(rl of prl to 1[(11 (as given in (2)) is a linear combination of M~rl. Mi rl , and M;rl (= I), it follows that
Since prl does not attain its minimum at u (thus yielding prl(u) > f(JI(W)) and since (x - y)2 > 0, we obtain the contradiction
•
This completes the proof.
We end this note by giving an example. adapted from [6], of a non-copositive symmetric quartic f (x. y . .::) that is nonnegative at the triples (I. 1. I), (I. 1. 0). and (I. O. 0). Letting Mr = xr + v r + ::r and -413M4 + 208M,M 1 + 343Mi - I36M2M~. gem) = -413m + 208m 2 + 343m 2 - 136m' = m(-413
f(X, y. Z)
=
+ 551m -
136m 2 ),
one readily verifies that
I(1. I, I)
I(l. 1. 0) I(1. 0, 0)
= g(3) = 48> O. = g(2) = 290 > 0, = g(1) = 2> O.
whereas fO, 1. I) = -16 < O. This is consistent with the result in [4], according to which f can be declared copositive only after the polynomials
(3)
f(a. a, I), f(a. 1. I). f(a. 1. 0) have been proved nonnegative for all a :::: O. In our case, none of them is, since
f(.2 • .2. I) = -6.272.
f(3, 1. I) = -16.
f(9. 1. 0) = -574.
It is worth investigating whether it is indeed necessary when deciding if a given symmetric quartic I is copositive to establish the copositivity of all of the three types mentioned in 0), and whether the results of [4] have simpler proofs. ACKNOWLEDGMENTS. This work is supported by a research grant from Yarmouk University. The author would like to express his thanks to Yarmouk University for that grant and to the referees for their valuable comments.
May 2005]
NOTES
465
REFERENCES I.
2. 3.
4. 5.
6.
M. D. Choi, T. Y. Lam, and B. Reznick, Even symmetric sextics, Math. Z. 195 (1987) 559-580. M. Hajja, Radical and rational means of degree two. Math. In equal. Appl. 6 (2003) 581-593. M. Hall. Jr.. and M. Newman. Copositive and completely positive quadratic forms. Pmc. Camhrh/f,e Philos. Soc. 59 (1963) 329-339. W. R. Harris. Real even symmetric ternary forms. J. Algehra 222 (1999) 204-245. J. Rigby. A method of obtaining related triangle inequalities, with applications, Univ. Beograd Publ. Elektmtehn. Fak. Set: Mat. Fi~. 412--460 (1973) 217-226. K. B. Stolarsky. Cubic triangle inequalities, this MONTHLY 78 (1971) 879-881.
Mathematics Department. Yannollk University, Irhid, Jordan [email protected]
Computing with Relative Ease I had a dream a few days ago in which I discovered and proved a theorem in the theory of computation. The good news: I remembered it upon awakening. The bad news: its validity is somewhat questionable. Still, it seemed worth sharing.
Theorem. There is a .fixed constant C such that every finite computation, no matter how complex, can be done in C seconds.
Proof Given a computation I wish to perform, I start a Turing machine working on the problem. Then I board a rocket ship and travel sufficiently close to the speed of light so that when I return before C seconds have elapsed for me, enough time on Earth has passed (because of the theory of relativity) to complete the computation. •
Corollary. P
= NP. --Submitted by Jerry Grossman, Oakland University, Rochester, MI
466
© THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
REFERENCES I.
2. 3.
4. 5.
6.
M. D. Choi, T. Y. Lam, and B. Reznick, Even symmetric sextics, Math. Z. 195 (1987) 559-580. M. Hajja, Radical and rational means of degree two. Math. In equal. Appl. 6 (2003) 581-593. M. Hall. Jr.. and M. Newman. Copositive and completely positive quadratic forms. Pmc. Camhrh/f,e Philos. Soc. 59 (1963) 329-339. W. R. Harris. Real even symmetric ternary forms. J. Algehra 222 (1999) 204-245. J. Rigby. A method of obtaining related triangle inequalities, with applications, Univ. Beograd Publ. Elektmtehn. Fak. Set: Mat. Fi~. 412--460 (1973) 217-226. K. B. Stolarsky. Cubic triangle inequalities, this MONTHLY 78 (1971) 879-881.
Mathematics Department. Yannollk University, Irhid, Jordan [email protected]
Computing with Relative Ease I had a dream a few days ago in which I discovered and proved a theorem in the theory of computation. The good news: I remembered it upon awakening. The bad news: its validity is somewhat questionable. Still, it seemed worth sharing. Theorem. There is a .fixed constant C such that every finite computation, no matter how complex, can be done in C seconds.
Proof Given a computation I wish to perform, I start a Turing machine working on the problem. Then I board a rocket ship and travel sufficiently close to the speed of light so that when I return before C seconds have elapsed for me, enough time on Earth has passed (because of the theory of relativity) to complete the computation. • Corollary. P
= NP. --Submitted by Jerry Grossman, Oakland University, Rochester, MI
466
© THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
PROBLEMS AND SOLUTIONS Edited by Gerald A. Edgar, Doug Hensley, Douglas B. West with the collaboration of Paul T. Bateman. Mario Benedicty, Itshak Borosh, Paul Bracken, Ezra A. Brown. Randall Dougherty, Dennis Eichhorn, Tamas Erdelyi, Zachary Franco. Christian Friesen. Ira M. Gessel. JeITold R. Griggs. Jerrold Grossman, Kiran S. Kedlaya. Andre Kundgen. Frederick W. Luttman, Vania Mascioni. Frank B. Miles. Richard Pfiefer. Cecil C. Rousseau. Leonard Smiley, John Henry Steelman, Kenneth Stolarsky. Richard Stong. Walter Stromquist. Daniel Ullman, Charles Vanden Eynden. and Fuzhen Zhang.
Proposed problems and solutions should be sent in duplicate to the MONTHLY problems address on the inside front cover. Submitted solutions should arrive at that address before September 30, 2005. Additional information, such as generalizations and references, is welcome. The problem number and the solver's name and address should appear on each solution. An acknowledgement will be sent only if a mailing label is provided. An asterisk (*) after the number of a problem or a part of a problem indicates that no solution is currently available.
PROBLEMS 11152. Proposed by Mircea /I'([n, Technical University of Cluj-Napoca, Romania. Evaluate ./;)llog(cos(nx/2»/(x(1 +x»dx.
Ch~i-Napoca,
11153. Proposed by Paul Bracken, University of Texas-Pan American, Edinburg, TX. Let XI = I. and for n 2: 1 let X,,+I = X" + 2 + I/x". If y" = 2n + (1/2) log(n) - x"' show that the sequence (y,,) is eventually increasing. 11154. Proposed by Kent Holing, Trondheim, Norway. Let a, b. and c be positive integers with ab > I. (a) Show that if u and b are relatively prime. then (I + ab) f (a 2 + b2 ). (b)* Suppose now that a and 17 need not be relatively prime. but that a 2 + 17 2 is square. Can it happen that (l + ab) I (a 2 + bC)? 11155. Proposed by Emmanuel Kowalski, Universite Bordeaux, Bordeaux, France, and Tunguy Ril'oal, /nstitllt FourieJ; Unil'C/"site de Grenoble 1, Saint Martin D'Heres, France. Let P" be the polynomial function on JR; given by P,,(x)
=
)C ,c/(l -x>-("-/). L" (n .. l '
i=O
.I
(a) Show that on the interval [0. I] P" (x) is minimized at x = 1/2. (b) Show that when P" is written as a sum of powers of x - I /2, say, P" (x) = L~:o a".dx - I /2)k. the odd coefficients (/".2.(+1 are zero and the even coefficients a".ck are nonnegative.
11156. Proposed by Richard Stanley, Massachusetts Institute of Technology, Cambridge, MA. Let s (n. k) denote a signed Stirling number of the first kind. It is well known that the matrix [s(n.k)] (n.k 2: I) has inverse [S(n,k)] (n.k 2: I), where S(Il, k) denotes a Stirling number of the second kind. Find a combinatorial formula May 2005]
PROBLEMS AND SOLUTIONS
467
for the entries of the inverse of the finite matrix [s (II, n - k + I) 1 (I s n, k S N), where we set s (n, .i) = 0 if .i s n. For example, when N = 4 this formula should give the numbers obtained by inverting
(1
o
0
-I
0 2 II
-3 -6
~).
-6
11157. Proposed h.v Hillel Gauchman, Eastern Illinois Universit)" Charleston, IL. Let 2: X2 2: ... 2: XII 2: 0, and assume that L'; = I Xj S 400 and L'; = I x} 2: 104 , Prove that JXI + ft2 2: 10.
xI
11158. Proposed hy David Beckwith, Sag HarhOl; NY Let n be a positive integer, and let p be a prime number. Prove that 171' I n! implies that 1'1'+1 In!.
SOLUTIONS Gamma Function Bounds 11016 [2003, 439). Proposed h.Y' Juan-Bosco Romero Mcirqlle;, Universidad de Val/ado/id, Valladolid, Spain, Prove that if a, h, and x are real numbers with a 2: -1/2, h 2: 0, and x 2: I, thenx- h (~r-I S rex) s ex ll (~)\,
Solution hy Martin Muldooll, York University, Toronto, Ontario, Canada. A stronger result was published by M. E. Muldoon in "Some Monotonicity Properties and Characterizations of the Gamma Function," Aequationes Mathematicae 18 (1978), 54-63. Here, we follow that method. Let h,Ax) = log Ix"r(x)(e/x)'I. Using Binet's integral representation (see (22) on page 18 of Higher Transcendental Functions, vol. I, by A. Erdelyi et al. (McGraw-Hili, 1953» for o/(x) = (d/dx) log I(x), we get h~(x) =
1 o
"XJ[1- t
I I - e
_I
+ a]
e- x1 dt
(x > 0).
The integrand here has the same sign as fa(t) = (I + at)( I - e- ' ) - t. If a S 1/2 andO < t < 00, then f,,(t) S fl/2(t) = (l + t/2)(l - e- I ) - t < O.lfa > landO < t < 00, then .f~(t) 2: fl(t) = (I + t)(1 - e- I ) - t > O. Thus when I < x < 00 we haveh,,(x) > 17,,(1) if a S 1/2andh,,(x) < h,,(I) if a 2: I. Taking a = -aanda = h + I, we get the desired inequalities. with equality only for x = I. The inequalities are reversed if 0 < x < I.
Editorial comment. This problem was also submitted to Mathematics Magaz.ine, where it was published as Problem 1651. Walther Janous proved the related inequality, with \' _
\I+···+XI/. II
n f(.x~) s (r(s»)11 il
/1 j=1 Xj
1
S,-I
which is a counterpart to the "log-convexity" inequality
n;=
I
r (Xi)
2:
r (S)II.
Also solved by S. Alllghibech (Canada). M. Bataille (France). P. Bracken. R. Chapman (U. K.). C. Curtis. D. Donini (Italy). P. J. Fitzsimll1ons. O. Furdui. 1. Grivaux (France). W. Janous (Austria). O. P. Lossers (Netherlands). R. Stong. L. ZhOli. and the proposer.
468
© THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
One Good Factorization Deserves Another 11029 [2003, 636]. Proposed hy Tim Ke//e/; Fair Oaks, CA. Let a. b. e. d be positive integers with (/ > b > c > d such that (1(' + bd = (h + d + a - c)(h + d - 0 + c). Problem 6 from the 200 I International Mathematical Olympiad asks for a proof that ab + cd is composite. Find the minimum possible number of prime factors (not necessarily distinct) of each of ab + cd. ae + hd. ad + he. Solution by Doyle Henderson, Unil'ersity ot'Nebraska, Omaha, NE. The answers are 3. 3, and 2, respectively. When (a. h. c. el) = (11.9.6. I). we have (/b + cd = 3 . 5·7, LIe + bd = (9 + I + 6 - I) (9 + I - I + 6) = 3 . 5 c, and ad + he = 5 . 13. so these values are achievable. simultaneously. Let a = h + d + 0 - e and 13 = h + d - 0+ e. We first show that ad + be is not prime. Since af3 = a(b + d + a - a) + hd = aa + (a + 17)(0 + d). a I (a + b)(a + d). As a > (I + b. a shares a prime factor PI with (/ + h. Since e - d = (a + b) - a. PI I (e - d). Now (ld + he = d(u + b) + b(e - d). so PI I (ad + he). But PI f::. ad + be because PI .s: a while ad + be = d(a + b) + b(e - d) = da + (d + b)(e - d) > a. To complete the proof it suffices to obtain primes P2 and Pl. not necessarily distinct. such that Pc/J.' properly divides both ab + ed and ac + hd. Now c - d < h < a. so a + 17 = a + e - d < a + (a + b)/2. which gives u + h < 2a. In tandem with the earlier result a I (u + b)(e + d). this forces a to share a prime factor P2 with a + d, and since b - e = a - (a + d). Pc I (h - c) as well. From ab + cd = b(a + eI) - d(b - c). we conclude that 1'2 I (017 + cd). Next, we note that 13 I (e + b)(e + d) because af3 = ae + bd = (h + d + cf3)c + bd = -f3c + (h + c)(e + d). We claim that 13 and (h + c) have a common prime factor Pl. Otherwise,f3 I (e + d). and then 13 I (a - b) because a - h = c + d - 13. Let kl = (c +d)/f3 and ko. = (a - h)/f3. Since 13 > O. e +d > a - b so that kl > kc 2: I. We now compute
2af3
= 2(ac + bd) =
(a
+ b)(e + d) + (a
= (a + b)k l f3 + ko.f3(e - d). and cancelling 13 gives 0 = (k l - 2)(a + b) + (k 2 + - b)(c - d)
Subtracting left from right here 2)(e - d). which is a contradiction in view of kl 2: 2. With Pc and pl as described, we have P2 I a and 1'1 113 so that PcP1 I (ae + bdl. Since a > b - e 2: P2. we have af3 > P2P,. Thus (lC + hd must have another factor in its prime factorization. Since c + b - 13 = a-d. /J., I (a - d). Now oh + cd = ue + bd + (h - e)(a - d). so P2p, I (ab + cd). From 017 + cd > oe + bd > Po./J." we conclude that there must also be another factor in the prime factorization of ab + cd.
Also solved by R. Chapman (U. K.). O. P. Lossers (Netherlands). J. Robertson. R. Stong. H. T. Tang. GCHQ Problem Group (U. K.). and Ihe proposer.
The Stationary Distribution of a Tridiagonal Row-Stochastic Matrix 11032 [2003.637]. Proposed by Mark Pinsky, Northwestern Unil'ersity, E\'{/nstoJl, IL. For each nonnegative integer N. let P be the (N + 3) x (N + 3) tridiagonal matrix with entries P ij (for i. j E {G ..... N + 2}) defined by p o. t = PNc"cN+1 = L i pi.i-I = 'N~o. ,V+'-i t' . Th"IS IS a Pi.i+1 = N+2' or I .s: I. .s: N + I . an d Pi.} = () ot herwise. row-stochastic matrix. The stationary distribution JTo •..•• JTi\'+2 is the solution of the system ,\,+2
L
JTi Pi)
=
JT)
(O.s: j .s:
N
+ 2)
i=1I
May 200S]
PROBLEMS AND SOLUTIONS
469
with L~~;][j = I. In terms of N, find ][ and the positions in ][ of its maximal and minimal entries.
Solution by Nicholas Singer. Annandale, VA. With constant c yet to be determined, let ][0 = ][ N +:' = 1/ c and, for I :s k :s N + I, let N+2
We compute ][ P. For the extreme values, (][ P)o
(][ PLv+:' ( ][ P)I
N+I
I
+ I) N + 2
c
N+2
= ][1 PI.IJ =
c(N
= ][,'HI PN ·11 . N +:, = = ][(Jp,(J 1 + .
][0
Po
- -
--- = - = ][0.
N+2 N+I c(N + I) N + 2
I
1
N+2
=-+ c
c(N
N+2 N -------+ -I c(N+I)NN+2
I ~
=
=
c
= ][,'H2.
N
= + I)N --N +2 N+2
=
N+2 c(N
+ I) = ][1.
][N+I.
c(N+I)
If2:sk:sN.
N+2
k-I
c(N+I)C~JN+2
N+2 N+I-k + -------:-;----c(N+I)C)
N+2
N +2
- - - - - : - ; - = ][,.
c(N Thus ][ P
+
I)C'~I)
r
= ][, as required. To normalize the vector ][, set c = 2 + :~ :7 L~~ 1 C~ 1
J
I.
The unimodality of C'~ implies that ][k has equal maxima at k = I and k = N + I (these are equal when N = 0) and has a minimum at k = I + N /2 (when N is even and at least 2) or at k = I + (N ± I )/2 (when N is odd and at least 3). When N = 0 or N = I. the minimum occurs at k = 0 and k = N + 2.
Editorial comment. Olaf Krafft and Martin Schaefer observed that their paper "Mean Passage Times for Tridiagonal Transition Matrices and a Two-Parameter Ehrenfest Urn Model," 1. Appl. Prob. 30 (1993),964-970, solves a more general problem. Also solved by R. A. Agnew. S. Amghibcch (Canada). M. Andreoli. D. Beckwith. R. Chapman (l!. K.). w. Chu (lialy). 6. Ciaurri & L. Roncal (Spain). J. Clark. K. Dale (Norway). D. Donini (Italy). E. A. Herman. W. Janous (Austria). 0. Krafft & M. Schaefer (Gcrmany). Y Lee (Korea). J. H. Lindsey II. D. M. Rosenblum. R. Stong. L. Zhou. GCHQ Problcms Group (l!. K.). NSA Problems Group. and the proposer.
Expansion By Inclusion-Exclusion 11033 [2003, 7421. Proposed bv M. N. Deshpande and R. M. Weiukar. Institute of Science. NagpuI; India. Let
~
.(m +
P(m.n. r) = L.,.,(-I)k k=()
470
2(k
11 fl
+
I)) (r)
..
k
© THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
Let m, n, and r be integers such that 0 :s r positive and that L~=o P(m, 11, r) = ('":").
:s
11
:s
m - 2. Show that P(m.ll, r) is
Composite solution I by Richard Stong, Rice University, Houston, TX, Jan,Y C. Bin;;" University of Bern, Bern, Switz.erland, and the editors. We prove a more general result. A sports league with teams T 1 , ••• , Til + 1 is expanding by adding one more team. The new team will select p players from a pool consisting of new players and s unprotected players from each of the n + I current teams. There are clearly C+I~'+II) possible selections. Let Q(e, n, p, r, s) be the number of these selections such that Tr+1 is the first-indexed team whose players are not used. Note that Q(£, 11, p, r, .1') is positive whenever r :s p :s 11.1', and Q(£, 11, p. r, .1') = (+11;:+11). With the requirements of avoiding players from ~+ 1 and using at least one player from each of T1 , ••• , ~, the inclusion-exclusion principle yields
e
L;'=o
~ I )k Q(£,n,p,r,s)=L.../-
(r) (£ +
k=1I
. k
k») .
s(11 -
P
Thus
e
Setting = m - 2 - 11, P = n, and .I' = 2 yields both assertions of the problem, since P(m,n,r) = Q(m-2-n,n,n,r,2). Solution II by BSI Problems Group, Bonn, Germany. We use the generating function i .x, -- (I - x )-111+11 , we d e fi ne d b Y FlI.r (X ) -- '\"' L...1II:>:11 P( m, n, r ).111 x . S'Ince '\"' L...J:>:II ("+ II ).i have
t(-ll(:)
FIIAx) =
r
-L
(11 +111 -n2 (k + I»)XIII
lII=clk+ II
k=O
-
t
(-1/
(r) xck + 2 .k (I -
k=()
X)II+I
o
x- - - - ( 1 _X2)'.
(1 - X)II+1
Using the "coefficient operator" [Xl],
P(m, n, r) =
[X
Ill
-
2
](1
+ x)'(l
- X)"-1+1 =
2 - k) L..(r)k (17 - r+ m-. '
111-2 k=1I
11 -
The summands and thus P (m, n, r) are positive. Also, since II
"
~ 1=0
2
P(m, n, r) = [XIII]
(I
= [xlll](1
x -
X
II
" 0 - x 2)' -
)"+1 ~ 1=11
11,
PROBLEMS AND SOLUTIONS
+ 2,
1-(I-x 2 )11+1 [xlllj---:-------:--:-(I - X)II+1
- X)-III+Ji - [Xlll]( 1 + X)"+1 =
Editorial comment. When p is an integer and at most
May 2005]
-
171 :::: /1
1
n (
+11 In) + O.
the identity of Solution I gen471
fJ
eralizes, using standard identities. Let I.X and
~ ~ (_I)h (I.X - fJ ~)k + I») =
L." (_1),(11 k
k =()
+
I) (I.X -
+I
G)
ti(k +
be any complex numbers. We compute
~ ( _ I)' (I.X - fJ ~k + I») ~
=
I») = ~(_I)i+1 (n + I) (I.X -
p
i=
In the last step, we use the fact that (U~;i) is a polynomial in
fJ.i) = P
.I
1
G)
.i
(I.X).
P
with degree at most n.
-I)i
Inclusion-exclusion yields L';~:)( (";1)/ = 0 when t :::: n (there are no sllI:jective functions from a t -set to an (/1 + I )-set), and the last step is a linear combination of this identity with the term .i = 0 left out. Setting I.X = 111 + nand ti = s yields the identity of the problem. This argument does not directly give the positivity statement. A number of solvers gave inductive proofs based on recurrences. Also solved hy S. Alllghihech (Canada). M. Bataille (France). D. Beckwith. R. Chapman (U. K.). W. Chu & M. R. Maglio (Italy). P. P. Dalyay (Hungary). J. P. Grivaux (France). W. Janous (Austria). O. P. Losscrs (Netherlands). Y. H. Mikaelian (Armenia). I. Sofair. R. Sprugnoli (Italy). A. Stadler (Switzerland). 1. H. Steelman. C. N. Swanson. A. Tefera & A. Zeleke. W. F. Trench. J. T. Ward. GCHQ Prohlem Solving Group (U. K.). "FejentalaltuKa" SIeged Problem Solving Group (Hungary). and the proposers.
Evaluation of an Integral 11035 [2003, 7421. Proposed bv Omer Yayenie, Murray State University, Murray, KY Find all reall.X and all positive ti such that
1
"/2
.
0
efiU"I-\)cos(2(1-I.X)x +fJsin(2x»dx =
II
sin(I.XIT)
2fJl-u
If! -dx. e-\ ()
xO'
Solution by Joe Geiman, student, McDaniel Col/eRe, Westminster. MD. The integral on the right diverges if I.X ~ I. We show that the equation is valid whenever I.X < I and ti > O. The left side is A = -~ll I
2
Since
ti cos (}
1" ()
exp(ticose)exp(i(e(l-I.X)+tisine))de.
is even and (} (I - I.X) +
If"
A = -
fJ sin 8
is odd, we have
exp(ticos8)exp(i(8(1-I.X) +tisin8)).
4" This may be written as a contour integral
A = ti
I
u -
4i
1. e dz. fr.(f!) ZU C
where r(ti) is the circle centered at 0 with radius ti. If I.X is a (nonpositive) integer, then the integrand is analytic inside the contour, so A = O. When I.X is an integer, the proposed right side is also O. If I.X is not an integer. then the denominator z" must be understood as the branch of a multi-valued function that is defined in the complex plane with the negative real axis removed and corresponds to the branch of the argument taking values between -IT and IT. To evaluate this contour integral, we deform the contour, as explained for example in Titchmarsh's Theorv FUllctiollS. Let 8 > 0 be very small. and consider a closed
or
472
© THE
MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
contour made up of four parts: the circle r (13) going counterclockwise from argument -Jr to argument Jr, the line segment T (Jr l along the negative real axis (with argument Jr) from -f3 to -8, the circle r(8) going clockwise from argument Jr to argument -Jr, and the line segment T (-Jr l along the negative real axis (with argument -Jr) from -8 to -f3. The contour integral around the whole path is 0 because the integrand is analytic inside it. Therefore we have
The integral around r( 8) goes to 0 as 8 --)- O. so we get
Therefore, as required. A
13,,-1
.
= - - 2i sm(JrO'l 4i
If! ()
e-\ - dx
=
x"
sin(JrO') 1
213 -"
If! ()
e- r -
x(l'
dx.
Also solved by S. Amghibcch (Canada). R. Chapman (U. K.). M. L. Glasser. J. Grivaux (France). 1. A. Grzesik. G. Lamb. O. P. Lossers (Netherlands). A. Stadler (Swiucriandl. R. Stong. N. Thornher. Y. Zoluin. (Hungary). and the proposer.
Shrinking Frcchet Differentiable Maps 11037 r2003. 743]. Proposed by M. L. 1. Halltlls. Technical Vnil'ersity o{Eindhoven, Eindhoven, The Netherlands. Let Band C be complex Banach spaces. and let F: B--)C be a Frechet differentiable map satisfying II F (x) II s Ilx II for all x E B. Show that F is linear. Solution by Julien Grivallx, studellt, Vnil'ersite Pierre et Marie Curie, Paris. France.
We first prove a lemma, which is an extension to Frechet differentiable maps of a classical result. Lemma. Let C be (/ complex Banach space and let ¢: C --)- C he a Free-het differentiable map. [{there exist positi!'e Ilumhers A LInd B such that II¢(:::)II SA + BI:::lfor all Z E C then there exist A.. /-1 E C such thqJ ¢ (::) = A. -t::/-1 for all :: E C Proof We shall prove that the function ¢ defined by ¢(::) = ¢(::) - ¢(O) is linear. If not, then there exist::. w.~. ( E C with ( 1= 0 such that ¢(::
+ ~lJIl -
¢(:::) -
~¢(w)
= (.
By the Hahn-Banach theorem there exists a continu£us linear form 0': C --)- C such that 0'(1;) 1= O. Since ¢ is Frcchet differentiable. 0' 0 ¢ is holornorphic. and 10'
0
¢'(::ll S 110'11 ·11¢'(:::lll
+ II¢(O)II) s 110011(11¢(olll + A + BI::I). S 110011(11¢(::lll
Therefore 0' 0 ¢': C --)- C is affine and holomorphic. Because ¢'(O) = 0, it follows that 0' 0 ¢' is linear. Hence (0' 0 ¢')(::: + ~wl = (0' 0 ¢')(::) + ~ . (0' 0 ¢,)(w). This proves that O'(n = 0, a contradiction. We conclude that ¢ is affine. D May 2005]
PROBLEMS AND SOLUTIONS
473
Now we turn to the proof of the problem. Let a, b by
E
B be given. Define
ce --+ ce
11
=
E
A(a, b)
C such that for all
zE
ce
+ Zfl(a, b).
If a = 0, then z = 0 yields A(O, b) = 0, and z = I yields fl(O, b) = F(b), so for all b E B and all z E C, F(zb) = zF(b). Now for general a and b, if z = 0 then A(a, b) = F(a). If z f= 0, then F(a + zb) = F(a) + Zfl(a, b). Multiply by liz to get
F (lIz
+ b) -
F(a)/z = flea, b).
Since F is continuous,
flea, b) =
lim 1;I-----,o.(XJ
Therefore F(a so F is linear.
+ zb)
= F(a)
[
F ( -a z
+ zF(b).
+ b)
F(a)] - - = F(b). Z,
This is true for all a, b E B and all z E
ce, so
Also solved by R. Bagby. J. A. Baker (Canada). M. W. Botsko. R. Chapman (U. K.). E. A. Herman, N. Passell, R. Stong, R. Tauraso. '"f'cjcntalaltuka'" Szeged Problem Group (Hungary), GCHQ Problem Group (U. K.), and the proposer.
Morley's Triangle and Lemma ll038l2003, 743]. Proposed by Floor van Lamoen, Goes, The Netherlands. Let ABC be a triangle, and let LAB, L BC , and LCA be the entire lines of which AB, BC, and CA are segments, respectively. Let a circle intersect these lines at A" A2 on L BC, B" B2 on LAc, and C" C 2 on LAB, in such a way that the chords A, B 2, B, C 2 , and C, A2 are parallel. Prove that these three chords are also parallel to one of the sides of Morley's equilateral trisector triangle in A Be.
Solutioll by Michel Bataille, Rouen, France. We make use of Morley's lemma: A line l is parallel to a side of Morley's equilateral trisector triangle in 6ABC if and only if L.(AB, I)
+ L.(Be, I) + L.(CA, I)
= 0
(mod Jr),
(1)
where L.(u, v) denotes the directed angle/rom line u to line v. The lemma can be found in I.-P. Ehrmann & B. Gilbert's paper "A Morley Configuration," Forum Geometricum 1 (2001),55. Take for I the common direction of the lines A, B 2 , B, C 2 , and C, A 2 , and note that L.(AB,/) = L.(C,C 2 ,C,A 2 ), L.(BC,/) = L.(A,A 2 ,A,B2 ), and L.(CA,/) = L.(B, B 2 , B, C 2 ). Also, depending on the order of the three parallel lines, it may occur that L.(C,C 2 , C,A 2 ) = L.(A,C2 , A,A 2 ) and L.(B,B 2 , B,C2 ) = L.(A,B 2 , A I C 2 ), as angles subtending the same arc of circle. In other cases, two such angles are replaced by the supplements. In every case, L.(AB, I) + L.(BC, I) + L.(C A, I) = L.(A, C 2 , A, A 2 ) + L.(A,A 2 , A,B2 ) + L.(A 2 B 2 , A,C2 ) = L.(A,C 2 , A,C2 ) = 0 (mod rr). Equation (1) is therefore valid for the selected line I, and the conclusion follows. Editorial comment. The proposer remarked that the lemma can be proved by showing that the three sides of Morley's triangle satisfy the condition and noting that there can only be three directions for I that satisfy the condition. As an alternative reference, we cite pages 24 and 25 of Coxeter's Introduction to Geometry,. Also solved by S. Amghibech (Canada), R. Chapman (U. K.), R. A. Simon (Chile), R. Stong, L. Zhou, '"Fejentalaltuka'" Szeged Problem Solving Group (Hungary), GCHQ Problem Group (U. K.), and the proposer.
474
© THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
REVIEWS Edited by Gerald B. Folland Mathematics Department, Unil'ersit\'
of Washington,
Seattle, WA 98195-4350
Ahstract AlgeiJra. By Ronald Solomon. Brooks/Cole. Belmont. CA. 20()3. xii + 227 pp .. ISBN
0-534-39996-7. $99.95. Integers, Polynomials and Rings. By Ronald S. Irving. Springer. New York. 2004. xv + 284 pp .. Cloth: ISBN 0-387-40397-3. $79.95. Paper: ISBN 0-387-20172-6. S39.95.
Reviewed by Daniel J. Madden Where do college mathematics textbooks come from? My guess is that most midlevel texts begin as instructors' lecture notes. The process begins when a mathematician is assigned to teach a course. sayan introductory course in abstract algebra. After choosing an appropriate text to follow. he finds that the material in the text needs certain adjustments to fit his particular style and the students in the class. By the time the course ends. he has a much clearer idea of what the course should be and asks to be reassigned for another try. Each time the instructor teaches the class. the course improves as ideas become better organized, explanations are improved. and exercises expanded. Classroom experience helps set the limits of the course. Some topics might be sacrificed to make the subject accessible to the students, or fully developed ideas in the class notes might be replaced to give other topics a try. Students have almost as much input into the final course as the instructor and the original syllabus. In time. a course emerges with clear and accessible goals, a course that fits the students' needs and abilities. The lecture notes have become so organized that they are almost a text already. It certainly seems worth the effort to polish the writing and turn the notes into a publishable manuscript. This way the mathematician can allow others to benefit from all this effort. Judging by the introductions of both Ronald Solomon's Abstract Algebra and Ronald Irving's Integers, Polynomials, and Rings, both books came out of a process very similar to that just described. As part of the rationale for their texts, both authors cite their experience with an undergraduate course in abstract algebra that was generally the last course in the subject that their students took. Both point out that many students using their texts will probably be preparing for careers as secondary school teachers. The authors agree that the way to get students past the abstract nature of the material is through guided exercises that force students to construct important proofs and examples. They also agree that these exercises work best if they are integrated into the exposition and engage the students in the development of the mathematics. They both use the history of the subject to introduce many of the ideas and problems, although Solomon emphasizes history more heavily. The results of so many agreements, however, are two very different textbooks. Solomon and Irving come to quite distinct solutions to the same problem. One reason for this is that Ronald Solomon and Ronald Irving each tackled his own local problem in his own style. Basically, there are two ways to mOve a load. You can hitch it to a rope and drag it along behind you, or you can get behind and push it along. The best tactic is to switch back and forth whenever necessary. but most people prefer May 2005]
REVIEWS
475
to rely primarily on the method that comes natural to them. Solomon likes to pull. He presents new ideas quickly and forthrightly as if they are perfectly natural. He pulls the students along through explanation and example as they become more familiar with the concepts. His overall approach is ultimately expository, even though he does use exercises to get students involved in developing the theory. Mostly, the text is laid out as a narrative, and the exercises are not always critical to the mathematical exposition. Solomon presents students with abstraction, and then explains the need for it and shows its value by answering historical questions. In contrast, Irving's text is much more exploratory. In his approach, the exercises are essential in the development and the presentation of the material. Major ideas are introduced in exercises, and students are pushed into accepting a general theory that ties together concrete examples that they already understand. Many important and valuable proofs are only outlined, and the student truly needs to fill in the details. Irving urges his students to accept the increased abstraction by building on the experiences gained from his exercises, and each step in the exposition presumes a good understanding of the preceding exercises. Another reason for differences between the books is that they come from experiences with distinct groups of students. Neither author addresses the issue of prerequisites for the course specifically. Nevertheless, Solomon's students are definitely expected to be more mathematically experienced than Irving's. In his "Section 0: Background," Solomon outlines basic set theory, including the language of functions and binary operations, the pigeonhole principle, equivalence relations, partitions, and induction. This takes less than six pages. Later in the book, Solomon has no qualms about calling on ideas from linear algebra such as linear independence and finite bases. He shows no fear in introducing related topics and further abstraction as evidence of the values of the main ideas. In contrast, Irving covers the basics he needs as part of the algebra itself. He eases students into the ideas of set theory, but only as they are absolutely necessary to the algebraic content. He even shies away from abstraction that would offer immediate advantages to the exposition. In the introduction, Irving makes it clear that he has a class dominated hy prospective secondary school teachers in mind. whereas Solomon aims more for general math majors, including people interested in secondary education. (He says, "I like to think of the typical student as a future high school math teacher." I would like to think he is right. but have a hard time doing so given the depth and spread of material he covers.) No doubt the ditTerence between the students has had an etTect on the material, but I suspect that the authors' different styles are evident here as well. At some point in any course, a topic surfaces that is either complicated or technical, essential to the development of the main theme, but in the end not directly part of the principal subject. For the sake of illustration, let us agree that the idea of equivalence relations is such a secondary topic in abstract algebra. One approach is to introduce the technicalities gingerly through example. but go over them as thoroughly as possible at the first opportunity. The teacher spends the time necessary to illustrate and explain the ideas in great detail so that students arc prepared when the topic reappears later in the course. When it does, the instructor reinforces and supplements students' understanding. If one follows this path, the complete abstract version does not appear until the underlying ideas are fully developed. One might say that the students are pushed toward the abstraction. Another approach is to introduce the idea as a simple matter, and pull students along until they get used to it. The topic is presented as just another convenience; the details and technicalities are covered only when they are immediately necessary, and new aspects of the definition that arise later are treated as a natural consequence of the original ideas. Irving's text usually takes the former approach; Solomon seems to prefer the latter. In our example of general equivalence, Irving slowly moves to-
476
© THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
ward it using very thorough discussions of integer and polynomial congruence, the resulting partitions they cause, and thc impact congruence has on definitions of functions and operations. As mentioned earlier, Solomon includes the general set theory definition as part of his brief account of background information. Yet even though it is background, he does take the time to illustrate integer and polynomial congruence almost as thoroughly as Irving. Also, he chooses to refer to equivalence classes only in passing when discussing cosets of subgroups. Both authors recognize the problems that students can have as they become comfortable with this simple but abstract idea. Each chooses to deal with these problems in a way that best fits his individual style. Irving moves slowly and cautiously, going over the details carefully and thoroughly. Solomon jumps in with a matter-of-fact attitude. Solomon's approach allows him to venture into more daring materiaL while Irving's tack requires a more select range of topics. The most surprising thing about these two abstract algebra texts is the difference in the material they cover. There is a central core common to both: rings, domains, the Euclidean algorithm, fields, finite fields, polynomial rings, roots of unity, roots of general polynomials, and the fundamental theorem of algebra. Nevertheless, given the manner in which the topics are covered, this really appears to be only a coincidence. Irving's main focus is on ring theory, and specific commutative rings at that. The central motivating problem throughout is solving polynomial equations. This does not require ideals, so they do not appear. Methods for solving linear, quadratic, cubic, and quartic equations are given, and their relevance to various coefficient rings is discussed in detail. This leads into other topics like quadratic reciprocity and characterizing irreducible polynomials over the real and complex numbers. Irving explores the arithmetic of Gaussian integers as a good example of unique factorization in an unfamiliar situation. Little is mentioned that isn't thoroughly explained. As a result. Irving produces a self-contained textbook for a well-defined course. Solomon builds along a path of historical problems and their solutions, and group theory plays the prominent role. Mathematical symmetry is introduced at the beginning through geometry and permutations. Eventually the definition of an abstract group appears, and soon afterward one finds a very concrete version of the Galois group. The Gaussian integers are used to prove Fermat's two-squares theorem. This is followed by an optional section on the quaternions and a left-ideal factorization proof of the foursquares theorem that is too deep for most undergraduates. A concrete construction of the Galois group of a cyclotomic field is motivated by Gauss's interest in Euclidean constructions. Solomon ends up with the Galois correspondence and its implications about the roots of higher degree polynomials. He repeatedly mentions ideas from a wide range of areas of mathematics, often with limited explanation, simply as something interesting to be noted in passing. His text is definitely meant to extend far past the confines of any actual undergraduate course. In terms of the mathematical topics covered, Solomon's text seems much more ambitious than Irving's, but this is not really the case. Solomon outlines much more than a year-long course, but in his introduction he suggests how to trim it to fit into one semester. His notes to the instructor also recommend that certain sections "be given a light, heuristic treatment." This is definitely in keeping with his goal of creating a course that is meant as an introduction to the ideas of abstract algebra but not a preparation for graduate studies. Solomon can cover more mathematical topics because, in the end, his aim is mostly to skim the surface, only occasionally dipping into deeper mathematical ideas. He offers a smorgasbord of deep ideas that allows instructors to pick topics in amounts that suit their appetites. Irving's sparser text is just as ambitious as Solomon's, but in a different way. It provides a reasonable one-semester course. May 20051
REVIEWS
477
Irving.s aim is for his students to develop a working understanding of all the ideas covered in the course. His text uses detailed examples and exercises to help students understand where the abstract ideas come from and exactly why they make the previous problems easier to solve. In the end, Irving expects students to have the skills necessary to use the ideas developed in the course in new and different ways. There is one way in which the texts are similar, one that bothers me. Both authors use historical references to move their exposition along. Commendably, both authors have researched the true history and have avoided repeating, or creating, the mathematical folklore that appears as historical fact in too many textbooks. Still, neither book contains a list of references. Perhaps mathematical references are unnecessary in such personal approaches to well-known algebra, but the historical parts really should have general citations that would allow the reader to check the author's account and learn the whole story. This is a particularly glaring omission in Solomon's text, since it follows a historical approach and is meant to serve as a continuing resource after the class is over. For an instructor who has the necessary discipline to use an exploratory approach, Irving's Integers, Polynomials, and Rings is a great choice. The text is self-contained and could be read and understood without other assistance. This book would be a good choice for a teacher who can step back and let the students work through the material themselves, adding ideas only when the students get bogged down. With the instructor pushing the students along at a well-planned pace, the full text could be covered in a semester, as intended. The students would emerge with a good idea of the basic ideas of abstract algebra. However, Irving's book would not be right for instructors who cannot stop themselves from giving away the secrets when things slow down. Irving really does expect the students to have suffered through all the (not unreasonable) work necessary for the exercises. Instructors who prefer to pull rather than push would be much better off with Solomon's Abstract Algebra. This text is better for those whose natural inclination is to lecture and to pick and choose their own topics and use their own lines of attack. Solomon gives an instructor ample opportunity. and reason, to add explanations and points of view to the textual material. The text offers lots of choices and contains many engaging ideas. It is not a good text for an independent learner, but with a good teacher leading the way, it presents a very complete introduction to modern algebra. These texts were developed by two obviously talented teachers to match their different styles and to reach different groups of students. Each is very good in its own category. The result is that the rest of us have two more great choices for texts to fit our own styles and our own students. University of Arizona, Tucson, AZ 85721-0089 [email protected]
478
© THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112
Introduction to the Mathematics otMedicalll1lllgillg. By Charles L. Epstein. Pearson Education, Upper Saddle River. NJ. 2003. xxvii + 739 pp .. ISBN 0-13-067548-2. $92.
Reviewed by John Sylvester Introduction to the Mathematics of" Medical Imaging is a surprising and delightful blend of pure and applied mathematics. I say "surprising" because I have come to expect that mathematical rigor and serious attention to applications can't-or at least don't often-coexist. It is unfortunate that disparate mathematical communities have evolved in such a way that many of us who value rigorous proofs often pay little attention to scientific needs, while those who carefully connect and tailor mathematical problems to technological needs are too ready to dismiss any article containing the words "theorem" or "lemma" as unrealistic. I am delighted that this is not at all the case in Charles Epstein's book. which should serve as a prototype for the successful integration of rigor and real applied mathematics. It is, to be sure, a mathematics book. As the author points out in the preface, "The book is written in the language of mathematics, which, as I have learned, is quite distinct from the language of physics or the language of engineering." He should go further and point out that the student (or colleague) who studies this book will gain a substantial basic vocabulary that will put him in a position to understand the other two languages. The target audience of the book is advanced undergraduate mathematics students, and the course in which it is used could be an alternative to a standard Fourier analysis course. Chapters 4, 5, 6, and 7 treat the Fourier transform, convolution, the Radon transform, and Fourier series, respectively. The treatment is about what one would expect from an advanced undergraduate or beginning graduate text on these subjects. The theory is developed in L 2. The perspective is very much that of introductory functional analysis, emphasizing the relationship between smoothness and decay of the Fourier transform or Fourier series coefficients. It includes a substantial discussion of the Hilbert transform and weak derivatives that serves to relate the Fourier and Radon transforms. The rigorous mathematics is complemented by chapters entitled "A Basic Model for Tomography," "Reconstructions in X-ray Tomography," "Imaging Artifacts in X-ray Tomography," and "Algebraic Reconstruction Techniques," which put as much effort into carefully developing the models for X-ray computerized tomography (CT) machines as has been put into developing the mathematical techniques. The application is placed on an equal footing; it is not merely an excuse for doing the mathematics. X-ray tomography is perhaps the best example of a technology that has been revolutionized by sophisticated mathematics. Not only has interesting mathematics been developed to interpret the technology, but technology has developed in response to the mathematics. The author includes enough of the history of the subject to relate this intriguing story. The mix of illustrations, including mathematical graphics, CT-scans, and true anatomy, give a nice perspective on how far the field has come and how much is still left to accomplish. I haven't taught a course from this text, but I am anxious to do so. What appeals to me most is not the chapters on medical imaging-my personal interests lie in other applications-but the chapters in the middle. Building the bridge between the theoretical development of the Fourier and Radon transforms and the practical issues in tomography requires the material in chapters 8, 9, and 10, which treat sampling and
May 2005]
REVIEWS
479
filtering. These topics fit very naturally into a course on Fourier analysis, and, at little additional cost. open up a whole world of applications and a new perspective that motivates us to reinterpret and extend these tools in different ways. There is a vast literature in engineering and the physical sciences, including signal processing, remote sensing, electromagnetics, and acoustics, that is inaccessible to many mathematicians even though the main tool used is Fourier analysis. These areas are closed to many of us simply because of a difference in language. Learning the basics of sampling and filtering goes a very long way towards bridging that gap. Sampling studies the effects of discretization: if we can only measure a signal every microsecond, we must understand how the basics of Fourier analysis change when we restrict functions to discrete, but often numerous, sets of points. What properties of the Fourier series coefficients continue to make sense under the equivalence relation that two functions, originally defined on the circle, are equivalent if they have the same values at the nth roots of unity? Filtering is in some sense complementary: it treats the averaging effects of measurement, studying the range of convolution mappings, the image of this range under the Fourier transform, and how properties of the convolution affect the resolution of an image. The definition of a point spread function appears here. This is another example of a concept that can be introduced simply and easily in terms of what we already teach but adds significantly to the student's repertoire. Introduction to the Mathematics oj"Medicalllllaging also contains chapters entitled "Probability and Random Variables," "Applications of Probability," and "Random Processes." These chapters are not related to Fourier analysis, but they address an issue that dominates many applied problems: the best theories and algorithms may be impossible to use until we understand how they are affected by the unavoidable presence of noise. Randomness thus plays an essential role in medical imaging problems, and, with its inclusion, the book has treated every significant aspect of the subject. In conclusion, Charles Epstein's book illustrates, with crystal clarity, that rigorous mathematics needn't be separate from applied mathematics. Many of us can benefit from reading it. With this book, and more books like it, the next generation can come away with a more integrated view of mathematics and the world of science and technology. Unil'"rsi/\' o{WasiJing/on. SI'II//il'. WA 98195-4350 s\,Il'l's/i!ll/i/{I/iJ.ll·a.liJing/on.l'dll
480
© THE MATHEMATICAL ASSOCIATION OF AMERICA
[Monthly 112