The Mathematical Intelligencer encourages comments about the material in this issue. Letters to the editor should be sen...
9 downloads
423 Views
27MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
The Mathematical Intelligencer encourages comments about the material in this issue. Letters to the editor should be sent to the editor-in-chief, Sheldon Axler.
Calculus Reform I would like to comment on the Opinion column on Calculus Reform by Professor Protter, which appeared in the Fall 1990 Mathematical Intelligencer. I wholeheartedly agree with Professor Protter with respect to the current excessive size and flashiness of calculus textbooks, the exaggerated influence of publishers, the irrelevance of the so-called "applications," and the fashionable, but as yet unestablished, usefulness of computer aids. However, I disagree with his rejection of the "apprenticeship" model of studying, based on practice, imitation, and interpersonal communication. Professor Protter claims that the optimal method of studying consists of absorbing the principles on which the various techniques are based and developing correct applications of those techniques in an individual way. In my opinion, this is true for students planning to major in mathematics or in other areas of the socalled "hard sciences." However, Professor Protter himself admits that currently "the overwhelming majority of students taking calculus do not become mathematics majors." Therefore, it is more important for these students to understand the types of problems that can be solved by calculus methods and how to solve them, than to acquire a full appreciation of the principles on which they are based. I consider myself a competent driver (based on my accident-free record) and can operate m y car in all kinds of traffic conditions. I do have a general understanding of how the engine of a car works, but I do not know all the principles on which driving is based, nor did I learn to drive by analyzing such principles and then deriving the combination of actions needed to move a car safely through traffic. I believe that our love for the mathematical principles of calculus should not obscure the fact that other professionals like and use calculus for its practical applications rather than its internal beauty. To be successful and appreciated by our varied audience we must understand and respect this fact. This brings me to point out another weakness I have noticed in most calculus textbooks but one that was not reported by Professor Protter, namely, the great breadth but shallow depth of applications. Perhaps we 4
should follow the examples of statistical textbooks and develop more area-specific calculus textbooks. I am sure that a chemistry student is interested in methods and applications different from those of an economics student or a social sciences student. A textbook that tries to be appropriate to everybody is likely to interest nobody. On the other hand, area-specific textbooks could concentrate on appealing problems and convince students that calculus may be useful to them and that it is therefore worth investigating. Of course, this should be accompanied by the offering of separate, area-specific sections of calculus courses by colleges and universities. I realize that both ideas are being applied to some extent in some places, b u t - - i n m y o p i n i o n - - t h e y should be practiced more widely. I look forward to a continued lively debate on this matter, so that we may keep improving what we have to offer to our students. Roberto Bencivenga Learning Assistance Centre Red Deer College Red Deer, Alberta T4N 5H5 Canada
Boy's Surface Ulrich Brehm's paper about the Boy surface (Mathematical Intelligencer, Fall 1990) reminded me that a picture of the Boy surface using "windows" appeared in an earlier Mathematical Intelligencer article by George K. Francis and Bernard Morin (volume 2, number 4, 1980) a b o u t e v e r s i o n of t h e sphere. After reading that article I m a d e a p l a s t e r m o d e l , s h o w n h e r e , of Boy's surface. Benno Artmann Fachbereich Mathematik Technische Hochschule Darmstadt 6100 Darmstadt Federal Republic of Germany
THE MATHEMATICAL INTELLIGENCER VOL 13, N O 2 9 1991 Spnnger-Verlag New York
The Opinton column offers mathematiczans the opportunity to write about any zssue of interest to the international mathematical community. Disagreements and controversy are welcome. An Opinion should be submztted to the edztor-in-chief, Sheldon Axler.
ICM 90 Walter Rudin
The most recent International Congress of Mathematicians was held in August 1990 in Kyoto, Japan. Almost every speaker at the opening ceremony emphasized that this was the first ICM outside Europe or North America. Our Japanese hosts did a beautiful job organizing this event (there were about 3,800 participants), all of which took place in the well air-conditioned Kyoto International Convention Hall. The lecture rooms accommodated the audiences without difficulty, the sound systems worked, and food and drink were easily available, good, and inexpensive. There was plenty of r o o m to stand a r o u n d (or sit) and gossip, and transportation to and from the hotels ran smoothly.
One of the most pleasing aspects of this Congress was that many more Soviet mathematicians came and lectured than was the case at any of the earlier ones (Edinburgh, Stockholm, Nice, Vancouver, Helsinki, Berkeley) I have attended. Vladimir G. Drinfeld, Vaughan F. R. Jones (who lectured on von Neumann algebras and knot theory in his "All Blacks" rugby outfit), Shigefumi Mori, and Edward Witten were the Fields Medal winners. The Nevanlinna Prize went to A. A. Razborov. As usual, there were three categories of lectures: (1) 15 "'Plenary Lectures" (60 minutes), aimed at the Congress as a whole, (2) 250 "Section Lectures" (45 minutes), split among 18 sections, corresponding to
The Golden Pavilion, Kyoto. 6
THE MATHEMATICALINTELLIGENCERVOL 13, NO 2 9 1991 Spnnger-Verlag New York
v a r i o u s subfields of M a t h e m a t i c s , a n d (3) a large n u m b e r (I didn't c o u n t them) of "Short C o m m u n i c a tions," about 4 or 5 scheduled per hour. There were also some seminars that were not included in the prep a r e d program, including one in which a disproof of the R i e m a n n h y p o t h e s i s was claimed. This was received with some skepticism. The Plenary Lectures are of course m e a n t to be the "Main Event" of each ICM. A n d it is here that a lot of negative c o m m e n t s w e r e heard. Every one of the 15 lecturers is an outstanding, top-rank mathematician, no question about that. The complaints dealt with the p r o g r a m ' s heavy slant t o w a r d Mathematical Physics and the resulting neglect of m a n y other topics. About 10 of these 15 lectures dealt with things like string t h e o r y , gauge fields, low-dimensional t o p o l o g y related to these, and q u a n t u m groups, q u a n t u m groups, q u a n t u m groups . . . . (One of the plenary lecturers, I believe it was Sinai, e v e n c o m m e n t e d on this imbalance in his talk.) A n d these 10 lecturers s e e m e d to talk largely to each o t h e r a n d about each other. If y o u d i d n ' t k n o w the subject, it was v e r y h a r d to learn m u c h about it by listening. Four years earlier, at Berkeley, this a s p e c t of the Congress was m u c h better. A large majority of the Plen a r y Lectures given there were a m o n g the best I ever h e a r d (and this o p i n i o n was e x p r e s s e d b y m a n y ) . There was real variety, and most of the speakers had successfully p r e p a r e d their lectures with n o n - e x p e r t audiences of 2,000 or so in mind. I believe that the Program Committee u r g e d t h e m to do so, and t h e y took this advice seriously. May the 1994 Congress in Z~irich return to that high standard!
Department of Mathematics University of Wzsconsin Ma&son, WI 53706 USA
Is information an independent quality of matter, like mass, momentum, or electric charge? Professor Tom Stonier believes it is. Information exists independently of anyone being able to understand it. For over two thousand years, nobody could read Egyptian hieroglyphics, but the reformation was there just the same. At a microscopic level, it is not hard to see that a DNA crystal contains more information than a salt crystal of the same mass. Crucially, the DNA crystal also requires more energy to assemble. It takes energy to create information, that is, to increase the degree of organization of the universe, or reduce its entropy. It appears to be possible to calculate an equivalence between energy and information. This may explain mysteries like potential energy. It takes energy to lift a brick from the ground to the table, and we say the brick gains potential energy, but there ~s no test we can perform on the brick to measure it. An explanation may be that the energy has been converted into information, reducing the entropy of the universe. This book sets the scene for a useful theory of information. Tom Stomer suggests that information may even take the form of particles, which might not be limited to travelling at the speed of light. The implications for the physical sciences are profound. Anyone who deals with information in science should read this book. It makes compelling and ~ , . ~ challenging reading.
Springer-Verlag [] Heldelberger Platz 3, D-1000 Berhn 33 [] 175 Fifth Ave, NewYork, NY 10010,USA [] 8 Alexandra Rd, London SW19 7JZ, England [] 26, rue des Carmes, F-75005 Pans [] 37-3, Hongo 3-chome, tm 9639/W2h$ Bunkyo-ku, Tokyo 113, Japan [] CJtlcorpCentre, Room 1603,18Whltfield Road, Causeway Bay, Hong Kong [] Avlnguda Diagonal, 468-4~C, E-08006 Barcelona
Kyoto International Convention Hall. THE MATHEMATICAL INTELLIGENCER VOL 13, NO 2, 1991
7
Karen V. H. Parshall*
One Hundred Years of the Deutsche Ma thema ti ker- Verein igun g Friedrich Hirzebruch Column Editor's Note: In 1990, the Deutsche Mathematiker-Vereinigung celebrated its one-hundredth anniversary at a meeting in Bremen, the site of its inception. As the Union's President, Friedrich Hirzebruch delivered a speech on 17 September 1990 in which he briefly sketched the history of the Union from its nineteenth-century origins, through the difficult interwar years, to the spectacular developments of the centennial year of 1990. The following is a somewhat free translation of part of that speech. I would like to thank my colleague, Bill Jackson of the German Department here at the University of Virginia, for giving so generously of his time in helping me over several difficult points in the text. The Society of German Scientists and Physicians (GDNA) met in Bremen 15-20 September 1890, and at this time, the Union of German Mathematicians (DMV) grew out of the group's Section 1 for mathematics and astronomy. The statutes and bylaws of the DMV were passed at its first meeting, which was held in Halle along with the sixty-fourth meeting of the GDNA. As its goal, the Union resolved "jointly to promote and to improve mathematics in all ways, to establish active connections and interaction among its various parts and scattered publications, to elevate it to its rightful place in the intellectual life of the nation, to offer its representatives and disciples the opportunity for free and friendly communication and for the interchange of ideas, skills, and hopes.'" The statutes called for the executive committee "to prepare for the annual meeting by arranging a full program, comprised, if possible, of talks on the develop-
ment of a single area of the science." When the DMV was founded here in Bremen a hundred years ago at the sixty-third meeting of the GDNA, the lecturers included Georg Cantor, Paul Gordon, David Hilbert, Felix Klein, Hermann Minkowski, Eduard Study, and Heinrich Weber, among others. Efforts to found the DMV began as early as 1867, when the mathematical physicist and algebraic geometer, Alfred Clebsch, put forth an appropriate proposal at the meeting of the GDNA in Frankfurt-amMain, where he was lecturing on binary forms. The mathematicians recognized that it had become necessary for them to set up regular specialized meetings and to found a new publication outlet. As a result of a two-day-long hike in which twenty mathematicians participated, the Mathematische Annalen were subse-
* C o l u m n Editor's a d d r e s s : D e p a r t m e n t s of M a t h e m a t i c s a n d History, University of Virginia, Charlottesville, VA 22903 USA. 8 THE MATHEMATICALINTELLIGENCERVOL 13, NO 2 9 1991Sprmger-Verlag New York
quently founded in 1868, although the intended outcome of the gathering had actually been the founding of the DMV itself. Edited initially by Alfred Clebsch and Carl Neumann, the journal published its 288th volume in 1990. The founding of the DMV, however, was delayed again and again. In 1872, Alfred Clebsch died from diphtheria at the age of thirty-nine. Although his student, the then twenty-three-year-old Felix Klein, kept the idea alive, the crucial initiative came much later from Georg Cantor. From its f o u n d i n g , the DMV has published its Jahresbericht, a journal conceived as a publication medium principally for survey articles on branches of pure and applied mathematics. A glimpse at the v o l u m e s from the first ten years s h o w s h o w the founders realized this conception. Reports on soil pressure, trusses, photogrammetry, mechanics, probability theory, kinetics, and oscillating functions show clearly the role of applications. Invariant theory, algebraic functions, n u m b e r theory, and Cantorian set theory are some of the subjects of reports in pure mathematics. In particular, in his 370page Zahlbericht, Hilbert developed the theory of algebraic number fields. His report remains the basis for an abundance of research even today. In the numbers of the Jahresbericht that appeared in 1989 and 1990, one finds such topics as the frontier between geometry and physics, free boundary-value problems, elliptic curves, minimal surfaces, statistics, control theory, homotopy theory and number theory, complexity theory, sphere-packing, and the finite-element method in the mechanics of solid bodies. Thus, there is still a healthy mixture of pure and applied mathematics in the Jahresbericht today. In spite of the promotion of applied mathematics within the DMV, the Society for Applied Mathematics and Mechanics (GAMM) was formed in 1922 with the stated purpose "of fostering and advancing the scientific exploration of all branches of mechanics, mathematics, and physics which are among the bases of the engineering sciences." The DMV considers itself the mother of the GAMM, just as it regards itself as the daughter of the GDNA. An honorary member of the GAMM, Felix Klein once said: "May the words of Leonardo da Vinci, that 'mechanics is the mathematician's paradise,' be fulfilled again." In the time of the DMV's founding and up to the beginning of the era of National Socialism, German mathematics led on the international scene. Several remarks on the work and curriculum vitae of Cantor, Klein, and Hilbert, three of the DMV's founding members, ought to give an indication of its high level. Georg Cantor (1845-1918) served as the president of the DMV from 1890 to 1893. As early as 1878, Cantor had published his paper, entitled A Contribution to the Theory of Manifolds, in which he used the terminology "'Mannigfaltigkeit" or "manifold" instead of "'Menge'" or
"set.'" In this work, he introduced the notion of the "potency" (cardinal number = number of elements) of a set, and using this, he discovered the transfinite numbers, numbers which could be not only compared but also manipulated arithmetically. In addition, he stated the now-famous " c o n t i n u u m hypothesis," namely, an infinite set of points on the real line is either countable or equivalent to the set of all points on the line (that is, equivalent to the continuum). As far as its proof was concerned, however, Cantor closed his 1878 paper by "'[postponing] the full investigation of this question to a later occasion." At the meeting of the DMV in Halle in 1891, Cantor gave a famous lecture in which he proved, using the diagonal process now named for him, that the power 2M is greater than M for every cardinal number M. Thus, there are larger and larger infinite cardinal numbers. He ended his lecture with these words: The potencies represent the simple and important generalization of the finite 'cardinal numbers.' These are nothing more than the transfinite cardinal numbers, and they possess the same reality and certainty as the former, except that the lawful relations among them, that is to say the arithmetic related to them, is of a different kind from that of the finite realm. The further exploration of this field of inquiry is a project for the future. These ideas and others exposed Cantor to the hostility of colleagues who, for example, declared as absurd his result that the line and the plane contained the same n u m b e r of points. By 1895 at the latest, Cantor discovered paradoxes in his set theory based on all sorts of strange sets, and he communicated these n e w findings to, for example, Hilbert. The axiomatization of set theory due to Zermelo and Fraenkel resulted from attempts to overcome this foundational crisis. In 1940, Kurt G6del showed that the continuum hypothesis is consistent with Zermelo-Fraenkel set theory (with the axiom of choice). Then, in 1964, Paul J. Cohen proved the independence of the continuum hypothesis, a result for which he won the Fields Medal at the International Congress of Mathematicians in Moscow in 1966. Today, set theory has become a deep area of foundational research in mathematics. One of Cantor's successors, Felix Klein (1849-1925) served as president of the DMV in 1897, 1903, and 1908 and held the honorary presidency in 1919. On the golden anniversary of his doctoral degree, his colleagues dedicated a special salute to him in which they highlighted some of his work and achievements: 12 December 1918 marked the fiftieth anniversary of your receipt of the doctorate, at the age of nineteen, from the Philosophical Faculty of Bonn University. . . . Shortly after your dissertation appeared in print . . . . . you conducted, together with Sophus Lie, your first group-theoretic researches. In the early 1870s, your deep discoveries in non-Euclidean geometry followed, and in assuming your professorship at Erlangen [in 1872] you made public your epoch-making Erlanger Program. The essential charTHE MATHEMATICALINTELL1GENCERVOL 13, NO 2, 1991 9
acter, which would distinguish the brilliant progression of your subsequent research, already manifested itself here: the interplay and mutual stimulation of the various mathematical disciplines and the ingenious insight into their inner connections. This mathematical research, marked by your very personal approach, blossomed most beautifully into your admirable work on the transformation of seventh-order elliptic functions. These findings will always be honored as a jewel of mathematical research. But we also see the merits of your approach reflected in the rich abundance of your later work, on the applications of mathematics to the natural sciences. Klein was the foremost politician of science among the mathematicians of his day. He was involved in plans concerning the amalgamation of the universities and the technical schools, collaboration with industry, cooperation in the founding of the Institute for Aerodynamical Research in G6ttingen, cooperation on the Encyclopedia of the Mathematical Sciences, the presidency of the International Commission on Mathematical Instruction, and reforms of mathematical and scientific instruction at the elementary through the college levels. All of these activities are as important today as they were then. Furthermore, the "interplay and mutual stimulation of the various mathematical disciplines" as well as the "applications to the natural sciences" referred to in the salute have never been of such pressing importance as they are today. The split of mathematics into pure and applied parts is both arbitrary and harmful 9 In past decades, it has led to many false steps. Felix Klein would be pleased that this division has been largely overcome in recent years and that the unity of mathematics once again stands out. Today, instead of his interest in the windtunnel at the Institute for Aerodynamical Research, Klein would perhaps focus on theories of mathematical modeling, which would effect, for example, the optimal design of an airplane wing with the help of high-speed computers. Richard Courant was called as a successor to Klein at GOttingen in 1921. Thanks to Courant's initiative and with the help of the Rockefeller Foundation, a mathematical institute was built, which was dedicated in 1929. Another one of Felix Klein's dreams had come true. Finally, from this early period of the DMV's history, David Hilbert (1862-1943) held its presidency in 1900. In this same year, he lectured on his twenty-three problems at the International Congress of Mathematicians in Paris, and he asked: "What n e w methods and new facts will this new century uncover in the broad and rich field of mathematical thought?" Hilbert's problems, which represented a glimpse into the future, range from Cantor's c o n t i n u u m h y p o t h e s i s (problem 1) to the extension of the methods of the calculus of variations (problem 23). Since he presented them, many of these problems have been solved 9 In the eighth, h o w e v e r , on the distribution of prime 10
THE MATHEMATICAL INTELLIGENCER VOL 13, NO 2, 1991
numbers, the Riemann hypothesis still remains a central unanswered question. Hilbert died in 1943 during the War and the Nazi terror. Writing [in English] from Princeton, where he had gone with his Jewish wife in 1933, H e r m a n n Weyl, the president of the DMV in 1932 and Hilbert's successor in the mathematical chair at G6ttingen, submitted an obituary of his predecessor that was published during the War by the Royal Society and the American Philosophical Society. In Weyl's words, At the beginning of this year died in Gottingen, Germany, David Hilbert, upon whom the world looked during the last decades as the greatest of the living mathematicians. Hilbert and Minkowski were the real heroes of the great and brilliant period which mathematics experienced during the first decade of this century in Gottingen . . . . Among the authors of the great number of valuable dissertations . . . written under Hilbert's guidance we find many Anglo-Saxon names, names of men who subsequently have played a considerable role in the development of American mathematics. 9
.
.
Yet, as Hermann Weyl goes on to write in his obituary of Hilbert, "soon the Nazi storm broke and those w h o had laid the foundations and who taught there besides Hilbert were scattered over the earth and the years after 1933 became for Hilbert years of ever deepening tragic loneliness." I would like to say a few words here about this terrible era of the so-called Third Reich. The law of April 1933 for the restoration of the professional Civil Service and the subsequent strengthenings of that law led to the dismissal of Jewish academics as well as of all officials who, in the official sense, were politically unreliable. This represented a beginning of the Nazi crimes that e n d e d in genocide. Many scientists left G e r m a n y (Hermann Weyl said, "[they] were scattered over the earth"); many lost their lives. We can see the effects of this horrible past in our o w n universities, and we should be conscious of it. Relative to the university in Bonn, Felix Hausdorff and Otto Toeplitz come to mind. The former committed suicide with his wife in 1942 in order to escape deportation to a concentration camp, while the latter emigrated to Jerusalem in 1939 and later died there9 M. Pin] has dubbed such scholars "Academics in a Dark Era" in his report of that title that appeared in the Jahresbericht of the DMV between 1969 and 1974. In 1934 there were considerable political discussions within the DMV. I find the letter of a prominent mathematician to the editor of the Jahresbericht particularly shameful. In it the author asserted that the DMV could not fulfill its statutory duty "to elevate mathematics to its rightful place in the intellectual life of the nation," since 40% of its members did not belong to the German race. He continued that there was need for a change in the statutes to clarify the place of the DMV within the National Socialist state. However, the DMV was able to ward off author-
itarianism [Fiihrerprinzip] in the case of the statutes, and in its leadership elements came to power (Wilhelm Siig from 1937) who put mathematics first. Following a decree from the Ministry of Culture in 1938, only so-called "Reichsbiirger" or "German nationals" were allowed to be members of the DMV, and all Jewish members had to withdraw. The DMV and its members did not behave heroically during the era of National Socialism, but who among us can maintain that he [or she] would have acted better? If mathematics in Germany was still at the highest level after the First World War, then it lost its role as international leader after the Second World War. Reestablished under the presidency of Erich Kamke (who also served as the vice-president of the International Mathematical Union in 1950), the DMV held its first postwar meeting in Tiibingen in September 1946. T h e Year 1990 The changes in Eastern, Southeastern, and Central Europe, in particular, the peaceful revolution in the DDR, its attendant removal of the Socialist Unity Party dictatorship, and the imminent reunification, place new problems before us and offer us exceptional opportunities. During the Mathematical Congress of the DDR in Dresden, the general meeting of the Mathematical Society of the DDR (MGDDR) resolved last Wednesday [12 September 1990] that it and the DMV should be united. The general meeting of the DMV will discuss a similar recommendation next Thursday [20 September 1990]. How should we proceed? The society that has arisen out of the union of the DMV and the MGDDR will probably be called the German Mathematical Union, where the word "union" now takes on a particularly special meaning. Yet, what are the duties of this new DMV? Through the promotion of mathematics in all areas, it should appeal to teachers of mathematics as well as to mathe-
maticians in industry, thereby developing into the representative of all mathematicians. The Mathematical Society of the DDR can bring much to the combined DMV. It is already open to teachers and to industrial mathematicians. Its Communications will prove very suggestive for the new Communications of the DMV. It has special sections and interest groups in the so-called "fringe areas" of mathematics from which mathematical research will be disseminated. It has small regular meetings on new developments in the various branches of mathematics, which have generated lively interest internationally and which should be continued in order to supplement the Institute at Oberwolfach (which is always completely booked) as well as the DMV Seminars. I would like to quote an introductory remark to the collection of examples from the David Report: "The unification and cross-fertilization of areas within core mathematics, i n c r e a s e d . . , applications (which often uncover unusual and unexpected uses of mathematics), and the growing role of the computer are old themes that are illustrated in these descriptions." In the salute to Felix Klein in 1918, we find the words "interplay and mutual stimulation" instead of "'unification and cross-fertilization," but the sentiments remain the same. The David Report also m e n t i o n s "core" mathematics, where "'core" means "the innermost," the "heart," the "kernel." Felix Klein spoke of the "central parts." Here, we are talking about the kind of mathematics that, without view toward application, discovers or creates; the kind of mathematics that its "representatives and disciples" love and develop for its beauty; and the kind of mathematics that is pursued with the same right as that with which artists pursue composition or painting. Mathematics is also an art. Max-Planck-Institut-fiir-Mathematik Gottfried Claren Strasse 26 W-5300 Bonn, FRG THE MATHEMATICAL INTELLIGENCER VOL 13, N O 2, 1991
11
The Story of 1, 2, 7, 42, 429, 7 4 3 6 , . . . David P. Robbins
The sequence of integers in the title, the first six terms of the sequence $1, $2, $3 999 given by . - I (3i + 1)!
sn
(n + 0!'
-- ,oo ]-[
(I)
has the combinatorial enumeration community puzzled. In the last few years this same sequence has arisen in three distinct combinatorial enumeration problems and no one has been able to explain why. Table I lists the first 20 values of S,. It is not obvious from (1), but it is known that the values of S, are integers.
for a total of 7. Let An denote the number of n-by-n alternating sign matrices. Then we see that A n = S, for n = 1, 2, 3. Also, I have checked by computer that An -- Sn for n ~< 16. (The best methods for computing An will not allow the computation of many more An's. ) I guessed the formula An = Sn for all n some years ago, but this has never been proved. Meanwhile the plot has thickened considerably. Before going on to the rest of the story, I want to convince you that the evidence for this conjecture is overwhelming. The main reason is that it follows from a simpler conjecture that has even more empirical evi-
Alternating Sign Matrices Consider the matrix
-1 1 0 0
0 0 0 1
1 -I
.
1 0
Its entries are all O's, l's, and - l ' s and its rows and columns have sum 1. Also notice that, upon omitting the zeroes, the l's, and - l ' s alternate in every row and column. Such matrices are called alternating sign matrices. O r d i n a r y permutation matrices are alternating sign matrices. For n = 1 and 2, these are the only alternating sign matrices. When n = 3, there are the 6 permutation matrices as well as the matrix
--1
i
1 12
THE MATHEMATICAL INTELLIGENCER VOL 13, N O 2 9 1991 Spnnger-Verlag New York
dence to support it. When faced with combinatorial enumeration problems, I have a habit of trying to make the data look similar to Pascal's triangle. There is a natural way to do this with alternating sign matrices. One sees easily that every alternating sign matrix has a single 1 in its top row. Suppose we count them according to where this 1 is. The counts are naturally organized into a triangle of positive integers whose first five rows are shown below. 1 1 2 7 42
1 3
14 105
135
(2)
2 14
7 105
42
Here the n-th row contains the counts for n-by-n matrices. For example, the first 105 in (2) indicates that there are precisely 105 5-by-5 alternating sign matrices with a 1 in the second position of the top row. You will be able to prove to yourself easily that the first entry in each row of (2) is the sum of the entries in the previous row. Then the whole pattern becomes clear if we insert ratios between the adjacent entries of (2) as shown below. 1 1 2:2 1 2 2:3 3 3:2 2 7 2 : 4 14 5 : 5 14 4 : 2 7 42 2 : 5 105 7 : 9 135 9 : 7 105 5 : 2 4 2 (The ratio 5:5 is the most creative part of this step.) Imagine m y surprise w h e n I noticed that the right and left sides of the ratios separately really did satisfy the Pascal's triangle recursion. The computer calculation mentioned above also verifies that this pattern continues at least for the first 16 rows. Note that, knowing this pattern and using our previous observation about the first entry of each row, one can quickly calculate as many rows as desired. A little algebra also yields the formula An = S,, given that this Pascal's triangle observation is true. This is the overwhelming evidence. Alternating sign matrices arose in connection with a study, by me and m y colleague Howard Rumsey, of a method of Dodgson (otherwise known as Lewis Carroll) [3] for c o m p u t i n g determinants. D o d g s o n ' s method, which he called condensationof determinants, is illustrated below.
1341
011 121 213
1 -1
-1 --~ -3
-1 5
-i
-4
~
-2
(3)
Here the second matrix is formed from the 2-by-2 connected minors (minors formed from consecutive rows and consecutive columns) of the first. The third is
formed by taking the 2-by-2 connected minors of the second and dividing these by the corresponding entries from the center of the first matrix. Finally the last matrix, which is the determinant of the original matrix, is obtained by taking the determinant of the third matrix and dividing by the center entry of the second. This m e t h o d has the pleasant feature that if the starting matrix has integer entries, then all intermediate results are integers. (In fact, each entry is a connected minor of the original matrix.) Dodgson gives no references in his paper, but in his proof of the validity of his method he appears to be making use of an identity of Jacobi [6], page 9. Also see Turnbull [16], page 77. Dodgson's method implies that the determinant of a square matrix is a rational function of all its connected minors of any two consecutive sizes. Rumsey and I [13] were able to give an explicit formula of this type as well as another related formula involving connected minors of three consecutive sizes. Both formulas involve alternating sign matrices. One is particularly interesting because it expresses the determinant as a sum in which alternating sign matrices index the terms in much the same way that permutations, or, if you wish, permutation matrices, can be thought of as indexing the ordinary expansion of the determinant. This is where Rumsey and I came across alternating sign matrices. THE MATHEMATICAL INTELLIGENCER VOL 13, N O 2, 1991
13
After several months of trying to prove our conjecture about the number of alternating sign matrices, we began to suspect that the theory of plane partitions might be involved. We called Richard Stanley and told him about our conjecture. He startled us a few days later when he told us that, although he did not know a proof of our conjecture, he had seen the sequence S, before, and that it was known to enumerate another class of combinatorial objects which had appeared only a few years earlier in a paper by George Andrews [1]. Naturally we headed straight for the library to find out what it was all about.
0 2 3 31 32 33 33 2
where 0 is the empty partition, which one includes by convention. Note that the number of these partitions is 7 = $3. In spite of the peculiar defining conditions, Andrews had actually proved in [1] that the number D, of descending plane partitions with no parts exceeding n was S,. How was Andrews able to do this? First he proved that D, is given by the determinant formula D, = G.j+ /~ i + j + 2}\ ]
Descending Plane Partitions Consider the following array 7 7 6 5 3 5541 3 3 2 which is an example of a strict shifted plane partition. It is an array of positive integers with weakly decreasing rows and strictly decreasing columns. It is called shifted because each row is indented one space to the right of the row above it and called strict because the columns are strictly decreasing. This partition* also satisfies two additional, frankly rather bizarre, conditions. (i) Each row leader (the left-most entry in its row) exceeds the length of its row. (In the example we have 7>5,5>4,3>2,2>1.) (ii) The row leaders of all rows other than the first do not exceed the length of the preceding row. (In the example we have 5 <~ 5, 3 ~< 4 and 2 ~< 2.) Strict shifted plane partitions that satisfy also (i) and (ii) were termed descending plane partitions by Andrews: The positive integers that appear in plane partitions are traditionally called the parts. Here is a list of all the descending plane partitions with no parts exceeding 3 * The use of the word "partition" here is currently standard but is rather far evolvedfrom its original meaning. Originally a partition of a positive integer n was a way of writing it as a sum of positive integers, where two ways differing only in the order of the summands were considered the same. Thus a standard way of writing a partition is to write the summands in weakly decreasing order. For example, the partitions of 4 are, m flus scheme, 4
3+1
2+2
2+1+1
14
(5) id=O, ..,n-2"
(This fact was relatively difficult to discover and to prove at that time. Now, however, there is a fairly standard technique for expressing plane partition enumerations in terms of determinants. The key to the method, described in [5], page 322, is to interpret the plane partitions in terms of paths. It can be used to find the determinant (5).) Secondly, A n d r e w s has nearly incredible facility with hypergeometric function identities, which allowed him to evaluate this determinant. (Just leaf through his paper if you want to see what I mean.) Naturally we set about looking for analogies between descending plane partitions and alternating sign matrices and eventually we found several. In each case we found a correspondence between certain sets of alternating sign matrices and sets of descending plane partitions where we could conjecture that corresponding sets h a d the same cardinality. None of these has been proved. They are all described in [10]. The simplest of these yields an analog for descending plane partitions of the "Pascal's triangle" (3). In a descending plane partition with no parts exceeding n there can be anywhere from 0 to n - I parts that are actually equal to n. These all must appear on the top row. We found (empirically speaking) that descending plane partitions with k occurrences of n correspond to alternating sign matrices with a 1 in position k + 1 of the top row. Thus, for example, in (4) you can see that there are respectively 2, 3, and 2 descending plane partitions with no threes, one three, and two threes in their top rows. This 2, 3, 2 pattern agrees with the third row of (2).
1+1+1+1.
If we drop the plus signs, then a partition becomes a weakly decreasing sequence of positive integers and the sum of these integers plays a less promment role. Such partitions are "linear" because they are written in a row. The notion of a "plane" partition is an extension in which we replace this one-dimensional array with a two-dtmensional array. THE MATHEMATICALINTELLIGENCER VOL 13, NO 2, 1991
(4)
II
IIII1[
[I
IIIII
,'12,
; ~; ' ~ ":~'~"
Although the definition of descending plane partitions makes t h e m seem less aesthetically appealing than alternating sign matrices, they are more appealing in at least one way. Their enumeration formula has a natural "q-analog." Suppose we assign to every descending plane partition r weight qWwhere w is the sum of the parts in ~r. Andrews had conjectured that the sum D,(q) of the weights of descending plane partitions with parts ~ n is given by the formula . - I Ha'+Ij=1~'rI"/_ 1)
D,.(q) = i1--[ nn+. t,.i_ I ) ' =0 li/=l ~ which can be obtained from the formula (1) for S, by replacing all the factorials m! by their standard q-analogs which are (q - 1 ) ( q 2 - 1 ) ' ' ' ( q m _ 1 ) . For example, when n = 3 his conjecture predicts (after some cancellation) that (q6 _ 1)(q7 _ 1) D3(q) =
rOW.
(q2 _ 1)(q3 _ 1)
= l+q2+q3+q4+qS+q6+q
s,
which is the generating function for the plane partitions in (4). Thus this q-analog formula not only enumerates descending plane partitions but gives their weight distribution as well. There is no weight known for alternating sign matrices that corresponds to this weight for descending plane partitions. The study of alternating sign matrices led me and my colleagues, Bill Mills and Howard Rumsey, to the proof [9] of Andrews's conjecture. Our key observation was that, for descending plane partitions, there was a natural q-analog of the Pascal's triangle (2). By following Andrews's method we were able to express each of the entries in this Pascal's triangle as a determinant similar to (5). We then had to prove that adjacent entries in the triangle were in certain ratios. This required a little ingenuity with determinants, but nothing like the calculations in [1]. By this time we had read (at least the easy parts of) Andrews's paper more carefully. We found that his primary interest was not really in his own conjecture but in a conjecture of Ian G. Macdonald [7], page 53, which concerns what are known as cyclically symmetric plane partitions. Consider the following array. 431 221 2 1
pile of cubes is known as the Ferrers graph of the partition. The Ferrers graphs of some plane partitions, such as the one above, are invariant under cyclic permutation of the three coordinate axes (the three edges of the box emanating from the selected corner). These are cyclically symmetric plane partitions. Macdonald's conjecture concerned the number of these cyclically symmetric plane partitions that can fit in an n-by-n-byn box. In [1] Andrews had proved this conjecture with essentially the same technique as he used for descending plane partitions, but he wished to prove a q-analog that had been conjectured by Macdonald. He had invented descending plane partitions because they seemed to be a natural companion problem. Our study of alternating sign matrices also led us to the resolution of this conjecture of Macdonald. The method was identical, including another Pascal's triangle of counts, obtained by classifying these plane partitions according to the number of n's in the top
1
It is an array of positive integers with weakly decreasing rows a n d c o l u m n s . N o w t h i n k of the numbers as a sort of contour map of a pile of unit cubes in the corner of a box. For example, the 4 represents a pile of four unit cubes. This representation as a
The best thing about this work, however, was that it led us in a few steps to lots of new combinatorial objects with mysterious enumeration properties.
Self-complementary Totally Symmetric Plane Partitions Let us return for a m o m e n t to the analogies between alternating sign matrices and descending plane partitions. Early on Bill Mills had observed that there was a natural involution, described in [10], on the set of descending plane partitions with no parts exceeding n, which seemed to correspond to the flip of an alternating sign matrix obtained by exchanging columns 1 and n, 2 and n - 1, etc. This Mills mapping, as we called it, was rather complicated. We had noted, however, that one could conjecture another (still unproved) formula for the number of flip-symmetric alternating sign matrices. Eventually it occurred to us that there might be a construction analogous to the Mills mapping for cyclically symmetric plane partitions. Once we asked the question, the answer was easily found. The analogous operation was "complementation." The complement of a plane partition whose Ferrers graph lies in a given box is the set of cubes that naturally fill the volume not filled by the given partition. This set of cubes can be viewed as the Ferrers graph of a partition from the point of view of the opposite corner of the box (see Figure 1). Apparently this complementation operation had not been studied previously. We first counted the numbers of self-complementary and cyclically symmetric plane partitions with Ferrers graph in an n-byn-by-n box w h e n n is even. (There are none when n is odd.) The usual vague analogy b e t w e e n cyclically symmetric plane partitions and alternating sign matrices worked again. It was easy to conjecture and not THE MATHEMATICAL INTELLIGENCER VOL 13, NO, 2, 1991 1 5
even that hard to prove [11] a formula for these counts. From here it was not long before we started wondering about the "most symmetric" plane partitions, namely those that are self-complementary and totally symmetric (invariant under all six permutations of the coordinate axes). Again there are none in an n-by-nby-n-box if n is odd, but there are some for even n = 2m. For example, if m = 3, we have the following selfcomplementary and totally symmetric plane partitions. 666333 666333 666333 333 333 333
666433 666333 665332 4331 333 332
666433 666433 664322 4432 332 332
666543 665332 655331 53311 4331 321
666543 665432 654321 54321 4321 321 (6)
666553 655331 655331 53311 53311 311
666553 655431 654321 54321 53211 311
These plane partitions are pictured in Figure 1.** Let T, be the number of self-complementary totally symmetric plane partitions in a 2n-by-2n-by-2n box. The experience was a little eerie when I calculated the first few values 1, 2, 7, 42 by hand and then, by computer, 429, 7436, etc. In fact, it has n o w been checked that T, = S. for n ~< 20 and could be checked quite a bit higher. (These computer calculations make use of methods more sophisticated than direct enumeration.) This numerical evidence for the conjecture T, = S, is quite convincing in itself. Again the evidence is strengthened by the existence
Figure 1. The 7 self-complementary totally symmetric plane partitions with Ferrets graph in a 6 - b y - 6 - b y - 6 box. of a Pascal's triangle array of subcounts analogous to (2). In fact there are now three essentially different ways of doing this. Strangely, the most elegant was the last to be discovered. It was found by Bill Doran [4], then an undergraduate at the University of Michigan. Suppose we write a self-complementary totally symmetric plane partition with Ferrers graph in a 2n-by-2n-by-2n box as a 2n-by-2n matrix az], i, j = 1, . . . . 2n, with missing entries replaced by zeroes for convenience. Doran classifies these matrices a according to the value of 12-
** Self-complementary totally symmetric plane parhtions are not usually studied in the form illustrated m (6). The problem is that, because of the high degree of symmetry, the plane partitions are highly redundant. O n e can s h o w without m u c h difficulty [12] that self-complementary totally symmetric plane partitions with Ferrers graph in a 2n-by-2n-by-2n box are in one-to-one correspondence with a certain class of shifted plane partitions. A n example for n = 5 is s h o w n below. 5 5 5 4 544 4 2 1 These are - 1, n tries ~< n, the list of
33 2
33 32 1 2
32 22 1 2
I=1
For example, in the case n = 3, Doran's function assigns the values 3, 2, 1, 2, 1, 3, 2 to the plane partitions in (6). Notice that the numbers of l's, 2's and 3's is 2, 3, and 2, in agreement with the third row of (2). The other two functions of this type were discovered by Howard Rumsey and were reported in [12]. One of these is the function n
characterized b y their triangular shape with row lengths n 2. . . . . 1, weakly decreasmg rows a n d columns, all enand that every e l e m e n t in the ith row is/> n - z. Here is all seven m case n = 3. 33 3
f,(a) = ~ ~ ( - 1 ) ' + 1 a , .
22 1
16 THE MATHEMATICALINTELUGENCERVOL 13, NO 2, 1991
f2(a)
=
Z(a,,
zffil
-
In case n = 3, the values assigned to the list in (6) are 3, 2, 1, 3, 2, 2, 1, with the same distribution of l's, 2's, and 3"s. The last is f3(a), the number of values of i, i = 1. . . . . n for which a,.,i takes on its minimum value
(which happens to be 2n - i + 1). Applied to our list for n = 3, this yields the values 1, 1, 2, 2, 3, 2, 3, again with the same distribution. These three functions have been observed to generate the Pascal's triangle array (2) at least through the first 7 rows. Now a reasonable thought is the following. An alternating sign matrix has four natural functions of this type, one giving the location of the 1 on each side of the square. One wonders if the joint distribution of these three functions is like the joint distribution of the positions of three of these l's. It is a little disappointing to find out that the answer is no. However, it is a little mind-boggling to find out that any two of the three functions f,, i = 1, 2, 3 for self-complementary totally symmetric plane partitions appear to have the same joint distribution of values as the position of the 1 in the top and bottom rows of an alternating sign matrix. All this evidence gives extra weight to the conjecture that Tn = Sn. Nevertheless it has not been proved either that Tn = Sn or even that Tn --- A n. From here the subject has exploded into a stag-
gering array of conjectures because of two suggestions of Richard Stanley. He first suggested taking a more systematic view of the complementation operation on plane partitions. He observed that the six permutations of the coordinate axes together with the complementation operation generate a 12-element group. (This group is isomorphic to the dihedral group of symmetries of the regular hexagon. See the recent attide of David and Tomei [2] for an observation that makes this quite vivid and that inspired the computerdrawn images of Figure 1 of self-complementary totally symmetric plane partitions for n = 3.) One sees easily that this group has 10 conjugacy classes of subgroups. For each conjugacy class one could attempt to enumerate plane partitions invariant under one of its subgroups. Stanley and I checked all the symmetry classes and we found that there were nice formulas for all 10 classes. (Table 2 summarizes these formulas.) In some cases these had already been proved. Another case was a previously known but still unproved conjecture of Macdonald concerning totally symmetric plane partitions. There were three new conjectures.
THE MATHEMATICALINTELLIGENCER VOL 13, NO 2, 1991 ~.~
Stanley quickly proved one of these himself in [14]. (This paper also contains a brief discussion of what is known about all the symmetry classes.) Another is the conjecture above about self-complementary totally symmetric plane partitions. The last n e w conjecture is worth mentioning. There are really two types of selfcomplementary cyclically symmetric plane partitions. This depends on whether one requires that the complementary partition actually be geometrically congruent to the original or congruent to its mirror image. It is the latter case that was mentioned above and treated in [11]. When one counts the cyclically symmetric plane partitions that are really congruent to their complements in a 2n-by-2n-by-2n box, one finds that the numbers are 12, 22, 72, 422, 4292, 74362 for n = 1. . . . . 6. It has not been proved either that these numbers are S2 or that they are the squares of the numbers Tn or An. Thus in a sense we have a fourth independent occurrence of the sequence Sn. 18
THE MATHEMATICAL INTELLIGENCER VOL 13, NO 2, 1991
Stanley's second suggestion was to study symmetry classes of altemating sign matrices. All the symmetries of a square act in a natural w a y on altemating sign matrices. There are 8 conjugacy classes of subgroups of the group of symmetries of a square. For each conjugacy class one can ask for the number of alternating sign matrices invariant under a group of symmetries in this conjugacy class. I have enumerated altemating sign matrices in each of these classes by computer. Mills and I [8] have found nice formulas, summarized in Table 3, that enumerate the alternating sign matrices in six of these eight classes. (These conjectures, together with some other related ones, are described in a paper of Stanley [15].) The two classes for which we could not find formulas are the sets of alternating sign matrices that are self-transpose and those that are invariant under all symmetries of the square. The empirically derived formulas in the other six cases have never been proved.
These conjectures are of such compelling simplicity that it is hard to understand how any mathematician can bear the pain of living without understanding why they are true.
7. 8.
I will conclude this section by stating the one that I consider the most surprising. For each positive integer n there exist some 4n-by-4n alternating sign matrices that are invariant u n d e r 90-degree rotation. For example, when n = 1, there are the two matrices,
10i)(i01
0
0 0
0 0
0
1
and
0
1
0
10. 11.
0 0 0 0
0
9.
"
12.
0
13. The number of these appears to be given by the formula A3Cn where An is the number of n-by-n alternating sign matrices and Cn is the number of cyclically symmetric plane partitions whose Ferrers graphs fit in an n-by-n-by-n box.
Conclusion You have now seen a number of empirically verified but unproved combinatorial enumeration formulas. As far as I know, this is the complete list of such conjectures with such simple formulas. These conjectures are of such compelling simplicity that it is hard to understand how any mathematician can bear the pain of living without understanding why they are true. The conjectures concerning alternating sign matrices are probably the most difficult. Here the only interesting contributions so far are the empirical observations. No progress has been made on any of the proofs. Some think that there will be a single key idea that will unravel all these conjectures, but I cannot see how the solution of one of these conjectures can help with the others. I expect that these problems will remain with us for some time.
14. 15.
16.
geneis secundi ordinis per substitutiones lineares in alias binas transformandis, quae solis quadratis variabilium constant; una cum variis theorematis de transformatione et determinatione integralium multiplicium, J. Reine Angew. Math. 12 (1834), 1-69. Ian G. Macdonald, Symmetric Functions and Hall Polynomials, Oxford: Clarendon Press (1979). W. H. Mills and David P. Robbins, Symmetries of alternating sign matrices, manuscript. W. H. Mills, David P. Robbins, and Howard Rumsey, Jr., Proof of the Macdonald conjecture, Invent. Math. 66 (1982), 73-87. W. H. Mills, David P. Robbins, and Howard Rumsey, Jr., Alternating sign matrices and descending plane partitions, J. Combin. Theory Ser. A 34 (1983), 340-359. W. H. Mills, David P. Robbins, and Howard Rumsey, Jr., Enumeration of a symmetry class of plane partitions, Discrete Mathematics 67 (1987), 43-55. W. H. Mills, David P. Robbins, and Howard Rumsey, Jr., Self-complementary totally symmetric plane partitions. J. Combin. Theory Ser. A 42 (1986), 277-292. David P. Robbins and Howard Rumsey, Jr., Determinants and alternating sign matrices, Advances in Mathematics 62 (1986), 169-184. Richard P. Stanley, Symmetries of plane partitions, J. Combin. Theory Ser. A 43 (1986), 103-113. Erratum, 44 (1987), 310. Richard P. Stanley, A baker's dozen of conjectures concerning plane partitions, Combinatoire Enumerative, Lecture Notes in Mathematics, Vol. 1234, New York: Springer-Verlag (1985), 285-293. H. W. Turnbull, The Theory of Determinants, Matrices, and Invariants, New York: Dover (1960).
Institute for Defense Analyses Thanet Road Princeton, NJ 08540 USA
References 1. G. E. Andrews, Plane Partitions (III): The weak Macdonald conjecture, Invent. Math. 53 (1979), 193-225. 2. Guy David and Carlos Tomei, The problem of calissons, Amer. Math. Monthly 96 (1989), 429-431. 3. C. L. Dodgson, Condensation of determinants, Royal Society of London, Proceedings 15 (1866), 150-155. 4. William F. Doran, A connection between alternating sign matrices and totally symmetric (2n, 2n, 2n)-selfcomplementary plane partitions, manuscript. 5. I. P. Goulden and D. M. Jackson, Combinatorial Enumeration, New York: Wiley (1983). 6. C. G. J. Jacobi, De binis quibuslibet functionibus homoTHE MATHEMATICAL 1NTELLIGF.NCER VOL 13, N O 2, 1991
19
The Axiom of Choice and Calculus Revision: A Dialogue Seymour Haber
M. and P. are mathematicians--an analyst and a logician. Both teach calculus, of course. This is not their first conversation. M. is feeling aggrieved. M. Well, of course the axiom of choice is true. It's obvious. If you have a set A and another set B, you can define a set C that is the set of all ordered pairs (a,13) where a is in A and 13 is in B. P. You mean the set A • B exists. M. Why put it that way? P. It's making what you said more explicit. For you to "define" A • B and use it, you have to believe it exists. M. That's not perfectly clear to me, but I'll go along. Okay. P. And that raises the question, " H o w do you know it exists?" M. What's the problem? The set A is there, and so is B---so of course the ordered pairs (et,13) are there, and the set of them is A x B. P. But you're asserting that all these things exist! They're not physical objects--in what sense do they "exist"? M. It seems your problem is about h o w to put this perfectly clear idea into words. Everyone knows putting ideas into words is difficult--the computer types call it "documentation" and avoid doing it at all costs. But you remember that years ago, before you specialized in logic, you learned about ordered pairs and cartesian products, and saw it all perfectly well. P. Well I don't see it now. In what sense can a cartesian product of infinite sets be said to exist? M. You've lost it? That's too bad.
P. It wasn't really clear ever. When questions were raised about it, I couldn't answer them. Now it's clear to me that I was sort of fooling myself--the concepts involved are really very obscure. M. It's clear to you? You mean the concepts you use in questioning the axiom of choice are themselves all clearer to you than the concept of cartesian product of sets? For instance, w h e n y o u ask " W h a t is the meaning of 'exists' in the statement 'A x B exists'?", is the term 'meaning' so clear to you? Also, what exactly, is the logical status of the phrase 'what is'? You know that when we wish to raise questions we can raise very difficult q u e s t i o n s about the v e r y clearest ideas and attitudes. Some of them are taken seriously by philosophers--for example the question of whether there is such a thing as "free will," as they put i t - - a n d some are not taken seriously, such as Zeno's paradoxes and the question of solipsism. I don't know how they choose which to dismiss and which to talk about.
The next day. P. But you have to be able to put it into words! Mathematical proofs are public, not private knowledge. You have to be able to say them explicitly. M. We do. When we bring cartesian products in, we say it explicitly and everyone sees what's being said. You remember, it used to be clear to you. 20
THE MATHEMATICAL INTELLIGENCER VOL 13, NO 2 9 1991 Spnnger-Verlag New York
Why do you choose to worry at the "'axiom of choice"--no worse than any axiom in the ZermeloFraenkel list, it seems to me--rather than worry about whether there is any kind of infinity at all? It seems reasonable to s u p p o s e that there are only finitely many distinct physical objects--electrons, protons, etc., and configurations of them . . . . P. The physicists get more and more elementary particles all the time. Maybe the list will go infinite soon. And there has to be at least one of each type. M. More likely it'll be finite. And when it comes to thoughts, each person manages only finitely many distinct thoughts, I suppose. And there are only finitely many human beings, now and in the past and in the f u t u r e - - s a y for the next 101~176 years. 10l~176 years seems a conservative upper bound on the life of the human race; or 102~176 seconds, or nanoseconds--it's much the same thing. So it seems there are only finitely many "things" at all, and infinite sets don't exist in any sense. Why not worry about that? P. Some people do. Quine, Mannoury. But it seems silly to me. M. It gets rid of the axiom of choice problem. You don't worry about the cartesian product of 2 finite sets. P. Maybe. If there are only finitely many things, so that THE u n i v e r s e - - t h e set of all things--is a finite set, h o w about its cartesian product with itself? You'd have to conclude that that doesn't exist. M. Interesting. What does Quine say about that? P. I don't know. M. The idea of only finitely many positive integers seems problematic. What do you do about N + 1, where N is the highest one? P. Perhaps it's not a problem because we don't know N yet. So "N + 1" isn't a definite concept. M. I agree with you, it seems silly. But I feel that doubts about the "axiom of choice" are equally silly! They've just been legitimized by everybody talking about the matter. That sort of discussion generates the same haze in my m i n d - - a feeling of growing meaninglessness--as does the discussion of there being only finitely many "things." It's only less funny. P. But there really are problems there. M. Perhaps so, and in both cases. Maybe we can get somewhere by looking at an older annoyance that's been s e t t l e d - - Z e n o ' s paradoxes. Seeing h o w they were settled might help. P. It was done by the concepts of limit, and derivative, and sum of an infinite series. Of course those themselves led to . . . . M. Forward, not back! If I remember the one about the arrow, it just came to questioning that an arrow can have position and velocity simultaneously. P. Yes. The argument was this. Look at a certain point p on its flight p a t h - - s o m e w h e r e in the middle of the path. At some time the arrow must be at point p.
When it's at p ("when its center of gravity is at p,'" if you want to be picky) it's at p; so it can't be moving. So its flight stops. M. It just stays there, or it falls down? P. That's just making it all silly. The point was, if it's at a place, it isn't moving; and if it's moving it isn't at a specific place. M. Again it sounds like an argument about phrases, not about physical facts. P. Well, they had to use words to discuss physical facts, and here the words were betraying them. Nobody knew a really good answer, and they mostly just laughed the problem out of court. M. Well, now we have a clear answer. The arrow has a position at each instant of time--there's a function p(t) describing its path and just how the arrow traverses it. And this function has a derivative, p'(t) tells us the velocity and at each time t the arrow has both a position and a velocity. P. Yes. But isn't it sort of by fiat? We're not answering Zeno's question, we're just asserting that the position and velocity exist simultaneously. M. Well, of course they do. Everybody knew that. The question was how to talk about the two in a manner that was logically--i.e., verbally--consistent and also fit the facts. That's what was developed.
Everyone knows putting ideas into words is difficultmthe computer types call it "'documentation" and avoid doing it at all costs. P. Does it really fit the facts? The physicists don't seem to believe that things really have position and velocity--they usually talk more directly about "mom e n t u m " - - s i m u l t a n e o u s l y . While for everyday objects the uncertainty is very small, it's enough to kill the definition of velocity as the derivative of position, at least in principle. M. I suppose that's so, and I don't know just what to say about it. I'm too weak on the physics. I suppose they have a w a y of defining the v e l o c i t y - - o r mom e n t u m - t h a t ' s different from differentiating the position function. I wonder h o w it works. There's something else that bothers me about the "velocity is the derivative" answer to Zeno's paradox. It's that a static d e s c r i p t i o n - - t h e function p(t), which is fundamentally a set of ordered pairs--is made the basic part of the analysis of the flight of the arrow and the aspect of movement, the velocity, is made a secondary (one might say "derivative"!) part. The intuitive dynamic picture of the situation has been placed at a remove; our analysis sets up a static o b j e c t - - a function, which we think of as a set of ordered p a i r s - - a n d works on that. THE MATHEMATICAL INTELLIGENCER VOL 13, NO 2, 1991
21
P. Hmm. That's not the way it went historically. Newton and those guys didn't think of a function as a set of ordered pairs; they thought in terms of variable quantities and functional expressions that showed the connections between the values of one variable and those of another. They were thinking in dynamic terms. M. Yeah, variable quantities. We don't seem to have them anymore. And there's a reason for that. What exactly are they? P. You've lost them? That's too bad.
The next day. M. Okay, you win one. But maybe it really is too bad. The scientists' concept of a varying quantity is intuitive and clear. Everybody knows about the temperature varying during the day and night, and about the speed of a car varying as you brake and accelerate. But we replace that with a static picture of a mapping in which each point in a domain is connected to a certain point in a range. P. So? M. Well, it works for me. But I also have fuzzy intuitive pictures of movement and change that I use in thinking, in m y research work. H a d a m a r d talked about mathematicians doing that. Nothing in our deftnitions suggests such pictures to our s t u d e n t s - - t h e definitions are almost algebraic, even in analysis. The word "variable" is kind of bootlegged into the calculus course nowadays. We never really defined it and we try to avoid using it. We mostly avoid it when teaching the chain rule, where it would really be natural to use it. It's pretty hard to avoid w h e n teaching partial derivatives. Do you remember being told in high school, years back, that calculus is "the mathematics for handling varying quantities?" It's true, you know. P. Well, a variable is a symbol that can represent various . . . . M. Yeah. I've even seen a high school text use the term "place holder" for variable. Ugh. P. It's essentially correct . . . . M. You know, to the computer types a variable is just the name of a memory location. Maybe that's where "place holder" came from. It's awful. P. I don't see . . . . M. Look. If y is a function of x and x is a function of t and these are all varying quantities, and x changes dx/dt times as fast as t, and y changes dy/dx times as fast as x, then y changes d g . dx dx dt times as fast as t, so
dy = dy dx dt dx dt " _
22
_
.
_
_
THE MATHEMATICAL 1NTELLIGENCER VOL 13, NO 2, 1991
Compare that to the contortions we go through to prove the chain r u l e - - o r to being forced to avoid proving it, and telling the students "the proof is too difficult to take up now!" P. Is that really a proof? Isn't it relying on an intuitive idea of "x changes dx/dt times as fast as t"? M. Well, do we want them to have that intuitive idea or don't we? Of course we do. And if they have it, are they never to use it? P. Still, there's something not rigorous there. M. Y e s - - a n d it can be handled two w a y s - - t h e rigorous w a y and the intuitive way. What's missing is absolute certainty that the local fact about y's varying with t - - d y / d t - - d e p e n d s only on the local facts dy/dx and dx/dt. Rigorously you can say Given a n y , > 0 there is a 81 such that
dx I[x(t) - x(t0)l - (t - t0)-~-(t0)I < ,It - t0] whenever It - to[ < 81; and there is a 82 such that IIy(x) - g(x0)] - (x - Xo)
U Jr
(xo)l < ,Ix - x o l - -
where x 0 = x(t0)--whenever Ix - x01 < 82 And so . . . . But you can also handle it intuitively. You'll have already said that (dy/dx)(xo) can't be used to figure the change in y when x varies by too much; that the approximation (x - Xo) " (dy/dx)(xo) loses accuracy, generally, as x gets further away from x0. And they'll have seen examples. N o w you tell them that as t gets further from t o, (dx/dt)(to) gets to be a worse and worse version of the relative rate with which x is changing; so that you're multiplying (dy/dx)(Xo) by the wrong quantity. And dy/dx is also changing. So they'll know that ay = dx dt dx dt is "locally valid"--i.e., strictly equal only as limits, and to be used with caution for finite changes in t. I think the intuitive approach is just as much apt to get them to avoid errors in the application of the chain r u l e - - s a y to integration--as the rigorous approach. And in h o w many calculus courses do we actually teach the rigorous approach? You know, we throw out intuitive reasoning in the name of rigor--and then we leave out the rigorous reasoning and tell the students to accept the theorem on our authority. That's real "honest mathematics!" P. There's a lot more h i d d e n in your approach. You're taking it for granted that the derivative has the intuitive properties of a rate of change. And that bit about (dx/dt)(to) getting to be a poorer version of x's rate of change as t gets further from t0--you're really picturing dx/dt being continuous. It's not needed for the chain rule argument, but it's there.
M. Yes, it is. And it's true for the functions the students should be thinking of at that point in their studies. There's no point to starting them off with a general concept of function--top-down teaching! They should be thinking about linear functions, and quadratics and cubics, and V~, and sine and cosine and a few other things. By the way, isn't it incredible luck that sine and cosine are analytic, considering where they come from--right triangles! We can tell them about discontinuous functions w h e n they're needed. We can tell them about the characteristic function of the rationals when we're teaching the Lebesgue integral.
A Guide for ambitious professionals
We throw out intuitive reasoning in the name of rigor--and then we leave out the rigorous reasoning and tell the students to accept the theorem on our authority. P. There's a lot vague there. But back to variables: H o w would you define "variable quantity?" M. I don't know. I can see discussing physical situations and introducing it in connection with them, but I don't see how to do it in a self-contained mathematical way. I seem to need the concept of time. P. Then you can't put it in a math course. M. Is that necessarily true? Back w h e n p e o p l e - mathematicians--did think that way, there were math courses, after all. Say before 1850. The idea that a topic has to be solidly established pure mathematics before you are permitted to deal with it as a mathematician at all is probably a hangover from the 1870 to 1930 foundations-crisis period. Why can't I teach ideas that are intuitively very clear--though not perfectly s o - - a n d not at all errorprone, and wait for somebody else to produce the pure mathematical theory that clarifies them a good deal further? Right n o w n o b o d y is even thinking about such a theory, as far as I know. Our present rigorous theory is not necessarily the ultimate, and perhaps we shouldn't regard it as sacrosanct, so that every departure from it must be considered shameful. You know, for over 1500 years students of Euclid spent an awful amount of effort on the geometrical solution of what amounted to quadratic equations, but later a better way came along. I'm not advocating loose math. If we teach an intuitive dynamic calculus, I see no w a y to continue with it in more advanced courses other than to tie it in with our present ~-~ way of rigor. Perhaps someday there will be another way. In the meantime I think a lot will be gained by starting with the dynamic approach. P. About the axiom of c h o i c e . . .
Department of Mathematics Temple Umversity Philadelphza, PA 19122 USA
1990. XV, 221 pp. Hardcover DM 49,ISBN 3-540-51375-2
In this book Ron Kay discusses the variety of management problems facing scientists and engineers in high technology enterprises. Drawing upon his personal experience, the author illustrates many of the issues encountered in dealing with people, planning, and financial matters in the course of scientific work. The emphasis throughout is on fostering a creative environment. Its entertaining style and stimulating content make this book enjoyable and invaluable reading for all professional scientists and engineers. "Compared to the many bloodless theoretical offerings in this field, here is a lucid, pleasant, and highly commendable guide from a practitioner."
H.-J. Queisser
[] Heldelberger Platz 3, D-1000 Berhn 33 [] 175 Fifth Ave, NewYork, NY 10010,USA [] 8 Alexandra Rd, London SW19 7JZ, England [] 26, rue des Carmes, F-75005 Pans [] 37-3, Hongo 3-chome, Bunkyo-ku, Tokyo 113,Japan [] Cltlcorp Centre, Room 1603, 18 Whafield Road, Causeway Bay, Hong Kong [] Avmguda Diagonal, 468-4o C, E-08006 Barcelona THE MATHEMATICAL INTELLIGENCER VOL 13, NO 2, 1991 2 3
Dmitrii Egorov: Mathematics and Religion in Moscow Charles E. Ford
The intensified class war in the USSR has pushed the right wing of the professorate into the camp of counter-revolution. The reactionary professoriate has been at the head of all the recently uncovered wrecking organizations and counter-revolutionary parties. Thanks go to the brilliant efforts of the OGPU [predecessor to the KGB] for uncovering the crimes of a whole series of scientific bonzes who have known how to conceal themselves artfully behind various masks--from cold loyalty to a loudly advertised warm attachment to Soviet power. Active counter-revolutionaries have appeared even among mathematicians. Professor Egorov was arrested for participation in a counter-revolutionary organization. He is the acknowledged leader of the Moscow school of mathematics, president of the Mathematical Society, former director of the Mathematical Institute and the candidate of Moscow mathematics in the Academy of Sciences. This same Egorov is the preserver of academic traditions, against which the proletarian student body had already undertaken struggle. Nearly unanimously the Moscow mathematicians came to his defense. There has been a full clarification of the role of academic traditions in our nation, traditions coming from pre-revolutionary Russia, in the promotion of counter-revolutionary and restorationist attitudes among scientists. By the preaching of "pure science," by the renunciation of the class struggle among scientific workers, by the preservation of caste prejudices among scientists, the counter-revolutionaries have preserved for themselves the leadership positions in scientific organizations.
"Shakhty affair," in which a large group of engineers was convicted of sabotage or "wrecking" in a widely publicized show trial. Thousands of engineers were arrested. Attacks were made on the Academy of Sciences, the most prestigious scientific institution in the country. Over the next two years, more than one hundred workers in the Academy were arrested, including six academicians. The Academy was accused, among other things, of containing an anti-Soviet religious and philosophical circle [15, p. 267]. Appeals were issued to students and faculty at academic and research institutions across the country, calling on those who sympathized with the revolution to expose "reactionaries," "counter-revolutionaries,"
This is the opening paragraph of the Declaration of the
Initiative Group for the Reorganization of the Mathematical Society [18]. This group consisted of five young mathematicians. It was formed in the fall of 1930 shortly after the arrest of Professor D. F. Egorov, u n d e r whose leadership Moscow had just emerged as a center of m o d e m mathematical research. This attack and the arrest of Egorov took place in the context of an unprecedented upheaval in the Soviet Union, launched by Stalin in 1928. It began with the 24
THE MATHEMATICAL INTELLIGENCER VOL 13, N O 2 9 1991 Spnnger-Ver|ag New York
and "wreckers" within their ranks. This is the context in which the "Initiative Group" was formed. The Declaration had five signatories: L. A. Lyusternik, L. G. S h n i r e l m a n , A. G e l f o n d , L. P o n t r y a g i n , a n d Nekrasov. Pontryagin's name was misspelled as Pontryashi. No initials were given for Nekrasov. The Declaration p r o c e e d e d to e n u m e r a t e the counter-revolutionary positions held by "bourgeoisdemocratic fellow travellers," w h o had been "unm a s k e d . " They i n c l u d e d "pacificism in the class struggle," "renunciation of the necessity of revolutionary terror," and "the preaching of 'general human' morals." The name of Egorov was turned into an epithet of opprobrium, "Egorovshchina," and a conciliatory stance towards it was denounced. Particularly condemned was the idea that "it is possible to be Egorov by conviction, but honestly work with Soviet power." Referring to the founding of the Moscow Mathematical Society in 1864, it asserted that the establishment of a serious scientific school of mathematics was not possible in Czarist times. This only became possible as a result of the revolution. The Society was accused of excluding communist mathematicians from its ranks, while continuing to list as members mathematicians w h o had fled the country after the revolution. Finally it turned to religion. Connected with the traditions of philosophical idealism, inherited from Bugayev and others, the Society of course would not consider Marxist methodology in science. Instead, "priestcraft" and clerical obscurantism flourished in its ranks. The entire membership of the Society, beginning with its president, were active church people, using the name of Soviet science to hold up the authority of the church among the masses. These charges are based on fact, unlike the fictitious accusations about "wrecking organizations and counter-revolutionary parties." To appreciate this we turn to a little-known aspect of the origins of the Moscow school of mathematics, namely its connections with the Russian religious-philosophical renaissance of the early twentieth century.
Origins As Marxists, the Soviet leaders were convinced that the advance of science would necessarily cause the retreat of religion. The drive against religion exerted considerable influence on Soviet philosophy of science [11, pp. 76-78]. Particularly unwelcome was a group of scientists whose members shared a religious outlook. The founders of the Moscow school of mathematics were just such a group. N. V. Bugayev, a member of the Mathematics faculty of Moscow University, was the most outstanding mathematician in Moscow in the late nineteenth century. He was one of the founders of the Moscow
Dmitrii Egorov Mathematical Society in 1864 and one of Egorov's teachers. He was an exponent of the philosophical idealism that Marxism condemned. Under his leadership the Moscow Mathematical Society became firmly identiffed with his philosophy. Like the Marxists, he saw a connection b e t w e e n different philosophical world views and different approaches to mathematics. Bugayev had a religious perspective and opposed materialism. In rejecting a mechanistic world view, he came to view discontinuity as a mathematical concept consonant with "free will." This together with his own mathematical interests led him to advocate the study of discontinuous functions. Bugayev and his followers sensed deeply that the development of m o d e m science, including the social sciences, would depend increasingly on mathematical methodology. They c h a m p i o n e d the d e v e l o p m e n t of a more comprehensive theory of functions that could include disc o n t i n u o u s as well as c o n t i n u o u s functions [27, pp. 352-353]. V. Ya. Tsinger, a contemporary and colleague of Bugayev, used non-Euclidean geometry to reaffirm an idealist philosophy of science and to oppose materialism and empiricism. Empiricism, he said, could lead to materialism, which "degrades the dignity of man by negating his spiritual nature and by striving to make him a slave of matter" [27, p. 350]. THE MATHEMATICAL INTELLIGENCER VOL 13, NO 2, 1991 2 5
Egorov enrolled as an undergraduate in this Department in the fall of 1887, graduating in 1891. Among his t e a c h e r s w e r e Bugayev, Tsinger, a n d P. A. Nekrasov, who shared the idealist philosophy of his two colleagues. Tsinger highly praised the paper Egorov submitted to the examination commission for his degree. He and Nekrasov recommended Egorov for a fellowship for graduate studies. Egorov was awarded a doctorate in 1901. After spending a year abroad, he was appointed to the faculty in 1903, the year that Bugayev died [13, pp. 125-126]. P. I. Kuznetsov has written about the influence of Bugayev on Egorov [13, pp. 127-128]: Egorov's first published paper [in 1892] was written on a theme which at the end of the last century was in Moscow the subject of a considerable amount of work under the influence of Professor N. V. Bugayev--the theory of numerical integrals and derivatives. To indicate the value of this work we have to say a few words about this direction of research. At its basis lies an attempt to draw an analogy between certain transformations of elementary number theory (summation over divisors) and the operations of infinitesimal analysis. Bugayev aimed at "discontinuous analysis." This analogy is quite valid. Nowadays it would be stated as the analogy between the dosure [completion] of the ring of integers with respect to the metric of absolute value and the closure [completion] with respect to other [p-adic] metrics. However, in Bugayev's work the analogy had a fairly formal character and did not lead him to any profound results. Bugayev's influence is also evident in Egorov's textbook Number Theory, in which there is an account of the theory of "'numerical integrals." Egorov wrote a paper about the work of Bugayev after his death. Kuznetsov wrote in 1971 of a revival of interest in Bugayev's ideas, particularly with reference to combinatorial analysis. The Moscow school of mathematics d e v e l o p e d under Egorov and his first and most important student, N. N. Luzin, w h o entered the University in 1901. It is from Bugayev that they had inherited an
N. N. Luzin 26
THE MATHEMATICAL INTELLIGENCER VOL 13, NO 2, 1991
interest in the theory of functions. Luzin had studied under Bugayev. So had B. K. Mlodzeyevskii, one of Egorov's colleagues. He had been one of the two official "opponents" against w h o m Egorov defended his thesis. Mlodzeyevskii offered the first course in the theory of functions at Moscow in the fall of 1900, introducing concepts of set theory. He repeated it in the fall of 1902 [20, p. 280].
L u z i n and Florensky
The Soviet historian of mathematics Sergei S. Demidov [2] has investigated the influence of Bugayev on the origins of the Moscow school of mathematics. Much of what follows is based on his paper. The central figure in this story is P. A. Florensky. He entered the University in 1900, one year ahead of Luzin. Florensky's interest was aroused by Bugayev's ideas about the mathematics of discontinuous functions. Florensky chose as his research topic "The idea of discontinuity as an element of a worldview." He worked intensely, studying a vast literature in mathematics, physics, chemistry, biology, and philosophy. The resulting manuscript connected the study of discontinuity with Cantor's theory of sets and with the latest work of the French school of the theory of functions of a real variable. The first part of it was sufficient for his degree. As early as January 1901 (during his freshman year!) Florensky was studying the work of Cantor, Peano, and E. Borel. He organized the extraordinary student sessions of the Moscow Mathematical Society. At these he gave a whole series of reports, including some on the new set theory. Some faculty attended these sessions, including Mlodzeyevskii and N. E. Zhukovskii, the aerodynamics pioneer and then head of the Moscow Mathematical Society. Luzin also attended. He was one year younger than Florensky and one year b e h i n d him in school. A friendship developed that was subsequently maintained. Demidov has discovered twenty-two years of correspondence between them. Demidov surmises that the better-read Florensky, with his interest in settheoretic and function-theoretic themes, influenced the young Luzin. Luzin kept a portrait of Florensky as a young man on his desk. Florensky, in his senior year, published the first Russian language outline of Cantor's set theory. This artide, entitled On Symbols of the Infinite, appeared not in a mathematics journal, but in a short-lived monthly called Novyi Put which was issued in 1903-1904 by the Religious-Philosophical Society of Writers and Symbolists! Central to the founding of this Society was the theologian and mystic V. S. Solovyev who died in 1900, the year before the Society was founded. He had en-
A photograph of Pavel Aleksandrovich Florensky in the uniform of a student of the Moscow Theological Academy, which he attended between 1904 and 1908. This was the picture of Florensky which Luzin kept on his desk.
gaged in a long and lonely effort against the growing hostility to religion among intellectuals [31, p. 100]. One of his students was Alexander Blok, the great symbolist poet, w h o played a role in the Society [31, p. 89]. Bugayev's son Andrei Belyi, another symbolist poet, was also active in the Society. He wrote an autobiographical p o e m tiffed A Moscow Eccentric in which his father is the major character [17, p. 193]. Thus, it seems, Bugayev was the first Russian mathematician to be represented as a hero in a poem! The Society sponsored well-attended public lectures in which spokesmen for the Russian Orthodox Church and the intelligentsia called for political reform and a revival of parish life. The lectures were suspended by the regime in 1903 for their outspoken criticism of state control of the Church [31, pp. 90-91]. The Society was dissolved under state pressure the following year. The Moscow Psychological Society, a center of philosophical idealism, was a n o t h e r organization influenced by the legacy of Solovyev. Bugayev contribu t e d papers on " t h e f r e e d o m of the will" to the journal of this Society. The idealist philosopher L. M. Lopatin e n d o r s e d Bugayev's emphasis on discontinuous functions as a recognition of "the mathematical indispensability of freedom" [27, p. 354]. Lopatin and S. N. Trubetskoi were convinced disciples of Solovyev's efforts to interest intellectuals in the Church. Florensky studied under both of them at the University [24, p. 25]. Florensky was to become one of the greatest figures in this religious-philosophical revival. Florensky's remarkable abilities in mathematics were apparent in high school. He had come to regard science as the key to the secrets of existence. After
graduation he experienced a crisis when "the limitations of physical knowledge were revealed to m e . " This realization, paradoxically, freed him to pursue the practical uses of science. "My strivings towards the technical applications were instilled by my father, but took form only when science ceased to be an object of faith. And later on, from that very crisis, came my interest in religion" [10, p. 196]. It is not surprising that Bugayev made a great impression on Florensky. The applications of mathematics to other fields and the relationship between theology and science, which were primary concerns for Bugayev, became so for Florensky as well. U p o n graduation from the University in 1904, Florensky t u r n e d d o w n its offer of a graduate fellowship in mathematics and enrolled in the Moscow Theological Academy. It is difficult today to appreciate the unprecedented nature of this decision. At that time most priests were trained in religious schools. Rarely did aspiring priests enter the seminary from a secular university. Rarer still was a seminarian w h o had studied mathematics and science, much less one w h o had been a top student in these fields at the prestigious Moscow University. Florensky graduated from the Moscow Theological Academy in 1908 and was appointed to its faculty. In 1914 he received his master's degree in theology for his thesis The Pillar and Foundation of Truth. This work is over 800 pages in length, including 400 pages of footnotes and commentaries, and "marked the beginning of a new era in Russian theology" [31, p. 101]. Egorov read it and discussed it in a letter to Luzin, stating "I found much of interest in it." (See [8, p. 355], letter of 27 July 1914.) Luzin wrote to Florensky that he frequently read and reread the work and that "besides its profundity," he admired "your ability to work." He added that Egorov was pleased by it ([6, p. 174], letter of 4 January 1915). Luzin was t h e n w o r k i n g on his o w n master's thesis, about which Florensky inquired in a letter later that year ([6, p. 179], letter of 24 October 1915). This was the famous treatise on trigonometric series for w h i c h Luzin was, quite exceptionally, awarded the doctorate [20, p. 286]. Florensky became editor of the theological journal of the M o s c o w Theological A c a d e m y , Bogoslovskii Vestnik, and n a m e d Luzin as one he hoped would write articles for it [25, p. 301]. In his letter of 24 October 1915, Florensky encouraged Luzin to consider writing something for it "in the manner which we discussed." We do not know whether Luzin ever did so. Luzin's interest in such questions is evident both from his s u b s e q u e n t w o r k s a n d his c o r r e s p o n d e n c e . Luzin's contemporaries, in particular A. N. Krylov and H. Lebesgue, noted his philosophical interests. Unlike Florensky, however, Luzin was primarily a THE MATHEMATICAL INTELLIGENCER VOL 13, NO 2, 1991
27
mathematician. Still, I find it interesting that Luzin waited three years after graduation before committing himself completely to mathematics [20, p. 283]. He attended lectures first in medicine and then in philosophy. In a letter to Luzin during this period Egorov besought him not to give up mathematics. (See [8, p. 341], letter of 21 January 1907. See also the letter translated in [22].) He expressed his utmost concern for Luzin who, in a "spiritual depression," had left town to see Florensky. During this period Luzin apparently experienced a desire to work among the people [1, p. 335]. Did Florensky's dramatic switch to theology cause him to also consider a radical change in career?
promisingly as he understood them, even when it led to inevitable conflicts" [17, p. 199]. Lyusternik pair}ted a positive picture of Bugayev, [17, p. 192], and treated most of the pre-revolutionary mathematicians favorably. He told of an episode in which, on behalf of a student, Egorov intervened with Lunacharskii, a high Soviet official [17, p. 189]. The s t u d e n t s a n d y o u n g c o l l e a g u e s w e n t to E g o r o v ' s h o u s e t h r e e t i m e s a year, at Easter, Christmas, and on his name-day. The name-day refers to the day in the church calendar devoted to Saint Dmitrii, for w h o m Egorov was named. He was not only a member but an elder (lay leader) in the Russian Orthodox Church.
The Moscow School When Luzin finally decided, in 1909, to focus entirely on mathematics, he studied one year with Egorov and then went abroad for four years. During his first two years abroad, Luzin and Egorov each published a significant paper in the theory of functions. This is commonly taken as the birthdate of the Moscow school of the theory of functions. As Egorov had done before him, Luzin traveled to GSttingen and Paris, where he became acquainted with the leading mathematicians working on the theory of functions. He returned to Moscow and in two more years completed his famous treatise on trigonometric series. In 1917 he was appointed to the faculty. Over the next ten years he and Egorov attracted a whole series of students who became first-rate mathematicians, thus creating the Moscow school of mathematics. Whereas Egorov was reserved and formal, Luzin was extroverted and theatrical, inspiring real devotion among these students and young colleagues [30, p. 24]. A member of this group, M. A. Lavrentev [14], has described the intense camaraderie and ritual forms of the group, which resembled a secret order. This was inspired by Luzin. Egorov seems to have been completely outside of it. Indeed, the "special language'" of the order might have seemed to Egorov to border on sacrilege. Lyusternik, one of the "Initiative Group" who had been part of this group, wrote a poem about the gripping e n t h u s i a s m - - a n d jealousy - - t h a t he experienced [20, p. 301]. He described an unnamed professor who expressed contempt for this behavior [26]. Might this have been Egorov? It is interesting to observe how Lyusternik handled his attack on Egorov in his memoirs. He approached the topic several times, referring to a "crisis" in Moscow mathematics "around 1930," but was unable to bring himself to say anything more about it. The closest he came was the following r e m a r k about Egorov: "this steadfast man evidently saw it as his duty to preserve the old university traditions uncom28
THE MATHEMATICAL INTELLIGENCER VOL 13, NO 2, 1991
Egorov under Attack The Church experienced a wave of violent repression after the revolution that culminated in a mass execution of clergy in 1922-1923. This led to a rise in religious commitment across the country. Lay people, including members of the intelligentsia, stepped in to defend the Church [21, pp. 99-102]. The attack was renewed in 1928. During the next ten years, nearly all religious communities were swept away in a tide of destruction [21, chapter 5]. Egorov publicly d e f e n d e d his leadership in the Church and tried to shelter academics who had been dismissed from their positions. He resisted the impo-
The name of E g o r o v w a s t u r n e d i n t o an epithet of opprobrium, "Egorovshchina," and a conciliatory stance towards it was denounced. sition of Soviet controls on academia and the admitting of large numbers of students chosen for their political rather than mathematical ability. Once w h e n a graduate student, V. Molodshii, came to ask him about mathematics wearing the symbol of the Komsomol (Communist Youth League), Egorov refused to talk with him [4]. Egorov was a leading figure in the Moscow Mathematical Society. He was elected secretary in 1917, vicepresident in 1921, and president in 1923. In 1921, u n d e r the influence of the Society, the Institute for Mechanics and Mathematics was created at Moscow State University to promote research. In 1923 Egorov became director. One of his students, V. A. Kostitsyn, became secretary. In 1923 Egorov was also appointed Chairman of the Mathematical Syllabus Commission of the University [13, p. 127]. Egorov had been elected a corresponding member of the Academy of Sciences in 1924, and an honorary
decline to send greetings to the Sixteenth Congress of the Communist Party meeting at the same time [12, p. 79] and [11, p. 243], [28, p. 102]. Another mathematician entered the conflict, Ernst Kolman, a militant Marxist. He denounced the Society for stubbornly refusing to expel Egorov. From him we have the only publicly recorded response that Egorov gave to the charge of "wrecking: .... genuine wrecking is nothing other than the imposition of a standard worldview on scientists" [12, p. 79]. This referred to the attempt to impose Marxist methodology on science. The statement was made during the discussion following a report delivered by O. Yu. Shmidt at the Congress of Mathematicians [3, p. 11]. Shmidt had replaced Egorov as director of the Institute and was a member of the Communist Party. Kolman was particularily incensed, because Shmidt,
A painting by the famous Russian religious artist N. N. Nesterov in 1917 entitled Philosophers. Florensky is on the left, in white. On the fight is his close friend Sergei Bulgakov, a priest and religious philosopher. After being expelled from the Soviet Union in 1923, Bulgakov became one of the leading figures in the newly founded Saint Sergius Theological Institute in Paris. Florensky was, he wrote, the major inspiration behind the institute.
member on 13 February 1929 [13, p. 127]. This last appointment is noteworthy, because the Academy was then under severe pressure. At that meeting, for the first time, candidates of the party were forced into the Academy (including the recently rehabilitated party leader Nikolai Bukharin) [9, p. 103]. In the mid-1920s a "war" was declared on Egorov in his capacity as head of both the Society and the Institute. He was forced to resign as Chairman of the Mathematical Syllabus Commission of the University. In 1929, he was dismissed as director of the Institute, which was reorganized, and "a sharp proletarianization of personnel was conducted" [29]. Egorov was given a public rebuke at a meeting of graduate students of Moscow State University on 21 December 1929. They then adopted an annual work plan that committed themselves, among other things, to do "antireligious work" [7]. The Soviet historian of mathematics A. P. Yushkevich has described the accusations against Egorov as "religious-political" [28, p. 103]. (See Joravsky [11, pp. 242-244] for this entire section. See also [16].) Although it was hopeless, Egorov continued to resist. As president of the Society, he continued to make it "a refuge where the old spirit reigned." He was the leading figure at the All-Union Congress of Mathematicians held in June 1930, which had the effrontery to
a communist, not only did not rebuke him, but in his concluding statement he ignored a proposal to have the organization deal with Egorov's statement, having explained everything as a "misunderstanding." Kolman went on to denounce the guardians of the traditions of Tsinger, Bugayev, Nekrasov, and Lopatin for attempting to uphold "orthodoxy, autocracy, and nationalism." Egorov's Fate
Not long after that Egorov was arrested. Even then, the Society continued to resist by conducting its next m e e t i n g as t h o u g h n o t h i n g h a d h a p p e n e d . The meeting was devoted to reports by S. P. Finikov, one of Egorov's students, his close relative, and his closest administrative collaborator, and by Kurosh. For that Kurosh was expelled from the Komsomol [18, p. 70]. (He was later readmitted [11, p. 372, note 51].) This p r o m p t e d the formation of the "Initiative Group" and the Declaration. It declared with satisfaction that Finikov had been expelled from the faculty of the Institute, and that the Society was about to undergo "reorganization," in accordance with an eightpoint program, which included political reeducation and struggle against "a conciliatory attitude toward religion and idealistic philosophy." Tribute was paid to the "Leninist guidance of our experienced leader comrade Stalin." Although the majority of the signers were not party members, they promised to "compensate" for not having been involved in the struggle earlier. At the next meeting of the Society on 21 November 1930, the "Initiative Group" took control. Finikov attempted to protect Egorov and was expelled [29]. The results were announced in a note from the editors in the next issue of the journal published by the Society [19]. It reported that the Society had adopted the Declaration. Egorov was denounced as "a reactionary and THE MATHEMATICAL INTELLIGENCER VOL 13, NO 2, 1991
29
a c h u r c h m a n . " He a n d "other reactionaries" had been expelled. N e w leadership had been elected. Lyustemik was n o w editor-in-chief. Gelfond was secretary. The word Soviet was being added to the n a m e of the journal, although the n e w name was never actually used [23]. In fact the Society ceased functioning for a year. It was revived in 1932 [11, p. 244]. Egorov resisted to the end. He was exiled to Kazan, where his friend Chebotarev was on the faculty, and kept in isolation. Although Egorov was a member of the Kazan Physico-Mathematical Society, only Chebotarev had the courage to attempt to help him. At first not allowed to visit him, Chebotarev m a n a g e d to bribe his w a y in. He f o u n d Egorov on a h u n g e r strike and finally c o n v i n c e d h i m to give it u p , b u t too late. Egorov was already seriously ill. He asked for a Bible. His wife was finally allowed to join him [4]. On 10 September 1931 at age 61, he died in Chebotarev's arms [30, p. 24], in the local hospital [28, p. 103], [5]. Only Chebotarev a n d Egorov's wife were present at the funeral. On 25 September 1931 Izvestiya a n n o u n c e d his death and his burial in the Arskoye Cemetery in Kazan [13, p. 127].
References and Notes Note: Except for the widely used spellings of Egorov and Florensky, I have tried to use a consistent transliteration scheme. 1. Douglas E. Cameron, The birth of Soviet topology, Topology Proceedings 7 (1982), 329-378. 2. S. S. Demidov, From the early history of the Moscow school of function theory, Istoriko-Matematicheskiye Issledovamya 30 (1986), 124-129 (Russian). This volume of the journal contains several articles on the role of P. A. Florensky in the early history of the Moscow school. An English translation has appeared in Phzlosophia Mathematica 3 (1988), 29-35. 3. S. S. Demidov, The Moscow school of the theory of functions in the 1930's, preprint of a talk given at the meeting of the American Mathematical Society in Louisville KY, January 1990. 4. S. S. Demidov, phone conversation with the author, late May 1988. 5. S. S. Demidov, in a conversation with the author, August 1989, identified the hospital as belonging to the Instztut usovershenstvovaniya vrachei, an institute for retraining doctors who had worked for long periods in small towns. 6. S. S. Demidov, A. N. Parshin, and S. M. Polovinkin, ed., The correspondence between N. N. Luzin and P. A. Florensky, Istonko-Matematzcheskiye Issledovaniya 31 (1989), 116-191 (Russian). 7. I. G. Dolin, At the meeting of the graduates, Nauchnyi rabotnik, 1930, No. 1, 18-20 (Russian). 8. D. F. Egorov, Letters of D. F. Egorov to N. N. Luzin, ed. F. A. Medvedyev and A. P. Yushkevich, IstonkoMatematicheskiye Issledovamya 25 (1981), 124-129 (Russian). 9. Loren R. Graham, The Soviet Academy of Sciences and the Commumst Party, 1927-1932, Princeton: Princeton University Press (1967). 30
THE MATHEMATICAL INTELLIGENCER VOL 13, NO 2, 1991
10. Abbot Herman and Fr. Damascene, St. Paul Florensky, The Orthodox Word 23 (1987), 195-209, 223-231. Paul is the Latin equivalent of Pavel, Florensky's given name. 11. David Joravsky, Soviet Marxism and Natural Science 1917-1932, New York: Columbia University Press (1961). 12. Ernst Kolman, Wrecking in Science, Bolshevzk, 1931, No. 2, 73-81 (Russian). 13. P. I. Kuznetsov, Dmitrii Fedorovich Egorov, Russian Mathematical Surveys 26 (1971), 125-164. 14. M. A. Lavrentev, Nikolai Nikolaevich Luzin, Russian Mathematical Surveys 29 (1974), 173-178. 15. Aleksey E. Levin, Expedient catastrophe: A reconsideration of the 1929 crisis at the Soviet Academy of Sciences, Slamc Review 47 (1988), 261-279. 16. Aleksey E. Levin, Anatomy of a public campaign: "Academician Luzin's Case" in Soviet political history, Slavic Review 49 (1990), 211-252. 17. L. A. Lyusternik, The early years of the Moscow mathematical school (Part U), Russzan Mathematical Surveys 22 (1967), 171-211. 18. L. A. Lyusternik, L. G. Shmrelman, A. Gelfond, L. Pontryagin, and Nekrasov, Declaration of the Initiative Group for the Reorganization of the Mathematical Society, Nauchnyi rabotnik, 1930, No. 11-12, 67-71 (Russian). I thank David Webb for his translation of this article and the one by Demidov. 19. Matematichesku Sbornik 38 (1931), Ot redaktsii, inside front cover. 20. Esther R. Phillips, Nicolai Nicolaevich Luzin and the Moscow school of the theory of functions, H~storiaMathematica 5 (1978), 275-305. 21. Dimitry Pospielovsky, The Russian Church under the Soviet Regime 1917-1982, Crestwood, NY: St. Vladimir's Seminary Press (1984). 22. Allen Shields, Luzin and Egorov, The Mathematical Intelligencer 9 no. 4 (1987), 24-27. 23. Allen Shields, Luzin and Egorov: Part 2, The Mathematical Intelligencer 11 no. 2 (1989), 5-8. 24. Robert Slesinski, Pavel Florensky: A metaphysics of love, Crestwood, NY: St. Vladimir's Seminary Press (1984). 25. Andronik Trubachev, Priest Pavel Florensky--Editor of Bogoslovskii Vestnik, Bogoslovskiye Trudy 28 (1987) (Russian). This is a publication of the Moscow Patriarchate. Trubachev is Florensky's grandson and himself a priest. 26. Uspekhi Matematicheskikh Nauk 25 (1970), Lazar Aronovich Lyusternik, 3-10 (Russian). In the poem on page 4 Lyusternik represented the unnamed professor with the Latin letter Y. This suggests Egorov, because Y is the first letter in many transliterations of that name. 27. Alexander Vucinlch, Science in Russian Culture1861-1917, Stanford, CA: Stanford University Press (1970). 28. A. P. Yushkevich, The case of Academician N. N. Luzin, Vestnik akademiyi nauk USSR, 1989, No. 4, 102-113 (Russian). 29. I. Zaidenvar, The October Revolution in the Mathematical Society and in the Institute of Mechanics and Mathematics, VARNITSO, 1930, No. 11-12, 73-74 (Russian). This journal was merged with Front naukt i tekhniki. 30. Smilka Zdravkovska, Listening to Igor Rostislavovich Shafarevich, The Mathematical Intelligencer 11 no. 2 (1989), 16-28. 31. Nicolas Zemov, The Russian Religious Renaissance of the Twentieth Century, London: Darton, Longman & Todd (1963).
Department of Mathematics and Computer Sczence Saint Louzs Unwersity Saint Lores, MO 63103 USA
David Gale* For the general philosophy of this section see Vol. 13, no. 1 (1991). Contributors to this column who wish an acknowledgment of their contribution should enclose a self-addressed postcard. We All Make Mistakes Some of us more than others, so it may be comforting to realize that occasionally even the great mathematicians have published erroneous results. Readers are invited to contribute examples of this phenomenon, where a mistake should mean not just a gap in a proof or a case of meaning one thing and writing another, but rather an explicit assertion that is false. Are there any such examples in the work of Gauss, I wonder? R. M. Robinson has called my attention to a lapse by Minkowski in which he asserts that the difference set of a tetrahedron is an octahedron. In his 1906 paper "'Dichteste gitterf6rmige Lagerung kongruenter K6rper," Nachrichten der K6nig. Gesellschafl der Wissenschaften zu G6ttingen 5(1904), 311-355, Minkowski writes For example, if K is a tetrahedron, then 89 + K') becomes an octahedron with faces parallel to the faces of the tetrahedron. Here K' is "the reflection of K with respect to the point O." It's a curious sort of error since the polyhedron in question clearly has 12 rather than 6 vertices, namely all pairs a, - ar i # j, where the a. are the vertices of the tetrahedron. The correct polyhedron is in fact the convex hull of the midpoints of the 12 edges of a cube, so it has eight triangular and six square faces.
Paradoxes and a Pair of Boxes What is a paradox? Perhaps the best-known examples in mathematics are Russell's paradox and the BanachTarski paradox, but it should be noted that the nature of these two results is very different. The BanachTarski Theorem is considered paradoxical because it shows that sets can behave in a way very different from our intuitive notions about them. Russell's paradox, on the other hand, shows that starting from
* C o l u m n editor's a d d r e s s : D e p a r t m e n t of M a t h e m a t i c s , University of California, Berkeley, C A 94720 USA.
what seem to be plausible axioms one can arrive at a contradiction. The proper term for this is not paradox but antinomy, which, according to Webster, is "a contradiction between two apparently equally valid principles or between inferences correctly drawn from such principles," while a paradox is "a statement that is seemingly contradictory or opposed to common sense and yet is perhaps true." In the examples to follow the first two are antinomies and the last two are paradoxes. 1. The other box. You are presented with two boxes, each containing a certain amount of money that has been placed there by the following rule. A fair coin was tossed until it fell tails. If n heads were tossed then one of the boxes contain 3" dollars and the other 3,+ 1. You are allowed to open one of the boxes and count its contents. You may then either pocket this money or switch and take the money in the as yet unopened box. What should you do? Well, clearly if the box you open contains one dollar you should take the three dollars in the other box. N o w suppose the box you open contains 3" dollars. Then one easily sees that the other box will contain 3"-1 or 3" + 1 dollars with respective probabilities 2~ and 1/3, so your expectation from switching is 2 3,_1 + 1 3,+1 = 11 3" > 3", so y o u m a x i m i z e y o u r e x p e c t e d w i n n i n g s b y switching; so, assuming you are an expectation maximizer, this is what you will do. (The debate as to whether expectation maximizing is "reasonable" is of course beside the point, since we are concerned here only with the mathematics and not with its implications for behavior.) But now, since you know in advance y o u will always switch, there is no point in w a s t i n g time o p e n i n g and counting: y o u should simply choose "the other box" to begin with, and the same argument then shows that whichever box you choose, you would have been better off expectationwise to have chosen the other. Readers will have noticed that this game has a definite Petersburgian flavor in that the expected win-
THE MATHEMATICALINTELLIGENCERVOL 13, NO 2 9 1991Spnnger-VerlagNew York 31
nings in the game are infinite. The novelty of this variant is the fact that it seems to lead to a contradiction and thus we are dealing with an antinomy rather than a mere paradox. 2. Beat the house. A somewhat similar example was told to me by Lester Dubins, who is uncertain as to its origin. In a certain casino one can play the following game. The house posts a positive integer n. In this game it is you the customer who is invited to toss the fair coin until it falls tails. If you tossed n - 1 times then you pay the house 8"-1 dollars, but if you tossed n + 1 times you win 8" dollars from the house. In all other cases the payoff is zero. Since the probability of tossing exactly n times is 1/2", your expected winnings are 8"/2" +1 _ 8"-1/2"-1 = 4"-1 for n > 1, and 2 for n = 1, so your expected gain, which is the house's expected loss, is positive. But now it turns out that the house arrived at the number n by tossing that same fair coin and counting the number of tosses up to and including the first tails. Thus, you and the house are behaving in a completely symmetric manner. Each of y o u tosses the coin and if the n u m b e r of tosses happens to be the consecutive integers n and n + 1, then the n-tosser pays the (n + 1)-tosser 8" dollars. But we have just seen that the game is to your advantage as m e a s u r e d by expectation no matter what number the house announces. How can there be this asymmetry in a completely symmetric game? 3. The other box again. (A mild modification of an example due to David Blackwell.) This time the boxes contain not money but each box contains an integer (perhaps printed on a card), and the only thing you know about them is that they are distinct. You draw one of them at r a n d o m and are then s u p p o s e d to guess whether the other is higher or lower. Is there anything you can do so that you will have a better than even chance of guessing correctly? Surprisingly the answer is yes, provided you have some mechanism for randomizing as, for example, a true coin to toss. To be specific, suppose you have a spinner like those used in children's board games. You should proceed to spin and then record the angle 0 between the initial and final position of the pointer. Now draw your number and guess higher or lower according as cot 0/2 is greater or less than the number you drew. Let us assume for convenience that 0 has the uniform distribution on [0,2~r). Claim: If the two numbers are p > q, then the probability that you will guess correctly is 1/2 + (cot-lq _ cot-lp)/2~r. Namely, q < cot 0/2 < p if and only if 2 cot-lp < 0 < 2 cot-lq and from uniformity this has probability ~/ = (cot-lq - cot-lp)/~r. In this case because of the guessing rule you will be right no matter which number you draw. In the other cases where 0 is either greater or less than both p and q your probability of guessing right is one-half, since you are equally likely to draw p or q. Thus your probability of a correct guess is 1/2(1 - -,/) + ~/ = 1/2 + ~/2 as claimed. 32
THE MATHEMATICAL INTELLIGENCER VOL 13, NO 2, 1991
Mathematically the argument above is air-fight but it raises some interesting philosophical questions about the applicability of probability theory in decision making. For example, s u p p o s e you d o n ' t have a spinner but are wearing a watch, the old-fashioned nondigital kind, and choose 0 to be the angle between the m i n u t e and hour h a n d in order to make y o u r guess. Is this in any sense a kind of randomization and if not why not? Or, suppose instead of numbers the boxes contain stones of different weights and you have a balance so you can compare weights but no scale for making measurements. You draw a stone and must guess whether the other is heavier or lighter. Is there anything you can do? This might lead one to think that randomization is possible only for quantitative comparisons, meaning that one must associate numbers with the objects drawn, but this need not be true either. Suppose, for example, the boxes contain slices of pie. Then a spinner is just what you need. Guess bigger or smaller depending on whether the spinner angle is greater or less than the angle of the slice you draw. This can be determined by direct visual comparison. There is no need for a protractor and numbers are not involved in any way. 4. The other's number. This time the integers in the boxes are positive and consecutive. Each player draws one a n d is s u p p o s e d to find out the o p p o n e n t ' s number by the following procedure. The players are equipped with a blank card and a pencil. If at any time a player knows her opponent's number, she writes it on her blank card and wins the game. If neither player knows the other's number, they exchange blank cards and start over. The assertion is that, with perceptive players, this game will terminate. More precisely we have THEOREM. If the two numbers are n and n + 1, then the player holding n will win after n - 1 exchanges.
The proof is by induction. If n = 1 then the player holding 1 will know the opponent's number is 2 and the game ends with no exchanges. Now assume the conclusion is true up to n a n d suppose the lower number is n + 1. Then the player holding this number knows that if her opponent holds n he will end the game after the (n - 1)st exchange (induction hypothesis), so when he doesn't do this she knows after the exchange that he must hold n + 2 and she wins. The paradoxical point is this: suppose the numbers held are, say, 72 and 73. Then neither player knows the other's number and both are aware of their opponent's ignorance so they know for sure that the first stage of the game will be an exchange of cards; so w h e n this indeed takes place they have apparently gained no new knowledge, yet since the game is now one step closer to termination, something must have changed. What was it?
Problems Supporting cords of convex sets (91-2) by Serge L. Tabacnikov (Moscow, USSR). Let A,B be plane convex sets with A C int B. Prove there are at least two cords of B that are tangent to A at their midpoints.
Boomerangproblem: Quickie (91-3) (origin unknown to column editor; information would be appreciated). A boomerang is a nonconvex quadrilateral. Prove that it is impossible to tile a convex polygon with a finite number of (not necessarily congruent) boomerangs.
Two contributions from Lee Sallows
letter frequencies are determined and then substituted for these, the new version then furnishing the argument for the next iteration, and so on. The result is a series of approximations tending toward the goal. I like to picture this process as a machine that takes sentences as input and yields sentences as output, the latter coupled back to the input via a feedback loop. This makes it easier to see that a self-descriptive sentence is effectively a virus able to subvert the machine so as to get itself perpetually reproduced. Such a sentence has only to appear once at the input in order to trigger a closed loop of period 1 and thus be regurgitated ad nauseam (if you see what I mean). The only trouble is, there are still other viruses that will probably infect the machine first! These are the sentence chains of longer period, in any one of which it may easily become ensnared, and thus be prevented from converging onto a self-descriptor. H o w can we immunize the machine against such interloopers? My answer is a modified machine that will scramble possible cycles through performing non-repetitively: Instead of correcting every total on every pass, I have it correct a single total chosen at random each time. N o w no recurrent cycle can survive such irregular exchanges, except of course in the special case where the totals remain unchanged because already correct: a self-descriptor! [Note here the complete analogy with a neural network settling into a stable solution state, while avoiding latch-up in pseudo-solution states through "jiggling."] In fact the "random" selection need not be truly random, provided only that the repetition period of its own pattern be longer than that of any possible loop the machine may fall into. Hence, any conventional pseudo-random number generator serves well. A few million iterations (mutations) normally suffice to evolve (naturally select) a viable solution (virus) provided one exists. If not we can try again with a modified text." Sallows" second contribution involves neither letters nor numbers and is presented below without comment.
First the following: "This computer-generated sentence contains two hundred forty-seven letters: four a's, one b, four c's, five d's, forty-four e's, nine f's, three g's, seven h's, eleven i's, one j, one k, three l's, two m's, twenty-nine n's, nineteen o's, two p's, one q, fourteen r's, thirtyone s's, twenty-five t's, seven u's, eight v's, seven w's, two x's, six y's, and one z." N o w that you've had a chance to verify the correctness of the sentence you may wonder h o w the computer generated it. Here is Sallows' description of how it's done. "The algorithm that generated the above sentence implements an iterated function. Starting with a similar text, but using randomly selected totals, its true THE MATHEMATICAL INTELLIGENCER VOL 13, N O 2, 1991 3 3
Information-Based Complexity: New Questions for Mathematicians J. F. Traub and H. Wo
1. Introduction The computational complexity of a problem is the intrinsic difficulty of its solution and is independent of the particular algorithm used. It is a fundamental invariant of computation. Information-based complexity typically studies the computational complexity of infinite dimensional problems, that is, problems defined on an infinite-dimensional space. For brevity we denote information-based complexity as IBC. As a fairly recent field, it poses many new questions for mathematicians. This paper consists of five sections. To make the paper self-contained, Section 2 is devoted to the basic
34
.niakowski
concepts of IBC, confined to material required for the rest of this paper. The reader is referred to a recent monograph [1] for a comprehensive treatment of IBC and an extensive bibliography of the world literature. A survey may be found in [2]. Readers already familiar with the basic concepts may wish to move directly to Section 3. Sections 3 and 4 discuss eight open research questions. Many important problems are provably intractable in the deterministic worst-case setting; that is, there can never be sufficient computer resources to obtain a solution. Can randomization make these problems tractable? What are the power and limitations of randomization? Our current knowledge and
THE MATHEMATICAL INTELLIGENCER VOL 13, NO 2 9 1991 Spnnger-Verlag New York
ignorance is discussed and a number of open questions are posed. Whether settling for an assurance on the average (rather than for the worst case) makes hard problems tractable is also discussed. Assigning values to mathematical h y p o t h e s e s is taken up in Section 4. Hypotheses regarding convexity, smoothness, symmetry, etc. are often made for mathematical convenience. We suggest that computational complexity can be used to quantify the effects of these hypotheses. In the concluding section we compare and contrast IBC and combinatorial complexity. We also briefly discuss a complexity theory recently introduced by L. Blum, M. Shub, and S. Smale [3].
2. Fundamental Concepts of IBC 2.1. An Expository Example. We use an elementary example to motivate the abstract model. For a scalar function, f: [0,1] ~ R, we wish to approximate /- 1
S(f) = J0 f(t)dt. For most integrands, the antiderivative cannot be symbolically expressed in finite terms and we have to integrate numerically. We sample the function at n points. Let
N(f) = [f(tl) . . . . .
/(t,)]
denote the information available concerning the function f. It is clear that this is not sufficient to compute an approximation to the definite integral since the function can have arbitrary behavior between the sample points. We assume that f is r times continuously differentiable with uniformly bounded rth derivative, say by 1, and let F denote the class of all such functions. Our goal is to compute an c-approximation, U(f), such that IS(f) - U(f)l ~< ~,
V / E F,
for any pre-assigned error threshold e. Here U(f) = r for some mapping 6: R" ---> R. That is, the computation of U(f) requires n function values plus a number of arithmetic operations, comparison of real numbers, and evaluations of certain elementary functions needed to compute •(y) for given y = N(f). Define the cost of computing U(f) as the cost of n function values and the cost of operations used in computing q~(y). We can now informally define the computational complexity of this example. The computational complexity, comp(~), is the minimal cost of computing an e-approximation to J'~ f(t)dt for all f from F. The computational complexity is an intrinsic property of computing an e-approximation to a definite integral for the class F. It turns out that comp(r is of order (1/r llr.
Although univariate and multivariate integration are classical problems, there is a huge recent literature on the computational complexity of integration for various classes of integrands. See [1], Chapters 5 and 7, and [4], Chapter 6, Section 4. 2.2 Assumptions of IBC. IBC is defined by three assumptions, namely, that information is 9 partial 9 contaminated 9 priced. We motivate each of these assumptions, in turn, using the integration example. For this example, the information is the vector N(f) = [f(tl) . . . . . f(t,)] and the knowledge that f E F. From this it is, in general, impossible to uniquely recover the function f. That is, N is a many-to-one operator from the function class F to R". We say the information is partial. We must usually deal with partial information w h e n solving an infinite-dimensional problem on a digital computer. The reason is that a digital computer can handle only finite sets of numbers. Objects such as functions must be replaced by finite sets of numbers. Although the focus of IBC has been on infinite-dimensional problems there has been some work on finite-dimensional problems. Examples are large linear systems and eigenvalue problems ([1], Chapter 5, Sections 9 and 10), as well as the discrete problem of synchronizing clocks in a distributed system (see [5]). The function values in the integration example are computed with round-off error. Typically, information is contaminated with errors such as round-off error, measurement error, and h u m a n error. The primary cost of solving the integration problem numerically is computing the function values. We assume information is priced. Combining these function values into an estimate of the definite integral is relatively inexpensive. We give a second example of priced information. In seismographic exploration for oil, the information may consist of data collected from explosions. This is an extremely expensive process. Many computational problems from mathematical analysis, science, engineering, and economics have information that is partial, contaminated, and priced. If the information is partial and/or contaminated we cannot solve the problem exactly; we are forced to accept an approximate solution. How to obtain such solutions at minimal cost is studied in IBC.
2.3. An Abstract Model. We now present an abstract formulation that includes m a n y i m p o r t a n t problems as special cases. Consider an operator
S:F--~G, THE MATHEMATICAL INTELLIGENCER VOL 13, NO 2, 1991 3 5
where F is a subset of a linear space and G is a normed linear space. For f E F, we wish to compute an approximation to S(f). To do this we must know something about f. A basic assumption is that we have only partial information about f. We gather this partial information about f by information operations L(f). Here we will assume that L is a linear functional; see [1], page 27, for a more general definition of L. Let A denote the class of informarion operations that we will permit. The choice of A will depend on the problem we wish to solve. For example, if we wish to approximate a definite integral we must exclude definite integration as a permissible information operation, and for this problem A is usually defined as the class of function evaluations. For other problems, such as the solution of nonlinear equations, we may permit any linear functional. Let
N(f) = ILl(f) . . . . .
Ln(f) ],
for L, E A. The L, as well as the number n can be chosen adaptively. N(f) is called the information on f and N the information operator. The motivation for introducing the information operator N is to replace the element f from an infinite-dimensional space by n numbers. An idealized algorithm rb is an operator q~:N(F) ~ G. The approximation U(f) is then computed by
U(f) = r The assumption that the approximation is the composition of q~ with N is made without loss of generality. In the worst-case setting we seek U(f) such that IIS(f) -
U(f)ll ~ e,
Vf (~ F.
In the average-case setting we seek U(f) such that
~I Hs(f)-
U(f)ll2dpo(f) ~ e,
where ~ is a probability measure on F. Such an U(f) is called an e-approximation. The integration example is an instance of this abstract model with 1
S(f) = fo f(t)dt,
IS(f) - U(f)l ~ e,
Vf ~ F,
or in the average-case setting
~ f l IS(f) - U(f)12d~(f) ~ e, where ~ may be a truncated Wiener measure placed on rth derivatives. 2.4. Model of Computation. To define computational complexity we must first introduce our model of computation, which is defined by two postulates: (i) We are charged for each information operation. That is, for every L E A and f E F, the computation of L(f) costs c, where c > 0. (ii) Let 1~ denote the set of permissible combinatory operations, including the addition of two elements in G, multiplication by a scalar in G, arithmetic operations, comparison of real numbers, and evaluations of certain elementary functions. We ass u m e that each combinatory operation is performed exactly with unit cost. In particular, we assume the real number model, that is, we can perform operations on real numbers exactly and at unit cost. See Section 5 for a discussion and motivations underlying the model of computation and the real number model. We briefly describe how the computation is carried out and h o w its cost is calculated. See [1], page 34, for a complete description. Let cost(N,f) denote the cost of computing the information N(f). Knowing the information N(f), the approximation U(f) = r is computed by combining the information to produce an element of G that approximates S(f). Let cost(q~, N(f)) denote the cost of computing U(f) = r given N(f). Then the total cost of computing U(f), cost(U,f), is cost(U,f) = cost(N,f) + cost(b, N(f)). 2.5. Computational Complexity. We are ready to define the basic concept of our subject, the computational complexity of an t-approximation. As in Section 2.3, let U(f) = r be an approximation to S(f). Then the computational complexity, comp(e), is defined as comp(e) = inf{cost(U): U such that e(U) ~ e},
F = {f:f ~ C r (0,1) and IIf(r)La. ~ 1}, and G is the set of real numbers. The functionals are chosen as L,(f) = f(t,). An example of an algorithm is
r
1 = - ~_~ f(t,). nz=l
We seek U(f) = rb(N(f)) such that in the worst-case setting 36
THE MATHEMATICAL INTELLIGENCER VOL 13, NO 2, 1991
with the convention that inf r = ~. The definitions of cost(U) and the error e(U) vary according to the setring. As in Section 2.4, let cost(U,f) denote the cost of computing U(f). Settings studied in IBC, see [1], include worst case, average case, probabilistic, randomized, and asymptotic. Mixed settings are also studied. We confine ourselves here to the definition of just the worst-case and average-case settings.
Worst-Case Setting: The worst-case error and worst-case cost of U are defined by e(U) = sup IIS(f) fEF
-
U(f)ll,
cost(U) = sup cost(U, f). f~F
Average-Case Setting: Let ~ be a probability measure defined on F. The average-case error and average-case cost of U are defined by e ( U ) = ~fF IIS(f) - U(f)ll2dp'(f)" cost(U) = (cost(U,f)dw(f). & For the remainder of this paper we shall, for brevity, refer to computational complexity as complexity. We began this paper by pointing out that the complexity of a problem is the intrinsic difficulty of its solution and is independent of the particular algorithm used. We can n o w amplify this observation. Complexity depends on S and F. It also depends on the setting, and for the average-case setting on the a priori measure ~. Finally, it depends on A, the class of permissible information operations, and on the model of computation. The concept of complexity permits us to introduce the fundamental concepts of optimal information and optimal algorithm. Information N and an algorithm r that uses N are called optimal information and optimal algorithm, respectively, if and only if U = ~b o N satisfies cost(U) = comp(e) and e(U) ~< e.
3. Intractability and Noncomputability for Continuous Problems 3.1. Fundamental Concepts. A central question in combinatorial complexity is w h e t h e r NP-complete problems are intractable. This question has been extensively studied for discrete problems; see [6]. It has recently been posed for finite-dimensional continuous problems by Blum, Shub, and Smale [3]. We will define and discuss intractable and noncomputable problems of IBC. As we shall see, many important problems are probably intractable or noncomputable in the deterministic worst-case setting. Can randomization or settling for an average or probabflistic assurance break intractability and noncomputability? We shall indicate w h a t is k n o w n and then pose a number of open questions. Given an error tolerance ~ > 0, it has been shown for many problems defined on function spaces that in the deterministic worst-case setting the computational complexity, comp(~), is of order (1/e) d/r, where d is the problem dimension, i.e., d is the number of variables
of functions, and r measures the smoothness; see [1]. That is, comp(e) = O We use big O notation, which means that there exist positive constants c u c2, and c3 such that Cle-d/r comp(e) ~< c2~.-d/r for e ~ [0,c3]. Among the problems with such complexity are (see [1]) approximation, nonlinear optimization, systems of nonlinear equations, and Fredholm integral equations. A numerical example illustrates the implications of this complexity. If one wants eight-place accuracy in computing triple integrals of functions that are once differentiable, then e = 10 -8, d = 3, r = 1, and (l/e) a/r = 1024. Even if we generously assume the existence of a computer that performs 101~ operations per second, the computation will take on the order of 1014 seconds or over 3 million years. Note two important differences between a typical NP-complete problem such as the traveling-salesman problem and the multivariate problems discussed above. Exponential complexity for the travelingsalesman problem (and many other problems) is currently only conjectured, but exponential complexity has been established for the indicated problems in information-based complexity. Secondly, the role of problem size in combinatorial complexity is, in information-based complexity, often taken by the dimension d. We observe that problems of high dimensionality occur in such diverse disciplines as physics and nonlinear economic modeling. If the complexity is of order (1/e) d/r, then for fixed smoothness r and increasing dimension d the problem is exponential in d and is intractable. An even more serious impediment to solving a problem than intractability may occur; a problem may be noncomputable. We say a problem is noncomputable if it is impossible to compute an e-approximation at finite cost. That is, there exists % > 0 such that comp(e) = 0%
Ve ~< %.
THE MATHEMATICALINTELLIGENCERVOL 13, NO 2, 1991 3 7
The motivation for this definition is that if the complexity is infinite, there does not exist an algorithm for computing even an approximate solution that uses a finite number of functionals and a finite number of operations. We give a simple example of a noncomputable problem. We wish to compute an e-approximation to a definite integral in the deterministic worst-case setring. It is enough to consider the scalar case. Assume that we know the class of integrands consists of continuous and u n i f o r m l y b o u n d e d functions. This problem is noncomputable. This is a very special case of a general situation. We observed above that often comp(e) is of order (l/e) a/r for positive r. If the functions are only continuous, that is, if r = 0, then generally the complexity is infinite for all these problems. We e m p h a s i z e that these negative results concerning intractability and noncomputability are instrinsic. There is no way around them by being clever. The intractability and noncomputability results stated above are for the deterministic worst-case setting. This is the same setting in which a problem in combinatorial complexity is NP-complete. If we wish to mitigate these negative conclusions we are forced to consider other settings. 3.2. Does Randomization Help? One possible way to break intractability or noncomputability is to permit randomization. Physicists have long known about the power of randomization. For example, classical Monte Carlo methods may be regarded as using randomized information to make the number of function evaluations needed to compute an e-approximation to a multivariate integral independent of dimension. More recently, computer scientists have used randomized algorithms to solve problems with complete information such as primality testing; see, for example, [7]. We will permit randomization of all the following: 9 The information can be randomized. That is, both the functionals and their number can be randomly chosen. 9 The algorithms can be randomized. We emphasize that whenever randomization is introduced it weakens the assurance we can offer regarding the error. Even if we are willing to live with this weaker assurance, will randomization decrease complexity? As we shall discuss below there are both negative and positive results. To pose questions and indicate results we must specify the settings. The first setting we will describe is the setting u s e d in classical Monte Carlo. The r a n d o m i n f o r m a t i o n a n d a l g o r i t h m d e p e n d on random variables whose distribution is p. The error and cost of an algorithm are defined by averaging with respect to p, which may depend on f, and then taking 38
THE MATHEMATICAL INTELLIGENCER VOL 13, NO 2, 1991
the worst case with respect to problem elements f. The complexity is defined as the minimal cost for approximating the solution to within e. That is, the information, the algorithms, and the distribution p are all selected optimally. This is called the worst-case setting with randomization and the complexity of a problem in this setting is denoted by compW~ See [1], Chapter 11, for a precise definition. The complexity of a problem in the deterministic worst case setting is denoted by compw~ Obviously, compW~
~ compW~
Ve /> 0,
because determinism is a special case of randomization. A central question is w h e n is compW~ much less than compW~ Furthermore, is it possible for a problem that is intractable !or even noncomputable) in the deterministic worst-case setting to be tractable (or computable) in the worst-case setting with randomization? It Lurns out that randomization may help a great deal or it may not help at all. A problem where rand o m i z a t i o n helps is multivariate integration; see Bakhvalov [8]. For the class of r times differentiable functions defined on the d-dimensional unit cube with uniformly bounded derivatives in the sup norm, we have :
and the problem is intractable. On the other hand, compW~
= O
.
Since the exponent is bounded by 2,
and the problem is tractable. Now let r = 0. Then there exists % such that compWO,-aet(e) = o%
Ve ~ %,
and the problem is noncomputable. On the other hand, compWO,-,-~(e) = @ and the problem is computable and tractable. Indeed, this bound is achieved by the classical Monte Carlo method. Thus for multivariate integration, randomization helps enormously and compw~
~ compWO'-det(e).
On the other hand, for certain approximation prob-
lems, see [1], pages 423-425, randomization does not help and compWO,-r~(9 = @(compWO~-a~(e)). We only have examples where randomization does or does not break intractability and noncomputability. There is no theory that characterizes for which problems randomization can help. Furthermore, there is no theory that sets limits on how much randomization can help. This suggests the following questions. Question 1. Characterize those problems for which randomization helps significantly. In particular, for which problems does randomization break noncomputability or intractability? Question 2. What are the limits on h o w much randomization can help?
3.3. Does the Average-case Setting Help? As we have seen, randomization may break intractability or noncomputability. Another possibility is to replace the worst-case setting by average or probabilistic settings. That is, we w e a k e n a worst-case assurance to an average or probabilistic assurance. For simplicity of exposition we limit ourselves here to two average-case settings: the deterministic average-case setting and the average-case setting with randomization. In the deterministic average-case setting the error and cost of an algorithm are defined by averaging with respect to a given measure on the set of problem elements. The average-case complexity is defined as the minimal cost of approximating the solution to within 9 and is denoted by compa~g'aet(e). This is entirely analogous to the definition of average-case complexity in the study of combinatorial problems. The average-case setting with randomization is defined similarly with deterministic information and algorithm replaced by randomized. See [1], Chapters 6 and 11, for precise definitions. The complexity in this setting is denoted by comp'~g'm(e). We will limit our discussion to linear problems. In the worst-case settings, a problem is said to be linear if S is a linear operator and F is a ball of finite radius. In the average-case setting, we additionally assume that the measure is a truncated Gaussian measure over F; see [1], Chapters 4 and 6, for more details. Examples of linear problems include multivariate integration, multivariate approximation, and linear partial differential equations, with appropriate measure. For linear problems, if we ignore multiplicative constants of order I which are independent of e, we have the following basic relations among the four settings defined above, comp,Vg-m(~) = compavg~det(e) ~< compWO~-~n(e) compwor'det(e).
The equality compavg-r~(e) = compavg-d~(e) means that randomization does not help on the average; see [1], page 420. Consider the inequality comp,~g*l,t(e) ~ compWO'-aa(e). It turns out that settling for an average assurance may help a great deal or it may help only a little. A problem where an average assurance helps a great deal is integration of continuous functions. As mentioned before, in the deterministic worst-case setting the problem is noncomputable. In the average case with Wiener measure, comp'Vg~(e) is at most of order e -2, and it was shown by Papageorgiou [9] that it is at least of order e-1. The proof of the u p p e r b o u n d is based on the Monte-Carlo algorithm and therefore does not provide a constructive w a y to find deterministic points at which the function should be evaluated to achieve the b o u n d e -2. Very recently, Wo~niakowski [10] showed that
O( (Io4))
To achieve this bound the function should be evaluated at the deterministic points that can be derived from Hammersley points. Therefore, with an average assurance, the problem is tractable. Another problem where an average assurance may help a great deal is approximation of smooth periodic functions studied by Kowalski and Sielski [11]. (We take their parameters rI - r and aj =-- 1.) They proved that the problem is intractable in the deterministic worst-case setting: compW~'a~(e) = O
.
In the average-case setting, consider a Gaussian measure with mean zero and with a correlation operator whose eigenvalues are proportional to j-v, v > 1, for j = 1,2 . . . . . Then compavg-det(~) = @ where q = 2d/(2r + (v - 1)d); see [1], pages 308-311. If v is chosen independently of d, the problem becomes tractable since q ~ 2/(v - 1). On the other hand, consider the same approximation problem with a Gaussian measure dependent on d by letting v --- 1 + 2ra/d for some fixed positive a. Then the average-case complexity is given by compavg'aet(e) = O
r(lq-a)
9
Thus, the problem is still intractable on the average, and the complexities in worst-case and average-case settings are almost the same for small a. As in the case of randomization, we lack a theory THE MATHEMATICAL INTELL1GENCER VOL 13, NO 2, 1991 ~ 9
that characterizes for which linear problems the complexity in the average-case settings is significantly less than for the worst-case setting. Furthermore, almost nothing is known for nonlinear problems. Exceptions are the recent work of Graf, Novak, and Papageorgiou [12] on zeros of scalar functions that change sign, and the work of Jackowski [13] for multilinear problems. Our discussion suggests the following questions. Question 3. Characterize linear operators S and measures for which the average-case setting helps significantly. In particular, for which problems and measures does the average-case setting break noncomputability or intractability? Question 4. Extend the analysis of average case to nonlinear problems; in particular, to such nonlinear problems as nonlinear equations, optimization, and nonlinear differential equations.
4. Assigning Values to Mathematical Hypotheses 4.1. Introduction. Consider the approximate solution of a large linear algebraic system, A x = b. We would expect that if A is symmetric the computation is easier than if A is non-symmetric, and if A is also posifive-definite the computation is easier still. Generally, computational complexity gives us the opportunity to quantify the effect of mathematical hypotheses. Since complexity is an intrinsic property of a problem, a n y c h a n g e in complexity in using hypothesis H 1 rather than hypothesis H 2 is solely due to the change in hypotheses. We can use computational complexity to see the effect of assuming properties such as convexity and smoothness of functions, or symmetry and positive definiteness of matrices. We consider three examples: 9 Integration of smooth scalar functions 9 Convexity versus smoothness in nonlinear optimization 9 Positive definiteness in large linear systems For simplicity we shall confine ourselves to the worst-case setting. 4.2. Integration. As mentioned before, the complexity of univariate integration for the class of functions of smoothness r is comp(e) = @ Thus, for integration we know what effect changing the hypothesis about smoothness has. For example, if we change from once differentiable to twice differentiable, the complexity decreases by a factor of Ve~. 40
THE MATHEMATICAL INTELLIGENCER VOL 13, NO 2, 1991
Indeed, many problems defined for scalar functions have complexity of order (1/9 1/r. Examples may be found in [1], Chapter 5. In such cases, we know how smoothness affects complexity.
4.3. Nonlinear Optimization. N e m i r o v s k y a n d Yudin [14] study a rather general nonlinear constrained optimization problem for both smooth and convex functions. Their results tell us how m u c h easier the assumption of convexity makes the problem. The problem is to minimize a nonlinear function subject to nonlinear constraints. Let f = [f0, fl . . . . . fm] where f0 denotes the objective function and the f,, i = 1. . . . . m, denote the constraints. Let F be a class of functions f. First, let F = F r be the class of r-times continuously differentiable functions such that the rth derivative of fj in any direction is uniformly bounded on a nonempty compact set D. Then compr(9) = | Thus, the problem is intractable for the class of smooth functions. Now, let F = Fc be the class of convex functions that satisfy a Lipschitz condition with a uniform constant on a bounded convex set D. The complexity for the class of convex functions is of the form compc(e) = | where the constant in the @ notation depends polynomially on d and m. Thus, for convex functions, the problem is tractable. These results quantify the relative value of smoothness and convexity for this optimization problem. 4.4. Large Linear Systems. Let A x = b denote a large linear system where A is an n - b y - n nonsingular, large sparse matrix. If all elements of A and b are known, then the information is complete and we can, in principle, solve the problem exactly by a direct method. However, if n is large, say of order 103 or 104, then the time of a direct solution may be prohibitive. Furthermore a direct algorithm may require too much storage. We can adopt the following strategy. Even if the matrix A is available we will not use it. We will replace the complete information about A with partial information and settle for an approximate solution. That is, the problem is to compute x such that IIAx - b{I ~ 9 provided that Ilbll = 1. This is the idea behind iterative methods for solving large linear systems. What partial information about A is it reasonable to know? For sparse matrices, it is reasonable to use matrix-vector products A v since they can usually be corn-
puted in time and storage proportional to n. Assume then that the information about A and b consists of matrix-vector multiplications,
Nk(A,b ) = [b, Azl . . . . .
AZk],
where z, may depend on b and on the previously computed vectors Az 1. . . . . Azz- 1. To be of interest, k ~ n. Krylov information is the special choice z I = b, z z = Az,_ 1. Thus Krylov information is given by
Nk(A,b) = [b, Ab . . . . .
Akb].
Many of the standard iterative methods, such as conjugate gradient, Chebyshev, minimal residual, and successive approximation, use Krylov information. Chow [16] showed that Krylov information and the minimal residual algorithm are almost optimal; that is, they are optimal up to a multiplicative factor of 2 if A belongs to a class F of orthogonally invariant matrices. A class F is orthogonaUy invariant if
A E V ~ QTAQ E V for any orthogonal matrix Q, QTQ = L Examples of such F are positive definite matrices, symmetric matrices, and matrices with uniformly b o u n d e d condition number. Let compv(e,F ) denote the complexity of solving large linear systems if information consisting of matrix-vector multiplications is used. Formulas for compv for a number of classes may be found in [15]. We are now ready to give a partial answer to how much easier the problem is if A is positive definite. Let F 1 be the class of symmetric positive definite matrices with condition numbers uniformly b o u n d e d by M. Class F2 differs from F 1 by the lack of positive definiteness. Then it follows from the general expressions for compd 9 that compv(e,F1) = M-V~compde,F2), for 9 small, M large, and n so large that n > Mln(2/e). Thus if we limit ourselves to information consisting of matrix-vector multiplications, we see that the property of positive definiteness makes a problem easier by a factor of about M - ~ . This is only a partial answer; why should we limit ourselves to matrix-vector multiplications? If we permit more general information about A, we might lower the complexity from compv or change our conclusion of what positive definiteness is worth. (As we remarked in Section 2.5, the complexity depends on the class of permissible information operators.) We consider two generalizations of information consisting of matrix-vector multiplications. First assume that inner products aTv can be computed, where a T denotes the ith row of A and v is a column vector. This is clearly a generalization of the information we considered above. Such inner product information was studied by Rabin [17] for the exact solution of
linear systems, 9 = 0, and for an arbitrary matrix. He obtained the surprising result that about n2/2 inner products are sufficient to find the exact solution. Nothing is known if 9 > 0.
Question 5. H o w many inner products are needed for certain classes of matrices for an e-approximation, ~ > O? Question 6. What is the complexity for certain classes of matrices if inner-product information is used? Next we permit still more general information. Suppose we compute L,(A,b), where L, denotes an arbitrary linear functional on A and b. Consider information N = [L1. . . . .
Lk],
w h e r e the choice of L, may d e p e n d arbitrarily on LI(A,b) . . . . . L,_I(A,b). We believe the following questions are difficult.
Question 7. H o w m a n y linear functionals L, are needed for certain classes of matrices for an e-approximation? Question 8. What is the complexity for certain classes of matrices if arbitrary linear functional information is used?
5. IBC and Combinatorial Complexity We compare and contrast IBC and combinatorial complexity. Our discussion will be rather brief and limited to some of the major similarities and differences. At the end of this Section we will briefly consider a complexity theory recently introduced by Blum, Shub, and Smale [3]. We first point out two characteristics that all complexity theories share. 9 There is always a model of computation that states what operations are permitted and h o w much they cost. 9 The complexity is defined as the minimal cost of performing a computation. IBC and combinatorial complexity differ in two significant ways: 9 The model of computation. 9 Assumptions concerning the available information. We discuss these in turn. IBC uses the real-number model characterized by real numbers with unit cost for each operation. Combinatorial complexity usually uses the bit model charTHE MATHEMATICAL INTELLIGENCER VOL 13, NO 2, 1991
41
Combinational Complexity
Blum-Shub-Smale Complexity
IBC
Model o f Computation
Bit Model
Real-Number Model
Real-Number Model
Information
Complete Exact
Complete Exact
Partial Contaminated
Figure 1. acterized by n-bit integers With the cost of operations proportional to the length of operands. We emphasize that IBC can use models other than the real-number model, but the work to date has been done mostly for the real-number model. An interesting exception is the work of Bojaficzyk [18], who analyzes linear systems in variable precision using the parallel model of computation. The rationale for the real-number model is that fixed-precision floating-point numbers are almost universally used for numerical calculations whether they occur in science, engineering, or economics. The cost of floating-point operations is independent of the size of operands. Complexity results are essentially the same as in the real-number model provided optimal algorithms are stable. The real-number model is used in IBC to decouple complexity from round-off issues. The numerical stability of optimal algorithms requires further study; see [1] for some known results. The real-number model is also used in algebraic complexity for such problems as polynomial evaluations and matrix multiplication; see [19, 20, 21, 22, 23]. We stress that the complexity of a problem can be totally different in real-number and bit models. Traub and Wo~niakowski [24] discuss this in the context of the complexity of linear programming. Even though Khachian's algorithm shows that linear programming has polynomial complexity in the bit integer model, they conjecture that this problem is intractable in the real-number model. We now consider assumptions concerning the available information. To motivate our discussion we use a typical problem of combinatorial complexity, the traveling-salesman problem (TSP), which is formulated as follows. One is given a set of n cities together with the distances between all pairs of cities. A trip must be made that visits each city exactly once and returns to the starting point. The problem is to find a tour, essentially an ordering of the n cities, that minimizes the total distance traveled. Since the distances between the cities are given, the 42
THE MATHEMATICAL INTELLIGENCER VOL 13, NO 2, 1991
information for TSP is complete. Furthermore, it is usually assumed that the distances are given as integers and that this information is exact. Since the in- 9 formation is complete and exact TSP can, in principle, be solved exactly. To summarize, IBC assumes information is partial and contaminated. Combinatorial complexity makes the opposite assumptions: information is complete and exact. The difference between combinatorial complexity and IBC can be succinctly summarized as follows: Combinatorial Complexity: Given f, compute S(f), IBC: Given N(f), approximate S(f). In closing we briefly discuss the complexity theory recently introduced by Blum, Shub, and Smale [3]. They adopt the real-number model. Information is assumed to be complete and exact. Finite-dimensional continuous problems are considered. In particular, they show that the problem of determining whether a system of n real polynomials of degree at most four has a real zero is NP-complete in the real-number model. Figure 1 summarizes the assumptions regarding model of computation and information in combinatorial complexity, Blum-Shub-Smale complexity, and IBC.
Acknowledgements: We are grateful to T. Boult, M. Kon, G. W. Wasilkowski, and A. G. Werschulz for their comments on this paper.
References 1. J. F. Traub, G. W. Wasilkowski and H. Wo~niakowski, Information-Based Complexzty, New York: Academic Press (1988). 2. E. W. Packel and H. Wo~niakowski, Recent developments in information-based complexity, Bull. Amer. Math. Soc. (2) 17 (1987), 9-36.
3. L. Blum, M. Shub and S. Smale, On a theory of computation and complexity over the real numbers: NP-completeness, recursive functions and universal machines, Bull. Amer. Math. Soc. (2) 21 (1989), 1-46. 4. J. F. Traub and H. Wo2niakowski, A General Theory of Optimal Algorzthms, New York: Academic Press (1980). 5. G. W. Wasilkowski, A clock synchronization problem with random delays, J. Complexity 5 (1989), 1-11. 6. M. R. Garey and D. S. Johnson, Computers and Intractabd~ty: A Guide to the Theory of NP-Completeness, New York: W. H. Freeman and Company (1979). 7. M. O. Rabin, Probabilistic algorithms, Algorithms and Complexity: New Directwns and Recent Results, (J. F. Traub, ed.) New York: Academic Press (1976), pp. 21-39. 8. N. S. Bakhvalov, On approximate computation of integrals (in Russian), Vestnik MGV, Ser. Math. Mech. Astron. 'Phys. Chem. 4 (1959), 3-18. 9. A. Papageorgiou, Average Case Complexity Bounds for Continuous Problems, Ph.D. Thesis, Dept. of Computer Science, Columbia University, 1989. 10. H. Wo2niakowski, Average-case complexity of multivariate integration, to appear in Bull. Amer. Math. Soc 11. M. A. Kowalski and W. Sielski, Approximation of smooth periodic functions in several variables, J. Complexity 4 (1988), 356-372. 12. S. Graf, E. Novak and A. Papageorgiou, Bisection is not optimal on the average, Numer. Math. 55 (1989), 481-491. 13. T. Jackowski, Compleraty of multilinear problems in the average case setting, J. Complexity (to appear). 14. A. S. Nemirovsky and D. B. Yudin, Problem Complexzty
15. 16. 17.
18. 19. 20. 21. 22. 23. 24.
and Method Efficzency m Optimization, New York: WileyInterscience (1983). J. F. Traub and H. Wo~niakowski, On the optimal solution of large linear systems, J. Assoc. Comput. Mach. 31 (1984), 545-559. A. W. Chou, On the optimality of Krylov information, J. Complexity 3 (1987), 26-40. M. O. Rabin, Solving linear equations by means of scalar products, Complexlty of Computer Computations, (R. E. Miller and J. W. Thatcher, eds.) New York: Plenum (1972), 11-20. A. Bojaficzyk, Complexity of solving linear systems in different models of computation, SIAM J. Comput. 21 (1984), 591-603. A. M. Ostrowski, On two problems in abstract algebra connected with Homer's rule, Studies Presented to R. von Mises, New York: Academic Press (1954), 40-48. V. Ya. Pan, On methods of computing polynomial values, Russian Mathematical Surveys 21 (1966), 105-137. V. Strassen, Gaussian elimination is not opbmal, Numer. Math. 13 (1969), 354-356. V. Pan, How to multiply matrices fast, Lecture Notes in Computer Sczence, vol. 179. Berlin: Springer-Verlag (1984). D. Coppersmith and S. Winograd, Matrix multiplication via arithmetic progression, Proceedings of the Nineteenth ACM Symp. on Theor. of Comp., New York (1987), 1-6. J. F. Traub and H. Wo~.niakowski, Complexity of Linear Programming, Oper. Res. Lett. 1 (1982), 59-61.
J. F. Traub and H. Wo2makowski Department of Computer Science Columbia Unwerszty New York, NY 10027 USA
H. Wo~makowski Institute of Informatics University of Warsaw Warsaw, Poland
9000000 9 0 We need your new address so that you do not miss any issues of The Mathematical InteUigencer. Please fill out the f o r m below and send it to:
".. ..*. 0000
Springer-Verlag New York, Inc. Journal Fulfillment Services 44 Hartz Way Secaucus, NJ 07094
: -
Old Address (or label) Name
Address 00 0000 9
O 000000
City/State/Zip
New Address N~ne
Please give us six weeks notice.
Address
City/State/Ztp
THE MATHEMATICALINTELLIGENCERVOL 13, NO 2, 1991 43
Ian Stewart* The catapult that Archimedes built, the gambling-houses that Descartes frequented in his &ssolute youth, the field where Galois fought his duel, the bridge where Hamdton carved quaternions--not all of these monuments to mathematzcal hzstory survwe today, but the mathematzczan on vacatzon can stall find many reminders of our subject's glomous and inglorious past: statues, plaques, graves, the cafd where the famous conjecture was made, the desk where the
famous initials are scratched, bzrthplaces, houses, memorials. Does your hometown have a mathematzcal tourist attraction? Have you encountered a mathematical sight on your travels? If so, we invite you to submit to this column a picture, a descrzption of its mathematical significance, and either a map or &rectzons so that others may follow m your tracks. Please send all submtsszons to the European Editor, Ian Stewart.
The Cloisters of Hauterive Benno Artmann One of the most typical elements of Gothic architecture is the tracery found in windows, on walls, and in many other places in Gothic churches. What is mathematical about it? Tracery is exclusively constructed from circular arcs (and straight line segments)! It is the most mathematical kind of art known to me. In many of the t h o u s a n d s of Gothic c h u r c h e s a n d other buildings of that time surviving in Europe you can find nice examples, take photos, and analyse them geometrically at home. In what follows I will first give a short introduction to tracery (German: MaJdwerk, French: r~seau) and then direct your attention to one mathematically outstanding example. General Remarks about Tracery
The Gothic style originates from France, more precisely from the parts of France close to Paris, from about 1150. Tracery, however, first appears in the 1220s in Reims, so that for instance you will find no tracery windows in the older parts of the cathedral of Chartres. G~inther Binding [1] defines three principal periods of stylistic development: (i) 1220-1270 High Gothic (ii) 1270-1360/80 Radiant (iii) 1350-1520 Flamboyant Obviously, the years are to be taken approximately; in various parts of Europe the architecture of the same year may be very different. * Column Editor's address: Mathematics Institute, University of Warwick, Coventry CV4 7AL England. 44
The first tracery windows were built by the architect Jean d'Orbais in the cathedral of Reims during the years 1211-1221 (Figure 1). Their construction is based on the equilateral triangle as shown in Figure 2. An iteration and variation of the first construction can be seen in the north w i n d o w of the church of Haina, Figure 3. (The former Cistercian monastery of Haina is located about 30 kilometers north of Marburg, Germany.) The two smaller parts of this window are clearly repetitions of the first Reims tracery. The upper part can be found in Reims as well as the whole composition in St. Denis ([1], p. 47, 51). From the large upper circle we are able to learn how architects free themselves from the overly strict rules of geometry in favor of aesthetic considerations. H o w is that part of the window constructed? Start with the two small pointed arcs, which are constructed from equilateral triangles. The points of abutment of these arcs are A, B, C. N o w the architect selects the point M - - o r , equivalently, the radius r of the great circle. The size of this circle is determined by the architect's artistic judgment. The distance AM will then be a + r, because the arc and circle are in contact; see Figure 4. For the completion of the window the architect needs the points X, Y of abutment for the great arc such that XYZ is an equilateral triangle and the circle about M is touched by the arcs. The distance from Y to the point of contact T must be 2a, hence the distance from M to X (or Y) should be 2a - r, and X, Y can be found. O b s e r v e that, in the real w i n d o w , the architect marks A, B, and C but conceals X and Y. The most important mathematical tools in t h i s - and all other constructions of traceries--are circles in
THE MATHEMATICAL INTELLIGENCERVOL 13, NO 2 9 1991 Spnnger-Verlag New York
Figure 1. Reims 1211-1221 (from [1], page 46).
Figure 2. C o n s t r u c t i o n Figure 3. H a i n a , north Figure 4. Construction of based on the equilateral tri- window, dating from 1250 window in Figure 3. (from [9], Tafel CXXIV). angle.
contact (and the division of a circle into equal parts). Sometimes a little more has to be known. Readers may amuse themselves by constructing a so-called 8-foil as in Figure 5 or by finding M and r in Figure 6, where ABC is again an equilateral triangle. The design of traceries became rapidly more complicated. Figure 7 shows an example from Strasbourg about 1285. It is basically an iteration of equilateral tria n g l e s - r e c o n s t r u c t it yourself! Some eighty years after the north window the great west w i n d o w of Haina was built (Figure 8, about 1330). Observe the w a v y pentagram consisting of (curved) equilateral triangles and the division of the great circle into fifteen parts. One final example, Figure 9, shows the different stylistic means of the late Gothic times in Germany. T h e Role of G e o m e t r y i n G o t h i c A r c h i t e c t u r e
In their fundamental work on early Gothic architecture Kimpel and Suckale ([6], p. 28) say "tracery is the specially favored medium of the Gothic 'love of geometry.' One of the main pursuits of architects up to the sixteenth century was to decorate a building with a profusion of variants and to invent new ones. Tracery is that part of the Gothic style that is most distant from
Figure 7. Strasbourg, north window of the cathedral dating from 1285 (from [1], page 246).
C
A
Figure 5. An e i g h t - f o i l drawing.
B
Figure 6. Find M and r.
the anthropomorphic architecture of antiquity. To put it positively: It is one of the few creations of ornaments in Europe that doesn't owe anything to antiquity." The mastery of geometry raised Gothic architects above the artisans and m a d e t h e m - - i n medieval terms--scientists. In fact the stonemasons considered geometry an essential part of their trade. Figure 10 shows a late medieval woodcut associating geometry with stonemasonry. We have seen a little of the freedom and the constraints of the geometric methods for the construction of t r a c e r i e s in the e x a m p l e f r o m H a i n a ( n o r t h window). More elaborate designs follow the same
Figure 8. Haina, west window dating from 1330 (from [9], Tafel CXXV).
Figure 9. Esslingen, south window dating from 1410 (from [1], page 331).
THE MATHEMATICAL INTELLIGENCER VOL 13, N O 2, 1991 4 5
Figure 10. Geometry and stonemasonry (from [4], page 58). rules: Every detail has to be geometrically constructed, except for the smallest ornamental pieces or things such as the profiles of the mullions. In m o d e m terms we could clearly speak of geometrical or "concrete" art long before the "Neo-Geo" of the twentieth century. Once one has seen a specimen of tracery, it is in most cases relatively easy to reconstruct it and hence "understand" such a window to a much higher degree than other works of art. But always remember that geometry is the servant and not the master of the architect. The geometric methods were easily understood and adopted by the nineteenth century architects who created so many "neo-Gothic" churches in Europe and America. One of their textbooks was Ungewitter [9]. H o w e v e r elaborate the construction of a tracery window might be, it was not an exercise in mathematical geometry as we know it. Just as little was it geom-
etry in the sense of the contemporary mathematicians such as Leonardo Fibonacci ([6], p. 45). Euclid was known in Latin translation since about 1150, but long before that people learned geometry from the so-called pseudo-Boethius [2], which in its essential parts is a b o i l e d - d o w n version of the first books of Euclid without proofs. The first proposition in Euclid's first b o o k is the construction of an equilateral triangle using the arcs that we see so frequently in Gothic traceries. The use of circles in contact and subdivisions of a circle into equal parts was the principal method of construction. The traceries became more and more complicated, but geometric methods remained the same. Essentially the same phenomenon has been observed by Jens Hoyrup [5] in Babylonian mathematics: A fixed method (in that case a way to solve quadratic problems) remains in constant use, and its school of practitioners adds more and more complicated examples. H o y r u p calls this "subscientific m a t h e matics"; it is the trademark of Babylonian schools of scribes as well as of medieval guilds of masons. I believe the concept of "subscientific mathematics" is most suitable to describe what we see in Gothic architecture. A fixed supply of elementary geometry was used for traceries and for other problems in the plan of a Gothic building as well. It is of no use to speculate further about constructions according to the golden section and the like. K. Hecht [3] has rejected dozens of theories about "secret methods" of medieval architects by confronting them with the actual buildings. (If you measure enough distances and allow generously for tolerances, you will find the golden section in any old building.) We have some Gothic design booklets from about 1500, edited by Shelby [7]. A careful analysis by Hecht ([3], p. 171-201) confirms what was said above: They are as "subscientific" as everything else. T h e Cloisters of Hauterive
Figure 11. Ground plan of the cloisters of Hauterive (from [10], page 132). 46
THE MATHEMATICAL INTELLIGENCER VOL 13, NO 2, 1991
The tracery windows in the cloisters of Hauterive are quite different from the traceries we have discussed so far. T h e y are not only c o n s t r u c t e d by geometric methods, but in fact they show geometry itself. The Cistercian monastery of Hauterive is situated on the banks of the river Saane (Saarine) some ten kilometers south of Fribourg in Switzerland. It is still inh a b i t e d b y Cistercian m o n k s . The History of its buildings is presented in detail by Catherine WaeberAntiglio [10]. The cloisters and the choir of the church were built in the years 1320-1328. Some reconstruction took place around 1910, but what you see in Hauterive are essentially the original medieval windows. The west, north, and east parts of the cloisters are in their original condition. The south part was taken a w a y in the e i g h t e e n t h c e n t u r y and t w o of its windows have been placed on the west and east side (numbers I and XIX). On the ground plan (Figure 11)
Church
you can see the foundations of the old south wing. In each bay of the cloisters (except for the ones in the c o r n e r s ) w e f i n d the s a m e c o m p o s i t i o n of windows: three small ones, separated by double columns and round arches, looking almost Romanesque, on the lower level and above t h e m either pointed or r o u n d tracery windows. The pointed windows have elaborate and delicate traceries, but they are not what we are looking for. The round ones have the geometrical motifs. Before giving a mathematical analysis, let us look at the cloisters as a whole. In the north wing we see only round arches; the east and west wings have alternating pointed and round arches. Windows 1 and XIX come from the old south wing ([10], p. 1360. The great window in the choir of the church was built in the same period and clearly shows the same style as the cloisters. Waeber-Antiglio ([10], p. 123-178) gives a detailed stylistic and historical analysis of the cloisters and the choir of Hauterive, placing them firmly in context with their neighbors in space and time and especially with the building tradition of Cistercian monasteries. On page 178 she points out the cloisters" specific originality: by using a very conservative (for the time) design of the triplets below and placing above them the most modern traceries, the cloisters of Hauterive are uniquely distinguished. Spahr ([8], p. 23) says that the cloisters are among the finest preserved north of the Alps. Neither Waeber-Antiglio nor Spahr mention the m a t h e m a t i c a l significance of the d e s i g n s of the
48
THE MATHEMATICALINTELLIGENCERVOL 13, NO 2, 1991
windows, which we now discuss. First observe that the architect stresses interest in the circle by selecting round arches. He approaches this topic systematically by subdividing the circle in three (window II), four (IX), five (VI), six (XI) equal parts. With little ornamental subdesigns the architect tries to alleviate the dry geometric diagrams, sometimes successfully, as in IX, sometimes less convincingly as in XI. Five-fold symmetry is repeated by the pentagram in VIII and again in the beautiful and ingeniously constructed rose in window IV. The subject of triangles is taken up and iterated in window XII. Observe that II has the sequence 3-9-27 built into it. Three- and six-fold symmetries are combined in window XVI, which has a very simple construction once you view the circle as the incircle of an equilateral triangle. Four and eight are combined in XIV. Window X tries to have it its own way: In spite of the supplied round arch, we see no circle but two squares with conventional pointed arcs above them. A second view reveals the role of this window: Its bilateral symmetry stresses its position in the middle of the cloisters right across from the (former) f o u n t a i n chapel. It is quite understandable in the overall composition of the cloisters. So far I have described what we see. Is there an interpretation that gives a general and coherent explanation of the designs? I offer this one: The architect was interested in theoretical geometry, especially in Eu-
clid's theory of the subdivision of a circle as presented in the Elements, Book IV. In that Book Euclid treats, in a rather Bourbakist fashion, the construction of the regular n-gons, more specifically the inscription of regular 3-, 4-, 5-, 6-, 15-gons into given circles. I understand the cloisters of Hauterive as a commentary to Euclid Book IV, carved out of stone. Can I maintain this interpretation in the case of window XVIII with its implicit 9-gon? First observe the five-foil in the center and the three bars of an equilateral triangle in the surrounding annulus. The combination of these two designs would result in the regular 15-gon as constructed by Euclid. By subdividing each of the three parts of the annulus again into three equal parts the architect goes beyond Euclid: The construction of the regular 9-gon by ruler and compass is impossible. I believe that in this particular case esthetic considerations o v e r w h e l m e d mathematics: fifteen little ornamental triangles would have required a very narrow annulus and a much greater five-foil with a rather empty center. As it is, this window is one of the most beautiful of the cloisters. Go to Hauterive and judge for yourself: Is it about mathematics or not? In any case you will enjoy it.
References 1. G~nther Binding, MaJJwerk, Darmstadt: Wiss. Buchgesellschaft (1989). [History and style of traceries in
2. 3.
4. 5. 6. 7.
8.
9.
10.
France, England and Germany. Over 400 pictures, up-to-date literature.] Menso Folkerts, "'Boethms'" Geometrie II, Wiesbaden: F. Steiner (1970). [Edition of the principal medieval textbook on geometry before Euclid was translated.] Konrad Hecht, MaJJ und Zahl in der gotischen Baukunst, Hildesheim: Olms (1979). [Discusses dozens of theories about geometrical secrets in the construction of Gothic buildings and finds no positive evidence.] "~" M. Heidelberger, and S. Thiessen, Natur und Erfahrung, Reinbeck bei Hamburg: Rowohlt (1981). Jens Hoyrup, Sub-scientific mathematics, History of Science 28, part 1, no. 79 (1990), 63-87. D. Kimpel, and R. Suckale, Die gotische Archltektur in Frankreich 1130-1270, Munchen: Hirmer (1985). [Authoritative work. Excellent pictures.] Lon R. Shelby, (ed.), Gothic design techniques: The fifteenth-century design booklets of Mathes Roriczer and Hans Schmuttermayer, Carbondale: Southern Illinois Univ. Press (1977). Columban Spahr, O. Cist., Hautemve, M/inchen und Ziirich: Schnell & Steiner (1984). [A small booklet containing all basic information about Hauterive, which you will get for a few francs in Hauterive.] G. Ungewitter, Lehrbuch der gotischen Konstruktionen, 2 volumes, Leipzig (1890-1892). [If you want to build a Gothic church, you will find complete instructions in this book.] Catherine Waeber-Antiglio, Hauterive: La Construction d' une abbaye Cistercienneau moyen dge, Fribourg: Ed. Universitaires (1976). [Everything about the buildings and sculptures of Hauterive.]
Fachbereich Mathematik Technische Hochschule Darmstadt 6100 Darmstadt, Federal Republic of Germany
THE MATHEMATICAL INTELLIGENCER VOL 13, NO 2, 1991
49
The Cinderella Career of Projective Geometry Abe Shenitzer
To have "'class,'" an essay must have a motto. The best motto for this paper is: Z u s a m m e n g e s t o h l e n yon Verschiedenem diesem u n d jenem (roughly: a motley collection of stolen items). Since I tend to rely on secondary sources, I stole this motto from H. M. Schey's Div, Grad, and all that. (Schey attributes this statement to Ludwig van Beethoven; I do not know where Schey found it.) The paper is a flowchart of the devious route by which projective geometry moved from its humble Greek origins to its apotheosis by Arthur Cayley and beyond. The term "beyond" refers to Felix Klein"s contribution that culminated in the Erlangen Program. Klein"s contribution not only filled gaps in Cayley's work but also--and this is crucial--supplied a much-needed corrective to Cayley's potentially misleading dictum that "'projective geometry is all geometry.'" Projective geometry is important as the source (in the sense of Klein's Erlangen Program) of many classical geometries and as the preliminary stage of the study of algebraic geometry. The paper is meant for readers who know---or knew---the basics of projective geometry and are interested in the historical aspects of its evolution. It opens with an Introduction that describes some of the conceptual basics of projective geometry. A reader who has forgotten the meaning of some basic terms will find them in the Appendix, which is followed by a Bibliographical Note. I wish to thank Benno Artmann, Donald Coxeter, John Fauvel, and John Stillwell for encouraging remarks and helpful suggestions. Of course, I am solely responsible for all remaining flaws.
to the existence of parallel l i n e s - - vanish. In some cases (such as the theorem of Bezout) the validity of a result requires the setting of the complex projective plane. One of the many useful models of the projective plane is based upon lines and planes in R 3 passing through the origin. A point of the projective plane is a line in R 3 containing the origin and a line of the projective plane is a plane in R 3 containing the origin. This being so, a projective point can be thought of as a class of triples of direction numbers (p,q,r) of a (Euclidean) line in R 3 containing the origin; in particular, (p,q,0) and its nonzero multiples is a point at infinity. Because a Euclidean plane is determined by its normal, the analytic counterpart of a projective line is also a class of triples of direction numbers [p,q,r]; in particular, the class determined by [0,0,w] is the line at infinity. At least one number in a triple must be different from zero. The line [u,v,w] is said to be on the point (p,q,r) if and only if up + vq + wr = O. The same equation
The Euclidean plane lacks symmetry: two points always determine a unique line but two lines don't always determine a unique point. This is remedied by adding to each pencil of parallel lines in the plane a "point at infinity." All points at infinity form the "line at infinity." The n e w plane is called the projective plane. In this setting, some theorems of Euclidean and affine g e o m e t r y are eliminated, while others achieve great simplicity: the many special c a s e s - - d u e 50
THE MATHEMATICALINTELLIGENCERVOL 13, NO. 2 O 1991 Spnnger-Verlag New York
can also be taken to mean that the point (p,q,r) is on the line [u,v,w]. The symmetry of the roles of points and lines in the projective plane is obvious. One could guess that an axiomatic description of the projective plane would be characterized by duality, in the sense that interchanging "point" and "line" in an axiom would yield another axiom. A similar statement applies to theorems. If the number triples in our model of the real projective plane are complex rather than real numbers, then we obtain a model of the complex projective plane. (This is a hint and not an adequate description.) Some of the issues dealt with below will make the transition from real to complex projective geometry absolutely necessary. For us, the most important objects in the projective plane will be lines and conics. They are given by homogeneous polynomials of degree 1 and 2 respectively. (The polynomials are homogeneous because the coordinates are homogeneous.) The C o n c e p t u a l E v o l u t i o n of Projective G e o m e t r y The origins of affine and projective transformations go back to Greek antiquity. ApoUonius's work on conics relies heavily on affine transformations. More specifically, Apollonius obtained many of his results on ellipses by viewing them as affine images of circles (the term "affinity" of figures is due to Euler). In particular, he knew of the invariance of the simple ratio AB/BC of three collinear points under parallel projections. We know from Pappus (third century A.D.; the discoverer of Pappus's Theorem, one of the fundamental theorems of projective geometry) that Greek geometers were aware of the invariance of the cross ratio (A,B,C,D) = (AC/CB)/(AD/DB) of four collinear points under central projection and that they used it to compute the dimensions of figures from their perspective images. They (in particular, Apollonius) also knew that the collinear tetrads A,B,X,Y depicted in Figure 1 are harmonic, that is, their cross ratio (A,B,X,Y) is - 1 (note our use of directed lengths). In spite of the endless talk about the contributions of Renaissance painters to the study of perspective, these contributions come down to empirical rules, and, reg a r d l e s s of their u s e f u l n e s s , are n o t p r o j e c t i v e theorems. The first clear and unmistakable connection between perspective representation and the study of projective properties of figures as a distinct part of Euclidean geometry is found in the work of G6rard Desargues (1593-1662). His famous Brouillon Project appeared in 1639. Among other things, Desargues created and applied the theory of poles and polars (see Appendix) and made systematic use of points and
///~] ///X~ N
N
N
\ ~
Figure 1. Example of a harmonic tetrad A,B,X,Y (associated with a polarity induced by a conic).
lines at infinity. The great creative follower of Desargues's ideas (ideas presented in a less than crystalclear manner) was Blaise Pascal (1623-1662). Another champion of his ideas was P. de la Hire (1640-1718). A number of relevant individual results, and indeed a profound projective conceptualization of cubic curves, are due to Isaac Newton. The seventeenth-century flowering of the study of projective properties of figures was shortlived. This was due not so much to Desargues's use of new and unusual terms and his limited expository skill as to the spectacular successes of Descartes's method of coordinates. The negative uniformizing effect of the method of coordinates on the conceptual evolution of geometry is described by N. A. Glagolev in these words. From the very beginning, the study of projective properties of figures invariably involved the use of ad hoc synthetic methods. Up to the time of Descartes the synthetic method was the sole method of solution of geometric problems. Descartes's method of coordinates placed at the disposal of geometry the powerful apparatus of algebraic analysis. Its use made possible quick and simple solutions of problems that were either unsolvable by synthetic methods or, if so solvable, required the use of many awkward constructions. The method of coordinates systematized the study of geometric forms and opened to geometry prospects undreamt of in the earlier era of synthetic investigations. At the same time, however, Descartes's analytic method erased all differences of distract geometric propositions and properties by impressing on them its particular analytic characteristic. That is why, while granting its power, the most profound geometers saw in the method of coordinates, from its very inception, a very great danger to geometry. The analytic method posed the twin danger of replacing geometry with algebraic analysis and of blocking the direct perception of "live" properties of real physical space. Newton was the first to point out this danger and consciously favored the synthetic method in geometry over Descartes's analytic method. Newton called a solution of a geometric problem obtained by analytic means "not a solution but a mere computation," and regarded it as incomplete. But there was nothing he could oppose the analytic method with. The synthetic method had no al-
THEMATHEMATICAL INTELLIGENCERVOL13.NO2,1991 51
Klein's contribution not only filled gaps in Cayley's work but alscr--and this is crucialm supplied a much-needed corrective to Cayley's potentially misleading dictum that "'projective geometry is all geometry.'" gorithmic apparatus with which to counter the method of coordinates, a method which developed ever more rapidly and found an ever growing number of applications. Newton's geometric genius notwithstanding, the synthetic method of ancient Greece was no match for the new geometric thought equipped with the powerful investigative tool of the algorithms of algebra. Synthetic methods began to die out. They were used less and less and were eventually forgotten. The analytic method became not only the dominant but also the only method of solution of complex geometric problems. Practical uses of projections in the late 18th century led to a revival of interest in (synthetic) geometry. From this it was a short step to the resurrection of projective geometry. In 1822 Victor Poncelet (1788-1867) published his Treatise on projective properties of figures. In it he defined a projective property of a (plane) figure as a property unchanged by central projection. In this way he created a geometric discipline within Euclidean geometry. His methods were synthetic. How far Poncelet was from the modern notions of projective geometry as a distinct geometry in the sense of Klein, or in David Hilbert's sense of a discipline based on a set of axioms, is indicated by his division of projective properties into graphic properties (exemplified by the incidence of points and lines) and metric properties (exemplified by harmonic conjugacy of point pairs (see Appendix)). Poncelet gave the first examples of projective transformations (= products of projections) such as homologies and polarities (see Appendix). The latter transformation suggested to him the principle of duality--a term due to Joseph Gergonne (1771-1859), who came to this insight by following a line of thought close to an axiomatic approach to projective phenomena. Of the other concepts in Poncelet's Treatise we mention circular points at infinity. In terms of homogeneous coordinates in the (projective) plane they are the triples (1,i,0) and'(1,-i,0). (These points lie on a conicax 2 + bxy + cxz + dyz + fy2 + gz 2 = 0iffb = 0 and a = f.) The fuss we make about Poncelet's introduction of these points is amply justified. For one thing, their use initiated the use of complex numbers in geometry. For another, they were used by Edmond Laguerre (in 1853, in the paper " O n the theory of foci") to give a projective definition of the (Euclidean) angle between two lines (see below). The study of projective properties of figures was greatly advanced by Jakob Steiner (1796-1863) in his 1832 work Systematic development of the mutual depen52 THE MATHEMATICALINTELLIGENCERVOL 13, NO 2, 1991
dence of geometric forms. A remarkable illustration of Steiner's approach is his projective definition of a conic as the set of points of intersection of two projectively--not p e r s p e c t i v e l y - - r e l a t e d pencils of lines (see Appendix). This endowed the study of projective properties of conics with a measure of autonomy. Steiner's methods were synthetic throughout. A towering figure in our story is Christian von Staudt (1798-1867). Whereas the contributions of Poncelet, Steiner, and their contemporaries made the study of projective properties of figures a discipline with a rich and clearly described content, its dependence on metric (Euclidean) geometry was obvious, in as much as its key concepts--the cross ratios of four collinear points and four concurrent lines--were tied to lengths of segments and sines of angles respectively (see Appendix). Staudt was the first to p o s e - - a n d essentially solve--the problem of making the study of projective properties of figures independent of Euclidean metric relations. 1 Staudt realized this very important methodological goal in The geometry of position (1847) and in Contributions to the geometry of position (1856, 1857, 1860). Staudt defined harmonic tetrads by means of a complete quadrangle (see Appendix) and a complete quadrilateral ( = the dual of a complete quadrangle). He defined projective mappings as mappings preserving harmonic tetrads. He gave a synthetic prescription for generating a net of rationality 2 on a line using repeatedly the construction of a fourth harmonic to three given points on the line. The absence of a suitable axiomatic basis for the reals prevented him from proving with all necessary rigor that his net of rationality was everywhere dense on the line and could therefore be e x t e n d e d - - b y using limits--to a projective coordinate system on the (projective) line. His scheme for a projective coordinatization of a line could readily be ext e n d e d to the (projective) plane and to (projective) space. Staudt was the first to outline an axiomatic basis for projective geometry. He must also be credited with the creation of complex projective geometry and with having anticipated Fano in the discovery of finite projective planes. 3 The study of projective properties of figures--including properties called "metric" by Poncelet (see a b o v e ) - - h a d by now become an autonomous disci-
1 Staudt did not elilmnate parallehsm from his development. 2 Start with three collinear points P,A,B. They gave rise to three n e w points D1,D2,D3 such that D1,P,A,B; D2,A,P,B; D3,B,P,A are harmonic tetrads. These szx points are split into groups of three, each of w h i c h gaves three n e w points as before, and so on. The resulting s y s t e m of p o i n t s can be " u s e f u l l y " labelled w i t h t h e r a t i o n a l n u m b e r s N o w hmlts are used to coordmataze the remaining points 3 In the latter connectaon see H S. M. Coxeter, Prolectwe Geometry (2nd ed.), Blalsdell, 91-92
pline c o n c e p t u a l l y i n d e p e n d e n t of its E u c l i d e a n parent. The next phase was the subordination of Euclidean geometry (and of other geometries) to projective geometry. This phase began with the remarkable discovery of Laguerre that 0, the Euclidean measure of the angle b e t w e e n lines u~ a n d u2 w i t h respective slopes k: and k2 intersecting at a point (xo,Yo) is given by i
,
.
i
0 = ~ log(ul,u2,h,]2) = ~ log(kl,k2,i,- i), w h e r e jl a n d j2 are the isotropic lines y - Y0 = +_ i(x - Xo) passing t h r o u g h (x0,Y0) (see Appendix). Stimulated by Laguerre's remarkable definition and by the n e w analytic m e t h o d s developed earlier (in the n i n e t e e n t h century) by A u g u s t M6bius a n d Julius PlUcker, Cayley (1821-1895) (who along with James Sylvester had t h o r o u g h l y investigated the question of algebraic invariants of forms under linear substitions) u s e d analytic m e t h o d s to give projective definitions of angle a n d distance tied to quadratic forms. This made projective g e o m e t r y - - p r e s u m a b l y a s t u d y of incidence p r o p e r t i e s - - t h e source of a host of metric geometries, including Euclidean, hyperbolic, and elliptic geometry. 4 A minimal sketch of Cayley's approach follows. Cayley starts with an imaginary absolute--a pair of c o n j u g a t e i m a g i n a r y points d e t e r m i n e d o n an ext e n d e d Euclidean line by a quadratic form f~(x,y) -= ax 2 + 2bxy + cy2 with ac - b2 > 0. This imaginary absolute is obtained by putting f~ = 0. It is the starting point of a coordinate system on the line, constructed by repeated use of harmonic tetrads. If the (homogeneous) coordinates of two real points are (x~,y~) and (x~,y~), then ~.
t
i
t
~*
p
!
t
0 < [ fl(x:,yl,x2,Y2) / X/fl(xl,yOf~(x2,Y2)[ < 1. s This justifies Cayley's defining the distance 0 between the two points by means of the equation r
cos 0 = J =
p.
!
•(xl"Yl"X2'Y2) _
, 0 ~ 0 ~ ~.
(1)
x/a(x:,y:)a(x:,y~)
0 is additive and its values for coincident points are 0 and ~. Quite generally, this distance is two-valued: if 0 is one of its values t h e n ~r - 0 is the other. We speak of an elliptic metric and of an elliptic line a n d say that the elliptic line has finite length. Cayley considered the case of coincidence of the points of the absolute and was led to the usual Eu4 The idea was Cayley's (see his "A sixth memoir upon quantlcs" of 1859). Its full utilization was due to Klein (1849-1925, see his "On the so-called non-Euchdean geometry" of 1871) s The numerator is the bihnear form assooated wRh the quadrahc form 1~
In spite of the endless talk about the contributions of Renaissance painters to the study of perspective, these contributions come down to empirical rules, and, regardless of their usefulness, are not projective theorems. clidean metric on the line. He did not fully u n d e r s t a n d hyperbolic geometry and so did not consider the case of a real two-point absolute. The latter leads to a hyperbolic metric p on the line, where cosh p = J, with j2 > 1. Cayley w e n t b e y o n d the question of metrics on a line. We quote from Dubnov's essay [3]. Cayley noted that the same arguments and conclusions are applicable to a plane pencil as well as to a pencil of planes, that is, to a so-called one-dimensional form. However, Cayley did not specify the modifications demanded by such an extension of his design. Cayley also extended his ideas to two-dimensional forms--the points of the plane and the lines of the plane. To this end he introduced a conic as the absolute of a two-dimensional metric. A line intersects the conic in two points which form the absolute for measuring distances on that line. Cayley indicated how to coordinate the units of length on different lines. Similarly, we can draw two tangents from a point to the conic which plays the role of the absolute. They serve as the absolute for the metric (the measure of angles) on the pencil of lines with center at that point. Cayley also provided for the case when the two points of intersection of a line and the absolute coincide and for the case when the two tangents (the absolute for an angle measure) coincide. In sum, we can say that Cayley showed that it is possible to extend the concepts of distance between points and angles between intersecting lines. There is not the slightest doubt that, essentially, all of Klein's constructs are to be found in Cayley's remarkable paper. But there is no doubt that Cayley's conclusions are not sufficiently precise. Their main defect is that Cayley did not examine the question of when the principal relation (1) assigns a real or complex value to the distance between points, and when the corresponding scheme assigns a real or complex value to the angle between intersecting lines. This question is of fundamental importance in that its analysis leads to different geometric systems. It is clear that Cayley's scheme leads to three socalled Cayley-Klein geometries (projective metrics) on the line, to 32 = 9 such g e o m e t r i e s in the plane, 33 = 27 such geometries in 3-space, and so on. (The reason is that there are two types of one-forms in the plane, three types of one-forms in 3-space, and so on). One of the nine plane projective geometries is the Klein model of the hyperbolic plane (first introduced by Beltrami in 1865). This model contributed decisively to the acceptance of hyperbolic geometry as a bona fide geometric system. The entirely n e w element introduced by Klein was the group-theoretic one. Klein u s e d the absolute of T H E M A T H E M A T I C A L I N T E L L I G E N C E R V O L 13, N O 2, 1991
53
each of the Cayley-Klein projective metric geometries to define a subgroup of the group of projective transformations of the appropriate projective space, namely the subgroup that m a p p e d the absolute onto itself. This subgroup played the role of the group of motions of the relevant metric geometry. We have come to the end of our story. But we cannot possibly leave it w i t h o u t m e n t i o n i n g the crowning intellectual achievement that grew out of this circle of remarkable ideas and developments and goes entirely beyond it. This achievement is Klein's definition--in the Erlanger Program of 1872--of "a geometry" of a set S and a group G of permutations of S as the totality of properties of the subsets of S that are invariant under the permutations of G.
$
/ Figure 2. (a,b,c,d) = (A,B,C,D).
$
Appendix The key concept of projective geometry is the cross ratio of four collinear points (four concurrent lines). The definitions given below are "pre-projective" in the sense that they rely on Euclidean distances (sines of Euclidean angles). The elimination of Euclidean metric elements marked the "'coming of age" of projective geometry. Cross ratio. The cross ratio of four collinear points A,B,C,D (in this order) is the number
D
"ffA
I
P[
",
Q
Figure 3. A complete quadrangle (with implicit construction of a fourth harmonic to three given points). complete quadrilateral is the dual of a complete quadrangle.)
(A,B,C,D) = (AC/CB)/(AD/DB). 6 All segments are directed. (The cross ratio is sometimes thought of as the quotient of two simple ratios (A,B,C) = AC/CB and (A,B,D) = AD/DB.) The cross ratio of four coplanar and concurrent lines a,b,c,d (in this order) is the number
Range.
This is the set of points on a line.
(Flat) pencil. The set of (coplanar) lines through a point O is called a (fiat) pencil with center O.
(a,b,c,d) = (sin ac/sin cb)/(sin ad/sin db).
Axial pencil. The class of planes through a line p is called an axial pencil with axis p.
Note that the four numbers on the right can be replaced by the (signed) areas of four triangles. It turns out that (a,b,c,d) = (A,B,C,D); see Figure 2.
Bundle. The class of lines and planes through a point O is called a bundle with center O.
Harmonic tetrad. If (A,B,C,D) = - 1 then we say that A,B,C,D form a harmonic tetrad and that the point pairs (A,B) and (C,D) divide each other harmonically. Analogous terminology applies to a concurrent tetrad of (coplanar) lines. There is a remarkable (projective) construction for finding the "fourth harmonic" for three given collinear points. It is implicit in a configuration called a complete quadrangle. Four points A,B,C,D in general position together with the six lines determined by them form a complete quadrangle (see Figure 3). The points A,B,P,Q form a harmonic tetrad(!). (A
The use of the term "dimension.'" A point has no dimension, a line has one dimension, and a plane has two dimensions. The lines of a fiat pencil, and likewise the planes of an axial pencil, correspond (by incidence) to the points of a range, which is the section of a pencil by a line. For this reason, ranges and pencils together are described as one-dimensional forms. Similarly, planes and bundles are two-dimensional forms, the points and lines of a plane being sections of the lines and planes of a bundle; and the whole space is three-dimensional.
6 N . B . ( z l , z 2 , z 3 , z 4 ) = [(z 3 - z l ) / ( z 2 complex numbers za,z2,z3,z4.
54
z3)l/[(z4
-
za)/(z 2 -
THE MATHEMATICAL 1NTELLIGENCERVOL 13, NO 2, 1991
z4)] f o r f o u r
Perspectivity. This is the correspondence established between two coplanar lines, or two planes, by regarding them as different sections of the same fiat
Polarity in the plane. This is a correlation which is its o w n inverse, so that if it relates a point A to a line a, it also relates a to the same point A. We call A the pole of a, and a the polar of A. If B lies on a, its polar, b, passes through A. We call A and B conjugate points, a and b conjugate lines. An example of a polarity is shown in Figure 1. (A polarity induces an involution of conjugate points on any line which is not self-conjugate and an involution of conjugate lines through any point which is not self-conjugate.)
Figure 4. Two ranges in perspective.
Isotropic lines. The lines y - Y0 = - i (x - x0) are the isotropic lines on (x0,Y0). (In homogeneous coordinates these equations are xl +- ix2 + cx3 = O. The isotropic lines pass through the so-called circular points at infinity (1,i,0), and (1, - i,0), respectively.) Note that the distance between two points on an isotropic line is zero.
Bibliographical Note
Figure 5. Two pencils in perspective. pencil or bundle, respectively. In the case of lines we say that the fiat pencil projects the one range into the other, and the two ranges are said to be in perspective; see Figure 4. It is natural to define a perspective correspondence of two coplanar pencils as one in which corresponding lines meet in points all of which lie on one line; see Figure 5.
One-dimensional projectivity.
This is a correspondence between two ranges (or two pencils, or a range and a pencil) that preserves harmonic tetrads. (This definition is due to von Staudt.) It can be shown that every projectivity can be constructed as a product of perspectivities.
Involution. This is a nontrivial projectivity which is its own inverse. Two-dimensional projectivities. These are collineations and correlations. A collineation b e t w e e n two planes (which m a y coincide) is a correspondence which relates collinear points to collinear points (and thus concurrent lines to concurrent lines). (Since this preserves quadrangles, it automatically preserves harmonic tetrads; and the correspondence induced between two corresponding ranges is a one-dimensional projectivity.) A correlation between two planes (which may coincide) is a correspondence which relates cob linear points to concurrent lines (and thus concurrent lines to collinear points).
I relied, for the most part, on the following three items: [1] N. A. Glagolev, Projective geometry, Moscow (1963) (in Russian). [2] G. B. Gurevich, Projective geometry, Moscow (1960) (in Russian). [3] Ya. S. Dubnov's essay on Cayley's A sixth memoir upon quantics. This is a chapter in V. F. Kagan's Foundations of geometry, vol II, Moscow (1956) (in Russian). Dubnov's essay was translated from the Russian by A. Shenitzer.) For an elementary account of the issue of projective metrics see Appendix A (pp. 214-241) in [4] I. M. Yaglom, A simple non-Euclidean geometry and its physical basis, Springer-Verlag (1979) (translated from the Russian by A. Shenitzer). For a comprehensive historical account of the evolution of projective geometry see [5] J. L. Coolidge, A history of geometry methods, Dover (1963). 79-83 in [6] R. Bonola, Non-Euclidean Geometry, Dover (1955), contain a very useful discussion of the contributions of Cayley and Klein described in this paper. The works [7] H. S. M. Coxeter, Non-Euclidean geometry, U. of Toronto Press (1957) and [8] H. Busemann and P. J. Kelly, Projective geometry and projective metrics, Academic Press (1953), include extensive discussions of all the issues sketched in this paper. (Most of the material in Appendix A comes from [7].)
Department of Mathematics York University North York, Ontario M3J 1P3 Canada THE MATHEMATICAL INTELLIGENCER VOL 13, NO 2, 1991 ~
Huygens' Principle and Hadamard's Conjecture Paul Giinther
In this article I shall try to report on a problem connected with hyperbolic differential equations that was posed by Jacques H a d a m a r d in his 1923 Yale Lectures ([11], p. 236). In spite of its age, the problem is still far from being completely solved.
hand, the past D_(t,x) is the set of those events (-r,y) from which (t,x) can be reached with a velocity v < c; in other words, ('r,y) E D+(t,x) if and only if (t,x) D_('r,y). The b o u n d a r y of D+(t,x) (resp., D_(t,x)) is called the forward (resp., backward) characteristic half cone C + (t,x) (resp., C_ (t,x)). Using the function
The Wave Equation
F(t,x; "r,y): = ca(t - ,r)2 - r2(x,y)
Let us begin with a discussion of the w a v e equation for one u n k n o w n function u of m = n + 1 independent variables. If f denotes a given real function and c a positive n u m b e r , t h e n the o p e r a t o r O m and the second-order differential equation (era) are defined as follows: 1 O2u
[3mU: = ca 3t 2
C~2U
02U
Ox2
Oxn2 = f.
(era)
we have
(x, - y,)
.
t=l
The reader may think of a graphic time table with m = 2, say, for Phileas Fogg during his 80-day voyage ([20], Chap. 7); see Figure 1. The open set D+(t,x) C R m of those events that can be reached from (t,x) with a velocity v < c is called the future of (t,x); on the other 56
D+(t,x) U D_(t,x) = {('t,y)lF(t,x; ,c,y) > 0},
(2)
C+(t,x) U C_(t,x) = {('c,y)lF(t,x;'r,y) = 0}.
(3)
The double cone C+(t,x) U C_(t,x) looks like an hour g l a s s - - a n old symbol of transitoriness; see Figure 2.
The variable t represents the time and (xl . . . . . xn) determines a point x of Rn; consequently, w e can think of c as having the dimensions of a velocity (length/ time). We call each pair (t,x) ~ R m an event and interpret the line segment between two events (t,x), (~,y) with "r > t as the straight, uniform m o v e m e n t of a particle; its velocity v is given by v = r(x,y)/('c - t), r(x,y) =
(1)
THE MATHEMATICAL INTELLIGENCER VOL 13, NO 2 9 1991 Spnnger-Verlag New York
In order to form an idea about what kind of physical processes are described by equation (e4), we consider sound waves in the air. Sound waves are small but fast variations of pressure p and density p. They are generated by certain sound sources, e.g., a musical instrument, and propagate into the still air. For the density we put 9(t,x) = 90(1 + 9 with a small parameter 9 and a constant 90, which gives the density for the state of rest. We assume that the pressure is a unique function of the density, p = @(9). From Euler's equations and the equation of continuity one finds that u satisfies (e4) with ca = (d@/dp)(po). If at the event (t,x) none of the sound sources acts, then we have f(t,x) = 0; otherwise we have, generally, f(t,x) # O. In the sequel we assume (for the sake of simplicity) that f is a given C'-function on R4 having compact support, which we abbreviate by writing f ~ C~0(R4); the support of a real or complex function f defined on some manifold M is the smallest dosed subset K C M such that f vanishes on M N K . One writes K = supp f. Equation (e4) also occurs in the theory of electromagnetism. In a vacuum each coordinate of the electric and the magnetic fields satisfies equation (e4) with f = 0 (the homogeneous equation). The number c depends on the dielectric constant and the magnetic permeability; it equals the velocity of light. If charges and currents are present, then one obtains equation (e4) with f # 0. Some
Solutions
Time 1872
5. Oct.
16.00
4. Oct. 6 35
3. Oct. 7.20 --
20.45
2. Oct.
I
I
London Pans
9
Future of (t, x)
\ \ \ \ \
We give first a useful identity. For a fixed x = (xx,x2,x3) E R 3 one defines the accelerated function {6}+ and the retarded function {d~}- of a function ~b:R4 ---* R by the formulas
= r
I Brxndlsl
Figure 1. Graphic timetable for Mr. Fogg.
o f (e4)
{r177
I Torlno
/
" , /
c§ (t, x)
/ / / / / /~.
A :t, x)
++-r(x,y)/c, yx,y2,y3).
If r is differentiable and x # y, we put
A'[r
/4 (t,y) + y, - x , / l {+}-(t,y)
_1
= r
+-
(t,y , i =
C
c _ ( t , . ) ~ - "
1,2,3.
1 r
3 aA'[r
t=l
c_ (t, x)
Past of (t, x)
Confir afion of the following identity is a simple exercise (see also [1]): - {E]46}- (t,y) = - ~
"
Oyt
(t,y) V(b (C2(R4).
Figure 2. Structure of space-time.
(4)
We integrate for fixed (t,x) over y (R3NK(e,x), where K(e,x) is the ball around x with radius 9 Applying the divergence theorem and passing to the limit 9 --* 0, we obtain for every r ~ C~(R4):
4-~
r~x,y) {D4qJ}-(t'y)dyldydly3 = r
(5)
If we e q u i p C• with the measure element dyldy2dy 3, then we can rewrite (5) as THE MATHEMATICALINTELLIGENCERVOL 13, NO 2, 1991 ~
1 fc 1 R4rbdyldy2dy3 4"tr _(t,x) r
+(t,x),
(6)
and this equality is the integrated form of the identity (4). Of course, the same formula holds with C+(t,x) instead of C_ (t,x). A function ~b defined in R m is called a forwar_d function (resp., backwa_rd function) if s u p p ~ n D_(t,x) (resp., supp ~b N D+(t,x)) is compact for every event (t,x). (Example: If ~b(t,x) = 0 for all events (t,x) with t < a, then + is a forward function.) We can n o w easily see that the identity (6) is valid not only for functions with compact support, but also for forward functions (C2(R4). We therefore obtain for a forward solution u + of (e4) the representation
u +(t,x) = - 4r
1
- fdyldy2dy 3,
(7)
event (t,x); one can say that G+(t,x) describes an ideal sound wave, whose source works only at the event (t,x). We disregard these more technical details and say only a few words about the supports of the functionals G• The relevant definition is as follows. Let M be a manifold and @: C~0(M) ~ R be a functional. A point ~ ( M belongs to M N s u p p @ if there exists a neighbourhood U(~) such that (I)[+] = 0 for every q~ ~ C~(M) with supp q~ C U(~). This definition, when applied to G• leads to the statement supp G+(t,x) = C+(t,x),
(12)
which is, so to speak, the limiting case of (8) if supp f reduces to the single event (t,x). Hadamard's
"Minor
Premise"
_(t,~) r
where f (C~0(R4). The analogous formula with C+(t,x) instead of C_ (t,x) holds for a backward solution u_ of (e4). On the other hand, if for a given f ( C~0(R4) the function u+ is defined by (7), then u+ is a solution of (e4). The proof of this assertion can be established again by means of (6), this time applied to f. Consequently, there exists exactly one forward (backward) solution of (e4). The solution u + gives the sound wave that arises from a source described by f. An event (t,x) belongs to the support of u+ only if C_ (t,x) N supp f # O. This can be reformulated as supp u+ C U {C+('r,y)l("r,y) ( suppf}.
(8)
The general notion "Huygens' Principle" comprises several ideas, which are related to wave propagation (diffraction and interference problems included). J. Hadamard formulated some of them in a famous syllogism; in the present paper we are concerned with its "minor premise": "If at the instant t = to, or more precisely, in the short interval to - 9 ~< t ~ to + e, we produce a s o u n d disturbance localized at the immediate neighbourhood of a point x 0, the effect at the (B) + subsequent instant t = t is localized in a very thin spherical shell with center x 0 and radius c(t' - to), where c is the velocity of sound" ([11], p. 54).
Let us give another interpretation of the formulas (6) and (7). For any fixed event (t,x) we associate to every function ~ ~ C~0(R4) the real number
G+(t,x)[6]: = - ~
+(,,x) r
+dyldy2dy3'
(9) y-dmgram
A linear mapping of a vector space 9; into R is usually called a linear functional on ~; equation (9) defines just such a functional G+(t,x) on C~0(R4). Analogously one defines a functional G_ (t,x) by (9) with C_ (t,x) replacing C+(t,x). Formula (7) can now be rewritten as
u•
= G~(t,x)[f].
(10)
t t'
One calls the functional G+(t,x) (resp., G_(t,x)) the forward (resp., backward) fundamental solution of (e4) belonging to (t,x). A reader who is familiar with distributions realizes at once from (9) and (6) that G• are those distributions that satisfy
D4G•
) = ~(t,x),
(11)
where 8(t,x) is the Dirac measure concentrated at the 58
THE MATHEMATICAL INTELLIGENCER VOL 13, NO 2, 1991
to+ e t o -- e
I I Xo
Y
Figure 3. Hadamard's minor premise. The cross-hatched area represents supp u.
1
u+(t,xa,x2) = - ~1 J~3p ( -fit - O/C,X1 + Zl,X2 q- z2)dzldz2dz3, t
= {z1 +
+
Clearly, u + (t,x) also does not depend on x3 and, consequently, solves equation (e3). The integrand is an even function of z3; for z3 > 0 we make the substitution Ya = xl + z v Y2 = x2 + z 2, T
=
t - p/c
and arrive at l fD f(%Yl,Y2) dyfly2d,r" u +(t,x) = ~ _(t,x) X/F(t,x; "r,y)
(13)
: supp u Orchestra
Listener
Figure 4. Music. As a consequence of (B)+ we state: if we produce at some locus several sound impulses, interrupted by p r o p e r intervals of silence, then (because of the missing aftereffects) a listener at another locus also receives separated sound impulses and not a confusing mixture. Clearly, (B)+ is fundamental for a suitable acoustic signal transmission and its artistic shape, which we call music. (See Figures 3 and 4.) Let us again consider the forward solution u + of (e4) with an f whose support is a neighbourhood of the event (t0,x0). According to (8) supp u§ is contained in a neighbourhood of C+(to,Xo). For a fixed t' > to the support of u+(t', "), considered as a function in R 3, is thus contained in a n e i g h b o u r h o o d of the sphere S(c[t' - to], x0) around x0 with radius c [ t ' - to]. If supp f shrinks to (to, x0), then supp u+(t', .) shrinks to S(c[t' - to], Xo). These considerations show not only the validity of (B)+ for the forward solutions u+, but also prove that the number c occurring in equation (e4) equals the velocity of sound. Of course, we can give a formulation (B)_ of the minor premise, which is then fulfilled by the backward solutions u_ of (e4). The validity of (B)_+ is the reason w h y we call E34 a Huygens operator.
r(t,x; %y) = c2(t - ~.)2 _ (x1 _ yl)2 _ (x2 _ y2)2. For every f ~ C~(R3) formula (13) gives the only forward solution u+ of (e3); replacing D_(t,x) by D§ one obtains the backward counterpart. The forward (backward) f u n d a m e n t a l solution G § (resp., G_(t,x)) is defined for fixed (t,x) as the functional C~o(R3) ~ 6--~ G+(t,x)[q~]: 1 fD ~b('r'yl'Y2) dyfly2d'r 2~ +_(t,x)V F ( t , x ; %y)
(14)
We can again write u+(t,x) = C_(t,x)[fl. Contrary to formula (9) the domain of integration in (14) is an open set, which suggests that the function Go(t,x ) defined by D+(t,x) U D_(t,x) ~ ( % y ) ~
1
=
Go(t,x)('r,y )
F(t,x; "r,y)-w
2"rr also solves equation (e3) with respect to (,r,y). An easy calculation confirms this. N o w let us choose a function f ( C~(R3) which is positive in a neighbourhood U~(t,x) of an event (t,x) and zero elsewhere; taking
The Equation (ea) Let us consider equation (e3). We shall point out that the minor premise (B)_+ is not satisfied for its forward and backward solutions; therefore E33 is not a Huygens operator. We start from the somewhat modified formula (7) with a function f that does not depend on the third spatial variable and write
U,(t,x) = {('r,y) ~ D+(t - e,x)l'r ( (t - e,t + e)} we find from (13) that supp u+ = D+(t - e,x) U C+(t - e,x). The support of u+(t', .) for any t' > t is now given by THE MATHEMATICALINTELLIGENCERVOL 13, NO 2, 1991 5 9
supp u§
") = K(c[t' - t + ~], x) C R 2.
In the limit e ~ 0 this set shrinks to the ball K(c[t' - t], x) and not to its boundary as demanded in (B)+. We also have supp G+(t,x) = D+(t,x) U C+(t,x). Of course, analogous results are valid for the backward solutions. Consequently, (B)• is not satisfied by the solutions of (e3). In a world with only two spatial dimensions an annoying noise produced somewhere and at any time never comes to an end. What luck that we have three dimensions at hand!
are defined as the boundary sets of D• respectively. The integration theory guarantees the existence of exactly one forward solution u+ and exactly one backward solution u_ of equation (15) provided that f C~0(f~). Both solutions are smooth; for fixed x ~ 11 we write
u._(x)
= G(x)If]
thus obtaining linear functionals G• (x) over C~(I~), which we call the forward and backward fundamental solutions of the differential operator L. For their supports we find supp u+ C U{D+(x) U C+(x)lx ~ suppf},
The General Hyperbolic Equation
supp G +(x) C D +(x) U C +(x).
Let us consider a second-order differential equation for one unknown function u:
L[u]:
=
0(., V ~ Oxi
0u
,1
+ 2A'~
cgx,
(15)
+ Cu = f .
The independent variables are x = (x0,x1. . . . . xn), m = n + 1; moreover, the sum convention for dispensing with the summation symbol ~ is used. The coefficients gO, A', C and f are smooth functions of x. We assume that L is a hyperbolic operator; i.e., the quadratic form giJ(x)~,*~1 has signature ( + . . . . . . . - ) for every x. Finally, the positive number ~/equals the absolute value of the determinant of the inverse matrix (g,j) = g of (g'0. Just as in general relativity, (g,1) determines a pseudo-Riemannian metric. We restrict our considerations to certain o p e n subsets of the x-space that are called causal domains. (For an exact definition, see [5], sec. 4.4.) In such a domain a satisfactory integration theory of equation (15) can be established. (See [5] or [9], Chap. W.) Let fl be a fixed causal domain. We first procure a function II x f~ ) (x,y) ~ F(x,y) ( R that generalizes our former F of formula (1). For that purpose we solve (using the Hamilton-Jacobi theory) the first-order differential equation OF
Ox,
o~F = 0,
Ox,,gx1
= 2g.s
).
For any fixed x ( ll the set {y E l~lF(x,y) > 0} decomposes naturally into two open subsets of ll; one of them we call the future D+(x) and the other one the past D_ (x) of x. The characteristic half conoids C• 60
THE MATHEMATICAL
= G mg(x) +
where supp G~-~(x) = C• and G~8(x)[f]: = fo •
. . . dy,
(16)
with a smooth kernel function T ~ C~(f~ x gl). One can say, apart from the very harmless integral (16), the fundamental solutions are concentrated on the characteristic conoids. The function T is called a "tail term" and corresponds with the "logarithmic term" in J. Hadamard's original version of the integration theory (see [11]). In the case m odd, such a decomposition with a smooth tail term is impossible; for L = D 3 we see this from (13), where the denominator of the integrand vanishes on C+(x).
u• the forward and backward solutions of L[u] = f; the operator L is called a Huygens operator if u + (x) = 0 for every pair (x,f) with C_ (x) f3 supp f = O and u _ (x) = 0 for e v e r y p a i r (x,f) w i t h C+(x) fq s u p p f
with the initial conditions oF
G•
Definition: Assume x ( 12, f (C~(I~), and denote by
OF
g'J(x) Ox---i(x,y) Oxj (x,y) = 4F(x,y)
r(y,y) = 0,-
There seems to be no incisive difference between the cases m even and m odd. Far from it! Let us consider the case m even. Then each of the fundamental solutions splits into the sum of two functionals over C~0(II) according to
INTELLIGENCER V O L 13, N O 2, 1991
=~. If for a Huygens operator L the function f vanishes outside a neighbourhood of x0 ~ I~, then u§ vanishes outside a neighbourhood of C+ (x0) in accordance with the minor premise (B)+. J. H a d a m a r d ([11], p. 236) gave the following famous criterion.
Hadamard's criterion: The operator L is a Huygens operator, if and only if for every x ( f~ we have supp G•
= C•
This is equivalent to m/> 4, m even and T(x,y) = 0 for
x ~ n, y ~ D•
e: = C
Hadamard's Conjecture When studying Huygens operators we must take into consideration certain simple transformations, which do not change the Huygensian character of an operator. Let L be a Huygens operator; L' is also a Huygens operator if it arises from L by one of the following transformations: (a) Non-singular transformations of the independent variables.
Oa) L'[u] = k-lL[ku] for some positive, smooth function k in fl (gauge transformation). (c) L'[u] = crL[u] for some positive, smooth function (r in l~ (conformal transformation). The wave operator D m, m I> 4 even, is a Huygens operator (for m = 4 we have seen this above). A Huygens operator that arises from I-Ira by (a), (b), and (c) is called a trivial Huygens operator.
Hadamard's Conjecture:
Every Huygens operator is trivial. In their well-known book Methoden der Mathematischen Physik, Vol. 2, Richard Courant and David HAlbert treated the validity of Huygens' principle for [-]r~ and formulated the above conjecture. It is doubtful whether J. Hadamard himself explicitly stated the conjecture; nevertheless it served as a guideline for the investigation of the topic. N o w a d a y s non-trivial Huygens operators are known; therefore it is better to speak of
Hadamard's problem: Find all H u y g e n s operators. What do their coefficients look like? Hadamard's criterion is indeed an elegant characterization of Huygens operators, but the functional relationship between the tail term and the coefficients of an operator is mysterious. Hadamard himself writes that his criterion gives "one answer but not the answer" (see [11], p. 236). The following result is a little step forward (see [9], Chap. VI, w By means of a certain differentiation process and passing to the limit y ---* x one can derive from the taft term T(x,y) an infinite sequence of symmetric, trace-free tensors, which we call the moments of order 0, 1, 2 . . . . . k. . . . of the operator L:
I(x), I,l(x), I, v2(x) . . . . .
These moments are invariant under the transformations (b) and (c), covariant under (a). More precisely, the moments arise by means of the usual tensor operations from (i), the metric tensor g and its curvature tensor; (ii) the tensor H,/ = a A 1 / O x i - OA/Oxj, where A, = g,tA1; (iii) the so-called Cotton invariant
1,1,2. ,k(x) . . . . .
1
0
m-2 (X/-~A') - Ae4' + R V ~ Ox, 4(m - 1)
(R being the scalar curvature of g); and (iv) the covariant derivatives of (i), (ii), (iii). For a Huygens operator the moments must vanish; if the coefficients of L are analytic this condition is even sufficient. In principle the moments can be calculated by use of suitable algorithms, but in practice the matter is complicated. If some m o m e n t I . . . is k n o w n as a function of the tensors (i)-(iv), then one can try to get information about the coeff• of Huygens operators from the equations I . . . = 0. This procedure is the "method of moments."
Some Results Concerning Hadamard's Problem In 1938 E. H61der proved that for a Huygens operator (15) with A' = 0, C = 0, and m = 4, the scalar curvature R of the metric g vanishes. For the case under consideration this is indeed equivalent to the vanishing of the zero-order moment [13]. In 1939 the Polish mathematician Myron Mathisson calculated the moments of order at most 2 for an operator of the form
i)2u
L[u] =- OX2o
~ __O2u
3u
~=1 Ox~ + 2Ai(x)--Oxi +
C(x)u
with m = 4 independent variables and a pseudo-euclidean metric. These assumptions simplify the matter considerably. On the other hand, one can say that M a t h i s s o n c r e a t e d the m e t h o d of m o m e n t s . H e proved that a Huygens operator of the type under consideration is trivial [15]. In 1952 the author expanded Mathisson's calculations to the case of a general operator (15) with m = 4 and curved metric; he gave the complete expressions of the moments of order at most 3 [6]. In 1955 Karl L. Stellmacher found the first examples of non-trivial H u y g e n s operators for even m I> 6, namely operators of the Euler-Poisson-Darboux type in the domains x, # 0:
LIu]-
+ - - ~ 0 ]u
e=l
+
with odd integers ~, I> 1, such that m < ~ ~ , ~ 2 m - 4, m = n + 1. t=O THE MATHEMATICAL INTELLIGENCER VOL 13, N O 2, 1991
61
This result disproves Hadamard's conjecture for m I> 6 [19]. In 1957 the author considered "static" operators of the form L[u] -
Ll[U ],
ax~
where L1 is a linear elliptic operator on a 3-dimensional manifold M3. He confirmed Hadamard's conjecture for this operator class under additional assumptions (e.g., compactness of M 3, or constant scalar curvature of the metric associated to L1) [7].
In a world with only two spatial dimensions an annoying noise produced somewhere and at any time never comes to an end. What luck that we have three dimensions at hand!
Here A denotes the Laplace-Beltrami operator of certain symmetric Riemannian manifolds (M,g) of odd dimension n / 5, m = n + 1, with scalar curvature R. For instance, M = G is a compact simple Lie group and g the bi-invariant metric arising from the negative of the associated Killing form. Taking G = SO(R,3), one obtains a trivial H u y g e n s operator, but G = SO(R,6), n = 15, leads to a non-trivial one [12]. In 1986/88 J. Carminati and R. G. McLenaghan treated again Hadamard's problem for the operator (15) with A z = 0 and m = 4. Their starting point was the Petrov classification for the Weyl curvature tensor of the associated metric g. They settled the problem i n the cases of Petrov type N, D and (partially) III; in these investigations they found no other Huygens operators than the above-mentioned ones. The same result w a s p r o v e d by R. G. M c L e n a g h a n and T. F. Walton even in the case A' # 0 for a Petrov type N metric [3], [17]. Concluding
In 1965 the author gave examples of non-trivial Huygens operators for every even m = n + 1 i 4, thus ultimately disproving Hadamard's conjecture. Let (a~'~), r [3 = 2. . . . . n, be a positive-definite matrix, whose elements are smooth functions of the sole variable x0; further, let (a~) be its inverse matrix and a = Det(a~o). Then 02U
1
0 V a 0u
~
02u
--"
L[u] =- 2 OXoOXl + V a
ar
OXo OX1
a,f~=2
- -
(17)
C~XrLOXfJ
is always a Huygens operator, which is trivial only for certain special matrices (a~). The associated pseudoRiemannian metric reads as follows ds 2 = 2dxodxl -
n ~ a~(xo)dx~dx~; a,[~=2
(18)
metrics of this kind are known as plane-wave metrics [8]. In 1969 R. G. McLenaghan studied the general operator (15) with m = 4 under the assumption that the Ricci tensor R,j of the associated metric vanishes. (In general relativity such a metric defines an e m p t y space-time.) He proved by use of the moments of order at most 4 that a Huygens operator of this kind is equivalent under (a) and (b) to an operator of the form (17) [16]. In 1981 Sigurdur Helgason discovered a new class of non-trivial Huygens operators with m I> 6 independent variables
L[u] =- Ox2
A -
u.
62 THEMATHEMATICALINTELLIGENCERVOL 13,NO 2, 1991
Remarks
Several other papers contain further contributions to H a d a m a r d ' s problem. We m e n t i o n t h o s e of V. W~insch, R. Schimming, R. G. McLenaghan, N. X. Ibragimov, and R. Illge. Their papers also deal with operators that act on sections of vector bundles instead of scalar functions u; they are quoted in the bibliography of [9]. In general relativity the propagation of electromagnetic waves is described by Maxwell's equations taken for a 4-dimensional pseudo-Riemannian metric of signature ( + , , , - ) . The following result in some sense solves Hadamard's problem within the framework of Einstein's theory. Let (M,g) be an empty space-time. Then Maxwell's equations obey Huygens" principle if and only if g is a plane-wave metric (18). For the "if" part see H. P. Kiinzle [14], R. Schimming [18]; for the "only if" part see P. Gfinther and V. W/insch [10]. On the other hand, the non-validity of Huygens' principle leads to an aftereffect; it is an effect of second order at least for an e m p t y space-time, and therefore not so remarkable as the classical ones (deviation of light rays, red shift).
References 1. Leifur Asgeirsson, Some hints on Huygens' principle and Hadamard's conjecture, Comm. Pure Appl. Math. 9 (1956), 307-326. 2. Luigi Bianchi, Vorlesungen uber Differentzalgeometrie, Leipzig: B. G. Teubner-Verlag (1899). 3. J. Carminati and R. G. McLenaghan, An explicit determination of the space-times on which the conformaUy invariant scalar wave equation satisfies Huygens' principle, Ann. Inst. Henri Pomcard, Phys. thdor. 44 (1986), 115-153; Part II, 47 (1987), 337-354; Part III, 48 (1988), 77-96. 4. Richard Courant and David Hilbert, Methoden der Mathemattschen Physik, Band 2, Berlin: Springer-Verlag (1937). 5. F. G. Friedlander, The wave equation on a curved spacetzme, Cambridge: Cambridge Univ. Press (1975). 6. Paul Gtinther, Zur Gifltigkeit des H u y g e n s ' s c h e n Prinzips bei partiellen Differentialgleichungen v o m normalen hyperbolischen Typ, Bet. Verb. Sachs. Akad. d. Wiss. Leipzig, Math.mat. Klasse 100, Heft 2 (1952). 7. Paul Gfinther, Uber einige spezielle Probleme aus der Theorie der linearen partiellen Differentialgleichungen zweiter Ordnung, Bet. Verh. Sachs. Akad. d. Wiss. Leipzig, Math.mat. Klasse 102, Heft 1 (1957). 8. P a u l Gianther, Ein Beispiel e i n e r n i c h t t r i v i a l e n Huygens'schen Differentialgleichung mit vier unabhangigen Verandedichen, Archive Rat. Mech. and Analysts 18 (1965), 103-106. 9. Paul Gianther, Huygens' Principle and Hyperbolic Equations, Boston: Academic Press (1988). 10. Paul Giinther and Volkmar Wtinsch, Maxwell'sche Gleichungen und Huygens'sches Prinzip, Math. Nachr. 63 (1974), 97-121. 11. Jacques Hadamard, Lectures on Cauchy's Problem m Linear Partial Differential Equatwns, New Haven: Yale University Press (1923). 12. Sigurdur Helgason, Wave Equatwns on Homogeneous Spaces, Preprint 1981; Lectures Notes in Mathematics 1077, Springer-Verlag (1984). 13. Ernst Holder, Poissonsche Wellenformel in nichteuklidischen Raumen. Bet. Verh. Sachs. Akad. d. Wzss. Leipzig, 99 (1938), 55-66. 14. H. P. Kiinzle, Maxwell fields satisfying Huygens' principle, Proc. Cambr. Phil. Soc. 64 (1968), 779-785. 15. Myron Mathisson, Le probl~me de M. Hadamard relatif ~t la diffusion des ondes, Acta Math. 71 (1939), 249-282. 16. R. G. McLenaghan, A n explicit determination of the empty space-times on which the wave equation satisfies H u y g e n s ' principle, Proc. Cambr. Phd. Soc. 65 (1969), 139-155. 17. R. G. McLenaghan and T. F. Walton, An explicit determination of the non-self-adjomt wave equations on curved space-time that satisfy Huygens' pnnciple. Part I: Petrov type N background space-times, Ann. Inst. Henri Poincard, Phys. Thdor. 48(3) (1988), 267-280. 18. Rainer Schimming, Zur Gtiltigkeit des Huygens'schen Prinzips bei emer speziellen Metrik, ZAMM 51 (1971), 201-208. 19. Karl L. Stellmacher, Eine Klasse huygensscher Differentialgleichungen und ihre Integration, Math. Ann. 130 (1955), 219-233. 20. Jules Verne, Le tour du monde en 80 ]ours.
Sektion Mathematik, Karl-Marx-Universitfit Karl-Marx-Platz
o-7OlO Leipzig Federal Republic of Germany
Excellent Titles for Mathematicians G. Gra6hoff, Universtty of Hamburg, FRG
The History of Ptolemy's Star Catalogue Here is a two-fold history of Ptolemy's star catalogue, found in the seventh and eighth books of Ptolemy's Alamagest, a work which from its conception in the second century untd the late Renaissance defined astronomy. The book describes the history of the various interpretations of the catalogue's origins. Each contribution ls treated in its historical sequence and an appendix includes a catalogue of all data and identification maps for all constellations. 1990/app. 352 pp./61 illus./I-Iardcover $70.00 ISBN 0-387-97181-5 Studies in the History of Mathematical and Physical Sciences, Volume 14 J. LiJtzen, Umversity of Copenhagen, Denmark
Joseph Liouville 1809-1882 Master of Pure and Applied Mathematics
The first part of this scientific biography contains a chronological account of Llouville's career, and the second part gives a detailed analysis of his major contributions to mathematics and mechanics. It reconstructs his work on potential theory, Galois theory, electrodynamlcs, and theories on the stability of rotating masses of flutd. It also incorporates valuable added reformation from Llouville's notes regarding his works on dffferenttatlon of arbitrary order, integration in finite terms, Sturm-LlouvIlle theory, transcendental numbers, doubly penodlc functions, geometry and mechanics. 1990/app. 815 pp./96 illus./Hardcover $98.00 ISBN 0-387-97180-7 Studies in the History of Mathematical and Physical Sciences, Volume 15 T.M. Apostol, Caltech, Pasadena, CA, USA
Modular Functions and Dirichlet Series in Number Theory Second Edition
"Beautifully wntten, discusses some fascinating topics and is a delight to read. In fact, it is surprisingly easy to read, m particular m view of the enormous amount of material presented.. Each of ~ts chapters ends with a set of exercises, so that the book may well be used as a textbook." --Mathematical Revtews 1989/204 pp., 25 illus./Hardcover $49.80 ISBN 0-387-97127-0 Graduate Texts in Mathematics, Volume 41 Ordering Information: Call Toll-Free 1-800-SPRINGER(In NJ call 201-348-4033) Or send FAX 201-348-4505 For mad orders, send paymentplus $2 50 for postageand handhngto Springer-VerlagNew York, Inc., Attn S Klamkln-- Dept $282, 175 Fifth Avenue, New York, NY 10010 We acceptVisa, MC, and Amexcharges (with signature and exp date noted) as well as personal checksand money orders
Springer-Verlag New York Berlin Heidelberg Vienna London Paris Tokyo HongKong
Catalan Numbers, Their Generalization, and Their Uses Peter Hilton and Jean Pedersen
Probably the most prominent among the special integers that arise in combinatorial contexts are the binomial coefficients (~). These have many uses and, often, fascinating interpretations [9]. We would like to stress one particular interpretation in terms of paths on the integral lattice in the coordinate plane, and discuss the celebrated ballot problem using this interpretation. A path is a sequence of points Po P1 999 Pro, m >I O, where each P, is a lattice point (that is, a point with integer coordinates) and Pz+l, i 1> 0, is obtained by
64
stepping one unit east or one unit north of P,. We say that this is a path from P to Q if Po = P, Pm= Q. It is now easy to count the number of paths. THEOREM 0.1. The number of paths from (0,0) to (a,b) is the binomial coefficient if+b).
Proof: We may denote a path from (0,0) to (a,b) as a sequence consisting of a Es (E for East) and b Ns (N for North) in some order. Thus the number of such paths
THE MATHEMATICALINTELUGENCERVOL 13, NO 2 @ 1991 Spnnger-Verlag New York
is the number of such sequences. A sequence consists of (a + b) symbols and is determined w h e n we have decided which a of the (a + b) symbols should be Es. Thus we have (o+b) choices of symbol.
p=2 k=3
p=3 k=2
3
2
4
5
3
Figure 1. Left: a binary (or 2-ary) tree with 3 source-nodes (Q) and 4 end-nodes (o) and, at right, a ternary (or 3-ary) tree with 2 source-nodes (o) and 5 end-nodes (o).
p=2 k=3 (Sl((S2
We will present another family of special integers, called the Catalan n u m b e r s (after the nineteenth-century Belgian 1 mathematician Eugene Charles Catalan; see the historical note at the end of this ar~cle), which are closely related, both algebraically and conceptually, to the binomial coefficients. These Catalan numbers have many combinatorial interpretations, of which we will emphasize three, in addition to their interpretation in terms of 2-good paths on the integral lattice. Because all these interpretations remain valid if one makes a conceptually obvious generalization of the original definition of Catalan numbers, we will present these interpretations in this generalized form. Thus the definitions we are about to give depend on a parameter p, which is an integer greater than one. The original Catalan n u m b e r s - - o r , rather, characterizations of these n u m b e r s - - a r e obtained by taking p = 2. Let us then present three very natural combinatorial concepts, which arise in rather different contexts, but which turn out to be equivalent. Let p be a fixed integer greater than 1. We define three sequences of positive integers, depending on p, as follows:
t H e lived his life m Belgium; but, b e i n g born in 1814, h e w a s n o t b o r n a Belgian!
S3)S4))
p=3 k=2
(sl s2(ss
Figure 2. Left: an expression involving 3 applications of a binary operation applied to 4 symbols and, at right, an expression involving 2 applications of a ternary operation applied to 5 symbols.
p=2 k=3
p=3 k=2
Figure 3. Left: a S-gun subdivided into three 3-gons by 2 diagonals and, at right, a 6-gon subdivided into two 4-gons by I diagonal. ak: rao = 1, ~a k = the number of p-ary trees with k
s o u r c e - n o d e s , k ~> 1 (see Figure 1). bk: pbo = 1, ~bk = the number of ways of associating k applications of a given p-ary operation, k I> 1 (see Figure 2). ck: f o = 1, t,ck = the n u m b e r of w a y s of subdividing a convex polygon into k disjoint (p + 1)-gons by means of non-intersecting diagonals, k I> 1 (see Figure 3). THE MATHEMATICALINTELLIGENCERVOL. 13, NO. 2, 1991 65
p=2 k=3
(1 ((23) 4))
(~ 2 s 4 5))
p=3 k=2
3 Figure 4. The dissected polygons of Figure 3, with diagonals and last side labeled.
( 1 ( 2 3 ( 4 ( 5 6 7 ) 8))9)
3 ( 4 ( 5 6 7 ) 8)) /
p=3 k=4 ~
'Y 5 Figure 5. The polygonal dissection associated with
(s~(s2s3(s,(sss~ )ss) )~ . (As indicated, we suppress the 'p' from the symbol if no ambiguity need be feared.) Note that, if k t 1, (i) a p - a r y t r e e w i t h k s o u r c e - n o d e s h a s (p - 1)k + 1 end-nodes and pk + 1 nodes in all (Figure 1); (ii) the k applications of a given p-ary operation are applied to a sequence of (p - 1)k + 1 symbols (Figure 2); (iii) the polygon has (p - 1)k + 2 sides and is subdivided into k disjoint (p + 1)-gons by (k - 1) diagonals (Figure 3). A well-known and easily proved result (for the case p = 2 see Sloane [8]) is the following: THEOREM 0.2. pak = pbk = pCk" The reader will probably have no trouble seeing how the trees of Figure I are converted into the corresponding expressions of Figure 2 - - a n d vice versa. 66
THE M A T H E M A T I C A L INTELLIGENCER VOL 13,N O 2, 1991
However, relating the trees of Figure 1 (or the parenthetical expressions of Figure 2) to the corresponding dissected polygons of Figure 3 is more subtle (and the hint given in [8], in the case p = 2, is s o m e w h a t cryptic), so we will indicate the proof that bk = Ck. Thus, suppose that we are given a rule for associating k applications of a given p-ary operation to a string of (p - 1)k + 1 symbols sl, s2. . . . . s0,_l~,+ v Label the successive sides (in the anticlockwise direction) of the convex ((p - 1)k + 2)-gon with the integers from 1 to (p - 1)k + 1, leaving the top, horizontal side unlabeled. Pick the first place along the e x p r e s s i o n (counting from the left) w h e r e a succession of p symbols is enclosed in parentheses. If the symbols enclosed run from st+ 1 to sj+p, draw a diagonal from the initial vertex of the (j + 1)st side to the final vertex of the (j + p)th side, and label it (j + 1. . . . . j + p) Also imagine the part (sj+a . . . . sj+p) replaced by a single symbol. We now have effectively reduced our word to a set of (k - 1) applications and our polygon to a set of 2 polygons, one a (p + 1)-gon, the other a [(p - 1)(k - 1) + 2]-gon, so we proceed inductively to complete the rule for introducing diagonals. Moreover, the eventual label for the last side will correspond precisely to the string of (p - 1)k + 1 symbols with the given rule for associating them, that is, to the original expression. Figure 4 illustrates the labeling of the dissections of the pentagon and hexagon that are d e t e r m i n e d by the c o r r e s p o n d i n g expressions of Figure 2. The converse procedure, involving the same initial labeling of the original sides of a dissected polygon, and the successive labeling of the diagonals introduced, leads to an expression that acts as label for the last (horizontal) side. We now illustrate a slightly more complicated situation: Example O. 1. Let p = 3, k = 4 and consider the expression sl(s2s3(s,(sss6sT)sg))sg). The dissection of the convex 10-gon associated with this expression, with the appropriate labeling, is shown in Figure 5. The corresponding ternary tree is shown in Figure 6, which also demonstrates how the tree m a y be associated directly with the dissected polygon. The rule is "Having entered a (p + 1)-gon through one of its sides, we may exit by any of the other p sides." In the case p = 2, any of at, bk, Ckmay be taken as the definition of the kth Catalan number. Thus we may regard any of pat, pbk, pck as defining the generalized kth Catalan number. Moreover, the following result is
known (see [6]). THEOREM 0.3.
k~>l.
Pak=
k
1
(p-1)k
+ l
'
The "classical" proof of this result is rather sophisticated. One of our principal aims is to give an elementary proof of Theorem 0.3, based on yet a fourth interpretation of the generalized Catalan n u m b e r s which, like that of the binomial coefficients given earlier, is also in terms of paths on the integral lattice. We lay
particular stress on the flexibility of this fourth interpretation, which we n o w describe. We say that a path from P to Q is p-good if it lies entirely below the line y = (p - 1)x. Let dk = pdk be the n u m b e r of p-good paths from (0, - 1) to (k, (p - 1)k - 1). (Thus, by our convention, do = 1.) We extend Theorem 0.2 to assert the following. T H E O R E M 0.4. pak = pbk
1
O•j•
=
pCk = p d k.
Proof: First we restate (with some precision) the definitions of bk a n d dk, which n u m b e r s we will prove equal. bk = the n u m b e r of expressions involving k appli-
p=3 k=4
cations of a p-ary o p e r a t i o n (on a w o r d of
k(p - 1) + 1 symbols). Let E be the set of all
9
such expressions.
dk = t h e n u m b e r of g o o d p a t h s f r o m ( 0 , - 1 ) to 2
5
6
7
Figure 6 (a). The tree associated with (sl(s2ss(s,(ssssS~)s.))sg). Ente[ here
I
(k, (p - 1)k - 1); by following such a path by a single vertical segment, we m a y identify such paths with certain paths (which we will also call 'good') from ( 0 , - 1) to (k, (p - 1)k). Let P be the set of all such paths. We set up functions ~: E---* P, ~F: P - - , E, which are obviously mutual inverses. Thus is obtained by rem o v i n g all final parentheses 2 a n d (reading the resulting expression from left to right) interpreting a parenthesis as a horizontal move a n d a symbol as a vertical move. The inverse rule is ~ . It remains to show (i) that ImCI>consists of elements of P (and not merely of arbitrary paths from ( 0 , - 1) to (k, (p - 1)k); and (ii) that Im~F consists of elements of E (that is, of wellformed, meaningful expressions). (i) Let E be an expression. We argue by induction on k that ~ E is a good path. Plainly ~E is a path from (0, - 1) to (k, (p - 1)k), so we have only to prove it good. This obviously holds if k = 1. Let k I> 2 a n d let E involve k applications. There m u s t occur somewhere in E the section
(st+is1+2..
9 st+ p)
Call this the key section. We wish to show that if (u,v) is a point on the path Ethat is not the endpoint, t h e n v < (p - 1)u. Let E' be the expression obtained from E by replacing the key section by s. Then E' involves (k - 1) applications. Now if A is the point on ~E corresponding to the parenthesis of the key section, then the inductive hypothesis tells us that v < (p - 1)u if (u,v) occurs prior to A. A s s u m e n o w that (u,v) is not prior to A. There are t h e n 2 possibilities:
Figure 6 (b). Associating a tree with a dissected p o l y g o n (in this case, the tree s h o w n in (a) and the p o l y g o n of Figure 5). Note that 9 = interior node = source-node; o = exterior node = end node.
2 This converts our expression into a meaningful reverse Polish expression (MRPE); see [2]. Note that the closing parentheses are, strictly speaking, superfluous, prowded that we know that we are dealing with a p-ary operation. It is not even necessary to know this if we are sure that the expression is well-formed, since the 'arity' is ~-1 + 1, where ~ is the number of symbols and Kis the number of K opening parentheses. THE MATHEMATICAL INTELL1GENCER VOL 13, NO 2, 1991
67
/
y=2x
/(.
p=3 k=4
~2
,C
X
(0, -1)
t
~=~bE E =~b~ Figure 7 (a). The path associated w i t h the expression E = (s~(s2s3(s,(sss6sT)ss))Sg).
Figure 7 (b). The inductive step, in either direction.
If E ends with S/+p, so that E' e n d s with s, then 3(u',v'), the last point of ~E', such that u' = u - 1, v' > v - p + 1, v' = (p - 1)u'. H e n c e v - p + 1 < (p - 1)(u - 1), s o y < ( p - 1)u. If E does not end with st+ ~, so that E' does not end with s, then 3(u',v'), not the last point of ~E', such that u' = u - 1, v' I> v - p + 1, v' < (p - 1)u' (by the inductive hypothesis). Hence v-p + l<(p1)(u- 1),sov<(p1)u. This c o m p l e t e s the i n d u c t i o n in the ~ - d i r e c t i o n . Figure 7, which gives a special, but not particular, case, may be helpful in following the argument. (ii) Let ~ be a good path to (k, (p - 1)k). We argue by induction on k that ~@ is a well-formed expression. This holds if k = 1, because then there is only one good path @ and ~@ is (sis2. 9 9 Sp). Let k ~> 2. Then b e c a u s e the path climbs altogether k(p - 1) + 1 places in k jumps, there must be a j u m p s o m e w h e r e (not at the initial point) of at least p places. Let A be a point on ~ where the path takes one horizontal step followed by p vertical steps, bringing it to C. Let B be the point one step above A (so that B is not on ~). Then B is on y = (p - 1)x if and only if C is on y = (p - 1)x, that is, if and only if C is the endpoint of ~; otherwise B is below y = (p - 1)x. Let ~1, ~2 be the parts of ending in A and beginning in C, respectively. Let ~2' be the translate of ~2, given by
b y AB, followed by ~2'" It is trivial that ~ ' is a good path to (k - 1, (p - 1)(k - 1)). Thus, by the inductive hypothesis, ~ ' is a well-formed expression involving (k - 1) applications. Let s be the symbol in ~@' corresponding to AB. Replace s by (sj+ls1+ 2 . . . sj+p, w h e r e these symbols do not occur in ~@'. Then the n e w expression is wellformed and is obviously ~@. This completes the induction in the ~-direction so that Theorem 0.4 is established. With Theorem 0.4 established, we may proceed to calculate d k. This is achieved in Section 2; a more general algorithm for counting p-good paths is described in Section 3. By considering the last application of the given p-ary operation, it is clear that pbk satisfies the recurrence relation (generalizing the familiar formula in the case p = 2) Pbk = E pbtl pbi2 . . . pbip, k I> 1. (0.1)
u' = u - 1, v' = v -
(p-
1);
and let @' be the path consisting of ~a, followed 68
T i m MATHEMATICAL INTELLIGENCER VOL 13, N O 2, 1991
tl+t2+
.
+tp=k-1
In Section 2 w e also enunciate a similar recurrence relation for pdk, thus providing an alternative proof that ~bk = pdk. We will also s h o w in Section 2, in outline, o w (0.1) leads to a proof of Theorem 0.3 via generating functions and the theory of B/~rmann-Lagrange series (see [5]). The case p = 2, dealing with the Catalan numbers as originally defined, is discussed in Section 1. In that special case w e have available an elegant proof that 2dk = ~(k~l), k I 1, using Andre's Reflection Method [1]; however, w e have not discovered in the literature a means of generalizing this m e t h o d effectively.
m
The authors wish to thank Richard Guy, Richard Johnsonbaugh, H u d s o n Kronk, and Tom Zaslavsky for very helpful conversations and communications during the preparation of this article. 1. Catalan Numbers and the Ballot Problem The ballot problem (see box) was first solved by Bertrand, but a very elegant solution was given in 1887 by the French mathematician D4sir6 Andr6 (see [1], [4]). In fact, his method allows one easily to count the number of 2-good paths from (c,d) to (a,b), where (c,d), (a,b) are any two lattice points below the line y = x. Because p = 2 throughout this discussion, we will speak simply of good paths; any path from (c,d) to (a,b) in the complementary set will be called a bad path. We will assume from the outset that both good paths and bad paths exist; it is easy to see that this is equivalent to assuming
d
composition, @ = ~1~2; and let @1 be the path obtained from @1 by reflecting_in the line y = x. Then, if = ~ 2 , @ is a path from P(d,c) to Q(a,b); and the rule ---* @ sets up a one-one correspondence between the set of bad paths from P to Q and the set of paths from to Q. It follows that there are ((a+b)2(~+a)) bad paths from P to Q, and hence
((a + b) - c(c + d))
((a + b) - d(c +
(1.1)
good paths from P to Q. The argument, illustrated in Figure 8, is called Andre's Reflection Method. Notice, in particular, that if c = 1, d = 0, this gives the number of good paths from (1,0) to (a,b) as
(o+b l) (a+b l) a-1 a
=
a
Thus the probability that a path from (0,0) to (a,b) proceeds first to (1,0) and then continues as a good path to (a,b) is a~bb;this then is the solution to the ballot problem. We return now to Catalan numbers. The kta Catalan number ck is expressible as 2dk, the number of good paths from ( 0 , - 1 ) to (k,k - 1). According to (1.1) this is 1
((a + b) - c(c + thus it suffices to count the number of bad paths from P to Q; see Figure 8. If ~ is a bad path, let it first make contact with the line y = x at F; let ~1, ~2 denote the subpa~hs PF, FQ, so that, using juxtaposition for path-
Thus the kth Catalan number ck is given by the formula 1 ( 2k ) k~>l. co = 1, ck =-~ k - 1 "
(1.2)
Notice that an alternative expression is Ck=k+
Figure 8. Andr4's reflection method.
1
(1.3)
Notice, too, that, by translation, ck may also be interpreted as the n u m b e r of good paths from (1,0) to (k + 1,k). Andr4's Reflection Method does not seem to be readily applicable to obtaining a formula corresponding to (1.2) in the case of a general p I> 2. Indeed, in the next section, where we calculate pck, p/> 2, we do not obtain a general closed formula for the number of p-good paths from (c,d) to (a,b). We do introduce there and calculate explicitly the number dqk of p-good paths from (1,q - 1) to Pk(k,(p - 1)k - 1) for q ~< p - 1. This enables us, by translation, to calculate the number of p-good paths from any lattice point C below the line y = (p - 1)x to Pk. For if C = (c,d), then the number of p-good paths from C to Pk is the number of p-good paths from (1,q - 1) to Pk-c+l, where q = d + 1 - (p - 1)(c - 1). On the other hand, although THE MATHEMATICAL INTELLIGENCER VOL 13, N O 2, 1991
69
a convenient explicit formula w h e n the terminus of the path is an arbitrary lattice point below the line y = (p - 1)x does not seem available, we will nevertheless derive in Section 3 an algorithmic formula in the form of a finite sum, d e p e n d i n g on the quantities dqk. This formula, while readily applicable, has a mysterious conceptual feature to which we draw attention at the e n d of Section 3. The standard procedure for calculating Ck (we here revert to the original case p = 2) is based on the evident recurrence relation
Ck =
~
c,cl, k >~ l; co = 1.
This relation is most easily seen by considering bk and concentrating on the last application of a given binary operation within a certain expression. If we form the power series oo
(1.5) k=O
then (1.4) shows that S satisfes xS 2 -
S -t- 1 -~ 0, S(0) = 1,
so that S =
1 - X/1 - 4 x 2x
(1.6)
It is n o w straightforward to expand (1.6) a n d thus to obtain confirmation of the value of ck already obtained in (1.2). As w e will d e m o n s t r a t e , this a n a l y t i c a l m e t h o d for calculating ck does generalize, but the generalization is highly sophisticated a n d we prefer to emphasize a m u c h more elementary, combinatorial arg u m e n t to calculate pck. 2.
Generalized
Catalan
Definition 2.1 Let q ~< p - 1. Then dq0 = I and dqk is the n u m b e r of p-good paths from (1, q - 1) to Pk, if k >1 1. The quantities dqk will play a crucial role in counting p-good paths a n d turn out to be no more difficult to calculate than the quantities dk. Of course, we have ~Ck = ~/k = pd0k, k ~> 0,
(2.1)
where pck is the kth (generalized) Catalan number. For, if k ~> 1, every p-good path from (0, - 1) to Pk m u s t first proceed to ( 1 , - 1). We further note that we have the relation
70
(2.3)
=/k =/0,k =/p-l,k§
the generalized kth Catalan number. (2.4)
We will also suppress the 'p' from these symbols a n d from the term 'p-good' if no ambiguity need be feared. We n o w enunciate two fundamental properties of the n u m b e r s dqk, for a fixed p 1 2. A special case of the first property was pointed out to us by David Jonah during a Mathematical Association of America Short C o u r s e , c o n d u c t e d by us in the s u m m e r of 1987, u n d e r the auspices of the Michigan Section. His original f o r m u l a concerned the case p = 2 and was restricted to q = 0 (and the parameter n appearing in the formula w a s restricted to being an integer not less t h a n 2k instead of an arbitrary real number). Recall that, for any real n u m b e r n, the binomial coefficient (8) m a y be interpreted as 1, and the binomial coefficient (~) m a y be interpreted as the expression n(n-1) .... (n-r+1), provided r is a positive integer (see, r! e.g., [3]). Using this interpretation, we prove the following. T H E O R E M 2.2 (Generalized Jonah Formula). Let n be any real number. If k >I 1, and q ~ p - 1, then -
k
Numbers
We will base our calculation of ~ck on that of a more general quantity. Let p be a fixed integer greater than 1, and let Pk be the point (k,(p - 1)k - 1), k ~ 0. We define the n u m b e r s ~/qk = dqk, as follows.
6_1, k
2dk = Ck, the kth Catalan number, and
(1.4)
z+/=k-1
S(x) =
for a n e a s y t r a n s l a t i o n a r g u m e n t s h o w s t h a t the n u m b e r of p-good paths from (1, p - 2) to Pk is the same as the n u m b e r of p-good paths from ( 0 , - 1) to Pk-l" Thus dp-l,1 = 1 = d00 if k = 1, and, if k I-- 2, the n u m b e r of p-good paths from ( 0 , - 1) to Pk-1 is, as arg u e d above, the number of p-good paths from (1, - 1) to Pk- 1, that is, do,k- 1. We m a y write ~dk for pd0k, so that, if k I> 0,
= d0,k_l, k ~ 1;
THE MATHEMATICAL INTELLIGENCER VOL 13, NO 2, 1991
(2.2)
Proof: We first assume (see Figure 9) that n is an integer not less than pk, so that (k, n - k) is a lattice point not below the line y = (p - 1)x. Partition the paths from (1, q - 1) to (k, n - k) according to where they first meet the line y = (p - 1)x. The n u m b e r of these paths that first meet this line where x = i is ~i ~ i , where ~/z = the number of paths from (1, q - 1) to (i, (p - 1)i) which stay below y = (p - 1)x except at the endpoint = the number of good paths from (1, q - 1) to P, t/q ; and the n u m b e r of paths from (i, (p - 1)i) to (k, n - k)
( k , n - k)
(Recall that k t> 2.) N o w we h a v e the universal identity (r"+l) = 7~-i(r)"-r n Hence
(k, (p - :) k) - pi-
(p k k - 1
1)
pk-
=
pi-
k -~i
k + i
kk
(P~
_
9
PZ---11)
i-1]'
so that
(p-- 1)z) ' Pz
= dqk
(o 1) = dqk.
Finally,
1
(P
1)( pk k - 2
( 1 , q - 1) pk-
Figure 9. Partitioning paths from (1, q - 1) to (k, n - k) according to where they first meet the line y = (p - 1)x. Because i ranges from I to k and there are (~--0 paths in all from (1, q - 1) to (k, n - k), formula (2.5) is proved in this case. We n o w observe that each side of (2.5) is a polynomial in n over the rationals Q of degree (k - 1), and we have proved that these two polynomials agree for infinitely m a n y values of n. They therefore agree for all real values of n. We m a y use Theorem 2.2 to calculate the value of dqk; recall that q ~< p - 1. THEOREM 2.3 If k >>-1, then
=
q(pk-
q~
pk- q\k-
P-
1]"
q
-
and (2.6) is proved. Note that nothing prevents us from taking q negative in Theorems 2.2 and 2.3. Taking q = 0, we obtain the values of the generalized Catalan numbers: 1 Corollary 2.4 ~lo = 1 and ~dk = ~_~), k >I 1.
We come n o w to the second fundamental property of the n u m b e r s dqk, for a fixed p ~> 2. THEOREM 2.5 (Recurrence relation for dqk). Choose a
fixed r >>-1. If k >I 1, and q < p - r, then Proof: Because, from its definition, dqa = 1, formula
dqk = Z dp-r,'dq+r,J' where i >~ 1, j >I 1, i + j = k + 1.
(2.6) holds if k = 1. We therefore assume k / > 2. We first substitute 3 n = pk - 1 into (2.5), obtaining
(2.7)
1
2 d~ (Pk k _ 1
We next substitute n = pk - 1 in (2.5), but n o w replace k by (k - 1), obtaining 2
~
z,!
The proof proceeds by partitioning the good paths from (1, q - 1) to Pk according to where they first meet the line y = (p - 1)x - r (see Figure 10). We suppress the details of the argument, which resemble those for Theorem 2.2. However, notice that formula (2.7) generalizes the familiar recurrence relation (1.4),
d , ( P ~ - /P'_--11).
Ck=
~
c,q, k > ' l ,
i+y=k-1 3 It is a m u s i n g to note that t h e combinatorial a r g u m e n t in t h e proof of T h e o r e m 2.2 required n >t pk, a n d we are here s u b s t i t u t i n g a value of n less t h a n pk.
for the Catalan numbers, in the light of (2.3) and (2.4) - - w e have merely to substitute p = 2, q = 0, r = 1. In THE MATHEMATICALINTELLIGENCERVOL 13, NO 2, 1991 71
Corollary 2.7 If k >I 1, then =
E
A1 A2 . . .
,l+Z2+...+Zp=k-1
Because the numbers pbk are easily seen to satisfy the same relation, Corollary 2.7 provides another proof of t h e e q u a l i t y of pd k w i t h t h e k th g e n e r a l i z e d Catalan number. It also s h o w s us that, if S(x) is the p o w e r series oo
+
S(x) = (p - - 1 ) 1 - - r )
k=O then
xSp = S - 1.
-"-X
(2.9)
Klamer [6] attributes to P61ya-Szeg6 [7] the observation that we may invert (2.9) to obtain
S(x) = 1 + ~ , - s
(1,p - 1 - r )
(o, --r)
(1, q - - l )
Figure 10. Proof of Theorem 2.5.
fact, we may use Theorem 2.5 to express dqk in terms of the generalized Catalan numbers ~di, at the same time generalizing (2.4): T H E O R E M 2.6. If q ~ p - 1, and if k >! 1, then
dqk =
E q+i2 + .
rdq ~d,2 . . . ~dlp_q.
(2.8)
+ip_q=k-1
Proof: The case q = p - 1 is just (2.4), so w e assume q < p - 1. Then, by Theorem 2.5, with r = 1,
',] wherei/1, = ~
j1>l,i+j
= k + 1
~d,ldq+ 1,12, where j2 ~ 1, il + j2 = k,
'1,12
using (1.4). Iterating this formula, we obtain
dqk = Zl+Z2+
E pdH p d z 2 . . , pdzp_q_l pdp_l,lp_q , . +,p-q- l +lp-q=k where jp_q >i 1.
1
indeed, this is the solution given to Problem 211 on p. 125 of [71, based on the theory of Bfirmann-Lagrange series. Thus, Corollary 2.7 leads to a different, but far more sophisticated, evaluation of the generalized Catalan n u m b e r s pdk, though it does not, in any obvious way, yield Theorem 2.3. Note that Corollary 2.4 could, of course, have b e e n o b t a i n e d w i t h o u t introducing the quantities pdqk for q # 0. H o w e v e r , these quantities have an obvious combinatorial significance and satisfy a recurrence relation (Theorem 2.5) m u c h simpler than that of Corollary 2.7. In the light of (2.1) the quantities pdqk themselves deserve to be regarded as generalizations of the Catalan n u m b e r s - - t h o u g h certainly not the ultimate generalization! We will see in the next section that they are useful in certain important counting algorithms. 3. C o u n t i n g
dqk = ~ dp-l,, dq+l a,
k-
p-good
Paths
We will give in this section an algorithm for counting the n u m b e r of p-good paths from (c,d) to (a,b), each of those lattice points being a s s u m e d below the line y = (p - 1)x. As explained in Section 1, it suffices (via a translation) to replace (c,d) b y the point (1, q - 1), with q ~ p - 1. Further, in order to blend our notation with that of Section 2, w e write (k, n - k) for (a,b), so that n < pk. We assume there are paths from (1, q - 1) to(k,n - k),i.e.,thatk1>l,nk / > q - 1. Of course, there are (~-~) paths from (1, q - 1) to (k, n - k). It is, moreover, easy to see that there exist p-bad paths from (1, q - 1) to (k, n - k) if and only if n - k I> p - 1, so we assume this--otherwise, there are only p-good paths and we have our formula, namely,
(~--0. A second application of (2.4) yields the formula (2.8). Setting q = 0 in (2.8), and again using (2.4), yields our next result. 72 THE MATHEMATICALINTELLIGENCERVOL 13, NO 2, 1991
To s u m up, we are counting the p-good paths from (1, q - 1) to (k, n - k) u n d e r the (nontrivializing) assumption that 1 ~ k ~ n - p + 1 ~ p(k - 1); notice
that this composite inequality implies that, in fact, k/> 2, so that we assume
y = (p-l)/
2~k~n-
J
g =
9 (k,. - k )
p + l <-p(k-
1).
(3.1)
There are h~-~) paths in all, so we count the p-bad paths. Let (~ = [~-1~], where [x] is the integral part of x. Then (3.1) ensures that (~/> 1, a n d a p-bad path from (1, q - 1) to (k, n - k) will meet the line y = (p - 1)x first at some point Q(j,(p - 1)j), j = 1, 2 . . . . . e (see Figure 11). Because there are dqj paths from (1, q - 1) to Qi that stay below the line y = (p - 1)x except at QI' and because there are (~-ro paths from Qj to (k, n - k), it follows that there are dq1(~-rO bad paths from (1, q - 1) to (k, n - k) that first meet the line y = (p - 1)x at Qj. Thus the number of p-bad paths from (1, q - 1) to (k, n -
k) i s
(1, q - 1 Figure 11. A bad path from (1, q - 1) to (k, n - k), n < pk. Notice that Qe is the last possible initial crossing point of the line y = (p - 1)x for such a path.
We have proved the following result. THEOREM 3.1. The number of p-good paths from (1, q - 1) to (k, n - k), under the conditions (3.1), is
Yl
,
[n,]
y = 2x
xxxx [
We m a y express this differently, however, and in a w a y that will be more c o n v e n i e n t if ~ is relatively large. If we invoke Theorem 2.2, we obtain the following corollary.
(5, b x
Corollary 3.2 The number of p-good paths from (1, q - 1) to (k, n - k), under the conditions (3.1), is (3, 6 ) ~ \ NN
'
y=4
/
9 (1, -1)
Figure 12. Corollary 3.3 with q = 0, k = 5, p = 3, n = 9; h e n c e l = 2 , m = 3. all paths bad paths good paths --
J
r ,-k 1 '
LP-lJ
It is interesting to observe that the binomial coeffid e n t s entering into the formula of Corollary 3.2 are certainly 'generalized,' because k - j > n - pj if j > 2. Thus if e + 1 ~ j ~< k, then (~--ro = o, so long as n - pj >/0, i.e., j ~ In~p]. Because n/p > (n - k)/(p - 1), it follows that In~p] <~ [(n - k)/(p - 1)], so that we m a y improve Corollary 3.2, at least formally, by replacing (~ by m = In~p]. Thus we obtain our final corollary.
1, 2)
/
-,,J
l=e+l
\ " ~ (5, 4)
+
r
-3
Corollary 3.3. The number of p-good paths from (1, q - 1) to (k, n - k), under the conditions (3.1), is
6
= 1(15) + 3(1) + 12(0) + 55(-3) + 273(1) i.e., 126 = 18 + 108
1=m+1
J '
We give an example (see Figure 12). THE MATHEMATICALINTELLIGENCERVOL 13, NO 2, 1991 73
Example 3.1. Let q = 0, k -- 5, p = 3, n = 9. The inequalities (3.1) arc certainly satisfied and m = 3. Thus the number of 3-good paths from (1, - 1) to (5,4) is
Now dj is the jth (generalized) Catalan number for 3; thus, by Corollary 2.4, d4 = 88 = 55 and ds = ~(i~) = 273. Thus the number of 3-good paths is 108. Notice that the total number of paths is (9) = 126. Notice also that, in this case, e = [(n - k)/(p - 1)] = 2, so that there is an advantage in replacing e by m. Circumstances will determine whether it is more convenient to use Theorem 3.1 or Corollary 3.3. In the example above there is little to choose. However, whereas in Theorem 3.1 each term has an obvious combinatorial meaning, it is difficult to assign a combinatorial meaning to the terms in Corollary 3.3; the binomial coefficient (~2r0, j I> m + 1, does not count the paths from Q1 to (k, n - k), because there are none - - m i g h t it be said to count 'phantom paths'? Example 3.1 is really "generic" and thus merits closer study. Theorem 2.2 tells us, in this case, that 5
(9-
The expression of the right of (3.2) breaks up into
33/ N o w the binomial coefficient (9) of the left of (3.2) counts the number of paths from ( 1 , - 1) to (5,4); the first block in (3.3) counts the bad paths from (1, - 1) to (5,4), partitioning them into those that first meet the danger line y = 2x at (1,2) and those that first meet the danger line at (2,4); the second block in (3.3) makes a zero contribution; and the third block in (3.3) thus counts the good paths from ( 1 , - 1) to (5,4). In some mysterious sense the third block--which intrigues us e n o r m o u s l y - - i s also partitioned by the points (4,8) and (5,10) on the line y = 2x. But this is only a notational partitioning, because we have 108 represented as (55)(-3) + (273)(1). Here the coefficients d,(i = 4,5) have an obvious combinatorial significance, but the binomial coefficients, (]3) and (56), do not. They result from applying the formula (~_-p') for the number of paths from (i, (p - 1)i) to (k, n - k) outside its domain of validity 1 ~ i ~ 2. There is here much food for thought!
him in connection with his solution of the problem of dissecting a polygon by means of non-intersecting diagonals into triangles; thus his definition of the kth Catalan number is our Ck. In fact, the problem of enumerating such dissections had already been solved by Segner in the eighteenth century, but not in convenient or perspicuous form. Euler (see the bibliography) had immediately obtained a simpler solution involving generating f u n c t i o n s - - w e have sketched this approach in Section 1. Catalan's own solution, achieved almost simultaneously with that of Binet a r o u n d 1838, was even simpler, not requiring the theory of generating functions, which was regarded at the time as rather 'delicate' in view of its (apparent) dependence on the notion of convergence of series. A h o s t of other interpretations of the Catalan numbers have been given, largely inspired by work in combinatorics and graph theory, and have been collected by Gould (see the bibliography). The interpretations have led to generalizations, of which the most obvious and important, treated in this article, have been studied by probabilists (e.g., Feller) and others. The ballot problem itself leads to an evident generalization, considered by Feller, in which the line y = x is replaced by the line y = (p - 1)x. An interesting generalization of a somewhat different kind is treated by Wenchang Chu (see the bibliography).
References 1. D. Andr6, Solution directe du probl~me r6solu par M. Bertrand, Comptes Rendus Acad. Sc~. Paros 105 (1887), 436-7. 2. A. P. Hillman, G. L. Alexanderson and R. M. Grassl, Discrete and Combinatorial Mathematzcs, San Francisco: Dellen (1987). 3. Peter Hilton and Jean Pedersen, Extending the binomial coefficients to preserve symmetry and pattern, Computers Math. Apphc. 17 (1-3) (1989), 89-102. 4. Peter Hilton and Jean Pedersen, The ballot problem and Catalan numbers, Nzeuw Archief voor Wiskunde, 1990 (to appear). 5. A. Hurwitz and R. Courant, Allgemeine Funktzonentheorze und elliptische Funktionen. Geometrische Funktionentheorie, Berlin: Springer (1922), 128. 6. David A. Klarner, Correspondences between plane trees and binary sequences, J. of Comb. Theory 9 (1970), 401-411. 7. G. P61ya and G. Szego, Aufgaben und Lehrs~tze aus der Analyszs, Vol. I, Berlin, Gottingen, Heidelberg: Springer (1954), 125. 8. N. J. A. Sloane, A Handbook of Integer Sequences, New York and London: Academic Press (1973). 9. Marta Sved, Counting and recounting, Mathematzcal Intelligencer 5, no. 4 (1983), 21-26.
Historical Note
Bibliography
The versatile Belgian mathematician Eugene Charles Catalan (1814-1894) defined the numbers named after
[Items [2], [6] and [8] among the References may also be regarded as general sources on Catalan numbers.]
74
THE MATHEMATICAL INTELLIGENCER VOL 13, NO 2, 1991
R. Alter, Some remarks and results on Catalan numbers, Proc. 2nd Louisiana Conf. on Comb., Graph Theory and Comp. (1971), 109-132. D. Andre, Solution directe du probl~me r~solu par M. Bert-rand, Comptes Rendus Acad. Sc~. Parrs 105 (1887), 436-7. W. G. Brown, Historical note on a recurrent combinatorial problem, Amer. Math Monthly 72 (1965), 973-977. Wenchang Chu, A new combinatorial interpretation for generalized Catalan numbers, DiscreteMath. 65 (1987), 91-94. K. L. Chung and W. Feller, On fluctuations in coin-tossing, Proc. Nat. Acad. Sc~., U.S.A. 35 (1949), 605-608. John H. Conway and Richard K. Guy, The Book of Numbers, Sci. Amer. Library, W. H. Freeman (to appear 1990). I. Z. Chomeyko and S. G. Mohanty, On the enumeration of certain sets of planted trees, J. Combm. Theory Ser. B18 (1975), 209-21. T. T. Cong and M. Sato, One-dimensional random walk with unequal step lengths restricted by an absorbing barrier, DzscreteMath. 40 (1982), 153-162. R. Donoghey and L. W. Shapiro, Motzkin numbers, J. Combin. Theory Ser. A23 (1977), 291-301. N. Dershowitz and S. Zaks, Enumeration of ordered trees, Discrete Math. 31 (1980), 9-28. L. Euler, Novz Commentarii Academiae Sczentarzum Imperialis Petropohtanque 7 (1758/9), 13-14. Roger B. Eggleton and Richard K. Guy, Catalan strikes again! How likely is a function to be convex?, Math. Mag. 61 (1988), 211-219. A. Erd~lyi and I. M. H. Etherington, Some problems of nonassociative combinatorics, Edinburgh Math. Notes. 32 (1941), 7-12. I. J. Good, Generalizattons to several variables of Lagrange's expansion, Proc. Camb. Phil. Soc. 56 (1960), 367-380. Henry W. Gould, Bell and Catalan Numbers, Research bibliography of two special number sequences, available from the author (Department of Mathematics, West Virginia University, Morgantown, WV 26506). The 1979 edition sells for $3.00 and contains over 500 references pertaining to Catalan numbers. Henry W. Gould, Combinatorial Identities, available from the author (Department of Mathematics, West Virginia University, Morgantown, WV 26506). Richard K. Guy, Dissecting a polygon into triangles, Bulletin of the Malayan Mathematical Society 5 (45) (1958), 57-60. Richard K. Guy, A medley of Malayan mathematical memories and mantissae, Math. Medley (Singapore), 12, no. 1 (1984), 9-17. V. E. Hoggatt, Jr., and Marjorie Bicknell, Pascal, Catalan, and general sequence convolution arrays in a matrix, Fibonaccz Quarterly 14, no. 2, 135-143. V. E. Hoggatt, Jr., and Marjorie Bicknell, Sequences of matrix inverses from Pascal, Catalan, and related convolution arrays, Fibonaccz Quarterly 14, no. 3, 224-232. V. E. Hoggatt, Jr., and Marjorie Blcknell, Catalan and related sequences arising from inverses of Pascal's triangle matrices, Fibonacci Quarterly (December 1976), 395-405. M. S. Klamkin, Problem 4983, Amer. Math. Monthly 69 (1962), 930-931. Mike Kuchinski, Catalan structures and Correspondences, M.Sc. thesis, Department of Mathematics, West Virginia University, Morgantown, WV 26506. Th. Motzkin, Relations between hypersurface cross ratios, and a combinatorial formula for partitions of a polygon, and for non-associative products, Bull. Amer. Math. Soc. 54 (1948), 352-360. G. P61ya, On picture-writing, Amer. Math. Monthly 63 (1956), 689 697. G. N. Raney, Functional composition patterns and power
series inversion, Trans. Amer. Math. Soc. 94 (1960), 441-451. D. G. Rogers, Pascal triangles, Catalan numbers and renewal arrays, Discrete Math. 22 (1978), 301-310. A.D. Sands, On generalized Catalan numbers, Dzscrete Math. 21 (1978), 218-221. Memoirs by Eugene-Charles Catalan relevant to the theme of Catalan numbers may be found in Journal de Math&natlques pures et appliqu~es de Liouville, (1), III, 508-816; W, 91-94, 95-99; VI, 74, 1838-41. For details of Catalan's life and work see "Les travaux math~matiques de Eugene-Charles Catalan," Annuaire de L'Acaddmie Royale des Sczences, des Lettres et des Beaux-Arts de Belgique, Brussels (1896), 115-172.
Peter H~lton Jean Pedersen Department of Mathematical Department of Mathematzcs Sciences Santa Clara University State University of New York Santa Clara, CA 95053 USA Binghamton, NY 13901 USA and Abteilung fi~r Mathematik E.T.H., 8092 Ziirich, Switzerland
-
THE MATHEMATICAL INTELLIGENCER VOL 13, N O 2, 1991 7 5
Chandler Davis*
Mathematics Tomorrow edited by Lynn Arthur Steen New York: Springer-Veflag, 1981, vi + 250 pp.
Math4matiques/i V e n i r edited by Karine Chemla and Ivar Ekeland Paris: Gauthier-Villars, 1987, 400 pp.
Reviewed by Martin Zerner And one of the games to which [the human race] is most attached is called "keep tomorrow dark," and which is also named (by the rustics in Shropshire, I have no doubt) "Cheat the Prophet." The players listen very carefully and respectfully to all that the clever men have to say about what is to happen in the next generation. The players then wait until all the clever men are dead, and bury them nicely. They then go and do something else. That is all. For a race of simple tastes, however, it is great fun. G. K. Chesterton (The Napoleon of Notting Hill) First of all: will there be mathematics tomorrow? If the greenhouse effect, holes in the ozone layer, or any other of the hazards generated by a socially uncontrolled technology leaves no human being to study mathematics, then mathematics is out. Next: has the mathematician of today a responsibility in that respect? The answer of Neal Koblitz in his contribution to Mathematics Tomorrow and of Robertson and Hogg in theirs is "Yes, we have a responsibility due to the need of giving students statistical literacy and helping them to develop an ability to discuss a sdentific--or pseudosdentific--argument in matters such as pollution, education and discrimination (the list could be made much longer!).'" Well, let us assume that there will be mathematics tomorrow. Of course, we can't peek at the answers and say what it will be. So when people write about
* C o l u m n editor's address: Mathematics Department, Uraverslty of Toronto, Toronto, Ontario M5S 1A1 Canada
the mathematics of the future, they actually say something either about what they would like the mathematics of today to be (subject A) or about what they would like people to believe the mathematics of today is (subject B). We have subject A for instance when Steen writes in his very helpful introduction to Mathematics Tomorrow, "As research in core mathematics reaches unpreced e n t e d heights of p o w e r a n d sophistication, the growth of diverse applied specialties threatens to fragment mathematics into distinct and frequently hostile mathematical sciences." Generally speaking, the book deals with subject A, many contributors going on to evaluate the chances that the mathematics of tomorrow will be what the authors would like and even suggesting steps to make these chances better; the steps suggested by various authors are in sharp contradiction. Mathematics Tomorrow was published in 1981 and its tomorrow was the eighties, which are over by now. Still we cannot play "Cheat the Prophet" with it for two reasons, each of them sufficient. For one thing, the opinions expressed in it are too different, no doubt some will be true and some wrong. The other reason is that we do not see clearly what changed in mathematics in the eighties; m y view is that the latest turning point was in the seventies (emphasis on specific problems and "experiment," de-emphasizing of rigour, rehabilitation of a p p l i e d mathematics), a turning point which is everywhere apparent in the book. Math&natiques ~ Venir deals with subject B. The introductory contribution of M61a is crystal clear: the aim of the book is to advertise French Mathematics to the French public (the capital M is more conspicuous in French, as "fran~aises" is written with a lower case f). The book as such is a poor advertising device, as it will be read by practically nobody except mathematicians, but the colloquium it sums up was a big affair with ample press coverage. There are nearly a hundred contributors (not all mentioned in the table of con-
76 THE MATHEMATICALINTELLIGENCERVOL 13, NO 2 9 1991 Spnnger-VerlagNew York
"'Business needs mathematics. Mathematics needs business." From a brochure of the D~partement de Math~matiques, Universit~ de Nice.
tents). In spite of their huge number, they are much more homogeneous in their opinions than those of Mathematics Tomorrow. The abundance of contributors is easy to understand: It was a prestigious show and everybody wanted to appear. The rules of advertisement may account for the homogeneity of opinions, but I fear it also reflects a current lack of debate among French mathematicians on issues of importance to the profession. A comparison of the Gazette des Mathd~naticiens (the internal bulletin of the Soci~t~ Math~matique de France) and the Notices of the American Mathematical Society also shows this lack of debate. Both these books depart strikingly from earlier ones dealing with general questions about mathematics in that most authors now acknowledge the importance of something they call the real world. Utter confusion arises when one tries to understand what they mean by the real world, or, to put it more mildly, what they consider important for the development of mathematics in this real world. Here is an unclassified and incomplete list of items: applications; technology; business; government; physics; computers; the social, behavioral, managerial, decision, and life sciences; the educational establishment; the general society; secondary schools; employers; women (considered as students and potential mathematicians); mathematicians" working conditions. Clearly, a comprehensive study of today's trends in mathematics would have to take all these into account and see how they relate to one another, along with a few others that are missing (most notably the military). The trouble arises because each author has a favorite important item, neglecting the others; a few pay lip service to the items they neglect, which makes the confusion worse. We may try to clarify things a little by first looking at the attitudes toward applications and then seeing what sort of a real world these attitudes relate to.
Contributors who write about applications belong to one of three types: the purist, the utilitarian, and the worshipper of the two-faced god Janus. Halmos is well known as an exponent of purism and is joined in Mathematics Tomorrow by the nostalgic King. To the utilitarian, usefulness is not only the aim and the justification of mathematics, but also the only source of its vitality and of its renewal. Lucas is the only exponent of utilitarianism in Mathematics Tomorrow and he is a subtle one; Math~mtiques ~ Venir is dominated by utilitarianism and we should not expect advertising to be subtle. One bad consequence of utilitarianism is that it enables some people to publish articles of pure mathematics that would not be accepted in journals, however indiscriminating, were it not for a fictitious relevance to some application. Now comes a disturbing question: Did utilitarianism gain the upper hand between 1981 when Mathematics Tomorrow was published and 1988 when Math~aatiques r Venir was, or does the difference come from some other reason? In any case, the advent of utilitarianism in French mathematics was exposed as early as 1976 by C. Chevalley [1] (the article is, in the words of the editors of the journal, "anonymous but not clandestine"). To the worshipper of Janus, mathematics is both beautiful and useful; it cannot lose its beauty without lessening its efficiency, and in the long run, losing it. Poston is the strongest exponent of this attitude. Spanier also appears to me as a worshipper of Janus. Of course, his concern is the training of "'applied mathematicians" (a strange phrase, by the way; as Poston puts it, "who applies the mathematician?"). But he writes, "The successful applied mathematician should know as much mathematics as possible.'" Both he and Tucker offer precious indications about what a Janusian teaching of mathematics can be (my apology for the neologism). Utilitarians and worshippers of Janus are bound together by the ideology of the model. I know that this phrase will surprise most readers; I should give more arguments for it than can be done within the framework of this article. Let me at least try to clarify it. First, I am fully in favor of teaching what is wrongly called mathematical modelling. Indeed I was quite excited by the contribution of Spanier in Mathematics Tomorrow. But let us see what is wrong with his definition of a model (he is one of the very few authors who gives a clear one). "The applied mathematician's first task is to create a mathematical i m a g e - - o r m o d e l - - o f the physical process which incorporates realistic assumptions and constraints. From this model, mathematical evidence is gathered which must then be compared with physical evidence typically gained through experiments or observation. This comparison determines the worth of the mathematical model." First objection: Spanier restricts the discussion to physical processes, whereas the notion of model is widely used in all sorts of fields. This is natural, as the examples he THE MATHEMATICAL INTELLIGENCER VOL 13, N O 2, 1991 7 7
gives and he actually works on are physical: oil production, the swelling of nuclear fuel elements, diffusion of groundwater nitrates. The second objection is that the equations that describe a physical process are by no means a mere image, they always incorporate theoretical elements. Hence the last sentence quoted is not true: Spanier would not accept a model that violates, say, the conservation of mass, even if it fitted the data. Well, I know: in actual practice such a model would not be proposed to begin with. But this objection takes its importance when combined with the first. Quite different is the final contribution, Thompson's "Mathematization in the sciences." Whereas Spanier speaks about what is called a model in the concrete problems that constitute the work of what is called an applied mathematician, Thompson is interested in the evolution of the various sciences in the long run; he speaks about much more general objects, such as the wave and particle models for light. But there is one thing he shares with Spanier: he downplays theory. He goes so far as to say nothing about the formation of concepts. This error b e c o m e s two-sided. O n the side of physics, scientificity seems to be acquired by the mere work of quantification and mathematization, without a critical discussion of basic concepts. On the side of the social "sciences," they can then pretend they have the same scientificity as physics (admittedly in an earlier stage). This is especially apparent as Thompson gives the example of Gossen's law in economics ("the marginal utility of a good decreases as more of the good is consumed"). The notion of utility is still used, but calls for (and has received) much criticism. 1 So we this double operation: first shrinking the meaning of mathematization in physics to a simple translation into mathematical language of relations b e t w e e n data, secondly establishing a false similarity between physics and the social sciences. This widely promulgated d e c e p t i o n tries to give a scientific aura to opinions on economic, social, and political issues that are at the very best questionable. Among scholars, this happens in a sophisticated context. In Mathematics Tomorrow, Neal Koblitz describes a quite typical reaction of historians to a critique of Huntington's pseudoequations of political and social change. 2 For the general public, this leads to the acceptance of Ehrlich's model, as it is described by Neal Koblitz, again in Mathematics Tomorrow: s e e
1 This is only half controversial, as some c n h a s m has come from neo-classical economists a n d more from their o p p o n e n t s , for mformed readers, I point out that the law, as formulated, speaks of cardinal utility; see in S c h u m p e t e r [5] the note on utility. 2 See the debate in the Mathematzcal Intelhgencer 10 (1988) and 11 (1989), passim. 78 THE MATHEMATICALINTELLIGENCERVOL 13, NO 2, 1991
The occasion was an interview of Paul Ehrhch, author of The Populatzon Bomb and popularizer of population control as a solution to the world's problems. At that time the ecology movement had just started to capture the attention of the public, and Mr. Ehrlich was arguing that the solution, as always, was in population control. He took a piece of posterboard and wrote in large letters for the TV audience: 9
.
.
D=N• 'In this equation,' he explained, 'D stands for the damage to environment, N stands for the number of people, and I stands for the impact of each person on the environment. This equation shows that the more people, the more pollution. We cannot control pollution without controlling the number of people.' As Koblitz points out, "Is Ehrlich saying that the T for the president of Hooker Chemicals (of Love Canal notoriety) is the same as the 'I' for you and me? Preposterous, isn't it? But what if the viewer is too intimidated by a mathematical equation to apply some common sense?"
One bad consequence of utilitarianism is that it enables some people to publish articles of pure mathematics that would not be accepted in journals, however indiscriminating, were it not for a fictitious relevance to some application. The fact that intellectuals have so well accepted the notion of model is one of the reasons w h y they do not give such fakery the strong public criticism it deserves. A sometimes unmentioned component of the real world is money. Money is important for what it can buy and for the prestige that goes with it. In France, things have changed in recent years. Traditionally, one was not supposed to show one's money; people in the upper middle class would boast of the success of their corporation, or of their race horses and other purchases. N o w they talk about their income. This change is not good for mathematics: how can one take seriously people who spend so much of their time preparing underpaid teachers? But money has grown still more important in another sense9 Although profit in a capitalist society is no doubt still made by having factories where workers produce goods with a larger value than the raw material plus salary, it is more and more harvested on the financial market, reinforcing the illusion that money creates more money, that it creates wealth by some marvellous property of its own. This bears on the social demand for mathematics, which formerly required people trained in technologies based on physics, whereas now, as we all know, the emphasis is much more on financial mathematics, the mathematics of decision (whatever that means), and suchlike.
So we have found one ground for the threat that diverse applied specialities will split mathematics into frequently hostile fragments. Of course, I do not claim it is the only reason. What about the much-spoken-of "revolution brought about in mathematics by the computer"? To form a clear idea of what it consists of, we would have to look at the history of its development. After the first applications, most of them military, the driving force was provided by its use for the managem e n t of large corporations, until microcomputers made it available to small enterprises. N o w that other applications of the computer (computer-assisted design and production, robotics, etc.) are on their way, shall we expect an impact on different branches of mathematics? Which ones: geometry? traditional applied mathematics? Is this not already happening? I have already said that the military is a component of the real world that we must not ignore. We know that it has been providing a little over half the federal money for mathematical research in the United States since 1982 (see [2]). It must have been already pretty high two years earlier, when Mathematics Tomorrow was written. While I was writing this article, I received the yearly report of m y own department and found that the contracts with the French army and NATO brought in as much m o n e y as we receive from our normal funding institutions, namely the Ministry of Education and the CNRS. I wish I could give a breakdown, but I doubt that it is available. I see my ignorance as one more testimony to the lack of debate among French mathematicians, especially ironic as this was formerly a subject of bitter controversy (see Godement's contribution [4]). The job market, the money market, the market for computers, the military. On a global level they orient our work, whether we like it or not and whether we know'it or not. They push us into a mathematics that is quickly done, quickly published, quickly taught, and quickly forgotten. Is there no choice other than dancing to their tune or confining ourselves to a supposedly pure research and an abstract teaching?
We would not know whether the results of our research would stay pure, and very likely they would become less and less communicable. As for teaching about our research, its abstractness would make it unintelligible except to a small minority of students, while the majority would turn to computer science, mathematics for management, etc. That would mean shunning the responsibility I spoke about at the beginning, namely the responsibility of equipping as many people as we can with the ability to understand and criticize a supposedly scientific argument. Of course, this is mainly to be done in the training of teachers and a large part of it is to be done in the teaching of statistics. But some ways of teaching statistics are worse than none. Let me take an example. In a commonly used French textbook for (roughly) grade 11, I found the following exercise. A table is given with the number of farms in Tunisia according to their areas (I did not check, but I assume the figures are the real outcome of a census); the pupils are asked to compute the average area of Tunisian farms. N o w a sensible a n s w e r w o u l d be something like this: "I won't compute a meaningless average. These figures do not distinguish between very highly productive olive groves and dry fields of grain or even dry pastures. Moreover, there has been a land reform in Tunisia, and we are not told whether the data mix individual and cooperative farms.'" Of course, a math teacher cannot tackle this problem alone, but must work with a colleague in geography or the social sciences, which makes things more complicated. But we have tools if we look for them. (Huff's How to Lie with Statistics [3] is unfortunately out of print and needs updating. Still it remains one of m y favorites.) Will we be able to train in this critical way the thousands of math teachers and the future experts in various fields that our countries need? Under the present circumstances, it is clear that we will not, at least for the foreseeable future. But if we can do it for a significant number of them, I am sure our efforts will not be lost.
References 1. Anonymous, Fini de folhtrer, une explication de texte, Impascience 4/5 (1976), 17-20. 2. Browder, W., et al., Forum on military funding of mathematics, Mathematical Intelligencer 9, no. 4 (1987), 52-62. 3. Huff, D., How to Lie with Statistics, New York: Norton (1982). 4. Godemont, R., Math~maticiens (purs) ou putains (respectueuses)?, [Auto] critutue de la science (A. Jaubert and J.-M. L~vy-Leblond, eds.), Paris: Le Seuil (1973). 5. Schumpeter, J., History of Economic Analysis (E. B. Schumpeter, ed.), London: Allen and Unwin (1954). Laboratoire de MathOnatiques Universit~ de Nice 06034 Nice Cedex, France THE MATHEMATICAL INTELLIGENCER VOL 13, N O 2, 1991 7 9
Robin Wilson*
Nineteenth-Century Analysis Bernard Bolzano (1781-1848) and Augustin-Louis Cauchy (1789-1857) were both instrumental in formalizing the idea of a continuous function, thereby laying the groundwork for a rigorous approach to real analysis. Bolzano's primary concern was to prove the Intermediate Value Theorem for continuous functions, as a step to rigorizing Gauss's proof of the Fundamental Theorem of Algebra (every polynomial with complex coefficients has a complex root). But Bolzano, living in Prague, was isolated from the mainstream of European mathematics, and his contributions made little impact. Cauchy, on the other hand, lived in Paris and became a highly influential figure. His formalizations of the concepts of limit, continuity, and derivative transformed the area of real analysis, while he (almost single-handedly) developed the subject of complex analysis. He was also, along with Galois, one of the founders of the theory of groups. The Bolzano stamp was issued by Czechoslovakia in 1981 to commemorate the 200th anniversary of his birth. The Cauchy stamp was issued by France in 1989 for the same reason.
The 1990 International Congress of Mathematicians The 1990 Congress was held in Kyoto, Japan, from 21-29 August, with about 4000 mathematicians attending. As with the Congress in Moscow (1966), Helsinki (1978), and Warsaw (1983) [see Stamp Corner, vol. 12, no. 3], a special stamp, pictured at left, was issued to commemorate the event.
* C o l u m n editor's address: Faculty of Mathematics, T h e O p e n University, Milton K e y n e s M K 7 6AA E n g l a n d 80 THE MATHEMATICALINTELLIGENCERVOL 13, NO 2 9 1991 Spnnger-VeflagNew York