PHYSICS AND THEORETICAL COMPUTER SCIENCE
NATO Security through Science Series This Series presents the results of scientific meetings supported under the NATO Programme for Security through Science (STS). Meetings supported by the NATO STS Programme are in security-related priority areas of Defence Against Terrorism or Countering Other Threats to Security. The types of meeting supported are generally “Advanced Study Institutes” and “Advanced Research Workshops”. The NATO STS Series collects together the results of these meetings. The meetings are co-organized by scientists from NATO countries and scientists from NATO’s “Partner” or “Mediterranean Dialogue” countries. The observations and recommendations made at the meetings, as well as the contents of the volumes in the Series, reflect those of participants and contributors only; they should not necessarily be regarded as reflecting NATO views or policy. Advanced Study Institutes (ASI) are high-level tutorial courses to convey the latest developments in a subject to an advanced-level audience. Advanced Research Workshops (ARW) are expert meetings where an intense but informal exchange of views at the frontiers of a subject aims at identifying directions for future action. Following a transformation of the programme in 2004 the Series has been re-named and reorganised. Recent volumes on topics not related to security, which result from meetings supported under the programme earlier, may be found in the NATO Science Series. The Series is published by IOS Press, Amsterdam, and Springer Science and Business Media, Dordrecht, in conjunction with the NATO Public Diplomacy Division. Sub-Series A. B. C. D. E.
Chemistry and Biology Physics and Biophysics Environmental Security Information and Communication Security Human and Societal Dynamics
Springer Science and Business Media Springer Science and Business Media Springer Science and Business Media IOS Press IOS Press
http://www.nato.int/science http://www.springeronline.nl http://www.iospress.nl
Sub-Series D: Information and Communication Security – Vol. 7
ISSN: 1574-5589
Physics and Theoretical Computer Science From Numbers and Languages to (Quantum) Cryptography
Edited by
Jean-Pierre Gazeau APC, Université Paris 7-Denis Diderot, Paris, France
Jaroslav Nešetřil Department of Applied Mathematics and ITI, MFF, Charles University, Prague, Czech Republic
and
Branislav Rovan Department of Computer Science, Comenius University, Bratislava, Slovakia
Amsterdam • Berlin • Oxford • Tokyo • Washington, DC Published in cooperation with NATO Public Diplomacy Division
Proceedings of the NATO Advanced Study Institute on Emerging Computer Security Technologies Cargese, Corsica, France 17–29 October 2005
© 2007 IOS Press. All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without prior written permission from the publisher. ISBN 978-1-58603-706-2 Library of Congress Control Number: 2006939922 Publisher IOS Press Nieuwe Hemweg 6B 1013 BG Amsterdam Netherlands fax: +31 20 687 0019 e-mail:
[email protected] Distributor in the UK and Ireland Gazelle Books Services Ltd. White Cross Mills Hightown Lancaster LA1 4XS United Kingdom fax: +44 1524 63232 e-mail:
[email protected]
Distributor in the USA and Canada IOS Press, Inc. 4502 Rachael Manor Drive Fairfax, VA 22032 USA fax: +1 703 323 3668 e-mail:
[email protected]
LEGAL NOTICE The publisher is not responsible for the use which might be made of the following information. PRINTED IN THE NETHERLANDS
Physics and Theoretical Computer Science J.-P. Gazeau et al. (Eds.) IOS Press, 2007 c 2007 IOS Press. All rights reserved.
v
Preface As a part of the NATO Security Through Science Programme, the goal of the Advanced Study Institute Physics and Computer Science was to reinforce the interface between physical sciences, theoretical computer science, and discrete mathematics. No one can dispute the current importance of applied as well as theoretical Computer Science in the development and the practice of Physical Sciences. Physicists of course use computers in communication as well as in teaching tasks and research: software for symbolic calculus, data processing, programming, modeling and numerical simulations, learning and teaching with the aid of computers. . . On the other hand, and besides the fundamental role played by mathematics in physics, methods imported from computer science are of increasing importance in theoretical physics: algorithmics, symbolic calculus, non-standard numeration systems, algebraic combinatorics, automata, cryptography. . . Some of them, like numeration, tilings and their associated dynamical systems, algebraic combinatorics, have already played an important role in recent developments in physics, like those accompanying the emergence of new materials (e.g. quasicrystals, uncommensurate structures) or the research around quantum information and cryptography (entanglement), or yet around quantum spin systems and related questions of integrability, and more generally in statistical physics. The intersection of combinatorics and statistical physics has been an area of great activity over the past few years, fertilized by an exchange not only of techniques but of objectives. Spurred by computing theoreticians interested in approximation algorithms, statistical physicists and discrete mathematicians have overcome language problems and found a wealth of common ground in probabilistic and algebraic combinatorics. Close connections between percolation and random graphs, between graph morphisms and hard-constraint models, and between slow mixing and phase transition have led to new results and new perspectives. These connections can help in understanding typical, as opposed to extremal, behavior of combinatorial phenomena such as graph coloring and homomorphisms. Some of the topics of particular interest are: percolation, random coloring, mixing, homomorphisms from and to fixed graph, phase transitions, threshold phenomena. Hence, this NATO ASI School was aimed at assembling theoretical physicists and specialists of theoretical informatics and discrete mathematics in order to learn more about recent developments in cryptography, algorithmics, symbolic calculus, nonstandard numeration systems, algebraic combinatorics, automata . . . which could reveal themselves to be of crucial interest in natural sciences. In turn, the School offered specialists in statistical physics or dynamical systems or in quantum information and quantum cryptography, or yet in new materials (e.g. quasicrystals, uncommensurate structures), the opportunity to describe aspects of their research in which new approaches imported from computer science are particularly needed. Therefore, nearly 70 participants (students + lecturers + organizers), coming from 20 different countries (actually more than 25 nationalities), most of them being PhD students or in post-doctoral positions working in various fields, have attended courses given by
vi
16 specialists in algorithmics, numeration systems, algebraic combinatorics, automata, languages, cryptography, quantum information, graphs and statistical mechanics. Generally, the lectures have been introductory and pedagogical. They perfectly complied with the objective of a real transmission of knowledge between the various communities attending the Institute. During the ten working days of the School, a total of 40 hours was reserved for lectures, and two half days were devoted to short presentations (30 or 45 min) mainly by young researchers and PhD participants. Around 35 participants presented their own research on posters displayed during the whole duration of the School. The list of participants is given in the annex of this book. Three Lectures and one concert were organized with the support of the Institut Scientifique de Cargèse: • Roman OPALKA, Artist, France, Poland, The River of Time, • Pierre SIMONNET, Université de Corse, Automata and games, ˇ and Xavier VIENNOT, Arbres et Formes, Art et Mathéma• Jaroslav NEŠETRIL tiques, • Maria COLOMÉ (flute) and Jean-Yves THIBON (piano): Sonate, F. Poulenc, Sonate en si mineur, J.S. Bach. They were aimed to attract a wide audience from Cargèse region. Moreover, the pupils of the Cargèse primary school enjoyed two pedagogical and playful presentations of the combinatorics of trees. During the last auditorium meeting, the participants discussed important question of the relations, on a pedagogical as well as institutional level, between physics and computer science in higher education. This volume is organized along the following rough thematic divisions: • • • • • •
Physics, Chaos and Fractals, Quasi-Crystals and Tilings, Numeration, Automata, and Languages, Algebraic Combinatorics, Graphs and Networks.
Acknowledgements This NATO-ASI “PHYSICS AND COMPUTER SCIENCE” has also been supported by l’ITI-DIMATIA, Charles University, Prague, the Collectivité Territoriale de Corse (Corsica Region), the French Ministry of Foreign Affairs, the University of Marne-LaVallée and the GDR 673 (CNRS) “Algorithmique, Langage et Programmation”. Jean-Pierre Gazeau, Jaroslav Nešetˇril, and Branislav Rovan Co-directors of the Advanced Study Institute Physics and Computer Science
vii
Lecturers & Participants NATO-ASI N◦ 981339
1. ABD AL-KADER Gamal Al-Azhar University, Cairo, Egypt 2. AKIYAMA Shigeki Niigata University, Japan 3. AMBROZ Petr Université Paris 7-Denis Diderot, France 4. ANDRLE Miroslav Aston University, Birmingham, UK 5. AUDENAERT Koenraad M.R. Imperial College, London, UK 6. BALKOVA Lubomira Czech Technical University, Prague, Czech Republic 7. BANDT Christoph Ernst-Moritz-Arndt-Universitaet, Greifswald, Germany 8. BERNAT Julien Université de la Méditerranée, Marseille, France 9. BERTI Stefano Università di Torino, Italy 10. BISTAGNINO Andrea Università di Torino, Italy 11. BUCHA Viktar National Academy of Sciences of Belarus, Minsk, Belarus 12. CARAMETE Laurentiu-Ioan Institute of Space Science, Bucharest-Magurele, Romania 13. CHI Dong Pyo Seoul National University, Korea 14. COLOMÉ Tatché Maria Université Paris-Sud, Orsay, France 15. COMELLAS Francesc Universitat Politecnica de Catalunya, Barcelona, Spain 16. DALFÓ Cristina Univertitat Politècnica de Catalunya, Barcelona, Spain 17. DE MIER Anna University of Oxford, UK 18. DE SOUZA Rodrigo Ecole Nationale Supérieure des Télécommunications, Paris, France 19. DMITROCHENKO Oleg Bryansk State Technical University, Russia
viii
20. DONCHENKO Lyudmyla Donetsk State Medicine University, Ukraine 21. FROUGNY Christiane Université Paris 7-Denis Diderot and Université Paris 8, France 22. GADJIEV Bahruz SAM International University of Nature, Society and Man, Dubna, Russia 23. GAJJAR Pankajkumar Sardar Patel University, Gujarat, India 24. GARCIA DE LEON Pedro Université de Marne-la-Vallée, France 25. GARNERONE Silvano Quantum Computation ISI Foundation, Torino, Italy 26. GAZEAU Jean-Pierre Université Paris 7-Denis Diderot, France 27. GOLINSKI Tomasz University of Bialystok, Poland 28. HADAR Ofer Ben-Gurion University of the Negev, Beer-Sheva, Israel 29. HIVERT Florent Université de Rouen, France 30. HONNOUVO Gilbert Concordia University, Montreal, Canada 31. JUN Jin Woo Inje University, Kimhae, Republic of Korea 32. JUSHENKO Ekaterina Taras Shevchenko University, Kyiv, Ukraine 33. KAROL Andrei Joint Institute for Nuclear Research, Dubna, Russia 34. KHACHATRYAN Suren American University of Armenia, Yerevan, Armenia 35. KITLAS Agnieszka University of Bialystok, Poland 36. KLYACHKO Alexander Bilkent University, Ankara, Turkey 37. KOSINAR Peter Comenius University, Bratislava, Slovakia 38. KOTECKY Roman Charles University, Prague Czech Republic & University of Warwick, UK 39. KWON DoYong Korea Institute for Advanced Study, Seoul, Republic of Korea 40. LEFRANC Marc Université des Sciences et Technologies de Lille, France 41. LOEBL Martin Charles University, Prague, Czech Republic 42. LUQUE Jean-Gabriel Université de Marne-La-Vallée, France
ix
43. MAKHLOUF Amar University of Annaba, Algeria 44. METENS Stéphane Université Paris 7-Denis Diderot, France 45. MOFFATT Iain Charles University, Prague, Czech Republic 46. NESETRIL Jaroslav Charles University, Prague, Czech Republic 47. NOUVEL Bertrand Ecole Normale Supérieure, Lyon, France 48. OLSHEVSKIY Alexander Bryansk State Technical University, Russia 49. OPALKA Roman Bazérac Thézac, France 50. ORLEANDEA Marius-Ciprian National Institute of Physics and Nuclear Engineering, Bucharest, Romania 51. PELANTOVA Edita Czech Technical University, Prague, Czech Republic 52. POLISENSKA Hana Charles University, Prague, Czech Republic 53. POPESCU Bogdan National Institute of Materials Physics, Bucharest, Romania 54. POPOVYCH Stanislav Taras Shevchenko University, Kyiv, Ukraine 55. PROGULOVA Tatyana SAM International University of Nature, Society and Man, Dubna, Russia 56. ROVAN Bronislav Comenius University, Bratislava, Slovakia 57. SAKAROVITCH Jacques Ecole nationale supérieure des télécommunications, Paris, France 58. SARGSYAN Lusine American University of Armenia, Yerevan, Armenia 59. SHISHKOVA Natalya Ukrainian National Academy of Sciences, Donetsk, Ukraine 60. SHMILOVICI LEIB Armin Ben-Gurion University, Beer Sheva, Israel 61. SIMONNET Pierre Université de Corse, Corte, France 62. SMOLINSKI Kordian University of Lodz, Poland 63. STEINER Wolfgang Université Paris 7-Denis Diderot, France 64. SYCH Denis M.V. Lomonosov Moscow State University, Russia 65. THIBON Jean Yves Université de Marne-La-Vallée, France
x
66. TRALLE Igor University of Rzeszow, Poland 67. VIENNOT Xavier Université Bordeaux 1, France 68. YAZYKOV Vladislav Bryansk State Technical University, Russia 69. ZAPATRIN Romàn The State Russian Museum, St. Petersburg, Russia
xi
Contents Preface Jean-Pierre Gazeau, Jaroslav Nešetřil and Branislav Rovan Lecturers & Participants Mathematical Aspects of Quantum Information Theory Koenraad M.R. Audenaert
v vii 3
Dynamical Symmetry Approach to Entanglement Alexander Klyachko
25
Mathematics of Phase Transitions Roman Kotecký
55
The Topology of Deterministic Chaos: Stretching, Squeezing and Linking Marc Lefranc
71
Random Fractals Christoph Bandt
91
Quasicrystals: Algebraic, Combinatorial and Geometrical Aspects Edita Pelantová and Zuzana Masáková
113
Pisot Number System and Its Dual Tiling Shigeki Akiyama
133
Non-Standard Number Representation: Computer Arithmetic, Beta-Numeration and Quasicrystals Christiane Frougny
155
An Introduction to the Theory of Finite Transducers Jacques Sakarovitch
171
Generating Languages Branislav Rovan
189
Basic Enumerative Combinatorics Xavier Gérard Viennot
211
An Introduction to Noncommutative Symmetric Functions Jean-Yves Thibon
231
An Introduction to Combinatorial Hopf Algebras — Examples and Realizations — Florent Hivert
253
Complex Networks: Deterministic Models Francesc Comellas
275
Homomorphisms of Structures Concepts and Highlight Jaroslav Nešetřil
295
xii
Some Discrete Tools in Statistical Physics Martin Loebl
317
Author Index
333
Trees and Forms: Natural Trees, Virtual Trees, Science or/and Art?
This page intentionally left blank
Physics and Theoretical Computer Science J.-P. Gazeau et al. (Eds.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
3
Mathematical Aspects of Quantum Information Theory Koenraad M.R. Audenaert 1 Imperial College London Abstract. In this chapter we give a brief, self-contained introduction to quantum information theory, which is the theory of information processing using systems that obey the laws of quantum mechanics. These processing systems exploit the purely quantum effect of entanglement as a novel resource, and we give a short overview of entanglement and its characterisations. The stress here is on the mathematical framework, rather than on the physical or engineering aspects. Keywords. Quantum Information Theory, Entanglement Measures, Additivity Problems
1. Introduction Quantum Information Theory (QIT) is a recently emerged field within quantum mechanics, serving as the theoretical basis for the exploitation of the laws of quantum physics in communication and computation. In this Chapter, we give the briefest of introductions to QIT. It is quite obvious that we cannot give a complete overview of QIT in this Chapter, as to do so would require a complete volume in its own. We have chosen to put a lot of emphasis on the mathematical framework underpinning QIT, and much less on the physics. We have tried to make this chapter as self-contained as possible, and understandable to readers who only have a modest knowledge of mathematics and quantum mechanics. Quoting [1], “Quantum information theory is not merely a theoretical basis for physics of information and computation. It is also a source of challenging mathematical problems, often having elementary formulation but still resisting solution.” In that spirit we have tried to supply the reader with a number of open “challenging mathematical problems” along the way. In our view, one of the most alluring traits of quantum information is exactly that so many of these problems arise even in very simple settings. Quantum Mechanics (QM) is the physical theory of the micro-world of particles. The two most basic notions in QM are states and operations, which we explain below, and its most distinctive features are the linearity of the operations and the positivity of the mathematical descriptions of both states and operations. Quantum Information Processing (QIP) is about exploiting quantum mechanical features in all facets of information processing (data communication, computing). The states 1 Correspondence to: Koenraad M.R. Audenaert, Imperial College London, The Blackett Lab–QOLS and Institute for Mathematical Sciences, Prince Consort Road, London SW7 2BW, United Kingdom. E-mail:
[email protected]
4
K.M.R. Audenaert / Mathematical Aspects of Quantum Information Theory
act as information carriers, while the communication channels are the quantum operations. Quantum Information Theory is the underlying theory, an interdisciplinary field combining Information Theory with Quantum Mechanics. QIT typically considers systems containing multiple particles, potentially widely separated. One of the most amazing features of these systems is the phenomenon of entanglement. Assigning a Hilbert space to every particle in the system, the state of the entire system lives on the tensor product space of the single-particle spaces. However, quantum mechanics allows for states that themselves are not tensor products of singleparticle states. Systems in such states are not completely described by the states of its constituents, because of correlations that exist between the particles. Roughly speaking, these correlations fall into two categories: classical and quantum correlations. States exhibiting the latter are called entangled states, and they have a number of surprising characteristics. These characteristics are, as it turns out, rather useful, so much so that in QIP we can exploit entanglement as a novel resource for information processing. Necessarily, we need to find ways to fully characterise entanglement, and this is an important topic of study in QIT. Many questions that are being raised in QIT are very hard to answer. The reason for this is that, because of the tensor structure of the underlying Hilbert space, the states of multi-particle systems are higher-order tensors. Tensors are more complicated objects than ordinary matrices and much less is known about them. In Section 2 we explain the basic concepts of QIT, stressing the underlying mathematics, while only briefly touching upon some of the physical ramifications. The interested reader is kindly referred to the splendid work [37] for an in-depth treatment of all facets of quantum information processing. The lecture notes on QIP by J. Preskill [41] and the book on Foundations of quantum mechanics by A. Peres [39] are also valuable sources of information and are warmly recommended. In Section 3 we then follow a certain direction that leads us deeper into the land of quantum information. Namely, we give a very short introduction to the topic of entanglement measures, again pointing out some of the open problems. Due to space limitations we restrict ourselves to the bipartite case; moreover, many important entanglement measures had to be left out of the discussion. This filled us with sadness, leaving us to refer readers wishing a more in-depth treatment to a forthcoming review article by Virmani and Plenio [40].
2. Basic Concepts in QIT 2.1. Notations and Basic Mathematical Facts Let us start this Section on the basics of QIT by collecting most of the notations we will be using, and some basic facts from matrix analysis that are used over and over again in QIT. Much of the mathematics that is being used in QIT is matrixanalysis, and can be learned from [9,19,20]. The main mathematical objects in QIT are vectors and matrices, both of which form linear spaces. The standard basis elements of these respective spaces are, for ddimensional vectors, {ei }di=1 , where ei is a vector with i-th entry 1 and all others 0, and for d × d matrices, {eij }di,j=1 , where eij is a matrix with (i, j)-th entry 1, and all others 0.
K.M.R. Audenaert / Mathematical Aspects of Quantum Information Theory
5
state have norm 1, where we are using the vector norm ||x|| = In QM, vectors 2. (x, x) = |x | i i The identity matrix will be denoted 11. Three basic operations on matrices are the T matrix transpose, AT , the complex conjugate, A, and the Hermitian conjugate, A∗ = A . † ∗ Note, however, that in physics the notation A is used for A . The trace of a matrix is the sum of its diagonal entries: Tr(A) = i Aii . It is customary in physics to use Dirac notation for vectors. Column vectors x are denoted by the so-called ket-symbol x → |x, and their complex conjugate transposes by the bra-symbol x∗ → x|. The basis vectors in Dirac notation are ei → |i and (ei )T → i|. The usefulness of this notation can be seen when considering inner products: (x, Ax) = x∗ Ax → x|A|x, and (x, x) = x∗ x → x|x. The shape of this bra-ket (bracket!) pair makes it visually clear that we are dealing with a scalar. Compare this to xx∗ = |xx|, which is a matrix (or operator). Special types of matrices are the Hermitian matrices, which are square matrices obeying A = A∗ , and the unitary matrices, which are square matrices obeying U U ∗ = 11. The column vectors of a unitary matrix form an orthonormal basis. That implies that also U ∗ U = 11. Hermitian matrices can be unitarily diagonalised, using the eigenvalue decomposition, as A = U ΛU ∗ , where U is unitary and Λ real diagonal; the diagonal entries of Λ, λi , are the eigenvalues. The set of λi is called the spectrum of A. Not every matrix has an eigenvalue decomposition (e.g. a basic requirement is that they be square), but every matrix has a singular value decomposition: A = U ΣV ∗ , where U and V are unitary, and Σ is diagonal, with non-negative diagonal entries σi , called the singular values. It is customary that the singular values appear on the diagonal in decreasing order. Finally, of all kinds of matrices, the Positive-semidefinite (PSD) matrices will occur most often in the setting of QIT. These are Hermitian matrices with non-negative eigenvalues λi . The notation used for PSDness is A ≥ 0. Of the many characterisations of PSD matrices we mention two: A ≥ 0 if and only if there exists a matrix X s.t. A = X ∗ X (If X exists it is certainly not unique); A ≥ 0 if and only if for all vectors x, x|A|x ≥ 0. Matrix functions on Hermitian matrices can be defined via their eigenvalue decomposition. Let A = U ΛU ∗ , then one easily sees for the square A2 = A.A = U ΛU ∗ U ΛU ∗ = U Λ2 U ∗ . One can do the same for all other integer powers. Hence, if a function f (x) has a Taylor expansion, one can extend the function to the matrix function f (A) = U f (Λ)U ∗ , where f operates on the diagonal elements of Λ. An example of such a function is the matrix logarithm of a positive definite matrix. This matrix logarithm is used in the definition of the Entropy functional of a PSD matrix S(A) = − Tr(A log A) = − i λi log λi . 2.2. Single-Particle Quantum Mechanics After having set up the notations, we will now begin with introducing the quantummechanical formalism as it is being used in QIT. Let us start with a warning: this formalism looks much different from the standard treatment found in many undergraduate courses and textbooks in quantum mechanics. In the next few pages I will show how the QIT formalism can be built up from “standard” QM. I am assuming that the following will at least look vaguely familiar to many a reader: While the mechanical state of a point-like particle can be classically described
6
K.M.R. Audenaert / Mathematical Aspects of Quantum Information Theory
by its position at any time x(t), quantum-mechanically it is described by a wave function ψ(x, t), which is a complex function of position x and time t, the values of which are called the amplitudes. This wave function satisfies the normalisation condition (ψ, ψ) := dx|ψ(x, t)|2 = 1. The wave function has to be interpreted such that the quantity dx|ψ(x, t)|2 is the probability of finding the particle in the interval [x, x + dx] when a measurement of position would be made. The QM counterpart to the classical evolution law of Newtonian mechanics, is the famous Schrödinger equation −i
∂ψ ˆ = Hψ, ∂t
ˆ is the Hamiltonian operator (a self-adjoint operator). Finally, the wave function where H being given, QM allows to predict the outcomes of experiments in a restricted sense. To any possible experiment there corresponds a so-called observable, which is a Hermitian operator. Because of Heisenberg’s quantum uncertainty, the outcomes of experiments are subject to statistical fluctuations, and QM only allows to predict the average outcome, or ˆ expectation value, (ψ, Oψ). The first point where QIT already deviates from this classical picture is that in Quantum Information Processing, we are using a particle’s discrete degrees of freedom, rather than a continuous one such as its position. Examples of discrete dof’s are the polarisation of a photon, the spin of an electron, or the excitation level of an atom (is it in its ground state or in an excited state?). The wave function turns into a wave (or state) vector ψ = (ψ1 , ψ2 , . . .)T , where ψi is the amplitude of the particle being in the i-th state. In QIP, we mostly work with 2-level systems ψ = (ψ1 , ψ2 )T : qubits. The rough idea here is to replace bits by qubits. 2 The normalisation condition for the state vector is (ψ, ψ) := i |ψi | = 1. The 2 quantity |ψi | is the probability of finding the particle in its i-th level. The particle undergoes a certain evolution in time, according to Schrödinger’s equation, which now reads −i
∂ψ = H.ψ. ∂t
Here H is the Hamiltonian matrix (which is Hermitian), and ‘.’ is the matrix product. The second point to remark is that in QIP, we rarely care about the details of this evolution, nor about the time dependence of ψ, and we don’t care much about expectation values either. What we do first is hide the Hamiltonian in a black box. Supposedly, the Hamiltonian is “turned on” for some amount of time Δt, and we only look at what comes out after this time Δt. Without solving Schrödinger’s equation, we know that the relation between ψ(t = 0) and ψ(t = Δt) must be of the form ψ(t = Δt) = U ψ(t = 0), where U is a certain unitary matrix, i.e. U U ∗ = 11, the reason being that Schrödinger’s equation is linear and norm-preserving. The formalism used in QIT is based on John von Neumann’s approach to QM, who called the above process unitary evolution [48]. He also identified a second process, ˆ we work with called the measurement process. Rather than working with observables O,
K.M.R. Audenaert / Mathematical Aspects of Quantum Information Theory
7
so-called POVMs (positive operator-valued measures) [39] which are sets of POVM elements {Ei }. Since we work with discrete degrees of freedom, our measurement outcomes typically are also discrete. To outcome i we assign an element Ei , such that its expectation value ψ ∗ Ei ψ is precisely the probability pi of getting outcome i after the measurement. This imposes the following requirements on any Ei that purport to be POVMs: Ei ≥ 0 and i Ei = 11, because probabilities are non-negative and add up to 1. It has to be remarked that the POVM formalism is the modern way of treating the measurement process; J. von Neumann considered orthogonal measurements only, which are such that Ei Ej = 0 for i = j and Ei2 = Ei . We will not be concerned with this restriction here. Depending on the underlying physics, a measurement may or may not destroy the particle. Even if the particle is not destroyed, its state will have been altered by the measurement. Since measurements are repetitive, the probability of obtaining the same outcome when measuring a second time must be 1, no matter what the probability of that outcome was for the first measurement. This can only be if the pre-measurement states differed before the first and before the second measurement. Some researchers call this phenomenon the collapse of the wave function but it is still a topic of heated debate whether this is a real physical process (whatever “real” means here) [45]. It has to be remarked that the POVM formalism does not allow to calculate the postmeasurement state. In the most general formalism, which does allow to do so, measurements are described by a set of measurement operators {Mi }. The post-measurement √ state is then given by ψ → Mi ψ/ pi . The POVM elements that correspond to these operators are Ei = Mi∗ Mi . Note, however, that this general formalism is seldom used in QIT. Example. A photon has a certain polarisation, expressed by two complex numbers a, b such that |a|2 + |b|2 = 1. The polarisation vector is (a, b)T . Conventionally, a horizontally polarised photon has vector (1, 0)T , a vertically polarised one has vector (0, 1)T , and any other linear polarisation, with angle α, is expressed by (cos α, sin α)T . We get complex coefficients for circular and elliptical polarisation. The POVM elements {e11 , e22 } correspond to the two outcomes “photon is polarised along direction H”, and “photon is polarised along direction V”. These operators would correspond to two detectors, H and V. A photon in state (a, b)T has probability |a|2 of being “detected” by detector H, and probability |b|2 by detector V. 2.3. Density Matrix Formalism Many undergraduate QM courses and textbooks stop at the concept of state vectors, leaving many beginning physicists to think that this is the way to do quantum mechanics, and not preparing them for the shock that everywhere else in the world state matrices are being used. In real-life experiments, we have to deal with many uncertainties and uncontrollable factors. E.g. preparation of a particle in some state is never perfect. What we get is ψ = (cos α, sin α)T , with some α close to the desired value, but subject to errors. How can we efficiently deal with those and other errors in QM? The naive method would be to simply specify the distribution of parameters (α) or of the state itself. That’s in practice a very tedious way, and it is also completely
8
K.M.R. Audenaert / Mathematical Aspects of Quantum Information Theory
ˆ unnecessary, since what we can measure are only the expectation values, like ψ|O|ψ. ˆ Any statistical uncertainty on ψ, P (ψ)dψ, will cause us to measure dψP (ψ)ψ|O|ψ ˆ instead. We can rewrite this as Tr[( dψP (ψ)|ψψ|)O]. This we will measure, whatever ˆ the nature of O. We can calculate all expectation values, once we know the matrix dψP (ψ)|ψψ|. Hence, fapp (for all practical purposes) this is “the” state! We call it the density matrix (cf. probability density) and, as is customary, denote it with the symbol ρ [34]. Mathematically speaking, a matrix can be a density matrix if and only if it is PSD and has trace equal to 1. A set of state vectors ψi with given probabilities pi is called an ensemble. The corresponding density matrix is the barycenter of the ensemble. Now note that different ensembles may yield the same density matrix: {p1 = 1/2, ψ1 = |H , p2 = 1/2, ψ2 = |V } {p1 = 1/2, ψ1 = |N E , p2 = 1/2, ψ2 = |N W }, where |H =
1 0
|V =
0 1
|N E =
√ 1 / 2 1
|N W =
1 −1
√ / 2.
Both ensembles yield the density matrix ρ = 11/2, the maximally mixed state. In fact, the uniform ensemble, in which every possible vector appears with uniform probability density, also has the maximally mixed state as barycenter. Absolutely astonishing fact about QM #29: We can never figure out which ensemble a density matrix originated from, not even how many elements it contained. A state with density matrix of rank 1 is a pure state and corresponds to a state vector. We can write such ρ as ρ = ψψ ∗ = |ψψ|; the state vector is the eigenvector ψ corresponding to eigenvalue 1. When the rank is greater than 1, we call the state a mixed state (cf. statistical mixing). 2.4. Tensor Products and Partial Traces (a mathematical interlude) For the later purpose of describing the state of a many-particle QM system, we now need to introduce two new mathematical concepts: the tensor product, and the partial trace. A somewhat unorthodox definition of the tensor product, but which in my view best describes its essence, goes as follows (readers who prefer the usual definition are kindly referred to textbooks like [20]). The Tensor Product, a.k.a. Kronecker Product of matrices A1 , . . . , AN can be thought of as an ordered list (A1 , . . . , AN ) with the following rules: 1. Product rule: (A1 , . . . , AN ).(B1 , . . . , BN ) = (A1 B1 , . . . , AN BN ); 2. Reduces to the ordinary product when all Ai are scalars ai .
K.M.R. Audenaert / Mathematical Aspects of Quantum Information Theory
9
The notation for the tensor product is: A1 ⊗ . . . ⊗ AN . For the purposes of calculation, the tensor product of matrices can be represented by a matrix: (eiT ⊗ ejT )(A ⊗ B)(ek ⊗ el ) = Aik Bjl . I like to denote the LHS by (A ⊗ B)(i,j),(k,l) , where (i, j) is a composite (row) index. This is actually a block matrix, where indices i, k index the blocks, and j, l index the entries within the blocks. E.g. when A is 2 × 2, A⊗B =
A11 B A12 B A21 B A22 B
.
From the above easily follows the trace rule: Tr(A ⊗ B) = Tr(A) Tr(B). The tensor product is distributive w.r.t. addition: (A + B) ⊗ C = (A ⊗ C) + (B ⊗ C). Every block matrix (with equally sized blocks) can be (trivially) written as a sum of tensor products. Given that the elements of A are A(i,j),(k,l) , A=
A(i,j),(k,l) (eik ⊗ ejl ).
i,j,k,l
The tensor product is about combining matrices into a bigger object. The opposite operation, breaking down a tensor product into its constituents, is called the partial trace. The partial trace of a tensor product w.r.t. its i-th factor is obtained by replacing the i-th factor with its trace: Tr1 (A ⊗ B) = Tr(A) ⊗ B = Tr(A)B Tr2 (A ⊗ B) = A ⊗ Tr(B) = Tr(B)A. In block matrix form this reads: A11 B Tr1 (A ⊗ B) = Tr1 A21 B A11 B Tr2 (A ⊗ B) = Tr2 A21 B
A12 B A22 B A12 B A22 B
= A11 B + A22 B
=
A11 Tr B A12 Tr B A21 Tr B A22 Tr B
.
By linearity of the trace, this extends to all block matrices: Tr1
AB CD
=A+D
Tr2
An alternative definition of the partial trace is: Tr((11 ⊗ X)A) = Tr(X Tr1 A), ∀X Tr((X ⊗ 11)A) = Tr(X Tr2 A), ∀X Exercise.
AB CD
=
Tr A Tr B Tr C Tr D
.
10
K.M.R. Audenaert / Mathematical Aspects of Quantum Information Theory
Given an undetermined Hermitian block matrix A, determine the exact relation between the spectra of its partial traces Tr1 A, Tr2 A, for A rank 1. The same question for general A with prescribed spectrum used to be a research problem until it recently got solved by A. Klyachko (see his contribution in this volume). 2.5. Multi-Particle States Given two particles, where the first is in state ψ1 , and the second in state ψ2 , the two particles taken together can be described in one go as being in the joint state ψ1 ⊗ ψ2 . In Dirac notation: ψ1 ⊗ ψ2 → |ψ1 ⊗ ψ2 = |ψ1 |ψ2 ei ⊗ ej → |i|j. Suppose we now make a measurement Ei on particle 1, and an independent measurement Fj on particle 2, then that corresponds to making a joint measurement Ei ⊗ Fj on the joint state. Indeed, the probability of obtaining outcome i on particle 1 and outcome j on particle 2 is just the product of the probabilities (a purely statistical rule!), ψ1 ⊗ ψ2 |X|ψ1 ⊗ ψ2 = ψ1 |Ei |ψ1 ψ2 |Fj |ψ2 , which can only be true if X = Ei ⊗ Fj . 2.5.1. Entanglement Let’s say the states ψ1 ⊗ ψ2 and φ1 ⊗ φ2 are possible state vectors of a joint 2-particle system. QM’s principle of superposition says that every superposition (that is: weighted sum) of allowed state vectors is again an allowed state vector. Absolutely astonishing fact about QM #30: one can also have state vectors ψ=
1 (ψ1 ⊗ ψ2 + φ1 ⊗ φ2 ) K
(with K the normalisation constant) which can never be written as a single tensor product. Such state vectors are called entangled states. In fact, in Nature, product states ψ = ψ1 ⊗ ψ2 form the exception and entangled states are the rule. Only when the state is a product state do the particles have a state on their own. In an entangled state, they have not. Perhaps the most famous example of an entangled state is the EPR state ⎛ ⎞ 1 1 1 ⎜ 1 0⎟ 1 1 0 0 ⎜ . ⊗ + ⊗ )= √ ⎝ ⎟ ψ = √ (|00 + |11) = √ ( 0 1 1 2 2 0 2 0⎠ 1 EPR stands for the initials of Einstein, Podolsky and Rosen, who first posited this state and explored its unusual characteristics [15]. The EPR state is one of the 4 so-called Bell states, forming an orthonormal basis: 1 √ (|00 ± |11), 2
1 √ (|10 ± |01). 2
K.M.R. Audenaert / Mathematical Aspects of Quantum Information Theory
11
2.5.2. Determination of Entanglement How can we mathematically determine whether a given state vector is entangled or not? Give that state vectors can be written as a sum of tensor products in lots of different ways, we need to know the minimum required number of terms. Write the (block) vector ψ as ˜ by stacking the blocks as columns a matrix, denoted ψ, ψ(i,j) → ψ˜ij A sum of n tensor products turns into a sum of n rank 1 matrices in this way. The minimal ˜ number of terms in a tensor decomposition of ψ is thus given by the rank of ψ. ˜ The rank can be determined using the Singular Value Decomposition (SVD) of ψ: ψ˜ = U ΣV ∗ , where U and V are unitary matrices and Σ is a non-negative diagonal matrix. If uk and vk are the k-th columns of U and V , and σk = Σkk , then ψ=
σk |uk |v k ,
k
where the sum is over all k for which σk is non-zero. This minimal decomposition of ψ is the Schmidt decomposition of ψ, the number of terms is the Schmidt rank, and the σk are the Schmidt coefficients. Normalisation implies that the sum of squares of the Schmidt coefficients is 1. The conclusion of this is that a state is entangled if and only if its Schmidt rank is greater than 1. For example, a product state has Schmidt coefficients (1, 0, . . . , 0). This is an extreme case,√with no √ entanglement. The Bell states, on the other hand, have Schmidt coefficients (1/ 2, 1/ 2), which is also an extreme case, with maximal entanglement. √ They are maximally entangled states. Note that if ψ is a Bell state, ψ˜ = U/ 2, for some unitary U . In a 3-particle system (or more) there is a slight notational problem: how do you, e.g., write that particles 1 and 3 are in an entangled state, not entangled with particle 2? We chose to mention particle labels as indices: e.g. ρ1,3 ⊗ σ2 . 2.5.3. Strange Behaviour of the EPR State Although it does not ostensibly appear from the mathematics, EPR states have some disturbing physical properties. Consider thereto the following Gedanken experiment (which has been performed recently). Send the first particle of an EPR state to, say, Amsterdam, and the second particle to Brussels. In both towns orthogonal 2-outcome measurements are performed on the local particles (e.g. H versus V, or NE vs. NW). It is easy to check that, no matter what the chosen measurement alternatives are, each one of the two outcomes will occur with 50%. However, when in A and B the same set of alternatives are used, they always obtain the same outcome! If A gets H as outcome, so will B. If A and B switch to the NE/NW basis, and A obtains NE, then, again, so will B. It is as if in an EPR state, Nature has fixed the correlations between the A- and B-outcomes without yet having chosen the outcomes themselves!
12
K.M.R. Audenaert / Mathematical Aspects of Quantum Information Theory
When A suddenly chooses a different set of alternatives than B, this immediately shows up as a change in the correlations between the outcomes. If A switches to the NE/NW basis, while B still uses the H/V basis, their outcomes will no longer be perfectly correlated. In fact, they will no longer be correlated at all. That this happens immediately, without delay due to the finite speed of light, apparently violates special relativity’s limit on the speed of information transmission. Einstein called this a “Spooky Action at a Distance”, and it was the main reason why he thought quantum mechanics was an incomplete theory [15]. We will not discuss this issue further, but just raise the question whether one can use entanglement to transmit messages faster than light, and thus violate special relativity? The answer is a clear No: the effect only shows up in the correlations, not in the local (marginal) distributions. Since one needs the outcomes from both A and B to compare the correlations, one also needs classical (=slow) communication. Of course, entanglement could be useful in other ways, and that is exactly what QIP is all about! 2.6. State Reductions We have just seen how the tensor product can be used to describe a number of independent (i.e. uncorrelated) particles with just one joint state vector. Now we consider the opposite question of how to describe the state of a subset of particles when the state of the whole set is given. The reader be forewarned that this question might not even make sense, in the light of the existence of entanglement. Let us first consider the seemingly innocuous question of how to ignore a particle in a set of particles in a given joint state. In classical physics this is answered trivially: by not considering its state. In quantum physics the situation is not so clear, as the particle might be entangled with the other particles and, hence, does not have a state of its own. The only reasonable answer to the question how to ignore the particle that still makes sense in QM is to perform a measurement on the particle and then forget the result. Mathematically, this is equivalent to performing a measurement for which the result will probably always be the same: the single-element POVM 11 does exactly that, as it has only one outcome. Specifically, a particle in a group of particles can be ignored by take the partial trace of the joint state w.r.t. the particle being ignored. In QIT lingo, we say that we trace out the particle. Let us illustrate this first for a product state: |ψψ| = |ψ1 ⊗ ψ2 ψ1 ⊗ ψ2 | → Tr1 |ψ1 ⊗ ψ2 ψ1 ⊗ ψ2 | = |ψ2 ψ2 | Indeed, performing a joint measurement that disregards particle 1, we are measuring using the POVM 11 ⊗ Ei , giving ψ|11 ⊗ Ei |ψ = Tr(11 ⊗ Ei )|ψψ| = Tr(Ei Tr1 |ψψ|). The state Tr1 |ψψ| is called the reduced state or the reduction of |ψψ| to particle 2. For a general entangled state, disregarding a particle is done by taking the reduction w.r.t. the remaining particles: |ψψ| → Tr1 |ψψ|.
K.M.R. Audenaert / Mathematical Aspects of Quantum Information Theory
13
Except for product states, this is no longer a pure state! One sees that a general feature of an entangled pure state is that its reductions are mixed states. Incidentally, this is the second reason for needing density matrices, the first reason being the required ability to incorporate classical statistical fluctuations in the framework. (Actually, there really is no fundamental distinction between these two reasons, as classical fluctuations can be understood quantum-mechanically by assuming the system under consideration is entangled with its environment.) Exercise. The spectrum of a reduced pure 2-particle state is given by the Schmidt coefficients squared. More generally, instead of tracing out some tensor factors, one can perform measurements on the corresponding particles, i.e. partial measurements. For a POVM {Ei }: Tr((Ei ⊗ 11)A) = Tr2 (Tr1 ((Ei ⊗ 11)A)) Tr((Ei ⊗ 11)A) = Tr1 (Ei Tr2 A) = pi . Thus Tr1 ((Ei ⊗ 11)A) can be written as pi ρi . Here, pi is the probability of getting outcome i: pi = Tr(Ei Tr2 A), and ρi is the post-measurement “left-over” state, conditional on measurement outcome i: ρi = Tr1 ((Ei ⊗ 11)A)/pi . 2.7. Completely Positive Maps We’ve already seen various ways of operating on states. • • • • • •
Unitary evolution: |ψ → U |ψ; Adding particles (in a determined state): |ψ → |ψ ⊗ |0; Removing/ignoring particles: |ψψ| → Tr1 |ψψ|; Measurements: |ψ → ψ|Ei |ψ; Combinations thereof; Measurement outcomes may determine the choice of subsequent operations.
Absolutely astonishing fact about QM #31: there are no other options, and all of this can be combined into one simple formula! Every quantum operation, composed of the above basic operations, can be written as a completely positive, trace preserving, linear map or CPT map Φ acting on the density matrix: ρ → Φ(ρ). A positive map is a linear map that transforms any PSD matrix (state) into a PSD matrix (again a state provided it has trace 1). That a quantum operation should be represented by a positive map is an obvious requirement, because a state should remain a state. Complete positivity now means that the map should be positivity preserving when acting on any subset of the state’s particles. The distinction between positivity and complete positivity is easily illustrated by means of an example. The matrix transpose is a positive, trace preserving linear map, but not a completely positive one: when it acts on 1 particle of an EPR state, one gets a non-positive matrix. The converse of the above remarkable statement is again a remarkable statement, and is called Stinespring’s Theorem [46]: any CPT map can be decomposed as the composition of three basic operations A-U-T:
14
K.M.R. Audenaert / Mathematical Aspects of Quantum Information Theory
• adding a particle (called an Ancilla), • some Unitary evolution, • Tracing out the ancilla ρ → Φ(ρ) = Traux (U (ρ ⊗ |00|)U ∗ ). Every combination of basic operations can be rewritten in Stinespring form. This is a very useful theorem indeed. 2.7.1. Characterisation of CP(T) maps By dropping the trace-preservation requirement, we get a CP map. Any linear map can be represented using its so-called Choi-matrix Φ: This is a block matrix with din × din blocks of size dout ×dout , where block i, j of Φ is given by Φ(eij ). The relation between the map and the Choi matrix is simply Φ(ρ) =
ρij Φ(eij ).
i,j
Now the very useful fact about the Choi matrix is that a map Φ is CP if and only if its Choi-matrix Φ is PSD [10]. Since the Choi-matrix, just as a bipartite state, is a block matrix, we can define its partial traces, where Tr1 = Trout and Tr2 = Trin . This allows us to convert the condition that a map be trace preserving to a condition on its Choi matrix, namely Trout Φ = 11. Incidentally, a unital CP map is defined by the condition Φ(11) = 11. This translates to the condition on the Choi matrix Trin Φ = 11. Every CP map also has a representation as a set of matrices Ak , the so-called Kraus representation [33] with the Ak being called the Kraus elements, where Φ(ρ) =
Ak ρA∗k .
k
For a CPT map we have the additional requirement the requirement is k Ak A∗k = 11.
k
A∗k Ak = 11. For a unital CP map
Example. Any POVM {Ei } with rank 1 elements Ei = Xi Xi∗ can be constructed using a CPT map with Kraus elements ei Xi∗ , followed by an orthogonal measurement, namely the POVM {eii }. 2.8. Basic QIP Tasks In the following section we discuss a number of basic operations that a quantum information processor should be able to do; at least, when naively extrapolating from what a classical processor can. However, it will turn out that some operations that are absolutely trivial on a classical processor are very hard to generalise to the quantum realm, or even impossible. This does not mean that a quantum processor is less powerful than its classical counterpart, though, as one can show that any classical operation can be efficiently simulated on a quantum processor, while the opposite is not true.
K.M.R. Audenaert / Mathematical Aspects of Quantum Information Theory
15
2.8.1. Distinguishing States Suppose I give you one of the two following states: ψ = (1, 0)T or φ = (cos α, sin α)T . Can you tell me which one I gave you by making a measurement? Consider first making an orthogonal measurement, using the POVM {E1 = e11 , E2 = e22 }. A simple calculation shows that the probability of outcome 1 is 1 for ψ, and cos2 α for φ. The probability of outcome 2 is 0 for ψ, and 1 − cos2 α for φ. If you get outcome 2, you know it must be the state φ. However, if you get outcome 1, you don’t know anything, as it could have come from either state! To make a long story short, it can be shown that no other measurement would work either. The upshot of this is that states can be distinguished perfectly only when they are orthogonal (in the above example when α = π/2) [17]. Incidentally, this means that classical information (bits) can therefore be described within the QM framework using orthogonal states (qubits): ‘0’→ |0, ‘1’→ |1. 2.8.2. Cloning States In the previous section we noted that states can be distinguished perfectly only when they are orthogonal. A possible way out would exist, however, if we could make lots of copies of the state and perform a measurement on each copy. Indeed, the probability of getting outcome 1 n times on each of the copies of state φ is cos2n α, which does tend to 0. Unfortunately, we have the Absolutely astonishing fact about QM #32: States Cannot Be Cloned. This is the content of the No-Cloning Theorem [54,13], the proof of which is very easy. A cloner would be a machine with behaviour |ψ → |ψ ⊗ |ψ, for any ψ [11]. As copying data is a basic computer operation, it would be nice to have this in a Quantum Computer too! We can certainly do this for states |0 and |1. However, for other states linearity of QM then comes in and demands that the cloner acts on ψ = (a, b)T as |ψ = a|0 + b|1 → a|00+ b|11, which is not the same as the expected (a|0+ b|1)⊗ (a|0+ b|1). So, to be exact, there exists no quantum operation with the behaviour |ψ → |ψ ⊗ |ψ, for every ψ, but it does exist when the ψ are restricted to come from an orthonormal set. The last statement is obvious, as otherwise many companies producing photocopiers would be out of business. Every cloud has a silver lining, and the impossibility of general cloning can be exploited. If we cannot clone a state, that means no one else can. This simple observation forms the basis of the BB-84 protocol [5] in Quantum Cryptography. It exploits the fact that an eavesdropper cannot clone a set of non-orthogonal states, and hence cannot drop eaves on them (measure them) without going unnoticed. 2.8.3. Information contained in a state To prepare a particle in a specified qubit-state, one actually needs an infinite amount of information: all the digits in the complex numbers that specify the state vector’s amplitudes. However, when making a measurement of a qubit-state, at most 1 bit of information comes out. The simplistic argument here is that either the ‘H’-detector or the ‘V’detector makes a “click”. The precise, information-theoretical argument is that the maximal information gain (called the accessible information [43]) by doing any measurement on a qubit is 1 bit [42].
16
K.M.R. Audenaert / Mathematical Aspects of Quantum Information Theory
Even if the measurement does not destroy the particle, it destroys its initial state. If detector ‘H’ clicks, the state will have collapsed into a ‘H’-state. It’s therefore no use measuring the same particle a second time. Hence, it is impossible to determine an unknown state of a particle completely. Nature allows us 1 measurement, and 1 bit of information per measurement. We cannot measure the state of a single particle! If, however, there is a supply of lots of particles prepared in the same state, ρ⊗N , we can approximate ρ using State Tomography [12]. 2.8.4. Moving States Another basic classical operation is moving data. Again we can ask whether this can be generalised to the quantum realm: Can we transfer a state from one particle to another, without actually moving the particles themselves (which is quite impossible if the particles are, say, atoms in a crystal lattice)? Method #1. If both particles are close enough that a joint unitary operation can be performed on them, one can apply the swap or flip unitary ⎞ 1000 ⎜0 0 1 0⎟ ⎟ F=⎜ ⎝0 1 0 0⎠ . 0001 ⎛
This unitary actually interchanges the states of source and destination particles: F(ψ1 ⊗ ψ2 ) = ψ2 ⊗ ψ1 . Method #2. Let’s now say the particles are so far away that we can’t apply joint unitaries on the particles, for example, if the particles are in different labs. Introducing some QIT Lingo: rather than use the word “lab”, we use “party”, and call these parties Alice, Bob, Charlie,. . . In tune with this terminology, states shared between 2 or more parties are called bi-partite and multi-partite states. Note that each party may hold several particles. By definition, global quantum operations encompassing several parties are not possible, because per lab/party, only quantum operations local to the lab are possible. On top of performing local quantum operations, parties may also engage in classical communication (e.g. via the phone). These operations, and combinations thereof, are called LOCC operations: Local Operations plus Classical Communication. So the question is: can states be transferred from one party to another using LOCC only? A “naive” way of doing this would be to measure the state of party A, transfer this measurement outcome using classical communication to party B, and then prepare the measured state at party B. This is of course not possible as we cannot determine a state with a single measurement. Absolutely astonishing fact about QM #33: perfect state transfer between two distant parties is still possible if, apart from the ability to perform LOCC operations, the parties share an EPR-pair! The protocol for doing so is called state teleportation [6]. (A somewhat unfortunate choice of terminology, I may add, because of the undesired connotations to a certain Science-Fiction series.) Let Alice hold particle 1, whose state ρ she wishes to transfer to Bob. Alice and Bob share a pure entangled state ψ in particles 2 and 3. The total initial state is thus |φφ|2,3 ⊗ ρ1 .
K.M.R. Audenaert / Mathematical Aspects of Quantum Information Theory
17
Alice performs a joint measurement {Ei }4i=1 on her particles 1 and 2, where the Ei are pure orthogonal states Ei = |φi φi |. If this measurement yields outcome i, the state of Bob’s particle 3 will be ψ˜φ˜i ρφ˜∗i ψ˜∗ /pi , where the probability p√i = 1/4. If we choose for ψ an EPR state, and for φi the Bell √ states, then 2ψ˜ and 2φ˜i are unitaries. After Alice tells Bob the outcome i of her measurement (via CC), Bob can perform a unitary evolution on his particle that undoes the unitary 2ψ˜φ˜i , and he ends up with the desired state ρ (whatever the value of i).
3. Entanglement Measures 3.1. Pure State Entanglement Consider bipartite states, states that are shared between party A and B. Recall that the Schmidt coefficients of a pure bipartite state (the σi in ψ = k σk |uk |v k ) allow to distinguish between product states and entangled states. A state is entangled if and only if its Schmidt rank is greater than 1. A product state has Schmidt coefficients √ (1, 0) and √ has no entanglement. The Bell states have Schmidt coefficients (1/ 2, 1/ 2) and have maximal entanglement. Intuition suggests that pure states with Schmidt coefficients close to (1, 0) should be “a little” entangled; that there should be a continuous measure of entanglement. To make a long story short: one can indeed define such continuous measures. Moreover, for pure bipartite states, there is one such measure that stands out, namely the entropy of the squared Schmidt coefficients E(|ψψ|) = −
σk2 log(σk2 ) = S(TrA |ψψ|) = S(TrB |ψψ|)
i
where S is the von Neumann Entropy S(ρ) = − Tr(ρ log ρ). The name of this entanglement measure is the Entropy of Entanglement [7]. The σk2 formally make up a probability distribution, and E expresses the amount of “mixedness” of that distribution. It is easily checked that for product states E = 0, while for Bell states E = 1 (taking the base 2 logarithm). This immediately provokes a number of questions. First of all, how about mixed states? Does E(ρ) make sense in that case? The answer to that is: No. Only for pure states ρ do S(TrA ρ) and S(TrB ρ) coincide. Worse, the maximally mixed state has E(11/dA dB ) > 0 although we know it is not entangled. A second, and deeper, question asks about the meaning of E. 3.2. Operational Definitions of Entanglement To make headway, we need to define entanglement in an operational way. Let us propose a new definition of entangled state, as those states that cannot be “made from” product states using LOCC [49]. This loose phrase actually means the following: If, starting from a large supply of N initial (bipartite) states ρ, ρ⊗N , there is an LOCC operation
18
K.M.R. Audenaert / Mathematical Aspects of Quantum Information Theory
transforming ρ⊗N to a state τ , where τ is either identical to σ ⊗αN , or indistinguishable from it (D(τ, σ ⊗αN ) → 0) when N tends to ∞, then ρ can be asymptotically LOCCtransformed to σ with yield α. Here D(ρ, σ) = Tr |ρ − σ|/2 is the Trace Distance between the two states. 3.3. Entanglement Cost One defines the entanglement cost [7,16] EC (ρ) of ρ as the inverse of the maximal obtainable yield of transforming an EPR state into ρ using LOCC. Obviously, EC for an EPR state is 1: here, the best LOCC operation is doing nothing, and already gives yield 1. One can show that for pure states EC = E [7]. For some states you don’t need initial EPR states at all: these are the so-called separable states. They are of the form ρA,B =
i pi ρiA ⊗ σB
i
where the pi form a probability distribution [49]. For example, the maximally mixed state is separable. To show that separable states have EC = 0, consider the following LOCC protocol for creating them. A and B have a list of prescriptions to make states ρi and σ i . A generates a random number i with known probability pi , which she sends to B using CC. A makes ρi and B makes σ i , and both subsequently forget i. After this willful act of forgetting, the joint state, which is representing the only remaining information A and B i . There were no EPR pairs needed, have about how they made it, will be i pi ρiA ⊗ σB so indeed EC = 0. Only very recently has it been shown that EC > 0 for non-separable states [55]. 3.4. Separability Another research problem: Find an efficient method to determine from its matrix elements whether or not a bipartite state is separable. The solution for states of dimension 2 × 2 and 2 × 3 is rather simple and is given by the famous Peres Criterion [38,21]. States of these dimensions are separable if and only if their partial transpose is still positive: (T ⊗ 11)(ρ) ≥ 0. We say such states are PPT. Recall that the transpose is a positive map, but not a completely positive one, which is why the partial transpose if of any use here. For higher dimensions, the situation is much more difficult: ρ is separable if and only if (Φ ⊗ 11)(ρ) ≥ 0 for all positive maps Φ [21]. However, this is not an efficient criterion, as the set of positive maps is infinite, nor can it be generated using a finite number of generating positive maps. One can of course resort to the Peres criterion, because if a state is separable, it must be PPT. Unlike the 2 × 2 and 2 × 3 cases the converse is not true. There exist PPT states that are not separable [22]. Such states fall in the class of bound entangled states (see below).
K.M.R. Audenaert / Mathematical Aspects of Quantum Information Theory
19
3.5. Entanglement of Formation The entanglement of formation (EoF) EF is an entanglement measure akin to entanglement cost but defined in a way more amenable to calculation [8]. Let the state ρ be the barycenter of the ensemble {pi , ψi }. The states in the ensemblehave (entropy of) entanglement E(ψi ). This ensemble has an average entanglement i pi E(ψi ). There might be other ensembles realising ρ, but with lower average E. The EoF of ρ is the minimal value: the minimal average entanglement of all ensembles realising ρ. Another way of saying is that the EoF functional is the convex hull of the pure state entanglement functional [3]. For 2 × 2 states, there is an analytical formula for the EoF by Wootters [53]. 3.6. Entanglement of Distillation The Entanglement cost measures the yield of LOCC-transforming EPR states into the desired state. Conversely, the Entanglement of Distillation ED is the yield of LOCCtransforming the state ρ back into EPR-states [8]. One can show that for pure states again ED = E [7]. One says that pure states can be reversibly converted to EPR states. This is no longer true for mixed states; in general ED < EC . One can show that for every entanglement measure EX that obeys certain axioms, one has ED ≤ EX ≤ EC [24]. Furthermore, if the partial transpose of ρ is positive, its ED is 0 [22]. Thus, for nonseparable PPT states, EC > 0 while ED = 0: this is called bound entanglement [22], as opposed to free entanglement (these names have been chosen with the thermodynamical analogs of free and bound energy in mind [23]). Research question: does the set of bound entangled states coincide with the non-separable PPT states? 3.7. Entanglement Cost, Revisited Big Research Problem: how do you calculate EC ? As mentioned above, EC is defined in an operational way, and is nearly impossible to calculate. The first theoretical breakthrough towards finding a way to calculate it was the discovery [16] that EC is equal to the regularisation of EF , the entanglement of formation (EoF): this means that EC (ρ) = lim EF (ρ⊗n )/n. n→∞
Here, A and B share n copies of the state ρ, ρ⊗n . Although an important step, it still leaves us with calculations over infinite-dimensional states. The regularisation ρ → ρ⊗n is about calculating potential “wholesale discounts” in making the state. Is making n copies of ρ one by one as expensive as making ρ⊗n in one go, or does Nature give us quantum discounts? If there are no such discounts, EF satisfies the property of Additivity EF (ρ1 ⊗ ρ2 ) = EF (ρ1 ) + EF (ρ2 ), and then, simply, EC = EF . Additivity has been proven in specific instances [47,36]. Some of these additivity results are sufficiently powerful to allow calculating EC for certain classes of mixed states. The much sought-after general proof, however, remains elusive for the time being and, in fact, general additivity is still a conjecture.
20
K.M.R. Audenaert / Mathematical Aspects of Quantum Information Theory
3.8. Equivalence of Additivity Problems We just met the additivity problem for the EoF. In QIT there are several other additivity problems related to the so-called classical capacity of a quantum channel. Mathematically speaking, a quantum channel is just a CPT map, and it carries that name because it is the proper generalisation of a classical information carrying channel to the quantum realm. As space limitations prohibit us from giving even the shortest treatment here, we limit ourselves to mentioning that the classical capacity is a bound on the amount of quantum error correction that is needed for sending classical information reliably along a quantum channel[18,37]. One can show that the classical capacity equals the so-called Holevo capacity – which is a quantity one can calculate – provided the latter is additive (just as the entanglement cost equals the EoF is the latter is additive). Although channel capacities seem to have nothing in common with entanglement measures, it has been proven that the two mentioned additivity problems are equivalent [36,3,44]: the EoF is additive if and only if the Holevo capacity is. Moreover, these additivities are equivalent to a third additivity, which may well be the easiest one to prove, namely additivity of the minimal output entropy (MOE) of a channel. As the Holevo capacity looks quite complicated as a mathematical entity, the MOE has been introduced, more or less as a toy problem – that it would later turn out to be just as important was a pleasant surprise. In general, if a quantum operation (channel) acts on a pure state, the result (channel output) will be a mixed state. The MOE, νS , of an operation or channel quantifies how close to purity one can get by appropriately choosing the input state, where the von Neumann entropy S is used as a measure of purity: νS (Ω) = min{S(Ω(|ψψ|)) : ||ψ|| = 1}. ψ
So, here we have yet another research problem: Is the MOE additive? I.e., for channels Φ and Ω, does one have νS (Φ ⊗ Ω) = νS (Φ) + νS (Ω)? As a measure of purity, one can alternatively use the Schatten q-norm ||X||q = (Tr X q )1/q . This norm is equal to the q norm of the eigenvalues of X: ||X||q = (
λi (X)q )1/q .
i
Then the maximal output purity (MOP), νq , is νq (Ω) = max{||Ω(|ψψ|)||q : ||ψ|| = 1}. ψ
The MOE is additive if and only if νq is multiplicative for q in a finite interval [1, 1 + ]: νq (Φ ⊗ Ω) = νq (Φ) νq (Ω)? Multiplicativity of νq was conjectured by Amosov, Holevo and Werner [1], and proven for various classes of channels [25,26,27,28,29,30,31,32]. There is, however, a
K.M.R. Audenaert / Mathematical Aspects of Quantum Information Theory
21
counterexample for q > 4.79 and channels that are at least 3-dimensional [50]. Note that this does not compromises additivity of the MOE, as multiplicativity of MOP is only required to hold for q in some interval [1, 1 + ], > 0. In the author’s view, the way forward is the study of matrix inequalities. For example, the following inequality would imply multiplicativity of MOP in case one of the channels is a qubit channel. For q ≥ 2 and general matrices A, B, X and Y , is it true that ||A ⊗ X + B ⊗ Y ||q ≤ max ||X||q A + eiθ ||Y ||q B q ? θ
We have verified this conjecture numerically for small dimensions, but finding a proof seems to be a hard matrix problem.
Acknowledgements The author was supported by The Leverhulme Trust (grant F/07 058/U), and by EPSRC under the QIP-IRC (www.qipirc.org) (grant GR/S82176/0).
References [1] G.G. Amosov, A.S. Holevo and R.F. Werner, “On Some Additivity Problems in Quantum Information Theory”, Problems in Information Transmission 36, 25–34 and arXiv.org preprint math-ph/0003002 (2000). [2] G. G. Amosov and A. S. Holevo, “On the multiplicativity conjecture for quantum channels”, Theor. Probab. Appl. 47, 143–146 (2002) and arXiv.org preprint math-ph/0103015. [3] K.M.R. Audenaert and S.L. Braunstein, “Strong Superadditivity of the Entanglement of Formation”, Commun. Math. Phys. 246 No 3, 443-452 (2004). [4] K.M.R. Audenaert, F. Verstraete and B. De Moor, “Variational Characterizations of Separability and Entanglement of Formation”, Phys. Rev. A 64, 052304 (2001). [5] C.H. Bennett and G. Brassard, “Quantum cryptography: public key distribution and coin tossing”, Proc. IEEE Intl. Conf. on Computers, Systems and Signal Processing, Bangalore, India, 175–179 (1984). [6] C.H. Bennett, G. Brassard, C. Crépeau, R. Jozsa, A. Peres and W. Wootters, “Teleporting an unknown quantum state via dual classical and EPR channels”, Phys. Rev. Lett. 70, 1895–1899 (1993). [7] C. H. Bennett, H. Bernstein, S. Popescu and B. Schumacher, “Concentrating Partial Entanglement by Local Operations”, Phys. Rev. A 53, 2046 (1996). [8] C.H. Bennett, D.P. DiVincenzo, J.A. Smolin, W.K. Wootters, “Mixed State Entanglement and Quantum Error Correction”, Phys.Rev. A 54, 3824–3851 (1996). [9] R. Bhatia, Matrix Analysis, Springer Verlag, New York (1997). [10] M.-D. Choi, “Completely Positive Linear Maps on Complex Matrices”, Lin. Alg. Appl. 10, 285–290 (1975). [11] V. Buzek and M. Hillery, “Quantum cloning”, Physics World 14, 25–29 (2001). [12] G.M. D’Ariano, M.G.A. Paris and M.F. Sacchi, ‘Quantum tomography”, Advances in Imaging and Electron Physics 128, 205–308 (2003). [13] D. Dieks, “Communication by EPR devices”, Physics Letters A, 92, 271–272 (1982). [14] A.C. Doherty, P.A. Parrilo and F.M. Spedalieri, “A Complete Family of Separability Criteria”, Phys. Rev. A 69, 022308 (2004). [15] A. Einstein, B. Podolsky and N. Rosen, Phys. Rev. 47, 777 (1935).
22
K.M.R. Audenaert / Mathematical Aspects of Quantum Information Theory
[16] P. Hayden, M. Horodecki and B.M. Terhal, J. Phys. A 34, 6891 (2001). [17] C.W. Helstrom, Quantum detection and estimation theory, Academic Press, New York (1976). [18] A.S. Holevo, “Quantum Coding Theorems”, Russian Math. Surveys 53:6, 1295–1331 (1998) and arXiv.org preprint quant-ph/9808023. [19] R.A. Horn and C.R. Johnson, Matrix Analysis, Cambridge University Press, Cambridge (1985). [20] R.A. Horn and C.R. Johnson, Topics in Matrix Analysis, Cambridge University Press, Cambridge (1991). [21] M. Horodecki, P. Horodecki, and R. Horodecki, “Separability of mixed states: necessary and sufficient conditions”, Phys. Lett. A 223, 1–8 (1996). [22] M. Horodecki, P. Horodecki, R. Horodecki, “Mixed-state entanglement and distillation: is there a “bound” entanglement in nature?”, Phys.Rev.Lett. 80, 5239-5242 (1998). [23] P. Horodecki, M. Horodecki and R. Horodecki, “Entanglement and thermodynamical analogies”, Acta Phys. Slovaca 48, 141 (1998). [24] M. Horodecki, P. Horodecki and R. Horodecki, “Limits for entanglement measures”, Phys. Rev. Lett. 84, 2014 (2000). [25] Ch. King, “Maximization of capacity and lp norms for some product channels”, J. Math. Phys 43, 1247–1260 (2002) and arXiv.org preprint quant-ph/0103086. [26] Ch. King, “Additivity for unital qubit channels”, J. Math. Phys. 43, 4641–4653 (2002) and arXiv.org preprint quant-ph/0103156. [27] Ch. King, “The capacity of the quantum depolarizing channel”, IEEE Trans. Info. Theory, 49, 221–229 (2003) and arXiv.org preprint quant-ph/0204172. [28] Ch. King, “Maximal p-norms of entanglement breaking channels”, Quant. Info. Comp. 3, 186–190 (2003) and arXiv.org preprint quant-ph/0212057. [29] Ch. King, “Inequalities for trace norms of 2 × 2 block matrices”, Commun. Math. Phys. 242, 531–545 (2003) and arXiv.org preprint quant-ph/0302069. [30] Ch. King and M.B. Ruskai, “Minimal Entropy of States Emerging from Noisy Quantum Channels”, IEEE Trans. Info. Theory, 47, 192-209 (2001) and arXiv.org preprint quantph/9911079. [31] Ch. King and M.B. Ruskai, “Capacity of Quantum Channels Using Product Measurements”, J. Math. Phys. 42, 87–98 (2001) and arXiv.org preprint quant-ph/0004062 (2000). [32] Ch. King and M.B. Ruskai, “Comments on multiplicativity of maximal p norms when p = 2”, Festschrift in honor of A. Holevo’s 60th birthday, and arXiv.org preprint quant-ph/0401026 (2004). [33] K. Kraus, States, Effects, and Operations, Springer-Verlag, Berlin (1983). [34] L.D. Landau and E.M. Lifschitz, Quantum Mechanics, volume 3 of Course of Theoretical Physics, Butterworth-Heinemann (1996). [35] M. Lewenstein, D. Bruß, J. Cirac, B. Kraus, M. Kus, J. Samsonowicz, A. Sanpera and R. Tarrach, arXiv.org preprint quant-ph/0006064 (2000). [36] K. Matsumoto, T. Shimono and A. Winter, “Remarks on additivity of the Holevo channel capacity and of the entanglement of formation”, Comm. Math. Phys. 246, 427–442, (2004). [37] M.A. Nielsen and I.L. Chuang, “Quantum Computation and Quantum Information,” Cambridge University Press, Cambridge (2000). [38] A. Peres, “Separability Criterion for Density Matrices”, Phys. Rev. Lett. 77, 1413–1415 (1996). [39] Asher Peres, Quantum Theory: Concepts and Methods, Kluwer (1993). [40] Martin B. Plenio and S. Virmani, “An introduction to entanglement measures”, arXiv.org preprint quant-ph/0504163 (2005). [41] John Preskill, Caltech PH-229 Lecture Notes, Quantum Computation and Information, http://www.theory.caltech.edu/people/preskill/ph229. [42] B. Schumacher, “Quantum coding”, Phys. Rev. A 51, 2738 (1995).
K.M.R. Audenaert / Mathematical Aspects of Quantum Information Theory
23
[43] B. Schumacher, “Information from quantum measurements”, in Complexity, Entropy and the Physics of Information, W.H. Zurek, ed., 29–37, Addison-Wesley, Santa Fe Institute Studies in the Sciences of Complexity, vol. VIII (1990). [44] P. W. Shor, “Equivalence of additivity questions in quantum information theory,” Commun. Math. Phys. 246 No 3 (2004). [45] C. Southgate, ed., God, Humanity and the Cosmos, T&T Clark, Edinburgh (1999). ˝ [46] W.F. Stinespring: “Positive functions on C*-algebras”, Proc.Amer.Math.Soc. 6, 211-U216 (1955). [47] G. Vidal, W. Dür and J. I. Cirac, “Entanglement cost of mixed states”, Phys. Rev. Lett. 89, 027901 (2002). [48] J. von Neumann, Mathematical foundation of Quantum Mechanics, Princeton University Press, Princeton (1996). [49] R.F. Werner, “Quantum states with Einstein-Podolsky-Rosen correlations admitting a hiddenvariable model”, Phys. Rev. A 40, 4277–4281 (1989). [50] R.F. Werner and A.S. Holevo, “Counterexample to an Additivity Conjecture for Output Purity of Quantum Channels”, J. Math. Phys. 43(9), 4353–4357 (2002) and arXiv.org preprint quantph/0203003. [51] H. Woerdeman, “Checking 2 × M quantum separability via semidefinite programming”, Phys. Rev. A 67, 010303 (2003). [52] H. Woerdeman, “The separability problem and normal completions”, Lin. Alg. Appl. 376, 85–95 (2004). [53] W. Wootters, “Entanglement of Formation of an Arbitrary State of Two Qubits”, Phys. Rev. Lett. 80, 2245–2248 (1998). [54] W.K. Wootters and W.H. Zurek, “A Single Quantum Cannot be Cloned”, Nature 299, 802– 803 (1982). [55] D. Yang, M. Horodecki, R. Horodecki and B. Synak-Radtke, “Irreversibility for all bound entangled states”, Phys. Rev. Lett. 95, 190501 (2005).
This page intentionally left blank
25
Physics and Theoretical Computer Science J.-P. Gazeau et al. (Eds.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
Dynamical Symmetry Approach to Entanglement Alexander Klyachko 1 Bilkent University, Turkey Abstract. In these lectures I explain a connection between geometric invariant theory and entanglement, and give a number of examples how this approach works. Keywords. Entanglement, Dynamic symmetry, Geometric invariant theory
1. Physical background 1.1. Classical mechanics Let me start with classical nonlinear equation d2 θ = −ω 2 sin θ, dt2
ω2 =
g
(1)
describing graceful swing of a clock pendulum in a corner of Victorian drawing room. It has double periodic solution θ(t) = θ(t + T ) = θ(t + iτ ), with real period T , and imaginary one iτ . Out of this equation, carefully studied by Legendre, Abel, and Jacobi, stems the whole theory of elliptic functions. Physicists are less interested in mathematical subtleties, and usually shrink equation (1) to linear one d2 θ = −ω 2 θ, dt2
|θ| 1
with simple harmonic solution θ = e±iωt . This example outlines a general feature of classical mechanics, where linearity appears mainly as a useful approximation. 1 Correspondence to: A. Klyachko, Bilkent University, 06800, Bilkent, Ankara, Turkey. Tel.: +90 312 290 2115; Fax: +90 312 266 4579; E-mail:
[email protected]
26
A. Klyachko / Dynamical Symmetry Approach to Entanglement
1.2. Quantum mechanics In striking contrast to this, quantum mechanics is intrinsically linear, and therefore simpler than the classical one, in the same way as analytic geometry of Descartes is simpler than synthetic geometry of Euclid. As a price for its simplicity quantum mechanics runs into enormous difficulties to manifest itself in a harsh macroscopic reality. This is what makes quantum phenomenology so tricky. Mathematicians encounter a similar problem when they try to extract geometrical gist from a mess of coordinate calculations. In both cases the challenge is to cover formal bonds of mathematical skeleton with flesh of meaning. As we know from Klein’s Erlangen program , the geometrical meaning rests upon invariant quantities and properties (w.r. to a relevant structure group G). This thesis effectively reduces “elementary” geometry to invariant theory. As far as physics is concerned, we witnessed its progressive geometrization in the last decades [65,25] . To name a few examples: general relativity, gauge theories, from electro-weak interactions to chromodynamics, are all geometrical in their ideal essence. In these lectures, mostly based on preprint [32], I explain a connection between geometric invariant theory and entanglement, and give a number of examples how this approach works. One can find further applications in [33,34]. 1.3. Von Neumann picture A background of a quantum system A is Hilbert space HA , called state space. Here, by default, the systems are expected to be finite : dim HA < ∞. A pure state of the system by projector operator |ψψ|, if the phase factor is given by unit vector ψ ∈ HA , or is irrelevant. Classical mixture ρ = i pi |ψi ψi | of pure states called mixed state or density matrix . This is a nonnegative Hermitian operator ρ : HA → HA with unit trace Tr ρ = 1. An observable of the system A is Hermitian operator XA : HA → HA . Actual measurement of XA upon the system in state ρ produces a random quantity xA ∈ Spec XA implicitly determined by expectations f (xA )ρ = Tr (ρf (XA )) = ψ|f (XA )|ψ for arbitrary function f (x) on Spec XA (the second equation holds for pure state ψ). The measurement process puts the system into an eigenstate ψλ with the observed eigenvalue λ ∈ Spec XA . Occasionally we use ambiguous notation |λ for the eigenstate with eigenvalue λ. 1.4. Superposition principle The linearity of quantum mechanics is embedded from the outset in Schrödinger equation describing time evolution of the system i
dψ = HA ψ dt
(2)
where HA : HA → HA is the Hamiltonian of the system A. Being linear Schrödinger equation admits simple solution
A. Klyachko / Dynamical Symmetry Approach to Entanglement
27
ψ(t) = U (t)ψ(0), (3) t where U (t) = exp − i 0 HA (t)dt is unitary evolution operator. Solutions of Schrödinger equation (2) form a linear space. This observation is a source of the general superposition principle, which claims that a normalized linear combination aψ + bϕ of realizable physical states ψ, ϕ is again a realizable physical state (with no recipe how to cook it). This may be the most important revelation about physical reality after atomic hypothesis. It is extremely counterintuitive and implies, for example, that one can set the celebrated Shcrödinger cat into the state ψ = |dead + |alive intermediate between death and life. As BBC put it: “In quantum mechanics it is not so easy to be or not to be.” From the superposition principle it follows that state space of composite system AB splits into tensor product HAB = HA ⊗ HB of state spaces of the components, as opposed to direct product PAB = PA × PB of configuration spaces in classical mechanics. 1.5. Consequences of linearity The linearity imposes severe restrictions on possible manipulations with quantum states. Here is a couple of examples. 1.5.1. No-cloning Theorem Let’s start with a notorious claim Theorem ([67], [12]). An unknown quantum state can’t be duplicated. Indeed the cloning process would be given by operator ψ ⊗ (state of the Cloning Machine) → ψ ⊗ ψ ⊗ (another state of the Machine) which is quadratic in state vector ψ of the quantum system. 1.5.2. Inaccessibility of quantum information As another application of linearity consider the following Theorem. No information on quantum system can be gained without destruction of its state.
28
A. Klyachko / Dynamical Symmetry Approach to Entanglement
Indeed the measurement process is described by linear operator U : ψini ⊗ Ψini → ψfin ⊗ Ψfin , where ψ and Ψ are states of the system and the measurement device respectively. The initial state Ψini of the apparatus is supposed to be fixed once and for all, so that the final state ψfin ⊗ Ψfin is a linear function of ψini . This is possible only if • ψfin is linear in ψini and Ψfin is independent of ψini , • or vice versa Ψfin is linear in ψini and ψfin is independent of ψini . In the former case the final state of the measurement device contains no information on the system, while in the latter the unknown initial state ψini is completely erased in the measurement process. Emmanuel Kant, who persistently defended absolute reality of unobservable “thingin-itself”, or noumenon, as opposed to phenomenon, should be very pleased with this theorem identifying noumenon with quantum state. The theorem suggests that complete separation of a system from a measuring apparatus is unlikely. As a rule the system remains entangled, with the measuring device, with two exceptions described above. 1.6. Reduced states and first glimpse of entanglement The density matrix of composite system AB can be written as a linear combination of separable states α ρAB = aα ρα (4) A ⊗ ρB , α α ρα A , ρB
are mixed states of the components A, B respectively, and the coefficients where aα are not necessarily positive. Its reduced matrices or marginal states may be defined by equations α ρA = α aα Tr (ρα B )ρA := Tr B (ρAB ), α ρB = α aα Tr (ρα A )ρB := Tr A (ρAB ). The reduced states ρA , ρB are independent of the decomposition (4) and can be characterized intrinsically by the following property XA ρAB = Tr (ρAB XA ) = Tr (ρA XA ) = XA ρA ,
∀
XA : HA → HA , (5)
which tells that ρA is a “visible” state of subsystem A. This justifies the terminology. Example 1.6.1. Let’s identify pure state of two component system ψ=
ψij αi ⊗ βj ∈ HA ⊗ HB
ij
with its matrix [ψij ] in orthonormal bases αi , βj of HA , HB . Then the reduced states of ψ in respective bases are given by matrices
A. Klyachko / Dynamical Symmetry Approach to Entanglement
ρA = ψ † ψ,
ρB = ψψ † ,
29
(6)
which have the same non negative spectra Spec ρA = Spec ρB = λ
(7)
except extra zeros if dim HA = dim HB . The isospectrality implies the so called Schmidt decomposition ψ= λi ψiA ⊗ ψiB , (8) i
where ψiA , ψiB are eigenvectors of ρA , ρB with the same eigenvalue λi . In striking contrast to the classical case marginals of a pure state ψ = ψA ⊗ ψB are mixed ones, i.e. as Srödinger put it “maximal knowledge of the whole does not necessarily includes the maximal knowledge of its parts” [58]. He coined the term entanglement just to describe this phenomenon. Von Neumann entropy of the marginal states provides a natural measure of entanglement E(ψ) = −Tr (ρA log ρA ) = −Tr (ρB log ρB ) = − λi log λi . (9) i
In equidimensional system dim HA = dim HB = n the maximum of entanglement, equal to log n entangled bits (ebits), is attained for a state with scalar reduced matrices ρ A , ρB . 1.7. Quantum dynamical systems In the above discussion we tacitly suppose, following von Neumann, that all observables XA : HA → HA or what is the same all unitary transformations eitXA : HA → HA are equally accessible for manipulation with quantum states. However physical nature of the system may impose unavoidable constraints. Example 1.7.1. The components of composite system HAB = HA ⊗ HB may be spatially separated by tens of kilometers, as in EPR pairs used in quantum cryptography. In such circumstances only local observations XA and XB are available. This may be even more compelling if the components are spacelike separated at the moment of measurement. Example 1.7.2. Consider a system of N identical particles, each with space of internal degrees of freedom H. By Pauli principle state space of such system shrinks to symmetric tensors S N H ⊂ H⊗N for bosons, and to skew symmetric tensors ∧N H ⊂ H⊗N for fermions. This superselection rule imposes severe restriction on manipulations with quantum states, effectively reducing the accessible measurements to that of a single particle. Example 1.7.3. State space Hs of spin s system has dimension 2s + 1. Measurements upon such a system are usually confined to spin projection onto a chosen direction. They generate Lie algebra su (2) rather than full algebra of traceless operators su (2s + 1).
30
A. Klyachko / Dynamical Symmetry Approach to Entanglement
This consideration led many researchers to the conclusion that available observables should be included in description of any quantum system from the outset [24,16]. Robert Hermann stated this thesis as follows: “The basic principles of quantum mechanics seem to require the postulation of a Lie algebra of observables and a representation of this algebra by skew-Hermitian operators.” We’ll refer to the Lie algebra L as algebra of observables and to the corresponding group G = exp(iL) as dynamical symmetry group of the quantum system in question. Its state space H together with a unitary representation of the dynamical group G : H is said to be quantum dynamical system. The choice of the algebra L depends on the measurements we are able to perform over the system, or what is the same the Hamiltonians which are accessible for manipulation with quantum states. For example, if we are restricted to local measurements of a system consisting of two remote components A, B with full access to the local degrees of freedom then the dynamical group is SU(HA ) × SU(HB ) acting in HAB = HA ⊗ HB . In settings of Example 1.7.2 suppose that a single particle is described by dynamical system G : H. Then ensemble of N identical particles corresponds to dynamical system G : S N H for bosons, and to G : ∧N H for fermions. The dynamical group of spin system from Example 1.7.3 is SU(2) in its spin s representation Hs .
2. Coherent states Coherent states, first introduced by Schrödinger [57] in 1926, lapsed into obscurity for decades until Glauber [22] recovered them in 1963 in connection with laser emission. He had to wait more than 40 years to win Nobel Prize in 2005 for three paper published in 1963-64. Later in 70th Perelomov [47,48] puts coherent states into general framework of dynamical symmetry groups. We’ll use a similar approach for entanglement, and to warm up recall here some basic facts about coherent states. 2.1. Glauber coherent states Let’s start with quantum oscillator, described by canonical pair of operators p, q, [q, p] = i, generating Weyl-Heisenberg algebra W. This algebra has unique unitary irreducible representation, which can be realized in the Fock space F spanned by the orthonormal set of n-excitations states |n on which dimensionless annihilation and creation operators q + ip a= √ , 2
q − ip a† = √ , 2
[a, a† ] = 1
act by formulae a|n =
√ n|n − 1,
a† |n =
√ n + 1|n + 1.
A. Klyachko / Dynamical Symmetry Approach to Entanglement
31
A typical element from the Weyl-Heisenberg group W = exp W, up to a phase factor, is of the form D(α) = exp(αa† − α∗ a) for some α ∈ C. Action of this operator on vacuum |0 produces state |α|2 αn √ |n, (10) |α := D(α)|0 = exp − 2 n! n≥0 known as Glauber coherent state. The number of excitations in this state has Poisson distribution with parameter |α|2 . In many respects its behavior is close to classical, e.g. Heisenberg’s uncertainty ΔpΔq = /2 for this state is minimal possible. In coordinate representation q = x,
p = −i
d dx
√ its time evolution is√given by harmonic oscillation of Gaussian distribution of width with amplitude |α| 2. Therefore for a large number of photons |α|2 1 coherent states behave classically. Recall also the Glauber’s theorem [23] which claims that classical field or force excites quantum oscillator into a coherent state. We’ll return to these aspects of coherent states later, and focus now on their mathematical description Glauber coherent states = W -orbit of vacuum which sounds more suggestive than explicit equation (10). 2.2. General coherent states Let’s now turn to arbitrary quantum system A with dynamical symmetry group G = exp iL. By definition its Lie algebra L = Lie G is generated by all essential observables of the system (like p, q in the above example). To simplify the underling mathematics suppose in addition that the state space of the system HA is finite, and that the representation of G in HA is irreducible. To extend (10) to this general setting we have to understand the special role of the vacuum, which is primarily considered as a ground state of a system. For grouptheoretical approach, however, another its property is more relevant: Vacuum is a state with maximal symmetry . This may be also spelled out that vacuum is a most degenerate state of a system. 2.3. Complexified dynamical group Symmetries of state ψ are given by its stabilizers Gψ = {g ∈ G | gψ = μ(g)ψ},
Lψ = {X ∈ L | Xψ = λ(X)ψ}
(11)
in the dynamical group G or in its Lie algebra L = Lie G. Here μ(g) and λ(X) are scalars. Looking back to the quantum oscillator, we see that some symmetries are actu-
32
A. Klyachko / Dynamical Symmetry Approach to Entanglement
ally hidden, and manifest themselves only in the complexified algebra Lc = L ⊗ C and group Gc = exp Lc . For example, the stabilizer of vacuum |0 in the Weyl-Heisenberg algebra W is trivial W|0 = scalars, while in the complexified algebra W c it contains c = C + Ca. In the last case the stabilizer is big enough to the annihilation operator, W|0 recover the whole dynamical algebra c c † + W|0 . W c = W|0
This decomposition, called complex polarization, gives a precise meaning to the maximal degeneracy of a vacuum or a coherent state. 2.4. General definition of coherent state State ψ ∈ H is said to be coherent if Lc = Lcψ + Lcψ † In finite dimensional case all such decompositions come from Borel subalgebra, i.e. a maximal solvable subalgebra B ⊂ Lc . The corresponding Borel subgroup B = exp B is a minimal subgroup of Gc with compact factor Gc /B. Typical example is the subgroup of upper triangular matrices in SL (n, C) = complexification of SU (n). It is a basic structural fact that B + B† = Lc , and therefore ψ is coherent ⇔ ψ is an eigenvector of B In representation theory eigenstate ψ of B is called highest vector, and the corresponding eigenvalue λ = λ(X), Xψ = λ(X)ψ,
X ∈B
is said to be highest weight. Here are the basic properties of coherent states. • • • •
For irreducible system G : H the highest vector ψ0 (=vacuum) is unique. There is only one irreducible representation H = Hλ with highest weight λ. All coherent states are of the form ψ = gψ0 , g ∈ G. Coherent state ψ in composite system HAB = HA ⊗ HB with dynamical group GAB = GA × GB splits into product ψ = ψ1 ⊗ ψ2 of coherent states of the components.
Remark. Coherent state theory, in the form given by Perelomov [48], is a physical equivalent of Kirillov–Kostant orbit method [31] in representation theory. The complexified group plays crucial role in our study. Its operational interpretation may vary. Here is a couple of examples.
A. Klyachko / Dynamical Symmetry Approach to Entanglement
33
Example 2.4.1. Spin systems. For system of spin s (see example 1.7.3) coherent states have definite spin projection s onto some direction ψ is coherent ⇐⇒ ψ = |s. The complexification of the spin group SU(2) is the group of unimodular matrices SL(2, C). The latter is locally isomorphic to the Lorentz group and controls relativistic transformations of spin states in a moving frame. Example 2.4.2. For two component system HAB = HA ⊗ HB with full access to local degrees of freedom the coherent states are decomposable ones ψAB is coherent ⇐⇒ ψAB = ψA ⊗ ψB . The dynamical group of this system is G = SU(HA ) × SU(HB ), see example 1.7.1. Its complexification Gc = SL(HA ) × SL(HB ) has an important quantum informational interpretation as group of invertible Stochastic Local Operations assisted with Classical Communication (SLOCC transformations), see [61]. These are essentialy LOCC operations with postselection. 2.5. Total variance Let’s define total variance of state ψ by equation D(ψ) = ψ|Xi2 |ψ − ψ|Xi |ψ2
(12)
i
where Xi ∈ L form an orthonormal basis of the Lie algebra of essential observables with respect to its invariant metric (for spin group SU(2) one can take for the basis spin projector operators Jx , Jy , Jz ). The total variance is independent of the basis Xi , hence G-invariant. It measures the total level of quantum fluctuations of the system in state ψ. The first sum in (12) contains the well known Casimir operator C=
Xi2
i
which commutes with G and hence acts as a scalar in every irreducible representation. Specifically Theorem 2.5.1. The Casimir operator C acts in irreducible representation Hλ of highest weight λ as multiplication by scalar Cλ = λ, λ + 2δ. One can use two dual bases Xi and X j of L, dual with respect to the invariant bilinear form B(Xi , X j ) = δij , to construct the Casimir operator X C= Xi X i . i
For example, take basis of L consisting of orthonormal basis Hi of Cartan subalgebra h ⊂ L and its root vectors Xα ∈ L normalized by condition B(Xα , X−α ) = 1. Then the dual basis is obtained by substitution Xα → X−α and hence
34
A. Klyachko / Dynamical Symmetry Approach to Entanglement C=
X
Hi2 +
X
Xα X−α =
α=root
i
X i
X
Hi2 +
X
Hα + 2
α>0
X−α Xα ,
α>0
where in the last equation we use commutation relation [Xα , X−α ] = Hα . Applying this to the highest vector ψ ∈ H of weight λ, which by definition is annihilated by all operators Xα , α > 0 and Hψ = λ(H)ψ, H ∈ h, we get Cψ =
X
λ(Hi )2 ψ +
i
where 2δ =
P α>0
X
λ(Hα )ψ = λ, λ + 2δψ,
(13)
α>0
α is the sum of positive roots and ∗, ∗ is the invariant form B translated to the dual
space h∗ . Hence Casimir operator C acts as scalar Cλ = λ, λ + 2δ in irreducible representation with highest weight λ.
2.6. Extremal property of coherent states For spin s representation Hs of SU(2) the Casimir is equal to the square of the total moment C = J 2 = Jx2 + Jy2 + Jz2 = s(s + 1). Hence D(ψ) = λ, λ + 2δ −
ψ|Xi |ψ2 .
(14)
i
Theorem 2.6.1 (Delbourgo and Fox [11]). State ψ is coherent iff its total variance is minimal possible, and in this case D(ψ) = λ, 2δ. Let ρ = |ψψ| be a pure state and ρL be its orthogonal projection into subalgebra L ⊂ Herm(H) of algebra of all Hermitian operators in H with trace metric (X, Y ) = Tr(X · Y ). Note that in contrast to R. Hermann [25] we treatÊ L as algebra of Hermitian, rather than skew-Hermitian operators, and include imaginary unit i in the definition of Lie bracket [X, Y ] = i(XY − Y X). By definition we have ψ|X|ψ = TrH (ρX) = TrH (ρL X). Choose a Cartan subalgebra h ⊂ L containing ρL . Then ψ|Xi |ψ = TrH (ρL Xi ) = 0 for Xi ⊥h and we can restrict the sum in (14) to orthonormal basis Hi of Cartan subalgebra h ⊂ L for which by the definition of highest weight ψ|H|ψ2 ≤ λ(H)2 with equality for the highest vector ψ only. Hence X i
ψ|Xi |ψ2 =
X X ψ|Hi |ψ2 ≤ λ(Hi )2 = λ, λ, i
(15)
i
and therefore D(ψ) ≥ λ, λ + 2δ − λ, λ = λ, 2δ, with equality holding for coherent states only.
The theorem supports the thesis that coherent states are closest to classical ones, cf. n◦ 2.1. Note however that such simple characterization holds only for finite dimensional systems. The total variance, for example, makes no sense for quantum oscillator, for which we have minimal uncertainty ΔpΔq = /2 instead. Example 2.6.1. For coherent state of a spin √ s system Theorem 2.6.1 gives D(ψ) = s. Hence amplitude of quantum fluctuations s for such state is of smaller order than spin s, which by Example 2.4.1 has a definite direction. Therefore for s → ∞ such a state looks like a classical rigid body rotating around the spin axis.
A. Klyachko / Dynamical Symmetry Approach to Entanglement
35
2.7. Quadratic equations defining coherent states There is another useful description of coherent states by a system of quadratic equations. Example 2.7.1. Consider a two component system HAB = HA ⊗ HB with full access to local degrees of freedom G = SU (HA ) ⊗ SU (HB ). Coherent states in this case are just separable states ψ = ψA ⊗ ψB with density matrix ρ = |ψψ| of rank one. Such matrices can be characterized by the vanishing of all minors of order two. Hence coherent states of two component system can be described by a system of quadratic equations. It turns out that a similar description holds for arbitrary irreducible system G : Hλ with highest weight λ, see [37]. Theorem 2.7.1. State ψ ∈ Hλ is coherent iff ψ⊗ψ is eigenvector of the Casimir operator C with eigenvalue 2λ + 2δ, 2λ C(ψ ⊗ ψ) = 2λ + 2δ, 2λ(ψ ⊗ ψ).
(16)
Indeed, if ψ is highest vector of weight λ then ψ ⊗ ψ is a highest vector of weight 2λ and equation (16) follows from (13). Vice versa, in terms of an orthonormal basis Xi of the Lie algebra L = Lie G the Casimir operator in the doublet Hλ ⊗ Hλ looks as follows X X X C= (Xi ⊗ 1 + 1 ⊗ Xi )2 = Xi2 ⊗ 1 + 1 ⊗ Xi2 + 2 Xi ⊗ Xi . i
i
Hence under conditions of the theorem 2λ + 2δ, 2λ = ψ ⊗ ψ|C|ψ ⊗ ψ = 2λ + 2δ, λ + 2
i
X
ψ|Xi |ψ2 .
i
It follows that
X
ψ|Xi |ψ2 = λ, λ
i
and hence by inequality (15) state ψ is coherent.
2.7.2 Remark. The above calculation shows that equation (16) is equivalent to Xi ψ ⊗ Xi ψ = λ, λ ψ ⊗ ψ,
(17)
i
which in turn amounts to a system of quadratic equations on the components of a coherent state ψ. Example 2.7.2. For a spin s system the theorem tells that state ψ is coherent iff ψ ⊗ ψ has definite spin 2s. Equations (17) amounts to Jx ψ ⊗ Jx ψ + Jy ψ ⊗ Jy ψ + Jz ψ ⊗ Jz ψ = s2 ψ ⊗ ψ.
3. Entanglement From a thought experiment for testing the very basic principles of quantum mechanics in its earlier years [15,58] entanglement nowadays is growing into an important tool
36
A. Klyachko / Dynamical Symmetry Approach to Entanglement
for quantum information processing. Surprisingly enough so far there is no agreement among the experts on the very definition and the origin of entanglement, except unanimous conviction in its fundamental nature and in necessity of its better understanding. Here we discuss a novel approach to entanglement [32], based on dynamical symmetry group, which puts it into a broader context, eventually applicable to all quantum systems. This sheds new light on known results providing for them a unified conceptual framework, opens a new prospect for further developments of the subject, reveals its deep and unexpected connections with other branches of physics and mathematics, and provides an insight on conditions in which entangled state can be stable. 3.1. What is entanglement? Everybody knows, and nobody understands what is entanglement. Here are some virtual answers to the question borrowed from Dagmar Bruß collection [6]: • J. Bell: . . . a correlation that is stronger than any classical correlation. • D. Mermin: . . . a correlation that contradicts the theory of elements of reality. • A. Peres: . . . a trick that quantum magicians use to produce phenomena that cannot be imitated by classical magicians. • C. Bennet: . . . a resource that enables quantum teleportation. • P. Shor: . . . a global structure of wave function that allows the faster algorithms. • A. Ekert: . . . a tool for secure communication. • Horodecki family: . . . the need for first application of positive maps in physics. This list should be enhanced with extensively cited Schrödinger’s definition given in n◦ 1.6. The very term was coined by Schrödinger in the famous “cat paradox” paper [58] which in turn was inspired by the not less celebrated Einstein–Podolsky–Rosen gedanken experiment [15]. While the latter authors were amazed by the nonlocal nature of correlations between the involved particles, J. Bell was the first to note that the correlations themselves, putting aside the nonlocality, are inconsistent with classical realism. Since then Bell’s inequalities are produced in industrial quantities and remain the main tool for testing “genuine” entanglement. Note however that in some cases LOCC operations can transform a classical state into nonclassical one [54]. Besides in a sense every quantum system of dimension at least three is nonclassical, see n◦ 3.4 and [40,41]. Below we briefly discuss the nonlocality and violation of classical realism. None of these effects, however, allows decisively to characterize entangled states. Therefore eventually we turn to another approach, based on the dynamical symmetry group. 3.2. EPR paradox The decay of a spin zero state into two components of spin 1/2 is subject to a strong correlation between spin projections of the components, caused by conservation of moment. The correlation creates an apparent information channel between the components, acting beyond their light cones. Let me emphasize that quantum mechanics refuted the possibility that the spin projection have been fixed at the moment of decay, rather than at the moment of measurement. Otherwise two spatially separated observers can see the same event like burst of a supernova simultaneously even if they are spacelike separated, see [50]. There is no such “event” or “physical reality” in the Bohm version of EPR experiment.
A. Klyachko / Dynamical Symmetry Approach to Entanglement
37
This paradox, recognized in the early years of quantum mechanics [15,3], nowadays has many applications, but no intuitive explanation. It is so disturbing that sometimes physicists just ignore it. For example, one of the finest recent books justifies QFT commutation relations as follows [69]: A basic relativistic principle states that if two spacetime points are spacelike with respect to each other then no signal can propagate between them, and hence the measurement of an observable at one of the points cannot influence the measurement of another observable at the other point. Experiments with EPR pairs tell just the opposite [1,19]. I am not in position to comment this nonlocality phenomenon, and therefore turn to less involved Bell’s approach , limited to the quantum correlations per se. 3.3. Bell’s inequalities Let’s start with classical marginal problem which asks for existence of a “body” in Rn with given projections onto some coordinate subspaces RI ⊂ Rn , I ⊂ {1, 2, . . . , n}, i.e. existence of probability density p(x) = p(x1 , x2 , . . . , xn ) with given marginal distributions pI (xI ) = p(x)dxJ , J = {1, 2, . . . , n}\I. RJ
In its discrete version the classical MP amounts to calculation of an image of a multidimensional symplex, say Δ = {pijk ≥ 0| pijk = 1}, under a linear map like π : Rmn → Rm ⊕ Rmn ⊕ Rn , pijk → (pij , pjk , pki ),
pij =
pijk ,
pjk =
pijk ,
pki =
i
k
pijk .
j
The image π(Δ) is the convex hull of π(Vertices Δ). So the classical MP amounts to the calculation of facets of a convex hull. In high dimensions this may be a computational nightmare [17,52]. Example 3.3.1. Classical realism. Let Xi : HA → HA be observables of quantum system A. Actual measurement of Xi produces random quantity xi with values in Spec (Xi ) and density pi (xi ) implicitly determined by expectations f (xi ) = ψ|f (Xi )|ψ for all functions f on spectrum Spec (Xi ). For commuting observables Xi , i ∈ I the random variables xi , i ∈ I have joint distribution pI (xI ) defined by similar equation f (xI ) = ψ|f (XI )|ψ,
∀f.
(18)
38
A. Klyachko / Dynamical Symmetry Approach to Entanglement
Classical realism postulates existence of a hidden joint distribution of all variables xi . This amounts to compatibility of the marginal distributions (18) for commuting sets of observables XI . Bell inequalities, designed to test classical realism, stem from the classical marginal problem. Example 3.3.2. Observations of disjoint components of two qubit system HA ⊗ HB always commute. Let Ai , Bj be spin projection operators in sites A, B onto directions i, j. Their observed values ai , bj = ±1 satisfy inequality a1 b1 + a2 b1 + a2 b2 − a1 b2 + 2 ≥ 0. Indeed the product of the monomials ±ai bj in LHS is equal to −1. Hence one of the monomials is equal to +1 and sum of the rest is ≥ −3. If all the observables have a hidden joint distribution then taking the expectations we arrive at Clauser-Horne-Shimony-Holt inequality for testing “classical realism” ψ|A1 B1 |ψ + ψ|A2 B1 |ψ + ψ|A2 B2 |ψ − ψ|A1 B2 |ψ + 2 ≥ 0.
(19)
All other marginal constraints can be obtained from it by spin flips Ai → ±Ai . Example 3.3.3. For three qubits with two measurements per site the marginal constraints amounts to 53856 independent inequalities, see [53]. Bell’s inequalities make it impossible to model quantum mechanics by classical means. In particular, there is no way to reduce quantum computation to classical one. 3.4. Pentagram inequality Here I’ll give an account of nonclassical states in spin 1 system. Its optical version, called biphoton , is a hot topic both for theoretical and experimental studies [59,28,64]. The so-called neutrally polarised state of biphoton is routinely treated as entangled, since a beam splitter can transform it into a EPR pair of photons. This is the simplest one component system which manifests entanglement. Spin 1 state space may be identified with complexification of Euclidean space E3 H = E3 ⊗ C, where spin group SU (2), locally isomorphic to SO (3), acts via rotations of E3 . Hilbert space H inherits from E3 bilinear scalar and cross products, to be denoted by (x, y) and [x, y] respectively. Its Hermitian metric is given by x|y = (x∗ , y) where star means complex conjugation. In this model spin projection operator onto direction ∈ E3 is given by equation J ψ = i[ , ψ]. It has real eigenstate |0 = and two complex conjugate ones | ± 1 = √12 (m ± in), where ( , m, n) is orthonormal basis of E3 . The latter states are coherent , see Example 2.4.1. They may be identified with isotropic vectors
A. Klyachko / Dynamical Symmetry Approach to Entanglement
39
ψ is coherent ⇐⇒ (ψ, ψ) = 0. Their properties are drastically different from real vectors ∈ E3 called completely entangled spin states. They may be characterized mathematically as follows ψ is completely entangled ⇐⇒ [ψ, ψ ∗ ] = 0. Recall from Example 2.4.1 that Lorentz group, being complexification of SO (3), preserves the bilinear form (x, y). Therefore it transforms a coherent state into another coherent state. This however fails for completely entangled states. Every noncoherent state can be transformed into completely entangled one by a Lorentz boost. In this respect Lorentz group plays rôle similar to SLOCC transform for two qubits which allows to filter out a nonseparable state into a completely entangled Bell state, cf. Example 2.4.2. By a rotation every spin 1 state can be put into the canonical form ψ = m cos ϕ + in sin ϕ,
0≤ϕ≤
π . 4
(20)
The angle ϕ, or generalized concurrence μ(ψ) = cos 2ϕ, is unique intrinsic parameter of spin 1 state. The extreme values ϕ = 0, π/4 correspond to completely entangled and coherent states respectively. Observe that J2 ψ = −[ , [ , ψ]] = ψ − ( , ψ) so that S = 2J2 − 1 : ψ → ψ − 2( , ψ) is reflection in plane orthogonal to . Hence S2 = 1 and operators S and Sm commute iff ⊥ m. Consider now a cyclic quintuplet of unit vectors i ∈ E3 , i mod 5, such that i ⊥ i+1 , and call it pentagram . Put Si := Si . Then [Si , Si+1 ] = 0 and for all possible values si = ±1 of observable Si the following inequality holds s1 s2 + s2 s3 + s3 s4 + s4 s5 + s5 s1 + 3 ≥ 0.
(21)
Indeed product of the monomials si si+1 is equal to +1, hence at least one of them is +1, and the sum of the rest is ≥ −4. Being commutative, observables Si , Si+1 have a joint distribution. If all Si would have a hidden joint distribution then taking average of (21) one get Bell’s type inequality ψ|S1 S2 |ψ+ψ|S2 S3 |ψ+ψ|S3 S4 |ψ+ψ|S4 S5 |ψ+ψ|S5 S1 |ψ+3 ≥ 0(22) for testing classical realism. Note that all marginal constraints can be obtained from this inequality by flips Si → ±Si . Using equation Si = 1 − 2| i i | one can recast it into geometrical form | i , ψ|2 ≤ 2 ⇐⇒ cos2 αi ≤ 2, αi = (23) i ψ. i mod 5
i mod 5
40
A. Klyachko / Dynamical Symmetry Approach to Entanglement
Completely entangled spin states easily violate this inequality. Say for regular pentagram and ψ ∈ E3 directed along its axis of symmetry one gets
cos2 αi =
i mod 5
5 cos π/5 ≈ 2.236 > 2. 1 + cos π/5
We’ll see below that on a smaller extent every non-coherent spin state violates inequality (23) for an appropriate pentagram. The coherent states, on the contrary, pass this test for any pentagram. To prove these claims write inequality (23) in the form ψ|A|ψ ≤ 2,
A=
| i i |,
imod 5
and observe the following properties of spectrum λ1 ≥ λ2 ≥ λ3 ≥ 0 of operator A. 1. Tr A = λ1 + λ2 + λ3 = 5. 2. If the pentagram contains parallel vectors i j then λ1 = λ2 = 2, λ3 = 1. 3. For any pentagram with no parallel vectors λ1 > 2, λ3 > 1, λ2 < 2. Proof. (1) Tr A = imod 5 Tr | i i | = 5. (2) Let us say 1 = ± 3 then 3 , 4 , 5 form an orthonormal basis of E3 . Hence A is sum of identical operator | 3 3 | + | 4 4 | + | 5 5 | and projector | 1 1 | + | 2 2 | onto plane < 1 , 2 >. (3a) Take unit vector (a) (b) (c)
x ∈< 1 , 2 > ∩ < 3 , 4 > so that x = 1 , x 1 + 2 , x 2 = 3 , x 3 + 4 , x 4 . Then Ax = 1 , x 1 + 2 , x 2 + 3 , x 3 + 4 , x 4 + 5 , x 5 = 2x + 5 , x 5 and λ1 ≥ x|A|x = 2 + |x| 5 |2 > 2. (3b) This property is more subtle. It amounts to positivity of the form
B(x, y) = x|A − 1|y =
x| i i |y − x|y.
imod 5
One can show that det B = 2 det A
sin2 ( i j ).
i<j
This implies that B is nondegenerate for every pentagram of noncollinear vectors. Therefor B has the same inertia index for all such pentagrams. Finally one can check that for
A. Klyachko / Dynamical Symmetry Approach to Entanglement
41
regular pentagram B is positive. (3c) Follows from (1), (3a), and (3b). Theorem 3.4.1. Bell’s inequality ψ|A|ψ ≤ 2 holds for coherent state ψ and any pentagram, while non-coherent state violates this inequality for some pentagram. Proof. Take ψ = m cos ϕ + in sin ϕ, 0 ≤ ϕ ≤ π/4 in canonical form (20). Then ψ|A|ψ = m|A|m cos2 ϕ + n|A|n sin2 ϕ. To violate Bell’s inequality we have to make the right hand side maximal. This happens for m = |λ1 , the eigenvector of A with maximal eigenvalue λ1 , and n = |λ2 . The maximal value thus obtained is λ1 − λ2 λ1 + λ2 + cos 2ϕ. (24) ψ|A|ψmax = λ1 cos2 ϕ + λ2 sin2 ϕ = 2 2 For coherent state 2ϕ = π/2 we arrive at Bell’s inequality ψ|A|ψmax =
5 − λ3 λ1 + λ2 = ≤2 2 2
which holds for all pentagrams by property (3b). The other part of the theorem follows from the following Claim. For every noncoherent state 0 ≤ ϕ < π/4 there exists a pentagram s.t. ψ|A|ψmax =
λ1 − λ2 λ1 + λ2 + cos 2ϕ > 2. 2 2
Indeed, for degenerate pentagram Π, containing parallel vectors, the corresponding operator A has multiple eigenvalue λ1 = λ2 = 2 and simple one λ3 = 1. In this case be the operator corresponding to a equation (24) amounts to ψ|A|ψmax = 2. Let A be its spectrum. Then for small nondegenerate ε-perturbation Π of pentagram Π, and λ simple eigenvalue λ3 we have by property (3b) 3 − λ3 = O(ε) > 0, Δ(λ3 ) = λ and hence Δ(λ1 + λ2 ) = Δ(5 − λ3 ) = O(ε) < 0. Hereafter O(ε) denote a quantity of exact order ε. The increment of multiple roots λ1 , λ2 is of smaller order √ Δ(λ1 ) = O( ε) > 0,
√ Δ(λ2 ) = O( ε) < 0,
√ Δ(λ1 − λ2 ) = O( ε) > 0,
where the signs of the increments are derived from properties (3a) and (3c). As result Δ(ψ|A|ψmax ) = Δ
√ √ λ1 − λ2 λ1 + λ2 + cos 2ϕ = O(ε) + O( ε) = O( ε) > 0, 2 2
42
A. Klyachko / Dynamical Symmetry Approach to Entanglement
provided cos 2ϕ > 0 and ε 1. Hence for noncoherent state Bell’s inequality fails: max > 2. ψ|A|ψ 3.4.2 Remark. Product of orthogonal reflections Si Si+1 in pentagram inequality (22) is a rotation by angle π in plane < i , i+1 >, i.e. Si Si+1 = 1 − 2J[2 i ,i+1 ] , and the inequality can be written in the form ψ|J[2 1 ,2 ] |ψ + ψ|J[2 2 ,3 ] |ψ + ψ|J[2 3 ,4 ] |ψ + ψ|J[2 4 ,5 ] |ψ + ψ|J[2 5 ,1 ] |ψ ≤ 4. Observe that i , i+1 , [ 1 , i+1 ] are orthogonal and therefore J2i + J2i+1 + J[2 i ,i+1 ] = 2. This allows us to return to operators Ji = Ji ψ|J12 |ψ + ψ|J22 |ψ + ψ|J32 |ψ + ψ|J42 |ψ + ψ|J52 |ψ ≥ 3. The last inequality can be tested experimentally by measuring J and calculating the average of J 2 . Thus we managed to test classical realism in framework of spin 1 dynamical system in which no two operators J ∈ su (2) commutes, cf. Example 3.3.1. The trick is that squares of the operators may commute. 3.4.3 Remark. The difference between coherent and entangled spin states disappears for the full group SU (H). Hence with respect to this group all states are nonclassical, provided dim H ≥ 3, cf. [49]. 3.5. Call for new approach Putting aside highly publicized philosophical aspects of entanglement, its physical manifestation is usually associated with two phenomena: • violation of classical realism, • nonlocality. As we have seen above every state of a system of dimension ≥ 3 with full dynamical group SU (H) is nonclassical. Therefore violation of classical realism is a general feature of quantum mechanics in no way specific for entanglement. The nonlocality, understood as a correlation beyond light cones of the systems, is a more subtle and enigmatic effect. It tacitely presumes spatially separated components in the system. This premise eventually ended up with formal identification of entangled states with nonseparable ones. The whole understanding of entanglement was formed under heavy influence of two-qubits, or more generally two-components systems, for which Schmidt decomposition (8) gives a transparent description and quantification of entanglement. However later on it became clear that entanglement does manifest itself in systems with no clearly separated components, e.g. • Entanglement in an ensemble of identical bosons or fermions [35,21,20,56,14,36, 63,60,68,44].
A. Klyachko / Dynamical Symmetry Approach to Entanglement
43
• Single particle entanglement, or entanglement of internal degrees of freedom, see [7,30] and references therein. Nonlocality is meaningless for a condensate of identical bosons or fermions with strongly overlapping wave functions. Nevertheless we still can distinguish coherent Bose-Einstein condensate of bosons Ψ = ψ N or Slater determinant for fermions Ψ = ψ1 ∧ ψ2 ∧ . . . ∧ ψN from generic entangled states in these systems. Recall, that entangled states of biphoton where extensively studied experimentally [59,28], and Bell inequalities can be violated in such simple system as spin 1 particle, see n◦ 3.4. Thus non-locality, being indisputably the most striking manifestation of entanglement, is not its indispensable constituent. See also [40,41]. Lack of common ground already led to a controversy in understanding of entanglement in bosonic systems, see n◦ 3.8, and Zen question about single particle entanglement calls for a completely novel approach. Note finally that there is no place for entanglement in von Neumann picture, where full dynamical group SU (H) makes all states equivalent, see n◦ 1.7. Entanglement is an effect caused by superselection rules or symmetry breaking which reduce the dynamical group to a subgroup G ⊂ SU (H) small enough to create intrinsical difference between states. For example, entanglement in two component system HA ⊗ HB comes from reduction of the dynamical group to SU(HA ) × SU(HB ) ⊂ SU(HA ⊗ HB ). Therefore entanglement must be studied within the framework of quantum dynamical systems. 3.6. Completely entangled states Roughly speaking, we consider entanglement as a manifestation of quantum fluctuations in a state where they come to their extreme. Specifically, we look for states with maximal total variance D(ψ) = ψ|Xi2 |ψ − ψ|Xi |ψ2 = max . i
It follows from equation (14) that the maximum is attained for state ψ with zero expectation of all essential observables ψ|X|ψ = 0,
∀X ∈ L
Entanglement equation
(25)
We use this condition as the definition of completely entangled state and refer to it as entanglement equation. Let’s outline its distinctive features. • Equation (25) tells that in completely entangled state the system is at the center of its quantum fluctuations. • This ensures maximality of the total variance, i.e. overall level of quantum fluctuations in the system. In this respect completely entangled states are opposite to coherent ones, and may be treated as extremely nonclassical . They should manifest as purely quantum effects, like violation of classical realism, to the utmost. • Maybe the main flaw of the conventional approach is lack of physical quantity associated with entanglement. In contrast to this, we consider entanglement as a manifestation of quantum fluctuations in a state where they come to their extreme. This, for example,
44
A. Klyachko / Dynamical Symmetry Approach to Entanglement
may help to understand stabilizing effect of environment on an entangled state, see [9]. • Entanglement equation (25) and the maximality of the total fluctuations plays an important heuristic rôle, similar to variational principles in mechanics. It has also a transparent geometrical meaning discussed below in n◦ 3.7. This interpretation puts entanglement into the framework of Geometric Invariant Theory, which provides powerful methods for solving quantum informational problems [33]. • The total level of quantum fluctuations in irreducible system G : Hλ varies in the range λ, 2δ ≤ D(ψ) ≤ λ, λ + 2δ
(26)
with minimum attained at coherent states, and the maximum for completely entangled ones, see n◦ 2.6. For spin s system this amounts to s ≤ D(ψ) ≤ s(s + 1). • Extremely high level of quantum fluctuations makes every completely entangled state manifestly nonclassical, see Example 3.6.2 below. • The above definition makes sense for any quantum system G : H and it is in conformity with conventional one when the latter is applicable, e.g. for multi-component systems, see Example 3.6.3. For spin 1 system completely entangled spin states coincide with so called neutrally polarized states of biphoton, see n◦ 3.4 and [59,28]. • As expected, the definition is G-invariant, i.e. the dynamical group transforms completely entangled state ψ into completely entangled one gψ, g ∈ G. 3.6.1 Remark. There are few systems where completely entangled states fail to exist, e.g. in quantum system H with full dynamical group G = SU(H) all states are coherent. In this case the total variance (12) still attains some maximum, but it doesn’t satisfy entanglement equation (25). We use different terms maximally and completely entangled states to distinguish these two possibilities and to stress conceptual, rather than quantitative, origin of genuine entanglement governed by equation (25). In most cases these notions are equivalent, and all exceptions are actually known, see n◦ 3.9. To emphasize the aforementioned difference we call quantum system G : H stable if it contains a completely entangled state, and unstable otherwise. Example 3.6.1. The conventional definition of entanglement explicitly refers to a composite system, which from our point of view is no more reasonable for entangled states than for coherent ones. As an example let’s consider completely entangled state ψ ∈ Hs of spin s system. According to the definition this means that average spin projection onto every direction should be zero: ψ|J |ψ = 0. This certainly can’t happens for s = 1/2, since in this case all states are coherent and have definite spin projection 1/2 onto some direction. But for s ≥ 1 such states do exist and will be described later in n◦ 3.11. For example, one can take ψ = |0 for integral spin s, and 1 ψ = √ (| + s − | − s) 2 for any s ≥ 1. They have extremely big fluctuations D(ψ) = s(s + 1), and therefore are manifestly nonclassical : average spin projection onto every direction is zero, while the standard deviation s(s + 1) exceeds maximum of the spin projection s.
A. Klyachko / Dynamical Symmetry Approach to Entanglement
45
Example 3.6.2. This consideration can be literally transferred to an arbitrary irreducible system G : Hλ , using inequality λ, λ < λ, λ + 2δ instead of s2 < s(s + 1), to the effect that a completely entangled state of any system is nonclassical. Example 3.6.3. Entanglement equation (25) implies that state of a multicomponent system, say ψ ∈ HABC = HA ⊗ HB ⊗ HC , is completely entangled iff its marginals ρA , ρB , ρC are scalar operators. This observation is in conformity with conventional approach to entanglemnt [13], cf. also Example 1.6.1. 3.7. General entangled states and stability From an operational point of view state ψ ∈ H is entangled iff one can filter out from ψ a completely entangled state ψ0 using SLOCC operations. As we know from Example 2.4.2 in standard quantum information settings SLOCC group coincide with complexification Gc of the dynamical group G. This leads us to the following Definition 3.7.1. State ψ ∈ H of a dynamical system G : H is said to be entangled iff it can be transformed into a completely entangled state ψ0 = gψ by complexified group Gc (possibly asymptotically ψ0 = limi gi ψ for some sequence gi ∈ Gc ). In Geometric Invariant Theory such states ψ are called stable (or semistable if ψ0 can be reached only asymptotically). Their intrinsic characterization is one of the central problems both in Invariant Theory and in Quantum Information. Relation between these two theories can be summarized in the following table, with some entries to be explained below. D ICTIONARY Quantum Information
Invariant Theory
Entangled state
Semistable vector
Disentangled state
Unstable vector
SLOCC transform
Action of complexified group Gc
Completely entangled state ψ0 prepared from ψ by SLOCC
Minimal vector ψ0 in complex orbit of ψ
State obtained from completely entangled one by SLOCC
Stable vector
Completely entangled states can be characterized by the following theorem, known as Kempf–Ness unitary trick.
46
A. Klyachko / Dynamical Symmetry Approach to Entanglement
Theorem 3.7.2 (Kempf-Ness [29]). State ψ ∈ H is completely entangled iff it has minimal length in its complex orbit |ψ| ≤ |g · ψ|,
∀g ∈ Gc .
(27)
Complex orbit Gc ψ contains a completely entangled state iff it is closed. In this case the completely entangled state is unique up to action of G. 3.7.3 Remark. Recall that entangled state ψ can be asympotically transformed by SLOCC into a completely entangled one. By Kempf-Ness theorem the question when this can be done effectively depends on whether the complex orbit of ψ is closed or not. The following result gives a necessary condition for this. Theorem (Matsushima [42]). Complex stabilizer (Gc )ψ of stable state ψ coincides with complexification of its compact stabilizer (Gψ )c . Square of length of the minimal vector in complex orbit μ(ψ) = inf c |gψ|2 ,
(28)
g∈G
provides a natural quantification of entanglement. It amounts to cos 2ϕ for spin 1 state (20), to concurrence C(ψ) [26] in two qubits, and to square root of 3-tangle τ (ψ) for three qubits (see below). We call it generalized concurrence . Evidently 0 ≤ μ(ψ) ≤ 1. Equation μ(ψ) = 1 tells that ψ is already a minimal vector, hence completely entangled state. Nonvanishing of the generalized concurrence μ(ψ) > 0 means that closure of complex orbit Gc ψ doesn’t contains zero. Then the orbit of minimal dimension O ⊂ Gc ψ is closed and nonzero. Hence by Kempf-Ness unitary trick it contains a completely entangled state ψ0 ∈ O which asymptotically can be obtained from ψ by action of the complexified dynamical group. Therefore by definition 3.7.1 μ(ψ) > 0 ⇐⇒ ψ is entangled. 3.8. Coherent versus unstable states The minimal value μ(ψ) = 0 corresponds to unstable vectors that can asymptotically fall into zero under action of the complexified dynamical group. They form the so-called null cone . It contains all coherent states, along with some others degenerate states, like W -state in three qubits, see Example 3.10.1. Noncoherent unstable states cause many controversies. There is unanimous agrement that coherent states are disentangled. In approach pursued in [63] all noncoherent states are treated as entangled. Other researchers [21,20] argue that some noncoherent unstable bosonic states are actually disentangled. From our operational point of view all unstable states should be treated as disentangled, since they can’t be filtered out into a completely entangled state even asymptotically. Therefore we accept the equivalence DISENTANGLED
⇐⇒
UNSTABLE
⇐⇒
NOT SEMI STABLE .
A. Klyachko / Dynamical Symmetry Approach to Entanglement
47
3.8.1. Systems in which all unstable states are coherent The above controversy vanishes iff the null cone contains only coherent states, or equivalently dynamical group G acts transitively on unstable states. Spin one and two qubits systems are the most notorious examples. They are low dimensional orthogonal systems with dynamical group SO (n) acting in Hn = En ⊗ C by Euclidean rotations. Null cone in this case consists of isotropic vectors (x, x) = 0, which are at the same time coherent states, cf. n◦ 3.4. Theorem 3.8.1. Stable irreducible system G : H in which all unstable states are coherent is one of the following • Orthogonal system SO (H) : H, • Spinor representation of group Spin (7) of dimension 8, • Exceptional group G2 in fundamental representation of dimension 7. The theorem can be deduced from Theorem 2.7.1 characterizing coherent states by quadratic equations. Indeed, the null cone is given by vanishing of all invariants. Hence in conditions of the theorem the fundamental invariants should have degree two. For an irreducible representation there is at most one invariant of degree two, the invariant metric (x, y). Thus the problem reduces to the description of subgroups G ⊂ SO (H) acting transitively on isotropic cone (x, x) = 0. The metric (x, x) is the unique basic invariant of such a system. Looking into the table in Vinberg-Popov book [62] we find only one indecomposable system with unique basic invariant of degree two not listed in the theorem: spinor representation of Spin (9) of dimension 16 studied by Igusa [27]. However, as we’ll see below, the action of this group Spin (9) on the isotropic cone is not transitive. Coherent states of decomposable irreducible system GA × GB : HA ⊗ HB are products ψA ⊗ ψB of coherent states of the components. Hence codimension of the cone of coherent states is at least dA dB − dA − dB + 1 = (dA − 1)(dB − 1). As we’ve seen above, in conditions of the theorem the codimension should be equal to one, which is possible only for system of two qubits dA = dB = 2, which is equivalent to orthogonal system of dimension four. One can also argue that projective quadric Q : (x, x) = 0 of dimension greater than two is indecomposable Q = X × Y .
Both exceptional systems carry an invariant symmetric form (x, y). Scalar square (x, x) generates the algebra of invariants, and therefore the null cone consists of isotropic vectors (x, x) = 0, as in the orthogonal case. These mysterious systems emerge also as exceptional holonomy groups of Riemann manifolds [2]. Their physical meaning is unclear. Élie Cartan [8] carefully studied coherent states in irreducible (half)spinor representations of Spin (n) of dimension 2ν , ν = n−1 pure spinors . In general 2 . He call them linear independent the cone of pure spinors is the intersection of 2ν−1 (2ν + 1) − 2ν+1 ν quadrics. For n < 7 there are no equations, i.e. all states are coherent. In such systems there is no entanglement whatsoever, and we exclude them from the theorem. These systems are very special and have a transparent physical interpretation. • For n = 3 spinor representation of dimension two identifies Spin (3) with SU (2). Vector representation of SO (3) is just spin 1 system, studied in n◦ 3.4. • Two dimensional halfspinor representations of Spin (4) identify this group with SU (2) × SU (2) and the orthogonal system of dimension 4 with two qubits. • For n = 5 spinor representation H4 of dimension 4 carries invariant simplectic form ω and identify Spin (5) with simplectic group Sp (H4 , ω). The standard vector representation of SO (5) in this settings can be identified with the space of skew symmetric forms in H4 modulo the defining form ω.
48
A. Klyachko / Dynamical Symmetry Approach to Entanglement
• For n = 6 halfspinor representations of dimension 4 identify Spin (6) with SU (H4 ) and the orthogonal system of dimension 6 with SU (H4 ) : ∧2 H4 . This is a system of two fermions of rank 4. The previous group Spin (5) Sp (H4 ) is just a stabilizer of a generic state ω ∈ ∧2 H4 . In the next case n = 7 coherent states are defined by the single equation (x, x) = 0 and coincide with unstable ones. Thus we arrive at the first special system Spin (7) : H8 . The stabilizer of a non isotropic spinor ψ ∈ H8 , (ψ, ψ) = 0 in Spin (7) is the exceptional group G2 and its representation in the orthogonal complement to ψ gives the second system G2 : H7 . Alternatively it can be described as the representation of the automorphism group of Cayley octonionic algebra in the space of purely imaginary octaves. Halfspinor representations of Spin (8) : H8 also carry invariant symmetric form (x, y). It follows that Spin (8) acts on halfspinors as full group of orthogonal transformations. Hence these representations are geometrically equivalent to the orthogonal system SO (H8 ) : H8 . The equivalence is known as Cartan’s triality [8]. Finally the spinor representation of Spin (9) of dimension 16 also carries the invariant symmetric form (x, y) which is the unique basic invariant of this representation. However according to Cartan’s formula the cone of pure spinors is the intersection of 10 independent quadrics, hence differs from the null cone (x, x) = 0. 3.8.2. Fermionic realization of spinor representations Spinor representations of the two-fold covering Spin (2n) of the orthogonal group SO (2n) have a natural physical realization . Recall that all quadratic expressions in creation and annihilation operators a†i , aj , i, j = 1 . . . n in a system of fermions with n intrinsic degrees of freedom form the orthogonal Lie algebra so (2n) augmented by scalar operator (to avoid scalars one has to use 12 (a†i ai − ai a†i ) instead of a†i ai , ai a†i ). It acts in fermionic Fock space F(n), known as spinor representation of so (2n). In difference with bosonic case it has finite dimension dim F(n) = 2n and splits into two halfspinor irreducible components F(n) = Fev (n) ⊕ Fodd (n), containing even and odd number of fermions respectively. For fermions of dimension n = 4 the halfspinors can be transformed into vectors by the Cartan’s triality. This provides a physical interpretation of the orthogonal system of dimension 8. To sum up, orthogonal systems of dimension n = 3, 4, 6, 8 have the following physical description • • • •
n = 3. Spin 1 system. n = 4. Two qubit system. n = 6. System of two fermions SU (H4 ) : ∧2 H4 of dimension 4. n = 8. System of fermions of dimension 4 with variable number of particles (either even or odd).
The last example is fermionic analogue of a system of quantum oscillators n◦ 2.1. Lack of the aforementioned controversy makes description of pure and mixed entanglement in orthogonal systems very transparent, and quite similar to that of two qubit and spin 1 systems.
A. Klyachko / Dynamical Symmetry Approach to Entanglement
49
3.9. Unstable systems Halfspinor representations of the next group Spin (10) were discussed as an intriguing possibility that quarks and leptons may be composed of five different species of fundamental fermionic objects [69,66]. This is a very special system where all states are unstable, hence disentangled. In other words the null cone amounts to the whole state space and there is no genuine entanglement governed by equation (25). Such systems are opposite to those considered in the preceding section, where the null cone is as small as possible. We call them unstable . There are very few types of such indecomposable irreducible dynamical systems [62,43]: • • • •
Unitary system SU (H) : H; Symplectic system Sp (H) : H; System of two fermions SU (H) : ∧2 H of odd dimension dim H = 2k + 1; A halfspinor representation of dimension 16 of Spin (10).
All (half)spinor irreducible representations for n < 7 fall into this category. There are many more such composite systems, and their classification is also known due to M. Sato and T. Kimura [55]. 3.10. Classical criterion of entanglement Kempf–Ness theorem 3.7.2 identifies closed orbits of the complexified group Gc with completely entangled states modulo action of G. Closed orbits can be separated by Ginvariant polynomials. This leads to the following classical criterion of entanglement. Theorem 3.10.1 (Classical Criterion). State ψ ∈ H is entangled iff it can be separated from zero by a G-invariant polynomial f (ψ) = f (0),
f (gx) = f (x), ∀g ∈ G, x ∈ H.
Example 3.10.1. For a two component system ψ ∈ HA ⊗ HB all invariants are polynomials in det[ψij ] (no invariants for dim HA = dim HB ). Hence state is entangled iff det[ψij ] = 0. The generalized concurrence (28) is related to this basic invariant by the equation μ(ψ) = n| det[ψij ]|2/n . The unique basic invariant for 3-qubit is Cayley hyperdeterminant [18] 2 2 2 2 2 2 2 2 Det [ψ] = (ψ000 ψ111 + ψ001 ψ110 + ψ010 ψ101 + ψ011 ψ100 )
−2(ψ000 ψ001 ψ110 ψ111 + ψ000 ψ010 ψ101 ψ111 +ψ000 ψ011 ψ100 ψ111 + ψ001 ψ010 ψ101 ψ110 +ψ001 ψ011 ψ110 ψ100 + ψ010 ψ011 ψ101 ψ100 ) +4(ψ000 ψ011 ψ101 ψ110 + ψ001 ψ010 ψ100 ψ111 ). related to 3-tangle [10] and generalized concurrence (28) by the equations
50
A. Klyachko / Dynamical Symmetry Approach to Entanglement
τ (ψ) = 4|Det[ψ]|,
μ(ψ) =
τ (ψ).
One can check that the Cayley hyperdeterminant vanishes for the so called W-state W =
|100 + |010 + |001 √ 3
which therefore is neither entangled nor coherent. 3.10.2 Remark. This example elucidates the nature of entanglement introduced here. It takes into account only those entangled states that spread over the whole system, and disregards any entanglement supported in a smaller subsystem, very much like 3-tangle did. For example, absence of entanglement in two component system HA ⊗ HB for dim HA = dim HB reflects the fact that in this case every state belongs to a smaller subspace VA ⊗ VB , VA ⊂ HA , VB ⊂ HB as it follows from Schmidt decomposition (8). Entanglement of such states should be treated in the corresponding subsystems. 3.11. Hilbert-Mumford criterion The above examples, based on Theorem 3.10.1, shows that invariants are essential for understanding and quantifying of entanglement. Unfortunately finding invariants is a tough job, and more than 100 years of study give no hope for a simple solution. There are few cases where all invariants are known, some of them were mentioned above. In addition invariants and covariants of four qubits and three qutrits were found recently [39,4,5]. For five qubit only partial results are available [38]. See more on invariants of qubits in [45,46]. For system of format 4 × 4 × 2 the invariants are given in [51]. Spin systems have an equivalent description in terms of binary forms , see Example 3.11.2. Their invariants are described by the theory of Binary Quantics, diligently pursued by mathematicians from the second half of 19-th century. This is an amazingly difficult job, and complete success was achieved by classics for s ≤ 3, the cases s = 5/2 and 3 being one of the crowning glories of the theory [43]. Modern authors advanced it up to s = 4. Other classical results of invariant theory are still waiting physical interpretation and applications. In a broader context Bryce S. DeWitt described the situation as follows: “Why should we not go directly to invariants? The whole of physics is contained in them. The answer is that it would be fine if we could do it. But it is not easy.” Now, due to Hilbert’s insight, we know that the difficulty is rooted in a perverse desire to put geometry into Procrustean bed of algebra. He created Geometric Invariant Theory just to overcome it. Theorem 3.11.1 (Hilbert-Mumford Criterion [43]). State ψ ∈ H is entangled iff every observable X ∈ L = Lie(G) of the system in state ψ assumes a nonnegative value with positive probability. By changing X to −X one deduces that X should assume nonpositive values as well. So in entangled state no observable can be biased neither to strictly positive nor to strictly negative values. Evidently completely entangled states with zero expectations ψ|X|ψ = 0 of all observables pass this test.
A. Klyachko / Dynamical Symmetry Approach to Entanglement
51
Example 3.11.1. Let X = XA ⊗ 1 + 1 ⊗ XB be an observable of the two qubit system HA ⊗ HB with Spec XA = ±α,
Spec XB = ±β,
α ≥ β ≥ 0.
Suppose that ψ is unstable and observable X assumes only strictly positive values in state ψ. Since those values are α ± β then the state is decomposable ψ = a|α ⊗ |β + b|α ⊗ | − β = |α ⊗ (a|β + b| − β), i.e. Hilbert-Mumford criterion characterizes entangled qubits. The general form of H-M criterion may shed some light on the nature of entanglement. However, it was originally designed for application to geometrical objects, like linear subspaces or algebraic varieties of higher degree, and its efficiency entirely depends on our ability to express it in geometrical terms. Let’s give an example. Example 3.11.2. Stability of spin states. Spin s representation Hs can be realized in space of binary forms f (x, y) of degree d = 2s Hs = {f (x, y)| deg f = 2s} in which SU (2) acts by linear substitutions f (x, y) → f (ax + by, cx + dy). To make swap from physics to mathematics easier we denote by fψ (x, y) the form corresponding to state ψ ∈ Hs . Spin state ψ ∈ Hs can be treated algebraically, physically, or geometrically according to the following equations ψ=
μ=s μ=−s
aμ
1/2 μ=s 2s 2s xs+μ y s−μ = aμ |μ = (αi x − βi y). s+μ s+μ μ=−s i
The first one is purely algebraic, the second gives physical decomposition over eigenstates |μ =
1/2 2s xs+μ y s−μ , s+μ
of spin projector operator Jz =
1 2
Jz |μ = μ|μ
∂ ∂ x ∂x , and the last one is geometrical. It − y ∂y
describes form fψ (x, y) in terms of configuration of its roots zi = (βi : αi ) in Riemann sphere C ∪ ∞ = S2 (known also as Bloch sphere for spin 1/2 states, and Poincaré sphere for polarization of light). According to the H-M criterion state ψ is unstable iff spin projections onto some direction are strictly positive. By rotation we reduce the problem to the z-component 1 ∂ ∂ Jz = 2 x ∂x − y ∂y in which case the corresponding form fψ (x, y) =
μ>0
aμ
1/2 2s 2s xs+μ y s−μ |μ = aμ s+μ s + μ μ>0
52
A. Klyachko / Dynamical Symmetry Approach to Entanglement
has root x = 0 of multiplicity more than s = d/2. As a result we arrive at the following criterion of entanglement (=semistability) for spin states ψ is entangled ⇐⇒ no more than half of the roots of fψ (x, y) coincide.
(29)
One can show that if less then half of the roots coincide then the state is stable i.e. can be transformed into a completely entangled one by the Lorentz group SL (2, C) acting az+b on roots zi ∈ C ∪ ∞ through Möbius transformations z → cz+d . In terms of these roots entanglement equation (25) amounts to the following condition ψ completely entangled ⇐⇒ (zi ) = 0, (30) i
where parentheses denote unit vector (zi ) ∈ S2 ⊂ E3 mapping into zi ∈ C ∪ ∞ under stereographic projection. For example, for integral spin the completely entangled state |0 can be obtained by putting equal number of points at the North and the South poles of Riemann sphere. Another balanced configuration (30) consisting of 2s points evenly distributed along the equator produces completely entangled state |ψ = √12 (|s− |− s) for any s ≥ 1, cf. Example 3.6.1. Note also that a configuration with half of its points in the South pole can’t be transformed into a balanced one (30), except all the remaining points are at the North. However this can be done asymptotically by homothety z → λz as λ → ∞ which sends all points except zero to infinity. This gives an example of semistable but not stable configuration. Summary. Solvability of the nonlinear problem of conformal transformation of a given configuration into a balanced one (30) depends on topological condition (29) on its multiplicities. One can find application of this principle to quantum marginal problem in [33,34].
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]
A. Aspect, J. Dalibard, and G. Roger. Phys. Rev. Lett., 47:1804, 1982. M. Berger. A panoramic view of Riemannian geometry. Springer-Verlag, Berkin, 2003. D. Bohm. Qantum theory. Prentice-Hall, 1951. E. Briand, J.-G. Luque, and J.-Y. Thibon. A complete set of covariants of the four qubits system. quant-ph/0304026. E. Briand, J.-G. Luque, J.-Y. Thibon, and F. Verstraete. The moduli space of three qutrit states. quant-ph/0306122. D. Bruß. Characterizing entanglement. J. Math. Phys., 43(9): 4237–4251, 2002. M. Can, A. Klyachko, and A. Shumovsky. Single-particle entanglement. Journal of Optics B: Quantum Semiclass. Opt., 7, 2005. É. Cartan. The theory of spinors. Hermann, Paris, 1966. Ö. Çakır, A. Klyachko, and A.S. Shumovsky. Stationary entanglement of atoms induced by classical field. Applied Physics Letters, 86(1), 2005. V. Coffman, J. Kudu, and W.K. Wooters. Distributed entanglement. Phys. Rev. A, 61:052306, 2000. L. Delbourgo and J.R. Fox. Maximum weight vectors possess minimal uncertainty. J. Phys. A, 10:1233, 1970. H. Dieks. Communication by EPR devices. Phys. Lett. A, 92(6):271–272, 1982.
A. Klyachko / Dynamical Symmetry Approach to Entanglement
53
[13] W. Dür, G. Vidal, and J.I. Cirac. Three qubits can be entangled in two inequivalent ways. Phys. Rev. A, 62:062314, 2000. [14] K. Eckert, J. Schliemann, D. Bruss, and M. Lewenstein. Quantum correlations in systems of identical particles. Annals of Physics, 299:88–127, 2002. [15] A. Einstein, B. Podolsky, and N. Rosen. Can quantum mechanical description of physical reality be considered complete. Phys. Review, 47:777, 1935. [16] G.G. Emch. Mathematical and conceptual foundations of 20th century physics. North Holland, Amsterdam, 1984. [17] R. Freund and J. Orlin. On the complexity of four polyhedral set containment problem. Math. Programming, 33:133–145, 1985. [18] I. M. Gelfand, M. M. Kapranov, and A. V. Zelevinsky. Discriminants, resultants, and multidimensional discriminants. Birkhäuser, Boston, 1994. [19] M. Genovese. Research on hidden variable theory: A review of recent progresses. Physics Reports, 413:319–396, 2005. [20] G.C. Ghirarardi and L. Marinatto. Criteria for the entanglement of composite systems of identical particles. quant-ph/0410086. [21] G.C. Ghirarardi and L. Marinatto. General criterion for the entanglement of two indistinguishable particles. Phys. Rev. A, 70:012109, 2004. [22] R.J. Glauber. Photon correlations. Phys. Rev. Lett., 10(3):84–86, 1963. [23] R.J. Glauber. Optical coherence and the photon statistics. In C. DeWitt, A. Blandin, and C. Cohen-Tannoudji, editors, Quantum optics and electronics. Gorgon and Breach, 1965. [24] R. Hermann. Lie Groups for physicists. Benjamin, New York, 1966. [25] R. Hermann. Topics in physical geometry. Math Sci Press, Brookline, MA, 1988. [26] S. Hill and Wooters W, K. Entanglement of a pair of quantum bits. Phys. Rev. Lett., 78:5022, 1997. [27] J.-I. Igusa. A classification of spinors up to dimension twelve. Am. J. Math., 92:997–1028, 1970. [28] G. Jaeser, M. Teodorescu Frumosu, A. Sergienko, B.E.A. Saleh, and M.C. Teich. Multiphoton stockes parameter invariant for entangled states. Phys. Rev. A, 67:032307, 2003. [29] G. Kempf and L. Ness. Lengths of vectors in representation spaces, pages 233–244. Springer, 1978. [30] Y.-H. Kim. Single photon two-qubits “entangled" states: preparation and measurement. quant-ph/0303125. [31] A. A. Kirillov. Elements of the theory of representations. Springer-Verlag, New York, 1976. [32] A. Klyachko. Coherent states, entanglement, and geometric invariant theory. arXiv:quantph/0206012, 2002. [33] A. Klyachko. Quantum marginal problem and representations of the symmetric group. arXiv:quant-ph/0409113, 2004. [34] A. Klyachko. Quantum marginal problem and n-representability. arXiv:quant-ph/0511102, 2005. [35] A. J. Leggett. Bose-Einstein condensation in the alcali gases: Some fundamental concepts. Rev. Mod. Phys., 73:307–356, 2001. [36] P. Lévay, S. Nagy, and J. Pipek. Elementary formula for entenglement entropies of fermionic systems. Phys. Rev. A, 72:022302, 2005. [37] W. Lichtenstein. A system of quadrics describing the orbit of the highest vector. Proc. Amer. Math. Soc., 84(4):605–608, 1982. [38] J.-G. Luque and J.-Y. Thibon. Algebraic invariants of five qubits. quant-ph/0506058. [39] J.-G. Luque and J.-Y. Thibon. The polynomial invariants of four qubits. quant-ph/0212069. [40] J.D. Malley. All quantum observables in a hidden variable model must commute simultanousley. Phys. Rev. A, 69:022118, 2004. [41] J.D. Malley and A. Fine. Noncommuting observables and local realism. Phys. Lett. A, 347:51–55, 2005.
54
A. Klyachko / Dynamical Symmetry Approach to Entanglement
[42] Y. Matsushima. Espaces homogènes de Stein des groupes de Lie complexes. Nagoya Math. J., 16:205–218, 1960. [43] D. Mumford, J. Fogarty, and F. Kirwan. Geometric invariant theory. Springer, Berlin, 1994. [44] T.J. Osborn and M.A. Nielsen. Entanglement, quantum phase transitions, and density matrix renormalization. Quant. Inf. Proc., 1:45, 2002. [45] A. Osterloh and J. Siewert. Constructing n-qubit entanglement monotone from antilinear operators. quant-ph/0410102, 2005. [46] A. Osterloh and J. Siewert. Entanglement monotones and maximally entangled states in multipartite qubit systems. quant-ph/0506073, 2005. [47] A. Perelomov. Coherent states for arbitrary Lie groups. Comm. Math. Phys., 26:222–236, 1972. [48] A. Perelomov. Generalized coherent states and their applications. Springer, New York, 1986. [49] A. Peres. Qantum theory: concepts and methods. Kluwer, Dordrecht, 1995. [50] A. Peres and D. R. Terno. Quantum information and relativity theory. quant-ph/0212023, 2003. [51] D.D. Pervushin. Invariants and orbits of the standard SL4 × SL4 × SL2 module. Izvestya Math., 64(5):1003–1015, 2000. [52] I. Pitowsky. Quantum Probabiliy – Quantum Logic. Springer, Berlin, 1989. [53] I. Pitowsky and K. Svozil. Optimal test of quantum nonlocality. Phys. Rev. A, 64:014102, 2001. [54] S. Popescu. Bell’s inequalities and density matrices: Revealing “hidden" nonlocality. Phys. Rev. Lett., 74:2619–2622, 1995. [55] M. Sato and T. Kimura. A classification of irreducible prehomogeneous vector spaces and their relative invariants. Nagoya Math. Journal, 65, 1977. [56] G. Schliemann, J.I. Cirac, M. Ku´s, M. Lewenstein, and D. Loss. Quantum correlations in two fermionic systems. Phys. Rev. A, 64:022304, 2001. [57] E. Schrödinger. Naturwissenschaften, 14:664, 1926. [58] E. Schrödinger. The present situation in quantum mechanics. Proc. Amer. Phil. Soc., 124:323–338, 1980. [59] T. Tsegaye, J. Söderholm, M. Atatüre, A. Trifonov, G. Byörk, A. Sergienko, B.E.A. Salrh, and M.C. Teich. Experimental demonstration of three mutually orthogonal polarisation states of entangled photons. Phys. Rev. Lett., 85:5013–5017, 2000. [60] F. Verstraete and J.I. Cirac. Quantum nonlocality in the presence of superselection rules and data hiding protocols. Phys. Rev. Lett., 91:010404, 2003. [61] F. Verstraete, J. Dehaene, and B. De Moor. Normal form of entanglement monotones, and optimal filtering of multipartite quantum systems. quant-ph/0105090. [62] E. Vinberg and V. Popov. Invariant theory. Springer, Berlin, 1992. [63] L. Viola, H. Barnum, E. Knill, G. Orlitz, and R. Somma. Entanglement beyond subsystems. quant-ph/0403044. [64] Z. D. Walton, A. V. Sergienko, B. E. E. Salech, and M. C. Teich. Generating polarization entengled photon pairs with arbitrary joint spectrum. arXiv:quant-ph/0405021. [65] F. Wilczek. Geometric phases in physics. World Scientific, Singapore, 1989. [66] F. Wilczek and A. Zee. Families from spinors. Phys. Rev. D, 25:553–565, 1982. [67] W.K. Wooters and W.H. Zurek. A single quantum cannot be cloned. Nature, 299:802–803, 1982. [68] P. Zanardi. Quantum entanglement in fermionic lattices. Phys. Rev. A, 65:042101, 2002. [69] A. Zee. Quantum field theory in a nutshell. Princeton University Press, Princeton, 2003.
55
Physics and Theoretical Computer Science J.-P. Gazeau et al. (Eds.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
Mathematics of phase transitions Roman Kotecký 1,2 Charles University, Praha, Czech Republic and the University of Warwick, UK Abstract. This is a very brief introduction to the theory of phase transitions. Only few topics are chosen with a view on possible connection with discrete mathematics. Cluster expansion theorem is presented with a full proof. Finite-size asymptotics and locations of zeros of partition functions are discussed among its applications to simplest lattice models. A link with the study of zeros of the chromatic polynomial as well as the Lovász local lemma is mentioned. Keywords. Phase transitions, lattice models, cluster expansions, zeros of partition functions, finite-size effects
1. Introduction A prototype of a phase transitions is liquid-gas evaporation. With increasing pressure p (at a fixed temperature), the density ρ abruptly increases:
Follow Gibbs’s prescription: start from microscopic energy of the gas of N particles HN ( p1 , . . . , pN , r1 , . . . , rN ) =
N N pi2 U (ri − rj ), + 2m i,j=1 i=1
(1)
1 Correspondence to: Roman Kotecký CTS, Jilská 1 110 00, Praha 1 Czech Republic; E-mail: koteckycucc.ruk.cuni.cz 2 These are lecture notes: an edited version of lectures’ transparencies. As a result, some topics are treated rather tersely and the reader should consult the cited literature for a more detailed information.
56
R. Kotecký / Mathematics of Phase Transitions
with interaction, for realistic gases, something like the Lennard-Jones potential, U (r) ∼ 6 12 , with strong short range repulsion and long range attraction, − αr + αr U
r
Basic thermodynamic quantities are then given in terms of grand-canonical partition function 3 3 ∞ zN d pi d ri −βHN Z(β, λ, V ) = e = N ! R3N ×V N h3N N =0
∞ P λN = e−β i,j U (ri −rj ) d3ri . N! V N
(2)
N =0
Namely, for a given inverse temperature β = p(β, λ) =
1 kT
and fugacity λ, the pressure is
1 1 lim log Z(β, λ, V ) β V →∞ |V |
(3)
and the density ρ(β, λ) = lim
V →∞
∂ 1 λ log Z(β, λ, V ). |V | ∂λ
(4)
However, to really prove the existence of gas-liquid phase transition along these lines remains till today an open problem. One can formulate it as follows:
Prove that for β large there exists λt (β) such that ρ(β, λ) is discontinuous at λt .
Much more is known and understood for lattice models, with Ising model as the simplest representative.
2. Ising model For x ∈ Zd take σx ∈ {−1, +1} and using σΛ to denote σΛ = {σx ; x ∈ Λ} for any finite Λ ⊂ Zd , we introduce the energy
57
R. Kotecký / Mathematics of Phase Transitions
H(σΛ ) = −
σx σy − h
x,y⊂Λ
σx .
x∈Λ
The ground states (with minimal energy) for h = 0 are the configurations σΛ = +1, σΛ = −1. At nonzero temperature one considers the Gibbs state, i.e. the probability distribution: f β,h Λ =
1 f (σΛ )e−βH(σΛ ) , ZΛ (b, h) σ Λ
where ZΛ (β, h) =
e−βH(σΛ ) .
σΛ
Phase transitions are discussed in terms of the free energy f (β, h) = −
1 1 lim log ZΛ (β, h) β ΛZd |Λ|
and the order parameter m(β, h) = lim
ΛZd
1 σx |Λ|
β,h
x∈Λ
Λ
should feature a discontinuity at low temperatures and h = 0: m
T h
Notice: (β,h) • m(β, h) = − ∂f ∂h whenever f is differentiable, • f is a concave function of h.
Define spontaneous magnetization: m∗ (β) = limh→0+ m(β, h).
58
R. Kotecký / Mathematics of Phase Transitions
An alternative formulation of the discontinuity is in terms of nonstability with respect to boundary conditions (up to now we have actually used free boundary conditions). Given a configuration σ ¯ , take HΛ (σΛ | σ ¯ ) = H(σΛ ) −
σx σ ¯y
x∈Λ,y ∈Λ /
and, correspondingly, ·β,h Λ,¯ σ
ZΛ,¯σ (β, h).
and
One can rather straightforward claims: • f does not depend on σ ¯: f (β, h) = −
• mσ¯ (β, h) = limΛZd
1 |Λ|
1 1 lim log ZΛ,¯σ (β, h) β ΛZd |Λ| β,h
x∈Λ
σx
Λ,¯ σ
may depend on σ ¯ . Actually,
m∗ (β) = m+ (β, 0) = limΛZd σx β,h Λ,+ Idea of the proof: • −∂h− f (β, h) ≤ mσ¯ (β, h) ≤ −∂h+ f (β, h), • limh→0+ mσ¯ (β, h) = −∂h+ f (β, 0) does not depend on the boundary condition, β,h 1 • monotonicity of |Λ| x∈Λ σx Λ,+ on Λ, h, lim lim = inf inf = inf inf = m+ (β, 0).
h→0+ Λ
h≥0 Λ
Λ h≥0
For high temperatures, the spontaneous magnetization vanishes,
tanh β <
Proof: Expand
x,y∈E(Λ)
1 2d−1
=⇒ m+ (β, 0) = 0.
eβσx σy with the help of
eβσx σy = cosh β 1 + σx σy tanh β .
59
R. Kotecký / Mathematics of Phase Transitions
|E(Λ)| ZΛ,+ = cosh β
σx σy tanh β =
σΛ E⊂E(Λ) x,y∈E
|E(Λ)| = 2|Λ| cosh β
tanh β
|E|
|E(Λ)| = 2|Λ| cosh β
E⊂E(Λ)
As a result,
σx Λ,+ =
≤
tanh β
ω:x→∂Λ
|ω|
≤
∞
n (2d−1)n tanh β → 0.
n=dist(x,∂Λ)
On the other hand, for low temperatures, there is a non-vanishing spontaneous magnetisation, d ≥ 2, ∃β0 : β ≥ β0 =⇒ m+ (β, 0) > 0.
Proof: This is the famous Peierls argument:
Start with contour representation, σΛ ←→ Γ = {γ1 , γ2 , . . . }:
It yields
60
R. Kotecký / Mathematics of Phase Transitions
H(σΛ | +) − H(+ | +) = 2
E(Λ)
|γ|
γ∈Γ
and thus ZΛ,+ (β, 0) = eβE(Λ)
e−2β
P γ∈Γ
|γ|
.
Γ in Λ
Writing σx β,0 Λ,+ = PΛ,+ (σx = 1) − PΛ,+ (σx = −1) = 1 − 2PΛ,+ (σx = −1), we evaluate
PΛ,+ (σx = −1) ≤
≤
e−2β|γ|
≤
γ surr. x
∞ k=4
k e−2βk 32(k−1) 2
using that #{γ surrounds x | |γ| = k} is (for d = 2) bounded by k2 32(k−1) . Analysing the proof: 2 main ingredients: • Independence of contours (taking away any one (by flipping all spins inside it), what remains is still a valid configuration). • Damping (e−2β|γ| is small for β large).
61
R. Kotecký / Mathematics of Phase Transitions
We met two expansions:
tanh β
|g|
and
e−2β|γ|
Γ γ∈Γ
F g∈F
(in the first sum we view the set E ⊂ E(Λ) as a collection F of its connected components—high temperature polymers). Both expressions have the same structure of a sum over collections of pairwise independent contributions. This is a starting point of an abstract theory of cluster expansions. Its mature formulation is best presented as a claim about graphs with weights attributed to their vertices and I cannot resist presenting its full proof as it was substantially simplified in recent years [5, 8, 11, 13].
3. Cluster expansions Consider: A graph G = (V, E) (without selfloops), and a weight w : V → C. The term abstract polymers is also used for vertices v ∈ V , with pairs (v, v ) ∈ E being called incompatible (no sefloops: only distinct vertices may be incompatible). For L ⊂ V , we use G[L] to denote the induced subgraph of G spanned by L. For any finite L ⊂ V , define ZL (w) =
w(v).
(5)
I⊂L v∈I
with the sum running over all independent sets I of vertices in L (no two vertices in I are connected by an edge). In other words: the sum is over all collections I of compatible abstract polymers. The partition function ZL (w) is an entire function in w = {w(v)}v∈L ∈ C|L| and ZL (0) = 1. Hence, it is nonvanishing in some neighbourhood of the origin w = 0 and its logarithm is, on this neighbourhood, an analytic function yielding a convergent Taylor series log ZL (w) = aL (X)wX . (6) X∈X (L)
Here, X (L) is the set of all multi-indices X : L → {0, 1, . . . } and wX = v w(v)X(v) . Inspecting the Taylor formula for aL (X) in terms of corresponding derivatives of log ZL (w) at the origin w = 0, it is easy to show that the coefficients aL (X) actually do not depend on L: aL (X) = asupp X (X), where supp X = {v ∈ V : X(v) = 0}. As a result, one is getting the existence of coefficients a(X) for each X ∈ X = {X : V → {0, 1, . . . }, |X| = v∈V |X(v)| < ∞} such that log ZL (w) =
a(X)wX
(7)
X∈X (L)
for every finite L ⊂ V (convergence on a small neighbourhood of the origin depending on L).
62
R. Kotecký / Mathematics of Phase Transitions
Notice that a(X) ∈ R for all X (consider ZL (w) with real w) and a(X) = 0 whenever G(supp X) is not connected (just notice that, from definition, Zsupp X (w) = ZL1 (w)ZL2 (w) once supp X = L1 ∪ L2 with no edges between L1 and L2 ). In addition, the coefficients a(X) have alternating signs: (−1)|X|+1 a(X) ≥ 0.
(8)
To prove this claim we verify the validity of an equivalent formulation: Lemma (alternating signs). For every finite L ⊂ V , all coefficients of the expansion of − log ZL (−|w|) in powers |w|X are nonnegative. Indeed, equivalence with alternating signs property follows by observing that due to (7), one has − log ZL (−|w|) = −
a(X)(−1)|X| |w|X
X∈X (L)
(and every X has supp X ⊂ L for some finite L). Proof. Proof of the Lemma by induction in |L|: Using a shorthand ZL∗ = ZL (−|w|), we notice that Z∅∗ = 1 with −log Z∅∗ = 0
and
∗ ∗ Z{v} = 1−|w(v)| with −log Z{v} =
∞ |w(v)|n . n n=1
Using N (v) to denote the set of vertices v ∈ V adjacent in graph G to the vertex ¯ = L ∪ {v}, from definition one has Z ∗¯ = Z ∗ − |w(v)|Z ∗ v, for w small and L L L\N (v) L yielding − log ZL∗¯
=
− log ZL∗
− log 1 − |w(v)|
∗ ZL\N (v)
ZL∗
∗ ¯ with W ⊂ L (we consider |w| for which all concerned Taylor expansions for log ZW converge). The first term on the RHS has nonnegative coefficients by induction hypothesis. Taking into account that − log(1 − z) has only nonnegative coefficients and that ∗ ZL\N (v)
ZL∗
= exp
|a(X)||w|X
X∈X (L)\X (L\N (v))
has also only nonegative coefficients, all the expression on the RHS have necessarily only nonnegative coefficients. What is the diameter of convergence? For each finite L ⊂ V , consider the polydiscs DL,R = {w : |w(v)| ≤ R(v) for v ∈ L} with the set of radii R = {R(v); v ∈ V }. The most natural for the inductive proof (leading in the same time to the strongest claim) turns out to be the Dobrushin condition:
R. Kotecký / Mathematics of Phase Transitions
63
There exists a function r : V →[0, 1) such that, for each v ∈ V, 1 − r(v ) . R(v) ≤ r(v)
(∗)
v ∈N (v)
Saying that X ∈ X is a cluster if the graph G(supp X) is connected, we can summarise the cluster expansion claim for an abstract polymer model in the following way: Theorem (Cluster expansion). There exists a function a : X → R that is nonvanishing only on clusters, so that for any sequence of radii R satisfying the condition (∗) with a sequence {r(v)}, the following holds true: (i) For every finite L ⊂ V , and any contour weight w ∈ DL,R , one has ZL (w) = 0 and a(X)wX ; log ZL (w) = X∈X (L)
(ii)
X∈X :supp X v
|a(X)||w|X ≤ − log 1 − r(v) .
Proof. Again, by induction in |L| we prove (i) and (ii)L obtained from (ii) by restricting the sum to X ∈ X (L): Assuming ZL = 0 and
|a(X)||w|X ≤ −
log 1 − r(v )
v ∈N (v)
X∈X (L):supp X∩N (v)=∅
obtained by iterating (ii)L , we use ZL\N (v) ZL¯ = ZL 1 + w(v) ZL and the bound 1 + w(v) ZL\N (v) ≥ 1 − |w(v)| exp ZL
|a(X)||w|
X
≥
X∈X (L)\X (L\N (v))
≥ 1 − |w(v)|
(1 − r(v ))−1 ≥ 1 − r(v) > 0
v ∈N (v)
to conclude that ZL¯ = 0. To verify (ii)L¯ , we write ¯ X∈X (L),supp X v
∗ ZL\N (v) |a(X)||w|X = − log ZL∗¯ +log ZL∗ = − log 1−|w(v)| ≤ − log(1−r(v)). ZL∗
64
R. Kotecký / Mathematics of Phase Transitions
4. Harvesting 4.1. Ising model at low temperatures The low temperature expansion is an instance of an abstract polymer model. Contours γ are its vertices with intersecting pairs connected by an edge: ZΛ,+ (β, 0) = eβE(Λ)
Γ in Λ
P
e −2β γ∈Γ |γ| = eβE(Λ) w(γ)
w(γ).
I⊂L(Λ) γ∈I
Here L(Λ) is the set of all contours in Λ. Checking that (for β large) the weights w ∈ DR : assume that β is large enough so that
e−(2β−1)|γ | ≤ 1
A(γ ) x
(for any fixed x ∈ Zd and A(γ ) = {x ∈ Zd : dist(x, γ ) ≤ 1}). Then choose r(γ) = 1 − exp{−e−(2β−1)|γ| } and verify (instead of (∗)) the weaker [7] condition (1 − r(γ )) log(1 − r(γ)) |w(γ)| ≤ −(1 − r(γ)) γ ∈N (γ)
as
e−2β|γ| ≤ e−(2β−1)|γ| exp{−e−(2β−1)|γ| −
e−(2β−1)|γ | }
γ ∈N (γ)
≥e−|γ|
(It implies (∗) since −(1 − t) log(1 − t) ≤ t.) Thus the cluster expansion applies:
log ZΛ,+ (β, 0) = β|E(Λ)| +
X∈X (L(Λ))
a(X)wX
Dependence on Λ only through the set of used multiindices, individual terms are Λindependent! It implies an explicit expression for the free energy:
−βf (β, 0) = lim
log ZΛ,+ (β,0) |Λ|
where A(X) = ∪γ∈suppX A(γ). Indeed,
= dβ +
a(X)wX X∈X :A(X) x |A(X)|
65
R. Kotecký / Mathematics of Phase Transitions
log ZΛ −(−βf )|Λ| = β|E(Λ)|−dβ+
x∈Λ X∈X (L(Λ)):A(X) x
≤ βO(|∂Λ|)+
|a(X)|w |A(X)|
X
X∈X (L(Λ)):A(X) x
≤ βO(|∂Λ|)+
a(X)wX − |A(X)|
X:A(X) x
e−β|x−y|
y∈∂Λ
≤ βO(|∂Λ|) +
a(X)wX ≤ |A(X)|
√ |a(X)|( w)X ≤
X:A(X) y
−β|x−y|
e
= βO(|∂Λ|).
y∈∂Λ x∈Λ
Thus, there exists β0 such that f (β, 0) is analytic on (β0 , ∞) (being, in this interval, an absolutely convergent series of analytic functions in β). Similarly, at high temperatures: there exists β1 such that f (β, h) is real analytic in β and h for (β, h) : β < β1 , βh < 1.
4.2. Applications in discrete mathematics 4.2.1. Zeros of the chromatic polynomial Sokal [12], Borgs [2] For a graph G = (V, E) let PG (q) =
q C(E ) (−1)|E
|
E ⊂E
with C(E ) denoting the number of components of the graphÊ (V, E ). Theorem. Let G be of a maximal degree D and K = mina of PG (q) lie inside the disc {q ∈ C; |q| < DK}. (−1)|E | . Idea of proof: Φ(G) :=
a+ea log(1+ae−a ) .
Then all zeros
E ⊂E E connected
E yields a partition π. Resum over all E → π: PG (q) =
π of V γ∈π
qΦ(G(γ)) = q |V | q 1−|γ| Φ(G(γ)) .
π of V γ∈π |γ|≥2
w(γ)
4.2.2. Connection with Lovász local lemma “Bad events” Av not too strongly dependent (bounded influence outside of a “neighbourhood” of v) =⇒ there is a positive probability that none of them occurs:
66
R. Kotecký / Mathematics of Phase Transitions
Theorem (Lovász). G = (V, E), Av , v ∈ V family of events, r(v) ∈ (0, 1) such that ∀Y ⊂ V \ (N (v) ∪ {v}), P
Av |
Av
≤ r(v)
v ∈Y
(1 − r(v )).
v ∈N (v)
Then P
Av
≥
v∈V
(1 − r(v)) > 0.
v∈V
Scott-Sokal [11]: P Av | ∩v ∈Y A v ≤ R(v) =⇒ P ∩v∈V ≥ ZG (−R) > 0 once R(v) ≤ r(v) v ∈N (v) (1 − r(v )). 5. Models without symmetry For example: Ising with H →H +h
σx σy σz
should yield a phase diagram (where κ stands for h): h ht (β, κ) Tc
T = 1/β
Can ht (β, h) be computed? Can contour representation be used? The answer is: Yes—with some tricks (Pirogov-Sinai theory [6, 9, 10, 14]). Main ideas: Again, ZΛ,+ (β, h) = eβ|E(Λ)|
Γin Λ
e−βe+ |Λ+ (Γ)|−βe− |Λ− (Γ)|
w(γ).
γ∈Γ
However, contours cannot be erased without changing the remaining configuration: • Λ± (Γ) changes,
R. Kotecký / Mathematics of Phase Transitions
• w
= w
67
.
Actually, we have here labeled contours with “hard-core long range interaction”.
First trick: restoring independence. The cost of erasing γ including flipping of the interior: w+ (γ) = w(γ)
ZIntγ,− , ZIntγ,+
w− (γ) = w(γ)
ZIntγ,+ . ZIntγ,−
We get
ZΛ,+ = e−βe+ |Λ|
w+ (γ)
Γ in Λ γ∈Γ
by induction in |Λ|: ZΛ,+ =
e−βe+ |Extθ|
θ exterior contours
ZIntγ,− w(γ) ZIntγ,+ , ZIntγ,+ γ∈θ w+ (γ)
with ZIntγ,+ = e−βe+ |Intγ| by induction step. The contour partition function ZL(Λ) (w+ ) yields the same probability for external contours as for original physical system. If w+ (γ)| ≤ e−τ |γ| with large τ =⇒ typical configuration is a sea of pluses with small islands. For any (h, β) with β large, either w+ or w− (or both) should be suppressed. But which one?
Second trick: metastable states. Define w± (γ) if w± (γ) ≤ e−τ |γ| w± (γ) := e−τ |γ| otherwise and Z Λ,± := e−βe± |Λ|
ZL(Λ) (w± )
cluster exp.→g(w± )
with −β log Z Λ,± ∼ |Λ|f± , where f± := e± + g(w± ). Notice: f+ and f− are inductively (through w± ) unambiguously defined.
68
R. Kotecký / Mathematics of Phase Transitions
Once we have them, we can introduce ht :
f−
f+ ht
The final step is to prove (again by a careful induction): h ≤ ht → f− = min(f− , f+ ) =⇒ w− = w− (&w+ (γ) = w+ (γ) for γ : β(f+ − f− )diamγ ≤ 1) and h ≥ ht → f+ = min(f− , f+ ) =⇒ w+ = w+ (&w− (γ) = w− (γ) for γ : β(f− − f+ )diamγ ≤ 1).
Standard example: Blume-Capel model. Spin takes three values, σx ∈ {−1, 0, 1}, with Hamiltonian
(σx − σy )2 − λ
σx2 − h
σx .
x,y
The phase diagram features three competing phases: +, −, and 0: h T (= 1/β) = 0
h
T >0
+
+
0
0 λ −
λ −
For the origin h = λ = 0, the phase 0 is stable (f0 > f+ , f− ): indeed, one has e+ = e− = e0 = 0, and g(w± ) ∼ −e−4β > g(w0 ) ∼ −2e−4β (lowest excitations: one 0 in the sea of + (or −), while, favourably, either one + or one − (two possibilities) in the sea of 0).
69
R. Kotecký / Mathematics of Phase Transitions
6. Second harvest Finite volume asymptotics: Using Pirogov-Sinai theory, one has a good control over the finite volume behaviour. For example, say, for the Ising model with an asymmetry, we get an asymptotics of d the magnetization mper N (β, h) in volume N with periodic boundary conditions [4]: mper N (β,h)
m m+
hmax (N ) ht
h
m−
In particular, hmax (N ) = ht +
per ZN
3χ N −2d + O(N −3d ). 2β 2 m3
Zeros of partition function: Blume-Capel in z = e−βh for the partition function with periodic boundary conditions:
One can obtain results about asymptotic loci of zeros by analyzing d
d
Z per ∼ e−βf+ N + e−βf− N + e−βf0 N
d
obtained with help of a complex extension of Pirogov-Sinai and cluster expansions [1,3].
References [1] M. Biskup, C. Borgs, J. T. Chayes, and R. Kotecký, Partition function zeros at first-order phase transitions: Pirogov-Sinai theory, Jour. Stat. Phys. 116 (2004), 97–155.
70
R. Kotecký / Mathematics of Phase Transitions
[2] C. Borgs, Absence of Zeros for the Chromatic Polynomial on Bounded Degree Graphs, Combinatorics, Probability and Computing 15 (2006), 63–74. [3] M. Biskup, C. Borgs, J. T. Chayes, R. Kotecký, and L. Kleinwaks, Partition function zeros at first-order phase transitions: A general analysis, Commun. Math. Phys. 251 (2004) 79–131. [4] C. Borgs, R. Kotecký, A rigorous theory of finite-size scaling at first-order phase transitions, Jour. Stat. Phys. 61 (1990), 79–119. [5] R. L. Dobrushin, Estimates of semi-invariants for the Ising model at low temperatures, In: R.L. Dobrushin, R.A. Minlos, M.A. Shubin, A.M. Vershik (eds.) Topics in statistical and theoretical physics, Amer. Math. Soc., Providence, RI, pp. 59–81, 1996. [6] R. Kotecký, Pirogov-Sinai theory, In: Encyclopedia of Mathematical Physics, vol. 4, pp. 60– 65, eds. J.-P. Françoise, G.L. Naber, and S.T. Tsou, Oxford: Elsevier, 2006. [7] R. Kotecký and D. Preiss, Cluster expansion for abstract polymer models, Comm. Math. Phys. 103 (1986), 491–498. [8] S. Miracle-Solé, On the convergence of cluster expansions, Physica A 279 (2000), 244–249. [9] S.A Pirogov and Ya.G. Sinai, Phase diagrams of classical lattice systems (Russian), Theor. Math. Phys. 25 (1975), no. 3, 358–369. [10] S.A Pirogov and Ya.G. Sinai, Phase diagrams of classical lattice systems. Continuation (Russian), Theor. Math. Phys. 26 (1976), no. 1, 61–76. [11] A. D. Scott and A. D. Sokal, The Repulsive Lattice Gas, the independent-Set Polynomial, and the Lovász Local Lemma, Jour. Stat. Phys. 118 (2005), 1151–1261. [12] A. D. Sokal, Bounds on the complex zeros of (di)chromatic polynomials and Potts-model partition functions, Combin. Probab. Comput. 10 (2001), 41–77. [13] D. Ueltschi, Cluster expansions & correlation functions, Moscow Mathematical Journal 4 (2004), 511–522. [14] M. Zahradník, An alternate version of Pirogov-Sinai theory, Commun. Math. Phys. 93 (1984), 559–581.
Physics and Theoretical Computer Science J.-P. Gazeau et al. (Eds.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
71
The Topology of Deterministic Chaos: Stretching, Squeezing and Linking Marc Lefranc 1 UMR CNRS 8523/CERLA, Université de Lille 1 Abstract. Chaotic behavior in a deterministic dynamical system results from the interplay in state space of two geometric mechanisms, stretching and squeezing, which conspire to separate arbitrarily close trajectories while confining the dynamics to a bounded subset of state space, a strange attractor. A topological method has been designed to classify the various ways in which stretching and squeezing can organize chaotic attractors. It characterizes knots and links formed by unstable periodic orbits in the attractor and describes their topological organization with branched manifolds. Its robustness has allowed it to be successfully applied to a number of experimental systems, ranging from vibrating strings to lasers. Knotted periodic orbits can also be used as powerful indicators of chaos when their knot type is associated with positive topological entropy and thus implies mixing in state space. However, knot theory can only be applied to three-dimensional systems. Extension of this approach to higher-dimensional systems will thus require alternate formulations of the principles upon which it is builT, determinism and continuity. Keywords. Chaotic dynamics, Knot theory, Branched manifolds, Symbolic dynamics, Topological entropy, Nielsen-Thurston theory
1. Introduction 1.1. The geometry of chaos The counterintuitive properties of “deterministic chaos” were first unveiled by Poincaré more than a century ago [1]. However, it was essentially when dynamics could be visualized with computers that it became widely known that physical systems governed by deterministic laws of motion can not only display stationary, periodic, or quasiperiodic regimes but also irregular behavior (Fig. 1). Indeed, the occurence of chaotic behavior can only be understood through a geometric description of the dynamics, where the time evolution of the system is represented by the trajectory of a representative point in an abstract space, the phase space, whose coordinates are the state variables [2, 3]. When only a single state variable can be measured, as is usually the case in experiments, the dynamics can be embedded in a reconstructed phase space, whose coordinates are, for example, successive time derivatives or values of the time series at different times [4, 5]. In phase space, each dynamical regime is associated to a geometrical object on which motion takes place after transients have died out: an attractor. Attractors for stationary, 1 Correspondance to: M. Lefranc, PhLAM, UFR de Physique, Bât. P5, Université de Lille 1, F-59655 Villeneuve d’Ascq, France. E-mail:
[email protected]
72
M. Lefranc / The Topology of Deterministic Chaos: Stretching, Squeezing and Linking
periodic and quasiperiodic regimes are a point, a closed cycle, a torus, respectively. When the dynamics becomes chaotic, trajectories are confined to a highly organized geometrical object with a complex, fractal structure: a strange attractor (Fig. 1).
Figure 1. Left: Chaotic time series X(t) delivered by a CO2 laser with modulated losses ; Right: reconstruction of the underlying strange attractor in a phase space with cylindrical coordinates {X(t), X(t + τ ), ϕ} where τ is a suitably chosen time delay and ϕ is the modulation phase. Reprinted from [6].
1.2. Geometric mechanisms of chaos An essential feature of chaotic dynamics is sensitivity to initial conditions: the difference in the states of two identical systems starting from arbitrarily close initial conditions increases exponentially with time and becomes macroscopic in finite time [2]. Because individual trajectories in phase space are unstable, it only makes sense to study how entire regions are transformed under the evolution laws, and to characterize the global structure of the dynamical flow in phase space.
Figure 2. Left: Intersections of a chaotic attractor with a series of section planes are computed. Right: Their evolution from plane to plane shows the interplay of the stretching and squeezing mechanisms.
A closer examination of the structure of a strange attractor reveals how local instability and global confinement in a bounded region of phase space can coexist. By looking at intersections of the strange attractor with a series of surfaces of section (Poincaré sections), it can be seen that chaotic behavior results from the interplay of two geometrical mechanisms: stretching, that separates close trajectories, and folding (or more generally
M. Lefranc / The Topology of Deterministic Chaos: Stretching, Squeezing and Linking
73
squeezing), that maintains them at finite distance (Fig. 2). At a given point of phase space, the two effects generally act along different directions, with the contraction rate along the squeezing direction being larger than the expansion rate along the stretching direction. As a consequence, line elements are stretched while volume elements are contracted. A simple geometric model that incorporates the stretching and squeezing mechanisms is Smale’s horseshoe map [7]. It is a map of the plane into itself that stretches the unit square in one direction, squeezes it in the orthogonal direction, and then folds the resulting region over itself so that it intersects the original square along two rectangles H0 and H1 (Fig. 3). The beauty of the horseshoe map is that points in the invariant set (i.e., points whose forward and backward iterates remain in the unit square forever) can be shown to be in one-to-one correspondence with bi-infinite sequences of symbols 0 and 1, which indicate in which of the two rectangles successive iterates of the points fall [2, 3, 8]. By studying this symbolic dynamics, the horseshoe map is found to display properties that are typical of deterministic chaos: existence of an infinite number of periodic orbits, density of periodic orbits in the invariant set, existence of a dense orbit [8]. This simple dynamical system appears in some form in many chaotic systems.
(a)
fold
H1
f(S)
H0 S
S
S
Figure 3. The geometric definition of the Smale’s horseshoe map.
1.3. Characterizing chaos Given a strange attractor observed in experiments or numerical simulations, one of the main objectives of the topological analysis of chaos outlined in this survey is to determine the horseshoe-like map organizing the dynamics and to describe in which way stretching and squeezing interact to generate chaotic behavior [3, 9]. This will not be carried out by mere visual inspection of Poincaré sections but through a systematic analysis of the knot invariants of closed trajectories embedded in the strange attractor, whose intertwining provides signatures of the underlying geometrical mechanisms. Following this approach and closely related ideas, we will be able to handle the following problems: 1. Classify chaotic regimes according to the dynamical mechanism generating them. This provides a robust way to compare predictions of a theoretical model with experimental observations. 2. Construct a symbolic encoding of a strange attractor and characterize the dynamics through the set of forbidden sequences (the “grammar of chaos”). 3. Understand the structure of bifurcation diagrams, that is the sequence of qualitative changes (e.g., creation of a periodic orbit) that occur as a system evolves towards fully developed chaos. 4. Obtain convincing evidence that deterministic chaos is responsible for the irregular behavior observed in a short or nonstationary time series.
74
M. Lefranc / The Topology of Deterministic Chaos: Stretching, Squeezing and Linking
2. Periodic orbits and knots 2.1. Periodic orbits and ergodicity In deterministic chaos, order and disorder are intimately related. On the one hand, a typical trajectory explores the strange attractor without ever returning exactly to its initial condition. On the other hand, the strange attractor is densely filled with an infinite number of periodic orbits associated with closed curves in phase space. The existence of these periodic orbits is linked to the ergodicity of the chaotic dynamics: since one can return arbitrarily close to a given point after a sufficiently long time, there must be arbitrarily many neighboring points whose orbit closes exactly. This is easily seen in the horseshoe map of Fig. 3, that has as many periodic orbits as there are bi-infinite periodic sequences of 0 and 1 (i.e., a countable infinity). These periodic orbits can coexist with a chaotic regime because they are unstable: trajectories enter their neighborhood and stay there for some time, but eventually leave it. In fact, a chaotic attractor can only appear when all periodic orbits in its neighborhood have become unstable. Since any segment of a trajectory on the attractor can in principle be approximated with arbitrary accuracy by an unstable periodic orbit, it is possible to fully characterize the natural measure on the attractor from its spectrum of unstable periodic orbits [10–12] 2.2. Periodic orbits in experiments Unstable periodic orbits are certainly more than a theoretical concept. Since they are visited at regular time intervals, they manifest themselves in experimental time series as bursts of almost periodic behavior occuring as the system trajectory temporarily shadows them. Long periodic orbits are hardly distinguishable from an aperiodic trajectory but the frequent occurence of intervals of low-periodic behavior is an unmistakable hallmark of low-dimensional deterministic chaos (Fig. 4). Periodic orbits are extracted from the time series by searching for places where the trajectory returns close to a previously visited location in a short time (“close returns”). Although infinitely many periodic orbits are embeded in a chaotic attractor, only a finite number of them can be extracted from a finite time series usually contaminated by noise. Obviously, shorter orbits are more easily identified than longer ones because they are less unstable and thus are visited more often and shadowed for a longer time by the trajectory. However, this is not a problem because large-scale features of an attractor are determined by its lowest-period orbits, with small-scale details being specified by higherorder orbits. As a matter of fact, it turns out that for practical purposes, a little more than ten periodic orbits may provide an excellent approximation of a strange attractor as illustrated in Fig. 5. This makes it plausible that we can fully characterize a chaotic attractor by restricting the analysis to its unstable cycles, decomposing its complexity into simple geometric objects with a systematic organization. However, we shall see in Sec. 5.2 that valuable information can still be extracted from a single orbit. If we are interested in computing metric properties (fractal dimensions, Lyapunov exponents, ...) of the strange attractor, we will seek to extract metric information (e.g., Floquet multipliers) from the spectrum of periodic orbits [10–12]. However, this information is not always easy to obtain from experimental signals. We shall rather focus on simple topological indices that indicate how periodic trajectories are intertwined. This
M. Lefranc / The Topology of Deterministic Chaos: Stretching, Squeezing and Linking
75
Figure 4. Signatures of periodic orbits of lowest period in a time series from a chaotic CO2 laser are shown inside boxes, with a shade that depends on period. A significant amount of time is spent near these orbits.
Figure 5. Left: a chaotic attractor reconstructed from a time series from a chaotic laser ; Right : Superposition of 12 periodic orbits of periods from 1 to 10.
will provide us with a robust description of the global topological organization of the strange attractor. Remarkably, this description not only captures the essential properties of chaotic dynamics but does so in a very robust way. 2.3. Knots Consider the pair of orbits embedded in the attractor of Fig. 6. A simple way to characterize their configuration is to count how many times one of them rotates around the other orbit in one period, which gives the linking number of the two orbits. The linking number is possibly the simplest invariant of knot theory, a branch of topology concerned with properties of closed curves that do not change when they are deformed continuously without inducing intersections. If we deform the two orbits of Fig. 6 and bring them in the configuration at the right of Fig. 6, it is easily seen that their linking number is 2 and that the two orbits cannot be deformed into each other. They have different knot types:
76
M. Lefranc / The Topology of Deterministic Chaos: Stretching, Squeezing and Linking
trivial knot
trefoil knot Figure 6. Left: two periodic orbits of periods 1 and 4 embedded in a strange attractor; Right: a link of two knots that is equivalent to the pair of periodic orbits up to continuous deformations without crossings.
one is a trivial loop while the other is a realization of the simplest non-trivial curve, the trefoil knot. More refined invariants can also be used to characterize knots and links [13], but linking numbers and recent generalizations of them, relative rotation rates [14, 15], are already sufficient to provide us with key dynamical information. What makes knot theory dynamically relevant is determinism. Indeed, two trajectories cannot intersect in phase space: otherwise, the common point would have two futures. As a result, knots and links formed by periodic orbits are well defined. Furthermore, changing a control parameter generally deforms continuously the orbit, but without having it intersect itself, so that its knot type is not modified. Thus, the knot invariants of a periodic orbit are constant over its entire domain of existence and are real fingerprints. Even if they have a dynamical significance, knot types and link invariants can only be useful in experiments if they can reliably be estimated from time series. By searching for close returns we only obtain trajectory segments which approach periodic orbits but are not exactly closed. We then make a knot out of this segment by closing the small gap. In doing so, we assume that in a small neighborhood of the almost closed orbit we have observed, there is a true periodic orbit with the same knot type. This is most certainly true when the distance of close return is small compared to the average distance between strands of the knot, so that the deformation from the segment observed to the neighboring periodic orbit cannot possibly induce self-intersections. Similarly, knot invariants are very robust to noise, whose effect can be considered to be a small deformation of the orbit. In any case, we shall see later that topological invariants of different periodic orbits provide redundant information, so that we can verify that the set of numbers measured is geometrically consistent. 2.4. Stretching and squeezing link periodic orbits After having extracted some periodic orbits from a time series, embedded them in a reconstructed phase space and computed their (self-) linking numbers and relative rotation rates, we are left with possibly large tables of numbers. What do we do with this information? As we shall see in the next section, the key property is that there is a systematic organization in the knots and links formed by periodic orbits and that this organization can be recovered through their invariants.
M. Lefranc / The Topology of Deterministic Chaos: Stretching, Squeezing and Linking
77
Figure 7. “Combing” the intertwined periodic orbits (left) reveals their systematic organization (right) created by the stretching and squeezing mechanisms.
As any trajectory on the attractor, periodic orbits experience the stretching and squeezing mechanisms that organize the chaotic dynamics. However, because they are closed trajectories, they bear the mark of these mechanisms in their own way. More precisely, they are knotted and braided in a way that depends directly on how stretching and squeezing act on phase space. It should thus be possible with continuous deformations to bring a link of periodic orbits in a configuration where the effect of stretching, folding and squeezing is easily visualized (Fig. 7). This is a difficult problem that is elegantly solved by means of branched manifolds thanks to the Birman Williams theorem [16]. 3. Branched manifolds 3.1. The Birman–Williams theorem A strange attractor is a complex object because it has a fractal structure which stems from the repeated action of stretching and folding. Roughly speaking, the strange attractor may be viewed as an infinitely folded two-dimensional surface tangent to the unstable direction and to the flow direction, with the fractal structure being observed in the stable direction, along which squeezing occurs. Thus, the attractor would collapse to a much simpler object if we could somehow squeeze it along the latter direction. This is exactly what the Birman–Williams achieves by identifying points whose orbits converge to each other as time goes by and are eventually indistinguishable. Given a three-dimensional (3D) hyperbolic chaotic flow Φt , Birman and Williams define the following equivalence relation which identifies points of the invariant set Λ having the same asymptotic future: ∀x, y ∈ Λ,
x ∼ y ⇔ lim ||Φt (x) − Φt (y)|| = 0, t→∞
(1)
where Φt (x) is the time-t image of x under the flow Φ. The Birman–Williams then consists of two main statements [16, 17]: 1. In the set of equivalence classes of relation (1), the hyperbolic flow Φt induces a ¯ t , K) is called a template, or ¯ t on a branched manifold K. The pair (Φ semi-flow Φ knot-holder, for a reason that the second statement makes obvious.
78
M. Lefranc / The Topology of Deterministic Chaos: Stretching, Squeezing and Linking
2. Unstable periodic orbits of Φt in Λ are in one-to-one correspondence with unsta¯ t in K. Moreover, every link of unstable periodic orbits of ble periodic orbits of Φ ¯ t , K). (Φt , Λ) is isotopic to the corresponding link of (Φ In other words, there exists a two-dimensional branched manifold such that all periodic orbits in the original flow can be projected onto it without modifying any of their topological invariants (Fig. 8). The proof of the theorem is based on the fact that two distinct points belonging to one or two periodic orbits stay at finite distance for all times: their evolution is periodic forever. Therefore, periodic points cannot become identified under equivalence relation (1). The Birman–Williams projection then amounts to an isotopy since no crossings between periodic orbits are induced.
Figure 8. The Smale horseshoe template with a period-1 and a period-4 orbit. These orbits are the projections of the two orbits of Fig. 6 and have exactly the same knot invariants. The manifold can be seen as a squeezed suspension of the horseshoe map in Fig. 3, with the semi-flow rotating counterclockwise. The longitudinal and transverse directions correspond to the flow direction and to the unstable direction of the original flow, respectively. The stable direction has been squeezed out.
Because of the splitting of the manifold into several branches (Fig. 8), there is a natural classification of orbits on a template in terms of symbolic dynamics. Labeling each branch with a different symbol, there is a one-to-correspondence with periodic orbits and periodic symbol sequences up to cyclic permutation, as for the horseshoe map. Because the symbolic code of an orbit determines its itinerary on the branched manifold, the knot types that are realized on a given template can be systematically studied [16–22]. Moreover, simple relations between the symbolic itinerary and knot invariants can be obtained as we shall see in the next section. 3.2. Algebraic description of templates and experimental determination Since the Birman–Williams theorem holds for hyperbolic flows, it may seem dubious to apply it to real-life strange attractors, which are never hyperbolic. Indeed periodic orbits are continuously created or destroyed as a control parameter is varied, which can be traced back to tangencies between stable and unstable directions. However, we may assume that there is a way to make the experimental attractor become hyperbolic by changing some control parameter. Orbits that exist in the operating conditions are then organized exactly as in the hyperbolic limit, since knot and link types are not modified under control parameter variation. Thus, for each set of periodic orbits detected in a experiment there must be at least a hyperbolic template carrying them, and the simplest
M. Lefranc / The Topology of Deterministic Chaos: Stretching, Squeezing and Linking
79
such template can be used as a model of the stretching and squeezing organizing the strange attractor in which the orbits are embedded. Experimental template determination therefore relies on solving the following problem: given a set of (self-) linking numbers, what is the simplest template with a set of orbits having the same invariants?
Figure 9. Left: the structure of a template can be described algebraically using a small set of integer numbers. Right: Linking numbers of two orbits can be computed by following itineraries on the template and counting contributions from the different terms. Only branches are shown. Reprinted from [3].
This can be achieved by describing the geometry of the template with a small set of integer numbers that specify the torsions of the branches, their linking numbers and the order in which they are stacked at the branch line when then join again. These numbers can be grouped in two matrices, the template matrix t whose diagonal (resp., nondiagonal) terms tii (resp., tij ) are given by the torsions in half-turns (resp., twice the linking number of the two branches), and the layering matrix l such that lij = 1 when the leftmost of the two branches i and j falls under the other and −1 otherwise [3, 9]. The parities of branch torsions, πi = tii mod 2, play an important rôle. (Self-) linking numbers of periodic orbits then have the general form I = i≤j αij tij + i<j Πij (π0 , π1 , . . . , πn−1 ) × lij (2) = L(tij ) + N (πi , lij ) where coefficients αij and polynomial functions Πij depend only on the symbolic codes of the orbits, and characteristic numbers tij , πi and lij depend only on the template structure. Expressions (2) are almost linear with respect to the characteristic numbers and are thus relatively easy to solve for the template structure. For example, let us assume that we have obtained the orbits and invariants shown in Table 1. Orbit
1
1 01
0 1
01 1
0111
2
3
0111
01010111
5
01010111 4 6 13 23 Table 1. (Self-) linking numbers of 4 orbits extracted from an experiment.
80
M. Lefranc / The Topology of Deterministic Chaos: Stretching, Squeezing and Linking
The equations associated with the first three orbits are: lk(1, 01) =
1 2 t01
+ 12 t11 +
× l01
t01 + l01
slk(01) = lk(1, 0111) =
π1 2
1 2 t01
+
3 2 t11
+
π1 2
=1 =1
× l01
=2
(3)
lk(01, 0111) = 2t01 + 12 t00 + 32 t11 + 12 (π0 + π1 ) × l01 = 3 slk(0111) =
3t01 + 3t11 + (3 − π1 ) × l01
=5
Remarkably, these 5 equations in 4 unknowns are compatible and have a single solution: t01 = 0
t00 = 0
t11 = 1 l01 = 1
(4)
which corresponds to the Smale horseshoe template of Fig. 8. This example illustrates that template determination is an extremely overdetermined problem: the four numbers in (4) are the solution of an infinite number of equations such as (3), since they describe the global topological organization of all periodic orbits in the attractor. This is a key advantage of template analysis. Indeed, the structure determined using part of the observed orbits may be checked for consistency using the remaining orbits. For example, we can verify that solution (4) correctly predicts a self-linking number of 23 for orbit 01010111. This makes the analysis extremely robust, with low-period orbits being used to extract the global structure, and high-period orbits serving to confirm it. A recent advance in the theory of templates has been to recognize that branched manifolds are enclosed in bounding tori, and that the structure of these tori has a deep influence on the global dynamics [23–25]. Another important (and related) problem not covered here is analyzing systems with symmetries, such as the Lorenz attractor [26,27].
4. Classification of strange attractors by templates Applying knot theory and template theory to physical systems was proposed by Mindlin et al. [28–30] as a robust method to classify chaotic regimes: attractors associated with different templates would fall in different topological classes. This was a very promising method for validating models of real systems. Indeed, there is no chance a model can be faithful if it does not predict correctly the topological structure observed experimentally. Furthermore, template analysis could possibly unfold large-scale changes in the structure of a chaotic attractor as a parameter was varied if it was found that different templates described the same system in different regions of parameter space. However all the first experimental template investigations evidenced the standard Smale horsehoe template shown in Fig. 8 [6, 29, 31–35]. This was disappointing since a classification method where all experimental systems fall in the same class is not very useful. At the same time there were some theoretical indications that more complicated structures could be found. A Shilnikov scenario in the Rössler system, in which chaotic regimes of increased complexity are observed, had been shown to be accompanied by a sequence of spiral templates with increasing numbers of branches [36]. Moreover, Gilmore and McCallum had shown that there was a systematic organization in the parameter space of forced nonlinear oscillators [37], with systematic changes in the templates as amplitude and frequency of the modulation were scanned (see below).
M. Lefranc / The Topology of Deterministic Chaos: Stretching, Squeezing and Linking
81
The first observations of non-horseshoe templates were made in an electronic circuit [38] and in a modulated YAG laser [39], respectively associated with a spiral template and a “reverse”, or “twisted” horseshoe template. The latter case is an interesting example as it shows that even if the template describes the organization of all periodic orbits, a single orbit may provide a hint at a different topological organization: the period-3 orbit detected in this experiment is incompatible with a standard horseshoe template (Fig. 10).
Figure 10. The four crossings of the period-3 braid (left) are correctly predicted by the reverse horseshoe template (right) but not by the standard horseshoe template (center). The branches of the former template are twisted by a half-turn compared to those of the latter.
Moreover, the parameter spaces of two modulated chaotic lasers were explored and their structure was found to be in agreement with the observations of Gilmore and McCallum in the Duffing model [37]. The response of forced nonlinear oscillators is generally much stronger when the modulation period is close to a multiple of their natural frequency (Fig. 11), which explains why chaotic regimes are usually found in tongues centered around these multiples. Gilmore and Mc Callum found that the global torsion of the template systematically increased by one when going from one chaotic tongue to the next one, which is indeed what was observed in a fiber laser experiment [40]. Futhermore, there is a systematic change in the template structure across a chaotic tongue which is associated to a gradual increase in torsion when modulation period is increased: new branches with high torsion appear (i.e., new periodic orbits populating these branches are created), and branches with low torsion disappear. When torsion has globally increased by one, then we are in the next tongue and the gradual transition repeats. This scenario was confirmed in experiments with a YAG laser [41] (Fig. 11). Numerical simulations also show that the number of branches generally increases when driving is stronger, which corresponds to a shift towards more fully developed chaos. The spiral template shown in Fig. 11 (“outside-to-inside” spiral) is only one of the three three-branch templates that are combinatorially possible, the two other ones being a S-shaped and a “inside-to-outside” spiral templates [3]. Recent preliminary results suggest that the S-shaped template has been observed in an Erbium doped fiber laser [42].
5. Characterizing the orbit spectrum: symbolic encodings and orbit forcing Classification of strange attractors according to their associated template is certainly robust, but a finer level of description is often needed. How can we distinguish two attractors associated with a standard horseshoe template? How can we detect topological discrepancies between a model and an experiment when the predicted and observed templates match? The key is, once the global orbit organization has been characterized by templates, to identify periodic orbits individually using their knot invariants and to describe the orbit spectrum. Since the latter is invariant under coordinate changes, this provides a way to characterize an attractor independently of the physical variable measured.
82
M. Lefranc / The Topology of Deterministic Chaos: Stretching, Squeezing and Linking
m (modulation amplitude) C1
C2
FIBER LASER C3
C4
Exp. 2 (YAG laser) Exp. 1 (fiber laser) 3
4
X(t)sin(wt)
X(t)sin(wt)
X(t)sin(wt)
(modulation period) 1 2 T/Tr
X(t)cos(wt)
X(t)cos(wt)
X(t)cos(wt)
2 1
1
2
0
0
1
spiral template
reverse horseshoe .
standard horseshoe
YAG LASER
Figure 11. Various templates observed in two laser experiments. Top left: schematic representation of the parameter space of forced nonlinear oscillators showing resonance tongues. Right: templates observed in the fiber laser experiment: global torsion increases systematically from one tongue to the next [40]. Bottom left: templates observed in the YAG laser experiment (only the branches are shown): there is a variation in the topological organization across one chaotic tongue [39, 41].
As a control parameter is varied, periodic orbits are created or destroyed. Attractors observed at different values of control parameters generally have different periodic orbit spectra. These spectra can be characterized with invariants such as the topological entropy which quantifies how the number of period-p orbits increases with p [8]. In this section, we study the fine structure of a chaotic attractor in two complementary ways. In Sec. 5.1, we use the fact that there is a natural connection between knot invariants and a description of orbits in terms of symbol sequences, as illustrated by expressions (3). This allows us to obtain information about the symbolic names of the periodic orbits detected from their topological structure. Because good symbolic codings are continuous mappings from phase space into the set of symbol sequences, we can then interpolate this information to the entire attractor in which the orbits are embedded. This allows us to characterize the chaotic dynamics through its “grammar” (i.e., the set of forbidden symbol sequences). In sec. 5.2, we look more closely at the order in which periodic orbits appear as one proceeds towards fully developed chaos. This order is not arbitrary but is governed by rigid topological constraints that can be deduced from knot invariants: the presence of some orbits forces the existence of infinitely many others. This allows us to describe the orbit spectrum concisely in terms of forcing orbits, and to obtain signatures of chaos from a single orbit. 5.1. Construction of symbolic encodings As mentioned when discussing the horseshoe map (Fig. 3) in Sec. 1.2, there is a natural mapping from trajectories in a chaotic invariant set to infinite symbol sequences (e.g., binary sequences). This mapping is based on a partition of state space into regions, each labeled with a different symbol, and represents an orbit by the sequence of symbols indicating the regions successively visited. When two arbitrary points in the invariant set
M. Lefranc / The Topology of Deterministic Chaos: Stretching, Squeezing and Linking
83
are never associated with the same symbol sequence, the partition is said to be generating. The symbolic dynamics in sequence space then reflects faithfully the dynamics in phase space. It is also generally required that the mapping is continuous (i.e., points that are close in phase space have sequences that are close in sequence space). In hyperbolic systems such as the horseshoe map, where there is a clear distinction between the stable and unstable directions at every point of phase space, there is generally an obvious partition of the invariant set into two or more disconnected components (Fig. 3) that is generating. However, most experimental attractors are connected and are not hyperbolic. Except in limit cases such as extremely dissipative systems, it is generally not obvious how to construct a generating partition, although important steps have been made towards solving this problem [33,43–49]. Once a good partition is known, the dynamics can be characterized through the list of forbidden symbolic sequences (“grammar of chaos”), which are related to the orbits which have not yet been created, and by various invariants such as topological entropy. This approach is all the more relevant as the structure of the forbidden rules directly reflects that of the strange attractor in phase space [50], in a combinatorial form that is easily manipulated. Periodic orbits, knots and templates have proved to be invaluable tools for studying the symbolic dynamics of a strange attractor [46, 47]. Indeed, templates describe the topological structure of hyperbolic flows and have a natural symbolic coding induced by the splitting into several branches (which is the coding of the horseshoe shown in Fig 3): the canonical symbolic name of a template orbit indicates which branches it visits. There is a direct connection between this symbolic name and the knot type of the orbit (Fig. 12). Knot types are preserved as a control parameter is modified, and thus make a bridge between the non-hyperbolic case and the hyperbolic limit. Assume that the template associated with an attractor has been determined and that the knot invariants of a periodic orbit P allow only one possible projection on this template (Fig. 12). Then it is natural to assign to P the symbolic name of this projection. Indeed, this is the symbolic name of the only hyperbolic orbit into which P can be deformed continuously by changing a control parameter. This guarantees that the nonhyperbolic coding can be connected continuously to the hyperbolic coding. 0111
1101 1110 1011
0111
1101 1110 1011
Figure 12. Left: horseshoe orbit 0111. The symbolic code imposes an itinerary on the branches, which in turn determines the knot type. Right: there is only one period-7 horseshoe orbit with a self-linking number of 16 and a torsion of 5, the 0110111 orbit. This orbit can thus be unambiguously identified through its invariants.
It is generally observed that a number of orbits, often of low period, have only one symbolic name compatible with their topological invariants, and that most orbits have only a few possible symbolic names [46]. Once the symbolic dynamics of a number of orbits in phase space is known, a natural idea is to interpolate this information to other points in phase space, since symbolic encodings are required to be continuous mappings, and thus to construct a partition of phase space taking into account topological
84
M. Lefranc / The Topology of Deterministic Chaos: Stretching, Squeezing and Linking
information. When an orbit has several possible topological symbolic names, it is most often found that only one of them is eventually compatible with a partition achieving a continuous encoding of trajectories [46]. This approach has been successfully used to construct a generating partition for a chaotic laser experiment [33]. In order to assess its validity, it was furthermore checked in numerical simulations involving about 1500 periodic orbits up to period 64 of a chaotic laser model that it was possible to construct a partition completely compatible with topological analysis [46] (Fig. 13). Moreover, it was shown that the partition obtained agrees to an accuracy of 10−4 with that obtained by connecting points of tangency between the unstable and stable manifolds [47] (Fig. 13), which is the reference method to construct a symbolic encoding for a non-hyperbolic attractor [43]. This is remarkable, as the two methods are completely independent, and confirms that the topological analysis of chaos provides a description of the dynamics that is valid down to scales much smaller than can be resolved in almost all experiments.
Figure 13. Left: partition of a section of a strange attractor observed in numerical simulations of a chaotic laser model. Center: enlarged view of the partition border. The width of the box is 5 × 10−2 in units of the attractor width. Reprinted from [46]. Right: the size of the box is 2.5 × 10−3 . Light dots represent selected periodic points. Heavy dots indicate homoclinic tangencies (angle between the two invariant manifolds smaller than 2 × 10−4 radians). The linewidth is 2.5 × 10−5 . Reprinted from [47].
5.2. Orbit forcing and signatures of chaos We have seen in previous sections that periodic orbits of a chaotic dynamical system are intertwined in a very complex link. Here we focus on how this link is gradually woven as new orbits are created, or unwoven as orbits are destroyed. This is a subtle problem because of two important constraints: (i) periodic orbits are created with the topological invariants they will have on their entire domain of existence ; (ii) periodic orbits are not born isolated but degenerate with another orbit, a saddle-node partner or a period-doubling mother [2, 3, 9]. Indeed, there are two main types of bifurcations in which new periodic orbits are created: the saddle-node bifurcation, in which a pair of twin orbits appear simultaneously, and the period-doubling bifurcation, where a periodic orbit becomes unstable while giving birth to an orbit of twice the period. Thus, a template not only describes the global organization of periodic orbits of a flow at fixed parameters but also determines which orbits can interact in bifurcations and the order in which orbits appear as one proceeds towards fully developed chaos. Saddlenode partners must have the same knot type, because they are indistinguishable at some point of parameter space. Similarly, there is a simple relation between the knot types of
M. Lefranc / The Topology of Deterministic Chaos: Stretching, Squeezing and Linking
85
the mother and daughter orbits of a period-doubling bifurcation. Moreover, all the orbits that are already present when two other orbits are degenerate at a bifurcation cannot distinghish them topologically: they link them in exactly the same way. Conversely, an orbit that links two twin orbits differently must have appeared after their creation (i.e., at parameter values where they are well separated and distinct) (Fig. 14). This orbit is said to force the existence of the two other orbits. All forcing relations for a given template can be summarized in a forcing diagram (Fig. 14). Orbit forcing makes it possible to describe the orbit spectrum in a concise way using basis sets of orbits, orbits that force other orbits in the attractor but are forced by no other orbits [3, 9, 51]. Basis sets of orbits are a powerful tool to describe the different possible routes towards fully developed chaos.
Figure 14. Left: in these three configurations, (AR , AF ) and (BR , BF ) are saddle-node pairs. In (a), orbits AR and AF cannot interact before BF and BR annihilate each other. Thus, pair B has appeared after pair A. The opposite conclusion holds in (c). When members of a pair cannot be distinguished through their linking numbers with the other pair, they have appeared after the other pair. In (b), there is no forcing relation. Right: forcing diagram for horseshoe orbits obtained from forcing relations as in left. Lines indicate forcing relations with the forcing orbit at the right and the top. Reprinted from [51].
Studying the topology of bifurcations is not the only way to determine forcing relations. The knot type of an individual orbit can imply the existence of infinitely many other orbits and provide information about the global structure of the flow [52], using tools from the Nielsen-Thurston classification of surface homeomorphisms [53]. By building a geometric model of the simplest homeomorphism of the plane compatible with such an orbit, it can be shown that the complement of the orbit is stretched at each iteration (Fig. 15). The minimal stretching rate obtained is the topological entropy of the orbit [52, 54]. It is a topological invariant which depends only on braid type. Forcing relations are obtained by considering that periodic orbits carried by the minimal flow supporting a given orbit are forced by this orbit. Using these ideas, much progress has been done towards understanding the bifurcation structure of horseshoe-like systems [55–57]. A closely related problem is that of forcing relations in homoclinic tangles, where the structure of intersections between the stable and unstable manifolds is given [58, 59]. Periodic orbits with positive-entropy braid type are powerful indicators of chaos: detecting only one such orbit is sufficient to prove that the dynamics is chaotic [6, 29, 60]. As was shown recently in an optical experiment [61], they are invaluable tools for obtaining signatures of chaos in nonstationary systems (Fig. 16), when methods that rely on reconstructing an entire strange attractor fail because of parameter drift.
86
M. Lefranc / The Topology of Deterministic Chaos: Stretching, Squeezing and Linking
Figure 15. Left: a period-three braid and action of the associated homeomorphism of the disk. Right top: representation of the braid as a line diagram. Right bottom: action of the homeomorphism on a set of Markov rectangles and transition matrix. Reprinted from [52] with permission from Elsevier Science.
(a)
(b)
Signal intensity (arb. units)
800
810
820
830 Time (μs)
840
850
860
870
36
X(t+τ) (arb. units)
35
790
30
32
28
25 24 805
810
815 Time (μs)
820
825
25
30 X(t) (arb. units)
35
Figure 16. Top left: time series from an optical parametric oscillator showing a burst of irregular behavior. Bottom left: segment of the time series containing a periodic orbit of period 9. Right: embedding of the periodic orbit in a reconstructed phase space and representation of the braid realized by the orbit. The braid entropy is hT = 0.377, showing that the underlying dynamics is chaotic. Reprinted from [61].
6. Towards a higher-dimensional topological analysis In chaotic systems with a three-dimensional state space, knots and links formed by periodic orbits are extremely powerful tools: they can be used to classify strange attractors, construct symbolic encodings, understand bifurcation diagrams, obtain signatures of chaos, etc. In four dimensions and above, they are absolutely worthless, as all closed curves become isotopic to each other. Unfortunately, not all chaotic attractors are threedimensional. Therefore, it would seem that template analysis is a deep theory, but not very relevant for general chaotic dynamical systems. However, the Birman–Williams theorem (Sec. 3.1) is not entirely about knots. Obviously, the statement about isotopy of periodic orbits does not survive in higher dimensions, but it is reasonable to believe that semi-flows on branched manifolds can be constructed from flows of arbitrary dimension by squeezing the flow along its stable direction [i.e., by applying the Birman–Williams reduction (1)]. Templates for 3D flows have return maps of the 1D branch line into itself displaying the fold singularities that are the signatures of the folding process in phase space. Similarly, branched manifolds for n-dimensional flows will have singular return map from the (n − 2)-dimensional branch line into itself. It should thus be possible to characterize the flow through the organization of the leading singularities of this return map, which will typically be folds for n = 1,
M. Lefranc / The Topology of Deterministic Chaos: Stretching, Squeezing and Linking
87
cusps for n = 2, etc. [3, 9, 62–64]. These singularities are the signatures of the stretching and squeezing mechanisms organizing the chaotic dynamics (Fig. 17).
Figure 17. Left: by going around a 3D branched manifold for 4D flows, a 2D section will be stretched, squeezed and folded over itself, typically developing cusp singularities. Right: branched manifolds could be classified by how their singularities are organized. Reprinted from [3].
More generally, the key principles upon which the topological analysis of chaos is built, namely continuity and determinism, hold in any dimension. In order to extend the topological analysis of chaos to higher dimensions, it is important to recognize that knot theory is not a crucial ingredient but only a convenient tool to explore how continuity and determinism constrain phase-space trajectories. It is because two trajectories cannot intersect that the knot type of a 3D periodic orbit is well defined and that it is not modified as the orbit is deformed under control parameter variation. However, the nonintersection of two 1D curves is a geometric problem that is non trivial only in 3D. Thus, a formulation of determinism that adapts naturally to phase space dimension is needed. It has been recently proposed to base topological analysis on an integral version of the non-intersection theorem [65]. As a volume element of phase space is advected by the flow, the image of its boundary can be stretched and squeezed, but it cannot intersect itself, as this would mean that the trajectories of two of its points are intersecting. In other words, the exterior and the interior of a droplet in a fluid remain separated at all times. A technical formulation of this property is that volume orientation is preserved. This requirement is obviously dimension-independent and can also be applied to hypersurfaces of Poincaré sections (Fig. 18).
ϕ Figure 18. Left: time evolution of a closed curve in successive Poincaré sections of a 3D flow. The orientation of the enclosed area is preserved. Right: A dynamics on a triangulation is constructed.
Given a periodic trajectory, a dynamics on surfaces can be constructed by computing triangulations of the periodic points in successive Poincaré sections and determinining the simplest dynamics on these triangulations that is consistent with periodic point motion and preserves volume orientation. Remarkably, this leads in dimension three to a formalism from which the correct entropies for horseshoe orbits can be recovered [65]. The extension to higher dimensions is currently under investigation.
88
M. Lefranc / The Topology of Deterministic Chaos: Stretching, Squeezing and Linking
7. Conclusion The stretching and squeezing mechanisms that organize strange attractors can be characterized systematically. For three-dimensional systems there is a powerful analysis that relies on the fact that unstable periodic orbits in a chaotic attractor can be projected to a branched manifold (a template) without modifying their topological invariants. It proceeds by detecting periodic orbits in time series, computing their topological invariants and reconstructing the branched manifold the invariants. Attractors that are associated to different templates experience different stretching and squeezing mechanisms. This classification has been used to understand the systematic changes in the topological structure of chaotic forced nonlinear oscillators as modulation frequency and amplitude are varied. Knots and templates can also be used to study the fine structure of chaotic attractors. Because topological invariants of a periodic orbit carry information about its symbolic dynamics, template analysis has proved a powerful tool to construct symbolic encodings and obtain a combinatorial description of the trajectory set. Knots are also useful to understand the structure of bifurcation diagrams and the order in which periodic orbits are created. Certain knot types imply positive entropy and can only occur in a chaotic system: they can be used as signatures of chaos when they can be extracted from short or nonstationary time series. Knots do not survive in higher dimensions. However, branched manifolds do, and there is currently some hope that a topological analysis where continuity and determinism are enforced through volume orientation preservation can be built.
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]
H. Poincaré, Les Méthodes nouvelles de la mécanique céleste, Gauthier-Villars, Paris, 1892. E. Ott, Chaos in Dynamical Systems, Cambridge University Press, Cambridge, 1993. R. Gilmore and M. Lefranc, The Topology of Chaos, Wiley, New York, 2002. F. Takens. Detecting strange attractors in turbulence. In Lecture Notes in Mathematics, Vol. 898 (New York, 1981), D. A. Rand and L.-S. Young, Eds., Springer-Verlag, pp. 366–381. N. H. Packard, J. P. Crutchfield, J. D. Farmer, and R. S. Shaw, Geometry from a time series, Phys. Rev. Lett. 45 (1980), 712–715. M. Lefranc and P. Glorieux, Topological analysis of chaotic signals from a CO2 laser with modulated losses, Int. J. Bifurcation Chaos Appl. Sci. Eng. 3 (1993), 643–649. S. Smale, Differentiable dynamical systems, Bull. Am. Math. Soc. 73 (1967), 747–817. A. Katok and B. Hasselblatt, Introduction to the Modern Theory of Dynamical Systems, Cambridge University Press, Cambridge, 1995. R. Gilmore, Topological analysis of chaotic dynamical systems, Rev. Mod. Phys. 70 (1998), 1455–1530. D. Auerbach, P. Cvitanovi´c, J.-P. Eckmann, G. Gunaratne, and I. Procaccia, Exploring chaotic motion through periodic orbits, Phys. Rev. Lett. 58 (1987), 2387–2389. P. Cvitanovi´c, Invariant measures of strange sets in terms of cycles, Phys. Rev. Lett. 61 (1988), 2729–2732. C. Grebogi, E. Ott, and J. A. Yorke, Unstable periodic orbits and the dimension of chaotic attractors, Phys. Rev. A 36 (1987), 3522–3524. L. H. Kaufmann, Knots and Physics, World Scientific, Singapore, 1991. H. G. Solari and R. Gilmore, Relative rotation rates for driven dynamical systems, Phys. Rev. A 37 (1988), 3096–3109.
M. Lefranc / The Topology of Deterministic Chaos: Stretching, Squeezing and Linking
89
[15] N. B. Tufillaro, H. G. Solari, and R. Gilmore, Relative rotation rates: Fingerprints for strange attractors, Phys. Rev. A 41 (1990), 5717–5720. [16] J. S. Birman and R. F. Williams, Knotted periodic orbits in dynamical systems I: Lorenz’s equations, Topology 22 (1983), 47–82. [17] J. Birman and R. Williams, Knotted periodic orbits in dynamical systems II: knot holders for fibered knots, Cont. Math. 20 (1983), 1–60. [18] P. J. Holmes and R. F. Williams, Knotted periodic orbits in suspensions of smale’s horseshoe: torus knots and bifurcation sequences., Arch. Rat. Mech. Anal. 90 (1985), 115–194. [19] P. J. Holmes, Knotted periodic orbits in suspensions of smale’s horseshoe: period multiplying and cabled knots., Physica D 21 (1986), 7–41. [20] P. J. Holmes, Knotted periodic orbits in suspensions of annulus maps., Proc. Roy. Soc. Lond. A411 (1987), 351–378. [21] P. J. Holmes, Knotted period orbits in suspensions of smale’s horseshoe: extended families and bifurcation sequences, Physica D 40 (1989), 42–64. [22] R. W. Ghrist, P. J. Holmes, and M. C. Sullivan, Knots and Links in Three-Dimensional Flows, vol. 1654 of Lecture Notes in Mathematics, Springer, Berlin, 1997. [23] T. D. Tsankov and R. Gilmore, Strange attractors are classified by bounding tori, Phys. Rev. Lett. 91 (2003), 134104. [24] T. D. Tsankov and R. Gilmore, Topological aspects of the structure of chaotic attractors in R3 , Phys. Rev. E 69 (2004), 056215. [25] C. Letellier, T. D. Tsankov, G. Byrne, and R. Gilmore, Large-scale structural reorganization of strange attractors, Phys. Rev. E 72 (2005), 026212. [26] C. Letellier and R. Gilmore, Covering dynamical systems: Twofold covers, Phys. Rev. E 63 (2001), 0162006. [27] R. Gilmore and C. Letellier, Dressed symbolic dynamics, Phys. Rev. E 67 (2003), 036205. [28] G. B. Mindlin, X.-J. Hou, H. G. Solari, R. Gilmore, and N. B. Tufillaro, Classification of strange attractors by integers, Phys. Rev. Lett. 64 (1990), 2350–2353. [29] G. B. Mindlin, H. G. Solari, M. A. Natiello, R. Gilmore, and X.-J. Hou, Topological analysis of chaotic time series data from Belousov–Zhabotinski reaction, J. Nonlinear Sci. 1 (1991), 147–173. [30] G. B. Mindlin and R. Gilmore, Topological analysis and synthesis of time series, Physica D 58 (1992), 229–242. [31] F. Papoff, A. Fioretti, E. Arimondo, G. B. Mindlin, H. G. Solari, and R. Gilmore, Structure of chaos in the laser with saturable absorber, Phys. Rev. Lett. 68 (1991), 1128–1131. [32] N. B. Tufillaro, R. Holzner, L. Flepp, R. Brun, M. Finardi, and R. Badii, Template analysis for a chaotic NMR laser, Phys. Rev. A 44 (1991), R4786–R4788. [33] M. Lefranc, P. Glorieux, F. Papoff, F. Molesti, and E. Arimondo, Combining topological analysis and symbolic dynamics to describe a strange attractor and its crises, Phys. Rev. Lett. 73 (1994), 1364–1367. [34] C. Letellier, L. Le Sceller, P. Dutertre, G. Gouesbet, Z. Fei, and J. L. Hudson, Topological characterization and global vector field reconstruction of an experimental electrochemical system, J. Phys. Chem. 99 (1995), 7016–7027. [35] N. B. Tufillaro, P. Wyckoff, R. Brown, T. Schreiber, and T. Molteno, Topological time series analysis of a string and its synchronized model, Phys. Rev. E 51 (1995), 164–174. [36] C. Letellier, P. Dutertre, and B. Maheu, Unstable periodic orbits and templates of the Rössler system: toward a systematic topological characterization, Chaos 5 (1995), 271–282. [37] R. Gilmore and J. W. L. McCallum, Structure in the bifurcation diagram of the Duffing oscillator, Phys. Rev. E 51 (1995), 935–956. [38] C. Letellier, G. Gouesbet, and N. F. Rulkov, Topological analysis of chaos in equivariant electronic circuits, Int. J. Bifurcation Chaos Appl. Sci. Eng. 6 (1996), 2531–2555. [39] G. Boulant, S. Bielawski, D. Derozier, and M. Lefranc, Experimental observation of a chaotic attractor with a reverse horsehoe topological structure, Phys. Rev. E 55 (1997), R3801–3804.
90
M. Lefranc / The Topology of Deterministic Chaos: Stretching, Squeezing and Linking
[40] G. Boulant, M. Lefranc, S. Bielawski, and D. Derozier, Horseshoe templates with global torsion in a driven laser, Phys. Rev. E 55 (1997), 5082–5091. [41] G. Boulant, J. Plumecoq, S. Bielawski, D. Derozier, and M. Lefranc. Model validation and symbolic dynamics of chaotic lasers using template analysis. In Proceedings of the 4th Experimental Chaos Conference (Singapore, 1998), M. Dong, W. Ditto, L. Pecora, M. Spano, and S. Vohra, Eds., World Scientific, pp. 121–126. [42] J. Used, J.-C. Martín, and M. Lefranc. unpublished. [43] P. Grassberger and H. Kantz, Generating partitions for the dissipative Hénon map, Phys. Lett. A 113 (1985), 235–238. [44] M. Finardi, L. Flepp, J. Parisi, R. Holzner, R. Badii, and E. Brun, Topological and metric analysis of heteroclinic crisis in laser chaos, Phys. Rev. Lett. 68 (1992), 2989–2991. [45] R. L. Davidchack, Y.-C. Lai, E. M. Bollt, and M. Dhamala, Estimating generating partitions of chaotic systems by unstable periodic orbits, Phys. Rev. E 61 (2000), 1353–1356. [46] Jérôme Plumecoq and Marc Lefranc, From template analysis to generating partitions I: Periodic orbits, knots and symbolic encodings, Physica D 144 (2000), 231–258. [47] Jérôme Plumecoq and Marc Lefranc, From template analysis to generating partitions II: Characterization of the symbolic encodings, Physica D 144 (2000), 259–278. [48] M. B. Kennel and M. Buhl, Estimating good discrete partitions from observed data: symbolic false nearest neighbors, Phys. Rev. Lett. 91 (2003), 084102. [49] Y. Hirata, K. Judd, and D. Kilminster, Estimating a generating partition from observed time series: symbolic shadowing, Phys. Rev. E 70 (2004), 016215. [50] P. Cvitanovi´c, G. H. Gunaratne, and I. Procaccia, Topological and metric properties of Hénontype strange attractors., Phys. Rev. A 38 (1988), 1503–1520. [51] G. B. Mindlin, R. Lopez-Ruiz, H. G. Solari, and R. Gilmore, Horseshoe implications, Phys. Rev. E 48 (1993), 4297–4304. [52] P. Boyland, Topological methods in surface dynamics, Topology and its Applications 58 (1994), 223–298. [53] W. P. Thurston, On the geometry and dynamics of diffeomorphisms of surfaces, Bull. Am. Math. Soc. 19 (1988), 417–431. [54] M. Bestvina and M. Handel, Train-tracks for surface homeomorphisms, Topology 34 (1995), 109–140. [55] T. Hall, Weak universality in two-dimensional transitions to chaos, Phys. Rev. Lett. 71 (1993), 58–61. [56] T. Hall, The creation of horseshoes, Nonlinearity 7 (1994), 861–924. [57] A. de Carvalho and T. Hall, How to prune a horseshoe, Nonlinearity 15 (2002), R19–R68. [58] P. Collins, Forcing relations for homoclinic orbits of the smale horseshoe map, Experiment. Math. 14 (2005), 75–86. [59] P. Collins and B. Krauskopf, Entropy and bifurcations in a chaotic laser, Phys. Rev. E 66 (2002), 056201. [60] F. Papoff, A. Fioretti, E. Arimondo, G. B. Mindlin, H. G. Solari, and R. Gilmore, Structure of chaos in the laser with a saturable absorber, Phys. Rev. Lett. 68 (1992), 1128–1131. [61] Axelle Amon and Marc Lefranc, Topological signature of deterministic chaos in short nonstationary signals from an optical parametric oscillator, Phys. Rev. Lett. 92 (2004), 094101. [62] R. Gilmore. Topological analysis of chaotic time series. In Applications of Soft Computing (Bellingham, 1997), B. Bosacchi, J. C. Bezdek, and D. B. Fogel, Eds., vol. 3165 of Proc. SPIE, SPIE, pp. 243–257. [63] R. Gilmore, Catastrophe Theory for Scientists and Engineers, Wiley, New York, 1981, Reprinted by Dover, New York, 1993. [64] V.I. Arnol’d, S.M.Gusein-Zade, and A.N.Varchenko, Singularities of differentiable mappings I, Birkhäuser, 1985. [65] M. Lefranc. Alternate determinism principle for topological analysis of chaos. ArXiv preprint nlin.CD/050305, 2005.
Physics and Theoretical Computer Science J.-P. Gazeau et al. (Eds.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
91
Random Fractals Christoph Bandt 1 Arndt University, Greifswald, Germany Abstract. This is a mathematical but non-technical survey on random fractals and random processes on deterministic fractals. We focus on important principles of self-similarity and randomness, and discuss a number of interesting examples. Keywords. Self-similar, fractal, random process, harmonic function
1. A Brief History of Fractals When we approach a subject in Science, we often look at its history. In the case of fractals, various periods of development can be distinguished roughly as follows. Around 1900, strange counterexamples were found by famous mathematicians: • • • • •
Cantor’s set consisting of “continuum many” separate points (Section 3), plane-filling curves (Peano, Hilbert) curves without intersection points with positive area (Osgood) nowhere differentiable functions (Weierstraß, von Koch) graphs where each point is a branch point (Sierpi´nski)
Starting 1920, important theory was created but not yet publicly recognized. • Hausdorff and Besicovich developed the concepts of fractional dimension and fractional measure. • Wiener and Kolmogorov founded the theory of Brownian motion. • In the 1940s Lévy discovered more general self-similar random processes which now carry his name. After 1975, fractals became popular, and a wave of simple applications followed. • Benoit Mandelbrot, who considers himself a student of Lévy, created the word “fractal” and wrote his book “Les objets fractals: forme, hasard et dimension” which in English became “The Fractal Geometry of Nature”. • Thousands of papers throughout science, in engineering and the social sciences were inspired by Mandelbrot’s book. • Computer graphics provided impressive visualizations of fractals. • In probability theory, new self-similar constructions were developed: Branching Brownian motion, Aldous’ tree, Le Gall’s Brownian snake etc. Since 2000, new important theory is being worked out. 1 Correspondence to: Institut für Mathematik und Informatik, Arndt-Universität, 17487 Greifswald, Germany, Tel.: +49 3834 864632; Fax: +49 3834 864615; E-mail:
[email protected].
92
C. Bandt / Random Fractals
• Lawler, Schramm and Werner developed a theory of conformally invariant random fractal curves. • S. Smirnov proved conformal invariance of two-dimensional percolation. • Analysis on fractals (which started 1982 with two letters in J. Physique) develops into an own mathematical area (see Section 7-9). Thus we live in a time where a deep probabilistic theory of fractals becomes visible. In this survey, we cannot go to details. We better discuss examples and general principles of self-similarity and randomness, and we try to stay near to the viewpoint which made Mandelbrot’s book so influential: Fractals are a tool for modelling nature. When a good theory is available, another wave of more profound applications can certainly be expected.
2. Two Self-similarity Arguments What is so special about self-similar constructions? Small pieces look the same as the whole figure, so more and more details will appear when we magnify the picture. This is in contrast to the topics treated by calculus. If we magnify a differentiable curve, we approach the shape of a line, and in a similar way all local phenomena are assumed to be linear in the classical approach, at least approximately. In the period of counterexamples, fractals were considered as monsters since they cannot be approached by calculus. In the period of applications, it was realized that natural phenomena are often organized in a hierarchical or even self-similar manner: look at cauliflower, snowflakes, the human lung, fern leaves, clouds. Of course this is also a crude approximation. Whereever you cut your skin, blood will appear. Thus, Mandelbrot concludes, the blood vessels must form a space-filling curve. In the period between where fractal dimension and random processes developed, it was found that self-similarity is not so bad for mathematical proofs either. Self-similarity is a symmetry assumption which simplifies structure and calculation. Let us demonstrate this with two examples.
................ .. .. .... .... .... ...... ........ ......... ...... .. .......... ........ . . .......... . . . ........ ......... ................... ......... . . . . . . . . . . . . ......... ......... .................... ....... ........ ......... ........ Figure 1. The Koch curve does not admit a tangent. Example 1. The Koch curve is not differentiable – at none of its points x. Why couldn’t there be exception points? To see this, take any line through the Koch curve K in Figure 1. Let h and b denote the height and base length of K. It is easy to see that some points of K (one of the endpoints, or the tip in the middle) must have distance ≥ h2 from . A simple calculation shows h2 > 7b .
C. Bandt / Random Fractals
93
Now take any point x in K and the ball Br (x) around x with radius r. If K is differentiable with tangent line at x, then for given ε > 0 and small enough r, all points of K ∩ Br (x) must have distance smaller εr from . However, there is a little piece of K which contains x and is contained in Br (x) and has base length b > 3r . Since this piece is similar to K, we can apply the above argument to see that there are points in r . So cannot be a tangent since the K ∩ Br (x) with distance from larger than b7 > 21 1 • condition is not fulfilled for ε < 21 .
Figure 2. Sierpi´nski’s gasket is a graph with branching at every point. Example 2. Each point x in the Sierpi´nski gasket is a branching point: it can be reached by three disjoint polygonal paths from the vertices of the gasket. To see this, it is enough to prove that for any tiny subtriangle, the three vertices can be reached by disjoint paths from the three vertices of the whole triangle. Then the paths to x can be obtained as limits of the paths to subtriangles containing x which become smaller and smaller. However, to show the assertion for subtriangles, it is enough to consider only the three subtriangles of the first level for which the assertion is obvious (one of the paths will be a point, the other two intervals). Now we use induction and repeat this argument on the other levels to approach smaller triangles. •
3. The Cantor Set The Cantor set is the father of all fractals. It is an abstract concept which appears everywhere in modern mathematics and which every mathematician should know. We start with the concrete one-dimensional construction of Cantor. The middle-third set. From [0, 1], remove the middle third [ 13 , 23 ]. From the 2 remaining intervals of length 13 , again remove the middle third set. From the 4 remaining intervals of length 19 , again remove the middle, and so on. k Since 23k → 0 for k → ∞, the resulting set C has total length 0. However, C has as many points as [0, 1].
94
C. Bandt / Random Fractals
C={
∞
ak 3−k where ak ∈ {0, 2} for all k}
k=1
while [0, 1] = {
∞
bk 2−k where bk ∈ {0, 1} for all k}.
k=1
Similarly, we can take the set C˜ of all decimal numbers 0.c1 c2 c3 ... with ck ∈ {0, 9} for all k. This set C˜ is even smaller as C since the holes are much larger. Still it has the same number of points. Cantor set as limit of a tree. This description is more abstract and applies to both ˜ Instead of cutting holes into [0, 1], we can take a binary tree and consider C C and C. as its set of limit points:
Figure 3. The Cantor set as limit set of the binary tree The nodes of the tree correspond to the intervals which are left on different levels. The limit points correspond to infinite paths in the tree. Cantor set and symbol sequences. Each point in the Cantor set is characterized by a sequence b1 b2 b3 ... of zeros and ones. In the tree, going left means 0 and going right is 1. In probability theory, C is taken as coding space for sequences of coin tosses: 1=head, 0=tail. Other codings with three or more symbols, and with certain forbidden combinations bk bk+1 , lead to equivalent Cantor sets, as is shown in topology. There, the Cantor set is ∞ considered as infinite product space C = k=1 {0, 1} which is exactly the space of 0-1sequences. In a more general setting, C can be characterized by topological properties. Theorem. Each compact totally disconnected set without isolated points is homeomorphic to C. Totally disconnected means that there is a base {B1 , B2 , ...} of open-and-closed sets. The idea of proof is to assign to each x the sequence b1 b2 ... with bi = 1 if x ∈ Bi and bi = 0 else, for an appropriate base. • Theorem. Each compact metric space X is a continuous image of C. The proof idea here is to assign the basic sets of C which are given by 0-1-words to basic sets which cover X. Fractal examples are given below. There the assignment is defined by contractive mappings fj associated to the symbols j. •
C. Bandt / Random Fractals
95
Now we draw our attention to the self-similarity of the Cantor set. Clearly, C consists of small pieces Cw which are similar to C itself. In the binary tree, each piece corresponds to a node and to the binary subtree below this node. In the symbol representation, each piece corresponds to a 0-1-word w = w1 ...wn , and Cw is given by all 0-1-sequences which start with w. They have the form w1 ...wn b1 b2 b3 ...
with
bk ∈ {0, 1}.
4. Self-similarity and Dimension The Cantor set and similar examples led Mandelbrot to formulate the concept of selfsimilarity: small pieces are similar to the whole. A rigorous concept was given 1981 by Hutchinson. Let f1 , ..., fm be contracting maps on Rn : |fk (x) − fk (y)| ≤ rk · |x − y| with 0 < rk < 1. We shall consider only similarity maps where equality holds instead of ≤, and rk is called the factor of fk . Definition. A compact set A = ∅ is called self-similar with respect to f1 , ..., fm if A = f1 (A) ∪ ... ∪ fm (A) Hutchinson’s Theorem. Given fk , there exists a unique A. A proof, by simple use of the contraction mapping principle, can be found in textbooks [1,6]. The Sierpi´nski gasket (Figure 2) with vertices c1 , c2 , c3 is obtained from the mappings fk (x) = 21 (x + ck ), k = 1, 2, 3. Addresses. Self-similar sets can be addressed in a natural way by symbols. Let S = {1, ..., m}. Consider the set S ∞ of all sequences s = s1 s2 s3 ... from S and the set ∞ S ∗ = n=0 S n of all words w = w1 ...wn from S. Each piece of a self-similar set A corresponds to a word, Aw = fw1 fw2 ...fwn (A), and each point x can be assigned an address s ∈ S ∞ by x = lim fs1 fs2 ...fsn (x0 ) n→∞
where x0 is arbitrary. Points in the intersection of different pieces have several addresses. In this way the Cantor set S ∞ and the corresponding m-ary tree describe the structure of any self-similar set. Pythagoras and similarity dimension. A right-angled triangle with side lengths r1 ≤ r2 ≤ 1 is self-similar, and the maps f1 , f2 have factors r1 , r2 . Pythagoras’ theorem
96
C. Bandt / Random Fractals
r12 + r 2 = 1 2
r2
r1
1 Figure 4. Proof of Pythagoras’ theorem by area and similarities comes from the fact that the area of the small triangles is r12 and r22 times the area of the big triangle. Proposition. For a self-similar set A with factors ri , there is a unique α with r1α + α = 1. This α is called the similarity dimension. • ... + rm For the Sierpi´nski gasket, 3 · ( 12 )α = 1 gives α =
log 3 log 2
≈ 1.58.
Natural measure on self-similar sets. In the same way as Pythagoras’ theorem can be derived from area, similarity dimension is based on Hausdorff’s fractional measures. On a self-similar set A, a natural measure μ can be defined by putting μ(A) = 1,
μ(Aj ) = rjα ,
μ(Ajk ) = rjα rkα , ...
for j, k, ... ∈ {1, ..., m}
The equation for α is necessary here, as well as the fact that the Ak do not substantially overlap. The latter is expressed by the open set condition: there is an open set U such that fk (U ) ⊆ U for all k and fj (U ) ∩ fk (U ) = ∅ for j = k. If this condition holds, then μ coincides with the α-dimensional Hausdorff measure on A, up to a constant factor [6]. In particular, α is the famous Hausdorff dimension. Mass scaling and dimension. Let us explain a more general concept of fractal dimension. Given a measure μ on and a point x in Rn , one can ask whether there is an exponent α such that the measure of the ball Ur (x) is approximately Crα , for some constant C and all sufficiently small r. For the length measure on the line we have μ(Ur (x)) = 2r, so α = 1 with C = 2. Similarly, α = n for the volume measure in Rn . If lim
r→0
log μ(Ur (x)) =α log r
then α is called the local dimension of μ at x. If μ is the natural measure on a self-similar set A with open set condition, then the similarity dimension is the local dimension of μ at each point x of A. So the two concepts of dimension coincide. Remark: If the local dimension takes different values, μ is called multifractal measure. For example, we can define other measures on the gasket by μ(fk (A)) = pk = rkα , where pk > 0 and pk = 1, and μ(fj fk (A)) = pj pk etc.
C. Bandt / Random Fractals
97
5. Fractals as Models
In this section, we briefly present some computer simulations of random phenomena with a fractal nature, to get an idea of the concepts which are needed here. Similar material can be found on the web, including fractal landscapes, turbulence, the structure of galaxies or financial markets, which will not be touched here. We recommend the new web pages by Mandelbrot and co-workers at Yale, http://classes.yale.edu/fractals/ and for landscapes www.gameprogrammer.com/fractal.html, or www.cg. tuwien.ac.at/courses/Fraktale/PDF/fractals7.pdf. All illustrations here look more impressive with color and animation.
Figure 5. A realization of simple random walk in the plane
Modelling irregular motion. The motion of a bird or an insect does not follow differential equations, like the motion of stars. A first model is random walk in discrete time. In the plane this means that, independently in each time step, we go with probability 1 4 one unit to North, South, East or West. On the line, there is only Right and Left, or Up and Down, if the axis is oriented vertically, as it is done when motion on the stock market is observed. The random steps are repeated many times. How can we describe the trace of the walk? A classical result is that on the line and plane, the walk is recurrent - with probability 1 it will reach any point, infinitely many times. What happens if the step size in time and space tends to zero while the number of steps tends to infinity? This will be the topic of chapter 4.
98
C. Bandt / Random Fractals
Figure 6. Modelling the growth of seaweed Modelling irregular growth. Physicists are interested in the growth of a sticky cluster by addition of small particles which move randomly in the neighborhood. For instance, metal ions moving in a fluid may collect on a charged electrode and form strange shapes. Similar processes may occur in dielectric breakdown, formation of snowflakes, growth of certain bacteria cultures and corrosion. While most models assume that the particles in the environment undergo random walk, our simulation program, written by Götz Gelbrich models the growth of seaweed on the ground under the influence of light: random rays are sent from above, and where they first hit the aggregate, a new pixel is added. Predator-prey dynamics in space. The classical Lotka-Volterra equations x = ax − bxy , y = −cy + dxy for prey x(t) and predator y(t) assume that any prey will meet any predator with rate xy, and this results in a reduction of the number of prey and increase of the number of predators with factors b, d, respectively. The size of prey and predator populations in such models (in particular when competition terms with x2 , y 2 are built in) will form cycles in time which depend on each other: if prey goes up, predator will follow, then prey goes down etc. Another model is a particle system which takes into account the spatial structure of the populations. Points in the lattice Z2 are either empty (black), occupied by one prey (blue), or by one predator (green). Only neighboring sites will interact. The parameters a, b, c, d above are replaced by probabilities for death and birth of a new particle on an empty site, depending on types of neighbors. At each time step, two neighbors are randomly chosen. Similar models work for competition of species, or for the spread of a disease, or a fashion. Not much is known about the structure of resulting clusters. In critical cases, they will form fractals. Our program was written by Jan Fricke, www.math-inf.uni-greifswald.de/∼fricke/StoKanAu. Now there are big packages on the web, as www.antsinfields.de by F. Friedrich.
C. Bandt / Random Fractals
99
Figure 7. Below and above the percolation threshold Critical processes - percolation. Imagine we have a mixture of tiny plastic and metal particles. Can electricity pass through this material? There is a threshold for the percentage of metal above which current will flow. At the critical value, those metal particles which are electrically connected with the boundary of our region will form a random fractal, a so-called critical percolation cluster. Those pieces which belong to the electrical connection from a given point to the boundary from a fractal curve, the so-called backbone of the percolation cluster. Percolation is usually modelled with occupied and empty lattice sites of a lattice of squares or hexagons, see www.physics.buffalo.edu/gonsalves/ ComPhys_1998/Java/Percolation.html. Our simulation is based on a Poisson process model.
6. Brownian Motion This is an old concept from the 1920s, but it is definitely the most important random fractal and the basis for a large part of modern probability. So it is absolutely necessary to discuss this topic here. Definition as scaling limit. Brownian motion is obtained as scaling limit of random walk where step size both in time and space tends to zero. The convergence in time must be faster since n2 more time steps are needed in order to get n times √ more distance from the initial point - at least in the quadratic average. This is the basic n-rule in elementary probability which asserts that√the sum of n independent quantities with standard deviation σ has standard deviation nσ. We start in dimension 1. If we start at zero on the line and make n2 random steps of length n1 , the endpoint will follow a (modified) binomial distribution with mean zero and standard deviation 1. Thus the distribution of the endpoint quickly converges to a standard normal distribution. Similarly, the distribution of the midpoint of the curve, after n2 /2 random steps of length 1 √1 n , will become normally distributed with mean zero and standard deviation 2 . If R(n) denotes the position of random walk after n steps, Brownian motion can now be defined as R([n2 t]) . n→∞ n
B(t) = lim
100
C. Bandt / Random Fractals
Note that for random objects, we should write R(n, ω), B(t, ω). However, this is not an ordinary limit: it means that the distributions of whole collections of renormalized random walks converge to the distribution of Brownian motion. All ω are present on both sides. From this construction follows the usual axiomatic characterization of Brownian motion as a process with independent and stationary increments which starts at point zero. Normal distribution is a consequence of these axioms. Stochastic self-similarity. Consider the process B(r2 t) where r is first an integer and later a real number. By our construction, this process can be approximated by R([(nr)2 t]) R([n2 r2 t]) =r· . n nr The fraction on the right-hand side approximates B(t), so we get the Self-similarity of Brownian motion: B(r2 t) = r · B(t)
for all r > 0 .
Again, equality is meant for the distribution of the collection of all realizations. Roughly speaking, for any realization of B(r2 t), we have equality with another realization r·B(t). This stochastic self-similarity holds for arbitrary positive values of r. However, the center of magnification is always the initial point on the curve. Midpoint displacement. A self-similar construction of the graph of Brownian motion for 0 ≤ t ≤ 1 follows from a simple study of its distribution. z1 , z2 , ... denote random numbers from a standard normal distribution. • Choose B(1) = z1 . The segment from (0, 0) to (1, B(1)) is our first approximation. • The midpoint of the segment is displaced: B( 12 ) = 12 (B(0) + B(1)) + z22 . • Now we have two segments which are treated in the same way. In each step, the displacements are decreased by the factor √12 . Mathematically, midpoint displacement is a wavelet representation of the graph of B(t) on [0, 1]. Brownian motion in Rd . For computer visualization of fractal landscapes, midpoint displacement was used for two-dimensional time domains (Brownian sheet). Midpoint displacement can also construct trajectories B = {B(t)|t > 0} of Brownian motion in Rd , d ≥ 2. We just choose the zk from a d-dimensional standard normal distribution. The trajectories can have lots of self-intersections, in particular for d = 2, as we see in Figure 5. Note that the trajectories, considered as sets, have a simpler self-similarity property than the graphs: B = rB
for all r > 0.
C. Bandt / Random Fractals
101
Of course, this holds for the distribution of the sets, not for single realizations Bω . When you magnify Brownian trajectories, what you see is essentially always the same. Brownian motion and dimension. Brownian trajectories in Rd almost surely have fractal dimension 2, for all d ≥ 2. To see this, consider the mass scaling at point 0 of the occupation measure μ on B (i.e. μ(A) is the amount of time which B(t) spends in set A). From random walk scaling we have μ(Ur (0)) = r2 μ(U1 (0))
in distribution.
Moreover, the graph of one-dimensional Brownian motion has dimension 23 , and the set of its zeros has dimension 12 , almost surely. Taylor got much more detailed results: the exact Hausdorff dimension function for B is s2 log log 1s for d ≥ 3 and still more complicated for d = 2. 7. Examples of Random Constructions Fractional Brownian motion. Many random fractals are obtained from Brownian motion in one or the other way. A self-similar modification with less independence is fractional Brownian motion where increments B(t) − B(s) are still Gaussian but with variance |t − s|2H instead of |t − s|. The construction by midpoint displacement works, and H determines the roughness of the trajectories. Swiss cheese construction. In the same way as for the Cantor set we can punch holes into Rd and consider the remaining fractal set. Holes are chosen randomly. For simplicity, let them all have circular shape. All holes should be independent of each other. Their midpoints should be uniformly distributed. However, small holes should appear much more often than large ones. Mandelbrot 1972 and U. Zähle 1984 studied the following self-similar construction. A(ω) = Rd \
∞
Uri (xi )
i=1
where ω = {(xi , ri )|i = 1, 2, ...} is a Poisson process on Rd × (0, ∞) with intensity function g(x, r) = ar−(d+1) for r ≤ |x| and g(x, r) = 0 for r > |x|. The last condition guarantees that 0 is in A(ω).
Figure 8. A realization of Swiss cheese. Is the black set a Cantor set?
102
C. Bandt / Random Fractals
Properties of the Swiss cheese fractals. • Self-similarity in distribution: A = rA
for all r > 0.
• Dimension d − a · V ol(B1 (0)) (Volume in Rd ) • Almost every point of A is disconnected from the rest of A • In R1 , we have the Markov property: for x ∈ A, the sets {y ∈ A|y > x} and {y ∈ A|y < x} are independent. • For every α < 1, there is exactly one self-similar Markov random Cantor set Aα in R, the support of a corresponding Levy subordinator. • For α = 12 , we get the zero set of Brownian motion. Levy flights. These are modifications of Brownian motion where jumps are possible. The most important class is obtained by restricting the time domain of Brownian motion to a Swiss cheese Cantor set Aα on the line. We get a random Cantor subset of the random trajectory B, with dimension 2α. If we like to obtain a curve, we connect the two endpoints for every cutout time interval by a straight line segment. The flights of albatros resembled this model, with long straight flights interrupted by fractal search for food. Branching Brownian motion. We take a number of particles which undergo Brownian motion and at the same time critical branching. For instance, after a certain time step, every particle either dies or divides into two particles which go on independently. To get a scaling limit, one uses measures. On the level k we consider k particles of mass k1 which divide with a time step of k1 . The limit in the sense of distribution is a measure-valued process which was termed super-Brownian motion and studied in detail by Dawson and others. Le Gall proved that not only the motion, but even the branching of this process can be described in terms of Brownian motion (so-called Brownian snake). Self-similar sets and branching There is an analogy between the hierarchy of pieces in a self-similar set and the family structure of a spatial branching process. In order to work this out, one has not only to consider successors of one fixed particle but also the ancestors and their successors, that is, the complete family tree. Instead of the open set condition, which is not appropriate here, a weaker transience condition for generating similarity maps has to be fulfilled. Super-Brownian motion can be embedded in such a framework. Random self-similar sets: not just adding noise. In physics, randomness is often introduced by noise. However, if we introduce noise in the random iteration algorithm for a self-similar set, xn+1 = fk (xn ), where P (k) = rkα then the fractal structure will be destroyed, as can be seen in Figure 9. However, random versions of self-similar sets can be obtained by choosing f1 , ..., fm randomly from a fixed distribution of the space of m-tuples of similarity maps. This
C. Bandt / Random Fractals
103
choice has to be done for each word w = w1 ...wn in S ∗ independently, in order to obtain fw1 , ..., fwm . Then Aw = fw1 · fw1 w2 · ... · fw (A). The first random choices concerning fw1 , and then fw1 w2 , are decisive, random changes on subsequent levels change less and less, in contrast to noisy iteration. Under the open set condition, it was shown by Falconer, Graf, Mauldin and Williams that the dimension α of the resulting random set is almost surely given by α E( m k=1 rk ) = 1. Unfortunately, the topological structure of the fractal can hardly be controlled with this construction. Midpoint displacement can be interpreted as a particular example.
Figure 9. Left: Adding noise when randomly iterating similarity functions will destroy the structure. Right: Randomly selecting similarity mappings on each stage gives random structure. Random fractals and conformal invariance. Some profound recent results in probability give hope for the development of a theory of random fractals. Let us just list them here and refer to [9] for details. • O. Schramm found 1999 a description of random planar fractal curves in terms of conformal mappings and Brownian motion (stochastic Löwner equations - one parameter). • Two of the resulting curves were identified as the scaling limits of the loop-erased random walk and the curve dividing a large region in critical percolation (Lawler, Werner, Schramm). • S. Smirnov proved certain invariance properties of critical site percolation on the plane triangular lattice. • The dimension of the outer boundary of planar Brownian motion was conjectured to have dimension 43 , twenty years ago by Mandelbrot. This was proved now by Lawler, Schramm and Werner. Some problems. We mention some questions which need further work. • Construction of random fractals with strong independence properties: what can replace the one-dimensional Markov property? • Moving fractals which model smoke, fire, clouds, and turbulence. • Find a convincing fractal model for percolation phenomena (several approaches exist).
104
C. Bandt / Random Fractals
8. Harmonic Functions and the Laplace Operator In the second part of this little survey we shall consider random processes on fractals. The aim of our presentation is to • show the interplay of analysis and probability, and of discrete and continuous models, • provide a somewhat unusual view of the Laplace operator which plays a central role in mathematical physics, and • explain how self-similarity as a symmetry property simplifies problems which are otherwise not tractable. The Laplace operator, which is applied to problems of heat transfer, gravitation, vibrations, electricity, fluid dynamics, and many others, is usually defined for twice differentiable functions u : Rd → R as follows. Δu =
d ∂2u j=1
∂x2j
The function u = u(x1 , ..., xd ) is called harmonic if Δu = 0. The simplest harmonic functions are the linear functions u = aj xj + b. For d = 1, there are no others. For d = 2, plenty of harmonic functions are obtained as real or imaginary part of any complex analytic function. Examples in polar coordinates r, φ are u = φ and u = log r. 1 is harmonic outside the origin. By adding For d = 3, the Newton potential u(x) = |x| or integrating Newton potentials of different masses, a lot of harmonic functions can be constructed. The mean-value property. A function u is harmonic in an open region V if and only if for any x ∈ V and any sphere C = Cr (x) around x and inside V, the value of u at x is the average of the values on the sphere. In two dimension this reads u(x) = 2π 1 iφ 2π 0 f (x + re )dφ . The mean-value property is our starting point. It serves to define harmonic functions and Laplace operators on finite sets and on fractals. In fact, a function need not be differentiable when we want to check this property. A direct consequence of the mean value property is the maximum principle: Maximum and minimum of a continuous function defined on the closure V and harmonic in V must be assumed on the boundary. If there is a point x ∈ V with u(x) = maxy∈V u(y) then u is constant on V . The proof is an exercise. Dirichlet problem. Let V be an open domain in Rd with boundary B and g : B → R a continuous function on the boundary of V. Then there is a unique continuous function f on V which is harmonic on V and coincides with the given g on the boundary. Physically, one can interpret g as the temperature which by some mechanism is kept constant at each point of the boundary, and f as the temperature distribution on the interior which will result from the exchange of heat. In the following, we shall get to know other equivalent interpretations, even on finite structures. The classical reference for the next section is Doyle and Snell [5], a more recent one is Häggström [7].
C. Bandt / Random Fractals
105
9. Harmonic Functions from Markov chains and Electric Networks Random walk on finite graphs. Many games of chance can be modelled by finite Markov chains. Let us consider a finite undirected graph with vertex set X - the states of the game - and edge set E (no loops, no multiple edges). An edge {x, y} indicates a possible move of the game from state x to state y, or conversely. When we are in state x we choose the next state randomly among all terminal points of edges in x, with equal probability. Now let us consider in X two disjoint subsets X1 of winning states and X0 of losing states. The game is over as soon as we reach a point in X0 ∪ X1 . The question is to determine the winning probability u(x) for any initial state x. Example (Bremaud [3]). States are five rooms of a flat. The mouse performs random walk. What is her chance to reach the cheese before it is eaten by the cat?
Figure 10. Help the mouse survive the random walk Winning chances are harmonic. Obviously, u(x) = 0 for x ∈ X0 and u(x) = 1 for x ∈ X1 . For other x, conditional probabilities lead to u(x) =
1 u(y). dx y∼x
Here y ∼ x means that {x, y} ∈ E, that is, y is a neighbor of x. The number of neighbors is the degree dx of x. This system of equations can be solved to get a unique function u, provided that the graph is connected which we always assume. For the mouse in the above initial point x we get u(x) = 72 . The equation can be interpreted as a mean-value property of u on the set V = X \ (X0 ∪ X1 ), and the values 0 and 1 are just the given boundary values of u. Thus we have a discrete probabilistic version of the Dirichlet problem, with boundary set X0 ∪ X1 . Resistance networks. Now we consider a finite network of resistances. We take the graph (X, E) as before, and a resistance Rxy and conductance Cxy = R1xy is assigned to each edge {x, y} ∈ E. Now we connect a unit voltage with − to X0 and with + to X1 and so that u(x) = 0 on X0 and 1 on X1 . We would like to know the voltage u(x) at all vertices x ∈ V. This is done using Kirchhoff’s laws:
106
C. Bandt / Random Fractals
0=
ixy =
y∼x
If we define Cx =
y∼x Cxy ,
u(x) − u(y) Rxy y∼x
this implies u(x) =
y∼x
u(y) ·
Cxy . Cx
Again we have a mean-value property. Equal probabilities Cxy Cx .
1 dx
are now replaced by pxy =
In probabilistic version, we have a random walk with transition probabilities pxy from x to y. This is a general case of finite Markov chain, though some symmetry is still assumed. Probabilistic solution of Dirichlet’s problem. There is nothing special about boundary values 0 and 1. Instead of X0 ∪ X1 we can take an arbitrary subset B ⊂ X as boundary of our domain, and prescribe arbitrary voltages u(x) for every x ∈ B. Together with the mean-value property for x ∈ V this will give us a unique solution of our Dirichlet problem. In the probabilistic interpretation, u(y) for y ∈ B is the gain which we get in our game when we arrive at the terminal state y. Harmonic functions, or voltages in a resistance network can be interpreted as average gain in a game which describes a random walk. Such a probabilistic version does also exist for the classical Dirichlet problem. Theorem. Let V be an open domain in Rd with boundary B and u : B → R a continuous function. Then u can be extended to a harmonic function on V by u(x) = E u(y) where y is the first (random) point of B reached by Brownian motion starting in x, and E denotes expectation. This gives a clear existence proof for the solution of the Dirichlet problem though details are difficult. Uniqueness is a direct consequence of the maximum principle. Laplace operator on finite networks. Now we know what harmonic functions u are, which should correspond to Δu = 0. So let us define the Laplace operator by Cxy v(y) − Cx v(x) . Δv(x) = y∼x
When our graph has n vertices, v(x) is an n-dimensional vector, and Δ is a symmetric square matrix which has the conductances Cxy outside the diagonal and negative entries −Cx on the diagonal so that row sums are zero. We can also consider the positive-definite quadratic form E(v) = −(v, Hv) =
1 (v(x) − v(y))2 2 x,y∈X
which represents energy dissipation. One can prove that for given boundary values, the minimum energy dissipation is realized by taking for v the harmonic voltage function u. Effective resistance. Physically, the effective resistance between two points x, y ∈ 1 X is obtained as ρxy = |ixy | when a unit voltage 1 = u(y) − u(x) is put between these
107
C. Bandt / Random Fractals
two points and the current is measured. From this physical interpretation, it can be seen that ρ is a metric on X, i.e. it fulfils the triangle inequality. We can use elementary calculations with resistances to determine ρ. Remember that two resistances in series add, R = R1 + R2 , while the formula for two parallel resistors is 1 1 R1 R2 1 = + , R= . R R1 R2 R1 + R2 Z R XZ
Z R
R Z
YZ R X
R X
Y
R XY
X
Y
Y
Figure 11. How to replace a Δ by a Y Example: Δ-Y-transform. Take a triangle xyz with resistances Rxy , Ryz , Rxz on the edges. To determine ρxy we note that we have parallel resistances Rxy and Rxz + Ryz .
ρxy =
Rxy (Rxz + Ryz ) Rxy + Rxz + Ryz
Similar for ρxz and ρyz . Now the triangle can be replaced by the Y-formed circuit with resistances Rx = 12 (ρxy + ρxz − ρyz ) , similarly for Ry , Rz . Both circuits have exactly the same electrical behavior when they are measured only at x, y, z.
´ 10. Harmonic Analysis on Sierpinski’s Gasket From finite graphs we now turn to fractals. We explain the basic ideas, technical details can be found in [8], and in recent papers by Kumagai (www.kurims.kyoto-u.ac. jp/∼kumagai). Resistance of the gasket. A first approximation of the Sierpi´nski gasket is the triangle xyz with equal resistances R = 1 on the edges. We get ρ = 32 for the effective resistance between any two vertices, so that we can replace the triangular circuit by the Y-circuit with Rx = Ry = Rz = 13 . Now we compose three of these Y-circuits which form a bigger Sierpi´nski gasket. What will be the effective resistance ρ between the vertices?
108
C. Bandt / Random Fractals
Figure 12. Renormalizing resistance on the gasket 2 4
·
By basic resistance formulas, ρ = 13 + 3 2 3 + 13 = 10 nski gasket 9 . Thus our bigger Sierpi´ can be replaced by a single Y-circuit with Rx = Ry = Rz = 59 . In other words: When the Sierpi´nski gasket is increased by one level, composing three smaller gaskets, then the effective resistance increases by the factor ρρ = 53 . One should stop for a moment to understand that self-similarity and our simple calculation really provide this conclusion. Indeed, the scaling will hold for arbitrary fine approximations of the gasket. Resistance exponent. In Section 4 we mentioned how the ansatz μ(Ur (x)) ≈ C ·rα r (x)) leads to the (local) fractal dimension α = limr→0 log μ(U which can be called masslog r −n scaling exponent. For the gasket, inserting r = 2 and μ(Ur (x)) = 3−n gives α = log 3 log 2 ≈ 1.58. For resistance we now make a similar ansatz ρ(r) ≈ C · rβ where ρ(r) denotes the effective resistance between two vertices in a piece of side length r. This leads to the resistance-scaling exponent β = lim
r→0
log ρ(r) . log r
5−log 3 For the gasket, inserting r = 2−n and ρ(r) = ( 53 )−n gives β = log log ≈ 0.737. 2 Since β < 1 < α, resistance changes more slowly and mass more rapidly than length.
Random walk exponent. In a game of chance, beside the winning probability u(x) we are also interested in the (expected) length t(x) which the game will last when it is started at x. There is a simple mean-value formula for random walk on finite graphs, similar to that for u(x) in the previous section. t(x) = 1 +
t(y) · pxy .
y∼x
Here pxy denotes the transition probability from x to y and t(y) = 0 for “points of the boundary”, that is, winning or losing states. The 1 is the time step from x to the neighbor, so t(x) is not a harmonic function.
109
C. Bandt / Random Fractals Z Z
B
A
Y
X
X
C
Y
Figure 13. Random walk scaling on the Sierpi´nski gasket Since this system of equations can be solved for any finite graph, we can now study the scaling of the average time of random walk on the Sierpi´nski gasket, from the top vertex z to one of the two other vertices, {x, y}. In the first approximation this is one step. Let us determine t(z) for the second approximation, using the equation above and t(x) = t(y) = 0. Note that pza = 12 while paz = 14 . By symmetry we can assume t(a) = t(b). Now t(z) = 1 + t(a), t(a) = 1 + 14 (t(z) + t(a) + t(c)) and t(c) = 1 + 12 t(a) yields t(c) = 3, t(a) = 4 and t(z) = 5. Thus on the gasket random walk needs 5 times more time to double the distance. In Rd this factor was 4, as discussed in Section 6. We write T (r) for the average time to traverse a gasket of length r and assume T (r) ≈ C · rγ . This leads to the random walk exponent γ = lim
r→0
which is 2 for Rd and γ =
log 5 log 2
log T (r) log r
≈ 2.32 for the gasket where γ = α + β.
Dirichlet problem on the gasket. Instead of defining the Laplace operator, we solve the Dirichlet problem on the gasket directly. The boundary of G consists of the vertices x, y, z – the only points where a neighboring triangle can touch if we extend the fractal outwards. Suppose u(x), u(y), u(z) are given. We construct a harmonic function on G which assumes the given boundary values. It will be enough to have the function on the vertices of all small triangles, then by continuity we can extend it to the whole G. Consider the second discrete approximation of G. Which values u(a), u(b), u(c) make the function harmonic? We just use the definition with neighbors in the graph. At a we get 4u(a) = u(z) + u(b) + u(c) + u(y). Taking similar equations at b and c we obtain u(a) =
1 (u(x) + 2u(y) + 2u(z)) 5
and similarly for u(b) and u(c). Now we have boundary values for the small triangles and can determine in the same way the values of the vertices of the next approximation. It turns out that the values at a, b, c fulfil the mean-value property also with respect to their new, nearer neighbors.
110
C. Bandt / Random Fractals
Self-similarity will do the rest. Let a computer repeatedly apply the above formula and determine the values at all vertices of small triangles. We know that the mean-value property will be fulfilled for neighbors of all generations. Hölder continuity. We check whether the resulting function u on all vertices is continuous, that is, nearby points have nearby values. Let δ0 and δn be the maximum differences between the boundary values u(x), u(y), u(z) and between values on the vertices of any triangle of level n, respectively. Since u(a) − u(b) =
2 1 2 (u(y) − u(x)) and u(a) − u(z) = (u(y) − u(z)) + (u(x) − u(z)) 5 5 5
n we have δ1 ≤ 35 · δ0 and by self-similarity δn ≤ 35 δ0 . This shows continuity. We can get more. Take two points c, d in G with distance in [2−(n+1) , 2−n ] so they are in the same or neighboring triangles of level n and n 3 |u(c) − u(d)| ≤ 2δ0 ≤ K · |c − d|β 5 β with K = 4δ0 since our resistance exponent was defined so that 12 = 35 . This is Hölder continuity of the harmonic function u with Hölder exponent β. If we had β = 1, the function would be Lipschitz or almost differentiable. The classical theorem in Rd says every harmonic function is twice differentiable, β = 2. Here we get a weaker result which is still much better than mere continuity. Theorem. All harmonic functions on the gasket are Hölder continuous with exponent β ≈ 0.737.
11. Analysis on other Fractals Harmonic analysis on fractals has extended in different ways: • Brownian motion on fractals: Lindstrøm, Barlow, Bass • Laplacian, its eigenvalues, PDE: Kigami, Fukushima, Strichartz, Falconer • Functional spaces: Kumagai, Triebel, Johnson Nevertheless, very few examples where actually calculated: d-dimensional or randomized Sierpi´nski gaskets, Lindstrøm’s nested fractals, the pentagasket, fractal trees. They all have strong mirror symmetry, and the pieces of the fractals meet in very few points. The following new example without mirror symmetry is obtained by the methods of [1] and published here for the first time. It is generated by fk (x) = cx + ck (1 − c), k = 1, 2, 3 where the fixed points ck are vertices of an equilateral triangle and c = 21 + i · (1 − √ √ 3 3 2 − 3 ≈ 0.518, the dimension α = log 2 ). The contraction factor is r = log r ≈ 1.67.
C. Bandt / Random Fractals
111
Z
X
W Y Figure 14. A modified Sierpi´nski gasket without mirror symmetry While the Sierpi´nski gasket has three boundary points, our modified gasket has six. Inspection of the small pieces shows, however, that at most four of them can be neighbors at the same time. In fact, up to rotation we have just four neighborhood types with the following boundary points (cf. Figure 14): {w, y}, {x, w, y}, {w, y, z}, and {x, w, y, z}. We get a self-similar resistance network and determine effective resistances a = ρwy , c = ρwz , d = ρwx , and b = ρyz = ρxy = ρxz . The equations become rather complicated, and the parameter λ = ρρ which was 53 for the usual gasket, will now be the Perron-Frobenius root of p(x) = 27x5 − 27x4 − 33x3 + 4x2 + 4x + 1. Numerically, λ ≈ 1.65, and the resistance exponent β ≈ 0.759 are not so different from the Sierpi´nski case. Two last examples. Now we give examples for which harmonic analysis has not developed yet.
Figure 15. Two fractals for which Dirichlet’s problem has not been solved
112
C. Bandt / Random Fractals
The picture on the left is new, its mappings have the complex factor c = τ · 1+i 2 where √ 1 τ = 2 ( 5 − 1) is the golden mean. Two pieces will always meet in a Cantor set of dimension α4 where α ≈ 1.675 is the dimension of the whole fractal. The picture on the right-hand side was recently studied by Sidorov [4] who called 1 it golden gasket since it is obtained αfrom the Sierpi´nski gasket replacing the factor 2 of the mappings by τ. Instead of rk = 1, the defining equation for the natural measure is 3b − 3b3 = 1, because of the overlaps. It turns out that b = √23 · cos 70o, and the dimension (as mass-scaling exponent) is α =
log b log τ .
Some problems for further research should be mentioned at the end. • Find more simple examples of fractals like those in Figure 15. • Develop harmonic analysis on such spaces. In contrast to the usual gasket, the space of harmonic functions must have infinite dimension! • Find a harmonic calculus for suitable random fractals.
References [1] C. Bandt, Self-similar measures, Ergodic theory, analysis, and efficient simulation of dynamical systems (ed. B. Fiedler), Springer 2001, 31-46. [2] M.F. Barnsley, Fractals Everywhere, 2nd ed., Academic Press 1988. [3] P. Bremaud, Markov Chains, Springer 1999. [4] D. Broomhead, J. Montaldi and N. Sidorov, Golden gaskets: variations on the Sierpi´nski sieve, Nonlinearity 17 (2004), 1455-1480. [5] P.G. Doyle and J.L. Snell, Random Walks and Electrical Networks, Math. Association of America 1984, arxiv.org/abs/math.PR/0001057 [6] K.J. Falconer, Fractal Geometry, Wiley 1990. [7] O. Häggström, Streifzüge durch die Wahrscheinlichkeitstheorie, Springer 2006. [8] J. Kigami, Analysis on Fractals, Cambridge University Press 2001. [9] G. Lawler, Conformally invariant processes in the Plane, Math. Surveys and Monographs 114, Amer. Math. Soc. 2005. [10] B.B. Mandelbrot, The Fractal Geometry of Nature, Freeman 1982. [11] S.J. Taylor, The measure theory of random fractals, Math. Proc. Cambridge Phil. Soc. 100 (1986), 383-406.
Physics and Theoretical Computer Science J.-P. Gazeau et al. (Eds.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
113
Quasicrystals: algebraic, combinatorial and geometrical aspects a
Edita Pelantová a,1 , Zuzana Masáková a Department of Mathematics, FNSPE, Czech Technical University
Abstract. The paper presents mathematical models of quasicrystals with particular attention given to cut-and-project sets. We summarize the properties of higherdimensional quasicrystal models and then focus on the one-dimensional ones. For the description of their properties we use the methods of combinatorics on words. Keywords. Quasicrystals, cut-and-project set, infinite word, palindrome, substitution
1. Introduction Crystals have been admired by people since long ago. Their geometrical shape distinguished them from other solids. Rigorous study of crystalline structures started in years 1830–1850 and was crowned around 1890 by famous list of Fedorov and Schoenflies which contained 32 classes of crystals. Their classification was based purely on geometry and algebra. The simplest arrangement, arrangement found in natural crystals, is a simple repetition of a single motive. In mathematics, it is described by the lattice theory, in physics, the subject is studied by crystallography. Repetition of a single motive means periodicity. Another remarkable property, characteristic of crystals, is their rotational symmetry, i.e. invariance under orthogonal linear transformations. Important consequence of the lattice theory is that neither planar nor space (threedimensional) periodic structures can reveal rotational symmetry of order 5 or higher than 6, see [27]. The discovery made by Max von Laue in 1912 enabled the study of the atomic structure of crystals via X-ray diffraction patterns and, in fact, justified the theoretical work developed by crystallography before. In case of periodic structures, the type of rotational symmetry of the crystal corresponds to the type of rotational symmetry of the diffraction diagram. The discovery that rapidly solidified aluminium-manganese alloys has a threedimensional icosahedral symmetry, made by Shechtmann et al. [28] in 1982, was therefore an enormous surprise for the crystallographic society. The diffraction diagram of this alloy revealed five-fold rotational symmetry. Materials with this and other crystallographically forbidden symmetries were later produced also in other laboratories with different technologies. They started to be called quasicrystals. 1 Correspondence to: Edita Pelantová, Department of Mathematics, Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University, Trojanova 13, 120 00 Praha 2, Czech Republic. Tel.: +420 224 358 544; Fax: +420 224 918 643; E-mail:
[email protected]fi.cvut.cz.
114
E. Pelantová and Z. Masáková / Quasicrystals
Schechtman’s discovery shows that periodicity is not synonymous with long-range order. The definition of long-range order is however not clear. By this term crystallographers usually understand ordering of atoms in the material necessary to produce a diffraction pattern with sharp bright spots. This is also used in the general definition of crystal adopted by Crystallographic Union at its meeting in 1992. The only clear requirement agreed upon by all scientists is that the set modeling quasicrystal, i.e. positions of atoms in the material, should be a Delone (or Delaunay) set. Roughly speaking, this property says that atoms in the quasicrystal should be ‘uniformly’ distributed in the space occupied by the material. Formally, the set Σ ⊂ Rd is called Delone if (i) (uniform discreteness): there exists r1 > 0 such that each ball of radius r1 contains at most one element of Σ; (ii) (relative density): there exists r2 > 0 such that each ball of radius r2 contains at least one element of Σ. The requirement of the Delone property is however not sufficient, for, also positions of atoms in an amorphous matter form (a section of) a Delone set. Therefore Delone sets modeling quasicrystals must satisfy other properties. According to the type of these additional requirements there exist several approaches to quasicrystal definitions [21, 22]: The concept of Bohr (Besicovich) almost periodic set, is based on Fourier analysis. The second concept of Patterson sets is based on a mathematical analogue of X-ray diffraction, developed by Hof. The third concept, developed by Yves Meyer, is based on restriction on the set Σ − Σ of interatomic distances. It is elegant and of purely geometric nature: A Meyer set Σ ∈ Rd is a Delone set having the property that there exists a finite set F such that Σ−Σ⊂Σ+F . In [21], Lagarias has proven that a Meyer set can equivalently be defined as a Delone set Σ such that Σ − Σ is also Delone. There exists a general family of sets Σ that are known to have quasicrystalline properties: the so-called cut and project sets, here abreviated to C&P sets. Various subclasses of these sets appear to satisfy all three above quoted definitions of quasicrystals. The paper is organized as follows: The construction of quasicrystal models by cut and projection is introduced in Section 2 and illustrated on an example of cut-and-project set with 5-fold symmetry in Section 3. The remaining part of the paper focuses on the properties of one-dimensional cut-and-project sets. Section 4 provides their definition. Sections 5, 6 and 7 show their diverse properties, both geometric and combinatorial.
2. Cut-and-project Sets The construction of a cut-and-project set (C&P sets) starts with a choice of a full rank lattice: let x1 , x2 , . . . , xd ∈ Rd be linearly independent vectors over R, the set L = {a1 x1 + a2 x2 + . . . + ad xd | a1 , . . . , ad ∈ Z}
115
E. Pelantová and Z. Masáková / Quasicrystals
is called a lattice. It is obvious that a lattice is a Delone set. Mathematical model for ideal crystal (or perfect crystal) in Rd is a set Λ, which is formed by a finite number of shifted copies of a single lattice L. Formally, Λ is a perfect crystal if Λ = L + S, where S is a finite set of translations. Since lattices satisfy L − L ⊂ L and a perfect crystal satisfies Λ − Λ ⊂ Λ − S, they are both Meyer sets. The Meyer concept of quasicrystals thus generalizes the classical definition of crystals. Perfect crystal is however a periodic set, i.e. Λ + x ⊂ Λ for any x ∈ L, and therefore it is not a suitable model for quasicrystalline materials, which reveal rotational symmetries incompatible with periodicity. We shall now describe a large class of Meyer sets which are not invariant under translation. Let L be a full rank lattice in Rd and let Rd be written as a direct sum V1 ⊕ V2 of two subspaces. One of the subspaces, say V1 , plays the role of the space onto which the lattice L is projected, we call V1 the physical space, the other subspace, V2 , determines the direction of the projection map. Space V2 is called the inner space. Let us denote by π1 the projection map on V1 along V2 , and analogically for π2 . We further require that the full rank lattice is in general position. This means that π1 is one-to-one when restricted to L, and that the image of the lattice L under the projection π2 is a set dense in V2 . The situation can be diagrammed as follows: π π2V1 1 Rd V2 ∪ L For the definition of C&P sets we need also a bounded set Ω ∈ V2 , called acceptance window, which realizes the "cut". The C&P set is then defined as Σ(Ω) := {π1 (x) | x ∈ L
and π2 (x) ∈ Ω} .
A cut-and-project set Σ(Ω) with acceptance window Ω is formed by lattice points projected on V1 , but only those whose projection on V2 belongs to Ω, i.e. by projections of lattice points found in the cartesian product V1 × Ω. Figure 1 shows the construction of a C&P set with one-dimensional physical and one-dimensional inner space. The acceptance window here is an interval in V2 . And the cylinder V1 × Ω is a strip in the plane. Let us list several important properties of C&P sets: • Σ(Ω) + t ⊂ Σ(Ω) for any t ∈ V1 , i.e. Σ(Ω) is not an ideal crystal. ◦
• If the interior Ω of the acceptance window is not empty, then Σ(Ω) is a Meyer set. ◦
• If Σ is a Meyer set, then there exists a C&P set Σ(Ω) with Ω = ∅ and a finite set F such that Σ ⊂ Σ(Ω) + F . First two properties can be derived directly from the definition of a C&P set; the third one has been shown in [22]. The aim of physicists is to find a mechanism which forces atoms in the quasicrystalline matter to occupy given positions. All physical approaches for describing crystals are based on minimum energy argument. If one wants at least to have a chance to find a physical explanation why a given Delone set is a suitable model for a quasicrystal then the number of various neighborhoods of atoms (of points) in the Delone set must be finite. This requirement is formalized in the notion of finite local complexity: We say that
116
E. Pelantová and Z. Masáková / Quasicrystals
V1 V2
r
r
r
r r
r
r
r
r
r
r
r
r
r
r
r
r
r
Figure 1. Construction of a one-dimensional cut-and-project set.
a Delone set Σ is of finite local complexity if for any fixed radius r all balls of this radius r contain only finitely many different configurations of points of Σ up to translation. It follows from their definition, that Meyer sets have finite local complexity. There◦
fore the condition Ω = ∅ ensures that a C&P set Σ(Ω) has finite local complexity. Another physically reasonable requirement on the model of quasicrystal is that every configuration of points is found in the modeling set infinitely many times. This property may be for example ensured by the requirement that the boundary of the acceptance window has an empty intersection with the image of the lattice by the projection, i.e. ∂Ω ∩ π2 (L) = ∅.
3. Cut-and-project Sets with Rotational Symmetry Recalling the motivation for introducing the notion of quasicrystals, one should ask about conditions ensuring existence of a crystallographically forbidden symmetry. For this, conditions on the acceptance window alone are not sufficient. The construction of a twodimensional C&P set with 5-fold symmetry has been described by Moody and Patera in [25]. In [6] one can find a more general construction of C&P sets with rotational symmetry of order 2n + 1, for n ∈ N, n ≥ 2. Consider the lattice L ⊂ R4 to be generated by unit vectors α1 , . . . , α4 whose mutual position is given by the following diagram. α1 A4 ≡ i
α2 i
α3 i
α4 i
In the diagram the vectors connected by an edge make an angle π/3, otherwise are orthogonal. Such vectors are root vectors of the root system A4 in the Cartan classification of simple Lie algebras. It can be verified that the lattice generated by α1 , . . . , α4 is in-
117
E. Pelantová and Z. Masáková / Quasicrystals
variant under 5-fold rotational symmetry. Let us mention that dimension 4 is the smallest which allows a lattice with such a rotational symmetry. The physical space V1 and the inner space V2 in our example have both dimension 2, thus they are spanned by two vectors, say v, u and v , u respectively. We can choose 2π
them as unit vectors, such that u and v form an angle 4π 5 and u a v form an angle 5 . The following scheme shows the definition of the projection π1 , which is uniquely given if specified on the four basis vectors α1 , . . . , α4 . α2 α4 i i A4 ≡ @ i@i α1 α3
π1 -
τv v i i @ i@i u τu
where τ =
√ 1+ 5 2
.
The irrational number τ , usually called the golden ratio, is the greater root of the quadratic equation x2 = x + 1. Recall that regular pentagon of side-length 1 has the diagonal of length τ , which exemplifies the correspondence of the golden ratio with the construction of a point set having 5-fold rotational symmetry. The projection π2 is defined analogically,√substituting vectors u, v in the diagram by v , u , and the scalar factor τ by τ = 12 (1 − 5), which is the other root of the quadratic equation x2 = x + 1. With this choice of projections π1 and π2 , a point of a lattice given by four integer coordinates (a, b, c, d) is projected as π1 (a, b, c, d) = (a + τ b)v + (c + τ d)u , π2 (a, b, c, d) = (a + τ b)v ∗ + (c + τ d )u∗ , and the C&P set has the form ! " Σ(Ω) = (a + τ b)v + (c + τ d)u a, b, c, d ∈ Z, (a + τ b)v ∗ + (c + τ d )u∗ ∈ Ω . To complete the definition of the C&P set Σ(Ω) we have to provide the acceptance window Ω. Its choice strongly influences geometrical properties of Σ(Ω). In [12] it is shown that with the above cut-and-project scheme the set Σ(Ω) has 10-fold rotational symmetry if and only if the 10-fold symmetry is displayed by the acceptance window Ω. Figure 2 shows a cut-and-project set Σ(Ω) where Ω is a disk centered at the origin. If one studies the inter-atomic interactions, it is impossible to consider contribution of all atoms in the matter; one must limit the consideration to ‘neighbours’ of a given atom. Thus it is necessary to define the notion of neighbourhood in a general point set, which has not a lattice structure. A natural definition of neighbours is given in the notion of a Voronoi cell. Consider a Delone set Σ ⊂ Rd and choose a point x in it. The Voronoi cell of x is the set V (x) = {y ∈ Rd | x − y ≤ z − y for all z ∈ Σ} . The Voronoi cell of the point x is thus formed by such part of the space, which is closer to x than to any other point of the set Σ. Since Σ is a Delone set, the Voronoi cells
118
E. Pelantová and Z. Masáková / Quasicrystals
Figure 2. Two-dimensional cut-and-project set with disc acceptance window.
of all points are well defined convex polytopes in Rd , filling this space without thick overlaps. Therefore they form a perfect tiling of the space. The notion of Voronoi cells allows a natural definition of neighbourhood of points in a Delone set Σ ⊂ Rd . Two points may be claimed neighbours, if their Voronoi polytopes share a face of dimension d − 1. The Voronoi tiling of the cut-and-project set Σ(Ω) from Figure 2 is shown on Figure 3.
Figure 3. Voronoi tiling of the cut-and-project set shown in Figure 2.
E. Pelantová and Z. Masáková / Quasicrystals
119
In the Voronoi tiling of Figure 3 one finds only 6 basic types of tiles (Voronoi π polygones). They are all found together with their copies rotated by angles 10 j, j = 0, 1, . . . , 9.
Figure 4. The tiles appearing in the Voronoi tiling of Figure 3.
Let us mention that the Voronoi tiling shown at Figure 4 is aperiodic, since the C&P set Σ(Ω) is aperiodic. For certain classes of acceptance windows with 10-fold rotational symmetry, the collections of appearing Voronoi tiles are described in the series of articles [23]. The geometry of the Voronoi tilings generated by cut-and-project sets is, except several special cases, unknown. The only known fact is that the number of types of tiles is for every cut-and-project set finite, which follows from the finite local complexity of cut-and-project sets. The situation in dimenions d ≥ 2 is quite complicated. Therefore we focus in the remaining part of the paper on one-dimensional cut-and-project sets.
4. One-dimensional C&P Sets and C&P Words Let us describe in detail the construction of a one-dimensional C&P set, as illustrated on Figure 1. Consider the lattice L = Z2 and two distinct straight lines V1 : y = εx and 1 1 V2 : y = ηx, ε = η. If we choose vectors x1 = ε−η (1, ε) and x2 = η−ε (1, η) then 2 for any point of the lattice Z we have (p, q) = (q − pη)x1 + (q − pε)x2 .
π1 (p,q)
π2 (p,q)
Let us recall that the construction by cut and projection requires that the projection π1 restricted to the lattice L is one-to-one, and that the set π2 (L) is dense in V2 . If η and ε are irrational numbers, then these conditions are satisfied. The projection of the lattice L = Z2 on the straight lines V1 and V2 are written using additive abelian groups Z[η] := {a + bη | a, b ∈ Z} , Z[ε] := {a + bε | a, b ∈ Z} . These groups are obviously isomorphic; the isomorphism : Z[η] → Z[ε] is given by the prescription x = a + bη
→
x = a + bε .
The cut-and-project scheme can then be illustrated by the following diagram.
120
E. Pelantová and Z. Masáková / Quasicrystals
Z2 π1 Z[η]x1
@ π2 @ R -@ Z[ε]x2
Every projected point π1 (p, q) lies in the set Z[η]x1 . The notation will be simplified by omitting the constant vector x1 . In a similar way, we omit the vector x2 in writing the projection π2 (p, q). With this convention, one can define a one-dimensional cut-andproject set as follows. Definition 1. Let ε, η be distinct irrational numbers, and let Ω ⊂ R be a bounded interval. Then the set Σε,η (Ω) := {a + bη | a, b ∈ Z, a + bε ∈ Ω} = {x ∈ Z[η] | x ∈ Ω} is called a one-dimensional cut-and-project set with parameters ε, η and Ω. From the properties of general cut-and-project sets, in particular from their finite local complexity, we derive that the distances between adjacent points of Σε,η (Ω) are finitely many. The following theorem [19] limits the number of distances to three. Theorem 2. For every Σε,η (Ω) there exist positive numbers Δ1 , Δ2 ∈ Z[η] such that the distances between adjacent points in Σε,η (Ω) take values in {Δ1 , Δ2 , Δ1 + Δ2 }. The numbers Δ1 , Δ2 depend only on the parameters ε, η and on the length |Ω| of the interval Ω. As the theorem states, the distances Δ1 , Δ2 and Δ1 + Δ2 depend only on the length of the acceptance window Ω and not on its position in R or on the fact whether Ω is open or closed interval. These properties can, however, influence the repetitivity of the set Σε,η (Ω). More precisely, they can cause that the largest or the smallest distance appears in Σε,η (Ω) at one place only. From the proof of Theorem 2 it follows that if Ω semi-closed, then Σε,η (Ω) is repetitive. Therefore in the sequel we consider without loss of generality the interval Ω of the form Ω = [c, c + ). Nevertheless, let us mention that even if Σε,η ([c, c + )) is repetitive, the distances between adjacent points may take only two values, both of them appearing infinitely many times. The knowledge of values Δ1 , Δ2 allows one to easily determine the neighbours of arbitrary point of the set Σε,η ([c, c + )) and by that, generate progressively the entire set. Denote (xn )n∈Z an increasing sequence such that {xn | n ∈ Z} = Σε,η ([c, c + )). Then for the images of points of the set Σε,η ([c, c + )) under the mapping one has x n+1
⎧
⎨ xn + Δ 1 = x n + Δ 1 + Δ 2 ⎩
xn + Δ 2
if x n ∈ [c, c + − Δ 1 ) =: ΩA , if x n ∈ [c + − Δ 1 , c − Δ 2 ) =: ΩB , if x n ∈ [c − Δ 2 , c + ) =: ΩC .
(1)
The mapping, which to x n associates the element x n+1 is a piecewise linear bijection f : [c, c + ) → [c, c + ). Its action is illustrated on Figure 5. Such mapping is in the theory of dynamical systems called 3-interval exchange transformation. In case that
121
E. Pelantová and Z. Masáková / Quasicrystals
Δ 1 − Δ 2 = |Ω| = , the interval denoted by ΩB is empty. This is the situation when the distances between adjacent points in Σε,η ([c, c + )) take only two values. The mapping f is then an exchange of two intervals. ΩA
ΩB
ΩC
@ @ @ @ @ @ @ @ @ @ @ @ @
@ f:
ΩC+Δ2
↑
ΩA+Δ1
ΩB+Δ1+Δ2 Figure 5. The diagram illustrating the prescription (1). The function f is a three-interval exchange transformation.
In order to record some finite segment of the set Σε,η ([c, c + )), one can list the individual elements of this set. However, this is not the most efficient way, since every point x ∈ Σε,η ([c, c + )) is of the form x = a + bη ∈ Z[η] and with growing size of the considered segment, the integer coordinates a, b, of the points needed for recording x = a + bη grow considerably. Much simpler is to find a point x0 ∈ Σε,η ([c, c + )) in the segment and record the sequence of distances between consecutive points on the left and on the right of x0 . With this in mind, the entire set Σε,η ([c, c + )) can be coded by a bidirectional infinite word (un )n∈Z in the alphabet {A, B, C} given as ⎧ ⎨ A if xn+1 − xn = Δ1 , un = B if xn+1 − xn = Δ1 + Δ2 , ⎩ C if xn+1 − xn = Δ2 . Such an infinite word is denoted by uε,η (Ω). Example 1 (Mechanical words). Let us choose irrational ε ∈ (−1, 0) and irrational η > 0. We shall consider one-dimensional C&P set with acceptance window Ω = (β − 1, β], for some β ∈ R. For simplicity of notation we shall put α = −ε ∈ (0, 1). From the definition of C&P sets it follows that a+bη ∈ Σε,η (Ω)
⇔
a+bε ∈ Ω
⇔
β −1 < a−bα ≤ β
⇔
a = bα+β
and therefore the C&P set is of the form Σ−α,η (β − 1, β] = {bα + β + bη | b ∈ Z} . Since α, η > 0, the sequence xn := nα + β + nη is strictly increasing and thus the distances between adjacent points of the C&P set Σ−α,η (β − 1, β] are of the form xn+1 − xn = η + (n + 1)α + β − nα + β =
η + 1 = Δ1 , η = Δ2 .
(2)
122
E. Pelantová and Z. Masáková / Quasicrystals
This C&P set can therefore be coded by an infinite word (un )n∈Z in a binary alphabet. Usually one chooses the alphabet {0, 1}, so that the n-th letter of the infinite word can be expressed as un = (n + 1)α + β − nα + β .
(3)
Such infinite words (un )n∈Z were introduced already in [26] and since then, extensively studied, under the name mechanical words. The parameter α ∈ (0, 1) is called the slope and the parameter β the intercept of the mechanical word.
5. Equivalence of One-dimensional C&P Sets In the previous section we have defined for a triplet of parameters ε, η and Ω the set Σε,η (Ω) and we have associated to it the infinite word uε,η (Ω). Natural question is how these objects differ for different triplets of parameters. Recall that our construction is based on the projection of the lattice Z2 . The group of linear transformations of this lattice onto itself is known to have three generators: ! " 11 1 0 01 G = A ∈ M2 (Z) det A = ±1 = , , 01 0 −1 10 Directly from the definition of C&P sets it follows that the action of these three generators on the lattice provides the identities Σε,η (Ω) = Σ1+ε, 1+η (Ω) , Σε,η (Ω) = Σ−ε,−η (−Ω) , 1 1 ( ε Ω) . ε,η
Σε,η (Ω) = η Σ 1
Another identity for C&P sets is obtained using invariance of the lattice Z2 under translations. a + bη + Σε,η (Ω) = Σε,η (Ω + a + bε) ,
for any a, b ∈ Z .
The mentioned transformations were used in [19] for the proof of the following theorem. Theorem 3. For every irrational numbers ε, η, ε = η and every bounded interval Ω, ˜ satisfying there exist ε˜, η˜ and an interval Ω, (P )
˜ ≤1 ε˜ ∈ (−1, 0), η˜ > 0, max(1 + ε˜, −˜ ε) < |Ω|
such that ˜ , Σε,η (Ω) = sΣε˜,˜η (Ω)
for some s ∈ R .
E. Pelantová and Z. Masáková / Quasicrystals
123
˜ by a scalar s can be understood as a choice of a different Multiplying the set Σε˜,˜η (Ω) scale. From the physical point of view the sets are therefore de facto the same. In geometry such sets are said to be similar. In the study of one-dimensional C&P sets, one can therefore limit the considerations only to parameters satisfying the condition (P ). One may ask whether the family of parameters can be restricted even more. More precisely, one asks whether for different triples of parameters satisfying (P ) the corresponding C&P sets are essentially different. The answer to such question is almost always affirmative, except certain, in some sense awkward, cases. A detailed analysis can be found in [19]. Theorem 3 concerned geometrical similarity of C&P sets. If interested only in the corresponding infinite words uε,η (Ω), we can restrict the consideration even more. This is a consequence of the following two assertions. Claim 1. If the parameters ε, η1 , Ω and ε, η2 , Ω satisfy (P ) then the infinite word uε,η1 (Ω) coincides with uε,η2 (Ω). Consequently, we can choose the slope of the straight line V2 in the cut-and-project scheme to be η = − 1ε , where ε is the slope of the straight line V1 . The straight lines V1 and V2 can therefore be chosen without loss of generality mutually orthogonal. Claim 2. If the parameters ε, η, Ω satisfy (P ) then the infinite word uε,η (Ω) coincides with u−1−ε,η (−Ω) up to permutation of assignment of letters. This statement implies that for the study of combinatorial properties of infinite words associated with C&P sets, one can limit the choice of the parameter ε to the range (− 12 , 0). 6. Factor and Palindromic Complexity of C&P Words For the description of combinatorial properties of infinite words associated to onedimensional C&P sets one uses the terminology and methods of language theory. Consider a finite alphabet A and a bidirectional infinite word u = (un )n∈Z , u = · · · u−2 u−1 u0 u1 u2 · · · . The set of factors of u of the length n is denoted Ln = {ui ui+1 · · · ui+n−1 | i ∈ Z} . The set of all factors of the word u is the language of u, L=
Ln .
n∈N
If any factor occurs in u infinitely many times, then the infinite word u is called recurrent. If moreover for every factor the gaps between its individual occurrences in u are bounded, then u is called uniformly recurrent. The factor complexity of the infinite word u is a mapping C : N → N such that
124
E. Pelantová and Z. Masáková / Quasicrystals
C(n) = #{ui ui+1 · · · ui+n−1 | i ∈ Z} = #Ln . The complexity of an infinite word is a measure of ordering in it: for periodic words it is constant, for random words it is equal to (#A)n . Since every factor ui ui+1 · · · ui+n−1 of the infinite word u = (un )n∈Z has at least one extension ui ui+1 · · · ui+n , it is clear that C(n) is a non-decreasing function. If C(n) = C(n + 1) for some n, then every factor of the length n has a unique extension and therefore the infinite word u is periodic. The complexity of aperiodic words is necessarily strictly increasing function, which implies C(n) ≥ n + 1 for all n. It is known that mechanical words (defined by (3)) with irrational slope are aperiodic words with minimal complexity, i.e. C(n) = n + 1. Such words are called bidirectional sturmian words. A survey of which functions can express the complexity of some infinite word can be found in [16]. The following theorem has been proven in [19] for the infinite words obtained by the cut-and-project construction. Theorem 4. Let uε,η (Ω) be a C&P infinite word with Ω = [c, c + ). • If ∈ / Z[ε], then for any n ∈ N we have C(n) = 2n + 1. • If ∈ Z[ε], then for any n ∈ N we have C(n) ≤ n + const. Moreover, the infinite word uε,η (Ω) is uniformly recurrent. From the theorem it follows that the complexity of the infinite word uε,η (Ω) depends only on the length = |Ω| of the acceptance window and not on its position. This is the consequence of the fact that language of uε,η (Ω) depends only on |Ω|. If the parameters ε, η and Ω = [c, c+) satisfy the condition (P), then the complexity of the infinite word uε,η (Ω) is minimal (i.e. C(n) = n + 1) if and only if = 1. Thus the infinite words uε,η ([c, c + 1)) are sturmian words. Nevertheless, the sturmian structure can be found also in words uε,η ([c, c + 1)) with ∈ Z[ε]. As a matter of fact, one can prove the following proposition [18]. Proposition 5. If ∈ Z[ε] then there exists a sturmian word v = · · · v−2 v−1 v0 v1 v2 · · · in {0, 1}Z and finite words W0 , W1 over the alphabet {A, B, C} such that uε,η (Ω) = · · · Wv−2 Wv−1 Wv0 Wv1 Wv2 · · · . The proposition in fact states that the infinite word uε,η (Ω) can be obtained by concatenation of words W0 , W1 in the order of 0’s and 1’s in the sturmian word v. Let us mention that Cassaigne [11] has shown a similar statement for arbitrary one-directional infinite words with complexity n + const. He calls such words quasisturmian. A reasonable model of quasicrystalline material cannot distinguish between the ordering of neighbours on the right and on the left of a chosen atom. In terms of the infinite word, which codes the one-dimensional model of quasicrystal, it means that the language L must contain, together with every factor w = w1 w2 . . . wn also its mirror image w = wn wn−1 . . . w1 . The language of C&P words satisfies such requirement. A factor w, which satisfies w = w, is called a palindrome, just as it is in natural languages. The study of palindromes in infinite words has a great importance for describing the spectral properties of one-dimensional Schrödinger operator, which is as-
125
E. Pelantová and Z. Masáková / Quasicrystals
sociated to (un )n∈Z in the following way: To every letter of the alphabet a ∈ A one associates the potential V (a) in such a way that the mapping V : A → R is injective. The one-dimensional Schrödinger operator H is then defined as (Hφ)(n) = φ(n + 1) + φ(n − 1) + V (un )φ(n) ,
φ ∈ 2 (Z) .
The spectral properties of H influence the conductivity properties of the given structure. Roughly speaking, if the spectrum is absolutely continuous, then the structure behaves like a conductor, while in the case of pure point spectrum, it behaves like an insulator. In [20] one shows the connection between the spectrum of H and the existence of infinitely many palindromes in the word (un )n∈Z . The function that counts the number of palindromes of a given length in the language L of an infinite word u is called the palindromic complexity of u. Formally, the palindromic complexity of u is a mapping P : N → N defined by P(n) := #{w ∈ Ln | w = w} . Upper estimates on the number P(n) of palindromes of length n in an infinite word u can be obtained using the factor complexity C(n) of u. In [3] the authors prove a result which brings into relation the factor complexity C(n) with the palindromic complexity P(n). For a non-periodic infinite word u it holds that P(n) ≤
16 n C n+ . n 4
(4)
By combining the above estimate with the knowledge of the factor complexity we obtain for C&P infinite words that P(n) ≤ 48. Infinite words constructed by cut and projection are uniformly recurrent. For such words, the upper estimate of the palindromic complexity can be improved, using the observation that uniformly recurrent words have either P(n) = 0 for sufficiently large n, or the language L of the infinite word is invariant under the mirror image, see [5]. If L contains with every factor w its mirror image w, then P(n) + P(n + 1) ≤ 3ΔC(n) := 3 C(n + 1) − C(n) .
(5)
This estimate of the palindromic complexity is better than that of (4) in case that the factor complexity C(n) is subpolynomial. In particular, for C&P infinite words we obtain P(n) ≤ 6. In [14] one can find the exact value of the palindromic complexity for infinite words coding three-interval exchange transformation. Since this is the case of C&P words, we have the following theorem. Theorem 6. Let uε,η (Ω) be a C&P infinite word with Ω = [c, c + ) and let ε, η, Ω satisfy the conditions (P). Then ⎧ ⎪ ⎨ 1 for n even, P(n) = 2 for n odd and = 1, ⎪ ⎩ 3 for n odd and < 1.
126
E. Pelantová and Z. Masáková / Quasicrystals
7. Substitution Invariance of C&P Words To generate the set Σε,η (Ω) using the definition amounts to decide, for every point of the form a + bη, whether a + bε belongs to the interval Ω or not. This is done by verifying certain inequalities between irrational numbers. If we use a computer working with finite precision arithmetics, the rounding errors take place and in fact, the computer generates a periodic set, instead of aperiodic Σε,η (Ω). The following example gives a hint to much more efficient and in the same time exact generation of a C&P set. Consider the most popular one-dimensional cut-and-project set, namely the Fibonacci chain. It is a C&P √set with parameters η = τ , ε = τ and Ω = [0, 1). (Recall √ 1+ 5 1− 5 that the golden ratio τ = 5 and τ = 5 are the roots of the equation x2 = x + 1.) Since τ 2 = τ + 1, the set of all integer combinations of 1 and τ is the same as the set of all integer combinations of τ 2 and τ , formally τ Z[τ ] = Z[τ ] . Moreover, Z[τ ] is closed under multiplication, i.e. Z[τ ] is a ring. Since τ + τ = 1, we have also Z[τ ] = Z[τ ], and the mapping which maps a + bτ → a + bτ is in fact an automorphism on the ring Z[τ ]. Note that Z[τ ] is the ring of integers in the field Q[τ ] and is the restriction of the Galois automorphism of this field. Using the mentioned properties one can derive directly from the definition of C&P sets that τ 2 Στ ,τ (Ω) = Στ ,τ (τ )2 Ω , which is valid for every acceptance window Ω. In the case Ω = [0, 1) we moreover have (τ )2 Ω ⊂ Ω. Therefore τ 2 Στ ,τ (Ω) ⊂ Στ ,τ (Ω) , i.e. Στ ,τ (Ω) is selfsimilar with the scaling factor τ 2 , as illustrated on Figure 6. Example 1, namely equation (2) implies that Στ ,τ (Ω) has two types of distances between adjacent points, namely Δ1 = τ 2 and Δ2 = τ . In Figure 6 the distance Δ1 is coded by the letter A and the distance Δ2 by the letter B. For our considerations it is important that every distance A scaled by the factor τ 2 is filled by two distances A followed by B. Similarly, every scaled distance B is filled by A followed by B. This property can be proven from the definition of Στ ,τ (Ω). As a consequence, the Fibonacci chain can be generated by taking an initial segment of the set, scaling it by τ 2 and filling the gaps by new points in the above described way, symbolically written as the rule A → AAB
and
B → AB.
Repeating this, one obtains step by step the entire C&P set. Since the origin 0 as an element of Στ ,τ (Ω) has its left neighbour in the distance Δ2 and the right neighbour in the distance Δ1 , we can generate the Fibonacci chain symbolically as B|A → AB|AAB → AABAB|AABAABAB → . . .
127
E. Pelantová and Z. Masáková / Quasicrystals
s
Σ:
A
A
B s A sB s A
|{z} | {z } Δ2
τ 2Σ : s
A
A
sA
sB s A
A
B
A
B
A
B
A
B
Δ1
s s A B | A {z B } | A {z τ 2 Δ2 0 τ 2 Δ1
B}
s
A
s
s
Figure 6. Selfsimilarity and substitution invariance of the Fibonacci word.
A natural question arises, whether such efficient and exact generation is possible also for other one-dimensional cut-and-project sets, respectively their infinite words. Let us introduce several notions which allow us to formalize this question. A mapping ϕ on the set of finite words over the alphabet A is a morphism if the ϕimage of a concatenation of two words is concatenation of the ϕ-images of the individual words, i.e. ϕ(vw) = ϕ(v)ϕ(w) for every pair of words v, w over the alphabet A. For the determination of a morphism, it suffices to specify the ϕ-images of the letters of the alphabet. The action of a morphism can be naturally extended to infinite words, ϕ(u) = ϕ(. . . u−2 u−1 |u0 u1 u2 u3 . . .) := . . . ϕ(u−2 )ϕ(u−1 )|ϕ(u1 )ϕ(u2 )ϕ(u3 ) . . . An infinite word u invariant under the action of the morphism, i.e. which satisfies ϕ(u) = u, is called a fixed point of ϕ. In this terminology, one can say that the Fibonacci chain (or the infinite word coding it) is a fixed point of the morphism ϕ over a two-letter alphabet {A, B}, which is determined by the images of letters, ϕ(A) = AAB, ϕ(B) = AB. The identity map, which maps every letter of the alphabet on itself, is also a morphism and arbitrary infinite word is its fixed point. However, one cannot use the identity map for generation of infinite words. Therefore we must put additional requirements on the morphism ϕ. The morphism ϕ over the alphabet A is called a substitution, if for every letter a ∈ A the length of the associated word ϕ(a) is at least 1, and if there exist letters a0 , b0 ∈ A such that the words ϕ(a0 ) and ϕ(b0 ) have length at least 2, the word ϕ(a0 ) starts with the letter a0 , and the word ϕ(b0 ) ends with the letter b0 . A morphism, which is in the same time a substitution, necessarily has a fixed point u, which can be generated by repeated application of the morphism on the pair of letters b0 |a0 . Formally, ϕ(u) = u = lim ϕn (b0 )|ϕn (a0 ) . n→∞
To every substitution ϕ over the alphabet A = {a1 , a2 , . . . , ak } one associates a k × k square matrix M (the so-called incidence matrix of the substitution). The element
128
E. Pelantová and Z. Masáková / Quasicrystals
Mij is given as the number of letters aj in the word ϕ(ai ). The incidence matrix of the substitution generating the Fibonacci word is M = ( 21 11 ) . The incidence matrix of a substitution is by definition a non-negative matrix for which Perron-Frobenius theorem holds [17]. A substitution ϕ is said primitive, if some power of its incidence matrix is positive. In this case the spectral radius of the matrix is an eigenvalue with multiplicity 1, the corresponding eigenvector (the so-called Perron eigenvector of the matrix) being also positive. Although the incidence matrix M does not allow to reconstruct the substitution ϕ, many properties of the fixed points of ϕ can be derived from it. Let us mention some of them. • If the infinite word u is invariant under a substitution, then there exists a constant K such that for the complexity function of the word u we have C(n) ≤ Kn2
for all n ∈ N .
• If the infinite word u is invariant under a primitive substitution, then there exist constants K1 and K2 such that for the factor complexity and palindromic complexity of the word u we have C(n) ≤ K1 n
and P(n) ≤ K2
for all n ∈ N .
• An infinite word which is invariant under a primitive substitution is uniformly recurrent. • If the infinite word u is invariant under a primitive substitution ϕ over an alphabet A = {a1 , a2 , . . . , ak }, then every letter ai has well defined density in u, i.e. the limit ρ(ai ) := lim
n→∞
number of letters ai in the word u−n . . . u−1 |u0 u1 . . . un−1 2n + 1
exists. Let (x1 , x2 , . . . xk ) be the Perron eigenvector of the matrix M (transpose of the incidence matrix M of the substitution ϕ). Then the density ρ(ai ) is equal to ρ(ai ) =
xi . x1 + x2 + . . . + xk
The question of describing all C&P infinite words invariant under a substitution is still unsolved. A complete solution is known only for C&P words over a binary alphabet, which can be, without loss of generality, represented by mechanical words (3) with irrational slope α ∈ (0, 1) and intercept β ∈ [0, 1). The substitution invariance of mechanical words has first been solved in [13] for the so-called homogeneous mechanical words, i.e. such that β = 0. Theorem 7. The homogeneous mechanical word with slope α ∈ (0, 1) is invariant under a substitution if and only if α is a quadratic irrational number whose conjugate α does not belong to (0, 1).
E. Pelantová and Z. Masáková / Quasicrystals
129
A quadratic irrational number α ∈ (0, 1) whose conjugate α ∈ / (0, 1) is called Sturm number. Let us mention that in the paper [13] the Sturm number is defined using its continued fraction expansion. The simple algebraic characterization was given in [2]. The substitution invariance for general (inhomogeneous) mechanical words is solved in [4] and [29]. Theorem 8. Let α be an irrational number, α ∈ (0, 1), and let β ∈ [0, 1). The mechanical word with slope α and intercept β is invariant under a substitution if and only if the following three conditions are satisfied: (i) α is a Sturm number, (ii) β ∈ Q[α], (iii) α ≤ β ≤ 1 − α or 1 − α ≤ β ≤ α , where β is the image of β under the Galois automorphism of the field Q[α]. Unlike the case of binary C&P words, the question of substitution invariance for ternary C&P words has been so far solved only partially. The following result is the consequence of [1,9]. Theorem 9. Let Ω = [c, d) be a bounded interval. If the infinite word uε,η (Ω) is invariant under a primitive substitution, then ε is a quadratic irrational number and the boundary points c, d of the interval Ω belong to the quadratic field Q(ε). All C&P words satisfying the properties of the theorem have a weaker property than substitution invariance, the so-called substitutivity [15]. It allows one to generate even those infinite words which are not fixed points of a morphism. 8. Conclusions In the theory of mathematical quasicrystals, best known are the properties of the onedimensional models, be it the geometric or the combinatorial aspects of these structures. However, this information can be used also in the study of higher dimensional models, since the one-dimensional ones are embedded in them. In fact, every straight line containing at least two points of a higher-dimensional cut-and-project set, contains infinitely many of them, and their ordering is a one-dimensional cut-and-project sequence. Nevertheless, the notions of combinatorics on words, as they were presented here, are being generalized also to higher dimensional structures; for example, one speaks about complexity and substitution invariance of two-dimensional infinite words, even two-dimensional sturmian words are well defined [7,8]. Except cut-and-project sets, there are other aperiodic structures which can serve for quasicrystal models; they are based on non-standard numeration systems [10]. The set of numbers with integer β-expansions share many properties required from onedimensional quasicrystal models, in particular, they are Meyer sets, are self-similar, and the corresponding infinite words are substitution invariant. Acknowledgements ˇ The authors acknowledge partial support by Czech Science Foundation GA CR 201/05/0169.
130
E. Pelantová and Z. Masáková / Quasicrystals
References [1] B. Adamczewski, Codages de rotations et phénomènes d’autosimilarité, J. Théor. Nombres Bordeaux 14 (2002), 351–386. [2] C. Allauzen, Simple characterization of Sturm numbers, J. Théor. Nombres Bordeaux 10 (1998), 237–241. [3] J.-P. Allouche, M. Baake, J. Cassaigne, D. Damanik, Palindrome complexity, Theoret. Comput. Sci. 292 (2003), 9–31. [4] P. Baláži, Z. Masáková, E. Pelantová, Complete characterization of substitution invariant sturmian sequences, Integers: Electronic Journal of Combinatorial Number Theory 5 (2005), #A14, 23pp. [5] P. Baláži, E. Pelantová, Interval Exchange Transformations and Palindromes, in Proceedings of 5th International Conference on Words, Montreal (2005) 8pp. [6] D. Barache, B. Champagne, J.-P. Gazeau, Pisot-cyclotomic quasilattices and their symmetry semigroups, in Quasicrystals and Discrete Geometry, ed. J. Patera, Fields Institute Monographs, AMS, (1998), 15–66. [7] V. Berthé, L. Vuillon, Palindromes and two-dimensional Sturmian sequences, J. Autom. Lang. Comb. 6 (2001), 2, 121–138. [8] V. Berthé, L. Vuillon, Suites doubles de basse complexité, J. Théor. Nombres Bordeaux 12 (2000), 179–208. [9] M.D. Boshernitzan, C.R. Carroll, An extension of Lagrange’s theorem to interval exchange transformations over quadratic fields, J. d’Analyse Math. 72 (1997), 21–44. ˇ Burdík, Ch. Frougny, J.P. Gazeau, R. Krejcar, Beta-integers as natural counting systems [10] C. for quasicrystals, J. Phys. A: Math. Gen. 31 (1998), 6449–6472. [11] J. Cassaigne, Sequences with grouped factors, Developments in Language Theory III, 1997, Aristotle University of Thessaloniki, (1998), 211–222. [12] L. Chen, R.V. Moody, J. Patera, Non- crystallographic root systems, in Quasicrystals and discrete geometry Fields Inst. Monogr., 10, Amer. Math. Soc., Providence, RI, (1998), 135– 178. [13] D. Crisp, W. Moran, A. Pollington, P. Shiue, Substitution invariant cutting sequences, J. Théor. Nombres Bordeaux 5 (1993), 123-137. [14] D. Damanik, L.Q. Zamboni, Combinatorial properties of Arnoux-Rauzy subshifts and applications to Schrödinger operators, Rev. Math. Phys. 15 (2003), 745–763. [15] F. Durand, A characterization of substitutive sequences using return words, Discrete Math. 179 (1998), 89–101. [16] S. Ferenczi, Complexity of sequences and dynamical systems, Discrete Math. 206 (1999), 145–154. [17] M. Fiedler, Special matrices and their applications in numerical mathematics. Translated from the Czech by Petr Pˇrikryl and Karel Segeth. Martinus Nijhoff Publishers, Dordrecht, 1986. [18] J.P. Gazeau, Z. Masáková, E. Pelantová, Nested quasicrystalline discretisations of the line, to appear in IRMA Lectures in Mathematics and Theoretical Physics (2005), 56pp. [19] L.S. Guimond, Z. Masáková, E. Pelantová, Combinatorial properties of infinite words associated with cut-and-project sequences, J. Théor. Nombres Bordeaux 15 (2003), 697–725. [20] A. Hof, O. Knill, B. Simon, Singular continuous spectrum for palindromic Schrödinger operators, Commun. Math. Phys. 174 (1995), 149-159. [21] J.C. Lagarias, Geometric models for quasicrystals I. Delone sets of finite type, Discrete Comput. Geom. 21 (1999), 161–191. [22] J.C. Lagarias, Mathematical quasicrystals and the problem of diffraction, in Directions in mathematical quasicrystals, CRM Monogr. Ser. 13, Amer. Math. Soc., Providence, RI, (2000), 61–93. [23] Z. Masáková, J. Patera, J. Zich, Classification of Voronoi and Delone tiles in quasicrystals.
E. Pelantová and Z. Masáková / Quasicrystals
[24]
[25] [26] [27] [28] [29]
131
I. General method.J. Phys. A 36 (2003), 1869–1894; II. Circular acceptance window of arbitrary size, J. Phys. A: Math. Gen. 36 (2003) pp. 1895-1912; III. Decagonal acceptance window of any size, J. Phys. A: Math. Gen. 38 (2005), 1947-1960. R.V. Moody, Meyer sets and their duals, in The mathematics of long-range aperiodic order, Waterloo, ON, 1995, NATO Adv. Sci. Inst. Ser. C Math. Phys. Sci., 489, Kluwer Acad. Publ., Dordrecht, (1997), 403–441. R.V. Moody, J. Patera, Quasicrystals and icosians, J. Phys A: Math. Gen. 26 (1993), 2829– 2853. M. Morse, G.A. Hedlund, Symbolic dynamics I. Sturmian trajectories. Amer. J. Math. 60 (1938), 815–866. M. Senechal, Quasicrystals and geometry, Cambridge university press (1995). D. Shechtman, I. Blech, D. Gratias, and J.W. Cahn, Metallic phase with long range orientational order and no translational symmetry, Phys. Rev. Lett. 53 (1984), 1951–1953. S. Yasutomi, On Sturmian sequences which are invariant under some substitutions, Number theory and its applications (Kyoto, 1997), 347–373, Dev. Math. 2, Kluwer Acad. Publ., Dordrecht, 1999.
This page intentionally left blank
Physics and Theoretical Computer Science J.-P. Gazeau et al. (Eds.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
133
Pisot number system and its dual tiling Shigeki Akiyama * Niigata Univ. JAPAN Abstract. Number systems in Pisot number base are discussed in relation to arithmetic construction of quasi-crystal model. One of the most important ideas is to introduce a ‘dual tiling’ of this system. This provides us a geometric way to understand the ‘algebraic structure’ of the above model as well as dynamical understanding of arithmetics algorithms. Keywords. Pisot Number, Number System, Symbolic Dynamical System, Tiling
1. Beta expansion and Pisot number system For this section, the reader finds a nice survey by Frougny [41]. However we give a brief review and concise proofs of fundamental results to make this note more self-contained. Let us fix β > 1 and A = [0, β) ∩ Z. Denote by A∗ the set of finite words over A and by AN the set of right infinite words over A. By concatenation ⊕: a1 a2 . . . an ⊕ b1 b2 . . . bm = a1 a2 . . . an b1 b2 . . . bm , A∗ forms a monoid with the empty word λ as an identity. An element of A∗ is embedded into AN by concatenating infinite 00 . . . to the right. AN becomes a compact metric space by the distance function p(a1 a2 . . . , b1 b2 . . . ) = 2−j for the smallest index j with aj = bj . A lexicographical order of AN is given by a1 a2 . . .
134
S. Akiyama / Pisot Number System and Its Dual Tiling
The beta transformation is a piecewise linear map Tβ on [0, 1) defined by Tβ : x −→ βx − βx which was shown to be ergodic by Rényi [45]. Parry [42] gave the invariant measure of this system, which is absolutely continuous to the Lebesgue measure and its RadonNikodym derivative was made explicit. For each real x = x1 ∈ [0, 1), iterating beta transforms we have a
a
a
1 2 3 Tβ : x1 −→ x2 −→ x3 −→ ....
The label over the arrow is defined as ai = βxi . One can expand x ∈ [0, 1) into x=
a2 a1 a3 + 2 + 3 + ... β β β
and ai ∈ A. Denote by dβ : [0, 1) x → a1 a2 · · · ∈ AN . Then dβ is order preserving, that is, x < y implies dβ (x)
[0, 1) −−−−→ [0, 1) ⏐ ⏐ ⏐d ⏐ dβ ( ( β
(1)
AN −−−−→ AN σ
Define the realization map: π = πβ : a1 a2 · · · ∈ AN −→
∞ ai ∈ R. βi i=1
Note that πβ is continuous but dβ is not. Since πβ ◦ dβ (x) = x by definition, we have πβ (AN ) ⊃ [0, 1). However AN ⊂ dβ ([0, 1)). If a1 a2 · · · ∈ AN is contained in dβ ([0, 1)), we say that a1 a2 · · · ∈ AN is admissible. A finite word a1 a2 . . . am of A∗ is admissible if its right completion a1 a2 . . . am ⊕ 00 · · · ∈ AN is admissible. For a given positive x, there is an integer m > 0 with β −m x ∈ [0, 1), x can be expanded like x = a−m β m + a−m+1 β m−1 + · · · + a0 +
a1 + ... β
This is the beta expansion which is a natural generalization of usual decimal or binary expansion. By abuse of terminology 1 , we sometimes write dβ (x) = a−m a−m+1 . . . a−1 a0 • a1 a2 a3 . . . or even simply 1 The
symbol ‘•’ is not in A
135
S. Akiyama / Pisot Number System and Its Dual Tiling
x = a−m a−m+1 . . . a−1 a0 • a1 a2 a3 . . . if there is no room of confusion. The expansion is finite if there is an that an = 0 holds for n > and we denote by x = a−m a−m+1 . . . a0 • a1 a2 a3 . . . a . Set dβ (1 − 0) = lim dβ (1 − ε). ε↓0
by the metric of AN . Then dβ (1 − 0) can not be finite. Theorem 1 ([42], [39]). A right infinite word ω = ω1 ω2 · · · ∈ AN is admissible if and only if σ n (ω)
∞
ωn+i β −i ∈ [0, 1).
i=1
By the assumption, there exists an admissible block decomposition: ωn+1 ωn+2 · · · = c1 . . . ck1 −1 g1 c1 . . . ck2 −1 g2 c1 . . . ck3 −1 g3 . . . where gi < cki and ki ≥ 1. It is easily seen that π(c1 . . . cki −1 gi ) ≤ π(c1 . . . cki −1 (cki − 1)) < 1 −
1 β ki
and therefore 1 1 π(ωn+1 ωn+2 . . . ) < 1 − k1 + k1 β β
1 1 1 1 − k2 + k1 +k2 1 − k3 + · · · = 1. β β β
Take F ⊂ A∗ and define a subset AF of AN or AZ by the infinite words whose subwords are not in F . Then AF is a subshift and any subshift is written in this manner. Thus F is the set of forbidden words. A subshift is called of finite type if there is a finite set F and it is expressed as AF . A subshift AF is called sofic if one can choose F which is recognizable by a finite automaton. A subshift of finite type is sofic and the sofic shift is characterized as a factor of the shift of finite type. A sofic shift AF is nothing but the set of infinite labels which is generated by infinite walks on a fixed finite directed graph labelled by A (c.f. [40]). The beta shift Xβ is a subshift of AZ which is defined to be a set of bi-infinite words whose all finite subwords are admissible. Xβ is sofic if and only if dβ (1−0) is eventually periodic. Such a β is designated as a Parry number. Further dβ (1 − 0) is purely periodic
136
S. Akiyama / Pisot Number System and Its Dual Tiling
if and only if AN is of finite type. In this case, the number β is a simple Parry number ([42], [19]). A Pisot number β > 1 is a real algebraic integer whose other conjugates have modulus less than one. A Salem number β > 1 is a real algebraic integer whose other conjugates have modulus not greater than one and also one of the conjugates has modulus exactly one. Denote by R+ the non negative real numbers. Theorem 2 (Bertrand [18], Schmidt [50]). If β is a Pisot number then each element of Q(β) ∩ R+ has an eventually periodic beta expansion. Proof. We denote by β (j) (j = 1, . . . , d) the conjugates of β with β (1) = β and use the same symbol to express the conjugate map Q(β) → Q(β (j) ) which sends x → x(j) . As the conjugate map does not increase the denominator of element x ∈ Q(β), it is enough to show that Tβn (x)(j) is bounded for all j. (The number of lattice points in the bounded region is finite.) This is trivial for j = 1 by definition. For j > 1, we have Tβn (x) = β n x −
n
xi β n−i
i=1
with xi ∈ A. Thus n β n (j) (j) n−j xi (β ) Tβ (x) < |x| + < |x| + 1 − |β (j) | i=1 since |β (j) | < 1 for j > 1. Hence a Pisot number is a Parry number. In [50], a partial converse is shown that if all rational number in [0, 1) has an eventually periodic beta expansion then β is a Pisot or a Salem number. It is not yet known whether each element of Q(β) ∩ R+ has an eventually periodic expansion if β is a Salem number (Boyd [20], [21], [22]). See Figure 1 for a brief summary. The finiteness will be discussed in §4. A Parry number β is also a real √ algebraic number greater than one, and other conjugates are less than min{|β|, (1 + 5)/2} in modulus ([42], Solomyak [53]) but the converse does not hold. It is a difficult question to characterize Parry numbers among algebraic integers. ([25], [15]) Hereafter we simply say Pisot number system to call the method to express real numbers by beta expansion in Pisot number base. The results like [50] and [18] suggest that Pisot number system is very close to the usual decimal expansion.
2. Delone set and β-integers Let X be a subset of Rd . The ball of radius r > 0 centered at x is denoted by B(x, r). A point x of X is isolated if there is a ε > 0 that B(x, ε) ∩ X = {x}. The set X is called discrete if each point of X is isolated. The set X is uniformly discrete if there exists a positive r > 0 such that B(x, r) ∩ X is empty or {x} for any x ∈ Rd , and X is relatively dense if there exists a positive R > 0 such that B(x, R) ∩ X = ∅ for any x ∈ Rd . A Delone set is the set in Rd which is uniformly discrete and relatively dense at a time. One can expand any positive real number x by beta expansion:
S. Akiyama / Pisot Number System and Its Dual Tiling
137
Figure 1. The classification of Parry numbers
x = a−m a−m+1 . . . a0 • a1 a2 . . . The β-integer part (resp. β-fractional part) of x is defined by: [x]β = π(a−m . . . a0 ) (resp. xβ = π(a1 a2 . . . )). A real number x is a β-integer if |x|β = 0. Denote by Zβ the set of β-integers and put Z+ β = Zβ ∩ R+ . Proposition 1. For any β > 1, the set of β-integers Zβ is relatively dense, discrete and closed in R. Proof. As any positive real number x is expressed by beta expansion, one can take R = 1 to show that Z+ β is relatively dense in R+ , which is equivalent to the fact that Zβ is relatively dense in R. Since π(a−m . . . a0 ) > β m there are only finitely many β-integers in a given ball B(0, β m ). Thus Zβ has no accumulation point in R. This proves that Zβ is closed and discrete. From now on, we assume that β is not an integer. Then limε↓0 Tβ (1−ε) = β−β ∈ [0, 1) and therefore we consider formally 2 the orbit of 1 by the beta transform Tβ by putting Tβ (1) = β − β. By using (1), it is easily seen that Tβn (1) = πβ (σ n (dβ (1 − 0))) unless Tβn (1) = 0. As Zβ is discrete and closed, we say that x, y ∈ Zβ is adjacent if there are no z ∈ Zβ between x and y. Proposition 2. If x, y ∈ Zβ is adjacent, then there exists some nonnegative n with |x − y| = Tβn (1). Proof. To prove this proposition, we use Theorem 1 and transfer the problem into the equivalent one in AN under abusive terminology introduced in the previous section. Put dβ (1 − 0) = c1 c2 . . . . Without loss of generality, assume that x > y, x = a−m . . . a0 21
is not in the domain of definition of Tβ .
138
S. Akiyama / Pisot Number System and Its Dual Tiling
with a−m = 0 and y = b−m . . . b0 by permitting b−m . . . b−m+ = 0+1 . As we are interested in x−y, we may assume that b−m = 0 since otherwise one can substitute x and y by (a−m − b−m )a−m+1 . . . a0 and 0b−m+1 . . . b0 . (Both of them are admissible by Theorem 1.) Since x and y are adjacent, a−m = 1 since otherwise (a−m − 1)a−m+1 . . . a0 is admissible and lies between x and y. Next we see that a−m+1 = 0 since otherwise am (a−m+1 − 1) . . . a0 is between x and y. In the same manner, we see that x = 10m . If b−m+1 . . . b0
The real number β > 1 is a Delone number 3 if {Tβn (1)}n=0,1,2,... does not accumulates to 0. If β is a Delone number, then Zβ is uniformly discrete with r = minn=0,1,... Tβn (1). With the help of Proposition 1 and 2, Zβ is a Delone set if and only if β is a Delone number. It is clear that a Pisot number is a Delone number since eventually periodicity of dβ (1 − 0) is equivalent to the fact that {Tβn (1)}n=0,1,2,... is a finite set. Verger-Gaugry proposed a working-hypothesis that any Perron number is a Delone number (c.f. [19], [57], [29] ). However it is not yet known whether there exists an algebraic Delone number which is not a Parry number. By ergodicity, when we fix a β, dβ (x) is almost ‘normal’ with respect to the invariant measure. This means that {Tβn (x)}n=0,1,2,... is dense in [0, 1) for almost all x. Therefore one might also make a completely opposite prediction that an algebraic Delone number is a Parry number. Schmeling [49] had shown a very subtle result that the set of Delone numbers has Hausdorff dimension 1, Lebesgue measure 0 and dense but meager in [1, ∞). Which conjecture is closer to the reality?
3. Definition of Pisot dual tiling For a point ξ = (ξi )i∈Z = . . . ξ−2 ξ−1 ξ0 • ξ1 ξ2 . . . in the subshift (AF , σ), let us say the left infinite word . . . ξ−2 ξ−1 ξ0 • the integer part and •ξ1 ξ2 . . . the fractional part of ξ. To make the situation clear, here we put the decimal point • on the right/left end to express an integer/fractional part. The symbol • should be neglected if we treat them as a word in A∗ . If ξ−i = 0 for sufficiently large i, the integer part is expressed by a finite word and if ξi = 0 for large i then the fractional part is written by a finite word. For an admissible finite or right infinite word ω = ω1 ω2 . . . , denote by Sω the set of finite integer parts a−m a−m+1 . . . a0 • such that the concatenation of a−m a−m+1 . . . a0 • and ω is admissible, i.e., Sω = {a−m a−m+1 . . . a0 • | a−M a−m+1 . . . a0 ⊕ ω1 ω2 . . . is admissible} . 3 Probably
we may call it also a Bertrand number. See the description of Prop.4.5 in [19].
S. Akiyama / Pisot Number System and Its Dual Tiling
139
This set Sω is the predecessors set of ω. It is shown that the number of distinct predecessor sets is finite if and only the subshift is sofic. Since the realization map πβ : AN → [0, 1) is continuous, the set of fractional parts is realized as a compact set [0, 1). However the set of integer parts is not bounded in R. Thurston embedded this set of integer parts into a compact set in the Euclidean space in the case of Pisot number system ([56]). We explain this idea by the formulation of [2] and [4]. Let β be a Pisot number of degree d and β (i) (i = 1, . . . , r1 ) be real conjugates, β (i) , β (i) (i = r1 + 1, . . . , r1 + r2 ) be imaginary conjugates where β (1) = β. Thus d = r1 + 2r2 . Define a map Φ : Q(β) → Rd−1 : Φ(x) = (x(2) , . . . , x(r1 ) , xr1 +1 , x(r1 +1) , . . . , x(r1 +r2 ) , x(r1 +r2 ) ). It is shown that Φ(Z[β] ∩ R+ ) is dense in Rd−1 ([2]). Since β is a Pisot number, Φ(Sω ) is bounded by the Euclidean topology. Take a closure of Φ(Sω + ω) and call it Tω . One can also write ) ∞ i a−i Φ(β ) a−m a−m+1 . . . a0 ⊕ ω1 ω2 . . . is admissible . Tω = Φ(ω) + i=0
A Pisot unit is a Pisot number as well as a unit in the ring of algebraic integers in Q(β). If beta expansions ω are taken over all elements of Z[β] ∩ [0, 1) (i.e. the fractional parts of Z[β]∩R+ ), we trivially have Rd−1 = ω Φ(Sω + ω). If β is a Pisot unit, the compact sets Φ(Sω + ω) form a locally finite covering of Rd−1 , we get Rd−1 = ω Tω ([4]). This is a covering of Rd−1 by Tω . If it is a covering of degree one, the predecessor set of a sofic shift is realized geometrically and give us a tiling of Rd−1 by finite number of tiles up to translations. Moreover the congruent tile must be translationally identical and this tiling has self-similarity. Indeed we have β −1 Sω =
a⊕ω : admissible
a + Sa⊕ω . β
The sum on the right is taken over all a ∈ A so that a ⊕ ω is admissible. The map z → β m z from Q(β) to itself is realized as an affine map Gm on Rd−1 satisfying the following commutative diagram. ×β m
Q(β) −−−−→ Q(β) ⏐ ⏐ ⏐ ⏐ Φ( (Φ Rd−1 −−−−→ Rd−1 Gm
The explicit form of Gm is Gm (x2 , x3 , · · · , xd ) = (x2 , x3 , · · · , xd )Am , where Am is a (d − 1) × (d − 1) matrix:
140
S. Akiyama / Pisot Number System and Its Dual Tiling
⎛
Am
⎜ ⎜ ⎜ ⎜ ⎜ =⎜ ⎜ ⎜ ⎜ ⎜ ⎝
⎞
(β (2) )m
0
(β (3) )m ..
⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
. (β (r1 ) )m B1 ..
0
. Br 2
with ((β (r1 +j) )−m ) ((β (r1 +j) )−m ) −((β (r1 +j) )−m ) ((β (r1 +j) )−m )
Bj =
for j = 1, . . . , r2 . Gm is contractive if m > 0 and expansive if m < 0 by a suitable norm on Rd−1 . Applying G−1 , the tile Tω emerges and is subdivided like G−1 (Tω ) =
Ta⊕ω .
(2)
a⊕ω
Therefore the sofic shift is geometrically realized as a self-affine tiling. In [56], under different notation he wrote, It does not quite follow that the Kx determines a tiling of S, for they could in principle have substantial overlap. (skip) However, in many cases of this construction, the shingling are tilings, and the tiles are disks’.
Thurston expected that they should give a tiling in many cases, i.e. the degree is one, and Tω may be homeomorphic to a d − 1 dimensional disk. The former statement is conjectured positively for all Pisot units but the later has many counter examples.
4. Examples in low degree cases Let us explain the Pisot dual tiling through concrete examples in degree two and three. It is already non trivial in the quadratic case and generates naturally a special type of √ sturmian sequences and substitutions. Put η = (1 + 5)/2 and let θ be a positive root of (10)∞ and x3 − x2 − x − 1. Then both of them are Pisot units and we see dη (1 − 0) = √ ∞ dθ (1 − 0) = (110) . Thus they are simple Parry numbers. Write η = (1 − 5)/2 and θ ∈ C \ R; one of the complex conjugates of θ. For understanding, let us begin with the tiling of R+ by the direct embedding of fractional parts. Start with the fundamental tile ) ∞ −i ai η ai ∈ {0, 1}, aiai+1 = 0 . A= i=1
This is symbolically written as A = {•a1 a2 . . . }. This is nothing but a realization of the fractional parts of Xη by the convergent power series and by the definition of beta expansion we have A = [0, 1]. Note that .0101 . . . is not admissible but the corresponding
S. Akiyama / Pisot Number System and Its Dual Tiling
141
beta shift Xη does have such right infinite sequence and hence the right end 1 must be included. Multiplying η to A behaves as a shift on the symbolic space and it yields a set equation: ηA = A ∪ (1 + B),
ηB = A
by classifying the left most symbol, 0 or 1. Here we have B = {x ∈ A | a1 = 0}. The reason that B has additional restriction is that the left end symbol 1 must be followed by 0 since 11 is forbidden. This gives B = [0, 1/η], and A = [0, 1] and 1 + B = [1, 1 + 1/η] are adjacent. One can omit the translation and write B instead of 1+B. In fact this makes clearer the situation. The tile A grows to the right to AB by the effect of multiplying η which is a concatenation of two tiles of different length. The tile B grows to A by ×η. This is nothing but a Fibonacci substitution A → AB, B → A and the half line R+ is tiled aperiodically like ABAABABAABAAB . . . which forms the fixed point of Fibonacci substitution. In general, if β is a Parry number, then the corresponding subshift is sofic and one have an aperiodic tiling of R+ by finite number of tiles through beta expansion. This construction is well known which we coin it a direct tiling. Now we introduce a dual tiling by embedding integer parts. The fundamental dual tile is ∞ ) i x−i η x−i ∈ {0, 1}, x−i x−i−1 = 0. T = Tλ = i=0
This extends beta expansion to the opposite direction and symbolically we may write: {. . . x−3 x−2 x−1 x0 •} However it is not convergent in the usual base η, we use η instead to have the convergence. The geometric feature is sometimes troublesome but in this case it is easy to see T = [−1, η] an interval. Let us make a right shift by dividing by η to have (η )−1 T = T ∪ T.1 The set T.1 is symbolically {. . . x−3 x−2 x−1 .1}, i.e., the set of right infinite expansion with a fixed fractional part .1. As 11 is forbidden, x−1 = 0. Therefore (η )−1 T = T ∪ (η T + η −1 ) holds. Put U = η T = [−1, 1/η]. As η −1 = −η and η T + η −1 = [−1 − η, 1/η − η] = [−η 2 , −1], the interval T grows to U T by the right shift. The new tile U is concatenated to the left of T . The situation is explained by an monoid anti homomorphism σ on two letters {T , U } (i.e. it satisfies σ(xy) = σ(y)σ(x)) with σ(T ) = U T ,
σ(U ) = T .
142
S. Akiyama / Pisot Number System and Its Dual Tiling
Iterating σ the tile grows like U T U TT UT U T T UT U T T UT T UT UT T UT U T T UT T UT UT T UT U T T UT T UT UT T UT T UT UT T UT UT T UT T UT UT T UT U T T UT T UT UT T UT T
The growing direction is alternating and T goes to the right and to the left each 2 times. This bi-infinite sturmian sequence satisfies several interesting properties. One of the most illuminating might be the cut sequence. Prepare a xy lattice together with all horizontal and perpendicular lines passing through integer points. Draw a line y = x/η and put the symbol T on the intersection of each perpendicular lines and the symbol U on that of each horizontal lines. Let us think that at the origin the line pass through very little above it and put U T . Then we get the cut sequence (See figure 2) which is identical to the above mentioned bi-infinite sturmian sequence. This is one of the general property of sturmian sequence and it is occasionally named after this property. (c.f. [23], see [55] for higher dimensional cases).
T T
U
T U T UT U
T
T UT T
Figure 2. Cut sequence
The essential reason of this phenomenon is that this sequence is a coding of 1dimensional irrational rotation x → η x. Proceed in the same way in the case of θ. Put ∞ ) i Tλ = x−i (θ ) x−i ∈ {0, 1}, x−i−1 = x−i−2 = 1 =⇒ x−i−3 = 0 i=1
which is a compact set in the complex plane. Similarly the fundamental tile grows like
143
S. Akiyama / Pisot Number System and Its Dual Tiling
(θ )−1 Tλ = Tλ ∪ T.1 (θ )−2 Tλ = Tλ ∪ T.1 ∪ T.01 ∪ T.11 (θ )−3 Tλ = Tλ ∪ T.1 ∪ T.01 ∪ T.11 ∪ T.001 ∪ T.101 ∪ T.011 . See the Figure 3.
4
2
101
1
001
-4
-2
2
4
11
011 01 -2
Figure 3. Rauzy Fractal
There are three tiles up to translations. As in the Fibonacci dual case, the origin is an inner point of T , it is shown that the complex plane is aperiodically tiled by these 5 kind of tiles. This tiling may be regarded as a coding of the irrational rotation z → θ z. Unlike Fibonacci shift, this coding is not realized by words and the geometric nature is not simple ([9]). Another example by the minimal Pisot number: a root of x3 − x − 1 is shown in Figure 4. In this case, dβ (1 − 0) = (10000)∞ .
5. Finiteness condition implies non overlapping The property of number systems are intimately related to the tiling introduced in §3, §4. Especially whether they give a tiling, a covering of degree one, or not. Let Fin(β) be the set of finite beta expansions. Fin(β) clearly consists of non negative elements of Z[1/β]. (Note that Z[β] ⊂ Z[1/β] as β is an algebraic integer.) FrougnySolomyak [27] asked if
144
S. Akiyama / Pisot Number System and Its Dual Tiling
4
100001 0000001 2 001 1
-6
-4
-2
2
0001
4
6
00001 01 -2
-4
Figure 4. Minimal Pisot case
Fin(β) = Z[1/β] ∩ R+ holds or not for a given number system. We say that it satisfies a finiteness condition (F). A weaker condition Z ∩ R+ ⊂ Fin(β) implies that β is a Pisot number ([3]). Therefore the finiteness (F) holds only when β is a Pisot number. The converse is not true. Especially if the constant term of the minimal polynomial of β is positive, then β has a positive other conjugate and hence (F) does not hold. Further there exists an algorithm to determine whether (F) holds or not ([1]). The relationship is depicted in the Figure 1. Several sufficient conditions for (F) are also known. For dβ (1) = c1 c2 . . . , if ci ≥ ci+1 holds for each i then β is a Pisot number and any number which is expressed as a polynomial of β with non-negative integer coefficients belongs to Fin(β). Additionally if β is a simple Parry number, then β satisfies (F). Let us call this type of β of Frougny-Solomyak type ([27]). Let xd − ad−1 xd−1 − ad−2 xd−2 − · · · − a0 be the minimal polynomial of β and if ai ≥ 0 and ad−1 > a0 + a1 + · · · + ad−2 then β satisfies (F). This is called of Hollander type ([33]). The minimal polynomial of cubic Pisot units with the finiteness (F) are classified by the following ([3]): 1. 2. 3. 4.
x3 − ax2 − (a + 1)x − 1, a ≥ 0 x3 − ax2 − bx − 1, a ≥ b ≥ 1 (Frougny-Solomyak type) x3 − ax2 − 1, a ≥ 1 (Hollander type) x3 − ax2 + x − 1, a ≥ 2
If β is a Pisot unit with the finiteness (F), the origin of Rd−1 is an inner point of Tλ and other Tω (ω = λ) does not contain the origin. An inner point of a tile is called
S. Akiyama / Pisot Number System and Its Dual Tiling
145
exclusive if it it does not belong to other tiles. As above, if the origin is an exclusive inner point of T , the tiling is generated by successive use of 2: G−n (Tω ) =
Ta−n+1 a−n+2 ...a0 ⊕ω .
a−n+1 a−n+2 ...a0 ⊕ω
A general Pisot number does not always satisfy this finiteness (F). In such cases, the origin belongs to plural tiles. Even in this case, if the next weaker finiteness is valid, then one can construct the similar tiling: (W) For any ε > 0 and z ∈ Z[1/β] ∩ R+ there exist x, y ∈ Fin(β) such that z = x − y and |y| < ε. More precisely, let us denote by P the elements of Z[β] having purely periodic βexpansions. Then the origin is shared by Tω with ω ∈ P and other tiles can not contain 0. Permitting an abuse of terminology, the origin 0 is an exclusive inner point of a union ω∈P Tω . Using this, the condition (W) is equivalent to the fact that the family {Tω : ω ∈ [0, 1) ∩ Z[β]} forms a covering of Rd−1 of degree one, i.e. a tiling. Especially under (W), the boundary of Tω has (d − 1)-dimensional Lebesgue measure zero ([4]): Theorem 3 ([2],[4]). Let β is a Pisot unit with the property (W). Then Rd−1 =
Tω
ω∈Z[β]∩[0,1)
is a tiling. This weak finiteness (W) is believed to be true for all Pisot numbers (Sidorov [51], [52]), which is an important unsolved problem. In [8], (W) is shown for several families of Pisot numbers, including cubic Pisot units. For an example of the Pisot unit β with the property (W) but not (F), let β is a cubic Pisot unit defined by x3 − 3x2 + 2x − 1. It gives a tiling in Figure 5. In this case dβ (1 − 0) = 201∞ and we denote by ω = 1∞ = 111 . . . . The condition (F) was applied in many different contexts (c.f. [54], [13], [34], [17], [28]). Characterization of Pisot numbers with the property (F) among algebraic integers is an important difficult problem. One can transfer this problem to a problem of the shift radix system (SRS for short), a concrete and simple dynamical system on Zd−1 . In fact, SRS unifies two completely different number systems: Pisot number systems and canonical number systems. The study of SRS is an ongoing project for us ( [5], [6], [7], [11], I recommend [12] for the first access).
6. Natural extension and purely periodic orbits For a given measure theoretical dynamical system (X, T1 , μ1 , B1 ), if there exists an invertible dynamical system (Y, T2 , μ2 , B2 ) such that (X, T1 , μ1 , B1 ) is a factor of (Y, T2 , μ2 , B2 ) then (Y, T2 , μ2 , B2 ) is called a natural extension of (X, T1 , μ1 , B1 ). There is a general way to construct a natural extension due to Rohlin [46]. However if
146
S. Akiyama / Pisot Number System and Its Dual Tiling
12
Unit Length 02
11
2 01
1 Ω 110Ω 200Ω 0Ω 010Ω 10Ω 100Ω
00Ω 000Ω
Figure 5. x3 − 3x2 + 2x − 1
you wish to answer numerous theoretical problems, a small and good extension is expected, which keeps its algebraic property of the system. Pisot dual tiling gives a way to construct such a natural extension of ([0, 1), Tβ ) equipped with the Parry measure. Assume that β is a Pisot unit with the property (W). As β is a Parry number, the set {Tβn (1) | n = 0, 1, 2, . . . } is finite. Number them like 0 < t1 < t2 < · · · < t = 1 and set t0 = 0. Take a u ∈ [ti , ti+1 ). Then by Theorem 1 and the construction of Pisot dual tiling, Tu − Φ(u) = Φ(Su ) does not depend on the choice of u. Introduce 0β = X
−1
(−Tti + Φ(ti )) × [ti , ti+1 )
i=0
0β : and the map acting on X
S. Akiyama / Pisot Number System and Its Dual Tiling
147
0β (x, y) → (G1 (x) − Φ(βy), βy − βy) ∈ X 0β . T0β : X and consider the restriction of the Lebesgue measure μd on Rd and the collection B of Lebesgue measurable sets. Then T0β preserves the measure since β is a unit and 0β , T0β , μd , B) gives an invertible dynamical system. This extended dynamical system (X gives a ‘bi-infinite’ extension of ([0, 1), Tβ ) and is a factor of the beta shift Xβ and the following diagram commutes: σ
Xβ −−−−→ Xβ ⏐ ⏐ ⏐φ ⏐ φ( ( b
Tβ 0β 0β −−− −→ X X ⏐ ⏐ ⏐ ⏐res res( (
(3)
[0, 1) −−−−→ [0, 1) Tβ
where φ(. . . a−1 a0 • a1 a2 . . . ) =
lim −Φ(a−m . . . a0 •), πβ (•a1 a2 . . . ) .
m→∞
and res(x, y) = y. This extension is realized in the d-dimensional Euclidean space and good since Φ is an additive homomorphism defined through conjugate maps, which are ring homomor0β consists of several cylinder sets (−Tti + Φ(ti )) × [ti , ti+1 ), phisms. By definition X and this natural partition gives a Markov partition. The Parry measure of ([0, 1), Tβ ) is retrieved as a restriction of the Lebesgue measure μd . As an application, the purely periodic orbits of Tβ is completely described. Using our formulation, we have Theorem 4 ([32], [31], [38], [37]). An element x ∈ Q(β) ∩ [0, 1) has a purely periodic 0β . β-expansion if and only if (Φ(x), x) ∈ X T0β is almost one to one. The main part of the proof of this Theorem is to discuss the intersection of two cylinder sets, the boundary problem. In fact, this is always a problem for a Markov partition. As we wish to have an exact statement, such set of measure zero is not negligible. In this case, we can show that there are no elements Q(β) ∩ [0, 1) on such intersec0β is compact, there are finite points in tion. To show this, the main idea is simple. As X 0β which correspond to elements of Q(β) ∩ [0, 1) having a fixed denominator. We can X easily show that T0β is surjective. But surjectivity and injectivity are equivalent for a finite 0β which correspond to Q(β)∩[0, 1). set. Therefore T0β is bijective on the set of points in X On the other hand, bijectivity breaks down only on the cylinder intersection. 0β . If β is To know more on periodic orbits, we need to give an explicit shape of X 0β is a union of two rectangles and the shape is quite easy. For a quadratic Pisot unit, X
148
S. Akiyama / Pisot Number System and Its Dual Tiling
cubic or higher degree Pisot units, the tile has a fractal boundary. We shall discuss a way to characterize the boundary in the last section. For non unit Pisot numbers, we also have to take into account the p-adic embedding (c.f. [16]). As the Markov partition based on number systems are simple and concrete, when the topological structure is not complicated, one can deduce geometric information from algebraic consideration on number systems and conversely from the fractal nature of tiles we deduce some number theoretical outcome. For example, in [1] it is shown that Theorem 5. If a Pisot unit β satisfies (F), the beta expansion of sufficiently small positive rational numbers is purely periodic. This is just a consequence of the fact that the origin is an exclusive inner point of Tλ under (F) condition. For a concrete case, we can show a strange phenomenon ([10]): Theorem 6. For a minimal Pisot number θ, the supremum of c such that all elements of [0, c] ∩ Q are purely periodic is precisely computed as 0.66666666608644067488 . . .. Moreover there exists an increasing sequence a0 < a1 < a2 < . . . , lying in (0, 1), such that all rationals in [a4i , a4i+1 ] are not purely periodic and all rationals in [a4i+2 , a4i+3 ] are purely periodic. The later statement reflects the fractal structure of the boundary of Tλ and perhaps it is not so easy to obtain this conclusion in a purely algebraic manner. This type of tight connection between fractal geometry and number theory is one of the aim of our research.
7. Periodic Tiling and Toral automorphism Arnoux-Ito [14] realized Pisot type substitutions in a geometric way for higher dimensional irrational rotations. It is also applied to higher dimensional continued fractions. The idea dates back to Rauzy [44], and the fractal sets arising from this construction are widely called Rauzy fractal (c.f. [14], [36], [24], [43], [26]). The addition of 1 to the number system is realized as a domain exchange acting on the central tile Tλ of the aperiodic tiling defined in the previous sections. Further, according to their theory, we can tile the space Rd−1 periodically (!) as well by the central tile Tλ and its translates under a certain condition. The multiplication by β in the number system gives rise to an explicit construction of Markov partition of automorphisms of (R/Z)d associated to the companion matrix of the Pisot unit β. For this construction, the existence of the periodic tiling is essential. Therefore it is worthy to give a direct construction of periodic tiling from the point of view of β-expansion. This section is devoted to this task. Let β be a Pisot unit of degree d with the property (W). A crucial assumption in this section is that cardinality of {Tβn (1) | n = 0, 1, . . . } \ {0} is equal to d. (By considering the degree of the minimal polynomial of β, the cardinality is not less than d.) Set dβ (1 − 0) = c1 c2 . . . . It is easy to see that {1, Tβ (1), Tβ2 (1), . . . , Tβd−1 (1)} forms a base of Z[β] as a Z-module. Put rn = 1 − Tβn (1) = 1 − π(cn+1 cn+2 . . . ) and W (β) =
d−1 i=0
fi Tβi (1)
fi ∈ Z,
) f0 + f1 + · · · + fd−1 ≥ 0 .
149
S. Akiyama / Pisot Number System and Its Dual Tiling
Similarly as Z[β] ∩ R+ , one may identify W (β) to lattice points in Zd lying above a fixed hyperplane and Φ(W (β)) is dense in Rd−1 . Lemma 1. P := {
k
i=0 bi β
i
| bi ∈ Z+ } ⊂ W (β)
Proof. Consider the regular representation of the multiplication by β with respect to the basis {1, Tβ (1), . . . , Tβd−1(1)}. As Tβj+1 (1) = βTβj (1) − cj+1 , one has ⎞ ⎛ 1 c1 1 ⎜ Tβ (1) ⎟ ⎜c2 1 ⎜ 2 ⎟ ⎜ ⎜ ⎟ ⎜ 1 β ⎜ Tβ (1) ⎟ = ⎜c3 ⎜ ⎟ ⎜ .. .. .. ⎝ ⎠ ⎝. . . d−1 cd ∗ Tβ (1) ⎛
⎞⎛
⎞ 1 ⎟ ⎜ Tβ (1) ⎟ ⎟⎜ 2 ⎟ ⎟ ⎜ Tβ (1) ⎟ ⎟⎜ ⎟ ⎟⎜ ⎟ .. ⎠⎝ ⎠ . d−1 Tβ (1)
where ‘*’ are filled by zeros but 1 appears at most once. The associated matrix being non negative, the result follows. W (β)∩(Z[β]∩R+ ) correspond to lattice points in the cone given by the intersection of two hyperplanes. Lemma 1 supplies √ a large subset in this intersection. Figure 6 shows the regions in the case of β = (1 + 5)/2 where (x, y) corresponds to x + βy.
2 1.5 1 0.5 -4
-3
-2
-1 -0.5 -1 -1.5 -2
P 1
2
3
4
WΒ ZΒ
Figure 6. W (β) and Z[β]+
Proposition 3. The set of β-integers forms a complete representative system of W (β) (mod r1 Z + r2 Z + · · · + rd−1 Z). Proof. As shown in Proposition 2, Z+ β is a uniformly discrete set in R+ such that the distance of adjacent points are in {1, Tβ (1), Tβ2 (1), . . . , Tβd−1 (1)}. Therefore 4 Z+ β ⊂ W (β). We write Z+ β = {z0 , z1 , z2 , . . . | zi < zi+1 } 4 This
proves Lemma 1 again.
150
S. Akiyama / Pisot Number System and Its Dual Tiling
and consider the order-preserving bijection ι : Z+ β → {0, 1, 2, . . . } defined by zi → i. Note that by taking modulo r1 Z + r2 Z + · · · + rd−1 Z, all the distances between adjacent points are identified with 1. Therefore the image of the map ι is uniquely determined by ι(z) ≡ z
(mod r1 Z + r2 Z + · · · + rd−1 Z).
d−1 On the other hand, for any element w = i=0 fi Tβi (1) ∈ W (β), there exists a unique non negative integer k such that w ≡ k (mod r1 Z + r2 Z + · · · + rd−1 Z) given by k = d−1 i=0 fi . The next Corollary seems interesting on its own. Corollary 1. For any x ∈ Z[β], there exists a unique y ∈ Zβ such that x ≡ y (mod r1 Z + r2 Z + · · · + rd−1 Z). Proof. By definition, Z[β] = −W (β) ∪ W (β). Therefore we can naturally extend the map ι in the proof of Proposition 3 to: ι : Z[β] → Z and the assertion follows. By this Proposition 3, through the map Φ we have Φ(W (β)) =
(m1 ,...,md−1 )∈Zd−1
Φ(Z+ β)+
d−1
mi Φ(ri ).
i=1
Taking the closure in Rd−1 we get a periodic locally finite covering: Rd−1 = Tλ + Φ(r1 )Z + · · · + Φ(rd−1 )Z. Theorem 7. If β is a Pisot unit with (W) and the cardinality of {Tβn (1) | n = 0, 1, . . . } \ {0} coincides with the degree d of β, then Rd−1 = Tλ + Φ(r1 )Z + · · · + Φ(rd−1 )Z. forms a periodic tiling. Proof. Take an element w ∈ W (β) \ Z+ β . We wish to prove that Φ(w) is not an inner point of Tλ . Assume the contrary that Φ(w) is an inner point. First we prove the case when w > 0. Choose a sufficiently large k such that Φ(β k + w) ∈ Inn(Tλ ) and wβ = β k + wβ . This is always possible. Indeed if the beta 0∞ , then we may choose expansion of w > 0 is a−m . . . a0 • a1 a2 . . . with a1 a2 · · · = k k−m−1 ⊕ a−m . . . a0 • a1 a2 . . . is admissible. This k > m + d + 1 such that β + w = 10 k means that w + β k ∈ Z+ β . However this is impossible since Φ(w + β ) ∈ Ta1 a2 ... and we already know that {Tω : ω ∈ Z[β] ∩ [0, 1)} forms a tiling by Theorem 3. This proves the case w > 0. Second, assume that w < 0. Recall that β k ∈ W (β) for k = 0, 1, . . . by Lemma 1. By Proposition 3 there exists 0 = (m1 , . . . , md−1 ) ∈ Zd−1 and y ∈ Z+ β such that w = y + mi ri with the beta expansion y = a−m . . . a0 •. Choose k as above, of Tλ and β k + y = 10k−m−1 ⊕ a−m . . . a0 • is then Φ(β k + w) is still an inner point admissible. Then β k + w = β k + y + mi ri ∈ Z+ β by the uniqueness of the expression of Proposition 3. Therefore without loss of generality we reduce the problem to the first case that w > 0.
S. Akiyama / Pisot Number System and Its Dual Tiling
151
Coming back to the example θ, the root of x3 −x2 −x−1. Then r1 = 1−Tθ (1) = θ−3 and r2 = 1 − Tθ2 (1) = θ−2 + θ−3 and we have a periodic tiling: C = Tλ + θ−3 Z + (θ−2 + θ−3 )Z depicted in Figure 7. In the case x3 −3x2 +2x−1, we have r1 = 1−Tβ (1) = 2β −1 −β −2 and r2 = 1 − Tβ2 (1) = β −1 − β −2 . Figure 8 is the corresponding figure. 4
2
-5
5
10
-2
-4
Figure 7. Periodic Rauzy Tiling
Figure 8. Periodic sofic Tiling
8. Boundary Automaton The boundary of tiles is captured by a finite state automaton (more precisely a Buchi Automaton which accepts infinite words) in several ways. We wish to describe one method, whose essential idea is due to Kátai [35]. Under the condition (W), {Tω : ω ∈ Z[β] ∩ [0, 1)} forms a covering of degree one of Rd−1 , and the boundary of the tile is
152
S. Akiyama / Pisot Number System and Its Dual Tiling
a common point of two tiles. Define a labeled directed graph on the vertices Z[β] by drawing edges a|b
z0 −→ z1 whenever two vertices z0 , z1 satisfy z0 = βz1 + a − b with a, b ∈ A. Labels belong to A × A. An essential subgraph of a directed graph is a subgraph such that each vertex has at least one incoming and also outgoing edge. Take a sufficiently large interval containing the origin and a large constant B. Consider an induced subgraph by vertices such that z falls in the interval and |Φ(z)| ≤ B. Then the essential graph of this subgraph does not depend on the choice of the interval and B provided they are large enough. Such B and interval are explicitly given by: |z| ≤
[β] β−1
and
|B| ≤ max
i=2,...,d
[β] . 1 − |β (i) |
On the other hand, the admissible infinite word of beta shift is described by an automaton. By a standard technique to make a Cartesian product of two automata, one obtain a finite automaton which recognizes common infinite words. The infinite walks attained in this manner give us the intersection Tλ ∩ Tω (ω = λ) in terms of infinite words. Therefore it gives the boundary of Tλ . By this automaton the boundary of Tω is given as an attractor of a graph directed set. This automaton, called the neighbor automaton, plays an essential role in the study of topological structure of tiles. If there is a conjugate of β with modulus close to 1, then the size of neighbor automaton becomes huge. This is an obstacle to investigate some property of a family of tiles. If we restrict ourselves to the description of the boundary, there is a better way to make a smaller automaton, the contact automaton (c.f. [30], [47], [48]).
References [1] S. Akiyama, Pisot numbers and greedy algorithm, Number Theory (K. Gy˝ory, A. Peth˝o, and V. Sós, eds.), Walter de Gruyter, 1998, pp. 9–21. , Self affine tilings and Pisot numeration systems, Number Theory and its Applica[2] tions (K. Gy˝ory and S. Kanemitsu, eds.), Kluwer, 1999, pp. 1–17. [3] , Cubic Pisot units with finite beta expansions, Algebraic Number Theory and Diophantine Analysis (F. Halter-Koch and R. F.Tichy, eds.), de Gruyter, 2000, pp. 11–26. , On the boundary of self affine tilings generated by Pisot numbers, J. Math. Soc. [4] Japan 54 (2002), no. 2, 283–308. [5] S. Akiyama, T. Borbély, H. Brunotte, A. Peth˝o, and J. M. Thuswaldner, On a generalization of the radix representation—a survey, High primes and misdemeanours, Fields Inst. Commun., 2004, Amer. Math. Soc., Providence, R.I., pp. 19–27. , Generalized radix representations and dynamical systems I, Acta Math. Hungar. 108 [6] (2005), no. 3, 207–238. [7] S. Akiyama, H. Brunotte, A. Peth˝o, and J. M. Thuswaldner, Generalized radix representations and dynamical systems II, Acta Arith. 121 (2006), 21–61. [8] S. Akiyama, H. Rao, and W. Steiner, A certain finiteness property of Pisot number systems, J. Number Theory 107 (2004), no. 1, 135–160.
S. Akiyama / Pisot Number System and Its Dual Tiling
153
[9] S. Akiyama and T. Sadahiro, A self-similar tiling generated by the minimal Pisot number, Acta Math. Info. Univ. Ostraviensis 6 (1998), 9–26. [10] S. Akiyama and K. Scheicher, Intersecting two-dimensional fractals with lines, to appear in Acta Math. Sci. (Szeged). , Symmetric shift radix systems and finite expansions, submitted. [11] , From number systems to shift radix systems, to appear in Nihonkai Math. J. 16 [12] (2005), no. 2. [13] P. Ambrož, Ch. Frougny, Z. Masáková, and E. Pelantová, Arithmetics on number systems with irrational bases, Bull. Belg. Math. Soc. 10 (2003), 1–19. [14] P. Arnoux and Sh. Ito, Pisot substitutions and Rauzy fractals, Bull. Belg. Math. Soc. Simon Stevin 8 (2001), no. 2, 181–207, Journées Montoises d’Informatique Théorique (Marne-laVallée, 2000). [15] F. Bassino, Beta-expansions for cubic Pisot numbers, vol. 2286, Springer, 2002, pp. 141–152. [16] V. Berthé and A. Siegel, Purely periodic β-expansions in a Pisot non-unit case, preprint. , Tilings associated with beta-numeartion and substitutions, Elect. J. Comb. Number [17] Th. 5 (2005), no. 3. [18] A. Bertrand, Développements en base de Pisot et répartition modulo 1, C. R. Acad. Sci. Paris Sér. A-B 285 (1977), no. 6, A419–A421. [19] F. Blanchard, β-expansions and symbolic dynamics, Theoret. Comput. Sci. 65 (1989), no. 2, 131–141. [20] D. W. Boyd, Salem numbers of degree four have periodic expansions, Number theory, Walter de Gruyter, 1989, pp. 57–64. , On beta expansions for Pisot numbers, Math. Comp. 65 (1996), 841–860. [21] , On the beta expansion for Salem numbers of degree 6, Math. Comp. 65 (1996), [22] 861–875. [23] D. Crisp, W. Moran, A. Pollington, and P. Shiue, Substitution invariant cutting sequences, J. Theor. Nombres Bordeaux 5 (1993), no. 1, 123–137. [24] H. Ei, Sh. Ito, and H. Rao, Atomic surfaces, tilings and coincidences II: Reducible case., to appear Annal. Institut Fourier (Grenoble). [25] L. Flatto, J. Lagarias, and B. Poonen, The zeta function of the beta transformation, Ergod. Th. and Dynam. Sys. 14 (1994), 237–266. [26] N. Pytheas Fogg, Substitutions in dynamics, arithmetics and combinatorics, Lecture Notes in Mathematics, vol. 1794, Springer-Verlag, Berlin, 2002, Edited by V. Berthé, S. Ferenczi, C. Mauduit and A. Siegel. [27] Ch. Frougny and B. Solomyak, Finite beta-expansions, Ergod. Th. and Dynam. Sys. 12 (1992), 713–723. [28] C. Fuchs and R. Tijdeman, Substitutions, abstract number systems and the space filling property, to appear Annal. Institut Fourier (Grenoble). [29] J.-P. Gazeau and J.-L. Verger-Gaugry, Geometric study of the beta-integers for a Perron number and mathematical quasicrystals, J. Théor. Nombres Bordeaux 16, no. 1, 125–149. [30] K. Gröchenig and A. Haas, Self-similar lattice tilings, J. Fourier Anal. Appl. 1 (1994), 131– 170. [31] M. Hama and T. Imahashi, Periodic β-expansions for certain classes of Pisot numbers, Comment. Math. Univ. St. Paul. 46 (1997), no. 2, 103–116. [32] Y. Hara and Sh. Ito, On real quadratic fields and periodic expansions, 1989, pp. 357–370. [33] M. Hollander, Linear numeration systems, finite beta expansions, and discrete spectrum of substitution dynamical systems, Ph.D. thesis, University of Washington, 1996. [34] P. Hubert and A. Messaoudi, Best simultaneous diophantine approximations of Pisot numbers and Rauzy fractals, to appear in Acta Arithmetica. [35] K.-H. Indlekofer, I. Kátai, and P. I. Racskó, Number systems and fractal geometry, Probability theory and applications (L. Lakatos and I. Kátai, eds.), Math. Appl., vol. 80, Kluwer Acad. Publ., Dordrecht, 1992, pp. 319–334.
154
S. Akiyama / Pisot Number System and Its Dual Tiling
[36] Sh. Ito and H. Rao, Atomic surfaces, tilings and coincidences I: Irreducible case., to appear in Israel J. , Purely periodic β-expansions with Pisot unit base, Proc. Amer. Math. Soc. 133 [37] (2005), no. 4, 953–964. [38] Sh. Ito and Y. Sano, On periodic β-expansions of Pisot numbers and Rauzy fractals, Osaka J. Math. 38 (2001), no. 2, 349–368. [39] Sh. Ito and Y. Takahashi, Markov subshifts and realization of β-expansions, J. Math. Soc. Japan 26 (1974), 33–55. [40] D. Lind and B. Marcus, An introduction to symbolic dynamics and coding, Cambridge University Press, Cambridge, 1995. [41] M. Lothaire, Chapter 7, numeration systems, Algebraic combinatorics on words, Encyclopedia of Mathematics and its Applications, vol. 90, Cambridge University Press, 2002. [42] W. Parry, On the β-expansions of real numbers, Acta Math. Acad. Sci. Hungar. 11 (1960), 401–416. [43] B. Praggastis, Numeration systems and Markov partitions from self-similar tilings, Trans. Amer. Math. Soc. 351 (1999), no. 8, 3315–3349. [44] G. Rauzy, Nombres algébriques et substitutions, Bull. Soc. Math. France 110 (1982), no. 2, 147–178. [45] A. Rényi, Representations for real numbers and their ergodic properties, Acta Math. Acad. Sci. Hungar. 8 (1957), 477–493. [46] V. A. Rohlin, Exact endomorphisms of a Lebesgue space, Izv. Akad. Nauk SSSR Ser. Math. 25 (1961), 499–530. [47] K. Scheicher and J. M. Thuswaldner, Canonical number systems, counting automata and fractals, Math. Proc. Cambridge Philos. Soc. 133 (2002), no. 1, 163–182. , Neighbours of self-affine tiles in lattice tilings., Fractals in Graz 2001. Analysis, [48] dynamics, geometry, stochastics. Proceedings of the conference, Graz, Austria, June 2001 (Peter (ed.) et al. Grabner, ed.), Birkhäuser, 2002, pp. 241–262. [49] J. Schmeling, Symbolic dynamics for β-shifts and self-normal numbers, Ergodic Theory Dynam. Systems 17 (1997), no. 3, 675–694. [50] K. Schmidt, On periodic expansions of Pisot numbers and Salem numbers, Bull. London Math. Soc. 12 (1980), 269–278. [51] N. Sidorov, Bijective and general arithmetic codings for Pisot toral automorphisms, J. Dynam. Control Systems 7 (2001), no. 4, 447–472. , Ergodic-theoretic properties of certain Bernoulli convolutions, Acta Math. Hungar. [52] 101 (2003), no. 2, 345–355. [53] B. Solomyak, Conjugates of beta-numbers and the zero-free domain for a class of analytic functions, Proc. London Math. Soc. 68 (1994), 477–498. [54] W. Steiner, Parry expansions of polynomial sequences, Integers 2 (2002), A14, 28. [55] J. Tamura, Certain sequences making a partition of the set of positive integers, Acta Math. Hungar. 70 (1996), 207–215. [56] W. Thurston, Groups, tilings and finite state automata, AMS Colloquium Lecture Notes, 1989. [57] J.-L. Verger-Gaugry, On lacunary Rényi β-expansions of 1 with β > 1 a real algebraic number, Perron numbers and a classification problem, Prepublication de l’Institute Fourier no.648 (2004).
Physics and Theoretical Computer Science J.-P. Gazeau et al. (Eds.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
155
Non-standard number representation: computer arithmetic, beta-numeration and quasicrystals Christiane Frougny a,* a LIAFA UMR 7089 CNRS, and Université Paris 8 Abstract. The purpose of this survey is to present the main concepts and results in non-standard number representation, and to give some examples of practical applications. This domain lies at the interface between discrete mathematics (dynamical systems, number theory, combinatorics) and computer science (computer arithmetic, cryptography, coding theory, algorithms). It also plays an important role in the modelization of physical structures like quasicrystals. Keywords. Number representation, computer arithmetic, quasicrystal
1. Introduction Non-standard number representation is emerging as a new research field, with many difficult open questions, and several important applications. The notions presented in this contribution are strongly related to the chapters of this volume written by Akiyama, Pelantová and Masáková, and Sakarovitch. Our purpose is to explain how the simplest way of representing numbers — an integer base β and a canonical set of digits {0, 1, . . . , β − 1} — is not sufficient for solving some problems. In computer arithmetic, the challenge is to perform fast arithmetic. We will see how this task can be achieved by using a different set of digits. This will also allow on-line arithmetic, where it is possible to pipe-line additions, subtractions, multiplications and divisions. Beta-numeration consists in the use of a base β which is an irrational number. This field is closely related to symbolic dynamics, as the set of β-expansions of real numbers of the unit interval forms a dynamical system. In this survey, we will present results connected with finite automata theory. Pisot numbers, which are algebraic integers with Galois conjugates lying inside the open unit disk, play a key role, as they generalize nicely the integers. * Correspondence to: Christiane Frougny, LIAFA UMR 7089 CNRS, 2 place Jussieu, 75251 Paris cedex 05, France. E-mail:
[email protected].
156
C. Frougny / Non-Standard Number Representation
Quasicrystals are a kind of solid in which the atoms are arranged in a seemingly regular, but non-repeating structure. The first one, observed by Shechtman in 1982, presents a five-fold symmetry, which is forbidden in classical crystallography. In a quasicrystal, the pattern of atoms is only quasiperiodic. The first observed quasicrystal is strongly related to the golden mean, and in this theory also, Pisot numbers are deeply rooted. I will explain how beta-numeration is an adequate tool for the modelization of quasi-crystals.
2. Preliminaries We refer the reader to [17] and to [42]. An alphabet A is a finite set. A finite sequence of elements of A is called a word, and the set of words on A is the free monoid A∗ . The empty word is denoted by ε. The set of infinite sequences or infinite words on A is denoted by AN . Let v be a word of A∗ , denote by v n the concatenation of v to itself n times, and by v ω the infinite concatenation vvv · · · . A word is said to be eventually periodic if it is of the form uv ω . An automaton over A, A = (Q, A, E, I, T ), is a directed graph labelled by elements of A. The set of vertices, traditionally called states, is denoted by Q, I ⊂ Q is the set of initial states, T ⊂ Q is the set of terminal states and E ⊂ Q × A × Q is the set of a labelled edges. If (p, a, q) ∈ E, we note p −→ q. The automaton is finite if Q is finite. The automaton A is deterministic if E is the graph of a (partial) function from Q × A into Q, and if there is a unique initial state. A subset H of A∗ is said to be recognizable by a finite automaton if there exists a finite automaton A such that H is equal to the set of labels of paths starting in an initial state and ending in a terminal state. A subset K of AN is said to be recognizable by a finite automaton if there exists a finite automaton A such that K is equal to the set of labels of infinite paths starting in an initial state and going infinitely often through a terminal state (B˝uchi acceptance condition, see [17]). We are also interested in 2-tape automata or transducers. Let A and B be two alphabets. A transducer is an automaton over the non-free monoid A∗ × B ∗ : A = (Q, A∗ × B ∗ , E, I, T ) is a directed graph the edges of which are labelled by elements of A∗ × B ∗ . Words of A∗ are referred to as input words, as words of B ∗ are referred to as f |g
output words. If (p, (f, g), q) ∈ E, we note p −→ q. The transducer is finite if Q and E are finite. A relation R of A∗ ×B ∗ is said to be computable by a finite transducer if there exists a finite transducer A such that R is equal to the set of labels of paths starting in an initial state and ending in a terminal state. A function is computable by a finite transducer if its graph is computable by a finite transducer. These definitions extend to relations and functions of infinite words as above. A left sequential transducer is a finite transducer where edges are labelled by elements of A × B ∗ , and such that the underlying input automaton obtained by taking the projection over A of the label of every edge is deterministic. For finite words, there is a terminal partial function ω : Q −→ B ∗ , whose value is concatenated to the output word corresponding to a computation in A. The same definition works for functions of infinite words, considering infinite paths in A, but there is no terminal function ω in that case. The notion of a right sequential transducer is defined similarly.
C. Frougny / Non-Standard Number Representation
157
3. Computer arithmetic Computer arithmetic is the field which gathers techniques to build fast, efficient and robust arithmetic processors and algorithms. In what follows, we focus on the problems concerning number representation. 3.1. Standard number representation We consider here only positional number systems, defined by a base β and a set of digits Aβ . In the standard number representation, β is an integer greater than one, β > 1, and Aβ = {0, 1, . . . , β − 1} is also called the canonical alphabet of digits. A β-representation of a positive integer N is a sequence of digits from Aβ , that is to k say a word ak · · · a0 on Aβ , such that N = i=0 ai β i . It is denoted N β = ak · · · a0 , most significant digit first. This representation is unique (called normal) if ak = 0. A β-representation of a number 1] is an infinite sequence (word) (xi )i≥1 x in [0, −i of elements of Aβ such that x = x β . It is denoted xβ = .x1 x2 · · · This i≥1 i representation is unique if it does not end in (β − 1)ω , in which case it is said to be the β-expansion of x. By shifting, any real x > 1 can be given a representation. We now recall some properties satisfied by the standard number system, see [31] for the proofs. Let C be an alphabet of positive or negative digits containing Aβ = {0, . . . , β − 1}. The numerical value in base β is the fonction πβ : C ∗ → Z such k i that πβ (ck · · · c0 ) = i=0 ci β . Define the digit-set conversion on C as the function ∗ ∗ χβ : C → Aβ such that χβ (ck · · · c0 ) = an · · · a0 , where an · · · a0 is a β-representation on Aβ of the number πβ (ck · · · c0 ). P ROPOSITION 1 [17] For any alphabet C the digit-set conversion on C is a right sequential function. Addition can be seen as a digit-set conversion on the alphabet {0, . . . , 2(β − 1)}, subtraction is a digit-set conversion on {−(β − 1), . . . , (β − 1)}, and multiplication by a fixed positive integer m is a digit-set conversion on {0, m, . . . , m(β − 1)}. Notice that arbitrary multiplication of two integers is not computable by a finite automaton.
1|0, 2|1
0|0, 1|1
0|1 / / s 1 0 k |1? 2|0 ? Figure 1. Addition of integers in base 2
On the contrary, division by a fixed positive integer m is a left sequential function. As mentioned above, the representation of the real numbers is not unique, since, for 0 ≤ d ≤ β − 2, the words d(β − 1)ω and (d + 1)0ω have the same value.
158
C. Frougny / Non-Standard Number Representation
N P ROPOSITION 2 The normalization function ν : AN β −→ Aβ which transforms improper ω representations ending in (β − 1) into normal expansions ending in 0ω is computable by a finite transducer.
The picture on Fig. 2 shows a transducer for the normalization base 2. Infinitely repeated states are indicated by double circles. 0|0,1|1
1|0
/ 0|1 / - k 0|0
1|1
k 7 0|0 Figure 2. Normalization of real numbers in base 2
Notice that this transducer is not left sequential. 3.2. Redundant representations Redundant representations are popular in computer arithmetic. Let us take for alphabet of digits a set C = {c, c+ 1, . . . , c+ h− 1}. Fix n to be the number of positions, that is to say the length of the representations of the integers. Then the following result is folklore. n
n
−1 −1 , (c + h − 1) ββ−1 ]. T HEOREM 1 Let I = [c ββ−1 If |C| < β some integers in I have no representation in base β with n positions. If |C| = β every integer in I has a unique representation. If |C| > β, every integer in I has a representation, non necessarily unique. 1 1 , (c + h − 1) β−1 ]. The same result has a real number version, with I = [c β−1 When |C| > β, the system is said to be redundant. Cauchy [12] already considered the case β = 10 and C = {−5, −4, . . . , 4, 5}. In computer arithmetic, the most interesting cases are β = 10 and C = {−6, . . . , 6}, introduced by Avizienis [3], and β = 2 with C = {−1, 0, 1}, see Chow et Robertson [13]. In a redundant number system, it is possible to design fast algorithms for addition. More precisely, take an integer a ≥ 1 and let C = {−a, −a + 1, . . . , a} be a signed digit alphabet. Since the alphabet is symmetric, the opposite of a number is simply obtained by taking opposite digits. From the result above, there is redundancy when 2a ≥ β. To be able to determine the sign of a number represented as a word cn−1 · · · c0 only by looking at the sign of the most significant digit cn−1 , we must take a ≤ β − 1. Under these hypotheses, it is possible to perform addition in constant time in parallel, since there is no propagation of the carry. The idea is the following. First suppose that β/2 < a ≤ β − 1.
C. Frougny / Non-Standard Number Representation
159
Take two representations cn−1 · · · c0 and dn−1 · · · d0 on C, of the numbers M and N respectively. For 0 ≤ i ≤ n − 1 set zi = ci + di . Then, 1. if a ≤ zi ≤ 2a, set ri+1 = 1 and si = zi − β 2. if −2a ≤ zi ≤ −a, set ri+1 = −1 and si = zi + β 3. if −a + 1 ≤ zi ≤ a − 1, set ri+1 = 0 and si = zi . Then set r0 = 0, and, for 0 ≤ i ≤ n − 1, ei = ci + ri , and en = cn . Thus en . . . e0 is a β-representation of M + N , with all the digits ei belonging to C. A slightly more complicated algorithm works in the case β = 2a, where a window is used to look at the sign of the right neighbour of the current position: if a + 1 ≤ zi ≤ 2a, set ri+1 = 1 and si = zi − β if −2a ≤ zi ≤ −a − 1, set ri+1 = −1 and si = zi + β if −a + 1 ≤ zi ≤ a − 1, set ri+1 = 0 and si = zi if zi = a then if zi−1 ≤ 0 set ri+1 = 0 and si = zi ; if zi−1 > 0 set ri+1 = 1 and si = z i − β 5. if zi = −a then if zi−1 ≥ 0, set ri+1 = 0 and si = zi ; if zi−1 < 0 set ri+1 = −1 and si = zi + β.
1. 2. 3. 4.
Special representations in base 2 with digit-set {−1, 0, 1} such that the number of non-zero digits is minimal were considered by Booth [10]. It is a right-to-left recoding of a standard representation: every factor of form 01n , with n ≥ 2, is transformed into 10n−1 ¯1, where ¯1 denotes the signed digit −1. The Booth recoding is a right sequential function from {0, 1}∗ to {−1, 0, 1}∗ realized by the transducer depicted on Fig. 3. 1|0
0|0
1|0¯ 1 1|ε |1 / / + + c a b 3 3 ? |1 ? 0|ε 0|01 Figure 3. Booth right sequential recoding
The applications of the Booth normal form are multiplication, internal representation for dividers in base 4 with digits in {−3, . . . , 3}, and computations on elliptic curves, see [33]. Another widely used representation is the so-called carry-save representation. Here the base is β = 2, and the alphabet of digits is D = {0, 1, 2}. Addition of a representation with digits in D and a representation with digits in {0, 1} with result on D can be done in constant time in parallel. This has important applications for the design of internal adders in multipliers, see [30,20].
160
C. Frougny / Non-Standard Number Representation
3.3. On-line computability In computer arithmetic, on-line computation consists of performing arithmetic operations in Most Significant Digit First (MSDF) mode, digit serially after a certain latency delay. This allows the pipelining of different operations such as addition, multiplication and division. It is also appropriate for the processing of real numbers having infinite expansions. It is well known that when multiplying two real numbers, only the left part of the result is significant. To be able to perform on-line addition, it is necessary to use a redundant number system (see [44], [19]). We now give a formal definition of on-line computability. Let A and B be two finite digit sets. Let ϕ : AN → B N (aj )j≥1 → (bj )j≥1 The function ϕ is said to be on-line computable with delay δ if there exists a natural number δ such that, for each j ≥ 1 there exists a function Φj : Aj+δ → B such that bj = Φj (a1 · · · aj+δ ), where Aj+δ denotes the set of sequences of length j + δ of elements of A. This definition extends readily to functions of several variables. Recall that a distance ρ can be defined on AN as follows: let v = (vj )j≥1 and w = (wj )j≥1 be in AN , set ρ(v, w) = 2−r where r = min{j | vj = wj } if v = w, ρ(v, w) = 0 otherwise. The set AN is then a compact metric space. This topology is equivalent to the product topology. Then any function from AN to B N which is on-line computable with delay δ is 2δ -Lipschitz, and is thus uniformly continuous [23]. It is well known that some functions are not on-line computable, like addition in the standard binary system with canonical digit set {0, 1}. When the representation is redundant, addition and multiplication can be computed on-line. More precisely, in integer base β, addition on the alphabet {−a, . . . , a} is on-line computable with delay 1 if β/2 < a ≤ β − 1, and with delay 2 if β = 2a. Multiplication of two numbers represented in integer base β > 1 with digits in C = {−a, . . . , a}, β/2 ≤ a ≤ β − 1, is computable by an on-line algorithm with delay δ, where δ is the smallest positive integer such that 2a2 1 β + δ ≤a+ . 2 β (β − 1) 2 Thus for current cases, the delay is as follows. If β = 2 and a = 1, δ = 2. If β = 3 and a = 2, δ = 2. If β = 2a ≥ 4 then δ = 2. If β ≥ 4 and if a ≥ β/2 + 1, δ = 1. A left on-line finite automaton is a particular left sequential transducer, which is defined as follows: • there is a transient part: during a time δ (the delay) the automaton reads without writing • and there is a synchronous part where the transitions are letter-to-letter. The following result follows easily from the properties recalled above. P ROPOSITION 3 Let β > 1 be an integer. Every affine function with rational coefficients is computable in base β by a left on-line finite automaton on C = {−a, −a + 1, . . . , a}, with β/2 ≤ a ≤ β − 1.
C. Frougny / Non-Standard Number Representation
161
The following result which is a kind of a converse has been proved by Muller [35]. Again let a such that β/2 ≤ a ≤ β − 1, and take D = {−d, . . . , d} with d ≥ a. Set I = [−a/(β − 1), a/(β − 1)], J = [−d/(β − 1), d/(β − 1)]. Let χ be a function such that there exists a function χR making the following diagram to commute χ
DN −−−−→ ⏐ πβ ⏐ (
BN ⏐ ⏐πβ (
J −−−−→ I χR
The function χR is called the real interpretation of the function χ. P ROPOSITION 4 Let χ be a function as above. Suppose that χ is computed by a left online finite automaton. If the second derivative χ”R is piecewise continuous, then, in each interval where χ”R is continuous, χR is affine with rational coefficients. 3.4. Complex base To represent complex numbers, complex bases have been introduced in order to handle a complex number as a sequence of integer digits. 3.4.1. Knuth number system √ Knuth [29] used base β = i b, with b integer ≥ 2 and digit set Aβ = {0, . . . , b − 1}. In this system every complex number has a representation. If b = c2 , every Gaussian integer has a unique finite representation of the form ak · · · a0 .a−1 . E XAMPLE 1 Let β = 2i, then Aβ = {0, . . . , 3} and z = 4+i is represented as 10310.2. The following √ results are derived from the ones valid in integer base. On Aβ addition in base β = i b is right sequential. On C = {−a, −a + 1, . . . , a} with b/2 ≤ a ≤ b − 1}, addition is computable in constant time in parallel, and realizable by an on-line finite automaton, see [36,23,43]. 3.4.2. Penney number system In this complex number system, the base is of the form β = −b + i, with b integer ≥ 1, and digit set Aβ = {0, . . . , b2 }. The case b = 1 was introduced by Penney [39]. We summarize the main results. Every complex number has a representation. Every Gaussian integer has a unique integer representation of the form ak · · · a0 ∈ A∗β . On Aβ addition in base β = −b + i is right subsequential [41]. The case β = −1 + i and Aβ = {0, 1} has received a lot of attention in computer arithmetics for implementation in arithmetic processors. On C = {−a, −a + 1, · · · , a}, with a = 1, 2 or 3, addition in base −1 + i is computable in constant time in parallel, and realizable by an on-line finite automaton, see [15,36,23,43].
162
C. Frougny / Non-Standard Number Representation
3.5. Real basis Muller in [34] introduced an original way of representing real numbers, for application to the CORDIC algorithms for computation of elementary functions. Let U = (un )n≥0 be a decreasing sequence of positive real numbers, summable, and let D be a finite alphabet of integer digits. Under certain conditions a real number x can be represented as x=
dn un
n≥0
with dn ∈ D by a greedy algorithm. −n For instance, take un = log(1 + 1}. Then every positive real 2 ), and D = {0, number has a representation. If x = n≥0 dn log(1 + 2−n ) then ex =
log(1 + 2−n )dn
n≥0
is obtained with no computation.
4. Beta-numeration When the base β is not an integer, numbers may have more than one representation. This natural redundancy raises questions on the problem of normalization. Here we focus on computations by finite automata. For more details on the relations with symbolic dynamics, see [31] and [1]. There is a nice survey by Berthé and Siegel [8] on the connections with tilings. 4.1. Beta-expansions Let β > 1 be a real number and let D be an alphabet of digits. A β-representation on D of a number x of [0, 1] is an infinite sequence (dj )j≥1 of DN such that j≥1 dj β −j = x. Any real number x ∈ [0, 1] can be represented in base β by the following greedy algorithm [40]: Denote by . and by {.} the integral part and the fractional part of a number. Let x1 = βx and let r1 = {βx}. Then iterate for j ≥ 2, xj = βrj−1 and rj = {βrj−1 }. Thus x = j≥1 xj β −j , where the digits xj are elements of the canonical alphabet Aβ = {0, . . . , β} if β ∈ / N, Aβ = {0, . . . , β − 1} otherwise. The sequence (xj )j≥1 of AN β is called the β-expansion of x. When β is an integer, it is the standard β-ary number system. When β is not an integer, a number x may have several different β-representations on Aβ : this system is naturally redundant. The β-expansion obtained by the greedy algorithm is the greatest one in the lexicographic order. When a β-representation ends with infinitely many zeroes, it is said to be finite, and the 0’s are omitted. Let dβ (1) = (tj )j≥1 be the β-expansion of 1. If dβ (1) is finite, dβ (1) = t1 · · · tN , set d∗β (1) = (t1 · · · tN −1 (tN − 1))ω , otherwise set d∗β (1) = dβ (1). We recall the following result of Parry [37]. An infinite word s = (sj )j≥1 is the β-expansion of a number x of [0, 1[ if and only if for every p ≥ 1, sp sp+1 · · · is smaller in the lexicographic order than d∗β (1).
C. Frougny / Non-Standard Number Representation
163
√ E XAMPLE 2 Consider the golden ratio τ = (1+ 5)/2. Then Aτ = {0, 1}, dτ (1) = 11 √ and d∗τ (1) = (10)ω . The number x = 3 − 5 has for greedy τ -expansion xτ = 1001. Other τ -representations of x are 0111, 100(01)ω , 011(01)ω , . . . It is easily seen that the factor 11 is forbidden in the greedy expansion xτ for any x. A number β such that dβ (1) is eventually periodic is called a Parry number. If dβ (1) is finite it is a simple Parry number. If β is a Parry number the set of β-expansions of numbers of [0, 1] is recognizable by a finite automaton. A Pisot number is an algebraic integer > 1 such that all its algebraic conjugates are smaller than 1 in modulus. The natural integers and the golden ratio are Pisot numbers. Recall that if β is a Pisot number then it is a Parry number [9]. Let D be a digit set. The numerical value in base β on D is the function πβ : DN −→ R such that πβ ((dj )j≥1 ) = j≥1 dj β −j . The normalization on D is the function νD : DN −→ AN β which maps any sequence (dj )j≥1 ∈ DN where x = πβ ((dj )j≥1 ) belongs to [0, 1] onto the β-expansion of x. A digit set conversion in base β from D to Aβ is a function χ : DN −→ AN β such that for each sequence (dj )j≥1 ∈ DN where x = πβ ((dj )j≥1 ) belongs to [0, 1], there exists a sequence (aj )j≥1 ∈ AN β such that x = πβ ((aj )j≥1 ). Remark that the image N χ((dj )j≥1 ) belongs to Aβ , but need not be the greedy β-expansion of x. Some of the results which hold true in the case where β is an integer can be extended to the case where β is not an integer. Let D = {0, . . . , d} be a digit set containing Aβ , that is, d ≥ β. T HEOREM 2 [24] There exists a digit set conversion χ : DN −→ AN β in base β which is on-line computable with delay δ, where δ is the smallest positive integer such that β δ+1 + d ≤ β δ (β + 1). If β is a Pisot number then the digit set conversion χ is computable by a left on-line finite automaton. Note that multiplication in real base β is also on-line computable [26]. We now consider the problem of normalization, see [22,7,25]. T HEOREM 3 If β is a Pisot number then for every alphabet D of non-negative digits normalization νD on D is computable by a finite transducer. Conversely, if β is not a Pisot number, then for any alphabet D of non-negative digits, D ⊇ {0, . . . , β, β + 1}, the normalization νD on D is not computable by a finite transducer. The transducer realizing normalization cannot be sequential. 4.2. U -representations Let U = (un )n≥0 be a strictly increasing sequence of integers with u0 = 1. A U representation of an integer N ≥ 0 is a finite sequence of integers (di )k≥i≥0 such that N = ki=0 di ui . It is denoted (N )U = dk · · · d0 .
164
C. Frougny / Non-Standard Number Representation
A normal or greedy U -representation of N is obtained by the following greedy algorithm [21]: denote q(m, p) and r(m, p) the quotient and the remainder of the Euclidean division of m by p. Let k such that uk ≤ N < uk+1 . Put dk = q(N, uk ) and rk = r(N, uk ), and, for k − 1 ≥ i ≥ 0, di = q(ri+1 , ui ) and ri = r(ri+1 , ui ). Then N = dk uk + · · · + d0 u0 . The word dk · · · d0 is called the normal U -representation of N , and is denoted N U = dk · · · d0 . Each digit di is element of the canonical alphabet AU . E XAMPLE 3 Let U = {1, 2, 3, 5, 8, . . .} be the set of Fibonacci numbers. Then AU = {0, 1} and 6U = 1001. The results in this domain are linked to those on β-expansions. Let G(U ) be the set of greedy or normal U -representations of all the non-negative integers. If U is linearly recurrent such that its characteristic polynomial is exactly the minimal polynomial of a Pisot number then G(U ) is recognizable by a finite automaton. Under the same hypothesis, normalization on every alphabet is computable by a finite transducer, see [31]. A set S ⊂ N is said to be U -recognizable if the set {< n >U | n ∈ S} is recognizable by a finite automaton. Recall the beautiful theorem of Cobham [14] in standard number systems. Two numbers p > 1 and q > 1 are said to be multiplicatively dependent if there exist positive integers k and such that pk = q . If a set S is both p- and q-recognizable, where p and q are multiplicatively independent, then S is a finite union of arithmetic progressions. A generalization of Cobham theorem is the following: let β and γ two multiplicatively independent Pisot numbers. Let U and Y two linear sequences with characteristic polynomial equal to the minimal polynomial of β and γ respectively. The only sets of integers that are both U -recognizable and Y -recognizable are unions of arithmetic progressions [6]. A generalization of Cobham theorem for substitutions was given in [16].
5. Quasicrystals For definitions and more results see the survey by Pelantová and Masáková in this volume. We are interested here with the connexion with beta-numeration. A set X ⊂ Rd is uniformly discrete if there exists a positive real r such that for any x ∈ Rd , the open ball of center x and radius r contains at most one point of X. A set X ⊂ Rd is relatively dense if there exists a positive real R such that for any x ∈ Rd , the open ball of center x and radius R contains at least one point of X. A Delaunay set is a set which is both uniformly discrete and relatively dense. A set X of Rd is a Meyer set if it is a Delaunay set and if there exists a finite set F such that the set of differences X − X is a subset of X + F . Meyer [32] shown that if X is a Meyer set and if β > 1 is a real number such that βX ⊂ X then β must be a Pisot or a Salem number 1 . Conversely for each d and for each Pisot or Salem number β, there exists a Meyer set X ⊂ Rd such that βX ⊂ X. 1 A Salem number is an algebraic integer such that every conjugate has modulus smaller than or equal to 1, and at least one of them has modulus 1.
C. Frougny / Non-Standard Number Representation
165
5.1. Beta-integers Let β > 1 be a real number. The set Zβ of β-integers is the set of real numbers such that the β-expansion of their absolute value has no fractional part, that is, Zβ = {x ∈ R | |x|β = xk · · · x0 }. Then βZβ ⊂ Zβ , Zβ = −Zβ − + Denote Z+ β the set of non-negative β-integers, and Zβ = −(Zβ ).
P ROPOSITION 5 [11] If β is a Pisot number then Zβ is a Meyer set. E XAMPLE 4 Let τ be the golden ratio. + Zτ = Z+ τ ∪ (−Zτ )
= {0, 1, τ, τ 2 , τ 2 + 1, . . .} ∪ {−1, −τ, −τ 2 , −τ 2 − 1, . . .} The set Z+ τ is generated by the Fibonacci substitution L → LS S → L and Zτ is obtained by symmetry for the negative part. •
S
•
L
•
−τ 2 − 1−τ 2
L
•
S
•
−τ -1
L
• 0
L
• 1
S
• τ
L
• τ2
L
•
S
•
τ2 + 1
Zτ is a Meyer set which is not a model set, see [38] for the definition. The τ -expansions of elements of Z+ τ are exactly the expansions in the Fibonacci numeration system of the non-negative integers, that is to say, {0, 1, 10, 100, 101, 1000, . . .}. It is an open problem to characterize the minimal finite sets F such that Zβ − Zβ ⊂ Zβ + F , see in particular [11,28,2] for partial answers. 5.2. Cyclotomic Pisot numbers Bravais lattices are used as mathematical models for crystals. A Bravais lattice is an infinite discrete point-set such that the neighborhoods of a point are the same whichever point of the set is considered. Geometrically, a Bravais lattice is characterized by all Euclidean transformations (translations and possibly rotations) that transform the lattice into itself. The condition 2 cos (2π/N ) ∈ Z, which implies that N = 1, 2, 3, 4, 6, characterizes Bravais lattices which are invariant under rotation of 2π/N , the N -fold Bravais lattices, in R2 (and in R3 ). For these values, N is said to be crystallographic. 2π Let us set ζ = ei N . The cyclotomic ring of order N in the plane is the Z-module:
166
C. Frougny / Non-Standard Number Representation
Z[ζ] = Z[2 cos
2π 2π ] + Z[2 cos ]ζ, N N
This N -fold structure is generically dense in C, except precisely for the crystallographic cases. Indeed Z[ζ] = Z for N = 1 or 2, Z[ζ] = Z + Zi for N = 4 (square lattice), and π Z[ζ] = Z + Zei 3 for the triangular and hexagonal cases N = 3 and N = 6. Note that a Bravais lattice is a Meyer set such that F = {0}. For a general non-crystallographic N , the number 2 cos 2π N is an algebraic integer of degree m = ϕ(N )/2 ≤ (N − 1)/2 where ϕ is the Euler function. A cyclotomic Pisot number with symmetry of order N is a Pisot number β such that Z[2 cos
2π ] = Z[β]. N
What is striking is the fact that, up to now, all the quasicrystals really obtained by the physicists are linked to cyclotomic quadratic Pisot units. More precisely, denote Mβ the minimal polynomial of β. Then √
• N = 5 or N = 10: β = 1+2 5 = 2 cos π5 , Mβ (X) = X 2 − X − 1 √ • N = 8: β = 1 + 2 = 1 + 2 cos π4 , Mβ (X) = X 2 − 2X − 1 √ • N = 12: β = 2 + 3 = 2 + 2 cos π6 , Mβ (X) = X 2 − 4X + 1. Other cyclotomic Pisot units are • N = 7 or N = 14: β = 1 + 2 cos π7 , Mβ (X) = X 3 − 2X 2 − X + 1 • N = 9 or N = 18: β = 1 + 2 cos π9 , Mβ (X) = X 3 − 3X 2 + 1. A complete classification of cyclotomic Pisot numbers of degree ≤ 4 was given by Bell and Hare in [5]. 5.3. Beta-lattices in the plane Let β be a cyclotomic Pisot number with order N symmetry. Then Z[ζ] = Z[β] + Z[β]ζ, 2π with ζ = ei N , is a ring invariant under rotation of order N (see [4]). This ring is the natural framework for two-dimensional structures having β as scaling factor, and 2π/N as rotational symmetry. Generically, let β be a Pisot number; a beta-lattice is a point set Γ=
d
Zβ ei
i=1
where (ei ) is a basis of Rd . Such a set is a Meyer set with self-similarity factor β. Observe that β-lattices are based on β-integers as lattices are based on integers. So β-lattices are good frames for the study of quasiperiodic point-sets and tilings, see [18]. Examples of beta-lattices in the plane are point-sets of the form Γq (β) = Zβ + Zβ ζ q ,
C. Frougny / Non-Standard Number Representation
167
with β a cyclotomic Pisot unit of order N , for 1 ≤ q ≤ N − 1. Note that the latter are not rotationally invariant. Examples of rotationally invariant point-sets based on betaintegers are Λq =
N −1
Γq (β)ζ j , 1 ≤ q ≤ N − 1 ,
j=0
and Zβ [ζ]=
N −1
Zβ ζ j .
j=0
All these sets are Meyer sets.
Figure 4. The τ -lattice Γ1 (τ ) with points (left), and its trivial tiling made by joining points along the horizontal axis, and along the direction defined by ζ.
In the particular case where β is a quadratic Pisot unit, the set of β-integers Zβ can be equipped with an internal additive law, which gives it an abelian group structure [11].
References [1] S. Akiyama, Pisot number system and its dual tiling, this Volume. [2] S. Akiyama, F. Bassino, and Ch. Frougny, Arithmetic Meyer sets and finite automata, Information and Computation 201 (2005), 199-215. [3] A. Avizienis, Signed-digit number representations for fast parallel arithmetic, IRE Transactions on electronic computers 10 (1961), 389–400. [4] D. Barache, B. Champagne and J.P. Gazeau, Pisot-Cyclotomic Quasilattices and their Symmetry Semi-groups, in Quasicrystals and Discrete Geometry (J. Patera ed.) Fields Institute Monograph Series, Volume 10, Amer. Math. Soc., (1998).
168
C. Frougny / Non-Standard Number Representation
[5] J.P. Bell and K.G. Hare, A classification of (some) Pisot-cyclotomic numbers, J. Number Theory 115 (2005), 215–229. [6] A. Bès, An extension of the Cobham-Semënov Theorem. Journal of Symbolic Logic 65 (2000), 201–211. [7] D. Berend and Ch. Frougny, Computability by finite automata and Pisot bases, Math. Systems Theory 27 (1994), 274–282. [8] V. Berthé and A. Siegel, Tilings associated with beta-numeration and substitution, Integers: electronic journal of combinatorial number theory 5 (2005), A02. [9] A. Bertrand, Développements en base de Pisot et répartition modulo 1, C.R.Acad. Sc., Paris 285 (1977), 419–421. [10] A.D. Booth, A signed binary multiplication technique, Quart. J. Mech. Appl. Math. 4 (1951), 236–240. ˇ Burdík, Ch. Frougny, J.-P. Gazeau, R. Krejcar, Beta-integers as natural counting systems [11] C. for quasicrystals, J. Phys. A, Math. Gen. 31 (1998), 6449–6472. [12] A. Cauchy, Sur les moyens d’éviter les erreurs dans les calculs numériques, C. R. Acad. Sci. Paris 11 (1840), 789–798. Reprinted in A. Cauchy, Oeuvres complètes, 1è série, Tome V, Gauthier-Villars, 1885, pp. 431–442. [13] C.Y. Chow and J.E. Robertson, Logical design of a redundant binary adder, Proc. 4th Symposium on Computer Arithmetic, I.E.E.E. Computer Society Press (1978), 109–115. [14] A. Cobham, On the base-dependence of sets of numbers recognizable by finite automata. Math. Systems Theory 3 (1969), 186–192. [15] J. Duprat, Y. Herreros, and S. Kla, New redundant representations of complex numbers and vectors. I.E.E.E. Trans. Computers C-42 (1993), 817–824. [16] F. Durand, A generalization of Cobham’s Theorem. Theory of Computing Systems 31 (1998), 169–185. [17] S. Eilenberg, Automata, Languages and Machines, vol. A, Academic Press, 1974. [18] A. Elkharrat, Ch. Frougny, J.P. Gazeau, J.L. Verger-Gaugry, Symmetry groups for betalattices, Theoret. Comput. Sci. 319 (2004), 281–305. [19] M.D. Ercegovac, On-line arithmetic: An overview, Real time Signal Processing VII SPIE 495 (1984), 86–93. [20] M.D. Ercegovac and T. Lang, Digital Arithmetic, Morgan Kaufmann Publishers - An Imprint of Elsevier Science, 2004. [21] A.S. Fraenkel, Systems of numeration, Amer. Math. Monthly 92(2) (1985), 105–114. [22] Ch. Frougny, Representation of numbers and finite automata, Math. Systems Theory 25 (1992), 37–60. [23] Ch. Frougny, On-line finite automata for addition in some numeration systems, Theoretical Informatics and Applications 33 (1999), 79–101. [24] Ch. Frougny, On-line digit set conversion in real base, Theoret. Comp. Sci. 292 (2003), 221– 235. [25] Ch. Frougny and J. Sakarovitch, Automatic conversion from Fibonacci representation to representation in base ϕ, and a generalization, Internat. J. Algebra Comput. 9 (1999), 351–384. [26] Ch. Frougny and A. Surarerks. On-line multiplication in real and complex base, Proc. IEEE ARITH 16, I.E.E.E. Computer Society Press (2003), 212–219. [27] J.-P. Gazeau, Pisot-cyclotomic integers for quasi-lattices, in The mathematics of long-range aperiodic order (Waterloo, ON, 1995), Kluwer Acad. Publ., Dordrecht, 1997, 175–198. [28] L. S. Guimond, Z. Masáková, E. Pelantová, Arithmetics on beta-expansions, Acta Arithmetica 112 (2004) 23–40. [29] D.E. Knuth, An Imaginary Number System. C.A.C.M. 3 (1960), 245–247. [30] I. Koren, Computer Arithmetic Algorithms, Second Edition, A. K. Peters, Natick, MA, 2002. [31] M. Lothaire, Algebraic Combinatorics on Words, Cambridge University Press, 2002. [32] Y. Meyer, Nombres de Pisot, nombres de Salem et analyse harmonique, Lecture Notes in Math. 117, Springer-Verlag (1970).
C. Frougny / Non-Standard Number Representation
169
[33] F. Morain and J. Olivos, Speeding up the computations on an elliptic curve using additionsubtraction chains, RAIRO Inform. Théor. Appl. 24 (1990), 531–543. [34] J.-M. Muller, Discrete basis and computation of elementary functions, I.E.E.E. Trans. on Computers, C-35 (1985). [35] J.-M. Muller, Some characterizations of functions computable in on-line arithmetic, I.E.E.E. Trans. on Computers, 43 (1994), 752–755. [36] A.M. Nielsen and J.-M. Muller, Borrow-Save Adders for Real and Complex Number Systems. In Proceedings of the Conference Real Numbers and Computers, Marseilles, 1996, 121–137. [37] W. Parry, On the β-expansions of real numbers, Acta Math. Acad. Sci. Hungar. 11 (1960), 401–416. [38] E. Pelantová and Z. Masáková, Quasicrystals: algebraic, combinatorial and geometrical aspects, this Volume. [39] W. Penney, A “binary" system for complex numbers. J.A.C.M. 12 (1965), 247–248. [40] A. Rényi, Representations for real numbers and their ergodic properties. Acta Math. Acad. Sci. Hungar. 8 (1957), 477–493. [41] T. Safer, Radix Representations of Algebraic Number Fields and Finite Automata, in Proceedings Stacs’98, L.N.C.S. 1373 (1998), 356–365. [42] J. Sakarovitch, Introduction to the Theory of Finite Transducers, this Volume. [43] A. Surarerks, Digit Set Conversion by On-line Finite Automata, Bulletin of the Belgian Mathematical Society Simon Stevin 8 (2001), 337–358. [44] K.S. Trivedi and M.D. Ercegovac, On-line algorithms for division and multiplication, I.E.E.E. Trans. on Computers C 26 (1977), 681–687.
This page intentionally left blank
Physics and Theoretical Computer Science J.-P. Gazeau et al. (Eds.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
171
An Introduction to the Theory of Finite Transducers Jacques Sakarovitch * CNRS / ENST, Paris Abstract. These notes give a brief introduction to the theory of finite automata with output, called transducers, and to that end to the part of classical automata theory which is necessary. The main results that shape the theory and give it its significance are stated, without proofs, and are illustrated with few examples, thus setting the general framework of this theory. Keywords. Finite automata, Transducers, Word functions
Introduction The notes that follow correspond to the two lectures I gave at the NATO-ASI School Physics and Computer Science held in Carg`ese from 17 to 29 October 2005. The subject, the study of finite automata with output, would certainly deserve a full book, even not a too small one, and within the page limit of a chapter in these proceedings, I am bound to stay both sketchy and picky. The text is adapted from excerpts I have made from my book entitled Elements of Automata Theory to be published soon by Cambridge University Press and which is a translation of [11]. In contrast, this book is fully comprehensive and probably too rich for someone who has never touched automata theory. This introduction will serve in setting the framework, giving few examples, but not much, and stating — in general without proofs — the main results that shape the theory and give it its significance.
1. From Turing machines to matrix representations In the beginning, Alan Turing invented the Turing machine which happened to be the most general possible definition of what is computable. Turing’goal was to establish undecidability results [15]; no surprise then that almost everything about Turing machines is undecidable, to begin with knowing whether a computation of such a machine eventually finishes or not, the halting problem. To be in the midst of an undecidable world is certainly not what people whose aim is to build computation models and ultimately computing machines want. * Correspondence to: Jacques Sakarovitch, Ecole Nationale Sup´ erieure des T´el´ecommunications, 46, rue Barrault, 75 634 Paris Cedex 13 (France), E-mail:
[email protected]
172
J. Sakarovitch / An Introduction to the Theory of Finite Transducers
In 1959, Michael Rabin and Dana Scott proposed to restrict Turing machine’s abilities and studied various possibilities [9]. Doing this, they met a model introduced and studied by Stephen Kleene few years earlier under the name of finite automaton [8]. This model has been given several descriptions; I choose the labelled finite graph description as it is the one that is apt to both generalisation and mathematical abstraction. One of these is the translation into the matrix representation. 1.1. From Turing machines to finite automata A Turing machine consists, as in Figure 1, of a finite control equipped with a finite memory represented by a finite number of states in which the finite control can be, and an arbitrarily long input tape, which is divided into cells in which are written symbols from the input alphabet. The finite control is connected to a read/write head which moves over the tape and reads or writes symbol in the cell over which it is placed. At each step of the computation the finite control, considering the state p in which it is and the symbol a read by the read/write head, writes, or not, a new symbol b in place of a, changes to a state q and moves the head one square to the left or to the right. The (drastic) limitation proposed by Rabin Finite control and Scott was to restrict the read/write head to State p move in only one direction, from left to right say (and thus not to write anymore). Because there is one input tape, this type of machine is called a one-way one-tape automaton. a 1 a2 a3 a4 an $ If the state q is determined uniquely by Direction of movement of the read head the state p and the letter a, the machine is deterministic; otherwise, the automaton is nonFig. 1: The Rabin–Scott machine deterministic. At the start of a computation, a word is written on the input tape, the finite control is in a distinguished state, called the initial state, and the read head is placed on the first square of the input tape. A computation terminates after a series of steps like that described above when the read head reaches the square in which is written an endmarker, represented in Figure 1, according to common usage, by a $. The computation is termed successful if the state in which the finite control finds itself after reading the endmarker is a special state, called the final state. A word is accepted by the automaton if it can be read by a successful computation. The characteristic of this model is the finiteness of the information that can be used along a computation, even if this computation, and the input data, have an unbounded size. It will be useful, and more effective, to give a more abstract model of this kind of machine. Nevertheless we still have in our terminology input alphabet, final state, reading from left to right, traces if not of the description of physical computation devices then at least of their modelling in terms less removed from reality. 1.2. Automata as labelled graphs An alphabet A is a non empty set, a priori finite, of letters. The set of finite sequences of letters in A, called words, is equipped with the concatenation, an associative operation
173
J. Sakarovitch / An Introduction to the Theory of Finite Transducers
called product, and thus a monoid, denoted A∗ , whose identity element is the empty word, denoted 1A∗ . An automaton is a directed graph which is labelled with letters of the alphabet, and in which two subsets of vertices are distinguished. The rest of the subsection elaborates this basic definition which can be found in other books such as [3,1,7]. An automaton A is specified by giving the following elements: • a non-empty set Q, called the set of states of A; • a set A, also non-empty, called the (input) alphabet of A; • two subsets I and T of Q; I is the set of initial states and T is the set of final or terminal states of A; • a subset E of Q×A×Q, called the set of transitions of A. We write, therefore, fairly naturally, A = Q, A, E, I, T , a notation we keep for the rest of these notes, and we say that A is an automaton over (the alphabet) A. Example 1.1
The automata A1 , P2 and Z1 . cf. Figure 2.
a
2
a p
a
q
0
b
b
1
r
1
1
b (a) The automaton A1
2
0
(b) The automaton P2
a
a
a
a 0
-1 b
1
0
0
b
b
a
1
2 b
b
(c) The automaton Z1 Figure 2. Three automata
Definition 1.1 Let A be a finite alphabet. An automaton over A is finite if and only if its set of states is finite. 2 If e = (p, a, q) is a transition of A, that is if e is in E, we say that a is the label a a → q where it might be ambiguous which of e and we will write p −−→ q , or p −− A
automaton we are considering. We also say that p is the source, and q the destination of the transition e. A computation c in A is a sequence of transitions where the source of each is the destination of the previous one, which can be written thus: a
a
a
c := p0 −−1→ p1 −−2→ p2 · · · −−n→ pn . The state p0 is the source of the computation c, and pn its destination. The length of the computation c is n, the number of transitions which make up c. The label of c is the
174
J. Sakarovitch / An Introduction to the Theory of Finite Transducers
concatenation (product) of the labels of the transitions of c; in the above case, the label of c is a1 a2 · · · an and is written thus: a a ···a
1 2 n c := p0 −−− −−−− → pn
or
a a ···a
1 2 n p0 −−− −−−− → pn .
A
A computation in A is successful if its source is an initial state and its destination is a final state. A word in A∗ is called accepted or recognised by A if it is the label of a successful computation in A. The language accepted, or recognised by A, also called the behaviour of A, written L(A) or | A | , is the set of words accepted (or recognised) by A: f
L(A) = {f ∈ A∗ | ∃p ∈ I , ∃q ∈ T
p −−→ q} . A
Two automata are equivalent if they recognise the same language. Example 1.1 (continued) L(A1 ) is the set of words in {a, b}∗ that contain a factor a b . L(P2 ) is the set of binary representations of numbers divisible by 3. The language ∗ recognised by the automaton Z1 is the set of words in {a, b} that contain as many a’s ∗ as b’s: Z1 = L(Z1 ) = {f ∈ {a, b} |f |a = |f |b } . 2 Definition 1.2 Let A be a finite alphabet. A language L of A∗ is called rational if there exists a finite automaton A over the alphabet A such that L = L(A). The family1 of rational languages of A∗ is written Rat A∗ . 2 An automaton is trim if every state belongs to a successful computation. An automaton is ambiguous if a word is the label of two distinct successful computations, unambiguous otherwise. 1.3. Matrix representation of automata Let A = Q, A, E, I, T be a finite automaton over A; the (boolean) matrix representation of A is the triple (λ, μ, ν) where μ : A∗ → BQ×Q is the morphism defined by ∀a ∈ A
aμp,q =
1 0
if (p, a, q) ∈ E , otherwise
and where λ ∈ BQ×1 and ν ∈ B1×Q are the two (boolean) vectors, respectively row and column, of dimension Q, defined by λq =
1 0
if q ∈ I otherwise
and
νp =
1 0
if p ∈ T . otherwise
1 Classically, Rat A∗ is defined as the smallest family of subsets of A∗ that contains the finite subsets and that is closed under union, product and generated submonoid. And it is a theorem, Kleene’s theorem indeed, that this family is precisely the family of languages accepted by finite automata. We shall not be interested by this side of the theory and we thus make this shortcut.
175
J. Sakarovitch / An Introduction to the Theory of Finite Transducers
Conversely, a triple (λ, μ, ν) where μ : A∗ → BQ×Q is a morphism and where λ ∈ BQ×1 and ν ∈ B1×Q are two (boolean) vectors completely defines the automaton A whose matrix representation is (λ, μ, ν). We can thus write A = (λ, μ, ν) as well. The morphism μ warrants being called a representation of the automaton A since, for any word f , the entry (p, q) of a matrix f μ is 1 if and only if there exists in A a computation from p to q labelled f ; that is: ∀f ∈ A
∗
f μp,q =
⎧ ⎨1
if
⎩0
otherwise
f
p −−→ q A
.
From this we deduce: Property 1.1
L(A) = {f ∈ A∗ | λ·f μ·ν = 1} = {r ∈ BQ×Q | λ · r · ν = 1} μ−1 .
Example 1.1 (continued) The representation (λ1 , μ1 , ν1 ) of the automaton A1 (Figure 2 (a)) which recognises L1 = A∗ abA∗ is
λ1 = 1 0 0 ,
⎛
aμ1 =
⎞ 110 ⎝0 0 0 ⎠ , 001 ⎞ 101 ⎝0 0 0 ⎠ 001
⎞ 100 ⎝0 0 1 ⎠ , 001 ⎛
bμ1 =
ν1 =
⎛ ⎞ 0 ⎝0⎠ 1
.
⎛
We compute for example: (abab)μ1 =
from which λ1 · (abab)μ1 · ν1 = 1.
2
Let A and B be two automata with representation (λ, μ, ν) and (η, κ, ξ) (of dimension Q and R) respectively. The product A × B is the automaton whose representation is (λ ⊗ η, μ ⊗ κ, ν ⊗ ξ), where ⊗ is the tensor product — the verification that μ ⊗ κ is a morphism is straightforward. It then holds L(A×B) = L(A) ∩ L(B). An automaton A is unambiguous iff the trim part of A×A is isomorphic to A. It is thus decidable whether an automaton is ambiguous or not. 1.4. Kleene Theorem An automaton A is deterministic if its matrix representation (λ, μ, ν) is row-monomial, that is, λ has only one non zero entry and for every letter a the matrix aμ is rowmonomial. The phase space of a finite automaton A = Q, A, E, I, T = (λ, μ, ν) is the set RA = {λ · f μ | f ∈ A∗ }. It is a subset of BQ and is thus finite. The phase space RA of A = (λ, μ, ν) is naturally the set of states of a (finite) deterministic automaton Adet equivalent to A. The unique initial state is λ = λ · (1A∗ )μ, a state λ · f μ is final iff λ · f μ · ν = 1 and all transitions of Adet are of the form a
λ · f μ −−→ λ · (f a)μ . Adet
We have thus proved that any finite automaton is equivalent to a (finite) deterministic one.
176
J. Sakarovitch / An Introduction to the Theory of Finite Transducers
Definition 1.3 Let A be a finite alphabet. A language L of A∗ is called recognisable if there exists a morphism α : A∗ → N where N is a finite monoid and such that P = P αα−1 . The family of recognisable languages of A∗ is written Rec A∗ . 2 It then holds Theorem 1.1 [Kleene 56]
If A is finite, Rat A∗ = Rec A∗ .
In particular, this implies that Rat A∗ is a Boolean algebra and that it is decidable whether two finite automata over A are equivalent or not.
2. Finite transducers What makes the richness of the finite automaton model, what explains its ubiquity, is the possibility of choosing the labels of transitions in other structures than the free monoids. Many properties will be changed then, but some remain and new appear. It is the case with transducers, which remarkably exemplifies automata theory, the variety of its methods and of its applications; “automata with output” are susceptible of particularly elementary presentation and yet some of their properties involve deeply algebraic methods. 2.1. Definitions An automaton A over a monoid M , A = Q, M, E, I, T , is defined as above, but for the label of the transitions that are taken in M . The behaviour of A, denoted | A | is the set of labels of its successful computations, a subset of M . The automaton A is finite if E, the set of transitions, is finite. A subset of a monoid M is rational iff it is accepted by a finite automaton over M and the family of rational subsets of M is denoted Rat M . A (finite) transducer T is a (finite) automaton over a direct product of free monoids A∗×B ∗ (or A∗1×A∗2×· · ·×A∗n ). The behaviour of a transducer T over A∗×B ∗ is thus a relation from A∗ into B ∗ , a rational relation if T is finite (cf. [3,1]). In the sequel, we consider only finite transducers which we call simply transducers. In a transducer T over A∗ × B ∗ , if we project every label on the first component, we get an automaton over A∗ called the underlying input automaton of T and which recognises the domain — thus rational — of the relation θ realised by T . Similarly, the image of θ is in Rat B ∗ . The exchange of the components of the labels in T yields a transducer that realises the inverse of θ, a rational relation from B ∗ into A∗ . Example 2.1 The automaton P2 (of Figure 2 (b)) is easily transformed into a transducer Q2 (Figure 3) whose behaviour is the set of pairs (f, g) where f is the writing of 3 n and g the writing of n, possibly prefixed by one or two 0’s. 2 0|0
1|0 0
1|1
1|1
0|0 1
2
0|1
Figure 3. The transducer Q2 which computes the quotient by 3 of a number written in binary.
J. Sakarovitch / An Introduction to the Theory of Finite Transducers
177
A Rat B ∗ -representation of A∗ is a triple (λ, μ, ν) where μ : A∗ → (Rat B ∗ )Q×Q is a morphism — Q is a finite set — and where λ and ν are two vectors, respectively row and column, of dimension Q, with entries in Rat B ∗ . Such a representation realises the relation θ from A∗ into B ∗ defined by f θ = λ · f μ · ν for every f in A∗ . Theorem 2.1 [Kleene–Schutzenberger ¨ 60] A relation θ from A∗ into B ∗ is rational ∗ iff there exists a Rat B -representation of A∗ which realises θ. 2.2. The pitfall of undecidability Contrary to what happens in A∗ , Rat A∗ ×B ∗ is not a Boolean algebra: it is not closed under intersection. Figure 4 shows two automata; the behaviour of the one on the left handside is V1 = {(am bn , cm ) n, m ∈ N} the behaviour of the one on the right intersection V1 ∩W1 = {(an bn , cn ) handside is W1 = {(am bn , cn ) n, m ∈ N} . The n ∈ N} is not rational since its domain {an bn n ∈ N} is not. a|c
b|1
a|1
b|1
b|c b|c
Figure 4. Two automata over {a, b}∗ ×{c}∗
A bad news never comes alone. It is not only the case that the intersection of two rational relations is not necessarily rational but it is even undecidable whether the intersection of two rational relations is empty or not [9]. Along the same line, it is shown that it is undecidable whether two given finite transducers are equivalent or not [5]. 2.3. The composition theorem In spite of this shortcoming, the rational relations form nevertheless a family with many properties and the finite transducers provide a relevant modelisation of computing machines. This is due on one hand to the fact that rational relations make up a framework inside which subfamilies with stronger properties are studied more specifically — as we shall do below for rational functions — and on the other hand to the stability of the family which is expressed through the following closure result. Theorem 2.2 [Elgot & Mezei [4]] is a rational relation.
The composition product of two rational relations
From which we deduce that the image of a rational language by a rational relation is a rational language. There are several proofs for Theorem 2.2 which correspond indeed to slightly different statements. We sketch and examplify here the proof based on the composition product of representations [12]. Let μ : A∗ → Rat B ∗ Q×Q and κ : B ∗ → Rat C ∗ R×R be two morphisms. It is first shown that if L is in Rat B ∗ then Lκ is in Rat C ∗ (as for relations, κ is extended additively to subsets of B ∗ ). Moreover κ is applied component wise to any matrix (and thus vector) whose entries are subsets of B ∗ . The composition product of μ by κ, denoted π = μ ◦ κ , is the mapping π : A∗ → Rat C ∗ (Q×R)×(Q×R) defined by
178
J. Sakarovitch / An Introduction to the Theory of Finite Transducers
∀w ∈ B ∗
(w)μ ◦ κ = [wμ]κ .
(1)
It is then shown that μ ◦ κ is a morphism. The composition product of the representations (ζ, π, ω) = (λ, μ, ν) ◦ (η, κ, χ) is defined by the formulas: π =μ◦κ,
ζ = η · [λ]κ
ω = [ν]μ · χ .
and
(2)
Theorem 2.2 follows then from: Theorem 2.3 Let θ : A∗ → B ∗ and σ : B ∗ → C ∗ be two rational relations that are realised respectively by the representations (λ, μ, ν) and (η, κ, χ), then the composed relation θ σ : A∗ → C ∗ is realised by the composition product (ζ, π, ω) = (λ, μ, ν) ◦ (η, κ, χ) . b|b
a|a
b|b
b | ba
a|a a | aa
a | ab
a|1 1|a
b|1
1 | ab
b|1
1 | bb
(a) The automaton D2
a | ab 1|b
b|1
(b) The transducer E2 Figure 5. Two transducers
Example 2.2 The transducers D2 and E2 shown at Figure 5 have the representations (λ2 , μ2 , ν2 ) and (η2 , κ2 , ξ2 ) respectively. ⎞ ⎞ ⎛ ⎞ ⎛ ⎛ 1 0 1 0 b 0 0 λ2 = 1 0 0 , aμ2 = ⎝0 a 0⎠ , bμ2 = ⎝0 0 1⎠ , ν2 = ⎝ a ⎠ , ab 0 ab 0 0 ba 0 ⎞ ⎞ ⎛ ⎛ ⎛ ⎞ 0 0 0 b10 0 η2 = b b b 1 , aκ2 = ⎝a a 0 0⎠ , bκ2 = ⎝0 0 1⎠ , ξ2 = ⎝0⎠ . 0 ab a 000 1 The composition product of (λ2 , μ2 , ν2 ) by (η2 , κ2 , ξ2 ) gives the representation (χ2 , π2 , ψ2 ) = (λ2 , μ2 , ν2 ) ◦ (η2 , κ2 , ξ2 ) :
χ2 = bb b 1 0 0 0 0 0 0 ⎛
aπ2 =
⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝
0 0 0
1 0 0 1 0 0 0 0 aa 0 0 ab 0 0 aab aa 0 0
0 0 1 0 0 a 0 0 ab
0 0 0
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟, ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
⎛
bπ2 =
b10 ⎜001 ⎜ ⎜000 ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ aa 0 0 ⎜ ⎝ 0 ab a 0 0 0
0
0
0
0
which corresponds to the transducer shown at Figure 6.
,
0
⎞
⎟ ⎟ ⎟ ⎟ 1 0 0 ⎟ ⎟ 0 1 0 ⎟ ⎟, 0 0 1 ⎟ ⎟ ⎟ ⎟ ⎠
0
⎛
ψ2 =
⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝
0 0 1 0 0 a 0 0 ab
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟, ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
2
J. Sakarovitch / An Introduction to the Theory of Finite Transducers
1 | bb b|b
1|b
b|1
a|1
179
b|1
a|1
a|1 a|a
a | aa
b|1
a | ab 1|a
a | aa + b | ab
b|1 b | aa
a | ab
b|1 a | aab
b|a
1 | ab
Figure 6. The composition of D2 by E2
3. Functional transducers A transducer is functional if its behaviour is a functional (rational) relation — also called rational function — that is, if the image of any word in the domain is a singleton. As we shall see below, and in contrast with the general case for rational relations, equivalence is decidable for rational functions which makes this subfamily even more appealling. The first question which arises is thus to know whether, given a rational relation, defined by a (finite) transducer which realises it we can effectively recognise that it is a function, and thereby exploit the functional properties which we shall prove. Consider for instance the transducer G1 in Figure 7; it is not obvious that the relation which it realises is functional. a | x3 a|x
a | x4 a | x3
a|x a | x2
Figure 7. Is the relation realised by the transducer G1 functional?
3.1. Deciding functionality Let θ : A∗ → B ∗ be a rational relation and (λ, μ, ν) be an arbitrary trim representation of θ. If θ is a function, then all the non-zero entries (of the vectors λ and ν and) of the matrices f μ , for all f in A∗ , are monomials. However, since the property must hold for all matrices f μ, this characterisation is not in itself effective and does not enable us to decide the functionality of a relation, which we shall now do. A transducer T = Q, A, B ∗ , E, I, T is not functional if and only if there exist in T two distinct successful computations
180
J. Sakarovitch / An Introduction to the Theory of Finite Transducers a1 |u
a2 |u
an |u
T
T
T
n c := q0 −−−−1→ q1 −−−−2→ · · · −−−−− → qn
and
a1 |u 1
a2 |u 2
an |u
T
T
T
(3)
n c := q0 −−−−→ q1 −−−−→ · · · −−−−− → qn ,
with u1 u2 · · · un = u1 u2 · · · un . There exists at least one j such that uj = uj and, by the above property, at least one j such that qj = qj . This implies in particular that, by ignoring the second component of labels, the automaton A underlying T is ambiguous. We shall reuse and extend the method of the cartesian square of an automaton which enables us to decide whether or not an automaton is ambiguous. By definition, the (cartesian) product of T by itself (the square of T ) is the transducer T ×T from A∗ to B ∗ ×B ∗ : T ×T = Q×Q, A, B ∗ ×B ∗ , F, I ×I, T ×T , whose transition set F is defined by F = {((p, r), (a, (u , u )), (q, s)) (p, (a, u ), q)
and
(r, (a, u ), s) ∈ E} .
In particular, the underlying input automaton of T × T is the cartesian square of the underlying input automaton A of T . If A is unambiguous, T is clearly functional and we have seen that in this case the trim part of A×A , or of T ×T , is reduced to its diagonal. To decide whether T is functional when A is ambiguous, we shall extend T × T with a valuation. The valuation describes the conditions under which two words such that u1 u2 · · · un and u1 u2 · · · un are equal or not and, more precisely, what minimal information we must preserve at each step i to be able to reach a conclusion at the final step n. Let B ∗ be the output monoid of T , and HB the set defined by HB = (B ∗ ×1B ∗ ) ∪ (1B ∗ ×B ∗ ) ∪ {00} , where 0 is a new symbol. We shall define an action of B ∗ ×B ∗ on HB in the following manner. We write u v if u is a prefix2 of v and start by defining a map ψ from B ∗×B ∗ to HB by
∀u, v ∈ B
∗
⎧ −1 ⎪ ⎨ (v u, 1B ∗ ) (u, v)ψ = (1B ∗ , u−1 v) ⎪ ⎩ 0
if v u , if u v , otherwise.
Intuitively, (u, v)ψ is either the ‘lead’ (v −1 u, 1B ∗ ) of the first component u over the second, or its ‘delay’ (1B ∗ , u−1 v) , or 0 , the observation that u and v are not prefixes of the same word and cannot therefore be ‘completed’ to give the same output. We then verify: 2 If
u is a prefix of v, then u = v w and, by definition, v−1 u = w.
J. Sakarovitch / An Introduction to the Theory of Finite Transducers
Lemma 3.1
181
The map ωB from HB ×(B ∗ ×B ∗ ) to3 HB , defined by
∀(f, g) ∈ HB \ 0 , ∀(u, v) ∈ B ∗ ×B ∗
((f, g), (u, v))ωB = (f u, g v)ψ
and
(00, (u, v))ωB = 0
is an action (of B ∗ × B ∗ on HB ) which we shall call a Lead or Delay action (relative to B ∗ ) and which we shall write from now on as a dot. We verify in particular that (u, v)ψ = (1B ∗ , 1B ∗ ) if and only if u = v , that is: Property 3.1
The stabiliser set of (1B ∗ , 1B ∗ ) for ωB is the diagonal of B ∗ ×B ∗ : (1B ∗ , 1B ∗ ) · (u, v) = (1B ∗ , 1B ∗ )
⇐⇒
u=v .
(4)
In what follows, we shall implicitly consider ωB also to be an action of A∗ ×(B ∗×B ∗ ) on HB by setting h · (a, (u, v)) = h · (u, v) for all a; that is, by ignoring the first component. Recall that we say an action δ is a valuation of an automaton A if the product A×δ is in bijection with A. We can now state the promised characterisation. Proposition 3.2 A transducer T from A∗ to B ∗ is functional if and only if the Lead or Delay action ωB is a valuation of the trim part U of T ×T such that the value of each final state (of U ×ωB ) is (1B ∗ , 1B ∗ ). Example 3.1 Figure 8 shows the valuation of the cartesian square of the transducer G1 from Figure 7 by the Lead or Delay action relative to x∗ . In this case we can identify H{x} with Z and label the states of the square by the corresponding integers. The figure is already crowded and we could not easily add the labels of the transitions without losing legibility. The initial label is always a and remains implicit. Instead of an outgoing label of the form (xn , xm ) we just indicate the quantity n − m, symbolised by the style of the arrow: 0 by a dotted arrow, 1 by a normal arrow, 2 by a bold arrow, 3 by a double arrow, and negative values by the same types of arrow, but dashed. 2 The construction of Figure 8 is thus in itself a proof that the transducer G1 is functional. More generally, the computations of the square T ×T , of its trim part U and of the product of U and ωB are effective: their complexity is proportional to the square of the number of transitions of T , as long as the computations on the words are counted as having a fixed cost, independent of the length of the words. We can therefore state: Theorem 3.1 [Schutzenberger ¨ [13]] realises a functional relation.
It is decidable whether a finite transducer T
It follows from the above sketch of proof that the complexity of the decision procedure is O(m2 ) (if T has m transitions). 3 We
shall not forget that if (f, g) ∈ HB \ 0 , at least one of f and g is equal to 1B∗ .
182
J. Sakarovitch / An Introduction to the Theory of Finite Transducers a | x3 a | x4
a|x
a|x
a | x3
a | x2
0
-1
1
0
1
0
2
1
-1
-2
0
-1
0
-1
1
0
a|x
a | x3 a | x4 a | x3
a|x
a | x2
Figure 8. The square of G1 is valued by ωB (with B = {x} ): G1 is functional
Remark 3.1 If T is a transducer from A∗ to B ∗ which recognises a relation α, the automaton obtained from T ×T by ignoring the first component is an automaton over B ∗×B ∗ whose behaviour is the graph of the relation α−1 α . The relation α is a function if and only if α−1 α is the restriction of the identity to Im α and the verification of this last property comes down exactly to the proof of Proposition 3.2. 2 3.2. The family of rational functions Given two functions θ : A∗ → B ∗ and σ : A∗ → B ∗ , we say that θ is contained in σ ( θ ⊆ σ ), if, as for relations, the graph of θ is included in that of σ ( θ0 ⊆ σ 0 ). Since σ is a function, this is the same as saying that, for all f in A∗ , if f θ is defined, then f θ = f σ . Recall also that θ ∪ σ is the relation whose graph is the union of the graphs of θ and σ; that is θ ∪ σ = θ0 ∪ σ 0 . It is obvious that θ ⊆ σ if Dom θ ⊆ Dom σ and if θ ∪ σ is a function. Theorem 3.1 therefore implies: Corollary 3.3
The equivalence of rational functions is decidable.
Note, by contrast, that it is undecidable in general whether the intersection of the graphs of two rational functions θ and σ is empty; that is, whether there exists a word f in A∗ such that f θ = f σ . Rational functions can be characterised by a remarkable structure theorem which makes use in its statement of a particular class of functions, the sequential ones, that we consider now.
J. Sakarovitch / An Introduction to the Theory of Finite Transducers
183
4. Sequential transducers The most important family of rational functions, which is so because it corresponds to the most natural model of a physical computer as much as because it is theoretically important, is that of functions realised by a deterministic finite automaton producing an output at each transition. Theoretic efficiency calls for a concept — due to Sch¨utzenberger [14] — that differs slightly from this down-to-earth intuition. 4.1. Definitions and examples Definition 4.1 We shall say that a rational function is sequential if it has a rowmonomial representation. 2 More explicitly, α : A∗ → B ∗ is sequential if there exists a representation (λ, μ, ν) of α such that: (i) the row vector λ has a single non-zero entry; (ii) for each a in A, the matrix aμ is row monomial; that is, each row of aμ has at most one non-zero entry; (iii) the non-zero elements of λ , aμ and ν are words of B ∗ . In other words, α : A∗ → B ∗ is sequential if it is realised by a real-time transducer whose underlying input automaton is deterministic and whose outputs, and initial and final functions, take values in B ∗ .4 Definition 4.2 sentation.
A rational function is co-sequential if it has a column-monomial repre2
More explicitly, α : A∗ → B ∗ is co-sequential5 if there exists a representation (λ, μ, ν) of α such that: (i) the row vector ν has a single non-zero element; (ii) for each a in A, the matrix aμ is column monomial; (iii) the non-zero elements of λ, aμ and ν are words of B ∗ . In other words, α : A∗ → B ∗ is co-sequential if it is recognised by a real-time left transducer whose underlying input automaton is co-deterministic, or, which is equivalent, by a real-time right transducer whose underlying input automaton is deterministic, and whose outputs and the initial and final functions take values in B ∗ . Remark 4.1 I continue to flout established terminology, and here is why: I reserve the qualifiers ‘left’ and ‘right’ for automata (and hence for transducers) which model physical machines, and according to as they read words from left to right or from right to left respectively. Functions are themselves neither left nor right, but are as they are, and are realised by transducers which can be left or right, with dual properties. 2 4 This terminology risks creating confusion for the reader already familiar with the subject since I call sequential that which is commonly called (following Sch¨utzenberger [14]) sub-sequential. I do so because I think that the fundamental object is indeed that which is called here a sequential function, and that it thus merits the basic term. 5 In the usual terminology: right sub-sequential function.
184
J. Sakarovitch / An Introduction to the Theory of Finite Transducers
Example 2.2 (continued) The representations (λ2 , μ2 , ν2 ) and (η2 , κ2 , ξ2 ) are rowmonomial and column-monomial respectively: the transducers D2 and E2 , and the functions they realise are sequential and co-sequential respectively. 2 Example 4.1 Addition in base 2. The addition of two numbers is most obviously performed by a machine with three tapes: one for each of the operands and one for the result. In fact, we can first add the operands digit by digit – this operation can obviously be performed by a very simple one-state three-tape automaton – which gives a number written in the alphabet {0, 1, 2}. The operation of addition then comes down to rewriting this word in the alphabet {0, 1} while preserving the numeric value. For example, 1101 + 101 (that is 13 + 5 = 18) is first written 1202 (that is, 18 = 23 + 2 · 22 + 2 · 22 ) which must be rewritten as 10010 (that is, 18 = 24 + 21 ). In Figure 9 (b) we show the co-sequential transducer which realises this normalisation on the alphabet {0, 1, 2}. (For we use digits as letters, we will not give the matrix representation of this example.) In fact, it is not easy to understand the behaviour of a cosequential (left) transducer (or even, of a co-deterministic (left) transducer); that is why we prefer to start by constructing a right sequential transducer – shown in Figure 9 (a) – which therefore reads and writes and words from right to left.6 It is then enough to reverse all the arrows, including the initial and final arrows, to obtain the transducer we want. 2
1|0 + 2|1
0|0 + 1|1
1|0 + 2|1
C |1
0|0 + 1|1 2|0
2|0
N 0|1
C |1
(a) the right transducer
N 0|1
(b) the left transducer
Figure 9. Two transducers for addition in base 2
Example 2.1 (continued) The transducer Q2 , shown in Figure 3, is both sequential and co-sequential. For the same reason as above, we do not give its matrix representation. More generally, operations on the written forms of numbers, addition and various normalisations, multiplication and division by fixed integers, are an almost inexhaustible field of examples of sequential and co-sequential functions (cf. [6] in this volume). 2 The consistency of the theoretical study of sequential functions is secured by the following. Theorem 4.1 [Choffrut [2]] a sequential function.
It is decidable whether a functional transducer realises
6 The reader should not be unduly disconcerted by this change in the direction of reading, since this is the way in which one is accustomed to perform addition.
J. Sakarovitch / An Introduction to the Theory of Finite Transducers
185
4.2. Semi-monomial representations Since the composition product of two row-monomial (resp. column-monomial) matrices is obviously a row-monomial (resp. column-monomial) matrix, and a representation of the composition of two relations is given by the composition product of the representations, it follows that: Proposition 4.1 The composition of two sequential (resp. co-sequential) functions is a sequential (resp. co-sequential) function. Definition 4.3 A matrix m is called (row-column) semi-monomial if there is a block decomposition of m such that m is row monomial as a matrix of blocks and each non-zero block is column monomial. 2 Naturally, we say that a representation is (row-column) semi-monomial if all the matrices λ, ν, and aμ, for all a, are semi-monomial matrices, with respect to the same block decomposition. Just as naturally, we have the dual notions of the column-row semi-monomial matrix and representation: column-monomial matrices of row-monomial blocks. The compatibility of matrix multiplication with their block decompositions implies: Property 4.1 The product of two semi-monomial matrices (of congruent size and block decomposition) is a semi-monomial matrix. The formulas for the composition of representations therefore gives the two following propositions: Corollary 4.2 The composition product of a sequential function by a co-sequential function has a row-column semi-monomial representation. Corollary 4.3 The composition product of a co-sequential function by a sequential function has a column-row semi-monomial representation. Example 2.2 (continued) The representation (χ2 , π2 , ψ2 ) = (λ2 , μ2 , ν2 )◦(η2 , κ2 , ξ2 ) is (row-column) semi-monomial. 2 4.3. A structure theorem for rational functions Rational functions are characterised by the following. Theorem 4.2 [Elgot & Mezei [4]] Every rational function from one free monoid to another is the composition of a sequential function with a co-sequential function. The original proof of this fundamental result is rather obscure. In [3], Eilenberg gave a proof based on the construction of a new model of transducers called bi-machine. I prefer to derive Theorem 4.2 from another (?) construction due to Sch¨utzenberger as well and that can be stated as follows (cf. [10] also). Theorem 4.3 [[14]]
Every rational function has a semi-monomial representation.
186
J. Sakarovitch / An Introduction to the Theory of Finite Transducers
Based on this result, Theorem 4.2 is the converse of Corollary 4.2: Proposition 4.4 Let τ : A∗ → B ∗ be a rational function with a semi-monomial representation (χ, π, ψ) . Then there exists: (i) an alphabet Z; (ii) a sequential function θ : A∗ → Z ∗ with a row-monomial representation (λ, μ, ν) ; and (iii) a co-sequential function σ : Z ∗ → B ∗ with a column-monomial representation (η, κ, ξ) , such that (χ, π, ψ) = (λ, μ, ν) ◦ (η, κ, ξ) (and hence τ = θ σ ). Proof. Let (χ, π, ψ) be a semi-monomial representation of dimension Q×R; that is, for all a in A, aπ is a Q × Q matrix of row-monomial blocks, each block being a column-monomial R×R matrix. Then let Z = {Q×A} {Q} . The representation (λ, μ, ν) , where μ is a morphism from A∗ to (Rat Z ∗ )Q×Q , is defined by ∀p, q ∈ Q , ∀a ∈ A
aμp,q =
(p, a) 0
if the block aπ(p,R),(q,R) is non-zero, otherwise,
(since aπ is semi-monomial, aμ is row monomial), and by
and
1 0
∀q ∈ Q
λq =
∀p ∈ Q
νp = p .
if the block χ(q,R) is non-zero, otherwise,
The morphism κ of the representation (η, κ, ξ) is defined, for elements of Z of the form (p, a), by ∀p ∈ Q , ∀a ∈ A
(p, a)κ = aπ(p,R),(q,R) ,
where q is the unique state such that aμp,q is non-zero. Since aπ is semi-monomial, (p, a)κ is column monomial. We then choose an arbitrary element t of R and, for all p in Q, define pκ as the R×R matrix whose tth column is equal to ψ(p,R) and all of whose other columns are zero. Since ψ is semi-monomial, pκ is column monomial. The two vectors η and ξ are defined by η = χ(q,R) ∀s ∈ R
ξs =
1 0
where χ(q,R) is the non-zero block of χ, and if s = t , otherwise.
It follows from Equations 1) and 2) that (χ, π, ψ) = (λ, μ, ν) ◦ (η, κ, ξ) and hence, by Theorem 2.3, that τ = θ σ .
J. Sakarovitch / An Introduction to the Theory of Finite Transducers
187
Example 2.2 (continued) Let us decompose the semi-monomial representation (χ2 , π2 , ψ2 ) of dimension (Q×R)×(Q×R) with Q = {p, q, r} and R = {s, t, u} , into a product (χ2 , π2 , ψ2 ) = (λ 2 , μ 2 , ν 2 ) ◦ (η 2 , κ 2 , ξ 2 ) . We can obviously take (λ 2 , μ 2 , ν 2 ) = (λ2 , μ2 , ν2 ) and (η 2 , κ 2 , ξ 2 ) = (η2 , κ2 , ξ2 ) . But if we follow the construction of the previous proof, we obtain λ2 = 1 0 0 , ⎞ 0 (p, a) 0 aμ2 = ⎝0 (q, a) 0⎠ , 0 (r, a) 0 ⎛
⎞ (p, b) 0 0 0 (q, b)⎠ , bμ2 = ⎝ 0 0 (r, b) 0
⎛ ⎞ p ν2 = ⎝q ⎠ , r
⎛
then ⎞ 1 0 0 ⎝ 0 1 0 ⎠, 0 0 1 ⎛
(p, a)κ2
=
⎛
(p, b)κ2
=
⎞ b10 ⎝0 0 1⎠ , 000
⎞ 0 0 0 ⎝ aa 0 0 ⎠ , 0 ab a ⎛
(q, a)κ2
=
⎞ 1 0 0 ⎝ 0 1 0 ⎠, 0 0 1
⎛
(r, a)κ2
⎛
(q, b)κ2
=
=
⎞ 0 0 0 ⎝aab aa 0 ⎠ , 0 0 ab
⎛
(r, b)κ2
=
⎞ aa 0 0 ⎝ 0 ab a ⎠ , 0 0 0
and, if we choose t as the distinguished state of R ⎛
pκ2 =
⎞ 0 0 0 ⎝0 0 0⎠ , 0 1 0
⎞ 0 0 0 ⎝0 0 0⎠ , 0 a 0 ⎛
qκ2 =
⎛
rκ2 =
we end up with η2
=
bb b 1 ,
and
ξ2
=
⎞ 0 0 0 ⎝0 0 0⎠ , 0 ab 0
⎛ ⎞ 0 ⎝1⎠ 0
.
The transducers corresponding to these matrix representations are given in Figure 10. 2
References [1] J. Berstel, Transductions and Context-Free Languages, Teubner, 1979. [2] C. Choffrut, Une caract´erisation des fonctions s´equentielles et des fonctions souss´equentielles en tant que relations rationnelles, Theoret. Computer Sci., vol. 5 (1977), 325– 337. [3] S. Eilenberg, Automata, Languages and Machines, vol. A, Academic Press, 1974. [4] C. C. Elgot and J. E. Mezei, On relations defined by generalized finite automata, IBM J. Res. and Develop., vol. 9 (1965), 47–68. [5] P. C. Fischer and A. L. Rosenberg, Multitape one-way nonwriting automata, J. Computer System Sci., vol. 2 (1968), 88–101. [6] C. Frougny. In this volume. [7] J. E. Hopcroft, R. Motwani and J. D. Ullman, Introduction to Automata Theory, Languages and Computation, Addison-Wesley, 2000.
188
J. Sakarovitch / An Introduction to the Theory of Finite Transducers
b | pb
a | qa a | pa
p
b | rb a | ra
q
1|p
r b | qb
1|q
1|r
(a) the sequential transducer
p | 1 + q | a + r | ab pa | 1 + qb | 1
pa | 1 + qb | 1
pb | b + rb | aa
s 1 | bb
pa | 1 + qb | 1 qa | ab
qa | aa + ra | aab
u
t pb | 1
1|b
pb | 1 + rb | a
qa | a + ra | ab
ra | aa + rb | ab
(b) the co-sequential transducer Figure 10. Decomposition of the semi-monomial representation (χ2 , π2 , ψ2 )
[8] S. C. Kleene, Representation of events in nerve nets and finite automata, in: Automata Studies, C. Shannon and J. McCarthy (ed.), Princeton Univ. Press, 1956, 3–41. [9] M. O. Rabin and D. Scott, Finite automata and their decision problems, I.B.M. J. Res. Develop., vol. 3 (1959), 125–144. Reprinted in Sequential Machines : Selected Papers (E. Moore, ed.), Addison-Wesley, 1965. [10] J. Sakarovitch, A construction on automata that has remained hidden, Theoret. Computer Sci., vol. 204 (1998), 205–231. [11] J. Sakarovitch, El´ements de th´eorie des automates, Vuibert, 2003. English translation: Elements of Automata Theory, Cambridge University Press, to appear. [12] M. P. Sch¨utzenberger, A remark on finite transducers, Inform. and Control, vol. 4 (1961), 381–388. [13] M. P. Sch¨utzenberger, Sur les relations rationnelles entre mono¨ıdes libres, Theoret. Computer Sci., vol. 3 (1976), 243–259. [14] M. P. Sch¨utzenberger, Sur une variante des fonctions s´equentielles, Theoret. Computer Sci., vol. 4 (1977), 47–57. [15] A. M. Turing, On computable numbers with an application to the Entscheidungsproblem, Proc. London Math. Soc., vol. 42 (1936), 230–265. Addendum vol. 43 (1937) 544–546.
Physics and Theoretical Computer Science J.-P. Gazeau et al. (Eds.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
189
Generating Languages Branislav Rovan * Department of Computer Science, Comenius University, Bratislava Abstract. This article is based on a talk given to the participants of the NATO Advanced Study Institute on Physics and Computer Science. It introduces some key concepts in Computer Science intended mainly for a non specialist and illustrates them on a particular topic of generating languages. The second part, bringing results on g-systems can be of interest also to specialists. Keywords. Language, model of computation, complexity, grammar, generative system.
1. Introduction This article is based on a talk given to the participants of the NATO Advanced Study Institute on Physics and Computer Science. Most of the participants came from physics and combinatorics but there was a number of computer scientists present as well. The article represents an attempt to present something interesting for everyone without loosing most of the participants due to either triviality or difficulty of the material. The article provides a necessarily personal and possibly biased bird’s eye view of computer science, some of its key concepts and zooms quickly to a particular area of generating languages. It enables to illustrate the way these concepts are studied and may inspire to use this way of thinking in other areas. 1.1. Introducing Computer Science There are several good reasons why it is necessary to introduce computer science. First, everyone “knows” what is computer science. Unfortunately, computer science is mostly perceived as a service discipline only. Moreover, it is usually identified with some of its very narrow subareas. The most frequent opinions are – computer science is about programming, about designing web pages, maybe dealing with computer games. More experienced people are likely to identify databases, networks, computer graphics and more. A false implication of this narrow and shallow perception is that computer science is something that everyone can do. There are many who consider themselves computer experts because they can work with a spreadsheet program, send an e-mail, or browse the web. The important role of computer science in providing insight and understanding to the world of information and information processing is not recognized in general. * Correspondence to: Branislav Rovan, Department of Computer Science, Comenius University, 84248 Bratislava, Slovakia; E-mail:
[email protected]
190
B. Rovan / Generating Languages
Second, everyone knows it is important. This is both a blessing and a curse of computer science. There are good job opportunities. Many more job openings than computer science professionals available. As a result, many computer science positions in industry and various institutions are taken by amateurs. It comes as no surprise that software systems are not reliable, hard to maintain, and usually performing below expectations. Computer science has an excellent application drive. It is hard to find an area which cannot benefit from an interaction with computer science. This is also excellent for computer science which benefits from these interactions. However, rushing for new and new applications leaves little time for strengthening the foundations of computer science. There is a danger of using up the pool of tools, methods and knowledge in the core computer science and application areas may be left with no new ideas to tap. There is usually good money in various grant schemes for computer science. Since this is known, there are many who try to claim some of this money, often successfully, by disguising their projects into computer science clothes. As a result, the competition is more crowded than it perhaps should be. Third, everyone knows it is useful. However, this usefulness is often overestimated. In some applications it is not useful to introduce computers at all. More frequently, satisfied users of some system do not realize that a better designed system or software could help them much more. New dangers arising from using computers are mostly underestimated and often not realized at all. At the same time we are becoming critically dependent on computer systems in very basic aspects of our daily lives. Very few questions are asked. What is computer science? There are plenty of definitions (and arguments about them) around. Most often the definition would say it is a science about storing, organizing, manipulating and communicating information (or variants of this). A different type, stressing the purpose, states that it is a science about using information processing to improve our lives. “Processing information” seems to be the common denominator of these definitions. We shall take a brief look into what it means in the following section. Without attempting to come up with yet another definition of computer science we conclude by stressing that computer science is a fundamental science that should bring understanding to what information is, what are the fundamental laws that it obeys, how it is to be used and manipulated, etc. 1.2. Processing Information One can identify the following steps which are needed in order “to do information processing”. First, one has to “create” information. This involves representing, encoding, . . . , real world objects into some machine readable form, into a form that can be stored and manipulated in a computer. The importance of this step is not always appreciated. And it can make a difference. A frequent example cited is the representation of natural numbers. We could choose roman numerals or the decimal notation. It is clear which choice will let us multiply numbers easier. Second, find efficient ways to manipulate information. This involves design of algorithms, data structures, . . . , but also ways to present (visualize) information. Third, make sure that what was done is what was intended to be done. One has to specify what a (software) system should do and prove it does, best in conjunction with
B. Rovan / Generating Languages
191
the second step. Many formal and semiformal specification languages were designed and associated logics and proof systems studied. Fourth, one has to really make it work. This means implementation and involves many of the engineering aspects of computer science. It involves writing up programs (transferring the above obtained design into machine executable code) and of course it needs hardware, machinery that executes programs. 1.3. Computer Science – the Landscape Let us now have a brief look at the theory part of the landscape of computer science. Models of Computation and Algorithms. This is the oldest part of the computer science landscape. The quest for finding algorithms (recipes for performing some sequence of computations) is with us for hundreds of years and precedes the time of computer science as an established scientific discipline. It was the need to prove that certain things cannot be computed that required formalisation of the computation process and study of models of computation. This also allowed to measure complexity of computations and problems. This part covers mainly the second step in Section 1.2 but it relates to the first step as well. Semantics. This part of the landscape relates to the third step in Section 1.2 and strives to provide guarantees that computations behave as expected and deliver desired results. Such guarantees are becoming increasingly important as the size of software systems exceeds any imaginable limits and our reliance on such systems turns failures to large scale disasters. Providing guarantees requires first a language to state precisely, what a system should do. Many such specification languages, both general purpose and specific for certain type of problems, were developed. Each such language has to be accompanied by a suitable logic in which proofs can be carried out (sometimes supported by software proof assistant). Despite great progress in this area our ability to really guarantee something is still limited. This is best illustrated by reading the “guarantee” on any software package and trying to translate it to a guarantee given when buying a car. Borderlines to Other Disciplines. It is certainly not an exaggeration to say that computer science borders with any other discipline. In some cases it may be a short and mostly unexplored border, in others a long border with many border crossings. In the fifties the borderline with linguistics was explored. An attempt to use computers to automate language translation made linguists realize they need a more precise understanding and description of the structure of sentences of a language. The formal grammar model was defined, which was later used in computer science to define syntax of programming languages. Currently the borderlines to physics, biology, and psychology appear to be the most vivid ones. Each of these disciplines enjoys the increased power of computers and better simulation and visualisation tools, algorithms for string matching helped in genome sequencing, etc. On the other hand, computer science looks for inspiration in nature to come up with new methods for computing – computing by nature. This includes quantum computing inspired by physics and molecular computing inspired by replication processes in cells. Vast knowledge accumulated in particle physics brought about nano-engineering which is paving way to construction of yet smaller devices. Cognitive science explores the borderline “between the computer and the brain” in an effort to help psychology to better understand the cognitive processes in human brain and to help computer science to find new ways to organize and manipulate information.
192
B. Rovan / Generating Languages
2. Key Concepts in Computer Science In Section 1 we first took a very broad view of computer science and then looked at the theory part of its landscape. We shall now continue in this zooming in approach and bring some of the key concepts in the “models of computation” part of the landscape. We cannot treat these notions in a formal and precise way as usual in this part of computer science (an interested reader can find more details in some of the classic books, e.g., [3], or many of the more recent ones, e.g., [8]). However, the informal explanations should suffice for understanding the reasons for having these notions and problems they should help us to solve. 2.1. Models of Computation The need for formal models of computation arose in the twenties when it was realized that an informal notion of an algorithm does not allow to prove that some problem cannot be solved by an algorithm. We can view a model of computation as a type of a computer with a specific (usually very restricted) set of instructions it can execute and possibly some restrictions on the use of its memory. Fixing a particular model (i.e., a particular instruction set and memory type) allowed to study problems, that can or cannot be solved by using this particular model of computation. The two models most likely to be known outside the computer science community are Turing Machines and Finite Automata. For studying particular types of problems many other models were introduced. E.g., sorting tree is a model introduced to study sorting problems, cellular automata were introduced to study problems of selfreplication, and there are many more – Boolean circuits, Binary Decision Diagrams, Minsky Machines, RAM, λ-Calculus, Formal Grammars, etc. Each of the models represents certain approach to solving problems. A natural key question is that of comparing the models. In particular, comparing their “power” and comparing the “complexity” of solving problems using the particular model. In the area of models of computation a problem is encoded as a language. By a language we understand a set of words, i.e., finite sequences of symbols from a given finite set called an alphabet. For example, the problem of computing squares can be 2 encoded as a language {an bn | n ≥ 1} of words consisting of a sequence of n letters a followed by n2 letters b. If we need to find out whether a particular word ai bj belongs to this language we have to be able to compute i2 and compare it to j. Encoding problems as languages allows to treat seemingly unrelated problems in a uniform way and in fact to reduce all problems to the same question – whether a given word belongs to a given language (the membership problem).1 We are now left with a problem of how to define (describe) a language. We need finite descriptions of languages through which they can be manipulated (examined). There are two major approaches to defining languages. First, the procedural approach in which languages are described by automata (“abstract computers”). A “program” for an automaton specifies the way the automaton should “process” the word in question and indicate whether the word belongs to the language. The second is the denotational approach in which languages are described by grammars which specify how to construct (generate) precisely the words in the language, usually exploiting the structure of the words. 1 We shall leave out the discussion of what constitutes a “reasonable” way of encoding problems by languages.
B. Rovan / Generating Languages
193
Describing languages by logical formulae also belongs to the denotational approach. Besides these two there are other approaches, e.g., using algebra, as shown elsewhere in this book. In this article we shall focus on defining languages via grammars and give more details and examples in Section 3. We can associate a family of languages L(M OD) to each model of computation M OD as the family of those languages for which there exists a description by a particular instance of the model. Denoting by F A the set of all finite automata we can thus write L(F A) for the family of languages definable by finite automata and similarly L(T M ) for the family of languages definable by Turing machines. The family L(F A) is the family of regular languages R (denoted by RatA∗ in the article by J. Sakarovitch in this book) and the family L(T M ) is the family LRE of recursively enumerable languages. It is natural to compare the computational power of models of computation via the families of languages they define. We say that M OD2 is at least as powerful as M OD1 if L(M OD1 ) ⊆ L(M OD2 ). We also say that M OD1 can be simulated by M OD2 . This means that any problem that can be solved by means of M OD1 can also be solved by means of M OD2 . Some models may be equivalent, i.e., they can simulate each other. 2.2. Classifying problems (languages) according to their difficulty We explained above that it suffices to consider the membership problem, i.e., given a word w and a language L find out whether w ∈ L. The first major classification of problems (languages) was the classification into decidable and undecidable problems, i.e., problems that can be solved by algorithms and problems for which no algorithmic solution exists. The language (problem) L is said to be decidable if there exists a Turing machine2 whose computation halts on every input word which solves the membership problem for L. The saddle difference between decidable and undecidable rests in the halting property. For a Turing machine which is not guaranteed to halt on every input word we cannot distinguish between a computation which will never finish and a computation which will say the input word belongs to the langauge after a very long computation (since we have no time limit on possible lengths of computations). An example of a decidable problem is the membership problem for regular languages. This is a problem to decide for a given finite automaton A and a word w whether A accepts w, i.e., whether the computation of A declares that w belongs to the language defined by A. An example of an undecidable problem is the membership problem for recursively enumerable languages. This is similar problem to the above with the finite automaton replaced by the Turing machine. A more famous undecidable problem is the halting problem for Turing machines. This is a problem to decide for a given Turing machine A and a word w whether the computation of A on w halts. When solving “practical” problems it turned out that some decidable problems are easy to decide and for deciding other problems the computational power of existing computers did not seem to suffice. There was a need for refining the classification of problems by subdividing the family of decidable languages into several families. Since the reason for this was the difference in difficulty of solving problems it was necessary to introduce some measures of complexity of problems (languages). The most common 2 One could argue that Turing machines are indeed a good formal model for the intuitive notion of an algorithm (this is the widely accepted Church-Turing hypothesis) by pointing out that all models that resulted from the efforts to formalize the notion of an algorithm turned out to be equivalent.
194
B. Rovan / Generating Languages
and natural measures of complexity are T IM E and SP ACE, measuring the number of steps in a computation (number of instructions executed) and the amount of memory used. For a given (nice) function f (n) we can then consider the subset of decidable languages for which there exists a Turing machine which can decide the membership problem for words of length n in time at most f (n) and similarly for space. We now briefly introduce the notion of nondeterminism. The readers are referred to [3] for more information. The model of Turing machines that naturally corresponds to common computer programs is that of deterministic Turing machines. These are the ones we were using so far and they are characterized by the property, that in each situation there is at most one possibility to continue the computation. The nondeterministic Turing machines may have several choices for continuing in a computation. The language defined by such a machine is the set of those words for which there exists a computation leading to acceptance (i.e., a computation that declares the word to be in the language defined by the Turing machine). Other possible computations on the same word that may not halt or that declare the word not to belong to the language “do not count”. This can be interpreted so that in each case where the nondeterministic machine can choose the way to continue makes the choice which leads to acceptance of the word processed. Thus, nondeterministic choice is not to be interpreted as random choice or some kind of “indeterminacy”. Nondeterminism in models of computation enables us to forget about “backtracking” when looking for a solution to a problem. Nondeterminism is one of the difficult to grasp notions in computer science. There are models of computation for which we do not know whether nondeterminism increases their power and the situation is even less clear when considering complexity. The complexity classes for Turing machines are families of languages that can be defined by deterministic or nondeterministic machine within a given bound f (n) on time resp. space. These are usually denoted by DT IM E(f (n)), N T IM E(f (n)), DSP ACE(f (n)), and N SP ACE(f (n)). The class of problems that are considered to be “practicallyi solvable”, meaning within a reasonable time, is the class P = i≥1 DT IM E(n ), i.e., the class of problems for which the time complexity on deterministic Turing machines can be bounded bya polynomial in the size (e.g., length) of the input. Similarly we can define N P = i≥1 N T IM E(ni ) and P SP ACE = i i i≥1 N SP ACE(n ) which can be shown to be the same as i≥1 DSP ACE(n ). It is easy to see that P ⊆ N P ⊆ P SP ACE but it is not known whether these classes are different. The problem whether P = N P is the most famous open problem in computer science. Intuitively we see a huge difference between P and N P , not being able to simulate nondeterministic polynomial time machines better than in exponential time. Our inability to prove these classes to be different thus indicates there are still some fundamental aspects of computing we do not understand. Moreover, there are many problems of practical importance that are known to be in N P but not known to be in P , i.e., no practically feasible solution is known for these problems. The traveling salesman problem belongs to this category. It is an optimisation problem in which a shortest route of a salesman through a given set of cities is to be found. Since practical problems in N P need to be solved the question “How to deal with hard problems?” had to be addressed. Several approaches were taken. Parallel computing turned out to be of some help but brought no substantial breakthrough. Giving up exact solutions and settling for some approximation brought some success. Practically feasible solutions were found. Unfortunately, the method does not work for all prob-
B. Rovan / Generating Languages
195
lems in N P . In recent years there are some (raising and falling) hopes in the “computing by nature” paradigm. Mimicking the way nature can “compute” is expected to bring feasible solutions to problems in N P . One of such avenues being explored is quantum computing.
3. Language Generating Devices We shall now take the notion of a language and consider possible ways of defining (describing) a language. In case a language L is finite, it is possible to simply enumerate all the words in L. We are however interested in finite descriptions for infinite languages as well. In what follows we shall be interested in defining languages by specifying a way to “produce” any word in a language. 3.1. Grammars Let us consider the language L of all words in the alphabet {a, b} containing equal number of occurrences of a and b. We could describe it formally using the standard set notation by L = {w ∈ {a, b}∗ | a (w) = b (w)}, where c (w) denotes the number of occurrences of the letter c in the word w. This description does not say how the words look like, does not indicate their structure. It is easy to realize that there are two types of nonempty words in L. Ones that begin with a and ones that begin with b. We can describe (define) this language using two other languages. The language La = {w ∈ {a, b}∗ | a (w) = b (w) + 1} containing words with one more a and the language Lb = {w ∈ {a, b}∗ | b (w) = a (w) + 1} containing words with one more b. We can thus define L by L = {} ∪ aLb ∪ bLa . We succeeded in defining L using two other languages in a way that says more about the composition of its words. To obtain a similar definition for La and Lb we can write La = {a} ∪ aL ∪ bLa La and Lb = {b} ∪ bL ∪ aLb Lb . We can then simultaneously define all three languages as the least fixpoint solution of the system of set equations L = {} ∪ aLb ∪ bLa La = {a} ∪ aL ∪ bLa La Lb = {b} ∪ bL ∪ aLb Lb Grammars use a generative approach to defining languages which resembles the above mentioned way using sets of words. Nonterminal symbols are used to represent some sets of words and a mechanism is defined through which words of the language to be defined are obtained by successively replacing (rewriting) these nonterminals using the rewriting rules of the grammar. The following rules correspond to the above way of defining the language L (the | sign separates the alternatives for rewriting the nonterminal symbol at the far left). S → | aSb | bSa Sa → a | aS | bSa Sa Sb → b | bS | aSb Sb
196
B. Rovan / Generating Languages
The process of generating words in L can be illustrated as follows (illustrating the use of the rewriting rules Sb → aSb Sb and Sb → bS in two of the rewriting steps by underlining). The word generated is aababb, the sequence of words leading to it is called a derivation and words in a derivation are called sentential forms. S ⇒ aSb ⇒ aaSb Sb ⇒ aabSb ⇒ aabaSb Sb ⇒ aababSSb ⇒ aababSb ⇒ aababb Formally, the (context-free) grammar is defined as a fourtuple G = (N, T, P, S), where N and T are finite alphabets (of nonterminal and terminal symbols resp.), P ⊆ N × (N ∪ T )∗ is a finite set of rewriting rules, S is the initial nonterminal symbol, and the language it defines (generates) is L(G) = {w ∈ T ∗ | S ⇒∗ w}, where ⇒∗ is the reflexive and transitive closure of the rewrite relation ⇒ on words in (N ∪ T )∗ . Already very simple grammars can generate languages composed of words which are not easy to describe by other means. Consider, e.g., a grammar with just two rewriting rules S → a | bSS and try to describe the resulting language by other means or in English. Grammars can be more general. The rewriting rules may be more complex (not contextfree), e.g., S → aSBC, S → , CB → BC, aB → ab, bB → bb, C → c or the way the rules are applied, i.e., the rewrite relation ⇒ may be more complex.
3.2. Parallelism in Grammars One of the phenomena studied in computer science is parallelism. We shall discuss the possibilities of introducing parallelism in grammars and later, in Section 4, introduce an abstract model enabling to study this phenomenon. There are several possibilities for introducing parallelism in grammars. Following their brief description we shall then concentrate on the first one. Parallel ⇒. In this case several (possibly all) symbols in u are rewritten simultaneously in order to obtain the v in u ⇒ v. We shall give several examples of grammars using this type of parallelism in the next subsection and introduce a general framework for studying them (introduced in [13]) in Section 4. Several grammars one sentential form. In this case grammars “take turns” in rewriting a common sentential form following given “rules of conduct”. This model was studied under the name of Cooperating Distributed Grammar Systems - CDGS (see e.g., [10], [1]). It models the “blackboard architecture” of problem solving in AI. Several grammars and several sentential forms. In this case several grammars synchronously rewrite their own sentential forms and communicate occasionally by providing their intermediate sentential forms to requesting grammars. The way of communicating resembles message passing. This model was studied under the name of Parallel Communicating Grammar Systems - PCGS (see e.g.,
197
B. Rovan / Generating Languages
[11], [1]).
3.3. Parallel Rewriting We shall illustrate in an informal way a variety of ways the parallel ⇒ was introduced and studied in the literature. We start with the sequential case of context-free grammars. In fact, all parallel models we use in the illustration are based on context-free rewriting rules. 3.3.1. Context-free Grammar Illustrating a derivation step aabASbA ⇒ aababSASbA realized using the rewrite rule: A → abSA.
a a b AS b A
a a b a b S AS b A In the context-free grammar case a single occurrence of a nonterminal symbol is selected and rewritten in one derivation step. 3.3.2. Indian parallel grammar Illustrating a derivation step aabASbA ⇒ aababSASbabSA realized using the rewrite rule: A → abSA.
a a b AS b A
a a b a b S AS b a b S A
In the Indian parallel grammar case [17] a nonterminal and a rewriting rule for this nonterminal are selected. Subsequently the selected rule is applied to rewrite all occurrences of the selected nonterminal in the sentential form in one derivation step (i.e., all these occurrences are rewritten in parallel). 3.3.3. Absolutely parallel grammar a a b AS b A Illustrating a derivation step aabASbA ⇒ aababSAbbbbAAbA realized using the rewrite rule: (A, S, A) → (abSA, bb, bAAb). a a b a b S A b b b b AA b In the absolutely parallel grammar case ([12]) the rewriting rules are given in “tuples”. In order for such rule to be applicable to a sentential form, the sequence of nonterminal symbols on its left hand side must match exactly the sequence of all occurrences of nonterminal symbols in this sentential form. If this is the case, all nonterminals are rewritten in one derivation step to the corresponding words of the right hand side tuple of the rewriting rule.
198
B. Rovan / Generating Languages
3.3.4. Lindenmayer Systems Illustrating a derivation step aabab ⇒ abbbabbababbA realized using the rewrite rules: a → ab, a → bb, b → abb.
aabab
ab b bab babab b
In the Lindenmayer system case ([16]) no distinction is made between terminal and nonterminal symbols and in one derivation step all symbols of the sentential form must be rewritten using some rewriting rule (possibly different rule for different occurrences of the same symbol). The original motivation for introducing this model was in biological systems, modeling cell development, hence the potential change of each symbol (cell status) in each time step. In the meantime other applications (e.g., in graphics) were discovered. 3.3.5. Discussion Despite the sketchiness of the above examples one can see that parallelism may speed up the derivation process of a word, i.e., we may derive the word in fewer derivation steps. On the other hand, describing what should happen in one derivation step may become more complex. Would more complex rules for the parallel ⇒ bring even faster derivations? How much the parallelism helps and does it help always? How to compare the models taking into account also the complexity of the ⇒ definition? These and similar questions led to introducing a general model of a grammar – a generative system (g-system) – that provides a unifying framework for studying sequential and parallel grammars [13]. Section 4 is devoted to this model.
4. Generative Systems 4.1. Definition and Basic Properties We now formally introduce g-systems and in Example 4.1 present their pictorial representation which helps to follow the way they generate words. Definition 4.1. A one-input finite state transducer with accepting states (1-a-transducer)3 is a 6-tuple M = (K, X, Y, H, q0 , F ), where K is a finite set of states, X and Y are finite alphabets (input and output resp.), q0 in K is the initial state, F ⊆ K is the set of accepting (final) states, and H is a finite subset of K × X × Y × K. In case H is a subset of K × X × Y + × K, M is said to be -free. By a computation of such an a-transducer a word h1 . . . hn in H is understood such that (i) pr1 (h1 ) = q0 , (ii) pr4 (hn ) is in F , and (iii) pr1 (hi+1 ) = pr4 (Hi ) for 1 ≤ i ≤ n − 1, where pri are homomorphisms on H defined by pri ((x1 , x2 , x3 , x4 )) = xi for i = 1, 2, 3, 4. The set of all computations of M is denoted by ΠM . An a-transducer mapping is then defined for each language L ⊆ X by M (L) = pr3 (pr2−1 (L) ∩ ΠM ). For a word w let M (w) = M ({w}). 3 We
shall only use one-input transducers, and write briefly a-transducer.
199
B. Rovan / Generating Languages
Definition 4.2. A generative system (g-system) is a 4-tuple G = (N, T, M, S), where N and T are finite alphabets of non-terminal and terminal symbols respectively (not necessarily disjoint), S in N is the initial non-terminal symbol, and M is an a-transducer mapping. Definition 4.3. The language generated by a g-system G = (N, T, M, S) is the language L(G) = {w ∈ T S ⇒ G w}, where ⇒ G is the transitive and reflexive closure of the rewrite relation ⇒G defined by u ⇒G v iff v is in M (u). Note that without loss of generality we may assume there is only one accepting state in the a-transducers used in g-systems. We shall denote this state qF . Example 4.1. The following is a pictorial representation of an example of g-system G = (N, T, M, S), where N = {S}, T = {a, b, c}. The a-transducer M is given by the directed graph below using the standard notation from finite automata theory. The states are nodes of this graph, the arrows represent the fourtuples in H (e.g., the arrow from q1 to q2 labeled by S, Sb represents the fourtuple (q1 , S, Sb, q2 ) ), and the initial and final states are denoted by the in-arrow and a double circle respectively. The derivation step SaaSbbScc ⇒G SaaaSbbbSccc is realized by the following computation of the a-transducer M : (q0 , S, Sa, q1 )(q1 , a, a, q1 )(q1 , a, a, q1 )(q1 , S, Sb, q2 )(q2 , b, b, q2 )(q2 , b, b, q2 ) (q2 , S, Sc, qF )(qF , c, c, qF )(qF , c, c, qF ). Note that the computation can be viewed as a path in the graph representing M which starts in q0 , ends in qF and follows the edges to match the first components of their labels to the sentential form to be rewritten. a, a R
q0
c, c
b, b
S, Sa - q ? S, Sb - q ? S, Sc - q ? 1 2 F a, a S, a R q3
?
b, b S, b - q 4
?
6 S, c
S, SSS We shall call a g-system sequential if the change of the sentential form in one derivation step is only local, i.e., there is a constant limiting the maximal length of a subword of the sentential form that can change in one derivation step. Formally we have the following. Definition 4.4. A g-system G is said to be sequential if the only cycles in the graph representation of its a-transducer are copying cycles in its initial and final states (i.e., cycles of the form (q0 , a, a, q0 ) or (qF , a, a, qF ) for some symbol a.) Notation. We shall denote the class of all g-systems by G, the class of all sequential g-systems by S and the respective classes of g-systems without erasing by G and S .
200
B. Rovan / Generating Languages
Furthermore, we denote by L(X ) the family of languages defined by the g-systems in the class X . One of the first questions about a new model is usually the question about its power, i.e., about the family of languages that can be defined by this model. It is easy to see that the following holds [13]. Theorem 4.1. L(G) = L(S) = LRE is the family of recursively enumerable languages and L(G ) = L(G ) = LCS is the family of context-sensitive languages. Thus g-systems have the power of Turing machines. This alone would not make the model interesting. The interesting part is that g-systems can simulate the grammar-like generating devices studied so far tightly, i.e., the derivations of words in some model (e.g., Indian parallel grammars, Absolutely parallel grammars, and almost all other ones) can be mimicked by a suitable g-system in essence exactly, without a need for some additional steps to be performed inbetween. This makes g-systems a suitable unifying framework for studying time complexity of various types of grammars. 4.2. Complexity Measures and Complexity Classes We shall now introduce complexity measures we shall consider for g-systems in a standard way. First the usual computational measures of complexity – time and space. Definition 4.5. We shall define the time and space complexity as usual and employ the following notation, where X is a class of g-systems. GT IM E(w, G) = min{k | S⇒k w}, GT IM E(n, G) = max{GT IM E(w, G) | |w| ≤ n, w ∈ L(G)}, GSP ACE(w, G) = min{max{|S|, |w1 |, . . . , |wm |} | S ⇒ w1 ⇒ · · · ⇒ wm = w}, GSP ACE(n, G) = max{GSP ACE(w, G) | |w| ≤ n, w ∈ L(G)}, GT IM EX (T (n)) = {L | L = L(G), G ∈ X and GT IM E(n, G) = O(T (n))}, and GSP ACEX (S(n)) = {L | L = L(G), G ∈ X and GSP ACE(n, G) = O(S(n))}. We shall now turn to descriptional complexity measures for g-systems. They measure the complexity of defining the derivation step ⇒, i.e., the complexity of the atransducer of the g-system. Additional measure of descriptional complexity is the number of nonterminal symbols of G. Definition 4.6. Let G = (N, T, M, S) be a g-system with M = (K, X, Y, H, q0 , F ). We shall define the ST AT E and ARC complexity measuring the number of states and the number of elements in H as follows: ST AT E(G) = (K) and ARC(G) = (H). For languages and a class X of g-systems we define ST AT EX (L) = min{ST AT E(G) | G ∈ X , L(G) = L} and similarly ARCX (L) = min{ARC(G) | G ∈ X , L(G) = L}. We can also define the complexity classes ST AT EX (n) = {L | ST AT EX (L) ≤ n} and ARCX (n) = {L | ARCX (L) ≤ n}. Similarly we define the measure V AR. V AR(G) = (N ), V ARX (L) = min{V AR(G) | G ∈ X , L(G) = L}, and V ARX (n) = {L | V ARX (L) ≤ n}. Complexity classes, i.e., families of languages defined by restrictions on the complexity of languages are studied from different perspectives. One is interested to find out
B. Rovan / Generating Languages
201
how and whether the complexity of the languages changes under usual language operations. This translates to the study of closure properties of these language classes under various operations. One is also interested to know what can be said about the complexity of the languages in some complexity class under a different complexity measure. This translates to the study of inclusion relations among the complexity classes. It is also important to know that for a given complexity measure certain language or a family of languages cannot have a complexity below some threshold. These “lower bound” problems belong to the most difficult ones in computer science and are mostly open or waiting for some nontrivial answers. Properties of the complexity classes ST AT ES (n) (in fact for g-systems in some normal form4 ) were studied in [14] where it was shown they have good closure properties (they form an AFL [2]) and their hierarchy is finite. It was shown later that the hierarchy of the complexity classes for the combined complexity measure ST AT E − V AR is also finite (for sequential g-systems in normal form). Notation. Let Ni.j = {G | G ∈ S is in normal form and SAT E(G) ≤ i and V AR(G) ≤ j} and let L(Ni.j ) be the corresponding family of languages. It was shown in [5] that the following holds. Theorem 4.2. L(N6,2 ) = L(N5,3 ) = L(N4,4 ) = LRE This result can be translated to a very strong normal form result for phrase structure grammars of the Chomsky hierarchy [4]. Corollary 4.1 (Geffert normal form). Any L ∈ LRE can be generated by a grammar with all rules of the form S → u where S is the initial nonterminal and having 1. two additional rules AB → ε and CD → ε, or 2. one additional rule ABC → ε, or 3. one additional rule ABBBA → ε. In each case the nonterminals shown are the only ones used in the grammar. An exact relation of the complexity classes L(N5,2 ) =?, L(N4,3 ) =?, and L(N4,2 ) =? to LRE or other families of languages is still open. 4.3. Sequential vs. Parallel g-systems Theorem 4.1 states, that from the point of view of the generative power there is no difference between sequential and parallel (or general) g-systems. Clearly, for any complexity measure M EASU RE and any L ∈ LRE it holds M EASU REG (L) ≤ M EASU RES (L). Is it the case that the complexity is indeed greater if we restrict ourselves to sequential g-systems only? It follows from Theorem 4.2 that this question is trivial for the measures ST AT E and V AR. In [13] a construction of a sequential g-system equivalent to any given g-system was given from which the following upper bounds were obtained. Theorem 4.3. The following relations on sequential vs. parallel complexity hold. 1. There exists a constant k such that ARCS (L) ≤ k · ARCG (L) for each L and n. 4 It
is not important for this article to give the details here.
202
B. Rovan / Generating Languages
2. GSP ACES (S(n)) = GSP ACEG (S(n)) 3. L ∈ GT IM ES (GSP ACE(G, n) · GT IM E(G, n)) for each L and any G such that L = L(G). Restricting ourselves to sequential g-systems does not influence the measure GSP ACE, we loose at most a constant factor for the measure ARC and it has no significance for the measures ST AT E and V AR. Let us concentrate on the time complexity now. Notation. Let GT IM E X (f (n)) = {L | GT IM E(G, n) = Ω(f (n)) for each G ∈ X such that L = L(G)}. (Thus, GT IM E X (f (n)) is the family of languages which cannot be generated by any g-system in X asymptotically faster than f (n).) Since sequential g-systems can prolong a sentential form by only a constant number of symbols in one step and parallel g-systems can extend the sentential form in one step by at most a constant factor, the following lower bound theorem holds. Theorem 4.4. L ∈ GT IM E S (n) and L ∈ GT IM E G (log n) for each infinite L ∈ LRE . The above theorem suggests that the difference in time complexity between sequential and parallel g-systems can be exponential. This is indeed the case. The Lindenmayer system having a single rewriting rule a → aa (which can be easily simulated i by a g-system in the same time complexity) generates the word a2 in i steps, thus i L = {a2 | i ≥ 1} ∈ GT IM E G (log n) while at the same time L ∈ GT IM E S (n). Some results about languages for which parallelism really helps, i.e., which can be generated fast (in polylogarithmic time) by parallel g-systems appear in [6]. The significance of Theorem 4.3 comes from the fact that its statement (3) allows to transform lower bounds for time complexity from the “sequential world” to the parallel one. Considering the fact that proving lower bounds is difficult enough in the sequential case, this may be a useful tool. In case we consider -free g-systems only, their space complexity is n. We can thus state the following corollary to Theorem 4.3. Corollary 4.2. Let L ∈ GT IM E S (f (n)). Then L ∈ GT IM E G (f (n)/n). Since it is known that the language L = {wcw | w ∈ {a, b}∗ } needs quadratic time on sequential grammars it follows that it cannot be generated faster than in linear time by a nonerasing g-systems (and due to tight simulations by any of the grammars with parallel ⇒ studied in the literature). 5. Relating Sequential and Parallel Worlds In Section 4.3 we have seen that some languages can be generated fast in parallel (in logarithmic time) and some need linear time. In [6] we defined “fast in parallel” to be “polylogarithmic time”. It was shown that the family of languages that can be generated fast in parallel is an AFL, i.e., has good closure properties. In an attempt to find an alternative characterisation of this family a more general result was shown. Namely, the following theorem holds, where 1N SP ACE(f (n)) is the complexity class of all languages accepted by nondeterministic Turing machines with one-way read-only input tape and a working tape size bounded by f (n).
203
B. Rovan / Generating Languages
Theorem 5.1. GT IM E(f (n)) = 1N SP ACE(f (n)) for f (n) ≥ log n.
Proof. Although we shall not present a formal proof here (see [6]), we shall present the main idea of the simulation. To prove the inclusion GT IM E(f (n)) ⊆ 1N SP ACE(f (n)) we utilize the notion of a derivation tree for g-systems. It is a tree labeled by four-tuples from H (the set of edges of the a-transducer M of the given g-system G). The labeling must satisfy certain constraints. The sequence of fourtuples at each “level” of the tree corresponds to a computation of the a-transducer M . Furthermore, the concatenation of the second components of the labels of the daughter nodes is exactly the third component of the label of their father node. We shall illustrate this notion by the following example (using the g-system from Example 4.1.
(q0 , S, SSS, qF )
(q0 , S, Sa, q1 )
(q1 , S, Sb, q2 )
(q2 , S, Sc, qF )
(q1 , a, a, q1 ) (q2 , b, b, q2 ) (qF , c, c, qF ) (q1 , S, Sb, q2 ) (q2 , S, Sc, qF ) (q0 , S, Sa, q1 )
The computations of the a-transducer at the three levels of this tree realize the derivation steps of the derivation S ⇒ SSS ⇒ SaSbSc ⇒ SaaSbbScc. This derivation tree is said to correspond to this derivation. The basic idea of the work of the Turing machine A with a one–way input tape which simulates G in space O(f (n)) is as follows. It successively guesses and stores on its working tape the paths leading from the root of a derivation tree of G to its leaves, in the left to right order. It simultaneously checks if the elements of one level form a computation of the a-transducer M and if the concatenation of the outputs of the leaf nodes forms its input word. If it succeeds, it accepts, otherwise rejects. This process is depicted at the figure below. The simulation proving the reverse inclusion is more complicated to describe and we refer the reader to [6].
204
B. Rovan / Generating Languages
G:
S O(f (n)) ...
A:
··· ··· .. .
Corollary 5.1. GT IM E(f (n)) = N SP ACE(f (n)) for f (n) ≥ n. Since it easily follows from standard constructions that GSP ACE(f (n)) = N SP ACE(f (n)) for f (n) ≥ n the following corollary holds. Note, that an analogous result about equal power of time and space is not known (and believed not to hold) for other models studied in the literature. Corollary 5.2. GT IM E(f (n)) = GSP ACE(f (n)) for f (n) ≥ n.
6. Nondeterminism and Communication
In this section we shall turn our attention to the relation of g-systems and Parallel Communicating Grammar Systems (PCGS) described in Section 3.2. The parallelism in gsystems is in the derivation step relation ⇒. In PCGS we have a number of grammars working in parallel and communicating (sending their intermediate sentential forms). It was shown in [15] that a simple type of PCGS can be tightly simulated by g-systems (i.e., within a constant time loss). This simulation exploits substantially the nondeterminism of g-systems. We shall see that in this case communication can be replaced by nondeterminism. In what follows we shall present the main ideas and due to space constraints resort to an informal description of the PCGS model and leave out many details (see [15] for more details).
205
B. Rovan / Generating Languages
-
G1 sentential form G1
G2 sentential form G2
R
G3
sentential form G3
Above is an example PCGS system in which three grammars G1 , G2 , G3 synchronously rewrite their respective sentential forms (rewriting step of the PCGS) or request the sentential form generated by some other grammar (communication step of the PCGS). In the example the arrows indicate “the communication structure” showing that G1 can request the sentential form from G2 and G3 , the grammar G2 can request the sentential form from G1 and G3 , while G3 cannot request the sentential form from the other grammars. The language generated by some PCGS consists of terminal words generated by G1 . Various complexity measures for PCGS were considered, e.g., the number of component grammars, the type of the component grammars, the complexity of the communication structure, and time (see e.g., [1], [7], [9]). In this article we shall consider the class of PCGS with arbitrary number of component grammars, each of them regular, and consider the time complexity bounded by some f (n) (based on the length of derivations, formally defined similarly to Definition 4.5). We shall denote by REG-P CGST IM E(f (n)) the corresponding complexity class of languages. Since g-systems have the power of Turing machines it is clear they can simulate PCGS. The following theorem shows that this simulation can be tight. Theorem 6.1. REG-P CGS-T IM E(f (n)) ⊆ GT IM E(f (n)) for f (n) ≥ log n. Proof. We only give a basic idea behind a technically involved proof of this theorem. The problem with a straightforward simulation (by, e.g., representing the m sentential forms of the m component grammars of the PCGS separated by some special symbol in the sentential form of the g-system) would require copying possibly large parts of a sentential form from one place to another during the simulation of a communication step. Since g-systems, despite parallel rewriting, can only copy a constant number of symbols from one part of the sentential form to another in one derivation step, the straightforward simulation would not be tight. The basic idea behind the tight simulation is the following. Instead of copying the requested sentential form from one part of the sentential form to another, the g-system
206
B. Rovan / Generating Languages
will nondeterministically guess the position where the sentential form of some Gi will be needed (possibly at several places) and generates in parallel a copy of this sentential form at each of these places. For regular component grammars it is possible to guarantee that all copies of the sentential form of Gi so generated are indeed identical. For regular component grammars the rewriting and communication requests can only occur at the right end of the sentential form of each grammar. In case the communication was nondeterministically guessed properly, the requested sentential form immediately follows the communication request symbol. The communication step can thus be realized by simply deleting some markers in the sentential form of the simulating g-system. Applying Theorem 5.1 we obtain the following relation of PCGS time to sequential space. Corollary 6.1. REG-P CGS-T IM E(f (n)) ⊆ 1N SP ACE(f (n)) for f (n) ≥ log n. Besides giving some upper bound for PCGS time the above corollary also shows how g-systems can be used to find relations to sequential complexity classes for those types of grammars, where tight simulation can be found. The fact the component grammars of the simulated PCGS were regular was crucial for the construction used in the proof of Theorem 6.1. An attempt to use the same construction for context-free grammars in components of the PCGS would fail for several reasons, the main being it is no longer possible to guarantee identity of all copies of the sentential form of Gi generated at “proper” places within the sentential form of the simulating g-system. It was shown in [15] that the problem is not with the construction but with the fact itself, as the following theorem states. Theorem 6.2. There exists a language L such that L ∈ CF -P CGS-T IM E(log n) and L ∈ GT IM E(log n). The proof required to find a suitable language L ∈ CF -P CGS-T IM E(log n) for which L ∈ GT IM E(log n) was subsequently proved using sophisticated results from the communication complexity theory [7].
7. Determinism and Nondeterminism in Generative Systems 7.1. Measuring Nondeterminism Nondeterminism of g-systems was used substantially in Section 6 to simulate PCGS by g-systems. It is natural to ask what happens if we limit nondeterminism in g-systems. In fact the relation between deterministic and nondeterministic versions is one of the key questions studied for any model of computation. First we ask “how much nondeterminism” we need. In order to do so we need some measures for the amount of nondeterminism in g-systems. We shall consider the following “static” measures based on the “description” of the g-system (naturally, the underlying a-transducer). Notation: N ST AT E(G) is the number of nondeterministic states in the atransducer of G, i.e., states q for which there exist a letter a and two distinct 4-tuples (q, a, u, p) and (q, a, v, r) in H.
B. Rovan / Generating Languages
207
We can introduce a finer measure by counting the number of nondeterministic decisions possible for each state and each symbol. Notation: For each state q and each symbol a let dec(q, a) be the number of outgoing a-arcs (elements of H having q and a in their first and second components) minus one. Furthermore for each state q let dec(q) = Σa∈N ∪T dec(q, a) and dec(G) = Σq∈K dec(q), where N , T and K are the sets of nonterminal symbols, terminal symbols and states of the a-transducer of G resp. The following theorems show that these static measures of nondeterminism can be bounded by constants. Theorem 7.1. For every g-system there is an equivalent one having at most one nondeterministic state. Proof. We sketch the main idea only. In the construction we “move the nondeterministic decisions forward” to the initial state. We present only the basic idea of the proof. We can assume the given g-system is sequential. Suppose q is a nondeterministic state in the a-transducer M for which there are two possibilities to move on a, i.e., (q, a, u1 , r1 ) and (q, a, u2 , r2 ) are in H. Let (p, b, v, q) be in H. We can split the state q into two new states q1 , q2 and replace the above elements of H by (p, b, v, q1 ), (p, b, v, q2 ), (q1 , a, u1 , r1 ), and (q2 , a, u2 , r2 ). There is no nondeterministic decision on a in q1 and q2 and the nondeterministic decision is moved forward, to be taken in the state p on b. It can be shown that this procedure of moving nondeterministic decision forward preserves the language generated by the g-systems, it terminates and the only remaining nondeterministic state will be the initial state. Theorem 7.2. For every g-system there is an equivalent one having dec(G) = (N ∪ T ) + 1 (i.e., the number of nondeterministic decisions can be reduced to the size of terminal and nonterminal alphabets of the g-system increased by one). Proof. The formal proof is quite technical. The basic idea is not too complicated. We assume that the initial state is the only nondeterministic state of the g-system. We can number all arcs leaving the initial state. The new g-system will use a new initial state in which it will decide between copying a symbol read or starting to build a unary counter within the sentential form using some new nonterminal symbol. In the derivation steps that follow there will be a decision between increasing the counter or applying the rule with the number represented by the counter and continuing in the original g-system (this will delete the counter and new decision on where to build it again will be taken in the next derivation step). There will be just two possibilities to choose from for each symbol in N ∪ T and the new nonterminal introduced for the counter. The above theorems show that the static measures of nondeterminism are rather coarse. At present we study different measures that would allow to measure nondeterminism in derivations. 7.2. Deterministic g-systems We shall now consider an extreme case of the amount of nondeterminism by not allowing any nondeterminism at all. We shall see that, depending on other parameters, the power of deterministic g-systems can vary significantly.
208
B. Rovan / Generating Languages
First we shall consider g-systems without nonterminal symbols (except the initial nonterminal which does not appear in any of the output (third) components of the elements of H. We shall call these g-systems deterministic terminal g-systems and we shall denote the corresponding family of languages by LDT G . Note that each sentential form of a terminal deterministic g-system (except the initial nonterminal) belongs to the language generated and the sequence in which these sentential forms are generated is uniquely determined. Theorem 7.3. LDT G does not contain all finite languages and contains languages which are not context-free. Proof. It is easy to see that L = {an bn cn |n > 0} is generated by a deterministic terminal g-system. It can be shown that the language L = {a, aa, b, bb, bba, bbb, } cannot be generated by any deterministic terminal g-system. Next we consider the role of nonterminals in deterministic g-systems. By a detailed analysis of the properties of computations of deterministic g-systems it was possible to prove the following for the family of languages generated by deterministic g-systems LDG . Theorem 7.4. LDG contains all finite languages. Furthermore, there exists an -free regular language not contained in LDG . When we analyzed the above proofs we were able to identify a difficulty the deterministic g-systems had when generating languages. It seemed the main problem was in their inability to recognize they are reading the last symbol of a sentential form. To analyze this problem we modified the definition of g-systems and introduced g-systems with an endmarker. We defined the language generated by a g-system with an endmarker to be the set of all terminal words w such that the g-system can generate the word w$, where $ is a special endmarker symbol. The following theorem shows that the inability to recognize reading the last symbol in a sentenial form was indeed the main reason limiting the power of deterministic g-systems. Its proof is technically involved and beyond the scope of this article. Theorem 7.5. Deterministic g-systems with endmarker generate all recursively enumerable languages.
8. Conclusion In this article we have indicated that computer science has a role that goes much deeper than its frequently recognized role of a service discipline. We have touched some of the key concepts of computer science and concentrated on the area of defining formal languages by generative devices. The key notions of nondeterminism, parallelism and complexity were demonstrated in this setting. The material presented will hopefully motivate an interested reader to study these and other key notions of computer science in more depth. There are still many particular questions one can ask about the model of g-systems. One could study additional complexity measures, look for some subclasses of G that characterize some complexity classes or corresspond to particular generating devices.
B. Rovan / Generating Languages
209
The single most important idea of g-systems that can be also applied to physics or some other disciplines is the idea of describing the change of state of some system by a finite state transduction. In the language generating setting these states of the system were the sentential forms, represented by a string of symbols. We could however use more complex structures to describe the state of a system. In case we succeed to describe transitions of some system by a g-system like transducer we can capture the complexity of system state transitions via the complexity of the transducer. This could enable us to compare and measure the complexity of system behaviour.
Acknowledgements This research was supported in part by the VEGA grant 1/3106/06.
References [1] E. Csuhaj-Varju, J. Dassow, J. Kelemen, Gh. P˘aun, Grammar Systems: A Grammatical Approach to Distribution and Cooperation, Gordon and Breach Science Publishers Ltd., London, 1994 [2] S. Ginsburg, Algebraic and Automata Theoretic Properties of Formal Languages, North– Holland, Amsterdam, 1975 [3] J.E. Hopcroft and J.D. Ullman, Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, 1979. [4] V. Geffert, Normal forms for phrase-structure grammars. RAIRO/Theoretical Informatics and Applications, 25, 5(1991), 473–496. [5] V. Geffert, Generative Systems, Ph.D. Thesis, Comenius University, 1986 [6] P. Gvozdjak, B. Rovan, Time-Bounded Parallel Rewriting, in Developments in Language Theory II (J. Dassow, G. Rozenberg, A. Salomaa eds.), World Scientific 1996, pp. 79 - 87. [7] J. Hromkoviˇc, Communication Complexity and Parallel Computing, Springer Verlag, 1997 [8] J. Hromkoviˇc, Theoretical Computer Science, Springer Verlag, 2004 [9] J. Hromkoviˇc, J. Kari, L. Kari, D. Pardubská, Two lower bounds on distributive generation of languages, in Proc. MFCS’94, LNCS 841, Springer Verlag, 423–432 [10] R. Meersman, G. Rozenberg, Cooperating grammar systems, in Proc. MFCS’78, Springer Verlag, Berlin, 364–373 [11] Gh. Paun, L. Santean, Parallel communicating grammar systems: the regular case, Ann. Univ. Buc. Ser. Mat.–Inform. 37 vol.2 (1989), 55–63 [12] V. Rajlich, Absolutely parallel grammars and two–way finite state transducers, Journal of Computer and System Sciences 6 (1972), 324–342 [13] B. Rovan, A framework for studying grammars, in Proceedings of MFCS ’81, Lecture Notes in Computer Science 118, Springer Verlag, 1981, pp. 473-482 [14] B. Rovan, Complexity classes of g-systems are AFL, Acta Math. Univ. Com., Vol.48-49(1986) s.283-297 [15] B. Rovan, M. Slašt’an, Eliminating Communication by Parallel Rewriting, in Proc. DLT 2001, LNCS Vol. 2295, Springer Verlag, 2002, 369-378 [16] G. Rozenberg, A. Salomaa, The mathematical theory of L–systems, Academic Press, New York, 1980 [17] R. Siromoney and K.Krithivasan. Parallel context-free languages. Info. and Control 24, pp. 155–162, 1974.
This page intentionally left blank
Physics and Theoretical Computer Science J.-P. Gazeau et al. (Eds.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
211
Basic enumerative combinatorics Xavier Gérard Viennot 1 CNRS, LaBRI, Université Bordeaux 1 Abstract. We give a survey of basic tools and motivations in contemporary enumerative combinatorics. We start with the classical example of binary trees and generating function of Catalan numbers and extend it to “decomposable structures”. A typical but non trivial example is given by the Schaeffer decomposition of planar maps explaining the algebricity of the corresponding generating function. Then we systematically show the correspondence between algebraic operations about formal power series (generating function) and the corresponding operations at the level of combinatorial objects (sum, product, substitution, quasi-inverse, ...). A section is given about rational generating functions and a basic inversion lemma (interpreted in physics as the transition matrix methodology). We also investigate the world of q-series and q-analogues, with some example of bijective proof identities and the “bijective paradigm”. We finish with the introduction of the theory of heaps of pieces and some inversion formula as a typical example of the active domain of algebraic combinatorics, in connection with theoretical physics. Keywords. Enumerative, bijective, algebraic combinatorics. Generating functions. Catalan numbers, Fibonacci numbers, Strahler numbers. Dyck paths, binary trees, planar maps. Decomposable structures. q-series, q-analogues. Ferrers diagrams, partitions of integers. Bijective proofs. Transition matrix methodology, inversion lemma. Heaps of pieces.
1. An example: binary trees and generating function We begin with a classical combinatorial object, called binary tree, and displayed in Figure 1. A binary tree is, else reduced to a single vertex, or else is a triple formed by a root and a pair of binary trees (left and right subtree). Each vertex has else two sons (a left and a right son), else no sons. In the first case the vertex is called internal (resp. leaf or external) vertex. A binary tree has n internal vertices and n + 1 external vertices. The number of binary trees with n (internal) vertices is the classical Catalan number Cn . The sequence of Catalan numbers appears everywhere in combinatorics, and also in some parts of pure mathematics and in theoretical physics. The first values (for n = 0, 1, ...) are 1, 2, 5, 14, 42, 132, ... Historically, these numbers appear as the number of triangulations of regular polygons with n + 2 vertices, that is maximal configurations of two by two disjoints diagonals, as displayed on Figure 2. The problem of counting such triangulations go back to Euler, Segner in the 18th century, and around 1830-1840 to Binet, Lamé, Rodrigues, Catalan, and others. A nice, simple and explicit formula for Catalan numbers is: 1 Correspondence to: Xavier Gérard Viennot, CNRS, LaBRI, Université Bordeaux 1, 33405 Talence Cedex France; E-mail: viennot@ labri.fr
212
X.G. Viennot / Basic Enumerative Combinatorics
Figure 1. A binary tree.
Figure 2. A triangulation of a polygon.
Cn =
2n! 1 2n = . n+1 n (n + 1)!n!
(1)
The variety of different proofs is typical of the evolution of combinatorics through the last two centuries. Classically, from the very definition of a binary tree, and the fact that the choices of the left and right subtrees are independent, one would write the following recurrence relation for Catalan numbers, and which one should deduce formula (1). Cn+1 = Ci Cj , C0 = 1. (2) i+j=n
Enumeration of binary trees is the typical situation in enumerative combinatorics: we have a class of “combinatorial objects α of size n", that is a set A, and a “size" function | α | sending A onto the integers N , such that the set An of objects of size n is finite. The problem is to find a “formula" for an , the number of elements of An . A powerful tool in combinatorics is the notion of generating function (which is neither generating, neither a function), it is simply the formal power series whose coefficients are the numbers an of objects. f (t) = a0 + a1 t + a2 t2 + · · · + an tn + · · · .
(3)
As for polynomials, we can define the sum and the product of two formal power series. In the case of Catalan numbers, recurrence (2) is equivalent to the following algebraic equation y = 1 + ty 2 .
(4)
X.G. Viennot / Basic Enumerative Combinatorics
213
In modern combinatorics, a standard lemma enables to go directly from the definition of binary trees to the equation (4). The philosophy is to define a kind of operations on combinatorial objects such as the sum and product, and consider equation (4) as the “projection" on the algebra of generating power series of the analog equation in the space of binary trees. Historically, it took a certain time before generating functions in combinatorics where considered as formal power series, without consideration of convergence in the real or complex domain. Nevertheless, there subsists a kind of formal convergence for power series. For example, infinite sum such as 1 + 1 + 1 + .... or t + t + t + ... are nonsense. Formal power series in one variable, with coefficients in a ring K (in practice, K = Z or Q), form an algebra. Extensions are immediate with several variables. There is the notion of quasi-inverse such as 1/(1 − t) = 1 + t + t2 + t3 + ... and of a substitution f (u(t)) where f and u are two formal power series with u(0) = 0, i.e. u(t) has no constant term. For example, the reader will check, by substituting u = t + t2 in the power series 1/(1 − t), that the power series 1/(1 − t − t2 ) is the generating function of the Fibonacci numbers Fn defined by the following linear recurrence Fn+1 = Fn + Fn−1 ; F0 = F1 = 1.
(5)
Going back to Catalan numbers, from the algebraic equation (4), we get an explicit expression for their generating function y=
1 − (1 − 4t)1/2 . 2t
(6)
m(m−1) 2 Using the “binomial" formula (1+u)m = 1+ m u + m(m−1)(m−2) u3 + 1! u+ 2! 3! ... for m = 1/2 and u = −4t, one gets the formula (1) for Catalan numbers. Generating function for Catalan numbers is the archetype of algebraic generating function, that is formal power series y such that P (y, t) = 0, where P is a polynomial in y and t. Going from the recursive definition of binary trees to the algebraic equation (4) is typical of modern enumerative combinatorics. But during the process, at the end, in the computation following relation (6), we are far from the combinatorics of binary trees. Another approach in the spirit of “bijective combinatorics" is to explain identities by the construction of bijections relating the objects interpreting each member of the equality. For example relation (1) is equivalent to the following identity 2n . (7) (n + 1)Cn = n
A bijective proof of (7) can be obtained by constructing a bijection between binary trees having n internal vertices with one of its leaf being distinguished, and an n elements subset of a set having 2n elements. Another identity equivalent to (1) or (7) is 2(2n + 1)Cn = (n + 2)Cn+1
(8)
and a completely different combinatorial construction will give a bijective proof. Such construction has been given by Rémy in term of binary trees and (surprisingly for the context of that epoch) by Rodrigues in term of triangulations. An interest of this bijection is to give a linear time algorithm constructing a random binary tree (with uniform distribution over all binary trees having a given number of vertices).
214
X.G. Viennot / Basic Enumerative Combinatorics
2. Algebricity and decomposable structures Binary trees are the standard example of decomposable structure leading to an algebraic equation for the generating function. From the definition, the decomposition of binary into smaller binary trees is immediate. There are many examples of classes of combinatorial objects, having a nice algebraic generating function, but where the explanation of this algebricity by a recursive decomposition is far from evident. We give a typical and well known example with planar maps.
Figure 3. A rooted planar map.
A planar map is an embedding on the sphere of a planar graph, up to homeomorphim. It may have loops and multiple edges. Counting planar maps is easier when one edge is selected, with an orientation on it. These are the so-called rooted planar maps (see Figure 3). The generating function y for such objects counted according to the number of edges is algebraic and is solution of the following algebraic system of equations, as given by W.Tutte in the 60’s: y = A − tA3
(9)
A = 1 + 3tA2 From equations (9), using standard tools such as Lagrange inversion formula, one get the following formula for the number am of rooted planar maps with m edges: am =
2.3m Cm (n + 2)
(10)
Tutte method for proving (9) is indirect, with the use of some so-called “catalytic" extra variables and “kernel method". Many efforts have been given in order to “explain" directly the algebraic equations (9) and the formula (10), in the 70’s by Cori and Vauquelin with the introduction of “well labeled trees", in the 80’s by Arquès, until the “final" explanation by Schaeffer [4] in 1998, with the introduction of the “balanced blossoming trees". A blossoming tree is a binary tree, with the choice for each internal vertex of a bud in each of the three possible regions delimited by the edges incident to that vertex (we have added an extra edge to the root of the binary tree, see Figure 4). The number of such trees is 3n Cn , satisfying obviously the second equation of the system (9). By a process connecting these n buds with n of the n + 2 external edges of the binary tree one get a planar map, where two external edges are left not connected (see Figure 5). The
X.G. Viennot / Basic Enumerative Combinatorics
215
external root edge is not connected if and only if the blossoming tree is balanced. Then by connecting this root edge with the other external edge not connected, one get a rooted planar map where every vertex has degree 4. Such rooted maps with n vertices are in bijection with planar rooted maps having n edges. Finally Figure 6 explains visually the first algebraic equation of system (9) (following a proof by Bouttier, Di Francesco and Guitter, 2002).
Figure 4. Binary tree and blossoming tree.
Figure 5. Balanced and unbalanced blossoming tree.
3. Substitution in generating function We have seen above how decompositions of combinatorial structures are related to some operations on generating functions such as sum and product. This is the philosophy of modern enumerative theory, considering some “operations" on combinatorial objects
216
X.G. Viennot / Basic Enumerative Combinatorics
Figure 6. Enumeration of blossoming trees.
(sum, product, ..) which are the “lifting" of the analog operation in the algebra of formal power series. It would be possible to write standard abstract lemma for each operation. But at this level of exposition we prefer to work with examples. We will see in section 5 more examples related to the operation product and also the operation “quasi-inverse". In this section, we give an example of the operation “substitution" in power series and in combinatorial objects. We define the Strahler number of a binary tree by the following recursive procedure. The leaves (external vertices) are labeled 0. If an internal vertex has two sons labeled k and k , then the label of that vertex is max(k, k ) if k = k , and k + 1 if k = k (see Figure 8). In a recursive way, every vertex is labeled and the label of the root is called the Strahler number of the binary tree (see Figure 7). This parameter was introduced in hydrogeology by Strahler, following Horton and has a long history in various sciences including fractal physics of ramified patterns, computer graphics, molecular biology and theoretical computer science (see the survey paper [6]). We are interested in the enumeration of binary trees according to the parameter “Strahler number". Let Sn,k be the number of binary trees having n (internal) vertices and Strahler number k. Let Sk (t) and S(t, x) be the corresponding generating functions Sn,k tn ; S(t, x) = Sk (t)xk . (11) Sk (t) = n≥0
k≥0
We are going to give an idea of the fact that this double generating function satisfies the following functional equation: 2 xt t S S(t, x) = 1 + ,x . (12) 1 − 2t 1 − 2t
X.G. Viennot / Basic Enumerative Combinatorics
217
Figure 7. Strahler number of a binary tree.
Figure 8. Strahler labeling.
First we recall the (very classical) bijection between binary trees and Dyck paths. A Dyck path is displayed on Figure 9 and the bijection is obtained by following the vertices of the binary tree in the so-called “prefix order" and associating a North-East (resp. South-East) step in the Dyck path when one reaches an internal (resp. external) vertex of the binary tree. The height H(ω) of a Dyck path is the maximum level of its vertices (here 4 on Figure 9). We define the “logarithmic height" LH(ω) of the Dyck path ω as the integer part of the logarithm in base 2 of the height, augmented by 1, of the Dyck path ω. In other words, we have the following characterization LH(ω) = k iff 2k−1 − 2 < H(ω) ≤ 2k − 2.
(13)
A remarkable fact is that the distribution of the parameter LH among Dyck paths of length 2n is exactly the same as the one of the Strahler number among binary trees having n internal vertices. In other words, the corresponding double generating function for Dyck paths also satisfies the functional equation (12). The proof relies on a bijective interpretation of this functional equation on both binary trees and Dyck paths (due to Françon [2]). We briefly resume the key ideas of this bijective proof. We need to introduce another family of binary trees (general binary trees), i.e. binary trees having four kind of vertices: no son, one left son, one right son, two sons. The family introduced at the beginning of this paper wil be called complete binary trees to avoid confusion. The second member of the equation (12) is obtained by replacing S(u, x) by uS(u2 , x) and then substituting u by t/(1 − 2t). The first substitution is interprated by a bijection between general binary trees having n vertices and complete binary trees having a total of 2n + 1 vertices (n internal and n + 1 external). Such bijection is shown on Figure 10. This first substitution is also interpreted on Dyck paths by the bijection displayed on Figure 12. A Dyck path of length 2n is put in bijection with a “2-colored
218
X.G. Viennot / Basic Enumerative Combinatorics
Figure 9. Bijection between binary trees and Dyck paths.
Motzkin path" of length n − 1, that is a path having 4 kinds of elementary steps: NorthEast and South-East as for Dyck paths, together with elementary East step, colored blue or red.
Figure 10. Bijection between general binary trees and complete binary trees.
X.G. Viennot / Basic Enumerative Combinatorics
219
Figure 11. Substitution in binary trees.
Figure 12. Bijection between Dyck paths and 2-colored Motzkin paths.
The generating function t/(1 − 2t) corresponds to “zig-zag filaments" in binary trees (the pieces shown on Figure 11), or to sequences of level steps colored blue or red in the paths of Figure 12. The substitution of u into t/(1−2t) in the double generating function S(u, x) corresponds in the class of (complete) binary trees to “substitute" each of the 2n + 1 vertices by a “zig-zag filaments" as shown on Figure 11. The complete binary tree becomes a bigger binary tree. The same substitution is also interpretated on Dyck paths: each of the 2n + 1 vertices of a Dyck paths is “substituted" by a sequence of blue or red East steps staying at the same level as the level of the vertex of the Dyck path. In these constructions, each of the parameters Strahler number of the binary tree and logarithmic height of the Dyck path is increased by one. These ideas, involving the idea of “substitution" inside combinatorial objects are at the basis of the proof that the parameters “Strahler number" and “logarithmic height" have the same distribution. A recursive bijection between binary trees and Dyck paths, preserving these parameters, could be deduced from these considerations (Françon [2]). A direct and deep bijection has just been obtained by D.Knuth [3]. The problem of computing the generating function Sk (t) for binary trees having Strahler number k is thus reduced to the one of finding the generating function for Dyck paths having a given height. It is a rational power series and will be given in the next section as a consequence of a general inversion property.
220
X.G. Viennot / Basic Enumerative Combinatorics
4. Rational generating function A formal power series is rational if it has the following form:
a n tn =
n≥0
N (t) D(t)
(14)
where N (t) and D(t) are polynomials in the variable t with D(0) = 0. We give here a general inversion theorem, which takes many forms in different domains: inversion of matrices in linear algebra, transition matrix in physics, finite automata generating function in theoretical computer science. We define a path (or walk) in an arbitrary set S as to be a sequence ω = (s0 , s1 , ..., sn ) where si ∈ S. The vertex s0 (resp. sn ) is the starting (resp. ending) point, n is the length of the path and (si , si+1 ) its elementary step. We suppose that a function v : S × S → K[X] is given, where K[X] is the algebra of polynomials with coefficients in the ring K. The weight of the path ω is defined as to be the product of the weights of its elementary steps. The following proposition gives the generating function for weighted paths in a finite set S. Proposition. Let i and j be elements of the finite set S. The generating function for weighted paths ω starting in i and ending in j is rational and given by
v(ω) =
ω,i→j
D=
Nij D
(15)
(−1)r v(γ1 )...v(γr ) {γ1 ,...,γr }, 2 by 2 disjoint cycles
Nij =
(−1)r v(η)v(γ1 )...v(γr ).
(16)
(17)
{η;γ1 ,...,γr }
In D and in Nij , a cycle means a circular sequence of distinct vertices (as in the decomposition of a permutation into disjoint cycles). The weight of a cycle is the product of the weights of its oriented edges. In D the cycles γ1 , ..., γr are two by two disjoints; in Nij the path η is a self-avoiding path and the cycles γ1 , ..., γr are two by two disjoints and disjoints of the path η (see Figure 13) In fact, formula (15) is nothing but another form of the classical inversion formula of a matrix in linear algebra. If we define the matrix A = (aij )1≤i,j≤n with v(i, j) = aij
(18)
then the term ij of the inverse of the matrix (I − A) is the sum of the weight of the paths ω going from i to j. The denominator D is the determinant of the matrix (I − A), while Nij is the cofactor ji of the same matrix.
X.G. Viennot / Basic Enumerative Combinatorics
221
Figure 13. Determinant D and cofactor Nij .
Example 1. Fibonacci numbers. We consider the segment graph of length n. A matching is a collection of two by two disjoints edges (i, i + 1) (see Figure 14). Such matchings are in bijection with paths on the set S = {1, 2} with weighted edges as displayed on Figure 15, and starting and ending at the vertex 1. Applying formula (15) gives the generating function for Fibonacci numbers:
Fn tn =
n≥0
1 . 1 − t − t2
(19)
Figure 14. Matching and Fibonacci numbers.
Example 2. Bounded Dyck paths. We define Fk (t) as to be the alternating generating polynomial for matchings of the segment of length n, that is Fn (x) =
n
an,k (−x)k .
(20)
k=0
where an,k is the number of matchings of the segment graph of length n having k edges. The generating function for Dyck paths bounded by the height k is given by the following
222
X.G. Viennot / Basic Enumerative Combinatorics
rational function: ω
t|ω|/2 =
Dyck paths H(ω)≤k
Fk (t) . Fk+1 (t)
(21)
Figure 15. Bounded Dyck path.
A Dyck path is in bijection with a path on the segment [0, k] starting and ending at the vertex 0, with elementary step (i, i+1) and (i, i−1). Equation (21) is just a consequence of the above proposition . The corresponding matrix A = (aij ) is a tridiagonal matrix with 0 as entries on the diagonal and t as entries above and under the diagonal. The polynomials Fk (t) are usually called Fibonacci polynomials. Up to a change of variable, they are Tchebycheff polynomials of the second kind. Figure 16 gives the Fibonacci polynomial of order 4.
Figure 16. Fibonacci polynomial F4 (t).
Combining equation (21) with section 3 gives the generating function S≤k (t) for binary trees with Strahler number ≤ k. Subtracting S≤k (t) and S≤k−1 (t), and applying some trigonometric formula about modified Tchebycheff polynomials gives the following generating functions for binary trees with given Strahler number k (enumerated by number of vertices). k−1
Sk (t) = S≤k (t) =
t(2 −1) = S≤k (t) − S≤k−1 (t) R2k −1 (t) R2k −2 (t) R2k −1 (t)
(22)
X.G. Viennot / Basic Enumerative Combinatorics
223
5. q-series and q-analogues For various reasons, some generating functions are denoted with variable “q". In this section, we are in the garden of q-series, q-calculus and q-analogues, sometimes also called quantum combinatorics. A typical example is the generating function for partitions of integers. As in previous sections 1,2,3, this section will show the relationship between elementary operations on power series and the corresponding operations on combinatorial structures. Here we will illustrate a new operation: the operation “quasi-inverse" 1/(1 − u), corresponding to sequence of combinatorial objects. A partition of the integer n is nothing but a decreasing sequence of non-zero integers (λ1 ≥ ... ≥ λk ) such that the sum λ1 + ... + λk = n. Such partition can be visualized with the so-called Ferrers diagram, as shown on Figure 17. The ith row (from bottom to top) has λi cells.
Figure 17. Ferrers diagram.
We explicit the generating function for partitions of integers, or Ferrers diagrams. First, the generating function for a single row of length i is reduced to a single monomial q i . A rectangle with i columns is nothing but a sequence of rows of length i. Now a general lemma says that if a combinatorial object A has generating function u, then a “sequence" of combinatorial objects A has generating function 1/(1 − u). Thus rectangle with i columns has generating function 1/(1 − q i ). Any Ferrers diagram can be decomposed in a unique way into a family of rectangles (see Figure 18). Applying the “product" lemma as in sections 1 and 2, we get the generating function for Ferrers diagrams having at most m columns: 1/(1 − q)(1 − q 2 )...(1 − q m ). Going to the limit, we get the generating function for partitions of n (or Ferrers diagram with n cells): 1 (23) an q n = (1 − q i ) n≥0
i≥1
q-analogues The genererating function of Ferrers diagrams included in a m × n rectangle is given by the following expression (1 − q)(1 − q 2 )...(1 − q m+n ) . (1 − q)...(1 − q m )(1 − q)...(1 − q n )
(24)
224
X.G. Viennot / Basic Enumerative Combinatorics
Figure 18. Generating function for Ferrers diagrams.
If we take the formal variable q as to be a real number, in the limit q → 1, the ex . The polynomial (24), also called pression (24) tends to the binomial coefficient m+n m Gaussian polynomial, is a q-analogue of the binomial coefficient. Ferrers diagrams included in a rectangle are defined by a path of length m + n with m elementary steps East and n elementary step South, and are thus enumerated by that binomial coefficient. The parameter q is the counting parameter for the area below the path. This is a typical situation of a combinatorial q-analogue. Another example is counting permutations on n elements according to the number of inversions, which is given by the following polynomial (1 + q)(1 + q + q 2 )...(1 + q + ... + q n−1 ) =
(1 − q)(1 − q 2 )...(1 − q n ) . (1 − q)n
(25)
For q = 1, this polynomial gives back n!, the number of permutations. Bijective proof of an identity Already in the ancient greek times such proofs were given, as for example a “bijective" or “visual" proof of the identity n2 = 1 + 3 + ... + (2n − 1), displayed on Figure 19. In the same spirit as proving equation (23), we illustrate on Ferrers diagrams the philosophy of bijective proofs of identities. We will prove the following identity m≥1
qm 1 . = 2 m 2 [(1 − q)(1 − q ) · · · (1 − q )] (1 − q i ) 2
i≥1
A bijective proof is displayed on Figure 20.
(26)
X.G. Viennot / Basic Enumerative Combinatorics
225
Figure 19. Identity n2 = 1 + 3 + ... + (2n − 1).
Figure 20. Bijective proof of an identity.
The left hand-side of identity (26) is a product of 3 generating functions. The nu2 merator q m is the generating for a Ferrers diagram having a square shape m × m. The denominator 1/(1 − q)...(1 − q m ) is the generating function for Ferrers diagrams having at most m columns (or by symmetry at most m rows). The right hand side of identity (26) is the generating function for (general) Ferrers diagrams. An arbitrary Ferrers F diagram can be decomposed in a unique way into a triple of Ferrers diagrams of the type described by the numerator and denominator in the left hand side of (24): it suffices to consider the largest square contained in the Ferrers diagram F . Such bijective proof, relating calculus and manipulation of combinatorial objects is typical of the so-called bijective paradigm. Identities coming from various part of mathematics can be treated this way. First one must find combinatorial interpretation of both sides of the identity which will appear as the sum of certain weighted objects. Then the identity will be seen as a consequence of the construction of a weight preserving bijection between the combinatorial objects interpreting both side of the identity. One possi-
226
X.G. Viennot / Basic Enumerative Combinatorics
ble interest is to give a better understanding of the identity. In the last years, many works have been done, in particular putting on the combinatorial level domains of mathematics such as special functions, orthogonal polynomials and continued fractions theory, elliptic functions, symmetric functions, representation theory of groups and algebras, etc ... Combinatorics is “plural", with various attributes such as enumerative, bijective or algebraic. We finish this lesson with a typical chapter of algebraic combinatorics by introducing heaps of pieces and an inversion lemma. The theory of heaps of pieces has been very useful for interaction with theoretical physics and will be explained in more details in our second lesson.
6. Algebraic combinatorics: an example with heaps of pieces The theory of heaps of pieces (in french: “empilements de pièces") has been introduced by the author in 1985 [5] as a geometric and combinatorial interpretation of the socalled commutation monoids defined by Cartier and Foata in 1969 [1]. Commutation monoids have been used in computer science as models for parallelism and concurrency access to data structures. They are also called trace monoids after the pioneer work of Mazurkiewicz.
π
P domino B =R×R Figure 21. Heaps of dimers on a chessboard.
The intuitive idea of a heap of pieces can be visualized on Figure 21. Here the pieces are “dimers on a chessboard". They are put one by one on the chessboard, such that the projection of each dimer is the union of two consecutive cells. Then, intuitively, the heap is symbolized by what can be seen when one forgets in which order the dimers have been put. Here we will develop only the particular case of “heaps of dimers on the integers N", as shown on Figure 22. A basic dimer is a pair (i, i + 1) of consecutive integers and a heap of dimers is formed by dimers, lying at a certain level k ≥ 0 and having as
X.G. Viennot / Basic Enumerative Combinatorics
227
projection on the horizontal axis a basic dimer (i, i + 1). In that case, the commutation monoid is the monoid generated by the variables σ0 , σ1 , ..., σn with commutations of the form: σi σj = σj σi iff | i − j |≥ 2.
(27)
maximal pieces
3 2 1 0 0
1
2
3
4
5
6
7
Figure 22. A heap of dimers on N.
We briefly resume the basic idea relating commutation monoid and heaps of pieces with the example of Figure 22. We start with the word w = σ2 σ3 σ5 σ1 σ4 σ1 σ3 . The columns above the horizontal axis are labeled σ0 , ..., σ5 , corresponding to basic dimers (0, 1), ..., (5, 6). Reading the word w from left to right, each letter σi produces a dimer falling down in the column labeled σi above the basic dimer (i, i + 1). The dimers fall down one by one, else on the “floor" (the horizontal axis at level 0) , or onto another dimer which is in “concurrency" (i.e. the corresponding columns are labeled σi and σj with | i − j |≤ 1). Two different words can lead to the same heap, as for example the words w and w = σ5 σ2 σ1 σ1 σ3 σ4 σ3 giving the heap displayed on Figure 22. The fundamental lemma is that two words give the same heap if and only if they are in the same commutation class, that is if they can be transformed from one into another by a sequence of elementary commutations of the form (27). Thus, we get a bijection between commutation classes of the commutation monoid and the set of heaps of dimers over N. We suppose that each basic dimer α = (i, i + 1) is weighted by a certain polynomial v(α) of the polynomial algebra K[X]. We suppose that v(α) has no constant term. In general, v(α) will be a monomial. We define the weight v(E) of a heap of dimers E as being the product of the weight of each pieces of the heap, the weight of a piece being the weight of its projection on the horizontal axis. We suppose that the generating fucntion for weighted heaps, that is the infinite sum E v(E) (over all heaps E) makes sense. A heap is called trivial when all its pieces are at level 0. We also suppose that the generating function for trivial heaps is a well defined formal power series. It will be a polynomial when the set of basic pieces is finite, as for exemple if we restrict the heaps of dimers on N to be on the segment [0, n − 1] of length n. A fundamental lemma, which
228
X.G. Viennot / Basic Enumerative Combinatorics
would be considered in physics as a boson-fermion identity, is the following: Inversion lemma The generating function for weighted heaps is the inverse of the alternating generating function for trivial heaps, that is E
v(E) =
heap
1 with D = D
(−1)|T | v(T ) T trivial heap
(28)
A typical example is given with heaps of dimers on the segment [0, n − 1]. If each basic dimer is weighted by the variable t, then the generating function for weighted heaps is simply the generating function enumerating heaps of dimers according to the number of dimers. From section 4, it is the inverse of the Fibonacci polynomial Fn (t).
D
1 D
Figure 23. Inversion lemma.
It will be very useful to use an extension of the inversion lemma. We define the maximal pieces of a heap as being the pieces which can be “removed" from the heap by sliding them up without bumping another piece (dually minimal pieces would be the pieces lying at level 0). Let M be a set of basic pieces. We state the following extension of the inversion lemma: Extended inversion lemma Let M be a set of basic pieces. The generating function for weighted heaps of pieces such that the projection of the maximal pieces are contained in M is given by the ratio: E
heap π(max E)⊆M
v(E) =
N . D
(29)
where D (resp. N ) is the alternating generating function of trivial heaps of pieces (resp. trivial heaps with pieces which are not in M ).
X.G. Viennot / Basic Enumerative Combinatorics
229
If the set of basic pieces is finite, then this heaps generating function is rational. In particular, we will apply in the next lesson this extended inversion lemma for pyramids, that is heaps having a unique maximal piece. The theory of heaps of pieces is particularly useful for the interaction between combinatorics and theoretical physics, in particular for solving in a pure combinatorial way some models from statistical physics such as: the directed animals model, hard gas model, strair-case polygons enumerated by perimeter and area, SOS (Solid-on-Solid) model and path with neighbour interactions. Heaps of pieces have also reappeared in 2D Lorentzian quantum gravity after the works of Ambjørn, Loll, Di Francesco, Guitter, Kristjansen, James and the author. Selected further reading For more studies, the reader will go to the well known reference in enumerative combinatorics, the 2 volumes book written by Richard Stanley, “Enumerative combinatorics”, Cambridge University Press, Vol. 1 (1986, 1997, 2000, 338p.) and Vol. 2 (1999, 2001, 598p.). Another reference, with emphasis on the so called analytic combinatorics and analysis of algorithms in computer science is the book in preparation of P. Flajolet (717p. in April 2006) available on his web site (pauillac.inria.fr/algo/ flajolet/Publications/book.html). The first part (PartA: Symbolic Methods, final version) with its 3 chapters is in the spirit of this Cargèse lesson. Finally, in this lesson we have completely put aside the vast domain of combinatorics dealing with exponential generating functions. A good exposition of exponential structures is the book written by F.Bergeron, G.labelle and P.Leroux, “Combinatorial species and tree-like structures”, in Encyclopaedia of Mathematics and its applications, Cambridge University Press, 1997, 479p. A French version is available at the “Publication du LACIM”, LACIM, UQAM, Montréal, 1996.
References [1] P. Cartier, and D. Foata, Problèmes combinatoires de commutation et réarrangements,, vol. 85 of Lecture Notes in Maths,. Spinger-Verlag, Berlin, 1969. [2] J. Françon, Sur le nombre de registres nécessaires à l’évaluation d’une expression arithmétique. RAIRO Informatique Théorique 18 (1984), 355–364. [3] D. Knuth, www-cs-faculty.stanford.edu/ knuth/programs/francon.w. [4] G. Schaeffer, Conjugaison d’arbres et cartes combinatoires aléatoires. PhD thesis, Université Bordeaux1, 1998. [5] X.G. Viennot, Heaps of pieces 1: basic definitions and combinatorial lemma. In Combinatoire énumérative, G.Labelle and P.Leroux, Eds., no. 1234 in Lecture Notes in Maths. 1986, pp. 321–246. [6] X.G. Viennot, Trees everywhere. A.Arnold, Ed., vol. 431 of Lecture Notes in Computer Science, Springer Verlag, pp. 18–41.
This page intentionally left blank
Physics and Theoretical Computer Science J.-P. Gazeau et al. (Eds.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
231
An Introduction to Noncommutative Symmetric Functions a
Jean-Yves Thibon a,1 Institut Gaspard-Monge, Université de Marne-la-Vallée
Abstract. We give a short introduction to the theory of Noncommutative Symmetric Functions, including noncommutative Vieta formulae, relations with Solomon’s descent algebras, quasi-symmetric functions and Hecke algebras at q = 0. Keywords. Symmetric functions, noncommutative polynomials, quasi-symmetric functions, descent algebras, Hecke algebras
1. Symmetric functions, physics, and computers The aim of these notes is to present a short introduction to the recent theory of noncommutative symmetric functions. It is an extension of the usual theory of symmetric functions, and since this is a school on Physics and Computer Sciences, I have to say a few words about the role of symmetric functions in these disciplines. As everybody knows, the elementary theory of symmetric functions has to do with algebraic equations, and algebraic equations occur in every field involving a bit of mathematics. So it is not this point that I have in mind. I rather want to stress the advanced part of the theory, which is related to Group representations, integrable systems, invariant theory and algebraic geometry, among others. From the very beginning, computers have been used for physical calculations. Many of these were related to group representations. Although the standard references for the theory were for a long time Hermann Weyl’s books [41,42], it was rather in the book by Francis D. Murnaghan [36] and then mostly in the one by Dudley E. Littlewood [24] that practical methods of calculation were presented. All of these methods were based on non-trivial identities on symmetric functions. The first computer programs devoted to group theorical calculations in Atomic Spectroscopy and Nuclear Physics were based on Littlewood’s methods. The algorithms are described in [44,45,46]. Up to 1979, Littlewood’s book remained the only reference dealing with these advanced aspects of symmetric functions. Then appeared the first edition of Ian G. Macdonald’s book [28], giving a modernized presentation of the previous topics, and including some new ones, in particular the theory of Hall-Littlewood functions. To cut the story short, Hall-Littlewood functions, discovered by Littlewood in 1959 [25], are certain symmetric functions depending on a parameter q, which realize what is called the Hall algebra. The interest of this discovery is that James A. Green had expressed the character table of the general 1 Correspondence to: J.-Y. Thibon, IGM, Université de Marne-la-Vallée, 77454 Marne-la-Vallée Cedex 2, France. Tel.: +33 1 60 95 77 22; Fax: +33 1 60 95 75 57; E-mail:
[email protected].
232
J.-Y. Thibon / An Introduction to Noncommutative Symmetric Functions
linear group over a finite field GL(n, Fq ) in terms of the structure constants of this Hall algebra [17]. Litlewood’s discovery allowed then for an efficient algorithm to compute these tables. The same Hall-Littewood functions were rediscovered by Macdonald, this time in the guise of spherical functions on p-adic groups [29]. Since then, they have arised in various mathematical problems, and also in the analysis of certain exactly solvable models of statistical mechanics [19,37]. In the meantime, knowing that real and p-adic Lie groups had many properties in common, and that a one-parameter family of symmetric functions (the Jack polynomials) interpolated between spherical functions of the different series, Macdonald looked for, and eventually found [30], a two parameter family of symmetric functions (the famous Macdonald polynomials) containing as specializations the Jack and Hall-Littlewood families. These new polynomials are know to be related to conformal field theory (they express singular vectors of Virasoro and W-algebras in various representations) and to exactly solvable one-dimensional n-body problems in the quantum and relativistic cases (cf. [8]). Research in this area is still very active, and relies a lot upon Combinatorics and Computer Algebra. Now, after the advent of Quantum Groups and Noncommutative Geometry, it became a natural reflex to look for noncommutative analogues of all interesting mathematical theories. At the time, Israel M. Gelfand and Vladimir S. Retakh were developing a general theory of noncommutative determinants [13,14]. The classical theory of symmetric functions involves a lot of determinantal identities and the story began with an attempt to lift these to the Gelfand-Retakh theory [12].
2. The quest of the Noncommutative Symmetric Functions 2.1. Back to basics The relations between the coefficients of a polynomial and its roots P (x) =
n
(x − xi ) =
i=1
n
(−1)k ek (X)xn−k ,
(1)
k=0
attributed to François Viète, were known in the sixteenth century, and certainly to the ancient civilisations in the case of quadratic polynomials. Our first requirement for a theory of noncommutative symmetric functions will then be that it should have something to say about roots of polynomials with coefficients in noncommutative rings (or, at least, in skew fields), and about the expansion of products of linear factors (x − xi ). The advent of linear algebra and matrices allowed for a fresh interpretation of the Viète formulas. If P (x) is the characteristic polynomial of a matrix M , the xi are its eigenvalues, and ek (X) = tr Λk (M ), where Λk (M ) is the kth exterior power of M , i.e., the matrix whose entries are the minors of order k of M . It is often more convenient to assume that the xi are the reciprocals of the eigenvalues, so that |I − xM | =
n n (1 − xi x) = ek (X)(−x)k i=1
k=0
(2)
J.-Y. Thibon / An Introduction to Noncommutative Symmetric Functions
233
is invertible as a formal power series, and its inverse −1
|I − xM |
n = (1 − xi x)−1 = hk (X)xk i=1
(3)
k≥0
has as coefficients the complete homogeneous symmetric functions hk (X), which can be interpreted as the traces of the symmetric powers S k (M ). This last statement is essentially McMahon’s “Master Theorem”. k The power sums pk (X) = xi are obviously the traces of the powers M k , and at a more advanced level, one knows that the traces of the images of M under the irreducible polynomial representation of GLn , labelled by partitions λ, are the so-called Schur functions sλ (X) [24,28]. Another natural requirement for a theory of noncommutative symmetric functions is therefore that some of these properties should have an analogue for matrices with entries in a noncommutative field. 2.2. Roots of noncommutative polynomials We first have to choose a reasonable definition of noncommutative polynomials, i.e., with coefficients in a noncommutative algebra R. If our variable x does not commute with R, the resulting algebra is in general intractable. So we will rather assume that x commutes with R, and introduce the notion of left and right roots. For P (x) =
n
ak xk ,
ak ∈ R ,
(4)
k=0
we say that c ∈ R is a right root of P if n
a k ck = 0
(5)
k=0
Now, if x1 , . . . , xn ∈ R, where we assume that R is a division ring, can we find P (x) = xn − Λ1 xn−1 + Λ2 xn−2 − · · · + (−1)n Λn
(6)
such that the xk are right roots of P (x)? If the xi are pairwise non conjugate, the unique solution is [4] 1 1 ... 1 x1 . . . xn x 2 2 2 (7) P (x) = x1 . . . xn x .. .. .. . . . n x1 . . . xnn xn where for a matrix A = (aij ) with coefficients in a noncommutative ring, the notation |A|ij , which can be displayed like a determinant with the entry aij in a box, denotes ones of the quasi-determinants of A. The quasi-determinants of A can be defined for a generic matrix by the formulae [13,14]
234
J.-Y. Thibon / An Introduction to Noncommutative Symmetric Functions
|A|ij = (A−1 )−1 ji
(8)
(these are analogues of the ratio of a determinant and one of its principal minors). One finds for example Λ1 (x1 , x2 ) = (x22 − x21 )(x2 − x1 )−1 , −1 Λ2 (x1 , x2 ) = (x22 − x1 x2 )(x−1 1 x2 − 1)
These expressions are symmetric functions of the xi , but are not polynomials. The correct setting for developing their rigorous theory is the free field generated by the xi [15]. 2.3. Products of linear factors Since the Viète formulae are not anymore true for noncommutative polynomials, we should also have a look at the expansion of products of linear factors. Let Y = {y1 < . . . < yn } be an ordered alphabet of non-commuting variables. We could define elementary symmetric functions of Y by λt (Y ) = (1 + tyn )(1 + tyn−1 ) · · · (1 + ty1 ) =
n
Λk (Y )tk
(9)
k=0
where t commutes with the yi . An immediate objection to this naive definition would be that the Λk (Y ) are not symmetric. This is an illusion. First, they are actually symmetric, but for a special action of the symmetric group, the plactic action of Lascoux and Schützenberger, now known to be a particular case Kashiwara’s action of Weyl groups on crystal graphs [26]. The Λk do not generate the full algebra of invariants of this action, but a very interesting subalgebra denoted by Sym(Y ), and called the algebra of noncommutative symmetric polynomials over Y , the full algebra of invariants being FSym(Y ), the algebra of free symmetric polynomials [9]. Suppose now that x = {x1 , . . . , xn } is a generic set of right roots of P (x), in the sense that the Vandermonde quasi-determinants k−1 x1 xk−2 vk = 1. . . 1
xk−1 2 xk−2 2 .. . 1
··· ··· .. . ···1
xk−1 k xk−2 k .. .
= 0 .
(10)
Then, if we define yk = vk xk vk−1 ,
(11)
we have P (x) = (x − y1 )(x − y2 ) · · · (x − yn ) =
n k=0
(−1)k Λk (Y )xn−k .
(12)
J.-Y. Thibon / An Introduction to Noncommutative Symmetric Functions
235
This is the relation between the two kinds of symmetric functions already considered. Moreover, R. Wilson has shown that any polynomial in the yi which is symmetric in the xi is a polynomial in the Λi (Y ) [43]. Therefore, the two questions lead to the same algebra, although in a nonobvious manner. These considerations can lead to new results, even in elementary linear algebra. For example, Connes and Schwarz [6], obtained the following identity. Suppose that R = Mk (C), that is, a1 , . . . , an , x1 , . . . , xn are k × k complex matrices such that the xi form a nondegenerate set of right roots of P (x) = xn + a1 xn−1 + · · · + an . Then, det(1 − λx1 ) · · · det(1 − λxn ) = det(1 + a1 λ + · · · + an λn ) .
(13)
In particular, tr a1 = −(tr x1 + · · · + tr xn )
(14)
det an = (−1)nk det x1 · · · det xn .
(15)
a result previously obtained by D. Fuchs and A. Schwarz [11].
For Y infinite, following the usage of the commutative theory, we speak of the Λi (Y ) as noncommutative symmetric functions (instead of polynomials). 2.4. Matrices over noncommutative rings Quasideterminants allow one to define analogues of the characteristic polynomial for a n × n matrix M with noncommutative entries. Instead of a single characteristic polynomials, we get (in general) n quasi-characteristic polynomials |1 + tM |ii = λt (α(i) )
(16)
where it is convenient to introduce n “virtual alphabets” α(i) of quasi-eigenvalues. For example, with M = En − (n − 1)I, where 0
e11 B e21 B En = B . @ .. en1
e12 e22 − 1 . .. en2
1 ... e1n C ... e2n C . C .. . A . . . . . enn − n + 1
(17)
(the Capelli matrices of classical invariant theory), in which the eij are the generators of the universal enveloping algebra U (gln ) corresponding to the matrix units, eij =
n X k=1
xik
∂ ∂xkj
the coefficients of the quasi-characteristic polynomials provide new generators of its center Z(gln ) This has been generalized by A. Molev to the case of Sklyanin determinants in twisted Yangians [34].
236
J.-Y. Thibon / An Introduction to Noncommutative Symmetric Functions
2.5. Formal noncommutative symmetric functions We can see from the previous considerations that there are several reasonable but non equivalent notions of noncommutative elementary symmetric functions. However, in each case, we end up with sequence of elements Λk , which do not commute, and are to be considered as being of degree k in some reasonable sense. We should therefore introduce an algebra of Formal Noncommutative Symmetric Functions Sym = KΛ1 , Λ2 , . . . , Λn , . . .
(18)
as the free associative algebra over an infinite sequence of noncommuting indeterminates Λk , k ≥ 1, with Λk of degree k, and try to generalize as much as possible the classical constructions on the algebra of commutative symmetric functions, which, after all, is nothing but Sym = K[e1 , e2 , . . . , en , . . . ]
(19)
The previous examples should then be considered as specializations.
3. Hopf algebras 3.1. Commutative symmetric functions The ring of symmetric polynomials in n variables Sym(X) = K[x1 , . . . , xn ]Sn (K is some field of characteristic 0) is freely generated by the elementary symmetric polynomials xi1 xi2 · · · xik (k = 1, 2, . . . , n) . ek = i1 <...
Thus, Sym(X) is just a polynomial algebra K[e1 , . . . , en ], with a particular grading defined by deg(ek ) = k. The algebra Sym of symmetric functions is obtained by letting n → ∞. The classical theory of symmetric functions can therefore be regarded as the study of the polynomial algebra K[ek : k ≥ 1] over an infinite sequence of indeterminates, graded by deg(ek ) = k. Note that the original variables xi can be eliminated from this definition. As ank algebra, Sym has several distinguished sets of generators. Let λ(t) = k≥0 ek t be the generating series of elementary symmetric functions. Then, the complete homogeneous functions hn can be defined by theirgenerating series σ(t) = d log σ(t) λ(−t)−1 = k≥0 hk tk , and the power-sums pn by ψ(t) = k≥1 pk tk−1 = dt (the Newton formulas). One has as well Sym = K[h1 , h2 , . . .] = K[p1 , p2 , . . .] . A Hopf algebra structure is defined by means of the comultiplication Δ(pk ) = pk ⊗ 1 + 1 ⊗ pk , and the antipode ω ˜ (pk ) = −pk . It can be shown that Sym is isomorphic to its graded dual [28].
J.-Y. Thibon / An Introduction to Noncommutative Symmetric Functions
237
The dimension of the homogeneous component of degree n of Sym is dim Symn = |Part (n)| = p(n), the number of partitions of n, i.e., non-decreasing sequences of positive integers summing to n. Linear bases of Sym are therefore naturally labelled by partitions. The simplest ones are mλ (monomial symmetric functions), eλ , hλ , pλ (products eλ1 eλ2 · · · eλr etc.), and the Schur functions sλ = |hλi +j−i |, which are the fundamental ones in representation theory. There is a canonical scalar product, defined by either of the equivalent formulas sλ , sμ = mλ , hμ = pλ , p∗μ = δλμ (where p∗μ = pμ /zμ and zμ = imi mi !, mi being the multiplicity of i in μ). This scalar product materializes the self duality of Sym, in that Δ is the adjoint of the multiplication map f ⊗ g → f g. The equivalence of these definitions, as well as the self duality, is a consequence of the following property: for two bases (uλ ) and (vλ ) of Sym, uλ , vμ = δλμ ⇔
(1 − xi yj )−1 = uλ (X)vλ (Y ) i,j
λ
and of the classical Cauchy identity for Schur functions
sλ (X)sλ (Y ) =
(1 − xi yj )−1 . i,j
λ
There is also a second comultiplication defined by δ(pk ) = pk ⊗ pk . This comultiplication is dual to a multiplication known as the internal product ∗, i.e., f ∗ g, h = f ⊗ g, δ(h). 3.2. Noncommutative symmetric functions Imitating the description of Sym as a polynomial algebra in independent indeterminates ek , graded by deg(ek ) = k, we have defined the algebra Sym of formal noncommutative symmetric functions as the free associative algebra on an infinite sequence Λk of noncommuting indeterminates (the noncommutative elementary functions), graded by deg(Λk ) = k. As in the commutative case, one introduces the generating series Λn tn , σ(t) = λ(−t)−1 = S n tn , (20) λ(t) = n≥0
n≥0
of elementary and complete functions, where t is an indeterminate in the ground field K. A concrete realization Sym(A) of this algebra can be given by taking an infinite sequence A = {an |n ≥ 1} of noncommuting indeterminates of degree 1, by setting λ(A; t) =
n≥0
n
Λn (A)t =
←
(1 + tai ) ,
(21)
i≥1
so that Λn (A) gets identified with the sum of all strictly decreasing words of length n, and Sn (A) with the sum of all nondecreasing words of the same length, which are respectively represented as column-shaped and row-shaped Young tableaux.
238
J.-Y. Thibon / An Introduction to Noncommutative Symmetric Functions
The algebra homomorphism Sym(A) → Sym(X) defined by ai → xi , called the commutative image F → F , maps Λn to en , so that Sym is actually a noncommutative lifting of the algebra of symmetric functions. The first really interesting question is “what are the noncommutative power sums ?”. There are indeed several possibilities. If one starts from the classical expression ⎧ ⎫ ⎨ tk ⎬ σ(t) = exp pk , (22) ⎩ k⎭ k≥1
one can choose to define noncommutative power sums Φk by the same formula ⎧ ⎫ ⎨ k⎬ t σ(t) = exp Φk , ⎩ k⎭
(23)
k≥1
but a noncommutative version of the Newton formulas nhn = hn−1 p1 + hn−2 p2 + · · · + pn
(24)
which are derived by taking the logarithmic derivative of (22) leads to different noncommutative power-sums Ψk inductively defined by nSn = Sn−1 Ψ1 + Sn−2 Ψ2 + · · · + Ψn .
(25)
k−1 , one may also regard Introducing the generating function ψ(t) = k≥1 Ψk t σ(t) as the unique solution of the differential equation
d σ(t) = σ(t)ψ(t) dt
(26)
satisfying the initial condition σ(0) = 1. The generating function of the Φk , taken in the form Φ(t) =
k≥1
Φk
tk k
(27)
is then the logarithm of this solution. From this, one realizes that the relation between the two kinds of noncommutative power-sums is of a rather complicated nature. The expression of Φ(t) as a function of the Ψk is known as the continuous Baker-CampbellHausdorff (BCH) series [27,3,33]. It can be written as a Chen series (iterated integrals) in a quite explicit form, and it is usually interpreted as expressing the logarithm of the “evolution operator” σ(t) in terms of the “Hamiltonian” ψ(t) [3]. It is known that the continuous BCH series is a Lie series, so that the Φk are elements of the free Lie algebra L generated by the Ψk , of which they form another system of generators. In fact, any sequence (Fn ) of generators of L, with deg(Fn ) = n can be shown to provide an admissible family of noncommutative power sums, in the sense that the commutative image of Fn is a nonzero multiple of pn . Moreover, the Poincaré-Birkhoff-Witt theorem shows that Sym can be identified with the universal enveloping algebra U (L) of L. As such, Sym is endowed with a canonical comultiplication Δ, for which L is the space of primitive elements (Friedrich’s theorem, see [38]). In particular,
J.-Y. Thibon / An Introduction to Noncommutative Symmetric Functions
ΔΨk = Ψk ⊗ 1 + 1 ⊗ Ψk
ΔΦk = Φk ⊗ 1 + 1 ⊗ Φk ,
239
(28)
and also ΔΛn =
n k=0
Λk ⊗ Λn−k ,
ΔSn =
n
Sk ⊗ Sn−k .
(29)
k=0
That is, the comultiplication of Sym is mapped onto the usual one on Sym under the commutative image homomorphism. There is also an analogue of the canonical involution ω : en ↔ hn , defined in the same way by ω(Λn ) = Sn , and it is easy to check that the signed version ω ˜ (Λn ) = (−1)n Sn is an antipode for Δ. As for ordinary symmetric functions, we first define linear bases by taking monomials in the various families of algebraic generators. Here, our generators do not commute, so that basis elements of the homogeneous component Symn of degree n will be labelled by compositions of n, i.e., ordered sequences I = (i1 , . . . , ir ) of positive integers with i1 + i2 + · · · + ir = n. For a sequence (Gn ) of homogeneous generators with deg(Gn ) = n, we set GI = Gi1 Gi2 · · · Gir . Therefore, we already have the four bases ΛI , S I , ΦI and ΨI . A composition I of n is conveniently pictured as a ribbon diagram, which is a rim-hook shaped skew Young diagram whose successive rows have lengths i1 , i2 , . . . , ir (read from top to bottom in the French convention). For example, the ribbon diagram of shape I = (3, 2, 1, 4) is
The number of compositions of n is equal to 2n−1 . A useful way to realize this is to encode the ribbon diagram of a composition I of n by the subset Des (I) = {i1 , i1 + i2 , . . . , i1 + i2 + · · · + ir−1 } of {1, 2, . . . , n − 1}. The elements of Des (I) are called the descents of the composition. The next basis that should be defined, to pursue the parallel with the commutative theory, would be the analogue of monomial symmetric functions. However, we have no way to achieve this at this point. This is because in the classical case, the monomial basis (mλ ) is dual to the homogeneous one (hλ ), of which we already have the noncommutative analogue (S I ). But since the comultiplication Δ of Sym is obviously cocommutative, Sym cannot be self-dual, and the analogues of the monomial functions will have to live in the dual Hopf algebra Sym∗ , to be discussed in the next section. On the other hand, we can define the analogues of Schur functions. These are the so-called ribbon Schur functions. The set of all compositions of a given integer n is equipped with the reverse refinement order, denoted '. For instance, the compositions J of 4 such that J ' (1, 2, 1) are (1, 2, 1), (3, 1), (1, 3) and (4). The ribbon Schur function (RI ) is defined by the alternating sum RI = (−1)(I)−(J) S J ⇐⇒ S I = RJ , (30) JI
JI
where (I) denotes the length of I. Clearly, (RI ) is also a homogenous basis of Sym.
240
J.-Y. Thibon / An Introduction to Noncommutative Symmetric Functions
The commutative image of a ribbon Schur function RI is the corresponding ordinary ribbon Schur function, which will be denoted by rI . The rI were first introduced by MacMahon [31]. An important property of the ribbon Schur functions is their very simple multiplication formula RI RJ = RIJ + RI·J
(31)
where I J denotes the composition (i1 , . . . , ir−1 , ir + j1 , j2 , . . . , js ) and I · J the composition (i1 , . . . , ir , j1 , . . . , js ).
4. Quasi-symmetric functions Recall that the Hopf algebra of commutative symmetric functions is self-dual. This is not anymore true of Sym, since it is obviously cocommutative. We must therefore compute the dual Hopf algebra Sym∗ (we mean the graded dual). As pointed out in the preceding section, we need it to define the proper generalization of monomial symmetric functions. In the commutative case, the duality between monomial symmetric functions mλ and homogeneous products hλ can be traced back to the Cauchy-type identity (1 − xi yj )−1 = mλ (X)hλ (Y ) , (32) σ(XY ; 1) = i,j
λ
where the sum runs over all partitions. By analogy, let us consider the ordered product σ(XA; 1) :=
→
σ(A; xi ) =
i≥1
→ →
(1 − xi aj )−1
(33)
i≥1 j≥1
where A is a totally ordered set of noncommutative variables, and X a totally ordered set of commutative variables, also commuting with those of A. This is a natural choice, since we know that the dual of Sym will be commutative. Expanding the product, we find σ(XA; 1) = MI (X)S I (A) , (34) I
where the polynomials MI are xij11 xij22 · · · xijrr . MI (X) =
(35)
j1 <j2 <...<jr
Thus, the MI are pieces of monomial symmetric functions. That is, mλ is equal to the sum of the MI labelled by the distinct rearrangements I of the partition λ. The MI form the basis of a subalgebra of K[X], already introduced in 1984 by Gessel [16] under the name quasi-symmetric functions. It is usually denoted by QSym. It is naturally graded, and its homogeneous component QSymn has, like Symn , dimension 2n−1 . One can then define a pairing , between QSym and Sym by MI , S J = δIJ .
(36)
J.-Y. Thibon / An Introduction to Noncommutative Symmetric Functions
241
With this at hand, it is easy to see that QSym is indeed the graded dual Hopf algebra of Sym ([32], see also [12,20]). That is, f ⊗ g , Δ(P ) = f g , P ,
(37)
γ(f ) , P ⊗ Q = f , P Q ,
(38)
where the comutiplication γ of QSym maps a quasi-symmetric function f = f (X) ˆ ), where X +Y ˆ denotes the ordered sum of two disjoint ordered sets, and to f (X +Y u(X)v(Y ) is identified with u ⊗ v. The dual basis of (RI ), denoted by (FI ), which is in some sense a quasi-symmetric analogue of the Schur basis, is given by MJ . (39) FI = JI
Finally, since any symmetric function f is in particular quasi-symmetric, one can expand it on the various bases of QSym. A useful property is then f , G = f , g ,
(40)
where, on the right-hand side, g = G is the commutative image of G and the brackets denote the ordinary scalar product of symmetric functions [16].
5. Representation theoretical interpretations 5.1. Commutative case: symmetric groups and generic Hecke algebras It is known that the irreducible characters χλ of the symmetric group Sn are parametrized by the partitions λ of n. The Frobenius characteristic map is a linear correspondence between characters and symmetric functions, defined by χλ → ch (χλ ) = sλ . This is an isometric isomorphism R(Sn ) → Symn , where R(Sn ) is the vector space spanned by the irreducible characters. One has then ch (φψ) = ch (φ) ∗ ch (ψ), where ∗ is the internal product. The Frobenius character formula expresses the value χ(σ) of a character χ on a permutation σ of cycle type μ as χλ (μ) = sλ , pμ . Another important point is the induction formula, interpreting the ordinary multiplication of symmetric functions in terms of induced representations. If f = ch (ξ) and g = ch (η) are the characteristics of two representations of Sm and Sn , then f g = ch (χ) where χ is the character of Sm+n induced from the character ξ × η of its subgroup Sm × Sn . Schur functions can also be interpreted as characters of GL(n, C). The polynomial representations Vλ of GL(n, C) are parametrized by partitions of length at most n, and sλ (x1 , . . . , xn ) = traceVλ (X) where X is the diagonal matrix X = diag (x1 , . . . , xn ). Schur functions correspond in a similar way to the irreducible representations of the q-deformed structures Hn (q), Uq (gln ) or Fq (GLn ), for generic q. In the case of the Hecke algebra Hn (q), which is the algebra generated by elements T1 , . . . , Tn−1 satisfying the braid relations together with (Ti + 1)(Ti − q) = 0, the character formula can be written as [5] χλμ (q) = sλ (X) , (q − 1)−(μ) hμ ((q − 1)X) ,
(41)
242
J.-Y. Thibon / An Introduction to Noncommutative Symmetric Functions
where the “λ-ring notation” hμ ((q − 1)X) means the following. In general, if X and Y are two (multi-) sets of variables, identified with the formal sum of their elements, the symmetric functions of X + Y , X − Y and XY are defined by pk (X ± Y ) = pk (X) ± pk (Y ) ,
pk (XY ) = pk (X)pk (Y ) ,
(42)
and then by expressing any symmetric function as a polynomial in the power sums. Therefore, the symmetric functions of (q − 1)X = qX − X are the images of those of X under the ring homomorphism pk → (q k − 1)pk . The Hecke algebra Hn (q) was introduced by Iwahori in [18]. The original definition was as follows. Let G = GL(n, Fq ) and B be the subgroup of G formed by upper triangular matrices. Then, G acts on the vector space M = C G/B spanned by the left cosets of B. To decompose this representation, one can use Schur’s lemma and look at the centralizer of CG in End (M ). Iwahori showed that this centralizer is isomorphic to Hn (q). 5.2. Noncommutative case: 0-Hecke algebras When q is a generic complex number, i.e., neither zero nor a nontrivial root of unity, the Hn (q) are semi-simple (in fact isomorphic to CSn ) and the direct sum 4 R(Hn (q)) (43) R(q) = n≥0
where R(Hn (q)) is the vector space spanned by isomorphism classes [M ] of finite dimensional Hn (q)-modules, with addition induced by direct sum, can be identified with Sym, the simple modules Sλ (q) (q-Specht modules) being represented by Schur functions. This defines a characteristic map ch : R(Hn (q)) → Symn . The usual product of symmetric functions corresponds then to induction from Hm ⊗ Hn to Hm+n , i.e., for λ ( m and μ ( n, H (q) ch Sλ (q) ⊗ Sμ (q) ↑Hm+n = sλ sμ . (44) m (q)⊗Hn (q) This statement summarizes a good deal about the representation theory of Hecke algebras in the generic case. For non generic values of the parameter, when Hn (q) is not semi-simple, it is a difficult task to describe R(Hn (q)) as defined above, since this would amount to understand all the indecomposable representations up to isomorphism. The usual strategy to investigate a non semi-simple algebra A is to introduce two kinds of Grothendieck groups (cf. [7]). The first one, usually denoted by G0 (A) is the quotient of R(A) by the relations [M ] = [M ] + [M ] whenever there is a short exact sequence 0 → M → M → M → 0. In G0 (A), [M ] = [N ] whenever M and N have the same simple composition factors, occuring with the same multiplicities. The second one, denoted by K0 (A), is the free abelian group generated by isomorphism classes of finite dimensional projective A-modules. Here, addition corresponds to direct sum. In the sequel, we will rather mean by G0 and K0 the complexified Grothendieck groups. A particularly interesting example of this situation occurs for A = Hn (ζ), where ζ is a primitive k-th root of unity. Then, the direct sums
J.-Y. Thibon / An Introduction to Noncommutative Symmetric Functions G(ζ) =
M
G0 (Hn (ζ))
and
K(ζ) =
n≥0
M
K0 (Hn (ζ))
243 (45)
n≥0
can be respectively identified with a quotient and with a subalgebra of Sym. Precisely, if we denote by Jk the ideal of Sym generated by the power sums pnk , n ≥ 1, and by Sk the subalgebra generated by the pm such that m ≡ 0 mod k, then G(ζ) Sym/Jk
and
K(ζ) Sk .
(46)
The characteristic map ch : G(ζ) → Sym/Jk realizing the first isomorphism is compatible with the previous one, in the sense that the specialized Specht module Sλ (ζ) is mapped to the class s¯λ = sλ mod Jk . The induction formula remains valid but does not tell us that much about the representation theory of Hn (ζ). This is because Sλ (ζ) is usually not irreducible, and essential information is contained in the multiplicities dλμ of its simple composition factors Dμ . These numbers are called the decomposition numbers, and they form the decomposition matrix of Hn (ζ). These matrices have been determined only recently [23,1].
On the other hand, the non semi-simple Hecke algebras Hn (0) are quite wellunderstood. The representation theory of 0-Hecke algebras for general type has been worked out by Norton [35], and special combinatorial features of type A have been described by Carter [5]. There are 2n−1 simple Hn (0)-modules, which are all one-dimensional [5,35]. These irreducible representations are obtained by sending a set of generators to −1 and its complement to 0. We shall label these representations by compositions of n rather than by subsets of generators. Let I be a composition of n and let Des (I) be the associated subset of {1, . . . , n − 1}. The irreducible representation ϕI of Hn (0) is defined by −1 if i ∈ Des (I) , ϕI (Ti ) = (47) 0 if i ∈ / Des (I) . The corresponding Hn (0)-module is denoted by CI . These modules (when I runs over all compositions of n) form a complete system of simple Hn (0)-modules. The direct sum of the Grothendieck groups 4 G(0) = G0 (Hn (0)) (48) n≥0
has therefore a canonical basis, the classes [CI ] of the simple modules, which is naturally labelled by compositions. One may then look for a characteristic map with values in Sym or in QSym. The correct choice is to identify G(0) with the algebra of quasisymmetric functions, the characteristic map being given by ch ([CI ]) = FI .
(49)
This map is again compatible with the characteristic map of the generic case. One has a decomposition map d : G(q) → G(0) sending the class [Sλ (q)] of a generic Specht module to the class [Sλ (0)] of its 0-specialization, which is usually not irreducible, nor even semi-simple. The choice (49) has the property that ch ◦ d = ch that is, ch (Sλ (0)) = sλ , regarded as a quasi-symmetric function.
(50)
244
J.-Y. Thibon / An Introduction to Noncommutative Symmetric Functions
From this, one can easily recover Carter’s combinatorial description of the decomposition matrix of Hn (0). The multiplicity dλI of the simple module CI as a composition factor of Sλ (0) is equal to the coefficient of FI in the quasi-symmetric expansion dλI FI (51) sλ = I
so that by duality, dλI = sλ , RI , and by (40), this is equal to the ordinary scalar product of symmetric functions sλ , rI . The characteristic map at q = 0 is also compatible with the induction product, that is, once again, H (0) ch [M ⊗ N ] ↑Hm+n = ch ([M ])ch ([N ]) (52) m (0)⊗Hn (0) In particular, the composition factors of the induction product of two simple modules CI and CJ are described by the product FI FJ , which is given by an interesting combinatorial formula. To state it, let us first define the shape composition I = C(σ) of a permutation σ as the unique composition of n such that Des (σ) = Des (I). Now, suppose that |I| = m and |J| = n. Take any permutation α ∈ Sm such that C(α) = I and any permutation β ∈ Sn such that C(β) = J. We consider permutations as words on the letters 1, 2, . . ., and we denote by β[m] the shifted word β[m] = (β1 + m) · (β2 + m) · · · (βn + m) .
(53)
Recall that the shuffle product on words is defined inductively by au
bv = a(u
bv) + b(au
v) ,
a, b ∈ A u, v ∈ A∗
(54)
(where A∗ is the set of words on the alphabet A) the initial condition being that the empty word is the unit element. The product formula is then [32] FI FJ = (α β[m] , σ)FC(σ) (55) σ
where (α β[m] , σ) denotes the coefficient of σ in α β[m]. It is tempting to interpret this formula as the shadow of something like Fα Fβ = (α β[m] , σ)Fσ
(56)
σ
in a noncommutative algebra based on all permutations. Such an algebra does exist, it is the Hopf algebra of Free Quasi-Symmetric Functions (cf. [32,9] and F. Hivert’s article in this volume). Let us now look at the K0 -groups of the 0-Hecke algebras. The indecomposable projective Hn (0)-modules have also been classified by Norton (cf. [5,35]). From the general theory of finite dimensional algebras, one knows that for each simple module S, there is a unique indecomposable projective module M such that S = M/rad (M ). Thus, one associates to each composition I of n the unique indecomposable projective Hn (0)module MI such that MI /rad (MI ) ) CI . The canonical duality between G0 (A) and K0 (A), defined by the pairing
J.-Y. Thibon / An Introduction to Noncommutative Symmetric Functions
N , M = dim Hom A (M, N )
245
(57)
tells us that the direct sum of Grothendieck groups 4 K0 (Hn (0)) K(0) =
(58)
n≥0
has to be identified with the algebra of noncommutative symmetric functions. The functorial duality between induction and restriction implies that the characteristic map Ch : K(0) −→ Sym
(59)
[MI ] −→ RI
(60)
is a ring homomorphism, i.e., H
(0)
) = RI RJ = RI·J + RIJ . Ch (MI ⊗ MJ ↑Hm+n m (0)⊗Hn (0)
(61)
The family (MI )|I|=n forms a complete system of indecomposable projective Hn (0)modules, and the decomposition of the left regular representation is multiplicity free: 4 MI . (62) Hn (0) = |I|=n
6. Descent algebras and internal product It remains to introduce an important operation on Sym, namely, the noncommutative analogue of the internal product ∗ of Sym. Recall that it is the dual of the comultiplication δ(f ) = f (XY ). It occurs in algebraic identities such as σ(XY Z; 1) := (1 − xi yj zk )−1 = sλ (X)sμ (Y )(sλ ∗ sμ )(Z) (63) i,j,k
λ,μ
which generalizes the Cauchy identity to three sets of variables. The comultiplication f → f (XY ) makes sense as well for quasi-symmetric functions, if ones takes care of the order on the product set XY . That is, X and Y need to be totally ordered, and one can set xi yj < xk yl whenever (i, j) < (k, l) for the lexicoˆ the product XY endowed with this order. Then, graphic order. Let us denote by X ×Y we can extend δ from Sym to QSym by setting [16] ˆ ). δ(f ) = f (X ×Y
(64)
The internal product ∗ of Sym can now be defined as the dual of δ, that is, f , P ∗ Q = δ(f ) , P ⊗ Q .
(65)
Clearly, each homogeneous component Symn is a ∗-subalgebra. In fact, it follows from results of Gessel [16] that this algebra is anti-isomorphic to the so-called descent algebra Σn of Sn . The descent algebras have been introduced by Solomon [39] for general finite Coxeter groups in the following way. Let (W, S) be a Coxeter system. One says that
246
J.-Y. Thibon / An Introduction to Noncommutative Symmetric Functions
w ∈ W has a descent at s ∈ S if w has a reduced word ending by s. For W = Sn and si = (i, i + 1), this means that w(i) > w(i + 1), whence the terminology. In this case, we rather say that i is a descent of w. Let Des (w) denote the descent set of w, and for a subset E ⊆ S, set DE = w ∈ ZW . (66) Des (w)=E
Then, Solomon showed that the DE span a Z-subalgebra of ZW . Moreover cE DE DE = E E DE
(67)
E
where the coefficients cE E E are nonnegative integers. The canonical anti-isomorphism α : Σn → Symn maps the descent class DE to the ribbon Schur function RI , with I such that E = Des (I). From one of Solomon’s formulas, one obtains the following multiplication rule. Let I = (i1 , . . . , ip ) and J = (j1 , . . . , jq ) be two compositions of n. Then, SI ∗ SJ = SM (68) M∈Mat (I,J)
where Mat (I, J) denotes the set ofmatrices of nonnegative integers M = (mij ) of size p × q such that s mrs = ir and r mrs = js for r ∈ [1, p] and s ∈ [1, q], and where S M = Sm11 Sm12 · · · Sm1q · · · Smp1 · · · Smpq . Note that by definition, if F and G are homogeneous of different degrees, F ∗ G = 0, and that Sn is the unit element of the ∗-subalgebra Symn . Let hλ = hI = S I and hμ = hJ = S J be the commutative images of S I and J S . From the known expression of hλ ∗ hμ in the commutative case, one can see that (S I ∗ S J ) = S I ∗ S J , so that in general F ∗ G = F ∗ G,
(69)
that is, the commutative image is a homomorphism for the internal products. From equation (68), one derives a fundamental formula, whose commutative version is just a special case of the Mackey formula for the restriction of an induced character. Let F1 , F2 , . . . , Fr , G ∈ Sym. Then, (F1 F2 · · · Fr ) ∗ G = μr [(F1 ⊗ · · · ⊗ Fr ) ∗ Δr G]
(70)
where in the right-hand side, μr denotes the r-fold ordinary multiplication and ∗ stands for the operation induced on Sym⊗n by ∗ [12]. In the commutative case, the power-sums pn and more generally the power-sum products are quasi-idempotents (i.e., idempotents up to a scalar factor) for the internal product. Precisely, pμ ∗ pμ = z μ pμ
(71)
J.-Y. Thibon / An Introduction to Noncommutative Symmetric Functions
247
where for μ = (1m1 2m2 · · · nmn ), zμ = i imi · mi !. Therefore, the commutative images of noncommutative power sums and their products are quasi-idempotents, and one may wonder whether there are true quasi-idempotents among them. Thanks to the antiisomorphism α with the descent algebra, we could then use them to construct idempotents in the group algebras of symmetric groups. As an illustration of the formalism, let us try this program with the power sums Ψn . We want to know whether Ψn ∗ Ψn = nΨn . To this end, we can write a generating function for the ∗-squares in the form (xy)n−1 (Ψn ∗ Ψn ) = ψ(x) ∗ ψ(y) , (72) n≥1
since Ψi ∗ Ψj = 0 for i = j. Now, writing (26) in the form ψ(t) =
tn−1 Ψn = λ(−t)σ (t)
n≥1
and applying the splitting formula (70), we get (xy)n−1 (Ψn ∗ Ψn ) = λ(−x)σ (x) ∗ ψ(y) n≥1
= μ [(λ(−x) ⊗ σ (x)) ∗ (ψ(y) ⊗ 1 + 1 ⊗ ψ(y))] = μ [(λ(−x) ∗ 1) ⊗ (σ (x) ∗ ψ(y))] , (since σ (x) has no degree zero terms) ⎛ =⎝
⎞ ⎛ nxn−1 Sn ⎠ ∗ ⎝
n≥1
⎞ y n−1 Ψn ⎠ =
n≥1
(xy)n−1 n Ψn ,
n≥1
the last equality following from the fact that Sn ∗ F = F for F ∈ Symn . Hence, Ψn ∗ Ψn = nΨn , so that θn = α−1 (Ψn ) is a quasi-idempotent of Σn . To see what it looks like, we have to expand Ψn on the ribbon basis. The linear recurrence (25) together with the multiplication formula for ribbons (31) (recall that Sk = Rk ) yields Ψn = Rn − R1,n−1 + R1,1,n−2 − · · · =
n−1
(−1)k R1k ,n−k ,
(73)
k=0
which is analogous to the classical expression of pn as the alternating sum of hook Schur functions. Therefore, in the descent algebra, θn =
n−1
(−1)k D{1,2,...,k} .
(74)
k=0
On this expression, we can recognize a famous element of the group algebra of the symmetric group, namely, Dynkin’s left-bracketing operator (see [38]). The standard left bracketing of a word w = x1 x2 · · · xn is the Lie polynomial
248
J.-Y. Thibon / An Introduction to Noncommutative Symmetric Functions
Ln (w) = [· · · [[[x1 , x2 ], x3 ], x4 ], . . . , xn ] .
(75)
This formula defines a linear operator Ln on the homogeneous component Kn A of the free associative algebra KA. In terms of the right action of the symmetric group Sn on Kn A, defined on words by x1 x2 · · · xn · σ = xσ(1) xσ(2) · · · xσ(n) ,
(76)
one can write Ln (w) =
aσ (w · σ) = w · θn
σ∈Sn
the coefficient aσ being ±1 or 0, according to whether σ is a “hook permutation” or not. To see this, one has only to write the permutations appearing in the first θi as ribbon tableaux and then to argue by induction. For example, θ3 = [[1, 2], 3] = 123 −
3 2 3 − + 2 1 3 1 2 1
and it is clear that when expanding θ4 = [θ3 , 4] one will only get those (signed) tableaux obtained from the previous ones by adding 4 at the end of the last row, minus those obtained by adding 4 on the top of the first column. Thus, we have proved that n1 Ln is a projector, whose image is obviously a subspace of the free Lie algebra. By iteration of Jacobi’s identity, it is easy to see that any Lie element can be written as a linear combination of standard left bracketings, so that what we have actually obtained is a proof of Dynkin’s characterization of Lie elements: a noncommutative homogeneous polynomial P ∈ Kn A is a Lie polynomial if and only if Ln (P ) = nP (cf. [38]). Idempotents such as n1 θn , acting as projectors onto the free Lie algebra are usually called Lie idempotents [38]. So far, our formalism has just led us to an exotic proof of a classical result. Let us now see whether the method contains the germ of some generalization. One ingredient in our proof was the analogue (73) of the expansion of a power sum as an alternating sum of hook Schur functions. This expression has a well known q-analogue, namely, the one involved in the character formula for Hecke algebras (41). It can be written in the form n−1 hn ((1 − q)X) = (−q)k sn−k,1k . 1−q
(77)
k=0
Let us look for a noncommutative analogue of this expression. To this aim, it will be convenient to extend as much as possible the λ-ring notation of the classical theory. Given two totally ordered sets A and B of noncommuting variables, we can define the virtual alphabet A−B by specifying its complete symmetric functions Sn (A−B). Their generating series is defined by σ(A − B; t) := λ(B; −t)σ(A; t) . One also defines the symmetric functions of A + B by
(78)
J.-Y. Thibon / An Introduction to Noncommutative Symmetric Functions
σ(A + B; t) := σ(A; t)σ(B; t)
249
(79)
where now, A and B can be either real or virtual, and for a real ordered commutative alphabet X, the virtual alphabet XA is defined by σ(XA; t) =
→
σ(A, txi ) .
(80)
i≥1
These definitions imply a definition of quasi-symmetric functions of a difference f (X − Y ), which is the same as the one obtained by composing the comultiplication and the antipode, and we can now give a meaning to the noncommutative symmetric functions of all virtual alphabets of the type (X ± Y )(A ± B). The case we have in mind corresponds to X = {1} and Y = {q}. Here, σ((1 − q)A; t) = σ(A − qA; t) = λ(qA; −t)σ(A; t) = k≥0 tk (−q)k Λk (A) l≥0 tl Sl (A) . Taking into account the fact that Λk = R1k and applying the multiplication rule for ribbons, we find that Θn (q) :=
n−1 Sn ((1 − q)A) = (−q)k R1k ,n−k , 1−q
(81)
k=0
is the q-analogue we were looking for. Moreover, it is easy to see that the image of Θn (q) under the isomorphism α−1 is the left q-bracketing of the standard word 1 2 . . . n, that is, α−1 (Θn (q)) = θn (q) = [[. . . [[ 1, 2 ]q , 3 ]q , . . . ]q , n ]q where [R, S]q = RS − q SR. Now, we can prove the following q-analog of Dynkin’s theorem, which, according to what we have already seen, can be understood as describing the left q-bracketing of a homogeneous Lie polynomial: Θn (q) ∗ Ψn = [n]q Ψn .
(82)
The proof works exactly as the previous one. We have ϑ(t) =
Θn (q)tn−1 =
n≥1
σ((1 − q)A; t) − 1 = λ(A; −qt)σq (A; t) . (1 − q)t
where σq denotes the the q-derivative with respect to t. Then, ϑ(t) ∗ Ψn = μ((λ(A; −qt) ⊗ σq (A; t)) ∗ Δ(Ψn )) = μ((λ(A; −qt) ⊗ σq (A; t)) ∗ (1 ⊗ Ψn + Ψn ⊗ 1)) , which implies that
(83)
250
J.-Y. Thibon / An Introduction to Noncommutative Symmetric Functions
ϑ(t) ∗ Ψn = (λ(A; −qt) ∗ 1) (σq (A; t) ∗ Ψn ) = [n]q Ψn tn−1 . Equation (82) means that homogeneous Lie polynomials of degree n are again eigenvectors of the left q-bracketing operator, now with the q-integer [n]q as eigenvalue. They actually constitute the [n]q eigenspace. However, the q-Dynkin element θn (q) is invertible in the group algebra for generic values of q, and its other eigenvalues are nonzero. Its spectral decomposition has been obtained in [20]. The case q = −1 is of special importance: it governs the combinatorics of the peak algebras [2].
References [1] S. Ariki, On the decomposition numbers of the Hecke algebra of G(m, 1, n), J. Math. Kyoto Univ. 36 (1996),789–808. [2] N. Bergeron, F. Hivert and J.-Y. Thibon, The peak algebra and the Hecke-Clifford algebras at q = 0, J. Combin. Theory Ser. A 107 (2004), 1–19. [3] I. Bialynicki-Birula, B. Mielnik and J. Pleba´nski, Explicit solution of the continuous BakerCampbell-Hausdorff problem, Ann. Phys. 51 (1969), 187-200. [4] J. Bray and G. Whaples, Polynomials with coefficients from a division ring, Canad. J. Math. 35 (1983), 509–515. [5] R.W. Carter, Representation theory of the 0-Hecke algebra, J. Algebra 15 (1986), 89–103. [6] A. Connes and A. Schwarz, Matrix Vieta theorem revisited, Lett. Math. Phys. 39 (1997), 349–353. [7] C.W. Curtis and I. Reiner, Methods of representation theory, Vol. I-II, John Wiley & Sons, Inc., New York, 1981. [8] J.F. van Diejen and L. Vinet, The quantum dynamics of the compactified trigonometric Ruijsenaars-Schneider model, Comm. Math. Phys. 197 (1998), 33–74. [9] G. Duchamp, F. Hivert and J.-Y. Thibon, Noncommutative symmetric functions VI: Free quasi-symmetric functions and related algebras, Internat. J. Alg. Comp. 12 (2002), 671–717. [10] G. Duchamp, A. Klyachko, D. Krob and J.-Y. Thibon, Noncommutative symmetric functions III: Deformations of Cauchy and convolution algebras, Discr. Math. and Theoret. Computer Sci. 1 (1997), 159–216. [11] D. Fuchs and A. Schwarz, matrix Vieta theorem, Amer. Math. Soc. Transl. ser. 2 vol. 169, AMS, Providence, 1995. [12] I.M. Gelfand, D. Krob, A. Lascoux, B. Leclerc, V.S. Retakh and J.-Y. Thibon, Noncommutative symmetric functions, Adv. in Math. 112 (1995), 218–348. [13] I.M. Gelfand and V.S. Retakh, Determinants of matrices over noncommutative rings, Funct. Anal. Appl. 25, (1991), 91–102. [14] I.M. Gelfand and V.S. Retakh, A theory of noncommutative determinants and characteristic functions of graphs, Funct. Anal. Appl., 26, (1992), 1-20. [15] I.M. Gelfand, V.S. Retakh and R.L. Wilson, Quasideterminants, Adv. Math. 193 (2005), 56– 141. [16] I. Gessel, Multipartite P-partitions and inner product of skew Schur functions, Contemp. Math. 34 (1984), 289–301. [17] J.A. Green, The characters of the finite general linear groups, Trans. Amer. Math. Soc. 80 (1955), 402-447. [18] N. Iwahori, On the structure of the Hecke ring of a Chevalley group over a finite field, J. Fac. Sci. Univ. Tokyo Sect. I 10 (1964), 215–236. [19] S. Kerov, A.N. Kirillov and N. Yu. Reshetikhin, Combinatorics, Bethe Ansatz, and representations of the symmetric group, J. Sov. Math., 41 (1988), 916-924.
J.-Y. Thibon / An Introduction to Noncommutative Symmetric Functions
251
[20] D. Krob, B. Leclerc and J.-Y. Thibon, Noncommutative symmetric functions II : Transformations of alphabets, Int. J. Alg. Comput. 7 (1997), 181–264. [21] D. Krob and J.-Y. Thibon, Noncommutative symmetric functions IV : Quantum linear groups and Hecke algebras at q = 0, J. Alg. Comb. 6 (1997), 339–376. [22] D. Krob and J.-Y. Thibon, Noncommutative symmetric functions V: A degenerate version of Uq (glN ), Internat. J. Alg. Comp. 9 (1999), 405–430. [23] A. Lascoux, B. Leclerc and J.-Y. Thibon, Hecke algebras at roots of unity and crystal bases of quantum affine algebras, Commun. Math. Phys. 181 (1996), 205–263. [24] D.E. Littlewood, The Theory of Group Characters and Matrix Representations of Groups, 2nd Ed., Oxford, Clarendon Press, 1950. [25] D.E. Littlewood, On certain symmetric functions, Proc. London Math. Soc. 43 (1961), 485498. [26] M. Lothaire, Algebraic Combinatorics on Words, (Chapter 5), Cambridge University Press, 2002. [27] W. Magnus, On the exponential solution of differential equations for a linear operator, Comm. Pure Appl. Math. VII (1954), 649-673. [28] I.G. Macdonald, Symmetric functions and Hall polynomials, Oxford, 1979; 2nd ed. 1995. [29] I.G. Macdonald, Spherical functions on a group of p-adic type, Publications of the Ramanujan Institute, No. 2. Ramanujan Institute, Centre for Advanced Study in Mathematics, University of Madras, Madras, 1971. vii+79 pp. [30] I.G. Macdonald, A new class of symmetric functions, Séminaire Lotharingien de Combinatoire, B20a (1988), 41pp (electronic). [31] P.A. MacMahon, Combinatory analysis, 2 vol., Cambridge University Press, 1915, 1916; Chelsea reprint, 1960. [32] C. Malvenuto and C. Reutenauer, Duality between quasi-symmetric functions and the Solomon descent algebra, J. Algebra 177 (1995), 967–982. [33] B. Mielnik and J. Pleba´nski, Combinatorial approach to Baker-Campbell-Hausdorff exponents, Ann. Inst. Henri Poincaré, Section A, vol. XII (1970), 215-254. [34] A. Molev, Laplace operators for classical Lie algebras, Lett. Math. Phys. 35 (1995), 135–143. [35] P.N. Norton, 0-Hecke algebras, J. Austral. Math. Soc. Ser. A 27 (1979), 337–357. [36] F.D. Murnaghan, The Theory of Group Representations, Baltimore, The Johns Hopkins Press, 1938. [37] A. Nakayashiki and Y. Yamada, Kostka polynomials and energy functions in solvable lattice models, Selecta Math. (N.S.) 3 (1997), 547–599. [38] C. Reutenauer, Free Lie algebras, Oxford, 1993. [39] L. Solomon, A Mackey formula in the group ring of a Coxeter group, J. Algebra 41 (1976), 255–268. [40] R. Steinberg, A geometric approach to the representations of the full linear group over a Galois field, Trans. Amer. Math. Soc. 71 (1951), 274–282. [41] H. Weyl, The Theory of Groups and Quantum Mechanics, New-York, 1931 [42] H. Weyl, The Classical Groups, their Invariants and Representations, Princeton, 1938. [43] R.L. Wilson, Invariant polynomials in the free skew field, Selecta Math. (N.S.) 7 (2001), no. 4, 565–586. [44] B.G. Wybourne, Symmetry Principles and Atomic Spectroscopy, including an Appendix of Tables by P.H. Butler, New-York, Wiley-Interscience, 1970. [45] B.G. Wybourne, Classical Groups for Physicists, New-York, Wiley-Interscience, 1974. [46] B.G. Wybourne, SCHUR, An Interactive Program for Calculating Properties of Lie Groups and Symmetric Functions Euromath Bulletin 2 (1996), 145–159.
This page intentionally left blank
Physics and Theoretical Computer Science J.-P. Gazeau et al. (Eds.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
253
An Introduction to Combinatorial Hopf Algebras — Examples and realizations — Florent Hivert 1 LIFAR, Universit´e de Rouen Abstract. The apparition of combinatorial structures like partitions, trees, graphs in algebraic computation is not new. Two famous earlier examples can be found in the formula of Cayley for integrating vector fields and the theory of Young tableaux in the representation theory of the symmetric group. The fact that combinatorial objects carry naturally some Hopf-algebraic structures has been then widely advertised by Rota. Recently, a large class of examples of very rich algebraic structures (Hopf algebra, Lambda-rings...) appeared naturally in several seemingly unrelated fields of science. One can cite, for example, renormalization of quantum electrodynamics in physics (Connes-Kreimer), category of algebra and operads in algebra (Loday), generalized symmetric functions (Gessel), and generalized generating series in combinatorics. Several of these algebras are closely related to well known algorithms of computer sciences (Schensted algorithm for computing the length of a longest increasing subsequence, Bubble-sort, search-tree insertion). The knowledge of this connection allows one for very simple proofs and very efficient algorithm for computing in these structures. The goal of this course is to present to the audience what kind of algebraic structures, namely the Hopf algebra appears naturally in these fields. In a second time, we will show on two similar examples how two of these Hopf algebras can be naturally constructed in a very efficient way starting from simple computer science algorithms. Keywords. Combinatorics, combinatorial Hopf algebras, symmetric functions, sorting algorithms, search trees
1. Introduction – some motivations In the past recent years, several very different people coming from various parts of Science were naturally led to the same kind of objects, namely “The Combinatorial Hopf Algebras”. A Hopf algebra is roughly speaking both an algebra and the dual of an algebra which are compatible. Consequently there is a product rule which allows one to compose the objects and a co-product rule which decompose 1 Correspondence to: Florent Hivert, LIFAR – Universit´ e de Rouen – Facult´e des Sciences et des Techniques – Avenue de l’universit´e – 76801 Saint Etienne du Rouvray – France; E-mail:
[email protected]
254
F. Hivert / An Introduction to Combinatorial Hopf Algebras
the objects. Though there is for the moment no good definition of what is a “combinatorial” Hopf algebra, there are plenty of examples each of them having a very rich combinatorics. It seems to be a field where algebraists, computer scientists and physicists can understand each other more or less and managing to work together as shown during the first meeting on this subject in Montreal (2001) and Banff (2003).
One of the precursor of the subject is certainly Rota [27,28,14] who was the first to emphasize the importance of Hopf algebra in Combinatorics. He was inspired by the work of Mac-Mahon [24] on symmetric functions. Indeed, the Hopf algebra of symmetric functions is the prototype of what is a combinatorial Hopf algebra. This was deeply used by Zelevinsky [30] is his study of the representation of the classical groups. The representation theory is one of the very important motivations in the study of combinatorial Hopf algebra. It appears as character ring or Grothendieck ring of tower of algebras. This point of view has been further extended to various extensions of the Hecke algebras (Hecke-Clifford, cyclotomic Hecke algebras or Ariki-Koike algebras) in [16,17,6,2,13]. Another strong motivation coming from algebra is the study of operads [19, 20]. One of the main characteristics of the present paper was defined in this setting namely the algebra of binary trees of Loday and Ronco [21,22]. The motivation was the study of a certain class of algebra called Dendriform algebras. These are algebras where the product decomposes into the sum of two operations verifying some relations. The physicists were led to the study of combinatorial Hopf algebras through the quantum field theory. The main problem here is to give a meaning to some complicated divergent series. The classical approach makes use of Feynman’s Diagrams and so is more a receipt than a well founded theory. A systematic approach has been initiated by Connes and Kreimer [5] using some combinatorial Hopf algebras on planar trees. Their algebra is very close to the algebra of Loday and Ronco. This approach is extended by Brouder and Frabetti in [3,4].
In this paper, we present a contribution coming from theoretical computer science to the subject. It appears that the most important examples arise naturally in the study of some very basic searching and sorting algorithms and that this knowledge allows us to construct these objects in a very simple way. All the axioms of Hopf algebras, which are usually very tedious to check when following the standard way in defining Hopf algebras, here are mostly obvious for coming from simple facts. The key idea is not to define, as usual, a combinatorial object together with a product, a coproduct and then to show the various axioms like the associativity, the co-associativity, and the compatibility, but rather to realize the Hopf algebra as a vector space of some maybe complicated non-commutative polynomials verifying some “symmetry” invariance. When this invariance is appropriate the product is inherited from the product of polynomials and the coproduct is defined
F. Hivert / An Introduction to Combinatorial Hopf Algebras
255
using the so-called “alphabet doubling trick”. Then the only things to prove is that the space is stable under the product and the co-product. This is usually done by proving the rules that are usually given as a definition. The main ingredient of the construction of a Hopf algebra is then an algorithm which computes the “symmetry class” S(w) of a word. Then the definition Fσ :=
w
(1)
w | S(w)=σ
where σ is a “symmetry class” defines all what is needed, provided the algorithm verifies certain compatibility properties which are very easy to check. Here the word “symmetry” is to be understood in a very large sense and I must confess that it took several years to my research team to become aware how much wide is this sense.
The goal of the present paper is to introduce interested people to this beautiful subject and certainly not to build a formal base for the theory of Hopf algebras. Though we give all the formal definitions, the reader is strongly advised to refer to basic textbooks on this subject as [29,1,26].
2. A very simple example We start by giving a simple example namely the non-commutative polynomials hoping that this will introduce smoothly the formalism and the key ideas. 2.1. The free algebra One of the simplest combinatorial objects is the word, that is a finite sequence of letters coming from a set X called the alphabet. For example, ababd is a word over the alphabet X = {a, b, c, d, e}. There exists an empty word which is denoted , and the concatenation (gluing) of to word w and w is denoted by w · w or simply by ww . The word ababd concatenated with bab gives ababdbab. The empty word is a neutral element for the concatenation which is associative: for all words w, w and w , one has ·w = w· = w
and (w · w ) · w = w · (w · w ) =: w · w · w
(2)
One encapsulates all these properties by saying that the set of words X∗ endowed with the unit and the product · is a monoid. If one want to count some multiplicities, one is naturally led to consider linear combinations of words with integer coefficients, i.e. that are elements of the free algebra generated by Xwith scalar in Z denoted by ZX. A typical element of ZX is of the form w cw w, for example 3abc + 5baba + 2. The product is naturally extended by linearity cw w cw w := cw cw w · w . (3) w
w
w,w
256
F. Hivert / An Introduction to Combinatorial Hopf Algebras
For example, (3abc + 5baba + 2)(3a + bb) = 9abca + 15babaa + 6a + 3abcbb + 5bababb+2bb. Note that the product symbol · is here implicit. It should be noticed that each word w has a length denoted by (w) and that the concatenation is compatible with this length in the sense that (w · w ) = (w) + (w ) .
(4)
Hence the space ZX decompose into ZX =
4
Zn X
(5)
n∈N
where Zn X is the linear combination of words of length n. Such a linear combination is called homogeneous of degree n and the compatibility property can be restated as the inclusion: Zn X Zm X ⊂ Zn+m X .
(6)
All these facts (unit, associativity, homogeneous decomposition compatible with the product) is resumed by saying that ZX is a graded algebra. 2.2. A first co-product In order to define Hopf algebra we now want to introduce the notion of co-product. Recall that a coproduct is a notion dual to the notion of a product. A product is a composition law and consequently a coproduct is a decomposition law. A very simple but powerful way to define coproduct is the “alphabet doubling” trick presented here through an example. Let w be a word. We compute the coproduct Δ(w) through the following sequence of operation. First we compute the sum of all possible replacements of the letters of w by any of two different copies of it, say a small one and a capital one. The letter a is replaced by a or A, the letter b is replaced by b or B and so on. Then we decide that the small letters and the capital one commute with each other, thus we can put the small one on the left and the capital one on the right. Finally, the concatenation of the two words wW is replaced by a tensor product w ⊗ w where w is the word with small letters associated to W . Here is an example: Δ(aba) = aba + Aba + aBa + abA + ABa + AbA + aBA + ABA = aba + ba A + aa B + ab A + a AB + b AA + a BA + ABA = aba ⊗ + ba ⊗ a + aa ⊗ b + ab ⊗ a + a ⊗ ab + b ⊗ aa + a ⊗ ba + ⊗ aba Recall now that the tensor product symbol is bilinear that is for any scalar c and objects U and V one has c (U ⊗ V ) = (cU ) ⊗ V = U ⊗ (cV ) . It is then possible to collect the preceding example to get the multiplicities:
(7)
F. Hivert / An Introduction to Combinatorial Hopf Algebras
257
Δ(aab) = aab ⊗ 1 + 2 (ab ⊗ a) + aa ⊗ b + 2 (a ⊗ ab) + b ⊗ aa + 1 ⊗ aab . Moreover, we decide to extend this operation by linearity, hence defining a linear operation Δ : ZX −→ ZX ⊗ ZX .
(8)
This operation goes in the reverse way of a product: indeed, a product on A is usually defined as a bilinear operation A × A → A, but it can consequently be seen as a linear operation A ⊗ A → A. This can be formalized using the duality, but for the moment we prefer to concentrate on some properties of this operation. 2.3. Co-associativity If w is a word, the coproduct Δ(w) is a linear combination c(w1 ⊗ w2 ) of ⊗ w . We can therefore decide to compute the linear combination symbols w 1 2 c(Δ(w1 ) ⊗ w2 ) that is (Δ ⊗ Id)(Δ(w)) where Id is the identity map and (f ⊗ g)(x ⊗ y) := f (x) ⊗ g(y) .
(9)
If we do the same on the right we get the same result:
c(Δ(w1 ) ⊗ w2 ) =
c(w1 ⊗ Δ(w2 )) .
(10)
This is not surprising because if we go back to the definition of the coproduct we realize that both of these expressions are computed by taking three copies of the original alphabet: in the left hand side each letter a is replaced by a or A and then by a or (A or A ) whereas in the right hand side the letter a is replaced by a or A and then by (a or a ) or A . If we have used three different colors, there would have been no differences. Thus we have the following identity (Δ ⊗ Id) ◦ Δ = (Id ⊗Δ) ◦ Δ
(11)
where ◦ is the composition of functions. This property is called co-associativity because it is dual to the associativity property. Indeed, if the product A × A → A is considered as a linear operation μ : A⊗ A → A, then the associativity property can be rewritten has μ ◦ (μ ⊗ Id) = μ ◦ (Id ⊗μ)
(12)
The word dual is here to be understood as “reversing” (transposing the matrices) the operation. In this paper we decide not to emphasize on duality, therefore the reader is strongly encouraged to have a look at [1,29,26] for a more developed theory including the diagrammatic notation together with the “reversing the arrows” principle. If we denote by c the operation which sends all the words to 0, except the empty one which is sent to 1, we have the following identity
258
F. Hivert / An Introduction to Combinatorial Hopf Algebras
(c ⊗ Id)(Δ(w)) = (Id ⊗c)(Δ(w)) = w
(13)
through the identification s ⊗ w = w ⊗ s = w if s is a scalar. Again, this is obvious in our example since (Id ⊗c)Δ(w) amounts to do the doubling alphabet tricks and then to erase the terms w ⊗ W where W is not empty. This property is dual to the property μ( ⊗ w) = μ(w ⊗ ) = w
(14)
and c is therefore called a co-unit. We resume these properties in the following definition Definition 1. A co-algebra over the field K is a K-vector space H together with two maps Δ : H → H ⊗ H
and
c : H → K
(15)
such that Δ is a co-associative coproduct (Equation (11)) admitting c for co-unit (Equation (13)). Before going further it should be noted that the co-product is compatible with the length is the sense that Δ(Zm X) ⊂
4
Zi X ⊗ Zj X .
(16)
i+j=m
One says that the co-algebra is graded. 2.4. Compatibility between the product and the co-product We want here to introduce in our example the main axiom of the theory of Hopf algebras. First of all, we naturally extend the multiplication to tensors as (u ⊗ v)(u ⊗ v ) = uu ⊗ vv .
(17)
Let v and w two words. Their respective co-products can be written as linear combinations s(v1 ⊗ v2 ) and t(w1 ⊗ w2 ) where s and t are scalars. We want to compare Δ(v)Δ(w), that is
st(v1 w1 ⊗ v2 w2 ) t(w1 ⊗ w2 ) = s(v1 ⊗ v2 )
(18)
with the co-product Δ(v · w). Let us do it first on an example. We start with Δ(a) = a ⊗ + ⊗ a Δ(ab) = ab ⊗ + a ⊗ b + b ⊗ a + ⊗ ab First we expand the product of these two expressions:
F. Hivert / An Introduction to Combinatorial Hopf Algebras
259
Δ(ab)Δ(a) = (ab ⊗ + a ⊗ b + b ⊗ a + ⊗ ab)(a ⊗ + ⊗ a) = aba ⊗ + aa ⊗ b + ba ⊗ a + a ⊗ ab + ab ⊗ a + a ⊗ ba + b ⊗ aa + ⊗ aba . The direct computation of the coproduct of a · ba gives Δ(aba) = aba ⊗ + ba ⊗ a + aa ⊗ b + ab ⊗ a + a ⊗ ab + b ⊗ aa + a ⊗ ba + ⊗ aba , which is the same linear combination. Actually, this is easily seen to be always true since the concatenation clearly commutes with the “doubling alphabet trick”. Moreover, the co-product of a word v = l1 l2 . . . lr is nothing but the tensor notation of the product (l1 + L1 )(l2 + L2 ) . . . (lr + Lr )
(19)
where the small l’s and the capital L’s commute with each other. Thus it is clear that if w = l1 . . . ls both Δ(v)Δ(w) and Δ(vw) are tensor notation for (l1 + L1 )(l2 + L2 ) . . . (lr + Lr ) (l1 + L1 )(l2 + L2 ) . . . (ls + Ls ) ,
(20)
and thus are equal. This leads us to the following definition Definition 2. A bi-algebra is a vector space H endowed with a structure of algebra (·, 1) together with a structure of co-algebra (Δ, c) satisfying the compatibility relation Δ(xy) = Δ(x)Δ(y) .
(21)
A bi-algebra is said to be graded for a degree if both the algebra and the coalgebra are graded. If moreover the homogeneous component of degree 0 is the line spanned by the unit the bi-algebra is said to be connected. Note 1. All the bi-algebras considered in this paper are graded and connected. In the usual definition of a Hopf algebra one more ingredient is required called the antipode. The graded-connected hypothesis ensures that the antipode is defined and thus all the bi-algebras considered here are Hopf algebras. Therefore we will stick to the name Hopf algebra without speaking about the antipode.
3. The bubble sort algorithm and the free quasi-symmetric functions 3.1. Basic properties The goal of this section is to construct the Hopf algebra of permutations first defined by Malvenuto-Reutenauer [25] and called here the algebra of free quasisymmetric functions. This algebra arises naturally in the study of a very simple (and inefficient) sorting algorithm called the bubble sort. Let us recall this algorithm
260
F. Hivert / An Introduction to Combinatorial Hopf Algebras
Algorithm 3 (Bubble sort). INPUT : a word OUTPUT : the associated sorted word. • Given a word w = l1 l2 . . . ln , find two adjacent letters li , li+1 , in the wrong order li > li+1 and exchange them. • Repeat the procedure until the word is sorted. Of course there is a need for a strategies of searching for the two letters. Actually there are several one, for example: • • • •
left-to-right scanning strategy; right-to-left scanning strategy; go-and-back scanning strategy; try the leftmost one, etc.
The chosen strategy is irrelevant for our study, we therefore stick to the left-toright scanning strategy. Here are examples of the execution of the bubble sort on three words: C
A
B
A
E
A
D
D
A
C
B
E
B
D
5 1 4 2 7 3 6
A
C
B
A
E
A
D
A
D
C
B
E
B
D
1 5 4 2 7 3 6
A B
C
A
E
A
D
A
C
D
B
E
B
D
1 4 5 2 7 3 6
A B
A
C
E
A
D
A
C
B
D
E
B
D
1 4 2 5 7 3 6 1 4 2 5 3 7 6
A B
A
C
A
E
D
A
C
B
D
B
E
D
A B
A
C
A D
E
A
C
B
D
B
D
E
1 4 2 5 3 6 7
A
A
B
C
A D
E
A
B
C
D
B
D
E
1 2 4 5 3 6 7
A
A
B
A
C
D
E
A
B
C
B
D
D
E
1 2 4 3 5 6 7
A
A
A
B
C
D
E
A
B
B
C
D
D
E
1 2 3 4 5 6 7
As it can be seen, though they are different, the bubble sort behaves the same. We want to formalize this notion of execution of the bubble sort. Definitions 4. An inversion of a word w = l1 l2 . . . ln is a pair (i < j) of integers such that i<j
and
l i > lj .
(22)
A descent of a word w = l1 l2 . . . ln is an integer i such that li > li+1 .
(23)
Clearly at each step, the bubble sort chooses a descent and removes it. Moreover, doing that it removes only one inversion and thus, the number of inversions is the number of steps of the bubble sort algorithm. Let Sn denote the symmetric group of size n, that is the group of the n! permutations of the set {1,. . . n}. A permutations σ is identified with the word σ(1)σ(2) · · · σ(n). Such a word is called standard. Among the permutations, we are interested in the elementary transposition σi = (i, i+1) which exchanges i and i + 1 and leaves unchanged all other integers. There is an action of the symmetric group on words from the right:
F. Hivert / An Introduction to Combinatorial Hopf Algebras
(a1 a2 · · · an ) · σ = aσ(1) aσ(2) · · · aσ(n) .
261
(24)
For example abbac · 31452 = baacb Definition 5. An execution of the bubble sort is a sequence of transpositions [σi1 , . . . , σim ], sorting the word in increasing order. Actually, one easily sees that the resulting permutation exec(w) := σ = σi1 ◦ · · · ◦ σim
(25)
encodes the whole information. Hence the goal of the bubble sort is to compute the smallest (using the minimal number of elementary transpositions) permutation σ := exec(w) such that w · σ is sorted. A natural question is to describe the set of the words which have the same bubble sort execution. The answer is provided by the following simple notion: Definition 6. Given a word w = l1 l2 . . . ln of length n, there exists a unique permutation Std(w) := σ of Sn which has the same inversion as w: for i < j
then
σ(i) > σ(j)
iff
l i > lj .
(26)
The permutation Std(w) ∈ Sn is called the standardized of w. The simplest way to prove this is to give an effective algorithm. Std(w) can also be defined as the permutation obtained by iteratively scanning w from the left to the right, and labelling 1, 2, . . . the occurrences of its smallest letter, then numbering the occurrences of the next one, and so on. For example Std(abcadbcaa) = 157296834 as seen on the following picture: a b c a d b c a a a1 b5 c7 a2 d9 b6 c8 a3 a4 1 5 7 2 9 6 8 3 4 Here is the link between the notion of standardization and bubble sort executions. Proposition 7. For any word w, the word w · Std(w)−1 is sorted. Moreover, exec(w) = Std(w)−1 .
(27)
Proof. Clearly, exec(w) = exec(Std(w)), because exec(w) depends only on the inversion of w and by definition w and Std(w) have the same inversions. Now, for any permutation σ, the result of the sorting is σ · exec(σ) and is the identity permutation. But for permutations, the action and the composition coincide σ·μ = σ ◦ μ. Consequently exec(σ) = σ −1 .
262
F. Hivert / An Introduction to Combinatorial Hopf Algebras
3.2. Execution and concatenation We are now interested in the following natural question: Let u and v two words of execution σ and μ. What are the possible executions of the sorting of the word uv ? More formally, let A be a totally ordered alphabet. Define the language (set of word) Lσ (A): Lσ (A) := {w ∈ A∗ | exec(w) = σ}
(28)
For example: L12 = {sorted words of length 2} L123...n = {sorted words of length n} Lnn−1...21 = {strictly decreasing words of length n} L2143 = {bacb, badc, cadc, cbdc, . . . } = {yxtz | x < y ≤ z < t} The question is now restated as: describe the language Lα (A)Lβ (A) for any permutations α and β. The answer is provided by the shuffle product Definition 8. The shuffle product recursively by w xu
yv = x(u
of two words is the element of ZA defined
=
yv) + y(xu
w=w x, y ∈ A,
v)
u, v ∈ A∗ .
Alternatively, u v is defined as the sum of all ways of building a word w together with two complementary subwords equal to u and v. Here is an example: aba
cb = abacb + abcab + abcba + acbab + acbba + acbba + cabab + cabba + cabba + cbaba = abacb + abcab + abcba + acbab + acbba + acbba + cabab + 2 cabba + cbaba
Of course the sum of the coefficients of u v is the binomial coefficient Then the main result of this section is the following theorem
(u)+(v) (u)
.
Theorem 9 (Duchamp-H.-Thibon [6]). For any permutations α ∈ Sm and β ∈ Sn , the language Lα Lβ is a disjoint union of languages Lμ : 5
Lα Lβ =
μ∈α
Lμ ,
β[m]
where β[m] := β1 + m . . . βn + m ∈ S(n + 1, n + 2, . . . , n + m). Here are some examples of products:
(29)
F. Hivert / An Introduction to Combinatorial Hopf Algebras
263
L12 L123 = L34512 * L34152 * L34125 * L31452 * L31425 * L31245 * L13452 * L13425 * L13245 * L12345 L21 L123 = L34521 * L34251 * L34215 * L32451 * L32415 * L32145 * L23451 * L23415 * L23145 * L21345 It is remarkable that Equation (29) is independent of the underlying language A, the only difference is that if A is too small, then some Lμ (A) are empty, for instance L3,2,1 ({a, b}) is empty. For this reason, it is easier to work with an infinite language. A formal proof of this theorem can be found in [6]. It essentially relies on the following equivalence for any permutation σ1 σ2 . . . σn+m : • Std(σ1 σ2 . . . σn ) = α and Std(σn+1 . . . σn+m ) = β is equivalent to • σ −1 occurs in the shuffle of α−1 and β −1 . 3.3. Free Quasi symmetric functions At this stage a simple remark is required. To be able to later take care of multiplicities, instead of working with languages, it is better to work with their characteristic series. The language Lσ (A) is then replaced by the noncommutative formal series w ∈ ZA . (30) Fσ (A) = exec(w)=σ
Thus we work in a sub-algebra of the free algebra. Definition 10. The subalgebra of CA FQSym(A) =
4 4
C Fσ (A)
(31)
n≥0 σ∈Sn
is called the algebra of free quasi-symmetric functions. It is convenient as this point to take an infinite alphabet A. Indeed, if A is infinite then the structure of FQSym(A) is independent of A, the resulting algebra is denoted FQSym. Note that there is an empty permutations () and that F() = which can be identified with the scalar 1. Thus the theorem 9 is now seen as the product rule of FQSym: Proposition 11 (Duchamp-H.-Thibon [6]). α ∈ Sm and β ∈ Sn . Then, Fσ Fα Fβ = σ∈α
(32)
β[m]
This was the original definition of Malvenuto-Reutenauer [25]. For example: F132 F21 = F13265 + F13625 + F13652 + F16325 + F16352 + F16532 + F61325 + F61352 + F61532 + F65132 .
264
F. Hivert / An Introduction to Combinatorial Hopf Algebras
3.4. The co-product of FQSym We want now to give FQSym a structure of a Hopf algebra, namely we need to define a coproduct. We will use an adapted alphabet doubling trick. From now on all alphabets are supposed infinite. Definition 12. Let A and B be two infinite, totally ordered, mutually commuting ˆ of A and B is the union of A and B where the alphabets. The ordered sum A+B variables of A are smaller than the variables of B. Then the coproduct of Fσ defined is defined as follows: We expand Fσ over ˆ Since the variables of A and B commute mutually, we reorder the alphabet A+B. the resulting expression to put the letters from A on the left and those from B on the right. We have now an expression in KA KB. But it happens, and this is the only thing to prove, that this expression actually belongs to the sub-algebra FQSym(A)FQSym(B). Then we use the tensor notation to get an element of FQSym(A) ⊗ FQSym(B) ≈ FQSym ⊗ FQSym. To summarize the coproduct is defined by ˆ −→ Fσ −→ Fσ (A+B)
Fα (A)Fβ (B) −→
Fα ⊗ Fβ
(33)
ˆ one has For example, let A = {a < b < · · · } and B = {A < B < · · · }. In A+B, z < A. then by definition
F312 =
yzx =
x
x
yzx +
yyx
x
This given ˆ = bba + bca + bAa + AAa + ABa + BBA + · · · F312 (A+B) = bba 1 + bca 1 + ba A + a AA + a AB + 1 BBA + · · · Δ(F312 ) = F312 ⊗ 1 + F21 ⊗ F1 + F1 ⊗ F12 + 1 ⊗ F312 The precise rule is given by Proposition 13 (Duchamp-H.-Thibon [6]). The coproduct in FQSym is given by Δ(Fσ ) =
n
FStd(w1 ...wk ) ⊗ FStd(wk+1 ...wn ) ,
(34)
k=0
for all permutation σ = w1 . . . wn . The proof is very similar to Theorem 9 and can be found in [6]. At this stage some remarks are in order. By construction, the co-product is coˆ +C ˆ = A+(B ˆ +C). ˆ associative because (A+B) Moreover the compatibility relation ˆ ˆ holds because the expansion of Fα (A+B)F (A +B) is the same as the expansion β of Fα (A)Fβ (A) provided that A and B be infinite. Thus the only non-trivial thing is that
F. Hivert / An Introduction to Combinatorial Hopf Algebras
265
ˆ ⊂ FQSym(A)FQSym(B) . FQSym(A+B)
(35)
which is precisely the result of the previous propositions. Hence we have proved for free that Theorem 14. FQSym is a Hopf algebra. This illustrate the strategy of the construction of Hopf algebra by realizations. Let us summarize it. The element of the Hopf algebra are defined (realized) as a vector space H of some kind of complicated polynomials (commutative or not). The product is inherited from the product of polynomials. The co-product is defined using an alphabet doubling trick. Then one must prove that the space H is stable by the product and the coproduct. This is usually done by proving an explicit formula. Then for free we have the associativity, the co-associativity and the compatibility. This is a interesting exercise to prove the compatibility relation using the explicit formulas for the product and co-product.
Going back to the bubble sort, the coproduct answers the following question: If a word w is sorted by a permutation σ = exec(w), what is the execution of the sort of the sub-word obtained from the k biggest (n − k smallest) letters of the word w ?
4. The binary search trees We want to apply the preceding strategy to define a Hopf algebra on binary rooted trees. This algebra has been first defined by Loday and Ronco [21] and is closely related to the renormalisation Hopf algebra of Connes and Kreimer [5]. As in the previous section, we start with a simple and powerful algorithm from computer science, namely the binary-searching algorithm. The reader interested in this algorithm can refer to [15]. The construction given here is fully described in [9,10,12].
4.1. The binary search insertion algorithm Through this paper, by binary tree we means (un-complete) planar rooted binary tree, precisely a binary tree is either void ∅ or is a pair of (possibly void) binary trees grafted on a node. The size of a binary tree is its number of nodes. The (2n n) . Here are the number of binary trees of size n is the Catalan number Cn := n+1 first value together with the associated trees:
266
F. Hivert / An Introduction to Combinatorial Hopf Algebras
1, 1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796 ∅
By labeled trees we means a tree with a label attached to each node, the label are taken either from the alphabet A or from N. Definition 15. A binary search tree T is a labeled binary tree such that for each node n the label of n is greater or equal than all the labels of the left subtree and strictly smaller than all the label of the right one. A decreasing tree T is a labeled binary tree such that for each node n the label of n is greater or equal than all the labels of the left subtree and strictly greater than all the label of the right one. A labeled tree which is labeled by the first natural numbers 1, . . . , n where each number appears only once is called standard. Here are some examples: ≤
b
a a
binary tree (BT)
≤
<
d b
d
d
>
c
d e
a
c
b
a
c
b
search tree (BST)
decreasing tree (DT)
The fundamental property of binary search trees is given by the following proposition. Proposition 16. If T is a binary search tree and a is a letter, there exists one and only one position to add a leaf labeled a such that the resulting tree T ← a is a binary search tree. This is proved by the following algorithm. Algorithm 17 (Binary search tree insertion). INPUT: a binary search tree T and a letter a OUTPUT: the binary search tree T ← a. • • • •
if T is empty then return the tree a ; compare a with the root of T ; if it’s smaller or equal insert recursively a in the left subtree; if it’s greater insert recursively a in the right subtree.
Here is an example:
267
F. Hivert / An Introduction to Combinatorial Hopf Algebras
b b a a
b
a
←c
d
d
= a
b
e
d
e
d
c
Then the insertion of a word w is the result of the consecutive insertions of its letters. The resulting tree is denoted by Tree(w). The reader has to be careful that, for compatibility with the convention of [21], we have decided to read the letters of the word from the right to the left. The figure 1 shows the insertion of the word cadbaedb. The meaning of the second tree will be explained later.
b,8
b
d,7
∅, ∅ −−−−→ b , 8 −−−−→
7
−−−−→ a 6
a d
e
, 5
d
d,3
−−−−→
7
e
b a,2
−−−−→
7 4
6
4
6 8
b
, 5
d
a,5
−−−−→
7
8
8
b
,
d e
b,4
7
e
b
e,6
−−−−→ b
, 5
d
8
b
8
8
b a
d
,
3
a a
6
,
d b
e
d
5 2
4
−−−−→
a
d b
3
6
8
b a
c,1
7
d
e
,
5 2
c
7 4
3
6
1
Figure 1. sylvester insertion of the word cadbaedb.
4.2. The sylvester monoid In this subsection, we first focus on the following question. Given a binary search tree T , how to describe the set of words that inserts to T ? We are lead to a monoid structure on binary search trees called the sylvester monoid. We start be reading some words from a tree Definitions 18. • The infix reading of a labeled tree is the word w obtained by reading recursively the left subtree, the root and the right subtree; • the (left to right) postfix reading of a tree T is the word wT obtained by reading the left subtree, then the right and finally the root.
268
F. Hivert / An Introduction to Combinatorial Hopf Algebras
It is clear that the infix reading gives the sorting of the word. This is the binary search tree sorting algorithm. Proposition 19. Let T be a binary search tree and wT its postfix reading. Then Tree(wT ) = T . Moreover wT is the smallest word (for the lexicographic order) w such that Tree(w) = T . The word corresponding to the previous tree is the word wBT = abacdedb. Note that the biggest word w such that Tree(w) = T is obtained by a right to left postfix reading, in our example we get ecddbaab. Let us first define the sylvester monoid by means of congruencies without using any tree. Definition 20. Let w1 and w2 be two words. They are said to be sylvester adjacent if there exists three words u, v, w and three letters A ≤ B < C such that w1 = u AC v B w
and
w2 = u CA v B w .
(36)
The sylvester equivalence is the transitive closure of the sylvester adjacency. That is, two words u, v are sylvester equivalent if there exists a chain u = w1 , w2 , . . . , wk = v
(37)
of words such that wi and wi+1 are adjacent for all i. In this case we write u ≡sylv v. Definition 21. The sylvester monoid Sylv(A) is the quotient of the free monoid A∗ by the sylvester equivalence: Sylv(A) := A∗ / ≡sylv . Note that ≡sylv is a congruence on A∗ , that is for any words u, v1 , v2 , w such that v1 ≡sylv v2 then uv1 w ≡sylv uv2 w. For example the class of the word 21354 is the set {52134, 25134, 21534, 21354} . The fundamental theorem is the following: Theorem 22. Two words u and v are equivalent if and only if they correspond to the same binary search tree: Tree(u) = Tree(v). Consequently the postfix readings (wT )T of a binary search tree give a section of the sylvester monoid, that is there is one and only one tree-word in each equivalence class of ≡sylv . This is very similar to a classical construction in algebraic combinatoric called the plactic monoid [18,23]. The plactic monoid is related to an algorithm called Schensted algorithm in the same way as the sylvester monoid is related to the binary search algorithm. Actually, it is possible to have not only the analogue of Schensted insertion, but also the full Robinson-Schensted correspondence. Suppose that σ ∈ Sn is a
269
F. Hivert / An Introduction to Combinatorial Hopf Algebras
permutation of {1, 2, . . . , n}. To σ we associate a decreasing tree DT(σ) as follows. The root is labeled by the biggest letter n of σ and if, as a word σ = unv, then the left subtree is DT(u) and the right subtree is DT(v). For each w ∈ A∗ , we set Q(w) = DT((Std w)−1 ). For example, with w = cadbaedb, one has Std(w) = 51632874, the inverse is Std(w)−1 = 25481376, the decreasing tree is therefore 8
Q(cadbaedb) =
5 2
7 4
3
6
.
1
The following theorem is the analogue of the Robinson-Schensted correspondence w −→ (P(w), Q(w)), where P(w) and Q(w) are two young tableaux of the same shape (Q(w) is standard). The trees play here the role of the tableaux [18,23]. Theorem 23. For each word w ∈ A∗ , the tree Tree(w) is obtained by replacing in Q(w) each label i by the i-th letter of w. The tree Q(w) records the reverse order of creation of the node in the binary insertion of w. The map w −→ (P(w), Q(w)) is a bijection between A∗ and the set of pairs of binary search trees and standard decreasing tree of the same shape (unlabeled tree). The two first points are easy to see and can be checked on the following example: ⎛ ⎜ ⎜ cadbaedb −→ ⎜ ⎝
a a
d b
d c
⎞
8
b
e
,
5 2
7 4
3
6
⎟ ⎟ ⎟. ⎠
1
The third point is proved by giving the converse bijection: the word corresponding to a pair (BST, DT ) is obtained by reading the labels of BST in the order of the label of DT . Note that, in opposition to the plactic insertion the labels do never move until they reach a leaf. 4.3. The algebra of trees We want now to investigate the following question. When we project the Fσ (A) into the sylvester algebra Fσ (A) = w −→ Fσ (A/ ≡sylv ) := Tree(w) , (38) exec(w)=σ
exec(w)=σ
what remains of the Hopf-algebra structure ? Let us denote Gσ = Fσ−1 so that Gσ (A) = Std(w)=σ w. It is clear that Tree(std(w)) and Tree(w) have the same shape. Moreover, there is only one BST of a given shape so that
270
F. Hivert / An Introduction to Combinatorial Hopf Algebras
Proposition 24 (compatibility with standardization). Let u, v be two words. Then the following assertions are equivalent • u ≡sylv v; • Std(u) ≡sylv Std(v) and for all letter a one has |u|a = |v|a , • Sh(Std(u)) = Sh(Std(u)) and for all letter a one has |u|a = |v|a , where |w|a denotes the number of a in the word w. Therefore Gσ (A/ ≡sylv ) depends only on the shape of Tree(σ). Let us define QT = Gσ (A/ ≡sylv )
(39)
for any σ such that Sh(Tree(σ)) = T . Then the monoid structure of Sylv(A) makes it clear that the product Gσ Gμ (A/ ≡sylv ) depends only on the shape of σ and μ. Therefore the quotient FQSym(A/ ≡sylv ) inherits naturally an algebra structure. Here is an example of product Q Q
=Q
+ Q
+ Q
+ Q
+ Q
+ Q
+ Q
+ Q
+ Q
+ Q
(40)
ˆ ≡sylv ). To deal with the co-algebra structure we need to investigate Gσ (A+B/ This is done by the following proposition. For any subset I of the alphabet A and any word w on A, let us denote w/I the word obtained from w by erasing the letters that are not in I. Proposition 25 (compatibility with restriction to intervals). Suppose that I is an interval of A. Then u ≡sylv v implies u/I ≡sylv v/I. ˆ one has As a consequence, since A and B are intervals of the alphabet A+B ˆ ≡sylv ) depends only on the shape of σ. Therefore the quotient that Gσ (A+B/ FQSym(A/ ≡sylv ) inherits naturally a co-algebra structure. Here is an example of co-product ΔQ
=Q + Q
⊗1 + Q ⊗Q
+ Q
⊗Q ⊗Q
+ Q
⊗Q
+ Q ⊗Q
(41)
+ 1⊗Q
Hence we have proved that Theorem 26. The equivalence relation ≡sylv is compatible with the Hopf-algebra structure on FQSym. The quotient Hopf algebra is a Hopf algebra whose basis are indexed by binary trees.
F. Hivert / An Introduction to Combinatorial Hopf Algebras
271
This Hopf algebra is isomorphic to the algebra of Loday and Ronco. Actually we could have worked in a dual way: Theorem 27. The space spanned by
PT :=
DT(w)=T
w=
FT
(42)
Sh(BST(σ))=T
if a sub-Hopf algebra of FQSym which is dual to FQSym(A/ ≡sylv ). The compatibility with restriction to intervals shows that the vector space spanned by the PT is stable under the product.
5. Toward a general theory 5.1. Free Schur functions In the preceding section we have used a certain relation on words to define a quotient Hopf algebra of FQSym. Actually, we only used very few properties of this congruence and there are several other well known examples. Let’s just describe without proof another very important one. It is related to Robinson-Schensted algorithm. We will not recall the algorithm, the reader should refer to [23] for the details. Recall a partition λ = (λ1 ≥ · · · ≥ λk of n is a non-increasing sequence of positive numbers of sum n. A partition is depicted by a so called Ferrer’s diagram as follows: the partition (5, 3, 2) is depicted as
Then a filling of the boxes of such a diagram is a tableau if and only if the contents of the boxes are increasing along rows and strictly increasing along columns. The Robinson-Schented correspondence is a bijection from the set of words to the set of pairs of tableaux and standard tableaux of the same shape. The first tableau corresponding to a word w in denoted P (w) the second Q(w). The following example shows the algorithm on the word acdbaedbc. a,1
∅, ∅ −−−→
b,4
c,2
a , 1 −−−→
−−−→
d,3
a c , 1 2 −−−→
a,5 c 4 , −−−→ a b d 1 2 3
a c d , 1 2 3
c 5 , 4 b a a d 1 2 3
272
F. Hivert / An Introduction to Combinatorial Hopf Algebras
e,6
−−−→
b,8
−−−→
c 5 , b 4 a a d e 1 2 3 6
−−−→
c e 5 8 , b d 4 7 a a b d 1 2 3 6
−−−→
d,7
c,9
c 5 , b e 4 7 a a d d 1 2 3 6
c e 5 8 , b d d 4 7 9 a a b c 1 2 3 6
The quotient of the free monoid by the relation “have the same P (w) tableau” is a monoid called the plactic monoid. It can be equivalently defined by Knuth relations: Theorem 28. Let ≡plact be the transitive closure of the two following relations · · · xzy · · · ≡ · · · zxy · · ·
for
x≤y
(43)
· · · yxz · · · ≡ · · · yzx · · ·
for
x
(44)
Then, for two words w1 and w2 , the equality P (w1 ) = P (w2 ) holds if and only if w1 ≡plact w2 .
Here are two examples of plactic rewriting. abaacbc ≡ abacabc
and acabdbc ≡ acadbbc
We then define St :=
Fσ =
Tab(σ)=t
w,
(45)
Q(w)=t
where w → (P (w), Q(w)) is the usual Robinson-Schensted map. Then we have the following analogue of Theorem 27 Theorem 29. The space spanned by St :=
Tab(σ)=t
Fσ =
w,
(46)
Q(w)=t
if a sub-Hopf algebra of FQSym which is dual to FQSym(A/ ≡plact ). This provides a very simple proof of the Littlewood-Richardson rule for computing the multiplication of Schur functions or equivalently for computing the decomposition of the tensor product of representations of GLN .
F. Hivert / An Introduction to Combinatorial Hopf Algebras
273
5.2. General theorems The planar binary trees Hopf algebra and the free symmetric functions Hopf algebra are actually two examples of a more general construction. The main data is a plactic-like monoid. Let summarize the following definitions. Definitions 30. A congruence ≡ on the free monoid A∗ is an equivalence relation which is compatible with the concatenation, that is, for any words u, v1 , v2 , w such that v1 ≡ v2 then uv1 w ≡ uv2 w. A congruence ≡ is generated by transpositions if it is the transitive closure of a (not necessarily finite) set of relation of the type uabv ≡ ubav
(47)
where u, v are words and a, b letters. We are now in position to state the main theorem. Theorem 31. Suppose that ≡ is a congruence generated by transpositions which is compatible with the standardization and the restriction to intervals. For any class C of standard words ( i.e.: permutations) for the congruence ≡, let Pt :=
Fσ .
(48)
σ∈C
Then the space spanned by Pt is a sub Hopf algebra of FQSym which is dual to the quotient FQSym(A/ ≡). The proof is essentially the same as in the case of the plactic monoid and can be found in [6]. There are more examples of the following construction. For instance, the hypoplactic monoid [16], leading to the pair quasisymmetric/noncommutative symmetric functions.
References [1] E. Abe, Hopf algebras, Cambridge tract in mathematics, Cambridge University Press 1980. [2] N. Bergeron, F. Hivert and J.-Y. Thibon, The peak algebra and the Hecke-clifford algebras at q = 0, J. Combinatorial Theory A, 117 (2004), 1–19. [3] C. Brouder and A. Frabetti, Renormalization of QED with planar binary trees, Europ. Phys. J. C 19 (2001), 715–741. [4] C. Brouder and A. Frabetti, QED Hopf algebras on planar binary trees, J. Algebra, 267 (2003), no. 1, 298–322. [5] A. Connes and D. Kreimer, Hopf algebras, renormalization and noncommutative geometry, Comm. Math. Phys. 199 (1998), no. 1, 203–242. [6] G. Duchamp, F. Hivert and J.-Y Thibon, Noncommutative symmetric functions. VI. Free quasi-symmetric functions and related algebras, Internat. J. Algebra Comput. 12 (2002), no. 5, 671–717.
274
F. Hivert / An Introduction to Combinatorial Hopf Algebras
[7] I.M. Gelfand, D. Krob, B. Leclerc, A. Lascoux, V.S. Retakh and J.-Y. Thibon, Noncommutative symmetric functions, Adv. in Math., 112 (1995), 218–348. [8] I. Gessel, Multipartite P-partitions and inner products of skew Schur functions, in Combinatorics and algebra, C. Greene, ed., Contemporary Mathematics 34 (1984), 289–301. [9] F. Hivert, J.-C. Novelli and J.-Y. Thibon, Un analogue du monoide plaxique pour les arbres binaires de recherche, C. R. Acad. Sci. Paris, 335 (2002), 1–4. [10] F. Hivert, J.-C. Novelli and J.-Y. Thibon, Sur quelques propri´et´es de l’alg`ebre des arbres binaires, C. R. Math. Acad. Sci. Paris, 337(9) (2003), 565–568. [11] F. Hivert and N. Thiry, Mupad-combinat, an open-source package for research in algebraic combinatorics, S´eminaire Lotharingien de Combinatoire, 51 (2003), 70 p. electronic. [12] F. Hivert, J.-C. Novelli and J.-Y. Thibon, The algebra of binary search trees, Theoretical Computer Science, 339(1) (2005), 129–165. [13] F. Hivert, J.-C. Novelli and J.-Y. Thibon, Yang-Baxter bases of 0-Hecke algebras and representation theory of 0-Ariki-Koike-Shoji algebras, Advances in Mathematics, to appears. [14] S. A. Joni and G.-C. Rota, Coalgebra and bialgebra in combinatorics, Stud. an Appl. Math. 61 (1979) 93–139. [15] D. E. Knuth, The art of computer programming, vol.3: Sorting and searching, (Addison-Wesley, 1973). [16] D. Krob and J.-Y. Thibon, Noncommutative symmetric functions IV : Quantum linear groups and Hecke algebras at q = 0, J. Alg. Comb., 6 (1997), no. 4, 339–376. [17] D. Krob and J.-Y. Thibon, Noncommutative symmetric functions V: A degenerate version of Uq (glN ), Internat. J. Algebra Comput., 9 (1999), no. 3-4, 405–430. [18] A. Lascoux and M.-P. Sch¨ utzenberger, Le mono¨ıde plaxique in Noncommutative structures in algebra and geometric combinatorics (Naples, 1978), pp. 129–156, Quad. Ricerca Sci., 109, CNR, Rome, 1981. [19] J.-L. Loday, Dialgebras and Related Operads, Lecture Notes in Mathematics, 1763, Springer-Verlag, 2001,7–66 [20] J.-L. Loday, Realization of the Stasheff polytope, math.AT/0212126, to appear in Arch. Math. (Basel). [21] J.-L. Loday and M.O. Ronco, Hopf algebra of the planar binary trees, Adv. Math., 139 (1998) n. 2, 293–309. [22] J.-L. Loday and M.O. Ronco, Order structure on the algebra of permutations and of planar binary trees, J. Algebraic Combin., 15 (2002) n. 3, 253–270. [23] A. Lascoux, B. Leclerc and J.-Y. Thibon, The plactic monoid, Chapter 5 of M. Lothaire, Algebraic Combinatorics on Words, Cambridge University Press. [24] P. A. MacMahon, Combinatorial analysis, Cambridge University Press, (1915) Chelsea reprint 1960. [25] C. Malvenuto and C. Reutenauer, Duality between quasi-symmetric functions and Solomon descent algebra, J. Algebra, 177 (1995), 967–982. [26] S. Montgomery, Hopf algebras and their action on rings, AMS 1994, 240p. [27] G.-C. Rota, Hopf algebra methods in combinatorics, Colloques internationaux C.N.R.S, Orsay (1976) 363–365, reprinted in Gian-Carlo in combinatorrics, Birkhauser, 1995. [28] G.-C. Rota, Baxter algebras and combinatorial identities I, II, Bull. A.M.S. 75 (1969) 325–334. [29] M. Sweedler, Hopf algebras, Benjamin 1969. [30] A. Zelevinsky, Representations of Finite Classical Groups. A Hopf Algebra Approach, Lecture Notes in Mathematics, 869, Springer-Verlag, Berlin-New York, 1981, 184 pp.
Physics and Theoretical Computer Science J.-P. Gazeau et al. (Eds.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
275
Complex Networks: Deterministic Models Francesc Comellas 1 Universitat Politècnica de Catalunya, Barcelona Abstract. The recent discovery that many networks associated with complex systems belong to a new category known as scale-free small-world has led to a surge in the number of new models for these systems. Many studies are based on probabilistic and statistical methods which capture well some of the basic properties of the networks. More recently, a deterministic approach has proven useful to complement and enhance the probabilistic and simulation techniques. In this paper, after a short introduction to the main concepts and models, we survey recent deterministic models. Keywords. Complex networks, small-world networks, scale-free networks, deterministic models, graph theory
1. Introduction Recent research shows that many networks associated with complex systems, like the World Wide Web, the Internet, telephone networks, transportation systems (including the power distribution network), and biological and social networks, belong to a class of networks known as small-world scale-free networks [4,35,41,42,56,52]. These networks exhibit both strong local clustering (nodes have many mutual neighbors) and a small average path length and diameter (maximum distance between any two nodes). Another important common characteristic is that the number of links attached to the nodes usually obeys a power-law distribution (is scale-free). Moreover, introducing a new measuring technique, it has recently been discovered that many real networks are self-similar, see [70]. Along these observational studies, researchers have developed different models and techniques -borrowed in some cases from statistical physics, computer science and graph theory- which should help us to understand and predict the behavior and characteristics of the systems. The origin of the interest in these studies may be found in the papers by Watts and Strogatz on small-world networks [75] and Barabási and Albert on scalefree networks [9]. Since then the study of complex networks has received a considerable boost as an interdisciplinary subject. Several excellent general reviews and books are available, and therefore in this paper we refer to them for the reader who would like to obtain more information on the topic, see references [74,71,3,31,11,57,73,8,20,32,67]. 1 Correspondence to: Francesc Comellas, Dep. Matemàtica Aplicada IV, EPSC, Universitat Politècnica de Catalunya, Avda. Canal Olímpic s/n, 08860 Castelldefels, Barcelona, Catalonia, Spain
276
F. Comellas / Complex Networks: Deterministic Models
To describe these complex networks several models have been proposed and analysed through simulations and considering probabilistic methods. The first, which triggered a sharp interest in the studies of the different properties of small-world, was the simple computational method to produce small-world netkorks proposed by Watts and Strogatz in their often cited paper [75]. Shortly after, Barabási and Albert [9,12] introduced a network model which uses two main mechanisms to produce a power-law distribution for the degrees: growth and preferential attachment. Dorogovtsev, Mendes, and Samukhin [33] use a “master-equation” to obtain an exact solution for a class of growing network models, Krapivsky, Redner, and Leyvraz [47] examined the effect of a nonlinear preferential attachment on network dynamics and topology. Models that incorporate aging and cost and capacity constraints were studied by Amaral et al. [6] to explain deviations from the power-law behavior in several real-life networks. Dorogovtsev and Mendes [27] also considered the evolution of networks with aging of sites. Bianconi and Barabási [16] introduced a model addressing the competitive aspect of many real networks such as the WWW. Additionally, in real systems microscopic events affect the network evolution, including the addition or rewiring of new edges or the removal of vertices or edges. Albert and Barabási [2] discussed a model that incorporates new edges between existing vertices and the rewiring of old edges. Dorogovtsev and Mendes [28] considered a class of undirected models in which new edges are added between old vertices and existing edges can be removed. It is now well established that preferential attachment can explain the power-law characteristic of networks, but some other alternative mechanisms affecting the evolution of growing networks can also lead to the observed scale-free topologies. Kleinberg et al. [48] and Kumar et al. [49,50] suggested certain copying mechanisms in an attempt to explain the power-law degree distribution of the World Wide Web. Chung et al. [26] also introduced a duplication model for biological networks. Krapivsky and Render [46] use an edge redirection mechanism which is equivalent to the model of Kumar et al [49,50]. All of these models have been studied intensively. Barthélémy and Amaral studied the origins of the small-world behavior in Ref. [15]. Barrat and Weigt addressed analytically as well as numerically the structure properties of the Watts-Strogatz model [13]. Amaral et al. investigated the statistical characteristics of many real-life networks [6]. Latora and Marchiori introduced the concept of efficiency of a network and found that small-world networks are both globally and locally efficient [51]. References [65,62,66,60] deal with the percolation properties of the networks and in particular the spread of information and disease along the shortest path in the graph or the spanning trees. More recently, researchers have also focused their attention on other aspects characterising properties of small-world scale-free networks [61,54,39,38,19,36,17,44]. While most of the models referenced above are stochastically produced and analysed, small-world scale-free networks can be created also by deterministic methods. Deterministic models have the strong advantage that it is often possible to compute analytically their properties, which may be compared with experimental data from real and simulated networks. Deterministic networks can be created by various techniques: modification of some regular graphs [23], addition and product of graphs [25], and other mathematical methods, like those which appear in [84]. Another important technique producing families of deterministic small-world scale-free networks is based on recursive techniques. Recursive and general scale-free constructions are given in [12,25,43,29,68,63].
F. Comellas / Complex Networks: Deterministic Models
277
Recursive methods based on the existence of cliques in a given network have appeared in [22,84,7,34,80,81,79]. After a short introduction to the basic definitions and models on probabilistic constructions, we present a review of recent deterministic models, mainly graph constructions which share the property that the graphs contain many complete subgraphs. Although with different names (hierarchical, pseudo-fractal, Apollonian, geometrical, recursive cliques) they all consider the same principle: the successive addition of vertices, each one connected to all the vertices of a subgraph isomorphic to a clique or complete graph. The rule used to add vertices produce different final networks sharing many basic properties: they are small-world, scale-free, with high clustering, and small average distance.
2. Basic properties of complex networks Many measures and parameters have been considered and studied to analyze complex networks. The use of certain subsets is in most cases sufficient to capture the rough structure of a given network. Recent research focuses on three main concepts, namely the mean distance of the network (average path length) -and in some cases the diameter-, the clustering coefficient and the degree distribution. Although the interest in the analysis of networks has always been present in the scientific community through the work in social sciences, the paper from Watts and Strogatz [75] extended this interest to other scientists. In their work they produce networks with a small average path length, similar to that of a random graph, and a relatively large clustering coefficient, as occurs in many structured networks, and show that real networks like the WWW, a power grid, and the neuronal network of the worm C. Elegans have a similar relation. A few months later, Barabási and Albert [9] discovered that many of these networks have a degree distribution that follows a power law (are scale-free), and introduced a model to produce that distribution. In this section we provide a short introduction to these concepts. First we give the following definitions, see also [76]. A network is represented by a graph G = (V, E) with vertex (node) set V = V (G), and edge (link) set E = E(G). The order of the graph, n = |V |, is the number of vertices or nodes of it. The degree of a vertex i, which we denote ki , is the number of edges incident to i and the degree of a graph G is Δ = maxi∈V ki . A graph is Δ-regular if the degree of all its vertices is Δ. A complete graph Kd (also referred in the literature as a d-clique) is a graph with d vertices, where there are no loops or multiple edges and every vertex is joined to every other by an edge. Generally speaking, two graphs are said to be isomorphic if the vertices and edges of one graph match up with vertices and edges of the other, and the edge matching is consistent with the vertex matching. The basic family of graphs considered by Watts and Strogats (and other studies) are known as circulant graphs. They considered the particular case Cn,Δ , Δ even, which has n nodes labelled with the integers modulo n, and Δ links per node such that each node i (mod n). is adjacent to the nodes i ± 1, i ± 2, · · · , i ± Δ 2
278
F. Comellas / Complex Networks: Deterministic Models
Diameter and mean distance In a graph the distance between two vertices i and j, d(i, j), is defined as the number of edges along the shortest path between i and j. The maximum distance between any pair of vertices, D = maxi,j∈V d(i, j), is the diameter of the graph. The mean distance of the 1 graph is defined to be d(i, j). In some probabilistic models the average n(n − 1) i,j∈V
path length (APL) of the network is introduced as the average value of d(i, j), with i and j chosen uniformly at random. In a social network, for example, the APL is associated with the average number of acquitances existing in the shortest chain which connects any two persons of the network. Note that all these definitions only make sense if we require that the graph is connected. As Watts and Strogatz noticed in [75], the average path length of most real complex networks is relatively small, even when the networks are sparse (they have many fewer edges than a complete graph with the same number of nodes). Some authors call this the small-world effect, and hence the name small-world networks. However, random networks also have a small diameter (and mean distance) [18] and they are different from real networks. For this reason it is perhaps better to refer to small-world networks as those networks that also have a relatively large clustering with respect to a similar random network. Clustering coefficient Clustering measures the “connectedness” of a graph and is another of the parameters used to characterize small-world networks. For example, in a friendship network, it’s very likely that two of your friends are also friends with each other, reflecting the clustering nature of this social network. The clustering coefficient was introduced to quantify this concept. First, for each node i of a graph G, Ci is defined as the fraction of the ki (k2i −1) possible edges among the neighbours of i that are present in G. More precisely, i is the number of edges connecting the ki vertices adjacent to the vertex i, the clustering coefficient of the vertex is Ci = ki (k2i i−1) . Then the clustering coefficient of G, denoted CG , is the average over all nodes i ∈ V (G) of Ci . Obviously, the clustering coefficient varies between 0 and 1. A value near 0 means that most of the vertices connected to any given vertex i are not connected to each other. Conversely, a value near 1 means that those neighbours tend to be connected to one another. Degree Distribution One simple and important characteristic of a given vertex is its degree. The degree ki of a vertex gives the total number of its connections. The average of ki over all i is called the average degree of the network, and is denoted by < k >. The spread of vertex degrees over a graph can be characterized by a distribution function P (k), which gives the probability that a randomly selected vertex has degree k. A structured graph, for example a circulant graph Cn,Δ , which is regular will have a degree distribution containing a single sharp spike (delta distribution). In a random net-
279
F. Comellas / Complex Networks: Deterministic Models
work (in the Erd˝os-Rényi model) the degree sequence will obey the well known Poisson distribution with a peak value in < k > and exponential declines. (The probability of finding vertices with k edges is negligible for k < k >.) In the past few years, many observational results showed that for most real large-scale networks the degree distribution deviates significantly from the Poisson distribution and in many instances the degree distribution can be better described by a power law, P (k) ∝ k −γ . Because these power-laws are free of any characteristic scale, such a network with a power-law degree distribution is called a scale-free network.
3. Complex Network Models 3.1. Watts-Strogatz small-world graphs Watts and Strogatz suggest a simple method for constructing graphs with the small world property [75]. The method is as follows. Start with a circulant graph Cn,Δ , then choose vertex 0 and the edge that connects it to 1. With probability p, reconnect this edge to a vertex chosen uniformly at random over the entire set of vertices (without duplicating any edge); otherwise do not change it. The process is repeated for all the remaining vertices in succession (1, . . . , n) considering each vertex in turn until one lap is completed. Next, do the same process with the edges connecting i to i + 2, i = 0, 1, . . . , n, as before, randomly rewire each of these edges with probability p, and continue this process, circulating around the ring and proceeding outward to more distant neighbours, i + 2, i + 3, · · · , i + Δ/2, after each lap, until each edge in the original graph Cn,Δ has been considered once. Therefore the rewiring process stops after Δ/2 laps. With this process, for p = 0, the original graph is unchanged whereas for p = 1 when all edges are rewired randomly (in that case we obtain a random graph which has a Gaussian degree distribution). Intermediate values for p lead to different states of disorder. With p around 0.01 small-world graphs are obtained with a large clustering coefficient, similar to the starting graph, and a small average path length and diameter, as in a random graph [18]. (See Fig. 2.) Watts and Strogatz realized that their model captures some aspects of many real networks, namely, they have a low average path length and diameter, in relation to a random network with a similar order and size, while they have a relatively high clustering (the clustering coefficient of a random network is almost zero), see Table 1. Table 1. Values of parameters for some real small-world scale-free networks, see [75].
Network
order
APL
Clustering
γ
WWW
153,127
3.1
35.21
0.11
1.94 [1]
Internet (domain) [35]
3,015
3.52
4.75
0.18
2.1
Power grid
4,014
18.7
2.67
0.08
4
Silwood Pk food web [53]
154
3.40
4.75
0.15
4.75
C. Elegans
282
2.65
14
0.28
-
Movie actors
225,226
3.65
61
0.79
2.3
280
F. Comellas / Complex Networks: Deterministic Models
3.2. Barabási-Albert scale-free graphs To explain the origin of the power-law degree distribution of real networks, Barabási and Albert proposed and analyzed a simple graph model (BA) based on two main concepts growth and preferential attachment. In this model a graph is dynamically formed by a continuous addition of new vertices. Each new vertex is joined to several existing vertices selected proportionally to their degree. The generation algorithm of a BA scale-free graph is as follows: Growth: Start with a small number, m0 , of vertices. At each step, a new vertex is introduced and is connected to m < m0 already existing vertices. Preferential attachment: The probability that the new vertex willbe connected to an existing vertex i depends on its degree ki according to P (ki ) = ki / j∈V kj . Considering these two rules, they proved analytically that the graph evolves into a scale-invariant state: The shape of the degree distribution does not change over the time and it is described by a power law P (k) ∝ k −γ , with γ = 3. This means that scalefree graphs have a few nodes with a high degree (called hubs). The analytical results can be contrasted easily with numerical simulations and compared with a random network produced according to the Erd˝os-Rényi method (start with a given number of vertices and add edges connected at random), see Fig. 1. The BA model does not allow, however, the analytical computations of the average path length and the clustering coefficient. It is, therefore, a minimal model capturing the mechanisms responsible for the power law degree distribution, but with some evident limitations when compared to some real-world networks (as it does not explain their relatively high clustering).
P(k)
P(k)
k
1
100
10000
k
Figure 1. A Gaussian distribution of degrees for an Erd˝os-Rényi random network and a power law distribution obtained with preferential attachment [9].
The BA model has allowed a thorough study of some important properties of real world networks. One of them, robustness to the random failure of nodes [4], is a result of the networks tend to stay connected, maintaining a small APL, when a node is deleted at random (this is because the probability of deleting a hub is small). However, they are particularly vulnerable to targeted attacks addressed to remove hubs. Other models have been proposed to overcome some of the limitations of the BA model and to produce scale-free networks with a small average path length and relatively high clustering, see the references given in the introduction.
F. Comellas / Complex Networks: Deterministic Models
(a)
(c)
281
(b)
(d)
Figure 2. (a) C(16, 4), a circulant graph. (b) Random graph. (c) Small-world graph. (d) Scale-free graph.
4. Deterministic small-world scale-free graphs In contrast to the random models of Watts and Strogatz and Barabási and Albert, and many other modifications and variations, it is possible to produce small-world scale-free graphs deterministically. The deterministic model very often allows a complete analytical study of the relevant parameters of the graphs and can be contrasted with the random model. As most real life networks are clustered, this characteristic can be reproduced in the deterministic model by considering complete graphs (or cliques). This is one of the reasons why many deterministic models are based on complete graphs. In this section we will introduce several deterministic models and compare, in some cases, with their randomized versions. 4.1. Deterministic WS small-world graphs In [23], small-world networks were constructed by choosing h nodes of Cn,Δ to be hubs and then using a graph with a very small diameter (star graph, complete graph, optimal double loop, etc.) of order h to interconnect the hubs. In this way, the clustering parameter of the final graph is high and very near to that of the original graph while the diameter is reduced considerably. This deterministic construction allows an analytical computation of all its main characteristics which can be compared with the numerical simulation, see Fig. 3.
282
F. Comellas / Complex Networks: Deterministic Models
Figure 3. Comparison of the values of the diameter and clustering obtained according to the simulation model of the Watts and Strogatz [75] with those from the deterministic model introduced in [23].
4.2. Small-world scale-free networks from graph products and sums Two simple deterministic construction techniques for small-world networks were introduced in [25]. The first method uses a replacement of vertices of a graph by clustered graphs (graph product). If the original graph has a low diameter and we use cliques to replace the vertices, a graph with low diameter and high clustering is obtained. In a second construction a small-world network is obtained by connecting the nodes of a network of diameter d to a complete network of any size, which may be different from node to node. In this case the resulting network has diameter d + 2, high clustering and nodes may have a different number of neighbors. This technique is very flexible as it allows different final degree distributions, including scale-free networks. 4.3. Hierarchical networks In [12], Barabási et al. introduced a simple hierarchical family of networks and showed it had a small-world scale-free nature. In [40] the authors compute some other properties of this family of graphs (like the spectrum.) The model is generalized in [68] and further studied in [63]. Hierarchical graphs have been used to model metabolic networks in [69]. Several authors claim that a signature for a hierarchical network is that other than the small-world scale-free characteristics the clustering of the vertices of the graph follows Ci ∝ 1/ki . Hierarchical networks can be constructed starting from a complete graph Kn and connecting to a selected root node n − 1 replicas of Kn . Next, n − 1 replicas of the new whole structure are added, to the root. At this step the graph will have n3 vertices. The process continues until we reach the desired graph size. There are many variations for these hierarchical networks, depending on the initial graph, the introduction of extra edges among the different copies of the complete subgraphs, etc. However, given the starting graph, they have no parameters to adjust and the main characteristics become fixed. In [14] a general model is considered and a labeling system introduced to allow the study of routing and other communication properties of hierarchical networks.
F. Comellas / Complex Networks: Deterministic Models
283
Figure 4. Recursive construction of an hierarchical network based on K4 , from [14].
4.4. Deterministic recursive clique-trees While in the hierarchical models the deletion of some edges leads to the decomposition of the graph into different complete graphs, another construction, also based on complete graphs, intermixes them producing a more complex structure. A generic recursive d−clique-tree is a graph theoretical construction which starts at t = 0 with a complete graph K(d, 0) = Kd . For any step t ≥ 1, the clique-tree K(d, t) is constructed from K(d, t−1) by selecting one or more existing d-cliques in K(d, t−1) and adding, for each clique, a new vertex connected to all the vertices of the clique. See Fig. 5. Note that a recursive clique-tree is a graph which contains numerous cycles and hence it is not a tree in the strict sense.
Figure 5. Iterative construction of a deterministic clique-tree graph.
Several modifications of this general construction have been considered and are discussed next. Different networks are assocaited to the choice of the value d. Authors have
284
F. Comellas / Complex Networks: Deterministic Models
considered d = 2, d = 3 and the general case d (d ∈ N, d ≥ 2). It is also important the way to select the existing d-cliques which will be joined to a new vertex: If we select all the cliques (even those that have been used before) we have complete recursive clique-trees [22] which include, as a particular case, pseudo-fractal networks [29] (when d = 2). If we just select cliques which have never been used before we obtain (for any d ≥ 3) high dimensional Apollonian networks [34,81], which include for d = 3 Apollonian networks [7,34]. Table 2 summarizes these deterministic constructions. Table 2. Deterministic recursive clique-tree constructions.
Case d=2
Adding at the same time a vertex to each d-clique with repetition
Adding at the same time a vertex to each d-clique without repetition
Pseudofractal scale-free Dorogotsev, Goltsev, Mendes Phys.Rev.E 65 (2002) 066122
Deterministic SW network Zhang, Rong, Guo Physica A cond-mat/0503637 Apollonian network Andrade, Herrmann, Andrade, Silva Phys.Rev.Lett. 94 (2005) 018702 Doye, Massen Phys. Rev. E 71 (2005) 016128.
Case d=3
General case d = 2...∞ (includes cases d=2,3)
Recursive clique-trees Comellas, Fertin, Raspaud Phys.Rev.E 69 (2004) 037104.
High dimensional Apollonian network Zhang, Comellas, Fertin, Rong J. Phys. A. 39 (2006) 1811 (introduced by Doye and Massen, Phys. Rev. E 71 (2005) 016128.)
4.5. Random recursive clique-trees The same clique-based principles used for the deterministic constructions can be used to produce graphs generated randomly. Interestingly, the final graphs differ in several values of the parameters. In some cases the discrepancy can be explained by a biased choice of the substructures selected in the process of growing the random graph, see [24]. If we select randomly a clique (allowing even those that have been used before), we have, for generic d, random recursive clique-trees studied in the Appendix of this survey. If we select randomly a clique (avoiding repetitions), we have, for generic d, high dimensional random Apollonian networks [80]. which include, as the particular case d = 3 random Apollonian networks [82]. We note that it is possible, for random constructions, to introduce a parameter to control part of the structural properties of the growing network. By tuning this parameter, one can allow the introduction at each step of one vertex attached to one clique or different vertices attached to different cliques up to the deterministic case. For d = 2, avoiding clique repetitions, this has been studied in [80]. A similar study could easily be done for the general case. Finally, the next table compares the values of the (asymptotical) degree distribution (scale-free in most cases, in which case we give the γ exponent) of Apollonian graphs [7], random Apollonian graphs [82], their high dimensional versions [81,80] (they include as
285
F. Comellas / Complex Networks: Deterministic Models Table 3. Random recursive clique-tree constructions.
Adding a single vertex to a random clique with repetition
Adding a single vertex to a random clique without repetition
Case d=2
Random SW network Ozik, Hunt, Ott Phys.Rev.E 69 (2004) 02618
Case d=3
Random Apollonian network Zhou, Yan, Wang Phys.Rev.E 71 (2005) 046141
General case d = 2...∞ (includes cases d=2,3)
Random recursive clique-tree see Appendix
HD random Apollonian network Zhang, Comellas, Rong Physica A. cond-mat/0502591
particular cases the former two families of graphs). We also present the random versions of the pseudo fractal scale-free graphs (introduced by Dorogovtsev, Goltsev and Mendes [29], their generalization, complete recursive clique-trees [22] and the so-called deterministic and random small-world graphs from [77,78]. We observe that these last two cases do not produce scale-free networks. Table 4. Comparison of deterministic and random Apollonian graphs and recursive clique-trees. Graph family
P (k) or γ-exponent
Deterministic SW [78]
2− 2
0.69 = ln 2
Random SW [77]
3 2 −k ( ) 4 3
0.65( =
k
Apollonian [7,34]
2.58(= 1 +
Random Apollonian [82]
3N−5 N
High-Dim. Apollonian [81]
1+
High-Dim. Random Apollonian [80]
2d−1 d−1
≈3
ln(d+1) ln d
1+
Random pseudo fractal scale-free
5 2
Determ. recursive clique-trees [22]
1+
Random rec. clique-trees [see Appendix]
2d−1 d−1
(2 to 2.58)
3 2
ln 3 − 1)
0.83 0.74( =
2 to 3
ln 3 ln 2
Pseudo fractal scale-free [29]
ln 3 ) ln 2
Clustering
46 3
− 36 ln 32 )
0.83 to 1 0.74 to 1
= 2.58
0.80(=
4 ) 5
= 2.5 ln(d+1) ln d
(2 to 2.58)
(2 to 3)
0.80 to 1 0.74 to 1
5. Acknowledgement Research supported by the Ministerio de Educación y Ciencia, Spain, and the European Regional Development Fund under projects TIC2002-00155 and TEC2005-03575/TCM.
286
F. Comellas / Complex Networks: Deterministic Models
APPENDIX: Deterministic vs random recursive clique-trees In this appendix we compute analytically the order, size and degree distribution of deterministic recursive clique-trees and their random variations. Deterministic recursive clique-trees The results for the values of the order, size, clustering, degree distribution and diameter of recursive clique-trees appeared in [22]. In this section we provide an alternative simpler method for the calculation of some of these parameters. We denote a deterministic recursive d-clique-tree network after t iterations by K(d, t), d ≥ 2, t ≥ 0. Then this deterministic recursive clique-tree network at step t is constructed as follows: For t = 0, K(d, 0) is the complete graph Kd (or d-clique), and K(d, 0) has d vertices and d(d − 1)/2 edges. For t ≥ 1, K(d, t) is obtained from K(d, t − 1) by adding, for each of its existing subgraphs isomorphic to a d-clique, a new vertex and joining it to all the vertices of this subgraph (see Fig. 5). Then, at t = 1, we add one new vertex and d new edges to the graph, creating d new d-cliques. Therefore, at t = 2 we add d + 1 new vertices, each of them connected to all the vertices of one of the d + 1 cliques Kd and we introduce d(d + 1) new edges, and so on. Note that the addition of each new vertex leads to d new d-cliques and d new edges. So the number of new vertices and edges at step ti is Lv (ti ) = (d + 1)ti −1 and Le (ti ) = d(d + 1)ti −1 , respectively. Therefore, a deterministic recursive clique-tree K(d, t) is a growing network, whose number of vertices increases exponentially with time. Thus we can easily see that at step t, the network K(d, t) has Nt = d +
t
(d + 1)t − 1 +d d
Lv (ti ) =
ti =1
(1)
vertices and total number of edges is: |E|t =
t d(d − 1) d(d − 1) + + (d + 1)t − 1 Le (ti ) = 2 2 t =1
(2)
i
Table 5. Number of new edges added to K(d, t) at each step and the total number of Kd ’s at this step. Step
New edges
Number of Kd
0
d(d−1) 2
1
1 2
d d(d + 1)
d+1 d(d + 1) + (d + 1) = (d + 1)2
3 ···
d(d + 1)2 ···
d(d + 1)2 + (d + 1)2 = (d + 1)3 ···
i
d(d + 1)i−1
(d + 1)i
i+1 ···
d(d + ···
1)i
The average degree kt is then
d(d + 1)i + (d + 1)i = (d + 1)i+1 ···
F. Comellas / Complex Networks: Deterministic Models
kt =
2|E|t 2d(d + 1)t + d3 − d2 − 2d . = Nt (d + 1)t + d2 − 1
287
(3)
For large t it is approximately 2d. We can see when t is large enough the resulting networks are sparse graphs as for many real-world networks whose vertices have a lot less connections than the maximum possible. We see that the dimension d is a tunable parameter controlling all the relevant characteristics of the network. When a new vertex i is added to the graph at step ti (ti ≥ 1), it has degree d and forms d new d-cliques. From the iterative algorithm, we can see that each new neighbor of i generated d − 1 new d-cliques with i as one vertex of them. In the next iteration, these d-cliques add to the already existing cliques and introduce new vertices that are connected to the vertex i. Let ki (t) be the degree of i at step t (t > ti + 1). Then, Δki (t) = ki (t) − ki (t − 1) = dΔki (t − 1)
(4)
combining the initial condition ki (ti ) = d and Δki (ti + 1) = d, we obtain Δki (t) = dt−ti
(5)
and the degree of vertex i becomes ki (t) =
t
Δki (tm ) =d
tm =ti
dt−ti − 1 +1 . d−1
(6)
The distribution of all vertices and their degrees at step t is given in Table 6. Table 6. Distribution of vertices and their degrees for K(d, t) at step t. Num. vert.
Degree
d+1
d+
d+1
d+
···
···
(d + 1)t−2
2d
(d + 1)t−1
d
t−1 j d j=1 t−2 j j=1
d
Therefore, the degree spectrum of the graph is discrete and some values of the degree are absent. To relate the exponent of this discrete degree distribution to the standard γ exponent as defined for continuous degree distribution, we use a cumulative distribution Pcum (k) ≡ k ≥k N (k , t)/Nt ∼ k 1−γ . Here k and k are points of the discrete degree spectrum. The analytic computation details are given as follows. For a degree k k=d
dt−l − 1 +1 , d−1
there are dl−1 vertices with this exact degree, all of which were introduced at step l.
288
F. Comellas / Complex Networks: Deterministic Models
All vertices introduced at step l or earlier have this or a higher degree. So we have
N (k , t) = d +
k ≥k
l
Lv (s) =
s=1
(d + 1)l − 1 + d. d
As the total number of vertices at step t is given in Eq. (1) we have 6
71−γ dt−l − 1 d( + 1) = d−1 =
(d+1)l −1 d (d+1)t −1 d
+d +d
(d + 1)l + d2 − 1 (d + 1)t + d2 − 1
(7)
Therefore, for large t we obtain (dt−l )1−γ = (d + 1)l−t and γ ≈1+
ln(d + 1) ln d
(8)
so that 2 < γ < 2.58496. Notice that when t gets large, the maximal degree of a vertex is roughly equal to ln d/ ln(d+1) 1/(γ−1) dt−1 ∼ Nt = Nt . Random recursive clique-trees In this section we study the random version of the construction of the last section. A random recursive d-clique-tree network after t iterations is denoted by R(d, t), d ≥ 2, t ≥ 0 and it is constructed as follows: For t = 0, R(d, 0) is the complete graph Kd (or d-clique), and it has d vertices and d(d − 1)/2 edges. For t ≥ 1, R(d, t) is obtained from R(d, t − 1) by adding to one randomly selected subgraph isomorphic to a d-clique a new vertex and joining it to all the vertices of this subgraph. This construction differs from the deterministic complete clique-tree of the last subsection in what at each iteration step only one vertex is added (and joined to a randomly selected clique). The selection of cliques which have been used before is allowed. Since the network size is incremented by one with each step, we use the step value t to represent a vertex created at this step. We can see easily that at step t, the network has of N = d + t vertices. We can compute the degree distribution as follows: First note that, after a new vertex is added, its degree is d and the number of d-cliques that can be chosen in the following step increases by d. If the degree of a vertex increases by 1, then this vertex belongs to d − 1 more cliques which could be chosen at any following step. When a vertex i attains degree ki the number of Kd to which it belongs is: d + (ki − d)(d − 1) = ki (d − 1) − d2 + 2d
289
F. Comellas / Complex Networks: Deterministic Models
Table 7. Distribution of vertices and number of Kd ’s for the random recursive clique tree R(d, t) at each step t. Number of Kd
Step (i)
Num. vertices
0
d
1
1 2
d+1 d+2
(d + 1) (d + 1) + d
3
d+3
(d + 1) + d + d
··· t
··· d+t
··· (d + 1) + d + · · · + d = = td + 1
The first term is the degree of the vertex when it is introduced to the network (equal to the number of Kd to which it belongs). The second is the increase of degree up to when it reaches degree ki times d − 1 (the number of cliques introduced at each increase of degree by one unit). Note that after t steps the number of (d + 1)-cliques available for selection is td + 1. If we consider k to be continuous, we can write for a vertex i ki (d − 1) − d2 + 2d ∂ki = . ∂t td + 1
(9)
The solution of this equation, with the initial condition that vertex i was added to the network at ti with degree ki (ti ) = d, is d d(d − 2) + ki (t) = (d − 1) (d − 1)
dt + 1 dti + 1
(d−1) d (10)
The probability that a vertex has a degree ki (t) smaller than k, P (ki (t) < k), is ⎛
d d−1
d d−1
⎞
(dt + 1) 1⎟ ⎜ P (ki (t) < k) = P ⎝ti > − ⎠. d d−1 d d k − d(d−2) (d−1)
(11)
Assuming that we add the vertices to the network at equal intervals, the probability density of ti is Pi (ti ) =
1 . d+t
(12)
Substituting this into Eq. (3) we obtain that ⎛
⎞ d d−1 d (dt + 1) d−1 1⎟ ⎜ P ⎝ti > − ⎠= d d−1 d d k − d(d−2) (d−1)
(13)
290
F. Comellas / Complex Networks: Deterministic Models
⎛
⎞ d d−1 d (dt + 1) d−1 1⎟ ⎜ = 1 − P ⎝ti ≤ − ⎠= d d−1 d d k − d(d−2) (d−1) d d−1 d (dt + 1) d−1 1 . + =1− d d−1 (d + t)d (d + t)d k − d(d−2) (d−1)
Thus the degree distribution is d
1−2d ∂P (ki (t) < k) (dt + 1)d d−1 P (k) = = ((d − 1)k − d(d − 2)) d−1 . ∂k (d + t)
(14)
For large t P (k) = d
2d−1 d−1
1−2d
((d − 1)k − (d(d − 2)) d−1
(15)
and if k d then P (k) ∼ k −γ with a degree exponent γ(d) = 2d−1 d−1 . When d = 2 one has γ(2) = 3, while as d goes to infinity γ(∞) = 2.
References [1] L.A. Adamic, B.A. Huberman, Power-law distribution of the World Wide Web , Science 287 (2000) 2115. Adamic, L. A., and B. A. Huberman, 2000, Science 287, 2115. [2] R. Albert and A.-L. Barabási, Topology of evolving networks: Local events and universality, Phys. Rev. Lett. 85 (2000), 5234–5237. [3] R. Albert, A.-L. Barabási, Statistical mechanics of complex networks, Rev. Mod. Phys. 74 (2002), 47–97. [4] R. Albert, H. Jeong, A.-L. Barabási, Diameter of the world wide web, Nature 401 (1999), 130–131. [5] R. Albert, H. Jeong, A.-L. Barabási, Error and attack tolerance of complex networks, Nature 406 (2000), 378–382. [6] L. A. N., Amaral, A. Scala, M. Barthélémy, H. E. Stanley, Classes of small-world networks, Proc. Natl. Acad. Sci. U.S.A. 97 (2000), 11149. [7] J. S. Andrade Jr., H. J. Herrmann, R. F. S. Andrade and L. R. da Silva, Apollonian Networks: Simultaneously scale-free, small world, Euclidean, space filling, and with matching graphs. Phys. Rev. Lett. 94 (2005), 018702. [8] A.-L. Barabasi. Linked: How Everything Is Connected to Everything Else and What It Means. Perseus Publishing, Cambridge, MA, 2002. [9] A.-L. Barabási, R. Albert, Emergence of scaling in random networks, Science 286 (1999), 509–512. [10] A.-L. Barabási, R. Albert, H. Jeong, Mean-field theory for scale-free random networks, Physica A 272 (1999), 173–187. [11] A.-L. Barabasi, E. Bonabeau. Scale-free networks. Scientific American 288 No. 5 (2003), 50–59 . [12] A.-L. Barabási, E. Ravasz, and T. Vicsek, Deterministic scale-free networks, Physica A 299 (2001), 559–564.
F. Comellas / Complex Networks: Deterministic Models
291
[13] A. Barrat, and M. Weigt, On the properties of small-world network models Eur. Phys. J. B 13 (2000), 547. [14] L. Barrière, F. Comellas, and C. Dalfó, Deterministic hierarchical networks, manuscript. [15] M. Barthélémy and L.A.N. Amaral, Small-World Networks: Evidence for a Crossover Picture, Phys. Rev. Lett. 82 (1999), 3180. [16] G. Bianconi, A.-L. Barabási, Competition and multiscaling in evolving networks, Europhys. Lett. 54 (2001), 436–442. [17] P. Blanchard, T. Krueger and A. Ruschhaup, Small world graphs by iterated local edge formation, Phys. Rev. E 71 (2005), 046139. [18] B. Bollobas, F. de la Vega. The diameter of random regular graphs , Combinatorica 2 (1982), 125–134 [19] L.A. Braunstein, S.V. Buldyrev, R. Cohen, S. Havlin and H.E.Stanley, Optimal Paths in Disordered Complex Networks, Phys. Rev. Lett. 91 (2003), 168701. [20] M. Buchanan. Nexus: Small Worlds and the Groundbreaking Theory of Networks. W.W. Norton and Company Inc., New York, N.Y., 2002. [21] D. Cohen. All the world’s a net. New Scientist 174 (2002), 24–27. [22] F. Comellas, G. Fertin, and A. Raspaud, Recursive graphs with small-world scale-free properties, Phys. Rev. E 69 (2004), 037104. [23] F. Comellas, J. Ozón, and J.G. Peters, Deterministic small-world communication networks, Inf. Process. Lett. 76 (2000), pp. 83–90. [24] F. Comellas, H. D. Rozenfeld, D. ben-Avraham, Synchronous and asynchronous recursive random scale-free nets, Phys. Rev. E 72 (2005), 046142.. [25] F. Comellas and M. Sampels, Deterministic small-world networks, Physica A 309 (2002), 231–235. [26] F. Chung, Linyuan Lu, T. G. Dewey, D. J. Galas, Duplication models for biological networks, J. of Comput. Biology 10 (2003), 677–688. [27] S. N. Dorogovtsev, J. F. F. Mendes, Evolution of networks with aging of sites, Phys. Rev. E 62 (2000), 1842. [28] S. N.Dorogovtsev, J. F. F. Mendes, Scaling behaviour of developing and decaying networks, Europhys. Lett. 52 (2000), 33–39. [29] S.N. Dorogovtsev, A.V. Goltsev, and J.F.F. Mendes, Pseudofractal scale-free web, Phys. Rev. E 65 (2002), 066122. [30] S. N. Dorogovtsev, J. F. F. Mendes, Comment on “Breakdown of the internet under intentional attack”, Phys. Rev. Lett. 87 (2001), 219801. [31] S.N. Dorogvtsev, J.F.F. Mendes, Evolution of networks, Adv. Phys. 51 (2002), 1079–1187. [32] S. N. Dorogovtsev, J. F. F. Mendes. Evolution of Networks : From Biological Nets to the Internet and WWW. Oxford University Press, Oxford, UK, 2003. [33] S.N. Dorogovtsev, J.F.F. Mendes, A.N. Samukhin, Structure of growing networks with preferential linking, Phys. Rev. Lett. 85 (2000), 4633–4636. [34] J. P. K. Doye, C. P. Massen. Self-similar disk packings as model spatial scale-free networks, Phys. Rev. E 71 71 (2005), 016128. [35] M. Faloutsos, P. Faloutsos, C. Faloutsos. On power-law relationships of the internet topology. Comput. Commun. Rev. 29 (1999), 251–260. [36] H. Guclu and G. Korniss Extreme fluctuations in small-world networks with relaxational dynamics, Phys. Rev. E 69 (2004), 065104. [37] K.-I. Goh, E. Oh, H. Jeong, B. Kahng, and D. Kim. Classification of scale-free networks. ˝ Proc. Natl. Acad. Sci. USA 99 (2002), 12583U-12588 . [38] C.P. Herrero and M. Saboyá, Self-avoiding walks and connective constants in small-world networks, Phys. Rev. E 68 (2003), 026106. [39] S.Y. Huang, X.W. Zou, Z.J. Tan, Z.G. Shao and Z.Z. Jin, Critical behavior of efficiency dynamics in small-world networks, Phys. Rev. E 68 (2003), 016107. [40] K. Iguchi, H. Yamada. Exactly solvable scale-free network model. Phys. Rev. E 71 (2005),
292
F. Comellas / Complex Networks: Deterministic Models
036144. [41] H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai, A.-L. Barabási, The large-scale organization of metabolic networks, Nature 407 (2000), 651–654. [42] H. Jeong, S. Mason, A.-L. Barabási, Z.N. Oltvai, Lethality and centrality in protein networks, Nature 411 (2001), 41–42. [43] S. Jung, S. Kim, B. Kahng, Geometric fractal growth model for scale-free networks, Phys. Rev. E 65 (2002), 056101. [44] M. Kaiser and C. Hilgetag, Spatial growth of real-world networks, Phys. Rev. E 69 (2004), 036103. [45] R. Kasturirangan, Multiple scales in small-world graphs, ArXiv cond-mat/9904055. [46] P.L. Krapivsky, S. Redner, Organization of growing random networks, Phys. Rev. E 63 (2001), 066123. [47] P.L. Krapivsky, S. Redner, F. Leyvraz, Connectivity of growing random networks, Phys. Rev. Lett. 85 (2000), 4629–4632. [48] J.M. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, The Web as a graph: Measurements, models and methods, Proceedings of the 5th Annual International Conference, COCOON’99, Tokyo, July 1999 (Springer-Verlag, Berlin), (1999) 1. [49] R. Kumar, P. Raghavan, S. Rajalopagan, D. Sivakumar, A. S. Tomkins, E. Upfal, The Web as a graph, Proceedings of the 19th Symposiumon Principles of Database Systems (2000) 1. [50] R. Kumar, P. Raghavan, S. Rajalopagan, D. Sivakumar, A. S. Tomkins, E. Upfal, Stochastic models for the Web graph, Proceedings of the 41st IEEE Symposium on Foundations of Computer Science (IEEE Computing Society, Los Alamitos, Calif.) (2000) 57–65. [51] V. Latora and M. Marchiori, Efficient Behavior of Small-World Networks, Phys. Rev. Lett. 87 (2001), 198701. [52] F. Liljeros, C.R. Edling, L.A.N. Amaral, H.E. Stanley, Y. Åberg, The web of human sexual contacts, Nature 411 (2001), 907–908. [53] J.M. Montoya, R.V. Solé. Small world patterns in food webs. J. Theor. Biol. 214 (2002), 405–412. [54] K. Medvedyeva, P. Holme, P. Minnhagen and B.J. Kim, Dynamic critical behavior of the XY model in small-world networks, Phys. Rev. E 67 (2003), 036118. [55] R. Monasson, Diffusion, localization and dispersion relations on small-world lattices, Eur. Phys. J. B 12 (1999), 555. [56] M.E.J. Newman, The structure of scientific collaboration networks, Proc. Natl. Acad. Sci. U.S.A. 98 (2001), 404–409. [57] M.E.J. Newman, The structure and function of complex networks, SIAM Review 45 (2003), 167–256. [58] M.E.J. Newman, C. Moore and D.J. Watts, Mean-Field solution of the small-world network model, Phys. Rev. Lett. 84 (2000), 3201. [59] M.E.J. Newman and D.J. Watts, Phys. Lett. A 263 (1999), 341. [60] M.E.J. Newman and D.J. Watts, Scaling and percolation in the small-world network model, Phys. Rev. E 60 (1999), 7332. [61] T. Nishikawa, A.E. Motter, Y.C. Lai and F.C. Hoppensteadt, Smallest small-world network, Phys. Rev. E 66 (2002), 046139. [62] C.F. Moukarzel, Spreading and shortest paths in systems with sparse long-range connections, Phys. Rev. E 60 (1999), 6263. [63] J.D. Noh, Exact scaling properties of a hierarchical network model, Phys. Rev. E 67 (2003), 045103. [64] J. Ozik, B.R. Hunt, and E. Ott, Growing networks with geographical attachment preference: Emergence of small worlds, Phys. Rev. E 69 (2004), 026108. [65] S.A. Pandit and R.E. Amritkar, Characterization and control of small-world networks, Phys. Rev. E 60 (1999), R1119. [66] S.A. Pandit and R.E. Amritkar, Random spread on the family of small-world networks, Phys.
F. Comellas / Complex Networks: Deterministic Models
293
Rev. E 63 (2001), 041104. [67] R. Pastor-Satorras, A. Vespignani. Evolution and Structure of the Internet : A Statistical Physics Approach. Cambridge University Press, Cambridge, UK, 2004. [68] E. Ravasz, A.-L. Barabási, Hierarchical organization in complex networks, Phys. Rev. E 67 (2003), 026112. [69] E. Ravasz, A. L. Somera, D. A. Mongru, Z. N. Oltvai, and A.-L. Barabási, Hierarchical organization of modularity in metabolic networks, Science 297 (2002), 1551–1555. [70] C.M. Song, S. Havlin, H.A. Makse, Self-similarity of complex networks Nature 433 (2005), 392–395. [71] S.H. Strogatz, Exploring complex networks. Nature 410 (2001), 268–276. [72] A. Vázquez, Disordered networks generated by recursive searches, Europhys. Lett. 54 (2001), 430–435. [73] X.F. Wang, G. Chen, Complex Networks: Small-world, scale-free and beyond, IEEE Circuits and Systems Magazine 1 (2003), 6–20. [74] D.J. Watts Small Worlds: The Dynamics of Networks between Order and Randomness. Princeton University Press, Princeton, NJ, 1999. [75] D.J. Watts, S.H. Strogatz, Collective dynamics of ‘small-world’ networks, Nature 393 (1998), 440–442. [76] D.B. West, Introduction to Graph Theory. Prentice-Hall, Upper Saddle River, NJ, 2001. [77] Z.Z. Zhang, L.L. Rong, Growing Small-World Networks Generated by Attaching to Edges arXiv: cond-mat/0503637 [78] Z.Z. Zhang, L.L. Rong, C. Guo, A deterministic small-world network arXiv: condmat/0502335 [79] Z. Zhang, L. Rong, F. Comellas, Evolving small-world networks with geographical attachment preference, J. Phys. A: Math. Gen. to appear. arXiv cond-mat/0510682. [80] Z.Z. Zhang, L.L. Rong, F. Comellas, High dimensional random Apollonian networks, Physica A. to appear. ArXiv: cond-mat/0502591. [81] Z.Z. Zhang, F. Comellas, G. Fertin, L.L. Rong, High dimensional Apollonian networks, J. Phys. A: Math. Gen. 39 (2006), 1811–1818 . [82] T. Zhou, G. Yan, P. L. Zhou, Z. Q. Fu, B. H Wang, Random Apollonian networks, arXiv: cond-mat/0409414. [83] T. Zhou, G. Yan, B. H Wang, Maximal planar networks wiht large clustering coefficient and power-law degree distribution, Phys. Rev. E 71 (2005), 046141. [84] T. Zhou, B.H. Wang, P.M. Hui, K.P. Chan, Integer networks, arXiv: cond-mat/0405258.
This page intentionally left blank
Physics and Theoretical Computer Science J.-P. Gazeau et al. (Eds.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
295
Homomorphisms of Structures concepts and highlight Jaroslav Nešetˇril 1 Department of Applied Mathematics and ITI, MFF, Charles University Abstract. In this paper we survey the recent results on graph homomorphisms perhaps for the first time in the broad range of their relationship to wide range applications in computer science, physics and other branches of mathematics. We illustrate this development in each area by few results. Keywords. finite model, finite structure, relational object, homomorphism, duality, partition function, descriptive complexity
1. Introduction Graph theory receives its mathematical motivation from the two main areas of mathematics: algebra and geometry (topology) and it is fair to say that the graph notions stood at the birth of algebraic topology. Consequently various operations and comparisions for graphs stress on either its algebraic part (e.g. various products) or geometrical part (e.g. contraction, subdivision). It is only natural that the key place in the modern graph theory is played by (fortunate) mixtures of both approaches as exhibited best by the various modifications of the notion of graph minor. However from the algebraic point of view perhaps the most natural graph notion is the following notion of a homomorphism: Given two graphs G and H a homomorphism f of G to H is any mapping f : V (G) → V (H) which satisfies the following condition : [x, y] ∈ E(G) implies [f (x), f (y)] ∈ E(H). This condition should be understood as follows: on both sides of the implication one considers the same type of edges: undirected {u, v} often denoted just uv) or directed ((u, v) often as well just uv). It is important that this definition is flexible enough to induce analogous definitions of the homomorphisms for hypergraphs (set systems) and relational systems (with a given signature; that will be specified later). Homomorphisms arise naturally in various and very diverse situations • in extremal combinatorics (and particularly in problems related to colorings, partitions and decompositions of graphs and hypergraphs); 1 Correspondence to: Jaroslav Nešetˇril, Department of Applied Mathematics and ITI, MFF, Charles University, CZ 11800 Praha 1, Malostranské nám. 25.
296
J. Nešetˇril / Homomorphisms of Structures Concepts and Highlight
• in statistical physics (as a model for partition functions); • in probability (as a model of random processes, for example random walk); • in logic (any satisfiability assignment of a formula may be viewed as a homomorhism); • in Artificial Intelligence (as a model and criterium of satisfaction leading to Constraint Satisfaction Problems); • finite model theory (as a natural way of compare and classify models); • theory of algorithms (as an example and reduction tool); • in complexity theory (and more recently in logic, descriptive complexity in particular); • in algebraic combinatorics (providing the vital link to algebraic topology); • in category theory (as a motivating example, a thoroughly studied particular case). This paper cannot even touch upon all these topics. This is too ambitious even for a monograph. To a certain extent this has been a plan of recently published book in [22]. But the progress has been fast and we want to complement this by giving some highlights of this development. The interested reader can consult also surveys and papers [30,63,32,3,48]. This paper is a (much) extended version of the talk given by that author at Cargese school Physics and Computer Science, October 17-29, 2006. The purpose of this text is to illustrate the rich conceptual framework of the contemporary study of homomorphisms in various mathematical as well as non mathematical context of various related applications. Because of this (and size) we cannot present full proofs and even to define all the related concepts. But we aim to present at least outline of the recent trends and perhaps for the first time we bring together topics which never coexisted together in a paper.
Contents 1. 2. 3. 4. 5. 6. 7. 8. 9.
Introduction Preliminaries Counting Weighted Counting, Random and Quantum Existence Constraint Satisfaction Problems Dualities Restricted Dualities Homomorphism Order
2. Preliminaries We rely on standard texts such as [2], [41] (for graph theory), [22] (for graphs and homomorphisms), [35,41] (for general combinatorics). However the field of combinatorial structures related to homomorphisms (which this author likes to call combinatorics of mappings, see forthcoming [47]) is currently developing very fast and it is the purpose of this short survey to cover this recent development.
J. Nešetˇril / Homomorphisms of Structures Concepts and Highlight
297
3. Counting 3.1. Hom and hom The symbol Hom(G, H) will denote the set of all homomorphisms of G to H. The symbol hom(G, H) will denote the number of all homomorphisms of G to H. These sets carry much of the information about the structure of graphs G and H. Consider for example the simple situation when G is an undirected graph and H = K2 . In this case hom(G, H) = 2k where k is the number of bipartite components of G. But this simplicity is an exception. Already when we consider the graph H = K2∗ which consists from two vertices, one edge joining them and one loop at one of the vertices (if the vertices are denoted by 0, 1 the the edges are 01, 11; sometimes this graph is called a “lollipop” sometimes even“io”) then the situation changes dramatically. What is the meaning of hom(G, K2∗ )? This is suddenly much more interesting: a homomorphism f : G −→ K2∗ corresponds exactly to an independent subset of vertices of G (a subset A ⊂ V (G) is said to be independent if it does not contain any edge; the correspondence is easy: we can put A = f −1 (0)) and thus hom(G, K2∗ ) is just the number of independent sets in the graph G. It follows that hom(G, K2∗ ) is a difficult parameter related to hard - core model in statistical physics. It is a difficult even in simple (and important) cases such as the d-dimensional cube (and its determination is known as “Dedekind problem”). K2∗ ) is of course not an isolated example. The triangle K3 is another (hom(G, K3 ) is the number of 3− colorings of a graph G.) On the other side the set Hom(G, H) may be endowed not only with the categorial structure (inherited from the category of graphs; this leads to sums and products as well as to the notion of power graph GH ) but more recently also by the following geometric structure: Given graphs G, H we consider all mappings f : V (G) −→ P (V (H)) (here P (X) denotes the set of all non-empty subsets of X) which satisfy xy ∈ E(G), u ∈ f (x), v ∈ f (y) ⇒ uv ∈ E(F ). It is natural to call such a mapping f a multihomomorphism: every homomorphism is a multihomomorphism and, moreover, for every multihomomorphism f every mapping g : V (G) → V (H) satisfying g(v) ∈ f (v) for every v ∈ V (G) is a homomorphism. By abuse of notation (for this moment) denote the set of all multihomomorphisms G → H also by Hom(G, H). This set may be naturally partially ordered: for multihomomorphisms f, f we put f ≤ f iff for every v ∈ V (G) holds f (v) ⊆ f (v). This construction is called Hom complex and it crystallized in the long and intensive history of coloring special graphs, most notably Kneser graphs, see [42,30,12,65]. It plays the key role in this application of topology to combinatorics. Hom complex Hom(G, H) is viewed as an order complex and this in turn as a topological space (in its geometric realization). All these constructions are functorial. (Ref. [60] is early study of graphs from the categorical point of view.) 3.2. Lovász’ theorem Let F1 , F2 , F3 , . . . be a fixed enumeration of all non-isomorphic finite graphs. The Lovász vector of a graph G is hom(G) = (n1 , n2 , n3 , . . . ), where nk = hom(Fk , G).
298
J. Nešetˇril / Homomorphisms of Structures Concepts and Highlight
Theorem.[36] Two finite graphs G and H are isomorphic if and only if hom(G) = hom(H). We include a short proof of this important result. Proof. It is more than evident that if G ∼ = H then hom(G) = hom(H). Let homi (F, G) denote the number of all monomorphisms (injective homomorphisms) of F to G. Suppose that hom(G) = hom(H). We claim that then also homi (F, G) = homi (F, H) for an arbitrary graph F . This claim will be proved by induction on the number of vertices of the graph F . First, if |V (F )| = 1, then homi (F, G) = hom(F, G) = hom(F, H) = homi (F, H). Suppose |V (F )| > 1. Then we can write homi (F/Θ, G) hom(F, G) = Θ∈Eq(V (F ))
= homi (F, G) +
homi (F/Θ, G),
Θ∈Eq(V (F )) Θ=id
where Eq(V (F )) is the set of all equivalence relations on V (F ) and F/Θ is the graph whose vertex set is the set of all equivalence classes of Θ and an edge connects two classes c and c if there are vertices u ∈ c and u ∈ c so that {u, u } is an edge of F . (Note that loops may occur in F/Θ.) This is because every homomorphism f : F → G corresponds to a monomorphism of F/Θ to G for Θ = {(u, u ); f (u) = f (u )}. Similarly, we get hom(F, H) = homi (F, H) +
homi (F/Θ, H).
Θ∈Eq(V (F )) Θ=id
By induction, we know that for any Θ ∈ Eq(V (F )), Θ = id, homi (F/Θ, G) = homi (F/Θ, H), since |V (F/Θ)| < |V (F )|. It follows that we have also homi (F, G) = homi (F, H). Applying this for the following choices F = G and F = H we get homi (G, H) = homi (G, G) ≥ 1 and homi (H, G) = homi (H, H) ≥ 1. If there is a monomorphism of G to H and a monomorphism of H to G, then (as our graphs are finite) G and H are isomorphic. The Lovász’ theorem has a number of interesting (and despite its seeming simplicity, profound) consequences. For example one can prove easily the following cancellation law for products of graphs. (There are many graph products. Here we mean the product G × H defined by the property that projections are homomorphisms. This is the categorical product of category of all graphs and their homomorphisms.) 3.3. Corollary Let G and H be graphs. If graphs G × G = G2 and H × H = H 2 are isomorphic then so are graphs G and H. Proof. (sketch) Let F be a graph. Every homomorphism f : F → G2 corresponds to a pair of homomorphisms (f1 , f2 ) of F to G; if f (u) = (x1 , x2 ), then fi (u) = xi .
J. Nešetˇril / Homomorphisms of Structures Concepts and Highlight
299
Moreover, the correspondence is one-to-one (due to the categorical properties of the product). Therefore hom(F, G)2 = hom(F, G2 ) = hom(F, H 2 ) = hom(F, H)2 and so hom(F, G) = hom(F, H).) This particular case was conjectured by Ulam (for finite partially ordered sets)([5]). Along the same lines one can also prove the following: Let A, B and C be graphs, let C have a loop. If A × C ∼ = B. = B × C, then A ∼ These results hold in fact not just for graphs but for arbitrary finite structures (with a mild conditions on the underlying category). For example it is important to observe that the following dual form of Theorem 2.2 holds: If hom(G, Fi ) = hom(H, Fi ) for every i = 1, 2, 3, . . . then graphs G and H are isomorphic. The proof uses (again) the inclusion-exclusion principle. These results form an important part in the Tarski’s and Birkhoff’s project of arithmetization of theory of finite structures (see [43,5]). It is not generally true that A × C ∼ = B. A = B × C implies A ∼ counterexample: A consists of two isolated loops, B = C = K2 . Another counterexample: A = K3 , B = C6 (the cycle of length 6), C = K2 . With more efforts one can prove that if A, B and C are not bipartite, then they have the cancellation property: A×C ∼ = B × C =⇒ A ∼ = B, [36].
4. Weighted Counting, Random and Quantum In Statistical Physics we deal with a structure of (typically) large number of particles each in a finite number of states σ1 , . . . , σt . The particles are positioned in the vertices of a graph G (with vertices {1, 2, . . . , t}) and interactions occurs only between neighboring vertices. Two particles σi , σj are interacting with energy γ(σi , σj ) and the total energy of the state σof the structure (i.e. of all states of the vertices of the graph G) is given by H(σ) = ij∈E(G) γ(σi , σj ). Finally the partition function (in a simplified form) is given by Z=
e−H(σ) .
σ
The partition function relates to the number of weighted homomorphisms. This was as developed recently in a series paper by Lovász et al. in a broad spectrum of asymptotic graph theory, random structures and abstract algebra (see e.g. [16,35,38,37,3,4]). Let G, H be graphs (undirected). Additionally, let the vertices and edges of H be weighted: α : V −→ R+ , β : E −→ R. In this situation define the weighted version of hom(G, (H, α, β)) as follows: hom(G, (H, α, β)) =
ϕ:V (G)→V (H) v∈V (G)
αϕ(v)
uv∈E(G)
βϕ(u)ϕ(v) .
300
J. Nešetˇril / Homomorphisms of Structures Concepts and Highlight
Of course if α and β are functions identically equal to 1 then the weight hom(G, (H, α, β)) of homomorphisms is just hom(G, H). The partition function Z may be expressed by this weighted homomorphism function. Towards this end write e−H(σ) = e−γ(σi ,σj = β(σi , σj ), Z= σ
σ ij∈E(G)
σ ij∈E(G)
where we put β(σi , σj ) = e−γ(σi ,σj ) . It follows that the partition function may be computed as weighted homomorphism function. This has many variants and consequences. For example, in the analogy with number of 3-colorings expressed by hom(G, K3 ) and Ising model expressed by hom(G, K2∗ ), one can ask which partition functions can be expressed as weighted functions hom(−, (H, α, β)) for a (finite) weighted graph (H, α, β). The surprising and elegant solution to this question was given in [16] and we finish this section by formulating this result. A graph parameter is a function p which assigns to every finite graph G a real number p(G) and which is invariant under isomorphisms. 4.1. Theorem ([16]) For a graph parameter p are the following two statements equivalent: 1. p is a graph parameter for which there exists a weighted graph (H, α, β) such that p(G) = hom(G, (H, α, β)) for every graph G; 2. there exists a positive integer q such that for every k ≥ 0 the matrix M (p, k) is positive semidefinite and its rank is at most q k . Motivated by physical context, the parameter p is called reflection positive if the matrix M (p, k) (called in [16] the connection matrix) is positive semidefinite for every k. There is no place to define here the connection matrix, let us just say that it is an infinite matrix induced by values of the parameter p on amalgams of graphs along k-element subsets (roots). Where are the random aspects of all this (as claimed by the title of this section) ? For this consider the following: t(G, H) =
hom(G, H) hom(G, (H, α, β)) , t(G, (H, α, β)) = |V (G)| |V (H)| |V (H)||V (G)|
These quantities are called homomorphism density. They express the probability that a random mapping is a homomorphism; or the average weight of a mapping V (G) −→ V (H). This connection leads to a homomorphism based interpretation of important asymptotic properties of large graphs such as Szemerédi Regularity Lemma [66] or properties of quasirandom graphs [10,67]; see [38,4,37]. Where are quantum aspects of this? Well it appears that in proving Theorem 4.1 it is both natural and useful to extend the homomorphism function to formal finite linear combinations of graphs. These combinations, called quantum graphs, [16], are natural in physical context and they appear as a convenient tool in proving 4.1. 5. Existence and CSP Perhaps this section should precede the counting sections. What can be easier than deciding as opposed to seemingly more difficult counting. Well the answer is not so simple
J. Nešetˇril / Homomorphisms of Structures Concepts and Highlight
301
and in fact both parts of the theory point to different directions: As we indicated above the counting relates to probability and properties of random structures in general, to partition models of physical phenomena; on the other side the existence problems relate to computational complexity of decision models (such as Constraint Satisfaction Problem (CSP), logic and descriptive complexity and dualities). Some of this will be covered in this and next sections. The sections on counting preceded “existence” sections as they are perhaps conceptually more uniform and they are also closer to this volume (of Cargese school). We consider here the following decision problem: 5.1. H-coloring Problem Consider the following decision problem (for a fixed graph H): Instance: A graph G. Question: Does there exists a homomophism G −→ H? This problem covers many concrete problems which were and are studied (see [22]): (i) For H = Kk (the complete graph with k vertices) we get a k-coloring problem; (ii) For graphs H = Kk/d we get circular chromatic numbers’ see e.g. [69]; (iii) For H Kneser graphs K kd we get so called multicoloring; [22]. Further examples include so called T -colorings, see e.g.[69],[62], which in turn are related to the recently popular channel assignment problem. Perhaps the most extensively studied aspect of H-coloring problems is its complexity. This is interesting and generally still unresolved. The situation is well understood for complete graphs: For any fixed k ≥ 3 the Kk -coloring problem (which is equivalent to the deciding whether χ(G) ≤ k) is NP-complete. On the other hand K1 - and K2 -coloring problems are easy. Thus, in the undirected case, we can always assume that the graph H is not bipartite. Some other problems are easy to discuss. For example, if H = C5 then we can consider the following (arrow replacement) construction: For a given graph G let G∗∗ be the graph which we obtain from G by replacing of every edge of G by a path of length 3 (these paths are supposed to be internally disjoint). Another way to say this is to consider a subdivision of G where each edge is subdivided by exactly two new vertices. It is now easy to prove that for any undirected graph G the following two statements are equivalent: (i) G −→ K5 ; (ii) G∗∗ −→ C5 . This example is not isolated (the similar trick may be used e.g. for any odd cycle). Using analogous, but more involved, edge-, vertex- and other replacement constructions (called indicators, subindicators, and edge - subindicators) the following has been proved in [23]: 5.2. Theorem For a graph H the following two statements are equivalent :
302
J. Nešetˇril / Homomorphisms of Structures Concepts and Highlight
(1) H is non-bipartite; (2) H-coloring problem is NP-complete. This theorem (and its proof) has some particular features which we are now going to explain: 1. The result claimed by the theorem is expected. In fact the result has been a long standing conjecture, but it took nearly 10 years before the conjecture had been verified. 2. As the statement of 5.2 is expected, so its proof is unexpected. What one would expect in this situation? Well, we should prove first that C2k+1 -coloring is NP-complete (which is easy and in fact we sketched this above) and then we would“observe" that the problem is monotone. Formally, iff H-coloring problem is NP-complete and H ⊆ H then also H -coloring problem is NP-complete. The monotonicity may sound plausible but there is not known a direct proof of it. It is certainly a true statement (by virtue of Theorem 5.2 ) but presently the only known proof is via the Theorem 5.2. In fact there is here more than meets the eye : for oriented graphs the NP-complete instances are not monotone! 3. We have to stress that the analogy of Theorem 5.2 for oriented graphs fails to the true. of bipartite graph H such that the H One can construct easily an orientation H coloring problem is NP-complete. Even more so, one can construct a balanced oriented with the property that the H-coloring graph H problem is NP-complete (an oriented graph is called balanced if every cycle has the same number of forward and backward arcs). One can go even further and (perhaps bit surprisingly) one can omit all cycles. Namely, one has the following [24]: 5.3. Theorem There exists an oriented tree T (i.e. T is an orientation of an undirected tree) such that the T -coloring problem is NP-complete. Presently the smallest such tree T has 45 vertices.
6. Constrained Satisfaction Problems (CSP) Every part of mathematics has some typical features which presents both its advantages and its limitations. One of such feature for the study of homomorphisms is the fact that its problems are usually easy to generalize and formulate, that there is the basic thread which allows to concentrate on important and “natural” questions (to try to explain this is also the main motif of this paper). The H-coloring problem (explained in the previous section) is a good example of this. One can formulate it more generally for every finite structure. We consider the general relational structures (so general that they are sometimes called just finite structures): 6.1. Relational Structures A relational structure of a given type generalizes the notion of a relation and of a graph to more relations and to higher (non-binary) arities. The concept was isolated in the thirties by logicians (e.g. Löwenheim, Skolem) who developed logical “static” theory. As we
J. Nešetˇril / Homomorphisms of Structures Concepts and Highlight
303
shall see this influenced terminology even today as we find useful to speak about models (of our chosen relational language). In the sixties new impulses came from the study of algebraic categories and the resulting “dynamic” studies called for a more explicit approach, see e.g [61]. We shall adopt here a later notation (with a touch of logical vocabulary). A type Δ is a sequence (δi ; i ∈ I) of positive integers. A relational system A of type Δ is a pair (X, (Ri ; i ∈ I)) where X is a set and Ri ∈ X δi ; that is Ri is a δi -nary relation on X. In this paper we shall always assume that X and I are finite sets (thus we consider finite relational systems only). The type Δ = (δi ; i ∈ I) will be fixed throughout this paper. Note that for the type Δ = (2) relational systems of type Δ correspond to directed graphs, the case Δ = (2, 2) corresponds to directed graphs with blue-green colored edges (or rather arcs). Relational systems (of type or signature Δ) will be denoted by capital letters A, B, C,..... A relational system of type Δ is also called a Δ-system (or a model). If A = (X, (Ri ; i ∈ I)) we also denote the base set X as A and the relation Ri by Ri (A). Let A = (X, (Ri ; i ∈ I)) and B = (Y, (Si ; i ∈ I)) be Δ-systems. A mapping f : X → Y is called a homomorphism if for each i ∈ I holds: (x1 , ..., xδi ) ∈ Ri implies (f (x1 ), ..., f (xδi )) ∈ Si . In other words a homomorphism f is any mapping F : A → B which satisfies f (Ri (A)) ⊆ Ri (B) for each i ∈ I. (Here we extended the definition of f by putting f (x1 , ..., xt ) = (f (x1 ), ..., f (xt )).) For Δ-systems A and B we write A → B if there exists a homomorphism from A to B. Hence the symbol → denotes a relation that is defined on the class of all Δ-systems. This relation is clearly reflexive and transitive, thus induces a quasi-ordering of all Δsystems. As is usual with quasi-orderings, it is convenient to reduce it to a partial order on classes of equivalent objects: Two Δ-systems A and B are called homomorphically equivalent if we have both A → B and B → A; we then write A ∼ B. For every A there exists up to an isomorphism unique A such that A ∼ A and A has smallest size |A |. Such A is called the core of A. The relation → induces an order on the classes of homomorphically equivalent Δsystem, which we call the homomorphism order. (So this is partial order when restricted to non-isomorphic core structures.) The homomorphism order will be denoted by CΔ (as it is also called coloring order). We denote by Rel(Δ) the class of all finite relational structures of type Δ and all homomorphisms between them. This category plays a special role in the model theory and theory of categories [61]. It is also central in the branch of Artificial Intelligence (AI) dealing with Constraint Satisfaction Problems [11]. The expressive power of homomorphisms between relational structures leads to the following: 6.2. Theorem ([14]) Every Constraint Satisfaction Problem can be expressed as a membership problem for a class CSP(B) of relational structures (of a certain type Δ) defined as follows: CSP(B) = {A; A −→ B}. Recall that the membership problem for a class K is the following problem: Given a structure A does A belong to K? Is it A ∈ K? For brevity we call the membership prob-
304
J. Nešetˇril / Homomorphisms of Structures Concepts and Highlight
lem for class CSP(B) simply B−coloring problem, or CSP(B) problem. The structure B is usually called template of CSP(B). (Generalized) coloring problems cover a wide spectrum (applications rich) problems. This attracted recently a very active research on the boundary of complexity theory, combinatorics, logic and universal algebra. Only some of it will be review in this paper. However the complexity status of the CSP(B) problem is solved only for special and rather restricted situations. The following are principal results: 1. undirected graph coloring (i.e. Theorem 5.2), see [23]; 2. the characterization of complexity of B−coloring problems for structures B which are binary (i.e. for which |B| = 2), see [64]; 3. the characterization of complexity of B−coloring problems for structures B which are ternary (i.e. for which |B| = 3), see [7]. The last two results may seem to be easy, or limited, but reader should realize that while the size of the |B| may be small (such as 2 or 3) the relational system can in fact be very large as the arities δi of relations Ri (B) may be arbitrary large. Whole book [11] is devoted to the case |B| = 2. Nevertheless in all known instances one proves that the CSP(B) problem is either polynomial (the class of polynomial problems is denoted by P) or NP-complete. This is remarkable as such dichotomy generally does not hold. Of course, there is a possibility that the classes P and NP coincide (this constitutes famous P-NP problem; one of the millennium problems). But if these classes are distinct (i.e. if P ⊂ NP) then there are infinitely intermediate classes (by a celebrated result of Ladner [39]). This (and other more theoretical evidence) prompted Feder and Vardi [14] to formulate the following by now well known problem: 6.3. Dichotomy Conjecture Every CSP(B) problem is either P or NP-complete. Although this is open, a lot of work was done. Let us finish this section by formulating two related results. 6.3.1. Oriented Graphs Suffice At the first glance the complexity of the CSP(B) problem lies in the great variety of possible relational structures. Already in [14] it has been realized that it is not so. Theorem. The dichotomy conjecture follows from the dichotomy of the complexity of H-coloring problem where H is an oriented graph. This is interesting as this positions Theorem 5.2 in the new light and shows a surprising difference between colorings (partitions) of undirected and directed graphs which was not before realized. See [14] for the original proof; see also [22]. 6.3.2. Dichotomy is Asymptotically Almost Surely True Relational structures and homomorphisms express various decision and counting combinatorial problems such as colouring, satisfiability, and linear algebra problems. Many of them can be reduced to special cases of a general Constraint Satisfaction Problem CSP(B). A number of such problems have been studied and have known complexity, e.g., when we deal with undirected graphs or the problem is restricted to small sets
J. Nešetˇril / Homomorphisms of Structures Concepts and Highlight
305
A (see [64,23,7]). However at this moment we are far from understanding the behavior of CSP(B) problem even for binary relations (i.e., when relational systems of type Δ = (2)). It seems that the Dichotomy Conjecture holds in a stronger sense: Dichotomy Conjecture*. Most CSP(B) problems are NP-complete problems with a few exceptions which are polynomial. For example for undirected graphs CSP(B)-problem is always hard problem with exactly 3 exceptions: B is homomorphically equivalent either to the loop graph, or to the single vertex graph (with no edges) or the symmetric edge, [23]. Results for other solved cases have a similar character supporting the modified Dichotomy Conjecture* (see [64,7]). One can confirm this feeling by proving that both Dichotomy and Dichotomy* Conjectures are equivalent: Theorem. Let Δ = (δi )i∈I be such that maxi∈I δi ≥ 2. Then CSP(B) is NPcomplete for almost all relational systems B of type Δ. (Note that for B of type (1, 1, . . . , 1) the problem CSP(B) is trivial.) In order to make the statement of Theorem precise, let R(n, k) denote a random k-ary relation defined on a set [n] = {1, 2, . . . , n}, for which the probability that (a1 , . . . , ak ) ∈ R(n, k) is equal to 1/2 independently for each (a1 , . . . , ak ), where 1 ≤ ar ≤ n for r = 1, . . . , k. Assume further that not all ai ’s are equal: for a ∈ [n], we assume (a, a, . . . , a) ∈ / R(n, k). Let ([n], (R(δi , n)i∈I )) denote the random relational system of type Δ = (δi )i∈I . In this situation we show that the probability that ([n], (R(δi , n)i∈I ))−coloring is NP-complete tends to one as either n, or maxi δi tends to infinity. Note that B−coloring problem for relational system B is NP-complete, provided it is NP-complete for the system (B, Ri0 (B)) of type (δi0 ) for some i0 ∈ I. (The converse implication, in general, does not hold). Thus, we prove our result for ‘simple’ relational systems which consist of just one k-ary relation. Theorem. For a fixed k ≥ 2, lim Pr ([n], R(n, k)) is NP complete = 1,
n→∞
while for a given n ≥ 2, lim Pr ([n], R(n, k)) is NP complete = 1.
k→∞
The proof uses properties of random graphs together with an algebraic approach to the dichotomy conjecture (based on the analysis of clones of polymorphisms) which was pioneered by [28,8].
7. Dualities From the combinatorial point of view there is a standard way how to approach (and sometimes to solve) a monotone property P : one investigates those structures without the property P which are critical, (or minimal) without P . One proceeds as follows: denote by F the class of all critical structures and define the class Forb(F) of all structures which do not “contain” any F ∈ F. The class Forb(F) is the class of all structures not containing any of the critical substructures and thus it is easy to see that Forb(F)
306
J. Nešetˇril / Homomorphisms of Structures Concepts and Highlight
coincides with the class of structures with the property P . Of course in most cases the class F is infinite yet a structural result about it may shed some light on property P . For example this is the case with 3-colorability of graphs where 4-critical graphs were (and are) studied thoroughly (historically mostly in relation to Four Color Conjecture). Of particular interest (and as the extremal case in our setting) are those monotone properties P of structures which can be described by finitely many forbidden substructures. The object of the theory of homomorphism duality is to characterize a family F of obstructions to the existence of a homomorphism into a given structure B. In a large sense, such a class F always exists; for instance, the class of all the structures not admitting a homomorphism to B has this property. However, it is desirable to seek a more tractable family of obstructions to make this characterization meaningful. The classical examples of graph theory makes this point clear. A graph is bipartite if and only if it does not contain an odd cycle; hence, the odd cycles are a family of obstructions to the existence of a homomorphism into the complete graph K2 . However, the class of directed graphs provides a much more fertile ground for the theory, and numerous examples of tree dualities and of bounded treewidth dualities are known (see [24]). When the family F of obstructions is finite (or algorithmically “well behaved”), then such theorems clearly provide an example of good characterisations (in the sense of Edmonds). Any instance of such good characterisation is called a homomorphism duality. This concept was introduced by [52] and applied to various graph theoretical good characterisations. The simplest homomorphism dualities are those where the family of obstructions consists from only singletons (i.e. single structures). In the other words such homomorphism dualities are described by a pair A, B of structures as follows: (Singleton) Homomorphism Duality Scheme A structure C admits a homomorphism into B if and only if A does not admit a homomorphism into C. Despite the fact that singleton homomorphism dualities are scarce for both undirected and directed graphs, for more general structures (such as oriented matroids with suitable version of strong maps) the (singleton) homomorphism duality may capture general theorems such as Farkas Lemma (see [25]). In [52] are described all singleton homomorphism dualities for undirected graphs. As a culmination of several partial results all homomorphism dualities for general relational structures were finally described in [57]. This is not the end and more recently the homomorphism dualities emerged as an important phenomena in new context. This we will be briefly described. 7.1. Generalized CSP Classes For the finite set of structures D in Rel(Δ) we denote by CSP(D) the class of all structures A ∈ Rel(Δ) satisfying A −→ D for some D ∈ D. Thus CSP(D) is the union of classes CSP(D) for all D ∈ D. This definition (of generalized CSP class) is sometimes more convenient and in fact generalized CSP classes are polynomially equivalent to the the classes described syntactically as MMSNP ( [14]; the equivalence is non trivial and follows from [31]). These classes are sometimes called color classes and denoted by −→ D = CSP(D).
J. Nešetˇril / Homomorphisms of Structures Concepts and Highlight
307
7.2. Forb Classes Let F be a finite set of structures of Rel(Δ). Denote by Forb(F) the class of all structures A ∈ Rel(Δ) which do not permit a homomorphism F −→ A for any F ∈ F. Formally: Forb(F ) = {A | there is no f : F → A withF ∈ F } Let us remark that these classes are sometimes denoted by Forb(F ) = F −→ . 7.3. Finite Duality Finite duality is the equation of two classes: of the class Forb(F ) and of the class CSP(D) for a particular choice of forbidden set F and dual set D. Formally: Forb(F) = CSP(D). We also say that D has finite duality. Finite dualities were defined in [52]. They are being intensively studied from the logical point of view, and also in the optimization (mostly CSP) context. We say that a class K ⊂ Rel(Δ) is First Order definable if there exists a first order formula φ (i.e. quantification is allowed only over elements) such that the class K is just the class of all structures A ∈ Rel(Δ) where φ is valid. Formally: K = {A; A |= φ}. It has been recently showed [1,63] that if the class CSP(D) is first order definable then it has finite duality. (This is a consequence of the solution of an important homomorphism preservation conjecture solved in [63].) On the other hand the finite dualities in categories Rel(Δ) were characterized in [15] as an extension of [57]. By combining these results we obtain: Theorem. For a finite set D relational structures in Rel(Δ) are the following statements equivalent: (i) The class CSP(D) is first order definable; (ii) D has finite duality; explicitly, there exists a finite set F such that Forb(F ) = CSP(D); (iii) Forb(F ) = CSP(D) for a finite set A of finite forests. We did not define what is a forest in a structure(see [57,15]). For the sake of completeness let us say that a forest is a structure not containing any cycle. And a cycle in a structure A is either a sequence of distinct points and distinct tuples x0 , r1 , x1 , . . . , rt , xt = x0 where each tuple ri belongs to one of the relations R(A) and each xi is a coordinate of ri and ri+1 , or, in the degenerated case t = 1 a relational tuple with at least one multiple coordinate. The length of the cycle is the integer t in the first case and 1 in the second case. Finally the girth of a structure A is the shortest length of a cycle in A (if it exists; otherwise it is a forest).
308
J. Nešetˇril / Homomorphisms of Structures Concepts and Highlight
In a sharp contrast with that, there are no finite dualities for (general) finite algebras. It has been recently shown [33] that there are no such dualities at all. Namely, one has Theorem. For every finite set A of finite algebras of a given type (δi )i∈I and every finite algebra B there exists a finite algebra A such that A ∈ Forb(A) and A ∈ / CSP(B). (This concerns the standard homomorphisms f : (X, (αi )i∈T ) → (X , (αi )i∈T ) satisfying x = αi (x1 , . . . , xni ) ⇒ f (x) = αi (f (x1 ), . . . , f (xni ))). )
(∗)
8. Restricted Dualities 8.1. Special Classes In this section we deal with graphs only. We motivate this section by the following two examples. Example1. Celebrated Grötzsch’s theorem (see e.g. [2]) says that every planar graph is 3-colourable. In the language of homomorphisms this says that for every triangle free planar graph G there is a homomorphism of G into K3 . Using the partial order terminology (for the homomorphism order CΔ ) the Grötzsch’s theorem says that K3 is an upper bound (in the homomorphism order) for the class P3 of all planar triangle free graphs. As obviously K3 ∈ P3 a natural question (first formulated in [48]) suggests: Is there yet a smaller bound? The answer, which may be viewed as a strengthening of Grötzsch’s theorem, is positive: there exists a triangle free 3-colorable graph H such that G → H for every graph G ∈ P3 . Explicitly: K3 −→ G ⇔ G −→ H for every planar graph G. Because of this we call such a theorem restricted duality. A restricted duality asserts the duality but only for structures in a restricted class of graphs. The (non-trivial) existence of graph H above has been proved in [49] (in a stronger version for proper minor closed classes). The case of planar graphs and triangle is interesting in its own as it is related to the Seymour conjecture and its partial solution [18], see [44]; it seems that a proper setting of this case is in the context of T T -continuous mappings, see [55]. Restricted duality results have been generalized since to other classes of graphs and to other forbidden subgraphs. In fact for every “forbidden” finite set of connected graphs we have a duality restricted to any proper subclass K of all graphs which is minor closed, see [49]. This then implies that Grötzsch’s theorem can be strengthened by a sequence of ever stronger bounds and that the supremum of the class of all triangle free planar graphs does not exist. Example2. Let us consider all sub-cubic graphs (i.e. graph with maximum degree ≤ 3). By Brooks theorem (see e.g. [2]) all these graphs are 3-colorable with the single connected exception K4 . What about the class of all sub-cubic triangle free graphs? Does there exists a triangle free 3-colorable bound? The positive answer to this question is given in [20]. In fact for every finite set F = {F1 , F2 , . . . , Ft } of connected graphs there exists a graph H with the following properties: - H is 3-chromatic;
J. Nešetˇril / Homomorphisms of Structures Concepts and Highlight
309
- G −→ H for every subcubic graph G ∈ Forb(F). It is interesting to note that while sub-cubic graphs have restricted dualities (and, more generally, this also holds for the classes of bounded degree graphs) for the classes of degenerated graphs a similar statement is not true (in fact, with a few trivial exceptions, it is never true). Where lies the boundary for validity of restricted dualities? We clarify this after introducing the formal definition. Definiton. A class K of graphs has a all restricted dualities if, for any finite set K such that of connected graphs F = {F1 , F2 , . . . , Ft }, there exists a finite graph DF K Fi −→ DF for i = 1, . . . , t and such that for all G ∈ K holds. Explicitly: K ). (Fi −→ G), i = 1, 2, . . . , t, ⇐⇒ (G −→ DF
It is easy to see that using the homomorphism order we can reformulate this definition as follows: A class K has restricted dualities if for any finite set of connected graphs F = {F1 , F2 , . . . , Ft } the class Forb(F) ∩ K is bounded in the class Forb(F ). The main result of [50] can be then stated as follows: Theorem. Any class of graphs with bounded expansion has all restricted dualities. Of course we have to yet define bounded expansion (and we do so in the next section). But let us just note that both proper minor closed classes and bounded degree graphs form classes of bounded expansion. Consequently this result generalizes both Examples 1. and 2. In fact the seeming incomparability of bounded degree graphs and minor closed classes led the authors of [50] to the definition of bounded expansion classes. 8.2. Bounded expansion classes. Recall that the maximum average degree mad(G) of a graph G is the maximum over all subgraphs H of G of the average degree of H, that is mad(G) = maxH⊆G 2|E(H)| |V (H)| . The distance d(x, y) between two vertices x and y of a graph is the minimum length of a path linking x and y, or ∞ if x and y do not belong to same connected component. We introduce several notations: • The radius ρ(G) of a connected graph G is: ρ(G) = min
max d(r, x)
r∈V (G) x∈V (G)
• A center of G is a vertex r such that maxx∈V (G) d(r, x) = ρ(G). Definition Let G be a graph. A ball of G is a subset of vertices inducing a connected subgraph. The set of all the families of pairwise disjoint balls of G is noted B(G). Let P = {V1 , . . . , Vp } be a family of pairwise disjoint balls of G. • The radius ρ(P) of P is ρ(P) = maxX∈P ρ(G[X]) • The quotient G/P of G by P is a graph with vertex set {1, . . . , p} and edge set E(G/P) = {{i, j} : (Vi × Vj ) ∩ E(G) = ∅ or Vi ∩ Vj = ∅}.
310
J. Nešetˇril / Homomorphisms of Structures Concepts and Highlight
We introduce several invariants that generalize the one of maximum average degree: Definition The greatest reduced average density (grad) of G with rank r is |E(G/P)| P∈B(G) |P|
∇r (G) = max
ρ(P)≤r
The following is our key definition: Definition A class of graphs K has bounded expansion if there exists a function f : N → N such that for every graph G ∈ K and every r holds ∇r (G) ≤ f (r).
(1)
f is called the expansion function. Proper minor closed classes have expansion function bounded by a constant, regular graphs by the exponential function, geometric graphs such as d-dimensional meshes have polynomial expansion. Expansion function can grow arbitrary fast. Finally note that bounded expansion classes have many applications. Some of them are included in [51]. 8.3. Lifts and Shadows We return to the general relational structures. We restrict by generalizing to a particular situation. Duals of structures are a fascinating subject (see e.g. [34,58,56]). In closing these sections on dualities we want to briefly stress the following aspect which is on the first glance surprising. Consider again the general duality scheme: structure C admits a homomorphism to D ∈ D if and only if F does not admit a homomorphism into C for any F ∈ F; formally: Forb(F ) = CSP(D). One is somehow tempted to think that the left side of the definition is somewhat more restrictive, that the finitely many obstacles make the problem easy if not trivial. After all, while the class CSP(D) may be complicated and have membership problem NP-complete even for a simple graph D (such as K3 ) the left side is always polynomially decidable (for every finite set F). But in a way this is a misleading argument. The expressing power of the classes Forb(F) for finite sets F is very large. This follows from a recent work [32] which we now briefly describe. We start with an example. Think of 3-coloring of graph G = (V, E). This is a well known hard problem and there is a multiple evidence for this: concrete instances of the problem are difficult to solve (if you want a non-trivial example consider Kneser graphs; [42]), there is an abundance of minimal graphs which are not 3-colorable (these are called 4-critical graphs, see e.g. [29]) and in the full generality (and even for important “small" subclasses such as 4-regular graphs or planar graphs) the problem is a canonical NP-complete problem. Yet the problem has an easy formulation. A 3-coloring is simple to formulate even at the kindergarten level. This is in a sharp contrast with the usual definition of the class NP by means of polynomially bounded non-deterministic computations. Fagin [13] gave
J. Nešetˇril / Homomorphisms of Structures Concepts and Highlight
311
a concise description of the class NP by means of logic: NP languages are just languages accepted by an Existential Second Order (ESO) formula of the form ∃P Ψ(S, P ), where S is the set of input relations, P is a set of existential relations, the proof for the membership in the class, and Ψ is a first-order formula without existential quantifiers. This definition of NP inspired a sequence of related investigations and these descriptive complexity results established that most major complexity classes can be characterized in terms of logical definability of finite structures. Particularly this led Feder and Vardi [14] to their seminal reduction of Constraint Satisfaction Problems to so called MMSNP (Monotone Monadic Strict Nondeterministic Polynomial) problems which also nicely link MMSNP to the class NP in computational sense. Inspired by these results we would like to ask an even simpler question: Can one express the computational power of the class NP by combinatorial means? It may seem to be surprising that the classes of relational structures defined by ESO formulas (i.e. the whole class NP) are polynomially equivalent to canonical lifts of structures which are defined by a finite set of forbidden substructures. Shortly, finitely many forbidden lifts determine any language in NP. Let us briefly illustrate this by our example of 3-colorability: Instead of a graph G = (V, E) we consider the graph G together with three unary relations C1 , C2 , C3 which cover the vertex set V ; this structure will be denoted by G and called a lift of G (G has one binary and three unary relations). There are 3 forbidden substructures or patterns: For each i = 1, 2, 3 the graph K2 together with cover Ci = {1, 2} and Cj = ∅ for j = i form pattern Fi (where the signature of Fi contains one binary and three unary relations). The class of all 3-colorable graphs then corresponds just to the class Φ(Forb(F1 , F2 , F3 )) where Φ is the forgetful functor which transforms G to G and the language of 3-colorable graphs is just the language of the class satisfying formula ∃G (G ∈ Forb(F1 , F2 , F3 )). This extended language (of structures G ) of course expresses the membership of 3-colorability to the class NP. Let us define lifts and shadows more formally: We will work with two (fixed) signatures, Δ and Δ ∪ Δ (the signatures Δ and Δ are always supposed to be disjoint). For convenience we denote structures in Rel(Δ) by A, B etc. and structures in Rel(Δ ∪ Δ ) by A , B etc. For convenience we shall denote Rel(Δ∪Δ ) by Rel(Δ, Δ ). The classes Rel(Δ) and Rel(Δ, Δ ) will be considered as categories endowed with all homomorphisms. The interplay of categories Rel(Δ, Δ ) and Rel(Δ) is here the central theme. Towards this end we define the following notion: Let Φ : Rel(Δ, Δ ) → Rel(Δ) denote the natural forgetful functor that “forgets” relations in Δ . Explicitly, for a structure A ∈ Rel(Δ, Δ ) we denote by Φ(A ) the corresponding structure A ∈ Rel(Δ) defined by X(A ) = X(A), R(A ) = R(A) for every R ∈ Δ (for homomorphisms we have Φ(f ) = f ). These object-transformations call for a special terminology: For A ∈ Rel(Δ, Δ ) we call Φ(A ) = A the shadow of A . Any A with Φ(A ) = A is called a lift of A. The analogous terminology is used for subclasses K of Rel(Δ, Δ ) and Rel(Δ). The following combinatorial characterization of NP was recently proved in [32]:
312
J. Nešetˇril / Homomorphisms of Structures Concepts and Highlight
Theorem. For every language L ∈ N P there exist relational types Δ, Δ and a finite set F of structures in Rel(Δ, Δ ) such that L is computationally equivalent to Φ(F orb(F )). Moreover, we may assume that the relations in Δ are at most binary. We omit the technical details (which are involved) but let us add the following: There seems to be here more than meets the eye. This scheme fits nicely into the mainstream combinatorial and combinatorial complexity research. Building upon Feder-Vardi classification of MMSNP we can isolate three computationally equivalent formulations of NP class: 1. By means of shadows of forbidden homomorphisms of relational lifts (the corresponding category is denoted by Relcov (Δ, Δ )), 2. By means of shadows of forbidden injections (monomorphisms) of monadic lifts (i.e. with type Δ consisting from unary relations only), 3. By means of shadows of forbidden full homomorphisms of monadic lifts (full homomorphisms preserve both edges and also non-edges). Our results imply that each of these approaches includes the whole class NP. It is interesting to note how nicely these categories fit to the combinatorial common sense about the difficulty of problems: On the one side the problems in CSP correspond and generalize ordinary (vertex) coloring problems. One expects a dichotomy here. On the other side the above classes 1. − 3. model the whole class NP and thus we cannot expect dichotomy there. But this is in accordance with the combinatorial meaning of these classes: the class 1. expresses coloring of edges, triples etc. and thus it involves problems in Ramsey theory [19,45]. The class (2) may express vertex coloring of classes with restricted degrees of its vertices (which is difficult restriction in a homomorhism context) . The class (3) relates to vertex colorings with a given pattern among classes which appears in many graph decomposition techniques (for example in the solution of the Perfect Graph Conjecture [9]). The point of view of forbidden partitions (in the language of graphs and matrices) is taken for example in [17]. This clear difference between combinatorial interpretations of syntactic restrictions on formulas expressing the computational power of NP is one of the pleasant consequences of this approach. See [32] for details and other related problems.
9. Homomorphism Order Recall that CΔ denote the homomorphism quasiorder of all relational structures of type Δ: A ≤ B ⇔ A −→ B. There are surprising close connections between algorithmic questions (which motivated dualities) and order theoretic properties of CΔ . We mention two such results (characterization theorems). 9.1. Gaps and Density A pair (A, B) of structures is said to be a gap in CΔ if A < B and there is no structure C such that A < C < B.
J. Nešetˇril / Homomorphisms of Structures Concepts and Highlight
313
Similarly, for a subset K of C, a pair (A, B) of structures of K is said to be a gap in K if A < B and there is no structure C ∈ K such that A < C < B. The Density Problem for a class K is ask for the description of all gaps of the class K. This is a challenging problem even in the simplest case of the class of all undirected graphs. This question has been asked first in the context of the structure properties of classes of languages and grammar forms. The problem has been solved in [68]: Theorem. The pairs (K0 , K1 ) and (K1 , K2 ) are the only gaps for the class of all undirected graphs. Explicitly, given undirected graphs G1 , G2 , G1 < G2 , G1 = K0 and G1 = K1 there is a graph G satisfying G1 < G < G2 . The density problem for general classes Rel(Δ) was solved only in [57] in the context of the characterization of finite dualities. Theorem. For every class Rel(Δ) the following holds: 1. For every (relational) tree T there exists unique structure PT predecessor of T such that the pair (PT , T) is a gap in Rel(Δ); 2. Up to homomorphism equivalence there are no other gaps in Rel(Δ) of the form A < B with B connected. The importance of this lies in the next result (a gap of the form A < B with B connected is called a connected gap). Theorem. For every category Rel(Δ) there is a one to one correspondence between connected gaps and singleton dualities. In fact this theorem holds in a broad class of posets called Heyting posets, [53]. The characterization of gaps in the subclasses of structures presents a difficult problem. 9.2. Maximal Antichains Let P = (P, ≤) be a poset. We say that a subset Q of P is an antichain in P, if neither a ≤ b nor b ≤ a for any two elements a, b of Q (such elements are called incomparable (this fact is usually denoted by ab). A finite antichain Q is called maximal, if any set S such that Q S ⊆ P is not an antichain. One can determine maximal antichains in classes Rel(Δ). Consider a duality Forb(F) = CSP(D) and consider the set M = F ∪ D. Then M has the property that any other structure A ∈ Rel(Δ) is comparable to one of its elements (as any structure A ∈ Rel(Δ) either satisfies F → A for an F ∈ F or A → D for an D ∈ D). One can prove the converse of this statement: Theorem. Let Δ = (k). There is a one-to-one correspondence between generalised dualities and finite maximal antichains in the homomorphism order of Rel(Δ). 9.3. Universality The homomorphism order CΔ has spectacular properties. One of them is related to the following notion: A countable partially ordered set P is said to be universal if it contains any countable poset (as an induced subposet).
314
J. Nešetˇril / Homomorphisms of Structures Concepts and Highlight
A poset P is said to be homogeneous if every partial isomorphism between finite subposets extends to an isomorphism (of the whole poset). It is a classical model theory result that universal homogenous poset exists and that it is uniquely determined. Such universal homogeneous poset can be constructed in a standard model theoretic way as Fraïssé limit of all finite posets. The poset CΔ fails to be homogeneous (due to its algebraic structure) but it is universal [21]. The oriented graphs create here again a little surprise: Denote by P ath the partial order of finite oriented paths (i.e. “zig-zags”) with the homomorphism ordering. It seem that the order of paths is an easy one: • Paths can be coded by 0 − 1 sequences; • One can decide whether P ≤ P (by an easy rewriting rules); • Density problem for paths has been solved, [59]. We finish this paper with the following non-trivial result which found immediately several applications [26,27]: Theorem. The partial order P ath is universal.
Acknowledgements The author would like to express thanks for support by the project 1M0021620808 of the Ministry of Education of the Czech Republic. Part of this paper was written while visiting Theory Group at Microsoft Research.
References [1] A. Atserias, On digraph coloring problems and treewidths duality, 20th IEEE Symposium on Logic in Computer Science (LICS) (2005), 106-115. [2] B. Bollobás, Modern Graph Theory, Springer 2002. [3] Ch. Borgs, J. Chayes, L. Lovász, V.T. Sós, K. Vestergombi,Counting Graph Homomorphisms (to appear) [4] Ch. Borgs, J. Chayes, L. Lovász, V.T. Sós, B. Szegedy, K. Vestergombi, Graph limits and parameter testing, STOC’06, ACM (2006), 261-270. [5] G. Birkhoff, Generalized arithmetics, Duke Math. J. 9 (1942), 283-302. [6] G. Brightwell, P. Winkler, Graph homomorphisms and phase transitions, J. Comb Th. B 77(1999), 415-435. [7] A. Bulatov, A dichotomy theorem for constraints on a three element set, FOCS’02 (2002), 649–658. [8] A. Bulatov, P.G. Jeavons, A. Krokhin, The complexity of maximal constraint languages STOC’01 (2001), 667–674. [9] M. Chudnovsky, N. Robertson, P. Seymour, R. Thomas, The strong perfect graph theorem, Annals of Mathematics 164,1(2006), 51-229. [10] F. Chung, R.L. Graham, R.M. Wilson,Quasi-random graphs, Combinatorica 9(1989), 345362. [11] N. Creignou, S. Khanna, M. Sudan, Complexity classification of Boolean constraint satisfaction problems, SIAM, 2001. [12] A. Dochtermann, Hom complexes and homotopy theory in the category of graphs, arXiv:math.CO/0605275 (2006).
J. Nešetˇril / Homomorphisms of Structures Concepts and Highlight
315
[13] R. Fagin, Generalized first-order spectra and polynomial-time recognizable sets, Complexity of Computation (ed. R. Karp), SIAM-AMS Proceedings 7 (1974), pp. 43–73. [14] T. Feder, M. Vardi, The computational structure of monotone monadic SNP and constraint satisfaction: A study through Datalog and group theory, SIAM J. Comput. 28, 1 (1999), 57–104. [15] J. Foniok, J. Nešetˇril, C. Tardif, Generalized dualities and maximal finite antichains in the homomorphism order of relational structures, to appear in European J. Comb. [16] M. Freedman,L. Lovász, L. Schrijver, Reflection positivity, rank connectivity, and homomorphism of graphs (to appear) [17] T. Feder, P. Hell, S. Klein, and R. Motwani, Complexity of graph partition problems, 31st Annual ACM STOC (1999) 464–472. [18] B. Guenin, Edge coloring plane regular multigraphs, manuscript. [19] R. L. Graham, J. Spencer, B. L. Rothschild, Ramsey Theory, Wiley, New York, 1980. [20] R. Häggkvist and P. Hell, Universality of A-mote graphs, Europ. J. Combinatorics 14 (1993), 23–27. [21] Z. Hedrlín, On universal partly ordered sets and classes, J. Algebra 11(1969), 503-509. [22] P. Hell, J. Nešetˇril, Graphs and Homomorphisms Oxford University Press, Oxford, 2004. [23] P. Hell, J. Nešetˇril, Complexity of H-coloring, J. Comb. Th. B 48 (1990), 92-110. [24] P. Hell, J. Nešetˇril, X. Zhu, Duality and polynomial testing of tree homomorphisms, Trans. Amer. math. Soc. 348,4 (1996), 1281-1297. [25] W. Hochstätter,J. Nešetˇril, Linear Programming Duality and Morphisms, Comment. Math. Univ. Carolinae 40 (1999), no. 3, 557 - 592 [26] J. Hubiˇcka, J. Nešetˇril, Finite paths are universal, Order 21,3(2004), 181-200; Order 22,1 (2005), 21-40. [27] J. Hubiˇcka, J. Nešetˇril, Universal partial order represented by means of oriented trees and other simple graphs, European J. Comb. 26(2005), 765-778 (2005),21-40. [28] P.G. Jeavons, On the algebraic structure of combinatorial problems Theor. Comp. Sci. 200(1998), 185–204. [29] T. Jensen, B. Toft, Graph Coloring Problems, Wiley 1995. [30] D.N. Kozlov, Chromatic numbers, morphism complexes, and Stiefel-Whitney characteristic classes, In: Ceometric combinatorics, AMS (2006). [31] G. Kun, Constraints, MMSNP and expander structures, manuscript, 2006 [32] G. Kun, J. Nešetˇril, Forbidden Lifts (NP and CSP for combinatorists) submitted. [33] G. Kun, J. Nešetˇril, Density and Dualities for Algebras, submitted. [34] B. Larose, C. Lotte, C. Tardif, A characterisation of first-order constraint satisfaction problems, LICS 2006. [35] L. Lovász, The rank of connection matrices and the dimension of graph algebras, European J. Comb. (to appear). [36] L. Lovász, Operations with Structures, Acta.Math.Acad.Sci.Hung. 18 1967, 321 - 329 [37] L. Lovász, B. Szegedy, Szemerédi’s Lemma for the Analyst (to appear). [38] L. Lovász, B. Szegedy, Limits of dense graph sequences (to appear). [39] R. E. Ladner, On the structure of Polynomial Time Reducibility, Journal of the ACM, 22,1 (1975), 155–171. [40] T. Luczak, J. Nešetˇril, A probabilistic approach to the dichotomy problem (to appear in SIAM J. Comp.). [41] J. Matoušek, J. Nešetˇril, Invitation to Discrete Mathematics, Oxford University Press, 1998. [42] J. Matoušek, Using Borsuk-Ulam Theorem, Springer Verlag, Berlin, 2003. [43] R. McKenzie, The zig-zag property and exponential cancellation of ordered sets, to appear. [44] R. Naserasr, Homomorphisms and edge-coloring of planar graphs, J. Comb. Th. B (2006), to appear. [45] J. Nešetˇril, Ramsey Theory, Handbook of Combinatorics (eds. R. L. Graham, M. Grötschel, L. Lovász), Elsevier (1995), 1331-1403.
316
J. Nešetˇril / Homomorphisms of Structures Concepts and Highlight
[46] J. Hubiˇcka and J. Nešetˇril, Universal partial order represented by means of trees and other simple graphs, European J. Comb. 26 (2005), no. 5, 765–778. [47] J. Nešetˇril,Combinatorics of Mappings, Birkhäser Verlag (to appear). [48] J. Nešetˇril, Aspects of structural combinatorics, Taiwanese J. Math. 3 (1999), no. 4, 381–424. [49] J. Nešetˇril, P. Ossona de Mendez, Tree depth, subgraph coloring and homomorphism bounds, European Journal of Combinatorics (2005), (in press). [50] J. Nešetˇril, P. Ossona de Mendez, Grad and Classes with Bounded Expansion III. Restricted Dualities (submitted) [51] J. Nešetˇril, P. Ossona de Mendez, Linear Time Low tree-width Partitions and Algorithmic Consequences , STOC’06 (2006), ACM, 391-400. [52] J. Nešetˇril, A. Pultr, On classes of relations and graphs determined by subobjects and factorobjects, Discrete Math. 22 (1978), 287–300. [53] J. Nešetˇril, A. Pultr, C. Tardif,Gaps and dualities in Heyting categories (submitted). [54] J. Nešetˇril, V. Rödl, Chromatically optimal rigid graphs, J. Comb. Th. B 46 (1989), 133–141. [55] J. Nešetˇril and R. Šámal, Tension continuous maps-their structure and applications, ITI Series 2005-242, 2005. [56] Švejdarová Nešetˇril, J, Diameter of duals are linear, KAM-DIMATIA Series 2005-729, 2005. [57] J. Nešetˇril, C. Tardif, Duality theorems for finite structures (characterizing gaps and good characterizations), J. Comb. Th. B 80 (2000), 80-97. [58] J. Nešetˇril and C Tardif, Short answers to exponentially long questions: Extremal aspects of homomorphism duality, KAM-DIMATIA Series 2004-714, SIAM J. Disc. Math. (to appear). [59] J. Nešetˇril, X. Zhu: Path Homomorphisms, Proc. Camb. Phil. Soc., 120 (1996), 207-220. [60] A. Pultr, The right adjoints into the categories of relational systems, Lecture Notes in Mathematics 137 (1970), 100-113. [61] A. Pultr, V. Trnková, Combinatorial, Algebraic and Topological Representations of Groups, Semigroups and Categories, North Holland, 1980. [62] F.S. Roberts, T-colorings of Graphs Recent Results and Open Problems, Discrete Math. 93(1991), 229-245. [63] B. Rossman, Existential positive types and preservation under homomorphisms, 20th IEEE Symposium on Logic in Computer Science (LICS) (2005), 467-476. [64] T. Schaefer, The complexity of satisfiability problems In: STOC’78 (1978), 216–226. [65] C. Schulz, Graph colourings, spaces of edges, and spaces of circuits (prepint). [66] E. Szemerédi, On sets of integers containing no k elements in arithmetic progression27 (1975), 299-345. [67] A. Thomasson,Pseudorandom graphs, Random Graphs ’85, North Holland (1987), 307-331. [68] E. Welzl, Color Families are Dense, J. Theoret. Comp. Sci. 17 (1982), 29-41 [69] X. Zhu, Circular chromatic number: a surwey, In: Combinaotorics, Graph Theory, Algorithms and Applications (M.Fiedler, J. Kratochvíl, J. Nešetˇril, eds.), Discrete Math. 229, 1-3 (2001), 371-410.
Physics and Theoretical Computer Science J.-P. Gazeau et al. (Eds.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
317
Some Discrete Tools in Statistical Physics Martin Loebl 1 Charles University, Prague Abstract. We will be walking for some time where the connections between combinatorics and statistical physics lead us. Keywords. Graph, partition function, Ising problem, dimer arrangement, knot diagram
1. Beginning The purpose of this note is to describe in passing some beautiful basic concepts interlacing statistical physics, combinatorics and knot theory. There are many sources and the time constraint prevented me from adding the references; this is just an informal writeup, anyway. If you get hooked up in a topic, your library will have more detailed books on the subject probably. I am also writing a book which should roughly cover the themes of this paper. A graph is a pair (V, E) where V is a set of vertices and E is a set of unordered pairs from V , called edges. The notions of graph theory we will use are so natural there is no need to introduce them. 1.1. Euler’s Theorem Perhaps the first theorem of graph theory is the Euler’s theorem, and it is also about walking. T HEOREM 1 A graph G = (V, E) has a closed walk containing each edge exactly once if and only if it is connected and each vertex has an even number of edges incident with it. This theorem has an easy proof. Let us call a set A of edges even if each vertex of V is incident with an even number of edges of A. Connectivity and evenness are clearly necessary conditions for the existence of such a closed walk. Sufficiency follows from the following two lemmas. L EMMA 1 Each even set of edges is a disjoint union of sets of edges of cycles. 1 Correspondence to: Martin Loebl, Dept. of Applied Mathematics and Institute of Theoretical Computer Science (ITI), Charles University, Malostranske n. 25, 118 00 Praha 1, Czech Republic; E-mail: [email protected]
318
M. Loebl / Some Discrete Tools in Statistical Physics
L EMMA 2 A connected set of disjoint cycles admits a closed walk which goes through each edge exactly once. The first lemma might be called the greedy principle of walking: to prove the first lemma we observe first that each non-empty even set contains a cycle; if we delete it, we again get an even set and we can continue in this way until the remaining set is empty. The proof of the second lemma is also simple: we can compose the closed walk by the walks along the disjoint cycles. 1.2. Even sets of edges as a kernel We will often not distinguish a subset A of edges and its incidence vector χA , i.e. 0, 1vector indexed by the edges of G, with (χA )e = 1 iff e ∈ A. Let E(G) be the set of the even subsets of edges of the graph G. We denote by IG the incidence matrix of graph G, i.e. matrix with rows indexed by V (G), columns indexed by E(G), and (IG )ve equal to one if v ∈ e and zero otherwise. We immediately have Observation 1 E(G) forms the GF [2]-kernel of IG , i.e. E(G) = {v; IG v = 0 modulo 2}. What is the orthogonal complement of E(G) in GF [2]E(G) ? It is the set C(G) of edge-cuts of G; a set A of edges is called edge-cut if there is a set U of vertices such that A = {e ∈ E; |e ∩ U | = 1}. 1.3. Max-Cut, Min-Cut problems Max-Cut and Min-Cut problems belong to the basic hard problems of computer science. Given a graph G = (V, E) with a (rational) weight w(e) assigned to each edge e ∈ E, the Max-Cut problem asks for the maximum value of e∈C w(e) over all edge-cuts of G, while the Min-Cut problem asks for the minimum of the same function. Max-Cut problem is hard (NP-complete) for non-negative edge-weights and hence both Max-Cut and Min-Cut problems are hard for general rational edge-weights. The Min-Cut problem is efficiently (polynomially) solvable for non-negative edge-weights. This has been a fundamental result of computer science, and is known as ‘max-flow, min-cut algorithm’. Still, there are some special important classes of graphs where the Max-Cut problem is efficiently solvable. One such class is the class of the planar graphs. 1.4. Max-Cut problem for planar graphs A graph is called planar if it can be represented in the plane so that the vertices are different points, the edges are arcs (by arc we mean an injective continuous map of the closed interval [0, 1] to the plane) connecting the representations of their vertices, and disjoint with the rest of the representation. We will also say that the planar graphs have proper planar drawing, and a properly drawn planar graph will be called topological planar graph. Let G be a topological planar graph and let γ be the subset of the plane consisting of the planar representation of G. After deletion of γ, the plane is partitioned
M. Loebl / Some Discrete Tools in Statistical Physics
319
into ‘islands’ which are called faces of G. We let F (G) be the set of the faces of G and we will denote by v(G), e(G), f (G) the number of vertices, edges and faces of G and recall the Euler’s formula: v(G) − e(G) + f (G) = 2. An important concept we need is that of dual graph G∗ of a topological graph G. It turns out convenient to define G∗ as an abstract (not topological) graph. But we need to allow multiple edges and loops which is not included in the concept of the graph as a pair (V, E), where E ⊂ V2 . A standard way out is to a graph as a triple (V, E, g) where V, E are sets and define g is a function from E to V2 ∪ V which gives to each edge its terminal vertices. For instance e ∈ E is a loop iff g(e) ∈ V . Now we can define G∗ as triple (F (G), {e∗ ; e ∈ E(G)}, g) where g(e∗ ) = {f ∈ F (G); e belongs to the boundary of f }. If G is a topological planar graph then G∗ is planar. There is a natural way to properly draw G∗ to the plane: represent each vertex f ∈ F (G) as a point in the face f , and represent each edge e∗ by an arc between the corresponding points, which crosses exactly once the representation of e in G and is disjoint with the rest of the representations of G and G∗ . We will say that a set A of edges of a topological planar graph is dual even if {e∗ ; e ∈ A} is an even set of edges of G∗ . Observation 2 The dual even subsets of edges of G are exactly the edge-cuts of G∗ . These considerations reduce the Max-Cut problem in the class of the planar graphs to the following problem, again in the class of the planar graphs: Maximum even subset problem. Given a graph G = (V, E) with rational weights on the edges, find the maximum value of e∈H w(e) over all even subsets H of edges. Finally the following theorem means that the Max-Cut problem is efficiently solvable for the planar graphs. T HEOREM 2 The Maximum even subset problem is efficiently solvable for general graphs. 1.5. Edwards-Anderson Ising model The Max-Cut problem has a long history in computer science, but one of the basic applications comes from the study of the Ising model, a theoretical physics model of the nearest-neighbor interactions in a crystal structure. In the Ising model, the vertices of a graph G = (V, E) represent particles and the edges describe interactions between pairs of particles. The most common example is a planar square lattice where each particle interacts only with its neighbors. Often, one adds edges connecting the first and last vertex in each row and column, which represent periodic boundary conditions in the model. This makes the graph a toroidal square lattice. Now, we assign a factor Jij to each edge {i, j}; this factor describes the nature of the interaction between particles i and j. A physical state of the system is an assignment of σi ∈ {+1, −1} to each vertex i. This describes the two possible spin orientations the particle can take. The Hamiltonian (or energy function) of the system is then defined as
320
M. Loebl / Some Discrete Tools in Statistical Physics
H(σ) = −
Jij σi σj
{i,j}∈E
One of the key questions we may ask about a specific system is: “What is the lowest possible energy (the ground state) of the system?" Before we seek an answer to this question, we should realize that the physical states (spin assignments) correspond exactly to the edge-cuts of the underlying graph with specified ‘shores’. Let us define:
V1 = {i ∈ V ; σi = +1}
V2 = {i ∈ V ; σi = −1} Then this partition of vertices encodes uniquely the assignment of spins to particles. The edges contained in the edge-cut C(V1 , V2 ) are those connecting a pair of particles with different spins, and those outside the cut connect pairs with equal spins. This allows us to rewrite the Hamiltonian in the following way:
H(σ) =
{i,j}∈C
Jij −
Jij = 2w(C) − W,
{i,j}∈E\C
where w(C) = {i,j}∈C Jij denotes the weight of a cut, and W = {i,j}∈E Jij is the sum of all edge weights in the graph. Clearly, if we find the value of MAX-CUT, we have found the maximum energy of the physical system. Similarly, MIN-CUT (the cut with minimum possible weight) corresponds to the minimum energy of the system. The distribution of the physical states over all possible energy levels is encapsulated in the partition function: Z(G, β) =
e−βH(σ) .
σ
The variable β is changed for K/T in the Ising model, where K is a constant and T is a variable representing the temperature. It follows from 1.4 that there is an efficient algorithm to determine the ground state energy of the Ising model on any planar graph. In fact the whole partition function may be determined efficiently for planar graphs, and a principal ingredient is the following concept of ‘enumeration duality’.
321
M. Loebl / Some Discrete Tools in Statistical Physics
1.6. An enumeration duality It turns out that the Ising partition function for a graph G may be expressed in terms of the generating function of the even sets of the same graph G. This is the seminal theorem of Van der Waerden whose proof is so simple that we include it here. We will use the following standard notations: sinh(x) = 1/2(ex − e−x ), cosh(x) = 1/2(ex + e−x ), sinh(x) tanh(x) = cosh(x) . T HEOREM 3 Let G = (V, E) be a graph with edge weights Jij , ij ∈ E. Then Z(G, β) = 2|V |
cosh(βJij )E(G, x)|xJij :=tanh(βJij ) .
ij∈E
Proof. We have Z(G, β) =
eβ
P ij
Jij σi σj
=
σ
cosh(βJij )
ij∈E
(cosh(βJij ) + σi σj sinh(βJij )) =
σ ij∈E
(1+σi σj tanh(βJij )) =
σ ij∈E
ij∈E
cosh(βJij )
cosh(βJij )
(U (A)
A⊂E
σi σj tanh(βJij ) =
σ A⊂E ij∈E
ij∈E
tanh(βJij )),
ij∈A
where U (A) =
σi σj .
σ ij∈A
The proof is complete when we notice that U (A) = 2|V | if A is even and U (A) = 0 otherwise. We saw above that Z(G, β) may be looked at as the generating function of the edgecuts with the specified shores. The theorem of Van der Waerden expresses it in terms of the generating function E(G, x) of the even sets of edges. We can also consider the honest generating function of edge-cuts defined by C(G, x) =
xw(C) ,
cutC
where the sum is over all edge-cuts of G and w(C) = e∈C w(e). It turns out that C(G, x) may also be expressed in terms of E(G, x). This is a consequence of another seminal theorem, of MacWilliams which we explain now. Let C ⊂ GF [2]n be a binary code, i.e. a subspace over GF [2]. Let Ai (C) denote the number of vectors of C with exactly i occurrences of 1. The weight enumerator of C is defined as
322
M. Loebl / Some Discrete Tools in Statistical Physics
AC (y) =
Ai (C)y i .
i≥0
let us denote by C ∗ the dual code, i.e. the orthogonal complement of C. MacWilliam’s theorem reads as follows: T HEOREM 4 AC ∗ (y) =
1 1−y (1 + y)n AC ( ). |C| 1+y
We saw before that the set of the edge-cuts and the set of the even sets of edges form dual binary codes, hence MacWilliams’ theorem applies. This theorem is true more generally for linear codes over finite field GF [q]; hence it applies to the kernel and the image of the incidence matrix of a graph, viewed over GF [q]. This is related to the extensively studied field of nowhere-zero flows. 1.7. A game of dualities: critical temperature of 2D Ising model We will end this introductory part by an exhibition of a game of dualities. We will assume that our graph G = (V, E) is a planar square grid, and we denote by N its number of vertices. This is a rude specialisation for the graph-theorists, but not for statistical physicists since planar square grids are of basic importance for 2-dimensional Ising problem. Moreover for simplicity we will have all the edges of the same weight, i.e. Jij = J for each ij ∈ E. Hence Z(G, β) = Z(N, γ) =
eγ
P ij∈E
σi σj
,
σ
where γ = J/T and T represents the temperature. We will take advantage of the interplay between the geometric duality and the enumeration duality (Theorem 3). Let G∗ denote the dual graph of G. A great property of the planar square grids is that they are essentially self-dual; on the boundary there are some differences, but who cares, we are playing anyway. So we will cheat and assume that G = G∗ . Low temperature expansion. Here we use the geometric duality. The states σ correspond to the assignments of + or − to the plaquettes of G∗ . An edge of G∗ will be called frontal for this assignment if it borders two plaquettes with the opposite signs. Now we observe that the set of the frontal edges for an assignment is even, and each even set of edges of G∗ corresponds to exactly two states σ (which are opposite on each vertex). Summarising, Z(N, γ) = 2e|E|γ
e−2|H|γ ,
H
where the sum is over all even subsets of edges of G∗ = G. If T goes to zero then γ goes to infinity, and hence small cycles should dominate this expression of the partition function. This is a good news for computer simulations, and explains the name of this formula.
M. Loebl / Some Discrete Tools in Statistical Physics
323
High temperature expansion. Here we use Theorem 3. It honestly gives Z(N, γ) = 2N cosh(γ)|E|
tanh(γ)|H| ,
H
where the sum is again over all even subsets of edges of G. If T goes to infinity then γ goes to zero, and hence small cycles should dominate this expression of the partition function. Critical temperature of 2D Ising model. Let F (γ) be the free energy per site, i.e. −F (γ) = lim N −1 ln Z(N, γ). N →∞
At a critical point the free energy is non-analytic, so F will be a non-analytic function of γ. Moreover we assume that there is only one critical point. Then the expressions above help us to locate it: Let F (v) = lim N −1 ln( v |H| ), N →∞
H
where the sum is over all even subsets H ⊂ E(G). Let v = tanh(γ). Then −F (γ) = 2γ + F (e−2γ ) = ln(2 cosh(γ)) + F (v). If we define γ ∗ by tanh(γ ∗ ) = e−2γ , we get F (γ ∗ ) = 2γ + F (γ) − ln(2 cosh(γ)). If γ is large, γ ∗ is small. Hence the last equation relates the free energy at a low temperature to that at a high temperature. Hence, if there is only one critical value γc , then necessarily γc = γc∗ and this determines it. 1.8. Δ − Y transformation Let us try to apply the same trick to the honeycomb lattice H2N with 2N vertices. If we disregard the boundary irregularities, its geometric dual is the triangular lattice TN with N vertices. If we apply the high temperature expansion to H2N and the low temperature expansion to TN , we get an expression of Z(H2N , γ) in terms of Z(TN , γ). In order to extract the critical temperature, we need one more relation, and we will get it from the Δ − Y transformation. This is one of these magic seminal simple local operations. It consists in the exchange of a vertex l of degree 3 connected to independent vertices i, j, k (a Y ), with three edges between vertices i, j, k which form Δ (a triangle). We first note that H2N is bipartite, i.e. its vertices may be uniquelly partitioned into two sets V1 , V2 so that all the edges go between them. The new trick is to apply the Δ − Y transformation to all the vertices of V1 . The result is again the triangular lattice TN . Now, if we want to transform Z(H2N , γ) into the Ising partition function of this new triangular lattice TN , we get a system of equations for the coupling constants of TN , which has a solution, and this suffices to extract the critical temperature for Z(H2N , γ).
324
M. Loebl / Some Discrete Tools in Statistical Physics
This system of equations in the operator form is the famous Yang-Baxter equation. It defines the Temperley-Lieb algebra which has been used to introduce and study the quantum knot invariants like Jones polynomial, with close connections to the topological QFTs. This connection between statistical physics, knot theory, QFT and combinatorics has kept the mathematicians and physicists busy for more than a decade. So, look where we arrived at from the Euler’s theorem. Next chapter starts with another principle, of inclusion and exclusion. 2. Inclusion and Exclusion Let us start with the introduction of a paper of Hassler Whitney, which appeared in Annals of Mathematics in August 1932: “Suppose we have a finite set of objects (for instance books on a table), each of which either has or has not a certain given property A (say of being red). Let n be the ¯ the number total number of objects, n(A) the number with the property A, and n(A) ¯ without the property A. Then obviously n(A) = n − n(A). Similarly, if n(AB) denote ¯ the number with neither property, the number with both properties A and B, nad n(A¯B) ¯ ¯ then n(AB) = n − n(A) − n(B) + n(AB), which is easily seen to be true. The extension of these formulas to the general case where any number of properties are considered is quite simple, and is well known to logicians. It should be better known to mathematicians also; we give in this paper several applications which show its usefulness." Indeed, we all know it, under the name ‘inclusion-exclusion principle’: 8 if A1 , ..., An are finite sets, and if we let (Ai ; i ∈ J) = AJ then n (−1)k−1 |AJ |. (Ai ; i = 1, ..., n) = k=1 J∈(n k)
It can also be formulated as follows: T HEOREM 5 Let S be an n-element set and let V be a 2n −dimensional vector space over some field K. We consider the vectors of V indexed by the subsets of S. Let l be a linear transformation on V defined by vY l(vT ) = T ⊂Y
for all T ⊂ S. Then l−1 exists and is given by (−1)|Y −T | vY l−1 (vT ) = T ⊂Y
for all T ⊂ S. The set of all subsets of S equipped with the relation ‘⊂’ forms a partially ordered set (poset) called Boolean poset. The Mobius inversion formula extends Theorem 5 from the Boolean poset to an arbitrary ‘locally finite’ poset.
M. Loebl / Some Discrete Tools in Statistical Physics
325
2.1. Zeta Function of a Graph The theory of the Mobius function connects the Principle of Inclusion and Exclusion with a very useful concept of the zeta function of a graph. We will explain a seminal theorem of Bass. You will see in the last section of the paper that it is closely related to (several decades older) combinatorial solution to the 2D Ising model proposed by Kac, Ward and Feynman. Let G = (V, E) be a graph and let A = (V, A(G)) be an arbitrary orientation of G; an orientation of a graph is a prescribtion of one of two directions to each edge. If e ∈ E then ae will denote the orientation of e in A(G) and a−1 e will be the reversed orientation to ae . A ‘circular sequence’ p = v1 , a1 , v2 , a2 , ..., an , (vn+1 = v1 ) is called : e ∈ E}, prime reduced cycle if the following conditions are satisfied: ai ∈ {ae , a−1 e m ai = a−1 and (a , ..., a ) = Z for some sequence Z and m > 1. 1 n i+1 D EFINITION 1 Let G = (V, E) be a graph. The Ihara-Selberg function of G is I(u) =
(1 − u|γ|)
γ
and the zeta function of G is Z(u) = I(u)−1 , where the infinite product is over the set of the prime reduced cycles γ of G. The theorem of Bass reads as follows: T HEOREM 6 I(u) = det(I − uT ), where T is the matrix of the transitions between edges. The above considerations are closely related to the MacMahon’s Master Theorem, known also as boson-fermion correspondence in physics. Strong connections with quantum knot invariants have been discovered recently. mn 1 T HEOREM 7 The coefficient of xm 1 . . . xn in n n ( aij xj )mi i=1 j=1
is equal to the coefficient of z1m1 . . . znmn in the power series expansion of [det(δij − aij zi )]−1 .
326
M. Loebl / Some Discrete Tools in Statistical Physics
3. The chromatic polynomial and the Tutte polynomial In the before-mentioned paper, Whitney mentions a formula for the number of ways of coloring a graph as one of the main applications of PIE. Let us again follow the article of Whitney for a while: Suppose we have a fixed number z of colors at our disposal. Any way of assigning one of these colors to each vertex of the graph in such a way that any two vertices which are joined by an arc are of different colors, will be called admissible coloring, using z or fewer colors. We wish to find the number M (z) of admissible colorings, using z or fewer colors. ... We shall deduce a formula for M (z) due to Birkhoff. If there are V vertices in the graph G, then there are n = z V possible colorings, formed by giving each vertex in succession any one of z colors. Let R be this set of colorings. Let Aab denote those colorings with the property that a and b are of the same color, etc. Then the number of admissible colorings is M (z) = n − [n(Aab ) + n(Abd ) + ... + n(Acf )] +[n(Aab Abd ) + ...] − ... +(−1)E n(Aab Abd ...Acf ). With each property Aab is associated an arc ab of G. In the logical expansion, there is a term corresponding to every possible combination of the properties Apq ; with this combination we associate the corresponding edges, forming a subgraph H of G. In particular, the first term corresponds to the subgraph containing no edges, and the last term corresponds to the whole of G. We let H contain all the vertices of G. Let us evaluate a typical term n(Aab Aad ...Ace ). This is the number of ways of coloring G in z or fewer colors in such a way that a and b are of the same color, a and d are of the same color, ..., c and e are of the same color. In the corresponding subgraph H, any two vertices that are joined by an edge must be of the same color, and thus all the vertices in a single connected piece in H are of the same color. If there are p connected pieces in H, the value of this term is therefore z p . If there are s edges in H, the sign of the term is (−1)s . Thus (−1)s n(Aab Abd ...Acf ) = (−1)s z p . If there are (p, s) (this is Birkhoff’s symbol) subgraphs of s edges in p connected pieces, the corresponding terms contribute to M (z) an amount (−1)s (p, s)z p . Therefore, summing over all values of p and s, we find the polynomial in z:
M (z) =
(−1)s (p, s)z p .
p,s
This function is the well-known chromatic polynomial. The proper colorings of graphs appeared perhaps first with the famous Four-Color-Conjecture, which is now a
M. Loebl / Some Discrete Tools in Statistical Physics
327
theorem, even though proved only with a help of computers: Is it true that each planar graph has an admissible coloring by four colors? A graph G = (V, E) is connected if it has a path between any pair of vertices. If a graph is not connected then its maximum connected subgraphs are called connected components. If G = (V, E) is a graph and A ⊂ E then let C(A) denote the set of the connected components of graph (V, A) and c(A) = |C(A)| denotes the number of connected components (pieces) of (V, A). Let G = (V, E) be a graph. For A ⊂ E let r(A) = |V | − c(A). Then we can write M (z) = z c(E) (−1)r(E)
(−z)r(E)−r(A)(−1)|A|−r(A) .
A⊂E
This leads directly to Whitney rank generating function R(G, u, v) defined by
R(G, u, v) =
ur(E)−r(A)v |A|−r(A) .
A⊂E
We start considering the Tutte polynomial; it has been defined by Tutte and it may be expressed as a minor modification of the Whitney rank generating function. (x − 1)r(E)−r(A) (y − 1)|A|−r(A) . T (G, x, y) = A⊂E
T (G, x, y) is called the Tutte polynomial of graph G. Note that for any connected graph G, T (G, 1, 1) counts the number of spanning trees of G: indeed, the only terms that count are those for which r(A) = r(E) = |A|. These are exactly the spanning trees of G. The Tutte polynomial is directly related to the partition function of another basic model of statistical physics, the Potts model. Potts specialises to Ising. 3.1. The dichromate and the Potts partition function The following function called dichromate is extensively studied in combinatorics. It is equivalent to the Tutte polynomial.
B(G, a, b) =
a|A| bc(A) .
A⊂E
D EFINITION 2 Let G = (V, E) be a graph, k ≥ 1 integer and Je a weight (coupling constant) associated with edge e ∈ E. The Potts model partition function is defined as k eE(P )(s) , P k (G, Je ) = s
where the sum is over all functions (states) s from V to {1, . . . , k} and E(P k )(s) =
{i,j}∈E
Jij δ(s(i), s(j)).
328
M. Loebl / Some Discrete Tools in Statistical Physics
We may write P k (G, Je ) =
s
(1 + vij δ(s(i), s(j))) =
{i,j}∈E
A⊂E
k c(A)
vij ,
{i,j}∈A
where vij = eJij − 1. The RHS is sometimes called multivariate Tutte polynomial; If all Jij are the same we get an expression of the Potts partition function in the form of the dichromate:
P k (G, x) =
s
exδ(s(i),s(j)) =
{i,j}∈E
k c(A) (ex − 1)|A| = B(G, ex − 1, k).
A⊂E
3.2. The q-chromatic function and the q-dichromate Here we study the following q-chromatic function on graphs: D EFINITION 3 Let G = (V, E) be a graph and n a positive integer. Let V = {1, . . . , k} and let V (G, n) denote the set of all vectors (v1 , . . . , vk ) such that 0 ≤ vi ≤ n − 1 for each i ≤ k and vi = vj whenever {i, j} is an edge of G. We define the q-chromatic function by: Mq (G, n) =
q
P i
vi
.
(v1 ...vk )∈V (G,n)
Note that Mq (G, n)|q=1 is the classic chromatic polynomial of G. An example. We first recall some notation: n −1 For n > 0 let (n)1 = n and for q = 1 let (n)q = qq−1 denote a quantum integer. We n let (n)!q = i=1 (i)q and for 0 ≤ k ≤ n we define the quantum binomial coefficients by (n)!q n = . k q (k)!q (n − k)!q A simple quantum binomial formula leads to a well-known formula for the summation of the products of distinct powers. This gives the q-chromatic function for the complete graph. Observation 3 n Mq (Kk , n) = k! q k(k−1)/2 . k q Let G = (V, E) be a graph and A ⊂ E with C(A) denoting the set of the connected components of graph (V, A) and c(A) = |C(A)|. If W ∈ C(A) then let |W | denote the number of vertices of W . A standard PIE argument gives the following expression for the q-chromatic function, which enables to extend it from non-negative n to the reals.
329
M. Loebl / Some Discrete Tools in Statistical Physics
T HEOREM 8 Mq (G, n) =
(−1)|A|
(n)q|W | .
W ∈C(A)
A⊂E
The formula of Theorem 8 leads naturally to a definition of q-dichromate. D EFINITION 4 We let Bq (G, x, y) =
x|A|
(y)q|W | .
W ∈C(A)
A⊂E
Note that Bq=1 (G, x, y) = B(G, x, y) and by Theorem 8, Mq (G, n) = Bq (G, −1, n). What happens if we replace B(G, ex − 1, k) by Bq (G, ex − 1, k)? It turns out that this introduces an additional external field to the Potts model. T HEOREM 9
(k)q|W |
A⊂E W ∈C(A)
vij =
{i,j}∈A
q
P v∈V
s(v) E(P k )(s)
e
,
s
where vij = eJij − 1. 3.3. Multivariate generalisations Let x1 , x2 , . . . be commuting indeterminates and let G = (V, E) be a graph. The qchromatic function restricted to non-negative integer y is the principal specialization of XG , the symmetric function generalisation of the chromatic polynomial. This has been defined by Stanley as follows: D EFINITION 5 XG =
f
xf (v) ,
v∈V
where the sum ranges over all proper colorings of G by {1, 2, . . . }. Therefore Mq (G, n) = XG (xi = q i (0 ≤ i ≤ n − 1), xi = 0(i ≥ n)). Further Stanley defines symmetric function generalisation of the bad colouring polynomial: D EFINITION 6 XBG (t, x1 , . . . ) =
f
(1 + t)b(f )
xf (v) ,
v∈V
where the sum ranges over ALL colorings of G by {1, 2, . . . } and b(f ) denotes the number of monochromatic edges of f .
330
M. Loebl / Some Discrete Tools in Statistical Physics
Noble and Welsh define the U-polynomial (see Definition 7) and show that it is equivalent to XBG . Sarmiento proved that the polychromate defined by Brylawski is also equivalent to the U-polynomial. D EFINITION 7 UG (z, x1 . . . ) =
x(τS )(z − 1)|S|−r(S),
S⊂E(G)
where τS = (n1 ≥ n2 ≥ . . . nk ) is the partition of |V | determined by the connected components of S, x(τS ) = xn1 . . . xnk and r(S) = |V | − c(S). The motivation for the work of Noble and Welsh is a series of papers by Chmutov, Duzhin and Lando. It turns out that the U-polynomial evaluated at z = 0 and applied to the intersection graphs of chord diagrams satisfies the 4T −relation of the weight systems. Hence the same is true for Mq (G, z) for each positive integer z since it is an evaluation of UG (0, x1 . . . ): Observation 4 Let z be a positive integer. Then Mq (G, z) = (−1)|V | UG (0, x1 . . . )|xi :=(−1)(qi(z−1) +···+1) . Weight systems form a basic stone in the combinatorial study of the quantum knot invariants. On the other hand, it seems plausible that the q-dichromate determines the Upolynomial. If true, q-dichromate provides a compact representation of the multivariate generalisations of the Tutte polynomial mentioned above.
4. Two combinatorial solutions to the 2D Ising model In this section we describe two ways how to calculate the partition function of the Ising model for any given planar graph G. We have seen in Theorem 3 that the Ising partition function for graph G may be calculated from the generating function E(G, x) of the even subsets of edges of the same graph G. 4.1. The method of Pfaffian orientations Let G = (V, E) be a graph. A subset of edges P ⊂ E is called a perfect matching or dimer arrangement if each vertex belongs to exactly one element of P . The dimer partition function on graph G may be viewed as a polynomial P(G, α) which equals the sum of αw(P ) over all perfect matchings P of G. This polynomial is also called the generating function of perfect matchings. There is a simple local transformation of graph G to graph G so that E(G) = P(G ), and G is planar if G is. Hence in order to calculate E(G), it suffices to show how to calculate P(G) for the planar graphs G. An orientation of a graph G = (V, E) is a digraph D = (V, A) obtained from G by assigning an orientation to each edge of G, i.e. by ordering the elements of each edge of G.
M. Loebl / Some Discrete Tools in Statistical Physics
331
Let G = (V, E) be a graph with 2n vertices and D an orientation of G. Denote by A(D) the skew-symmetric matrix with the rows and the columns indexed by V , where auv = αw(u,v) in case (u, v) is a directed edge of D, au,v = −αw(u,v) in case (v, u) is a directed edge of D, and au,v = 0 otherwise. D EFINITION 8 The Pfaffian is defined as P fG (D, α) =
s∗ (P )ai1 j1 · · · ain jn ,
P
where P = {{i1 j1 }, · · · , {in jn }} is a partition of the set {1, . . . , 2n} into pairs, ik < jk for k = 1, . . . , n, and s∗ (P ) equals the sign of the permutation i1 j1 . . . in jn of 12 . . . (2n). Each nonzero term of the expansion of the Pfaffian equals αw(P ) or −αw(P ) where P is a perfect matching of G. If s(D, P ) denotes the sign of the term αw(P ) in the expansion, we may write P fG (D, α) =
s(D, P )αw(P ) .
P
The Pfaffians behave in a way very similar to determinants; in particular there is an efficient Gaussian elimination algorithm to calculate them. Hence, if we can find, for a graph G, an orientation D such that the sign s(D, P ) from 8 is the same for each perfect matching P , then we can calculate the generating function of the perfect matchings of G efficiently. Such an orientation is called Pfaffian orientation. The following seminal theorem of Kasteleyn thus provides a solution of the 2D Ising problem. T HEOREM 10 Each planar graph has a Pfaffian orientation. We can draw graphs on more complicated 2-dimensional surfaces; let us consider those that can be represented as the sphere with added disjoint handles (the torus is obtained from the sphere by adding one handle). The genus of a graph is the minimum number of handles needed for its proper representation. Kasteleyn noticed and Galluccio, Loebl proved the following generalisation of theorem 10. T HEOREM 11 If G is a graph of genus g then it has 4g orientations D1 , . . . , D4g so that P(G, x) is a linear combination of P fG (Di , x), i = 1, . . . , 4g . As a consequence, the Ising partition function may be calculated in a polynomial time for graphs on any fixed orientable surface. Hence also the Max-Cut problem is polynomially solvable on any fixed surface, by exhibiting the whole density function of edgecuts weights. Curiously there is no other method known even for the torus. This brings a curious restriction to the weights: in order to write down the whole density function, the weights must be integers with the absolute values bounded by a fixed polynomial in the size of the graph. Perhaps the most interesting open problem in this area is to design a combinatorial polynomial algorithm for the toroidal Max-Cut problem.
332
M. Loebl / Some Discrete Tools in Statistical Physics
4.2. Products over aperiodic closed walks The following approach has been developed by Kac, Ward and Feynman. It coincides with the notions of 2.1. Let G = (V, E) be a planar graph embedded in the plane and for each edge e let xe be an associate variable. Let A = (V, A(G)) be an arbitrary orientation of G. If e ∈ E then ae will denote the orientation of e in A(G) anda−1 e will be the reversed orientation to ae . We let xae = xa−1 = xe . A circular sequence e p = v1 , a1 , v2 , a2 , ..., an , (vn+1 = v1 ) is called non-periodic closed walk if the following m : e ∈ E}, ai = a−1 conditions are satisfied: ai ∈ {ae , a−1 e i+1 and (a1 , ..., an ) = Z n for some sequence Z and m > 1. We let X(p) = i=1 xai . We further let sign(p) = (−1)n(p) , where n(p) is a rotation number of p, i.e. the number of integral revolutions of the tangent vector. Finally let W (p) = sign(p)X(p). There is a natural equivalence on non-periodic closed walks: p is equivalent with reversed p. Each equivalence class has two elements and will be denoted by [p]. We let W ([p]) = W (p) and note that this definition is correct since equivalent walks have the same sign. We denote by (1 − W ([p]) the formal infinite product of (1 − W ([p]) over all equivalence classes of non-periodic closed walks of G. The following theorem, proposed by Feynman and proved by Sherman, together with a straightforward graph-theory transformation, provides an expression of E(G, x)2 for a planar graph G in terms of a reformulation of the Ihara-Selberg function of G by Foata and Zeilberger (see definition 1). The theorem thus provides, along with theorem 6, another solution of the 2D Ising problem. Again, there is a generalisation for graphs with genus g. T HEOREM 12 Let G be a planar graph with all degrees equal to two or four. Then E(G, x) =
(1 − W ([p]).
333
Physics and Theoretical Computer Science J.-P. Gazeau et al. (Eds.) IOS Press, 2007 © 2007 IOS Press. All rights reserved.
Author Index Akiyama, S. Audenaert, K.M.R. Bandt, C. Comellas, F. Frougny, C. Gazeau, J.-P. Hivert, F. Klyachko, A. Kotecký, R.
133 3 91 275 155 v 253 25 55
Lefranc, M. Loebl, M. Masáková, Z. Nešetřil, J. Pelantová, E. Rovan, B. Sakarovitch, J. Thibon, J.-Y. Viennot, X.G.
71 317 113 v, 295 113 v, 189 171 231 211
This page intentionally left blank
This page intentionally left blank
This page intentionally left blank