Commun. Math. Phys. 219, 1 – 3 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Harry Lehmann Harry Lehmann died in Hamburg on November 22, 1998. He left his wife Margot, whom he married in 1971, and two adult children. Lehmann was a founding father of modern Quantum Field Theory, the part of Quantum Mechanics that underlies elementary particle physics. Harry Lehmann was born in 1924 at Güstrow, Mecklenburg. After graduation from school in Rostock, the German army drafted him in 1942 for service in North Africa, where he was taken prisoner of war by the American forces. He spent three years in a prison camp in the United States; there he had the opportunity to study on his own, and to prepare for the university. When he was released in 1946, he soon returned to his parents in Rostock and began to study physics, first at the University of Rostock, then at the Humboldt University in East Berlin, obtaining his diploma with a thesis on experimental physics. In 1949 he became assistant of Friedrich Hund at the University of Jena, where he wrote his doctoral dissertation on classical electrodynamics. When Hund moved to the University of Frankfurt, Lehmann served at Jena as acting professor, until Hund was replaced. In the fall of 1952, Heisenberg offered Lehmann a position at the Max Planck Institute for Physics in Göttingen. There he joined an active group of young theorists from Germany and abroad who had come to collaborate with Heisenberg. After his initial stay, he requested permission to extend his visit; but the authorities of the former German
2
A. Jaffe, G. Mack, W. Zimmermann
Democratic Republic never responded, and so Harry Lehmann remained in the west. As a result it was not until 1976 that he could once again visit his parents in Rostock. A main topic of discussion in Heisenberg’s institute was the method of renormalization, that had been developed in the United States and Japan right after the war. This technique made it possible to compute measurable quantities of quantum electrodynamics, and to compare them with experiments, even though divergent integrals entered intermediate stages of the calculations. Despite the enormous success of renormalization theory, made evident by the high-precision agreement between theory and experiment, many physicists of the older generation in Europe remained skeptical and were convinced that the infinities indicated a serious deficiency of quantum field theory. Dirac, for instance, called renormalization theory “a sin against theoretical physics”. On the other hand the younger theoretical physicists were quite enthusiastic. They considered it a challenge to reformulate the theory in such a way that renormalization infinities never occur, either in the formulation of the principles, or in the calculation of observable quantities. Harry Lehmann’s publication on the properties of propagators [1] was an early decisive step in this direction. From minimal assumptions he derived the main properties of propagators, and expressed the constants of renormalization by integrals over finite quantities, even though those quantities diverged in perturbation theory. He carried out a large part of his pioneering work in the 1950’s, in collaboration with Kurt Symanzik and Wolfhart Zimmermann. The shorthand designation LSZ is familiar to all elementary particle physicists up to this day. The LSZ-formalism and the Lehmann representation are among the most important basic tools of the theory of elementary particles [2]. An important application of this technique in scattering theory provides the relation between scattering amplitudes and time ordered correlation functions. These were derived as an immediate consequence of the asymptotic behavior of field operators in the distant past and future. In 1955 Harry Lehmann left Heisenberg’s institute to visit Copenhagen as a member of the CERN Study Group, and in 1956 he accepted a professorship at the University of Hamburg to become the successor to Wilhelm Lenz. He founded the theoretical elementary particle physics group, and for thirty years, his strong personality determined the character of the II. Institut für Theoretische Physik at Hamburg University. He became Professor Emeritus in 1986. He also advised the German Electron Synchrotron laboratory DESY and helped start its theory group by persuading Kurt Symanzik to return there from New York. Many young scientists were strongly influenced by his personality, by his style of discussion in seminars, by the conciseness of his insightful contributions to research, and by his views on physics in general. Harry Lehmann’s interest in the theory of dispersion relations led to the beginning of his close collaboration with Res Jost. In the case of equal mass scattering Jost and Lehmann found a representation for matrix elements of the commutator of two field operators between energy-momentum eigenstates [3]. This representation was extended by Dyson to the general case of unequal masses [4]. On the basis of the Dyson representation, Lehmann derived dispersion relations and other analytic properties of the scattering amplitudes as a consequence of locality, relativistic invariance and conditions on the particle spectrum [5]. These results are also valid for composite particles despite their internal structure, since only general properties are used in the derivation which are independent of the dynamics of the system. Dispersion relations, therefore, provide an experimental test for the principles of local quantum field theory.
Harry Lehmann
3
Harry Lehmann remained active in research until the end of his life. He directed several NATOAdvanced Study Institutes in Cargèse (France). With K. Pohlmeyer he worked on field theories with non-polynomial Lagrangians. During his last years he investigated symmetry breaking effects for the quark mass spectrum together with T. T. Wu. Harry Lehmann’s scientific merits were recognized in many ways. He received the Max Planck medal of the German Physical Society 1967 and he was made a Chevalier de la Legion d’Honneur on December 31, 1969. He was honoured in 1997 by the Dannie Heineman Prize of the American Physical Society and the American Institute of Physics. Harry Lehmann was an excellent speaker with the remarkable ability to communicate involved and difficult subjects understandably. We remember him gratefully, in friendship, and with esteem for his scientific work. References 1. 2. 3. 4. 5.
Lehmann, H.: Nuovo Cimento 11, 342 (1954) Lehmann, H., Symanzik, K., and Zimmermann, W.: Nuovo Cimento 1, 1425 (1955) Jost, R., and Lehmann, H.: Nuovo Cimento 5, 1598 (1957) Dyson, F. J.: Phys. Rev. 110, 1460 (1958) Lehmann, H.: Nuovo Cimento 10, 579 (1958), and Suppl. to Nuovo Cimento 14, 153 (1950)
Arthur Jaffe Gerhard Mack Wolfhart Zimmermann
Commun. Math. Phys. 219, 5 – 30 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Algebraic Quantum Field Theory, Perturbation Theory, and the Loop Expansion M. Dütsch , K. Fredenhagen II. Institut für Theoretische Physik, Universität Hamburg, Luruper Chaussee 149, Hamburg, Germany. E-mail:
[email protected],
[email protected] Received: 9 February 2000 / Accepted: 21 March 2000
Dedicated to the memory of Harry Lehmann Abstract: The perturbative treatment of quantum field theory is formulated within the framework of algebraic quantum field theory. We show that the algebra of interacting fields is additive, i.e. fully determined by its subalgebras associated to arbitrary small subregions of Minkowski space. We also give an algebraic formulation of the loop expansion by introducing a projective system A(n) of observables “up to n loops”, where A(0) is the Poisson algebra of the classical field theory. Finally we give a local algebraic formulation for two cases of the quantum action principle and compare it with the usual formulation in terms of Green’s functions. 1. Introduction Quantum field theory is a very successful frame for our present understanding of elementary particle physics. In the case of QED it led to fantastically precise predictions of experimentally measurable quantities; moreover the present standard model of elementary particle physics is of a similar structure and is also in good agreement with experiments. Unfortunately, it is not so clear what an interacting quantum field theory really is, expressed in meaningful mathematical terms. In particular, it is by no means evident how the local algebras of observables can be defined. A direct approach by methods of constructive field theory led to the paradoxical conjecture that QED does not exist; the situation seems to be better for Yang-Mills theories because of asymptotic freedom, but there the problem of big fields which can appear at large volumes poses at present unsurmountable problems [1, 21]. In this paper we will take a pragmatic point of view: interacting quantum field theory certainly exists on the level of perturbation theory, and our confidence on quantum field theory relies mainly on the agreement of experimental data with results from low orders of perturbation theory. On the other hand, the general structure of algebraic quantum Work supported by the Deutsche Forschungsgemeinschaft
6
M. Dütsch, K. Fredenhagen
field theory (or “local quantum physics”) coincides nicely with the qualitative features of elementary particle physics, therefore it seems to be worthwhile to revisit perturbation theory from the point of view of algebraic quantum field theory. This will, on the one hand, provide physically relevant examples for algebraic quantum field theory, and on the other hand, give new insight into the structure of perturbation theory. In particular, we will see that we can reach a complete separation of the infrared problem from the ultraviolet problem. This might be of relevance for Yang-Mills theory, and it is important for the construction of the theory on curved spacetimes [7]. The plan of the paper is as follows. We will start by describing the Stückelberg– Bogoliubov–Shirkov–Epstein–Glaser-version of perturbation theory [6, 14, 28, 26, 7]. This construction yields the local S-matrices S(g) (g ∈ D(R4 )) as formal power series in g (Sect. 2). The most important requirement which is used in this construction is the condition of causality (15) which is a functional equation for g → S(g). The results of Sects. 3 and 4 are to a large extent valid beyond perturbation theory. We only assume that we are given a family of unitary solutions of the condition of causality. In terms of these local S-matrices we will construct nets of local observable algebras for the interacting theory (Sect. 3). We will see that, as a consequence of causality, the interacting theory is completely determined if it is known for arbitrary small spacetime volumes (Sect. 4). In Sect. 5 we algebraically quantize a free field by deforming the (classical) Poisson algebra. In a second step we generalize this quantization procedure to the perturbative interacting field. We end up with an algebraic formulation of the expansion in h¯ of the interacting observables (“loop expansion”). In the last section we investigate two examples for the quantum action principle: the field equation and the variation of a parameter in the interaction. Usually this principle is formulated in terms of Green’s functions [20, 18, 22], i.e. the vacuum expectation values of time ordered products of interacting fields. Here we give a local algebraic formulation, i.e. an operator identity for a localized interaction. In the case of the variation of a parameter in the interaction this requires the use of the retarded product of interacting fields, instead of only time ordered products (as in the formulation in terms of Green’s functions). For a local construction of observables and physical states in gauge theories we refer to [10, 11, 5]. There, perturbative positivity (“unitarity”) is, by a local version of the Kugo-Ojima formalism [17], reduced to the validity of BRST symmetry [3]. 2. Free Fields, Borchers’ Class and Local S-Matrices An algebra of observables corresponding to the Klein–Gordon equation ( + m2 )ϕ = 0
(1)
can be defined as follows: Let ret,av be the retarded, resp. advanced Green’s functions of ( + m2 ) ( + m2 ) ret,av = δ,
supp ret,av ⊂ V¯± ,
(2)
where V¯± denotes the closed forward, resp. backward lightcone, and let = ret − av . The algebra of observables A is generated by smeared fields ϕ(f ), f ∈ D(R4 ), which
Algebraic Quantum Field Theory, Perturbation Theory, Loop Expansion
7
obey the following relations f → ϕ(f ) is linear,
(3)
ϕ(( + m )f ) = 0, ϕ(f )∗ = ϕ(f¯),
(4)
2
(5) (6)
[ϕ(f ), ϕ(g)] = i < f, ∗ g >, where the star denotes convolution and < f, g >= d 4 xf (x)g(x). As a matter of fact, A (as a ∗-algebra with unit) is uniquely determined by these relations. The Fock space representation π of the free field is induced via the GNS-construction from the vacuum state ω0 . Namely, let ω0 : A → C be the quasifree state given by the two-point function ω0 (ϕ(f )ϕ(g)) = i < f, + ∗ g >,
(7)
where + is the positive frequency part of . Then the Fock space H, the vector representing the vacuum and the Fock representation are up to equivalence determined by the relation (, π(A)) = ω0 (A), A ∈ A. On H, the field ϕ (we will omit the representation symbol π ) is an operator valued distribution, i.e. there is some dense subspace D ⊂ H with (i) ϕ(f ) ∈ End(D) (ii) f → ϕ(f ) is continuous
∀ ∈ D.
There are other fields A on H, on the same domain, which are relatively local to ϕ, [A(f ), ϕ(g)] = 0
if
(x − y)2 < 0
∀(x, y) ∈ (supp f × supp g).
(8)
They form the so called Borchers class B. In the case of the free field in 4 dimensions, B consists of Wick polynomials and their derivatives [13]. Fields from the Borchers class can be used to define local interactions, g ∈ D(R4 ), (9) HI (t) = − d 3 x g(t, x)A(t, x), (where the minus sign comes from the interpretation of A as an interaction term in the Lagrangian) provided they can be restricted to spacelike surfaces. The corresponding time evolution operator from −τ to τ , where τ > 0 is so large that supp g ⊂ (−τ, τ ) × R3 , (the S-matrix) is formally given by the Dyson series S(g) = 1 +
∞ n i n=1
n!
dx1 . . . dxn T A(x1 ) . . . A(xn ) g(x1 ) . . . g(xn ).
(10)
with the time ordered products (“T -products”) T . . . . It is difficult to derive (10) from (9) if the field A cannot be restricted to spacelike surfaces. Unfortunately, this is almost always the case in four spacetime dimensions, the only exception being the field ϕ itself and its derivatives. Therefore one defines the timeordered products of n factors directly as multilinear (with respect to C ∞ -functions as coefficients) symmetric mappings from
8
M. Dütsch, K. Fredenhagen
B n to operator valued distributions T A1 (x1 ) . . . An (xn ) on D such that they satisfy the factorization condition1 T A(x1 ) . . . A(xn ) = T A(x1 ) . . . A(xk ) T A(xk+1 ) . . . A(xn ) (11) if {xk+1 , . . . , xn } ∩ ({x1 , . . . , xk } + V¯+ ) = ∅. The S-matrix S(g) is then, as a formal power series, by definition given by (10) . Since its zeroth order term is 1, it has an inverse in the sense of formal power series ∞ (−i)n S(g)−1 = 1 + dx1 . . . dxn T¯ A(x1 ) . . . A(xn ) g(x1 ) . . . g(xn ), (12) n! n=1
where the “antichronological products” T¯ (. . . ) can be expressed in terms of the time ordered products def T¯ A(x1 ) . . . A(xn ) = (−1)|P |+n T A(xi ), i ∈ p). (13) P ∈P ({1,...,n})
p∈P
(Here P({1, . . . , n}) is the set of all ordered partitions of {1, . . . , n} and |P | is the number of subsets in P .) The T¯ -products satisfy anticausal factorization T¯ A(x1 ) . . . A(xn ) = T¯ A(xk+1 ) . . . A(xn ) T¯ A(x1 ) . . . A(xk ) (14) if {xk+1 , . . . , xn } ∩ ({x1 , . . . , xk } + V¯+ ) = ∅. The crucial observation now (cf. [16]) is that S(g) satisfies the remarkable functional equation S(f + g + h) = S(f + g)S(g)−1 S(g + h),
(15)
f, g, h ∈ D(R4 ), whenever (supp f + V¯+ ) ∩ supp h = ∅ (independent of g). Equivalent forms of this equation play an important role in [6] and [14]. For g = 0 this is just the functional equation for the time evolution and may be interpreted as the requirement of causality [6]. Actually, for formal power series S(·) of operator valued distributions, the g = 0 equation is equivalent to the seemingly stronger relation (15), because both are equivalent to condition (11) for the time ordered products. We call (15) the “condition of causality”. 3. Interacting Local Nets The arguments of this and the next section are to a large extent independent of perturbation theory. We start from the assumption that we are given a family of unitaries S(f ) ∈ A, ∀f ∈ D(R4 , V) (i.e. f has the form f = i fi (x)Ai , fi ∈ D(R4 , R), Ai ∈ V), where V is an abstract, finite dimensional, real vector space, interpreted as the space of possible interaction Lagrangians, and A is some unital ∗-algebra. In perturbation theory V is a real subspace of the Borchers’ class. The unitaries S(f ) are required to satisfy the causality condition (15). We first observe that we obtain new solutions of (15) by introducing the relative S-matrices Sg (f ) = S(g)−1 S(g + f ), def
1 Due to the symmetry and linearity of T (. . . ) it suffices to consider the case A = A = · · · = A . n 1 2
(16)
Algebraic Quantum Field Theory, Perturbation Theory, Loop Expansion
9
where now g is kept fixed and Sg (f ) is considered as a functional of f . In particular, the relative S-matrices satisfy local commutation relations [Sg (h), Sg (f )] = 0
(x − y)2 < 0
if
∀(x, y) ∈ supp h × supp f.
(17)
δ Therefore their functional derivatives Ag (x) = δh(x) Sg (hA)|h=0 , A ∈ V, h ∈ D(R4 ), provided they exist, are local fields (in the limit g → constant this is Bogoliubov’s definition of interactig fields) [6]. We now introduce local algebras of observables by assigning to a region O of Minkowski space the ∗-algebra Ag (O) which is generated by {Sg (h) , h ∈ D(O, V)}. A remarkable consequence of relation (15) is that the structure of the algebra Ag (O) depends only locally on g [16, 7], namely, if g ≡ g in a neighbourhood of a causally closed region containing O, then there exists a unitary V ∈ A such that
V Sg (h)V −1 = Sg (h),
∀ h ∈ D(O, V).
(18)
Hence the system of local algebras of observables (according to the principles of algebraic quantum field theory this system (“the local net”) contains the full physical content of a quantum field theory) is completely determined if one knows the relative S-matrices for test functions g ∈ D(R4 , V). The construction of the global algebra of observables for an interaction Lagrangian L ∈ V may be performed explicitly (cf. [7]). Let $(O) be the set of all functions θ ∈ D(R4 ) which are identically to 1 in a causally closed open neighbourhood of O and consider the bundle {θ} × Aθ L (O). (19) θ∈$(O )
Let U(θ, θ ) be the set of all unitaries V ∈ A with V Sθ L (h) = Sθ L (h)V ,
∀ h ∈ D(O, V).
(20)
Then AL (O) is defined as the algebra of covariantly constant sections, i.e. AL (O) A = (Aθ )θ∈$(O) V Aθ = Aθ V ,
(Aθ ∈ Aθ L (O))
∀V ∈ U(θ, θ ).
(21) (22)
AL (O) contains in particular the elements SL (h), (SL (h))θ = Sθ L (h).
(23)
The construction of the local net is completed by fixing the embeddings i21 : AL (O1 ) &→ AL (O2 ) for O1 ⊂ O2 . But these embeddings are inherited from the inclusions Aθ L (O1 ) ⊂ Aθ L (O2 ) for θ ∈ $(O2 ) by restricting the sections from $(O1 ) to $(O2 ). The embeddings evidently satisfy the compatibility relation i12 ◦ i23 = i13 for O3 ⊂ O2 ⊂ O1 and define thus an inductive system. Therefore, the global algebra can be defined as the inductive limit of local algebras def
AL = ∪O AL (O).
(24)
In perturbation theory, the unitaries V ∈ U(θ, θ ) are themselves formal power series, therefore it makes no sense to say that two elements A, B ∈ AL (O) agree in nth
10
M. Dütsch, K. Fredenhagen
order, but only that they agree up to nth order (because (Aθ − Bθ ) = O(g n+1 ) implies Aθ − Bθ = V −1 (Aθ − Bθ )V = O(g n+1 )). The time ordered products and hence the relative S-matrices Sθ L (h) are chosen as to satisfy Poincaré covariance (see the normalization condition N1 below), i.e. the unitary ↑ positive energy representation U of the Poincaré group P+ under which the free field transforms satisfies U (L)Sθ L (h)U (L)−1 = SθL L (hL ), θL (x) := θ (L−1 x), hL (x) := D(L)h(L−1 x),
(25)
↑
∀L ∈ P+ provided L is a Lorentz scalar and V transforms under the finite dimensional representation D of the Lorentz group. This enables us to define an automorphic action of the Poincaré group on the algebra of observables. Let for A ∈ AL (O), θ ∈ $(LO) (αL (A))θ = U (L)AθL−1 U (L)−1 . def
(26)
By inserting the definitions one finds that αL (A) is again a covariantly constant section (22). So αL is an automorphism of the net which realizes the Poincaré symmetry αL AL (O) = AL (LO),
αL1 L2 = αL1 αL2 .
(27)
For the purposes of perturbation theory, we have to enlarge the local algebras somewhat. In perturbation theory, the relative S-matrices are formal power series in two variables, and therefore the generators of the local algebras SL (λf ) =
∞ n n i λ n=0
n!
TL (f ⊗n )
(28)
are formal power series with coefficients which are covariantly constant sections in the sense of (22). The first order terms in (28) are, according to Bogoliubov, the interacting local fields, TL (hA) =: AL (h), A ∈ V, h ∈ D(R4 ),
(29)
the higher order terms satisfy the causality condition (11) and may therefore be interpreted as time ordered products of interacting fields (cf. [14], Sect. 8.1). Our enlarged local algebra AL (O) (we use the same symbol as before) now consists of all formal power series with coefficients from the algebra generated by all timeordered products TL (f ⊗n ) with f ∈ D(O, V), n ∈ N0 . 4. Consequences of Causality Another consequence of the causality relation (15) is that the S-matrices S(f ) are uniquely fixed if they are known for test functions with arbitrarily small supports. Namely, by a repeated use of (15) we find that S( ni=1 fi ) is a product of factors S( i∈K fi )±1 , where the sets K ⊂ {1, . . . , n} have the property that for every pair i, j ∈ K the causal closures of supp fi and supp fj overlap. Hence if the supports of all fi are contained in double cones of diameter d, the supports of i∈K fi fit into double cones of diameter
Algebraic Quantum Field Theory, Perturbation Theory, Loop Expansion
11
2d. As d > 0 can be chosen arbitrarily small and the relative S-matrices also satisfy (15), this implies additivity of the net, AL (Oα ), (30) AL (O) = α
where (Oα ) is an arbitrary covering of O and where the symbol means the generated algebra. One might also pose the existence question: Suppose we have a family of unitaries S(f ) for all f with sufficiently small support which satisfy the causality condition (15) for f, g, h ∈ D(O, V), diam(O) sufficiently small, and local commutativity for arbitrary big separation [S(f ), S(g)] = 0
if
supp f is spacelike to supp g.
By repeated use of the causality (15) we can then define S-matrices for test functions with larger support. It is, however, not evident that these S-matrices are independent of the way of construction and that they satisfy the causality condition. (We found a consistent construction only in the simple case of one dimension: x = time.) Fortunately, a general positive answer can be given in perturbation theory. Let S(f ) be given for f ∈ D(O, V) for all double cones with diam(O) < r. The time ordered product of n factors is the n-fold functional derivative of S at f = 0. It is an operator valued distribution2 Tn defined on test functions of n variables with support
contained in Un = {(y1 , . . . , yn ) ∈ R4n | maxi<j |yi − yj | < 2r } and with values in V ⊗n . Especially we know T1 (x) on R4 . On this domain the time ordered products satisfy the factorization condition (11). In addition, local commutativity of the S-matrices implies def
[Tn (x1 , . . . , xn ), Tm (y1 , . . . ym )] = 0
(31)
for (xi − yj )2 < 0 ∀(i, j ) and (x1 , . . . xn ) ∈ Un , (y1 , . . . , ym ) ∈ Um . By construction Tn |Un is symmetric with respect to permutations of the factors. We now show that this input suffices to construct Tn (x1 , . . . , xn ) on the whole R4n by induction on n. We assume that the Tk ’s were constructed for k ≤ n − 1, that they fulfil causality (11) and [Tm (x1 , . . . , xm ), Tk (y1 , . . . yk )] = 0
for
(x1 , . . . xm ) ∈ Um , k ≤ n − 1
(32)
(m arbitrary) and [Tl (x1 , . . . , xl ), Tk (y1 , . . . yk )] = 0 if (xi − yj )2 < 0 of [7].3
for
l, k ≤ n − 1,
(33)
∀(i, j ) in the latter two equations. We can now proceed as in Sect. 4
2 Here we change the notation for the time ordered products: let f = f (x)A , f i i ∈ i i D(R4 ), Ai ∈ V. Instead of dx1 . . . dxn i1 ...in T Ai1 (x1 ) . . . Ain (xn ) fi1 (x1 ) . . . fin (xn ) (10) we write dx1 . . . dxn Tn (x1 , . . . , xn )f (x1 ) . . . f (xn ) ≡ Tn (f ⊗n ). 3 In contrast to the (inductive) Epstein–Glaser construction of T (x , . . . , x ) [14, 7] the present construcn 1 n
tion is unique, normalization conditions (e.g. N1–N4 in Sect. 5) are not needed, because the non-uniqueness of the Epstein–Glaser construction is located at the total diagonal n ≡ {(x1 , . . . , xn ) | x1 = · · · = xn }. But here the time ordered products are given in the neighbourhood Un of n .
12
M. Dütsch, K. Fredenhagen
Let J denote the family of all non-empty proper subsets I of the index set {1, . . . , n} def
and define the sets CI = {(x1 , . . . , xn ) ∈ R4n | xi ∈ J − (xj ), i ∈ I, j ∈ I c } for any I ∈ J . Then CI ∪ Un = R4n . (34) I ∈J
We use the short-hand notations
T I (xI ) = T (
Ai (xi )),
xI = (xi , i ∈ I ).
(35)
i∈I
On D(CI ) we set c
TI (x) = T I (xI )T I (xI c ) def
(36)
for any I ∈ CI . For I1 , I2 ∈ J , CI1 ∩ CI2 = ∅ one easily verifies4 TI1 |CI1 ∩CI2 = TI2 |CI1 ∩CI2 .
(37)
Let now {fI }I ∈J ∪ {f0 } be a finite smooth partition of unity of R4n subordinate to {CI }I ∈J ∪ Un : supp fI ⊂ CI , supp f0 ⊂ Un . Then we define def
Tn (h) = Tn |Un (f0 h) +
TI (fI h),
h ∈ D(R4n , V ⊗n ).
(38)
I ∈J
As in [7] one may prove that this definition is independent of the choice of {fI }I ∈J ∪{f0 } and that Tn is symmetric with respect to permutations of the factors and satisfies causality (11). Local commutativity (32) and (33) (with n−1 replaced by n) is verified by inserting the definition (38) and using the assumptions. By (10) we obtain from the T -products the corresponding S-matrix S(g) for arbitrary large support of g ∈ D(R4 , V), and S(g) satisfies the functional equation (15).
5. Perturbative Quantization and Loop Expansion Causal perturbation theory was traditionally formulated in terms of operator valued distributions on Fock space. It is therefore well suited for describing the deformation of the free field into an interacting field by turning on the interaction g ∈ D(R4 , V). It is much less clear how an expansion in powers of h¯ can be performed, describing the deformation of the classical field theory, mainly because the Fock space has no classical counter part. Usually the expansion in powers of h¯ is done in functional approaches to field theory by ordering Feynman graphs according to loop number. In this section we show that the algebraic description provides a natural formulation of the loop expansion, and we point out the connection to formal quantization theory. 4 In contrast to [7] the Wick expansion of the T -products is not used here, because local commutativity of the T -products is contained in the inductive assumption.
Algebraic Quantum Field Theory, Perturbation Theory, Loop Expansion
13
5.1. Quantization of a free field and Wick products. In quantization theory one associates to a given classical theory a quantum theory. One procedure is the deformation (or star-product) quantization [2]. This procedure starts from a Poisson algebra, i.e. a commutative and associative algebra together with a second product: a Poisson bracket, satisfying the Leibniz rule and the Jacobi identity; and to deform the product as a function of h, ¯ such that5 a ×h¯ b is a formal power series in h, ¯ the associativity is maintained and a ×h¯ b
h¯ →0
−→
1 (a ×h¯ b − b ×h¯ a) h¯
ab,
h¯ →0
−→
{a, b}.
(39)
Actually this scheme can easily be realized in free field theory (cf. [9]). Basic functions are the evaluation functionals ϕclass (x), ( + m2 )ϕclass = 0, with the Poisson bracket {ϕclass (x), ϕclass (y)} = (x − y)
(40)
( is the commutator function (2)). Because of the singular character of the fields should be smoothed out in order to belong to the Poisson algebra. Hence our fundamental classical observables are φ(t) = t0 +
N
ϕclass (x1 ) . . . ϕclass (xn )tn (x1 , . . . , xn )dx1 . . . dxn ,
n=1
(41)
t ≡ (t0 , t1 , . . . ), where t0 ∈ C arbitrary, N < ∞, tn is a suitable test “function” (we will admit also certain distributions) with compact support. The Klein Gordon equation shows up in the property: A(t) = 0 if t0 = 0 and tn = ( i + m2 )gn for all n > 0, some i = i(n) and some gn with compact support. In the quantization procedure we identify ϕclass (x1 ) . . . ϕclass (xn ) with the normally ordered product (Wick product) : ϕ(x1 ) . . . ϕ(xn ) : (ϕ is the free quantum field ((3)– (6)). Wick’s theorem may be interpreted as the definition of a h-dependent associative ¯ product, :
ϕ(xi ) : ×h¯ :
i∈I
=
j ∈J
ϕ(xj ) :
K⊂I α:K→J injective j ∈K
i h
¯ + (xj − xα(j ) ) :
ϕ(xl ) :
(42)
l∈(I \K)∪(J \α(K))
in the linear space spanned by Wick products (the “Wick quantization”).6 To be precise we have to fix a suitable test function space (or better: test distribution space) in (41) which is small enough such that the product is well defined for all h¯ and which contains the interesting cases occuring in perturbation theory, e.g. products of translation invariant distributions (particularly δ-distributions of difference variables) with test functions of compact support should be allowed for tn as in Theorem 0 of Epstein and Glaser. 5 The deformed product is called a ∗-product in deformation theory. In order to avoid confusion with the ∗-operation we denote the product by ×h¯ . 6 The observation that the Wick quantization is appropriate for the quantization of the free field goes back to Dito [9].
14
M. Dütsch, K. Fredenhagen
Let Wn = {t ∈ D (R4n )symm , supp t compact, def
(43)
WF(t) ∩ (R4n × V+n ∪ V−n ) = ∅}
(see the Appendix for a definition of the wave front set WF of a distribution). In [7] it was shown that Wick polynomials smeared with distributions t ∈ Wn , def def ⊗n : ϕ(x1 ) . . . ϕ(xn ) : t (x1 , . . . , xn ) dx1 . . . dxn , (ϕ ⊗0 ) = 1, (44) (ϕ )(t) = are densely defined operators on an invariant domain in Fock space. This includes in particular the Wick powers n
: ϕ (f ) := (ϕ
⊗n
)(t), f ∈ D(R ), t (x1 , . . . , xn ) = f (x1 ) 4
n
δ(xi − x1 )
(45)
i=2
The product of two such operators is given by (ϕ
⊗n
)(t) ×h¯ (ϕ
⊗m
)(s) =
min{n,m}
h¯ k (ϕ ⊗(n+m−2k) )(t ⊗k s)
(46)
k=0
with the k-times contracted tensor product n!m!i k dy1 . . . dy2k + (y1 − y2 ) . . . (t ⊗k s)(x1 , . . . , xn+m−2k ) = S k!(n − k)!(m − k)!
+ (y2k−1 − y2k )t (x1 , . . . , xn−k , y1 , y3 , . . . , y2k−1 ) s(xn−k+1 , . . . , xn+m−2k , y2 , y4 , . . . , y2k ) (47) (S means the symmetrization in x1 , . . . , xn+m−2k ). The conditions on the wave front sets of t and s imply that the product (t ⊗k s) exists (see the Appendix) and is an element of Wn+m−2k . The ∗-operation reduces to complex conjugation of the smearing function. def def ∞ Let W0 = C and W = n=0 Wn . For t ∈ W let tn denote the component of t in def Wn . The ∗-operation is defined by (t ∗ )n = (t¯n ). Equation (46) can be thought of as the definition of an associative product on W, h¯ k tm ⊗k sl . (48) (t ×h¯ s)n = m+l−2k=n
The Klein–Gordon equation defines an ideal N in W which is generated by ( + m2 )f, f ∈ D(R4 ). Actually this ideal is independent of h¯ (because a contraction with ( +m2 )f vanishes) and coincides with the kernel of φ defined in (41). Hence the product ¯ = W/N . For a given positive value of h, (48) is well defined on the quotient space W ¯ ¯ is isomorphic to the algebra generated by Wick products (ϕ ⊗n )(t), t ∈ Wn (44). In W the limit h¯ → 0 we find h¯ n t ⊗n s) lim φ(t) ×h¯ φ(s) = lim φ( h¯ →0
h¯ →0
n
= φ(t ⊗0 s) = φ(t) · φ(s)
(49)
Algebraic Quantum Field Theory, Perturbation Theory, Loop Expansion def
(we set (t ⊗k s)n = lim
h¯ →0
m+l=n tm+k
15
⊗k sl+k , cf. (47)), with the classical product ·, and
1 [φ(t), φ(s)]h¯ = φ(t ⊗1 s − s ⊗1 t) = {φ(t), φ(s)} i h¯
(50)
with the classical Poisson bracket. Thus (W, ×h¯ ) provides a quantization of the given Poisson algebra of the classical free field ϕclass (40). We point out that we have formulated the algebraic structure of smeared Wick products without using the Fock space. The Fock representation is recovered, via the GNS construction, from the vacuum state ω0 (t) = t0 . It is faithful for h¯ = 0 but is one dimensional in the classical limit h¯ = 0. This illustrates the superiority of the algebraic point of view for a discussion of the classical limit.
5.2. Normalization conditions and retarded products. To study the perturbative quantization of interacting fields we need some technical tools which are given in this subsection. The time ordered products are constructed by induction on the number n of factors (which is also the order of the perturbation series (10)). In contrast to the inductive construction of the T -products in sect. 4, we do not know Tn |Un here. So causality (11) and symmetry determine the time ordered products uniquely (in terms of time ordered products of less factors) up to the total diagonal n = {(x1 , . . . , xn ) ∈ R4n |x1 = x2 = · · · = xn }. There is some freedom in the extension to n . To restrict it we introduce the following additional defining conditions (“normalization conditions”, formulated for a scalar field without derivative coupling, i.e. L is a Wick polynomial solely in φ, it does not contain derivatives of φ; for the generalization to derivative couplings see [5]) N1 covariance with resp. to Poincaré transformations and possibly discrete symmetries, in particular N2 unitarity: T (A1 (x1 ) . . . An (xn ))∗ = T¯ (A∗1 (x1 ) . . . A∗n (xn )), N3 [T (A1 (x1 ) . . . An (xn )), φ(x)] k = i h¯ nk=1 T (A1 (x1 ) . . . ∂A ∂φ (xk ) . . . An (xn )) (xk − x), 2 N4 ( x + m )T (A1 (x1 ) . . . An (xn )φ(x)) k = −i h¯ nk=1 T (A1 (x1 ) . . . ∂A ∂φ (xk ) . . . An (xn ))δ(xk − x), where [φ(x), φ(y)] = i h¯ (x − y). N1 implies covariance of the arising theory, and N2 provides a ∗-structure. N3 gives the relation to time ordered products of sub Wick polynomials. Once these are known (in an inductive procedure), only a scalar distribution has to be fixed. Due to translation invariance the latter depends only on the relative coordinates. Hence, the extension of the (operator valued) T -product to n is reduced to the extension of a C-number distribution t0 ∈ D (R4(n−1) \ {0}) to t ∈ D (R4(n−1) ). (We call t an extension of t0 if t (f ) = t0 (f ), ∀f ∈ D(R4(n−1) \ {0})). The singularity of t0 (y) and t (y) at y = 0 is classified in terms of Steinmann’s scaling degree [27, 7] sd(t) = inf{δ ∈ R , lim λδ t (λx) = 0}. def
λ→0
(51)
By definition sd(t0 ) ≤ sd(t), and the possible extensions are restricted by requiring sd(t0 ) = sd(t).
(52)
16
M. Dütsch, K. Fredenhagen
Then the extension is unique for sd(t0 ) < 4(n − 1), and in the general case there remains the freedom to add derivatives of the δ-distribution up to order (sd(t0 ) − 4(n − 1)), i.e. Ca ∂ a δ(y) (53) t (y) + |a|≤sd(t0 )−4(n−1)
is the general solution, where t is a special extension [7, 24, 14], and the constants Ca are restricted by N1, N2, N4, permutation symmetries and possibly further normalization conditions, e.g. the Ward identities for QED [10, 5]. For an interaction with mass dimension dim(L) ≤ 4 the requirement (52) implies renormalizability by power counting, i.e. the number of indeterminate constants Ca does not increase by going over to higher perturbative orders. In [10] it is shown that the normalization condition N4 implies the field equation for the interacting field corresponding to the free field φ (see also (77) and Sect. 6.1 below). We have defined the interacting fields as functional derivatives of relative S-matrices (29). Hence, to formulate the perturbation series of interacting fields we need the perturbative expansion of the relative S-matrices: Sg (f ) =
i n+m n,m
n!m!
Rn,m (g ⊗n ; f ⊗m ),
(54)
where g, f ∈ D(R4 , V). The coefficients are the so called retarded products (“Rproducts”). They can be expressed in terms of time ordered and anti-time ordered products by Rn,m (g ⊗n ; f ⊗m ) =
n
(−1)k
k=0
n! T¯k (g ⊗k ) k!(n − k)!
×h¯ Tn−k+m (g ⊗(n−k) ⊗ f ⊗m ).
(55)
They vanish if one of the first n arguments is not in the past light cone of some of the last m arguments ([14], Sect. 8.1), supp Rn,m . . . ⊂ {(y1 , . . . yn , x1 , . . . , xm ) , {y1 , . . . yn } ⊂ ({x1 , . . . , xm } + V¯− )}. (56) In the remaining part of this subsection we show that the time ordered products can be defined in such a way that Rn,m is of order h¯ n . For this purpose we will introduce the connected part (a1 ×h¯ · · · ×h¯ an )c of (a1 ×h¯ · · · ×h¯ an ), where the ai are normally ordered products of free fields, and the connected part Tnc of the time ordered product Tn (or “truncated time ordered product”). In both cases the connected part corresponds to the sum of connected diagrams, provided the vertices belonging to the same ai are identified. Besides the (deformed) product ×h¯ (42) a ×h¯ b = h¯ n Mn (a, b), (57) n≥0
where a, b are normally ordered products of free fields, we have the classical product a · b = M0 (a, b), which is just the Wick product ϕ(xi ) : · : ϕ(xj ) :=: ϕ(xi ) ϕ(xj ) : (58) : i∈I
j ∈J
i∈I
j ∈J
Algebraic Quantum Field Theory, Perturbation Theory, Loop Expansion
17
and which is also associative and in addition commutative. Then we define (a1 ×h¯ · · · ×h¯ an )c recursively by def (aj1 ×h¯ · · · ×h¯ aj|J | )c , (59) (a1 ×h¯ · · · ×h¯ an )c = (a1 ×h¯ · · · ×h¯ an ) − |P |≥2 J ∈P
where {j1 , . . . , j|J | } = J , j1 < · · · < j|J | , the sum runs over all partitions P of {1, . . . , n} in at least two subsets and means the classical product (58). Tnc is defined analogously def c T|p| (⊗j ∈p fj ), (60) Tnc (f1 ⊗ · · · ⊗ fn ) = Tn (f1 ⊗ · · · ⊗ fn ) − |P |≥2 p∈P
and similarly we introduce the connected antichronological product T¯nc ≡ (T¯n )c . Proposition 1. Let the normally ordered products of free fields a1 , . . . , an be of order O(h¯ 0 ). Then (a1 ×h¯ · · · ×h¯ an )c = O(h¯ n−1 ).
(61)
Proof. We identify the vertices belonging to the same ai and apply Wick’s theorem (42) to a1 ×h¯ · · ·×h¯ an . Each “contraction” (i.e. each factor + ) is accompanied by a factor h. ¯ In the terms ∼ h¯ 0 (i.e. without any contraction) a1 , . . . , an are completely disconnected, the number of connected components is n. By a contraction this number is reduced by 1 or 0. So to obtain a connected term we need at least (n − 1) contractions. Hence the connected terms are of order O(h¯ n−1 ). % & Let B A1 , . . . , An = O(h¯ 0 ) and xi = xj , ∀1 ≤ i < j ≤ n. Then there exists a permutation π ∈ Sn such that T c A1 (x1 ) . . . An (xn ) = (Aπ1 (xπ1 ) ×h¯ · · · ×h¯ Aπn (xπn ))c = O(h¯ n−1 ). (62) We want this estimate to hold true also for coinciding points T c A1 (x1 ) . . . An (xn ) = O(h¯ n−1 ) on D(R4n ).
(63)
By the following argument this can indeed be satisfied by appropriate normalization of the time ordered products, i.e. (63) is an additional normalization condition, which is compatible with N1–N4. We proceed by induction on the number n of factors. Let us assume that the T c -products with less than n factors fulfil (63) and that we are away from the total diagonal
n . Using causal factorization, (60) and the shorthand notation T (J ) := T ( j ∈J Aj (xj )), J ⊂ {1, . . . , n}, we then know that there exists I ⊂ {1, . . . , n}, I = ∅, I c = ∅, with
c
c
T A1 (x1 ) . . . An (xn ) = T (I ) ×h¯ T (I ) =
|I | |I |
r=1 s=1 I1 &···&Ir =I J1 &···&Js =I c
h¯ k Mk T c (I1 ) · · · · · T c (Ir ), T c (J1 ) · · · · · T c (Js ) ,
(64)
k≥0
where & means the disjoint union. We now pick out the connected diagrams. The term k = 0 on the r.h.s. has (r + s) disconnected components. Analogously to Proposition 1
18
M. Dütsch, K. Fredenhagen
we conclude that it must hold k ≥ (r +s −1) for a connected diagram. Taking the validity of (63) for T c (Il ) and T c (Jm ) into account, we obtain rl=1 (|Il | − 1) + sm=1 (|Jm | − 1) + (r + s − 1) = n − 1 for the minimal order in h¯ of a connected diagram. So the h-power behaviour (62) holds true on D(R4n \ n ), and (63) is in fact a normalization ¯ condition. Due to (60) (Tn − Tnc ) is completely given by timeordered products of lower orders < n and hence is known also on n . The problem of extending Tn to n concerns solely Tnc . The normalization conditions N1–N4 are equivalent to the same conditions for Tnc and T¯nc (i.e. Tn and T¯n everywhere replaced by Tnc and T¯nc ). Due to N3–N4 it remains only the extension of < , T c (A1 . . . An ) >, where all Aj are different from free fields and is the vacuum. It is obvious that this can be done in a way which maintains (63) and is in accordance with N1–N2. We emphasize that the (ordinary) time ordered product Tn does not satisfy (63) because of the presence of disconneted diagrams. On the other hand the connected antichronological product T¯nc fulfills the estimate (63), as may be seen by unitarity N2. We now turn to the retarded products (55): Proposition 2. Let D(R4 , V) fj , gk = O(h¯ 0 ). Then the following statements hold true: (i) All diagrams which contribute to Rn,m (f1 ⊗ · · · ⊗ fn ; g1 ⊗ · · · ⊗ gm ) have the property that each fj -vertex is connected with at least one gk -vertex. (ii) Rn,m (f1 ⊗ · · · ⊗ fn ; g1 ⊗ · · · ⊗ gm ) = O(h¯ n ). Proof. (i) We work with the notation Rn,m (Y ; X), Y ≡ {y1 , . . . , yn }, X ≡ {x1 , . . . , xm } (cf. [14]), and consider a subdiagram with vertices J ⊂ Y which is not connected with the other vertices (Y \ J ) ∪ X. Because disconneted diagrams factorize with respect to the classical product (58), the corresponding contribution to Rn,m (Y ; X) (55) reads
(−1)|I | T¯ (I ∩ J c )T¯ (I ∩ J ) ×h¯ T (I c ∩ J )T (I c ∩ J c , X) . (65) I ⊂Y
However, this expression vanishes due to P ⊂J (−1)|P | T¯ (P ) ×h¯ T (J \ P ) = 0 (the latter equation is equivalent to (13), it is the perturbative version of S −1 S = 1). Hence for non-vanishing diagrams J must be the empty set. (ii) We express the R-product in terms of the connected T - and T¯ -products Rn,m (f1 ⊗ · · · ⊗ fn ; g1 ⊗ · · · ⊗ gm ) (−1)|I | = I ⊂{1,...,n}
p∈P
P ∈Part(I ) Q∈Part(I c &{1,...,m})
c c T¯|p| (⊗i∈p fi ) ×h¯ T|q| (⊗i∈q fi ⊗ ⊗j ∈q gj ) ,
(66)
q∈Q
where again means the classical product (58) and & stands again for the disjoint union. From (63) we know c T¯|p| (⊗i∈p fi ) = O(h¯ |I |−|P | ), p∈P
q∈Q
c T|q| (⊗i∈q fi ⊗ ⊗j ∈q gj ) = O(h¯ |I
c |+m−|Q|
).
(67)
Algebraic Quantum Field Theory, Perturbation Theory, Loop Expansion
19
From part (i) we conclude that the terms of lowest order (in h) ¯ in
c c c c h¯ n Mn T¯|p| T¯|p| (. . . ) ×h¯ T|q| (. . . ) = (. . . ), T|q| (. . . ) (68) p∈P
n≥0
q∈Q
p∈P
q∈Q
do not contribute. For simplicity we first consider the special case m = 1. Then only connected diagrams contribute. Hence we obtain n ≥ |P | + |Q| − 1 similarly to the reasoning after (64). For arbitrary m ≥ 1 the terms with minimal power in h¯ correspond to diagrams which are maximally disconnected.According to part (i) these diagrams have m disconnected components each component containing precisely one vertex gj . Applying the m = 1-argument to each of this components we get n ≥ |P | + |Q| − m. Taking (67) into account it results the assertion: (|I |−|P |)+(|I c |+m−|Q|)+(|P |+|Q|−m) = n. & % 5.3. Interacting fields. We first describe the perturbative construction of the interacting classical field. Let L be a function of the field which serves as the interaction Lagrangian (for simplicity, we do not consider derivative couplings). We want to find a Poisson algebra generated by a solution of the field equation ( + m2 )ϕL (x) = −
∂L ∂ϕ L
(x),
(69)
with the initial conditions {ϕL (0, x), ϕL (0, y)} = 0 = {ϕ˙L (0, x), ϕ˙L (0, y)} {ϕL (0, x), ϕ˙L (0, y)} = δ(x − y).
(70)
We proceed in analogy to the construction of the interacting quantum field in Sect. 3 and construct in a first step solutions with localized interactions θ L with θ ∈ D(R4 ) which coincide at early times with the free field (hence the initial conditions (70) are trivially satisfied for sufficiently early times). They are given by a formal power series in the Poisson algebra of the free field ϕθ L (x) =
∞ 0 0 0 0 n=0 y1 ≤y2 ≤...yn ≤x
dy1 dy2 . . . dyn θ(y1 ) . . . θ(yn )
(71)
{L(y1 ), {L(y2 ), . . . {L(yn ), ϕ(x)} . . . }} Analogous to the quantum case, the structure of the Poisson algebra associated to a causally closed region O does not depend on the behaviour of the interaction Lagrangian outside of O, i.e. there is, for θ, θ ∈ $(O) a canonical transformation v with v(ϕθ L (x)) = ϕθ L (x) for all x ∈ O. The interacting field ϕL may then be defined as a covariantly constant section within a bundle of Poisson algebras. Starting from the classical interacting field, one may try to define the quantized interacting field by replacing products of free classical fields by the normally ordered product of the corresponding free quantum fields (as in sect. 5.1) and the Poisson brackets in (71) by commutators {· , ·} →
1 [· , ·]h¯ , i h¯
(72)
20
M. Dütsch, K. Fredenhagen
where the commutator refers to the quantized product ×h¯ . Note that in general this replacement produces additional terms, e.g. the terms k ≥ 2 in min {n,m} n!m! 1 n m k−1 (i h) [: ϕ (x) :, : ϕ (y) :]h¯ = ¯ i h¯ (n − k)!(m − k)! k=1
k
+ (x − y) − + (y − x)k : ϕ (n−k) (x)ϕ (m−k) (y) :
(73)
which correspond to loop diagrams. Due to the distributional character of the fields with respect to the quantized product the integral in (71), as it stands, is not well defined (there is an ambiguity for coinciding points due to the time ordering). But as we will see Bogoliubov’s formula (29) for the interacting quantum field as a functional derivative of the relative S-matrix may be interpreted as a precise version of this integral. From the factorization property (11), (14) of time ordered and anti-time ordered products, one gets the following recursion formula for the retarded products ((54), (55)): if supp g is contained in the past and supp f, supp h in the future of some Cauchy surface, we find Rn+1,m (g ⊗ h⊗n ; f ⊗m ) = −[T1 (g), Rn,m (h⊗n ; f ⊗m )]h¯ ,
(74)
where we used the fact that T¯1 = T1 . Hence, for m = 1 and yi = yj ∀i = j the retarded product Rn,1 (y1 , . . . , yn ; x) can be written in the form7 0 0 0 R L(y1 ) . . . L(yn );ϕ(x) = (−1)n $(x 0 − yπn )$(yπn − yπ(n−1) )... π∈Sn (75) 0 0 − yπ1 )[L(yπ1 ), [L(yπ2 ) . . . [L(yπn ), ϕ(x)]h¯ . . . ]h¯ ]h¯ . $(yπ2
(Due to the locality of the interaction L this is a Poincaré covariant expression.) This formula confirms part (ii) of Proposition 2 for non-coinciding yi . Our main application of (75) is the study of the classical limit h¯ → 0 of the quantized interacting field (29). Due to Proposition 2 (part (ii)) R h¯ −1 L(y1 ) . . . h¯ −1 L(yn ); ϕ(x) contains no terms with negative powers of h¯ and thus has a well-defined classical limit. We conclude that the quantized interacting field (29), (54) ϕθ L (h) =
∞ in Rn,1 ((θL)⊗n ; hϕ), n!h¯ n
h ∈ D(R4 ),
(76)
n=0
tends to the classical interacting field (71) in this limit. Note that the factor h¯ −1 in the interaction Lagrangian is in accordance with the quantization rule (72), since in (75) there is for each factor L precisely one commutator. In Rn,1 ((θ L)⊗n ; f ϕ) the above mentioned ambiguities for coinciding points in the iterated retarded commutators have been fixed by the definition of time ordered products as everywhere defined distributions. The normalization condition N4 implies an analogous equation for the retarded product Rn,1 (cf. [10]). The latter means that ϕL (76) satisfies the same field equation as the classical interacting field (69)
∂L ( + m2 )ϕL (x) = − (x). (77) ∂ϕ L 7 The notation for the time ordered products introduced in Sect. 2 is used here for the retarded products.
Algebraic Quantum Field Theory, Perturbation Theory, Loop Expansion
21
Here ∂∂ϕL is not necessarily a polynomial in ϕL (the pointwise product of interacting L fields is in general not defined). We found that the relative S-matrices Sh¯ −1 θ L (f ) (f ∈ D(R4 , V)), and hence all elements of the algebra Ah¯ −1 θ L are power series in h. ¯ For the global algebras of covariantly constant sections we recall from [7] that the unitaries V ∈ U(θ, θ ) can be chosen as relative S-matrices V = Sh¯ −1 θ L (h¯ −1 θ− L)−1 ∈ U(θ, θ ),
(78)
where θ− ∈ D(R4 ) depends in the following way on (θ − θ ): we split θ − θ = θ+ + θ− with supp θ+ ∩ (C(O) + V¯− ) = ∅ and supp θ− ∩ (C(O) + V¯+ ) = ∅, (where C(O) means the causally closed region containing O in which θ and θ agree, cf. (18)). So V is a formal Laurent series in
h, ¯ and the sections
are no longer well defined power series. Replacing A and A(O) by n∈N0 h¯ n A and n∈N0 h¯ n A(O) (for the new algebras the same symbol A will be used again) we obtain modules over the ring of formal power series in h¯ with complex coefficients. For the further construction the validity of part (iii) of the following Proposition is crucial: (a) (a) Proposition 3. (i) Let Rn,m (. . . ; . . . ) = m a=1 Rn,m (. . . ; . . . ), where Rn,m (. . . ; . . . ) is the sum of all diagrams with a connected components. Then (a) ((h¯ −1 θ L)⊗n ; (h¯ −1 θ− L)⊗m ) = O(h¯ −a ). Rn,m
(79)
(Note that the range of a is restricted by part (i) of Proposition 2.) This estimate is of more general validity: instead of a retarded product we could have e.g. a multiple ×h¯ -product, a time ordered or antichronological product and the factors may be quite arbitrary. It is only essential that each factor is of order O(h¯ −1 ). (ii) Let A ∈ A(O). Then all diagrams which contribute to V ×h¯ A ×h¯ V −1 , (where V is given by (78)) have the property that each vertex of V and of V −1 is connected with at least one vertex of A. (It may happen that a connected component of V is not directly connected with A, but that it is connectecd with a connected component of V −1 and the latter is connected with A.) (iii) A(O) A = O(h¯ n )
(⇒
V ×h¯ A ×h¯ V −1 = O(h¯ n ).
(80)
In particular if A is the term of n-th order in h¯ of an interacting field, then V ×h¯ A ×h¯ V −1 is a power series in h¯ in which the terms up to order h¯ n−1 vanish. Proof. Part (i) is obtained essentially in the same way as Proposition 1. Part (iii) is a consequence of parts (i) and (ii), and the following observation: let us consider a diagram which contributes to V ×h¯ A ×h¯ V −1 according to part (ii). If the subdiagrams belonging to V and V −1 have r and s connected components, then the whole diagram has at least (r + s) contractions, which yield a factor h¯ (r+s) . It remains the proof of (ii): We use the same notations as in the proof of Proposition 2. Let Y1 & Y2 = Y , X1 & X2 = X. We now consider the sum of all diagrams contributing to R(Y, X) in which the vertices (Y1 , X1 ) are not connected with the vertices (Y2 , X2 ).
22
M. Dütsch, K. Fredenhagen
Using (55) and the fact that disconnected diagrams factorize with respect to the classical product (58), this (partial) sum is equal to (−1)|I ∩Y1 | [T¯ (I ∩ Y1 ) ×h¯ T (I c ∩ Y1 , X1 )] · I ⊂Y (81) |I ∩Y2 | ¯ c [T (I ∩ Y2 ) ×h¯ T (I ∩ Y2 , X2 )] = R(Y1 , X1 ) · R(Y2 , X2 ). (−1) From 1 = V V −1 = V V ∗ , (54) and (78) we know (−1)(|Y1 |+|X1 |) R ∗ (Y1 , X1 ) ×h¯ R(Y2 , X2 ) = 0
(82)
Y1 &Y2 =Y, X1 &X2 =X
for fixed (Y, X), Y ∪ X = ∅. Next we note 1 V ×h¯ A ×h¯ V −1 = dy1 . . . dyn dx1 . . . dxm θ(y1 ) . . . n!m! n,m θ (yn )θ− (x1 ) . . . θ− (xm ) (−i)(|Y1 |+|X1 |) Y1 &Y2 =Y, X1 &X2 =X
×i
(|Y2 |+|X2 |)
∗
R (Y1 , X1 ) ×h¯ A ×h¯ R(Y2 , X2 ),
(83)
where we have used the notations Y ≡ {y1 , . . . yn }, X ≡ {x1 , . . . , xn }. In the integrand of the latter expression we consider (for given Y and X) fixed decompositions Y = Y3 & Y4 and X = X3 & X4 , Y3 ∪ X3 = ∅. Now we consider the (partial) sum of all diagrams in which the vertices (Y3 , X3 ) are not connected with A and each of the vertices (Y4 , X4 ) is connected with A. Part (ii) is proved if we can show that this partial sum vanishes. This holds in fact true because R ∗ and R factorize according to (81), and due to the unitarity (82): (−1)(|Y1 ∩Y4 |+|X1 ∩X4 |) R ∗ (Y1 ∩ Y4 , X1 ∩ X4 ) Y1 &Y2 =Y, X1 &X2 =X
×h¯ A ×h¯ R(Y2 ∩ Y4 , X2 ∩ X4 ) (−1)(|Y1 ∩Y3 |+|X1 ∩X3 |) R ∗ (Y1 ∩ Y3 , X1 ∩ X3 ) ×h¯ R(Y2 ∩ Y3 , X2 ∩ X3 ) = 0. & %
Now we are ready to give an algebraic formulation of the expansion in h. ¯ Let In = h¯ n AL . In is an ideal in the global algebra AL . We define def
(n) def
AL =
AL , In+1
(n)
def
AL (O) =
AL (O) . In+1 ∩ AL (O)
(84)
which means that we neglect all terms which are of order O(h¯ n+1 ). The embeddings (n) (n) i21 : AL (O1 ) &→ AL (O2 ) for O1 ⊂ O2 induce embeddings AL (O1 ) &→ AL (O2 ). (n) Thus we obtain a projective system of local nets (AL (O)) of algebras of quantum observables up to order h¯ n+1 . (n) Note that we may equip our algebras AL also with the Poisson bracket induced by 1 i h¯ [·, ·]h¯ , because the ideals In are also Poisson ideals with respect to these brackets. Then
Algebraic Quantum Field Theory, Perturbation Theory, Loop Expansion
23
(0)
AL becomes the local net of Poisson algebras of the classical field theory, whereas for n = 0 we obtain a net of noncommutative Poisson algebras. The expansion in powers of h¯ is usually called “loop expansion”. This is due to the fact that the order in h¯ of a certain Feynman diagram belonging to Rn,m ((h¯ −1 θL)⊗n ; f1 ⊗ · · · ⊗ fm ), D(R4 , V) fj = O(h¯ 0 ), is equal to: (number of propagators (i.e. inner lines)) - n = (number of loops) + m - (number of connected components). In particular, using part (i) of Proposition 2, we find that for the interacting fields (m = 1) the order in h¯ agrees with the number of loops. 6. Local Algebraic Formulation of the Quantum Action Principle The method of algebraic renormalization (for an overview see [22]) relies on the so called “quantum action principle” (QAP), which is due to Lowenstein [20] and Lam [18]. This principle is a formula for the variation of (possibly connected or one-particle-irreducible) Green’s functions (or of the corresponding generating functional) under – a change of coordinates (e.g. one applies the differential operator of the field equation to the Green’s functions), – a variation of the fields (e.g. the BRST-transformation) – a variation of a parameter. This may be a parameter in the Lagrangian or in the normalization conditions for the Green’s functions. These are different theorems with different proofs. The common statement is that the variation of the Green’s functions is equal to the insertion of a local or spacetime integrated composite field operator (for details see [22]). In this section we study two simple cases of the QAP: the field equation and the variation of a parameter which appears only in the interaction Lagrangian. The aim of this section is to formulate the QAP (in these two cases) for our local algebras of observables AL (O), i.e. we are looking for an operator identity which holds true independently of the adiabatic limit. Such an identity does not depend on the choice of a state, as it is the case for the Green’s functions. In a second step we compare our formula with the usual formulation of the QAP in terms of Green’s functions. The latter are the vacuum expectation values in the adiabatic limit g → 1.8 We specialize to models for which the adiabatic limit is known to exist. This is the case for pure massive theories [14] and certain theories with (some) massless particles such as QED and λ : ϕ 2n : -theories [4], provided the time ordered products are appropriately normalized. Remarks. (1) From the usual QAP (in terms of Green’s functions) one obtains an operator identity by means of the Lehmann–Symanzik–Zimmermann-reduction formalism [19]. Although the latter relies on the adiabatic limit an analogous conclusion from the Fock vacuum expectation values to arbitrary matrix elements is possible in our local construction: let O be an open double cone and let x1 , . . . , xk ∈ ((O¯ ∪ {xk+l+1 , . . . , xn }) + V¯− ), xk+1 , . . . , xk+l ∈ O and xk+l+1 , . . . , xn ∈ (O¯ + V¯+ ). Using the causal factorization of time ordered products of interacting fields (28) we 8 This limit is taken by scaling the test function g: let g ∈ D(R4 ), g (0) = 1; then one considers the 0 0 limit A → 0 (A > 0) of gA (x) ≡ g0 (Ax). Uniqueness of the adiabatic limit means the independence of the particular choice of g0 .
24
M. Dütsch, K. Fredenhagen
obtain
∗ , Tθ L ϕ(x1 ) . . . ϕ(xn ) = Tθ L ϕ(x1 ) . . . ϕ(xk ) ,
Tθ L ϕ(xk+1 ) . . . ϕ(xk+l ) Tθ L ϕ(xk+l+1 ) . . . ϕ(xn ) .
(85)
Now we choose θ ∈ $(O) such that {x1 , . . . , xk } ∩ (supp θ + V¯− ) = ∅ and {xk+l+1 , . . . , xn } ∩ (supp θ + V¯+ ) = ∅. Due to the retarded support (56) of the Rproducts we then know that T ϕ(x ) . . . ϕ(x ) agrees with the time ordered prodk+l+1 n θ L uct T0 ϕ(xk+l+1 ) . . . ϕ(xn ) of the corresponding free fields. By means of Sθ L (f ϕ) = S(θL)−1 S(f ϕ)S(θL) for supp f ∩ (supp θ + V¯− ) = ∅ we obtain ∗ ∗ (86) Tθ L ϕ(x1 ) . . . ϕ(xk ) = S(θ L)−1 T0 ϕ(x1 ) . . . ϕ(xk ) S(θ L). Our assertion follows now from the fact that the states T0 ϕ(xk+l+1 ) . . . ϕ(xn ) generate a dense subspace of the Fock space and the same for the states S(θ L)−1 ∗ T0 ϕ(x1 ) . . . ϕ(xk ) S(θL). (For the validity of the latter statement it is important that x1 , . . . , xk can be arbitrarily spread over a Cauchy surface which is later than (O¯ ∪ {xk+l+1 , . . . , xn }).) (2) Recently Pinter [23] presented an alternative derivation of the QAP for the variation of a parameter in the Lagrangian also in the framework of causal perturbation theory. In contrast to our presentation Pinter’s QAP is formulated for the S-matrix. 6.1. Field equation. The normalization condition N4 implies ( x + m2 )R L(y1 ) . . . L(yn ); φ(x)φ(x1 ) . . . φ(xm ) = −i −i
n l=1 m
∂L δ(x − yl )R L(y1 ) . . . lˆ . . . L(yn ); (x)φ(x1 ) . . . φ(xm ) ∂φ
(87)
δ(x − xj )R L(y1 ) . . . L(yn ); φ(x1 ) . . . jˆ . . . φ(xm ) ,
j =1
where lˆ and jˆ means that the corresponding factor is omitted. This equation takes a simple form for the corresponding generating functionals (i.e. the relative S-matrices (16))
δ δ ∂L Sg L (f φ) − . f (x)Sg L (f φ) = ( x + m2 ) Sg L f φ + ρg iδf (x) iδρ(x) ρ=0 ∂φ (88) To formulate this in terms of our local algebras of observables (cf. sect. 3) we set g ≡ θ ∈ $(O) and for x ∈ O we can choose ρ such that supp ρ ⊂ {y|θ(y) = 1}. Then (88) turns into
∂L δ δ ( x + m2 ) SL (f φ) = f (x)SL (f φ) + , x ∈ O. SL f φ + ρ iδf (x) iδρ(x) ρ=0 ∂φ (89)
Algebraic Quantum Field Theory, Perturbation Theory, Loop Expansion
25
This is the QAP (in the case of the field equation) for the local algebras of observables. To compare with the usual form of the QAP we consider the generating functional Z(f ) for the Green’s functions < |T φL (x1 ) . . . φL (xm ) | > which is obtained from the relative S-matrices by Z(f ) = lim (, Sg L (f φ)), g→1
(90)
where is the Fock vacuum [14]. So by taking the vacuum expectation value and the adiabatic limit of (88) we get f (x)Z(f ) = − (x) · Z(f ),
(91)
where (x) is a insertion of UV-dimension9 3, coinciding with the classical field polyδS nomial δφ(x) in the classical approximation (where S = d 4 x [ 21 (∂µ φ(x)∂ µ φ(x) − m2 φ 2 (x)) + g(x)L(x)] is the classical action). Equation (91) is the usual form of the QAP (cf. eqn. (3.20) in [22]). In the present case the local algebraic formulation (89) contains more information than the usual QAP (91).
6.2. Variation of a parameter in the interaction. In (54) we have defined retarded products of Wick polynomials, i.e. elements of the Borchers class. Analogously we now introduce retarded products RL (g ⊗n ; f ⊗m ) of interacting fields SL+g (f ) = SL (g)−1 SL (g + f ) =
def
∞ i n+m RL (g ⊗n ; f ⊗m ), n!m!
(92)
n,m=0
where L, g, f ∈ D(R4 , V). Obviously they can be expressed in terms of antichronological and time ordered products of interacting fields by exactly the same formula as in the case of Wick polynomials (55) RL (g ⊗n ; f ⊗m ) =
n
(−1)k
k=0
n! T¯L (g ⊗k )TL (g ⊗(n−k) ⊗ f ⊗m ). k!(n − k)!
(93)
Thereby the antichronological product of interacting fields is defined analogously to the time ordered product (28), namely by T¯L (f ⊗m ) =
dm SL (λf )−1 , (−i)m dλm λ=0
(94)
and satisfies anticausal factorization (14) (which justifies the name). The support property (56) of the retarded products relies on the (anti)causal factorization of the T - and T¯ products (11, 14), hence, the R-product of interacting fields ((92), (93)) has also retarded support (56). Similarly to Lowenstein in [20], Sect. II.B, we consider an infinitesimal change of the interaction Lagrangian L0 → L0 + AL1 , 9 We assume that L has UV-dimension 4.
(95)
26
M. Dütsch, K. Fredenhagen
where L0 , L1 ∈ V or D(R4 , V). For the m-fold variation of the time ordered product of the interacting fields (28) we obtain d m ∂ l ∂ m ⊗l T (f ) = Sθ(L0 +A L1 ) (λf ) θ(L0 +A L1 ) dA m A=0 ∂A m A=0 i l ∂λl λ=0 = i m Rθ L0 ((θL1 )⊗m ; f ⊗l ).
(96)
To formulate this identity for our local algebras of observables we assume that L1 has compact support, i.e. L1 ∈ D(R4 , V). We set def
$0 (O) = {θ ∈ $(O)
|
θ |supp L1 ≡ 1}.
(97)
We consider the observables as covariantly constant sections in the bundle over $0 (O) (instead of $(O) as in sect. 3). Then we obtain dm ⊗l |A=0 TL0 +A L1 (f ⊗l ) = i m RL0 (L⊗m 1 ; f ). dA m
(98)
This is the local algebraic formulation of the QAP for the variation of a parameter in the interaction. We are now going to investigate the usual QAP by using Epstein and Glaser’s definition of Green’s functions (90). In (96) the m-fold variation of the parameter A results in a retarded insertion of (θ L1 )⊗m . In the usual QAP (θL1 )⊗m is inserted into the time ordered product, i.e. one considers i m Tθ L0 ((θL1 )⊗m ⊗ f ⊗l ) =
∂ l ∂ m Sθ L0 (θ AL1 + λf ). ∂A m A=0 i l ∂λl λ=0
(99)
Obviously (96) and (99) do not agree. However, let us assume that we are dealing with a purely massive theory and that L0 and L1 have UV-dimension dim(Lj ) = 4. Or: if dim(Lj ) < 4 we assume that Lj is treated in the extension to the total diagonal as if it would hold dim(Lj ) = 4. Hence it may occur that the scaling degree increases in the extension to a certain amount: sd(t0 ) ≤ sd(t) ≤ 4n − b for a scalar theory without derivative couplings, where b is the number of external legs (cf. (51)–(53)). (In the BPHZ framework one says that Lj is “oversubtracted with degree 4”.) Then there exists a normalization of the time ordered products, which is compatible with the other normalization conditions N1–N4 and (63), such that the Green’s functions corresponding to (99) exist and agree, i.e. we assert
d m ⊗l m ⊗m ⊗l , T , T lim (f ) = i lim ((θL ) ⊗ f ) 1 θ(L0 +A L1 ) θ L0 θ→1 dA m A=0 θ→1 (100) for all m, l ∈ N0 , which is equivalent to
lim , Sθ(L0 +A L1 ) (λf ) = lim , Sθ L0 (θ AL1 + λf ) . θ→1
θ→1
m
l
(101)
∂ ∂ commute with the adiabatic limit θ → 1. (We assume that the derivatives ∂A m and ∂λl This seems to be satisfied for vacuum expectation values in pure massive theories as it is the case here [14].) This is the usual form of the QAP (in terms of Epstein and Glaser’s
Algebraic Quantum Field Theory, Perturbation Theory, Loop Expansion
27
Green’s functions) for the present case (cf. Eq. (2.6) of [20]10 ). In contrast to the field equation, the QAP (100) does not hold for the operators before the adiabatic limit. Proof of (100). For a better comparison with Lowenstein’s formulation, we present a proof which makes the detour over the corresponding Gell–Mann Low expressions. First we comment on the equality of Epstein and Glaser’s Green’s functions with the Gell–Mann Low series lim (, Sθ L (f )) = lim
θ→1
θ→1
(, S(θ L + f )) , (, S(θL))
(102)
which is proved in the appendix of [12]. This can be understood in the following way: let def
P be the projector on the Fock vacuum and P⊥ = 1−P . Using S(θ L)∗ = S(θ L)−1 we obtain (, Sθ L (f )) = (S(θL), (P + P⊥ )S(θ L + f )) (, S(θL + f )) · |(, S(θL))|2 = (, S(θ L)) + (, S(θL)−1 P⊥ S(θ L + f ))
(103)
and 1 = (, S(θ L)−1 (P + P⊥ )S(θL))
= |(, S(θ L))|2 + (, S(θ L)−1 P⊥ S(θ L)).
(104)
In (, S(θ L)−1 P⊥ S(θL + f )) there is at least one contraction between S(θ L)−1 and S(θL + f ) (or: the terms without contraction are precisely (, S(θ L)−1 )(, S(θL + f ))). In the mentioned reference the support properties in momentum space of the contracted terms are analysed and in this way it is proved lim (, S(θL)−1 P⊥ S(θL + f )) = 0.
θ→1
(105)
Inserting this into (103) and (with f = 0) into (104) it results (102). Because of (102) our assertion (101) is equivalent to lim
θ→1
(, S(θ (L0 + AL1 ) + λf )) (, S(θ (L0 + AL1 ) + λf )) = lim . θ→1 (, S(θ (L0 + AL1 ))) (, S(θ L0 ))
(106)
This is the QAP in terms of the Gell–Mann Low series. Obviously the nontrivial statement is lim
θ→1
(, S(θ (L0 + AL1 ))) = 1. (, S(θL0 ))
(107)
A possibility to ensure the validity of this equation is the above assumption (which has not been used so far) that L0 and L1 have mass dimension dim(Lj ) ≤ 4 and are treated as 10 Lowenstein works with Zimermanns definition of normal products of interacting fields: l Nδ { lj =1 ϕij L (x)}, δ ≥ d ≡ j =1 d(ϕij L ) [29]. For δ = d (i.e. without oversubtraction) l Nδ { j =1 ϕij L (x)} agrees essentially with our (: lj =1 ϕij (x) :)g L (29). The difference is due to the adiabatic limit and the different ways of defining Green’s functions (Zimmermann uses the Gell–Mann Low series, cf. (102), (106)).
28
M. Dütsch, K. Fredenhagen
dimension 4 vertices in the renormalization procedure. Due to this additional assumption and the requirements that the adiabatic limit exists and is unique, the normalization of the vacuum diagrams is uniquely fixed, and with this normalization the vacuum diagrams vanish in the adiabatic limit lim (, S(θ L0 )) = 1,
θ→1
lim (, S(θ (L0 + AL1 ))) = 1.
θ→1
(For a proof see also the appendix of [12].)
(108)
& %
Remarks. (1) Without the assumption about L0 and L1 we find
, Sθ L0 (θ AL1 + λf )
lim , Sθ(L0 +A L1 ) (λf ) = lim θ→1 θ→1 , Sθ L0 (θ AL1 )
(109)
instead of (101), by using (102) only. This is a formulation of the QAP for general situations in which (107) does not hold. (2) By means of the QAP (98) (or (100), or (109)) one can compute the change of the time ordered products of interacting fields (or of the Green’s functions) under the variation of parameters λ1 , . . . , λs if the interaction Lagrangian has the form L(x) = 4 i ai (λ1 , . . . , λs )Li (x), Li ∈ V resp. D(R , V) (cf. Eqs. (2.7), (2.8)) of [20]). But only the interaction L may depend on the parameters and not the time ordering operator (i.e. the normalization conditions for the time ordered products). Appendix: Wavefront Sets and the Pointwise Product of Distributions In this appendix we briefly recall the definition of the wavefront set of a distribution and mention a simple criterion for the existence of the pointwise product of distributions in terms of their wavefront sets. For a detailed treatment we refer to Hörmander [15], the application to quantum field theory on curved spacetimes can be found in [25, 8, 7]. Let t ∈ D (Rd ) be singular at the point x and let f ∈ D(R4 ) with f (x) = 0. Then f t ∈ D (Rd ) is also singular at x and f t has compact support. Hence the Fourier transform ft is a C ∞ -function. In some directions ft does not rapidly decay, because otherwise f t would be infinitly differentiable at x. Thereby a function g is called rapidly decaying in the direction k ∈ Rd \ {0}, if there is an open cone C with k ∈ C and supk ∈C |k |N |g(k )| < ∞ for all N ∈ N. Definition. The wavefront set WF(t) of a distribution t ∈ D (Rd ) is the set of all pairs (x, k) ∈ Rd × Rd \ {0} such that the Fourier transform ft does not rapidly decay in the direction k for all f ∈ D(Rd ) with f (x) = 0. For example the delta distribution satisfies fδ(k) = f (0), hence WF(δ) = {0} × \ {0}. The wavefront set is a refinement of the singular support of t (which is the complement of the largest open set where t is smooth): Rd
t is singular at x
⇐⇒
∃k ∈ Rd \ {0} with (x, k) ∈ WF(t).
For the wavefront set of the two-point function one finds WF( + ) = {(x, k) | x 2 = 0, k 2 = 0, x.k, k0 > 0}.
(110)
Algebraic Quantum Field Theory, Perturbation Theory, Loop Expansion
29
Let t and s be two distributions which are singular at the same point x. We localize them by multiplying with f ∈ D(Rd ), where f (x) = 0. We assume that (f t) and (f s) have only one overlapping singularity, namely at x. In general the pointwise product (f t)(y)(f s)(y) does not exist. Heuristically this can be seen by the divergence of the convolution integral dk (f t)(p − k)(f s)(k). But this integral converges if k1 + k2 = 0 for all k1 , k2 with (x, k1 ) ∈ WF(t) and (x, k2 ) ∈ WF(s). This makes plausible the following theorem: Theorem. Let t, s ∈ D (Rd ) with {(x, k1 + k2 ) | (x, k1 ) ∈ WF(t) ∧ (x, k2 ) ∈ WF(s)} ∩ (Rd × {0}) = ∅.
(111)
Then the pointwise product (ts) ∈ D (Rd ) exists. By means of this theorem one verifies the existence of the distributional products (ϕ ⊗n )h¯ (t) (44) and (t ⊗k,h¯ s) (47). Acknowledgements. We thank Gudrun Pinter for several discussions on the quantum action principle, and Volker Schomerus and Stefan Waldmann for discussions on deformation quantization. In particular we are grateful to Stefan Waldmann for drawing our attention to reference [9].
Note added in proof. Renormalization can also be done entirely on the level of retarded products [1, 2, 3]. This leads to a direct proof that the interacting fields are power series in h. ¯ [1] Steinmann, O.: Perturbation expansions in axiomatic field theory. Lecture Notes in Physics 11, Berlin– Heidelberg–New York: Springer-Verlag, 1971 [2] Dütsch, M. and Fredenhagen, K.: Perturbative Algebraic Field Theory, and Deformation Quantization. To appear in the proceedings of the Conference on Mathematical Physics in Mathematics and Physics, Siena, June 20–25, 2000 [3] Dütsch, M. and Fredenhagen, K.: Causal perturbation theory in terms of retarded products and perturbative algebraic field theory. Work in progress
References 1. Balaban, T.: Large Field Renormalization. II. Localization, Exponentiation, and Bounds for the R Operation. Commun. Math. Phys. 122, 355 (1989); and earlier works of Balaban cited therein 2. Bayen, F., Flato, M., Fronsdal, C., Lichnerowicz, A., Sternheimer, D.: Deformation Theory and Quantization. Ann. Phys. (N.Y.) 111, 61, 111 (1978) 3. Becchi, C., Rouet, A. and Stora, R.: Renormalization of the abelian Higgs–Kibble model. Commun. Math. Phys. 42, 127 (1975); Becchi, C., Rouet, A. and Stora, R.: Renormalization of gauge theories. Ann. Phys. (N.Y.) 98, 287 (1976) 4. Blanchard, P. and Sénéor, R.: Green’s functions for theories with massless particles (in perturbation theory). Ann. Inst. H. Poincaré A 23, 147 (1975) 5. Boas, F.M., Dütsch, M. and Fredenhagen, K.: A local (perturbative) construction of observables in gauge theories: Nonabelian gauge theories. Work in progress 6. Bogoliubov, N.N. and Shirkov, D.V.: Introduction to the Theory of Quantized Fields. New York, 1959 7. Brunetti, R. and Fredenhagen, K.: Microlocal analysis and interacting quantum field theories: Renormalization on physical backgrounds. math-ph/9903028, Commun. Math. Phys. 208, 623 (2000) 8. Brunetti, R., Fredenhagen, K. and Köhler, M.: The microlocal spectrum condition and Wick polynomials of free fields on curved space times. Commun. Math. Phys. 180, 312 (1996) 9. Dito, J.: Star-Product Approach to Quantum Field Theory: The Free Scalar Field. Lett. Math. Phys. 20, 125 (1990); Dito, J.: Star-products and nonstandard quantization for Klein–Gordon equation. J. Math. Phys. 33, 791 (1992)
30
M. Dütsch, K. Fredenhagen
10. Dütsch, M. and Fredenhagen, K.: A local (perturbative) construction of observables in gauge theories: The example of QED. Commun. Math. Phys. 203, 71 (1999) 11. Dütsch, M. and Fredenhagen, K.: Deformation stability of BRST-quantization. Preprint: hep-th/9807215, DESY 98-098, In: Proceedings of the conference ’Particles, Fields and Gravitation’, Lodz, Poland (1998) 12. Dütsch, M.: Slavnov–Taylor identities from the causal point of view. Int. J. Mod. Phys. A 12, 3205 (1997) 13. Epstein, H.: On the Borchers’ class of a free field. N. Cimento 27, 886 (1963); Schroer, B.: Unpublished preprint (1963) 14. Epstein, H. and Glaser, V.: The role of locality in perturbation theory. Ann. Inst. H. Poincaré A 19, 211 (1973) 15. Hörmander, L.: The Analysis of Linear Partial Differential Operators I. Berlin: Springer-Verlag, 1983 16. Il’in, V.A., Slavnov, D.A.: Algebras of observables in the S-matrix approach. Theor. Math. Phys. 36, 578 (1978) 17. Kugo, T. and Ojima, I.: Local covariant operator formalism of non-abelian gauge theories and quark confinement problem. Suppl. Progr. Theor. Phys. 66, 1 (1979) 18. Lam, Y.-M.P.: Perturbation Lagrangian Theory for Scalar Fields – Ward–Takahashi Identity and Current Algebra. Phys. Rev. D 6, 2145 (1972); Equivalence Theorem on Bogoliubov–Parasiuk–Hepp– Zimmermann – Renormalized Lagrangian Field Theories. Phys. Rev. D 7, 2943 (1973) 19. Lehmann, H., Symanzik, K., Zimmermann, W.: Zur Formulierung quantisierter Feldtheorien. Nuovo Cimento 1, 205 (1955) 20. Lowenstein, J.H.: Differential vertex operations in Lagrangian field theory. Commun. Math. Phys. 24, 1 (1971) 21. Magnen, J., Rivasseau and Sénéor, R.: Construction of Y M4 with an infrared cutoff. Commun. Math. Phys. 155, 325 (1993) 22. Piguet, O. and Sorella, S.P.: Algebraic Renormalization. Berlin–Heidelberg–New York: Springer-Verlag, 1995 23. Pinter, G.: The Action Principle in Epstein Glaser Renormalization and Renormalization of the S-Matrix of 4 -Theory. hep-th/9911063 24. Prange, D.: Epstein–Glaser renormalization and differential renormalization. J. Phys. A 32, 2225 (1999) 25. Radzikowski, M.: Micro-local approach to the Hadamard condition in quantum field theory on curved space-time. Commun. Math. Phys. 179, 529 (1996) 26. Scharf, G.: Finite Quantum Electrodynamics. The causal approach. 2nd. ed., Berlin–Heidelberg–New York: Springer-Verlag, 1995 27. Steinmann, O.: Perturbation expansions in axiomatic field theory. Lecture Notes in Physics 11, Berlin– Heidelberg–New York: Springer-Verlag, 1971 28. Stora, R.: Differential algebras in Lagrangean field theory. ETH-Zürich Lectures, January–February 1993; Popineau, G. and Stora, R.: A pedagogical remark on the main theorem of perturbative renormalization theory. Unpublished preprint (1982) 29. Zimmermann, W.: In: Lectures on Elementary Particles and Quantum Field Theory. Brandeis Summer Institute in Theoretical Physics (1970), S. Deser (ed.) Communicated by G. Mack
Commun. Math. Phys. 219, 31 – 44 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Weak Transition Matrix Elements from Finite-Volume Correlation Functions Laurent Lellouch1, , Martin Lüscher2, 1 LAPTH, Chemin de Bellevue, B.P. 110, 74941 Annecy-Le-Vieux Cedex, France 2 CERN, Theory Division, 1211 Geneva 23, Switzerland
Received: 29 March 2000 / Accepted: 10 April 2000
Dedicated to the memory of Harry Lehmann Abstract: The two-body decay rate of a weakly decaying particle (such as the kaon) is shown to be proportional to the square of a well-defined transition matrix element in finite volume. Contrary to the physical amplitude, the latter can be extracted from finite-volume correlation functions in euclidean space without analytic continuation. The K → ππ transitions and other non-leptonic decays thus become accessible to established numerical techniques in lattice QCD. 1. Introduction The computation of the non-leptonic kaon decay rates from first principles, using lattice QCD and numerical simulations, meets a number of technical difficulties (see [1], for example). Apart from the operator renormalization, which must be controlled at the nonperturbative level, the central problem is that the computational framework is limited to correlation functions in euclidean space and that there is apparently no simple relation between the behaviour of these functions at large time separations and the desired transition matrix elements [2, 3]. This statement (which is often referred to as the Maiani–Testa no-go theorem) applies to very large or infinite lattices, where the spectrum of final states is continuous. One might think that having a finite volume (as is unavoidable when numerical simulations are employed) makes it even more difficult to extract the transition amplitudes. In the present paper we wish to show that this is actually not so. The key observation is that the two-pion energy spectrum is far from being continuous when the lattice is only a few fermis wide. Under these conditions, a kaon at rest cannot decay into two pions unless one of these energy levels happens to be close to its mass. This is the case for certain Work supported in part by TMR, EC-Contract No. ERBFMRX-CT980169.
On leave from Centre de Physique Théorique, CNRS Luminy, 13288 Marseille Cedex 9, France On leave from Deutsches Elektronen-Synchrotron DESY, 22603 Hamburg, Germany
32
L. Lellouch, M. Lüscher
lattice sizes, and a simple formula then relates the square of the corresponding transition amplitude in finite volume to the physical decay rate in infinite volume. The problem is thus reduced to calculating the required finite-volume transition amplitudes. Since the initial and final states are isolated energy eigenstates, these matrix elements can in principle be computed using established techniques, such as those commonly employed to determine form factors. An additional difficulty is that the relevant two-pion states are not the lowest ones in the specified sector. Two-particle states in finite volume have, however, previously been studied [4]–[14] and practical methods have been devised to calculate the higher levels. To keep the presentation as transparent as possible, we shall consider a simplified generic theory with two kinds of spinless particles, referred to as the kaon and the pion. Details are given in the next section, and we then first discuss the form of the two-pion energy spectrum in finite volume. This is essentially a summary of the relevant results of refs. [15]–[17]. In Sect. 4 we define the transition amplitudes in finite volume and state the formula that relates them to the corresponding decay rates in infinite volume. The following sections contain the proof of this relation and a discussion of its application to the physical kaon decays. 2. Preliminaries As announced above, we consider a generic situation where there are two particles, the “kaon” and the “pion”, with spin zero and masses such that 2mπ < mK < 4mπ .
(2.1)
We assume that the symmetries of the theory are such that the kaon is stable in the absence of the weak interactions and that the pions scatter purely elastically below the four-pion threshold. The weak interactions, described by a local effective lagrangian Lw (x), then allow the kaon to decay into two pions. The corresponding transition amplitude is T (K → ππ) = π p1 , π p2 out|Lw (0)|K p,
(2.2)
with p1 , p2 and p the four-momenta of the pions and the kaon. We shall only be interested in the physical case where the total momentum p = p1 + p2 is conserved. Lorentz invariance and the kinematical constraints then imply that the transition amplitude is independent of the momentum configuration. The meson states in Eq. (2.2) are normalized according to the standard relativistic conventions (Appendix A) and their phases are constrained by the LSZ formalism. In the case of the pions, for example, one assumes that there exists an interpolating hermitian field ϕ(x) such that 0|ϕ(x)|π p = Zπ e−ipx (2.3) for some positive constant Zπ . If the phase of the kaon states is chosen in the same way, the CPT symmetry implies T (K → π π ) = Aeiδ0
(2.4)
with A real and δ0 the S-wave scattering phase shift of the outgoing pion state. The decay rate is then given by the usual expression kπ 1 2 |A| = , kπ ≡ m2K − 4m2π , (2.5) 2 16πm2K proportional to the pion momentum kπ in the centre-of-mass frame.
Weak Transition Matrix Elements from Finite-Volume Correlation Functions
33
3. Two-Pion States in Finite Volume In a spatial box of size L × L × L with periodic boundary conditions, the eigenvalues of the total momentum operator are integer multiples of 2π/L. The energy spectrum is also discrete in this situation, with level spacings that can be appreciable. In the following we consider the subspace of states with zero total momentum and trivial transformation behaviour under cubic rotations and reflections. The energy spectrum of the two-pion states in this sector below the inelastic threshold W = 4mπ has been studied in detail in refs. [15]–[18]. In particular, for the lowest energy value the expansion a02 a0 4π a0 W = 2mπ − 1 + c1 + c2 2 + O(L−6 ), (3.1) m π L3 L L c1 = −2.837297,
c2 = 6.375183,
(3.2)
has been obtained, where a0 = lim
k→0
δ0 (k) k
(3.3)
is the S-wave scattering length (here and below the scattering phase is considered to be a function of the pion momentum k in the centre-of-mass frame). The higher energy values in the elastic region are determined through W = 2 m2π + k 2 , (3.4) nπ − δ0 (k) = φ(q),
q≡
kL , 2π
(3.5)
where n = 1, 2, . . . labels the energy levels in increasing order and the angle φ(q) is a known kinematical function (Appendix B). Apart from the lowest level, the energy spectrum at any given value of L is thus obtained by inserting the solutions k of Eq. (3.5) in Eq. (3.4)1 . All these results are valid up to terms that vanish exponentially at large L. Box sizes a few times larger than the diameter of the pion should be safe from these corrections. Equation (3.5) moreover assumes that the scattering phases δl for angular momenta l ≥ 4 are small in the elastic region, which is usually the case since δl is proportional to k 2l+1 at low energies. For illustration, let us consider QCD with three flavours of quarks, unbroken isospin symmetry and quark masses such that the masses of the charged pions and kaons coincide with their physical values. In the subspace with isospin 0, the two-pion energy spectrum is then given by Eqs. (3.1)–(3.5), with δ0 the appropriate pion scattering phase. If we insert the phase shift that is obtained at one-loop order of chiral perturbation theory [20]–[22], this yields the curves shown in Fig. 1. For any other reasonable choice of the scattering phase the plot would look essentially the same, because the interaction effects are proportional to 1/L3 and thus tend to be small. Note that the spacing between successive levels is quite large. One is clearly very far away from having a continuous spectrum when L ≤ 10 fm. 1 Similar formulae have been derived for the spectrum in the subspaces of states with non-zero total momentum [19]. The extension of our results to these sectors could give further insight into the connection between finite and infinite volume matrix elements and may prove useful in practice.
34
L. Lellouch, M. Lüscher
Fig. 1. Two-pion energy spectrum in QCD below the inelastic threshold, in the sector with isospin 0, calculated from Eqs. (3.1)–(3.5) with the scattering phase shift given by next-to-leading order chiral perturbation theory. The levels shown in this plot are all non-degenerate
4. Kaon Decays in Finite and Infinite Volume Let us imagine that a state |K describing a kaon in finite volume with zero momentum has been prepared at time x0 = 0. In the absence of the weak interactions, this is an energy eigenstate (and thus a stationary state) with energy mK . However, through the interaction hamiltonian Hw = d3 x Lw (x), (4.1) x0 =0
the time evolution of the state becomes non-trivial and it starts to mix with the other eigenstates of the unperturbed hamiltonian. It is straightforward to work this out using ordinary time-dependent perturbation theory. For the transition probability at time x0 = t to any finite-volume two-pion state |π π with energy W , the result 2 1 2 sin 2 ωt P (K → ππ) = 4 |ππ|Hw |K| , ω ≡ W − mK , (4.2) ω2 is then obtained (in this equation the states are assumed to be normalized to unity and higher-order weak-interaction effects have been neglected). From Eq. (4.2) one infers that the transition probabilities tend to be very small unless the energy of one of the two-pion final states happens to be close to the kaon mass. Recalling Fig. 1, it is clear that this will be the case only for certain box sizes L. In the following we focus on these special values of L and introduce the associated transition matrix element M = π π |Hw |K,
(4.3)
where both states are normalized to unity as before, while their phase will not matter and can be chosen arbitrarily. Since W = mK in this case, Eq. (4.2) becomes P (K → π π ) = |M|2 t 2
(4.4)
Weak Transition Matrix Elements from Finite-Volume Correlation Functions
35
and the kaon will thus have an appreciable probability to decay into the two-pion state if one waits long enough (the formula breaks down at very large times, because the higher-order terms are then no longer negligible). The central result obtained in the present paper is that the finite-volume matrix element M is related to the decay rate of the kaon in infinite volume through
∂φ ∂δ0 |A| = 8π q +k ∂q ∂k
2
k=kπ
mK kπ
3 |M|2
(4.5)
[cf. Eqs. (2.5), (3.5)]. The relation holds under the same premises as Eq. (3.5) and the comments made in Sect. 3 thus apply here too. Another restriction is that the two-pion final state has to be non-degenerate in the specified sector of the unperturbed theory. This condition is satisfied for n < 8 [17], but degeneracies can occur at higher level numbers and the formula then ceases to be valid. In principle Eq. (4.5) allows one to compute the kaon decay rate in infinite volume by studying the theory in finite volume. Note that in the course of such a calculation it should also be possible to determine the two-pion energy spectrum and thus the scattering phase δ0 in the elastic region. The proportionality factor in Eq. (4.5) essentially accounts for the different normalizations of the particle states in finite and infinite volume. One can easily check this in the free theory, where the pion self-interactions are neglected. In this case and for n ≤ 6, the nth two-pion energy level passes through mK at L=
2π √ n. kπ
(4.6)
Equation (4.5) then assumes the form 4 (mK L)3 |M|2 , νn νn ≡ number of integer vectors z with z2 = n,
|A|2 =
(4.7) (4.8)
which is precisely what is derived from the relative normalizations of the plane waves in finite and infinite volume that describe the (non-interacting) kaon and pion states (Sect. 6). 5. Proof of Equation (4.5) The interpretation of the proportionality factor in Eq. (4.5) given above also applies in the interacting case. This follows from the fact that the transition matrix elements probe the S-wave component of the two-pion wave function near the origin and that this component is the same in finite and infinite volume apart from its phase and normalization. The latter can be worked out explicitly in the framework of refs. [15]–[17], but the calculation is rather involved and will not be presented here. Instead we shall go through a different argument, where one studies the influence of the weak interaction on the energy spectrum in finite volume. This can be done directly, using ordinary perturbation theory, or one may start from Eq. (3.5) and take the weakinteraction effects on the scattering phase into account. The combination of the results of these calculations then yields Eq. (4.5).
36
L. Lellouch, M. Lüscher
K
Fig. 2. Kaon resonance contribution to the elastic pion scattering amplitude in the s-channel. The diagram appears at second order of the expansion in powers of the weak interaction, with the bubbles representing the first-order Kπ π vertex function
As already mentioned in Sect. 2, the kaon is assumed to carry a quantum number (alias strangeness) that forbids its decay into pions in the unperturbed theory. Since only the strangeness-changing part of the weak interaction lagrangian contributes to the kaon transition amplitudes, all other terms may be dropped without loss. The matrix elements of the weak hamiltonian Hw between states with the same strangeness are then all equal to zero. As a consequence most energy values in finite volume are affected by the weak interaction only to second order. First order energy shifts do occur, however, if there are degenerate states at lowest order that mix under the action of Hw . This is the case at the values of L where one of the two-pion energy values coincides with the kaon mass, i.e. at the special points considered in the preceding section. Degenerate perturbation theory then yields W = mK ± |M| + . . .
(5.1)
for the first order change of these energy values (here and below the ellipses denote higher-order terms that do not contribute to the final results). The energy shifts (5.1) can also be calculated by including the weak corrections to the scattering phase on the left-hand side of Eq. (3.5). From the above one infers that the solutions of Eq. (3.5) we are interested in are given by mK |M| . k = kπ ± &k + . . . , &k ≡ (5.2) 4kπ Compared to the kaon resonance width (which is of second order in the weak interaction), these values of k are far away from the kaon pole. The weak corrections to the pion scattering amplitude in the relevant range of energies are hence small and can be safely computed by working out the perturbation expansion in powers of the interaction lagrangian. One might think that these corrections are all of second or higher order, because the interaction is strangeness-changing. The reason this is not so is that the kaon propagator in a diagram like the one shown in Fig. 2 evaluates to iZK iZK =± + ... 2mK |M| p 2 − m2K
(5.3)
at the energies (5.1) and thus reduces the effective order of the term by 1. This diagram is in fact the only one that yields a first-order contribution to the scattering amplitude. It can be calculated by noting that the momenta flowing into the three-point vertices are all on shell up to higher-order corrections. The vertices are hence proportional to the kaon decay amplitude A. Together with Eq. (5.3) this leads to the result δ¯0 (k) = δ0 (k) ∓
kπ |A|2 + ... 32π m2K |M|
(mod π )
(5.4)
Weak Transition Matrix Elements from Finite-Volume Correlation Functions
37
for the scattering phase in the full theory at the point (5.2) (as in the previous section, δ0 stands for the phase shift in the unperturbed theory). We now replace δ0 in Eq. (3.5) by δ¯0 and expand all terms in powers of the weak interaction. The lowest-order terms cancel while, at first order, the equation implies kπ |A|2 ∂δ0 (k) ∂φ(q) + . (5.5) −&k = &k ∂k ∂k 32π m2K |M| k=kπ k=kπ This is easily seen to be equivalent to Eq. (4.5) after substituting the expression (5.2) for &k and we have thus proved this relation. 6. Verification of Equation (4.5) in Perturbation Theory In a low-energy effective theory, such as the chiral non-linear σ -model, it is possible to obtain an independent check on Eq. (4.5) by working out the transition amplitudes in finite and infinite volume in perturbation theory. Since this calculation does not rely on any of the results presented above, it can provide additional confidence in the correctness of the equation. Perturbation theory may also prove helpful when considering more complicated situations, where one has several decay channels or particles with non-zero spin. In this section we describe how such a calculation proceeds, without giving too many details. 6.1. Specification of the model. The two-pion energy spectrum and the proportionality factor in Eq. (4.5) depend on the final-state interactions only through the phase shift δ0 . All other properties of the pion interactions do not matter and to check the equation we may thus consider an arbitrary effective meson theory with the correct particle spectrum. For the pion interaction lagrangian the simplest choice is 1 (6.1) λϕ(x)4 , 4! where ϕ(x) denotes the pion field and λ the bare coupling. To make the perturbation expansion completely well-defined, we introduce a Pauli–Villars cutoff ). At tree level the euclidean pion propagator is then given by 1 1 d4 x e−ipx ϕ(x)ϕ(0) = 2 − 2 , (6.2) 2 m +p ) + p2 Lint (x) =
with m the bare mass of the pion (its physical mass is denoted by mπ as before). The cutoff should be large enough so that ghost particles cannot be produced at energies below the four-pion threshold, but in view of the universality of Eq. (4.5) there is no need to take ) to infinity at the end of the calculation. As far as the kaon is concerned, the least complicated possibility is to describe it by a hermitian free field θ (x) with mass mK and to take 1 (6.3) gθ (x)ϕ(x)2 2 as the weak-interaction lagrangian. One then first has to expand the transition amplitude (2.2) in powers of λ, but we shall not discuss this here since the calculation is completely standard. The way to obtain the perturbation expansion of the finite-volume matrix element (4.3) may be less obvious, however, and we thus proceed to explain this in some detail. Lw (x) =
38
L. Lellouch, M. Lüscher
time
(a)
(b)
(c)
(d)
Fig. 3. Feynman diagrams contributing to the correlation function (6.7). The lines represent the free pion propagator in the time-momentum representation (6.8) and the filled circles the self-interaction vertex. All external lines end at times x0 or y0
6.2. Two-pion states. In finite volume the low-lying two-pion energy eigenstates with zero total momentum and trivial transformation behaviour under the cubic group may be labelled by an integer n = 0, 1, 2 . . . such that the associated energies Wn increase monotonically with n. We denote these states by |π π n and assume that they have unit norm. To lowest order in λ, the energy values are determined through the free energymomentum relation and the relative momentum of the pions. Since we only consider cubically invariant states, any two momenta that are related to each other by a cubic transformation describe the same state. For n ≤ 6 the momenta in the set
-n = k = 2π z/L z ∈ Z3 , z2 = n (6.4) are all equivalent in this sense. The corresponding state is thus non-degenerate and one concludes from this that Wn = 2 m2 + n(2π/L)2 + O(λ), 0 ≤ n ≤ 6. (6.5) In the following, our attention will be restricted to these levels. The corresponding energy eigenstates |π π n can be created from the vacuum by applying the operators L On (x0 ) = d3 x d3 y eik(x−y) ϕ(x0 , x)ϕ(x0 , y). (6.6) k∈-n 0
Note that On (x0 ) couples to all two-pion states in the given sector, since there are no quantum numbers that would forbid this. In euclidean space and at large time separations x0 − y0 , its connected two-point function is thus given by On (x0 )On (y0 )con =
6
|0|On (0)|π π l|2 e−Wl (x0 −y0 ) + . . . ,
(6.7)
l=0
where the ellipses stand for more rapidly decaying terms. The perturbation expansion of the two-pion energy Wn and the associated matrix element |0|On (0)|ππ n| may now be obtained by expanding the left-hand side of Eq. (6.7) in Feynman diagrams in the standard way. If one uses the time-momentum representation L e−ωp |x0 | d3 x e−ipx ϕ(x)ϕ(0) = − (m ↔ )), ωp ≡ m2 + p 2 , (6.8) 2ωp 0
Weak Transition Matrix Elements from Finite-Volume Correlation Functions
(a)
(b)
39
(c)
Fig. 4. Diagrams contributing to the correlation function (6.10). The double line represents the kaon propagator and the circled cross the weak interaction vertex at the origin. All other graphical elements are as in Fig. 3
for the tree-level pion propagator, the diagrams evaluate to a sum of exponentials. The desired expansions can then be read off from the coefficients of the exponential factor that corresponds to the nth level. To leading order the diagrams (a) and (b) in Fig. 3 yield the expected expression (6.5) for the two-pion energy and |0|On (0)|π π n| = 2νn L3 /Wn + O(λ) (6.9) for the matrix element [cf. Eq. (4.8)]. At the next order in the coupling, there are two types of diagrams. Diagram c and three further diagrams of this kind amount to an additive renormalization of the pion mass by a term that is independent of L up to exponentially small corrections [15]. Such contributions are neglected here and the renormalization is thus equivalent to replacing m by mπ in the tree-level expressions. One is then left with the diagram (d), which can be worked out analytically in a few lines.
6.3. Transition matrix element. The finite-volume transition matrix element (4.3) can be computed by studying the euclidean correlation function 0
L
d3 y On (x0 )Lw (0)θ (y)con =
6
e−Wl x0 +mK y0 0|On (0)|π π lπ π l|Hw |KK|θ(0)|0 + . . .
(6.10)
l=0
at large x0 and large negative y0 . As in the case of the two-pion states, the terms we are interested in are found by looking for the appropriate exponential factor. To lowest order diagram (a) in Fig. 4 yields √ g νn {1 + O(λ)} . |ππ n|Hw |K| = (6.11) 2Wn mK L3 The pion mass in this expression is renormalized by the tadpole insertions at the next order (diagram (b) and its mirror image). Diagram (c), the only other diagram at this order, may be evaluated by inserting the time-momentum representation for the external and also the internal lines. Apart from various simple terms, one then ends up with the momentum sum
40
L. Lellouch, M. Lüscher
−3
Sn = L
p∈/ n
1 2 2 − R) (p , k ) , ωp (p2 − k2 )
k ∈ -n ,
(6.12)
where R) is an expression that arises from the Pauli-Villars regularization. A general summation formula proved in ref. [16] allows one to compute such sums up to terms that vanish more rapidly than any power of 1/L. The precise form of R) is not important for this. One only needs to know that it is a smooth function of p and k and that it makes the sum absolutely convergent. The result 1 1 d3 p 1 2 , k2 ) Sn = (p + − R ) (2π)3 2ωp p2 − k2 + i/ p2 − k2 − i/ (6.13) zn νn νn 2 2 + + 3 R) (k , k ) + 4π 2 ωk L 2(ωk L)3 L is then obtained, with the constant zn given by √ zn = lim 4π Z00 (1; q 2 ) + q 2 →n
νn q2 − n
(6.14)
(the zeta function Z00 (s; q 2 ) is defined in Appendix B). 6.4. Final steps. To check Eq. (4.5) one has to tune the box size so that Wn = mK for a specified level number n. This condition determines L order by order in the coupling. The perturbation expansion of the right-hand side of Eq. (4.5) is then obtained by inserting this series in the proportionality factor and the perturbative expressions for the matrix element |ππ n|Hw |K|. To lowest order, the box size is given by Eq. (4.6) and√the function φ (q) in the proportionality factor is thus to be expanded around q = n. This generates a term proportional to zn , which cancels the corresponding term in Eq. (6.13). The integral in this equation matches with the contribution to the transition amplitude A of the infinitevolume diagram with the topology of diagram (c). All other terms that occur at first order in the coupling cancel and one finds that Eq. (4.5) holds as expected. 7. Application to the Physical Kaon Decays Compared to the generic theory considered so far, the situation in the case of the physical kaon decays is complicated by the fact that there are several decay channels. To a first approximation we may however assume that isospin is an exact symmetry in the absence of the weak interactions. The decay channels can then be separated from each other by passing to a basis of states with definite quantum numbers. As an example we discuss the CP-conserving decays of the neutral kaon into twopion states with isospin 0 and 2. The corresponding decay amplitudes, A0 and A2 , are related to the physical transition matrix elements through 2 1 0 2 T (KS0 → π + π − ) = √ A0 eiδ0 + √ A2 eiδ0 , 6 3 2 2 0 2 T (KS0 → π 0 π 0 ) = − √ A0 eiδ0 + √ A2 eiδ0 . 6 3
(7.1) (7.2)
Weak Transition Matrix Elements from Finite-Volume Correlation Functions
41
Table 7.1. Calculation of the proportionality factor in Eq. (7.4) at the first level crossing k∂δ0I /∂k
I
L [fm]
q
q∂φ/∂q
0
5.34
0.89
4.70
1.12
2
6.09
1.02
6.93
−0.09
In these equations δ0I denotes the S-wave pion scattering phase in the channel with isospin I and the normalization and phase conventions are as in Sect. 2. In the sector of two-pion states with isospin I , zero electric charge, zero total momentum and trivial transformation behaviour under cubic rotations and reflections, the energy spectrum in finite volume is determined by the equations that we have previously discussed, with δ0 replaced by δ0I . At the points where one of these energy levels passes through mK , we define the associated transition matrix element MI = (π π )I |Hw |K 0 ,
(7.3)
where it is understood that the states are normalized to unity and that Hw is the CPconserving part of the effective weak hamiltonian. With these conventions, the physical amplitudes are given by
∂δ I ∂φ |AI | = 8π q +k 0 ∂q ∂k
2
k=kπ
mK kπ
3 |MI |2 .
(7.4)
Note that A0 and A2 are real and only their relative sign is observable. Up to this sign, the complete information can thus be retrieved from the matrix elements and the energy spectrum in finite volume. For illustration, let us suppose that the scattering phases δ0I are accurately described by the one-loop formulae of chiral perturbation theory [20]–[22]. The two-pion energy spectrum in the subspaces with isospin I and the box sizes L, where the next-to-lowest levels in these sectors (the ones with level number n = 1) coincide with the kaon mass, can then be calculated. After that the proportionality factor in Eq. (7.4) is easily evaluated (Table 1) and one ends up with |A0 | = 44.9 × |M0 |, |A2 | = 48.7 × |M2 |, |A0 /A2 | = 0.92 × |M0 /M2 |.
(7.5) (7.6) (7.7)
As can be seen from these figures, the large difference between the scattering phases in the two isospin channels (about 45◦ at k = kπ ) does not lead to a big variation in the proportionality factors. In fact, if we set the scattering phases to zero altogether, Eqs. (4.6)–(4.8) give |AI | = 47.7 × |MI | for n = 1, which is not far from the results quoted above. The proportionality factor in Eq. (7.4) thus appears to be only weakly dependent on the final-state interactions. In particular, if the theory is to reproduce the &I = 1/2 enhancement, the large factor has to come from the ratio of the finite-volume matrix elements MI .
42
L. Lellouch, M. Lüscher
8. Concluding Remarks Finite-volume techniques have been used in lattice field theory for many years and have long proved to be a most effective tool. It may well be that weak transition matrix elements are also best approached in this way. For two-body decays a concrete proposition along this line has been made here, which is conceptually satisfactory and which we believe has a fair chance to work out in practice. In the case of the physical kaon decays, the proportionality factor relating the transition matrix elements in finite and infinite volume turned out to be nearly the same in the two isospin channels. This may be surprising at first sight, since the interactions of the pions in the isospin 0 state are much stronger than in the isospin 2 state. One should, however, take into account the fact that the comparison is made at box sizes L greater than 5 fm. It is hence quite plausible that the finite-volume matrix elements already include most of the final-state interaction effects (such as the ones recently discussed in refs. [23]–[25]). Apart from a purely kinematical factor, an only small correction is then required to pass to the matrix elements in infinite volume. Since the unitarity of the underlying field theory has been essential for our argumentation, it is not obvious that Eq. (4.5) holds in quenched QCD. As usual, however, one expects to be safe from the deficits of the quenched approximation when the quark masses are not too small and our results should then be applicable. An investigation of the problem in quenched chiral perturbation theory, following refs. [26, 27], may be worthwhile at this point to find out where precisely the unphysical effects set in. As a final comment we note that the ideas developed in this paper may also be applied to baryon decays, such as ) → N π , 3 → N π and 4 → )π , as well as to any other decay where the particles in the final state scatter only elastically. Depending on the kinematical details, the relation between the finite and infinite volume transition matrix elements may, however, assume a slightly different form. Appendix A The components of four-vectors in real and euclidean space are labelled by an index running from 0 to 3. Bold-face types denote the spatial parts of the corresponding fourvectors and scalar products are always taken with euclidean metric, except for Lorentz vectors in real space where xy = x0 y0 − xy. States | p in infinite volume describing a spinless particle, with mass m and fourmomentum p0 = m2 + p2 > 0, (A.1) p = (p0 , p), are normalized in such a way that p | p = 2p0 (2π )3 δ(p − p ).
(A.2)
Particle states in finite volume are always normalized to unity. In the centre-of-mass frame, the elastic scattering amplitude of two spinless particles of mass m may be expanded in partial waves according to T = 16πW
∞ l=0
(2l + 1)Pl (cos θ)tl (k),
W = 2 m2 + k 2 ,
(A.3)
Weak Transition Matrix Elements from Finite-Volume Correlation Functions
43
where W denotes the total energy of the particles, θ the scattering angle and Pl (z) the Legendre polynomials [28]. Below the inelastic threshold, unitarity implies 1 2iδl −1 , (A.4) e tl = 2ik with δl the (real) scattering phase for angular momentum l. Appendix B For all q ≥ 0 the angle φ(q) is determined through tan φ(q) = −
π 3/2 q , Z00 (1; q 2 )
φ(0) = 0,
(A.1)
and the requirement that it depends continuously on q. The zeta function in this equation is defined by 1 2 (n − q 2 )−s Z00 (s; q 2 ) = √ 4π 3
(A.2)
n∈Z
if Re s > and elsewhere through analytic continuation. Numerical methods to compute the zeta function are described in ref. [17] and a table of values of φ(q) is included in ref. [18]. The source code of a set of ANSI C programs for these functions can be obtained from the authors. 3 2
References 1. Dawson, C., Martinelli, G., Rossi, G.C., Sachrajda, C.T., Sharpe, S., Talevi, M., Testa, M.: Nucl. Phys. B 514, 313 (1998) 2. Maiani, L., Testa, M.: Phys. Lett. B 245, 585 (1990) 3. Ciuchini, M., Franco, E., Martinelli, G., Silvestrini, L.: Phys. Lett. B 380, 353 (1996) 4. Montvay, I., Weisz, P.: Nucl. Phys. B 290 [FS20], 327 (1987) 5. Frick, Ch., Jansen, K., Jersák, J., Montvay, I., Münster, G., Seuferling, P.: Nucl. Phys. B 331, 515 (1990) 6. Lüscher, M., Wolff, U.: Nucl. Phys. B 339, 222 (1990) 7. Guagnelli, M., Marinari, E., Parisi, G.: Phys. Lett. B 240, 188 (1990) 8. Gattringer, C.R., Lang, C.B.: Nucl. Phys. B 391, 463 (1993) 9. Gupta, R., Patel, A., Sharpe, S.: Phys. Rev. D 48, 388 (1993) 10. Fiebig, H.R., Dominguez, A., Woloshyn, R.M.: Nucl. Phys. B 418, 649 (1994) 11. Göckeler, M., Kastrup, H.A., Westphalen, J., Zimmermann, F.: Nucl. Phys. B 425, 413 (1994) 12. Fukugita, M., Kuramashi, Y., Okawa, M., Mino, H., Ukawa, A.: Phys. Rev. D 52, 3003 (1995) 13. Aoki, S. et al. (JLQCD Collab.): Phys. Rev. D 58, 054503 (1998) 14. Gutsfeld, C., Kastrup, H.A., Stergios, K.: Nucl. Phys. B 560, 431 (1999) 15. Lüscher, M.: Commun. Math. Phys. 104, 177 (1986) 16. Lüscher, M.: Commun. Math. Phys. 105, 153 (1986) 17. Lüscher, M.: Nucl. Phys. B 354, 531 (1991) 18. Lüscher, M.: Nucl. Phys. B 364, 237 (1991) 19. Rummukainen, K., Gottlieb, S.: Nucl. Phys. B 450, 397 (1995) 20. Gasser, J., Leutwyler, H.: Phys. Lett. B 125, 325 (1983); Ann. Phys. (NY) 158, 142 (1984); Nucl. Phys. B 250, 465 (1985) 21. Gasser, J., Meissner, U.-G.: Phys. Lett. B 258, 219 (1991) 22. Knecht, M., Moussallam, B., Stern, J., Fuchs, N.H.: Nucl. Phys. B 457, 513 (1995) 23. Pallante, E., Pich, A.: Phys. Rev. Lett. 84, 2568 (2000) 24. Paschos, E.A.: Rescattering effects for / //. hep-ph/9912230 25. Buras, A.J., Ciuchini, M., Franco, E., Isidori, G., Martinelli, G., Silvestrini, L.: Phys. Lett. B 480, 80 (2000)
44
L. Lellouch, M. Lüscher
26. Bernard, C.W., Golterman, M.F.L.: Phys. Rev. D 53, 476 (1996) 27. Golterman, M.F.L., Leung, K.C.: Phys. Rev. D 56, 2950 (1997); ibid. D 57, 5703 (1998); ibid. D 58, 097503 (1998) 28. Gradshteyn, I.S., Ryzhik, I.M.: Table of Integrals, Series and Products. New York: Academic Press, 1965 Communicated by G. Mack
Commun. Math. Phys. 219, 45 – 56 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Computations of M/MS in the 2-d O(n) Non-Linear σ -Model Peter Weisz Max-Planck-Institut für Physik, Föhringer Ring 6, 80805 München, Germany Received: 23 March 2000 / Accepted: 10 April 2000
Dedicated to the memory of Harry Lehmann Abstract: We review the various computations of the ratio of the mass gap M to the -parameter entering the perturbative computations of amplitudes at high energies in the O(n) non-linear σ -model. In particular we reproduce from original notes of H. Lehmann his computation of this ratio in the (next–to) leading order of the 1/n expansion. 1. Introduction Harry Lehmann was in his later career well known to have a strong interest in phenomenology. It may thus come as a surprise to many that he also had a secret love for soluble models in two dimensions! The reason was that he was always concerned about basic field theoretical principles and structures, and these can generally be investigated more easily in 2 dimensions rather than our real world. Historically studies of 2 dimensional models have played a significant role in the development of quantum field theory and statistical mechanics – ranging from studies of phase transitions, critical exponents, non-Euclidean geometry, solitons, duality etc.; moreover they lie at the heart of the presently popular string theories. The main topic of this review is the non-linear O(n) sigma model. Classically it is simply described by the action 1 S = 2 d2 x(∂µ s a (x))2 , (1.1) 2g0 where s a (x) are fields satisfying the constraint s 2 = 1. There are many approaches to the quantization of this model, albeit the question as to which of these lead to the same theory has not yet been answered. What makes the model particularly interesting is that it shares many properties with Yang–Mills theory, the pure gluonic part of QCD, the candidate theory of the strong interactions. For example for the case n = 3 the model has instanton solutions. Moreover perturbatively it has been shown to be renormalizable
46
P. Weisz
[1, 2] and has the property of being asymptotically free, i.e. at high energies q various physical amplitudes can be computed as power series in a running coupling, e.g. gMS (q) the coupling of the MS scheme of dimensional regularization. The running coupling of any scheme falls logarithmically at high energies and the scale is set by the so-called -parameter (of that scheme), for example in the MS scheme ln ln q/MS 1 1 q q (n − 2) . (1.2) + +O ln ln ln = 2π MS n−2 MS ln q/MS g 2 (q) MS
Perturbation theory describes the running of amplitudes at high energy but one needs non-perturbative methods to relate the -parameter to low energy mass parameters. Table 1. Determinations of M/MS in the σ -model Date
Authors
n=3
’82
Lüscher
1.7 ± 0.4
general n
small vol. L
’85
Floratos,
2.1 ± 0.2
+ extrapolation L→∞
Petcher ’82
Fox et al.
Method
∼ 3.7
Lattice MC (SA)
’83
Berg et al.
∼ 1.3
Lattice MC (IA)
’89
Wolff
2.75(25)
Lattice MC (SA)
’90
Hasenfratz,
3.4 ± 0.1
Niedermayer ’85
Müller et al.
1+
’90
Biscari et al.
(ln 8 + γ − 1) n1 +···
Lehmann
1 n -Expansion
Hamilton Form. ’85
Gliozzi
2.71 · · ·
’81
Iwasaki
8/e =
exp[1/(n − 2)]
+ OPE (wrong!) Instantons
2.94 · · ·
?!
Hasenfratz, ’90
Maggiore,
TBetheA 8/e
Niedermayer Weisz, Wolff
’97
Shin
+ S-Matrix
Lüscher, ’91
(8/e)1/(n−2) [1+1/(n−2)]
Finite size 2.9(3)
scaling + lattice
It is widely assumed that the ground state |0 is O(n) invariant, and that the spectrum of stable particles consists only of an O(n) vector multiplet of particles of mass M > 0 (i.e. there are no stable bound states). The computation of ratio of M to the -parameter of the MS scheme in the σ -model has a long history which is summarized in Table 1 and will be further discussed in the next section. Lehmann’s name appears in connection with the computation of the leading orders in the 1/n expansion: M/MS ≡ C(n) = 1 + c1 /n + c2 /n2 + O(1/n3 ).
(1.3)
Computations of M/MS in the 2-d O(n) Non-Linear σ -Model
47
He completed the analytic calculation of the coefficient c1 around the same time as Biscari, Campostrini and Rossi [3]. But, as was typical for him, he did not publish his result immediately; he wanted to first compute the next coefficient c2 . He thought that with this information he could actually guess the exact result! In the meantime Hasenfratz, Maggiore and Niedermayer [4] showed that amazingly the exact result could be obtained analytically using the thermodynamic Bethe ansatz. The result on c1 thus served as a useful consistency check. In 1991 Forgács, Niedermayer and I were discussing with Lehmann the extension of the computation of M/ to the Gross–Neveu model. He agreed to send his unpublished hand-written notes and I have taken this opportunity to convert these practically unchanged to print in Appendix A. We adopted his method, which is rather elegant since it makes use of his spectral representation and does not require a particular regularization, for our 1/n computation [5]. 2. Methods of Computing the Ratio C = M/MS in the O(n) σ -Models The first computations of C were performed using lattice regularization. The literature on the subject is vast and only a few representative references have been included in the table. The lattice -parameter associated with a given lattice action is a function of the bare coupling g0 : 2 alatt = (b0 g02 )−b1 /b0 exp[−1/(b0 g02 )] 1 + O(g02 ) , (2.1) where a is the lattice spacing. One can compute the mass gap in lattice units M(g0 )a directly through numerical measurements of the exponential decay of the spin 2–point function and compare the behavior with the rhs of (2.1). The lattice data is consistent with M(g0 )/latt (g0 ) slowly varying for small g0 . Unfortunately perturbation theory in the bare coupling (for the standard action) seems badly convergent, which could explain the deviations of the values in the table from the exact result. Nevertheless we note that the ratio MS /latt which is needed to convert the result to the MS scheme often introduces a significant factor e.g. for the standard action and n = 3 it is ∼ 27, and hence any discrepancy could have been a priori much worse! Wolff [8] obtained his result by comparing the spin 2–point function measured on the lattice at short physical distance with renormalized perturbation theory. His result, obtained prior to the paper by Hasenfratz et al, is in remarkably good agreement therewith. Another approach was that of Lüscher [9]. His tactic was to consider the mass gap m(L) in a finite periodic (1d) volume L. He pointed out that for very small volumes z = m(L)L 1 the ratio c(z) = m(L)/MS could be computed perturbatively. The ratio in leading order is a rapidly falling function for small z (e.g. for n = 3, c(z) ∝ z2 exp(2π/z)) which flattens and attains a minimum at some finite z. In the limit n → ∞ the value of c(z) at the minimum is close to the known result at z = ∞. Encouraged by this and by the fact that he had previously shown that m(L) approached M exponentially for large ML [10]1 (m(L) − M)/M = f (n) √
1 2π ML
e−ML [1 + O(1/(ML))] ,
(2.2)
Lüscher assumed that c(z) was a monotonically falling function of z and tried to estimate C = c(∞). Unfortunately it turned out that his systematic error was underestimated. 1 Here f (n) is related to the forward scattering amplitude; one obtains e.g. f (3) = 32/9.
48
P. Weisz
Indeed the computation to next order, performed by Floratos and Petcher [11], yielded a central value just within his expected errors. Their result is indeed closer to the exact result but the error is still underestimated. It seems that straightforward 2-loop perturbation theory for c(z) is not quantitatively accurate at intermediate values z ∼ 1. A refined version of the above method was put forward by Lüscher, Weisz and Wolff [12] and extended later by Shin [13]. Here the running of the “coupling" m(L)L was measured on the lattice over a wide range of volumes. The continuum limit was taken assuming that the ratio of physical quantities reaches its limit rapidly as a power in the lattice cut–off (modulated by powers of logs) as proposed by Symanzik [14]. The range measured covers large volumes where m(L) ∼ M, to small volumes where renormalized perturbative behavior seems to set in, and hence permits an approximate determination of C. Actually, the first computation of the coefficient c1 in (1.3) was done by Müller, Raddatz and Rühl [15] in 1985. They started from the 1/n expansion in the lattice regularization and evaluated c1 numerically. Their result was in excellent agreement with the analytic result obtained much later by Biscari et al [3] and Lehmann working in the continuum 2 . It remains to remark that although Lehmann invested a lot of effort to compute the coefficient c2 he did not complete this calculation and to my knowledge it has also not yet been accomplished by anyone else. Iwasaki [17] produced the exact result for the case n = 3 already in 1981! He obtained this by assuming that classical multi-instantons configurations completely saturate the path integral and further argued that either instantons or anti-instanton should be included but not both. Since the approximations involved are rather questionable it is not clear whether this agreement is an accident or has a deeper significance. In the table we also note the result of Gliozzi [18] which does not agree with the exact result. We include this because the story here is rather unfortunate; Gliozzi obtained his result crucially using a wrong equation in a paper of Lüscher [19]3 . Fortunately this mistake did not effect the results in which Lüscher was actually interested in that publication. Neither author published an erratum but notes discussing the necessary corrections to [19] are available [20]. We conclude this section by outlining the computation of Hasenfratz et al [4]. It is based on a remarkable property of the O(n) sigma models which we have not mentioned so far. That is classically they have an infinite set of conserved local and non-local charges. It was shown by Polyakov [1] and Lüscher [19] that these survive quantization. In the quantum theory they have the consequence that there is no particle production and the N-particle S-matrix factorizes into a product of 2-particle S-matrices, which in turn is determined (up to CDD ambiguities) to be that proposed by Zamolodchikov and Zamolodchikov [21]. To be able to compute C exactly one must find a quantity which is calculable in perturbation theory and in a non-perturbative approach. One such quantity is the free energy f as a function of a chemical potential h coupled to a Noether charge J 12 . The euclidean action is:
1 S = 2 dD x (∂s)2 + 2ih(s 1 ∂0 s 2 − s 2 ∂0 s 1 ) − h2 1 − s32 − · · · − sn2 . (2.3) 2g0 2 The correct result appears also in the paper H. Flyvberg [16]; however his 1989 preprint on which this paper is based contained an error which was pointed out by the authors of ref. [3]. 3 When Lüscher saw Gliozzi’s paper he realized that there must be an error (which he hardly ever makes!). He immediately informed the author of [18] but it seems that it was too late for the paper to be withdrawn.
Computations of M/MS in the 2-d O(n) Non-Linear σ -Model
49
A standard one loop perturbative calculation using dimensional regularization yields f (h) − f (0) = −
(n − 2) h2 1 2 ) , + O(g(h) − 2 g 2 (h) 4π
(2.4)
MS
and hence, using (1.2) f (h) − f (0) = −
h h (n − 2) h2 1 +O ln ln ln √ + 2π 2 MS MS e n − 2
ln ln h/MS . ln h/MS (2.5)
On the other hand Polyakov and Wiegmann [22] derived an integral equation for f (h) for cases n = 3, 4 by applying the Bethe ansatz technique to related fermionic models. Hasenfratz and Niedermayer [4] showed the integral equation can be derived for general n. The largest eigenvalue of J 12 on one particle states is 1. Thus as h exceeds the threshold value of hc = M a finite density of such particles will be formed. The momenta of a dilute gas of N such particles in a periodic system of size L satisfy the eigenvalue equation pj = M sinh θj : S(θj − θr ), j = 1, 2, . . . , N, (2.6) exp(−iLM sinh θj ) = r=j
where S(θ ) is the invariant amplitude in the symmetric and traceless channel. If the gas is not dilute, multi–particle scattering processes will also enter, which in general leads to a more complicated problem. However because of factorization of the S–matrix one can argue that Eq. (2.6) remains true for arbitrary densities ρ = N/L. Due to the property S(0) = −1, the threshold behavior h ≥ M is described by a dilute non-relativistic non-interacting Fermi gas. The derivation of the integral equation starts from (2.6) and follows standard steps. Taking the thermodynamic limit N → ∞, L → ∞, ρ fixed the density g(θ ) (normalized so that dN = (L/2π )g(θ)dθ ) satisfies the integral equation B g(θ ) − dθ K(θ − θ , n)g(θ ) = M cosh θ, (2.7) −B
where the kernel K is related to the S–matrix amplitude through K(θ, n) =
1 d ln S(θ ). 2π i dθ
The energy E of the ground state and the particle density ρ are given by B 1 dθg(θ )M cosh θ, E= 2π −B B 1 ρ= dθg(θ ). 2π −B
(2.8)
(2.9) (2.10)
The free energy density can be obtained from the above equations by performing the Legendre transform f (h) = minρ [E(ρ) − hρ].
(2.11)
50
P. Weisz
One obtains M f (h) − f (0) = − 2π
B
−B
dθ +(θ ) cosh θ,
where +(θ ) satisfies the integral equation B +(θ ) − dθ K(θ − θ , n)+(θ ) = h − M cosh θ. −B
(2.12)
(2.13)
The parameter B is determined through the boundary condition +(±B) = 0.
(2.14)
The analysis of the integral equation in the limit of h M uses the generalized Wiener–Hopf technique, and is rather involved [4]. The result for h M is f (h) − f (0) = −
(n − 2) h2 h h ln ln(h/M) 1 ln √ + ln ln + O( , (2.15) 2π 2 M ln(h/M)) M e n−2
where 1 1 8 n−2 c(n) = e 1+
1 n−2
.
(2.16)
Comparing (2.5) and (2.15) we see that C(n) = c(n).
(2.17)
3. Discussion A glance at the table gives a quantitatively satisfactory impression; all methods for determining the ratio C agree rather well (and as noted before for the lattice determinations it is note a priori obvious that this should be so). On the other hand the overall agreement could just be an accident! Obviously the task of computing C(n) only makes sense if there really exists a non-perturbative definition of the theory which behaves according to renormalized perturbation theory at high energies. The problem is that there is no non-perturbative definition of the model where this has been rigorously proven, indeed although supposedly integrable no completely satisfactory solution starting from first principles (e.g. a la Bethe Ansatz) has been given4 . As mentioned above the thermodynamic Bethe Ansatz computation above is fully consistent with perturbation theory, but there are some unproven assumptions which enter into the computation. In this connection one may argue that the fact that the leading ln(h) perturbative behavior is obtained from the thermodynamic Bethe ansatz is not surprising because the Zamolodchikov S-matrix satisfies (on-shell) asymptotic freedom in the sense that the phase shifts go to zero logarithmically at high energies. However what is highly non-trivial is that the coefficient non-leading terms involving ln ln h (which in the perturbative approach is related to the 2-loop coefficient of the beta function) also match; without this of course one would have obtained an inconsistency! 4 A few physicists, e.g. L. Faddeev, might disagree with this statement!
Computations of M/MS in the 2-d O(n) Non-Linear σ -Model
51
There is one non-perturbative approach to quantization of the sigma-model, the form factor bootstrap [23, 24], which is nearly definitely asymptotically free. The best evidence for this has been produced by Balog and Niedermaier [25]. In this approach off-shell correlation functions are obtained starting from the on-shell data and using general field theoretic properties. Here the crucial question which remains is whether this approach really defines a quantum field theory (although no reason has yet been put forward that this should not be the case). The situation with the construction of the theory starting from the lattice regularization is more subtle. Practically all present lattice investigations crucially rely on one or both of the following assumptions. The first is that the critical point of the (standard) lattice model is at g0 = 0, and the second is that the continuum limit is reached à la Symanzik (mentioned in the previous section). Although much data is consistent with these, a proof of either is lacking. A failure of the Symanzik hypothesis would unfortunately be a big blow to the goal of obtaining accurate results in QCD from numerical simulations. In fact both assumptions have been questioned by Patrascioiu and Seiler [26]; they claim that, for the standard action, there is a critical point at some gc > 0 and that the continuum limit of the lattice model O(n) is not asymptotically free for any n ≥ 2! If Patrascioiu and Seiler are correct it would point to a serious gap in our understanding of universality and that there would be various classes of non-linear O(n) sigma models with differing high energy behavior. It is known is that the continuum limit of the lattice model is quantitatively consistent with the Zamolodchikov S-matrix at low energies [28] and correlation functions are consistent with perturbation theory up to very high energies of O(q/M) = 50 [25]. Thus although the question of whether the continuum limit of the standard lattice theory is asymptotically free is theoretically highly interesting, it is probably phenomenologically irrelevant in the sense that there is a wide range of energies effectively described by the lattice regularization and by renormalized perturbation theory. Infinite energies are an idealization, and if one extends similar thoughts to QCD then this theory (assuming it correctly describes hadronic phenomena) does not “stand alone” in Nature; in particular at high energies the other interactions which come into play must be taken into account.
Appendix. Lehmanns Computation The one-particle states |p, a ,
a = 1, . . . , n,
(A.1)
are labeled by a momentum p = (p 0 , p1 ),
p0 =
M 2 + (p 1 )2 > 0,
(A.2)
and an isospin label a. The normalization of these states may be chosen such that p, a|q, b = δ ab 2p 0 2π δ(p 1 − q 1 ).
(A.3)
Let s a (x) be the renormalized spin field normalized such that 0|s a (x)|q, b = δ ab e−ipx .
(A.4)
52
P. Weisz
We consider the 2-point function 0|s a (x)s b (0)|0 = δ ab i+ (x).
(A.5)
The leading terms of i+ (x) for euclidean distance |x| → 0 computed in renormalized perturbation theory are n−1 (n − 1) 1 1 C(n) + n−2 1+ ln L + ln i = f rac12π a(n)L (n − 2) L n − 2 K(n) (A.6) ln L 2 × 1+O , L with L = − ln ξ,
ξ = M|x|,
(A.7)
and K(n) =
1 exp[γ − 1/(n − 2)], 2
(A.8)
where γ denotes Euler’s constant. To order 1/n we have 1 + ..., n C(n) 1 ln = ln 2 − γ + (1 + c1 ) + . . . , K(n) n a(n) = 1 + a1
(A.9) (A.10)
and hence i+ =
1 1 L + ln 2 − γ + L ln L + a1 L + (1 + ln 2 − γ ) ln L 2π n
(A.11)
+ (ln 2 − γ )(1 + a1 ) + 1 + c1 + O(1/L) + O(1/n2 ) .
In the following we will show 4 + γ − 3, π c1 = 3 ln 2 + γ − 1.
a1 = ln
(A.12) (A.13)
First we write i+ =
1 i + O(1/n2 ). K0 (ξ ) + + 2π n 1
Now + 1 is given by the imaginary part of the diagram
(A.14)
Computations of M/MS in the 2-d O(n) Non-Linear σ -Model
53
where the solid line corresponds to the propagator 1/(k 2 − M 2 ), and the wavy line to the propagator D(q): ∞ dκ 2 D(q) = · E(κ 2 ), (A.15) 2 2 4M 2 q − κ √ 4π κ κ 2 − 4M 2 2 E(κ ) = . (A.16) √ ln2 (κ + κ 2 − 4M 2 )2 /4M 2 + π 2 Thus we have the spectral representation 1 + i1 (x) = d2 ke−ikx θ(k0 )ρ1 (k 2 ) 2π ∞ 1 dK 2 K0 (K|x|)ρ1 (K 2 ), = 2π 9M 2 with
(A.17)
d2 qθ(k0 − q0 )θ (q0 )δ (k − q)2 − M 2 E(q 2 ) √ ∞ 2 2 1 2 E(κ )θ ( k − κ − M) = dκ . (A.18) 2π 4M 2 (k 2 + κ 2 − M 2 )2 − 4k 2 κ 2 1 2π
(k 2 − M 2 )2 ρ1 (k 2 ) =
Now we change the integration variables in (A.17) to η, t by ch3η − cht K = Mu, u = , chη − cht
(A.19)
κ = 2Mchη, to obtain i+ 1 (ξ ) =
1 2π
∞
dη(η2 + π 2 /4)−1
0
(A.20)
η
dtK0 (ξ u)
0
chη − cht . shη
(A.21)
We need the leading terms of + 1 (ξ ) for ξ → 0 neglecting terms of O(1/L). To deal with this we proceed in three steps. L Step 1. We show that the η-integration can be restricted to 0 . Consider η ∞ chη − cht dη(η2 + π 2 /4)−1 dtK0 (ξ u) . (A.22) I0 = shη L 0 Since K0 (z) decreases monotonically and u > eη we have ∞ dηK0 (ξ eη )Iˆ 0 (η), I0 < L
with Iˆ 0 (η) =
1 2 shη(η + π 2 /4)
η 0
dt (chη − cht).
(A.23)
(A.24)
54
P. Weisz
Now ηcothη − 1 η 1 Iˆ 0 (η) = 2 < 2 < , 2 2 η + π /4 η + π /4 η and thus 1 I0 < L
∞
L
1 dηK0 (ξ e ) = L η
∞
1
ds K0 (s) = O(1/L). s
(A.25)
(A.26)
Step 2. Now we define Kˆ 0 through K0 (z) = − ln z + ln 2 − γ + Kˆ 0 (z), so that Kˆ 0 (z) = O(z2 ln z) for small z. Then η L chη − cht I1 ≡ = O(1/L) dη(η2 + π 2 /4)−1 dt Kˆ 0 (ξ u) shη 0 0
(A.27)
(A.28)
follows by elementary calculation if the limit ξ → 0 can be taken in the integral 5 . Step 3. So we finally have i+ 1 (ξ ) =
1 (L + ln 2 − γ )I2 + I3 + O(1/L), 2π
where
L
I2 =
(A.29)
dηIˆ 0 (η),
(A.30)
dη Iˆ 3 (η), shη(η2 + π 2 /4)
(A.31)
dt (chη − cht) ln u2 .
(A.32)
0
I3 = − Iˆ 3 (η) =
1 4π
0
L η
0
Now
∞ dη(η − 1) dηηe−η + + O(ξ −2 ) 2 2 2 + π 2 /4) η + π /4 shη(η 0 0 2
1 2 2 2 L = ln η + π /4 |0 − arctan L + ln 2 + γ − 1 + O(ξ −2 ) 2 π π 4 1 = ln L + ln + γ − 2 + + O(1/L2 ). π L
I2 =
L
(A.33) (A.34) (A.35)
Next we note that Iˆ 3 (η) can be calculated by elementary integration with the result (1) (2) Iˆ 3 = Iˆ 3 + Iˆ 3 ,
(A.36)
(1) Iˆ 3 = −4shηch2 η ln(2chη) + 4ηchηsh2 η,
(A.37)
with
5 Lehmann remarks in his notes that he didn’t show that this was actually possible.
Computations of M/MS in the 2-d O(n) Non-Linear σ -Model
and (2) Iˆ 3 = 2chη
2η
dx ln shx − 2 0 = 2chη η2 + 2G(η) ,
where
55
η
dx ln shx
dx ln 1 + e−2x 0
η dx xe−x = η ln 1 + e−2η + . chx 0
G(η) =
(A.38)
0
(A.39)
η
(A.40) (A.41)
By contour integration (shifting to a path parallel to the real axis with imaginary part iπ) we get 1 (1) I3 = ln L + γ + ln 2 − 1 . (A.42) 2π Further 1 (2,I) (2) (2,II) (2,III) + O(1/L), (A.43) I3 = − I3 + I3 + I3 2π where L dη η2 π2 (2,I) I3 = = L − , (A.44) 2 2 4 0 η + π /4 π/2 ∞
e−η π2 dη η2 dy π (2,II) ) = − ln 2 − − y tan y, (A.45) = I3 (η2 + π 2 /4 shη 4 y 2 0 0 ∞ dη (2,III) =2 cothηG(η). (A.46) I3 2 η + π 2 /4 0 We introduce H (z) = G(z + iπ/2) = G(z − iπ/2), (Re z ≥ 0)
z dx xe−x π2 −2z = − + z ln 1 − e , 8 shx 0 which is analytic for |Im z| < π . Now 2 π/2 dy (2,III) = tan y H (iy) + H (−iy) I3 π 0 y π/2 dy π 2 tan y( − y)2 = π 0 y 2 π/2
dy π − y tan y − ln 2, = y 2 0
(A.47) (A.48)
(A.49) (A.50) (A.51)
and so π2 − 2 ln 2. 4 Putting everything together we get the final result (A.13). (2,II)
I3
(2,III)
+ I3
=
(A.52)
56
P. Weisz
References 1. Polyakov, A.M.: Phys. Lett. B 59, 79 (1975) 2. Brézin, E. and Zinn-Justin, J.: Phy. Rev. B 14, 3110 (1976); Brézin, E., Zinn-Justin, J. and Le Guillou, J.C.: Phys. Rev. D 14, 2615 (1976) 3. Biscari, P., Campostrini, M. and Rossi, P.: Phys. Lett. B 242, 225 (1990) 4. Hasenfratz, P., Maggiore, M. and Niedermayer, F.: Phys. Lett. B 245, 522 (1990); Hasenfratz, P. and Niedermayer, F.: Phys. Lett. B 245, 529 (1990) 5. Forgács, P., Niedermayer, F. and Weisz, P.: Nucl. Phys. B 367, 123 (1991); ibid. B 367, 144 (1991) 6. Fox, G., Gupta, R., Martin, O. and Otto, S.: Nucl. Phys. B 205 [FS5], 188 (1982) 7. Berg, B., Meyer, S. and Montvay, I.: Nucl. Phys. B 235 [FS11], 149 (1984) 8. Wolff, U.: Nucl. Phys. B 334, 581 (1990); Phys. Rev. Lett. 62, 361 (1989) 9. Lüscher, M.: Phys. Lett. B 118, 391 (1982) 10. Lüscher, M.: In: Progress in Gauge Field Theory, ed G.’t Hooft et al., New York: Plenum, 1984 11. Floratos, E. and Petcher, D.: Nucl. Phys. B 252, 689 (1985) 12. Lüscher, M., Weisz, P. and Wolff, U.: Nucl. Phys. B 359, 221 (1991) 13. Shin, D.-S.: Nucl. Phys. B 496, 408 (1997) 14. Symanzik, K.: Nucl. Phys. B 226, 187 (1983) For a review see Lüscher, M.: Improved Lattice Gauge Theories, In: Les Houches 1984, Proceedings, Critical Phenomena, Random Systems, Gauge Theories, pp. 359–374; and “Advanced Lattice QCD”, Talk given at Les Houches Summer School in Theoretical Physics 1997, hep-lat/9802029 15. Müller, V.F., Raddatz, T. and Rühl, W.: Nucl. Phys. B 231 [FS13], 212 (1985) 16. Flyvberg, H.: Phys. Lett. B 245, 533 (1990) 17. Iwasaki, Y.: Phys. Lett. B 104, 458 (1981); Prog. Theor. Phys. 68, 448 (1982) 18. Gliozzi, F.: Phys. Lett. B 153, 403 (1985) 19. Lüscher, M.: Nucl. Phys. B 135, 1 (1978) 20. Lüscher, M.: Addendum to ref.[19]. Unpublished notes (1986) 21. Zamolodchikov, A.B. and Zamolodchikov, Al.B.: Ann. Phys. 120, 253 (1979); Nucl. Phys. B 133, 525 (1979) 22. Polyakov, A. and Wiegmann, P.B.: Phys. Lett. B 131, 121 (1983); Wiegmann, P.B.: Phys. Lett. B 152, 209 (1985); JETP. Lett. 41, 95 (1985) 23. Karowski, M. and Weisz, P.: Nucl. Phys. B 139, 455 (1978) 24. Smirnov, F.A.: Form factors in Completely Integrable Models of Quantum Field Theory. Singapore: World Scientific, 1992 25. Balog, J. and Niedermaier, M.: Nucl. Phys. B 500, 421 (1997); Phys. Rev. Lett. 78, 4151 (1997) 26. Patrascioiu, A. and Seiler, E.: Phys. Rev. Lett. 74, 1920 (1995); ibid 1924 27. Patrascioiu, A. and Seiler, E.: hep-th/0002153 28. Lüscher, M. und Wolff, U.: Nucl. Phys. B 339, 222 (1990) Communicated by W. Zimmernann
Commun. Math. Phys. 219, 57 – 76 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Facts and Fictions About Anti deSitter Spacetimes with Local Quantum Matter Bert Schroer Institut für Theoretische Physik, FU-Berlin, Arnimallee 14, 14195 Berlin, Germany Received: 24 March 2000 / Accepted: 27 February 2001
Dedicated to the memory of Harry Lehmann Abstract: It is natural to analyse the AdSd+1 -CQFTd correspondence in the context of the conformal-compactification and covering formalism. In this way one obtains additional insight about Rehren’s rigorous algebraic holography in connection with the degree of freedom issue which in turn allows to illustrate the subtle but important differences beween the original string theory-based Maldacena conjecture and Rehren’s theorem in the setting of an intrinsic field-coordinatization-free formulation of algebraic QFT. I also discuss another more generic type of holography related to light fronts which seems to be closer to ’t Hooft’s original ideas on holography. This in turn is naturally connected with the generic concept of “Localization Entropy”, a quantum pre-form of Bekenstein’s classical black-hole surface entropy. 1. Historical Background There has been hardly any problem in particle physics which has attracted as much attention as the problem if and in what way quantum matter in the Anti deSitter spacetime and the one dimension lower conformal field theories are related and whether this could possibly contain clues about the meaning of quantum gravity. In more specific quantum physical terms the question is about a conjectured [1–3] (and meanwhile in large part generically and rigorously understood [6]) correspondence between two quantum field theories in different spacetime dimensions; the lowerdimensional conformal one being the “holographic image” or projection of the AdS theory. Conjectures, different from mathematical proofs; allow of course almost always a certain margin in their precise mathematical formulation and in their physical interpretation. The field theoretic content of this conjecture has often been interpreted as This work received financial support from the CNPq
Present address: CBPF, Rua Dr. Xavier Sigaud, 22290-180 Rio de Janeiro, Brazil.
E-mail:
[email protected]
58
B. Schroer
a correspondence between two Lagrangian field theories (e.g. between a conformally invariant 4-dimensional SYM and a higher dimensional spin = 2 gravitational-like theory). The exact theorem says that such a correspondence cannot exist; one side has to be non-Lagrangian. There is no exception to this proposition; not even the assumption of supersymmetry helps here. One of our goals is to spell this out in detail and to illustrate this interesting point with a simple model. The community of string physicists has placed this correspondence problem in the center of their interest. Remembering the great conceptual and calculational achievements as e.g. the derivation of scattering theory and dispersion relations from field theory with which the name of Harry Lehmann (to whose memory this article is dedicated) is inexorably linked, I will limit myself to analyze the particle physics content of the socalled Anti deSitter conformal QFT-correspondence from the conservative point of view of a quantum field theorist who, although having no active ambitions outside QFT, still nourishes a certain curiosity about present activities in particle physics as e.g. string theory or noncommutative geometry. In the times of Harry Lehmann the acceptance of a theoretical proposal in particle physics was primarily coupled to its experimental verifiability and/or its conceptual standing within physics. The AdS model of a curved spacetime has a long history [4, 5] as a theoretical laboratory of what can happen with particle physics in a universe which is the extreme opposite of globally hyperbolic in that it possesses a self-closing time, whereas the proper de Sitter spacetime was once considered among the more realistic models of the universe. The recent surge of interest about AdS came from string theory and is different in motivation and more related to the hope (or dream) to attribute a meaning to “Quantum Gravity” from a string theory viewpoint. Fortunately for a curious outsider (otherwise I would have to quit right here), this motivation has no bearing on the conceptual and mathematical problems posed by the would be AdS-conformal QFT correspondence; the latter turned out to be one of those properties discovered in the setting of string theory which allows an interesting and rigorous formulation in QFT which confirms some, but not all, the conjectured properties. The rigorous treatment however requires a reformulation of (conformal) QFT within a more algebraic setting. The standard formalism based on pointlike “field coordinatizations” which underlies the Lagrangian (and the Wightman) fomulations does not provide a natural setting for the study of isomorphisms between models in different spacetime dimensions, even though the underlying physical principles are the same. One would have to introduce many additional concepts and auxiliary tricks into the standard framework to the extent that the formulation appears contrived containing too many ad hoc prescriptions. The important aspects in this isomorphism are related to space and timelike (Einstein, Huygens) causality, localization of corresponding objects and problems of degree of freedom counting. All these issues are belonging to real-time physics and in most cases their meaning in terms of Euclidean continuation (statistical mechanics) remains obscure; but this of course does not make them less physical. This note is organized as follows. In the next section I elaborate the kinematical aspects of the AdSd+1 -CQFTd situation as a collateral result of the old (1974/75) compactification formalism for the “conformalization” of the d-dimensional Minkowski spacetime. For this reason the seemingly more demanding problem of studying QFT directly in AdS within a curved spacetime formalism can be bypassed. The natural question whose answer would have led directly from CQFT4 to AdS5 in the particle physics setting (without string theory as a midwife) is: does there exist a quantum field theory which has the same SO(4, 2) symmetry and just reprocesses the CQFT4 matter con-
Anti deSitter Spacetimes with Local Quantum Matter
59
tent in such a way that the “conformal Hamiltonian” (the timelike rotational generator ¯ becomes the true hamiltonian? This theory indeed exists, it is through compactified M) an AdS theory with a specific local matter content computable from the CQFT matter content. The answer is unique, but as a result of the different dimensionality one cannot describe this one-to-one relation between spacetime indexed matter contents in terms of pointlike fields. This will be treated in Sect. 3, where we will also compare the content of Rehren’s isomorphism [6, 8] with the Maldacena, Witten at al. [1–3] conjectures and notice some subtle but potentially serious differences in case one interprets the conjecture (as it was done in most of the subsequent literature) as a relation between two Lagrangian theories. Whoever is aware of the fact that subtle differences often have been the enigmatic motor of progress, will not dismiss such observations. The last section presents some general results of AQFT on degrees-of-freedomcounting and holography. Closely connected is the idea of “chiral scanning” i.e. the encoding of the full content of a higher dimensional (massive) field QFT into a finite number of copies of one chiral theory in a carefully selected relative position within a common Hilbert space. In this case the price one has to pay for this more generic holography (light-front holography) is that some of the geometrically acting spacetime symmetry transformations become “fuzzy” in the holographic projection and some of the geometrically acting symmetries on the holographic image are not represented by diffeomorphisms if pulled back into the original QFT.
2. Conformal Compactification and AdS The simplest type of conformal QFT is obtained by realizing zero mass Wigner representation of the Poincaré group with positive energy (and discrete helicity) and allowing for a natural extension to the conformal symmetry group SO(4, 2)/Z2 without any enlargement of the Hilbert space. Besides scale transformations, this larger symmetry also incorporates the fractional transformations (proper conformal transformations) x =
x − bx 2 . 1 − 2bx + b2 x 2
(1)
It is often convenient to view this formula as the action of the translation group T (b) conjugated with a (hyperbolic) inversion I −x , x2 x = I T (b)I x. I :x→
(2) (3)
I does not belong to the above conformal group, although it is unitarily represented (and hence a Wigner symmetry) in these special Wigner representations. For fixed x and small b formula (1) is well defined, but globally it mixes finite spacetime points with infinity and hence requires a more precise definition (in particular in view of the positivity energy-momentum spectral properties) in its action on quantum fields. Hence as a preparatory step for the adequate formulation of quantum field theory concepts, one has to achieve a geometric compactification. This starts most conveniently from a linear representation of the conformal group SO(d, 2) in d+2-dimensional auxiliary space
60
B. Schroer
R(d,2) (i.e. without field theoretic significance) with two negative (time-like) signatures gµν (4) G= −1 +1 and restricts this representation to the (d + 1)-dimensional forward light cone 2 LC (d,2) = {ξ = (ξ, ξ4 , ξ5 ); ξ 2 + ξd2 − ξd+1 = 0},
(5)
where ξ 2 = ξ02 − ξ 2 denotes the d-dimensional Minkowski length square. The compactified Minkowski space M¯ d is obtained by adopting a projective point of view (stereographic projection) ξ M¯ d = x = (6) ; ξ ∈ LC (d,2) . ξd + ξd+1 It is then easy to verify that the linear transformations, which keep the last two components invariant, consists of the Lorentz group and those transformations which only transform the last two coordinates, yield the scaling formula ξd ± ξd+1 → e±s (ξd ± ξd+1 )
(7)
leading to x → λx, λ = es . The remaining transformations, namely the translations and the fractional proper conformal transformations, are obtained by composing rotations in the ξi -ξd and boosts in the ξi -ξd+1 planes. A convenient description of Minkowski spacetime M in terms of this d + 2 dimensional auxiliary formalism is obtained in terms of a “conformal time” τ , Md = (sin τ, e, cos τ ), e ∈ S d−1 , e sin τ , x= d , t= d e + cos τ e + cos τ ed + cos τ > 0, −π < τ < +π
(8) (9)
so that the Minkowski spacetime is a piece of the d-dimensional wall of a cylinder in d+1 dimensional spacetime which becomes tiled with the closure of infinitely many Minkowski worlds. If one cuts the wall on the backside appropriately, this carved out piece representing d-dimensional compactified Minkowski spacetime has the form of a d-dimensional double cone positional symmetrically around τ = 0, e = (0, ed = 1) without its boundary1 . The above directional compactification leads to an identification of boundary points at “infinity” and give e.g. for d=1+1 the compactified manifold the ¯ topology of a torus. The points which have been added at infinity to M namely M\M are best described in terms of the d-1 dimensional submanifold of points which are lightlike with respect to the past infinity apex at m−∞ = (0, 0, 0, 0, 1, τ = −π ). The cylinder d = S d−1 × R which is “tiled” in both τ -directions walls form the universal covering M by infinitely many Minkowski spacetimes (“heavens and hells”) [12]. If the only interest 1 The graphical representations are apart from the compactification (which involves identifications between past and future points at time/light-infinity), the famous Penrose pictures of M.
Anti deSitter Spacetimes with Local Quantum Matter
61
¯ then one may as well stay with the original is the description of the compactification M, x-coordinates and write the d+2 ξ -coordinates follow Dirac and Weyl as ξ µ = x µ , µ = 0, 1, 2, 3, 1 ξ 4 = (1 + x 2 ), 2 1 5 ξ = (1 − x 2 ), 2 2
2
i.e. ξ − ξ = x − x .
(10)
Since ξ is only defined up to a scale factor, we conclude that lightlike differences retain an objective meaning in M¯ even though the space- and time-like separation does lose its meaning. An example of a physical theory on M¯ are free photons. The impossibility of a distinction between space- and time-like finds its mathematical formulation in the Huygens principle which says that the lightlike separation is the only one where the physical fields do not commute and hence where an interaction can happen. In the terminology of local quantum physics this means that the commutant of an observable algebra localized in a double cone consists apparently of a (Einstein causal) connected spacelike – as well as two disconnected (Huygens causality) timelike – pieces. But taking the compactification into consideration one realizes that all three parts are connected and ¯ In terms of Wightman correlation the space/time-like distinction is meaningless on M. functions this is equivalent to the rationality of the analytically continued Wightman functions of observable fields which includes an analytic extension into timelike Jost points [37, 18]. Therefore in order to make contact with particle physics aspects, the use of either the or of more general fields (see the next section) on M¯ is very important since covering M only in this way one can implement the pivotal property of causality together with the associated localization concepts. As first observed by I. Segal [11] and later elaborated and brought into the by now standard form in field theory by Lüscher and Mack [12], a global form of causality can be based on the sign of the invariant
2 ξ(e, τ ) − ξ(e , τ ) ≷ 0, hence
1
2
e − e
τ − τ ≷ 2 Arcsin
= Arccos e · e ,
4
(11)
where the < inequality characterizes global spacelike distances and > corresponds to positive and negative global timelike separations. Whereas the globally spacelike region of a point is compact, the timelike region is not. The concept of global causality solves the so-called Einstein causality paradox of CQFT [13]. In the next section we will meet a global decomposition method which also avoids this paradox without the necessity of using covering space. The central theme, namely the connection with QFT on AdS enters this section naturally if one asks the question whether one can use instead of the surface of the forward light cone a mass hyperboloid Hd+1 inside the forward light cone of the same
62
B. Schroer
ambient d+2 dimensional space, Hd+1 = η; η2 = 1 , η0 = 1 + r 2 sin τ, ηi = rei , i = 1, . . . d, ηd+1 = 1 + r 2 cos τ.
(12)
This space which because of its formal relation to the analogous deSitter spacetime (which is defined by the spacelike hyperboloid) is called “Anti deSitter” spacetime is noncompact. It is obvious from its construction that its asymptotic part is the same as M¯ d . It was conjectured by Maldacena and others [1–3] that there is also a correspondence between quantum field theories. This conjecture implies the tacit assumption (not explicitly stated in these papers) that an AdSd+1 QFT which coalesces asymptotically2 with an CQFTd theory has a unique extension into the AdS bulk. Since there can be no mapping between pointlike fields on spacetimes of different dimensions the question of the origin of this unique extension is non-trivial. The conjecture came from some speculations concerning possible relations of string theory with some supersymmetric gauge theories (SYM), i.e. from ideas far removed from the present particle physics setting which therefore will not be explained here. In the 70s, at the time of the conformal compactifications, free fields on AdS4 were studied from a particle physics viewpoint by Fronsdal [4]. The correspondence to CQFT3 was overlooked, probably because of the fact that despite the obvious group theoretical connection through the common SO(3, 2), the multiplicites of the discrete AdS free Hamiltonian turned out too big for matching those of the rotational conformal Hamiltonian, a fact which will find its explanation in the next section. Although the two spacetimes cannot be mapped into each other, their shared spacetime symmetry group SO(4, 2) suggests that there is at least a correspondence between certain subsets which may be obtained from projecting down wedge regions from the ambient space onto the two spacetime manifolds. Wedges have a natural relation to SO(4, 2); they may all be generated from the standard wedge in the ambient auxiliary space Wst = ξ 1 > ξ 0 . The fixed point group of this transitive action on wedges consists of a boost and transversal translations and rotations3 . The projected wedges pW on AdS are by definition again wedges in AdS/CQFT and the SO(d, 2) symmetry group has the same transitive action, i.e. the system of wedges is described by SO(d, 2) modulo the fixed point subgroup. This geometric situation clearly suggests that on should consider algebras associated with these wedges instead of looking for a relation between pointlike fields. On the conformal side this includes all double cone algebras of arbitrary small size since the noncompact wedge regions are conformally equivalent to compact double cone regions. The logic of algebraic QFT requires to continue this algebraic correspondence to all intersections obtained from wedges. In this way one expects to arrive at an isomorphism which carries the full content of both theories and which includes the asymptotic relation (on the conformal surface of the aforementioned cylinder) in terms of field coordinatizations used by Maldacena et al. In order to obtain a rigorous 2 Using the previous cylindric representation of the conformal covering, the covering of AdS corresponds to the full cylinder of which its mantel is the conformal covering. 3 If one adds the two longitudinal lightlike translations which in one direction cause a compression into the wedge, one obtains a 8-dimensional Galilei group [21].
Anti deSitter Spacetimes with Local Quantum Matter
63
proof, one must check some consistency conditions in the conversion of maps between spacetime regions and algebras indexed by those regions. This was achieved by Rehren [6] and his theorem will be briefly commented on (including its relation to the original conjecture) in the next section. According to our previous remarks, interacting conformal local fields live on the Fortunately the geometric isomorphism between wedge regions can covering space M. d+1 − M correspondence. The conformal decomposition theory of be lifted to an AdS the next section avoids the use of the rather complicated coverings by using an operator analog of fibre bundles on M¯ 3. The Conformal Hamiltonian as the True Hamiltonian There is another less geometric, but more particle physics type of argument, which leads to the AdS-CQFT correspondence. For this one should recall that in SO(d, 2) there are besides the usual translations with infinity as a fixed point also “conformal translations” which act without fixed points on the compactified M¯ as some kind of “timelike rotations”. They are the analogs of (±) the light-like chiral rotation R (±) (L0 in standard Virasoro algebra notation) and their connection with the light ray translation P (±) with which they share the positivity of their spectrum is R (±) = P (±) + K (±) , K (±) = I (±) P (±) I (±) ,
(13)
where I± is the representer of the chiral conformal reflection x → − x1 (in linear lightray coordinates x) and K is the generator of the fractional special conformal transformation (1). For free zero mass fields the discrete R-spectrum can be understood in terms of that of a Hamiltonian for a massless model in a spatial box. This is however not possible for the R-spectrum of chiral theories with anomalous scale dimension (the R-spectrum is known to be identical to that of scale dimensions). In that case the only theory for which the spectrum is that of its Hamiltonian is the QFT on AdS2 . So if one wants to read the SL(2, Z) modular characters of chiral conformal field theory in the spirit of a Hamiltonian Gibbs formula one should use the AdS side. An analogous statement holds in higher dimensions where the M¯ rotation is described in terms of a Lorentz vector Rµ , Rµ = Pµ + I Pµ I, where the inversion I was defined at the beginning of the previous section. It leads to a family of operators with discrete spectrum of e · R which are dependent on a timelike vector eµ . Again the operator R0 is the true Hamiltonian of only one theory with the same symmetry group and the same system of algebras (but with a different spacetime indexing): the associated d+1 dimensional AdS theory. Now it is time to quote (adapted to our purpose) Rehren’s theorem and comment on it. Theorem 1. The geometric bijection between projected wedges pW on AdSd+1 and the ¯ d which constitute the asymptotic infinity of pW (as deconformal double cones in M scribed in the previous section) extends to an isomorphism of the corresponding algebras. Both theories share the same Hilbert space and the same family of operator algebras, but their spacetime organization and with it their physical interpretation changes.
64
B. Schroer
For the proof we refer to Rehren [6]. Some comments are in order. There are no additional restrictive assumptions (supersymmetry, vanishing β-functions) on either side. If the algebras of the AdS theory are generated by pointlike fields then the associated conformal algebra cannot be generated by a field which has an energy-momentum tensor or obeys a causal equation of motion. This is one of Rehren’s conclusions and it is very instructive to illustrate this with an example. Consider a free scalar AdS field [7]. A simple calculation which will not be repeated here reveals that it corresponds to a conformal generalized free field with homogeneous Kallen–Lehmann spectral function. Generalized free fields always have been physically suspect and if the spectral functions increase in the manner as the homogeneous degree demands in this case, one can prove that primitive causality [17] is violated since the algebra on a piece of time-slice (represented as a chain of small double cones which approximate the compact slice from the inside) is not equal to its causal completion (causal shadow) algebra. As one moves up in time inside the causal shadow from the time-slice more and more degrees of freedom coming from the inner parts of the bulk enter the causal shadow which were not in the time-slice. Rehren’s graphical representation [33] of the CQFT world on the wall of a full AdS cylinder makes this undesired sidewise propagation geometrically visible. This free field situation is generic in the sense that pointlike AdS fields always carry too many degrees of freedom which lead to a violation of causal propagation in the aforementioned sense4 . Such theories have to be abandoned for general physical reasons (not just because they do not fit into a Lagrangian picture which automatically implies causal propagation). Therefore the nice idea [34] to circumvent the scarcity in constructing Lagrangian conformal models (the β-function restrictions) by starting instead with AdS Lagrangians does not work; the resulting conformal theories all share the above defect. In passing we mention that the brane idea shares the same causality conflict with pointlike field. Whereas from a mathematical viewpoint a manifold of interest may in certain cases be considered a brane of a larger dimensional space, the assignment of a physical reality to the ambient spacetime generates causality problems of the above kind for restrictions to the brane in the case of pointlike field theories in the larger ambient spacetime. Only if the ambient degrees of freedom are carefully tuned to the brane can such causality violations be avoided. Note that it is always the causal shadow property which may get lost in such constructions and not the Einstein causality. This is not visible if one restricts one’s attention to (semi)classical solutions concentrated on a brane and or to euclidean formulations. Whereas the principles of AQFT confirm in a very precise way that there exists an isomorphism, it is very interesting that there is a clash with certain concepts which have been used in string theory for the last two decades. This clash extends beyond the above remarks on the AdS-CQFT and the brane concept and casts doubt on the consistency of such quasiclassical pictures as the Kaluza–Klein dimensional reduction. As a matter of fact the quasiclassical Klein–Kaluza reduction idea has been shown to be consistent with causal QFT. For this one would have to demonstrate that the idea works on the vacuum expectation of the ambient QFT and not just on the objects involved in the formal quantization approach which is used in the tentative construction of a QFT. 4 Contrary to a widespread belief, the number of degrees of freedom of causally propagating AdS theories is always larger than that of causally propagating conformal theories so that the isomorphism cannot be one among causally propagating theories. If the AdS theory is pointlike and causally propagating, the associated conformal theory has no causal propagation.
Anti deSitter Spacetimes with Local Quantum Matter
65
As far as the strict conceptual requirement of causality and Haag duality in AQFT are concerned, the K-K mechanism, to the extent that it is not just a mathematical trick (but an asymptotic property of a genuine inclusion of two local quantum physics worlds) has at best remained an enigmatic speculative idea (and at worst a tautology caused by not doing what one actually is claiming to do). The above degree of freedom discussion creates the suspicion that “good” causal conformal theories may have too few degrees of freedom in order to yield AdS pointlike fields as the other side of the coin of the above observation that pointlike AdS fields create causally bad conformal theories. This is indeed the case and can be seen by starting from the Wigner zero mass representation space of the Poincaré group d d−1 p HW ig = ψ(p)| |ψ(p)|2 <∞ , 2 |p|
(14)
which without extension provides an irreducible representation space of SO(d, 2). The subspace of modular wedge-localized Wigner wave functions consists of boundary values of wave functions ψ which are analytic in the rapidity strip 0 < Imθ < π , where for the standard wedge px =
py2 + pz2 sinh θ, p0 = py2 + pz2 cosh θ.
(15)
This wave function space is in common regardless whether we are talking about the standard wedge on Md or the corresponding AdSd+1 wedge. Whereas this space in the Md interpretation is easily rewritten in terms of covariant x-space wave functions with the expected support properties, an analog η-spacetime covariantization for AdS does not work. The best one can do is to introduce (Olsen–Nielsen) string-like wave functions in ηspacetime which do not depend on w and which behave under SO(d, 2) transformations as objects which depend on an additional direction (the string direction). So instead of pointlike fields one obtains strand-like objects (with weaker covariance properties) which emanate from the points on the asymptotic boundary and extend through the bulk and which are linear in the same momentum space creation/annihilation operators as those which appear in the free conformal fields. This is the way in which the AdS formulation maintains the conformal degrees of freedom and the primitive causality. At this point an ardent string theorist might say: didn’t I tell you that starting from a conformal field theory you should expect to encounter AdS strings! However the strandlike objects of the Rehren theorem [6] are perfectly within causal localizable AQFT, and hence they are not objects of string theory proper. The main characteristics of the strings of string theory, namely the enhancement of the degrees of freedom due to the internal excitation structure, is missing in the case of our strands. In order to avoid misunderstandings we emphasize again that the degree of freedom issue is not related to Einstein causality which remains valid irrespective of whether the local algebras are generated by pointlike fields or not, but rather to the causal propagation property which requires Einstein causality as a prerequisite, but is not guaranteed by the latter. The Maldacena et al. conjecture [1–3] is that some high dimensional true (i.e. not the above kinematical strands) string theories in some effective and not precisely specified sense is equivalent with conformally invariant supersymmetric Yang–Mills (SYM) theories. To the extend in which the argument supporting this conjecture uses a correspondence between pointlike (Lagrangian) AdS5 QFT and a conformal SYM it is contradicted by the above theorem.
66
B. Schroer
Antinomies and contradictions about important topics in earlier times were often the source of progress and removal of prejudices, and one would hope that they continue to receive their due attention. The issue is somewhat delicate as a result that the Euclidean functional integral formalism in which the original conjectures were presented is at most an heuristic starting point since Feynman–Kac representations in strictly renormalizable QFTs are simply not valid for the physical (renormalized) results. For this reason the chosen method poses an obstacle against converting the conjecture into a proof. This conceptual flaw of the functional action approach is one of the raisons d’etre of algebraic QFT which succeeds to balance the starting calculational definitions with the properties of the constructed models. One could of course try to argue exclusively in terms of differential geometric concepts by abstracting from the action formulation purely geometric definitions of what constitutes SYMs and the associated string theories. But in doing this one will lose the relation to local quantum physics and the obtains theorem may be void of any particle physics content. If one admits that, as argued above, the Lagrangian perturbation method applied to the AdS side of the correspondence cannot be used for the construction of additional conformal QFTs, the question arises whether there are other construction methods. A closer investigation reveals that the spectrum of anomalous dimensions of interacting conformal field theories is determined in terms of a timelike braid group structure. A convenient way of presenting this structure is to work with nonlocal component fields [14] which result from the decomposition of the charge carrying globally local fields F 2)), under the reduction of the center of the conformal covering group (S(D, on M F (x) =
Fα,β (x), Fα,β (x) ≡ Pα F (x)Pβ ,
α,β
Z=
e2πiθα Pα
(16)
α
in terms of projectors Pa which appear in the spectral decomposition of the generator 2)) = {Z n , n ∈ Z}. In a way the existence of this decomposition Z of the center(S(D, facilitates the use of the standard parametrization of Minkowski space augmented by the quasiperiodic central transformation ZFα,β (x)Z ∗ = e2πi(θα −θβ ) Fα,β (x),
(17)
and hence one may to a large part avoid the use of the complicated covering parametriza2) transformations which the unprojected fields F would require. tion and its SO(D, the notation would be insufficient; one also has to give an For the latter fields on M equivalence class of path (the number n ≷0 of the heaven/hell one is in) with respect The projected fields on the other hand are analogous to our copy of M embedded in M. to sections in a trivialized vectorbundle. With the help of conformal 3-point functions one shows that the θ -phases are related to the anomalous dimensions. The component fields Fα,β (x) are the suitable objects for the formulation of the timelike braid group commutation relations which take the form of an exchange algebra Fα,β (x)Gβ,γ (y) =
β
(α,γ )
Rβ,β Gα,β (y)Fβ ,γ (x), x > y,
(18)
Anti deSitter Spacetimes with Local Quantum Matter
67
where the R-matrices are determined from admissible braid group representations. For more on the timelike braid group structure in higher dimensional conformal QFT we refer to [18]. Since theAdS-CQFT isomorphism implies a radical reprocessing on the physical side, it would be interesting to perform the timelike commutation relation analysis directly within the AdS setting. This has not been done yet. 4. Generalized Holography in Local Quantum Physics The message we can learn from the AdS-conformal correspondence is two-fold. On the one hand there is the recognition that there are situations where it is necessary to avoid the use of “field coordinates” in favor of directly working with local algebras. In most concrete situations there were always convenient field coordinatizations available in terms of which the calculations simplified. For the AdS-conformal correspondence is however a new type of problem for which the best way is to stay intrinsic, i.e. to use the net of algebras. The second message is that there may exist a holographic relation between QFT’s and their lower dimensional boundaries. We have argued that the degrees of freedom of AdSd+1 are the same as in the corresponding CQFTd on the boundary even though the Hamiltonians and the associated thermal aspects are different5 . This is the only known case of a bijection of nets of algebras associated with spacetimes of different dimensions but with the same maximal spacetime diffeomorphisms group. Another more frequent kind of holography6 occurs for spacetimes with a causal horizon. In that case certain spacetime diffeomorphisms of the original spacetime act in a “fuzzy” nongeometric manner, thus accounting for the fact that the diffeomorphism group of the horizon does not admit all original diffeomorphisms. Let us consider a simple example: the holographic image of a two-dimensional massive theory in the vacuum representation restricted to the standard wedge, i.e. a Rindler–Unruh situation. We want to restrict the d = 1 + 1 wedge algebra A(W ) to its upper half-line horizon R+ . In a massive theory we expect that both operator algebras are globally identical A(W ) = A(R+ )
(19)
although their local net structure is quite different. Classically this corresponds to the fact that characteristic data on either of the two horizons determine uniquely the function in the wedge7 . It is very important to control the data on the entire upper horizon R+ ; in contradistinction to a spacelike interval, compact intervals on R+ do not cast twodimensional causal shadows. The physical reason is of course that each point in a small neighborhood below that interval is in the backward influence cone of some points on R+ which are far removed to the right outside that interval. Only if we take all of R+ , we will have W as its two-dimensional causal shadow. 5 Contrary to a widespread belief, the number of degrees of freedom of causally propagating pointlike AdSd+1 theories is always larger than that of a causally propagating conformal theories CQFTd so that the isomorphism cannot be one among causally propagating pointlike theories, i.e. if the AdS theory is pointlike and causally propagating, the associated conformal theory has no causal propagation and hence has to be discarded as unphysical. 6 Since our approach tries to relate the holographic aspects via modular localization ideas to the old principles of particle physics, we do not have to invoke a new “holographic principle”. 7 This is true in any dimension. The only exception is d = 1 + 1, mass = 0 in which case both horizons are needed to specify the two chiral components of conformal theories.
68
B. Schroer
In the general approach to QFT the von Neumann algebra of a compact spacetime region is, according to the causal shadow property of AQFT (which is a local version of the time-slice property mentioned in the previous section [17]), identical to the algebra of its causally completed region. Each field theory with a causal propagation (in particular Lagrangian field theory) fulfills this requirement. If one takes a sequence of spacelike intervals which approximate a lightlike interval, the causal shadow region becomes gradually smaller and approaches an interval on the light ray in the limit. The only way to counteract this shrinking is to extend the spacelike interval gradually to the right in such a way that the larger lower causal shadow part becomes the full wedge in the limit. The correctness of this intuitive idea which suggests the correctness of (19) can be checked against other rigorous results. One such result from Wigner representation theory (which therefore is limited to free field theories) together with the application of the Weyl- or CAR- (for half-integer spin) functor is the statement that the cyclicity spaces for an interval I on R+ agree with the total space [19] A(I )6 = A(R+ )6 = A(W )6 = H,
(20)
i.e. the validity of the Reeh–Schlieder theorem on the light ray subalgebra. In fact this holds for all positive energy representations including zero mass, except zero mass in d=1+1 in which case the decomposition in two chiral factors prevents its validity. Therefore one only needs to prove the spatial statement A(R+ )6 = A(W )6 = H in order to derive (19). But this spatial completeness follows from the causal shadow property for spacelike half-lines L starting at the origin since the space A(L)6 = H and this completeness property cannot get lost in the light ray limit L → R+ . The step from spaces to (19) is done with the help of Takesaki’s theorem (mentioned later). We still have to rigorously define the holographic algebra A(R+ ) (which turns out to be chiral conformal) and its net structure A(I ) from A(W ). This is done by the modular inclusion technique which is one of AQFT’s most recent mathematical achievements [20, 19]. The modular way of associating a chiral conformal theory with e.g. a d=1+1 massive theory is the following. Start from the right wedge algebra A(W ) with apex at the origin and let an upper lightlike translation a+ (which fulfills the energy positivity!) act on A(W ) and produce an inclusion (all the algebras are von Neumann algebras) A(Wa+ ) ⊂ A(W ).
(21)
This inclusion is halfsided “modular”, i.e. the modular group8 7it of (A(W ), 6) which is the Lorentz boost acts on A(Wa+ ) for t < 0 as a “compression” Ad7it A(Wa+ ) ⊂ A(Wa+ ),
t < 0.
(22)
The assumed nontriviality of the net i.e. the intersections9 of wedge algebras entails that the relative commutant (primes on algebras denote their commutant in B(H )) A(Wa+ ) ∩ A(W )
(23)
8 For presentations of the Tomita–Takesaki modular theory which are close to the present concepts and notations see [10]. A more extensive presentation which pays due attention to the importance of modular theory for the new conceptual setting of QFT is that by Borchers [36]. 9 The nontriviality of the intersections is in some sense the algebraic counterpart of the renormalizable short distance behaviour in a quantization approach which is believed to be required by the mathematical existence of the Lagrangian theory.
Anti deSitter Spacetimes with Local Quantum Matter
69
is also nontrivial. Such inclusions are called “standard”. It is known that standard modular inclusions correspond to chiral conformal theories, i.e. the classification problem for the latter is identical to the classification of all standard modular inclusions [35]. In the case at hand the emergence of the chiral theory is intuitively clear since the only “living space” in agreement with Einstein causality (within the closure of W and spacelike with respect to the open Wa+ ) which one can attribute to the relative commutant is the lightray interval of length a+ starting at the origin. From the abstract modular inclusion setting the Hilbert space which the relative commutant generates from the vacuum could be a subspace H+ ⊂ H, H+ = P H of the original one, but the already mentioned causal shadow property assures that H+ = H, i.e. P = 1. With the help of the L-boost (=modular group 7it of (A(W ), 6) one then defines a net on the halfline R+ and a global algebra A(R+ ) = alg ∪t
70
B. Schroer
The present modular inclusion approach to “lightray physics”, including the localization and degrees of freedom aspect is another illustration of the conceptual power of the field coordinate free approach and the modular inclusion method. In the standard setting there are several fake as well as genuine (requiring a change of field coordinates) problems with light cone-restrictions and quantizations. Standard approaches are usually entirely formal; they tend to overlook localization problems whose understanding is vital for the physical interpretation of the formalism and furthermore they often use field coordinatizations which become singular on the lightray. These problems continue in higher dimensions where the wedge horizons are lightfronts. A typical case which requires new concepts is d = 1 + 2. In that case the modular method, applied to one wedge, only transfers a small fraction of the geometric structure of the original theory into a chiral conformal theory obtained by modular inclusion, which localization-wise should really be associated with the upper light front horizon of the wedge. The lightfront quantization (or “infinite momentum frame” method) with respect to one lightfront only cannot account for the full locality information. Since its transversal localization remains completely unresolved, the so obtained theory only contains the longitudinal localization data of a chiral conformal net. Let me explain the way to get a transversal resolution. In that case one tilts the wedge by a L-boost which leaves the upper defining light ray invariant [9, 10]. One then convinces oneself that this newly positioned second wedge has a modular associated chiral conformal theory which, though being unitarily equivalent to the first one, carries the missing information (which is needed for the reconstruction of the original d = 1+2 theory) in form of its relative position in the common Hilbert space H . The tilted wedge together with the original one can be used to give a net structure to the original wedge in the transversal direction. Again the holographic projection of the original net into the horizon has, besides geometric actions, also fuzzy and partially local actions. But instead of using the transversal resolution of the 2-dim. horizon for a constructive approach based on the modular inclusion and intersection method, it would be somewhat more natural to describe the original theory in terms of the two chiral theories which the modular inclusion associates with the original wedge and the tilted wedge. In the general d-dimensional case one would encode the original theory in terms of d − 1 copies of one and the same chiral theory in different positions within one Hilbert space. The name “chiral scanning” would hence be more appropriate than holography for such a procedure. Adding nice names to structural relations is of course by itself not very constructive. The hope is that by more profound future studies one may develop criteria which allow a more universal intrinsic algebraic characterization of those relative positions and chiral theories which allow to construct a d-dimensional QFT. Chiral theories are the simplest and best understood QFTs and the study of d − 1 copies of them even in nontrivial relative positions seems to be simpler than to confront higher dimensional field theories directly. In fact ’t Hooft’s original holography [23] proposal and Susskind’s [22] subsequent work appear much more related to the light front encoding and/or the related scanning than to the AdS holography with its high geometric symmetry restriction. The present use of modular inclusions may be seen as an attempt to find a firmer conceptual and mathematical basis for those ideas. The importance of causal horizons in the above considerations suggests to look for a “localization entropy” of causally localized matter as a first step towards a quantum explanation of the universal Bekenstein area law in black hole physics. But there is a
Anti deSitter Spacetimes with Local Quantum Matter
71
hurdle right at the start: unlike QM where a quantization box defines an inside/outside division of the Hilbert space and the quantum mechanical algebra (type I∞ von Neumann factor) through a tensor product factorization, the nature of the double cone algebras (the relativistic causally closed analogs of boxes) in QFT is totally different, since as hyperfinite type III1 von Neumann factors they contain neither minimal projectors nor are there any pure states among its normal states [36]. This unusual state of affairs requires the introduction of the “split property” in order to construct the relativistic analogue of the QM box [16]. The physical mechanism behind this property is the strong vacuum fluctuations of partial charges at the surface of its localization volume V , one of the oldest and most characteristic phenomena which set apart QFT from QM. Let us first try to understand this phenomenon in a mathematically refined formulation of its original discovery by Heisenberg. Using a smooth spacetime smearing function consisting of a spatial part gR,δ (x) with thickness δ and localization radius R multiplied by a compact support time-smearing f in the definition of the partial charge, QR =
j0 (x)f (x0 )gR,δ (x)d s x,
[c]c1, |x| < R , |x| > R + δ 0, f (x0 ) ≥ 0, f (x0 )d x0 = 1,
gR,δ (x) =
(25)
one finds that the square norm of the partial charge applied to the vacuum QR QR diverges with δ → 0 and increases for fixed δ in the limit R → ∞ as R d−2 , where d is the spacetime dimension [24]. But what, if any, could be the message of this area law with that of the would be localization entropy? We first have to understand the algebraic analogue of the surface vacuum fluctuation of the partial charges. This turns out to be the split property, i.e. the necessity to work with fuzzy space time boxes in the form of double cones with a “collar” region of thickness δ separating the inside of the smaller box of radius R from the outside of the bigger with radius R + δ. In this split situation we do recover the quantum mechanical inside/outside tensor factorization which refers to a fuzzy box algebra N which extends beyond the smaller box into the collar without sharp geometric boundaries [16]. This sets the stage for defining von Neumann entropy which needs the type I tensor-factorization of boxes in QM. There remains however another important difference to Schrödinger quantum mechanics in that the vacuum state remains entangled, i.e. does not factorize into an inside/outside part but rather remains a highly correlated state with the Hawking–Unruh temperature. This has paradigmatic consequences for the conceptual framework of the measurement process in local quantum physics [25]. It is also the origin of the localization entropy which we have been looking for. One can show that the vacuum state restricted to the fuzzy QFT box leads to a nontrivial entropy which diverges with δ → 0 and increases with the size R of the box in agreement with the above analogy which intuitively pictures the box entropy of the vacuum as being related to a partial “Hamiltonian charge” via a Gibbs formula in the above sense. As in that case one also would expect the validity of an entropical area law at least for large ratios of the diameter divides by the collar size and that the matter dependence would show up, if not in the coefficient of the area law itself, at least in its correction terms. The “Hamiltonian charge” which
72
B. Schroer
we intuitively relate with a Gibbs formula is not expected to be associated to a geometrical symmetry but rather to one of the infinitely many modular-generated fuzzy/hidden symmetries which any QFT possesses. In particular we find the use of the conformal rotational Hamiltonian which appeared in the recent literature [26] physically ad hoc and too restrictive, especially in view of the fact that Bekenstein’s area law does not require conformal invariance. Even if in very special conformal situations its spectrum happens to be similar to that of the logarithm of the modular operator of the splitting box algebra with respect to the vacuum and the resulting entropy complies with the Bekenstein area law, such an enigmatic observation will be helpful only if it leads to a general physical concept. The Minkowski analog (the Unruh problem) of black hole thermodynamics/statistical mechanics in our view is the understanding of thermal aspects resulting from (modular) localization rather than the application of the heat-bath Gibbs formalism. Various intuitively equivalent forms of localization entropy related to the split inclusion situation were introduced via the concept of relative entropy for a pair of states in the work of H. Narnhofer [27]. The most managable version for the purpose of extracting a possible area law which refers directly to the states seems to be the relative entropy of the vacuum relative to the “split vacuum” on the restricted tensor product algebra A ⊗ B , A ⊂ N , B ⊂ N , where A is the smaller double cone algebra and B the commutant of the bigger one. There exists [28] a nice variational formula in terms of states only for such relative entropy of a von Neumann algebra M between two states ωi , i = 1, 2, S(ω1 |ω2 )M = − log 7ω1 ,ω2 ω 2 1 ω(1) 1 dt − ω1 (y ∗ (t)y(t)) − ω2 (x ∗ (t)x(t)) , = sup 1+t t t 0 x(t) = 1 − y(t), x(t) ∈ M.
(26)
Here 7ω1 ,ω2 is the relative modular operator and for the case at hand we have to identify ω1 = 6, ω2 = 6 ⊗ 6 (the split vacuum) and M = A ⊗ B . Using some previous nuclearity estimates of Buchholz and Wichmann [16], Narnhofer carried out a rough estimate for this entropy and found that it increases less than the volume of the relativistic box of size R [27]. In the present setting her result may be interpreted as a first indication in favor of a Bekenstein area law for localized quantum matter. In order to obtain more structural insight into this fundamental and universal phenomenon I started to investigate this problem in the mathematically better controllable situation of two double cones separated by a collar in conformal theories [10]. By conformal invariance the large R behaviour becomes coupled to the short distance behavior in the limit of vanishing collar size δ → 0. One expects to have an easier conceptual grasp on this ultraviolet behavior as a consequence of the fact that it reflects truly intrinsic properties of the local algebras and not with short distance divergencies of particular field coordinates. Besides, conformal theories from an analytic viewpoint are the simplest theories after free fields. There are as yet no sufficiently concrete results worthwhile to be reported here. To avoid misunderstanding, We do not claim that the area law for black holes is a simple consequence of the area law for localized quantum matter. It would be a pity if it would, because then not much would be revealed by black hole physics about the still extremely speculative issue of quantum gravity. Rather I believe that it is the seemingly
Anti deSitter Spacetimes with Local Quantum Matter
73
very nontrivial conversion of localization entropy of local quantum physics into the more geometric Killing horizon entropy of black holes10 which will be the crucial step. 5. Epilogue The present analysis of the AdS-CQFT correspondence has its roots in the LSZ setting of particle physics from which the conformally invariant QFT should result in the zero mass limit [30]. The step from the traditional use of pointlike (Lagrangian) fields to operator algebras indexed by spacetime regions has been taken a long time ago with the intention to obtain a more profound understanding of the observed insensitivity of the S-matrix obtained as the asymptotic limit in the setting of the Lehmann–Symanzik–Zimmermann formalism to changes of field coordinates (“interpolating fields”) within the (Borchers) equivalence class. This led to a more intrinsic formulation of QFT called algebraic QFT (AQFT) which relegates the role of fields to a coordinatization of local algebras in terms of selection of particular generators. If one wants to use field-coordinatizations altogether, as was needed in Rehren’s proof of the AdS-CQFT isomorphism, it is appropriate to avoid the word “field” and talk about Local Quantum Physics [16]. As the step from differential geometry with coordinates to the modern intrinsic coordinate-free formulation did not represent a change of the geometrical content, one does not change physical principles (but only some concepts for their implementation) by passing from QFT to LQP. Since certain problems, as e.g. the abominable short-distance problem in the pointlike formulation (∼coordinate singularities? of which field coordinate?)11 which always seemed to threaten the existence of Lagrangian QFT through its long journey through renormalization theory, become deemphasized in favor of apparently different aspects (ultraviolet divergencies→nontriviality of certain intersection of algebras) in the new formulation, this reprocessing of concepts represents a very healthy change. The conjecture about the AdS-CQFT correspondence comes from string theory. Although string theory has been the dominant way of thinking in particle physics publications for at least two decades, its main achievements seem to be that (with some training and coaching) it allows theoretical physicists to make contributions to mathematics. Its historical origin in the dual S-matrix model of Veneziano was very close to the framework of LSZ scattering theory; in fact it started as a proposal for a nonperturbative crossing symmetric S-Matrix which fulfilled a very strong form (not suggested by QFT) of crossing called duality (saturation of crossing on the level of reggeized one-particle states). This forced the S-matrix to live in a high-dimensional spacetime of at least 10 dimensions (by invoking another invented structure: supersymmetry). The next step in the LSZ logic would have been to ask for the understanding of this high dimensional QFT (i.e. the unique equivalence class of fields or the local net of algebras) which has this S-matrix as a bona fide physical S-matrix, i.e. as a large time LSZ limit. Unfortunately this never happened; instead the off-shell transition was performed at a completely different purely technical place. It was based on the auxiliary observation that the particle towers which appeared in the lowest order (or lowest genus of Riemann surfaces which is the analogue of Schwinger’s auxiliary eigentime in QFT) 10 The role of the double cone restricted vacuum in the black hole situation is played by the Hartle–Hawking state restricted to the outside of the black hole [29]. 11 There are also intrinsic ultraviolet aspects of the local algebras. For example if one uses the “split property” for the definition of the vauum entropy of a local algebra with a “collar” for controlling the vacuum fluctuations near the causal horizon of the localization region, the entropy diverges with shrinking collar size in a way which is characteristic for the model but not for one of its field coordinates [10].
74
B. Schroer
can be reproduced by the mass spectrum of a string. In the original strong interaction representation of the model this tower was thought to lead to resonances (poles in the second Riemann sheet) resulting from higher order interactions destabilizing the higherlying particles in the tower. It was this step (which occurred even before the decree of the use of string theory as a quantum theory of gravity) which is responsible for the lost (and never recovered since) relation to causality and localization which are the cornerstones of QFT. Whereas in earlier times quantum field theorists have thought (without success) about nonlocal alternatives in the form of an elementary length or a cutoff, recent developments in algebraic QFT have made it abundantly clear that Einstein causality and its strengthed form Haag duality is inexorably linked with the mathematics of the Tomita–Takesaki modular theory. This is an extremely deep theory which is able to convert abstract domain properties of operators and subspaces obtained by applying algebras of local quantum physics to distinguished state vectors into concrete spacetime localization geometry (without the necessity to impose any additional structure from the mathematics of noncommutative geometry). All structural insight obtained up to now, the charge superselection structure, TCP, braid group statistics12 , V. Jones – as well as the new modular – inclusion theory mentioned in this paper, the universal nature of holography and the concept of localized entropy, all these properties depend on the causality aspects of QFT. So the reasons for giving them up must be very strong (theoretical or experimental) and amount to much more than the esthetics of differential geometric consistency observation. The biggest difference to a more scholarly and less marketing Zeitgeist of previous times becomes visible if one looks at the terminology. Whereas e.g. the quasiclassical Bohr–Sommerfeld theory was presented in a way that left no doubt about its transitory character and the step towards quantum mechanics was the de-mystification of the quasiclassical antinomies and loose ends, string theorists often praise their product as a theory of everything and invite their fellow physicists to read the big latin letter M as “mystery” in a science whose main aim used to be de-mystification. Having enjoyed the good fortune of proximity to Harry Lehmann to whose memory I have dedicated this paper, the present crisis often reminds me of good and healthy times in particle physics when he made his lasting contributions to particle physics. It would seem to me that in the present absence of profound experimental discoveries it would be more reasonable and safer to develop local quantum physics according to its very strong intrinsic logic and guidance of its underlying physical principles instead of taking off into the blue yonder under the maxim “everything goes”. But apart from a few exceptions there is a lamentable dominance of ideas which despite their long age have not contributed anything tangible to particle physics. This danger emanating from this dominance which seriously threatenes the chance of our most gifted and original young minds to contribute to the progress of particle physics (and which may even wipe out the very successful scholarly traditions in the exact sciences altogether) was certainly realized by the late Harry Lehmann who reacted to it with his characteristc mocking irony which his friends and collaborators will not forget easily, and which besides his scientific achievements probably explains Pauli’s sympathy and support extended to him. There are indications that members of the older generation (who have been keeping silence in the face of the mathematical brilliance and exclusiveness behind some of the present dominant fashion in particle physics) are slowly becoming aware of the potential danger [31, 32]. 12 Including the appearance of temporal plektonic structures in higher dimensional conformal field theories mentioned at the end of the third section.
Anti deSitter Spacetimes with Local Quantum Matter
75
Note added. Although the majority, interest has recently shifted away from the field theoretic AdS-CQFT problem, we find that it serves as an ideal ilustration how the powerful concepts of AQFT can solve a problem which otherwise (despite a very large number of papers) would have remained unsolved. Acknowledgement. I am indebted to Gerhard Mack for valuable suggestions and encouragements.
References 1. 2. 3. 4. 5. 6. 7. 8. 9.
10. 11.
12. 13. 14. 15. 16. 17. 18.
19. 20. 21.
22. 23. 24. 25. 26.
27.
Maldacena, J.: Adv. Theor. Math. Phys. 2, 231 (1998) Gubser, S.S., Klebanov, I.R., Polyakov, A.M.: Phys. Lett. B 428, 105 (1998) Witten, E.: Adv. Theor. Math. Phys. 2, 253 (1998) Fronsdal, C.: Phys. Rev. D 10, 589 (1974) Avis, S.J., Isham, C.J. and Storey, D.: Phys. Rev. D 18, 3565 (1978) Rehren, K.-H.: Ann. Henri Poincaré 1, 607 (2000) Bertola, M., Bros, J., Moschella, U. and Schaeffer, R.: AdS/CFT correspondence for n-point functions, hep-th/9908140 Buchholz, D., Florig, M. and Summers, S.J.: Hawking–Unruh temperature and Einstein Causality in anti-de Sitter space-time. hep-th/9905178 Schroer, B.: Ann. Phys. (N.Y.) 275, 190 (1999), and references therein; Schroer, B.: Local Quantum Theory beyond Quantization. In: Quantum Theory and Symmetries, ed. H.-D. Doebner, V.K. Dobrev, J.-D. Henning and W. Luecke, Singapore: World Scientific, 2000, hep-th/9912008 Schroer, B.: J. Math. Phys. 41, 3801 (2000) Segal, I.E.: Causality and Symmetry in Cosmology and the Conformal Group. Montreal 1976, Proceedings, Group Theoretical Methods In Physics, New York 1977, 433 and references therein to ealier work of the same author Luescher, M. and Mack, G.: Commun. Math. Phys. 41, 203 (1975) Hortacsu, M., Schroer, B. and Seiler, R.: Phys. Rev. D 5, 2519 (1972) Schroer, B. and Swieca, J.A.: Phys. Rev. D 10, 480 (1974); Schroer, B., Swieca, J.A. and Voelkel, A.H.: Phys. Rev. D 11, 11 (1975) Belavin, A.A., Polyakov, A.M. and Zamolodchikov, A.B.: Nucl. Phys. B 247, 83 (1984) Haag, R.: Local Quantum Physics. Berlin: Springer Verlag, 1992 Haag, R. and Schroer, B.: J. Math. Phys. 3, 248 (1962) Schroer, B.: Anomalous Scale Dimensions from Timelike Braiding. hep-th/0005134; Schroer, B.: Space- and Time-Like Superselection Rules in Conformal Quantum Field Theory. hepth/0010290 Guido, D., Longo, R., Roberts, J.E. and Verch, R.: Charged Sectors, Spin and Statistics in Quantum Field Theory on Curved Spacetimes. math-ph/9906019 Schroer, B. and Wiesbrock, H.-W.: RMP Vol. 12 No. 2, 301–326 (2000) Schroer, B. and Wiesbrock, H.-W.: RMP Vol. 12 No. 1, 139 (2000); Schroer, B. and Wiesbrock, H.-W.: Looking beyond the Thermal Horizon: Hidden Symmetries in Chiral Models. To appear in RMP 12 No. 3, (2000) Susskind, L.: J. Math. Phys. 36, 6377 (1995) ’t Hooft, G.: Dimensional reduction in quantum gravity. In: Salam-Festschrift, A. Ali et al. eds., Singapore: World Scientific, 1993, p. 284 Buchholz, D., Doplicher, S., Longo, R. and Roberts, J.H.: Rev. Math. Phys. Special Issue 49 (1992) Clifton, R. and Halvorson, H.: Entanglement and open Systems in Algebraic Quantum Field Theory. University of Pittsburgh, preprint Jan. 2000 Verlinde, E.: On the Holographic Principle in a Radiation Dominated Universe. hep-th/0008140 see also critical remarks in Bin Wang, Elcio Abdalla and Ru-Keng Su: Relating Friedmann equations to Cardy formula in universes with cosmological constant, hep-th/0101073 Narnhofer, H.: In: The State of Matter, ed. by M. Aizenman and H. Araki, Singapore: Wold-Scientific, 1994
76
B. Schroer
28. Kosaki, H.: J. Operator Theory 16, 335 (1986) 29. Wald, R.M.: Quantum Field Theory in Cuved Spacetime and Black Hole Thermodynamics. Chicago: University of Chicago Press, 1994 30. Schroer, B.: Phys. Lett. B 494, 124, (2000), hep-th/0005110 31. Todorov, I.T.: Two-dimensional conformal field theory and beyond. Lessons from a continuing fashion, math-phys/0011014 32. Penrose, R.: How to compute-help-and hurt scientific research. Convergence Winter 1999, p. 30 33. Rehren, K.-H.: Local Quantum Observables in the Anti de Sitter-Conformal QFT Correspondence. hepth/0003120 34. Kupsch, J., Ruehl, W. and Yunn, B.C.: Ann. Phys. (N.Y.) 89, 141 (1975) 35. Wiesbrock, H.-W.: Lett. Math. Phys. 31, 303 (1994); Guido, D., Longo, R. and Wiesbrock, H.-W.: Commun. Math. Phys. 192, 217 (1998) 36. Borchers, H.J.: J. Math. Phys. 41, 3604 (2000) 37. Nikolov, N.M. and Todorov, I.T.: Rationality of conformally invariant local correlation functions on compactified Minkowski space. hep-th/0009004 38. Buchholz, D., Mund, J. and Summers, S.J.: Transplantation of Local Nets and Geometric Modular Action on Robertson–Walker Space-Times. hep-th/0011237; Buchholz, D.: Algebraic Quantum Field Theory. A Status Report. hep-th/0011015 Communicated by G. Mack and W. Zimmermann
Commun. Math. Phys. 219, 77 – 88 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Mikheyev–Smirnov–Wolfenstein Effect for Linear Electron Density Harry Lehmann1, , Per Osland2, , Tai Tsun Wu3,4, 1 2 3 4
II. Institut für Theoretische Physik der Universität Hamburg, Hamburg, Germany Department of Physics, University of Bergen, Allégaten 55, 5007 Bergen, Norway Gordon McKay Laboratory, Harvard University, Cambridge, MA 02138, USA Theoretical Physics Division, CERN, 1211 Geneva 23, Switzerland
Received: 3 April 2000 / Accepted: 10 April 2000
Abstract: When the electron density is a linear function of distance, it is known that the MSW equations for two neutrino species can be solved in terms of known functions. It is shown here that more generally, for any number of neutrino species, these MSW equations can be solved exactly in terms of single integrals. While these integrals cannot be expressed in terms of known functions, some of their simple properties are obtained. Application to the solar neutrino problem is briefly discussed. 1. Introduction In studying the Mikheyev–Smirnov–Wolfenstein (MSW) effect [1] due to the coherent forward scattering of neutrinos by electrons in matter, it is often instructive to consider first special cases where the electron density is taken to be a simple function of distance. It is the purpose of the present paper to investigate perhaps the simplest case: the case where the electron density is a linear function of distance. The problem of the linear electron density is formulated in Sect. 2. The case of two neutrino species has a long history [2], and the solution, as reviewed in Sect. 3, can be expressed in terms of parabolic cylinder functions – see, for example, Chapter VIII of [3], or equivalently confluent hypergeometric functions. However, the solution in this form is specific to the case of two neutrino species, and is not convenient for generalizations to more neutrino species. Physically, this generalization is essential because there are at least three types of neutrinos. Therefore, Sect. 4 is devoted to treating in a different way the MSW differential equations for linear electron density and two neutrino species. On the one hand, this alternative method must lead to the same solutions as those in Sect. 3; on the other hand, this new treatment can be readily generalized to any number Deceased 22 November 1998.
This work was supported in part by the Research Council of Norway. Work supported in part by the United States Department of Energy under Grant No. DE-FG02-84ER40158.
78
H. Lehmann, P. Osland, T. T. Wu
of neutrino species. This case of linear electron density but any number of neutrino species forms the main content of the present paper, and various aspects of this case are treated in Sects. 5 and 6. 2. Formulation of the Problem Let there be N types of neutrinos, denoted by ν1 , ν2 , . . . νN , where ν1 is the neutrino of the first generation, i.e., the one that forms the SU(2) doublet with the electron. It is assumed that ν1 is the only neutrino which interacts differently with the electron because of the exchange of the intermediate boson W , while the other neutrinos ν2 , ν3 , . . . νN all have the same interaction with the electron. Thus, the neutrino mass matrix M [4] is an N × N matrix. The eigenvalues of M give the N neutrino masses µ. In analyzing the MSW effect, the neutrino masses are usually taken to be much smaller than the momentum p of the neutrino. Under this assumption, because (p 2 + µ2 )1/2 ∼ p +
1 2 µ , 2p
(2.1)
it is M 2 that enters in the differential equation for the MSW effect. Let (x) be the N-component neutrino wave function, then this differential equation is d 1 i (2.2)
(x) = W (x) + M 2 (x), dx 2p where W (x) is an N × N matrix whose only non-zero element is √ [W (x)]11 = 2 GF Ne (x),
(2.3)
with GF the Fermi weak-interaction constant and Ne (x) the electron density at the point x. The terminology “linear electron density” is used to mean that Ne (x) is a linear function of x. Since Ne (x) is the density of electrons, it cannot be negative. Therefore, the MSW differential equation (2.2) is physically meaningful only for the half-line of x, where Ne (x) ≥ 0. On the other hand, when the neutrino or the electron, but not both, is replaced by its antiparticle, the quantity [W (x)]11 of Eq. (2.3) changes sign. Therefore, the complementary half-line of x describes this slightly different physical situation. For this reason, Eq. (2.2) is to be studied for the entire range of x, from −∞ to +∞. For the present case of the linear electron density, Eq. (2.2) can be reduced, for a given value of p, to the dimensionless standard form i where
d ψ(t) = A(t)ψ(x), dt
(2.4)
ψ1 (t) ψ2 (t) ψ (t) ψ(t) = 3 .. . ψN (t)
(2.5)
Mikheyev–Smirnov–Wolfenstein Effect for Linear Electron Density
and
−t a2 A(t) = a3 . .. aN
a2 b2 0 .. . 0
a3 0 b3 .. . 0
. . . aN ... 0 ... 0 . .. . . . . bN
79
(2.6)
This is accomplished as follows. (i) To change the independent variable from x to t, there is a shift in origin and a rescaling with possibly a reversal of sign. (ii) To change the dependent variable from to ψ, there is a rotation in the second to N th component and an introduction of exponential factors with possibly some minus signs. Furthermore, from (i) and (ii), the elements of the matrix A(t) can be chosen to satisfy the conditions N
bj = 0,
(2.7)
j =2
b2 ≤ b3 ≤ b4 ≤ . . . ≤ bN−1 ≤ bN ,
(2.8)
aj ≥ 0
(2.9)
and for
j = 2, 3, 4, . . . N.
Consider the following special cases. (a) If, for some j , say j0 , aj0 = 0, then it is seen from Eqs. (2.5) and (2.6) that ψj0 is decoupled from the other ψj ’s. Thus, this special case of N types of neutrinos is reduced to a problem with N − 1 types of neutrinos. (b) If, again for some j , say j0 , bj0 = bj0 +1 , then a rotation can be carried out between ψj0 and ψj0 +1 such that, after this additional rotation, the new aj0 is zero. Thus, this special case of bj0 = bj0 +1 can be reduced to the above case of aj0 = 0, and hence again this second special case of N types of neutrinos is reduced to a problem with N − 1 types of neutrinos. It is therefore sufficient to study the ordinary differential equation (2.4) with Eqs. (2.5) and (2.6) under the condition (2.7) together with b2 < b3 < b4 < . . . < bN−1 < bN
(2.10)
aj > 0
(2.11)
and for
j = 2, 3, 4, . . . N.
In view of the inequality (2.10), it turns out to be convenient to define symbolically b1 = −∞
and
bN+1 = +∞.
(2.12)
80
H. Lehmann, P. Osland, T. T. Wu
3. Case N = 2 Let us review first the well-known case of the MSW effect for two types of neutrinos [1, 2]. By Eqs. (2.2)–(2.7), the MSW equations are d ψ1 (t) −t a2 ψ1 (t) = , (3.1) i a2 0 ψ2 (t) dt ψ2 (t) or more explicitly d ψ1 (t) = −tψ1 (t) + a2 ψ2 (t), dt d i ψ2 (t) = a2 ψ1 (t), dt i
(3.2) (3.3)
with a2 > 0. A second-order ordinary differential equation for ψ1 (t) is obtained by applying d/dt to Eq. (3.2) and using Eq. (3.3): dψ1 (t) d 2 ψ1 (t) − it + (a22 − i)ψ1 (t) = 0. dt 2 dt
(3.4)
In order to remove the first-derivative term, let ψ1 (t) = eit
2 /4
φ1 (t).
(3.5)
Then the equation for φ1 (t) is d 2 φ1 (t) + ( 41 t 2 + a22 − 21 i)φ1 (t) = 0. dt 2
(3.6)
Two linearly independent solutions of this Eq. (3.6) are the parabolic cylinder functions [3] Dρ (±eiπ/4 t),
(3.7)
ρ = −ia22 − 1.
(3.8)
where
Parabolic cylinder functions are special cases of the confluent hypergeometric function [5], the relation being Dρ (z) = 2(ρ−1)/2 e−z
2 /4
z ( 21 − 21 ρ, 23 ; 21 z2 ).
(3.9)
Since the confluent hypergeometric functions and satisfy the same second-order differential equation, the general solution of Eq. (3.6) is
(3.10) ψ1 (t) = t C (1 + 21 ia22 , 23 ; 21 it 2 ) + C (1 + 21 ia22 , 23 ; 21 it 2 ) . This is one convenient form for the solution for N = 2.
Mikheyev–Smirnov–Wolfenstein Effect for Linear Electron Density
81
4. Case N = 2 – an Alternative Approach In the existing treatment in the literature for linear electron density and two types of neutrinos [1, 2] as reviewed in Sect. 3, the crucial step is to recognize that the second-order differential equation (3.4) can be solved exactly in terms of known higher transcendental functions, either parabolic cylinder functions or confluent hypergeometric functions. More generally, for N types of neutrinos, the corresponding differential equation is of N th order. Even for N = 3, the third-order differential equation is not one for any wellknown transcendental function. Therefore, in order to be able to generalize the treatment of N = 2 to larger values of N , we must recast the solution of Sect. 3 so that parabolic cylinder functions and confluent hypergeometric functions do not play an essential role. A useful question to ask is the following: In what way is the linear electron density especially simple? The answer must be sought in Eq. (2.6), from which it is seen that the independent variable t appears only in one matrix element, and furthermore, it appears only linearly in that element. This implies that, if the Fourier transform is applied to the differential equation (2.4), the differentiation with respect to the Fourier-transform variable appears only once. Hence it is expected that an explicit expression can be obtained for the Fourier transform of ψ. Let ∞ 1 F (ζ ) = dt eiζ t ψ1 (t), (4.1) 2π −∞ then it follows from Eq. (3.4) that F (ζ ) satisfies the first-order differential equation −ζ 2 F (ζ ) −
d [−iζ F (ζ )] + (a22 − i)F (ζ ) = 0, dζ
where we have omitted all terms from t = ±∞. This differential equation simplifies immediately to iζ
dF (ζ ) − (ζ 2 − a22 )F (ζ ) = 0, dζ
or 1 dF (ζ ) i = (a22 − ζ 2 ). F (ζ ) dζ ζ
(4.2)
Integration over ζ gives F (ζ ) = const. e−iζ
2 /2
ζ ia2 . 2
(4.3)
From the inequality (2.11), it is seen that the function on the right-hand side of Eq. (4.3) has a singularity at ζ = 0 = b2 .
(4.4)
Therefore the constant in (4.3) can take on different values for ζ positive and for ζ negative. In other words, the differential equation (4.2) is really two differential equations, one for ζ > 0 and the other for ζ < 0, consistent with the fact that the right-hand side of Eq. (4.2) has a singularity at ζ = 0. With this observation, it is natural to define
2 2 e−iζ /2 (−ζ )ia2 for ζ < 0, (4.5a) F1 (ζ ) = 0 for ζ > 0,
82
H. Lehmann, P. Osland, T. T. Wu
and
for ζ < 0, for ζ > 0.
0 2 2 e−iζ /2 ζ ia2
F2 (ζ ) =
(4.5b)
Inverting the Fourier transform (4.1), this choice leads to
(1)
ψ1 (t) = (2) ψ1 (t)
0
−∞ ∞
dζ e−iζ t e−iζ
=
2 /2
(−ζ )ia2 , 2
(4.6) dζ e
−iζ t −iζ 2 /2
e
ζ
ia22
.
0
With the notation (2.12), these two formulas (4.6) can be written as
(n)
ψ1 (t) =
bn+1
bn
dζ e−iζ t e−iζ
2 /2
|ζ |ia2 , 2
(4.7)
for n = 1, 2. (1) (2) It remains to show that both ψ1 (t) and ψ1 (t) are confluent hypergeometric functions of the correct parameters and argument. For this purpose, it is convenient to define ∞ 2 2 dζ cos(ζ t)e−iζ /2 ζ ia2 , ψc (t) = (4.8) 0 ∞ −iζ 2 /2 ia22 dζ sin(ζ t)e ζ , ψs (t) = 0
so that it follows from Eqs. (4.6) that (1)
ψ1 (t) = ψc (t) + iψs (t),
(4.9)
(2)
ψ1 (t) = ψc (t) − iψs (t). It is found that ψc (t) = e−iπ/4 eπa2 /4 2(−1+ia2 )/2 '( 21 + 21 ia22 ) ( 21 + 21 ia22 , 21 ; 21 it 2 ) 2
2
(4.10)
and ψs (t) = −i eπa2 /4 2ia2 /2 '(1 + 21 ia22 ) t 2
2
(1 + 21 ia22 , 23 ; 21 it 2 ).
(4.11)
There are various ways to verify Eqs. (4.10) and (4.11), including carrying out power series expansions in t for the left-hand and right-hand sides. Finally, we note from Eq. (7) on p. 257 of reference [5] that
(a, c; x) =
'(1 − c) '(a − c + 1)
(a, c; x) +
'(c − 1) 1−c x '(a)
(a − c + 1, 2 − c; x). (4.12)
Therefore, the results of Sect. 3 and this section are the same.
Mikheyev–Smirnov–Wolfenstein Effect for Linear Electron Density
83
5. General Values of N The procedure of Sect. 4 for N = 2 can be generalized in a straightforward way to larger values of N . Indeed, this is the major advantage over the previously known ones as reviewed in Sect. 3. This generalization to arbitrary values of N is to be carried out in this section. Thus, the differential equations (2.4) need to be solved under the constraints (2.7), (2.10), and (2.11). By Eqs. (2.5) and (2.6), the Eqs. (2.4) are more explicitly N
i
dψ1 (t) aj ψj (t) = −t ψ1 (t) + dt
(5.1)
j =2
and, for k = 2, 3, 4 . . . N,
d i − bk ψk (t) = ak ψ1 (t). dt
(5.2)
In order to get a differential equation for ψ1 (t), apply the operator N d i − bk dt
k=2
to Eq. (5.1). By Eq. (5.2), this gives
N N N d d d i i i − bk + t ψ1 (t) = − bk ψ1 (t). aj2 dt dt dt
k=2
j =2
(5.3)
k=2 k=j
Equation (5.3) is an N th -order ordinary differential equation for ψ1 (t); it reduces to Eq. (3.4) when N = 2. Following Sect. 4, define the Fourier transform F (ζ ) of ψ1 (t) by Eq. (4.1), then the first-order differential equation for F (ζ ) is
N
k=2
(ζ − bk )
ζ −i
d dζ
F (ζ ) =
N j =2
aj2
N
(ζ − bk )F (ζ ),
(5.4)
k=2 k=j
or N aj2 d F (ζ ) = F (ζ ), ζ −i dζ ζ − bj
(5.5)
j =2
or N aj2 1 dF (ζ ) . = i −ζ + F (ζ ) dζ ζ − bj j =2
(5.6)
84
H. Lehmann, P. Osland, T. T. Wu
This Eq. (5.6) is the generalization of the previous Eq. (4.2) for N = 2. Integration over ζ gives the generalization of Eq. (4.3): F (ζ ) = const. e−iζ
2 /2
N
(ζ − bj )iaj . 2
(5.7)
j =2
From (2.11), the function on the right-hand side of this Eq. (5.7) has singularities at ζ = bj
(5.8)
for j = 2, 3, . . . N,. Therefore, define for n = 1, 2, 3 . . . N,
Fn (ζ ) =
e−iζ 0
N
2 /2
j =2 |ζ
− bj |iaj
2
for bn < ζ < bn+1 , otherwise.
(5.9)
Inverting the Fourier transform (4.1) then gives the desired N linearly independent solutions of the differential equation (5.3) as (n) ψ1 (t)
=
bn+1
bn
dζ e
−iζ t −iζ 2 /2
e
N
|ζ − bj |iaj
2
(5.10)
j =2
for n = 1, 2, 3, . . . N. In both Eq. (5.9) and Eq. (5.10), the notation of (2.12) has been used. The general solution of (5.3) is of course
ψ1 (t) =
N
(n)
Cn ψ1 (t)
(5.11)
n=1
with arbitrary constants Cn . The other components of the ψ(t) of (2.5) can be easily obtained also, and the result is ψ(t) =
N
Cn ψ (n) (t),
(5.12)
n=1
where
ψ (n) (t) =
bn+1
bn
1
a2 ζ −b2 a3 ζ −b3
iaj2 −iζ t −iζ 2 /2 dζ e e |ζ − bj | .. . j =2 aN −1 ζ −b
N
N −1
aN ζ −bN
.
(5.13)
Mikheyev–Smirnov–Wolfenstein Effect for Linear Electron Density
85
6. Limiting Behaviors for Large Distances The next task is to obtain the limiting behaviors of the various components of the wave function when the distance x is large, either positive or negative. In other words, the (n) problem is to find the limiting behaviors of the ψj (t), as given explicitly by Eq. (5.13), both for t → −∞ and t → ∞, with all the a’s and b’s fixed. It is important to remember that these two limits correspond to different physical problems, as discussed after Eq. (2.3). The consideration here will be limited to the part of the asymptotic behavior that does not vanish as t → ±∞. This is the physically interesting part. There are two possible types of contributions, from points of stationary phase and from end points of integration.
6.1. Points of stationary phase. From Eq. (5.13), the points of stationary phase are determined by ∂ (−ζ t − 21 ζ 2 ) = 0 ∂ζ
(6.1)
ζ = −t.
(6.2)
or
In Eq. (6.1), the additional phase due to the factor N
|ζ − bj |iaj
2
j =2
is not included because the aj and bj are all fixed while t → ±∞. Equation (6.2) implies that this point of stationary point is relevant only to: (1)
• ψ1 (t) as t → ∞, and (N) • ψ1 (t) as t → −∞ (1)
(N)
in view of Eq. (5.13). For example, when j > 1, ψj (t) as t → ∞ and ψj (t) as t → −∞ both behave as 1/t in absolute value so far as the contribution from this point of stationary phase (6.2) is concerned. 6.2. End points of integration. It is seen from Eq. (5.13) that, when k ≥ 2, there is an extra factor of ak ζ − bk (n)
(6.3) (n)
associated with ψk (t). But the range of integration for this ψk (t) as given by Eq. (5.13) is from bn to bn+1 . Therefore, the contribution from these end points of integration can lead to a non-zero answer only when the index k appearing in the expression (6.3) agrees with either n or n + 1. In other words, there are non-zero contributions as t → ±∞ only
86
H. Lehmann, P. Osland, T. T. Wu (k−1)
(k)
to ψk (t) [i.e., n = k − 1] and ψk (t) [i.e., n = k]. These particular components are given by k−1 bk N 2 2 −ak 2 (k−1) (t) = dζ e−iζ t e−iζ /2 (ζ − bj )iaj (bj − ζ )iaj ψk , bk − ζ bk−1 (k) ψk (t)
=
j =2
bk+1
bk
dζ e
−iζ t −iζ 2 /2
e
k
j =k
(ζ − bj )
N ia 2
(bj − ζ )
j
j =2
iaj2
j =k+1
ak . ζ − bk (6.4)
These Eq. (6.4) are exact. Since the important contributions come from the vicinity of ζ = bk , all the ζ ’s in Eq. (6.4) can be replaced approximately by bk except in the factors e−iζ t , bk − ζ , and ζ − bk . Therefore bk N 2 2 2 (k−1) (t) ∼ e−ibk /2 |bj − bk |iaj (−ak ) dζ e−iζ t (bk − ζ )−1+iak , ψk j =2 j =k
(k) ψk (t)
∼e
−ibk2 /2
N
iaj2
|bj − bk |
ak
bk
j =2 j =k
(6.5) dζ e−iζ t (ζ − bk )
−1+iak2
,
or (k−1) (t) ψk
∼e
−ibk2 /2 −ibk t
e
N
|bj − bk |
iaj2
|bj − bk |
iaj2
j =2 j =k
(k)
ψk (t) ∼ e
−ibk2 /2
e−ibk t
N
(−ak )
ak
j =2 j =k
∞
dx eixt x −1+iak , 2
0
∞
(6.6) dx e−ixt x
−1+iak2
.
0
This integral can be evaluated exactly in terms of the gamma function. (n)
6.3. Results. Figure 1 shows which ones of the various ψk (t) have non-vanishing behaviors for t → −∞ and t → ∞. These non-vanishing behaviors are: For t positive and large, √ 2 (1) ψ1 (t) ∼ 2π e−iπ/4 t iα eit /2 , (6.7) N 2 2 2 2 (k−1) ψk (t) ∼ e−ibk /2 e−ibk t |bj − bk |iaj (−ak )e−πak /2 '(iak2 )t −iak , (6.8) j =2 j =k
(k) ψk (t)
∼e
−ibk2 /2 −ibk t
e
N
j =2 j =k
iaj2
|bj − bk |
ak eπak /2 '(iak2 )t −iak ; 2
2
(6.9)
Mikheyev–Smirnov–Wolfenstein Effect for Linear Electron Density
k
n
1
2
1
∞
2
×
. . . N-1 N
4
-∞ × ×
3
3
87
× ×
4 .. .
× ..
.
N -1
..
.
..
.
× ×
N
×
(n)
Fig. 1. Table of non-vanishing components of ψk (t) as t → ±∞. A cross means that the component is non-vanishing both for t → −∞ and t → +∞; the symbol ∞ means for t → +∞ only, and −∞ means for t → −∞ only.
while, for t negative and large, √ 2 (N) ψ1 (t) ∼ 2π e−iπ/4 |t|iα eit /2 , N 2 2 2 2 (k−1) ψk (t) ∼ e−ibk /2 e−ibk t |bj − bk |iaj (−ak )eπak /2 '(iak2 )|t|−iak , j =2 j =k
(k) ψk (t)
∼e
−ibk2 /2 −ibk t
e
N
|bj − bk |
iaj2
j =2 j =k
ak e−πak /2 '(iak2 )|t|−iak . 2
2
(6.10) (6.11)
(6.12)
All the other components approach zero as t → ∞ and as t → −∞. In the asymptotic formulas (6.7) and (6.10), α is the quantity α=
N
aj2 .
(6.13)
2 ≤ k ≤ N,
(6.14)
j =2
In the formulas (6.8), (6.9), (6.11) and (6.12),
where N as always is the number of neutrino species. 7. Discussion When we started to investigate the MSW differential equations for three neutrino species in the case of the linear electron density, we were mostly interested in various possibilities of finding approximate solutions. Therefore, it was quite a surprise to us that these
88
H. Lehmann, P. Osland, T. T. Wu
coupled differential equations can be solved exactly not only for three, but also for any number of neutrino species. In the work of Wolfenstein, Mikheyev, Smirnov [1] and others [2] on the sun taking into account two species of neutrinos, it has been found that most of the effect takes place in a fairly narrow region around a particular value of the electron density. Because of this, it is quite accurate to use a linear approximation to the electron density. For more than two species of neutrinos, it is no longer true in general that there is a narrow region for most of the activity. Nevertheless, there are a number of circumstances where this is true. However, the conditions for this to hold has not yet been studied systematically. This is one direction for future work. Under the assumption of the electron density being a linear function of distance, the exact, general solution of the MSW differential equation is given by Eqs. (5.12) and (5.13). This solution is in the form of a number of single integrals. When the number of neutrino species is more than 2, these integrals cannot be evaluated in terms of known functions, and therefore their properties need to be investigated. A small step in this direction has been taken in Sect. 6, where the asymptotic behaviors of these integrals have been evaluated for large distances but with all the a’s and b’s held fixed. It is believed that, in so far as this case of linear electron density is applicable to the physically interesting case of solar neutrinos, the asymptotic evaluation of Sect. 6 is far from being sufficient. It is more likely that not only the distance, but also some of the parameters, the a’s and b’s, are large. This is a second direction for future work. Acknowledgements. We are greatly indebted to Dr. Conrad Newton for collaboration at the early stage of this work. One of us (TTW) thanks the Theory Division at CERN for its kind hospitality.
References 1. Wolfenstein, L.: Phys. Rev. D 17, 2369, (1978) ibid. 20, 2634 (1979); Mikheyev, S.P. and Smirnov, A.Yu.: Yad. Fiz. 42, 1441 (1985) [Sov. J. Nucl. Phys. 42, 913 (1985)], Nuovo Cimento C 9, 17 (1986) 2. See, for example, Landau, L.D.: Phys. Z. Sovjetunion 2, 46 (1932); Zener, C.: Proc. Roy. Soc. (London) A137, 696 (1932); Haxton, W.C.: Phys. Rev. Lett. 57, 1271 (1987); Parke, S.J.: Phys. Rev. Lett. 57, 1275 (1986); Petcov, S.T.: Phys. Lett. B 191, 299 (1986) 3. Bateman Manuscript Project, Higher Transcendental Functions, Vol. II, A. Erdélyi, ed., New York: McGraw-Hill, 1953 4. Rosen, S.P.: In: Symmetries and Fundamental Interactions in Nuclei, edited by W. C. Haxton and E. M. Henley, Singapore: World Scientific, 1995, p. 251 5. Bateman Manuscript Project, Higher Transcendental Functions,Vol. I, A. Erdélyi, ed., NewYork: McGrawHill, 1953 Communicated by A. Jaffe, G. Mack and W. Zimmermann
Commun. Math. Phys. 219, 89 – 124 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
The Elliptic Genus and Hidden Symmetry Arthur Jaffe Harvard University, Cambridge, MA 02138, USA Received: 7 January 2000 / Accepted: 10 April 2000
Dedicated to the memory of Harry Lehmann Abstract: We study the elliptic genus (a partition function) in certain interacting, twist quantum field theories. Without twists, these theories have N = 2 supersymmetry. The twists provide a regularization, and also partially break the supersymmetry. In spite of the regularization, one can establish a homotopy of the elliptic genus in a coupling parameter. Our construction relies on a priori estimates and other methods from constructive quantum field theory; this mathematical underpinning allows us to justify evaluating the elliptic genus at one endpoint of the homotopy. We obtain a version of Witten’s proposed formula for the elliptic genus in terms of classical theta functions. As a consequence, the elliptic genus has a hidden SL(2, Z) symmetry characteristic of conformal theory, even though the underlying theory is not conformal. 1. Introduction We study coupled complex bosonic and fermionic quantum fields on a two-dimensional space-time cylinder S 1 × R, where S 1 denotes a circle of length . The equations are determined by a holomorphic polynomial in n variables called the superpotential, V : Cn → C.
(1.1)
We denote the degree of this polynomial by n˜ = degree(V ), and we assume n˜ ≥ 2.
(1.2)
The complex scalar fields ϕ and the Dirac field ψ have n and 2n components respectively, ϕ = {ϕi }, where 1 ≤ i ≤ n,
and ψ = {ψα,i }, where 1 ≤ α ≤ 2,
1 ≤ i ≤ n. (1.3)
Work supported in part by the Department of Energy under Grant DE-FG02-94ER-25228. The author performed this research in part for the Clay Mathematics Institute.
90
A. Jaffe
In the literature one finds these equations called “Wess–Zumino equations” or sometimes “Landau–Ginzburg equations”. For cubic V , the equations reduce to the coupling of a non-linear boson field to the Dirac field by aYukawa interaction. Hence one occasionally also refers to the equations arising from general V as “generalized Yukawa” equations. In [15, 12] we established the existence of solutions to the Wess–Zumino equations for massive fields. Recently we extended these results by proving the existence of solutions for the equations coupling massless, multicomponent, twist fields. The word “twist” refers to the fact that the fields are multi-valued; translation about the spatial circle results in each component of the field being multiplied by a phase. This phase is proportional to a real parameter φ that we choose in the interval φ ∈ (0, 2π ], and the periodic case (no twist) corresponds to the limiting value φ = 0. The operators in the field theory act on a Fock–Hilbert space H over the circle, with domains and other properties of the operators depending on φ. For details of these definitions and results see [4, 10, 11]. We study a subset of polynomials V with properties detailed in Sect. 1.1. For these examples, the Hamiltonian H = H (V ) is self-adjoint, it is bounded from below, and the heat kernel e−βH has a trace for all β > 0. This semigroup commutes with the translation group generated by the momentum operator P . There is also a U (1) group U (θ ) = eiθJ of “twist” symmetries of H , where the generator J = J (V ) depends on V , f see Sect. 1.1. Denote the fermion number operator by N f , and let = (−I )N denote a Z2 -grading. In our examples, all four operators H , P , J , and are self-adjoint and mutually commute. Hence the operator A = e−iθJ −iσ P is unitary, and the operator A e−βH = e−iθJ −iσ P −βH has a trace for all β > 0. The elliptic genus is the partition function (1.4) ZV = Tr H e−iθJ −iσ P −βH . In a seminal paper [21], Witten suggested that one could calculate the elliptic genus of these examples in closed form. He gave a proposed formula (for φ = 0) based on an argument that ZλV should be independent of a parameter λ, and an “evaluation” of Z for V = 0. Kawai,Yamada, andYang [17] elaborated on the algebraic aspects Witten’s work and made contact with related proposals of Vafa [19]. From a mathematical point of view, these insights are not definitive; the representation (1.4) is ill-defined if both V = 0 and φ = 0, as e−βH does not have a trace, and the evaluation is only suggestive. Furthermore, establishing the existence and continuity of ZλV requires extensive analysis, beyond the scope of earlier work. We introduce a regularized ZV , with two regularizing parameters. The first regularization mollifies the zero-frequency modes, and enters through the non-zero twisting parameter φ, as explained in Sect. 1.1. The second regularization mollifies the highfrequency modes. We denote the regularization parameter by , and we discuss it in detail in Sect. 5 when we give an explicit expression for the supercharge as a densely defined sesqui-linear form on the Hilbert space H. The regularized supercharges determine self- adjoint operators. The elliptic genus depends on the parameter φ, and has a regular limit as φ → 0. (In fact, the genus continues holomorphically to all φ ∈ C.) The genus does not depend on the high-frequency mollifier . Our goal in this paper is to find and exploit infra-red and ultra- violet regularizations that yield all the following: • a self-adjoint Hamiltonian H that is bounded from below, with a trace class heat kernel, • the two-parameter group of Lie symmetries of H generated by J and P , and
The Elliptic Genus and Hidden Symmetry
91
• a sufficient number of invariant supercharges to study and to compute the elliptic genus. The method that we use in this paper has many advantages. We use twists to provide the infra-red regularization, and a ultra-violet regularization with the property of slow decrease at infinity to provide a non-local cutoff in the Hamiltonian. This regularized Hamiltonian has a form that allows us to establish stability and self-adjointness, as well as the existence of a trace for the heat kernel. This trace is uniform in the ultraviolet regularization parameter , but diverges as the twist parameter φ → 0. This regularization leaves us with half the number of translation- invariant supercharges that one expects in a twist-free theory. These supercharges also commute with J . On the other hand, more straightforward regularizations cause difficulty in at least one of these areas, either producing a heat kernel with continuous spectrum, destroying the eiθJ symmetry that one needs to study the elliptic genus, breaking all supersymmetries, making it impractical to establish stability, or producing error terms in the supersymmetry algebra that elude estimation. For example, introducing a bosonic mass, without a corresponding fermionic mass, provides an infra-red regularization compatible with a trace-class heat kernel and with J -symmetry; but all supersymmetries will be broken. We used this method in [9] to study the quantum-mechanics version of the present problem. As a result, the mathematical analysis became quite lengthy – even in the case of a finite number of degrees of freedom. On the other hand, introducing a mass in both the boson and the fermion destroys the J -symmetry of the Hamiltonian, as well as of all supercharges, requiring the analysis of o ther types of error estimates. Furthermore, a sharp upper momentum cutoff in the interaction produces non-localities that defy estimation. One new ingredient in our program is to generalize the framework of constructive quantum field theory to cover twist fields. We carry this out in more detail in [10]. A second new ingredient involves identifying and studying cancellations that occur in the geometric invariants we study, and we give the details of these cancellations. We begin in Sect. 5 with operator estimates, that justify representations of the invariants by invariants of a sequence of approximating problems. Related estimates show that we can exhibit cancellations in the difference quotients for the approximating problems. In order to estimate these cancellations, we pass from operator estimates to the study of traces in Sect. 6 and Sect. 7. Twisting partially breaks supersymmetry, as explained in detail in [11]. Half the supercharges are translation and twist invariant, while the other half of the supercharges are not. The elliptic genus can be written as a function of the invariant charges. We restrict the 3n-real twisting angles to lie on a line in R3n , parameterized by one angle φ. Doing this yields one invariant supercharge that we denote Q, and which commutes both with translations and the twist group. This supercharge satisfies Q2 = H + P . A second supercharge (one that formally exists for φ = 0) is neither translation nor twist invariant. But it is well-behaved in the sense that we can estimate the error terms in the supersymmetry algebra, and we use one of these estimates in this paper.1 In the end, we obtain the representation of the elliptic genus in terms of theta functions. The partition function then satisfies certain properties under transformations defined by 1 Other estimates on the error terms in the supersymmetry algebra play a role if one wants to identify the limiting quantum field theory with full supersymmetry in the limit as the twists are removed. The elliptic genus turns out to be the boundary value of an entire function of φ ∈ C. In particular, the limit φ → 0 exists. Since the Hilbert space and operators we study depend on φ, we define a limit of field theories as a limit of expectation values. With such a limit, as long as we keep a well-behaved, non-zero potential, we recover a standard quantum field theory as φ → 0.
92
A. Jaffe
the modular group SL(2, Z), acting on the complex space-time coordinate τ defined below. At first this seems surprising, as the theta functions and conformal symmetry are generally associated with zero mass fields or with conformal field theory. For this reason, we describe the SL(2, Z) symmetry as a hidden aspect of these Wess–Zumino models. Our results here build on work of Witten [21] and Connes [1], combining these ideas with results from our theory of twist quantum fields [4, 10] and our work in [6]. The elliptic genus is an index invariant, and as explained in Sect. IX of [6], it fits into the general framework of equivariant, non-commutative geometry (entire cyclic cohomology), characterized by the Dirac operator Q on loop space. However, the elliptic genus is only one such invariant, from a whole family of invariants, that result from the JLO-cocycle [13]. Therefore we suggest that it may be possible, within the framework of the Wess–Zumino examples that we study here, to find closed form expressions for some other invariants given in [6]. We formulated various representations for such invariants in [7, 9], and these might be useful in computation. We prove here the representation for the elliptic genus ZV . Our proof relies on a series of a priori estimates and other methods from constructive quantum field theory. In particular, we study ZλV , where λ denotes a real parameter, and establish differentiability of ZλV in λ for λ > 0, and eventually that ZλV is a constant function of λ. Another key estimate is to show that ZλV does not jump at λ = 0. In fact, ZλV is a priori Hölder continuous at λ = 0. We obtain any positive Hölder exponent α < 2/(n˜ − 1), namely there is a constant M = M(α, V , ) such that λV (1.5) Z − Z 0 ≤ M λα , for λ ∈ [0, 1]. For potentials of large degree this exponent is small, but strictly positive. These two results combine with the vanishing of the derivative, to show that ZλV is actually a constant function of λ ∈ [0, 1]. We then compute ZV by evaluating Z0 . 1.1. Assumptions. Let us give more details. The real-time bosonic field ϕRT = {ϕRT,i } has n components designated ϕRT,i , where 1 ≤ i ≤ n. The corresponding real-time fermionic fields ψRT = {ψRT,α,i } has 2n components labeled by α, i with i as before and f 1 ≤ α ≤ 2. All these fields are complex, and so given 3n twist constants & = {&bi , &α,i }, there is a one-parameter group U (θ) such that b
b
U (θ )ϕRT,i U (θ )∗ = ei&i θ ϕRT,i , and U (θ)ψRT,α,i U (θ)∗ = e&α,i θ ψRT,α,i .
(1.6)
Also, the momentum operator implements spatial translations, eiσ P ϕRT,i (x, t)e−iσ P = ϕRT,i (x − σ, t), and eiσ P ψRT,α,i (x, t)e−iσ P = ψRT,α,i (x − σ, t).
(1.7)
These properties uniquely determine each generator J and P , up to an additive constant; we choose these constants in the normalization condition NC below. A twist field has the additional property that these two groups are related. Translation around the circle results in multiplying each component of the field by a phase. Thus f there are 3n-independent twisting angles χ = {χib , χα,i } such that b
b
ϕRT,i (x + , t) = eχi ϕRT,i (x, t), and ψRT,α,i (x + , t) = eχα,i ψRT,i (x, t).
(1.8)
The Elliptic Genus and Hidden Symmetry
93
Our superpotential V is a holomorphic polynomial from Cn to C, and it determines the coupling of ϕRT with ψRT . Let Vi denote the directional derivative of V , namely Vi (z) = ∂V (z)/∂zi . We study a holomorphic polynomial superpotential V with two other basic properties: the potential is quasi-homogeneous (QH) and the potential satisfies certain elliptic bounds (EL). Furthermore, we assume that the twist constants and twisting angles satisfy certain twist relations (TR). Finally we assume certain normalization conditions (NC). We now briefly summarize these four hypotheses: QH (Quasi-homogenity) The superpotential function V: Cn → C is a holomorphic, quasi-homogeneous polynomial of degree n˜ at least two. This means that there are n constants &i called quasi-homogeneous weights, such that 0 < &i ≤ 21 and V (z) =
n
& i zi
i=1
∂V (z) . ∂zi
(1.9)
EL (Elliptic Property) Given 0 < -, there exists M < ∞ such that the function V satisfies |∂ α V | ≤ - |∂V |2 + M, and |z|2 + |V | ≤ M |∂V |2 + 1 . (1.10) Here ∂ α Vdenotes any multi-derivative of V , while |z| denotes the magnitude of z, and |∂V |2 = ni=1 |∂V /∂zi |2 is the squared magnitude of the gradient of V . TR (Twist Relations) Define the 3n twist constants & in J as functions of the n quasi-homogeneous weights &i , &bi = &i ,
f
&1,i = &i ,
f
and &2,i = 1 − &i .
(1.11)
Choose the 3n twisting angles χ to be proportional to the twist constants &, namely χib = &i φ,
f
χ1,i = &i φ,
f
and χ2,i = (1 − &i ) φ,
(1.12)
where φ is a single twisting parameter that we take to lie in the interval (0, π ]. NC (Normalization Conditions) Choose the additive constants in the generators J and P so the Fock ground state &vac is an eigenvector with the following eigenvalues2 : 1 and J &vac = − cˆ &vac , 2
P &vac = 0, where cˆ =
n
n f f &2,i − &1,i = (1 − 2&i ) .
i=1
(1.13)
i=1
This ensures that J and −J have the same spectrum. In [10] we establish Proposition 1.1. Assume that V is a holomorphic polynomial satisfying EL of Sect. 1.1. (i) There exists a self-adjoint quantum field twist Hamiltonian H (V ) that is the normresolvent limit of a sequence of approximating Hamiltonians H (V ) defined in Sect. 4. 2 The constant cˆ recurs in these problems and is called the central charge. In fact cˆ characterizes the weight of the elliptic genus as a modular function, as pointed out in [21].
94
A. Jaffe
(ii) The self-adjoint semi-group e−βH (V ) is trace class for β > 0. (iii) Suppose in addition that V is quasi-homogeneous, and that the twist constants & and the twisting angles χ satisfy TR. Then the Hamiltonians H and H both commute with the two-parameter unitary group eiθJ +iσ P of space translations and f twists, and they also commute with = (−I )N . We introduce some further notation. With (τ ) the imaginary part of τ , let H = {τ : 0 < (τ )} designate the upper half plane. We use the parameter σ ∈ R, and the strictly positive parameters β, θ, and φ. We take τ=
σ + iβ ∈ H.
(1.14)
In terms of these parameters, define the variables q = e2πiτ , so |q| < 1,
y = eiθ , so |y| = 1,
and z = eiφτ , so |z| < 1. (1.15)
Consider partition functions as functions of τ , θ , and φ, related to q, y, and z as above. The Jacobi theta function of the first kind ϑ1 (τ, θ ), defined for τ ∈ H, for θ ∈ C, with period 8 in τ , and with period 4π in θ, is given by ∞ 1 1 1 ϑ1 (τ, θ ) = iq 8 y − 2 − y 2 (1 − q n )(1 − q n y)(1 − q n y −1 ).
(1.16)
n=1
This function is odd in the second variable, namely ϑ1 (τ, θ ) = −ϑ1 (τ, −θ). We follow the standard notation in Sect. 21.3 of Whittaker and Watson [20], with the exceptions noted above. 2. Main Results We study the partition function
ZλV = Tr H e−iθJ −iσ P −βH (λV ) .
(2.1)
For V = 0, the heat kernel e−βH0 is also trace class, on account of the non-zero twisting parameter φ. Given a non-zero potential V satisfying QH and EL, we associate a family of potentials λV , where λ ∈ [0, 1], and also a generator J of symmetry with parameters & specified by TW and normalization given by NC. The partition function Z0 defined by λ = 0 has an implicit dependence on V , brought about through the choice of J . We devote the remainder of this paper to establishing the following theorem and its corollary. Theorem 2.1. Assume the polynomial potential V of degree n˜ ≥ 2 satisfies QH and EL of Sect. 1.1. Consider the self-adjoint Hamiltonian H = H (λV ), as defined in Proposition 1.1 for λ ≥ 0. Assume that the twist fields satisfy assumptions TR, and that P and J satisfy NC. (i) The map λ → ZλV (τ, θ, φ) is differentiable in λ for λ > 0.
(2.2)
The Elliptic Genus and Hidden Symmetry
95
(ii) Choose α so that 0 ≤ α < 2/(n˜ − 1). There exists a constant M = M(α, V ) such that for λ ∈ [0, 1], 0 (2.3) Z − ZλV ≤ M λα . Corollary 2.2. The map (2.2) is constant for 0 ≤ λ ≤ 1. The partition function ZV depends on V only through its weights &, and it equals V
Z (τ, θ, φ) = z
c/2 ˆ
n ϑ1 (τ, (1 − &i ) (θ − φτ )) . ϑ1 (τ, &i (θ − φτ ))
(2.4)
i=i
Remark. Corollary 2.2 shows that ZV (τ, θ, φ) extends to a holomorphic function for ab τ ∈ H, θ ∈ C, and φ ∈ C. If a, b, c, d ∈ Z, and ad − bc = 1, then ∈ SL(2, Z). cd Let τ =
aτ + b , cτ + d
θ =
θ , cτ + d
and
φ =
φτ . aτ + b
(2.5)
The analytic continuation of the partition function ZV (τ, θ, φ) obeys the transformation law V
Z (τ , θ , φ ) = e
2πi
cˆ 8
c(θ−φτ )2 cτ +d
ZV (τ, θ, φ).
(2.6)
One obtains limiting values from the representation (2.4) as the parameters φ, θ, or q vanish; these limits are not uniform and do not commute. Define the integer-valued index of the self-adjoint operator Q with respect to the grading as the difference in the dimension of the kernel and the dimension of the cokernel of Q as a map from the +1 eigenspace of to the −1 eigenspace of . Denote this integer by Index (Q). Corollary 2.3. We have the following limits. (i) As φ tends to zero, the partition function converges to3 lim ZV =
φ→0
n ϑ1 (τ, (1 − &i ) θ) i=i
ϑ1 (τ, &i θ)
.
(2.7)
As θ → 0, the partition function converges to ˆ lim ZV = zc/2
θ→0
n ϑ1 (τ, (1 − &i ) φτ ) i=1
ϑ1 (τ, &i φτ )
.
(2.8)
3 The existence of a field theory for φ = 0 requires special analysis. For λ = 0, this can be established as a consequence of the assumption EL for V . The field theory is the φ → 0 limit of the twist field theory, and the elliptic genus of the limiting theory is the limit (2.7). It agrees with the formula proposed in [17]. In the case λ = 0, the elliptic genus also has a φ → 0 limit as long as 0 < |θ | < 2π , but this limit is not the genus of a limiting theory.
96
A. Jaffe
(ii) For θ ∈ (0, π), we may take the iterated limit as φ → 0 and then q → 0 to obtain the equivariant, quantum- mechanical index studied in [9], n sin ((1 − &i )θ/2) . (2.9) lim lim ZV = q→0 φ→0 sin (&i θ/2) i=1
(iii) The integer-valued index Index (Q) can be obtained as Index (Q) = lim lim ZV = lim lim ZV θ→0
= lim
φ→0
lim
θ→0
q→0
φ→0
lim ZV
φ→0
θ→0 n
=
i=1
1 −1 . &i
(2.10)
(iv) On the other hand, lim
θ→0
lim ZV
q→0
Examples. For any n, if V (z) = EL, and &i =
1 , ki
cˆ =
= lim
n
ki i=1 zi ,
q→0
lim ZV
θ→0
= 1.
(2.11)
with 2 ≤ ki ∈ Z, then V satisfies QH and
n ki − 2 , and ki
Index (Q) =
i=1
n
(ki − 1).
(2.12)
i=1
For n = 2, with V (z) = z1k1 + z1 z2k2 , the potential also satisfies QH and EL. In this case, &1 =
1 k1 − 1 , &2 = , k1 k1 k2
cˆ = 2
(k1 − 1)(k2 − 1) , k1 k 2
and Index (Q) = k1 (k2 − 1) + 1.
(2.13)
Remark. The integer-valued index (2.10) is stable under a class of perturbations of V that are not necessarily quasi-homogeneous. Briefly, we require that V = V1 +V2 , where V1 satisfies the hypotheses QH and EL above. While V2 is a holomorphic polynomial, it is not necessarily quasi-homogeneous. In place of this, we assume that the perturbation V2 is small with respect to V1 in the following sense: given 0 < -, there exists a constant M1 < ∞ such that for any multi-derivative ∂ α of total degree |α| ≥ 1, |∂ α V2 | ≤ -|∂V1 | + M2 .
(2.14)
3. Supercharge Forms In this section, we define the supercharge Q as a densely-defined, symmetric, sesquilinear form. In later sections, we consider a family of self-adjoint operators Q that are mollifications of Q. The operators Q have a norm resolvent limit, showing that the sesquilinear form Q actually defines an unbounded operator. The definition of Q does not require renormalization.
The Elliptic Genus and Hidden Symmetry
97
The Hilbert space of our example is a Fock space H = Hb ⊗Hf . The bosonic Hilbert space Hb and the fermionic Hilbert space Hf are the symmetric and respectively the skew-symmetric tensor algebras over the one particle space K. Here K is the direct sum of 2n- copies of L2 (S 1 ). The free Hamiltonian H0 , the momentum operator P , the total number operator N = N b +N f , and twist generator J = J (&) are self-adjoint operators on H. Here N b is the total bosonic number operator, and it acts on H = Hb ⊗ Hf as N b ⊗ I , etc. The bosonic time-zero field ϕ(x), its conjugate field π(x) and fermion time-zero fields ψ(x) are operator valued distributions on H. There is a dense linear subset D ⊂ H, obtained by replacing L2 (S 1 ) by C0∞ (S 1 ), and by taking vectors with a finite number of particles. The domain D provides a natural domain on which to define operators, and then to extend them by closure. Furthermore the operators N , , H0 , P , and eiθJ all map D into D. In addition to defining operators with the domain D, we also define sesqui-linear forms with domain D × D. These are maps from pairs of vectors in D to C, that are antilinear in the first vector and linear in the second vector. By polarization, each such form can be expressed as a sum of four diagonal elements, namely as a sum of four expectations in vectors in D. On the domain D × D, the components of the time-zero fields ϕi (x), πi (x) and ψα,i (x), as well as normal-ordered polynomials in these components, are sesqui-linear forms; see for example [2]. The values of these forms defined in this way are C ∞ functions of x. We call them C ∞ -sesqui-linear forms with the domain D × D. Unless we specify otherwise, we use these domains and then extend the resulting operators or forms by closure. Ultimately our goal is to redefine operators and forms with domains determined by the range of a heat kernel of the Hamiltonian. Choose a potential function V satisfying QH and EL. This potential as a function of the scalar complex, boson field ϕ(x) determines the energy density of our system as follows. Let ψ(x) denote our Dirac field. Monomials in the components of the scalar field ϕi (x) (or in the components of the adjoint field, but not simultaneously in the components of the field and of its adjoint) are normal ordered. Since the boson fields and the Dirac fields act on different factors in the tensor product, the product of a normal ordered boson field and a Dirac field is also normal ordered. Let λ denote a real parameter lying in the interval [0, 1]. Define the normal ordered density D(λ; x) as the C ∞ sesqui-linear form D(λ; x) =
n
iψ1,j (x) πj (x) − ∂x ϕj (x)∗ + λψ2,j (x)Vj (ϕ(x))∗ ,
(3.1)
j =1
with domain D × D. The adjoint of a C ∞ sesqui-linear form is also a C ∞ sesquilinear form. Define the sesqui-linear form D(λ; x)∗ by polarization of the expectations f, D(λ; x)∗ f = f, D(λ; x)f ∗ , for f ∈ D. Define the supercharge density Q(λ; x) as the sesqui-linear form Q(λ; x) = D(λ; x) + D(λ; x)∗ .
(3.2)
The integral of these densities over S 1 yield supercharges that are densely-defined, sesqui-linear forms with the domain D × D, namely
D(λ) = 0
D(λ; x) dx, and Q(λ) = 0
Q(λ; x) dx = D(λ) + D(λ)∗ ,
(3.3)
98
A. Jaffe
where D(λ)∗ = 0 D(λ; x)∗ dx. If we also assume the twist assumption TW, then these forms have the properties for all λ ∈ [0, 1], all σ , and all θ, Q(λ) = −Q(λ) ,
eiσ P Q(λ) = −Q(λ) eiσ P ,
and eiθJ Q(λ) = −Q(λ) eiθJ .
(3.4)
The supercharge that we denote Q(λ), or sometimes Q(λV ), is the one that we study most in this paper. Define D0 = D(0) and Q0 = Q(0), and define DI and QI so that D(λ) = D0 + λDI , and Q(λ) = Q0 + λ QI .
(3.5)
The supercharge Q0 extends by closure to a self-adjoint operator Q0 . (This means that the form obtained by closing Q0 with the domain D × D uniquely determines a selfadjoint operator that we also name Q0 .) This operator is essentially self adjoint on the domain D, and also maps this domain into itself. Furthermore, Q0 commutes with the operator P and with the operator J defined for any &. The operator Q0 anticommutes f with = (−I )N . The square of the supercharge operator Q0 has the property Q20 = H0 + P .
(3.6)
As Q0 commutes with P , it follows from (3.6) that Q0 commutes with H0 . Furthermore, as P ≤ H0 , we have the elementary inequality of forms, ±Q0 ≤ |Q0 | ≤
√ 1/2 2H0 .
(3.7)
We also require the second component of the supercharge. This is a sesqui-linear form Q2 (λ), defined as the integral 0 Q2 (λ; x) dx of the density Q2 (λ; x) = D2 (λ; x) + D2 (λ; x)∗ . Here D2 (λ; x) is the C ∞ -sesqui-linear form D2 (x) =
n
iψ2,j (x) πj (x)∗ + ∂x ϕj (x) + λψ1,j (x)Vj (ϕ(x)) e−iφx/ .
(3.8)
j =1
As with the first component of the supercharge, Q2 (λ) = −Q2 (λ) ,
(3.9)
Q2 (λ) = Q2,0 + λQ2,I ,
(3.10)
and we have the decomposition
where Q2,0 and Q2,I are independent of λ. The form Q2,0 uniquely determines a selfadjoint operator that we also denote as Q2,0 . However, unlike the case of the operator Q0 , the operator Q2,0 is neither translation invariant nor twist invariant. Nevertheless, the square of Q2,0 is invariant under both these groups. This square equals Q22,0 = H0 − P + φR,
(3.11)
The Elliptic Genus and Hidden Symmetry
99
where n
R=−
2 i=1
:ψ2,i (x)ψ2,i (x)∗ : dx.
(3.12)
0
Here : · : denotes normal ordering. An explicit representation of 2 R can be given as a difference of two terms, each term being a sum of number operators for a subset of the fermionic modes, see [4]. This ensures, in particular, that 2 N,
±R ≤
(3.13)
where N denotes the total number operator. 4. Approximating Supercharge Operators In order to study the properties of the Hamiltonian, we introduce approximating families of supercharge forms Q (λ) indexed by a parameter ∈ [0, ∞], and with the property Q0 (λ) = Q0 , and Q∞ (λ) = Q(λ). Let Q (λ) = Q0 + λQI, ,
Q2, (λ) = Q2,0 + λQ2,I, .
and
(4.1)
f
In [4] we introduce a family of mollifier functions κi,b and κα,i, for the scalar and Dirac fields respectively. These mollifiers act by convolution, with a particular mollifier for each field component. The mollifiers have an index that specifies a momentum scale for the mollifier, and each mollifier converges to the Dirac measure δ as → ∞. We define mollified time-zero fields ϕ (x) and ψ (x) as sesqui-linear forms with components
f b κi, (x − y) ϕi (y) dy, and ψα,i, (x) = κα,i, (x − y) ψα,i (y) dy. ϕi, (x) = 0
0
(4.2) We apply the mollifiers only to the fields that occur in the terms QI and Q2,I . These terms are the interaction terms and are proportional to λ; in this way we mollify the boson and the fermion fields symmetrically. We construct the mollifiers from a single smooth, positive function κ˜ as follows. Let 1
κ˜ sdi (k) =
1 + k2
- ,
(4.3)
where 0 < - ≤ -(V ), and where we choose -(V ) sufficiently small. We choose for κ(k) ˜ any smooth function such that ˜ ≤ κ(0) ˜ = 1. κ˜ sdi (k) ≤ κ(k)
(4.4)
The lower bound on κ(k) ˜ by the strictly positive function κ˜ sdi (k) is the property that we call slow decrease at infinity or sdi, and it ensures that κ(k) ˜ is sufficiently close to being local, i.e. κ(k) ˜ = 1. We introduced this sdi property in [14] in order to establish stability for a purely-bosonic, bi-local interaction. In the supersymmetric case, the mollified Hamiltonian is bilocal and it is therefore natural to use an sdi mollifier. In [10] we establish stability based on these ideas. We represent the trace of the heat kernel of an
100
A. Jaffe
approximate Hamiltonian as a functional integral. The sdi property allows us to study a partition of unity of function space, and to show on each patch that the bi-local bosonic self-interaction can be bounded by a similar local self-interaction (with a coefficient that depends on the patch). The method is sufficiently robust that we can also estimate the non-local contributions from the ferm ionic determinant. We describe this phenomenon in more detail in Sect. 5. We define the family of periodic mollifier functions indexed by by the Fourier series 1 κ (x) = κ(k/ ˜ ) e−ikx , (4.5) 2π k∈
Z
where the series for κ converge in the sense of distributions. Each kernel κ (x) satisfies κ (x + ) = κ (x).
(4.6)
Denote by S the space of C ∞ , periodic functions on the circle. Let κ denote the integral operator κ : S → S defined by the integral kernel κ (x, y) = κ (x − y). In other words, κ is the operator of convolution by κ (x) on S. Given the usual topology on these smooth functions, the adjoint κ + of the operator κ acts on the dual space of distributions on the circle, defined by κ + ϕ (f ) = ϕ(κ f ). This adjoint is an integral
operator with the kernel κ + (x, y) = κ (y − x) = κ (x − y). Consider the space Sib = e−i&i φx/ S of smooth functions on the circle satisfying the twist relation f (x + ) = e−i&i φ f (x). These are the test functions for the components f f of the bosonic, time-zero, twist field. Likewise define the spaces S1,i = Sib and S2,i =
f e−i(1−&i )φx/ S. For each , define operators κi,b acting on Sib , operators κ1,i, acting f f f on S1,i , and operators κ2,i, acting on S1,i . To simplify notation we designate the mollifiers acting on the dual space by κ b , etc., without the adjoints, defining them as the convolution operators with kernels f
κi,b (x) = κ1,i, (x) = ei&i φx/ κ (x),
and
f
κ2,i, (x) = ei(1−&i /)φx κ (x). (4.7) f
f
The kernels satisfy κi,b (x) = κi,b (−x), and similarly κα,i, (x) = κα,i, (−x). They satisfy the twist relations f
κi,b (x + ) = κ1,i, (x + ) = ei&i φ κi,b (x), and
f
f
κ2,i, (x + ) = ei(1−&i )φ κ2,i, (x).
(4.8)
converge as → ∞ to the identity as operators on Sib , and f f similarly for κα,i, on Sα,i . Correspondingly, the kernels converge as distributions to a Dirac measure δ,
The operators κi,b
f
lim κi,b (x) = lim κα,i, (x) = δ(x). →∞
→∞
(4.9)
The Elliptic Genus and Hidden Symmetry
101
Also define n families of spatially-dependent kernels vi, (x) by the Fourier representations that converge in the sense of distributions, 1 (4.10) |κ(k/ ˆ )|2 e−ikx . vi, (x) = ei(1−&i )xφ/ 2π k∈
Z
In the sense of distributions, lim vi, (x) = δ(x).
(4.11)
→∞
With these definitions, we establish in [10] that the forms Q and Q2, determine self-adjoint operators. The operator Q have the properties Q = −Q ,
Q eiθJ +iσ P = eiθJ +iσ P Q ,
and
(4.12)
for all real θ, σ . Furthermore the operators Q2, satisfy Q2, = −Q2, .
(4.13)
But these operators Q2, do not commute with the group eiθJ +iσ P . The operator Q satisfies the normal relation of the first component of a supercharge and a Hamiltonian H , Q2 = H + P .
(4.14)
Here the Hamiltonian H is a perturbation of the free, twist- field Hamiltonian H0 = f H0b + H0 , and has the (non-local) form H = H (λV ) = H0 +
n i=1
dx
0
0
d y Vi (ϕ (x))∗ λ2 vi, (x − y) Vi (ϕ (y))
+ λ Y +Y
∗
,
(4.15)
where the boson-fermion coupling Y is the generalized Yukawa interaction Y = Y (V ) =
n i,i =1 0
ψ1,i, (x)ψ2,i , (x)∗ Vii (ϕ (x)) dx.
(4.16)
On account of the positive definite nature of the kernel λ2 vi, (x), the bosonic part of H , namely H b = H0b +
n i=1
0
dx 0
d y Vi (ϕ (x))∗ λ2 vi, (x − y) Vi (ϕ (y)),
(4.17)
is a sum of positive operators. In fact, the bosonic Hamiltonian can also be written, H b = H0b + λ2 Q2I, ,
(4.18)
102
A. Jaffe
where we note the identity, QI, (V )2 =
n i=1
dx
0
0
d y Vi (ϕ (x))∗ λ2 vi, (x − y) Vi (ϕ (y)).
(4.19)
The bosonic Hamiltonian H b is not normal ordered, and unlike H (λV ), it has no limit as → ∞. The second family of approximate supercharges Q2, are also related to H . However, their square has an error term in the standard supersymmetry algebra, Q22, = H − P + φR,
(4.20)
where R is the same operator that arose when analyzing the square of the free supercharge Q2,0 . The error term is given in (3.12). We use the following result from [10]: Proposition 4.1. Assume the potential V satisfies the assumptions QH, EL, assume the relations TR, and assume the definitions of Q , Q2, , H , and P in Sect. 3 and Sect. 4. Then the forms Q , Q2, , H , and P define self adjoint operators on H. The operators H are bounded from below. The operators Q , H , and P mutually commute, and they also commute with J . 5. Estimates on Operators We consider here the basic properties of the Hamiltonian and the supercharges. This leads to consideration of estimates that involve implicit renormalization cancellations. These estimates depend only on the form of the underlying operators H , P , N , etc., and they lead to inequalities of operators or their norms. These estimates do not involve further cancellations of the sort that arise in the proof of estimates on partition functions, that we consider in the following section.
5.1. A Priori Estimates. The results here require certain a priori estimates involving the family of Hamiltonians H = H (λV ), or the associated family of self-adjoint semigroups e−βH (λV ) that the H (λV ) generate. The proofs of these estimates are lengthy, so we establish them as the central results in the companion paper [10]. These estimates are of utmost importance, so we give an overview by collecting together the necessary statements. Within the context of constructive quantum field theory, the estimates we assume are of a standard nature, though they have not been previously proved in the context of zero-mass (twist) fields that we use here. The operators occurring in this section have been introduced earlier in this paper. For more details about these definitions, see [4]; for analytic details, see [10]. • In case the following inequalities involve β, we take β > 0. We choose a given, fixed φ ∈ (0, π], and a given, fixed V satisfying QH and EL of Sect. 1.1, and we define the Hamiltonian with the twist relations TR. The operators in question act on a Fock space H = H(&, φ) depending on the parameters &, φ. We fix these parameters throughout the approximations in this paper. By convention, we generally do not note the dependence of constants on φ, while we generally indicate the dependence on V .
The Elliptic Genus and Hidden Symmetry
103
• We require certain estimates that are uniform in , the parameter that designates the high-frequency mollifier. There exist positive, finite constants M1 = M1 (V ), M2 = M2 (V ), and M = M(β, V ) that are independent of , and of λ ∈ (0, 1], and such that N ≤ M1 H (λV ) + M2 , 1/2 H0
and
(5.1)
≤ M1 H (λV ) + M2 ,
Tr H e−βH
(λV )
(5.2)
≤ M(β).
(5.3)
There exists a self-adjoint R(β) = R(β; λ), that is a semigroup in β and that depends on the parameter λ, and such that e−βH
(λV )
− R(β) → 0, as
→ ∞,
(5.4)
for each λ ∈ (0, 1]. • We require the following estimate that is not uniform in . Given , there exist constants M1 = M1 ( , V ) and M2 = M2 ( , V ) such that for all λ ∈ [0, 1], H0 + λ2 QI,2 ≤ M1 H (λV ) + M2 .
(5.5)
Remarks. It is no loss of generality in (5.5) to increase the constants, if necessary, so in addition 1 ≤ M1 and H0 + λ2 QI,2 + I ≤ M1 H (λV ) + M2 , so H0 + λ2 QI,2 + I ≤ M1 (H (λV ) + M2 ) as well. We make this assumption. From the norm convergence of semigroups (5.4), we infer that the limiting semigroup R(β) has a self-adjoint generator H = H (λV ). This defines the limiting Hamiltonian, and R(β) = e−βH (λV ) . The uniform bound on the trace of e−βH (λV ) ensures that H (λV ) is bounded from below,4 and there exists a constant M3 = M3 (λV ) such that 0 ≤ H (λV ) + M3 .
(5.6)
For this limiting theory, there is a self-adjoint operator Q = Q(λV ) that commutes with P and that anticommutes with , for which Q(λV )2 = H (λV ) + P .
(5.7)
We comment briefly on the mollifier κ(k/ ˜ ) that we employ, rather than a mollifier, for example, that completely eliminates Fourier modes with large |k|. In the latter case, the approximating Hamiltonians have not been proved to be bounded from below. We first studied the special advantages of a mollifer like with slow decrease at infinity in [14], where we used this property that expresses “almost-locality”, to show that a class of bosonic Hamiltonians are bounded from below. We showed that the normal-ordered, purely-bosonic bilocal Hamiltonian :H b :, with H b of the form (4.17), is bounded from below. Specifically, in [14] we treat the case with n = 1, with a massive (rather than a b , and with no twist, φ = 0. massless) unperturbed Hamiltonian H0,m We outline the basic idea of our method in [10] to utilize the slowly decreasing prop
erty of the mollifier to prove the estimate (5.3). We begin by representing Tr H e−βH as 4 Without good control over convergence, such as the norm-convergence of semigroups that is the case here, a uniform bound like (5.1) or (5.2) on H (λV ) is insufficient information to establish a lower bound on H (λV ).
104
A. Jaffe
a functional integral. This is the functional integral for the normal-ordered purely bosonic actions, multiplied by a regularized Fredholm determinant arising from the expectation in the fermionic modes. We insert an appropriate partition of unity 1 = ∞ i =1 χi into this integral, thus dividing the integration into a sum of integrals over patches. To obtain an effective bound, we need to replace the non- local bosonic part of the action by a related local term. We do this on each patch, using several things: the positive definite form of the interaction term, the explicit form of the mollifier function κ(k/ ˆ ), in particular its monotonic property and its slowly decreasing character as a function of |k|. Using these properties, we bound the bilocal (boson ic) action from below on the patch χi . We obtain a lower bound on the bilocal action with the non-local coupling constant λ2 vi, (x − y) by a similar local action but with a local, coupling constant of ˜ d/ )2 δ(x − y). Here d + 1 = n˜ denotes the degree of the polynomial the form λ2 κ(i V . The coefficient of λ2 here is κ(i ˜ d/ )2 , and this vanishes as i → ∞ (namely at high momentum). In fact for constant , we have the asymptotics λ2 κ(i ˜ d/ )2 ∼ λ2 i −2- . We use the local action to estimate further non-local perturbations of lower degree, as well as local perturbations of lower degree, on the patch χi . This results in an additive, constant error term ri that has a magnitude |ri | ≤ o(1)(κ(i ˜ d/ )−2 ) ≤ o(1)(i )2- , - which diverges as i → ∞. The measure |χi | of the set χi satisfies |χi | ≤ e−i , where - = - (V ) > 0. This constant is small, and it depends only on the polynomial V . Therefore, fixing V , we can choose -(V ) ≥ - > 0 sufficiently small so that the product e|ri | |χi | is small for large i . When summed over i it leads to a finite estimate on the integral. We also use the approximate local bosonic action to estimate the non-local terms arising from the regularized Fredholm determinants. In this fashion we establish the uniform upper bound (5.3) on the trace of the family of approximating heat kernels. The method to establish the remaining bounds is similar. 5.2. Traces. In this section we collect a few general remarks that we use later. The Schatten p-norm of T for operators on H is defined as
p/2 1/p T p = Tr H T ∗ T . These norms satisfy Hölder’s inequalities T S r ≤ T p S q , where r = pq/(p + q), and 1 ≤ r, p, q. Furthermore, the trace norm · 1 is also given by T 1 = supunitary U |Tr H (U T )|, see Sect. III of [18]. Thus |Tr H (T )| ≤ T
1.
(5.8)
An operator T with T 1 < ∞ is said to be trace class, and such operators have a basis-independent trace. A sufficient condition to ensure the cyclicity identity of the trace, Tr H (AB) = Tr H (BA),
(5.9)
is that A is trace class and B is bounded. One says that a self-adjoint semigroup R(t) is C-summable if there is a function M(t) < ∞ such that R(t) 1 < M(t) for all 0 < t. A family of semigroups Rj (t) is uniformly C-summable if Rj (t) ≤ M(t), (5.10) 1 for all j .
The Elliptic Genus and Hidden Symmetry
105
Proposition 5.1. Assume that {Rj (t)} are a family of uniformly C-summable semigroups on a Hilbert space H, and assume that Rj (t) − R(t) → 0 as j → ∞. Then R(t) is trace class, and Rj (t) converges to R(t) in trace norm, (5.11) lim Rj (t) − R(t)1 = 0, and R(t) 1 ≤ M(t), for all 0 < t. j →∞
Furthermore, for any bounded operator A,
Tr H (AR(t)) = lim Tr H ARj (t) , for all 0 < t.
(5.12)
j →∞
Proof. Write
Rj (t) − Rm (t) = Rj (t/2) Rj (t/2) − Rm (t/2) + Rj (t/2) − Rm (t/2) Rm (t/2). (5.13) Thus by Hölder’s inequality, Rj (t) − Rm (t)
1
≤ 2M(t/2) Rj (t/2) − Rm (t/2) .
(5.14)
Hence Rj (t) is a Cauchy sequence in the Schatten ideal of trace class operators. Thus there exists a trace-class limit R(t), for which Rj (t) − R(t) → 0, and R(t) ≤ M(t). (5.15) 1 1 ≤ Rn (t) − R(t) , we infer from (5.15) that R(t) = R(t). Since Rj (t) − R(t) 1 Since R(t) and Rj (t) are trace class, if A is bounded then AR(t) and ARj (t) are also trace class. For a trace class operator T , we use (5.8) and Hölder’s inequality to obtain
Tr H ARj (t) − AR(t) ≤ ARj (t) − AR(t) ≤ A Rj (t) − R(t) , (5.16) 1 1 from which (5.12) follows. This completes the proof of the lemma.
# "
Lemma 5.2. Let e−βH be a self-adjoint, C-summable semigroup, and let A be a bounded operator on H. Then the map (σ, β) → Tr H A eiσ P −βH (5.17) extends holomorphically in β to all iβ ∈ H (keeping σ ∈ R fixed). Suppose the unitary group eiσ P is a symmetry of H , and there exist constants M1 , M2 < ∞ such that ±P ≤ M1 H + M2 .
(5.18)
Then for iβ ∈ H, the map (5.17) extends analytically in σ into a strip about the real axis of width proportional to $(β), and otherwise only depending on M1 and M2 . Proof. Theta summability ensures that H is bounded from below, so it is no loss of generality to add a constant to H so H ≥ I . With this convention, we can replace (5.18) by the assumption that there exists a constant M = M(M1 , M2 ) < ∞ such that ±P ≤ M H.
(5.19)
106
A. Jaffe
To prove analytic continuation in β, it is sufficient to establish a neighborhood of absolute convergence for the power series in - of ∞ Tr H e−(β+-)H = (−-)n /n!Tr H H n e−βH , n=0
starting initially with real β. Express β in its real and imaginary parts β = $(β)+i(β). The operator ee(β) is unitary, so for 0 < $(β), the operator H n e−βH /2 is bounded in norm by (n/$(β))n . So using Hölder’s inequality and (5.8) |Tr H H n e−βH | ≤ (n/$(β))n e−βH /2 1 . Then the exponential series converges absolutely for |-| < $(β)/e, yielding ∞ |-|n n=0
n!
|Tr H H n e−βH | ≤ (1 − |-|e/$(β))−1 e−$(β)H /2
1
< ∞,
(5.20)
as desired. We assume that P and H commute, so we simultaneously diagonalize these operators. We conclude from the spectral representation and (5.19) that |P |n ≤ M n H n for non-negative integers n. Proceed as above in the domain |-| < $(β)/Me, the power series in - for ei(σ +-)P e−βH 2 converges absolutely in operator norm. Using Hölder’s
inequality and (5.8), it then follows that Tr H eiσ P −βH is real analytic in σ for iβ ∈ H, and the proof is complete. " # Proposition 5.3. Assume quantum twist fields interact, with the nonlinearity determined by a polynomial V as specified above. Assume QH, EL, and TR of Sect. 1.1. Then there exist constants M1 and M2 , independent of , and such that ±P ≤ M1 H + M2 .
(5.21)
As a consequence, with a new constant M1 , Q2 ≤ M1 H + M2 .
(5.22)
Proof. The identity Q2 = H + P of (4.14) gives an upper bound on −P , −P ≤ H .
(5.23)
In order to obtain an upper bound on P , we take into account the details concerning the second component of the supercharge Q2, . From the relation (4.20) we infer that P ≤ H + φR.
(5.24)
Thus to establish an upper bound on P , it is sufficient to establish an upper bound on R in terms of H . We use the explicit form for R in (3.12), and the following comment; see [4] for details. It therefore follows that R satisfies the bound ±R ≤
2 N,
(5.25)
where N is the total number-of-particles operator. Using (5.1), we infer that P ≤ M1 H + M2 , with constants independent of . The bound (5.21) then follows, and from (4.14) we also infer (5.22). " #
The Elliptic Genus and Hidden Symmetry
107
5.3. Continuity of the Heat Kernel for λ > 0. We establish Lipshitz continuity, in the trace-norm topology, of the map λ → e−βH
(λV )
,
(5.26)
from the parameter λ ∈ (0, 1] into the approximating heat kernels. Stated in detail, for each allowed V , each fixed j < ∞, and each fixed λ ∈ (0, 1], and for |λ−λ | sufficiently small, there exists a constant M such that −βH (λV ) − e−βH (λ V ) ≤ M |λ − λ |. (5.27) e 1
Unfortunately, the estimates that we have proved for H (λV ) are insufficient to show that the map (5.26) is differentiable
in λ, and we do not know whether this is true. Also, we do not know whether Tr H e−H (λV ) is differentiable in λ. However, in the next
subsection we show that the partition function Tr H e−H (λV ) is differentiable in λ. We study the λ-derivative of the approximating family of heat kernels. For λ, λ ∈ (0, 1], and λ = λ , define the difference quotient of e−βH (λV ) by e−βH
− e−βH (λ V ) , and let λmin = min{λ, λ }. (5.28) λ − λ In the following we let R(β) denote the self-adjoint, trace- class semigroup generated by H (λV ), and let R (β) denote the similar semigroup generated by H (λ V ), Dβ (λ, λ ) =
(λV )
R(β) = e−βH
(λV )
,
and
R (β) = e−βH
(λ V )
.
(5.29)
β
Define the function F (λ, λ , s) for λ, λ , s ∈ (0, 1) for allowed potentials V by β
F (λ, λ , s) = −β e−sβH
(λV )
Q (λV ) QI, (V ) + QI, (V ) Q (λ V ) e−(1−s)βH
(λ )
.
(5.30)
= −βR(sβ) Q (λV ) QI, (V ) + QI, (V ) Q (λ V ) R ((1 − s)β).
(5.31)
We also write this as β
F (λ, λ , s)
Note that the bound (5.3) ensures that Dβ (λ, λ ) is trace class. By itself, this does not establish (5.27), as the trace norm may diverge as λ → λ. Also the bound (5.3), taken β together with the bound (5.5), ensures that F (λ, λ , s) is the sum of two trace-class β operators. In order to verify that F (λ, λ , s) is trace class, write each of the two heat kernels in (5.30) as the square of a heat kernel. The bound (5.3) shows that one of the heat kernel factors by itself is trace class. The second heat kernel multiplies Q (λV ), Q (λ V ), or QI, (V ) (either on the left or on the right); the estimates (5.3) and (5.5) show that each such product is bounded. Since the product of a bounded operator with a β trace-class operator is trace class, we infer that F (λ, λ , s) is trace class. But we have no control over how the trace-norm diverges (for fixed β) as s approaches an endpoint of the interval. We now address these issues. Let us denote the degree of the polynomial V by n˜ = degree(V ),
and note
2 ≤ n, ˜
in order to satisfy the elliptic growth assumption EL of Sect. 1.1.
(5.32)
108
A. Jaffe
Theorem 5.4. Assume quantum twist fields interact, with the nonlinearity determined by a polynomial V as specified above. Assume QH, EL, and TR of Sect. 1.1. Let β > 0. Let j ∈ Z+ be fixed. Then there exists a constant M = M(β, , V ) < ∞, such that the difference quotient Dβ (λ, λ ) satisfies the trace-norm bound β ˜ D (λ, λ ) ≤ M λ−1+1/(n−1) , (5.33) min 1 for all λ, λ ∈ (0, 1]. Lipshitz continuity (5.27) then follows. Theorem 5.4 is contained in Proposition 5.5 and Corollary 5.7 that follow. Proposition 5.5. Under the hypotheses of Theorem 5.4, there exists a constant M = ˜ M(β, , V ) < ∞ such that for λ, λ ∈ (0, 1], for s ∈ (0, 1), and for 0 ≤ α ≤ 1/(n−1), the following holds: β
(i) The operator F (λ, λ , s) defined in (5.30) has a trace norm bounded by β s −1+α/2 (1 − s)−1/2 + s −1/2 (1 − s)−1+α/2 . F (λ, λ , s) ≤ M λ−1+α min 1
(5.34) β
(ii) The map s → F (λ, λ , s) is continuous in the trace-norm topology. Lemma 5.6. There exists a constant M3 = M3 (j, V ) such that the following bounds hold: (i) For any α ∈ [0, 1], the interaction QI, (V ) satisfies ˜ QI, (V )2α ≤ M3α (N + I )α(n−1)
and also
˜ QI, (V )2α ≤ M3α (H0 + I )α(n−1) .
(5.35)
Here N is the total number operator and n˜ is the degree of V . (ii) The generalized Yukawa interaction Y + Y ∗ = {Q0 , QI, (V )} satisfies ˜ ±{Q0 , QI, (V )} ≤ M3 (H0 + I )n−1 .
(5.36)
(iii) For 0 ≤ α ≤ (n˜ − 1)−1 , 0 < λ ≤ 1, and 0 ≤ λ ≤ 1, −α(n−1)/2
˜ (H (λV ) + M2 )−(1−α)/2 QI, (V ) H (λ V ) + M2 ≤ M3 λ−1+α .
(5.37)
Proof. The estimates leading to this bound rely on the expansion of the bosonic field into its Fourier representation. The Fourier coefficients of the field are linear in creation and annihilation operators, multiplied by a kernel that is l 2 , by virtue of the mollifier κ, ˜ but with an l 2 norm depending on V and also on . These expansions and properties are given in detail in [4]. As a consequence, the operator Q2I, , that equals (4.19), has an expansion in terms of the fields that is a polynomial in creation and annihilation operators of degree 2(n˜ − 1). Each monomial in this expansion, expressed in terms of creation and annihilation operators, has an l 2 kernel. As a consequence, there is a constant M3 = M3 (j, V ), such that the purely bosonic interaction term QI, (V )2
The Elliptic Genus and Hidden Symmetry
109
˜ satisfies the upper bound, QI, (V )2 ≤ M3 (N + I )n−1 . This estimate is a standard property of monomials in creation and annihilation operators with l 2 -kernels; in the constructive quantum field theory literature this estimate is known as an Nτ -bound, and the contribution to the constant M3 from each monomial is the l 2 norm of the corresponding kernel, see [3]. Because the twisting angle is fixed and lies in the interval 0 < φ ≤ π, there is a constant M5 = M5 (φ) such that the commuting operators N and H0 satisfy N ≤ M5 H0 . Thus with a new choice of the constant M3 (and suppressing the dependence on φ, which is fixed) we obtain the bounds (5.35) with α = 1. The interpolation inequalities with 0 ≤ α ≤ 1 then follow from the Cauchy representation for the fractional powers of the resolvents, see Chapter V, Remark 3.50 of [16]. (ii) The bound
±{Q0 , QI, } ≤ Q20 + Q2I, = H0 + P + Q2I, ,
(5.38)
leads to the desired estimate with a new constant M3 . Use the elementary bound P ≤ H0 to estimate P , and use the bound (5.35) with α = 1 to estimate Q2I, . (iii) The bound (5.5) with 1 ≤ M1 and I ≤ H (λV ) + M2 ensures that λ2 QI, (V )2 ≤ M1 (H (λV ) + M2 ) .
(5.39)
As a consequence, the domain of the operator QI, (V ) = QI, (V ) contains all vectors in the domain of (H (λV ) + M2 )1/2 , for any λ > 0. It follows that we have an interpolation inequality: for any α ∈ [0, 1], (1−α)
λ2(1−α) |QI, (V )|2(1−α) ≤ M1
(H (λV ) + M2 )1−α .
(5.40)
The operator form of this inequality is (1−α)/2 −1+α λ . |QI, (V )|1−α (H (λV ) + M2 )−(1−α)/2 ≤ M1
(5.41)
Using part (i) of the lemma, we also have the operator interpolation inequality, α/2 ˜ (5.42) |QI, (V )|α (N + I )−α(n−1)/2 ≤ M1 . Note that the bound (5.42) does not involve λ. Combining (5.41) and (5.42), and the self-adjointness of QI, and H (λV ), we have
−α(n−1)/2 ˜ (H (λV ) + M2 )−(1−α)/2 QI, (V ) H (λ V ) + M2 ˜ ≤ (H (λV ) + M2 )−(1−α)/2 QI, (V ) (N + I )−α(n−1)/2 (5.43)
−α(n−1)/2 ˜ ˜ × (N + I )α(n−1)/2 H (λ V ) + M2 (1−α)/2 −1+α
≤ M1
λ
.
We obtain the interpolation bound on
−α(n−1)/2 ˜ α(n−1)/2 ˜ ˜ H (λ V ) + M2 , (N + I )α(n−1)/2 ≤ M1 using (5.1), as long as α(n˜ − 1) ≤ 1, which we assume. This completes the proof of the lemma. " #
110
A. Jaffe β
Proof of Proposition 5.5. Expand F (λ, λ , s) according to the definition (5.31). Write the first term R(sβ) Q (λV ) QI, (V ) R ((1−s)β) term in −F (λ, λ , s) as the following product of four bounded operators separted in braces, R(sβ) Q (λV ) QI, (V ) R ((1 − s)β) = {R(sβ/4)} {R(sβ/4) Q (λV ) R(sβ/4)} × R(sβ/4) QI, (V ) R (3(1 − s)β/4) R ((1 − s)β/4) . (5.44) Apply Hölder’s inequality to bound the trace norm of this product of four terms, using 1 . Then the exponents 1s , ∞, ∞, 1−s R(sβ) Q (λV ) QI, (V ) R ((1 − s)β) 1 ≤ R(sβ/4) 1/s R(sβ/4) Q (λV ) R(sβ/4) × R(sβ/4) QI, (V ) R (3(1 − s)β/4) R ((1 − s)β/4)1/(1−s) .
(5.45)
Bound the first and last factors on the right of (5.45) using the uniform estimate (5.3). Thus R(sβ/4)
1/s
R ((1 − s)β/4)
1/(1−s)
≤ M(β/4).
(5.46)
Use the spectral theorem to bound the second factor on the right of (5.45) uniformly in λ, by R(sβ/2) Q (λV ) R(sβ/4) ≤ O(1) s −1/2 , where the constant in O(1) depends on β and in (5.45) by
(5.47)
, but not on λ. Bound the third factor
R(sβ/4) QI, (V ) R ((1 − s)β/2) ≤ R(sβ/4) (H (λV ) + M2 )(1−α)/2
−α(n−1)/2 ˜ × (H (λV ) + M2 )−(1−α)/2 QI, (V ) H (λ V )
α(n−1)/2 ˜ × H (λ V ) R ((1 − s)β/2).
(5.48)
The first factor on the right of (5.48) is O(1) s −(1−α)/2 , again with the constant in O(1) depending on β and , but not on λ. From Lemma 5.6 we infer that the second factor in (5.48) is O(λ−1+α ), with the same proviso about O(1). Finally we estimate the third ˜ , with O(1) depending on and β. These three factor in (5.48) by O(1)(1 − s)−α(n−1)/2 bounds yield ˜ R(sβ/4) QI, (V ) R ((1 − s)β/2) ≤ O(1) λ−1+α s −(1−α)/2 (1 − s)−α(n−1)/2 . (5.49)
The Elliptic Genus and Hidden Symmetry
111
We combine the estimates (5.46), (5.47), and (5.49) to obtain ˜ R(sβ) Q (λV ) QI, (V ) R ((1 − s)β) ≤ O(1)λ−1+α s −1+α/2 (1 − s)−α(n−1)/2 , 1
(5.50) which is the first term in the bound (5.34). In order to bound the second term R(sβ) QI, (V ) Q (λ V ) R ((1 − s)β) in −F (λ, λ , s), repeat this procedure, but use the adjoint bounds. This yields the estimate R(sβ) QI, (V ) Q (λ V ) R ((1 − s)β) 1 ˜ ≤ O(1)(λ )−1+α s −α(n−1)/2 (1 − s)−1+α/2 .
(5.51)
Adding (5.50) and (5.51) completes the proof of the desired estimate (5.34). We now establish statement (ii). Let 0 < s < s < 1. Using the bound (5.1), we infer that there is a constant M suchthat, for λ ∈ [0, 1], the heat kernel e−sβH (λV ) is bounded in norm by M sβ . Therefore e−sβH (λV ) − e−s βH (λV ) ≤ 2M β . We can also bound the difference e−sβH (λV ) − e−s βH (λV ) = I − e−(s −s)βH (λV ) e−sβH (λV ) , (5.52) using the fundamental theorem of calculus, giving −sβH (λV ) − e−s βH (λV ) ≤ M β |s − s|/s. e Combining these two bounds on the difference, we infer that there is a new constant M > 1 such that for any 0 < - ≤ 1, -
−sβH (λV ) − e−s βH (λV ) ≤ M β |s − s|/s . (5.53) e The same bounds hold with H (λV ) replaced by H (λ V ). To simplify notation, let us denote H = H (λV ), H = H (λ V ), Q = Q (λV ), Q = Q (λ V ), and QI = QI, (V ). Now write the difference β
β
F (λ, λ , s) − F (λ, λ , s )
= βe−sβH Q QI + QI Q e−(1−s)βH − βe−s βH Q QI + QI Q e−(1−s )βH = β I − e−(s −s)βH e−sβH /2 F β/2 (λ, λ , s) e−(1−s)βH /2
+ βe−s βH /2 F β/2 (λ, λ , s ) e−(1−s )βH
/2
e−(s −s)βH − I .
(5.54)
From (5.53) and Hölder’s inequality, we obtain for any 0 < - ≤ 1, β β F (λ, λ , s) − F (λ, λ , s ) 1 ≤ βM β |s − s|- s −- F β/2 (λ, λ , s) + (1 − s )−- F β/2 (λ, λ , s ) . 1
1
(5.55)
Taking the bound (5.34) into account, we conclude in the case 0 < s < s < 1 that β the map s → F (λ, λ , s) is Hölder continuous in trace norm with an exponent - . A similar bound holds if 0 < s < s < 1, but with s and s interchanged, completing the proof of the proposition. " #
112
A. Jaffe
Corollary 5.7. We have the following. (i) Let η > 0. For any bounded operator A,
1−η
Tr H A F (λ, λ , s) ds = Tr H A η
1−η η
F (λ, λ , s)ds .
(5.56)
1−η β (ii) Let η > 0. The operators η F (λ, λ , s)ds converge in trace-norm as 1 β η → 0, defining 0 F (λ, λ , s)ds. Thus for any bounded A, 1
1
Tr H A F (λ, λ , s) ds = Tr H A F (λ, λ , s)ds . (5.57) 0
0
(iii) For A = I , this limit equals the difference quotient,
1−η β β = 0, lim D (λ, λ ) − F (λ, λ , s)ds η→0
η
and Dβ (λ, λ ) =
(5.58)
1
1
β
F (λ, λ , s)ds.
(5.59)
β Tr H A F (λ, λ , s) ds,
(5.60)
0
(iv) For any bounded operator A,
Tr H A Dβ (λ, λ ) =
1 0
yielding the estimate
Tr H ADβ (λ, λ ) ≤ O(λ−1+α ) A . min
(5.61) β
Proof. Statement (i) of the corollary follows from the continuity of F (λ, λ , s) in s, namely Proposition 5.5.ii. Statement (ii) of the corollary is a consequence of the es timate of Proposition 5.5.i. We now verify (iii). Consider the domain Ds × D1−s = −(1−s)H (λ V ) −sH (λV ) H×e H. Both H (λV ) and H (λ V ) are sesqui-linear forms e on this domain. Furthermore, from Proposition 5.3 we infer that both H (λV ) = Q (λV )2 − P and H (λ V ) = Q (λ V )2 − P on this domain. Therefore, we have the identity of forms, H (λV ) − H (λ V ) = Q (λV )2 − Q (λ V )2
= Q (λV ) Q (λV ) − Q (λ V ) (5.62)
+ Q (λV ) − Q (λ V ) Q (λ V )
= λ − λ Q (λV )QI, (V ) + QI, (V )Q (λ V ) ) . Consequently, on H × H, on Ds × D1−s H (λV ) − H (λ V ) −(1−s)H e−sH (λV ) e λ − λ
(λ V )
β
= −F (λ, λ , s).
(5.63)
The Elliptic Genus and Hidden Symmetry
113
Part (ii) of the corollary asserts that this expression has an integral over s ∈ [η, 1 − η], that converges in trace norm as η → 0. Therefore
1
e−sH
(λV )
0
H (λ V ) − H (λV ) −(1−s)H e λ − λ
(λ V )
ds =
1
β
F (λ, λ , s)ds.
0
(5.64) But the left side of this identity is the difference quotient Dβ (λ, λ ), so we have identified the η → 0 limit. Finally, the same argument proves that
1−η β β lim A D (λ, λ ) − A F (λ, λ , s)ds (5.65) = 0, η→0
η
1
and the bounds of statement (iv) then follow from integrating the estimate of Proposition 5.5.i. " # 6. Estimates on Traces In this section, we estimate certain partition functions. Their proof involves further cancellations, that are not captured by the estimates studied in the previous section. The proofs here use the estimates on operators from the previous section, both to justify the existence of the objects studied here, as well as to estimate the quantities that arise after exhibiting cancellations in the trace that defines the partition functions. 6.1. Differentiability for λ > 0 . In this section we establish differentiability of ZλV as a function of λ. Choose the bounded operator A in Corollary 5.7.iv to be A = e−iθJ −iσ P . Then the corollary yields a representation for the difference quotient
1−η ZλV − Zλ V β δ (λ, λ ) = = lim Tr (λ, λ , s) ds A F H η→0 η λ − λ
1 β Tr H A F (λ, λ , s) ds. =
(6.1)
0
Furthermore, the putative derivative of ZλV also has an integral representation, namely
δ (λ, λ) = lim
η→0 η
1−η
β Tr H A F (λ, λ, s) ds =
1 0
β Tr H A F (λ, λ, s) ds. (6.2)
Although both representations (6.1) and (6.2) are well defined, we have not established that δ (λ, λ ) has a limit as λ → λ, nor if this limit exists whether it equals δ (λ, λ). In this section we find the consequence of the smoothing provided by the specific operator A in the partition function, This allows us to prove differentiability of the partition function, and actually its vanishing. Theorem 6.1. Under the conditions of Theorem 2.1, the map λ → ZλV is a differentiable ∂ λV Z = δ (λ, λ) = 0. function of λ for all λ ∈ (0, 1]. In fact, the derivative vanishes, ∂λ
114
A. Jaffe
Proof. The bounds in the previous section show that δ (λ, λ ) is bounded. To establish the theorem, we show that the difference quotient (6.1) actually converges to zero, lim δ (λ, λ ) = 0,
λ →λ
(6.3)
for λ > 0. A similar argument shows that δ(λ, λ) = 0. We claim that for each fixed λ ∈ (0, 1], there exists a positive, constant M = M(β, , λ, V ), not depending on λ , such that
Tr H AF β (λ, λ , s) ≤ M |λ − λ | s −1/2 (1 − s)−1/2 , (6.4) whenever λ ∈ (0, 1] lies in the neighborhood of λ defined by Bλ = {λ : |λ − λ| ≤ 21 λ}. Let us assume this bound. As a consequence, 1 1−η
β β Tr H AF (λ, λ , s) ds = lim Tr H AF (λ, λ , s) ds η→0
0
η
≤ π M |λ − λ |, for
λ, λ
∈ (0, 1] and
λ
∈ Bλ . Thus according to the representation (6.1), δ (λ, λ ) ≤ π M |λ − λ |,
(6.5) (6.6)
for λ, λ ∈ (0, 1] and λ ∈ Bλ , and the derivative of ZλV vanishes as claimed. Thus we have reduced the proof of the theorem to the proof of (6.4), which we now establish. We use the notation in the proof of Proposition 5.5. Write the density β Tr H A F (λ, λ , s) for the difference quotient as
β Tr H A F (λ, λ , s) = β Tr H A R(sβ) Q QI R ((1 − s)β) (6.7)
+ β Tr H A R(sβ) QI Q R ((1 − s)β) . The operator A commutes with R and with R and it anticommutes with Q, Q , and QI . Also, we have seen that R(sβ) QI and Q R ((1 − s)β) are both trace class. Therefore using cyclicity of the trace,
β Tr H A F (λ, λ , s) = − β Tr H A QI R ((1 − s)β) Q R(sβ) (6.8)
+ β Tr H A R(sβ) QI Q R ((1 − s)β) . The bound (5.5) assures that the range of R is in the domain of both Q0 and QI , and hence in the domain of both Q and Q . Thus in the first term, we can write Q = Q + Q − Q = Q + λ − λ QI , to yield
β Tr H A F (λ, λ , s) = − β Tr H A R(sβ) QI R ((1 − s)β) Q
+ β λ − λ Tr H A QI R ((1 − s)β) QI R(sβ)
+ β Tr H A R(sβ) QI Q R ((1 − s)β)
= − β Tr H A R(sβ) QI Q R ((1 − s)β)
+ β λ − λ Tr H A QI R ((1 − s)β) QI R(sβ) (6.9)
+ β Tr H A R(sβ) QI Q R ((1 − s)β)
= β λ − λ Tr H A QI R ((1 − s)β) QI R(sβ)
= β λ − λ Tr H A R(sβ/2) QI R ((1 − s)β) QI R(sβ/2) .
The Elliptic Genus and Hidden Symmetry
We estimate (6.9) using Hölder’s inequality, obtaining β Tr H A F (λ, λ , s) ≤ β |λ − λ | A R(sβ/4) s R(sβ/4) QI R ((1 − s)β/4) × R ((1 − s)β/4)1−s R ((1 − s)β/2) QI R(sβ/2) 2 ≤ β |λ − λ | M(β/4) R(sβ/4) QI R ((1 − s)β/4) .
115
(6.10)
The constant M(β/4) in the last term is the constant in (5.3), and the bound on QI involves the self- adjointness of QI , R, and R . From (5.5) we infer that with a new constant M4 = M4 (β, , V ), β (6.11) Tr H A F (λ, λ , s) ≤ β |λ − λ | M4 λ−1 λ −1 s −1/2 (1 − s)−1/2 . On the set Bλ , we have λ = λ + (λ − λ) ≥ 21 λ. Thus taking M(β, , λ, V ) = 2β M4 (β, , V )λ−2 , we establish (6.4), and complete the proof of the theorem. " # 6.2. Hölder Continuity at λ = 0. In Theorem 6.1, we found that the partition function ZλV is a constant function of λ for all λ ∈ (0, 1]. At the λ = 0 endpoint of the interval, H (λV ) = H0 . If both 0 < φ ≤ π and 0 < β, then the heat kernel e−βH0 is trace class, and the partition function Z0 = Z0 is well defined. However, ZλV might have a jump discontinuity at λ = 0, so it may not be the case that ZλV = Z0 . It is important to demonstrate the continuity of ZλV , and we do so by establishing Hölder continuity at λ = 0 with an exponent depending on the degree n˜ = n(V ˜ ) of the polynomial potential V. Theorem 6.2. Assume the hypotheses of Theorem 2.1. Let 0 ≤ α < 2/(n˜ − 1). Then there exists a constant M = M(α, β, , V ) such that the partition function ZλV satisfies λV Z − Z0 ≤ M λα , for all 0 < λ ≤ 1.
(6.12)
Corollary 6.3. Under the hypotheses of Theorem 2.1, the functions ZλV are independent of and of λ, and ZλV (τ, θ, φ) = Z0 (τ, θ, φ), for all λ ∈ C.
(6.13)
Proof. The corollary for λ ∈ [0, 1] is an immediate consequence of the theorem. Substituting γ V for V , with γ ∈ C, we also have an allowed potential, and also (λγ )V λ(γ V ) = Z = Z0 . So the identity ZλV (τ, θ, φ) = Z0 (τ, θ, φ) extends to all Z λ ∈ C. The first step in the proof of the theorem is to establish a representation for the difference Z0 − ZλV , that is similar to the representation in the previous section for the difference quotient (5.60), except that it is convergent at the λ = 0 endpoint of the interval.
116
A. Jaffe
Lemma 6.4. There are constants M1 = M1 (j, V ) < ∞ and M2 = M2 (j, V ) < ∞ such that ˜ I ≤ H (λV ) + M2 ≤ M1 (H0 + I )n−1 ,
(6.14)
and for all 0 ≤ α ≤ 1, α/2 ˜ ≤ M1 . (H + M2 )α/2 (H0 + I )−α(n−1)/2
(6.15)
Proof. Write H (λV ) = H0 + λ2 QI, (V )2 + λ Y + Y ∗ , where Y + Y ∗ = {Q0 , QI, }. Since n˜ ≥ 2, the upper bound (6.14) holds trivially for λ = 0. The bound of ˜ . Finally, as a consequence of Lemma Lemma 5.6.i ensures that Q2I, ≤ M3 (H0 + I )n−1 ∗ ˜ . Taken together, these 5.6.ii, the term Y +Y is bounded from above by M3 (H0 +I )n−1 bounds establish (6.14). We choose M2 sufficiently large so that I ≤ H (λV ) + M2 . The lemma then follows from the interpolation inequality (H + M2 )α ≤ M1α (H0 + ˜ I )α(n−1) , valid for 0 ≤ α ≤ 1. For s ∈ (0, 1), define the operator-valued function β
f (λ, s) = e−sβH
(λV )
(H0 − H (λV )) e−(1−s)βH0 , for s ∈ (0, 1).
(6.16)
Lemma 6.5. Under the hypotheses of the theorem, and for s ∈ (0, 1), (i) Both e−sβH (λV ) H0 e−(1−s)βH0 and e−sβH (λV ) H (λV ) e−(1−s)βH0 are trace class. β (ii) There exists a constant M6 = M6 (β, , V ), such that the function f (λ, s) has a trace-norm bounded by β ˜ (1 − s)−1/2 . (6.17) f (λ, s) ≤ M6 s −1+1/2(n−1) 1
β
(iii) The map s → f (λ, s) is continuous in the trace-norm topology. β (iv) The integral of f exists, and for any bounded linear transformation A,
1 0
β Tr H A f (λ, s) ds = lim
1−η
η→0 η
= Tr H = Tr H
β Tr H A f (λ, s) ds
lim
η→0 η 1
1−η
β
A f (λ, s)ds
(6.18)
β A f (λ, s)ds .
0
(v) The difference ZλV − Z0 has the representation, ZλV − Z0 = β
1 0
where A = e−iθJ −iσ P .
β Tr H A f (λ, s) ds,
(6.19)
The Elliptic Genus and Hidden Symmetry
117
Proof. Write −sβH H0 e−(1−s)βH0 ≤ e−sβH /2 −1 e−sβH /2 H0 e−(1−s)βH0 /2 e 1 s × e−(1−s)βH0 /2 . (6.20) −1 (1−s)
Hence using (5.3) and also (5.5), we conclude that there is a constant M6 = M6 (β, , V ) such that −sβH H0 e−(1−s)βH0 ≤ M6 s −1/2 (1 − s)−1/2 , (6.21) e 1
so
e−sβH
e−(1−s)βH0
H0 is trace class. With a possibly larger constant M6 (β, , V ), we also have the bound −sβH ˜ H e−(1−s)βH0 ≤ e−sβH /2 −1 e−sβH /2 (H + M2 )1−1/2(n−1) e 1 s ˜ × (H + M2 )1/2(n−1) (H0 + I )−1/2 1−s × (H0 + I )1/2 e−(1−s)βH0 /2 e−βH0 /2 ˜ ≤ M6 s −1+1/2(n−1) (1 − s)−1/2 ,
(6.22)
where we use the bound of Lemma 6.4 to bound the third term of (6.22), as well as (5.3) to estimate the product of the first and last terms. This proves that e−sβH H e−(1−s)βH0 is trace class. As n˜ ≥ 2, the two bounds (6.21) and (6.22) taken together yield the proof of (i–ii). Let us use the notation R(s) = e−sβH (λV ) and R0 (s) = e−sβH0 . In order to establish (iii), take s < s and consider the difference β β f (λ, s) − f (λ, s ) 1
≤ R(s) − R(s ) (H0 − H ) R0 (1 − s)1
+ R(s ) (H0 − H ) R0 (1 − s) − R0 (1 − s ) 1
= I − R(s − s) R(s/2) R(s/2) (H0 − H ) R0 (1 − s)1 (6.23)
+ R(s ) (H0 − H ) R0 ((1 − s )/2) R0 ((1 − s )/2) R0 (s − s) − I 1 . We bound this using Hölder’s inequality by β β f (λ, s) − f (λ, s )
1 ≤ I − R(s − s) R(s/2) R(s/2) (H0 − H ) R0 ((1 − s)/2)
1
× R0 ((1 − s)/2) + R(s /2) R(s /2) (H0 − H ) R0 ((1 − s )/2)1
× R0 ((1 − s )/2) R0 (s − s) − I
β/2 ≤ I − R(s − s) R(s/2) R0 ((1 − s)/2) f (λ, s) 1
β/2 + I − R0 (s − s) R0 ((1 − s )/2) R(s /2) f (λ, s ) . 1
(6.24)
118
A. Jaffe
1 Use the bound (5.53), with 0 < - < 2(n−1) , as well as Lemma 6.5.ii, to obtain with a ˜ new constant M6 = M6 (β, , V ), - β β ˜ (1 − s)−1/2 f (λ, s) − f (λ, s ) ≤ M6 s − s s −1+1/2(n−1)−1 (6.25) - ˜ + M6 s − s (s )−1+1/2(n−1) (1 − s )−1/2−- .
This establishes continuity. The proof of (iv) follows the proof of Corollary 5.7, and we omit the details. Taking A = e−iθJ −iσ P , and observing that ∂ −sβH e ∂s
(λV ) −(1−s)βH0
e
yields (v). This completes the proof of the lemma.
β
= f (λ, s) # "
Lemma 6.6. Assume the hypotheses of Theorem 2.1, take A = e−iθJ −iσ P , and let s ∈ (0, 1). (i) We have the identity β Tr H A f (λ, s) = −λ2 Tr H A e−(1−s)βH0 /2 QI, (V ) e−sβH
(λV )
QI, (V ) e−(1−s)βH0 /2 . (6.26)
(ii) There exists a constant M7 = M7 (β, , V ) such that for all α ∈ [0, 1/(n˜ − 1)], β ˜ . (6.27) Tr H A f (λ, s) ≤ M7 λ2α s −1+α (1 − s)−α(n−1) Proof. Part (i) of the lemma is a consequence of the fact that both e−sβH (λV ) and e−(1−s)H0 are trace class. Furthermore, the bound ±P ≤ H0 along with Proposition 5.3 establishing a similar upper bound with H , shows that e−sβH (λV ) P e−(1−s)H0 is trace β class. We therefore rewrite H0 − H (λV ) in f (λ, s) as H0 − H (λV ) = H0 + P − H (λV ) − P = Q20 − Q (λV )2 = Q (λV ) (Q0 − Q (λV )) + (Q0 − Q (λV )) Q0 . Thus β
f λ, s) = e−sβH
(λV )
= −λ e−sβH
(Q (λV ) (Q0 − Q (λV )) + (Q0 − Q (λV )) Q0 ) e−(1−s)H0
(λV ) Q (λV ) QI, (V ) + QI, (V ) Q0 e−(1−s)H0 . (6.28)
Furthermore, in the first term Q (λV ) commutes with the heat kernel mollifier on the left, so the above methods show e−sβH (λV ) Q (λV ) QI, (V ) e−(1−s)H0 is trace class. Similarly, Q0 commutes with the mollifier on the right, so the second term is also trace class. Consider the first term. The operator e−sβH (λV )/2 Q (λV ) is bounded,
The Elliptic Genus and Hidden Symmetry
119
the operator e−sβH (λV )/2 QI, (V ) e−(1−s)H0 is trace class, and A anti-commutes with e−sβH (λV )/2 Q (λV ). Thus using cyclicity (5.9) one can write, − λ Tr H Ae−sβH (λV ) Q (λV ) QI, (V ) e−(1−s)H0 = − λ Tr H Ae−sβH (λV )/2 Q (λV ) e−sβH (λV )/2 QI, (V ) e−(1−s)H0 = λ Tr H A e−sβH (λV )/2 QI, (V ) e−(1−s)H0 e−sβH (λV )/2 Q (λV ) = λ Tr H A e−sβH (λV )/2 QI, (V ) e−(1−s)H0 Q (λV ) e−sβH (λV )/2
= λ Tr H A e−sβH (λV )/2 QI, (V ) e−(1−s)H0 Q0 + λQI, (V ) e−sβH (λV )/2 = λ Tr H A e−sβH (λV )/2 QI, (V ) Q0 e−(1−s)H0 e−sβH (λV )/2 + λ2 Tr H A e−sβH (λV )/2 QI, (V ) e−(1−s)H0 QI, (V ) e−sβH (λV )/2 = λ Tr H A e−sβH (λV ) QI, (V ) Q0 e−(1−s)H0 + λ2 Tr H A e−sβH (λV )/2 QI, (V ) e−(1−s)H0 QI, (V ) e−sβH (λV )/2 . (6.29) On the other hand, since each term in (6.28) is trace class, we have β Tr H A f (λ, s) = − λ Tr H A e−sβH − λ Tr H A e−sβH
(λV )
Q (λV ) QI, (V ) e−(1−s)H0 (λV ) QI, (V ) Q0 e−(1−s)H0 .
(6.30)
Substituting (6.29) into (6.30), we obtain β Tr H A f (λ, s) = −λ2 Tr H A e−sβH
(λV )/2
QI, (V ) e−(1−s)H0 QI, (V ) e−sβH
(λV )/2
,
(6.31)
which proves (i). In order to prove (ii), observe that a consequence of Lemma 5.6.iii, with α < (n˜ − 1)−1 , is the following bound. There is a constant M8 = M8 (β, , V ), such that −sβH e
(λV )/4
QI, (V ) e−(1−s)H0 /4
≤ e−sβH (λV )/4 (H (λV ) + M2 )(1−α)/2 ˜ × (H (λV ) + M2 )−(1−α)/2 QI, (V ) (H0 + I )−α(n−1)/2 ˜ × (H0 + I )α(n−1)/2 e−(1−s)H0 /4
˜ . ≤ M8 λ−1+α s −(1−α)/2 (1 − s)−α(n−1)/2
(6.32)
120
A. Jaffe
As a consequence of the representation (6.26), the fact that A is unitary, and using (5.8), we have β Tr H A f (λ, s) ≤ λ2 Tr H A e−(1−s)βH0 /2 QI, (V ) e−sβH (λV ) QI, (V ) e−(1−s)βH0 /2 ≤ λ2 e−(1−s)βH0 /2 QI, (V ) e−sβH (λV ) QI, (V ) e−(1−s)βH0 /2 1 −(1−s)βH0 /4 2 −(1−s)βH0 /4 −sβH (λV )/4 ≤ λ e QI, (V ) e e 1/(1−s) × e−sβH (λV )/2 e−sβH (λV )/4 QI, (V ) e−(1−s)βH0 /2 (6.33) 1/s 1−s 2 −βH (λV )/2 s −sβH (λV )/4 QI, (V ) e−(1−s)βH0 /4 . ≤ λ2 e−βH0 /4 e e 1
1
We have used Hölder’s inequality with the exponents (1 − s)−1 , ∞, s −1 , ∞, and the ∗ −(1−s)βH /4 0 ≤ 1. We use the bound (5.3), along fact that T = T , as well as e with (6.32), to complete the proof of (6.27). " # λV Proof of Theorem 6.2. Bound the difference Z − Z0 by using the representation of Lemma 6.5.v, and the bound of Lemma 6.6.ii. Integrating this bound, we obtain for any α ∈ (0, (n˜ − 1)−1 ),
1 λV β Z − Z0 ≤ β Tr H A f (λ, s) ds (6.34) 0 −1 2α ≤ βM7 (α) (1 − α(n˜ − 1)) (1 − α(n˜ − 2)) λ . The parameter 2α in the bound (6.34) becomes α in the bound (6.12). Thus we obtain Hölder continuity with any Hölder exponent strictly less than 2/(n˜ − 1), and the proof of the theorem is complete. " # Proof of Theorem 2.1. The bound (5.4), along with Proposition 5.1, ensures that the limit of partition functions limj →∞ ZλV actually equals ZλV . There is no question about the existence or the numerical value of the limit: Theorem 6.1 ensures that the function ZλV is constant in λ for λ > 0, and Theorem 6.2 ensures that ZλV equals the same function at λ = 0. Since Z0 is -independent, therefore ZλV is also -independent. As a result, not only do the differentiability and continutity of ZλV also hold for ZλV , but ZλV is also λ-independent for λ ∈ [0, 1]. So we have established Theorem 2.1 and the first statement in Corollary 2.2.
7. Analyticity In the previous section, we saw that ZV (τ, θ, φ) = ZV (τ, θ, φ) = Z0 (τ, θ, φ). In the next section we calculate Z0 (τ, θ, φ) and find that it is holomorphic for all τ ∈ H and all θ ∈ C. Furthermore, it actually extends to a holomorphic function of φ. (There is an independent way to verify that ZV (τ, θ, φ) is holomorphic using a priori estimates. This analyticity is in a smaller domain, but a - independent domain.)
The Elliptic Genus and Hidden Symmetry
121
Proposition 7.1. Assume QH, EL, and TR, with a fixed potential V . Then for fixed real θ and φ, the partition function ZV (τ, θ, φ) is holomorphic in τ for all τ ∈ H. Furthermore, for fixed τ ∈ H and fixed φ ∈ R, the function ZV (τ, θ, φ) extends analytically in θ to a strip | (θ ) | < R, where R = R(τ ). Let A = e−iθJ −iσ P . One can express the partition function ZV as ZV (τ, θ, φ) = Tr H Ae−βH = Tr H e−iθJ −iτ (H −P )/2+iτ (H +P )/2
2 2 = Tr H e−iθJ −iτ Q /2−P +iτ Q /2 ,
(7.1)
where τ denotes the complex conjugate of τ . We have a representation similar to (7.1) for the approximating family of partition functions,
2 2 ZV (τ, θ, φ) = Tr H A e−βH = Tr H (e−iθJ −iτ Q /2−P +iτ Q /2 ). (7.2) Lemma 7.2. The approximating partition functions ZV (σ, β, θ, φ) are holomorphic in the following senses: (i) Fix σ ∈ R, θ ∈ R, and φ ∈ (0, π ]. Then ZV (σ, β, θ, φ) defined for β > 0 is the boundary value of a holomorphic function of β extending to iβ ∈ H. (ii) Fix iβ ∈ H, θ ∈ R, and φ ∈ (0, π ]. Then ZV (σ, β, θ, φ) extends analytically in σ into a strip around the real σ axis whose width is independent of . (iii) Fix σ ∈ R, iβ ∈ H, and φ ∈ (0, π ]. Then ZV (σ, β, θ, φ) extends holomorphically in θ to a strip around the real θ axis, whose width is independent of . Proof. Express the partition function ZV in terms of the real variables, ZV =
ZV (σ, β, θ, φ) = Tr H e−iθJ −iσ P −βHj . The uniform trace bound (5.3) ensures that ZV extends to a holomorphic function of β in the right half-plane. In order to establish part (ii), we use (5.21), combined with Lemma 5.2. Finally, to establish part (iii) of the lemma, we observe that J , P , and H are mutually commuting. Furthermore we use a bound on J in terms of |P |. In fact, using the explicit form of these operators, see [4], we conclude that for fixed 0 < φ there is a constant M3 < ∞ such that ±J ≤ M3 |P |. It then follows from (5.21) that for constants M1 and M2 , independent of ±J ≤ M3 (M1 H + M2 ) .
(7.3) , (7.4)
We then apply Lemma 5.2 with θ replacing σ and J replacing P , to conclude that ZλV (τ, θ, φ) is real analytic in θ. The constants M1 , M2 , and M3 do not depend on , so there is a strip of uniform width about the real θ axis for which ZλV is uniformly bounded and holomorphic. " # Lemma 7.3. The approximate partition functions ZV (σ, β, θ, φ) satisfy the Cauchy– Riemann identity ∂ZV ∂ZV +i = 0, ∂σ ∂β for τ ∈ H. Therefore ZV is holomorphic for τ ∈ H.
(7.5)
122
A. Jaffe
Proof. By Lemma 7.2, the derivative of ZλV (σ, β, θ, φ) with respect to β and σ exist. Differentiating the representation (7.2), and using the identity (4.14) yields
∂ZV ∂ZV +i = −i Tr H A Hj + P e−βHj = −i Tr H A Q2 e−βH . (7.6) ∂σ ∂β Proposition 5.1 ensures that Q (H + M2 )−1/2 is bounded, at least if we choose M2 sufficiently large so I ≤ H + M2 . But (5.3) ensures that Q e−βH /2 = e−βH /2 Q is also bounded and trace class. As a consequence, we use cyclicity of the trace and A Q = −Q A to give Q e−βHj /2 Tr H AQ2 e−βH = Tr H A e−βH /2 Q A Q e−βH /2 = −Tr H e−βH /2 Q (7.7) = −Tr H A Q e−βH /2 Q e−βH /2 = −Tr H AQ2 e−βH = 0, completing the proof of analyticity in τ ∈ H. " # 8. Evaluation We verify the representation for the elliptic genus in the case that the potential V is zero. Proposition 8.1. Choose &i ∈ (0, 21 ] for 1 ≤ i ≤ n. Take V = 0 and assume TR and NC. Then the partition function Z0 is given by (2.4). Proof. Define
f &i (k)
=
&i , if 0 < k , 1 − &i , if k < 0
(8.1)
and the functions f
f
b (±k) = e∓iθ&i −β|k| , γ±,i
and γ±,i (±k) = e∓iθ&i (±k)−β|k| .
(8.2)
The momenta range over the following lattices, Kib = {k : k ∈ 2πZ − &i φ},
f
f
and K±,i = {k : k ∈ 2π Z − &i (±k)φ}.
(8.3)
We require that 0 < φ ≤ 2π and 0 < &i , 1 − &1 < 1, so zero is not an allowed momentum, f
f
0 ∈ Kib , K+,i , K−,i .
(8.4)
In case V = 0, the partition function factors into a product of a fermionic free-field part and a bosonic free-field part. We calculated the free bosonic and fermionic partition functions in Theorems 2.2.1 and 5.4.1 of [4], yielding f f n 1 − γ+,i (k ) 1 − γ−,i (−k ) ˆ (8.5) Z0 = y c/2 . b (k)|2 |1 − γ +,i b f f i=i k∈Ki k ∈K+,i k ∈K−,i
The Elliptic Genus and Hidden Symmetry
123
ˆ arises from the normalization constant c/2 The overall factor y c/2 ˆ in (1.13). Split each product into terms indexed by n ∈ Z, and separate the terms with positive, b (k) = γ b (−k)∗ . (The γ f satisfy such a relation negative, and zero n. Note that γ+,i ±,i −,i b,f b, f only when &i = 1/2.) For k = 2π n − χ±,i /, and n ∈ Z the functions γ±,i (±k) take the following values: f
f
b (k) γ+,i
b (−k) γ−,i
γ+,i (k)
γ−,i (−k)
n=0
(z/y)&i
(yz)&i
(z/y)1−&i
(yz)&i
n>0
q n (1/yz)&i
q n (y/z)&i
q n (1/yz)&i
q n (y/z)1−&i
n<0
q |n| (z/y)&i
q |n| (yz)&i
q |n| (z/y)1−&i
q |n| (yz)&i
Therefore (8.5) equals the product of ratios made from these 12 terms, with factors 1−γ f in the numerator and factors 1 − γ b in the denominator. Group the terms depending on q near each other, to obtain ˆ Z0 = y c/2
n (1 − (z/y)1−&i )(1 − (yz)&i )
(8.6) (1 − (z/y)&i )(1 − (yz)&i )
∞
1 − q n (1/yz)&i 1 − q n (yz)&i 1 − q n (z/y)1−&i 1 − q n (y/z)1−&i
. × 1 − q n (1/yz)&i 1 − q n (yz)&i 1 − q n (z/y)&i 1 − q n (y/z)&i n=1 i=i
Using the definition (1.16), the product (8.6) is ˆ Z0 = zc/2
n ϑ1 (−τ , &i (φτ + θ )) ϑ1 (τ, (1 − &i ) (φτ − θ )) ϑ1 (−τ , &i (φτ + θ )) ϑ1 (τ, &i (φτ − θ )) i=i
n ϑ1 (τ, (1 − &i ) (θ − φτ )) c/2 ˆ =z . ϑ1 (τ, &i (θ − φτ ))
(8.7)
i=i
The theta functions depending on τ occur in both the numerator and the denominator. We also use here the fact that the function ϑ1 is odd in the second variable. This completes the evaluation of Z0 , and it also completes the proof of Corollary 2.2. " # References 1. Connes, A.: Entire cyclic cohomology of Banach algebras and characters of C-summable Fredholm modules. K-Theory 1, 519–548 (1988) 2. Glimm, J., and Jaffe, A.: Quantum Physics. Second Edition, Berlin–Heidelberg–New York: Springer Verlag, 1987 3. Glimm, J., and Jaffe, A.: Les Houches Lectures 1970. In: Selected Papers, Expositions, Boston, MA: Birkhäuser Boston, 1985 4. Grandjean, O., and Jaffe, A.: Twist fields and broken supersymmetry. J. Math. Phys. 41, 3698–3763 (2000) 5. Grandjean, O., Jaffe, A., and Tyson, J.: Twist Positivity for Lagrangian Symmetries. Adv. in Theor. and Math. Phys. 4 (2000) 6. Jaffe, A.: Quantum harmonic analysis and geometric invariants. Advances in Math. 143, 1–110 (1999) 7. Jaffe, A.: Quantum invariants. Commun. Math. Phys. 209, 1–12 (2000) 8. Jaffe, A.: Twist positivity. Ann. Phys. 278, 10–61 (1999)
124
A. Jaffe
9. Jaffe, A.: The holonomy expansion, index theory, and approximate supersymmetry. Ann. Phys. 279, 161–262 (2000) 10. Jaffe, A.: Twist fields and constructive quantum field theory. In preparation 11. Jaffe, A.: Twist fields, the elliptic genus, and hidden symmetry. Proc. Nat. Acad. Sci. 97, 1418–1422 (2000) 12. Jaffe, A., and Lesniewski, A.: A Priori Estimates for the N = 2, Wess–Zumino Model on a Cylinder. Commun. Math. Phys. 114, 553–575 (1988) 13. Jaffe, A., Lesniewski, A., and Osterwalder, K.: Quantum K-theory I. The Chern character. Commun. Math. Phys. 118, 1–14 (1988) 14. Jaffe, A., Lesniewski, A., and Osterwalder, K.: Stability for a class of bi-local Hamiltonians. Commun. Math. Phys. 155, 183–197 (1993) 15. Jaffe, A., Lesniewski, A., and Weitsman, J.: The two-dimensional, N=2 Wess–Zumino model on a cylinder. Commun. Math. Phys. 114, 147–165 (1988) 16. Kato, T.: Perturbation Theory for Linear Operators. Berlin–Heidelberg–New York: Springer, 1966 17. Kawai, T., Yamada, Y., and Yang, S.-K.: Elliptic genera and N = 2 superconformal field theory. Nucl. Phys. B 414, 191–212 (1994) 18. Schatten, R.: Norm Ideals of Completely Continuous Operators. Second Edition, Berlin–Heidelberg–New York: Springer, 1970 19. Vafa, C.: String vacuua and orbifoldized LG models. Modern Phys. Lett. A 4, 1169–1185 (1989) 20. Whittaker, E. T., and Watson, G. N.: A course of Modern Analysis. Fourth Edition, Cambridge: Cambridge University Press, 1962 21. Witten, E.: On the Landau–Ginzburg description of N = 2 minimal models. Internat. J. Modern Phys. A 9, 4783–4800 (1994) Communicated by G. Mack and W. Zimmermann
Commun. Math. Phys. 219, 125 – 140 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Polarization-Free Generators and the S-Matrix Hans-Jürgen Borchers1 , Detlev Buchholz1 , Bert Schroer2, 1 Institut für Theoretische Physik, Universität Göttingen, Bunsenstraße 9, 37073 Göttingen, Germany 2 Fachbereich Physik, Freie Universität Berlin, Arnimallee 14, 14195 Berlin, Germany
Received: 11 April 2000 / Accepted: 20 April 2000
Dedicated to the memory of Harry Lehmann Abstract: Polarization-free generators, i.e. “interacting” Heisenberg operators which are localized in wedge-shaped regions of Minkowski space and generate single particle states from the vacuum, are a novel tool in the analysis and synthesis of two-dimensional integrable quantum field theories. In the present article, the status of these generators is analyzed in a general setting. It is shown that such operators exist in any theory and in any number of spacetime dimensions. But in more than two dimensions they have rather delicate domain properties in the presence of interaction. If, for example, they are defined and temperate on a translation-invariant, dense domain, then the underlying theory yields only trivial scattering. In two-dimensional theories, these domain properties are consistent with non-trivial interaction, but they exclude particle production. Thus the range of applications of polarization-free generators seems to be limited to the realm of two-dimensional theories. 1. Introduction Local quantum field theory provides the adequate setting for elementary particle physics. It allows one to express in mathematical terms the basic features of relativistic quantum physics, such as Einstein causality, Poincaré covariance and the Nahwirkungs-principle, which are encoded in the local field equations and commutation relations. The physical interpretation of the theory relies on asymptotic notions, however, based on the particle concept. The way from the local fields to the asymptotic particle interpretation was paved in the seminal work of Lehmann, Symanzik and Zimmermann [1] who invented a consistent collision theory and established reduction formulas for the computation of scattering matrix elements. Little is known about the opposite road, however, i.e. the reconstruction of a local theory from a given scattering matrix. This problem, sometimes called the Present address: CBPF, 22290-180 Rio de Janeiro, Brazil
126
H.-J. Borchers, D. Buchholz, B. Schroer
form-factor program [2], is for example of importance in the construction of integrable field-theortic models, cf. [3] for some interesting progress in this respect. It was recently noticed [4] that certain interacting theories in two spacetime dimensions admit an important intermediate step in this program. Namely, there exist semi-local polarization-free generators, which are localized in wedge-shaped regions of Minkowski space and generate from the vacuum single particle states, similarly to free fields. Important features of the theory, such as the crossing symmetry of the scattering matrix, are encoded in simple analyticity properties of the correlation functions of these operators (KMS-condition). Moreover, their algebraic properties can directly be expressed in terms of the elastic scattering amplitudes. This interesting observation warrants a more systematic investigation of polarization-free generators in the general setting of local quantum field theory. The present article is devoted to such a study. We will show that there exist polarization-free generators in any local theory with a non-trivial particle spectrum, irrespective of the number of spacetime dimensions. It turns out, however, that these operators are generically unbounded and their domains of definition exhibit some delicate features if there is interaction. If, for example, the polarization-free generators are defined on a translation invariant domain and the norms of their respective images stay polynomially bounded for large translations, then the elastic scattering amplitudes inevitably vanish in more than two spacetime dimensions. In two-dimensional theories, these domain properties are consistent with interaction, but they exclude particle production. Thus the only non-trivial theories in which such temperate families of polarization-free generators can be defined seem to be models of the type studied in [4]. The upshot of our investigation is the insight that polarization-free generators always exist. But in view of their subtle domain properties, they are not accessible to Fourier analysis in most cases of physical interest. They are therefore not suitable for a general analysis and synthesis of collision states and do not provide the desired universal link between the scattering matrix and the local interacting fields. Some further aspects of our results are mentioned in the conclusions. 2. Existence of Polarization-Free Generators Explicit non-trivial examples of polarization-free generators were invented in some twodimensional relativistic quantum field theories [4]. But their generic features can be stated more clearly in the general framework of local quantum physics [5]. For the convenience of the reader who is not familiar with this setting, we briefly recall in the following the relevant notions and explain our notation. We assume that we are given a local, relativistic quantum field theory in d-dimensional Minkowski space R d . But instead of dealing with the unbounded field operators, we proceed to their local bounded functions [6]. Each of the resulting bounded operators is associated to some region O ⊂ R d , fixed by the support of the test functions involved in the smearing of the field operators. We say that these operators are localized in O, for short. The collection of all operators localized in a particular region O generates a *-algebra A(O) on the underlying physical Hilbert space H which is closed in the weak operator topology. (It is thus a von Neumann algebra.) The family of these local algebras inherits from the underlying quantum field theory the following fundamental properties: 1. (Locality). The assignment O → A(O)
(2.1)
Polarization-Free Generators and the S-Matrix
127
defines a net over Minkowski space, i.e. an inclusion preserving mapping. The net complies with the principle of locality, that is operators affiliated with spacelike separated regions commute. Besides these “local algebras”, we also consider algebras A(W) associated to wedge– shaped regions W of the form (given in proper coordinates) W1 = {x ∈ R d : x1 ≥ |x0 |, x2 , . . . xd−1 arbitrary}
(2.2)
as well as their Poincaré transforms. They are the smallest von Neumann algebras containing all local algebras A(O) with O ⊂ W. Because of locality, the algebra A(W ) associated to the spacelike complement W of W commutes with A(W), A(W ) ⊂ A(W) . 2. (Covariance). The group of spacetime translations R d acts on H by a continuous unitary representation U which induces automorphisms of the net. Thus for any translation x ∈ R d and region O ⊂ R d U (x)A(O)U (x)−1 = A(O + x)
(2.3)
in an obvious notation. 3. (Spectrum). The joint spectrum of the generators P of U (the physical energy– momentum spectrum) is contained in the closed forward lightcone, sp P ⊂ {p ∈ R d : p0 ≥ |p|}.
(2.4)
Moreover, there is a unit vector ∈ H, unique up to a phase, which is invariant under the action of U and cyclic for the local algebras A(O) (Reeh–Schlieder-property). This vector describes the vacuum state. Because of this familiar form of the energy-momentum spectrum, the mass operator M = (P02 − P 2 )1/2
(2.5)
is positive selfadjoint with spectral resolution E( · ). If there are particles of mass m in the theory, the spectral projection Em = E({m}) is different from zero. It is our aim to show that there exist operators which are localized in wedge regions and generate from the vacuum single particle states with mass m. The formal characterisation of such operators is given in the subsequent definition. We recall in this context that a closed operator is said to be affiliated with a von Neumann algebra M if it commutes on its domain with all elements of the commutant M of M. Its adjoint is then also affiliated with M. Definition. A closed operator G is called polarization-free generator of mass m if (a) it is affiliated with a wedge algebra A(W), (b) is contained in the domains of G and G∗ , and (c) G, G∗ are elements of Em H. For the proof that polarization-free generators exist in any theory, we make use of Tomita–Takesaki-Theory [7]. We begin by recalling some basic facts from this theory for the case at hand. Since is cyclic and separating for the wedge-algebras A(W) by the Reeh–Schlieder property and locality, one can consistently define the Tomita conjugations SW , setting SW A = A∗ ,
A ∈ A(W).
(2.6)
128
H.-J. Borchers, D. Buchholz, B. Schroer
These operators are closable anti-linear involutions. Their closures, which we denote by the same symbol, have the polar decomposition 1/2
SW = JW W .
(2.7)
Here JW is an anti-unitary operator, the modular conjugation, and W , the modular operator, is strictly positive and selfadjoint. The following well-known fact is of fundamental importance in the present context. We therefore sketch its proof. Lemma 2.1. Let be any vector in the domain of SW . There exists a closed operator F which (a) is affiliated with A(W), (b) has, together with its adjoint F ∗ , the vector in its domain and (c) satisfies F = ,
F ∗ = SW .
(2.8)
Proof. Since the set of vectors A(W) is a core for SW and SW is closed, there is a sequence Fn ∈ A(W) such that Fn → and SW Fn → SW , strongly. Thus if A ∈ A(W) , one also has Fn A = A Fn → A and Fn∗ A = A Fn∗ → A SW . So the operator F , given by F A = lim Fn A = A , n→∞
A ∈ A(W) ,
(2.9)
is well defined. Its adjoint F ∗ also has the dense set of vectors A(W) in its domain and F ∗ A = lim Fn∗ A = A SW , n→∞
A ∈ A(W) .
(2.10)
This shows that F is closable (we use the symbol F also for its closure), and it establishes part (b) and (c) of the statement since 1 ∈ A(W) . For part (a) one makes use of the fact that for any vector ∗ in the domain of F ∗ and A , B ∈ A(W) , (A ∗ , FB ) = ( ∗ , A∗ B ) = ( ∗ , F A∗ B ) = (A F ∗ ∗ , B ).
(2.11)
Hence |(A ∗ , FB )| ≤ const · ||B ||, B ∈ A(W) . So A ∗ lies in the domain of F ∗ and F ∗ A ∗ = A F ∗ ∗ , A ∈ A(W) . An analogous statement holds for F ∗∗ = F , so the proof of the lemma is complete. In view of this lemma, it suffices for the proof of the existence of polarization-free generators to exhibit non-zero vectors 1 ∈ Em H in the domain of SW such that also SW 1 ∈ Em H. To accomplish this, we have to take a closer look at the modular operators and conjugations. Fortunately, we have sufficiently concrete information about these objects in the present general setting. Since W is strictly positive, we can proceed to the corresponding unitary group is W , s ∈ R, called modular group. It is an important consequence of the spectral properties of the generators of U and covariance [8] that for x ∈ Rd , −is is W U (x)W = U ((s)x)
and
−1 JW U (x)JW = U (x),
(2.12)
where (s), s ∈ R, is (with some appropriate scaling of s) the one-parameter group of boosts leaving the wedge W invariant and is the reflection about the edge of W. If, for example, W1 is the wedge given in (2.2), x± ∈ R · (±1, 1, 0, . . . 0) are any
Polarization-Free Generators and the S-Matrix
129
two lightlike tranlations in the characteristic planes forming the boundary of W1 , and x⊥ = (0, 0, x2 , . . . xd−1 ) is any translation along the edge of W1 , then 1 (s)x± = e±2πs x± , 1 (s)x⊥ = x⊥
and
1 x± = −x± , 1 x⊥ = x⊥ .
(2.13)
Thus the modular groups and conjugations act on the translations U like Lorentz transformations. As a matter of fact, these operators generate a representation of the proper Poincaré group in generic cases according to the Bisognano–Wichmann theorem [9]. But this more detailed information is not needed here. Knowing that the modular groups and conjugations act on the generators P of U like Lorentz transformations, we conclude that the mass operator M, being invariant under Lorentz transformations of P , commutes both with JW and is W , s ∈ R. The same is true 1/2 for the spectral projections Em of M, hence SW = JW W commutes on its domain D(SW ) with Em . This implies that Em D(SW ) is a dense subspace of Em H which is stable under the action of SW since SW 2 = 1, cf. relation (2.6). Applying the preceding lemma, we have thus established the existence of polarization-free generators. Theorem 2.2. Given any m in the discrete spectrum of the mass operator and any wedge W, there exist polarization-free generators G of mass m which are affiliated with A(W). In fact, for any vector 1 in the dense subspace Em D(SW ) of Em H, there is a G such that G = 1 and G∗ = SW 1 ∈ Em D(SW ). The simplest example illustrating this existence theorem is free field theory. We briefly discuss it here in order to indicate a subtle point in applications of this abstract result. Let φ0 be the free massive scalar field acting on Fock space. It is well known [10] that the field operators φ0 (f ), smeared with real test functions f with compact support in some O ⊂ R d , are essentially self-adjoint on the domain D0 consisting of all vectors with a finite particle number. They generate, by their spectral resolutions, the local algebras A0 (O) and are thus affiliated with the wedge-algebras A0 (W) whenever supp f ⊂ W. Since the operators φ0 (f ) also generate single particle states from the vacuum, they are polarization free generators in the sense defined above. We mention as an aside that the dense set of vectors A(W) is a core for φ0 (f ) for any wedge W ⊃ supp f . This implies that, by the preceding general construction, one would recover φ0 (f ) from the single particle state φ0 (f ) and the net. The full domains of the locally smeared free fields φ0 (f ) are not invariant under spacetime translations, but they contain the common core D0 which has this property. As a matter of fact, D0 is also invariant under Lorentz transformations and the vectorvalued functions (, x) → φ0 (f )U0 (, x),
(2.14)
where U0 denotes the underlying unitary representation of the Poincaré group, are strongly continuous for each ∈ D0 . Moreover, ||φ0 (f )U0 (, x)|| ≤ const,
(2.15)
uniformly for all Poincaré transformations (, x). We emphasize that the existence of such a domain D0 on which polarization-free generators exhibit a “temperate behaviour” with respect to spacetime transformations does not follow from the general theorem. But it seems to be an indispensible requirement if one wants to use these operators in the analysis of collision states and of scattering amplitudes. For that analysis is based on Fourier transformation, which is only meaningful if the underlying functions do not increase too rapidly at infinity. We therefore take a closer look at such temperate generators in the subsequent section.
130
H.-J. Borchers, D. Buchholz, B. Schroer
3. Temperateness and Absence of Interaction In view of the preceding considerations, we restrict attention now to those theories which admit polarization-free generators with a temperate behaviour with respect to translations. Definition. A polarization-free generator G is said to be temperate if there is a dense subspace D of its domain which is stable under translations, such that for any ∈ D the function x → GU (x), ∈ D, is strongly continuous and polynomially bounded in norm for large x, and the same holds true also for its adjoint G∗ . The respective subspaces are called domains of temperateness. It turns out that this regularity requirement imposes severe constraints on the underlying theory and excludes interaction if the dimension of spacetime is larger than two. In the proof of this statement, we restrict attention to massive theories, describing a single scalar particle of mass m, so the spectrum of U has the form sp U = {0} ∪ {p ∈ R d : p0 = (p2 + m2 )1/2 } ∪ {p ∈ R d : p0 ≥ (p2 + 4m2 )1/2 }, (3.1) where m > 0, but our arguments also apply to theories with a more complex particle spectrum. In a first step we show that temperate polarization-free generators lead to solutions of the Klein–Gordon equation and have in their domains single particle states with compact energy-momentum support about any given point on the “mass shell” {p ∈ R d : p0 = (p2 + m2 )1/2 }. Lemma 3.1. Let G be a temperate polarization-free generator of mass m. Then (a) x → G(x) = U (x)GU (x)−1 is a weak solution of the Klein–Gordon equation of mass m on the domain of temperateness D. (b) The domain D contains a dense set of vectors with compact spectral support. In particular, there exist single particle states of mass m in D with spectral support in any given neighborhood of any point on the mass shell. Corresponding statements hold also for the adjoint G∗ of G. Proof. (a) If G is affiliated with the wedge algebra A(W), say, the operators G(x) are affiliated with A(W +x) by covariance. Now for x varying in some open, bounded region U ⊂ R d , there is a wedge W0 ⊃ ∪x∈U (W + x), hence the operators G(x)∗ , x ∈ U, contain the common dense subspace A(W0 ) in their domains. Thus for ∈ D, (G(x), A ) = (, G(x)∗ A ) = (, A G(x)∗ ),
A ∈ A(W0 ) .
(3.2)
Since G∗ ∈ Em H, the function x → G(x)∗ = U (x)G∗ is a weak solution of the Klein–Gordon equation, so one obtains from the preceding equation for any test function f with support in U, dx ( + m2 )f ∗ (x) (G(x), A ) = 0, A ∈ A(W0 ) , (3.3) where f ∗ is the complex conjugate of f . Making use of the temperateness assumption and the Reeh–Schlieder property of , this implies dx ( + m2 )f (x) G(x) = 0, (3.4)
Polarization-Free Generators and the S-Matrix
131
where the integral is defined in the strong sense. Since U was arbitrary, the latter equation extends to all test functions f ∈ S(R d ) by continuity. (b) The set of vectors of the form (f ) = dx f (x)U (x), where ∈ D and f is any test function whose Fourier transform fhas compact support, has compact spectral support and it is dense in H since D is dense and U is continuous. By choosing the support of f properly, one obtains single particle states with spectral support in any given neighborhood of any point on the mass shell. It remains to be shown that these vectors belong to the domain of temperateness of G. There holds for any vector ∗ in the domain of G∗ , |((f ), G∗ ∗ )| = dx f ∗ (x) (U (x), G∗ ∗ ) ≤ dx |f (x)| |(GU (x), ∗ )| ∗ ≤ dx |f (x)| ||GU (x)|| || || ≤ dx |f (x)| Q(x) ||∗ || (3.5) for some polynomial Q, depending only on by the temperateness assumption. Hence (f ) is an element of the domain of G∗∗ = G, and the same holds true for U (x)(f ) = (fx ), where fx (y) = f (y − x), y ∈ R d , is the translated test function. The continuity of x → GU(x)(f ) and temperateness follow from the estimate ||G (U (x)(f ) − (f ))|| ≤ dy |fx (y) − f (y)| Q(y) with the same polynomial Q as above. Hence (f ) is an element of the domain of temperateness D. The corresponding statements for G∗ are established in the same manner. Picking any single particle state 1 ∈ D with spectral support in a given compact region on the mass shell, let us turn next to the interpretation of the vectors G(x)1 . As x → G(x) is a solution of the Klein–Gordon equation, one may expect – guided by the LSZ asymptotic condition – that these vectors describe asymptotic two-particle states. But in view of the weak localization properties of the generators G and the domain problems involved, some care is needed in the analysis. We rely in our argument on an approach to collision theory established by Hepp [11] for the proof of the LSZ reduction formulas in the general framework of local quantum field theory, cf. also [12]. The main ingredient are quasilocal operators A(ft ) of the form (3.6) A(ft ) = dx ft (x) A(x). Here A ∈ A(O) are local operators, where the localization region O is held fixed in the following, and the functions ft , t ∈ R, are given by −d/2 dp f(p) ei(p0 −ωp )t e−ipx , (3.7) ft (x) = (2π ) where f ∈ S(R d ) and ωp = (p 2 + m2 )1/2 . If f has support in a sufficiently small neighborhood of some point on the mass shell, A(ft ) is an element of Em H which does not depend on t. Moreover, lim A(ft ) = A(f ) in ,
t → ∓∞
out
lim A(ft )∗ = A(f ) in ∗ ,
t → ∓∞
(3.8)
out
where A(f )in , A(f )out are the creation operators of an incoming, respectively outgoing particle which is in the state A(f ), and their adjoints A(f )in∗ , A(f )out∗ are the corresponding annihilation operators.
132
H.-J. Borchers, D. Buchholz, B. Schroer
These asymptotic relations have been established in [11, 12] for some dense set of “decent” collision states . But, making use of the fact that the operator norms ||A(ft )E()|| and ||A(ft )∗ E()|| are uniformly bounded in t for any compact subset of the spectrum of U [13], they can be extended by continuity to all states with compact spectral support. Thus there holds in particular for the single particle states 1 considered above lim A(ft ) 1 = A(f ) × 1 in , lim A(ft )∗ 1 = (A(f ), 1 ) , t → ∓∞
out
t →∓ ∞
(3.9) where we employ the standard notation for collision states. In the subsequent discussion, we will make use of the support properties of the functions ft for asymptotic t [11]. Let '(f ) = { (1, p/ωp ) : p ∈ suppf}
(3.10)
be the “velocity support” of f and let χ be any smooth function which is equal to 1 on '(f ) and vanishes in the complement of some slightly larger region 'ε (f ). The asymptotically dominant part of ft is given by x → fˆt (x) = χ (x/t)ft (x). It is thus a test function with support in t 'ε (f ), t = 0. The resulting remainder x → fˇt (x) = (1 − χ (x/t))ft (x) tends to zero in the topology of S(R d ) as t → ±∞. The decomposition ft = fˆt + fˇt will be repeatedly used in the following arguments. Assuming for the sake of concreteness that the given generator G is affiliated with A(W1 ), where W1 is the wedge defined in (2.2), we introduce the following partial ordering of sets with reference to that wedge. Definition. Let 'a , 'b ⊂ R d be compact sets. 'a is said to be a precursor of 'b , 'a ≺ 'b in formula form, if 'b − 'a (the set of all difference vectors) is contained in W1 . Since 'a , 'b are compact, the set 'b − 'a is compact as well. Hence, as W1 is an open cone, it follows from 'a ≺ 'b that there is some δ > 0 such that t 'b − t 'a ⊂ W1 + (0, tδ, 0, . . . 0) for t > 0. In view of {W1 + x} = −W1 + x, this implies t 'a + (0, tδ, 0, . . . 0) ⊂ (W1 + t 'b ) , t > 0, i.e. the sets t 'a and (W1 + t 'b ) are spacelike separated and their spatial distance increases linearly with t. We shall apply the above order relation to the velocity supports of test functions, defined in (3.10), as well as to the velocity supports of single particle states, which are defined in an analogous manner by '(1 ) = {(1, p/ωp ) : p ∈ supp1 },
(3.11)
where supp1 is the spectral support of 1 . After these preparations, we can clarify now the interpretation of the vectors G(g)1 = dx g(x)G(x)1 in terms of asymptotic two-particle states. Lemma 3.2. Let G be a temperate polarization-free generator of mass m which is localized in the wedge W1 , let 1 be a single particle state in its domain D of temperateness with compact spectral support, and let g ∈ S(R d ) be any test function whose Fourier transform has support in a sufficiently small neighborhood of some point on the mass shell. Then
Polarization-Free Generators and the S-Matrix
133
(a) G(g)1 = (G(g) × 1 )in if '(g) ≺ '(1 ), (b) G(g)1 = (G(g) × 1 )out if '(1 ) ≺ '(g). Proof. Let ∗ be any vector in the domain of temperateness of G∗ with compact spectral support. Then ∗ (3.12) (G(g)1 , ) = dx g ∗ (x)(1 , G∗ (x) ∗ ) = (1 , G∗ (g ∗ ) ∗ ), where G∗ (g ∗ ) ∗ = dx g ∗ (x)G∗ (x) ∗ is defined as a strong integral. Now by the Reeh–Schlieder property of , there is for any δ > 0 an A ∈ A(O) and a test function f whose Fourier transform has support in any given neighbourhood of supp1 such that ||1 − A(f )|| < δ and A(f ) ∈ Em H. In view of the latter fact, one may replace f in A(f ) by any member of the corresponding family of test functions ft , defined in (3.7). Proceeding to the decomposition ft = fˆt + fˇt and taking into account that ˇ ˇ ||A(ft )|| ≤ dx |ft (x)| ||A|| → 0 as t → ±∞, it follows that A(f ) = A(ft ) = lim A(fˆt ). t → ±∞
(3.13)
Similarly, since x → G∗ (x) ∗ is a weak solution of the Klein–Gordon equation according to Lemma 3.1, one may replace g in G∗ (g ∗ ) ∗ by gt . For g (p) (1 − ei(p0 −ωp )t ) −ipx 2 −d/2 dp , (3.14) e (gt − g)(x) = ( + m ) (2π ) (p0 + ωp ) (p0 − ωp ) and the expression under the integral is a test function because of the support properties of g . Making use of the decomposition gt = gˆ t + gˇ t and temperateness, which implies (3.15) ||G∗ (gˇ t∗ ) ∗ || ≤ dx |gˇ t (x)| ||G∗ U (x) ∗ || ≤ dx |gˇ t (x)| Q(x) for some polynomial Q, one finds that G∗ (gˇ t∗ ) ∗ → 0 as t → ±∞ and G∗ (g ∗ ) ∗ = G∗ (gt ∗ ) ∗ = lim G∗ (gˆ t∗ ) ∗ , t → ±∞
(3.16)
strongly. Combining these facts, one gets (A(f ), G∗ (g ∗ ) ∗ ) = lim (A(fˆt ), G∗ (gˆ t∗ ) ∗ ) t → ±∞ dx gˆ t∗ (x) (A(fˆt ), G∗ (x) ∗ ). = lim
(3.17)
t → ±∞
According to the choice of the test function f , its velocity support '(f ) is contained in a small neighborhood 'ε (1 ) of '(1 ), and consequently the operators A(fˆt ) are localized in O + t 'ε (1 ). On the other hand, the operators G∗ (x), appearing under the integral in (3.17), are affiliated with A(W1 + x), x ∈ t 'ε (g). Because of locality, they commute with A(fˆt ) on their respective domains if W1 + t 'ε (g) is spacelike separated from O + t 'ε (1 ). In case (a) of the statement, there holds −'(1 ) ≺ −'(g) and therefore also −'ε (1 ) ≺ −'ε (g) if the respective neighborhoods are suitably chosen. Hence, according to the above geometrical considerations, the regions −|t| 'ε (1 ) and W1 − |t| 'ε (g)
134
H.-J. Borchers, D. Buchholz, B. Schroer
are spacelike separated, and their spatial distance increases linearly with |t|. Because of the latter fact and since O is bounded, the two regions O + t 'ε (1 ) and t 'ε (g) are spacelike separated if t < 0 and |t| is sufficiently large. One can then reexpress the integral in (3.17) according to dx gˆ t∗ (x) (A(fˆt ), G∗ (x) ∗ ) = dx gˆ t∗ (x) (, G∗ (x) A(fˆt )∗ ∗ ) (3.18) ∗ ∗ ˆ = (G(gˆ t ) , A(ft ) ). In the latter expression, one can reverse now the passage from ft , gt to their respective asymptotically dominant parts, taking into account that, in the limit of asymptotic t, ||G(gˇ t )|| → 0, ||A(fˇt )|| → 0, and ||A(ft )∗ ∗ || ≤ const since ∗ has compact spectral support [13]. Hence, by a straightforward estimate, one finds that (G(gˆ t ) , A(fˆt )∗ ∗ ) and (G(gt ) , A(ft )∗ ∗ ) = (A(ft ) G(g) , ∗ ) have the same limit as t → −∞. Plugging this information into relation (3.17) and making use of the asymptotic formula (3.9), one arrives at (A(f ), G∗ (g ∗ ) ∗ ) = lim (A(ft ) G(g) , ∗ ) t→−∞
= ((A(f ) × G(g) )in , ∗ ).
(3.19)
In the resulting equation, one can replace A(f ) by 1 since ||1 − A(f )|| < δ, where δ > 0 was arbitrary, and ||(1 × G(g) )in − (A(f ) × G(g) )in || √ ≤ 2 ||1 − A(f )|| ||G(g) ||,
(3.20)
by the Fock structure of collision states. In view of relation (3.12), this completes the proof of part (a) of the statement. The proof of part (b) is similar, but now one has to take into account that the regions O + t 'ε (1 ) and t 'ε (g) are spacelike separated if t > 0 is sufficiently large. So in this case one arrives at an interpretation of the vectors G(g)1 in terms of outgoing collision states. In the next step, we establish a weak form of commutation relations between the operators G(x) and the asymptotic creation and annihilation operators. The result will enable us to compute scattering amplitudes. Lemma 3.3. Let , ∗ be vectors with compact spectral support in the domains of temperateness of G and G∗ , respectively, and let f, g be test functions whose Fourier transforms have support in small neighborhoods of points on the mass shell. Then (a) (G(g) , A(f )in • ∗ ) = (A(f )in • ∗ , G∗ (g ∗ ) ∗ ) if '(g) ≺ '(f ), (b) (G(g) , A(f )out • ∗ ) = (A(f )out • ∗ , G∗ (g ∗ ) ∗ ) if '(f ) ≺ '(g). Here the symbol X• stands for both the operator X and its adjoint X ∗ . Proof. The argument is very similar to the proof of the preceding lemma and it therefore suffices to indicate the main steps. In case (a) one has, in view of the fact that G(gt ) = G(g) , t ∈ R, and the asymptotic relation (3.8), (G(g) , A(f )in • ∗ ) = (G(gt ) , A(f )in • ∗ ) = lim (G(gt ) , A(ft )• ∗ ) t→−∞
= lim (G(gˆ t ) , A(fˆt )• ∗ ), t→−∞
(3.21)
Polarization-Free Generators and the S-Matrix
135
where, in the last step, ft , gt have been replaced by their asymptotically dominant parts. Since '(g) ≺ '(f ), the regions W1 + t 'ε (g) and O + t 'ε (f ) are spacelike separated for t < 0 and |t| sufficiently large. Hence, by locality, lim (G(gˆ t ) , A(fˆt )• ∗ ) = lim (A(fˆt )•∗ , G∗ (gˆ t∗ ) ∗ )
t→−∞
t→−∞
= lim (A(ft )•∗ , G∗ (gt ∗ ) ∗ ) t→−∞
(3.22)
= (A(f )in • ∗ , G∗ (g ∗ ) ∗ ), where, in the second equality, the transition from ft , gt to the asymptotically dominant parts has been reversed and, in the last step, the asymptotic relation (3.8) has been used as well as the fact that G∗ (gt ∗ ) ∗ = G∗ (g ∗ ) ∗ , t ∈ R. This establishes statement (a). The proof of (b) is analogous. In a final step, we have to determine the spectral support of G with respect to the spatial momentum operators P in order to see which single particle states can be generated by G from the vacuum. Lemma 3.4. Let G be a polarization-free generator which is affiliated with the algebra A(W1 ). The spectral support of G with respect to the spatial momentum operators P = (P1 , P2 , . . . Pd−1 ) is equal to R × C, where C ⊂ Rd−2 is a closed set with open interior. Proof. Let A be any local operator which is localized in W1 . Since x → G(x) and x → G∗ (x) are solutions of the Klein–Gordon equation of mass m, the commutator function x → C(x) = (A, G(x)) − (G∗ (x), A∗ ) can be represented in the form dp (K+ (p) eiωp x0 − K− (p) e−iωp x0 )e−i px . C(x) = 2ωp
(3.23)
(3.24)
Here the functions K± are given by K+ (p) = (A)(p)∗ (G)(p), K− (p) = (G∗ )(−p)∗ (A∗ )(−p),
(3.25)
where p → (A)(p) is the momentum space wave function of the single particle vector Em A, and similarly for the other terms. Because of the localization properties of the operators A and G, the commutator function x → C(x) and its time derivative vanish at time x0 = 0 in the half space {x ∈ Rd−1 : x1 > 0}. In view of the representation (3.24), this implies that the functions p→
1 (K+ (p) − K− (p)) ωp
and
p → (K+ (p) + K− (p))
(3.26)
can be analytically continued in p1 into the lower half plane. As p → ωp is analytic in p1 in a strip about the origin, the functions p → K± (p) can likewise be analytically continued in p1 into some strip of the lower half plane.
136
H.-J. Borchers, D. Buchholz, B. Schroer
Now if U ⊂ R and V ⊂ Rd−2 are open sets such that p → (G)(p) vanishes for almost all spatial momenta p = (p1 , p⊥ ) with p1 ∈ U and p ⊥ ∈ V, the function p → K+ (p) vanishes for these momenta as well. Being the boundary value of an analytic function with respect to p1 , K+ (p) therefore vanishes for all p1 ∈ R and p⊥ ∈ V. Since A ∈ A(W1 ) was arbitrary and the set of single particle states Em A, A ∈ A(W1 ), is dense in Em H, this implies (G)(p) = 0 for p1 ∈ R and p ⊥ ∈ V. Thus the complement of the support of p → (G)(p) in momentum space Rd−1 has the form R × V, where V ⊂ Rd−2 is open. Hence, disregarding sets of measure 0, the statement follows since G is different from zero. We have accumulated now sufficient information in order to proceed to the computation of the scattering amplitudes in the underlying theory, provided the dimension of spacetime is larger than two. Let p = (p1 , p⊥ ) be any vector in the spectral support of G with respect to the spatial momentum operators and let p1 < 0 and p⊥ = 0. We pick a test function g whose Fourier transform has support in a sufficiently small neighborhood of (ωp , p), and a single particle state 1 which is an element of the domain of temperateness of G with spectral support in a small neighborhood of (ωp , −p). Hence '(g) ≺ '(1 ) and consequently G(g)1 = (G(g) × 1 )in by Lemma 3.2. As p⊥ = 0, there are spatial momenta q with q1 < p1 and |q| = |p| (here the dimension of spacetime enters). For any such q, we choose a test function f whose Fourier transform has support in a small neighborhood of (ωq , q) such that '(f ) ≺ '(g). Finally, we pick a single particle state 1∗ in the domain of temperateness of G∗ with spectral support about (ωq , −q). After these preparations, we can apply Lemma 3.3 and compute ((G(g) × 1 )in , (A(f ) × 1∗ )out ) = (G(g)1 , A(f )out 1∗ ) = (A(f )∗out 1 , G∗ (g ∗ )1∗ ) = 0,
(3.27)
where, in the last step, we used the fact that A(f )out ∗ 1 = (A(f ), 1 ) = 0 since '(f ) ≺ '(1 ). Varying f, g and 1 , 1∗ within the above limitations, it follows that elastic scattering processes of two particles with initial momenta about p, −p and final momenta about q, −q – although admitted by the energy–momentum conservation law – do not occur in the underlying theory. This result implies that the elastic two-particle scattering amplitude T vanishes identically. For the proof of this statement, we recall that in a relativistic theory T = T (s, t) is a distribution with respect to the invariants s, t (the squares of the energy in the center of mass system and the momentum transfer, respectively). Thus for p, q as above, s = 4(m2 + p 2 ) and t = −2p 2 (1 − cos θ), where θ is the scattering angle. It is a well–known consequence of locality, relativistic covariance, and the form of the energy– momentum spectrum that T (s, t) is, in the physical region s ≥ 4m2 − t for fixed t ≤ 0, the boundary value from Ims > 0 of an analytic function in the cut s–plane. On the other hand, for fixed s, it is analytic in the variable cos(θ ) in the Lehmann ellipse [14] with foci at ±1 and semi-minor axis of length 6m2 / s(s − 4m2 ), cf. [15] for an exposition of these basic facts. By the preceding computations, we know that the scattering amplitude vanishes for s, t in some open set. In fact, taking into account that all momenta p = (p1 , p⊥ ) with p1 ∈ R− belong to the spectral support of G for some fixed p ⊥ = 0 (cf. Lemma 3.4) and varying q within the above limitations, we get T (s, t) = 0
for s > 4(m2 + |p ⊥ |2 ),
0 > t > −4|p ⊥ |2 .
(3.28)
Polarization-Free Generators and the S-Matrix
137
By analyticity in cos(θ ), this equality extends to all scattering angles and hence to all t for the given range of s. Analyticity of T (s, t) in s then implies that the scattering amplitude vanishes everywhere. It is a well-known consequence of this result that then there can be no non-trivial multi-particle scattering or particle production either [16]. These implications hold in any number d > 2 of spacetime dimensions [17]. So we arrive at the following statement. Theorem 3.5. If in a local, relativistic quantum field theory of a scalar massive particle in d > 2 spacetime dimensions there exists a temperate polarization-free generator, then the scattering matrix is trivial. For the sake of simplicity, we have restricted attention in the preceding analysis to theories describing a single scalar massive particle. But it should be clear from our discussion that similar results hold in theories with a more complex particle spectrum. There one finds that particles whose states can be generated from the vacuum by temperate polarizationfree generators do not participate in collision processes. For the derivation of this result it is actually not necessary to assume that the collision states can be constructed by local operators. Cone-like localized “interpolating operators”, whose existence has been established in all theories of massive particles [18], are completely sufficient for the proof. We therefore conclude that in the presence of interaction there is no room for temperate polarization-free generators in more than two spacetime dimensions. 4. Polarization-Free Generators in Two Dimensions The analysis in the preceding section did not lead to any restrictions on the form of the elastic scattering amplitudes in two spacetime dimensions. For configurations of asymptotic particle momenta which would allow one to show that the scattering amplitudes have to vanish in the presence of temperate polarization-free generators cannot occur in this case because of the energy-momentum conservation law. So non-trivial theories admitting polarization-free generators can and do exist in two spacetime dimensions [4]. It seems therefore worthwhile to have a closer look at the type of constraints imposed on such theories from the present general point of view. In order to abbreviate this discussion, we assume in the following that the domains of temperateness of G and G∗ contain incoming and outgoing collision states for arbitrary configurations of particle momenta. Lemma 3.3 then implies that on these states G(g) A(f )in ∗ = A(f )in ∗ G(g) if '(g) ≺ '(f ), G(g) A(f )out = A(f )out G(g) if '(f ) ≺ '(g),
(4.1) (4.2)
and similarly for G∗ (g ∗ ). For the lemma says that on the domain of temperateness G∗ (g ∗ )∗ A(f )in ∗ ⊃ A(f )in ∗ G(g) if '(g) ≺ '(f ), say, and with the above domain assumptions one can replace the triple–star expression G∗ (g ∗ )∗ by G(g). It turns out that these commutation relations imply that there can be no particle production in the underlying theory. In the proof of this statement, we make use of the following lemma. Lemma 4.1. Let f, g1 , . . . gn be test functions whose Fourier transforms have support about points on the mass shell such that '(g1 ) ≺ · · · ≺ '(gn ) ≺ '(f ) and let A, A1 , . . . An ∈ A(O) be arbitrary local operators. Then A(f )in ∗ (A1 (g1 ) × · · · × An (gn ))out = 0.
(4.3)
138
H.-J. Borchers, D. Buchholz, B. Schroer
Proof. The proof is based on induction in n. For n = 1, one has A(f )in ∗ A1 (g1 ) = (A(f ), A1 (g1 )) = 0
(4.4)
because of the support properties of f, g1 in momentum space. Assuming that the statement holds for n, let g be a test function whose Fourier transform has support about points on the mass shell such that '(g1 ) ≺ · · · ≺ '(gn ) ≺ '(g) ≺ '(f ). It then follows from relation (4.2) that G(g) A1 (g1 )out · · · An (gn )out = A1 (g1 )out · · · An (gn )out G(g) = (A1 (g1 ) × · · · × An (gn ) × G(g))out .
(4.5)
Hence, by relation (4.1) and the induction hypothesis, one obtains A(f )in ∗ (A1 (g1 ) × · · · × An (gn ) × G(g))out = A(f )in ∗ G(g) A1 (g1 )out · · · An (gn )out = G(g) A(f )in ∗ A1 (g1 )out · · · An (gn )out = 0.
(4.6)
Now let An+1 ∈ A(O) be any local operator and gn+1 any test function such that '(g1 ) ≺ · · · ≺ '(gn ) ≺ '(gn+1 ) ≺ '(f ). There exists for given δ > 0 a test function g as in the preceding step such that ||An+1 (gn+1 ) − G(g) || < δ. For the spectral support of G consists of the whole mass shell according to Lemma 3.4 and consequently the set of vectors {G(g) : supp g ⊂ } is, for any compact set ⊂ R2 , dense in the corresponding spectral subspace E() Em H of single particle states. As the collision states are continuous with respect to their single particle components, one can thus replace in Eq. (4.6) the vector G(g) by An+1 (gn+1 ), proving the statement. Let us consider now an incoming collision state of two particles with momenta p1 , p2 on the mass shell and an outgoing state of n > 2 particles with mutually different momenta q1 , . . . qn . Taking advantage of the fact that in d = 2 spacetime dimensions the momenta on the mass shell are linearly ordered, we may assume without loss of generality that q1 < · · · < qn . Now if the incoming state is to evolve with non-zero probability into this outgoing state, the energy–momentum conservation law requires that p1 + p2 = q1 + · · · + qn . Taking into account that the linear order of momenta is preserved under proper orthochronous Lorentz transformations, it is not difficult to see that at least one of the incoming particle momenta, say p1 , has to be strictly larger than any one of the outgoing momenta, i.e. q1 < · · · < qn < p1 . We pick now test functions f1 , f2 and g1 , . . . gn which, in momentum space, have support about p1 , p2 and q1 , . . . qn , respectively, such that '(g1 ) ≺ · · · ≺ '(gn ) ≺ '(f1 ). Thus, for any choice of local operators A1 , . . . An+2 ∈ A(O), we obtain with the help of the preceding lemma (A1 (f1 ) × A2 (f2 ))in , (A3 (g1 ) × · · · × An+2 (gn ))out (4.7) = (A1 (f1 )in A2 (f2 ), (A3 (g1 ) × · · · × An+2 (gn ))out ) ∗ = (A2 (f2 ), A1 (f1 )in (A3 (g1 ) × · · · × An+2 (gn ))out ) = 0. But the set of collision states with non-overlapping momenta is dense in the set of all collision states, so we arrive at the conclusion that an incoming collision state of two particles can never evolve into an outgoing collision state containing more than two particles, i.e. there is no particle production in the underlying theory.
Polarization-Free Generators and the S-Matrix
139
Theorem 4.2. If a local, relativistic quantum field theory in d = 2 spacetime dimensions admits temperate polarization-free generators, there is no particle production. This result shows that in d = 2 dimensions temperate polarization-free generators can only exist in the presence of additional conservation laws, besides energy-momentum conservation. We have illustrated this fact on the example of the particle number. By a more refined analysis, one can show that also the individual particle momenta have to be preserved in multi-particle collisions. This brings us close to the structure of scattering amplitudes found in completely integrable models. In particular, the apparently general Ansatz for correlation functions of polarization-free generators, proposed by one of the authors in [4, 19], falls back to this special class of theories. Moreover, there are indications in the present general setting that temperate polarization-free generators necessarily have algebraic properties of the type found in these examples. Thus the notion of temperate polarization-free generator may not only be useful for the characterization of such integrable models, but it might also serve as a tool for their general analysis and classification. This interesting aspect of the present investigation will be discussed elsewhere. 5. Concluding Remarks Harry Lehmann, one of the pioneers of the rigorous approach to relativistic quantum field theory, liked to mock at the sometimes cumbersome subtleties appearing in this setting as “problems of inessential selfadjointness”. But his scientific work provides ample evidence to the effect that he was willing to invest mathematical diligence and care where the physical context required it. In the present article, we have encountered a surprisingly subtle feature of relativistic quantum field theory: Mathematics tells us, on one hand, that any such theory accommodates well-defined polarization-free generators. Physics, on the other hand, implies that these generators necessarily have rather peculiar domain properties which do not allow one to apply methods of Fourier analysis. Their relation to the asymptotic particle interpretation is thereby obscured. Being sloppy with regard to these domain properties, one would be led to the unpleasant conclusion that the fundamental postulates of relativistic quantum field theory exclude interaction in more than two spacetime dimensions. Thus it is this subtle point which provides the loophole for theories with non-trivial interaction. Temperate polarization-free generators exist, however, in two-dimensional integrable models and the present results indicate that they are a distinctive feature of such theories. This fact may be attributed to the presence of large groups of conservation laws in such theories which help to restrain the polarization effects of local operations. We believe that a more detailed investigation of these temperate generators is warranted and will lead to a better understanding of the specific features of these models. There is another aspect of the present analysis which deserves mentioning, namely the problem of particle statistics in low dimensions. We have discussed here only the simple case of bosons, the case of fermions being similar. But it is well-known that particles in two and three spacetime dimensions can also have anyonic or plektonic statistics (cf. [20, 21] for a systematic analysis of this issue). There is also a general collision theory for such particles [22], but it is an open question whether there exists some kind of associated free fields. A negative result to that effect is due to Mund [23], who proved that there are no operator-valued solutions of the Klein–Gordon equation which generate such particles from the vacuum and are localized in salient (pointed) spacelike cones. This ad hoc
140
H.-J. Borchers, D. Buchholz, B. Schroer
assumption about the localization is, however, crucial for the proof of this no-go theorem. In fact, as in theories of massive anyons and plektons there are still cone-like localized (vacuum polarizing) operators which generate the states of physical interest from the vacuum, one can establish the existence of wedge-localized polarization-free generators for these particles. Taking temperateness as an additional input, it may well be possible to construct from these generators in a systematic manner examples of anyonic or even plektonic theories which come close to the idea of a free field theory. References 1. Lehmann, H., Symanzik, K. and Zimmermann, W.: Zur Formulierung quantisierter Feldtheorien. Nuovo Cimento 1, 205–225 (1955) 2. Karowski, M. and Weisz, P.: Exact form-factors in (1+1)-dimensional field theoretic models with soliton behavior. Nucl. Phys. B 139, 455–476 (1978) 3. Babujian, H., Fring, A., Karowski, M. and Zapletal, A.: Exact form-factors in integrable quantum field theories: The Sine–Gordon model. Nucl. Phys. B 538, 535–586 (1999) 4. Schroer, B.: Modular wedge localization and the d = (1 + 1) form-factor program. Ann. Phys. (N.Y.) 275, 190–223 (1999) 5. R. Haag: Local Quantum Physics: Fields, Particles, Algebras. Berlin–Heidelberg–New York: Springer, 1996 6. Borchers, H.-J. and Yngvason, J.: From quantum fields to local von Neumann algebras. Rev. Math. Phys. Special issue, 15–47 (1992) 7. Takesaki, M.: Tomita’s Theory of Modular Hilbert Algebras and its Applications. Lecture Notes in Mathematics, Vol. 128, Berlin–Heidelberg–New York: Springer 1970 8. Borchers, H.-J.: On revolutionizing quantum field theory with Tomita’s modular theory. J. Math. Phys. 41, 3604–3673 (2000) 9. Bisognano, J. and Wichmann, E.H.: On the duality condition for a hermitian scalar field. J. Math. Phys. 16, 985–1007 (1975); On the duality condition for quantum fields. J. Math. Phys. 17, 303–321 (1976) 10. Jost, R.: The General Theory of Quantized Fields, Providence, RI: American Math. Soc., 1965 11. Hepp, K.: On the connection between Wightman and LSZ quantum field theory. In: Brandeis University Summer Institute in Theoretical Physics 1965 “Axiomatic Field Theory” Vol. 1 (M. Chretien and S. Deser eds.), Gordon and Breach, New York: 1966, pp. 135–246 12. Araki, H.: Mathematical Theory of Quantum Fields. Int. Series of Monographs on Physics 101, Oxford: Oxford Univ. Press, 1999 13. Buchholz, D.: Harmonic analysis of local operators. Commun. Math. Phys. 129, 631–641 (1990) 14. Lehmann, H.: Analytic properties of scattering amplitudes as functions of momentum transfer. Nuovo Cimento 10, 579–589 (1958) 15. Martin, A.: Scattering Theory: Unitarity, Analyticity and Crossing. Lecture Notes in Physics, Vol. 3, Berlin–Heidelberg–New York: Springer, 1969 16. Åks, S.: Proof that scattering implies production in quantum field theory. J. Math. Phys. 6, 516–532 (1965) 17. Bros, J.: Private communication 18. Buchholz, D. and Fredenhagen, K.: Locality and the structure of particle states. Commun. Math. Phys. 84, 1–54 (1982) 19. Schroer, B.: Particle physics and QFT at the turn of the century: Old principles with new concepts. (An essay on local quantum physics). J. Math. Phys. 41, 3801–3831 (2000) 20. Fredenhagen, K., Rehren, K.H. and Schroer, B.: Superselection sectors with braid group statistics and exchange algebras. 1. General theory. Commun. Math. Phys. 125, 201–226 (1989) 21. Fröhlich, J. and Marchetti, P.A.: Quantum field theories of vortices and anyons. Commun. Math. Phys. 121, 177–223 (1989) 22. Fredenhagen, K. and Gaberdiel, M.R.: Scattering states of plektons (particles with braid group statistics) in (2+1)-dimensional quantum field theory. Commun. Math. Phys. 175, 319–336 (1996) 23. Mund, J.: No go theorem for “free” relativistic anyons in (2+1)–dimension. Lett. Math. Phys. 43, 319–328 (1998) Communicated by A. Jaffe, G. Mack and W. Zimmermann
Commun. Math. Phys. 219, 141 – 178 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Universal Dynamics, a Unified Theory of Complex Systems. Emergence, Life and Death G. Mack II. Institut für Theoretische Physik, Universität Hamburg, Luruper Chaussee 149, 22741 Hamburg, Germany Received: 20 April 2000 / Accepted: 20 April 2000
Dedicated to the memory of Harry Lehmann Abstract: A universal framework is proposed, where all laws are regularities of relations between things or agents. Parts of the world at one or all times are modelled as networks called systems with a minimum of axiomatic properties. A notion of locality is introduced by declaring some relations direct (or links). Dynamics is composed of basic constituents called mechanisms. They are conditional actions of basic local structural transformations (“enzymes”): indirect relations become direct (friend of friend becomes friend), links are removed, objects copied. This defines a kind of universal chemistry. I show how to model basic life processes in a self contained fashion as a kind of enzymatic computation. The framework also accommodates the gauge theories of fundamental physics. Emergence creates new functionality by cooperation – nonlocal phenomena arise out of local interactions. I explain how this can be understood in a reductionist way by multiscale analysis (e.g. renormalization group). 1. Introduction When we speak about the world, we speak about models of parts of the world which are constructed by the human mind. I postulate that they reflect the structure of human thinking as formulated in the following: Preaxiom. The human mind thinks about relations between things or agents Relations will be interpreted as directed binary relations from a source to a target. Their constitutive property is that they can be composed – think of friend of a friend, brotherin-law, next nearest neighbor, etc. Traditionally, emphasis in physics has been on objects, like atoms or elementary particles. But relations are equally important. They integrate the objects into a network. In adaptive systems, the relations change in time in such a way that the connectivity of the whole network may change. Some mistaken views concerning reductionism or emergence result from neglect of the basic role of relations.
142
G. Mack
One may regard geometry as ancestor of relational theories. It knows a relation of parallelism between pairs of tangent vectors which serves to define straight lines and distances. The modern theories of fundamental physics are relational theories. This is true for the established theories, general relativity and the standard model of elementary particle physics. They are geometric theories (gauge theories). It is also true of string theory [14] and of the loop space approach to quantum gravity in Ashtekar variables [2]. Strings, whether open or closed, can be composed when they touch appropriately, and similarly for loops. A general plan of relational biology has been put forward by Rashevsky [61] and Rosen [63] decades ago. There is a mathematical theory of relations, category theory. Lawvere sought a purely categorical foundation of all mathematics, including set theory [42]. The mathematical biologists used category theory from the start. However, category theory lacks an essential ingredient of physical theories, locality. The fundamental physical theories are local in space time in the sense that the basic equations only relate quantities at (infinitesimally) close points in space, and at (infinitesimally) close instances of time. This is the celebrated Nahewirkungsprinzip which was discovered in the last century. Newton’s theory does not obey it, but Einstein’s general relativity, which supercedes it, does, and so does electrodynamics. The processing of chemically bound atoms and molecules in the living cell is mostly performed by biochemical enzymes, and their action is local – they act somewhere at a time. Here I postulate a more general locality principle which does not refer to space. Certain relations are singled out as direct relations, called links, and all others are obtained from them by composition. The generalization is desirable for several reasons – systems not in space, investigations of properties of space time itself, quantum objects in quantum systems1 whose parts are far apart in space, rapid communication over long distances (like in the Newtonian limit of general relativity), etc. The fathers of artificial intelligence did not adopt locality as a default option. This lead to such problems as mentioned by Marvin Minsky [58] when he says that a robot needs to be told a lot of facts about its surroundings, for instance that the wall does not fall down when he paints the table in the middle of the room. Without a locality principle, complexity becomes unmanageable [59]. What is assumed is not explained. Therefore, a fundamental physical theory is the more fundamental the less a priori structure is assumed. And a theory of complex systems is the more general the less a priori structure is assumed. Here I propose a general framework which provides a minimum of a priori structure through the definition of a system. It is in the spirit of L. v. Bertalanffy [7], the pioneer of general systems theory, and of Wittgenstein’s tractatus [70]. Basically it defines in a precise way a notion of structure. Its axioms contain essentially no more than my preaxiom and locality. In contrast with automata theory [71], the framework is supposed to be self contained. In principle, there are no data in systems other than structure, no states of any part of a system other than structure and no information exists other than specification of structure. The miracle is how much can be modelled with so little building material. I show models of life processes which exemplify this. The selfcontainedness of the present framework makes it a natural candidate for implementation on a computer. Software has been written and will be presented elsewhere [55]. It offers the convenience that one may 1 They are not considered in this paper
Universal Dynamics; Unified Theory of Complex Systems
143
compose, record and run a program by mouse click, build models of complex systems in this way and simulate dynamical processes. With locality, models become much more similar to those which physicists are used to. This results in a promising strategy to bring methods of theoretical physics to bear on very general complex systems, including biological and social systems. Actually I want to model not only material parts of the world, but also space time and immaterial constructs of the human mind like proposition logic. Different kinds of systems are distinguished by structural features which generalize what is known in physics as constraints on initial states. For instance, Gauss’law and gauge group isomorphic to R are constitutive constraints for an electromagnetic field. Space imbedded in space time is characterized by constraints which may be summarized by saying that there is geometry. And the constitutive property of matter is that it is in space time. Conservation laws need not be imposed in addition. They follow from requirements of internal consistency. Einstein’s equations for space time would be inconsistent without covariant conservation of the energy momentum tensor, and Maxwell’s equations of electrodynamics would be inconsistent without conservation of electric charge. For reasons of space, I cannot expand on these aspects here, cp. ref.[50]. But note that Darwinian evolution has competition for scarce resources as a constitutive feature, and scarcity is a consequence of conservation of matter and energy. There are no numbers in the framework to begin with, but a numerical description can sometimes be obtained by coordinatization. For instance, a gauge group is always defined as a structural property of a System with unitary links (at one time) and through its coordinatization one obtains a numerical decription of the gauge fields in lattice gauge theory models of elementary particle physics. In fact, the constitutive feature of pure gauge theories – differential geometry on principal or associated fibre bundles – can be recovered from the axioms plus one single extra structural assumption, unitarity of links, i.e. forth ◦ back = identity, cp. ref. [49] and Sect. 4. Let us turn to dynamics. I consider local Markovian dynamics in discrete time. The dynamics is composed from special local structural transformations. They are atomic constituents of dynamics and will be called enzymes. They are the mathematical models not only of the aforementioned biochemical enzymes , but of any kind of agent which causes change locally. Besides, there are predicates which enquire about local structural properties and serve to formulate conditions to which the enzymes action is subject. The conditional action of an enzyme – i.e. a pair (enzyme, predicate) – is called a mechanism. Mechanisms are valuable tools in theoretical immunology [57]. It is not a trivial task to construct a well defined deterministic dynamics from mechanisms because their actions here and at a neighbouring location may not commute. Since Petri [60] this is known to computer scientists working on parallel computing as the concurrency problem. I solve it with the help of a generalization of the well known device of Jacobi sweeps [68]. The resulting theory may be viewed as a universal chemistry in which general objects and links substitute for atoms and their chemical bonds,2 and dynamics can be interpreted as enzymatic computation [55]. The λ-calculus [31] can be implemented, therefore enzymatic computation can do anything a Turing machine can [62].
2 Actually there is in chemistry another basic relation besides chemical bonds, spatial proximity.
144
G. Mack
Enzymes share with matter the property of being somewhere. In biochemistry they are tied to material bodies, while in fundamental physics they are imagined to be ubiquitous3 . We regard the presence of mechanisms as part of the specification of the initial state of a system. The universal dynamics says: All mechanisms operate. A System sub specie aeternitatis – i.e. its whole history – is again a system. It is called a drama. The dynamics manifests itself as structural properties of this system. The dynamics can be stochastic. In this case the drama is a random System, and its links, etc. are random variables. When the dynamics is sufficiently stochastic, the drama becomes a classical equilibrium statistical mechanical system, albeit “in one more dimension”, with possible initial conditions now figuring as boundary conditions. There may also be external fields which represent interactions with an environment. Some emergent phenomena in biological systems can be modelled that way and they then appear as instances of familiar phenomena in equilibrium statistical mechanics such as restoration of spontaneously broken symmetries. Elsewhere [48] I illustrate this with the example of a very much simplified model of schools of fish swimming in coherent array which abruptly turn together with no leader guiding the group. Emergence is generally understood as leading to new and often unexpected properties of a whole which are not shared by its isolated parts.4 We regard a system as genuinely complex, if its properties are not all shared by subsystems with few objects. Emergent phenomena which arise in this way are nonlocal phenomena. Yet we want to understand them as a consequence of local interactions. Life is an emergent phenomenon. It involves emergent functionality. According to Maturana and Varela [56], living organisms are autopoietic - characterized by being able to make their own elements.5 Typically this involves creation of structure by copying or translation from templates, and preparation of building blocks by digestion, i.e., degradation of structure. The action of the split Fork-enzyme of Sect. 3.1.1 is an example of emergent functionality. The whole enzyme can copy arbitrary systems, but the individual microenzymes from which it can be composed cannot. And when one of them is missing or carries the wrong predicate, the copy-functionality is also lost. Some emergent phenomena like wave propagation can be understood by exploitation of symmetry and linearity or other special methods. Otherwise, a multiscale analysis is called for. Although a genuinely complex system S cannot be understood as a whole by looking at it locally, a complexity reduction is often possible by local considerations. This was the central idea of Wilson’s renormalization group [69, 38]. One constructs an effective theory, i.e. a description in terms of a new system S1 with new objects which represent subsystems, but retain only as much information on their internal structure as is relevant for their cooperation. Links between them are also constructed. Enzymes may be attached which represent functionality of compound enzymes at the smaller scale. The resulting system is still complex, but may have much fewer degrees of freedom. 3 In the canonical approach to classical physics, the dynamics it determined by the Hamiltonian H according to ξ → ξ + {H, ξ }δt. In a field theory, the Hamiltonian is a sum of local pieces which act locally. Call them enzymes. They are composed from basic microenzymes – canonical variables q and p. We may imagine that these enzymes are everywhere, and they stay there. One might want to think of them as dynamical, capable of changing their location or composition (functional form) as a consequence of the dynamics. But that is impossible because whatever H may be, the Poisson bracket {H, H } = 0. In dissipative systems the situation is different. 4 An illuminating discussion of the relevance of concepts in complexity, including emergence, for immunology can be found in I. Cohen’s book [11]. 5 Maturana and Varela thought of biological systems only, but Luhmann [46] generalized the notion of Autopoiesis to social systems.
Universal Dynamics; Unified Theory of Complex Systems
145
Then the procedure of complexity reduction may be iterated, leading to a multilevel description. Mathematically, the chief insight is the relation between block spin constructions in a renormalzation group (RG) setup, and collections of dual colimits and factorizing cones in categories. Experience with the rigorous renormalization group approach to gauge theories [5] is valuable because general systems share many features of lattice gauge theories, cp. Sect. 4. In what way they are essentially more general is best seen in Sect. 4.1 on logic. Monte Carlo RG-studies of gauge theories have also been performed, cp. e.g. [29], and the Monte Carlo RG-method is still being improved [9]. 2. Structure This section and the next introduce the basic mathematical framework. Parts of the world at one time as well as their whole histories are modelled as systems with certain axiomatic properties. A more descriptive name would be “local category”. According to L. van Bertalanffy, [7], a system is a set of units with relationships between them. I precisize. Definition 1. (System) A system is a model of a part of the world as a network of objects X, Y, . . . (which represent things or agents) with arrows f, g, . . . which represent directed relations between them. One writes f : X → Y for a relation from a source (domain) X to a target (codomain) Y. The arrows are characterized by axiomatic properties as follows: 1. Composition. Arrows can be composed. If f : X → Y and g : Y → Z are arrows, then the arrow g ◦ f : X → Z is defined. The composition is associative, i.e. (h ◦ g) ◦ f = h ◦ (g ◦ f ). 2. Adjoint. To every arrow f : X → Y there is a unique arrow f ∗ : Y → X in the opposite direction, called the adjoint of f . f ∗∗ = f and (g ◦ f )∗ = f ∗ ◦ g ∗ . 3. Identity. To every object X there is a unique arrow 1X : X → X which represents the identity of a thing or agent with itself. 1X = 1∗X ,
and
1Y ◦ f = f = f ◦ 1X
for every arrow f : X → Y . 4. Locality. Some of the arrows are declared direct (or fundamental); they are called links. All arrows f can be made from links by composition and adjunction, f = bn ◦ · · · ◦ b1 , (n ≥ 0) where bi are links or adjoints of links; the empty product (n = 0) represents the identity. 5. Composites. The objects X are either atomic or Systems. In the latter case, X is said to have internal structure, and the objects of the system X are called its constituents. 6. Non-selfinclusion. A system cannot be its own object or constituent of an object etc. Ultimately, constituents of . . . of constituents are atomic. The links and objects of a system are called its elements. A system is called connected if there are arrows to all other objects from some (and therefore all) objects. A system is called unfrustrated if there is at most one arrow from X to Y for any objects X, Y .
146
G. Mack
Axioms 1 and 3 are those of a category. Ignoring specification of links and the ∗operation, a system S becomes a category Cat(S). There is a long standing controversy in philosophy concerning identity, see e.g. Wittgenstein [70], Satz 5.303 or Quine [72]. In systems theory, the identity arrows 1X are as important as the number 0 in arithmetics. Later we shall have occasion to introduce also special arrows between two objects which are identical in the sense of indistinguishable (i.e. copies). By abuse of language they will also be called identity arrows. The idea of an adjoint (Axiom 2) is that a relation in the opposite direction should be specified in some way by any link. There can be different ways in which links can be adjoint. For instance, if objects X, Y are Systems, hence categories, and f ∗ : Y → X is left adjoint functor of a functor f (s. later) then f ∗∗ = f is right adjoint functor of f ∗. Axiom 4 introduces locality as explained in the introduction. Axiom 5 makes the whole scheme self contained. And according to Jacob [35] Tout objet que considère la biologie represente un système de systèmes. The totality of statements about systems which are meaningful as a consequence of the axioms will be called the language of thought. Assumption 1. Unless otherwise stated, it is assumed that constituents, constituents of constituents, etc. of objects of S are not objects of S. It must be emphasized that atomicity of an object is not a property of something in the world, but of a particular model which describes some of its aspects on some scale. Objects with internal structure are black boxes. Later on we shall “dissolve” such objects, making their interior structure visible by putting links from some of their constituents, and thereafter they may be treated as atomic although they still stand for the “same” (composite) object. Definition 2 (Types of links). A link b : X → Y is said to be invertible if there exists an arrow, denoted b−1 such that b ◦ b−1 = 1X , and b−1 ◦ b = 1Y . It is said to be unitary if b∗ = b−1 . Links whose adjoints are links will be called bidirectional for short. Sequences b1 , . . . , bn of links or adjoints of links which can be composed are said to make a path of length n ≥ 0. Composability requires that the target of bi is the source of bi+1 . The length of the shortest connecting path can be used to measure distance between objects. Definition 3 (Subsystems). A subsystem S1 of a system S is generated by a set of objects in S and a set of links in S between these objects. Its arrows are all arrows in S that can be composed from these links and their adjoints. The boundary of S1 consists of the links in S with target in S1 which are not links or adjoints of links in S1 . The environment of S1 is the system generated by the objects of S not in S1 and the links between them. For n ≥ 1, the n-neighborhood of an object X is the subsystem which contains all objects connected to X by a path of length ≤ n, and is generated by all links between them. Identity links in the path are counted as contributing 0 to its length. A System is called locally unfrustrated if all its 1-neighborhoods are unfrustrated subsystems.
Universal Dynamics; Unified Theory of Complex Systems
Object:
Links:
147
✲
I ✛❅ ❅s✒ ✲ ✠❅ ❘
Fig. 1. Brick wall. The links b are translations of a brick to a nearest neighbour’s position. The arrows are equivalence classes of paths subject to the equivalences b∗ = b−1 and b1 ◦ b2 ◦ b3 = 1X for any triangular path from X to X. The system is locally unfrustrated, but would become (globally) frustrated if the wall were closed to a round tower
Note that it is not required that adjoints of links in S1 which are links in S are also links in S1 . Following Luhmann [46], one may also want to consider the internal environment I1 of S1 . It is the system whose objects are the constituents of non-atomic objects X in S1 and whose links are the links between them. Arrows are equivalence classes of paths, equivalences being determined by the equivalences in the systems X. Given our standing assumption 1, I1 is not a subsystem of S. The brick wall shown in Fig. 1 is an example of a locally unfrustrated system, and so is any triangulated 2-manifold, with the 1-simplices as links.
2.1. Structure preserving maps. As always in physics, we shall not distinguish between isomorphic systems. To make this precise we need to consider structure preserving maps called local functors.6 In category theory, a functor F : C → C is a map of the objects of a category C to objects of a category C and of arrows f of C to arrows of C such that source and target of F (f ) are the images of source and target of f , and F (f ◦ g) = F (f ) ◦ F (g), F (1X ) = F1F (X) .
(1) (2)
In contravariant functors, Eq. (1) is replaced by F (f ◦ g) = F (g) ◦ F (f ),
(3)
Definition 4 (local functor). A local functor F : S → S is a map of a system S into another system S which obeys the above requirements on a functor of categories, maps links into links, and obeys F (f ∗ ) = F (f )∗
(4)
In a contravariant local functor, Eq. (3) is substituted for Eq. (1). 6 This makes precise what Wittgenstein leaves undefined in his isomorphism theory in tractatus 2.15 when he postulates that the images of two objects are related “in the same way” as the objects.
148
G. Mack
An isomorphism of systems is a local functor whose inverse exists as a local functor. An anti-isomorphism of systems is a contravariant local functor whose inverse exists as a contravariant local functor. No local functor between two systems need exist unless the identity arrows of the image are links. (Typically they are not.) Anti-isomorphisms relate complementary shapes. They are important in cognition. The surface (boundary) of a lock and a key have opposite orientation; this will be reflected in an anti-isomorphism. This is important for life. In biochemistry, the specificity of enzymes for particular substrates is due to a lock-key match of parts of their surfaces. And receptors on cell walls function according to the same lock-key principle [1]. It is important to note that the internal structure of black boxes (nonatomic objects) is declared irrelevant by not distinguishing isomorphic systems, and so is the distinction between atomic and nonatomic objects. The only usage of the internal structure is in constructing links of the system and their composition ◦. One does not look into black boxes anymore once they are in place. If isomorphism of corresponding nonatomic objects is demanded, we speak of a strong isomorphism. Categorical Remark 1. Local functors define the category of systems whose objects are systems and whose arrows are local functors. Among the isomorphisms F : S → S , there is a particularly important class called gauge transformations. They are “inner” in the sense that they are generated by arrows of the system. Since gauge transformatons are isomorphisms, statements about a system in the language of thought are necessarily gauge invariant. It follows that observables must be gauge invariant. More precisely, one may define objective observables to be boolean functions on systems, well defined for all systems, while subjective observables require for their definition specification of a distinguished object X (the subject), and possibly some links b with target X (e.g. the direction in which a speaker points). Objective observables must be gauge invariant, while for subjective observables this needs only be true for gauge transformations which are trivial at X and don’t change its specified links b. Given S and an object X of S, the group GX of local gauge transformations consists of all invertible arrows g : X → X. (It is also called the holonomy group). Its identity element is 1X . Gauge transformations take S into a system S with Cat(S) = Cat(S ). Definition 5 (gauge transformations). A gauge transformation F is defined by selecting an element gX ∈ GX for every object X, and mapping arrows f : X → Y into F (f ) = gY−1 ◦ f ◦ gX . The links of S are the images of links of S, and the *-operation terms of the *-operation in S by f + = σX−1 ◦ f ∗ ◦ σY
(5) +
in S is defined in
(6)
∗ ◦ g . A gauge transformation is called unitary if g ∗ = g −1 for all X. with σX = gX X X X
Universal Dynamics; Unified Theory of Complex Systems
149
One verifies that the conditions for an isomorphism are satisfied. The composition law is preserved. The new star operation satisfies f ++ = f and 1+ X = 1X as it should, and F (f ∗ ) = F (f )+ . Theorem 1 (Gauge group). In connected systems S where all links are unitary, all gauge transformations are unitary, and all groups GX of local gauge transformations are isomorphic. Their isomorphism class G is called the gauge group. Proof. With all links, all arrows are unitary. Therefore all gX : X → X are unitary. Given X, Y , there exists an arrow f : X → Y because S is connected. If gX ∈ GX then gY = f ◦ gX ◦ f ∗ ∈ GY , and conversely. Therefore GX and GY are isomorphic. Gauge transformations in electrodynamics are the standard example. Electrodynamics will be considered in Sect. 3.2 later on. Here is another example. Example 1 (Fundamental group). Consider the system made from a triangulated manifold as follows. The objects are the 0-simplices and the links are the 1-simplices; adjunction is change of orientation, adjoints of links are links. The arrows are equivalence classes of paths b0 , . . . , bn , defined by the following equivalence relation: – all links are unitary, – b0 ◦ b1 ◦ b2 = 1X if b0 , b1 , b2 is a closed path from X to X (a triangle). The gauge group is the fundamental group of the manifold. Theorem 2 (Reconstruction of objects). A system is determined up to isomorphism if – – – –
the arrows are enumerated or given as a set, it is specified which arrows can be composed and which arrow is the result, it is specified which arrows are links, the adjoints of arrows are specified.
Conversely, any such data determine a system if all the arrows can be obtained by composing links and their adjoints, and if the *-operation satisfies the consistecy conditions imposed by the axioms. Proof. The corresponding result in category theory is standard [40]. One reconstructs the objects as equivalence classes X of links, two links b1 and b2 being equivalent if there exists an arrow f such that b1 ◦ f and b2 ◦ f are both defined, X is then the common target of all links equivalent with b1 . f with target X is equal to the identity 1X if g ◦ f = g whenever it is defined. Categorical Remark 2. Gauge transformations are invertible functors which preserve objects and admit a natural transformation to the identity. 2.2. Tautological character of the axioms. I wish to convince the reader that there is no more in the axioms than what is intended by the preaxiom, and locality. The composition law and adjunction need to be discussed. The axiomatic properties assure that a system is a directed (pseudomulti) graph if the number of objects is finite. The edges of the graph are the links and its vertices are the objects. The graph may have edges which are loops (pseudographs), and it can have multiple edges between the same vertices (multigraph) [28]. Deviating from standard nomenclature, I will also speak of a graph when the number of vertices is countable.
150
G. Mack
Let us first turn to adjoints. One may consider the existence of the fundamental relation b : X → Y as a directed relation from Y to X. This amounts to introducing formal adjoint links in the graph in the opposite direction. Given a graph , which represents basic relations between things or agents, we have no composition rule. But one can make from a category S in a universal way. This yields a system. The arrows from X to Y are the paths from X to Y made from links and their formal adjoints. All the systems with given directed graph are obtained from S by passing to equivalence classes of paths. So, the freedom of choosing the composition law merely introduces the option of waiving distinctions between relations. Similarly, there is a universal way of making a category U with unitary links from any given directed graph. It is obtained from S by imposing the relations b ◦ b∗ = 1 and b∗ ◦ b = 1, i.e. considering only nonbacktracking paths. Every system with unitary links is obtained from U by passing to equivalence classes of nonbacktracking paths. Categorical Remark 3. The passage to equivalence classes of arrows defines a unique local functor. Therefore we have Theorem 3. The forgetful functor F from the category of finite systems S [resp. finite systems with unitary links] to the category of directed (pseudomulti-)graphs has a left adjoint functor F ∗ : → S [resp. → U ]. 3. Universal Dynamics Dynamics shall be composed from local structural transformations. They are special graph transformations [64] obeying some strict locality requirements. They are universal in the sense that their action is defined for arbitrary systems; the same will therefore be true for dynamics composed from them7 . We shall distinguish four kinds of such transformations: – motion, – growth, – death, – cognition. They are reversible except for death. I discuss them one by one. Motion promotes indirect relations to direct relations. Either arrows composed of two (or more) links are promoted to the status of a link (e.g. friend of a friend becomes friend), or a non-fundamental adjoint becomes a link. The opposite, demotion of an adjoint link to non-link is also subsumed under motion. Equations of motion in physics like the Maxwell equations (s. below) determine motion in this sense. Catalysis of bonds in chemistry (and elsewhere, Fig. 2) is motion in this sense (supplemented with removal of some link(s)), and so is motion in space. Let material body A be “at” space point C and C be “neighbour_of” B. If the relation “at = neighbour_of ◦ at” becomes fundamental instead of “at”, it means that body A has moved from C to B. The category Cat(t ) does not change at all in this kind of time evolution. Therefore we have Theorem 4. Any quantity Q which is determined by the category Cat(t ) is a constant of motion. 7 Universal dynamics is intended to be universal also in the sense of universal constructions in category theory. But the appropriate theorems have not been proven yet.
Universal Dynamics; Unified Theory of Complex Systems
C
A
C
B
A
151
Carl
B
Anne
Carl
Bert
Anne
Bert
Fig. 2. Catalysis in chemistry and elsewhere. A catalyst C binds molecules A and B. First a substrate-enzyme complex is built, where A and B bind to C. Next the composite arrow from A to B becomes fundamental
Growth copies objects. Death removes links (together with their adjoints), or removes objects together with all links incident on them. Cognition creates links between objects with matching internal structure. The match is supposed to be established by enzymatic computation, ultimately with enzymes of the other kinds. Identity links (cp. after Definition 1) between atomic copies of objects are also admitted and serve as prototypical examples of cognitive links. The creation of links by cognition (cognitive links for short) is fundamentally different from creation of links from existing links by composition or adjunction because it creates new arrows. This will be important later on when we adapt Baas’ distinction between deductive and observational emergence [3]. In principle, operations of this kind would also be mathematically well defined if they are nonlocal, but we don’t want to admit that. If only objects within one 1-neighborhood can be linked, they must be connected before by a path of length at most 2. But the new arrow is not the arrow which this path defines. For instance, in chemistry spatial proximity links are a prerequisite for forming chemical bonds. Receptors on membranes of living cells operate in a cognitive way, using lock-key type matching. Before proceeding to formal developments, I will discuss some examples. Copy processes are most important for life. Autopoietic systems make their own elements. Ultimately they make them from constituents or constituents of constituents, etc. which are conserved material entities, not created. So the “making” is supply of structure, typically from templates. DNA is copied, it is also copied into RNA. RNA is copied and is also translated into sequences of amino acids, i.e. proteins. The material constituents must also be supplied. Plants make organic material from anorganic substances and light by enzymatic action, but animals need to get their building blocks from organic materials breaking down its preexisting structure. Call this digestion. I will demonstrate how copy processes and digestion can be modelled using no more than what is provided by the axioms, Definition 1, and also how relaxation sweeps through extended systems can be modelled.
3.1. Examples. 3.1.1. Copying. The asymmetrical replication fork. There is an example of a local dynamics that can be used to produce within a finite time two copies of any finite system whose links are all bidirectional. It is a mathematical abstraction and generalization of the asymmetrical replication fork mechanism which copies DNA in the living cell, see standard textbooks [1, 43]. During the copy process, links without fundamental adjoints appear.
152
G. Mack
q✛✲ q ◗ s q✛✲ q✛✲ q ⇒ ◗ ✑ ✛ ✰ q ✲ q✑
q✛✲ q✛✲ q ◗ s q✛✲ q ◗ ✑ ✛ ✛ ✰ q ✲ q ✲ q✑
Fig. 3. Action of the splitfork-enzyme sX at X, for chains like DNA. The same mechanisms can operate on general systems
A fork at X shall be a pair of links without fundamental adjoints with source and target X, respectively. The split fork-action sX shall be a local structural transformation. Figure 3 shows its action on chains (of pairs of directed links) like the DNA double helix. The generalization is as follows. A link is called “bidirectional” if its adjoint is a link, and “unidirectional” otherwise. 1. A copy X of X is made. 2. The links incident on X other than loops are distributed among X and its copy as follows: – bidirectional links with target X get X as their target, – unidirectional links with target X retain X as their target, – bidirectional links with source X retain X as their source, – unidirectional links with source X get X as their source. The loops X → X remain in place and get a copy X → X . 3. The adjoints of formerly unidirectional links are promoted to the status of links. Theorem 5 (Universal copy constructor). Let S0 be obtained from a finite connected system S whose links are all bidirectional by action of sX0 at some X0 ∈ S. For t > 0, let St be obtained from St−1 by action of sX for all objects X which have forks. St is well defined for t ≥ 0. For sufficiently large t, it is independent of t and consists of two disconnected systems, both isomorphic to S. “Once replication has started, it continues until the entire system has been duplicated”. Upon substituting “genome” for “system” this becomes a quote from a genetics textbook [43]. Remark 1. Copying may be initiated at several sites X0 , . . . Xn which are not connected by links. The action of the split Fork-enzyme is quite robust against errors due to computer failures which mimick local mutations. But a third copy is made of part of the system when a fundamental adjoint gets lost (or added) “at the wrong moment”8 . The theorem was first demonstrated in [51]. The fact that sX is well defined requires a comment - what does it means that “a copy X of object X is made”? Theorem 2 can be invoked to describe sX S up to isomorphism. The isomorphism class does not retain the information about the internal structure of nonatomic objects. But this information can be retained by the copies, if desired. To do so, one uses the universal copy constructor to copy the objects which are themselves systems, to copy their nonatomic constituents, and so on. Because of the axiom of non-selfinclusion this does not lead to an infinite recursion. In conclusion, if sX is defined in this way, the stronger version of Theorem 5 holds true where the phrase “both isomorphic to S” is replaced by “both strongly isomorphic to S”. 8 In man, errors in copying the genome may result in Down’s syndrome, the presence of three copies of chromosome no. 21 instead of the usual two.
Universal Dynamics; Unified Theory of Complex Systems
153
sX is an example of a local structural transformation (“enzyme”). On systems as occur as St in Theorem 5, actions sX , sY commute. But this is a lucky circumstance. The production of a copy is an “emergent phenomenon”, a nonlocal phenomenon which arises from local interactions. New functionality –copying– emerges. Categorical Remark 4. sX can be decomposed into several “microenzymes” which act in sequence. If all links are unitary, each of them specifies a functor of categories, albeit the trivial one, but the first of these functors is not surjective because an extra object is created. Such situations are admitted in the work of Ehresmann and Vanbremeersch [19] to which we turn later. It is not true that the functor is the dynamics in their work. The decomposition is shown in ref. [55]. The first microenzymes is an elementary copy process which creates a duplicate of X and links it by an identity link to the original. The other microenzymes compose links and remove the identity link again. When the copied object X is an indestructible material constituent, its equal may be imagined to be recognized in the environment (by a cognitive process) and absorbed by linking it to X by an identity arrow. 3.1.2. Digestion. Consider a finite connected system S0 with bidirectional links and with a distinguished object X. I describe a local structural transformation dX which continues to act on a 1-neighborhood of X. Starting with an arbitrary S0 , the action becomes trivial after some time and there results a system S with the same objects X, Y1 , . . . , Yn as S but whose structure is completely degraded in the sense that its only links are one link from Yi to X for each i, and their adjoints. (These links could be removed, but then “the food {Yi } is lost” since there are no relation to it anymore.) dX consists of consecutive steps. 1. (Death) The far side of all triangles of 3 links with tip X is removed, together with its adjoint. 2. (Motion) If b is a link from Y = X to X and b is a link from Y to Y then b ◦ b becomes a link and b ceases to exist as a link. 3. Fundamental loops X → X are removed. Actually the 3rd step can be omitted when Step 2 operates also for Y = X. Figure 4 illustrates the procedure. Categorical Remark 5. dX specifies a functor of categories, albeit the trivial one, if the links are invertible, e.g. unitary. s ❅ ❅
s
❅ s ❅s ❅ ❅ ❅ X ❅s
s ❅ ❅
s
s ❅ ❅
❅s
❅
⇒
❅ X
❅s
s ❆ ❆
s
❆ s ❆ s ❅ ❆ ❅ ❆ ❅❆ ⇒ X ❅ ❆s
s ❆ ❆
s
❆ s ❆ s ❅ ❆ ❅ ❆ ❅❆ ⇒ X ❅ ❆s
Fig. 4. Digestion enzyme attacks at X
154
G. Mack
3.1.3. Sweeps through a system. The split Fork copy procedure of Sect. 3.1.1 relies on the propagation of a shock wave. The shock wave is the boundary between the part of the system which has already been copied, and the rest. It is made of links without fundamental adjoints.9 The objects X at the outside of the boundary are copied, and then the boundary passes past them. Instead of copying X, one may act on its neighbourhood with some (other) enzyme. In this way a sweep through the system is generated which invokes an updating of the neighbourhoods of all objects in the system. 3.2. Systems sub specie aeternitatis. There are two different ways of looking at dynamics. I will now examine the possibility of looking at a system sub specie aeternitatis, i.e. its whole history. This will also be viewed as a system. Following Sorin Solomon’s suggestion, I call it a drama. Definition 6 (Drama). A drama is a system S which is composed from subsystems St labelled by t = 0, ±1, ±2, . . . and links e in one or both directions between objects Y ∈ St+1 and X ∈ St . If there is such a link, Y is said to be descendent of X, and X is an ancestor of Y . It is required that every Y ∈ St+1 is descendent of at least one object in St . If there are several such objects, they must be connected by identity links. We will impose the additional condition that the time links are all unitary, and that any two paths between the same target X and source Y which are made exclusively of time links define the same arrow. It is a subtle question whether one might want to generalize this. If the condition is satisfied, the time links can be “gauged away”, so that the parallel transporters along time links are trivial. This is familiar in gauge theory under the name “A0 = 0 gauge” [32]. By definition, a drama is a system. Its consideration converts function into structure. Dynamical laws constrain the structure of the drama. Now we are ready to consider deterministic dynamics. Stochastic dynamics operates in the same way except that enzymes operate with certain probabilities. An initial state t of an N th order dynamics shall be a subsystem S[t,t+1−N] of a drama which is generated by subsystems St , . . . , St+1−N and the time links between them, and enzymes that are attached to the objects and links of this system. The enzymes code for the constraints on the local structure of the drama which determine St+1 in terms of S[t,t+1−N] . The enzymes must 1. Determine the descendents in St+1 , possible identity links between them and the existence of time links to and from their ancestors. 2. Determine noncognitive links in St+1 . They are arrows in S[t,t+1−N] composed with time links between St and St+1 . 3. Determine possible cognitive links between objects in St+1 which are descendents of nonatomic objects in St . 4. Put enzymes on objects and links of St . Enzymes of motion make exactly one descendent of every object, with a pair of timelinks between it and its ancestor. They only make non-cognitive links. A growth enzyme makes two or more descendents of one object or, in case of fusion, makes one common descendent of several objects which are linked by identity links; death follows from absence of enzymes that make appropriate objects and links. 9 Boundaries of subsystems which are distinguished in this or other ways serve as mathematical models of membranes in cell biology [1].
Universal Dynamics; Unified Theory of Complex Systems
155
I pause to explain the notion of cognitive links which link two nonatomic objects. By definition, they are systems X1 , X2 . A cognitive link is a local functor f : X1 → X2 . Links between isomorphic systems are the prototypical examples. If there is at most one link between two objects in X1 , then the functor is determined by the images f (Y ) of objects Y in X1 . We use special links – e.g. identity links – to connect Y and f (Y ). These identity links are supposed to be detemined by enzymatic computation, i.e. by a dynamical process in a system which is generated by X1 , X2 and an initial tentative identity link between some constituents of these. The making of cognitive links only makes sense if the dynamics determines future systems up to strong isomorphism, because the internal structure of nonatomic objects matters. For illustration of the relation between drama and dynamics, consider the split Forkdynamics. The t + 1 piece of Fig. 6 shows a portion of a drama. Another example is the discretized Maxwell equations on a cubic lattice and in discrete time. This dynamics is of second order, so the dynamical laws will involve links in 3 layers St+1 , St and St−1 and time links. In addition there are constraints on the initial state, they involve links in St and St−1 and time links between. All these constraints on the structure of the drama have the form l = 1X ,
(7)
where l are arrows X → X that are made from closed paths with the above links. They are shown in Fig. 5. We explain below why these are the Maxwell equations including the Gauss constraint. Let us show that the dynamics is well defined. The links are unitary and the dynamical laws involve closed paths l with exactly one link b in St+1 . Therefore they have a unique solution b = u∗ if l = b ◦ u,
(8)
where u is composed of the remaining links in the path p. The time links in path u are gauged away and the remaining ones are determined by the initial state. We see that the time evolution merely amounts to promoting arrows u∗ of the system to the status of link, while the old links may lose that status but remain as a arrows. In other words, the category Cat(t ) does not change at all. Recall that this is always the case in motion. The Gauss constraint is preserved in time. This will be shown in Sect. 6 using tools of noncommutative differential calculus. The gauge group R of electrodynamics is determined by the initial state; it is also counted as a constraint on the initial state. The coordinatization of links by real vector potentials (below) comes from the gauge group R similarly as in lattice gauge theory. This can be deduced from the representation theorems to be proven in Sect. 4. Therefore real values are attached to the links. If the lattice gauge field comes from a vector potential A in the continuum they are • → • = Adx, whence, (9) ✷= Adx = Bdf (10) ✷
F :∂F =✷
156
G. Mack
r r
r r
r r r r r
r
r ✡✻ ✢ r✡
r r
r r
r✛ r ✡ ✣ ✡ ✢ ✲✛ ✡X r✡ r❄ ✻ r r❄ ✡ ✢ r r✡
r r
r
r r
r
r r
r
r
r
✲r ✡ ✢ r✡
=
r
r
r r
r r
1X
r
r
r r
r
r r
Fig. 5. Maxwell Drama. For the Gauss-constraint, the direction to the back is the time direction. For the equations of motion, the upward direction is the time direction. The equations say that the parallel transporter along the path around all the displayed plaquettes equals the identity. There is one equation for every pair of links. Formally, the Yang–Mills equations are of the same form. Only the gauge group is different
for loops around squares. B is the magnetic field. Parallel squares are surrounded with opposite orientation. Therefore the total contribution of paths around all spacelike plaquettes going through → is ≈ ∇ × B · area, while a timelike plaquette gives something ˙ Putting everything together we get proportional to the electric field, since −E = A. ˙ Maxwell’s equation E = ∇ × B, and Gauss’ law ∇E = 0. The other Maxwell equation follows from the existence of a 4-vector potential. Charged fields can be put in [50]. Note that charge conservation is required by the internal consistency of the Maxwell equations – indestructibility of charged matter is built into a structural description, it need not be postulated separately. The Yang–Mills equations of elementary particle physics have the same form, at least formally. Only the gauge group is different. Higgs physics can also be put in, at a prize [49]. The world is regarded as two sheeted, one sheet carries the left-handed matter and the other the right-handed matter. The two 4-dimensional sheets might be boundaries of a five (or higher)-dimensional world. The Higgs fields are possibly nonunitary parallel transporters between the sheets, as in the model of Connes and Lott [13], but with conventional locality requirements, cp. Sect. 6. The prize is that each sheet should have its own gauge group GL resp. GR . The strong gauge group is therefore SU (3) × SU (3), broken spontaneously by a Higgs to SU (3). But this breaking is expected to produce a massive vector meson, the axigluon [24] which has not been found until now.
3.3. Concurrency. This section presents details on how a well defined dynamics can be composed from enzymes. There is a technical problem which is known to computer scientists as the concurrency problem. Enzymes specify local structural transformations, but the action of such transformations at neighboring locations may not commute. The drama point of view amounts to solving the concurrency problem by a generalization of what is known in applied mathematics as a Jacobi sweep (as opposed to Gauss Seidel sweeps) [68]. In a Jacobi sweep one updates variables attached to nodes and links of a grid by visiting them one at a time and determining the values of their particular
Universal Dynamics; Unified Theory of Complex Systems
r ✛ ✲ rP P q P ✘ r✛ ✲ r✛ ✲ r ✾✘✘ r✛ ✲ r✘ r ✻
r ✻
157
t
r
r r r ✄ r r ✻ ✻ ✄ ❄ ❇M ✄ ✻ r❄ ✛✻ t + 41 ✲ rP ✎ ✄ ❇ ❄ ❄ P q P ✛ ✛ r r r ✲ ✲ ✘ ✛ ✲ r❄ ✾✘✘ ✘ r❄ r r r r r ❆❑ ✻ I ✒✻ ❅ ✄ r ❆ r◗ ❅r ✻ ✻ ✒ I ❅ ✒ ❦✄ t + 21 ❅ ❅ r❄ ✻ I r❄ ❅ ✒✻ ✒❆ ❇M ◗ ✛ ◗ ✲ ✄ PP ❅ ❆❇ ✄✎ ◗ ❄ ❅ ❄ q P ✘ r✛ ✲ r✛ ✲ r ✘ ❅ ❄ ❄ ✘ ✛ ✾ r ✲ r✘ r r✛ ✲ r✛ ✲ P P q ✛ ✲ r t +1 P ✻ ✻ ✾✘✄ ✘✘ r r✛ ✲ r✛ ✲ r✘ ✻ ✻ ✄ ❄ ❇M ✄ ✻ r❄ ✛✻ ✲ rP ❇ ✄✎ ❄ ❄ P q P ✘ r✛ ✲ r✛ ✲ r ✘ ❄ ❄ ✘ ✛ ✾ r ✲ r✘ Fig. 6. Split fork dynamics, concurrent version. The time t + 43 step is not shown
variables at time t + 1 in terms of time t values of variables attached to objects and links in the neighborhood. The result is independent of the order of visits. For simplicity consider first order dynamics, and ignore the possibility of making cognitive links. Given St , the objects of St+1 need to be made as descendents, and the links in St+1 need to be made. Divide the time interval in four and let the production of descendents take place at time t + 41 and the initial production of new links at time t + 21 . Suppose that the production of descendents and their time links is governed by special enzymes called O-enzymes, which are attached to objects or identity arrows i between indistinguishable objects. An O-enzyme may also attach specific enzymes to the descendents or identity arrow between them. There are two types, O2-enzymes attached to objects, and O1-enzymes. The objects Xi connected by identity arrows with attached O1-enzymes form clusters which are in one 1-neighborhood. Let us regard the O1-enzymes in this cluster as one O-enzyme. It makes a copy of some representative object Xi in the cluster and links it to all the Xj in the cluster by bidirectional time links. O2-enzymes at X make two descendents of X and a time link to one copy and another from the other copy. This produces a “fork in time direction”. The enzyme may put a bidirectional identitity link between the two descendents at the ends of the fork prongs. Enzymes may be attached to the descendent’s identity arrows in a manner determined by the O1, O2-enzymes. We note that the action of all the O-enzymes anywhere commutes, and so their action specifies a globally well defined transformation of St into St which consists in the growth of additional elements. At this stage, St is only defined as a graph. We make it into a system by extending the composition rule. This is done by specifying the equivalence relations involving new links as follows:
158
G. Mack
1. time links are unitary, 2. triangles made from time links and identity links are 1. Next we consider L-enzymes. They make links. They act at time t + 21 . Like Oenzymes they are attached to links or objects in St . Their action also consists in the growth of new elements, “diagonal links”. Their action at time t + 21 makes collections of “diagonal” links l. They connect objects in St with descendents of objects in St in a manner which depends on the enzyme, on the link or object it is attached to, on the neighbourhood of this link or object in St , and on the descendents of objects in this neighbourhood. It may also attach enzymes to the newly made (diagonal) links. The new links l are made by composition of links in St and time links. Furthermore, the L-enzymes may put marks on the new links which will serve as indicators of adjoint relationships to be specified later. We note that the action of all the L-enzymes anywhere commutes, since links are created depending only on what was before. Therefore the action of all L-enzymes at time t + 21 specifies a globally well defined transformation of St into St which consists in the growth of new elements. St is a system because the new links are arrows of St . At time t + 43 we lift the ends (sources or targets) of diagonal links which are in St by composing them with time links to or from descendents. The time links were made such that the way to do this is unique. There results a well defined system S t with Cat (St ) = Cat (St ). Finally, at time t + 1, one considers pairs Y1 , Y2 of descendents arbitrary objects such that there is some link between them. The local action at (Y1 , Y2 ) consists in an examination of the totality T of links between Y1 and Y2 and in declaring some of them adjoints of others in a manner which depends on T and the marks on the links. How this is done must be specified a priori by specifying the meaning of marks. We note that the local action for different pairs commutes. Therefore there results a well defined system S t with Cat(St ) = Cat(St ). The objects which are descendents of objects in St and the links between them generate St+1 This demonstrates validity of the following Theorem 6 (Deterministic 1. order enzymatic dynamics). Suppose that St can be obtained from a system without attached enzymes by attaching O-enzymes and L-enzymes (as described above) to links and objects. Then the enzymes determine a unique map St → St+1 to another system. If St is the time t-layer of the part S≤t of a drama, then S≤(t+1) is defined. The making of cognitive links requires a separate consideration in order to fix the extension of the composition law to them. Given a collection f21 of identity links from objects of system X1 to system X2 , and f32 from X2 to X3 , this defines a possibly empty collection f31 = f32 ◦ f12 . If f21 and f31 define local functors, then so does f31 . By definition, cognitive links are functors. We define arrows as equivalence classes of paths. Equivalences are generated by equivalences of paths without cognitive links, and equivalence of two paths (b1 , . . . , bn ) when all bi are cognitive links and their composition defines the same local functor for both paths.
Universal Dynamics; Unified Theory of Complex Systems
159
4. Transformation Theory This section will show in what way system’s theory is a generalization of gauge theory. Some basic concepts and tools flow from this. In quantum mechanics, Dirac’s transformation theory played an important role [17]. It rests on the fact that unitarily equivalent representations of the algebra A of observables in Hilbert spaces are not physically distinct. The spectral theorem asserts the existence of representations in which given commuting observables are simultaneously diagonal, i.e. act as multiplication operators on function spaces. Similarly, isomorphic systems are not considered distinct. Here I present representation theorems and some properties of special representations are pointed out. I explain the notion of a representation. In group theory, a representation of a group G is not just a homomorphism (structure preserving map) to another group G , but it is required that G must come with some predefined structure, and the group operations must be compatible with it. More particularly, G must consist of linear maps of a vector space, and group multiplication must be composition of maps. Similarly, a portrait in oil is a structure preserving map of a person. The image is supposed to consist of oil paint on canvas. Generalizing this, representations of a system will be defined as local functors to instances of a class of systems which are equipped with some predefined structure, and operations like composition ◦ of arrows are supposed to be compatible with it. Sometimes there is additional structure (e.g. composition rules for objects) which are required to be preserved. A representation is called semi-faithful when no two objects or links are mapped into the same object or link, faithful when the same is true of arrows. There are many kinds of representations. The most important ones have the following classes of systems as images – communication networks: The arrows f : X → Y are maps f : -X → -Y of sets (or spaces with more structure) and composition is composition of maps; – archetypes; – unfrustrated systems. Archetypes are special systems , often with few objects. Theorem 7 (Representation as a communication network). Every system with finitely many objects and links admits a faithful representation as a communication network, i.e. there are sets -X associated with objects, and links and arrows are maps f : -X → -Y . Given a path C = (b1 , . . . , bn ) from X to Y , let f = bn ◦· · ·◦b1 . Then f : -X → -Y is called the parallel transport along the path C. There is a more elaborate version of Theorem 7 wherein there are separate input spaces AX and output spaces -X , objects X define maps 1X : AX → -Y and links and arrows are maps -X → AY . The theorem was proven in [50]; it could be generalized to systems with more elements. The proof is constructive, but the construction of -X involves elements of the whole system. Theorem 8 (Principal fibre bundle representation). Let S be a system with countably many objects and links, and suppose that its links are all unitary. Then it admits a representation as in Theorem 7, where -X are copies of the gauge group G, and the maps commute with the right action of G on the -X by group multiplication.
160
G. Mack
Maps which commute with the right action of G amount to left multiplication with group elements. The theorem recovers the structure of lattice gauge fields in pure lattice gauge theory under the single extra structural assumption of unitarity of links, i.e. forth ◦ back = identity . Corollary 1 (associated vector bundle representation). Under conditions as in Theorem 8, if the gauge group G admits a faithful representation in a vector space -, there is a representation as in Theorem 7 where -X are copies of -, and the maps f are linear. If the linear representation is a unitary representation in a Hilbert space, then the arrows are unitary maps of Hilbert spaces. This is the standard construction of associated vector bundles from principal fibre bundles. Proof of Theorem 8. The graph of a system with countably many objects and links admits a spanning tree which generates an unfrustrated system T. Let X0 be its root and identify G = GX0 . To every object X, there is a unique unitary arrow T hX : X0 → X. Associate copies of the gauge group G with objects X; they are all identified with -X0 = G via the maps hX . Convert arrows f : X → Y into elements fG of the gauge group GX0 according to fG = h∗Y ◦ f ◦ hX . fG acts on -X = G by fG (g) = fG g ∈ G = -Y . In this way we construct a system which is isomorphic to the original one and has the desired properties.
4.1. Logic. systems with unitary links are generalizations of gauge theories. There is a quite different class of systems (and of categories [42, 41]) where links and arrows are maps of sets which need be neither surjective nor injective. Logic belongs here. To give an example what can be done with representations, I report some theorems on logic. They were proven by Schrattenholzer in his thesis [65]. I will not reproduce the proofs. Definition 7 (Logical archetype). The logical archetype is the system with two objects denoted T (true) and F (false), and links e : T → F, e∗ : F → T , o : F → F , The composition law is defined by the following relations: e ◦ e∗ = 1F ,
e∗ ◦ e = 1T ,
o ◦ o = o = o∗ ,
o ◦ e = e,
e∗ ◦ o = e∗ .
In addition there is a rule for composing objects with the help of the Scheffer stroke | : T |T = F , T |F = F , F |T = F , F |F = T . In the logical archetype and in all our logical systems, the links are interpreted as “excludes”, and the special case of unitary links as “not”. A pair of adjoint unitary links is graphically represented as ∼. The objects represent potential propositions, and the Scheffer stroke | is interpreted as “neither nor” (NOR). Hence A|A is interpreted as not A. Note that logically, (A excludes B) implies (B excludes A), since both are equivalent to the statement that A and B are not both true. This rule of logic says that adjoints of links should be links. In the following, we are relaxing Assumption 1 by admitting links from a composite object to its constituents and vice versa.
Universal Dynamics; Unified Theory of Complex Systems 1) adjoint of a link
2) A
. .. .. .. .. .
r
3) A
r
4) A
r
5) A
r
r
161
r ❅ ❅
rB
r r
˜ ✏
r B
˜
❅ r .........❅ . ........... r
❍
˜
.
❍❍ .... ❍ .r
...........
r
˜
rB
rB
Fig. 7. Schrattenholzer moves to make logical deductions. Every one of the above compound arrows between A and B may be replaced by a link pair A − B. Lines represent adjoint pairs of links, and • . . . ✷ . . . • stands for a composite object C|D with links to its constituents C, D; is a unitary link pair, interpreted as “is not”
Define a logical system as a system without equivalence relations between paths other than possible unitarity of links, in which some composite objects A|B may appear, subject to the following conditions: 1. With A|B also A and B are in the system, and there are pairs of adjoint links A ↔ A|B ↔ B. 2. For every object A, including composite objects A = B|C there is an object A|A. Furthermore A and A|A are linked by adjoint unitary links ∼ in both directions. A logical representation of a (logical) system is a local functor into the logical archetype, subject to the additional requirement F (A|B) = F (A)|F (B).
(11)
Note that it maps every object of the system into T or F . In this way truth values are assigned. Theorem 9 (Schrattenholzer 1999). In a logical representation of a logical system, truth values T , F are assigned in accordance with the axioms of proposition logic. Conversely, let S be any system (not necessarily a logical one), possibly with composite objects A|B. If there exists an assignment of truth values which is consistent with proposition logic (with the above interpretation of objects, links and Scheffer stroke |), then the system admits a logical representation. The propositional content in a logical system is in its links. They are subject to being transformed by the moves shown in Fig. 7. It was proven by Schrattenholzer that these rules are complete – every deduction of proposition logic is possible with their help. Furthermore, he showed that there is also a local decision calculus. The diagrams in Fig. 7 may be regarded as compound arrows if two parallel links or arrows f1 : X → Y and f2 : X → Y can be regarded as a single arrow, cp. later (Definition 8).
162
G. Mack
4.2. Frustration. In the context of the representation theorem 7, paths C = b1 , . . . , bn from X to Y define parallel transport fC : -X → -Y , of elements in -X and frustration exists if the parallel transport along different paths does not agree. Local frustration occurs when this happens for paths which stay in a neighborhood. This situation appears many times in physics under different names. (If all links are unitary, frustration exists if and only if the gauge group is nontrivial.) – Curvature in general relativity and in the Riemannian geometry of surfaces in 3dimensional Euclidean space. Space time is curved in general relativity if the parallel transport of a tangent vector [e.g. a 4-velocity] from space time point X to X along two different paths need not give the same result. Generically, the gauge group is the Lorentz group. – Field strength in electromagnetism and in the gauge theories of elementary particle physics, where vectors in colour spaces are parallel transported. – Arbitrage in financial markets, and – Frustration in spin glasses. In a spin glass model, one has spins attached to the sites X of a lattice. They may either point up or down. There are links between some of them. They are assigned values +1 if it is energetically favorable for the spins at their ends to be parallel, and −1 if antiparallel is favored. There is frustration if the requirements for energetically favorable alignment of spins are in conflict.10 A gauge theory of financial markets was presented by Ilinski [34]. Eschers impossible pictures provide other examples of frustration: People move around and up and up, yet arrive back at their starting point. In a representation of a three dimensional scene, this is impossible, because change of height should be path independent. The pictures also illustrate the point that frustration prevents the assignment of global meaning (height) by synchronization. There is a lot of frustration in human communication. We do not exclusively communicate facts which have a globally defined meaning. If we think of the absence of a fundamental adjoint as a kind of frustration also,11 then in essentially all our examples nontrivial changes in time are associated with frustration. Thus, change is caused by frustration. 5. Managing Complexity The main theme of this section is the management of complexity by construction of simplified models, also known as effective theories, which operate on coarser scales. Thermodynamics and electrodynamics of polarizable media are well known examples of effective theories in physics. Before coming to this I will discuss emergence as a manifestation of complexity, and some alternative strategies to deal with complexity. 5.1. What is complexity, emergence?. Emergence is sometimes described in terms like these. “From several components which happen to get together something fundamentally new originates, often with totally unexpected properties. The classical example is 10 This explains the name. In real life there is frustration if one’s different desires cannot all be fulfilled because they are mutually incompatible. 11 forth ◦ back = identity in some sense when forth is a one way street
Universal Dynamics; Unified Theory of Complex Systems
163
water whose properties are not predictable from those of hydrogen and oxygen.” It is characterized as surprising behaviour which cannot be anticipated from the behaviour of their isolated parts [10, 21, 3]. But theoretical chemists can predict the properties of water from those of hydrogen and oxygen atoms. The mystery comes from ignoring the links, here chemical bonds. Here I am only interested in emergence in complex systems. I consider a system as genuinely complex if it shows behavior which cannot be understood by considering small subsystems in isolation. Such behavior I call emergent. Since they do not show in small subsystems, such emergent phenomena are nonlocal phenomena. We wish to understand them as a consequence of local interactions, including those from links between constituents of different subsystems of any kind. This is not a task which is hopeless by definition. In quantum field theory there are several known mechanisms which lead to emergent phenomena, and string theorists are exploring more [14]. In general, there are several strategies – – – –
large scale computer simulation, special mechanisms, exploitation of symmetry, multiscale analysis.
Although it is true that computers get faster much more quickly than scientists get smarter, it is not true this will solve all problems in due time. Large scale computer simulations do not qualify as a universal brute force method because the computer does not tell what to look for. 5.2. Special mechanisms. Special mechanisms of particular interest for life include those discussed in Sect. 3.1, including the split Fork dynamics. They are based on propagating shock waves. Several mechanisms are known in gauge field theory which lead to nonlocal phenomena, besides propagating harmonic waves. Typically they imply protection of wave propagation against nonlinear perturbation, forbidding generation of masses. 1. Gauge invariance in gauge theories whose gauge group possesses a nontrivial center .12 This includes theories with an abelian (= commutative) gauge group like Maxwell’s theory. Gauss’ law asserts that the presence of central charges causes flux which can be observed arbitrarily far away. Central charges come from matter fields which transform nontrivially under . Electrically charged fields in Electrodynamics and quark fields in Quantum Chromodynamics are examples. In the latter example, long lines of flux cost too much energy, therefore quarks are confined and physical states carry no central charge [53]. 2. Chiral invariance. The deeper reason behind this is the Atiyah Singer index theorem applied to the Dirac operator D in an external gauge field. It implies that D must have zero modes for certain boundary conditions, in numbers depending on them. This sensitivity to boundary conditions implies that the Greens function for D – the Dirac propagator – will have infinite correlation length when it exists at all, for arbitrary gauge field. Contrast the covariant Laplacian in an external gauge field with nonvanishing field strength. It is strictly positive and its Green’s function decays exponentially [6]. The Atiyah Singer index theorem applies in the continuum. The approximate zero modes have also been 12 The center of a group G consist of those elements which commute with all elements.
164
G. Mack
found in computations of the spectrum of the Kogut Susskind discretization of the Dirac operator on a lattice [37]. 3. Supersymmetry and additional mechanisms now under consideration in string theory cannot be discussed here. 5.3. Exploitation of symmetry. Symmetries are properties of a system as a whole which are often rather easy to detect. When detected, they can be exploited. A crucial problem in complex systems is often the determination of the long distance behaviour. In favorable circumstances this problem can be solved or reduced to manageable problems by exploitation of symmetries. Let us consider examples. Propagation of waves is an emergent phenomenon because local equations of motion lead to nonlocal phenomena. Electromagnetic waves, sound waves, and matter waves are known examples. In homogeneous media, translation symmetry can be used to reduce the problem to one which is no longer complex in our sense. Consider for instance the wave function 0(x)e−iωt of noninteracting electrons in a potential which is invariant under lattice translations x → x + ni ei , ni ∈ Z. Herein, ei (i = 1, 2, 3) are some given vectors which define a lattice. One must solve the 1-particle Schrödinger equation. x could be points of a continuous space or of a discretization of it. One introduces Bloch waves which are invariant under lattice translations, 0(x) = eikx uk (x), uk (x + ei ) = uk (x),
(12) (13)
and one is left with Schrödinger equations for uk on a single lattice cell with periodic boundary conditions, i.e. on a compact space without “large distances”. A more subtle example is the treatment of statistical mechanical systems (in equilibrium) at a critical point. By definition, the correlation length is infinite at a critical point; therefore there are correlations between very distant regions of space, and so the system is complex in our sense to begin with. The problem is to find the long distance behavior. Under suitable conditions, the long distance behaviour can be described by a field theory which is invariant under all conformal transformations. It was shown in the seventies [47] that any such theory can be regarded as living on a compact space and there is a “conformal Hamiltonian” which has a purely discrete spectrum ≥ 0 with only a finite number of eigenstates below any finite value E. This is true in any dimension ≥ 2; in 2 dimensions one needs the extra assumption of half integral spin. The assertion about the spectrum assumes Wilson operator product expansions as asymptotic expansions (they are then automatically summable to convergent expansions [54]). On a compact space, there are no large distances any more. Another subtle class of “manageable” systems are integrable models [23]. Again, their treatment involves subtle transformations to systems which are in a sense “no longer complex”. The author is not prepared to enter into a discussion of these methods, although they deserve mention here because they exploit the property of certain equations of motion that can be expressed as a requirement of no frustration. Nearly all the standard methods of theoretical physics to deal with complex systems are based on the exploitation of symmetries. Often one regards the system of interest as obtained by perturbing a “free” system. The free system is solved by exploitation of symmetries, and the perturbation expansion involves calculations within the framework of the free theory. However, these methods are limited in their applicability.
Universal Dynamics; Unified Theory of Complex Systems
165
5.4. Multiscale analysis. The general idea of multiscale analysis is that, although by definition complex systems cannot be understood by examining small subsystems in isolation, a complexity reduction can be achieved by doing so. It constructs objects and links of a new system whose objects represent subsystems of the old one, but which have much fewer degrees of freedom. The new system is still complex, but typically the procedure can be iterated. In practice, few repetitions suffice because the number of objects decreases exponentially. The axiomatic properties of systems are not quite suitable for the purpose of multiscale analysis, but there is a natural way to extend them without seriously violating the philosophical principle of minimal a priori structure. It is natural to admit the possibility that two (or more) parallel links or arrows b1 : X → Y and b2 : X → Y are regarded as a single arrow, denoted b1 ⊕ b2 . I emphasize that no assumption of linearity is involved at this stage. In the extreme case, ⊕ could be a direct sum. Definition 8 (Semiadditive System). A semiadditive system S+ satisfies the axioms of a system, except that arrows may be composed from links and their adjoints with the help of two operations, ◦ and ⊕. The ⊕-operation makes the set of all arrows with given source X and target Y into an additive semigroup. The distributive law holds (f1 ⊕ f2 ) ◦ (g1 ⊕ g2 ) = (f1 ◦ g1 ) ⊕ (f2 ◦ g1 ) ⊕ (f1 ◦ g2 ) ⊕ (f2 ◦ g2 ). A local functor F of a semiadditive system obeys F (g ⊕ h) = F (g) ⊕ F (h). An arrow o is a zero arrow if f ⊕ o = f for all f . It is understood that arrows are modulo zero arrows. If f ⊕ h = g ⊕ h implies f = g, whatever h, then the additive semigroup admits a unique extension to an additive group. But there are some important examples, so-called discrete event dynamical systems (DED’s) [8] where this property does not hold. They have real (or matrix) valued links (similarly as in Maxwell theory), with addition + as composition ◦, and f ⊕ g = max(f, g). In the case with one object, this is the time table or Max Plus-“algebra”. For matrices, the maximum is taken entry by entry. In the rest of this section, I work in the category of semiadditive systems. and the word system shall mean semiadditive system. 5.4.1. Deterministic case. Multigrid methods. For definiteness sake, I consider iterative solution of optimization problems. This is a very general class of problems. For instance, finding a solution x of an equation f (x) = g is equivalent to finding a minimum of dist(f (x), g) if f (x) and g are in a metric space with distance dist. We seek S in a class S of systems such that a given local cost function H is minimized. H assigns a real number H(S) to every S ∈ S which is a sum of contributions from neighbourhoods. One seeks approximate solutions. Therefore a criterium for a tolerable error should also be specified. In the computation of an iterative solution one starts from a rough approximation S = Sinit and one has a collection of local structural transformations (enzymes) to act on S ∈ S. I assume that it contains enzymes 1. to solve the local problems, 2. to compose links with ◦ and ⊕.
166
G. Mack
The precise local problems depend on the problem, but they are always optimization problems for subsystems of individual neighborhoods within S. Put another way we wish to reduce a global optimization problem to a local one. Under the stated assumptions, there is a universal problem solving strategy, relaxation. One sweeps through the system and determines what is variable in individual links and objects in such a way that the local cost function is minimized subject to the constraint that everything else remains constant. Because of locality of the cost functional, this is a local problem. When relaxation is very slow to converge, one speaks of critical slowing down. It is a typical effect of genuine complexity, because relaxation is inefficient in dealing with nonlocal phenomena. Multiscale analysis comes in when there is critical slowing down. The problem that one may get stuck in local minima is something else again; it is not the issue under consideration here. In a multiscale analysis one introduces levels 0,1,2, . . . . The system S is level 0. One constructs further systems S1 , S2 , . . . called level 1,2, . . . and links between them. In a genuine multigrid (as opposed to unigrid) the only links between levels connect Sj and Sj +1 . An object X 1 ∈ S1 represents a subsystem of S, and similarly for links. S1 varies with S . It is to be constructed together with a local cost function H1 (S1 ) in such a way that ˜ 1 ) of H(S) under the constraint that S1 is fixed can 1) The conditional minimum S(S ˜ 1 ) will be called the optimal be found by fast converging relaxation in level 0. S(S 1 interpolation of S . ˜ 1 ) is an approximate minimum of H which can 2) If S1 is a minimum of H1 then S(S be corrected by further relaxation sweeps on level 0. In this way the problem is reduced to minimization on level 1. To solve this problem, one introduces level 2, and so on. The technical aspects of how to organize the whole iteration scheme (V-cycles, W-cycles . . . ) shall not interest us here [68]. The objects and links of S1 may contain data which reflect the structure of the corresponding subsystems of S, but they do not determine it uniquely. In this sense there is complexity reduction. Typically, the number of elements (links and objects) in level 1 is only a fraction of those in level 0, and the numbers of internal degrees of freedom of individual elements is about the same. The idea is that one only retains structural information to the extent that it is relevant for the cooperation of subsystems as a whole that is responsible for nonlocal emergent phenomena. The problem with this method is the fulfillment of the above requirements 1) and 2). There are very different kinds of optimization problems, and only a part of them can be successfully treated with existing multigrid technology. Deterministic equations of motion are intractable except in favorable cases. Let me describe an Example 2 (discretized linear elliptic PDE’s). We seek the solution of a system of linear equations for vector valued functions u = {uz }, Lzw uw + fz = 0, (14) w
or Lu+f = 0 for short. Lzw are linear maps (matrices), and f is a given vector function (section, really . . . ), uz , fz ∈ -z .
Universal Dynamics; Unified Theory of Complex Systems
167
We assume that L is a positive operator. The system theoretic interpretation is as follows. I work within the context of the associated vector bundle representation theorem, Corollary 1, with Hilbert spaces -z with scalar product < . >. Addition ⊕ of links shall be written as +, symbols ◦ are omitted, and 0 is the zero link. S shall consist of a constant system S¯ with at most one link Lwz : z → w for every pair z, w of objects, plus one object ∞ with associated vector space -∞ . For simplicity, ¯ and identify fz , uz admit and take -∞ = R, possibly not isomorphic with -z , z ∈ S, with maps R → -z , viz. r → uz r. In this way, fz and uz become links ∞ → z. ¯ there shall be a constant link (linear map) fz : -∞ → -z and For every object z ∈ S, one variable link (linear map) uz : -∞ → -z . The cost functional shall be quadratic, H(S) ≡ E(u) =
1 < uw , Lwz uz > + &fz , uz '. 2 z,w z
(15)
The local problem is the solution of some equation Lzz uz +gz = 0 for individual z. Write its solution as (−L−1 zz )gz , without implying that negative and inverse can be computed separately. (−Lzz )−1 are loops. Relaxation at z updates ) u + L u . uz → (−L−1 z zw w zz w=z
This only involves solution of the aforementioned local problem, and composition and addition of links. By assumption, there are enzymes to solve it. The iterative solution is pfw , (16) uz = (−Lzz )−1 w p:w→z
where the sum is over all paths p with links Lzw (−L−1 ww ), and p is the arrow associated with the path p. This illustrates once again the principle of universality and minimal a priori structure. The axiomatic composition operations ⊕ and ◦ suffice to reduce a global to a local problem, and if the local problem is a global one on a finer scale, the procedure can be repeated. If there are important contributions in (16) from very long paths, the iteration is slow to converge, and there is critical slowing down. This happens when L has very small eigenvalues. The multigrid method deals with this. For reasons of space, I will not give details here but treat the stochastic case instead. 5.4.2. Stochastic case. Renormalization group. The setup in the stochastic case is as in the deterministic case except that now we don’t want to minimize a cost function H, but study probability distributions for systems S ∈ S of the form p(S) = Z −1 e−β H(S) p0 (S), where the a priori distribution p0 assigns equal probabilities in some sense. We are also interested in properties of the expectation values. Thus, S are random systems, and their elements are random variables. In a critical situation, we expect long range correlations. Long range is measured by path length. If S is a drama, the long range correlations could be in time.13 Starting from level 0, one introduces systems S1 of level 1 and a probability distribution p1 (S1 ) for them. The links and objects of level 1 represent subsystems of S in 13 Haken’s slave principle [30] was invented to deal with this case
168
G. Mack
the same way as before, and they vary with S. One considers the conditional probability distribution p(S|S1 ) for S, given S1 . One demands that 1. The conditional probability distribution p(S|S1 ) for S shows no long range correlations. 2. p(S) = S1 p(S|S1 )p 1 (S1 ). Typically, the objects x of S1 contain data :x which reflect the structure of the subsystems X of S to which x corresponds. It does so to the extent that it is relevant for cooperative effects, i.e. long range correlations. And similarly for links. It does not fix the system X uniquely. The requirement 1 says that emergent phenomena disappear when :x are frozen. The quantities :x were introduced into statistical mechanics by Kadanoff [36] under the name of block spin. They are also called macros [59]. The main problem in the approach is to find the suitable subsystems X (blocks) and a good choice of block spins or macros. When one succeeds, the study of nonlocal phenomena – long range correlations – has been lifted to level 1, and a complexity reduction has been achieved. Now one can iterate the procedure. Given the blocks and a choice of block spin, how are the links in S1 and the cost function on level 1 constructed and how can one find out whether the requirements 1 and 2 are satisfied? No general procedure is known which is always guaranteed to work. But the example below gives an idea how the task may be performed by enzymatic computation, at least in favorable cases. There remains the problem of how to choose the blocks and the block spins, much as in the deterministic case. In successful applications of the real space renormalization group to ferromagnets, lattice gauge theories [5], and other problems in physics, successful blocks and block spins could be guessed a priori. The big task for the future is to construct general and systematic procedures to find them. Neural nets [33] are a very difficult example of a prospective application. For a ferromagnet in thermal equilibrium, suitable blocks are cubes in space of some extension, and a suitable block spin is the total magnetization in the cube. It fixes only the average value of the magnetic moment vectors of the elementary magnets in the cube. But this is all that matters for the purpose of determining long range correlations, i.e. the physics at coarse scales [36]. The construction of block spins is a cognitive procedure. It involves construction of new links which are not composed from existing links alone, although very nearly so. In the construction below, the irreducibly new link is the identification of x with a representative (“typical”) object in X. In biological or social organisms we imagine that they have limited cognitive capabilities which have been acquired by evolution. They determine what blocks and blockspins are subject to being tried out. In the spirit of Ehresmann and Vanbremeersch [18], one may imagine that they carry templates of the index category J which is used in the construction of the block spin, cp. the example below and Subsect. 5.4.3. Example 3 (Block spin). The system S and the cost functional H(S) are the same as in the deterministic case, Example 2. uz ∈ -z are now random variables. Their a priori distribution is given by the uniform measure duz in -z and we seek to examine the
Universal Dynamics; Unified Theory of Complex Systems
169
probability measure dµ(u) = Z −1 e−β H(u)
duz ,
(17)
z
H(u) =
1 &uz , Lzw uw ' + &fz , uz '. 2 z,w z
(18)
This would become (almost) a realistic Euclidean quantum field theory model of strongly interacting elementary particles if z formed a hypercubic 4-dimensional lattice and if the lattice gauge fields Lzw , z = w were dynamical, with values in SU (3). dµ is a Gaussian measure.14 Let us discuss the example. Consider subsystems X of S¯ (level 0), for instance a hypercube of some side length in the above mentioned 4-dimensional hypercubic lattice, together with all links Lzw between its objects (lattice sites) z, w. We seek to represent X by one object x at the next level 1, and construct a block spin Ux ∈ -x as some average of uz over objects z ∈ X, Ux = Cxz uz . (19) z∈X
We omit ◦-symbols again for the composition of maps; is here addition in the vector space -x . Cxz : -z → -x are linear maps. Elements uz ∈ -z for different z are in different spaces. To add them up they need to be parallel transported to some representative site xˆ ∈ X first. This can be done with the help of a tree J with root x which specifies a unique arrow z → x, i.e. a linear map txz ˆ : -z → -xˆ . J could be a tree made of links of X or it could be a star of arrows which are sums of parallel transporter from z to xˆ along some classes of paths. Given J we construct Cxz = Cx xˆ txz ˆ .
(20)
This involves one new link Cx xˆ : xˆ → x which links the representative object xˆ ∈ X in X to the representative x of X on the next level. Some such new link is inevitably needed since we have no links between levels to start with. We choose Cxx ˆ as identification map of -xˆ and -x . This completes the block spin definition. Categorical Remark 6. J is an unfrustrated subcategory of Cat(X) of the type of a partial order [40], p.11. At this stage we don’t insist on making it into a system, so adjoints of its arrows need not be in J. But if the arrows txz are unitary, we could put their adjoints into J and make it into a system without introducing frustration. It retains the type of a preorder. Now we come to the construction of the links at level 1 and the examination of the locality requirements. The standard procedure [25] is to construct an interpolation operator A, i.e. a collection of links Azx : x → z, z ∈ S¯ for all x ∈ S1 in such a way that ∗ 1 (LA)zx ≡ Lzw Awx = Cyz Lyx (21) w
y
14 The qualification “almost” refers to the fact that u should be Fermi fields rather than true random variables, and L should be the Dirac operator in a gauge field, which is not positive.
170
G. Mack
∗ and arrows in S. ¯ for some L1 . The links Azx are supposed to be composed from Cxw The sum over objects y of level 1 has only one term if every z is in only one block y. A suitable A may be obtained by minimizing z,x < A∗zx , (LA)zx > subject to the constraint CA = 1, i.e. z Cxz Azy = δxy for all x, y. This extra condition makes A unique, and if the block spin choice is “good”, Azx will be local in the sense that Azx is very nearly zero except for z in a reasonably small neighborhood of the subsystem X. L1xy are the desired links at the next level. Under the stated condition it will also be local – i.e. only a few L1xy will be not very nearly zero. This follows from L1 = CLA if CC ∗ = 1. Given A, any u can be uniquely split into a contribution from a blockspin U = {Ux } and a fluctuation field ζ = {ζz } which satisfies the constraint Cζ = 0 so that it contributes nothing to the blockspin, uz = ζz + Azx Ux . (22) x
The cost function decomposes as H(u) =
1 1 < ζ Lζ > + < f, Lζ > + < U, L1 U > + < Cf, L1 U > . 2 2
(23)
The constraint CA = 1 may be put into the measure dµ in the form of a δ-function which is the limit of a Gaussian. One finds that ζ are Gaussian random variables [27] with covariance = limκ→∞ κ , κ = (L + κC ∗ C)−1 . One may opt to keep κ finite, thereby relaxing CA = 1. In this case A = κκ C ∗ . κ decays fast with pathdistance beThen all locality requirements are fulfilled if zw 1 tween z and w. In particular, Azx and Lxy will also be local. In conclusion, one needs to find a local “interpolation operator” A = {Azx } such that Eq. (21) holds for some L1 . This yields the links L1xy of the system S1 at the next scale. 5.4.3. (Co)Limits. Here I wish to establish the connection with the work in mathematical biology of Ehresmann and Vanbremeersh [19]. They propose to consider the objects X in level j + 1 which represent subsystems of the level j system as limits in a category. The same construction of composite objects as limits is also used in information science in what is called integration [20]. I will argue that (co)limits serve the same purpose as blockspins, and prove that the blockspin of Sect. 5.4.2 defines a colimit. I recall the notion of a limit [40]. Given a category C and a (small) category J, called the indexing category, a functor F : J → C is called a diagram in C of type J. To be intuitive, Vanbremeersh and Ehresmann call it a pattern of linked objects. By the map, some objects Fj = F (j ) of C are indexed by objects j of J. An object L of C together with a collection of arrows πj : L → Fj , one for each j ∈ J , is called a cone π : L → F on the diagram F of type J with vertex L if the following compatibility condition is satisfied. For any arrow u : i → j in J, πk = F (u) ◦ πj .
(24)
In ref. [19], π = {πj } is called a collective link. It links L to the subsystem L which it is supposed to represent. In our applications, the subsystem has Fj as its objects, but in general it has more arrows than F (u), u ∈ J. The images of links u ∈ J are special links, which are “important for the collaboration”. Note that the collective link projects out any frustration in the image of J in the following sense. Given two arrows F (u1 ) : Fi → Fj
Universal Dynamics; Unified Theory of Complex Systems
171
and F (u2 ) : Fi → Fj , Eq. (24) implies F (u1 ) ◦ πj = F (u2 ) ◦ πj .. If F maps several j to the same object z = F (j ), we need only one link πj ≡ π˜ z : L → z for them, and we may regard π˜ z as a (link-) field whose argument is z ∈ L. The condition (24) can be interpreted to say that this field is constant under parallel transport along paths which are images of paths in J. We say that it is “constant along J” for short. A cone π : L → F is called a limit of the diagram J if the following uniqueness property holds. Given any other cone f : Y → F , there exists a unique arrow g : X → L such that fj = πj ◦ g. By abuse of language, L is called the limit. The condition of a cone f means that f˜z is constant along J. Colimits are dual to limits, i.e. all arrows are reversed. In our applications, all categories except the index categories J will be systems. Therefore {πj } defines a limit if {π ∗ } defines a colimit of a dual diagram of type Jop , where Jop is J with arrows reversed [40]. Theorem 10 (Blockspins as colimits). The blockspin construction of Sect. 5.4.2 based on a tree J defines a colimit x ∈ S1 in the category which contains Cat(S), Cat(S1 ) and the links Cx,z : z → x, (z ∈ S), and in any category containing it. The interpretation of blockspins as (co)limits serves to translate from a quantitative description to a structural one. Before we proceed to the easy proof of Theorem 10, let us discuss how the interpolation operator is interpreted. Equation (21) requires that for every x, f˜z = (LA)zx =
Lzw Awx : x → z
(25)
w ∗ . In other words, {(LA) } is a is constant along J, because this is true of π˜ z = Czy zx ∗ }. collective link which defines a cone on the same diagram as for the collective link {Czx ∗ 1 ∗ 1 If every object z is in only one block zˆ , then (C L )zx = Czˆz Lzˆ x (no sum). Therefore, if LA defines a cone, the existence of L1 is assured by the property that the collective ∗ } defines a limit. link {Czx In conclusion, the existence of the interpolation operator requires the existence of another “factorizing” cone on the same diagram as the limit cone. The factorization into L and some A is expressed by Eq. (25), L = {Lzw } is the collection of links of the system S at the fine scale. (Note that this is not of the form of standard factorization properties in category theory because of the sum. It involves the ⊕-operation of semiadditive systems.) In addition, the locality properties laid down in previous subsections should be satisfied. This means that it must be possible to approximate Azx by zero if z is not in a reasonably small neighborhood of the subsystem X represented by x. How to go about treating the error made in this way is a subtle issue [66] which I am not prepared to discuss here. Let us turn to the proof of the theorem. The definition of a limit is external in the sense that one needs to seek an arrow in the whole category and show its uniqueness. This is typical of the universal constructions of category theory. This feature is what makes category theory into “abstract nonsense”. But there are instances where limits are internal, i.e. require only examination of the arrows which are involved in their construction. Products in Ab-categories are examples. They are necessarily biproducts, and biproducts are easy to characterize internally, cp. Theorem 2 of Sect. VIII in [40]. Products are limits with index categories J which have no arrows other than the identity arrows. A similar situation holds when J is a tree or preorder.
172
G. Mack
Lemma 1 (Colimits on trees). Let J be a tree with root r, so that it is unfrustrated and there is a unique arrow tj : j → r in J for every object j ∈ J. Then a cocone π : A(J) → L is a colimit if πr has an inverse. Conversely, the colimit property requires that πr has a left inverse ir , viz ir ◦ πr = 1L . The dual statement is true for limits. Proof of Theorem 10. J and F (J) are identified in Sect. 5.4.2. The collective links are Cxz : z → x, (z ∈ A(J)), and the root is x. ˆ The compatibility condition for a cone is satisfied by construction. The theorem is an immediate consequence of the lemma, since Cx xˆ , which substitutes for πr , was chosen as an identification map, whose adjoint is its inverse. Proof of Lemma 1. Given another cocone f , g = fr ◦ ir is the required unique map. It is unique because fr = g ◦ πr = h ◦ πr implies g = h by invertibility. Conversely, t = tj is a cocone with vertex r and the required unique map g must be a left inverse of πr . Definition 9 (Ehresmann Vanbremeersch). A hierarchical system is a category H whose objects are divided into levels, numbered 0, 1, . . . , p, such that each object of level n + 1 (where n < p) is the limit in H of a pattern A of linked objects [= diagram] of level n (i.e. each Ai has level n). I propose to substitute “system” H for “category H”, and count the arrows πi in the collective links as links. I would also prefer to speak of colimits in place of limits. If indeed blockspins and colimits are basically the same, as is suggested by the above Theorem 10, this setup corresponds with the multiscale analysis of Sect. 5.4. It is always possible to extend categories by adding objects which represent limits of certain diagrams [18]. Mathematicians often speak of categories which have all finite limits (i.e. limits for all finite diagrams). But to do so would be contrary to the intended complexity reduction. 5.5. Dynamics on coarser levels. The somewhat abstract considerations of section 5.4.3 serve to convert blockspin constructions from the quantitative description that is used in quantum field theory and statistical mechanics to a structural description. The problem is now how to extend the dynamics from level 0 to the higher levels. We think of a stochastic dynamics. In autopoietic systems, the objects in the higher levels will typically represent functional units of objects of lower levels which should be capable of making their elements. They may disappear, i.e. die. They may live on, possibly adapting or differentiating. And new ones may form, for instance as newly made copies of already existing units, building blocks being absorbed from the environment. How and when does this happen? Diagrams of type J were identified with certain types of block spins, and the object x to which the block spin is attached was identified with a limit of the diagram. The objects x represent subsystems X of S whose constituents cooperate through links (channels of communication) which are determined by the diagram. x may stand for organs in a biological organism, for institutions of a society, for extended domains in a ferromagnet, or for any kind of a “thing”. We only want to keep or acquire them in our model when they achieve something or are needed to achieve something, namely the cure of locality problems as discussed in Sects. 5.4.1 and 5.4.2. This is the criterium by which objects representing subsystems will appear or disappear.
Universal Dynamics; Unified Theory of Complex Systems
173
An object x of this kind in level n+1 may disappear if the limit of a diagram in level n ceases to exist. This mechanism was suggesed by Ehresmann and Vanbremeersh. It can happen when the diagram is disrupted because the links involved in the collaboration disappear. I give an example in a moment. It may also cease to exist because frustration appears in the diagram, so that the cone ceases to exist. Informally speaking, confusion arises because communication of the collaborators through different channels produces different messages. (If J is a tree, this cannot happen). It may occur that there are short range correlations only on level n in the vicinity of some subsystem X, even without any blockspin constraint. No block x on level n + 1 is needed in that case. If it is present anyway, it will have no important links Ln+1 xy to other objects y on level n + 1, hence no relations to anything in the “rest of the world Sn+1 ”. Such an object does not exist for the “rest of the world” and should be discarded. Conversely, if locality properties in S or in time start to be violated, they need to be salvaged by introducing a blockspin constraint as discussed in Sects. 5.4.1 and 5.4.2, and with the block spin comes an object to which it is attached. Suppose that the correlations are not short ranged at level n without block spin constraint. This can be decided on the basis of relaxation sweeps, i.e. by enzymatic computation. They will produce correlations growing beyond the allowed range. In this case, suitable blocks and block spin constraints will have to be introduced until the residual correlations under the block spin constraint are short ranged. If the multiscale analysis were done on the drama, short ranged would mean in particular short ranged in time. Thus, the effect of particular properties (initial conditions) which are independent of the value of the blockspins will die out quickly. Only the cooperative effects which are well described by the block spins will survive. Here is the example for the loss of a diagram. It is a frequent cause of death that an organism or a functional part of it gets digested by a predator or parasite. Digestion is performed by special enzymes. For instance, T4 bacteriophage’s nuclease enzymes degrade its E.coli host’s chromosome (but not its own genome). Consider one organism P being digested by another one who attacks it by acting on its object X with its digestion enzyme according to the model mechanism of Sect. 3.1.2. Take any subsystem Q of P which does not contain X. It will be totally disconnected after the digestion process. Therefore, the supposed images of links in the diagram J – the links which are essential for the collaboration – are missing. So the loss of structure at some level, e.g. by digestion, may lead to loss of the limit object in the next level. The choice of block spin will typically not be unique. But if suitable block spins can be found, the long range correlations are under control, and therefore all emergent phenomena. They are merely described in a different language when different block spins are chosen. What has been proposed here is a reductionist scenario – the dynamics on the lower level determines what happens at higher levels, modulo switch of language. Locality is the crucial ingredient of the construction. Earlier multi-level analyses of biological systems [19, 3, 21] had no locality principle and they relaxed on reductionism. Extreme views were expressed by Laughlin and Pines. They claimed that neither life nor high temperature superconductivity can be understood from basic principles [39]. Autopoietic systems make their own elements. This appears to require a top down action of objects x at level n + 1 to make elements which are objects or links at level n inside the subsystem X to which x corresponds. We understand this now. There is a dynamics at level n which gives rise to nonlocal – therefore emergent – phenomena. And these are effectively described with the help of objects x at level n + 1.
174
G. Mack
I add a few words on functionality. Technically most convenient would be a multiscale analysis of the drama. In this way, the higher levels would also acquire longer time scales, and function would appear as structure. For intuition’s sake, I spoke here of changes in time instead. Therefore we needed to speak of functionality separately. In the present approach, functionality depends on the presence of a suitable complement of enzymes. The integrity of boundaries of subsystem may also be important in order to confine the domain where enzymes act, and also the presence of channels of communication which transfer quantities of material constituents. All this should be reflected at the coarser scale when the functionality is important for cooperation at that scale. Production of copies is an emergent phenomenon, as we saw in Sect. 3.1.1. Typically the copy process absorbs objects which involve, at a still lower level, materials (including energy) which are conserved or supplied by the environment in limited quantities. In this way, a competition for scarce resources results which drives evolution. Imagine a subsystem X is copied by the split Fork dynamics because a sufficient collection of microenzymes is present in X. Then the split Fork-enzyme, now considered as one entity, should be attached to the object x which corresponds to X at the next scale. Let me emphasize that a general block spin procedure can be much more complicated than for a ferromagnet, where the Kadanoff construction furnishes one block spin definition for all purposes and all scales. In general, the appropriate kind of blockspin on level n + 1 may depend on the values of variables or block spins ϕ on level n. Since ϕ are random variables, there may be nonvanishing probabilities for the appropriateness of several kinds of block spin descriptions. This can lead to bifurcations. The situation on the next scale may be the same again, and a proliferation of possibilities may result. One cannot expect to find giraffes by enumeration of all possibilities of living organisms. Moreover, scaling laws (power laws) are expected to emerge in special circumstances only. Ferromagnets, the Bak Sneppen model of evolution [4] and Lotka–Volterra-models [67] are examples, but power laws are not a general indicator of criticality.
5.6. Deductive vs. observational emergence. Baas [3] makes a distinction between deducible and observational emergence. He interprets Gödels incompleteness theorem in logic as a case of observational emergence. In the present framework, we may classify as deduction in this sense anything that involves composition of existing links (with ◦ and ⊕). This does not change the category. Deductions in proposition logic are of this kind, cp. Sect. 4.1, Fig. 7. Observational emergence would then involve the making of cognitive links. They are new links added to the category. The blockspin constructions involve new links.
6. Semicommutative Differential Calculus and Geometry on systems In this section I want to bring systems theory closer to the traditional approach in physics which is based on differential calculus. Given a system S, there is a unique local functor to an unfrustrated system B which shares the objects with S and inherits equivalence classes of its parallel links. In B, different links of S with the same source and target are identified, and equally the arrows. This is an example of the possible identifications mentioned in Sect. 2.2. B is determined
Universal Dynamics; Unified Theory of Complex Systems
175
by the graph of S and shall be called the base system of S, or base for short. I assume for simplicity that it has at most countably many elements15 . Differential calculus and geometry puts more structure on a given base, thereby creating systems S with given base, and it constructs algebras and modules to go with them. Changes in connectivity of the base are outside the scope of calculus. The appropriate version of differential calculus to fit onto arbitrary unfrustrated systems B is a special case of noncommutative differential calculus and geometry [12] which was developed by Dimakis and Müller-Hoissen [15]. I call it semicommutative because the “algebra of functions” A remains commutative. In this framework, conventional notions of locality and the notion of a point, which are given up in fully noncommutative differential calculus, retain their meaning. One usage is in lattice gauge theory on a hypercubic lattice. All the familiar formulae from gauge theory in the continuum remain literally true, except for the commutation relations of differentials with functions [16], cp. Eq. (31). The use of this device is in the spirit of the strategy to bring proven methods of theoretical physics to bear on very general complex systems. The discrete calculus substitutes for and is in many ways like calculus on manifolds. Given a base B, let A be the algebra with unit element consisting of real or complex functions on B with pointwise multiplicaton. It has a basis {eX } labelled by objects X of B such that eX eY = δXY eX ,
(26)
δXY being the Kronecker δ-function. From now on, the symbol X → Y shall mean that there exists a link from X to Y . Since there is at most one such link b we may write b = (XY ). The differential algebra - is generated by the algebra A of functions and differentials eXY attached to the links of B. -1 = spanC {eXY : X → Y } is made into a A-bialgebra via eZ eXY = δ ZX eXY , The quantity ρ = d is defined by
eXY eZ = δ Y Z eXY .
(27)
eXY (sum over all links) is introduced, and exterior differentiation deX = ρeX − eX ρ, de
XY
X
Y
(28) X 2 Y
X
Y
= ρe ρe − e ρ e + e ρe ρ,
(29)
and the standard graded Leibniz rule. There are relations between the generators in the algebra. Whenever the link from X to Y is missing, eX ρeY = 0. Applying d implies the constraint eX ρ 2 eY = 0, i.e. eXZ eZY = 0. (30) Z
It was shown by Dimakis and Müller-Hoissen that this calculus reduces to something looking familiar on an “oriented” d-dimensional hypercubic lattice of lattice spacing a. Of the two directions ±µ, one is distinguished as positive, say +µ, µ = 1, . . . , d, and links are put between nearest neighbours denoted x and x + µ in positive direction only. 15 B need not be like a grid. For instance, it might contain grid-like subsystems together with arbitrary refinements of these. The floating lattices in the Ashtekar approach to gravity are like this [2].
176
G. Mack
If x µ are the standard coordinate functions, one computes dx µ = ex,x+µ and therefore df (x) = a −1 [f (x + µ) − f (x)]dx µ . The constraints among differentials reproduce the standard relations dx µ dx ν + ν dx dx µ = 0. There is a Hodge *-operator, and the calculus shares all the properties of the continuum calculus, except that f (x)dx µ = dx µ f (x + µ).
(31)
This rectifies the Leibniz rule. To understand the usefulness of all this, consider Theorem 11 (Gauss constraint). The Maxwell dynamics of a free electromagnetic field in discrete space and time preserves the Gauss constraint. Proof. One transcribes the standard proof. Let ds be the exterior derivative in space, ∗ the Hodge star operator in space and ds∗ = ∗d∗. The electric field defines a 1-form E = Ei dx i . The Gauss law say that ds∗ E = 0. The magnetic field defines a 2-form B and the equations of motion say that E˙ = ds∗ B. Since d ∗2 = 0 it follows that ds∗ E˙ = 0, and so the Gauss constraint is preserved. All the quantities have their analog in the semicommutative calculus on an oriented cubic lattice, E˙ becomes the finite difference derivative, and the equations of motion and Gauss constraint retain their form. So the proof carries over. Acknowledgement. I would like to thank A. Brandt, I. Cohen, M. Meier-Schellersheim, S. Solomon, J. Wuerthner and Y. Xylander for many helpful and stimulating discussions.
References 1. Alberts, B., Bray, D., Lewis, J., Raff, M., Roberts, K., Watson, J.: Molecular Biology of the Cell. New York, London: Garland Publishing, 1994 2. Ashtekar, A. and Lewandowski, J.: Differential geometry on the space of connections via graphs and projective limits. J. Geom. Phys. 17, 191–230 (1995) Ashtekar, A., Lewandowski, J., Marolf, D., Mourao, J., Thiemann, T.: Quantization of diffeomorphism invariant theories of connections with local degrees of freedom. J. Math. Phys. 36, 6456–6493 (1995) 3. Baas, N.A.: Emergence, Hierarchies and Hyperstructures. In: Artificial Life III, G. Langton (ed), SFI Studies in the Sciences of Complexity, Proc. Vol XVII, Reading, MA: Addison Wesley, 1994 Baas, N.A. and Emmeche, C.: On emergence and explanation. Intellectica 1997/2, 67–83 (1997) 4. Bak, P., Sneppen, K.: Punctuated equilibrium and criticality in a simple model of evolution. Phys. Rev. Letters 71, 4083 (1993) 5. Balaban, T.: Renormalization group approach to lattice gauge field theories. Commun. Math. Phys. 109, 249 (1987) Balaban, T., Imbrie, J., Jaffe, A.: Renormalization of Higgs Models: Minimizers, Propagators and the stability of mean field theory. Commun. Math. Phys. 97, 299 (1985) 6. Balaban, T.: Regularity and decay properties of lattice Green’s functions. Commun. Math. Phys. 89, 571 (1983) 7. von Bertalanffy, L.: Les problèmes de la vie. Paris: Gallimard, 1956 General systems theory. New York: George Braziller, 1968 8. Braker, H.: Algorithms and applications in timed discrete event systems. Dissertation Delft, 1993 Pöppe, C.: Fahrplanalgebra. Spektrum der Wissenschaft, Digest Wissenschaftliches Rechnen, 1999 9. Brandt, A., Ron, D.: Recovery of renormalized Hamiltonians and Coarse-to-Fine Monte-Carlo Acceleration. Weizmann Institute preprint, 1999 10. Casti, J.L.: The simply complex: trendy buzzwords or emerging new science? Bull. Santa Fé Institute 7, 10–13 (1992) 11. Cohen, I.: Tending Adam’s garden: Evolving the Cognitive Immune Self. New York: Academic Press, 2000 12. Connes, A.: Noncommutative Geometry. New York: Academic Press, 1994
Universal Dynamics; Unified Theory of Complex Systems
177
13. Connes, A., Lott, J.: Particle physics and noncommutative geometry. Nucl. Phys. B (Proc. Suppl.) 18, 29–47 (1991) 14. Deligne, P. et al (eds): Quantum fields and Strings: A course for mathematicians. Vol I,II. Providence, RI: American Math. Soc., 1999 15. Dimakis, A., Müller-Hoissen, F.: Differential calculus and gauge theory on finite sets. J. Phys. A 27, 3159 (1994) Discrete differential calculus, graphs, topologies and gauge theories. J. Math. Phys. 35, 6703 (1994) Dimakis, A., Müller-Hoissen, F., Vandersypen, F.: Discrete differential manifolds and dynamics on networks. J. Math. Phys. 36, 3771 (1995) 16. Dimakis, A., Müller-Hoissen, F. and Striker, T.: Noncommutative differential calculus and lattice gauge theory. J. Phys. A 26, 1927 (1993) From continuum to lattice theory via deformation of the differential calculus. Phys. Lett. B 300, 141 (1993) 17. Dirac, P.A.M.: The principles of quantum mechanics, Oxford: Clarendon, 1958 18. Ehresmann, A.C. and Ehresmann, C.: Categories of sketched structures. Cah. Top. Géom. Diff. XIII 2, 105–214 (1972) 19. Ehresmann, A.C. and Vanbremeersch, J.-P.: Hierarchical Evolutive Systems: A Mathematical Model for Complex Systems. Bull. Math Biology 49, 13–50 (1987) 20. Ehrig, H. and Orejas, F.: Integration and Classification of Data Types and Process Specification Techniques. Bulletin EATCS: Formal Specification Column 65 (1998) 21. Emmeche, C., Koppe, S., Stjernfeld, F.: Explaining emergence – towards an ontology of levels. J. General Philosophy of Sciences 28, 83–119 (1997) 22. Faddeev, L.D., Slavnov, A.A.: Gauge Fields. Introduction to Quantum Theory. Reading, MA: Benjamin/Cummings, 1980 23. Faddeev, L.D. and Takhtajan, L.A.: Hamiltonian methods in the theory of solitons. Heidelberg: Springer, 1987 24. Frampton, P.H. and Glashow, S.L.: Unifiable Chiral Color with Natural Glashow–Iliopoulos–Maiani Mechanism. Phys. Rev. Lett. 58, 2168–2170 (1987) 25. Gawedzki, K., Kupiainen, A.: A rigorous blockspin approach to massless lattice theories. Commun. Math. Phys. 77, 31–64 (1980) 26. Gell-Mann, M.: The quark and the jaguar: Adventures in the simple and the complex. San Francisco: Freeman, 1994 27. Glimm, J., Jaffe, A.: Quantum Physics. A functional Integral Point of View. Heidelberg: Springer, 1981 28. Gould, R.: Graph theory. Reading, MA: Benjamin/Cummings, 1988 29. Gupta, R. et al.: Monte Carlo renormalization group for SU (3) lattice gauge theory. Phys. Rev. Lett. bf 53, 1721 (1984) Gupta, R., Wilson, K.G., Umrigar, C.: Improved Monte Carlo renormalization group method. In: Proc. Frontiers in quantum Monte Carlo, J. Stat. Phys. 43, 1095–1099 (1986) 30. Haken, H.: Synergetics. Heidelberg: Springer, 1972 31. Hankin, C.: Lambda Calculi. Oxford: Clarendon Press, 1994 32. Henneaux, M. and Teitelboim, C.: Quantization of gauge systems. Princeton, NJ: Princeton University Press, 1992 33. Hertz, J., Krogh, A., Palmer, R.G.: Introduction to the theory of neural computation. Redwood City: Addison Wesley, 1991 34. Ilinski, K.: Physics of Finance. Preprint; hep-th/9710148 (1997) 35. Jacob, F.: La logique du vivant. Paris: Gallimard, 1970 36. Kadanoff, L.P.: “Scaling laws for Ising models near Tc ”. Physics 2, 263 (1965), reviewed in Sect. 2.2.2 of ref. [38] 37. Kalkreuter, T.: Spektrum of the Dirac operator and multigrid algorithm with dynamical staggered fermions. Phys. Rev. D 51, 1305 (1995) 38. Kogut, J., Wilson, K.: The renormalization group and the F-expansion. Phys. Reports C 12, 75 (1974) 39. Laughlin, R.B., Pines, D.: The theory of everything. Proc. Natl. Acad. Sciences 97, 28–31 (2000) 40. Mac Lane, S.: Categories for the working mathematician, New York: Springer Verlag, 1971 41. Mac Lane, S. and Moerdijk, I.: Sheaves in Geometry and Logic. A first introduction to topos theory. Heidelberg: Springer, 1992 42. Lawvere, F.W.: An elementary theory of the category of sets. Proc. Natl. Acad. Sci. 52, 1506–1511 (1964) 43. Lewin, B.: Genes VII. Oxford: Oxford University Press, 2000, p. 349 44. Lindenmayer, A.H.: Mathematical models for Cellular Interactions in Development I,II. J. Theor. Biol. 18, 280–315 (1968) 45. Louie, A.H.: Categorical System Theory and the Phenomenological Calculus. Bull. Math. Biol. 45, 1029– 1045 (1983) Louie, A.H.: Categorical System Theory. Bull. Math. Biol. 45, 1047–1072 (1983)
178
G. Mack
46. Luhmann, N.: The autopoiesis of social systems. In: Sociocybernetic paradoxes: Observation, Control and Self-Steering Systems, R.F. Geyer and J. van der Zouwen (eds), London: Sage 1986, pp. 172–192 Luhmann, N.: Soziale Systeme: Grundriss einer allgemeinen Theorie, Frankfurt: Suhrkamp, 1988 47. Lüscher, M., Mack, G.: Global conformal invariance. Commun. Math. Phys. 41, 203 (1975) 48. Mack, G.: Emergence, a case of order vs. disorder. Physica D, submitted 49. Mack, G.: To Gauge Theory from a Minimum of a priori structure. Proc. Steklov Inst. Mathematics 226, 208–216 (1999) 50. Mack, G.: Pushing Einsteins principles to the extreme. In: G. ’t Hooft et al, Quantum Fields and Quantum Space Time, NATO-ASI series B: Physics Vol. 364, New York: Plenum Press, 1997 51. Mack, G.: Gauge theory of things alive: Universal dynamics as a tool in parallel computing. Progress Theor. Phys. (Kyoto) Suppl. 122, 201–212 (1996) 52. Mack, G.: Gauge theory of things alive. Nucl. Phys. B (Proc Suppl.) 42, 923–925 (1995); extended version: Gauge theory of things alive and Universal dynamics. DESY 94-184, hep-lat/9411059 53. Mack, G.: Colour screening and Quark confinement. Phys. Lett. B 79, 263 (1978) 54. Mack, G.: Convergence of operator product expansions on the vacuum in conformal invariant quantum field theory. Commun. Math. Phys. 53, 155 (1977) 55. Mack, G., Wuerthner, J.: Simulation of Complex Systems by Enzymatic Computation. DESY 00-147 (2000), physics/0011020 (2000) Wuerthner, J.: Enzymatic Simulation of Complex Processes. PhD-thesis, Hamburg, Feb. 2000 56. Maturana, H.W. and Varela, F.G.: Autopoiesis and cognition: The realization of the Living. Amsterdam: Dordrecht, 1980 57. Meier-Schellersheim, M., Mack, G.: SIMMUNE, a tool for simulating and analyzing Immune System behaviour. Submitted to Bull. Math. Biology, cs. MA/9903017 Meier-Schellersheim, M.: The Immune System as a Complex System: Description and Simulation of the Interactions of its Constituents. PhD thesis, Hamburg 2001 58. Minsky, M.: The society of mind. New York: Simon and Schuster, 1986 59. Persky, N. and Solomon, S.: Macros and multiscale dynamics in spin glasses. cond-mat/9603056 Solomon, S.: The microscopic representation of complex macroscopic phenomena. Ann. Rev. Comput. Physics II (D. Stauffer, ed.) 243–294 (1995) 60. Petri, C.A.: Kommunikation mit Automaten. PhD thesis Bonn, Schriften des Instituts für Instrumentelle Mathematik, 1962 61. Rashevsky, N.: Organismic sets. Outline of a general Theory of Biological and Sociological Organisms. Bull. Math. Biophys 29, 139–152 (1967) Organismic sets II. Some general considerations. Bull. Math. Biophys. 30, 163–174 62. Rathje, D.: Die Theorie kategorischer Systeme. Emergenz, Kognition, Überblick und Berechenbarkeit. Diplomarbeit, Hamburg, Feb. 2000 63. Rosen, R.: The representation of Biological Systems from the Standpoint of the Theory of Categories. Bull. Math. Biophys. 20, 245–260 (1958) A Relational theory of Biological Systems. Bull. Math. Biophys. 20, 217–341 (1958) A Relational theory of Biological Systems II. Bull. Math. Biophys. 21, 109–128 (1959) 64. Rozenberg, G.: Handbook of Graph Grammars and Computing by Graph Transformation, Vol. 1: Foundations. Singapore: World Scientific, 1997 65. Schrattenholzer, M.: Logik lokal. Diplomarbeit, Hamburg, 1999 66. Shankar, R., Gupta, R., Murphy, G.: Dealing with truncation in Monte Carlo renormalization group calculations. Phys. Rev. Lett. 55, 1812 (1985) 67. Solomon, S.: Generalized Lotka–Volterra (GLV) models and generic emergence of scaling laws in stock markets. cond-mat/9901250, Proc. Econophysics Budapest 1997, to appear in: I. Kontor and J. Kertes (eds), Kluver Academic Press 68. Trottenberg, U., Oosterlee, C., Schüller, A.: Multigrid. New York: Academic Press, 1999 Brandt, A.: Multi-level adaptive solutions to boundary value problems. Math. Comp. 31, 333–390 (1977) 69. Wilson, K.: Renormalization group and strong interactions. Phys. Rev. D 3, 1818 (1971) 70. Wittgenstein, L.: Tractatus logico-philosophicus. London, 1922 71. Wolfram, S.: Cellular automata as models of complexity. Nature 311, 419 (1984) 72. W.v.O. Quine, W.: Word and object. Cambridge, MA: MIT Press, 1962 Communicated by A. Jaffe
Commun. Math. Phys. 219, 179 – 190 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Gauge and Mass Parameter Dependence of Renormalized Green’s Functions Peter Breitenlohner, Dieter Maison Max-Planck-Institut für Physik, Föhringer Ring 6, 80805 München, Germany Received: 18 May 2000 / Accepted: 28 May 2000
Dedicated to the memory of Harry Lehmann Abstract: Using the framework of algebraic renormalization we discuss the dependence of the renormalization group flow on gauge-fixing and mass parameters. We demonstrate that the freedom of finite renormalizations can be used to remove this dependence from the coefficients of the renormalization group equation.
1. Introduction Historically Quantum Field Theory arose from the attempt to quantize charged particles coupled to the electromagnetic radiation field. Already the first calculations by Heisenberg, Pauli, and Weisskopf, treating the interaction between the particles and the radiation field as a small perturbation, were plagued by infinities. Although it was observed that the infinities could be absorbed into the parameters describing unobservable “bare” particles, the resulting finite (renormalized) answers contained a certain amount of arbitrariness (apart from the fact that it is not very comforting to base the theory on unobservable objects). “Hence there was a definite need for a structural investigation of the divergencies of QED and their consistent removal”, as F. Dyson expressed it. Such structural analysis eventually led both to the development of an axiomatic approach to the renormalized perturbation expansion [1] as well as to general axioms for relativistic quantum field theory. In particular the LSZ axiomatics [2], although intrinsically nonperturbative, also allows for a systematic construction of the renormalized perturbation expansion as demonstrated by Steinmann [3]. In a similar spirit Bogolyubov showed that the renormalized perturbation series for the time-ordered functions can be systematically constructed using only the general principles of locality, (off-shell) unitarity and Poincaré invariance [4]. One of the main results of this axiomatic approach to the renormalization program developed by Bogolyubov, Hepp, Zimmermann and others is the precise characterization of the amount of arbitrariness involved in the construction of the expansion in terms of “finite renormalizations” [5].
180
P. Breitenlohner, D. Maison
However, physically interesting theories, e.g. the Standard Model, are restricted by further constraints expressed by symmetry properties referring to transformation groups acting on the fields. These can be space-time symmetries like conformal invariance or global and local internal invariances. The latter may be usually attributed to the use of unphysical “coordinate fields” required for a local formulation of the interactions. Gauge invariance then provides the guarantee that the unphysical degrees of freedom decouple from the physical states of the system. On the other hand this local invariance leads to zero-modes of the wave operator enforcing the breaking of gauge invariance by a gauge fixing in order to have a well-defined propagator. Nevertheless it turns out to be possible to decouple the unphysical modes introducing so-called Faddev–Popov ghosts and enforcing Slavnov–Taylor identities [6] expressing the response of the system to a gauge change. An important formal achievement was provided by Becchi, Rouet and Stora [7], who showed that the Slavnov–Taylor identities may be interpreted as an invariance condition of the quantum field theory under a new type of transformations with Grassmann type parameters – the BRS transformations. These transformations not only have an interesting algebraic geometric interpretation, but also allow for an abstract characterization of the renormalized perturbation expansion reducing the arbitrariness in its construction to physically relevant parameters. A BRS invariant renormalized perturbation expansion can be obtained essentially in two ways, either using an explicitly invariant subtraction procedure (e.g. employing an invariant regularization scheme) or using what is now called the “Algebraic Renormalization” [8]. The latter is based on the idea to turn the arbitrariness of the renormalization procedure into a virtue and enforce BRS invariance exploiting the freedom to make finite renormalizations eliminating violations of the invariance due to the use of an arbitrary “intermediate” renormalization. Although this procedure is in general more complicated than the use of an explicitly invariant one, it does not require the invention of an invariant regularization, which is often accompanied by delicate questions of consistency (e.g. the popular “Dimensional Regularization” [9]). The algebraic renormalization also allows to distinguish apparent violations of invariance from genuine ones usually called “Anomalies”. The mathematical apparatus appropriate for this distiction belongs to a well-developed branch of algebraic geometry – Cohomology theory. In the present paper (based on unpublished work done long ago [10]) we want to demonstrate the use of the algebraic renormalization method in two simple applications, namely the question of the dependence of the renormalization group flow on gaugefixing and mass parameters. While the gauge parameter dependence has been extensively treated in the literature [11–13], some aspects of [10] seem still to be unpublished. Although the mass independence of the renormalization group flow is known to hold for dimensional renormalization with minimal subtraction [14], we prefer a scheme independent treatment.
2. Gauge Parameter Dependence of Green’s Functions In order to analyze the gauge parameter dependence of Green’s functions we consider the case of a simple non-abelian gauge group G (with generators ta , [ta , tb ] = ifabc tc ) with one coupling constant g and one gauge parameter α. The generalization to the case with several g’s and α’s is straightforward and analogous to the discussion of mass dependence later on. In addition to the gauge fields Aµ = Aaµ ta and matter fields ϕi (transforming with some matrix representation ρ of G) we have to introduce Faddeev–
Gauge and Mass Parameter Dependence of Renormalized Green’s Functions
181
Popov ghosts c = ca ta with the nilpotent BRS-variations sAµ = Dµ c ≡ ∂µ c + i[c, Aµ ],
sϕ = ica ρa ϕ,
sc = ic2 ,
(1)
as well as anti-ghosts c¯ = c¯a ta and auxiliary ghosts B = B a ta with s c¯ = B,
sB = 0.
(2)
Note that we have chosen gauge- and BRS-transformations that depend on the group structure but not on the coupling constant which has been absorbed into the fields.
2.1. Review of Algebraic Renormalization. Whereas in [10] we have used dimensional renormalization in order to obtain BRS-invariant renormalized Green’s functions, we now use algebraic renormalization methods that do not refer to any particular renormalization scheme. In the following we assume that the theory is free of anomalies, i.e. that BRS-invariance can be established by suitable finite renormalizations (which is proven, e.g., for semi-simple non-abelian gauge groups). Let us shortly review the key concepts of algebraic renormalization, closely following the treatment by Piguet and Sorella [8]. We will need the renormalized Green’s functions of the elementary fields φ = (c, ¯ B, c, Aµ , ϕ) and composite fields (composite operators, operator insertions) Oi , collectively denoted by = (φ, O), Gni1 ,... ,in (x1 , . . . , xn ) = T i1 (x1 ) . . . in (xn ) . (3) They can be derived from a generating functional Z(J ) depending on sources J = (j, J ), h¯ δ δ h¯ n Gi1 ,... ,in (x1 , . . . , xn ) = ... Z(J ) , (4) i i n 1 i δJ (x1 ) i δJ (xn ) J =0 i i dx J (x)i (x) . (5) Z(J ) = T exp h¯ i
As composite fields we first of all need the BRS-variations of the elementary fields sc, sAµ , and sϕ (although some of the “matter fields” ϕ might actually be gauge invariant). In addition we may add gauge covariant composite fields built from Fµν , ϕ, and covariant derivatives; if they are not gauge invariant we need their BRS-variations as well. They must form a closed set under operator mixing, i.e., we have to add all composite fields with the same covariance and power counting dimension (and, unless the theory is massless, of lower dimension). Finally, since the classical action (to be specified later on) is only BRS- and not gauge invariant, the renormalized operators corresponding to gauge invariant composite fields may mix with BRS-invariant ones (but not with those that are BRS-variations [11]), such composite fields must also be included. Due to the BRS-invariance of the classical action, the generating functional satisfies in the tree approximation the Slavnov–Taylor identity (integrated Ward identity) SZ(J ) = 0, where S is the linear functional differential (Slavnov–Taylor) operator δ S ≡ − dx J i (x)#i j , (6) δJ j (x) ij
182
P. Breitenlohner, D. Maison
j determined by the BRS-transformations of the fields si = j #i j . The finite renormalizations are to be chosen such that the Slavnov–Taylor identity is satisfied to all orders of perturbation theory. In order to do so one needs the vertex functional $ corresponding in a diagrammatic approach to 1PI Feynman graphs. Introducing first the generating functional for connected Green’s functions Zc (J ) = hi¯ ln Z(J ), one has to perform a functional Legendre transformation with respect to all sources j for elementary fields. Defining the classical fields φi (x; j, J ) = δZc (j, J )/δj i (x) and solving for j (x; φ, J ) the vertex functional is given by $(φ, J ) = Zc (j (φ, J ), J ) − dx j i (x; φ, J )φi (x), (7) i
with j i (x; φ, J ) = −δ$(φ, J )/δφi (x). In the tree approximation we obtain $cl = Scl + dx J i (x)Oi (x).
(8)
i
The classical action Scl consists of the gauge invariant action Sinv = dxLinv (x) and a gauge fixing term Sgf which is a suitable BRS-variation. The classical Lagrangian contains all gauge invariant terms compatible with power counting; we choose in particular 1 Fµν · F µν + Lmatter , 4g 2
1 1 = 2 s c¯ · ( B + ∂A) g 2α 1 1 2 = 2 ( B + B · ∂A − c¯ · ∂Dc). g 2α
Linv = − Lgf
(9) (10) (11)
The matter Lagrangian could be, e.g. for one massless spinor field used as example below, ¯ µ Dµ ψ. Lmatter = i ψγ In the tree approximation we find δ$ 1 = 2 δB(x) g
1 B + ∂A , α
(12)
(13)
and this B-field equation holds true to all orders since there are no (non-trivial) 1PI diagrams with external B-lines. Later on we will use this field equation to eliminate B and obtain a somewhat more conventional form for the gauge fixing term. For the moment we have to keep the field B in order that the BRS-variations and Slavnov–Taylor operator are nilpotent. Adding a term ) = dx)(x) to the classical action yields a corresponding change in the vertex functional Scl → Scl + )
⇒
2 $ → $ + )$ + O(h) ¯ ),
(14)
where ) is an integrated operator insertion, i.e. )(x)$ is the generating functional for vertex functions with one additional vertex )(x), and )$ = ) + O(h)). ¯
(15)
Gauge and Mass Parameter Dependence of Renormalized Green’s Functions
183
Inserting a local polynomial P with sources J and without fields or with one field reproduces just this polynomial
(16) P(J (x))$ = P(J (x)), P(J (x))φ i (x) $ = P(J (x))φ i (x). Next we have to formulate the Slavnov–Taylor identity for the vertex functional $. Since the BRS-variations of elementary fields are either elementary or composite we decompose the matrix # accordingly, θ˜i j θ¯i j φj φj φi j = = , (17) #i s Oi Oj 0 θi j Oj j j and obtain the Slavnov–Taylor identity S($) = 0 with the non-linear operator δ$ δ$ δ$ . + θ¯i j − J i (x)θi j S($) ≡ dx B(x) δ c(x) ¯ δφi (x) δJ j (x)
(18)
ij
There is a corresponding $-dependent linear functional differential operator S$ defined by 1 (S($ + ,F ) − S($)) ,−>0 ,
S$ F = lim
(19)
satisfying S$ S($) ≡ 0; moreover S($) = 0 implies S$ S$ ≡ 0. ˜ and the (integrated) operator According to the quantum action principle S($) = )$ ˜ satisfies S$ )$ ˜ = 0. Assume the lowest contribution to the BRS-breaking insertion ) ˜ arises from graphs with n > 0 loops, i.e., ) ˜ = h¯ n ) ˜ n + O(h¯ n+1 ) — in the tree ) approximation S($cl ) = 0 by construction — and consequently ˜n =0, S$cl ) where S$cl =
dx
i
(20)
δ$cl δ δ . (21) sφi (x) + θ¯i j − J i (x)θi j δφi (x) δφi (x) δJ j (x) ij
Assuming there is no BRS-anomaly, there exists an integrated local composite field )n ˜ n and we can subtract h¯ n )n from the classical action and thereby such that S$cl )n = ) ˜ cancel )n . The field equation (13) for B together with S($) = 0 implies the ghost field equation 1 µ δ δ $ = 0, (22) + ∂ δ c¯ g 2 δJsAµ i.e., $ depends on c¯ only through the combination JsAµ +
1 µ ∂ c. ¯ g2
Let J i and J i be the sources coupled to a gauge covariant (but not invariant) composite field Oi and to sOi . Whenever S($) = 0 can be satisfied in the absence of J i
and J i , it can also be satisfied with them; this just amounts to a suitable redefinition of
184
P. Breitenlohner, D. Maison
the composite operator sOi . This need not be so when adding the source for a gauge invariant composite field; in this case there might be a global anomaly. The procedure outlined above fixes some, but not all of the freedom allowed by renormalization theory; one can still add terms h¯ n )n to the classical action provided S$cl )n = 0. These “invariant insertions” are terms that are either already present in $cl or additional source terms that take care of operator mixing (and multiple operator insertions), i.e. finite renormalizations of coupling constants (including gauge parameters), masses, wave functions, and composite operators; all of them have to be fixed by a complete set of suitable normalization conditions. For the gauge field propagator (after eliminating the auxiliary field through its field equation B = α∂A)
gµν k 2 − kµ kν kµ kν −i hδ ¯ ab a b ˜ ˜ T Aµ (k)Aν (−k) = 2 + , (23) (k + i0)2 /1 (k 2 ) /2 (k 2 ) we choose in particular /1 (µ2 ) =
1 , g2
/2 (µ2 ) =
1 , αg 2
(24)
for some µ2 < 0. Note that the infrared divergence of /1 (0) prevents us from using on-shell normalization conditions.
2.2. Gauge Parameter Dependence. Let us now turn to the gauge parameter dependence of Green’s functions. Due to the quantum action principle ∂$ ∂α = )$, where ) is an integrated operator insertion with power counting dimension four and vanishing ghost charge. Since the Slavnov–Taylor operator does not depend explicitly on α, S($) = 0 implies S$ )$ = 0. Arguing again recursively (in powers of h) ¯ ) is a linear combination of invariant insertions: gauge invariant composite fields, source terms (including ˆ those describing operator mixing), and BRS-variations dxs O(x) (the two pieces of Lgf ). Similarly we may differentiate $ with respect to one of the – explicit or implicit– ∂ parameters of the action, i.e. apply to $ one of the differential operators ∂g∂ 2 , mi ∂m or i δ δ i counting operators Ni = − dxϕi (x) δϕi (x) and Ni = dxJ (x) δJ i (x) or – in the case of operator mixing – Nij = dxJ i (x) δJ jδ(x) (suitably summed over components in order to preserve global symmetries). This is again an operator insertion of the kind described above. Moreover the invariant composite fields and source terms can be uniquely expressed in terms of a basis of such differential and counting operators (and vice versa) up to operator insertions that are BRS-variations. This is so because the normalization conditions have to be chosen such that they uniquely determine all coefficients of these ∂ insertions. Note, however, that only certain combinations of ∂α and counting operators respect the field Eqs. (13) and (22) for B and c (gauge conditions). Combining all this, we obtain ∂$ ∂ ∂ ˆ $. δˆi mi γˆi Ni + + s) (25) = βˆ 2 + ∂g ∂mi ∂α i
i
ˆ ˆ = dx O(x) Although it is not needed for our purposes, one might make the term ) somewhat more explicit by adding to the classical action either sources for (the two
Gauge and Mass Parameter Dependence of Renormalized Green’s Functions
185
ˆ ˆ pieces of) O(x) and s O(x) as in [10] or parameters for the corresponding integrated terms as in [12]. ˆ γˆi , and δˆi in Eq. (25) are functions of α and g 2 ; they reflect the The coefficients β, inevitable lack of gauge invariance of the normalization conditions and can be determined by applying Eq. (25) to them. The functions δˆi vanish if all mass parameters are fixed by on-shell normalization conditions. For simplicity assume the matter fields ϕ consist of just one massless spinor ψ, e.g. in the fundamental representation of the gauge group G, and its adjoint ψ¯ with propagator
T ψ˜ α (k)ψ˜¯ β (−k) =
i hδ ¯ βα k (k 2 + i0)4(k 2 )
,
(26)
and normalization condition 4(µ2 ) = h(α, g 2 ),
(27)
where h is some function to be determined later. The invariant counting operators are all of the form NI $ = dxS$ PI (x)$, (28) δ with Pc¯ = −c¯ δB , Pc = cJsc , PA = Aµ JsAµ , and Pψ = ψ α Jψ α + ψ¯ β Jψ¯ β . In order to
preserve the gauge conditions (13) and (22), the operators as the combination1
∂ ∂α ,
∂˜ ∂ 1 ≡ − (NA − Nc¯ ) . ∂α ∂α 2α
Nc¯ , and NA must occur
(29)
For c = c¯ = B = J = 0, i.e. omitting all ghost and composite fields, expression (29) simplifies to ∂˜ ∂ NA = − , ∂α ∂α 2α
(30)
˜ ∂$ ∂ ˆ $, = βˆ 2 + γˆ Nψ + s ) ∂α ∂g
(31)
with NA = − dxAµ (x) δAµδ (x) , and Eq. (25) simplifies to
¯ α δ ). The functions βˆ and γˆ can be deterwith Nψ = − dx(ψ α (x) δψ αδ (x) + ψ(x) δ ψ¯ α (x) mined by applying Eq. (31) to the normalization conditions (24) and (27). Whereas the ˆ is already present in the tree approximation, one would last term in Eq. (31) with s ) like to eliminate the other two terms. We can eliminate βˆ by replacing g 2 by a function g 2 (α, g¯ 2 ) satisfying α ˆ g 2 (α, g¯ 2 ))dτ, β(τ, g 2 (α, g¯ 2 ) = g¯ 2 − (32) α0
1 This modification of ∂ was missing in [10]. We are grateful to W. Zimmermann for pointing out this ∂α
inconsistency.
186
P. Breitenlohner, D. Maison
such that
∂˜ ∂α g¯
≡
∂˜ ∂α
− βˆ ∂g∂ 2 differentiates along “gauge orbits” in the (α, g 2 ) plane.
In order to compute γˆ we have to apply Eq. (31) to the normalization condition (27). ¯ First we introduce two new functions h(α, g¯ 2 ) ≡ h(α, g 2 (α, g¯ 2 )) and γˆ¯ (α, g¯ 2 ) ≡ 2 2 γˆ (α, g (α, g¯ )). Next we compute α ˆ k T s )(0)ψ (k)ψ˜¯ β (−k)
k 2 =µ2
=
ˆ¯ (α, g¯ 2 ) iδβα ) ψ , ¯ h(α, g¯ 2 )
(33)
ˆ¯ (α, g¯ 2 ) that is independent of the choice of h¯ due to the counting with some function ) ψ identity ∂$ h¯ = 2Nψ $, ∂ h¯ and thus obtain 1 γˆ¯ = 2
(34)
∂ ln h¯ ˆ ¯ψ . −) ∂α g¯
We can now choose ¯ ¯ 0 , g¯ 2 ) exp h(α, g¯ 2 ) = h(α
α
α0
ˆ¯ (τ, g¯ 2 )dτ, ) ψ
and obtain γˆ¯ = 0 as desired. With these modifications we find ˜ ∂$ ˆ ˆ ≡ S$ ()$). = (s ))$ ∂α
(35)
(36)
(37)
g¯
Could we go on mass shell this would guarantee the α-independence of the physical S-matrix, with g¯ 2 playing the role of a physical coupling constant. 2.3. The Renormalization Group Equation. Due to the choice of normalization conditions (24) and (27) the Green’s functions depend on µ. Using once again the action principle one derives in a standard fashion [15] the Renormalization Group (R.G.) equation, again for c = c¯ = B = J = 0, ˜ ∂ 2 ∂ 2 ∂ 2 D$ ≡ µ (38) + β(α, g ) 2 + η(α, g ) + γ (α, g )Nψ $ = 0. ∂µ ∂g ∂α Expressing g 2 in terms of g¯ 2 we can rewrite the R.G. equation as ˜ ∂ ∂ ∂ ¯ ≡ µ ¯ g¯ 2 ) + η(α, ¯ g¯ 2 ) + β(α, D$ + γ¯ (α, g¯ 2 )Nψ $ = 0. ∂µ ∂ g¯ 2 ∂α g¯
(39)
Gauge and Mass Parameter Dependence of Renormalized Green’s Functions
187
We shall now prove that β¯ and γ¯ are actually independent of α. The α-independence of β¯ plays an important role in the application of the R.G. equation for Green’s functions of gauge invariant operators. Using an idea originally introduced by Caswell and ∂˜ ¯ to get ˆ with the R.G. operator D, Wilczek [13] we commute the operator ∂α − s) g¯
˜ ¯ ∂ ∂ β ∂ γ ¯ ∂
ˆ D¯ $ = ˆ $, + Nψ + s ) 0= − s ), ∂α ∂α g¯ ∂ g¯ 2 ∂α g¯ g¯ ˆ ˆ ∂) ∂ η¯ )
ˆ = β¯ ) + . ∂ g¯ 2 ∂α
(40)
(41)
g¯
Since
∂ , ∂ g¯ 2
Nψ , and the insertions s) are linearly independent operators, we deduce ∂ β¯ ∂ γ¯ ˆ = 0. = =) ∂α g¯ ∂α g¯
(42)
2.4. Gauge Invariant Composite Operators. Let us discuss as a simple, yet interestµ ¯ µ ψ coupled to the source ing example the gauge invariant composite field Oa = ψγ aµ (x). In contrast to operators of higher canonical dimension the Green’s functions of this operator are completely specified by a finite number of normalization conditions. Specifically we choose (with the same µ2 < 0 as before) 3 δ $ = δαβ γ µ h¯ a (α, g¯ 2 ), (43) δ a˜ (k )δ ψ˜ α (k )δ ψ˜¯ (k ) µ
for
ki = 0,
ki2
1
2
β
3
φ=J =0
(with h¯ a determined later), and δ2 $ = (g µν k 2 − k µ k ν )/(k 2 ), δ a˜ µ (k)δ a˜ ν (−k) φ=J =0
=
µ2
(44) µ
with /(µ2 ) = 0. These normalization conditions determine the coefficient of aµ Oa and of the induced source term f 2 with fµν = ∂µ aν − ∂ν aµ . There is the BRS-invariant
µ µ composite field Oa = B · Aµ − c¯ · D µ c with the same covariance as Oa which is, µ however, the BRS-variation of c¯ · A and hence the corresponding operators will not mix [11]. ¯ µ ψ yields two additional terms in Adding the source aµ for the composite field ψγ Eq. (37), corresponding to the two additional invariant insertions
˜ ∂$ ˆ $ + ζ¯ˆ (α, g¯ 2 )f 2 (45) = γˆ¯ a (α, g¯ 2 )Na + s ) ∂α g¯
with the counting operator Na = dxaµ (x) δaµδ(x) . The coefficients γˆ¯ a and ζˆ¯ are again determined by applying Eq. (45) to the normalization conditions (43) and (44). Equation (44) immediately implies ζˆ¯ = 0. Using the same argument as before, γˆ¯ a can be
188
P. Breitenlohner, D. Maison
removed by a suitable choice of h¯ a (α, g¯ 2 ). With this modification the α dependence is again given by Eq. (37). The renormalization group equation reads now (again with c = c¯ = B = 0, and ¯ µ ψ) restricting J to a, i.e., omitting all ghosts and all composite fields except ψγ ˜ ∂ ∂ ∂ µ ¯ g¯ 2 ) + η(α, ¯ g¯ 2 ) + β( + γ¯ (g¯ 2 )Nψ + γ¯a (g¯ 2 )Na $ = ζ¯ (g¯ 2 )f 2 . ∂µ ∂ g¯ 2 ∂α g¯
(46) Arguing as before, one finds ∂ β¯ ∂ γ¯ ∂ γ¯a ∂ ζ¯ = = = = 0. ∂α g¯ ∂α g¯ ∂α g¯ ∂α g¯
(47)
¯ µ ψ (withLet us finally consider the generating functional for Green’s functions for ψγ out any other elementary or composite fields) $inv (a) ≡ $(φ, J )|φ=0, J =a ,
(48)
∂$inv = 0, ∂α g¯
(49)
satisfying
since the term with S$ in Eq. (37) does not contribute. Therefore the Green’s functions ¯ µ ψ do not depend on α for fixed “physfor the gauge invariant composite operator ψγ ical” coupling constant g. ¯ This has been achieved by the explicit α dependence of the normalization conditions (27), (36), and (43). 3. Mass Dependence of Green’s Functions As a second application of the algebraic renormalization method we want to show that it is always possible to use the freedom to do finite renormalizations in order to make the β function(s) and anomalous dimension(s) mass independent. The argument will essentially follow the previous one used for the gauge-parameter dependence. So we start again with the renormalization group equation in the general form
µ
∂ ∂ βk + γi Ni $ = 0. + ∂µ ∂gk k
(50)
i
For a gauge theory with gauge parameter(s) αk there would be an additional term ∂˜ 2 k ηk ∂αk and the g’s would include the gauge coupling g¯ from the previous chapter. Besides we have parametric equations describing the response to an infinitesimal change of the (on-shell) masses,
mi
∂ ∂ + + γˆij Nj $ = 2 aij m2j )j $, βˆik ∂mi ∂gk k
j
j
(51)
Gauge and Mass Parameter Dependence of Renormalized Green’s Functions
189
where the )j represent the minimally subtracted (“soft”) mass insertions. As before we replace the coupling parameters gk by g’s ¯ solving the differential equations ∂gk = βˆik (52) mi ∂mi g¯ ¯ mi = mi,0 ) = g¯ k . The solvability of these equations with the initial conditions gk (g, requires the integrability conditions mi
∂ βˆj k ∂ βˆik = mj . ∂mi ∂mj
(53)
∂ and antisymmetrize In order to show that they are fulfilled, we act on Eq. (51) with mj ∂m j in i and j . Thus we arrive at
∂ γˆin ∂ βˆj k ∂ ∂ γˆj n ∂ βˆik mj mj Nn $ − mi + − mi ∂mj ∂mi ∂gk ∂mj ∂mi n
∂ ∂ =2 ain m2n − mi aj n m2n )n $. mj ∂mj ∂mi n
k
(54)
Since the operators ∂g∂ k , Nn , and )n are independent their coefficients have to vanish leading to Eq. (53). Similarly we perform a finite wave-function renormalization replacing $ → $¯ = e
hi Ni
$,
(55)
∂hj + γˆij = 0 ∂mi g¯
(56)
with functions hi solving mi
whose integrability conditions are also a consequence of Eq. (54). A distinguished choice for the mi,0 is obviously mi,0 = 0, but for that we have to make sure that the m = 0 limit of the Green functions exists, e.g. by choosing suitable off-shell normalization conditions. After these changes we obtain new parametric equations
µ
∂ ∂ + γ¯i Ni $¯ = 0, β¯k + ∂µ ∂ g¯ k k i ∂ ¯ mi a¯ ij m2j )j $. $¯ = 2 ∂mi
(57) (58)
j
As before commuting these differential operators we find that the β¯k and anomalous dimensions γ¯i are mass independent. The same result has been obtained by Zimmermann [16] without the use of Eq. (51).
190
P. Breitenlohner, D. Maison
References 1. Hepp, K.: Renormalization theory. In: Statistical Mechanics and Quantum Field Theory, Proc. Les Houches 1970, C. de Witt and R. Stora (Eds.), New York: Gordon and Breach, 1971, pp. 429–500 2. Lehmann, H., Symanzik, K. and Zimmermann, W.: Nuovo Cim. 1, 205–225 (1955); 6, 319–333 (1957) 3. Steinmann, O.: Perturbative expansions in axiomatic field theory. Berlin: Springer, 1971 4. Bogoliubov, N.N. and Shirkov, D.V.: Introduction to the theory of quantized fields. Interscience, London, 1958 5. Epstein, H. and Glaser, V.: Ann. Inst. Henri Poincaré 19, 211–295 (1973) 6. Faddev, L.D. and Slavnov, A.A.: Gauge Fields, Introduction to Quantum theory. Reading, MA.: Benjamin, 1980 7. Becchi, C., Rouet, A. and Stora, R.: Ann. Phys. (N.Y.) 98, 287–321 (1976) 8. Piguet, O. and Sorella, S.P.: Algebraic Renormalization. Lecture Notes in Physics, Springer, Berlin– Heidelberg–New York: 1995 9. ’t Hooft, G. and Veltmann, M.: Nucl. Phys. B 50, 318–353 (1972) 10. Breitenlohner, P. and Maison, D.: Dimensional Renormalization of MasslessYang-Mills Theories. Preprint MPI-PAE/PTh 26/75, August 1975, unpublished 11. Kluberg-Stern, H. and Zuber, J.B.: Phys. Rev. D 12, 3159–3180 (1975) 12. Piguet, O. and Sibold, K.: Nucl. Phys. B 253, 517–540 (1985) 13. Caswell, W.E. and Wilczek, E.: Phys. Lett. B 49, 291–292 (1974) 14. Collins, J. and Mac Farlane, A.: Phys. Rev. D 10, 1201–1212 (1974) 15. Lowenstein, J.: Commun. Math. Phys. 24, 1–21 (1971) 16. Zimmermann, W.: Reduction of Couplings in Massive Models of Quantum Field Theory. In: Theoretical Physics, Fin de Siècle, Proc. Wrocław 1998, A. Borowiec et. al. (Eds.), Lecture Notes in Physics, Berlin– Heidelberg–New York: Springer, 2000, pp. 304–314 Communicated by W. Zimmermann
Commun. Math. Phys. 219, 191 – 198 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Harry Lehmann and the Analyticity Unitarity Programme André Martin1,2 1 Theoretical Physics Division, CERN, 1211 Geneva 23, Switzerland 2 LAPP 74941 Annecy le Vieux Cedex, France
Received: 22 May 2000 / Accepted: 2 June 2000
I dedicate this paper to Marie-Noëlle Fontaine, the last of the many papers she typed so skillfully for me, wishing her a happy retirement.
Abstract: I try to describe the extremely fruitful interaction I had with Harry Lehmann and the results which came out of the analyticity unitarity programme, especially the proof of the Froissart bound, which, with recent and future measurements of total crosssections and real parts, remains topical.
My first meeting with Harry Lehmann was not with his person but with the famous paper of the trio Lehmann–Symanzik–Zimmermann, LSZ [1], the importance of which everybody in the Theory group of Maurice Lévy at Ecole Normale realized immediately. In spite of the fact that I did not know German (I still don’t) I read it, Nuovo Cimento in one hand, dictionary in the other hand (I am a “corrected” left hander). Later Harry visited the Ecole Normale in person and I was immediately impressed. That was the time where there was a wave of interest into what is an unstable particle and Lehmann and Lévy were some of the people involved. I remember also quite vividly our meeting at the La Jolla Conference in 1961 which I attended, coming from CERN. It was, as I realized a posteriori, a very important conference, for physicists and for people (some of the people I met there became my very best friends). I remember that Marcel Froissart gave a talk on his famous Froissart bound [2] on the total cross-section, σt < c log s)2 , s square of the centre-of-mass energy, and Harry with his very meticulous mind found out that some of the estimates of Froissart were not quite correct, though this did not affect the result (some years later, I published a sum rule on pion-nucleon scattering and Harry discovered a very well hidden mistake. I was very impressed). Anyway we were both admirative of the achievement of Froissart and for me it was a decisive turning point, since I left almost completely for many years potentials and the Schrödinger equation for the study of high-energy scattering and high-energy bounds. The Froissart bound was URA 1436 du CNRS, associée à l’Université de Savoie
192
A. Martin
derived from a combination of the Mandelstam representation [3] where the scattering amplitude is the boundary value of an analytic function of two variables, which implies automatically dispersion relations proved from field theory [4] in one variable as well as the Lehmann ellipse [5] which is probably the most celebrated result of Harry, a fundamental result presented in 10 small pages of Nuovo Cimento (compare with the incredibly lengthy papers on what I would call “rigorous atomic physics” which appeared during the last 15 years!). The trouble with the Mandelstam representation is that nobody was ever able to prove it even in perturbation theory (through some wrong proofs were published!). Both Harry and I were anxious to obtain high-energy bounds with minimal assumptions. A step in this direction was made by Greenberg and Low [6] who used the Lehmann Ellipse to derive a bound on the total cross-section where (log s)2 was replaced by s(log s)2 . Myself, I realized that the whole Mandelstam representation was not needed to get the Froissart bound and that it was sufficient to replace the Lehmann ellipse by a larger one [7]. Later, in Princeton, Y. S. Jin (a former student of Harry) and I found a way to control the growth of the scattering amplitude for unphysical momentum transfer using positivity [8] but at the time we made no progress on the derivation of the Froissart bound. In the autumn of 1965 I was visiting IHES (Institut des Hautes Etudes Scientifiques) and Harry was there. He attracted my attention on a paper by Nakanishi which contained the claim that the Lehmann Ellipse could be enlarged by using results from perturbative field theory, leading to the obtention of the Froissart bound. As I shall explain later, we tried to make sense of the paper of Nakanishi [9] but in the end could not. Nevertheless it started again my interest in the subject, and after a visit to Cambridge where I learnt that the Nakanishi perturbative domain of analyticity [10] had been obtained independently and in a simpler way by T. T. Wu [11], I came back to CERN and finally succeeded, using positivity properties not terribly different from those I had used with Jin, to enlarge the Ellipse without using perturbation and prove the Froissart bound from first principles. Something rather rare happened: Harry sent a postcard to congratulate me, but while moving from one apartment to another one or maybe from one office to another, I lost it! I had many occasions to meet Harry later, but the last one was in the spring of 1998 at CERN where he came to work with T.T. Wu after an operation which seemed successful. At the Ringberg Castle meeting, in the honor of Wolfhart Zimmermann, he was supposed to be the first speaker and could not come because he was ill. I became the first speaker. Then I knew I would never meet him again. Now I believe that it is necessary to give some technical details. In 3+1 dimensions (3 space, 1 time) the scattering amplitude depends on two variables energy and angle. For a reaction A + B → A + B, Ec.m. = MA2 + k 2 + MB2 + k 2 , (1) k being the centre-of-mass momentum. The angle is designated by θ . There are alternative variables: s = (ECM )2 ,
t = 2k 2 (cos θ − 1).
(2)
(Notice that physical t is NEGATIVE). We shall need later an auxiliary variable u, defined by s + t + u = 2MA2 + 2MB2 .
(3)
Harry Lehmann and the Analyticity Unitarity Programme
193
The Scattering amplitude (scalar case) can be written as a partial wave expansion, the convergence of which will be justified in a moment: √ s (4) (2 + 1)f (s)P (cos θ). F (s, cos θ ) = k f (s) is a partial wave amplitude. The Absorptive part, which coincides for cos θ real (i.e., physical) with the imaginary part of F , is defined as √ s (5) (2 + 1) Im f (s)(cos θ). As (s, cos θ) = k The Unitarity condition, implies, with the normalization we have chosen Im f (s) ≥ |f (s)|2 ,
(6)
Im f (s) > 0, |f | < 1.
(7)
which has, as a consequence
The differential cross-section is given by dσ 1 = |F |2 , d s and the total cross-section is given by the “optical theorem” 4π σtotal = √ As (s, cos θ = 1). k s With these definitions, a dispersion relation can be written as: 1 As (s , t)ds Au (u , t)du 1 + F (s, t, u) = π s − s π u − u
(8)
(9)
with possible subtractions, i.e., for instance the replacement of 1/(s −s) by s N /s N (s − s) and the addition of a polynomial in s, with coefficients depending on t. The scattering amplitude in the s channel A + B → A + B is the boundary value of F for s + i, > 0 → 0, s > (MA + MB )2 . In the same way the amplitude for ¯ B¯ being the antiparticle of B is given by the boundary value of F A + B¯ → A + B, for u + i, → 0 u > (MA + MB )2 . Here we understand the need for the auxiliary variable u. The dispersion relation implies that, for fixed t the scattering amplitude can be continued in the s complex plane with two cuts. The scattering amplitude possesses the reality property, i.e., for t real it is real between the cuts and takes complex conjugate values above and below the cuts. In the most favourable cases, like π π → π π or π N → π N scattering dispersion relations have been established for −T < t ≤ 0, T > 0 [4]. In the general case, even if dispersion relations are not proved, the crossing property of Bros, Epstein and Glaser states that the scattering amplitude is analytic in a twice cut plane, minus a finite region, for any negative t [12]. So it is possible to continue the ¯ amplitude directly from A + B → A + B to the complex conjugate of A + B¯ → A + B.
194
A. Martin
By a more subtle argument, using a path with fixed u and fixed s it is possible to continue ¯ directly from A + B → A + B to A + B¯ → A + B. At this point, we see already that one cannot dissociate analyticity, i.e., dispersion relations, and unitarity, since the discontinuity in the dispersion relations is given by the absorptive part. In the simple case of t = 0, the absorptive part is given by the total cross-section and the forward amplitude is given, as we said already for the case of Compton Scattering, by an integral over physical quantities. It was recognized very early that the combination of analyticity and unitarity might lead to very interesting consequences and might give some hope to fulfill at least partially the S matrix Heisenberg program. This was very clearly stated already in 1956 by Murray Gell-Mann [13] at the Rochester conference. Later this idea was taken over by many people, in particular by Geff Chew. To make this program as successful as possible it seemed necessary to have an analyticity domain as large as possible. Dispersion relations are fixed t analyticity properties, in the other variable s, or u as one likes. Another property derived from local field theory was the existence of the Lehmann ellipse [5], which states that for fixed s, physical, the scattering amplitude is analytic in cos θ in an ellipse with foci at cos θ = ±1. cos θ = 1 corresponds to t = 0 the ellipse therefore contains a circle |t| < T1 (s).
(10)
T1 (s) is given by T1 (s) , x0 = 1 + 2k 2 1/2 (M12 − MA2 ) (M22 − MB2 ) , x0 = 1 + k 2 (s − (M1 − M2 )2 ) where MA and MB are the masses of the particles, M1 and M2 are the lowest intermediate states in the currents associated to the fields of the incoming particles. Hence T1 (s) → 0 for s → (MA + MB )2 and s → ∞. The absorptive part is analytic in the larger ellipse, the “large” Lehmann ellipse, containing the circle |t| < T2 (s), T2 is given by
(11)
T2 (s) . 2k 2 So T2 (s) → c > 0 for s → (MA + MB )2 , T2 (s) → 0 for s → ∞. It was thought by Mandelstam that these two analyticity properties, dispersion relations and Lehmann ellipses, were insufficient to carry very far the analyticity-unitarity program. he proposed the Mandelstam representation [3] which can be written schematically as 1 ρ(s , t )ds dt F = 2 π (s − s) (t − t) + circular permutations in s, t, u (12) + one dimensional dispersion integrals + subtractions. 2x02 − 1 = 1 +
Harry Lehmann and the Analyticity Unitarity Programme
195
This representation is nice. It gives back the ordinary dispersion relations and the Lehmann ellipse when one variable is fixed, but it was never proved nor disproved for all mass cases, even in perturbation theory. One contributor, Jean Lascoux, refused to co-sign a “proof”, which, in the end, turned out to be imperfect. One very impressive consequence of Mandelstam representation was the proof, by Marcel Froissart, that the total cross-section cannot increase faster than (log s)2 , the so-called “Froissart Bound” [2]. My own way to obtain the Froissart bound [7] was to use the fact that the Mandelstam representation implies the existence of an ellipse of analyticity in cos θ qualitatively larger than the Lehmann ellipse, i.e., such that it contains a circle |t| < R, R fixed, independent of the energy. This has a consequence that Im f (s) decreases with at a certain exponential rate because of the convergence of the Legendre polynomial expansion and of the polynomial boundedness, but on the other hand the Im f (s)’s are bounded by unity because of unitarity [Eq. (7)]. Taking the best bound for each gives the Froissart bound. Let me now try to recall the exchange Harry Lehmann and I had in the Autumn of 1965 in Bures sur Yvette. We had in common the same desire to find a proof of the Froissart bound without using the Mandelstam representation and to find a way to enlarge the Lehmann ellipse. Harry pointed out to me a paper published by N. Nakanishi [9] a few months earlier where he claimed that he had a proof of the Froissart bound. Let me remind you that the largest possible ellipse of convergence of the Legendre Polynomial series for the absorptive part has necessarily a singularity at its right extremity. This is the analogue of a classical theorem on power series with positive coefficients. This means that if you succeed (take the π π case, mπ = 1) in proving that the absorptive part is analytic in the neighbourhood of the segment t = 0,
t = 4,
then it is automatically analytic in the ellipse with foci t = 4 − s,
t = 0,
and right extremity t = 4, and a fortiori it is analytic in the circle |t| < 4, entirely contained in the ellipse. Nakanishi had obtained a representation valid for any Feynman diagram [10] d nα . TN (s, t) = [f (α) + s g(α) + t h(α)]p Later on I learnt from P. Landshoff that this representation had also been obtained, independently and in a simpler way by T. T. Wu [11]. A minimal analyticity domain for TN (s, t) is obtained when the denominator in the integral representation does not vanish. This domain, for the ππ case for fixed complex s, is a kind of strip containing the straight line going through t = 0 and t = 4 − s, (which corresponds to cos θs real), and the segments −4 < t < +4 and −s < t < 8 − s. When s tends to a real value the domain shrinks to zero for s > 4 and for s < −t (for t real). This means that for t fixed, real −4 < t < +4, dispersion relations hold. The Nakanishi–Wu representation also implies the validity of partial wave dispersion relations but this is irrelevant for our problem.
196
A. Martin
However, there is nothing like a small or a large Lehmann ellipse in this domain. The absorptive part in perturbation theory, which is defined only in the limit s → sR + i, → 0 has a priori no analyticity in t. A priori, it is just a distribution. In fact in perturbation theory, unitarity connects amplitudes of different orders and positivity properties of the absorptive part are completely hidden. In three-space dimensions, nobody knows if the perturbation series can be resummed (probably not!) and it is not “legal” to combine the results of axiomatic field theory and perturbation theory. Of course one can always try it as a game, which is what Harry and I tried to do, but we went nowhere. It is only in December 1965, after a visit to Cambridge, that I found a way to enlarge the Lehmann ellipse in the framework of axiomatic field theory [14], without using at all the results of perturbation theory. I was maybe a bit unfair not to quote the Nakanishi– Wu representation because the “wrong” paper of Nakanishi was undoubtedly a source of stimulation but, on the other hand, I did not use it at all. Our method was the following. The positivity of Im f implies, by using expansion (5), n d A (s, t) S dt
n d ≤ AS (s, t) . dt t=0
−4k 2 ≤t≤0
To calculate 1 F (s, t) = π
s0
(13)
As (s t)ds s − s
(forget the left-hand cut and subtractions!), for s real < s0 one can expand F (s, t) around t = 0. From the property (13) one can prove that the successive derivatives can be obtained by differentiating under the integral. When one resums the series one discovers that this can be done not only for s real < s0 , but for any s and that the expansion has a domain of convergence in t independent of s. This means that the large Lehmann ellipse must contain a circle |t| < R. This is exactly what is needed to get the Froissart bound. In fact, in favourable cases, R = 4m2π , mπ being the pion mass. A recipe to get a lower bound for R was found by Sommer [15] R ≥ sups0 <s<∞ T1 (s).
(14)
It was already known that for |t| < 4m2π the number of subtractions in the dispersion relations was at most two [8], and it lead to the more accurate bound [16] σT <
π (log s)2 . m2π
(15)
Notice that this is only a bound, not an asymptotic estimate. In spite of many efforts the Froissart bound was never qualitatively improved, and it was shown by Kupsch [17] that if one uses only Im f ≥ |f |2 and full crossing symmetry one cannot do better than Froissart. On the more theoretical side one might wonder if using crossing symmetry and analytic completion one could not prove Mandelstam representation at least for the pion-pion case using only axiomatic results. This is not the case, as I showed it in 1967
Harry Lehmann and the Analyticity Unitarity Programme
197
at a meeting organized by Bob Marshak in Rochester when Harry was present [18]. One can write a representation of the scattering amplitude 1 ∞ dp dq w(x, p, q) Fν = dx ν ν 0 p0 q0 x(p − s) + (1 − x) (q − t) + circular permutations. For ν = 1/2 this is just a funny way to write the Mandelstam representation. For ν = 1, you get back to the Nakanishi–Wu representation. For ν = 2/3 you get a natural domain bigger than all you can get from axiomatic field theory and positivity. Before 1972, rising cross-sections were a pure curiosity. Almost everybody believed that the proton-proton cross-section was approaching 40 millibarns at infinite energy. Yet, Khuri and Kinoshita [19] took seriously very early the possibility that cross-sections rise and proved, in particular, that if the scattering amplitude is dominantly crossing even, and if σt ∼ (log s)2 then π ReF ∼ , ρ= ImF log s where ReF and ImF are the real and imaginary part of the forward scattering amplitude. As early as 1970, Cheng and Wu proposed a model in which cross-sections were rising [20] and eventually saturating the Froissart bound. However, at that time there was no experimental indication of this. It is only in 1972 that it was discovered at the ISR, at CERN, that the p − p cross-section was rising by 3 millibarns from 30 GeV c.m. energy to 60 GeV c.m. energy [21]. I suggested to the experimentalists that they should measure ρ and test the Khuri–Kinoshita predictions. They did it [22] and this kind of combined measurements of σT and ReF are still going on. In σT we have now more than a 50 % increase with respect to low energy values. For an up to date review I refer to the article of Matthiae [23]. it is my strong conviction that this activity should be continued with the future LHC. A breakdown of dispersion relation might be a sign of new physics due to the presence of extra compact dimensions of space according to N. N. Khuri [24]. Future experiments, especially for ρ, will be difficult because of the necessity to go to very small angles, but not impossible [25]. References 1. 2. 3. 4.
5. 6. 7. 8. 9. 10. 11. 12.
Lehmann, H., Symanzik, K. and Zimmermann, W.: Nuovo Cimento (Serie 10) 1, 205 (1955) Froissart, M.: Phys. Rev. 123, 1053 (1961) Mandelstam, S.: Phys. Rev. 112, 1344 (1958) Goldberger, M.L.: Phys. Rev. 99, 979 (1975); Bogoliubov, N.N., Medvedev, B.V. and Polivanov, M.K.: Voprossy Teorii Dispersionnyk Sootnoshenii. V. Shirkov et al. Eds., Moscow, 1958; Symanzik, K.: Phys. Rev. 105, 743 (1957); Lehmann, H.: Suppl. Nuovo Cimento 14, 153 (1959) Lehmann, H.: Nuovo Cimento 10, 579 (1958) Greenberg, O.W. and Low, F.E.: Phys. Rev. 124, 2047 (1961) Martin, A.: Phys. Rev. 129, 1432 (1963), and in: Proceedings of the 1962 Conference on High Energy Physics at CERN, J. Prentki ed., CERN Scientific Information Service, 1962, p. 567 Jin, Y.S. and Martin, A.: Phys. Rev. B 135, 1375 (1964) Nakanishi, N.: Phys. Rev. Lett. 13, 677 (1964) Nakanishi, N.: Progr. Theor. Phys. 26, 337 (1961) Wu, T.T.: Phys. Rev. 123, 678 (1961) Bros, J., Epstein, H. and Glaser, V.: Commun. Math. Phys. 1, 240 (1965)
198
A. Martin
13. Gell-Mann, M.: In: Proceedings of the 6th Annual Rochester Conference. J. Ballam, V.L. Fitch, T. Fulton, K. Huang, R.R. Rau and S.B. Treiman, eds., New York: Interscience Publishers, 1956, p. 30 14. Martin, A.: Nuovo Cimento 42, 901 (1966) 15. Sommer, G.: Nuovo Cimento A 48, 92 (1967) In the special case of pion–nucleon scattering a special argument gives R = 4m2π . See Bessis, D. and Glaser, V.: Nuovo Cimento (Serie X) 50, 568 (1967) 16. Lukaszuk, L. and Martin, A.: Nuovo Cimento 52, 122 (1967), Appendix E 17. Kupsch, J.: Nuovo Cimento B 70, 85 (1982) 18. Martin, A.: In: Proceedings of the 1967 International Conference on Particles and Fields, Rochester, C. Hagen, G. Guralnik and V.A. Mathur, eds., New York: John Wiley and Sons, 1967, p. 255 19. Khuri, N.N. and Kinoshita, T.: Phys. Rev. B 137, 720 (1965) 20. Cheng, H. and Wu, T.T., Phys. Rev. Lett. 24, 1456 (1970) 21. Amaldi, U. et al.: Phys. Lett. B 44, 112 (1973); Amendolia, S.R. et al.: Phys. Lett. B 44, 119 (1973) 22. Bartenev, V. et al.: Phys. Rev. Lett. 31, 1367 (1973); Amaldi, U. et al.: Phys. Lett. B 66, 390 (1977) 23. Matthiae, G.: Rep. Progr. Phys. 57, 743 (1994) 24. Khuri, N.N.: Rencontres de Physique de la vallée d’Aoste, 1994, M. Greco, ed., Editions Frontières 1994, p. 771; See also: Khuri, N.N. and Wu, T.T.: Phys. Rev. D 56, 6779 and 6785 (1997) 25. Faus-Golfe, A.: Private communication Communicated by W. Zimmermann
Commun. Math. Phys. 219, 199 – 219 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Strongly Coupled Quantum Discrete Liouville Theory. I: Algebraic Approach and Duality L. D. Faddeev1,2 , R. M. Kashaev1,2 , A. Yu. Volkov1,3 1 Steklov Mathematical Institute at St. Petersburg, Fontanka 27, St. Petersburg 191011, Russia.
E-mail:
[email protected];
[email protected]
2 Helsinki Institute of Physics, University of Helsinki, P.O. Box 9, 00014 Helsinki, Finland.
E-mail:
[email protected];
[email protected]
3 Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussel, Belgium. E-mail:
[email protected]
Received: 26 May 2000 / Accepted: 28 May 2000
Dedicated to the memory of Harry Lehmann Abstract: The quantum discrete Liouville model in the strongly coupled regime, 1 < c < 25, is formulated as a well defined quantum mechanical problem with unitary evolution operator. The theory is self-dual: there are two exponential fields related by Hermitian conjugation, satisfying two discrete quantum Liouville equations, and living in mutually commuting subalgebras of the quantum algebra of observables. Introduction The Liouville equation [33] φtt − φxx − 4e−2φ = 0
(1)
has plenty of important applications in Mathematics and Physics. In particular, it describes the surfaces of constant negative curvature and plays the indispensable role in uniformization theory of Riemannian surfaces [35] (for some recent approaches see [40–42]). In modern physics the Liouville equation defines one parameter family of models in conformal field theory (CFT), which usually is identified with 2-dimensional gravity [30]. It plays an even more important role in Polyakov’s theory of the noncritical Bosonic string in dimensions d < 26 [36]. For these reasons the Liouville model, especially in its quantum version, attracted wide attention during the last 25 years [10, 11, 25, 26]. The parameter c, labelling the quantum Liouville theory as a CFT model, is the central charge in a representation of the Virasoro algebra. The quasiclassical (or weak coupling) region, corresponding to large positive c, is well understood. The domain c < 1, containing the minimal models of CFT for some preferred discrete values of c, is also well described [9, 22, 18]. It is the region 1 < c < 25, corresponding to strong coupling, about which there existed very limited knowledge until now. In this paper we begin to describe one more method to treat the Liouville model, which is applicable for studying the strong coupling region. The method is based on the
200
L. D. Faddeev, R. M. Kashaev, A. Yu. Volkov
apparatus of quantum integrable models (see e. g. [14] for the recent survey). Some parts of this machinery were already used in CFT and, in particular, for the Liouville model. However, we feel that we have added several new things to this development. First, we use the lattice regularization for the model, which exactly retains the integrability. Here we follow the previous papers [16, 21]. Second, we show that it is indispensable (especially in the strong coupling region) to use simultaneously two mutually dual discrete models, corresponding to two exponents of the coupling constant τ ±1 , symmetrically entering the expression for the central charge c = 1 + 6(τ + τ −1 + 2). Positive τ correspond to c > 25 or weak couplings, while strong couplings lead to complex τ on the unit circle, τ = eiθ ,
τ −1 = e−iθ = τ¯ ,
and only unification of the models for τ and 1/τ can restore unitarity. Similar considerations on constructing “modular doubles” were used earlier in simpler examples of the Weil–Heisenberg algebra [13] and quantum group [15]. This paper, technically being more involved, ideologically belongs to the same line of thought. We must stress, that one can see the elements of the dualization idea in papers [1, 24, 10, 43] devoted to the Liouville model itself, and in [5–7, 9] within the framework of conformal field theory. Our main result consists in constructing the unitary evolution operator for the chiral shift serving both dual models. In the first part of the paper we recall various facts about the Liouville equation to be used in what follows. Many of these facts are known in one or another form; however we present them in the form most suitable for our goal. Then follows a description of the discretized Liouville equation and its formal quantization in purely algebraic manner. The appropriate involution together with proper quantization are introduced in Sect. 4. The self-dual structure is indispensable for that. 1. Recollecting the Facts In this section we recall in appropriate form some facts about the Liouville model and the technique of its discretization. 1.1. Liouville formula and Möbius-invariance. The Liouville formula makes solutions, all of them in fact, from pairs of arbitrary functions of one variable “moving” in lightcone directions −e−2φ(x,t) =
α (x − t)β (x + t) . (α(x − t) − β(x + t))2
(2)
The right-hand side of Eq. (2) is invariant under simultaneous point-wise Möbius transformations m21 + αm11 , β → β M, α → α M = m22 + αm12 of the “chiral halves” α and β. Accordingly, if those are “Möbius-periodic” α(x + 2π) = α(x) T ,
β(x + 2π ) = β(x) T ,
Strongly Coupled Quantum Discrete Liouville Theory
201
with the same monodromy matrix T in both chiralities of course, the solution comes out periodic in the spatial direction φ(x + 2π, t) = φ(x, t). From now on, this will be the only boundary condition in use. Because of this hidden Möbius-symmetry, the chiral halves cannot be uniquely restored from a given solution, but their Schwarz derivatives can, for they are Möbiusinvariant as well. In particular, that of α (times minus one half) 1 α 3 α 2 u=− − ) ( 2 α 2 α doubles as the (chiral component of the) stress-energy tensor u(x − t) =
1 1 (φx − φt )2 + (φxx − φtx ) + e−2φ . 4 2
(3)
We will now pay less attention to the parallel, or rather perpendicular, β-chirality. 1.2. Magri bracket. The canonical Poisson bracket { (x), ϕ(y)} = δ(x − y) on the laboratory Cauchy data ϕ = φ|t=0 ,
= φt |t=0 ,
together with canonical total momentum and energy P = ϕ dx, H = 21 ( 2 + ϕ 2 + 4e−2ϕ )dx, gives the equations of motion in the Hamiltonian formalism: ϕ = {P , ϕ},
ϕ˙ = = {H, ϕ},
= {P , },
˙ = ϕ + 4e−2ϕ = {H, }.
These equations lead to a single free motion equation u˙ = −u = {L0 , u},
(4)
with L0 denoting the (slightly shifted) zeroth Fourier coefficient L0 = (u + 41 ) dx = 21 (H − P + π ). The bracket involved is the same canonical bracket of course, but now we are prompted to express it, using formula (3) at t = 0, in terms of either u itself or its Fourier coefficients La = (u + 41 )e−iax dx. The Magri bracket [34] {u(x), u(y)} = (u(x) + u(y)) δ (x − y) −
1 2 δ (x
− y)
(5)
202
L. D. Faddeev, R. M. Kashaev, A. Yu. Volkov
and the yet more celebrated (Poisson bracket realization of) Virasoro algebra −i{La , Lb } = (a − b)La+b + π(a 3 − a)δa,−b emerge. They will attract our attention in the next couple of (sub)sections where we shall try to treat the Magri bracket and the “chiral Hamiltonian” L0 as a stand-alone dynamical system, not for the first time of course. We shall review some earlier attempts to get it properly discretized and quantized. It will give an idea how far we can go on bare common sense and formal algebra, without actually touching the Liouville equation. 1.3. Volterra model. The first such attempt [16] produced a viable lattice counterpart of the Magri bracket {um , un } = 41 um un (4 − um − un )(δm+1,n − δm−1,n ) (5 ) − u 1 (m+n) (δm+2,n − δm−2,n ), 2
with un ∼ 1 − In the traditional difference-differential scheme of things it has to do with Volterra’s venerable preys and predators u˙ n = un (un−1 − un+1 ) = (4 ) log um , un , "2 u(n").
otherwise known as the lattice KdV equation. It should be noted however that the latter only emerges in a rather tricky continuous limit, while the simplest limit leads to the free equation (4). Which is a little unfortunate, for however essential a part the Volterra system has been playing in the soliton theory, there must be something wrong about a nonlinear equation emulating a linear one. Apparently, discrete space and continuous time do not get along, still, the KdV connection may teach us a thing or two. 1.4. Bi-hamiltonity. Historically, the Magri bracket brought about the notion of “raising and lowering” which soon became paramount for the whole Hamiltonian theory of the soliton equation. In our case, one lifts L0 to ↑ 1 2 L0 = 4 u dx to turn the free motion (4) into the KdV equation ↑
u˙ = 21 u − 3uu = {L0 , u}, then drops the Magri bracket to that of Zakharov–Faddeev {u(x), u(y)}↓ = 2δ (x − y) to restore the free motion in a different guise ↑
u˙ = −u = {L0 , u}↓ . Although not quite relevant to the Liouville equation, this makeshift step may be useful anyway, for the corresponding lower guise of the Volterra system u˙ n = un (un−1 − un+1 ) = um , un ↓
Strongly Coupled Quantum Discrete Liouville Theory
203
features a more quantization-friendly bracket {um , un }↓ = um un (δm+1,n − δm−1,n ). We happen to already know [19] how to turn this one into a noncommutative algebra and to arrange there a “free motion in discrete time”. So, in the rest of this section we shall be looking at that motion; then we shall try to guess how it could be lifted back to the Magri–Virasoro case. 1.5. Quantization. Going quantum, the bracket ↓ ought to become Weyl-style exchange relations um un = q 2(δm+1,n −δm−1,n ) un um . We do not want to discuss prematurely the nature of the quantization constant q or the exact degree of formality implied in what follows, but we want to be very clear about boundary conditions: the u’s are strictly periodic and so is the Kronecker symbol, 1 if m ≡ n (mod N ) un+N = un δm,n = 0 otherwise. For a reason which will surface shortly, we want period N to be even and an additional condition u1 u3 . . . uN−1 = u2 u4 . . . uN
(6)
to be met. It makes sense because the elements in both sides are central. With all this in place, the following relation uˆ n = un−1 = Q−1 un Q, with Q = b1 b2 . . . bN−1
bn =
∞ a=−∞
q a uan , 2
proves to hold for all n ∈ Z. So, with automorphism ˆ : un → un−1 interpreted as a jump in time, the above relation replaces the (lower) Volterra system with a free and fully discrete Heisenberg quantum motion, employing the element Q as an “evolution operator”. 1.6. Proof. We just honestly start from n = 1: Q−1 u1 Q = Q−1 u1 b1 b2 b3 . . . bN−1 = Q−1 b1
∞ a=−∞
qa
2 +2a
ua2 u1 b3 . . . bN−1
−1 −1 = Q−1 b1 b2 q −1 u−1 2 u1 b3 . . . bN−1 = Q b1 b2 b3 u2 u1 u3 b4 . . . bN−1 = · · ·
– every step adds another factor, in the end parity of N and that center-reducing condition (6) make all the difference – · · · = Q−1 Q(u2 u4 . . . uN−2 )−1 u1 u3 . . . uN−1 = uN .
204
L. D. Faddeev, R. M. Kashaev, A. Yu. Volkov
We may seem to have done only one very particular case, but, luckily, the rest is little more than tautology. It immediately follows that Q−1 b1 Q = bN , which can be written as Q = b1−1 QbN = b2 b3 . . . bN , ˆ and therefore anything good for n = 1 at once which in turn just means that Q = Q becomes just as good for all n. That is it. In the process, a seeming contradiction between the “open-ended” appearance of the evolution operator and the cyclic nature of its action has resolved itself. Since ˆˆ = · · · = b b ˆ = b2 b3 . . . bN = Q Q = b1 b2 . . . bN−1 = Q N N+1 . . . bN−2 , whichever of these forms one chooses to express Q in, it always remains an ordered product but each time starts from another point. In this sense it does not depend on the starting point, or rather does not have one. The next subsection offers an explanation of this miracle. 1.7. It has to do with braids. Let us compile a list of what we know about the b’s. They are periodic, bn+N = bn , they are this “local” bm bn = bn bm
if |m − n| ≡ 1
(mod N ),
and they satisfy “global” relations which closed the previous subsection b1 b2 . . . bN−1 = b2 b3 . . . bN = · · · = bN bN+1 . . . bN−2 . Let us now consider these as defining relations and identify the emerging group. It is BN , the group of braids of N strings in 3 dimensions, alternatively defined by an exhaustive list of (Artin’s) relations bn bn−1 bn = bn−1 bn bn−1 ,
bn bm = bm bn if |n − m| = 1
imposed on N −1 generators b1 , b2 , . . . , bN−1 . We leave it as an (instructive) exercise to check that our list is equivalent to Artin’s, provided the leftmost of our “global” relations is read as bN = (b2 . . . bN−1 )−1 b1 b2 . . . bN−1 and understood as the definition of bN . Just in case, let us warn against mistaking bN for the N th generator of BN+1 , ours is just an element of BN explicitly defined above. Those familiar with the braid group must have already recognized in it the first string crossing the very last one behind all the others. Let us picture it for N = 4: b1 =
b2 =
b3 =
b4 =
.
A picture of the evolution operator Q = b1 b2 b3 = b2 b3 b4 = b3 b4 b1 = b4 b1 b2 =
,
now explains better than words how it manages to be ordered and cyclic at the same time.
Strongly Coupled Quantum Discrete Liouville Theory
205
1.8. Lattice Virasoro algebra. Now we look for a generalization of this construction by natural interpretation of the “raising”. So, we want the lattice Virasoro algebra actually to be a group. If that is to be, the “global” relations ought to remain the same as in the braid group d1 d2 . . . dN−1 = d2 d3 . . . dN = · · · = dN dN+1 . . . d2N−2 , or equivalently dˆn = dn−1 = Q−1 dn Q,
Q = d1 d2 . . . dN−1 ,
(4 )
But this time each generator should interfere not only with the nearest neighbours but also with the second-nearest ones dm dn = dn dm
if |m − n| ≡ 1 and 2
(mod N ),
just like it was in bracket (5 ) of course. This by itself implies dn dn−2 dn−1 dn dn+1 = dn−2 dn−1 dn dn+1 dn−1 instead of Artin’s relations, but we feel that these are not tight enough, so we voluntarily split each of them in two dn dn−2 dn−1 dn dn+1 = dn−2 dn dn−1 dn+1 = dn−2 dn−1 dn dn+1 dn−1 , to end up with weird relations from [47] dn+1 dn−1 dn dn+1 = dn−1 dn+1 dn , dn dn−1 dn+1 = dn−1 dn dn+1 dn−1 .
(5 )
This coup de force will find its justification in the treatment of the Liouville model below. 2. Difference-Difference Liouville Equation 2.1. Liouville formula. For an aesthetic reason alone, its difference approximation can only be
t
β
β
α
❅
❅ ❅ ✻ ❅φ ❅ ❅ ❅ ❅
α −"2 e−2φ ∼
(α − α)(β − β) , (α − β )(α − β)
✲ x
with primes now denoting finite shifts of arguments by ". Appearance aside, we have already mentioned that the Liouville formula manifests invariance of φ under simultaneous point-wise Möbius transform of chiral halves α and β. The so-called cross-ratio employed in our difference scheme has just the same property. To make of this a lattice formula we draw a square j, k lattice and put
α 1 (j −k+1) − α 1 (j −k−1) β 1 (j +k+1) − β 1 (j +k−1) 2 2 2 χj k = − 2 α 1 (j −k+1) − β 1 (j +k+1) α 1 (j −k−1) − β 1 (j +k−1) 2
2
2
2
206
L. D. Faddeev, R. M. Kashaev, A. Yu. Volkov
in the vertices with j + k ≡ 1 (mod 2), like this: k ✻
β3 y ✐ ✐ y y ✐ ✐ y
α1
β4
α2
k ✻ y ✐
y❅✐ χ ❅✐ ❅ 52 ❅ ✐ y❅✐ y❅✐ y ❅ ❅ y ✐ y❅✐ y❅✐✲ j ❅ ❅ ✐ y ✐ y❅✐ y ❅
So, the χ’s occupy sites marked by bullets, the empty circlets will this time remain empty. The second k-axis at j = 8 reminds that the lattice covers a cylinder rather than a plane, χj +2N,k = χj k , with N not necessarily equal to four of course. 2.2. Discrete Liouville equation. It not only ideologically justifies the chosen discretization but also helps make the following action more entertaining. In continuum, the Liouville equation may be considered as a compatibility condition for the Liouville formula overloaded with Möbius-invariance. It ought to be just the same on the lattice, so let us find out what “equation of motion” the χ ’s might solve. We pick four of them next to each other to figure out that they are four cross-ratios made from the total of six α’s and β’s. That is one too many: 4 χ ’s + 3 symmetry parameters – 3 α’s – 3 β’s = 1. The missing one turns out to be χj,k+1 χj,k−1 =
χj −1,k χj +1,k (1 + χj −1,k )(1 + χj +1,k )
j + k ≡ 0 (mod 2),
(7)
which thus becomes our favourite lattice Liouville equation – replacing Hirota’s original hj,k+1 hj,k−1 =
hj −1,k hj +1,k . 1 + hj −1,k hj +1,k
The two are connected by a simple change of variables χj k = hj −1,k hj +1,k , which quarters the h’s in the empty sites of our lattice of course. However, that change would also split cross-ratios, which we want to avoid, at least in this paper. The stress-energy connection (3) also lends itself to a cross-ratio treatment. The Schwarz derivative inevitably becomes another cross-ratio [17] un = 4
(αn+2 − αn+1 )(αn − αn−1 ) , (αn+2 − αn−1 )(αn+1 − αn−1 )
then another counting argument leads to χj,k−1 χj +1,k 4 1 + χj +1,k + . = 1 + χj,k−1 + u 1 (j −k) χj −1,k χj +2,k−1 2
(8)
Strongly Coupled Quantum Discrete Liouville Theory
207
2.3. Poisson bracket. Two rows of χ ’s make perfect Cauchy data χ2n = χ2n,−1 ,
χ2n+1 = χ2n+1,0 ,
just in case one may directly check that the lattice canonical bracket of [21] {χ2n±1 , χ2n } = χ2n±1 χ2n ,
{χj , χi } = 0 if |j − i| ≡ 1
(mod N )
both a) reproduces itself as the χ ’s evolve according to the lattice Liouville equation (7) and b) translates, by means of formula (8) at k = 0, into the lattice Magri bracket (5 ). 3. Formal Quantization 3.1. Algebra of observables. In a virtual replay of Subsect. 1.5, we introduce a formal quantization constant q, replace the above lattice canonical bracket by Weyl-style exchange relations χ2n±1 χ2n = q 2 χ2n χ2n±1 ,
χj χi = χi χj if |j − i| ≡ 1
(mod 2N ),
opt for even N and impose an additional condition on central elements −1 −1 = χ2 χ4−1 χ6 χ8−1 . . . χ2N−2 χ2N . χ1 χ3−1 χ5 χ7−1 . . . χ2N−3 χ2N−1
(9)
3.2. Quantum lattice Liouville equation. We can only afford a Heisenberg evolution of observables, so let us again consider χj as initial data (χ2n,−1 = χ2n , χ2n+1,0 = χ2n+1 ) and determine elements χj,k+1 , j + k ≡ 0 (mod 2) step by step using a slightly “quantized” lattice Liouville equation χj,k+1 χj,k−1 =
q 2 χj −1,k χj +1,k . (1 + qχj −1,k )(1 + qχj +1,k )
(10)
We deliberately present the r.h.s. as a ratio to stress that all the factors there commute with each other. On the contrary, those in the l.h.s commute neither with each other nor with the r.h.s.
3.3. Shift and evolution operators. If we guessed the equation right, there must exist the “evolution operator”, that is an element K such that χj,k+1 = K −1 χj,k−1 K. Needless to say, these relations will hold everywhere if and only if they do so for k equal 0 and −1 where they become q 2 χ2n−2 χ2n , (1 + qχ2n−2 )(1 + qχ2n ) q 2 χ2n−1 χ2n+1 = . (1 + qχ2n−1 )(1 + qχ2n+1 )
χ2n−1 Kχ2n−1 K −1 = K −1 χ2n Kχ2n
208
L. D. Faddeev, R. M. Kashaev, A. Yu. Volkov
Following [19, 21] we shall produce that element, with explicitly singled out d’Alembert part K∞ good for −1 = q 2 χ2n−2 χ2n , χ2n−1 K∞ χ2n−1 K∞
−1 K∞ χ2n K∞ χ2n = q 2 χ2n−1 χ2n+1 ,
and complete with the “shift operator” J moving the χ ’s in the spatial direction χj +1,k = J −1 χj −1,k J. Here follow explicit formulas for these shift and evolution operators K∞ = V U,
K = E2 K∞ E1 ,
J = V U −1 ,
where E1 =
N
0(χ2n−1 ),
E2 =
n=1
N
0(χ2n ),
n=1
−1 U = θ (qχ1−1 χ2 )θ (qχ3−1 χ4 ) . . . θ (qχ2N−3 χ2N−2 ), −1 −1 )θ (qχ2N−3 χ2N−4 ) . . . θ(qχ3 χ2−1 ). V = θ (qχ2N−1 χ2N−2
Of course, U and V have everything to do with braids but we will not go into that. The two special functions involved 0(z) =
∞
(1 + q 2p+1 z) = (−qz; q 2 )∞ ,
p=0
θ (z) = 0(z) 0(z
−1
) = const ·
∞
q p zp 2
p=−∞
are quite special indeed but all we want to know about them, for now at least, are the beautiful functional equations 1 0(qz) , = 1+z 0(q −1 z)
θ(qz) 1 = θ(q −1 z) z
fulfilled by the former, and the ensuing equation on the latter which has already been used, albeit implicitly, in Subsect. 2.4. This time those functional equations and condition (9) gradually translate into relations E1−1 χ2n−1 E1 = χ2n−1 , E2−1 χ2n E2 = χ2n ,
E1−1 χ2n E1 = χ2n (1 + qχ2n−1 )(1 + qχ2n+1 ), E2−1 χ2n−1 E2 = ((1 + qχ2n−2 )(1 + qχ2n ))−1 χ2n−1 ,
U −1 χ2n U = χ2n−1 ,
−1 U −1 χ2n+1 U = q 2 χ2n−1 χ2n+1 χ2n ,
V −1 χ2n V = χ2n+1 ,
−1 V −1 χ2n−1 V = q 2 χ2n−1 χ2n+1 χ2n ,
which combined make J and K satisfy what they have to. We omit the calculation but a remark is in order.
Strongly Coupled Quantum Discrete Liouville Theory
209
3.4. Odd N or no condition (9). In these cases it all fails, and Subsect. 1.6 gives an idea why. In fact, they are better served by those cross-ratio-splitting variables h and Hirota’s lattice Liouville equation which we were so quick to discard. We are planning to return to this issue elsewhere.
3.5. Chiral evolution operator. We define it as Q = U E1 , and it indeed moves the χ ’s in the right direction χj,k+1 = Q−1 χj +1,k Q √ and equals J −1 K, in the sense that Q2 = J −1 K. So, the total momentum P , Hamiltonian H and the chiral Hamiltonian 21 (H − P ) of the original equation have finally become shift-evolution-operators J , K and Q of the quantum lattice equation, but we have yet to find out in what sense, if at all, this Q coincides with that hypothetical quantum lattice group-like counterpart of L0 which we called Q in Sect. 1.8. The answer does not come easy but in the end a few carefully placed q’s do it again, and so does the magic function 0 introduced in Subsect. 3.3. It turns out that Q = d1 d2 . . . dN−1 ,
(11)
where Q is U E1 and the d’s are
−1 −1 dn = 0 q −1 (1 + qχ2n + χ2n χ2n−1 )(1 + qχ2n+1 + χ2n+2 χ2n+1 ) − q −1 . As the notation suggests, these also satisfy relations (4 ) and (5 ). It is instructive now to see that the argument of function 0 in this formula is a natural quantisation of expression (8).
3.6. Proof of Eq. (11). First, we recall the two identities satisfied by 0-function: 0(u)0(v) = 0(u + v),
0(v)0(u) = 0(v + u + qvu),
uv = q 2 vu.
(12)
Next, we have the identity −1 −1 0(χ1 (1 + qχ2−1 ))Q = Q0(χ2N (1 + qχ2N−1 )−1 )
which is a consequence of the relations in Subsect. 3.3 satisfied by U and E1 operators. It is also easily checked that for any j , −1 −1 −1 θ (qχ2j −1 χ2j )0(χ2j −1 ) = 0(χ2j −1 (1 + qχ2j ))0((1 + qχ2j −1 )χ2j ).
210
L. D. Faddeev, R. M. Kashaev, A. Yu. Volkov
Now, using these relations, we have −1 −1 Q = (0(χ1 (1 + qχ2−1 )))−1 U E1 0(χ2N (1 + qχ2N−1 )−1 ) −1 θ(qχ2j = (0(χ1 (1 + qχ2−1 )))−1 −1 χ2j )0(χ2j −1 ) 1≤j
−1 −1 × 0(χ2N−1 )0(χ2N (1 + qχ2N−1 )−1 )
= (0(χ1 (1 + qχ2−1 )))−1
1≤j
=
−1 −1 0(χ2j −1 (1 + qχ2j ))0((1 + qχ2j −1 )χ2j )
−1 × 0(qχ2N−1 χ2N )0(χ2N−1 ) −1 −1 0((1 + qχ2j −1 )χ2j )0(χ2j +1 (1 + qχ2j +2 )) 1≤j
=
dj ,
1≤j
thus obtaining Eq. (11). The proof of relations (5 ) is done in [47] and is also based on identities (12).
4. Dualization 4.1. Involution. The formal quantization of Sect. 3 can be put in the usual framework of quantum mechanics if we assume that both the formal parameter q and the generators χ are exponentials q = eπiτ ,
χj = e−2π
√ τ ϕj
,
and the new generators ϕj have the commutation relations I [ϕ2n±1 , ϕ2n ] = − 2πi ,
[ϕj , ϕi ] = 0 if |j − i| ≡ 1
(mod N )
independent of τ , and so could be taken as selfadjoint operators ϕj† = ϕj . The rest depends on τ of course. If the established formula c = 1 + 6(τ + τ1 + 2) (relating the coupling constant of the continuous Liouville field theory to the central charge of the corresponding representation of the Virasoro algebra) has something to do with our lattice theory, then three cases have to be considered: a) c ≤ 1 ↔ τ < 0; b) c ≥ 25 ↔ τ > 0; and c) 1 ≤ c ≤ 25 ↔ |τ | = 1. The first two, although lead to pretty normal reality conditions χj† = χj−1 and χj† = χj respectively, also put q on the unit circle, which is the last thing we want right now. That leaves c).
Strongly Coupled Quantum Discrete Liouville Theory
211
4.2. Change of function 0 . So, |τ | = 1, also, since c does not distinguish between τ and 1/τ , let for definiteness τ > 0. Following [13], consider the function τ
√ (−e2πi(z+ 2 ) ; e2πiτ )∞ f (z) = e√τ (iz/ τ ) = , 2π i 1 2π i (−e τ (z− 2 ) ; e− τ )∞
(13)
which uses our former favourite 0 as the numerator but divides it by itself but with suitably altered arguments. In the Appendix we collect some of the properties of this function. Here we remark in particular that f (z) satisfies the same functional equation f (z + τ2 ) 1 = f (z − τ2 ) 1 + e2πiτ z
(14)
as 0(e2πiz ) did, but this time we also have |f (z)| = 1 on the line z = τ z¯ , simply because the numerator and denominator of f are complex conjugate of each other on that line. There is an easy profit to be had from that. 4.3. Unitarity. Replace 0 by f in every factor of every shift-evolution operator. All those little factors and big operators will at once become unitary but all the relations we have found them to satisfy will remain intact, for they rely only on that one and only functional equation. For instance, let us write out the so upgraded chiral evolution operator: N
Q = κ 2(N−1) eπi(ϕ1 −ϕ2 ) eπi(ϕ3 −ϕ4 ) . . . eπi(ϕ2N −3 −ϕ2N −2 ) 2
2
2
√ f (i τ ϕ2n−1 ),
n=1
where it is already taken into account that πi 2 z
f (z)f (−z) = κ 2 e− τ
,
with κ = f (0) of course. Mission accomplished, our lattice Liouville model has finally turned from an algebraic fantasy into a quite material unitary theory. Moreover, we shall momentarily see that we get a new feature. 4.4. Dual Liouville equation. Let us permute the factors in the l.h.s of the lattice Liouville equation (10),
−1 χj,k−1 χj,k+1 = χj,k−1 χj,k+1 χj,k−1 χj,k−1 q 2 χj −1,k χj +1,k −1 = χj,k−1 χj,k−1 (1 + qχj −1,k )(1 + qχj +1,k ) =
q −2 χj −1,k χj +1,k (1 + q −1 χj −1,k )(1 + q −1 χj +1,k )
212
L. D. Faddeev, R. M. Kashaev, A. Yu. Volkov
and treat both sides with Hermitian conjugation, remembering that Q† = Q−1 of course. Since πi
q˜ ≡ q −1 = e τ = q ( τ ) , 1 2
χ˜ j ≡ χj† = e
2π −√ ϕ τ j
1
= χjτ ,
the resulting equation reads χ˜ j,k+1 χ˜ j,k−1 =
q˜ 2 χ˜ j −1,k χ˜ j +1,k . (1 + q˜ χ˜ j −1,k )(1 + q˜ χ˜ j +1,k )
(10∗ )
It is plain to see that it is the same equation but with q and χ ’s replaced by q˜ and χ˜ ’s; this is called duality. Indeed, instead of conjugating things, we might instead explore the fact that those χ˜ ’s satisfy the same relations χ˜ 2n±1 χ˜ 2n = q˜ 2 χ˜ 2n χ˜ 2n±1 as the original χ ’s do, except with q˜ instead of q. Moreover, the two sets commute with each other χj χ˜ i = χ˜ i χj . The algebras they generate (call them Aτ and A 1 ) form two factors in the algebra B τ generated by the ϕ’s, and leave no free space, in the sense that B = Aτ ⊗ A 1 , sort of. τ Either way, such a bisection is well served by the function f , which fittingly satisfies the dual functional equation f (z + 21 ) f (z −
1 2)
=
1 1+e
2π i τ z
(14∗ )
to complement the original one (14). Equation (10∗ ) can now be derived in exactly the same way as (10) was, and regarded as not just a conjugate clown of the latter but as an equal dual equation. 4.5. Baxter equation. It reads tτ (λ)Q(λ) = Q(λ + 21 ) + (e4πiτ λ + 1)N Q(λ − 21 ), where tτ (λ) and Q(λ) are two families of elements (of Aτ and B respectively), which all commute with each other [tτ (λ), tτ (µ)] = [Q(λ), tτ (µ)] = [Q(λ), Q(µ)]. In this paper we do not define those families explicitly and do not derive the equation. Let us only mention that the chiral evolution operator Q is in fact Q(λ) evaluated at a particular value of λ. The duality symmetry of our construction implies that there is another dual equation having the form t 1 (λ)Q(λ) = Q(λ + τ2 ) + (e4πiλ + 1)N Q(λ − τ2 ). τ
Thus, the Baxter equation gets associated with the modular lattice. We stop here and postpone the study of Baxter equations to the next paper of this series.
Strongly Coupled Quantum Discrete Liouville Theory
213
5. Conclusions and Comments We have shown that the appropriate lattice regularization of the quantum Liouville model allows to enter the “forbidden region” of the Virasoro central charge 1 ≤ c ≤ 25. The key for this is using the double family of dual quantum fields, which are not selfadjoint, but normal and adjoint to each other. The results are still rather modest, but prove the point quite persuasively. The real problem is that of finding the spectrum of the model. One viable approach to this can be based on the Baxter equation. In the next paper of this series we plan to deal with this equation. The idea of its derivation originally goes to Baxter himself [2]. In [8, 4] it was realized in the context of the chiral Potts model, and in [5–7] in the context of quantum conformal field theory. The dual Baxter equations in a different context were also considered recently by Smirnov [39]. In parallel construction for the WZNW model [12] it was shown how to separate the zero mode problem which reduces the problem of finding the spectrum of the primary states to a problem with a finite number of degrees of freedom. It is possible that similar construction can be found here. If so, one will be able to bypass Baxter equations for the investigation of the spectrum. Liouville model is known to be a contraction of the massive Sine–(Sinh–) Gordon model to which most of our considerations are also applicable [20]. In a series of papers [5–7] the quantum KdV equation was considered without recourse to lattice.As the Liouville and KdV models are close relatives, it is no wonder that one can see many similarities between those papers and our text. Moreover, the discrete variant of quantum KdV was already considered in [46]. We believe that using the lattice allows to use the duality in full strength and, in particular, go to the strong coupling region. In papers [27, 28] arguments were raised for the exceptional values of the central charge c = 7, 13, 19. From our point of view, these values are distinguished only by giving our modular lattice supplementary symmetries (τ corresponds to elliptic fixed points of the modular figure: eiπ/3 , eiπ/2 , ei2π/3 ). There is no reason for us to exclude other values of τ on the unit half-circle. Finally, we cannot help stating that for us the function f (z) defined in Eq. (13) seems to be the real cornerstone of the theory of quantum integrable models. It appears in basic objects of the dynamical theory: the evolution and the Baxter operators. Its close relative as a function of rapidity defines Zamolodchikov’s factorized S-matrices. It is also indispensable in the theory of form-factors [38]. We believe that the full content of duality, hidden in this function, is still not explored to the full extent. In the Appendix below we describe some of its remarkable properties. 6. Appendix: The Non-Compact Quantum Dilogarithm Let complex b have a nonzero real part $b = 0. The non-compact QDL, eb (z), z ∈ C, | z| < | cb |, cb ≡ i(b + b−1 )/2, is defined by the formula eb (z) ≡ exp
+∞ e−2izx dx 1 , 4 −∞ sinh(xb) sinh(x/b)x
(15)
214
L. D. Faddeev, R. M. Kashaev, A. Yu. Volkov
where the singularity at x = 0 is put below the contour of integration. This definition implies that eb (z) is unchanged under substitutions b → b−1 , b → −b. Using this symmetry, we choose b to lay in the first quadrant of the complex plane, namely $b > 0,
b ≥ 0,
τ > 0,
τ ≡ b2 .
which implies that
This function has the following properties. 6.1. Functional relations. Function (15) satisfies the “inversion” relation eb (z)eb (−z) = eiπz
2 −iπ(1+2c2 )/6 b
,
(16)
and a pair of functional equations ±1
eb (z − ib±1 /2) = (1 + e2πzb )eb (z + ib±1 /2).
(17)
The latter equations enable us to extend the definition of the QDL to the entire complex plane. When b is real or a pure phase, function eb (z) is unitary in the sense that eb (z) = 1/eb (¯z).
(18)
If selfadjoint operators P and X in L2 (R) satisfy the Heisenberg commutation relations [P, X] =
1 , 2π i
(19)
the following operator five term identity holds: eb (P)eb (X) = eb (X)eb (P + X)eb (P).
(20)
For real b this can be proved in the C ∗ -algebraic framework1 . We will prove it for complex b. The case of real b then will follow by continuity. 6.2. Analytic properties . We can perform the integration in (15) by the residue method. The result can be written as ratio of two q-exponentials −1
eb (z) = (e2π(z+cb )b ; q 2 )∞ /(e2π(z−cb )b ; q¯ 2 )∞ ,
(21)
where q = eiπb , 2
−2
q¯ = e−iπb ,
thus reproducing definition (13) in the text. Formula (21) defines a meromorphic function on the entire complex plane, satisfying functional equations (16) and (17), with essential 1 S.L. Woronowicz: private communication, 1998
Strongly Coupled Quantum Discrete Liouville Theory
215
singularity at infinity. So, it is the analytical continuation of definition (15) to the entire complex plane. It is easy to read off location of its poles and zeroes: zeroes of (eb (z))±1 = {∓(cb + mib + nib−1 ) : m, n ∈ Z≥0 }. The behavior at infinity depends on the direction along which the limit is taken: eb (z)
|z|→∞
1 | arg(z)| > π2 + arg(b); iπz2 −iπ(1+2cb2 )/6 e | arg(z)| < π2 − arg(b); ≈ 2 2 −1 −2 (q¯ ; q¯ )∞ /:(ib z; −b ) | arg z − π/2| < arg b; :(ibz; b2 )/(q 2 ; q 2 )∞ | arg z + π/2| < arg b,
(22)
where :(z; τ ) ≡
eiπτ n
2 +2πinz
,
τ > 0.
n∈Z
Thus, for complex b, double quasi-periodic θ-functions, generators of the field of meromorphic functions on complex tori, describe the asymptotic behavior of the non-compact QDL.
6.3. Integral Ramanujan identity. Consider the following Fourier integral: eb (x + u) 2πiwx ;(u, v, w) ≡ dx, e R eb (x + v)
(23)
where (v + cb ) > 0,
(−u + cb ) > 0,
(v − u) < w < 0.
(24)
Restrictions (24) actually can be considerably relaxed by deforming the integration path in the complex x plane, keeping the asymptotic directions of the two ends within the sectors ±(| arg x| − π/2) > arg b. So, the domain enlarged in this way for the variables u, v, w has the form: | arg(iz)| < π − arg b,
z ∈ {w, v − u − w, u − v − 2cb }.
(25)
Regarding eb (z) as a “non-compact” analogue of the q-exponent (x; q)∞ , Definition (23) can be interpreted as the corresponding integral counterpart of the Ramanujan sum: 1ψ1 (x, y, z)
≡
(x; q)n n∈Z
(y; q)n
zn .
The latter is known to be evaluated explicitly, the result being the famous Ramanujan summation formula: 1ψ1 (x, y, z)
=
(q; q)∞ (y/x; q)∞ (xz; q)∞ (q/xz; q)∞ . (y; q)∞ (q/x; q)∞ (z; q)∞ (y/xz; q)∞
216
L. D. Faddeev, R. M. Kashaev, A. Yu. Volkov
Remarkably, integral (23) can be evaluated explicitly as well. Indeed, using the residue method, we easily come to the following result: eb (u − v − cb )eb (w + cb ) −2πiw(v+cb )+iπ(1−4c2 )/12 b e eb (u − v + w − cb ) eb (v − u − w + cb ) 2 = e−2πiw(u−cb )−iπ(1−4cb )/12 , eb (v − u + cb )eb (−w − cb )
;(u, v, w) =
(26) (27)
where the two expressions on the right-hand side are related to each other through the inversion relation (16). The similarity of this result with the Ramanujan sum becomes very transparent if we rewrite the latter in the form: (yq n ; q)∞ n∈Z
(xq n ; q)∞
(y/x; q)∞ (q/z; q)∞ θq (xz) (q; q)2∞ , (y/xz; q)∞ θq (x)θq (z)
zn =
(28)
where the θq -function is defined by θq (x) ≡ (q; q)∞ (x; q)∞ (q/x; q)∞ =
q n(n−1)/2 (−x)n .
(29)
n∈Z
Comparing the inversion relation (16) with Eq. (29), we conclude that the non-compact analogue of the θ-function is the Gaussian exponent, and the structures of Eqs. (28) and (26) are now quite similar.
6.4. Fourier transformation of the QDL. Certain specializations of ;(u, v, w) lead to the following Fourier transformation formulas for the QDL: φ+ (w) ≡ =e
eb (x)e2πiwx dx R 2πiwcb −iπ(1−4cb2 )/12
= ;(0, v, w)|v→−∞
/eb (−w − cb ) = e−iπw
2 +iπ(1−4c2 )/12 b
eb (w + cb ),
(30)
/eb (−w − cb ).
(31)
and
(eb (x))−1 e2πiwx R −2πiwcb +iπ(1−4cb2 )/12
φ− (w) ≡ =e
dx = ;(u, 0, w)|u→−∞
eb (w + cb ) = eiπw
2 −iπ(1−4c2 )/12 b
The corresponding inverse transformations read: (eb (x))
±1
=
R
φ± (y)e−2πixy dy,
where the pole at y = 0 is surrounded from below.
(32)
Strongly Coupled Quantum Discrete Liouville Theory
217
6.5. Proof of the Pentagon identity. Using formula (32) and commutation relation (19), we equate the coefficients of the operator terms e−2πixX e−2πiyP in the pentagon relation (20), the result being an integral identity: 2 2πixy = φ+ (z)φ+ (x − z)φ+ (y − z)eiπz dz, φ+ (x)φ+ (y)e R
where the singularities at z = x, z = y are put below, and at z = 0, above the integration path. Now, multiplying both sides of this identity by exp(−2π iyu), integrating over y, and using (32), we obtain 2 φ+ (x)eb (u − x) = eb (u) φ+ (z)φ+ (x − z)eiπz −2πiuz dz. R
Using (30), we rewrite it equivalently eb (u − x) 2 e−iπ(1−4cb )/12 eb (−x − cb )eb (u) eb (z + cb ) −2πiz(u+cb ) = dz = ;(cb , −x − cb , −u − cb ), e e (z − x − cb ) b R
(33)
which is a particular case of (27). Acknowledgement. This paper is partly supported by RFBR grant 99-01-00101, grant INTAS 99-01705, and the Finnish Academy. A.V. wishes to acknowledge the financial support extended by the DWTC office of the Belgian government through the IUAP project P4/08. A.V. and L.D.F. are also grateful to ESI, Vienna for hospitality during October, 1999.
References 1. Babelon, O.: Exchange formula and lattice deformation of the Virasoro algebra. Phys. Lett. B238, 234–238 (1990) 2. Baxter, R.J.: Partition function of the eight-vertex model. Ann. Phys. 70, 193–228 (1972) 3. Baxter, R.J.: Exacly solved models in statistical mechanics. NY: Academic Press, 1982 4. Baxter, R.J., Bazhanov, V.V., Perk, J.H.H.: Int. J. Mod. Phys. B 4, 803–870 (1990) 5. Bazhanov, V.V., Lukyanov, S.L., Zamolodchikov, A.B.: Integrable structure of conformal field theory, quantum KdV theory and thermodynamic Bethe ansatz. Commun. Math. Phys. 177, 381 (1996), hepth/9412229 6. Bazhanov, V.V., Lukyanov, S.L., Zamolodchikov, A.B.: Integrable Structure of Conformal Field Theory II: Q-operator and DDV equation. Commun. Math. Phys. 190, 247 (1997), hep-th/9604044 7. Bazhanov, V.V., Lukyanov, S.L., Zamolodchikov, A.B.: Integrable structure of conformal field theory. III: The Yang-Baxter relation, Commun. Math. Phys. 200, 297 (1999), hep-th/9805008 8. Bazhanov, V.V., Stroganov, Yu.G.: Chiral Potts model as a descendant of the six-vertex model. J. Stat. Phys. 59, (1990) 799–817 9. Belavin, A.A., Polyakov, A.M., Zamolodchikov, A.B.: Infinite conformal symmetry in two-dimensional quantum field theory. Nucl. Phys. B 241, 333–380 (1984) 10. Curtright, T.L., Thorn, C.B.: Conformally invariant quantization of the Liouville theory. Phys. Rev. Lett. 48, 1309–1313 (1982) 11. D’Hoker, E., Jackiw, R.: Liouville field theory. Phys. Rev. D 26, 3517 (1982) 12. Faddeev, L.D.: Quantum symmetry in conformal field theory by Hamiltonian methods. Cargèse 1991, In: New Symmetry principles in quantum field theory, J. Fröhlich et al. eds, NY: Press Plenum 1992, pp. 159–175
218
L. D. Faddeev, R. M. Kashaev, A. Yu. Volkov
13. Faddeev, L.D.: Discrete Heisenberg–Weyl group and modular group. Lett. Math. Phys. 34, 249–254 (1995), hep-th/9504111 14. Faddeev, L.D.: How algebraic Bethe Ansatz works for integrable models. Proc. of Les Houches Summer School, Session LXIV, NATO ASI, Amstedam: Elsevier 1998, pp. 149–220 15. Faddeev, L.D.: Modular double of quantum group. Preprint math.QA/9912078 16. Faddeev, L.D., Takhtajan, L.A.: Liouville model on the lattice. In: Field Theory, Quantum Gravity and Strings, Proceedings, Meudon and Paris VI, France 1984/85, Lecture Notes in Physics 246, eds. H.J. de Vega and N. Sanchez, Berlin: Springer, 1986 17. Faddeev, L.D., Takhtajan, L.A.: Hamiltonian methods in the theory of solitons. Moscow: Nauka 1986 (in Russian). English transl.: Berlin–Heidelberg, Springer–Verlag, 1987 18. Faddeev, L.D., Tirkkonen, O.: Connections of the Liouville model and an XXZ spin chain. Nucl. Phys. B 453 [FS], 647–669 (1995), hep-th/9506023 19. Faddeev, L. D., Volkov, A. Yu.: Abelian current algebra and the Virasoro algebra on the lattice. Phys. Lett. B 315, 311–318 (1993) 20. Faddeev, L. D., Volkov, A. Yu.: Hirota equation as an example of integrable symplectic map. Lett. Math. Phys 32, 125–136 (1994) 21. Faddeev, L.D., Volkov, A.Yu.: Algebraic Quantization of Integrable Models in Discrete Space-Time, In: Discrete Integrable Geometry and Physics, Oxford Lecture Series in Mathematics and its Applications 16, Oxford: Clarendon Press, 1999, pp. 301–320, hep-th/97010039 22. Friedan, D., Qiu, Z., Shenker, S.: Conformal invariance, unitarity, and critical exponents in two dimensions. Phys. Rev. Lett. 52, 1575–1578 (1984) 23. Gervais, J.-L.: Infinite family of polynomial functions of the Virasoro generators with vanishing Poisson brackets. Phys. Lett. B 160, 277–278 (1985) 24. Gervais, J.-L.: Solving the strongly coupled 2d gravity: 1. Unitary truncation and quantum group structure. Commun. Math. Phys. 138, 301–338 (1991) 25. Gervais, J.-L., Neveu, A.: The dual string spectrum in Polyakov’s quantization (I). Nucl. Phys. B 199, 59–76 (1982) 26. Gervais, J.-L., Neveu, A.: Non-standard 2d critical statistical models from Liouville theory. Nucl. Phys. B257 [FS14], 59–76 (1985) 27. Gervais, J.-L., Neveu, A.: Locality in strong coupling Liouville field theory and string models for dimensions 7, 13 and 19. Phys. Lett. B151, 271–274 (1985) 28. Gervais, J.-L., Roussel, J.-F.: Solving the strongly coupled 2d gravity (II). Fractional-spin operators and topological three-point functions. Nucl. Phys. B 426, 140–186 (1994) 29. Izergin, A.G., Korepin, V.E.: Lattice versions of quantum field theory models in two dimensions. Nucl. Phys. B 205, 410–413 (1982) 30. Jackiw, R.: Liouville field theory: A two dimensional model for gravity? In: Quantum theory of gravity, Christensen, S. (ed.), Bristol: Adam Hilger, 1983, pp. 403–420 31. Jorjadze, G.P., Pogrebkov, A.K., Polivanov, M.C., Talalov S.V.: Liouville field theory: IST and Poisson bracket structure. J. Phys. A, 19, 121–139 (1986) 32. Kashaev, R.M.: Liouville central charge in quantum Teichmüller theory. Proc. Steklov Inst. Math. 226, 63–71 (1999), hep-th/9811203 d 2 log λ
33. Liouville, J.: Sur l’ équation aux différences partielles dudv ± λ2 = 0. Comptes rendus de l’Académie 2a des Sciences, Tome XXXVI. Séance du 28 février 1853, Paris 34. Magri, F.: A simple model of the integrable Hamiltonian equation. J. Math. Phys. 19, 1156–1162 (1978) 35. Poincaré, A.: Les fonctions fuchsiennes et l’équation "u = eu . J. Math. Pures et Appl. 4, 137–230 (1898) 36. Polyakov, A.M.: Quantum geometry of Bosonic strings. Phys. Lett. B 103, 207–210 (1981) 37. Schützenberger, M.P.: Une interprétation de certaines solution de l’equation functionelle. C. R. Acad. Sci. Paris 236, 352–353 (1953) 38. Smirnov, F.A.: Form factors in completely integrable models of quantum field theory. Singapore: World Scientific, 1992 39. Smirnov, F.A.: Dual Baxter equations and quantization of Affine Jacobian. Preprint LPTHE-00-04, mathph/0001032 40. Takhtajan, L.: Liouville theory: Quantum geometry of Riemann surfaces. Mod. Phys. Lett.A 8, 3529–3535 (1993) 41. Takhtajan, L.: Topics in the quantum geometry of Riemann surfaces: Two-dimensional quantum gravity. In: Int. School of Physics “Enrico Fermi”, Course CXXVII, L. Castellani and J. Wess (Eds.) IOS Press, 1996, pp. 541–579 42. Takhtajan, L.: Equivalence of geometric h < 1/2 and standard c > 25 approaches to two-dimensional qunatum gravity. Modern Phys. Lett. A 11, 93–101 (1996) 43. Teschner, J., Ponsot, B.: Liouville bootstrap via harmonic analysis an a noncompact quantum group. Preprint hep-th/9911110 44. Volkov, A. Yu.: Quantum Volterra Model. Phys. Lett. A 167, 345 (1992)
Strongly Coupled Quantum Discrete Liouville Theory
45. 46. 47. 48.
219
Volkov, A. Yu.: q-combinatorics and quantum integrability. Preprint q-alg/9702007 Volkov, A. Yu.: Quantum lattice KdV equation. Lett. Math. Phys. 39, 313–329 (1997) Volkov, A. Yu.: Beyond the “Pentagon Identity”. Lett. Math. Phys. 39, 393–397 (1997) Zacharov, V.E., Manakov, S.V., Novikov, S.P., Pitayevsky, L.P.: Theory of solitons. The inverse problem method. Moscow: Nauka, 1980 (in Russian). English transl.: New York: Plenum, 1984
Communicated by A. Jaffe, G. Mack and W. Zimmermann
Commun. Math. Phys. 219, 221 – 245 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Scheme Independence of the Reduction Principle and Asymptotic Freedom in Several Couplings Wolfhart Zimmermann Max-Planck-Institut für Physik, Föhringer Ring 6, 80805 München, Germany Received: 14 June 2000 / Accepted: 28 June 2000
Dedicated to the memory of Harry Lehmann Abstract: It is proved that reduction in the number of coupling and mass parameters is a scheme independent concept. This result justifies to use special renormalization schemes suitable for applications of the reduction method. Scheme changing transformations are discussed with the aim of removing gauge and mass parameters in the reduction equations. Necessary and sufficient conditions for asymptotic freedom in models with several couplings are stated. 1. Introduction The method of reducing the number of couplings was originally proposed for renormalizable models of quantum field theory with dimensionless couplings λ0 , λ1 , . . . , λn and a normalization mass κ as the only parameters [1]. Since the reduction method is exclusively based on the form of the β functions it may as well be applied to other models in formulations for which the β functions are massless and independent of gauge parameters. To this end the Landau gauge is used for gauge theories and a scheme of renormalization like dimensional renormalization in which β functions are mass independent [2, 3]. Then the β functions depend on the dimensionless couplings only βj = βj (λ0 , λ1 , . . . , λn ),
j = 0, 1, . . . , λn .
(1.1)
By the principle of reduction all couplings λj are required to be functions of a single one denoted by λ0 , λj = λj (λ0 )
(j = 1, . . . , n),
(1.2)
in a way which is compatible with invariance under the renormalization group [4–6]. Substituting the functions (1.2) for the couplings λj of the original model one obtains a formulation of a reduced model involving a single coupling parameter λ0 only. As a
222
W. Zimmermann
consequence of the renormalization group invariance of the original and the reduced model as well one finds a system of ordinary differential equations β0
dλj = βj dλ0
(1.3)
to be satisfied by the functions (1.2). For the solutions to be meaningful it is required that all couplings simultaneously vanish in the weak coupling limit λj → 0
for
λ0 → 0.
(1.4)
In many cases it is natural to impose further that all couplings allow for power series expansions with respect to a suitably selected primary coupling λ0 , λj = cj l λl0 . (1.5) In this case the correlation functions of the reduced model have formal expansions with respect to powers of λ0 , thus resembling a renormalizable theory with a single coupling λ0 . For some applications it is useful to consider partial reductions, where several parameters remain independent. It may also be of interest to require – instead of (1.4) – that all couplings simultaneously approach a non-trivial zero of the β functions. Coupling relations (1.2) which follow from the invariance of a model under a symmetry group satisfy the conditions (1.3)–(1.5) provided the symmetry can be implemented to all orders of perturbation theory. The reduction method may thus be considered as a generalization of this particular aspect of symmetry1 . The reduction method was extended by Piguet and Sibold to formulations of models with β functions depending on mass and gauge parameters [20]. In that case the reduction equations become a system of partial differential equations including derivatives with respect to the normalization mass and gauge parameters. Due to these partial derivatives it is difficult to study the solutions of the reduction equations in the general case. However, Piguet and Sibold found the remarkable result that on the basis of the Callan–Symanzik equations [21, 22] the reduction equations have the form of ordinary differential equations with parametric dependence on the mass and gauge parameters. Since in general the renormalization group equations [23] and the Callan–Symanzik equations are independent, the question of consistency between the two types of reduction equations comes up. For solutions which are uniquely determined power series in the primary coupling Piguet and Sibold proved the consistency. For general solutions the issue is more involved. But transforming to a scheme with massless β functions for which renormalization group equations and Callan–Symanzik equations coincide should furnish a resolution of this problem in general. Another important development concerns the combined reduction of couplings and masses in supersymmetric grand unified theories [24]. In this work Kubo, Mondragón and Zoupanos reduced the coefficients of the soft supersymmetry breaking terms in order to minimize the number of independent parameters. The scheme of dimensional renormalization was used with mass parameters introduced similarly to couplings. Then the differential equations of the renormalization group also involve derivatives with respect to the masses. It is characteristic for dimensional renormalization that those β functions which carry a dimension are linear or quadratic forms in the dimensional 1 For reviews see, for instance, refs. [7–14]. Refs. [15–19] contain earlier work related to the reduction of couplings.
Reduction and Asymptotic Freedom
223
couplings and masses, while the coefficients of these polynomials depend on the dimensionless couplings only. Since in this approach the mass parameters enter similarly to the couplings, masses are included with the couplings in the reduction process. In this way Kubo, Mondragón and Zoupanos obtained non-trivial constraints on the soft supersymmetry breaking terms which are compatible with renormalization and lead to surprisingly simple sum rules [25]. In the present paper it will be proved that the principle of reduction is invariant under transformations of couplings and masses which change the scheme of renormalization2 . This scheme independence justifies the use of special schemes of renormalization chosen such that the β functions take a particularly simple form. The proof includes the case of couplings with the dimension of mass and variable masses treated similarly to couplings (Sect. 2). In Sect. 3 methods of eliminating gauge and mass parameters are discussed. It is referred to the work of Breitenlohner and Maison for a comprehensive treatment [27]. For the purpose of the reduction method an alternative approach is proposed which is exclusively based on the differential equations of the renormalization group. In models with dimensionless couplings and pole masses transformations are constructed which lead to a scheme of renormalization with massless β functions. The proof is based on formal expansions with respect to powers of the coupling and uses the assumption that the massless limit of the β functions exists and is approached smoothly. The formulation obtained should be equivalent to the scheme of dimensional renormalization with appropriate normalization conditions. The generalization to models which also involve dimensional couplings and variable mass parameters is only sketched. In this case mass parameters cannot be eliminated completely from the β functions. Instead a polynomial dependence on dimensional couplings and masses remains. The final form of the reduction equations is in agreement with ref. [24]. A different interpretation of the reduction method is provided by the evolution equations [28]. A systematic discussion of the effective couplings in this respect is given in Sect. 4 for models with dimensionless couplings and pole masses. It is shown how in the reduced model the effective couplings are expressed as functionals of the primary coupling. An evolution equation for the primary coupling alone is derived. Again, particularly simple results are obtained, if a scheme of renormalization is used with massless β functions, as is justified by scheme independence. Then the reduction equations follow in the form d λ¯ j β¯0 = β¯j (j = 1, . . . n), β¯j = βj (λ¯ 0 , λ¯ 1 , . . . , λ¯ n ), (1.6) d λ¯ 0 for the effective couplings λ¯ j by eliminating the momentum variable |k| in the evolution equations. Corresponding to (1.4) the condition λ¯ j → 0
for
λ¯ 0 → 0
(1.7)
|k| → ∞
(1.8)
is imposed. In the case of λ¯ 0 → 0
for
the property of asymptotic freedom holds [29, 30]: All couplings vanish simultaneously in the high momentum limit, λ¯ 0 → 0, . . . , λ¯ n → 0
for
2 A preliminary report on this work was given in ref. [26].
|k| → ∞.
(1.9)
224
W. Zimmermann
Equation (1.8) is implied by the evolution equation for λ¯ 0 , if β¯0 has the appropriate sign for λ¯ 0 → 0. For example β¯0 should be negative in the case of λ0 > 0 in the model considered. In this way necessary and sufficient conditions for asymptotic freedom in several couplings follow3 .
2. Scheme Independence We consider models of local quantum field theory with renormalizable interactions involving several coupling and mass parameters. Apart from dimensionless coupling parameters and a normalization mass we allow for the possibility of intrinsic masses, coupling parameters of dimension mass and gauge parameters, should gauge fields be present. For the intrinsic masses either pole masses are used defined by the lowest propagator singularities or variable masses suitably defined by propagators at the normalization point. For implementing the concept of reduction some of the parameters are selected as an independent variables with other parameters depending on them. Usually one single parameter is chosen as independent variable. There are interesting applications, however, where a partial reduction with several independent parameters is useful, see ref. [24], for instance. For this reason the case of partial reduction is included. Following is a list of all parameters involved: 0 , . . . , g0 , 1 , . . . , g1 ; dimensionless couplings g01 g01 0A 0E 0 0 1 , . . . , g1 ; couplings of dimension mass g11 , . . . , g1B , g11 1F 0 , . . . , g0 , 1 , . . . , g1 ; variable masses g21 g21 2C 2G 0 , . . . , g0 , 1 , . . . g1 ; variable mass squares g31 g31 3D 3H pole masses m1 , . . . , mI ; gauge parameters α1 , . . . , αJ ; normalization mass κ.
– – – – – – –
The independent parameters are denoted by gij0 , the parameters gij1 will be considered to be functions of them, 0 0 gij1 = rij (g01 , . . . , g3D , m1 , . . . , mI , α1 , . . . , αJ , κ 2 )
(2.1)
g 1 = r(g 0 , m, α, κ 2 )
(2.2)
or
in vector notation 0 0 1 1 , . . . , g3D ), g 1 = (g01 , . . . , g3H ), r = (r01 , . . . , r3H ), g 0 = (g01 m = (m1 , . . . , mI ), α = (α1 , . . . , αJ ).
(2.3)
The distinction between linear and quadratic mass parameters is a matter of convenience relevant for the massless limit. For the time ordered correlation functions τ = τ (k, g 0 , g 1 , m, α, κ 2 ),
(2.4)
3 For reduced models with asymptotic freedom see refs. [15–18, 31–34], reviews are given in refs. [8, 10].
Reduction and Asymptotic Freedom
225
(k denotes the vector of momentum variables) the partial differential equations of the renormalization group are κ2
∂τ ∂τ ∂τ + βijl + δj + γj τ = 0. l 2 ∂κ ∂αj ∂gij
(2.5)
In the original model all variables gijl of the correlation functions are independent. By substituting the functions (2.1) for the variables gij1 in (2.4) the number of independent parameters is decreased. The correlation functions thus obtained, τ = τ (k, g 0 , m, α, κ 2 ) = τ (k, g 0 , r(g 0 , m, α, κ 2 ), m, α, κ 2 ),
(2.6)
define a new model which is called a reduced model with the reducing functions (2.1). By the reduction principle the reduced model is again invariant under the renormalization group. This means that the correlation functions (2.6) should also satisfy partial differential equations of the form κ2
∂τ ∂τ 0 ∂τ + β + δ + γj τ = 0. ij j ∂κ 2 ∂αj ∂gij0
(2.7)
Comparing (2.5) with (2.7) we obtain βij0 = βij0 ,
δj = δj ,
γj = γj
(2.8)
with the prime indicating that the functions (2.1) should be inserted for the variables gij1 . For the reducing functions (2.1) the partial differential equations κ2
∂rst ∂rst 0 ∂rst + β + δj = βst1 ij ∂κ 2 ∂αj ∂gij0
(2.9)
follow. The reduction principle requires further that the couplings vanish simultaneously in the weak coupling limit g 0 → 0, r0t = 0,
r1u = 0
at
0 goj = 0,
0 g1l = 0.
(2.10)
A considerably stronger restriction may be imposed on the reducing functions by demanding that – in addition to (2.10) – formal expansions of the dependent couplings r0t , r1u , and masses r2u , r3w as well, exist with respect to the independent couplings 0 , g 0 . In that case the correlation functions can also be expanded with respect to the g0j 1l independent couplings so that the reduced system resembles a renormalizable model. If the scheme of renormalization is changed, the couplings and variable masses are transformed like 0 1 Glij = 'ijl (g01 , . . . , g3H , m, α, κ 2 )
(2.11)
or Gl = ' l (g 0 , g 1 , m, α, κ 2 ) in vector form. Here Gl and ' l denote the vectors Gl = (G001 , . . . , G12F ),
0 1 ' l = ('01 , . . . , '2F ).
(2.12)
226
W. Zimmermann
u , gw . These transformations can be expanded with respect to powers of the couplings g0t 1v In lowest order we have
'ijl = gijl + higher orders in
u w g0t , g1v .
(2.13)
The correlation functions τˆ in the new scheme are given by τ (k, g 0 , g 1 , m, α, κ 2 ) = τˆ (k, G0 , G1 , m, α, κ 2 )
(2.14)
with the transformation (2.11) to be substituted for G0 , G1 . In the new scheme the renormalization group equations are κ2
∂ τˆ ∂ τˆ ∂ τˆ ˆstu + β + δ + γj τˆ = 0 j ∂κ 2 ∂Gust ∂αj
(2.15)
with the coefficients βˆstu = κ 2
u u ∂' u ∂'st l 'st + β + δj stj . ij ∂κ 2 ∂α ∂gijl
(2.16)
The functions (2.1) represent a surface S in the space of coordinates gijl . By the transformation (2.11) the surface S will be mapped into a surface Sˆ in the space of coordinates Glij which will be described by functions G1 = R(G0 , m, α, κ 2 ),
R = (R01 , . . . , R2F ).
(2.17)
Inserting these functions into the transformed correlation functions we obtain a reduced system with the correlation functions τˆ (k, G0 , m, α, κ 2 ) = τˆ (k, Go , R(G0 , m, α, κ 2 ), m, α, κ 2 ).
(2.18)
In order to prove the scheme independence of the reduction principle we have to show that τˆ satisfies a renormalization group equation. We begin with the construction of the functions (2.17). The surface S is mapped into the surface Sˆ by G0 = ' 0 (g 0 , r(g, m, α, κ 2 ), m, α, κ 2 ) = L0 (g 0 , m, α, κ 2 ),
(2.19)
G = ' (g , r(g , m, α, κ ), m, α, κ ) = L (g , m, α, κ )
(2.20)
1
1
0
0
2
2
1
0
2
(see Eqs. (2.1) and (2.11)). At given m, α and κ 2 the coordinates of Gl of Sˆ are thus expressed as functions of g 0 which we denote by Ll . For constructing the parametrization (2.17) we have to replace g 0 by G0 . To this end we invert (2.19) with respect to g0 , g 0 = f (G0 , m, α, κ 2 ) (inversion of
G0 = L0 (g 0 , m, α, κ 2 )).
(2.21)
Reduction and Asymptotic Freedom
227
The inversion is possible for values of g 0 not too large, since ∂L0ij 0 ∂gst
=
∂'ij0 0 ∂gst
at
∂'ij0 ∂g 1 vw + = δis δj t 1 ∂g 0 ∂gvw st
0 = 0, g1p
(2.22)
0 g1q =0
(see Eq. (2.13)). Substituting (2.21) for g 0 into (2.20) we obtain G1 = L1 (f (G0 , m, α, , κ 2 ), m, α, κ 2 ) = R(G0 , m, α, κ 2 ).
(2.23)
ˆ By this we have constructed the parametrization (2.17) of the surface S. After this preparation we turn to the proof of the renormalization group equations for the functions τˆ defined by (2.18). Into the transformation law (2.14) of the correlation functions we substitute the reducing functions (2.1) and their image (2.17) for the variables g 1 or G1 resp., τ (k, g 0 , r(g 0 , m, α, κ 2 ), m, α, κ 2 ) = τˆ (k, G0 , R(G0 , m, α, κ 2 ), m, α, κ 2 ).
(2.24)
By definition (2.6) and (2.18) of τ and τˆ this represents the transformation law for the correlation functions of the reduced system τ (k, g 0 , m, α, κ 2 ) = τˆ (k, G0 , m, α, κ 2 )
(2.25)
with (2.19) expressing the dependence of G0 on g 0 . Differentiating (2.25) with respect to κ 2 , gij0 and αj we get 0 ∂ τˆ ∂' 0 ∂rvw ∂τ ∂ τˆ ∂ τˆ ∂L0st ∂ τˆ ∂ τˆ ∂'st st = + = + + , 1 ∂κ 2 ∂κ 2 ∂κ 2 ∂κ 2 ∂G0st ∂κ 2 ∂G0st ∂κ 2 ∂G0st ∂gvw (2.26) 0 0 0 ∂ τˆ ∂L ∂ τˆ ∂' ∂ τˆ ∂' ∂rvw ∂τ st st st = = + , (2.27) 0 0 0 0 0 1 ∂g 0 ∂gij ∂Gst ∂gij ∂Gst ∂gij ∂G0st ∂gvw ij ∂ τˆ ∂Lst ∂ τˆ ∂' 0 ∂ τˆ ∂' 0 ∂rvw ∂ τˆ ∂ τˆ ∂τ st st = + = + + . 1 ∂α ∂αj ∂αj ∂αj ∂G0st ∂αj ∂G0st ∂αj ∂G0st ∂gvw j (2.28)
Inserting these expressions into (2.7), (2.8) and using (2.9) first, then (2.16) (for u = 0), we obtain ∂ τˆ ∂ τˆ 0 ∂ τˆ βˆij + δj + γj τˆ = 0. (2.29) κ2 2 + 0 ∂κ ∂αj ∂Gij These are the renormalization group equations of the reduced system in the new scheme. Combining this result with the renormalization group equations (2.5) of the original system in the new scheme we find the differential equations κ2
∂Rst ∂Rst 0 ∂Rst + βˆij + δj = βˆst1 0 2 ∂κ ∂αj ∂Gij
(2.30)
228
W. Zimmermann
for the reducing functions (2.17). This completes the proof for the scheme independence of the reduction principle. It is easy to check that condition (2.10) – and the power series requirement as well – are scheme independent. We begin with transforming (2.10). By (2.13) 0 = 0, '0s
at
0 g0a
= 0,
0 '1t = 0,
1 '0u = 0,
1 '1v = 0,
0 g1b
1 g0c
1 g1d
= 0,
= 0,
(2.31)
= 0.
Setting 0 = 0, g0a
0 g1b = 0,
it follows r0c = 0
and
r1d = 0
from (2.10) so that in (2.19), (2.20) L00s = 0,
L01t = 0
at
0 g0a = 0,
0 g1b =0
(2.32)
L10u = 0,
L11v = 0
at
0 g0a = 0,
0 g1b =0
(2.33)
and
using (2.31). Since (2.19) is inverted uniquely by (2.21), (2.32) implies f0a = 0,
f0b = 0
at
G00s = 0,
G01t = 0.
(2.34)
Inserting (2.34) followed by (2.33) into (2.23) the final result R0u = 0,
R1v = 0
at
G00s = 0,
G01t = 0
(2.35)
is obtained. This is the transformed version of (2.10) in the new scheme. Similarly the power series requirement can be checked. An expansion of r and the expansion (2.13) implies that L0 and L1 as defined by (2.19) or (2.20) resp. can be 0 , g 0 . The power series of L0 may be inverted to expanded with respect to powers of g0a 0b a power series of f (see Eq. (2.21)) because of (2.22). Inserting the power series of f into (2.23) followed by the expansion of L1 we find that the reducing functions R in the new scheme can be expanded with respect to powers of G00s and G01t . This completes the proof of the scheme independence for the condition that all couplings simultaneously approach zero and the additional requirement that the reducing functions can be expanded in the independent couplings. 3. Elimination of Parameters A comprehensive treatment on the elimination of gauge and mass parameters is given in the work of Breitenlohner and Maison published in this volume [27]. In this section we discuss possibilities of eliminating parameters which are based on the renormalization group alone and should be sufficient for applications to the reduction method. Only minimal assumptions on the dynamics of the system will be needed for that purpose.
Reduction and Asymptotic Freedom
229
The aim is to find parameter transformations which lead to schemes with particularly simple β functions. In the last section the relations (2.16) served to determine the β functions βstu in a new scheme after applying a given transformation (2.11) to the parameters. A different point of view will be taken now: We consider the β functions βˆstu as given in a suitable form and determine transformations (2.11) as solutions of Eqs. (2.16). Postponing the removal of masses as a second step we discuss the elimination of gauge parameters first. For this purpose we consider (2.16) with βˆstu taken to be the values of the β functions in the Landau gauge. In this case solutions of (2.16) can be found, but in general they involve additional parameters carrying a dimension or require a positive lower bound for the masses. Thus the correlation functions will either depend on new mass parameters or a final elimination of masses is impossible. But using a few simple consequences of gauge invariance parameter transformations can be constructed as solutions of (2.16) which do not introduce new parameters and apply to a range of mass values including the massless limit. A detailed treatment of this possibility for eliminating the gauge parameters will be given in another publication. For the remainder of this section it will be assumed that the gauge parameters have been removed. We next turn to the problem of eliminating masses. First we consider models with parameters λ0 , λ1 , . . . , λn ; m1 , . . . , mI ; ζ.
(3.1)
The couplings λi are all dimensionless. The mass parameters mj denote pole masses defined by the location of the lowest propagator singularities. The normalization mass κ is replaced by its inverse ζ =
1 |κ|
(3.2)
which is more convenient for the discussion of the massless limit. Opposite signs of the same coupling parameter are interpreted as belonging to different models, unless the square may be used instead of the original coupling parameter in the renormalization group analysis. For a specific model each coupling parameter is defined such that λj ≥ 0
(3.3)
by changing sign, if necessary. The renormalization group equations (2.5) simplify to
βs
∂τ 1 ∂τ =0 + γs τ − ζ ∂λs 2 ∂ζ
(3.4)
with βs = βs (λ0 , λ1 , . . . , λn , m1 ζ, . . . , mI ζ )
(3.5)
(similarly for γs ). In this and the following section it is assumed that the β functions are differentiable and do not vanish in4 (λ0 , . . . , λn ) ∈ D,
0 ≤ mj ζ < πj ,
ζ > 0,
(3.6)
4 Instead of differentiability Lipschitz conditions would be sufficient for the existence theorems applied in this paper.
230
W. Zimmermann
where D is a bounded domain in the sector λj > 0 on the boundary of D. In the simplest case a cube 0 < λj < ωj
(j = 1, . . . , λn ) with the origin
(j = 0, ..., n),
ωj > 0,
may be chosen for D. The interior of a cone section in λj > 0 (j = 1, . . . , n) with tip at the origin should be sufficiently general. This assumption excludes the case that the β functions vanish identically and restricts (3.6) by an appropriate boundary such that non-trivial zeroes of the β functions remain outside. Moreover, by (3.5) and (3.6) the massless limit βˆj (λ0 , . . . , λn ) = βj (λ0 , . . . , λn , 0, . . . , 0)
(3.7)
exists independently of the way the limit mj → 0 is taken. We want to change the scheme by constructing a transformation (2.11), 5j = 'j (λ0 , λ1 , . . . , λn , m, ζ ),
m = (m1 , . . . , mI ),
(3.8)
which leads to renormalization group equations
βˆs
∂ τˆ 1 ∂ τˆ + γs τˆ − ζ =0 ∂5s 2 ∂ζ
(3.9)
with the massless β functions (3.7), βˆs = βˆs (50 , . . . , 5n ).
(3.10)
The transformations (3.8) are solutions of the partial differential equations (2.16),
βs
∂'j 1 ∂'j − ζ = βˆj . ∂λs 2 ∂ζ
(3.11)
There are many solutions of (3.11). A unique solution can be constructed, for instance, by adjusting the new couplings to the old ones at a normalization mass κ = κ0 , i.e. 5j = λj
at
ζ = ζ0 = 1/|κ0 | > 0.
(3.12)
The existence of such a solution will be proved in a region (3.6). For given mass values the functions (3.8) represent an (n + 2)-dimensional surface S in the (2n + 3)-dimensional space of coordinates λi , 5j , ζ. A solution of (3.11) must be found for which S contains the (n + l)-dimensional surface S0 given by (3.12). The characteristic determinants of the n + 1 equations (3.11) are identical and have the value − 21 ζ0 on the surface S0 . Thus the characteristic determinants do not vanish at ζ = ζ0 > 0. Therefore, a unique solution of (3.11) exists which satisfies the initial conditions (3.12)5 . In this way a new scheme of renormalization is defined for which the β functions are those of the massless model. By this construction, however, a new dimensional parameter κ02 is introduced. The β functions of the new scheme do not depend on it, but the transformation (3.8) as well as the correlation functions τˆ in the new scheme involve this parameter κ0 . Moreover, the dependence on κ0 is not controlled by the renormalization group equation. Instead, a satisfactory method of eliminating masses is provided by adjusting the couplings 5j = λj
at
5 See ref. [35], Chapter 2 and ref. [36], Chapter 2.2.
ζ = 0.
(3.13)
Reduction and Asymptotic Freedom
231
This condition may be interpreted as adjusting the couplings of the old and the new scheme for |κ02 | → ∞. The procedure should not be confused with trying to normalize coupling parameters at infinite momentum. Even in the case of asymptotic freedom such normalization is not easily possible, since then all effective couplings vanish in the high momentum limit. In contradistinction the issue here is to find solutions of the partial differential equations (3.11) satisfying the initial conditions (3.13) with the β functions (3.5) and (3.10). The choice of boundary conditions (3.13) seems to be particularly natural, since the new β functions βˆs are the limits of the original β functions for vanishing ζ , βˆs = lim βs (λ0 , . . . , λn , m1 ζ, . . . , mI ζ ). ζ →0
(3.14)
For the method to work this limit should exist, of course. But it should be stressed that the massless limit of the correlation functions is not required here. It will be shown that indeed a power series solution of (3.11) can be constructed uniquely by imposing condition (3.13). An existence and uniqueness proof which is not based on expansions is also possible, but requires the use of Callan–Symanzik equations in addition as in the work of Breitenlohner and Maison [27]. For the construction of the power series expansions a few assumptions concerning the limit (3.14) will be made. In the formal expansions µ n βj = βj µ λ0 0 · · · λµ µ = (µ0 , . . . , µn ), (3.15) n , M= µj ≥ 2, the coefficients βj µ = βj µ (m1 , . . . , mI ; ζ ) = βj µ (ν1 , . . . , νI ), νj = mj ζ,
(3.16)
are assumed to exist in a region including the massless case ζ = 0. The expansions of the β functions in the new scheme are then µ n βˆj = βˆj µ λ0 0 · · · λµ (3.17) n with the constants βˆj µ given as the values of βj µ at ζ = 0, βˆj µ = βj µ (0, . . . , 0).
(3.18)
It is further assumed that the value βˆj µ is approached smoothly by βj µ in the limit ζ → 0. The condition that :βj µ (m1 ζ, . . . , mI ζ )| ≤ aj µ ζ ;j µ ,
if
0<ζ
(3.19)
will be sufficient for the deviations :βj µ = βj µ
(3.20)
from the zero mass values. The numbers aj µ , ;j µ and z are suitably chosen with aj µ > 0,
0 < ;j µ < 1,
z > 0.
232
W. Zimmermann
The aim is to solve (3.11) by a formal expansion 5s = 's (λ0 , λ1 , . . . , λn , m1 , . . . , mI , ζ ) µ n = 5sµ (m1 , . . . , mI ; ζ )λ0 0 · · · λµ n
(3.21)
with the initial condition (3.13) imposed. This implies 5sµ (m1 , . . . , mI ; 0) = 0
(3.22)
5s(s) (m1 , . . . , mI ; 0) = 1
(3.23)
for all coefficients except
for the coefficient of λs . For the low order terms of the β functions (3.15), (3.17) and the transformation (3.21) we use the simplified notation 1 kl βj = (3.24) bj λk λl + · · · , 2 1 ˆ kl βˆj = (3.25) bj 5k 5l + · · · , 2 :bjkl = bjkl − bˆjkl , (3.26) 1 5s = Ls + Lks λk + (3.27) Lkl s λ k λl . 2 The differential equations (3.11) imply ∂Ls = 0, ∂ζ
∂Lks = 0, ∂ζ
Ls = 0,
Lks = δsk
by the conditions (3.22), (3.23). With this the expansion (3.21) takes the form µ n 5 s = λs + 5sµ (m1 , . . . , mI ; ζ )λ0 0 · · · λµ n .
(3.28)
M≥2
In the notation of (3.24)–(3.27) we obtain the differential equations 1 ∂Lkl ζ s = :bskl 2 ∂ζ for the coefficients of the quadratic terms. The solutions are ζ dx kl Ls = 2 :bskl (m1 x, . . . , mI x) . x 0
(3.29)
(3.30)
By (3.19) the integrals converge, additional constants of integration vanish due to the initial condition (3.13). For treating higher orders we proceed by induction. The hypothesis of induction is: On the basis of the differential equations (3.11) with the initial conditions (3.13) all coefficients 5sµ = 5sµ (m1 , ..., mI ; ζ )
(3.31)
Reduction and Asymptotic Freedom
233
of the expansion (3.28) with 2≤M=
µj < N
(3.32)
have been constructed. This construction is unique and it has been shown that the coefficients (3.31) are bounded by |5sµ (m1 , . . . mI ; ζ )| ≤ csµ ζ ηsµ ,
if
0 < ζ < usµ ,
(3.33)
for suitable numbers csµ , ηsµ , usµ with csµ > 0,
0 < ηsµ < 1,
usµ > 0.
We remark that (3.33) holds for the integral (3.30) as a consequence of (3.19). It will now be shown that each coefficient 5tν = 5tν (m1 , . . . , mI ; ζ ), with
ν = (ν0 , . . . , νn ),
(3.34)
νj = N
is also determined uniquely by (3.11), (3.13) and bounded similarly to (3.33). Equation (3.11) implies the differential equation 1 ∂5sν l βtν − ζ Etν = βˆtν + 2 ∂ζ
(3.35)
l
l are determined by lower orders only with M < N. They for (3.34). The terms Etν are monomials in the coefficients (3.31) with (3.32) and involve coefficients of the β functions. Therefore, they are bounded similarly to (3.33). Equation (3.35) is solved by ζ ζ dx dx l 5tν = 2 :βtν (m1 x, . . . , mI x) +2 Etν (m1 , . . . , mI ; x) . (3.36) x x 0 0 l
l all integrals converge and are again bounded Due to (3.19) and similar bounds for Etν like (3.33). Therefore, (3.33) also holds for 5tν . This completes the proof of induction. On the basis of formal expansions it is thus possible to construct a scheme of renormalization in which the β functions do not depend on the pole masses mj nor on the normalization mass κ. This result will now be applied to the reduction of a model involving the parameters (3.1) with λ0 chosen as primary coupling. For a set of reducing functions
λj = rj (λ0 , m1 ζ, . . . , mI ζ ),
(3.37)
the reduction equations (2.9) take the form β0
∂rj 1 ∂rj − ζ = βj ∂λ0 2 ∂ζ
(j = 1, . . . , n),
(3.38)
βj = βj (λ0 , r1 , . . . , rn , m1 ζ, . . . , mI ζ ).
(3.39)
234
W. Zimmermann
The reducing functions are supposed to satisfy the condition lim rj = 0
(3.40)
λ0 →0
or the stronger power series requirement rj =
∞
cj l λl0 ,
(3.41)
l=1
cj l = cj l (m1 ζ, . . . , mI ζ ). After transforming to massless β functions (3.37) is mapped into 5j = Rj (50 , m1 ζ, . . . , mI ζ ) satisfying βˆ0
∂Rj 1 ∂Rj = βˆj − ζ ∂50 2 ∂ζ
(j = 1, . . . , n),
(3.42)
βˆj = βˆj (50 , R1 , . . . , Rn ) = βj (50 , R1 , . . . , Rn , 0, . . . , 0). Although the β functions do not explicitly depend on mj or ζ , such dependence cannot be excluded for the solutions rj . But it will be shown in the following section that any ζ -dependent solution of (3.42) may be replaced by an equivalent solution of the same equations which is independent of ζ . Therefore, we may set ∂Rj =0 ∂ζ in (3.42) and solve the ordinary differential equations βˆ0
dRj = βˆj d50
(j = 1, . . . , n)
(3.43)
by functions 5j = Rj (50 ) with the requirements lim Rj = 0
50 →0
(3.44)
or the stronger power series condition Rj =
∞
Cj l 5l0 .
(3.45)
l=1
We conclude this section by making some brief remarks on the elimination of the normalization mass and the reduction method for models involving dimensional couplings and variable mass squares as in ref. [24]. The parameters are denoted by
Reduction and Asymptotic Freedom
– – – –
235
dimensionless couplings λ0 , λ1 , . . . , λn , couplings of dimension mass ξ10 , . . . , ξB0 , ξ11 , . . . , ξF1 , 0, 1, variable mass squares ω10 , . . . , ωC ω11 , . . . , ωG inverse normalization mass ζ = 1/|κ|.
The independent parameters are 0 λ0 , ξ10 , . . . , ξB0 , ω10 , . . . , ωC ,
(3.46)
1 λ1 , . . . , λn , ξ11 , . . . , ξF1 , ω11 , . . . , ωG
(3.47)
while the parameters
are treated as functions depending on (3.46), λt = rt (λ0 , ξ 0 , ω0 , ζ ) ξt1 ωt1
(t = 1, . . . , n),
= r1t (λ0 , ξ , ω , ζ )
(t = 1, . . . , F ),
= r2t (λ0 , ξ 0 , ω0 , ζ )
(t = 1, . . . , G)
0
0
(3.48)
with the vector notation ξ 0 = (ξ10 , . . . , ξB0 ),
0 ω0 = (ω10 , . . . , ωC ).
(3.49)
The renormalization group equations (2.5) are j
βj
∂τ 1 ∂τ l ∂τ l ∂τ = 0. + β1j + β2j + γj τ − ζ 1 l ∂λj 2 ∂ζ ∂ξj ∂ωj jl
(3.50)
jl
Taking into account the dimensionality of the β functions we write the representations βt = βt , u ui i = β1tk ξk , β1t uij j ui i β2u t = β2tk ωk + β2tkl ξki ξl
(3.51)
with coefficients uij
ui ui F = βt , β1tk , β2tk , β2tkl
depending on dimensionless ratios only F = F (λ0 , λ1 , . . . , λn , ζ ξ 0 , ζ ξ 1 , ζ 2 ω0 , ζ 2 ω1 ).
(3.52)
Terms involving ζ −1 or ζ −2 with non-vanishing coefficients for ζ → 0 should not be expected in realistic models. It is assumed that the limits βˆt = lim βt , ζ →0
βˆstu = lim βstu ζ →0
(3.53)
236
W. Zimmermann
exist. By (3.51) and (3.52) these limits yield quadratic forms in the dimensional couplings and masses with coefficients depending on the dimensionless couplings. The reduction equations (2.9) take the form ∂rt 1 ∂rt 0 ∂rt 0 ∂rt = βt , + β1j + β2j − ζ 0 0 ∂λ0 2 ∂ζ ∂ξj ∂ωj ∂rst 0 ∂rst 0 ∂rst 1 ∂rst β0 + β1j 0 + β2j 0 − ζ = βst ∂λ0 2 ∂ζ ∂ξj ∂ωj β0
(3.54) (3.55)
with primes indicating the insertion of the reducing functions. On the basis of formal power series expansions a transformation to a scheme can be constructed for which the β functions assume their value at ζ = 0. Details will not be given in this paper. The transformed coupling and mass parameters are denoted by 50 , 51 , . . . , 5n , 0 A1 , . . . , A0B , A11 , . . . , A1F , B01 , . . . , B0C , B11 , . . . , B1G .
(3.56)
For the transformed reducing functions we write the representations 5t = Rt (50 , A0 , B0 , ζ ) = Rt (50 , ζ A0 , ζ 2 B0 ), Stk A0k + St0 ζ −1 , A1t = R1t (50 , A0 , B0 , ζ ) = B1t = R2t (50 , A0 , B0 , ζ ) Ttkl A0k A0l + Ttk0 A0k ζ −1 . = Ttk B0k + St0 ζ −2 +
(3.57) (3.58)
(3.59)
Here the coefficients F = Stk , Sk0 , Ttk , Tt0 , Ttkl , Ttk0 depend on dimensionless ratios F = F (50 , ζ A0 , ζ 2 B0 ).
(3.60)
In the transformed version of the reduction equations ∂Rt 1 ∂Rt 0 ∂Rt 0 ∂Rt + βˆ1j + βˆ2j − ζ = βˆt , 0 ∂50 2 ∂ζ ∂Aj ∂B0j ∂Rst 0 ∂Rst 0 ∂Rst 1 ∂Rst βˆ0 = βˆst , + βˆ1j + βˆ2j − 0 0 ∂50 2 ∂ζ ∂Aj ∂Bj βˆ0
(3.61) (3.62)
the β functions are ζ -independent. Therefore, it is consistent (and can be justified by an equivalence argument) that the reducing functions (3.57)–(3.59) do not depend on ζ . This excludes terms involving ζ −1 or ζ −2 . In the remaining terms ζ may be set equal
Reduction and Asymptotic Freedom
237
to zero so that the coefficients (3.60) become independent of masses and dimensional couplings. Thus 5t = Rt (50 ),
(3.63)
A1t
= R1t (50 , A ) =
Stk A0k ,
(3.64)
B1t
= R2t (50 , A , B ) Ttkl ξk0 A0l . = Ttk B0k +
(3.65)
0
0
0
After insertion of (3.63) the β functions take the form βˆt = φt (50 ), u ui = χtk (50 )Aik , βˆ1t uij j u ui βˆ2t = ψtk (50 )Bik + ψtkl (50 )Aik Al .
(3.66)
Here (3.64) and (3.65) should be substituted for the variables A1k and B1l . Eventually the β functions and the reducing functions become expressed as quadratic forms of the independent variables A0k and B0l . Using ∂Rt = 0, ∂A0j
∂Rt = 0, ∂B0j
∂Rt = 0, ∂ζ
∂R1t ∂R1t ∂R2t = 0, = 0, =0 ∂ζ ∂ζ ∂B0j the reduction equations (3.61), (3.62) simplify considerably. With the representations (3.63)–(3.66) a first order system of ordinary differential equations is found for the coefficients Rt , Stk , Ttk , Ttkl of the reducing functions (3.63)–(3.65). The final result are Eqs. (2)–(11) of ref. [24]. 4. Evolution Equations and Asymptotic Freedom In this section evolution equations will be studied in connection with asymptotic freedom and reduction for models involving dimensionless couplings and masses defined by propagator singularities. For the notation see (3.1). Effective couplings λ¯ j = λ¯ j (z, m; λ0 , λ1 , . . . , λn , ζ ) z=
1 , |k|
ζ =
1 , |κ|
(j = 0, . . . , n),
(4.1)
m = (m1 , . . . , mI ),
depending on a momentum square k 2 are introduced by suitable vertex functions with initial values at the normalization point, λ¯ j = λj > 0
at
z = ζ > 0.
(4.2)
238
W. Zimmermann
For the effective couplings evolution equations hold in the form 1 d λ¯ j = βj (λ¯ 0 , . . . , λ¯ n , m1 z, . . . , mn z) − z 2 dz
(4.3)
with the initial values (4.2). The masses and initial values we restrict by (3.6), likewise z and the values λ¯ j assumed by the solutions of (4.3). Then by the Cauchy–Picard theorem a unique solution (4.1) of (4.3) exists with initial values (4.2). Unless the dependence on the initial values λj , ζ is relevant, the simplified notation λ¯ j = λ¯ j (z, m)
(4.4)
will be used instead of (4.1). Asymptotic freedom means that all effective couplings vanish in the high momentum limit lim λ¯ j (z, m) = 0
z→∞
(j = 0, 1, . . . , n).
(4.5)
In the case of several couplings this is not a property of the model as such, but selects, if at all possible, particular solutions of the evolution equations, while other solutions are not asymptotically free. By imposing (4.5) the couplings are no longer independent. In fact, it will be seen that (4.5) induces a reduction of couplings. Since zeroes are absent in the domain (3.6), the evolution equations (4.3) imply that each effective coupling is either monotonically increasing or decreasing. Therefore, condition (4.5) combined with convention (3.3) implies d λ¯ j >0 dz
(4.6)
βj (λ¯ 0 , . . . , λ¯ n , m1 z, . . . , mI z) < 0
(4.7)
and
for asymptotically free couplings λ¯ j on the domain (3.6). Thus a negative sign for the β functions is a necessary condition for asymptotic freedom. It is, however, – unlike the case of a single coupling – not sufficient in general. Sufficient conditions will be stated later after elimination of mass parameters in the β functions. In preparation for this, how evolution equations transform under a change of the renormalization scheme will be discussed. After a scheme changing transformation (3.7) new effective couplings may be defined by ¯ j = 'j (λ¯ 0 , . . . , λ¯ n , m, z). 5
(4.8)
Through the dependence (4.4) the transformed couplings (4.8) also become functions of z and m with initial values ¯ j = 5j 5
at
z = ζ.
(4.9)
For these functions the notation ¯ j (z, m; 50 , . . . , 5n , ζ ), ¯j =5 5
(4.10)
Reduction and Asymptotic Freedom
239
or simpler, ¯j =5 ¯ j (z, m) 5
(4.11)
will be used. With Eq. (3.11) it is easy to check that the new effective couplings again satisfy evolution equations in the form ¯j 1 d5 ¯ 0, . . . , 5 ¯ n , m1 z, . . . , mI z). − z = βˆj (5 2 dz
(4.12)
The condition (4.5) of asymptotic freedom is scheme independent. For a Taylor formula j ¯ j = λ¯ j + 5 λ¯ s λ¯ t Rst (4.13) j
with appropriate remainders Rst holds according to the properties of transformations (2.11) stated in the last section. Thus (4.5) implies the corresponding condition ¯ j (z, m) = 0 lim 5
z→0
(j = 1, . . . , n)
(4.14)
in the new scheme. The scheme independence justifies studying asymptotic freedom in a special scheme, where the β functions are massless. The evolution equations then take the simplified form ¯j 1 d5 ¯ 0, . . . , 5 ¯ n) − z = βˆj (5 2 dz
(j = 0, . . . , n)
(4.15)
with βˆj denoting the massless limit (3.7). For asymptotically free solutions we write (4.6) and (4.7) in transformed form ¯j d5 > 0, dz ¯ 0, . . . , 5 ¯ n ) < 0. βˆj (5
(4.16) (4.17)
With massless β functions it is possible to treat asymptotic freedom in two separate steps: First, all couplings are reduced to functions of a primary coupling, then the high momentum behavior is determined by a single evolution equation involving the primary ¯ 0 as a primary coupling and introduce coupling only. In order to show this we select 5 it in (4.15) as an independent variable instead of z. Because of (4.16) the function ¯0 =5 ¯ 0 (z, m) 5
(4.18)
¯ 0 , m). z = ζ¯ (5
(4.19)
may be inverted to
¯ j may be expressed as functionals of 5 ¯ 0, By this all 5 ¯j =5 ¯ j (z, m) = 5 ¯ j (ζ¯ (5 ¯ 0 , m), m), 5
(4.20)
which we denote by ¯ j = s¯j (5 ¯ 0 , m), 5
j = 1, . . . , n.
(4.21)
240
W. Zimmermann
¯ 0 as an independent variable the system (4.15) takes the equivalent form Introducing 5 d ζ¯ 1 = − ζ¯ , ¯0 2 d5 d s ¯ j βˆ0 = βˆj ¯0 d5
βˆ0
(4.22) (4.23)
with the notation ¯ 0 , s¯1 (5 ¯ 0 , m), . . . , s¯n (5 ¯ 0 , m)). βˆj = βj (5
(4.24)
Equation (4.22) is integrated by lg ζ¯ =
1 2
c
¯0 5
dx + d, β˜0
¯ 0, c>5
(4.25)
β˜0 = β0 (x, s¯1 (x, m), . . . , s¯n (x, m)). Equation (4.14) may be written equivalently as ¯ 0 , m) = 0, lim ζ¯ (5
(4.26)
¯ 0 , m) = 0. lim s¯j (5
(4.27)
5¯0 →0 ¯ 0 →0 5
Equations (4.23) constitute reduction equations for the reducing functions (4.21) of the ¯ 0 with the condition (4.27) to be imposed. With the solution of the primary coupling 5 reduction equations (4.21) the evolution of the system becomes a problem in one variable only: Eq. (4.22) or (4.25) controls the momentum dependence of the primary coupling ¯ 0 in the high momentum limit. Depending on the sign of β˜0 for small x the divergence 5 ¯ 0 → 0. of the integral for small couplings implies either ζ¯ → 0 or ζ¯ → ∞ for 5 The results of this analysis are summarized by the following necessary and sufficient conditions for asymptotic freedom: ¯ 0 is chosen so that the other Among the effective couplings a primary coupling 5 ¯ ¯ couplings 5j become functions of 50 . These functions should satisfy the reduction ¯ 0. equations (4.23) with the requirement (4.27) that the couplings vanish together with 5 ¯ The β function of 50 should be negative for sufficiently small couplings after inserting the solution of (4.23). As a corollary we note that for asymptotically free couplings all β functions simultaneously become negative for small couplings. More generally, as a consequence of (4.27) reduction solutions of (4.23) satisfy d s¯j >0 ¯0 d5
(4.28)
in (3.6) due to the absence of zeroes of the β functions and the convention (3.3). This means that all β functions have the same sign for small couplings. Negative sign corresponds to asymptotic freedom in the original sense. Positive sign of the β functions can be interpreted as asymptotic freedom in the infrared region. This is relevant for models without intrinsic masses. Not discussed in this paper is the case that β functions vanish identically for some solutions of the evolution equations.
Reduction and Asymptotic Freedom
241
We return to the theory of reduction in general schemes of renormalization. In the last section it was found that the reduction equations still involve the normalization mass after transforming to massless β functions. The resulting reduction equations in the case of asymptotic freedom seem to indicate that such a dependence should not be expected. It will be shown that indeed the normalization mass can be eliminated independently of the scheme by making use of the evolution equations. We begin by setting up the evolution equations of the reduced model. To this end we combine the reduction equations (3.38) with the original form (4.3) of the evolution equations. As initial values (4.2) for the solutions (4.1) of (4.3) reducing functions rj will be taken: λ¯0 = λ0 ,
λ¯ j = rj (λ0 , m, ζ )
at
z=ζ
(j = 1, . . . , n).
(4.29)
The functions rj are supposed to obey the reduction equations (3.38) with the condition (3.40) or the stronger power series requirement (3.41). By the assumptions stated on the β functions for the domain (3.6) existence and uniqueness of the effective couplings (4.1) is implied. Corresponding to the primary coupling λ0 we define an effective coupling λ¯ 0 by (4.1), λ¯ 0 = λ¯ 0 (z, m),
(4.30)
using the simplified notation (4.4). For the reduced model an evolution equation for λ¯ 0 alone is expected. As such we propose 1 d λ¯ − z 0 = β¯0 2 dz
(4.31)
βˆj = βj (λ¯ 0 , r1 (λ¯ 0 , m, z), . . . , rn (λ¯ 0 , m, z)),
(4.32)
with the notation
and the initial conditions λ¯ 0 = λ0
at
z=ζ
(4.33)
to be imposed. We have chosen another notation λ¯ 0 for the effective coupling, since it has yet to be shown that (4.30) indeed solves (4.31). In the domain (3.6), λ¯ 0 = λ¯ 0 (z, m)
(4.34)
exists as a unique solution of (4.31) with the initial condition (4.33). The other effective couplings λ¯ j are introduced by λ¯ j = λ¯ j (z, m) = rj (λ¯ 0 (z, m), m, z)
(j = 1, . . . , n)
(4.35)
as functionals of λ¯ 0 . It will be seen that the functions (4.34) solving (4.31)–(4.33) combined with the functions (4.35) on the one hand and the function (4.30) solving (4.3), (4.29) on the other hand are identical, λ¯ j ≡ λ¯ j
(j = 0, . . . , n).
(4.36)
For the proof we need only check that the functions (4.34), (4.35) likewise solve the evolution equations (4.3) with the initial conditions (4.2). Identity (4.36) follows by
242
W. Zimmermann
the uniqueness property of these differential equations. For j = 0 Eq. (4.3) is satisfied according to the defining equation (4.31) of λ¯ 0 . In order to verify the remaining equations we differentiate (4.35) with respect to z, 1 d λ¯ j 1 ∂rj 1 ∂rj d λ¯ 0 − z =− z − z 2 dz 2 ∂ζ 2 ∂λ0 dz ∂rj 1 ∂rj =− z = β¯j . + β¯0 2 ∂ζ ∂λ0
(4.37)
Here λ¯ 0 (z, m) and z should be substituted for the arguments λ0 and ζ resp. in the partial derivatives of rj , similar to (4.35), for the notation β¯j see Eq. (4.32). Thus we have shown that the functions λ¯ j indeed satisfy the evolution equations (4.3). Since the initial conditions (4.2) are also fulfilled, the proof of (4.36) is completed. The results may be summarized as follows. The effective coupling (4.30) of the reduced model solves the evolution equations (4.31), 1 d λ¯ 0 − z = β¯0 , 2 dz
λ¯ 0 = λ0
at
z = ζ,
(4.38)
with β¯0 given by (4.32). Defining the other couplings by λ¯ j = λ¯ j (z, m) = rj (λ¯ 0 (z, m), m, z),
(4.39)
a solution of the original evolution equations (4.3) in the form ¯ 1 d λj − z = β¯j 2 dz
(4.40)
is obtained with the initial conditions (4.2). We next turn to the question to what extent the reduction equations (3.38) contain redundant information and how it can be eliminated. On the basis of the evolution equations a natural constraint on the reducing functions will be found. Obviously, relevant for the interpretation of the reduction method can only be the final functional dependence of the effective couplings λ¯ j on the primary coupling λ¯ 0 . Accordingly, we call two sets of reducing functions equivalent, (1)
rj
(2)
∼ rj ,
(4.41)
if the resulting functional dependence λ¯ j (z, m) = s¯j (λ¯ 0 (z, m), m)
(j = 1, . . . , n)
(4.42)
is the same. In order to find an appropriate formulation we take the reduced form (4.38) of the evolution equations and invert its solution (4.30) with respect to z, z = ζ¯ (λ¯ 0 , m),
(4.43)
using that β¯0 does not vanish in the domain considered. Then the effective couplings λ¯ j may be expressed as functions of λ¯ 0 , λ¯ j = rj (λ¯ 0 , m, ζ¯ (λ¯ 0 , m)) = s¯j (λ¯ 0 , m).
(4.44)
Reduction and Asymptotic Freedom (1)
Hence rj
(2)
and rj
243
are equivalent, if
(1) (2) rj (λ¯ 0 , m, ζ¯ (1) (λ¯ 0 , m)) = rj (λ¯ 0 , m, ζ¯ (2) (λ¯ 0 , m))
or
(1)
(2)
s¯j (λ¯ 0 , m) = s¯j (λ¯ 0 , m).
(4.45)
The s¯j are not reducing functions per se, but may be viewed as such by admitting a sliding normalization mass. To see this we replace z by λ¯ 0 as an independent variable in (4.40). Similar to the discussion of asymptotic freedom the equivalent set of differential equations d ζ¯ 1 = − ζ¯ , 2 d λ¯ 0 d s ¯ j β¯0 = β¯j , d λ¯ 0 β¯0
(4.46) (4.47)
β¯j = βj (λ¯ 0 , λ¯ 0 , s¯1 (λ¯ 0 , m), . . . , s¯n (λ¯ 0 , m), m, ζ¯ (λ¯ 0 , m))
(4.48)
is obtained. In Eqs. (4.46)–(4.48) we replace λ¯ 0 by its value λ0 at the normalization point and change the notation s¯j , ζ¯ , β¯j to sj , ζ, βj accordingly. Then we have a set of n + 1 ordinary differential equations dζ 1 = − ζ, dλ0 2 dsj β0 = βj (j = 1, . . . , n), dλ0
β0
βj = βj (λ0 , s1 (λ0 , m), . . . , sn (λ0 , m), mζ (λ0 , m))
(4.49) (4.50) (j = 0, . . . , n).
(4.51)
for the functions ζ = ζ (λ0 , m),
λj = sj (λ0 , m)
(j = 1, . . . , n).
(4.52)
The function sj are related to reducing functions by (4.44): sj (λ0 , m) = rj (λ0 , m, ζ (λ0 , m)).
(4.53)
Equations (4.50) may thus be interpreted as reduction equations modified by a sliding normalization mass |κ| =
1 ζ (λ0 , m)
(4.54)
which satisfies the differential equation (4.49). In general Eqs. (4.49) and (4.50) are coupled by the dependence of the β functions on the normalization mass. But in a scheme with massless β functions the system (4.50) can be solved independently of (4.49). Any set sj of solutions for (4.50) then also satisfies (3.38) with ∂sj = 0. ∂ζ
(4.55)
244
W. Zimmermann
Therefore, in a scheme with massless β functions the functions sj coincide with reducing functions Rj independent of ζ , thus representing an equivalence class. Hence without loss of information the dependence on the normalization mass may be disregarded so that the reduction equations (3.38) with massless β functions become a set of ordinary differential equations βˆ0
dRj = βˆj d50
(j = 1, . . . , n)
(4.56)
for functions 5j = Rj (50 ). Equation (4.49) may be integrated to ζ = c exp[−
1 2
a
λ0
dx ], β˜0
(4.57)
β˜0 = βˆ0 (x, R1 (x), . . . , Rn (x)), so that identity (4.53) becomes 5j = Rj (50 ) 1 = Rj (50 , m, c exp[− 2
a
50
dx ]). β˜0
(4.58)
Since the constants m1 , . . . , mI , a and c (correlated to a) do not occur otherwise in (4.56), they may be absorbed by the arbitrary integration constants of the general solution for (4.56). Thus a set of reducing functions Rj is selected in each equivalence class by the solutions of (4.56). In the original formulation of the model on the basis of mass dependent β functions a corresponding set rj may be constructed by applying the inverse of (3.8) with (3.13) to Rj . On the reducing functions thus selected the condition (3.40) or the power series requirement (3.41) is imposed. Acknowledgement. With great pleasure I thank my colleagues P. Breitenlohner, J. Kubo, D. Maison, R. Oehme and K. Sibold for helpful discussions.
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.
Zimmermann, W.: Commun. Math. Phys. 97, 211 (1985) Weinberg, S.: Phys. Rev. D 8, 3497 (1973) Collins, J.C. and McFarlane, A.J.: Phys. Rev. D 10, 1201 (1974) Stueckelberg, E. and Petermann, A.: Helv. Phys. Acta 26, 499 (1953) Gell-Mann, M. and Low, F.: Phys. Rev. 95, 1300 (1954) Bogoliubov, N.N. and Shirkov, D.V.: Dokl. Akad. Nauk SSSR 102, 391 (1955) Zimmermann, W.: In: XIV. Intern. Coll. on Group Theor. Methods in Physics, Seoul, Korea, ed. Y.M. Cho, Singapore: World Scientific, 1985, p. 145 Sibold, K.: In: Proc. of the Intern. Europhysics Conf. on High Energy Physics, Bari, Italy 1985 Oehme, R., Progr. Theor. Phys. Suppl. 86, 215 (1986) Zimmermann, W.: In: Renormalization Group 1986, Dubna, USSR, eds. D.V. Shirkov, D.I. Kazakov and A.A. Vladimirov, Singapore: World Scientific, 1986, p. 51 Kubo, J.: In: Proc. of the 1989 Workshop on Dynamical Symmetry Breaking, eds. T. Muti and K.Yamawaki, Nagoya 1989, p. 48
Reduction and Asymptotic Freedom
245
12. Sibold, K.: Acta Physica Polonica 19, 295 (1989) 13. Kubo, J.: In: Recent Developments in Quantum Field Theory, eds. P. Breitenlohner, D. Maison and J. Wess, Heidelberg: Springer-Verlag, 2000 14. Oehme, R.: In: Recent Developments in Quantum Field Theory, eds. P. Breitenlohner, D. Maison and J. Wess, Heidelberg: Springer-Verlag, 2000 15. Chang, N.-P.: Phys. Rev. 10, 2706 (1974) 16. Fradkin, E.S. and Kalashnikov, O.K.: J. Phys. A 8, 1814 (1975); Phys. Lett. B 64, 177 (1976) 17. Ma, E.: Phys. Rev. D 11, 322 (1975); D 17, 623 (1978) 18. Chang, N.-P., Das, A. and Perez-Mercader, J.: Phys. Rev. D 22, 1829 (1980) 19. Kazakov, D.I. and Shirkov, D.V.: In: Proc. of the 1975 Smolence Conf. on High Energy Particle Interactions, eds. D. Krupa and J. Pišút, VEDA, Bratislava: publishing House of the Slovak Academy of Sciences, 1976 20. Piguet, O. and Sibold, K.: Phys. Lett. B 229, 83 (1989) 21. Callan, C.: Phys. Rev. D 2, 1541 (1970) 22. Symanzik, K.: Commun. Math. Phys. 18, 227 (1970) 23. Osviannikov, L.V.: Dokl. Akad. Nauk SSSR 109, 1112 (1956) 24. Kubo, J.: Mondragón, M. and Zoupanos, G.: Phys. Lett. B 389, 523 (1996) 25. Kawamara, Y., Kobayashi, T. and Kubo, J.: Phys. Lett. B 405, 64 (1997) 26. Zimmermann, W.: In: Proc. of the 12th Max Born Symposium, eds. A. Borowiec, W. Cegła, B. Jancewicz and W. Karwowski, Heidelberg: Springer-Verlag, 1998 27. Breitenlohner, P. and Maison, D.: published in this volume 28. Wilson, K.: Phys. Rev. D 3, 1818 (1971) 29. Gross, D.J. and Wilczek, F.: Phys. Rev. Lett. bf30, 1343 (1973); Phys. Rev. D 8, 3633 (1973) 30. Politzer, H.P.: Phys. Rev. Lett. 30, 1346 (1973) 31. Oehme, R. and Zimmermann, W.: Commun. Math. Phys. 97, 569 (1985) 32. Oehme, R., Sibold, K. and Zimmermann, W.: Phys. Lett. B 153, 147 (1985) 33. Zimmermann, W.: Lett. Math. Phys. 30, 61 (1994) 34. Kubo, J., Mondragón, M. and Zoupanos, G.: Nucl. Phys. B 424, 29 (994) 35. Courant, R. and Hilbert, D.: Methoden der mathematischen Physik. Vol. II, Berlin: Springer-Verlag, 1931 and 1937 36. Epstein, B.: Partial Differential Equations. New York: McGraw-Hill Book Company, 1962 Communicated by A. Jaffe
Commun. Math. Phys. 219, 247 – 257 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Non-Abelian Gauge Theories on Non-Commutative Spaces J. Wess Sektion Physik der Ludwig-Maximilians-Universität, Theresienstr. 37, 80333 München and Max-Planck-Institut für Physik, (Werner-Heisenberg-Institut), Föhringer Ring 6, 80805 München, Germany Received: 9 August 2000 / Accepted: 12 August 2000
Dedicated to the memory of Harry Lehmann Abstract: The construction of a non-abelian gauge theory on non-commutative spaces is based on enveloping algebra-valued gauge fields. The number of independent field components is reduced to the number of gauge fields in a usual gauge theory. This is done with the help of the Seiberg–Witten map. The dynamics is formulated with a Lagrangian where additional couplings appear.
In a recent development gauge theories on non-commutative space-time structures [1– 3] have attracted some interest. This approach can certainly lead to a regularization scheme for gauge theories. Non-commutative space-time coordinates will in general have a discrete eigenvalue spectrum when considered as self-adjoint linear operators on a Hilbert space. The gauge theory will resemble a gauge theory on a lattice. Particular non-commutative spaces have a co-module structure under the action of a quantum group [4, 16]. These spaces allow us to maintain a symmetry structure, usually referred to as quantum groups. Quantum groups can be considered as deformations of groups in the category of Hopf algebras. They depend on a parameter (parameters), we shall call it q, such that for a special value of this parameter, say q = 1, the quantum group coincides with a given group. For different values of the parameter quantum groups are not groups, they are, however, still Hopf algebras. Among the groups that can be deformed that way is the Lorentz group. Thus it seems natural to consider non-commutative spaces that are comodules of the q-deformed Lorentz group. For the regularized gauge theory a symmetry structure will be maintained which in the limit q → 1 becomes the Lorentz group. The hope, however, is that nature could accept the regularization as a natural cutoff for gauge theories. At large distances, however, space-time does not show a lattice structure that is related to a q-deformed Lorentz group. We therefore think along the line that the dynamics of a gauge theory forces a phase transition on space-time at high
248
J. Wess
energy densities or equivalently at very short distances. The phase at large distances is the well-known commutative structure of space-time, the other phase at short distances is governed by a non-commutative structure. The energy density or distance where the phase transition is supposed to take place is characterized by the gauge coupling – the Planck scale is not a natural scale in this scenario. To probe such a scenario we have to formulate a gauge theory on non-commutative spaces. We shall outline a quite general method of how this can be done. The associative algebra we have in mind will be defined by elements that generate the algebra and by relations between these elements. The generating elements we call coordinates {x 1 , . . . , x N }, the relations then generate ideals and the algebra freely generated by the coordinates and divided by the ideals is the respective algebra: Ax =
C[[xˆ 1 . . . xˆ N ]] . R
Formal power series are accepted. Three examples should illustrate this. Canonical structure: The relations are of the type [xˆ i , xˆ j ] = iθ ij ,
θ ij ∈ C.
In phase space this is the algebraic structure of quantum mechanics and has quite nontrivial consequences. In our approach the xˆ i are elements of the configuration space. In this context the canonical structure has been considered in non-commutativeYang–Mills theory in ref. [1]. Lie structure: ij
ij
[xˆ i , xˆ j ] = iθk xˆ k ,
θk ∈ C.
This is the algebraic structure of Lie algebras. It has also been used for non-commutative coordinates in matrix models. The rich structure of Lie algebras is well known. Quantum plane structure: ij
[xˆ i , xˆ j ] = iθkl xˆ k xˆ l . This is the algebraic structure of quantum planes which are comodules of quantum groups. Based on such relations a very rich mathematical structure has been revealed within the last twenty years. The simplest example is the Manin plane with N = 2: xˆ yˆ = q yˆ x, ˆ ij
ij
q ∈ C.
In general the constants θk and θkl will be subject to consistency relations. For the structure constants of a Lie algebra this will be the Jacobi identity, for the quantum structure this will lead to the quantum Yang–Baxter equation.
Non-Abelian Gauge Theories on Non-Commutative Spaces
249
Gauge transformations Fields will be elements of the algebra Ax : ψ(x) ˆ ∈ Ax . Under an infinitesimal gauge transformation a field transforms, as usual: δψ(x) ˆ = i α( ˆ x)ψ( ˆ x), ˆ
αˆ ∈ Ax .
α( ˆ x) ˆ is also an element of the enveloping algebra of a Lie group - the gauge group. The fields are representations of this Lie group, and αˆ acts on these representations. Coordinates are not a covariant concept under gauge transformations: δ(xˆ i ψ) = i xˆ i αψ ˆ = i α( ˆ xˆ i ψ). We can try to define covariant coordinates the same way as we are used to define covariant derivatives: Xˆ i = xˆ i + Aˆ i (x), ˆ
Aˆ i (x) ˆ ∈ Ax .
At the same time Aˆ is an element of the enveloping algebra, we call it enveloping algebra-valued. We demand δ Xˆ i ψ = i αˆ Xˆ i ψ and find δ Aˆ i = i[α, ˆ Aˆ i ] − i[xˆ i , α]. ˆ Tensors can be formed: ˆ Tˆ ij = [Xˆ i , Xˆ j ] − i θˆ ij (X), ˆ is the right hand side of the commutator of the coordinates with the where θˆ ij (X) coordinates replaced by covariant coordinates. The tensors are enveloping algebra-valued as well. The transformation law of these tensors is: δ Tˆ ij = i[α, ˆ Tˆ ij ]. This now allows us to proceed formally but it does not tell us how to make contact with real or complex numbers through which we communicate with nature. Finally we will have to study the representations of the algebraic elements in a Hilbert space to relate the formalism to measurements. To formulate the gauge theory in terms of fields that depend on elements of Rn we can, however, use the formalism of Weyl quantization, without studying the representation theory first. The elements of Rn we shall denote by (x 1 , . . . , x n ).
250
J. Wess
Weyl quantization With a function f (x 1 , . . . , x n ) defined on Rn we can associate an element of the algebra. First Fourier transform the function f : j 1 f˜(k1 , . . . , kn ) = d n xe−i j kj x f (x 1 , . . . , x n ) n/2 (2π ) and define the corresponding element of the algebra by: j 1 W (f ) = d n kei j kj xˆ f˜(k1 , . . . , kn ). n/2 (2π) The element xˆ of the algebra replaces the variable x in f in the most symmetric way – this follows from the power series expansion of the exponential. The algebra elements obtained that way can be multiplied. If these products can be associated again with a classical function we write the product as follows: W (f )W (g) = W (f g). In this case the symbol f g is defined by the left hand side of this equation. It will be linear in f and g. We can write the left-hand side more explicitely: 1 W (f )W (g) = ˜ d n kd n peik·xˆ eip·xˆ f˜(k)g(p). (2π )n For the canonical structure and the Lie structure this product exists and can be calculated using the Baker–Campbell–Hausdorff formula. The function f g exists. Canonical case: i
ˆ 2 k·θ·p eik·xˆ eip·xˆ = ei(k+p)·x− , i
∂
f g = e 2 ∂x i
θ ij
∂ ∂y j
f (x)g(y)
y→x
.
We have obtained the Moyal–Weyl ∗ product f ∗ g [9–14]. For the Lie structure the product of the two exponentials corresponds to group multiplication eik·xˆ eip·xˆ = ei(k+p+ 2 g(k,p))·xˆ , 1
i
f g = e2
∂ ∂ x i gi (i ∂y ,i ∂z )
f (y)g(z)y→x . z→x
For the quantum space structure the Baker–Campbell–Hausdorff formula is not manageable. Moreover, Weyl quantization using the Fourier transformation works well for functions that have a well-behaved Fourier transform. Polynomials are not among these functions. In our algebraic setting fields have been defined as elements of Ax , thus essentially are polynomials and power series. In this setting it is more natural to start from a more algebraic interpretation of Weyl quantization.
Non-Abelian Gauge Theories on Non-Commutative Spaces
251
We study algebras which have a basis. We denote the elements forming that basis by β ν . Any element of the algebra can be expressed in this basis with certain coefficient functions. f (xˆ 1 , . . . , xˆ n ) = fν β ν , ν
n
g(xˆ , . . . , xˆ ) = 1
gν β ν
ν
so can the product f ·g =
dν β ν .
If we symbolically denote the elements by {fν } and {gν } and {dν } respectively, i.e., by there coefficient functions, we write {dν } = {fν } {gν }. The algebraic properties are mapped to the coefficient functions and the non-commutative structure is reflected in the diamond product . A special role is played by algebras that have the Poincare–Birkhoff–Witt property. Poincare–Birkhoff–Witt tells us that, considered as a graded algebra, the subspace of polynomials of fixed degree has the same dimension as the corresponding subspace of polynomials in commuting variables. We then choose a particular basis in this grading and characterize an arbitrary element of the algebra by the coefficient function of this element when expanded in this basis. Then we construct the function of commuting variables by replacing the basis of non-commuting variables by the corresponding basis elements of the commuting variables. This way we obtain a unique map of elements of the algebra into the set of real analytic functions and vice versa. Let me illustrate this by the simplest example of the canonical case. We consider two variables: x, y and xˆ yˆ − yˆ xˆ = iθ . To the constant c and the elements x, ˆ yˆ of Ax correspond the constant c and the monomials x, y. For the basis in Ax we choose the symmetric monomials. The basis elements for the polynomials of second degree are: xˆ x, ˆ 21 (xˆ yˆ + yˆ x), ˆ yˆ y. ˆ We obtain the map xˆ xˆ ←→ xx, 1 (xˆ yˆ + yˆ x) ˆ ←→ xy, 2 yˆ yˆ ←→ yy. For the elements xˆ yˆ and yˆ xˆ of Ax this implies: 1 (xˆ yˆ + yˆ x) ˆ + 2 1 yˆ xˆ = (yˆ xˆ + xˆ y) ˆ − 2 xˆ yˆ =
1 iθ ←→ xy + 2 1 iθ ←→ xy − 2
1 iθ, 2 1 iθ. 2
252
J. Wess
In the language of the Weyl quantization that maps the functions f (x) to elements of Ax this is: W (c) = c, W (x) = x, ˆ W (y) = y, ˆ 1 ˆ W (xy) = (xˆ yˆ + yˆ x), 2 W (x 2 ) = xˆ x, ˆ W (y 2 ) = yˆ y. ˆ The multiplication law is easily obtained: 1 W (x)W (y) = xˆ yˆ = W xy + iθ , 2 i x y = xy + θ. 2 This is exactly the result we obtain from the Moyal–Weyl ∗ product. With some combinatorics we can show that the algebraic approach using the completely symmetrized spaces exactly reproduces the diamond product of the Weyl quantization for the canonical and Lie structures. Let me now treat the Manin plane xˆ yˆ = q yˆ x, ˆ q ∈ R, q > 1. The Poincare–Birkhoff– Witt property holds, and what is more, we can choose any ordering for a basis. We choose the xˆ yˆ ordering for simplicity. W (c) = c, W (x) = x, ˆ W (y) = y, ˆ W (f (x, y)) =: f (x, ˆ y) ˆ :. The dots :: mean that in the power series expansion of the real analytic function f each monomial in x, y has to be replaced by the ordered monomial in xˆ y. ˆ We derive the multiplication law: W (x r y s )W (x n y m ) = W (q −sn x r+n y s+m ). This immediately leads to the diamond product f g(x, y) = q
∂ −x ∂x∂ y ∂y
f (x, y)g(x , y )x →x . y →y
This should serve as an example for the general case. The Poincare–Birkhoff–Witt property holds for all the quantum spaces mentioned above. Therefore we have a unique and invertible map of the fields as elements of Ax to the fields defined as functions of the commutative variables x 1 · · · x N . Actually, monomials in any well defined ordering form a basis. Among them are the completely symmetrized polynomials. f (x 1 , . . . , x N ) ←→ Wˆ (f ). In the case of algebras with the Poincare–Birkhoff–Witt property when an arbitrary ordering is allowed we shall denote the (diamond) product as ∗ (star) product.
Non-Abelian Gauge Theories on Non-Commutative Spaces
253
Our aim is to formulate a dynamics for a physical system with non-commuting coordinates in terms of fields that depend on commuting variables. These fields we shall call physical fields. The non-commuting structure is encoded in the ∗ product as the multiplication law for the physical fields. To formulate a dynamics the algebra has to be enlarged by derivatives. this can be done on purely algebraic grounds, derivatives being introduced as elements of an algebra and the Leibniz rule taking the role of relations. The Leibniz rule has to be defined in such a way that it does not lead to new relations for the coordinates. This is what we mean when we say that the derivatives have to be consistent with the non-commutative structure of the coordinates. For the quantum plane such derivatives have been introduced in ref. [15]. Following the same strategy derivatives can be defined for the canonical structure as well. For the canonical case the usual Leibniz rule will be consistent. ∂ˆρ xˆ µ = δρµ + xˆ µ ∂ˆρ . For the canonical structure the expression xˆ α − iθ αρ ∂ˆρ commutes with all coordinates. It can be used to define a relation on the x, ˆ ∂ˆ algebra. For invertible θ this amounts to defining derivatives as follows: −1 µ ∂ˆρ = −iθρµ xˆ .
As a consequence we do not have to enlarge the algebra by derivatives. In the following we shall restrict the discussion to the canonical structure. Gauge theories As an example we treat non-abelian gauge theories on non-commutative spaces. Such gauge theories cannot be formulated with Lie algebra-valued infinitesimal transformations and consequently not with Lie algebra-valued gauge fields. The reason is that in the composition of infinitesimal transformations commutators and anticommutators of the generators of the group appear. The enveloping algebra of the Lie algebra is the proper setting for such gauge theories An enveloping algebra-valued infinitesimal transformation will in general depend on infinitely many parameters, an enveloping algebra-valued gauge field on infinitely many component fields. A dynamic based on infinitely many fields is not very attractive, it is, however, possible to define enveloping algebra-valued infinitesimal transformations and gauge fields that depend on a finite number parameters and component fields only. The construction of such transformations is based on the Seiberg–Witten map [3]. This map allows us to construct enveloping algebra-valued gauge fields Aν and gauge parameters α parametriced by a finite number of gauge fields aaν (x) and parameters αa1 (x) and their derivatives [8, 17]. The gauge fields aaν (x) will transform like the usual gauge fields. The gauge field Aν will transform as outlined above. The transformation law of the tensor T ij will be unchanged. These tensors can then be used to build invariant Lagrangians for the gauge fields aaν (x). These Lagrangians will be different from the usual gauge-invariant Lagrangians, so will be the coupling of matter fields to the gauge fields because Aν has to be used in the gauge-covariant coupling.
254
J. Wess
We are now going to outline the construction of such a theory. Non-abelian: [T a , T b ] = if ab c T c Non-commutative: [xˆ µ , xˆ ν ] = iθ µν It is natural to consider both algebras simultaneously: zˆ i = {xˆ 1 , . . . , xˆ N , T 1 , . . . , T M }, Az =
C[[ˆz1 , . . . , zˆ N+M ]] . R
The ∗ product formalisms developed with the help of the Baker–Campbell–Hausdorf formula can be applied. We introduce the commuting variables zi = {x 1 , . . . , x N , t 1 , . . . , t M } and find the ∗ product: i
(F ∗ G)(z) = e 2
θ µν ∂x∂µ
∂ ∂x ν
+t a ga i ∂t∂ ,i ∂t∂
F (x , t )G(x , t ) x →x,x →x . t →t,t →t
We define the “physical” fields ψ(x) by the correspondence ˆ x) ψ( ˆ ←→ ψ(x). They span a representation of the gauge group and depend on the commuting coordinates xi . The transformation parameters and the gauge field remain enveloping algebra-valued Aˆ ν (ˆz) ←→ Aν (z), α(ˆ ˆ z) ←→ α(z). It is only via the Seiberg–Witten map that the physical fields aaν enter. In the ∗ product formulation the transformation of the gauge field is: δAν = −i[x ν ∗, α] + i[α ∗, Aν ] = θ νµ ∂µ α + i[α ∗, Aν ]. This, and the definition of the derivative ∂ˆ µ justifies the ansatz: Aν = θ νρ Vρ , that leads to δVρ = ∂ρ α + i[α ∗, Vρ ]. We expand this equation in powers of θ and find a solution to zeroth order that is linear in t:
Non-Abelian Gauge Theories on Non-Commutative Spaces
255
Zeroth order: α = αa1 t a ,
1 a Vρ = aρ,a t . 1 transforms in the usual way: The gauge field aρ,a 1 1 δaρ,a = ∂ρ αa1 − f bc a αb1 aρ,c .
In first order in θ we find a solution that is of second order in t: First order: 2 a b t t + ... , α = αa1 t a + αab
1 a 2 t + aρ,ab tatb + . . . . Vρ = aρ,a
To this order the transformation law of Vρ will be satisfied with the following choice for α (2) and a (2) : 1 νµ 1 θ ∂ν αa1 aµ,b t at b, 2 1 2 1 1 1 t a t b = − θ νµ aν,a (∂µ aρ,b + Fµρ,b )t a t b , aρ,ab 2 1 1 1 1 1 Fνρ,b = ∂ν aµ,b − ∂µ aν,b + f cd b aν,c aµ,d . 2 a b t t = αab
This procedure can be generalized to all the higher powers in θ . We will find solutions where the coefficients of θ n−1 , α n and a n , are polynomials of degree n in t. They will 1 and their derivatives. This will be true to all orders in θ as a be functions of αa1 and aρ,a consequence of the existence of the Seiberg–Witten map [5–7]. It is natural to introduce the field strength Fνρ in analogy to the tensor Tνρ : Fκλ = ∂κ Vλ − ∂λ Vκ − iVκ ∗ Vλ + iVλ ∗ Vκ . It will transform under the restricted transformations like a tensor as well: δα 1 Fκλ = i[α ∗, Fκλ ]. To formulate the dynamics it is desirable to do it in the Lagrangian formalism. An integral has to be defined for this purpose. For the canonical structure an integral can be defined: φˆ = d N x φ(x). We have restricted the integral to the x-coordinates and forget about the t-coordinates for a moment. For the integral over the product of fields the ∗ product has to be used: ψˆ φˆ = d N x ψ(x) ∗ φ(x). From the definition of the ∗ product follows ˆ ˆ ψ φ = φˆ ψˆ
256
J. Wess
and Stokes’ theorem
ˆ = [∂ˆl , ψ]
dN x
∂ ψ = 0. ∂x
A natural choice for the Lagrangian of a gauge theory is L=
1 Tr Fαβ ∗ F αβ . 4
The trace is to be taken over the T -matrices which replace the coordinate t. This action will be invariant under gauge transformations δL = i[α ∗, L], as a consequence of the properties of the trace and the integration dicussed above. To first order in θ the following terms appear in the field strength: 1 1 1 1 Fκλ = Fκλ,a T a + θ µν (T a T b + T b T a ) Fκµ,a Fλν,b 2
1 1 1 1 2∂ν Fκλ,b − aµ,a + aν,c Fκλ,d f cd b . 2 This should serve as an example of how a non-commutative space-time structure manifests itself in the couplings of a gauge theory, formulated on ordinary space in terms of ordinary fields. The consequence of such couplings for the field theoretical properties of the model as well as for the phenomenological properties remain to be investigated. References 1. Madore, J., Schraml, S., Schupp, P. and Wess, J.: Gauge theory on noncommutative spaces. Eur. Phys. J. C 16, 161 (2000), hep-th/0001203 2. Connes, A., Douglas, M.R., Schwarz, A.: Noncommutative Geometry and Matrix Theory: Compactification on Tori. JHEP 9802, 003 (1998), hep-th/9711162 3. Seiberg, N. and Witten, E.: String theory and noncommutative geometry. JHEP 9909, 032 (1999), hepth/9908142 4. Faddeev, L., Reshetikhin, N. and Takhtajan, L.: Quantization of Lie groups and Lie algebras. Leningrad Math. J. 1, 193 (1990) 5. Jurˇco, B., Schupp, P.: Noncommutative Yang-Mills from equivalence of star products. Eur. Phys. J. C 14, 367 (2000), hep-th/0001032 6. Jurˇco, B., Schupp, P. and Wess, J.: Noncommutative gauge theory for Poisson manifolds. Nucl. Phys. B 584, 784 (2000), hep-th/0005005 7. Jurˇco, B., Schupp, P. and Wess, J.: Nonabelian noncommutative gauge theory and Seiberg–Witten map. In preparation, hep-th/0102129 8. Jurˇco, B., Schraml, S., Schupp, P. and Wess, J.: Enveloping algebra-valued gauge transformations for non-abelian gauge groups on non-commutative spaces. Eur. Phys. J. C 17, 521 (2000), hep-th/0006246 9. Weyl, H.: Quantenmechanik und Gruppentheorie. Z. Physik 46, 1 (1927); The theory of groups and quantum mechanics. New York: Dover, 1931, translated from Gruppentheorie und Quantenmechanik, Leipzig: Hirzel Verlag, 1928 10. Wigner, E.P.: Quantum corrections for thermodynamic equilibrium. Phys. Rev. 40, 749 (1932) 11. Moyal, J.E.: Quantum mechanics as a statistical theory. Proc. Cambridge Phil. Soc. 45, 99 (1949) 12. Bayen, F., Flato, M., Fronsdal, C., Lichnerowicz, A., Sternheimer, D.: Deformation theory and quantization. I. Deformations of symplectic structures. Ann. Physics 111, 61 (1978) 13. Kontsevitch, M.: Deformation quantization of Poisson manifolds, I. q-alg/9709040 14. Sternheimer, D.: Deformation Quantization: Twenty Years After. math/9809056 15. Wess, J. and Zumino, B.: Covariant differential calculus on the quantum hyperplane. Nucl. Phys. Proc. Suppl. B 18, 302 (1991)
Non-Abelian Gauge Theories on Non-Commutative Spaces
257
16. Wess, J.: q-deformed Heisenberg Algebras. In: H. Gausterer, H. Grosse and L. Pittner, eds., Proceedings of the 38. Internationale Universitätswochen für Kern- und Teilchenphysik, Lect. Notes in Phys. 543, Berlin–Heidelberg–New York: Springer-Verlag, 2000, Schladming, January 1999, math-ph/9910013 17. Bonora, L., Schnabl, M., Sheikh-Jabbari, M.M. and Tomasiello, A.: Noncommutative SO(n) and Sp(n) gauge theories. hep-th/0006091 Communicated by W. Zimmermann
Commun. Math. Phys. 219, 259 – 270 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Classical Versions of q-Gaussian Processes: Conditional Moments and Bell’s Inequality Wlodzimierz Bryc Department of Mathematics, University of Cincinnati, PO Box 210025, Cincinnati, OH 45221-0025, USA. E-mail:
[email protected] Received: 18 September 2000 / Accepted: 17 November 2000
Abstract: We show that classical processes corresponding to operators which satisfy a q-commutative relation have linear regressions and quadratic conditional variances. From this we deduce that Bell’s inequality for their covariances can be extended from q = −1 to the entire range −1 ≤ q < 1. 1. Introduction In this paper we consider a linear mapping H f → af ∈ B from the real Hilbert space H into the algebra B of bounded operators acting on a complex Hilbert space which satisfies the q-commutation relations af ag∗ − qag∗ af = f, g I,
(1)
and af = 0 for a vacuum vector . This defines a non-commutative stochastic process Xf = af + af∗ , first studied in [5], which following [2] we call the q-Gaussian process. For different values of q, these processes interpolate between the bosonic (q = 1) and fermionic (q = −1) processes, and include free processes of Voiculescu [7] (q = 0). One of the basic problems arising in this context is the existence of the classical versions of q-Gaussian processes, see Definition 2. For q = 1, these are the classical Gaussian processes with the covariances f, g f,g∈H . For q = −1, the classical versions are two-valued, so Bell’s inequality [1] shows that only some covariances may have the classical versions. In [5] classical versions were constructed for covariances corresponding to stationary two-valued Markov processes (q = −1). In [2, Prop. 3.9], the existence of such classical versions was proved for all −1 < q < 1 in the case where the q- Gaussian process is Markovian (which can be characterized in terms of the covariance function). The situation for other covariances remained open in [2] and it was unclear which qGaussian processes have no classical realizations. This issue is addressed in the present
260
W. Bryc
paper. Using a formula for conditional variances of classical versions we derive a constraint on the covariance which extends one of the Bell’s inequalities from q = −1 to general −1 ≤ q < 1. The inequality implies that there are covariances such that the corresponding non-commutative q-Gaussian processes cannot have classical versions over the entire range −1 ≤ q < 1. Since q interpolates between the values q = −1, where classical versions may fail to exist and q = 1, where the classical versions always exist, it is interesting that there is a version of Bell’s inequality which does not depend on q. The proof relies on formulas for conditional moments of the first two orders, which are of independent interest. Computations to derive them were possible thanks to recent advances in the Fock space representation of q-commutation relations (1), see [2, 3]. 2. Preliminaries This section introduces the Fock space representation of q-Gaussian processes, and states known results in the form convenient for us. It is based on [2]. 2.1. Notation. Throughout the paper, q is a fixed parameter and −1 < q < 1. For n n = 0, 1, 2, . . . we define q-integers [n]q := 1−q 1−q . The q-factorials are [n]q ! := [1]q [2]q . . . [n]q , with the convention [0]q ! := 1. The q-Hermite polynomials are defined by the recurrence xHn (x) = Hn+1 (x) + [n]q Hn−1 (x), n ≥ 0
(2)
with H−1 (x) := 0, H0 (x) := 1. These polynomials are orthogonal with respect to the unique √ continuous probability measure νq (dx) = fq (x)dx supported on √ absolutely [−2/ 1 − q, 2/ 1 − q], where density fq (x) has explicit product expansion, see [2, Theorem 1.10] or [6]; the second moments of q-Hermite polynomials are
√ 2/ 1−q
√ −2/ 1−q
(Hn (x))2 νq (dx) = [n]q !.
In our notation we are suppressing the dependence of Hn (x) on q. 2.2. q-Fock space. For a real Hilbert space H with complexification Hc = H ⊕ iH we define its q-Fock space q (H) as the closure of C ⊕ n Hc⊗n , the linear span of vectors f1 ⊗ · · · ⊗ fn , in the scalar product n |σ | if m = n σ ∈Sn q j =1 fj , gσ (j )
f1 ⊗ · · · ⊗ fn |g1 ⊗ · · · ⊗ gm q = . (3) 0 if m = n Here is the vacuum vector, Sn are permutations of {1, . . . , n} and |σ | = #{(i, j ) : i < j, σ (i) > σ (j )}. For the proof that (3) indeed is non-negative definite, see [3]. Given the q-Fock space q (H) and f ∈ H we define the creation operator af : q (H) → q (H) and its ·|· q -adjoint, the annihilation operator af∗ : q (H) → q (H) as follows: af := 0,
q-Gaussian Processes: Conditional Moments and Bell’s Inequality
af f1 ⊗ · · · ⊗ fn :=
n
q j −1 f, fj f1 ⊗ · · · ⊗ fj −1 ⊗ fj +1 ⊗ · · · ⊗ fn ,
261
(4)
j =1
and af∗ = f,
af∗ f1 ⊗ · · · ⊗ fn := f ⊗ f1 ⊗ · · · ⊗ fn .
(5)
These operators are bounded, satisfy commutation relation (1), and af +g = af + ag , see [3]. 2.3. q-Gaussian processes. We now consider (non-commutative) random variables as the elements of the algebra A generated by the self-adjoint operators Xf := af + af∗ , with vacuum expectation state E : A → R given by E(X) = |X q . Definition 1. We will call {X(t) : t ∈ T } a q-Gaussian (non- commutative) process indexed by T if there are vectors h(t) ∈ H such that X(t) = Xh(t) . For a q-Gaussian process the covariance function ct,s := E(Xt Xs ) becomes ct,s = h(t), h(s) . The Wick products ψ(f1 ⊗ · · · ⊗ fn ) ∈ A are defined recurrently by ψ() := I, and ψ(f ⊗ f1 ⊗ · · · ⊗ fn ) := Xf ψ(f1 ⊗ · · · ⊗ fn ) −
n
(6)
q j −1 f, fj ψ(f1 ⊗ · · · ⊗ fj −1 ⊗ fj +1 ⊗ · · · ⊗ fn ).
j =1
An important property of Wick products is that if X = ψ(f1 ⊗ · · · ⊗ fn ) then X = f1 ⊗ · · · ⊗ fn . We will also use the connection with q-Hermite polynomials. If f = 1 then ψ f ⊗n = Hn Xf ,
(7)
(8)
see [2, Prop. 2.9]. Formulas (3), (7), and (8) show that for a unit vector f ∈ H we have 2
q |σ | = [n]q !. (9) = E Hn (Xf ) σ ∈Sn
Thus νq is indeed the distribution of Xf . Our main use of the Wick product is to compute certain conditional expectations.
2.4. Conditional expectations. Recall that a (non-commutative) conditional expectation on the probability space (A, E) with respect to the subalgebra B ⊂ A is a mapping E : A → B such that E(Y1 XY2 ) = E(Y1 E(X)Y2 ) for all X ∈ A, Y1 , Y2 ∈ B.
(10)
262
W. Bryc
We will study only algebras B generated by the identity and the finite number of random variables Xf1 , . . . , Xfn . In this situation, we will use a more probabilistic notation: E(X|Xf1 , . . . , Xfn ) := E(X), X ∈ A. In this setting conditional expectations are easily computed for X given by Wick products. This important result comes from [2, Theorem 2.13]. Theorem 1. If Y = ψ(g1 ⊗ · · · ⊗ gm ), X1 = Xf1 , . . . , Xk = Xfk for some fi , gj ∈ H and P : H → H denotes orthogonal projection onto the span of f1 , . . . , fk then E(Y|X1 , . . . , Xk ) = ψ(P g1 ⊗ · · · ⊗ P gm ). The following formula is an immediate consequence of Theorem 1 and (8), and is implicit in [2, Proof of Theorem 4.6]. Corollary 1. If X = Xf , Y = Xg with unit vectors f = g = 1 and Hn is the nth q-Hermite polynomial, see (2), then E(Hn (Y)|X) = f, g n Hn (X).
(11)
For a finite number of vectors f0 , f1 , . . . , fk ∈ H, let Xk := Xfk . These (noncommutative) random variables have linear regressions and constant conditional variances like the classical (commutative) Gaussian random variables. Proposition 1. E(X0 |X1 , . . . , Xk ) =
k
aj Xj
(12)
j =1
and E(X02 |X1 , . . . , Xk )
=(
k
aj Xj )2 + cI.
(13)
j =1
If f1 . . . , fk ∈ H are linearly independent then the coefficients aj , c are uniquely determined by the covariance matrix C = [ci,j ] := [ fi , fj ]. Notice that Eq. (13) can indeed be rewritten as the statement that conditional variance is constant,
V ar(X0 |X1 , . . . , Xk ) := E (X0 − E(X0 |X1 , . . . , Xk ))2 |X1 , . . . , Xk = cI.
Proof. This follows from Theorem 1 and (6). Write the orthogonal projection of f0 onto the span of f1 , . . . fk as the linear combination g = j aj fj . Then E(X0 |X1 , . . . , Xk ) = E(ψ(f0 )|X1 , . . . , Xk ) = ψ(g) =
j
aj ψ(fj ),
q-Gaussian Processes: Conditional Moments and Bell’s Inequality
263
which proves (12). Similarly, E(X02 − f0 2 I|X1 , . . . , Xk ) = E(ψ(f0 ⊗ f0 )|X1 , . . . , Xk ) 2 = ψ(g ⊗ g) = aj Xj − g2 I. j
This proves (13) with c = f0 2 − g2 . If f1 . . . , fk ∈ H are linearly independent then the representation g = j aj fj is unique. To analyze standardized triplets in more detail we need the explicit form of the coefficients. (We omit the straightforward calculation.) Corollary 2. If X := Xf , Y := Xg , Z := Xh and f, h ∈ H are linearly independent unit vectors, then E(Y|X, Z) = aX + bZ,
(14)
E(Y |X, Z) = (aX + bZ) + cI, 2
2
(15)
where f, g − g, h f, h
, 1 − f, h 2 g, h − f, g f, h
b= . 1 − f, h 2
a=
(16) (17)
Another calculation shows that c = det(C)/(1 − f, h 2 ), where C is the covariance matrix; in particular c ≥ 0. 3. Conditional Moments of Classical Versions We give the definition of a classical version which is convenient for bounded processes; for a more general definition, see [2, Def. 3.1]. Definition 2. A classical version of the process X(t) indexed by t ∈ T ⊂ R is a stochastic ˜ defined on some classical probability space such that for any finite number process X(t) of indexes t1 < t2 < · · · < tk and any polynomials P1 , . . . , Pk , E (P1 (X(t1 ))P2 (X(t2 )) . . . Pk (X(tk )))
˜ 1 ))P2 (X(t ˜ 2 )) . . . Pk (X(t ˜ k )) . = E P1 (X(t
(18)
Here E(·) denotes the classical expected value given by Lebesgue integral with respect to the classical probability measure. Our main interest is in finite index set T = {t1 , t2 , t3 }, where t1 < t2 < t3 . In this case we write X := X(t1 ), Y := X(t2 ), Z := X(t3 ). We say that an ordered triplet (X, Y, Z)
˜ ˜ ˜ ˜ has a classical version X, Y , Z, if E (P1 (X)P2 (Y)P3 (Z)) = E P1 (X)P2 (Y˜ )P3 ( Z) for all polynomials P1 , P2 , P3 . The classical version of a non-commutative process is order-dependent, since the left-hand side of (18) may depend on the ordering of the variables, while the right-hand side does not. For specific example in the context of q-Gaussian random variables, see [5, formulas (2.64) and (2.65)].
264
W. Bryc
3.1. Triplets. All pairs (Xf , Xg ) of q-Gaussian random variables have classical versions because E(Xfm Xgn ) = E(Xfn Xgm ) for all integer m, n; however, the classical version of a triplet may fail to exist. With this in mind we consider q-Gaussian triplets X := Xf , Y := Xg , Z := Xh .
(19)
To simplify the notation we take unit vectors f = g = h = 1. We assume that ˜ Y˜ , Z) ˜ of (X, Y, Z), in this order. there is a classical version (X, From Corollary 2 we know that non-commutative random variables X, Y, Z have linear regression and constant conditional variance. It turns out that the corresponding ˜ Y˜ , Z˜ also have linear regressions, while their conditional classical random variables X, variances get perturbed into quadratic polynomials. ˜ Y˜ , Z) ˜ is a classical version of the q-Gaussian triplet (19) then Theorem 2. If (X, ˜ Z) ˜ = a X˜ + bZ, ˜ E(Y˜ |X, 2 2 ˜ Z) ˜ = AX˜ + B X˜ Z˜ + C Z˜ 2 + D, E(Y˜ |X,
(20) (21)
where a, b are given by (16), (17), ab(1 − q) f, h + a 2 (1 − q f, h 2 ) , 1 − q f, h 2 ab(1 + q)(1 − f, h 2 ) B= , 1 − q f, h 2 ab(1 − q) f, h + b2 (1 − q f, h 2 ) C= , 1 − q f, h 2 A=
(22) (23) (24)
and D = 1 − A − B f, h − C.
(25)
The proof relies on the following technical result. Lemma 1. If Hn , Hm are q-Hermite polynomials given by (2), then f, h n+1 [n + 2]q ! if m = n + 2 n−1 f, h [n]q ! if m = n − 2 , E (Hn (X)ZXHm (Z)) = n−1 ([n] +1) f, h 2 +q[n] [n] ! if m = n f, h
q q q 0 otherwise (26) f, h n+1 [n + 2]q ! if m = n + 2 f, h n−1 [n] ! if m = n − 2 q E (Hn (X)XZHm (Z)) = , f, h n−1 [n + 1]q f, h 2 + [n]q [n]q ! if m = n 0 otherwise (27)
q-Gaussian Processes: Conditional Moments and Bell’s Inequality
E Hn (X)X2 Hm (Z) = E Hm (X)Z2 Hn (Z) f, h n+2 [n + 2]q ! if m = n + 2 f, h n−2 [n] ! if m = n − 2 q , = f, h n [n + 1]q + [n]q [n]q ! if m = n 0 otherwise f, h n [n]q ! if m = n E (Hn (X)Hm (Z)) = . 0 otherwise
265
(28)
(29)
˜ Y˜ , Z˜ are bounded random variables, to prove (20) we need Proof of Theorem 2. Since X, only to verify that for arbitrary polynomials P , Q,
˜ Y˜ Q(Z) ˜ = E P (X)(a ˜ X˜ + bZ)Q( ˜ ˜ . E P (X) Z) This is equivalent to E (P (X)YQ(Z)) = E (P (X)(aX + bZ)Q(Z)) , see (18). The latter follows from (14) and (10), proving (20). To prove (21), we verify that for arbitrary polynomials P , Q we have
˜ = E P (X)(A ˜ ˜ . ˜ Y˜ 2 Q(Z) X˜ 2 + B X˜ Z˜ + C Z˜ 2 + D)Q(Z) E P (X) By definition (18), this is equivalent to
E P (X)Y2 Q(Z) = E P (X)(AX2 + BXZ + CZ2 + D)Q(Z) .
(30)
It suffices to show that (30) holds true when P = Hn and Q = Hm are the q-Hermite polynomials defined by (2). Formula (15) implies that the left-hand side of (30) is given by cE(Hn (X)Hm (Z)) + a 2 E(Hn (X)X2 Hm (Z)) + b2 E(Hn (X)Z2 Hm (Z)) + abE(Hn (X)XZHm (Z)) + abE(Hn (X)ZXHm (Z)), and the right-hand side becomes AE(Hn (X)X2 Hm (Z)) + CE(Hn (X)Z2 Hm (Z)) + BE(Hn (X)XZHm (Z)) + DE(Hn (X)Hm (Z)). Using formulas from Lemma 1 we can see that both sides are zero, except when m = n or m = n ± 2. We now consider these three cases separately. Case m = n + 2. Using Lemma 1, (30) simplifies to (a 2 f, h 2 +2ab f, h +b2 ) f, h n [n+2]q ! = (A f, h 2 +B f, h +C) f, h n [n+2]q !. This equation is satisfied when coefficients A, B, C satisfy the equation A f, h 2 + B f, h + C = a 2 f, h 2 + 2ab f, h + b2 .
(31)
266
W. Bryc
Case m = n − 2. Using Lemma 1, (30) simplifies to (a 2 + 2ab f, h + b2 f, h 2 ) f, h n−2 [n]q ! = (A + B f, h + C f, h 2 ) f, h n−2 [n]q !. This equation is satisfied whenever A + B f, h + C f, h 2 = a 2 + 2ab f, h + b2 f, h 2 .
(32)
Case m = n. We use again Lemma 1. On both sides of Eq. (30) we factor out f, h n−1 [n]q !, and equate the remaining coefficients. (This is allowed since we are after sufficient conditions only!) We get
f, h (a 2 + b2 )([n + 1]q + [n]q )
+ ab f, h 2 [n + 1]q + (1 + q)[n]q + c f, h
= (1 + q)(A + C) + B(q f, h 2 + 1)[n]q + D f, h .
Now we use [n + 1]q = 1 + q[n]q . Suppressing the correction to the constant term (i.e., the term free of n), we get
(1 + q) f, h (a 2 + b2 ) + ab(1 + f, h 2 ) [n]q + c f, h + . . .
= (1 + q)(A + C) f, h + B(q f, h 2 + 1) [n]q + D f, h , where c + . . . denotes the suppressed constant term corrections. This equation holds true when the coefficients at [n]q match, which gives (1 + q) f, h (A + C) + B(q f, h 2 + 1)
= (1 + q) (a 2 + b2 ) f, h + ab( f, h 2 + 1) ,
(33)
and the constant terms match: c + · · · = D. The latter holds true when the expectations are equal (n = m = 0), and hence this condition is equivalent to (25). The remaining three equations (31), (32), and (33) have a unique solution given by the expressions (22), (23), (24). Proof of Lemma 1. Using the definition of vacuum expectation state, (7) and (8) we get E(Hn (X)ZXHm (Z)) = ZHn (X)|XHm (Z) q = Xh f ⊗n |Xf h⊗m q . Therefore (4), and (5) imply E(Hn (X)ZXHm (Z)) = [n]q f, h f ⊗n−1 + h ⊗ f ⊗n |[m]q f, h h⊗m−1 + f ⊗ h⊗m q .
(34)
The latter is zero, except when m = n or m = n ± 2. We will consider these two cases separately. If m = n, by orthogonality we have E(Hn (X)ZXHm (Z)) = [n]2q f, h 2 f ⊗n−1 |h⊗n−1 q + h ⊗ f ⊗n |f ⊗ h⊗n q .
q-Gaussian Processes: Conditional Moments and Bell’s Inequality
267
Clearly, f ⊗n−1 |h⊗n−1 q = f, h n−1 [n − 1]q !; this can be seen either from (9) and (11), or directly from the definition (3). By (3) the second term splits into the sum over permutations σ ∈ Sn+1 such that σ (1) = 1 and the sum over the permutations such that σ (1) = k > 1. This gives h ⊗ f
⊗n
|f ⊗ h
⊗n
q =
f, h q
σ ∈Sn n+1
= f, h
|σ |
n
f, h +
n+1
q k−1+|σ | f, h n−1
k=2 σ ∈Sn n−1 [n]q ! + f, h q[n]q [n]q !.
Elementary algebra now yields (26) for m = n. If m = n + 2, then the right-hand side of (34) consists of only one term we get E(Hn (X)ZXHn+2 (Z)) = [n + 2]q f, h h ⊗ f ⊗n |h⊗n+1 q = [n + 2]q f, h
q |σ | f, h n = f, h n+1 [n + 2]q !. σ ∈Sn+1
Since m = n − 2 is given by the same expression with the roles of m, n switched around, this ends the proof of (26). The remaining expectations match the corresponding commutative values, and can also be evaluated using recurrence (2) and formulas (11), and (9). To prove (27) notice that since X and Hn (X) commute, using (2) and (11) we get E(Hn (X)XZHm (Z)) = E(XHn (X)(Hm+1 (Z) + [m]q Hm−1 (Z))) = f, h m+1 E(XHn (X)Hm+1 (X)) + [m]q f, h m−1 E(XHn (X)Hm−1 (X)). The only non-zero values are when m = n, or m = n ± 2. Using (2) again, and then (9) we get (27). Since by (11) we have E(Hn (X)X2 Hm (Z)) = f, h m EX2 Hn (X)Hm (X), recurrence (2) used twice proves (28). Formula (29) is an immediate consequence of (11) and (9).
3.2. Relation to processes with independent increments. In [2, Definition 3.5] the authors define the non-commutative q-Brownian motion and show that it has a classical version, see [2, Cor. 4.5]. Since the classical version of the q-Brownian motion is Markov, Theorem 2 implies that all regressions are linear, and all conditional variances are quadratic. A computation gives the following expression for the conditional variances. Proposition 2. Let X˜ t be the classical version of the q-Brownian motion, i. e., ft , fs = min{s, t}. Then for t1 < t2 < · · · < tn < s < t we have Var(X˜ s |X˜ t1 , . . . , X˜ tn , X˜ t )
˜ t − X˜ tn t X˜ t − tn X˜ tn (1 − q) t t X n (t − s) (s − tn ) . 1+ = s (t − qtn ) (t − tn )2
268
W. Bryc
In [8], classical processes with independent increments, linear regressions, and quadratic conditional variances are analyzed. These processes have the same covariances as qBrownian motion, but the conditional variances are quadratic functions of the increment X˜ t − X˜ tn only. Proposition 2 shows that the classical realizations of q-Brownian motion are not among the processes in [8] and thus have dependent increments. 4. Bell’s Inequality It is well known that all q-Gaussian n-tuples with q = 1 have classical versions: these are given by the classical Gaussian distribution with the same covariance matrix [ fi , fj ]. For q = −1 the classical version of the the standardized q-Gaussian triplet (X, Y, Z) consists of the ±1-valued symmetric random variables. The celebrated Bell’s inequality [1] therefore restricts their covariances: 1 − f, h ≥ | f, g − g, h |.
(35)
In particular, there are triplets of q-Gaussian random variables with q = −1 which do not have a classical version. The following shows that restriction (35) is in force for sub-Markov covariances over the entire range −1 ≤ q < 1. ˜ Y˜ , Z) ˜ is a classical version of q-Gaussian (X, Y, Z) := Theorem 3. Suppose that (X, (Xf , Xg , Xh ), where f, h ∈ H are linearly independent, and −1 ≤ q < 1. If either f, g g, h ≤ f, h and 0 < f, h < 1,
(36)
or f, h = 0, or q = −1, then inequality (35) holds true. Proof. Since the case q = −1 is well known, we restrict our attention to the case −1 < q < 1. Our starting point is expression (21). A computation shows that the
2 ˜ Z) ˜ := E(Y˜ 2 |X, ˜ Z) ˜ − E(Y˜ |X, ˜ Z) ˜ conditional variance Var(Y˜ |X, is as follows. (1 + q)(1 − f, h 2 ) + 2(1 − q) 1 − q f, h 2
ab (1 − q) ˜ ˜ f, h X˜ − Z˜ . − Z f, h
− X 1 − q f, h 2
˜ Z) ˜ = 1 − a 2 − b2 − ab f, h
Var(Y˜ |X,
(37)
˜ Z. ˜ It The right-hand side of this expression must be non-negative over the support of X, ˜ ˜ is known, see [4, Lemma 8.1] or [2, Theorem 1.10], that X, Z have the joint probability density function f √ (x, z) with respect to√the product of marginals νq . Moreover, f is defined for all −2/ 1 − q ≤ x, z ≤ 2/ 1 − q and from its explicit product expansion we can see that ∞ (1 − f, h 2 q k ) f (x, z) ≥ (1 + f, h q k )4 k=0
is strictly positive.√In particular, the right-hand √ √ side of (37) must be non-negative when √ evaluated at X˜ = 2/ 1 − q, Z˜ = − 2/ 1 − q.
q-Gaussian Processes: Conditional Moments and Bell’s Inequality
269
˜ Y˜ we get the rational expression Using formulas (16), (17) with the above values of X, for the conditional variance which can be written as follows.
˜ Z˜ 1 − q f, h 2 (1 − f, h )2 Var Y˜ |X,
(38) = (1 − f, h )2 1 − q f, h 2 + (1 + q) f, h f, g g, h
− ( f, g − g, h )2 (1 + f, h 2 ). Therefore
(1 − f, h )2 1 − q f, h 2 + (1 + q) f, h f, g g, h
≥ ( f, g − g, h )2 (1 + f, h 2 ). Since the assumptions imply that 1 − q f, h 2 + (1 + q) f, h f, g g, h ≤ 1 + f, h 2 , this implies (1 − f, h )2 ≥ ( f, g − g, h )2 , proving (35). 4.1. Examples. The first example shows that there are covariances such that q- Gaussian random variables have no classical version for all −1 ≤ q < 1. Example 1. Consider the case f, h = g, h > 0. This can be realized when the covariance matrix is non-negative definite; a computation shows that this is equivalent to the condition 2 f, h 2 ≤ 1 + f, g . Since (36) is satisfied, Bell’s inequality (35) implies 1 + f, g ≥ 2 f, h . Therefore, all choices of vectors f, g, h ∈ H such that f, h = g, h , 0 < f, h < 1, and 2 f, h 2 − 1 < f, g < 2 f, h − 1 lead to q-Gaussian triplets with no classical version for −1 ≤ q < 1. A nice feature of Theorem 3 is that its statement does not depend on q, as long as q < 1. But such a result cannot be sharp. A less transparent statement that the conditional variance is non-negative is a stronger restriction on the covariances, and it depends on q. This is illustrated by the next example. Example 2. Suppose f, h = g, h = 1/2. Inequality (35) used in Example 1 implies that if a classical version of a q-Gaussian process exists then f, g ≥ 0. Evaluating the √ ˜ Z) ˜ at X˜ = 2/ 1 − q, Z˜ = −X˜ we get a more restrictive conditional variance Var(Y˜ |X, constraint f, g ≥ q+5 36 . Acknowledgements. I would like to thank the referee, A. Dembo, and T. Hodges for suggestions which improved the presentation, and to V. Kaftal for several helpful discussions.
References 1. Bell, J.S.: On the Einstein–Podolski–Rosen paradox. Physics 1, 195–200 (1964) 2. Bo˙zejko, M., Kümmerer, B., and Speicher, R.: q-Gaussian processes: Non-commutative and classical aspects. Commun. Math. Phys. 185, 129–154 (1997) 3. Bo˙zejko, M. and Speicher, R.: An example of a generalized Brownian motion. Commun. Math. Phys. 137, (3), 519–531 (1991) 4. Bryc, W.: Stationary random fields with linear regressions. Ann. Probab. in print, 2001 5. Frisch, U. and Bourret, R.: Parastochastics. J. Math. Phys. 11, (2), 364–390 (1970) 6. Koekoek, R. and Swarttouw, R.F.: The Askey-scheme of hypergeometric orthogonal polynomials and its q-analogue. Report no. 98–17, Delft University of Technology, 1998, www:http://aw.twi.tudelft.nl/˜koekoek/askey.html
270
W. Bryc
7. Voiculescu, D.V. Dykema, K.J., and Nica, A.: Free random variables. Providence, RI: American Mathematical Society, 1992 8. Wesołowski, J.: Stochastic processes with linear conditional expectation and quadratic conditional variance. Probab. Math. Statist. (Wrocław) 14, 33–44 (1993) Communicated by H. Araki
Commun. Math. Phys. 219, 271 – 322 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Multiplicity of Phase Transitions and Mean-Field Criticality on Highly Non-Amenable Graphs Roberto H. Schonmann Department of Mathematics, UCLA, Los Angeles, CA 90095, USA. E-mail:
[email protected] Received: 27 March 2000 / Accepted: 7 December 2000
Abstract: We consider independent percolation, Ising and Potts models, and the contact process, on infinite, locally finite, connected graphs. It is shown that on graphs with edge-isoperimetric Cheeger constant sufficiently large, in terms of the degrees of the vertices of the graph, each of the models exhibits more than one critical point, separating qualitatively distinct regimes. For unimodular transitive graphs of this type, the critical behaviour in independent percolation, the Ising model and the contact process are shown to be mean-field type. For Potts models on unimodular transitive graphs, we prove the monotonicity in the temperature of the property that the free Gibbs measure is extremal in the set of automorphism invariant Gibbs measures, and show that the corresponding critical temperature is positive if and only if the threshold for uniqueness of the infinite cluster in independent bond percolation on the graph is less than 1. We establish conditions which imply the finite-island property for independent percolation at large densities, and use those to show that for a large class of graphs the q-state Potts model has a low temperature regime in which the free Gibbs measure decomposes as the uniform mixture of the q ordered phases. In the case of non-amenable transitive planar graphs with one end, we show that the q-state Potts model has a critical point separating a regime of high temperatures in which the free Gibbs measure is extremal in the set of automorphism-invariant Gibbs measures from a regime of low temperatures in which the free Gibbs measure decomposes as the uniform mixture of the q ordered phases. Contents 1.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Percolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
272 272 274
Work partially supported by the N.S.F. through grants DMS-9703814 and DMS-0071766 and by a Guggenheim Foundation fellowship
272
2.
3.
4.
5.
6.
7.
R. H. Schonmann
1.3 Potts and Ising models . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Contact process . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Organization of the paper . . . . . . . . . . . . . . . . . . . . . . Background on Infinite Graphs . . . . . . . . . . . . . . . . . . . . . . . 2.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Transitive graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Isoperimetric constants . . . . . . . . . . . . . . . . . . . . . . . 2.5 Spectral radius . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Number of ends . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Cut sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Planar graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Independent Bond Percolation . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Background on independent bond percolation on graphs . . . . . . 3.2 Mean field criticality for independent percolation . . . . . . . . . . 3.3 Consequences for independent percolation of “high non-amenability” . . . . . . . . . . . . . . . . . . . . . . Potts and Ising Models I . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Applications of the Fortuin–Kasteleyn random cluster model . . . . 4.2 Mean field criticality for the Ising model . . . . . . . . . . . . . . 4.3 Consequences for FK and Potts models of “high non-amenability” Potts and Ising Models II. The Finite Island Property . . . . . . . . . . . . 5.1 Sufficient condition for the Potts free Gibbs measure to decompose as the uniform mixture of the q ordered phases . . . 5.2 Finite island property for graphs with positive anchored vertex isoperimetric constant . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Finite island property for transitive graphs with one end which satisfy the quasi-connected minimal cut sets property . . . . 5.4 Finite island property for planar graphs . . . . . . . . . . . . . . . Related Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Multiplicity of phase transitions in the FK model . . . . . . . . . . 6.2 Site percolation . . . . . . . . . . . . . . . . . . . . . . . . . . . The Contact Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Separation of critical points for the contact process . . . . . . . . . 7.2 Mean field criticality for the contact process . . . . . . . . . . . .
277 281 282 283 283 284 285 286 287 287 288 288 288 288 291 293 294 294 302 303 305 305 306 309 310 311 311 312 313 313 317
1. Introduction 1.1. Preliminaries. In this paper we address issues in the fast growing area of research that can be called “statistical-mechanics type processes on graphs”. In this area one is interested in studying the basic lattice models from statistical mechanics and related areas, including percolation and interacting particle systems, but with the typical Euclidean lattices (Zd and its relatives) replaced by a fairly general graph. For a recent review, see [Lyo5]. It has always been clear that several results in statistical mechanics are best formulated on arbitrary graphs, and do not depend on the particular structure of the graph. For instance, this is typically the case with correlation inequalities. It has also long been accepted that the study of the systems on homogeneous trees is worthwhile, in that it
Multiplicity of Phase Transitions
273
is typically simpler than on Euclidean lattices, and it sheds light on the behavior of the systems in high dimensional Euclidean lattices. It is important, therefore, to stress that the current surge of interest in statistical mechanics type processes on graphs and in particular the motivation for the current paper derive from a much broader perspective. Some of the main reasons for this interest are listed next. (1) Certain elegant and enlightening results are not present when one only considers Euclidean lattices. In other words, relevant mathematical structure is hidden when the systems are studied only on Euclidean lattices. (2) The interplay between the geometry of the graph and the behavior of the statistical mechanics system is often very interesting. The study of the processes sheds light on the geometry of the graph and conversely. (3) The study of systems on general graphs is important if one is interested in systems in random environment. For instance, the study of a diluted system on a cubic lattice may be seen as the study of the system on percolation clusters. (4) Non-Euclidean geometries are important in Physics, so that the study of physical systems on non-Euclidean lattices may be of relevance also to Physics. In Chemistry, especially after the discovery of fullerenes, it became clear that even in a three dimensional Euclidean universe atoms and molecules can arrange themselves in fashions which locally display the geometry of non-Euclidean surfaces (see, e.g., [VT] and [TM]). Quasi-crystals are still another source of motivation. (5) For applications to other areas of Science, including Biology, Economics, etc, it is natural to consider the agents as being modeled by the vertices of a graph, with the edges of the graph indicating the presence of interaction among agents (infection, mating, trade, etc.). Such graphs do not need to be regular in any sense. (6) Problems which remain open on Euclidean lattices may be better understood, and perhaps even eventually solved, by placing them in this broader context. Moreover, the added breadth of the enterprise attracts to it mathematicians with various backgrounds and in this way brings in new ideas to the field. Here we will consider three of the most basic statistical-mechanics type processes: independent percolation, Potts models (especially Ising models) and the contact process. We will obtain certain results which are similar for all these processes, and other results that relate some of these processes to each other. Papers in which percolation has been studied with emphasis on graphs which are not Euclidean lattices include: [Lyo2, GN, Lyo3, Wu1, Lyo4, BS1, Häg2, Lal1, BB, BLPS1, BLPS2, GS, HP, Sch2, Sch3, HPS, BLS, LyS, Häg3, CP, HSS, Per, MP, CS, PS-N, Lal3, BS3]. See also [BS2] for an on-line, continuously updated, account of progress in this area, as well as for links to various related Web sites. For an introduction to the Ising model and other statistical mechanics systems on homogeneous trees, see, e.g., Chapter 12 of [Geo] and the references provided there, as well as the papers [BRZ] and [Iof1] that appeared afterwards. The Ising model on more general graphs (with and without an external field) has been studied in a number of recent papers, including: [Lyo1, SeS, RNO, NW, Wu1, Wu3, Iof2, ST, JS, Häg3, EKPS, HSS] and [Wu4]. The contact process on homogeneous trees has been studied in [Pem, MS, MSZ, DS, Wu2, Zha, Lig2, Sta, Lig3, LS, SaS2, Lal2], and [Sch1]. For a comprehensive presentation on the subject see [Lig4]. The contact process on more general graphs has been studied primarily in [SaS1, SaS3, Sal] and [PS].
274
R. H. Schonmann
This introductory section is written having in mind a reader who is familiar with mathematical statistical mechanics and related subjects, but may not be so familiar with the study of such processes on graphs. Still, to keep this introduction focused, we will postpone the presentation of a detailed background on graphs, including various technical definitions, to Sect. 2. Also, to enhance readability, we state the results in the introduction not in their strongest possible form, but rather in some form which seems particularly transparent and appealing to us. Stronger, but more cumbersome results will appear in later sections of the paper. One of the main concerns in this paper is with the critical behavior. We will provide results that show that in certain cases it is of “mean-field nature”. We suppose that the reader is familiar with the general meaning of the above expression in quotation marks, and we will use it freely in this introduction, even when stating conjectures. Precise meanings will appear in the following sections. For the moment it is worth saying that in the cases in this paper in which we proved the “mean-field nature” of a critical point, what we proved is that a certain diagrammatic condition (the open triangle condition for percolation and the contact process and the finiteness of the bubble diagram for the Ising model) holds. The precise definitions of the diagrammatic conditions and of the critical exponents which are implied to take mean-field value are postponed to later sections, to keep this introduction from becoming too long. A very large class of graphs on which it is natural to study statistical mechanics type systems is that of the infinite, locally finite, connected graphs. But for most of the purposes in this paper the smaller class of transitive (same as vertex-transitive or homogeneous) graphs is the appropriate playground. Informally, a graph is transitive if all its vertices play exactly the same role. Therefore transitive graphs can be seen as “discrete models of a homogeneous space”. Basic examples of transitive graphs are the cubic lattices, Zd , d = 1, 2, . . . , other regular tilings of Euclidean space, like the triangular and hexagonal lattices, and the homogeneous trees. Other examples of transitive graphs include regular tilings of hyperbolic spaces, which can be thought of as representing “non-Euclidean crystals”. Moreover new transitive graphs can be obtained from old ones, by taking for instance Cartesian products. A very important numerical feature of a graph G = (V , E) is its Cheeger constant (same as edge-isoperimetric constant), defined as |∂E S| iE (G) = inf : S ⊂ V , 0 = |S| < ∞ , |S| where ∂E S is the edge boundary of S. Note that iE (G) lies always in the interval [0, D(G)), where D(G) is the maximal degree (same as coordination number) of the vertices of G. Graphs which have iE (G) = 0 (e.g., Euclidean lattices, including Zd , d= 1,2, . . . ) are called amenable graphs, while graphs which have iE (G) > 0 (e.g., homogeneous trees with degree at least three and regular tilings of hyperbolic spaces) are called non-amenable graphs. We state now in succession results on percolation, Potts and Ising models and the contact process. We will suppose that the reader is either familiar with the basics about these models, or will learn it from the references that are provided.
1.2. Percolation. For a general introduction to percolation see for instance [Gri2]. We will mostly consider independent bond percolation in this paper, adding the usual remark that results similar to the ones here hold also for independent site percolation, with
Multiplicity of Phase Transitions
275
similar proofs. For the reader’s convenience, Sect. 6 contains the statements of some of the corresponding results for independent site percolation, with indications on how to adapt the proofs to this setting. In independent bond percolation, each edge (same as bond) of a graph G = (V , E) is occupied with probability p and vacant otherwise, these decisions being independent for distinct edges. Clusters are the infinite connected components of the graph obtained by deleting the vacant edges from G. A basic result about percolation on a transitive graph G is that there are two critical points, 0 < pc (G) ≤ pu (G) ≤ 1, which define the boundaries of three distinct phases described as follows: (1) For p < pc (G) there is no infinite cluster a.s. (2) For pc (G) < p < pu (G) there are infinitely many infinite cluster a.s. (3) For p > pu (G) there is exactly one infinite cluster a.s. The monotonicity in p of uniqueness of the infinite cluster, contained in this statement, was proved in the case of unimodular (see definition in the next section) graphs in [HP], and generalized to all transitive graphs in [Sch2]. On amenable transitive graphs it follows from the arguments of [BK] that pc (G) = pu (G). On the other hand, there is a number of examples of non-amenable transitive graphs for which pc (G) < pu (G) (see [GN, BS1, Lal1, PS-N, Lal3] and [BS3]). One of the most interesting conjectures about percolation on transitive graphs is therefore the following one, originally stated in [BS1]. Conjecture 1.1 (Benjamini and Schramm). Suppose that G is an infinite, locally finite, connected transitive graph. Then pc (G) < pu (G) iff G is non-amenable. As for the critical behavior near and at pc (G), the arguments which lead to the conclusion that on Zd , for large d, the behavior is of mean-field nature (see, e.g. [Aiz2, AN, BA]) suggest the following conjecture. Conjecture 1.2. Suppose that G is an infinite, locally finite, connected transitive graph. If pc (G) < pu (G) then the behavior near and at pc is of mean-field nature. The following theorem is motivated by these two conjectures. The term “unimodular”, which appears there is very technical and will only be defined in Subsect. 2.2. For the moment, we simply point out that most transitive graphs of interest are known to be unimodular. For instance, this is the case for all Cayley graphs of finitely generated infinite groups (see definition also in Subsect. 2.2). Various conjectures for transitive graphs have only been proved to hold for the unimodular ones. This is due mostly to the availability in this case of a technique called “mass transport” (see, e.g., [BLPS1, BLPS2, HPS, BLS, LyS, Per, Lyo5] and Subsect. 3.2 of the current paper). Theorem 1.1. Suppose √ that G is an infinite, locally finite, connected transitive graph. If iE (G)/D(G) > 1/ 2, then pc (G) < pu (G) and the open triangle condition holds, implying that if G is unimodular a.s. there is no infinite cluster at pc (G) and the critical exponents γ , β, δ and exist and take their mean-field values. Note. The conclusion that pc (G) < pu (G) in Theorem 1.1 is not new. It is morally contained in [BS1], where an upper bound on pc (G) ((3.1) here) and a lower bound on pu (G) ((3.6) here) were obtained. When combined with an inequality available in [Moh1] ((2.4) here), these bounds imply pc (G) < pu (G). This fact was realized independently
276
R. H. Schonmann
by I. Pak and T. Smirnova-Nagnibeda, who stated it (in a somewhat more restricted setting) as Proposition 1(1) in [PS-N]. Note that if Corollary 1.1 turns out to be false, then the numerical constant defined by ϕ = inf{φ : iE (G)/D(G) > φ ⇒ pc (G) < pu (G)} would be non-trivial, i.e., different from 0 (or 1). In this case, Corollary 1.1 would be replaced with the question of obtaining the value of this constant. The triangle condition (to be defined in Subsect. 3.2) was proved to hold on certain Euclidean lattices in fundamental work by Hara and Slade (see, e.g., [HS]). Their work includes the cubic lattices Zd with d ≥ 19. It also includes modifications of Zd with d > 6, obtained by adding edges between any two vertices which in the original graph are within a large fixed distance L of each other (spread-out models). The only published result on the triangle condition being valid for a non-amenable graph that we are aware of is in [Wu1]. In that paper this is proved for a Cartesian product of Z and a homogeneous tree with large degree (the same graphs for which [GN] first showed the possibility of having 0 < pc < pu < 1). Theorem 1.1 extends this result. Absence of percolation at pc has been proved for all the unimodular non-amenable transitive graphs in [BLPS1] and [BLPS2], but it remains unproved for general transitive graphs. It is natural to ask also when is pu (G) < 1. This issue will appear, for instance, in connection to our results on the Potts model, presented in the next subsection. A fundamental topological notion related to the conjectured answer to this question is that of the number of ends of a graph. This number is the supremum of the number of infinite connected components of subgraphs produced by removing finitely many edges from the original graph. It is known (see, e.g., Sect. 6 of [Moh2]) that infinite, locally finite, connected, transitive graphs can only have 1, 2 or infinitely many ends. It is easy to prove (see Sect. 8 of [HPS] and Subsect. 3.1 of the current paper) that if the number of ends of the transitive graph G is 2, then pc (G) = pu (G) = 1, and that if the number of ends is infinity, then pc (G) < pu (G) = 1. In [BS1] it was asked whether transitive graphs with 1 end would have pu (G) < 1. By now this is widely believed to be the case, and it was explicitly stated as a conjecture (in the non-amenable case) in [Lyo5]. Conjecture 1.3. Suppose that G is an infinite, locally finite, connected transitive graph. Then pu (G) < 1 iff G has 1 end. The following classes of transitive graphs with one end have been proved to have pu (G) < 1. In [BB] this was done for graphs which, in the terminology being introduced in the current paper, have the quasi-connected minimal cut sets property. (Roughly speaking, a graph has this property if any minimal set of edges which separates two arbitrarily chosen vertices cannot be split into two sets which are arbitrarily far apart. For a precise definition, see Subsect. 2.7.) In [HPS] this was done for graphs which are Cartesian products of infinite transitive graphs. (For the definition of a Cartesian product of graphs, see Subsect. 2.3.) In [Lal1, Lal3] and [BS3] this was done for planar graphs with one end. (Roughly speaking, a graph is planar if it can be embedded in R2 with vertices being represented by points and edges being represented by lines which connect the corresponding vertices and can only intersect at their end-points. For a precise definition, see Subsect. 2.8.) In [LyS] this was done for Cayley graphs of Kazhdan groups (see the definition in that paper).
Multiplicity of Phase Transitions
277
1.3. Potts and Ising models. For a general introduction to the statistical mechanics of lattice systems see for instance [Geo]. For an introduction to the Potts model see for instance [GHM]. For q ∈ {2, 3, . . . }, the (ferromagnetic) q-Potts model on an infinite, locally finite, connected graph G = (V , E) is the statistical mechanics model in which at each site x there is a spin σx which can take values in {1, . . . , q} and for which the formal Hamiltonian is given by δσx ,σy , H (σ ) = − {x,y}∈E
where δa,b = 1 if a = b and δa,b = 0 if a = b. The Ising model corresponds to the case q = 2. The inverse temperature parameter β > 0 is included in the definition of the Gibbs distributions, formally given by µβ (σ ) =
exp(−βH (σ )) . Normalization
The set of all the Gibbs distributions (DLR states) for the q-Potts model on the infinite G,q graph G at inverse temperature β will be denoted by Gβ . The set of its extremal elements G,q
will be denoted by (Gβ )ext . The set of automorphism invariant Gibbs distributions will G,q
G,q
be denoted by Gβ,A , and the set of its extremal elements will denoted by (Gβ,A )ext . G,q
G,q
It is well known that q Gibbs distributions µ1,β , . . . , µq,β can be obtained by taking infinite volume limits using as boundary conditions the configurations in which all the spins take, respectively, the same value 1, . . . , q. These distributions, called the ordered G,q G,q G,q phases, belong to (Gβ )ext ∩Gβ,A (and hence to (Gβ,A )ext ), as can be seen by constructing them using the Fortuin–Kasteleyn (FK) random cluster model, in the coupled fashion introduced in [ES], and using the FKG inequalities for the FK model. G,q G,q G,q It is also well known that |Gβ | = 1 iff µ1,β = · · · = µq,β . Moreover, there is βc (G, q) ∈ [0, ∞] such that G,q
(1) For β < βc (G, q), |Gβ | = 1. G,q
G,q
(2) For β > βc (G, q), µi,β , i = 1, . . . , q are distinct (hence |(Gβ )ext | ≥ q). G,q
A further Gibbs distribution, µf,β , can be obtained taking infinite-volume limits using free boundary conditions. In this case construction using the FK model and use of G,q G,q the FKG inequalities for the FK model, give that µf,β ∈ Gβ,A , but not necessarily that it is an extremal element of this set of distributions. On Z2 it is known that for each q, for all β > βc (G, q), G,q
G,q
G,q
(Gβ,A )ext = {µ1,β , . . . , µq,β }, G,q
G,q
(1.1)
so that in particular µf,β ∈ (Gβ,A )ext . (This was proved for the Ising model in [MM-S] and extended to the other Potts models in [BCK].) In contrast, on homogeneous trees, G,q G,q µf,β ∈ (Gβ,A )ext for all β > 0. (This was proved in Theorem 12.31 of [Geo].) In [NW] the Potts model was studied on the Cartesian product of Z and a homogeneous tree with large degree. The following contrasting results were proved, for each value of
278
R. H. Schonmann
q. For large β, G,q
µf,β =
i=1,...,q
G,q
1 G,q µ , q i,β
(1.2)
G,q
so that in particular µf,β ∈ (Gβ,A )ext . But for an interval of values of β above βc (G, q), G,q
µf,β has decaying correlations, so that (1.2) must fail. Similar results were obtained in [Wu3] for other non-amenable (but in this case non-transitive) graphs. In [Jon] (Theorem 4.4(a)) it was shown that if G is transitive and non-amenable, then for large q, (1.2) fails for values of β in a non-degenerate interval. In contrast, the following can be proved, for instance by combining Lemma 4.3 of [Jon] with Theorem 4.2 of [Gri1] (adapted to transitive amenable graphs in Sect. 3.2 of [Jon]) and Corollary 4.5 of [Pfi]. (This theorem strengthens Theorem 4.4(b) in [Jon] in the transitive case.) Theorem 1.2. Suppose that G is an infinite, locally finite, connected transitive graph. If G is amenable then for each q and each value of β, (1.1) and (1.2) are equivalent. Moreover, these statements can fail for at most countably many values of β. The following detailed conjecture contains natural extensions of the results above. It includes an analogue of Corollary 1.1 for Potts models, and also proposes relations between the behavior of percolation and Potts models. (Part (a.1) is (1) above, included for comparison with the other statements. The fact that for transitive graphs, or more generally for bounded degree graphs, βc (G, q) > 0 is also well known; it can be proved, e.g., using the techniques in Chapter 8 of [Geo]. Compare also (b.2) in the conjecture with the known fact that βc (G, q) < ∞ iff pc (G) < 1; for this see [Häg3].) Conjecture 1.4. Suppose that G is an infinite, locally finite, connected transitive graph. ¯ q) such that: Then for each q there exist 0 < βc (G, q) ≤ β(G, G,q
(a.1) For β < βc (G, q), |Gβ | = 1. ¯ (a.2) For βc (G, q) < β < β(G, q), G,q
G,q
G,q
G,q
G,q
(Gβ,A )ext = {µ1,β , . . . , µq,β , µf,β } (|(Gβ,A )ext | = q + 1). ¯ (a.3) For β > β(G, q), G,q
G,q
G,q
G,q
(Gβ,A )ext = {µ1,β , . . . , µq,β } (|(Gβ,A )ext | = q). Moreover ¯ (b.1) βc (G, q) < β(G, q) iff G is non-amenable. ¯ (b.2) β(G, q) < ∞ iff pu (G) < 1. To provide an idea of how open the conjecture above is, we make two observations. First, even in the widely studied case of the Ising model on Zd , only in dimension d=2 G,q G,q G,q it is known that (Gβ,A )ext = {µ1,β , µ2,β } for all β > βc . In higher dimensions this is known only up to the possible existence of a countable set of exceptional values of β > βc , where it would fail (this was proved in [Leb]; it was extended to Potts models in [Pfi]). Second, in the case of the Ising model on a homogeneous tree it is easy to see
Multiplicity of Phase Transitions
279
G,q G,q that β¯ = ∞ (since µf,β has decaying correlations), but while it is known that µ1,β , G,q
G,q
G,q
µ2,β and µf,β are three distinct elements of (Gβ,A )ext for β > βc it is not known that these distributions exhaust this set. We state now some results which can be seen as further contributions in support of Conjecture 1.4. We will use the following definitions: G,q G,q β¯1 (G, q) = sup β ≥ 0 : µf,β ∈ (Gβ,A )ext for all β < β , 1 G,q G,q β¯2 (G, q) = inf β ≥ βc (G, q) : µf,β = µi,β for all β > β . q i=1,...,q
Note that clearly βc (G, q) ≤ β¯1 (G, q) ≤ β¯2 (G, q), and that if Conjecture 1.4 is true, ¯ then β¯1 (G, q) = β¯2 (G, q) (= β(G, q)). Theorem 1.3. Suppose that G is an infinite, locally finite, connected transitive graph. If iE (G)/D(G) > q/ 1 + q 2 , then βc (G, q) < β¯1 (G, q), so that for the q-Potts model G,q G,q on G there exists a non-degenerate interval of values of β on which µ1,β , . . . , µq,β and G,q
G,q
µf,β are q + 1 distinct elements of (Gβ,A )ext . Recall that transitive graphs with more than one end have pu (G) = 1. Theorem 1.4. Suppose that G is an infinite, locally finite, connected transitive graph. If G,q G,q G has more than one end, then we have for each q and every β > 0 that µf,β ∈ (Gβ,A )ext , so that β¯1 (G, q) = ∞. The next theorem shows, in particular, that under the extra assumption of unimodG,q G,q ularity, the property that µf,β ∈ (Gβ,A )ext is monotone decreasing in β, meaning that once it holds for some value of β, it holds also for smaller values of β. Theorem 1.5. Suppose that G is an infinite, locally finite, connected transitive unimodular graph. Then for each q: G,q
G,q
G,q
(a) For each β > 0 either µf,β ∈ (Gβ,A )ext , or else µf,β is a mixture of exactly q G,q
distinct elements of (Gβ,A )ext . G,q G,q (b) For all β > β¯1 (G, q), µf,β ∈ (Gβ,A )ext . (c) β¯1 (G, q) < ∞ iff pu (G) < 1. Amenable transitive graphs are unimodular ([SW], Corollary 1), so that Theorem 1.5 applies to them. In this case Theorem 1.2 implies that βc (G, q) = β¯1 (G, q). In contrast, Theorem 4.4(a) of [Jon] and Theorem 1.3 above provide sufficient conditions for βc (G, q) < β¯1 (G, q). Theorem 1.3 and Theorem 1.6 below combined and their extensions in Sects. 4 and 5 (see Theorem 4.5 and Theorem 5.4) generalize the work of [NW] and [Wu3], and show that for a class of non-amenable graphs, for all q, 0 < βc (G, q) < β¯1 (G, q) ≤ β¯2 (G, q) < ∞. Theorem 1.6. Suppose that G is an infinite, locally finite, connected transitive nonamenable graph. Then if pu (G) < 1, we have β¯2 (G, q) < ∞ for each q.
280
R. H. Schonmann
The statement that pu (G) < 1 in the next theorem is due to [BB] (see the proof of Corollary 10 there), where it is also proved that a large class of transitive graphs with one end have the quasi-connected minimal cut sets property. Theorem 1.7. Suppose that G is an infinite, locally finite, connected transitive graph with one end which satisfies the quasi-connected minimal cut sets property. Then pu (G) < 1 and β¯2 (G, q) < ∞ for each q. Often the extra tool of duality available in the planar case allows one to make faster progress in this case. This is the case with Conjecture 1.4. Our results in this case are summarized in the next theorem. Note that in this case, in addition to knowing that the G,q G,q property µf,β ∈ (Gβ,A )ext is monotone decreasing in β, we know that (1.2) is monotone increasing in β (meaning that once it holds for some value of β, it holds also for larger values of β). Moreover, we know in this case that at each β one of these two properties ¯ must hold; informally, “the critical point β(G, q) (= β¯1 (G, q) = β¯2 (G, q)) is sharp”. The statement that pu (G) < 1 in the next theorem is is due to [BS3]. Theorem 1.8. Suppose that G is an infinite, locally finite, connected transitive nonamenable planar graph with one end. Then pu (G) < 1 and for each q: G,q
G,q
(a) For each β > 0 either µf,β ∈ (Gβ,A )ext , or else (1.2) holds. (b) β¯1 (G, q) = β¯2 (G, q) < ∞. Note. Theorem 1.8 extends results obtained independently in [Wu4], a paper that was submitted before the current paper was completed, but of which we learned only after the current paper was submitted. Theorems 1.4–1.8 relate the behavior of percolation and Potts models, and are similar in this respect to results in [Häg3]. But while [Häg3] is concerned with relations between the critical points pc (G) and βc (G, q), the theorems above relate pu (G) to ¯ the conjectured β(G, q), through its technical surrogates β¯1 (G, q) and β¯2 (G, q), and address item (b.2) of Conjecture 1.4. It is important to stress that in our discussion above of Potts models on transitive G,q G,q graphs, we are concerned with the set (Gβ,A )ext , rather than with the set (Gβ )ext . In the case of the Ising model on a homogeneous tree, Conjecture 1.4 predicts a single critical G,q point, βc , separating two regimes in which (Gβ,A )ext has, respectively, cardinality 1 and 3. It is nevertheless known (see [Hig, MoS, BRZ] and [Iof1]; for similar results on more general trees see [Iof2] and [EKPS]) that there is a second critical point β˜ > βc , G,q G,q at which the structure of the set (Gβ )ext changes: µf,β ∈ (Gβ )ext for β < β˜ and G,q ˜ Also for the Ising model on Z3 a second critical point µf,β ∈ (G )ext for β > β. β
G,q
is believed to exist at which the structure of the set (Gβ )ext changes (roughening transition). Related results can also be found in [SeS]. Conjecture 1.4 does not state anything about the behavior at the critical points. These are much more delicate questions, since even in the case of Z2 the behavior at βc depends on q. These questions may be very challenging in the case of amenable graphs, even ¯ at the non-rigorous level. Also for non-amenable graphs the behavior at β(G, q) may depend on G, since this is the case for the analogous problem (the behavior at pu (G)) for percolation (see [Sch3, Per, Lal3, BS3]). Nevertheless, as for percolation, it seems reasonable to make the following conjecture.
Multiplicity of Phase Transitions
281
Conjecture 1.5. Suppose that G is an infinite, locally finite, connected transitive graph. If G is non-amenable, then for each q the behavior near and at βc (G, q) is of meanfield nature. In particular, for q = 2 (Ising model) the transition is continuous, with G,q |Gβc (G,q) | = 1, and the critical exponents take their mean-field values, while for q = G,q
3, 4, . . . the transition is discontinuous, with |(Gβc (G,q),A )ext | = q + 1. For the mean field behavior of the Potts model see, e.g, Sect. 1.C of [Wu5]. Most aspects of Conjecture 1.5 are known to hold for homogeneous trees; see, e.g., Sect. 4.8 of [Bax] for the computation of some critical exponents for the Ising model and [PLM] for the discontinuity of the transition for the Potts model. In [RNO] evidence from series expansions was found supporting mean field values for the exponents β and γ for the Ising model on some non-amenable planar transitive graphs with one end. We have no contribution to the q > 2 case of this conjecture. But for q = 2 we have the following result. Theorem 1.9. Suppose √ that G is an infinite, locally finite, connected transitive graph. If iE (G)/D(G) > 2/ 5, then the Ising model bubble diagram is finite, implying that if G is unimodular the Ising model on G has a continuous phase transition and the critical exponents γ , β and δ exist and take their mean-field values and if the critical exponent α exists it is non-positive. The bubble diagram (to be defined in Subsect. 4.2) is known to be finite on Zd for d > 4 (see [Aiz1, Aiz2, AF]). The only published result on the bubble diagram being finite for a non-amenable graph that we are aware of is in [Wu1]. As for percolation, in that paper this is proved for a Cartesian product of Z and a homogeneous tree with large degree. 1.4. Contact process. For a general introduction to interacting particle systems and to the contact process see for instance [Lig1] and [Lig4]. On an infinite connected graph of bounded degree G, one defines the contact process with infection parameter λ > 0, as a continuous time Markov process in which sites can be in state 0 (healthy) or 1 (infected), with infected sites recovering at rate 1 and healthy sites being infected by each one of their infected neighbors, independently, at rate λ. (The assumption that G is of bounded degree assures that the process is well defined and does not explode.) It is natural to define the following two critical points, λs (G) ≤ λr (G), which provide the boundaries of three distinct phases described as follows. In the description below, in each case, we suppose that the process starts with only finitely many infected sites. (1) For λ < λs (G), the infection eventually disappears a.s. (One says that the contact process dies out.) (2) For λs (G) < λ < λr (G), the infection persists forever with positive probability, but for every finite set of vertices the infection eventually disappears from them a.s. (One says that the contact process survives globally but not locally. “s” stands for survival.) (3) For λ > λr (G), with positive probability the infection recurs to every vertex infinitely often. (One says that the contact process survives locally or recurs. “r” stands for recurrence.) It is well known that under the assumption introduced above, that G is of bounded degree, λs (G) > 0. Also λr (G) ≤ λr (Z+ ) < ∞. Much deeper results include the facts that on
282
R. H. Schonmann
cubic lattices λs (G) = λr (G) (this was proved in [BG]), and that on homogeneous trees of degree at least three λs (G) < λr (G) (this was proved in [Pem] and [Lig2] and the proof was greatly simplified in [Sta]). It is natural to conjecture the following. Conjecture 1.6. Suppose that G is an infinite, locally finite, connected transitive graph. Then λs (G) < λr (G) iff G is non-amenable. Both directions of this conjecture are open problems. The assumption that G is transitive is crucial in Conjecture 1.6. For instance, a homogeneous tree with an infinite linear chain glued to it by one edge is amenable, but has λs (G) < λr (G). It is much harder to find examples of non-amenable graphs which have λs (G) = λr (G), but this was done in [PS], where a spherically symmetric tree with this property was presented. Nevertheless, in Sect. 7 we will show that even without assuming transitivity, a large value for iE (G) compared to an appropriate function of the maximum and the minimum degrees of G implies a separation between λs (G) and λr (G). In the transitive case the result is contained in Theorem 1.10 below. Regarding criticality, we state next a conjecture analogous to the ones for percolation and for the Potts models. Conjecture 1.7. Suppose that G is an infinite, locally finite, connected transitive graph. If G is non-amenable, then the behavior of the contact process near and at λs is of mean-field nature. The following theorem is motivated by the two conjectures above. Theorem 1.10. Suppose √ that G is an infinite, locally finite, connected transitive graph. If iE (G)/D(G) > 1/ 2, then λs (G) < λr (G) and the contact process open triangle condition holds, implying that if G is unimodular the contact process on G dies out at λs (G) and the critical exponents γ , β and δ exist and take their mean-field values. As far as we know, the contact process open triangle condition (to be defined in Subsect. 7.2) has not been proved to hold for any amenable graph (but for the related oriented percolation process, this was done in the case of Zd with large d in [NY]). The contact process open triangle condition was proved to hold for homogeneous trees of large degree in [Wu2] and this result was extended to all the homogeneous trees with degree at least 3 in [Sch1].
1.5. Organization of the paper. In Sect. 2 basic facts about graphs are introduced and reviewed, along with examples. In Sect. 3 independent bond percolation is discussed in more detail, and an extension of Theorem 1.1 (Theorem 3.2) is proved. In Sect. 4 Potts and Ising models are discussed in more detail, their relations to Fortuin–Kasteleyn random cluster models are reviewed and extended and Theorem 1.4, an extension of Theorem 1.5 (Theorem 4.3), Theorem 1.3 and Theorem 1.9 are proved, along with related results. In Sect. 5 the finite island property of [NW] is introduced and used to prove an extension of Theorem 1.6 (Theorem 5.4), Theorem 1.7 and Theorem 1.8. The methods and the discussion in Subsect. 5.2 are of independent interest in connection to the study of independent bond percolation on non-amenable, not necessarily transitive, graphs.
Multiplicity of Phase Transitions
283
Section 6 contains further results, related to those in Sections 3, 4 and 5. First some results on the Fortuin–Kasteleyn random cluster models, which are by-products of our study of the Potts model, are presented. Second the analogues for independent site percolation of various results on independent bond percolation are stated with explanation on how to adapt some of the proofs to this case. In Sect. 7 the contact process is discussed in more detail and Theorem 1.10 and related results are proved. 2. Background on Infinite Graphs 2.1. Basics. A graph is a pair G = (V , E), where V is an arbitrary set and E is a subset of the set {{x, y} : x, y ∈ V }. V is called the set of vertices (or sites) of G, and E is called the set of edges (or bonds) of G. Given an edge e = {x, y}, the vertices x and y are called its end-points; e is said to be incident to x and y and x and y are said to be incident to e. Vertices will be said to be neighbors or adjacent if they belong to a common edge. Edges will be said to be neighbors or adjacent if they share a common vertex. The degree of a vertex x ∈ V is the number, dx , of edges incident to it. All the graphs considered in this paper will be locally finite, i.e., the degree of each vertex will be finite. Set D(G) = sup{dx : x ∈ V }. If D(G) < ∞, the graph G is said to be of bounded degree, and D(G) is its maximal degree. A finite chain is a sequence x0 , x1 , . . . , xn , of distinct sites in which for each i, xi is neighbor to xi+1 . The length of this chain is n. The sites x0 and xn are its end-points. We also express this by saying that this chain connects x0 to xn . An infinite chain is a sequence x0 , x1 , . . . , of distinct sites in which for each i, xi is neighbor to xi+1 . The head of the chain is the vertex x0 . (In an abuse of terminology we will sometimes think of chains as unordered sets of vertices.) A finite path is a sequence e1 , e2 , . . . , en of edges in which for each i, ei is neighbor to ei+1 . An infinite path is a sequence e1 , e2 , . . . of distinct edges in which for each i, ei is neighbor to ei+1 . Such a path is said to be edge-self-avoiding if ei = ej for i = j . It is said to be vertex-self-avoiding if it is edge-self-avoiding and ei ∩ ej = ∅ when |i − j | > 1. Two vertices of a graph are said to belong to the same connected component of the graph if there is a finite chain which has them as end-points. Note that this notion partitions a graph into equivalence classes of connected components. A graph is said to be connected if it has a single connected component. A tree is a graph such that for each pair of vertices x, y ∈ V , there is a unique chain which connects them. The distance dist(y, z) between sites y and z is the minimal length of the chains which have y and z as end-points. The distance between a set of sites R and a set of sites S is the minimal distance between sites in R and sites in S; in an abuse of notation it will be indicated by dist(R, S). A site of G will be singled out and called its root, denoted by r. The ball of center x ∈ V and radius N is the set B(x, N ) = {y ∈ V : dist(x, y) ≤ N }.
284
R. H. Schonmann
We will use the abbreviation B(r, N ) = B(N ). The edge boundary of a set S ⊂ V is ∂E S = {{x, y} ∈ E : x ∈ S, y ∈ V c }, and its vertex boundary is ∂V S = {y ∈ V c : {x, y} ∈ ∂E S for some x ∈ S}. It is easy to see that for finite S, |∂E S| ≤ |∂V S| ≤ |∂E S|, D(G)
(2.1)
where the left-hand-side expression is 0 if D(G) = ∞. If G is a tree, then |∂E S| = |∂V S|. The expression S V will mean that S is a finite subset of V . In a common abuse of notation we will sometimes denote a subset of V which contains a single element by the name of this element. 2.2. Transitive graphs. An automorphism of G is a one-to-one map φ from V onto V which preserves the graph structure, i.e., such that E = {{φ(x), φ(y)} : {x, y} ∈ E}. The set of all the automorphisms of G will be denoted by Aut(G). A graph is said to be transitive (or vertex-transitive or homogeneous) if for each pair x and y of its vertices there is an automorphism of the graph which maps x to y. Intuitively, a graph is transitive if all its vertices play the same role. A graph is said to be quasi-transitive if there is a finite set of vertices, V0 , with the property that each vertex of the graph can be mapped into one of the vertices of V0 by an automorphism. Intuitively, a graph is quasi-transitive if there is a finite number of types of vertices, and vertices of the same type play the same role. Typically results on transitive graphs can be extended to quasi-transitive graphs with minor modifications. This is the case of the results in this paper. An important class of transitive graphs is that of the Cayley graphs of finitely generated groups. (Here and in Subsect. 2.3 we will use a number of terms which come from the basic theory of infinite groups. Readers not familiar with them may consult any standard text, e.g., [Hun]. No results from group theory will be used in the proofs contained in this paper.) Suppose that V is such a group and that S is a finite symmetric set of generators for it. Then the (right) Cayley graph of V for this set of generators is the graph G = (V , E) which has E = {{x, y} : x, y ∈ V , y = xz for some z ∈ S}. As we will see in Subsect. 2.3 below, the class of Cayley graphs of finitely generated groups includes a large number of examples of transitive graphs and most of those which are of greater interest. Nevertheless this class of graphs is known not to exhaust the class of transitive graphs (see [CK, BLPS1] or [Lyo5]). Moreover there is a class of graphs, known as transitive unimodular graphs which is a proper subclass of that of the transitive graphs and that properly contains the class of Cayley graphs of finitely generated groups. To define this class of graphs, we first define the stabilizer of a site x ∈ V as S(x) = {γ ∈ Aut(G) : γ (x) = x}. A transitive graph G is unimodular if for each x, y ∈ V , |{γ (y) : γ ∈ S(x)}| = |{γ (x) : γ ∈ S(y)}|.
Multiplicity of Phase Transitions
285
2.3. Examples. The basic examples of graphs of interest in statistical mechanics and related areas are the cubic lattices Zd , d ≥ 1, and other graphs which can be embedded in a periodic fashion in Rd , for some d ≥ 1 (see [Kes] for precise definitions). The cubic lattice Zd is a Cayley graph of the free Abelian group of rank d. The next most important class of examples is that of the homogeneous trees, i.e., trees which are transitive. Clearly there is exactly one homogeneous tree with degree D, for each D = 0, 1, 2, . . . , but only in case D ≥ 2 they are infinite. We will denote by Tb the homogeneous tree of degree D = b + 1. (This somewhat standard notation is motivated by the fact that the index b is the branching number of the tree, as defined for instance in [Lyo1]). Note that T1 = Z. When D = b + 1 is even, Tb is a Cayley graph of the free group with D/2 free generators. When D = b + 1 is odd, Tb is a Cayley graph of the group with b/2 free generators and one generator which is identical to its inverse. Discrete groups of isometries of the hyperbolic spaces Hd , d ≥ 2 (Fuchsian groups in the d = 2 case) define an important class of Cayley graphs (see [SeS, Lal1, Lal2] and [BS3]). The vertices of such graphs can be thought of as the tiles of a tesselation of Hd , with edges connecting vertices which correspond to tiles whose boundaries intersect in a d − 1-dimensional surface. Such examples may be seen as “crystals in a non-Euclidean space” (their Euclidean counterparts are Cayley graphs of discrete groups of isometries of Euclidean space, including Zd and the triangular and hexagonal lattices). New graphs can be obtained from old graphs by means of various operations. One important example of such an operation is the Cartesian product. Given a pair of graphs G1 = (V1 , E1 ) and G2 = (V2 , E2 ), we will let G1 × G2 denote the graph which has as vertices the elements of the Cartesian product V1 × V2 and an edge connects (x, u) to (y, v) iff either x = y and {u, v} ∈ E2 or else {x, y} ∈ E1 and u = v. In the special case in which G1 and G2 are Cayley graphs, the operation of taking their Cartesian product corresponds to the group-theoretic notion of taking a direct product of the groups. So the resulting graph is again a Cayley graph. Given two finitely generated groups G1 and G2 , one can also take their free product, G1 ∗G2 . Informally, this free product corresponds to the group generated by the collection of all the generators of G1 and G2 , keeping all the relations among them as in G1 and G2 , but not adding any new relation. For a formal definition, see [Hun]. Cayley graphs of free products of finitely generated groups are an important source of counterexamples for statements about transitive graphs. Next we introduce two somewhat special examples of non-transitive graphs, which will be used to clarify several issues in the remarks in this paper. T! is the tree in which the vertices at distance n from the root have n + 1 neighbors. Note that the number of vertices at distance n from the root is (n − 1)!. Given a number a ∈ {2, 3, . . . }, the graph Br(a) (for “bridges”) will be defined as follows. For each odd positive integer n, take a disjoint copy, Gn = (Vn , En ), of the complete graph with a (n+1)/2 vertices (i.e., En contains all pairs of distinct vertices in Vn ). For each even non-negative integer n take a distinct vertex xn , distinct also from all the vertices in V1 , V3 , . . . . The set of vertices of Br(a) is V = {x0 , x2 , . . . }∪V1 ∪V3 ∪. . . and its set of edges E is obtained by taking all the edges in each set En , n = 1, 3, . . . and adding an edge between xn and each vertex in Vn−1 and Vn+1 , for n = 2, 4, . . . and an edge between x0 and each vertex in V1 . x0 will be taken as the root of Br(a). Note that the set of vertices at distance n from the root is {xn } for n even and is Vn for n odd.
286
R. H. Schonmann
2.4. Isoperimetric constants. Given an infinite, locally finite, connected graph G = (V , E) its edge-isoperimetric constant is defined as |∂E S| iE (G) = inf : ∅ = S V . |S| Its vertex-isoperimetric constant is defined as |∂V S| : ∅ = S V . iV (G) = inf |S| Its anchored edge-isoperimetric constant (introduced in [BLS]) is defined as |∂E S| ∗ iE (G) = lim inf : r ∈ S V , S connected, |S| ≥ n . n→∞ |S| Its anchored vertex-isoperimetric constant is defined as |∂V S| : r ∈ S V , S connected, |S| ≥ n . iV∗ (G) = lim inf n→∞ |S| Note that no one of the isoperimetric constants defined above depends on the choice of the root r. Clearly iE (G) ≤ iE∗ (G), iV (G) ≤ iV∗ (G). And from (2.1) we have iE (G) ≤ iV (G) ≤ iE (G), D(G) so that for graphs of bounded degree, iE (G) > 0
iff iV (G) > 0,
iE∗ (G) ≤ iV∗ (G) ≤ iE∗ (G), D(G) iE∗ (G) > 0
iff
iV∗ (G) > 0.
The same is true for trees, since for them iE (G) = iV (G) and iE∗ (G) = iV∗ (G). While even for trees of bounded degree one can have iE (G) = 0 and iE∗ (G) > 0, or iV (G) = 0 and iV∗ (G) > 0 (see [BLS]), these are not possible if G is transitive (see Proposition 7.4 of [HSS]). The graphs Br(a), a ≥ 2 can be used to show that iE (G) > 0 does not imply iV (G) > 0 (nor even iV∗ (G) > 0). It is easy to see that iV∗ (Br(a)) = 0, by considering the sets Sk = {x ∈ V : dist(r, x) ≤ 2k + 1}, k = 0, 1, 2, . . . . For these sets |Sk | → ∞ as k → ∞, while |∂V Sk | = 1 for all k. Now we argue why for each a ∈ {2, 3, . . . }, iE (Br(a)) > 0. Given a finite nonempty set S of vertices of Br(a), let depth(S) be the maximal distance to the root of the vertices in S. There are two cases to consider. First, if depth(S) = 2k, for some integer k, then ∂E S contains all the edges between vertex x2k and vertices in V2k+1 , so that |∂E S| ≥ |V2k+1 | = a k+1 . On the other hand it is clear that |S| ≤ |{x ∈ V : dist(r, x) ≤ 2k}| = k + 1 + a + a 2 + · · · + a k = k + 1 + (a k+1 − 1)/(a − 1) ≤ k + a k+1 . Second, if depth(S) = 2k − 1, for some integer k, then ∂E S contains for each x ∈ V2k−1 either the edge between x and the vertex x2k−2 (in case x ∈ S), or the edge between x and the vertex x2k (in case x ∈ S), so that |∂E S| ≥ |V2k−1 | ≥ a k . On the other hand, it is clear that |S| ≤ |{x ∈ V : dist(r, x) ≤ 2k − 1}| ≤ |{x ∈ V : dist(r, x) ≤ 2k}| ≤ k + a k+1 , as above. So ak iE (Br(a)) ≥ inf > 0. k≥1 2k + a k+1
Multiplicity of Phase Transitions
287
2.5. Spectral radius. For more on the topic in this subsection, see, e.g., [MW] and references therein. Given a locally finite connected graph G = (V , E), let AG stand for its adjacency matrix, i.e., AG (x, y) takes the values 1 or 0, according to the vertices x and y being neighbors or not in G. Let AnG denote the nth power of AG , and note that AnG (x, y) is the number of paths of length n which connect x to y. It is easy to see that R(G) = lim sup (AnG (x, y))1/n n→∞
does not depend on x and y. The quantity R(G) is the spectral radius associated to the matrix AG . A graph G = (V , E) is said to be bipartite if V is the disjoint union of two sets, V1 and V2 , and each edge in E contains one element of V1 and one element of V2 (e.g., the graphs Zd are bipartite). Set bip(G) = 2 if G is bipartite and bip(G) = 1 otherwise. For x, y ∈ V , set oddG (x, y) = 1 if dist(x, y) is odd and oddG (x, y) = 0 if dist(x, y) is even. From Theorem 4.6 in [MW], we have AnG (x, y) ≤ (R(G))n
for all x, y ∈ V ,
n = 1, 2, . . . .
(2.2)
For n(k) = bip(G)k + oddG (x, y), n(k)
lim (AG (x, y))1/n(k) = R(G)
k→∞
for all x, y ∈ V .
(2.3)
It is clear that R(G) ≤ D(G), and it is important to know when this inequality is strict. For transitive graphs it is known that R(G) < D(G) is equivalent to iE (G) > 0. Quantitative versions of this statement are available, and the following one, valid for all infinite, connected, bounded degree graphs will play a central role in this paper, (iE (G))2 + (R(G))2 ≤ (D(G))2 .
(2.4)
This inequality (which saturates for homogeneous trees) was derived originally in [Moh1], Theorem 2.1(a). It is sometimes referred to as a “Cheeger-type-inequality”. 2.6. Number of ends. Given S V , the graph G\S is the graph obtained from the graph G by removing the vertices which belong to S and the edges incident to these vertices. The number of ends of the graph G is E(G) = sup {number of infinite connected components of G\S}. SV
Any Cartesian product of infinite graphs can easily be seen to have a single end. Any Cayley graph of a free product of non-trivial finitely generated groups can easily be seen to have infinitely many ends. It is known (see Sect. 6 of [Moh2]) that for transitive graphs the number of ends can only be: 1 (e.g., Zd , d ≥ 2), 2 (e.g., Z), or ∞ (e.g., Tb , b ≥ 2). Moreover, when the number of ends is 2, the graph is amenable and when the number of ends is infinity the graph is non-amenable. The following proposition will be used in this paper; it can be easily proved with the arguments in the proofs of Propositions 6.1 and 6.2 in [Moh2]. Proposition 2.1. Suppose that G is an infinite, locally finite, connected transitive graph. If G has more than one end, then there is a positive integer n and vertices . . . , x−1 , x0 , x1 , . . . such that the balls B(xk , n), k ∈ Z have the following property. For each i < j , any chain from B(xi , n) to B(xj , n) intersects each B(xk , n), k = i + 1, . . . , j − 1. If G has two ends, then the choices above can be made so that there is also l < ∞ such that any x ∈ V is within distance l of one of the balls B(xk , n), k ∈ Z.
288
R. H. Schonmann
l 2.7. Cut sets. Given
a graph G = (V , E), we defineG as the graph with vertex set V l and edge set E = {x, y} : x, y ∈ V , dist(x, y) ≤ l . The line graph (or cover graph) of G, denoted GE , is the graph which has vertex set VE = E and edge set EE = {{e, f } : as elements of E, e and f share and endpoint} = {{e, f } : e, f ∈ E, e ∩ f = ∅}. We will denote by distE (·, ·) the distance function on VE × VE = E × E for the graph GE . Given two vertices x, y ∈ V , a (x, y) cut set 5 ⊂ E is a set of edges that has nonempty intersection with every path from x to y. A (x, y) cut set is minimal (an mcs) if it contains no proper subset which is also a (x, y) cut set. Given 6 ⊂ E, set C(6) = sup distE (61 , 62 ). 61 ∪62 =6
If C(6) ≤ l, then we say that 6 is l-close. We will say that an infinite, locally finite, connected graph has the quasi-connected minimal cut sets property if for some l < ∞ for each pair of vertices x, y any (x, y) mcs is l-close. It is easy to see that in this case any infinite mcs, when considered as a set of vertices in (GE )l , must contain an infinite chain in this graph. The following result from [BB] shows that the most commonly considered transitive graphs with one end satisfy the quasi-connected minimal cut sets property. Theorem 2.1 (Babson, Benjamini). Suppose that G is the Cayley graph of a finitely generated, finitely present group, with respect to any symmetric finite set of generators. Then G has the quasi-connected minimal cut sets property. See Theorem 1 and Example 3 in [BB] for the proof. It is natural to ask whether all transitive graphs with one end satisfy the quasiconnected minimal cut sets property. Unfortunately this seems not to be the case (I thank Russ Lyons for having told me that R. Kenyon may have a counterexample.) 2.8. Planar graphs. Let 5 be the Euclidean (or the hyperbolic) plane. A graph G = (V , E) is planar if it can be embedded in 5 satisfying the following restrictions. Each vertex x ∈ V is mapped into a point vx ∈ 5. Each edge e = {x, y} ∈ E is mapped into the image 6e = γe ([0, 1]) of a curve γe : [0, 1] → 5, with γe (0) = vx and γe (1) = vy . If e1 = e2 , then γe1 ((0, 1)) ∩ γe2 ((0, 1)) = ∅. The connected components of 5\(∪e∈E 6e ) are called the faces of the embedding. The dual multigraph (meaning that more than one edge can connect two vertices and edges may have two identical endpoints) G† = (V † , E † ) is defined as follows. V † is the set of faces and to each edge e ∈ E one associates a dual edge e† connecting the two faces which have 6e in their boundary. When G is planar, transitive and has one end, it is not hard to show that each face is bounded. Proposition 2.1 of [BS3] gives much more detailed information in this case. In particular the embedding can be done so that each bounded set in 5 contains only finitely many points vx , x ∈ V (embedded vertices) and intersects only finitely many curve images 6e , e ∈ E (embedded edges). 3. Independent Bond Percolation 3.1. Background on independent bond percolation on graphs. Given an infinite, locally finite, connected graph G = (V , E), the probability measure which corresponds to
Multiplicity of Phase Transitions
289
making each bond of G occupied independently with probability p and vacant otherwise G will be denoted by PG p , with corresponding expectation Ep . Clusters are the connected components of the graph obtained from G by removing all the vacant bonds. In a common abuse of terminology, we sometimes think of the clusters as being graphs and sometimes as being the corresponding vertex sets. Given A, B ⊂ V , we will write {A ↔ B} for the event that A and B intersect a common cluster. We will write {A → ∞} for the event that A intersects an infinite cluster. The union of the clusters that intersect A is denoted by C(A). For x ∈ V set θxG (p) = PG p (x → ∞). Obviously, in case G is transitive, θxG (p) = θ G (p) does not depend on x ∈ V . The number of infinite clusters will be denoted by N . The threshold for percolation is pc (G) = inf{p : θxG (p) > 0 for some x ∈ V } = inf{p : θxG (p) > 0 for all x ∈ V } G = inf{p : PG p (N > 0) > 0} = inf{p : Pp (N > 0) = 1},
where the last equality is a consequence of Kolmogorov’s 0-1 law. The analogue of Theorem 2 in [BS1] for bond percolation states that pc (G) ≤
1 . 1 + iE (G)
(3.1)
This result will be strengthened in Sect. 5, when we state and prove Theorem 5.3. It is very natural to define several other critical points based on the connectivity properties under PG p: pexp (G) −γ dist(x,y) (x ↔ y) ≤ Ce for all x, y ∈ V , = sup p : for some C, γ ∈ (0, ∞) PG p
pconn (G) = sup p : lim sup PG p (x ↔ y) = 0 , n→∞ x,y∈V dist(x,y)≥n p¯ conn (G) = sup p : inf PG (x ↔ y) = 0 . p x,y∈V
Obviously, pexp (G) ≤ pconn (G) ≤ p¯ conn (G).
(3.2)
For an infinite, locally finite, connected transitive graph, one defines pu (G) = inf{p : PG p (N = 1) = 1}. It is known from [HP] and [Sch2] that for all p > pu (G), PG p (N = 1) = 1. A simple application of Harris’ inequality (see, e.g., (26) in [HPS]) gives then that for p > pu (G), G 2 PG p (x ↔ y) ≥ (θ (p)) . Therefore, when G is transitive, (3.2) can be extended to pexp (G) ≤ pconn (G) ≤ p¯ conn (G) ≤ pu (G).
(3.3)
290
R. H. Schonmann
For transitive G, Theorem 4 in [BS1] provided a lower bound for pu (G) in terms of the spectral radius of the simple symmetric random walk on G. This result was strengthened and extended in Proposition 8.3 in [HPS], which states that for an infinite, locally finite, connected graph, for p < 1/R(G), for any x, y ∈ V , PG p (x ↔ y) ≤
(pR(G))dist(x,y) . 1 − pR(G)
(3.4)
In particular pexp (G) ≥
1 . R(G)
(3.5)
In the transitive case, the bound on pu (G) obtaining from combining (3.3) with (3.5) is identical to the bound implicit in Theorem 4 in [BS1]: pu (G) ≥
1 . R(G)
(3.6)
In Subsect. 1.2, in connection to Conjecture 1.3, we mentioned some known facts on the issue of when pu (G) < 1. The following is a strengthening of one of these results. From the fact that transitive graphs with infinitely many ends are non-amenable, from (3.1) and from Proposition 2.1, we obtain for such graphs pc (G) < p¯ conn (G) = pu (G) = 1.
(3.7)
But the following example, from [LyS], shows that there are transitive graphs with infinitely many ends and pconn (G) < pu (G). For instance, this is the case for a Cayley graph G of the free product Z2 ∗ Z. Since G has infinitely many ends, pu (G) = 1, but since copies of Z2 are embedded in G, pconn (G) ≤ pc (Z2 ) < 1. Observe that also from Proposition 2.1, we have that if G is transitive and has 2 ends, then pc (G) = pexp (G) = pconn (G) = p¯ conn (G) = pu (G) = 1.
(3.8)
The following is a natural conjecture. Conjecture 3.1. For any infinite, locally finite, connected transitive graph G and for any p ∈ [0, 1], inf PG ⇐⇒ PG p (x ↔ y) > 0 p (N = 1) = 1. x,y∈V
In particular p¯ conn (G) = pu (G).
(3.9)
In [LyS], this was proved to be the case if G is also unimodular. Note then that from (3.7) and (3.8), the only cases in which (3.9) has not been proved are non-unimodular transitive graphs with one end. Open Problem 3.1. For which transitive graphs G is pconn (G) = pu (G)? The counterexample Z2 ∗ Z of [LyS] has infinitely many ends; is there any counterexample with one end?
Multiplicity of Phase Transitions
291
Open Problem 3.2. Is pexp (G) = pconn (G) for every transitive graph G? The methods of [AB] show that for every transitive graph G, for all p < pc (G), EG p (|C(r)|) < ∞. Combining this with the methods of [Ham] gives under the same conditions that PG p (x ↔ y) ≤ C exp(−γ dist(x, y)), for some C, γ ∈ (0, ∞). Therefore, for transitive graphs, (3.3) can be extended to pc (G) ≤ pexp (G) ≤ pconn (G) ≤ p¯ conn (G) ≤ pu (G).
(3.10)
If G is transitive and amenable, then we know that also pc (G) = pu (G), so that from (3.10) we learn that these graphs satisfy the equalities in Conjecture 3.1 and Open Problem 3.2. In [Lal3], Proposition 5.1, and in [BS3], Corollary 4.5, the equality pexp (G) = pconn (G) = pu (G) was also proved for a class of transitive non-amenable planar graphs with one end, which were also showed to satisfy the expected inequalities pc (G) < pu (G) < 1. 3.2. Mean field criticality for independent percolation. Next we introduce diagrams which are commonly employed to prove mean field critical behavior near and at pc . Given an infinite, locally finite, connected graph G = (V , E) and p ∈ [0, 1], for k = 0, 1, . . . and x, y ∈ V set DiagG p,k (x, y) = G G G PG p (x ↔ z1 ) Pp (z1 ↔ z2 ) . . . Pp (zk−1 ↔ zk ) Pp (zk ↔ y).
(3.11)
z1 ,...,zk ∈V
DiagG p,1 (x, y) is known as the bubble diagram, and is of relevance in the study of the critical behavior of the Ising model, as will be reviewed in the next section. DiagG p,2 (x, y) is known as the triangle diagram, and is of relevance in the study of the critical behavior of the independent percolation model. Before we can state the known results, we need one more definition: M G (p, h) = 1 −
∞ n=1
e−hn PG p (|C(r)| = n),
where h ≥ 0. We will sometimes below abbreviate pc = pc (G), since there is no risk of confusion. The results below on mean field critical behavior were proved in the union of the papers [AN, BA] and [Ngu]. Suppose that G is an infinite, locally finite, connected transitive unimodular graph. (These papers considered smaller classes of graphs, usually only Zd , but their arguments work with this generality. The role of unimodularity will be explained below.) Under the open triangle condition, which reads, lim
sup
n→∞ x,y∈V ,dist(x,y)≥n
DiagG pc ,2 (x, y) = 0,
(3.12)
one has the following (the labels on the left indicate the way one usually refers to each result, in terms of a corresponding critical exponent): −1 [γ = 1] C1 (pc − p)−1 ≤ EG p (|C(r)|) ≤ C2 (pc − p) ,
for p < pc ,
292
R. H. Schonmann
[β = 1] C1 (p − pc )1 ≤ θ G (p) ≤ C2 (p − pc )1 , for p > pc , [δ = 2] C1 h1/2 ≤ M G (pc , h) ≤ C2 h1/2 , for h > 0, m+1 )/EG (|C(r)|m ) [ = 2] For m = 1, 2, . . . C1 (pc − p)−2 ≤ EG p (|C(r)| p ≤ C2 (pc − p)−2 , for p < pc , where in each case C1 , C2 ∈ (0, ∞). G G Since by Harris’ inequality PG p (z2 ↔ x) ≤ Pp (z2 ↔ y)/Pp (x ↔ y), it is clear that the open triangle condition (3.12) implies the closed triangle condition: DiagG pc ,2 (r, r) < ∞.
(3.13)
Equations (3.12) and (3.13) are actually equivalent (C. Newman and C. Wu, private communication), but this fact (which seems to only have appeared in print with a proof that is restricted to the special case of Zd ) will not be needed in this paper. It is beyond the scope of this paper to review the long and technical proofs that diagrammatic conditions imply certain mean-field features of criticality. Nevertheless, it is important to point out the sort of step in such proofs which may not be true for a general transitive graph, but that can be justified under the extra assumption of unimodularity. For concreteness, we consider one example: The derivation in [AN] of their equation (6.4) from Russo’s formula. The following statement follows that equation: “Where we made a simple use of translation invariance (in effect to simplify later notation we replaced 0 with x in the more natural expression)”. In our notation, they performed the transformation P(r ↔ x ↔ u ↔ y) = P(x ↔ r ↔ u ↔ y). (3.14) x,u,y dist(x,u)=1
x,u,y dist(r,u)=1
This transformation is justified on Zd , using properties of the group of translations (as the authors indicated). The arguments can also easily be extended to Cayley graphs, with the group of automorphisms {φv : v ∈ V }, where φv (u) = vu for each u ∈ V (φv is left group multiplication by v), taking the role of the group of translations in the case of Zd . On the other hand, (3.14) may not hold in general for an arbitrary transitive graph. The validity of (3.14) in the case of unimodular transitive graphs is directly related to the “mass transport technique”, which works as follows. Let M(·, ·) be a function from V × V to [0, ∞) which is invariant under diagonal actions of the automorphisms of G, i.e., M(a, b) = M(φ(a), φ(b)) for all a, b ∈ V , φ ∈ Aut (G). Corollary 3.5 in [BLPS1] states that if G = (V , E) is an infinite, locally finite, connected transitive unimodular graph, then for each a ∈ V , M(a, b) = M(b, a). (3.15) b
b
Equation (3.14) follows immediately, by taking M(a, b) = P(a ↔ b ↔ u ↔ y) u,y dist (b,u)=1
and using (3.15) with a = r. Open Problem 3.3. Can one avoid the use of unimodularity in the derivation of meanfield criticality from diagrammatic conditions?
Multiplicity of Phase Transitions
293
3.3. Consequences for independent percolation of “high non-amenability”. Note that DiagG p,k (x, y) is non-decreasing in k and G DiagG p,0 (x, y) = Pp (x ↔ y).
Therefore, the following result extends Proposition 8.3 from [HPS] ((3.4), (3.5) and (3.6) above). Theorem 3.1. Suppose that G = (V , E) is an infinite, locally finite, connected graph. For each p ∈ [0, 1], each k = 0, 1, . . . , and each x, y ∈ V , DiagG p,k (x, y)
≤
∞
l k (pR(G))l .
l=dist(x,y)
Therefore, if p < 1/R(G), sup
x,y∈V ,dist(x,y)≥n
DiagG p,k (x, y) → 0 exponentially fast as n → ∞,
and for each x ∈ V ,
DiagG p,k (x, x) < ∞.
Proof. For u, v ∈ V , let Nl (u, v) be the number of edge-self-avoiding paths from u to v with length l. Set x = z0 and y = zk+1 . Using (2.2) we obtain: DiagG p,k (x, y) = ≤
k
z1 ,...,zk ∈V i=0
k
z1 ,...,zk ∈V i=0
≤
k
PG p (zi ↔ zi+1 )
li ≥0
p l0 +···+lk
l0 ,l1 ,...,lk ≥0
=
p
l0 ,l1 ,...,lk ≥0 l0 +···+lk ≥dist(x,y)
≤
=
Nli (zi , zi+1 ) pli AlGi (zi , zi+1 ) pli
k
z1 ,...,zk ∈V i=0 l0 +···+lk
l0 ,l1 ,...,lk ≥0
=
li ≥0
z1 ,...,zk ∈V i=0
=
AlGi (zi , zi+1 )
AlG0 +···+lk (x, y)
p l0 +···+lk AlG0 +···+lk (x, y) p l0 +···+lk (R(G))l0 +···+lk
l0 ,l1 ,...,lk ≥0 l0 +···+lk ≥dist(x,y) ∞ l=dist(x,y) l0 ,l1 ,...,lk ≥0 l0 +···+lk =l
(pR(G))l ≤
∞ l=dist(x,y)
l k (pR(G))l .
!
294
R. H. Schonmann
The following theorem, which thanks to (3.3) extends Theorem 1.1, is an immediate consequence of (3.1), (3.5), Theorem 3.1 and (2.4). The reader can either do the elementary computation to check this claim, or wait until we state and prove the stronger Theorem 4.4 in the next section. Theorem 3.2. Suppose that G is an infinite, locally finite, connected graph. If iE (G) > √ ( 2D(G)2 − 1 − 1)/2 (in particular if iE (G)/D(G) > 1/ 2), then pc (G) < pexp (G) and the open triangle condition (3.12) holds. The following question is motivated by this theorem. Open Problem 3.4. Is pc (G) < pexp (G) for every infinite, bounded degree, connected, non-amenable graph G? An affirmative answer would confirm Corollary 1.1. Note that for a = 2, 3, . . . , Br(a) is non-amenable and has pc = pexp = 0 (and PG p (N = 1) = 1, for p > 0), but these graphs do not have bounded degree. 4. Potts and Ising Models I 4.1. Applications of the Fortuin–Kasteleyn random cluster model. We will use the well known construction of the Gibbs measures of the Potts models by means of the dependent bond percolation process known as the Fortuin–Kasteleyn random cluster model, denoted by FK model in the sequel. We will summarize next the needed facts about the FK model and its relation to the Potts models. Readers who want to learn more about it are referred to the review [GHM]. On an infinite, locally finite, connected graph G, for each q ∈ [1, ∞) and each p ∈ [0, 1] there are two q-FK measures which will be of relevance to us in this paper. G,q G,q The wired one will be denoted by Pw,p , while the free one will be denoted by Pf,p . In G,1 G the case q = 1, we have PG,1 w,p = Pf,p = Pp . G,q
The Gibbs distributions µi,β , i = 1, . . . , q for the q-Potts model on G at inverse temperature β can be obtained by considering a random configuration of occupied and G,q vacant bonds according to the law Pw,p , p = 1 − e−β , and assigning to all the sites in each infinite cluster the state i, and to all the sites in each finite cluster a state from {1, . . . , q}, chosen uniformly at random for each cluster and independently for different G,q G,q clusters. The law of this coupled process with marginals Pw,p and µi,β will be denoted G,q
by Pw,i,p .
G,q
The Gibbs distribution µf,β , for the q-Potts model on G at inverse temperature β can be obtained by considering a random configuration of occupied and vacant bonds G,q according to the law Pf,p , p = 1 − e−β , and assigning to all the sites in each cluster a state from {1, . . . , q}, chosen uniformly at random for each cluster and independently G,q G,q for different clusters. The law of this coupled process with marginals Pf,p and µf,β will G,q
be denoted by Pf,p . In the remainder of this paper the symbol ∗ will represent w or f (when it appears more than once in an expression or statement it is understood that it takes the same meaning in each appearance). Unless otherwise stated, facts stated for the FK models
Multiplicity of Phase Transitions
295
are valid for all q ∈ [1, ∞) and relations between the FK and the Potts model are valid for all q ∈ {2, 3, . . . }. The following stochastic inequalities will be used to relate the FK model to the independent bond percolation model: G,q
P∗,p ≤ PG p,
(4.1)
and G,q
P∗,p ≥ PG p/(p+(1−p)q) .
(4.2)
G,q
The measures P∗,p have positive correlations, G,q
G,q
G,q
G,q
Pf,p ≤ Pw,p ,
(4.3)
and if p1 ≤ p2 , then P∗,p1 ≤ P∗,p2 .
(4.4)
G,q
The measures P∗,p have a trivial tail σ -field. This can be proved by the argument in the proof of Theorem 3.1(c) in [Gri1], where the case of Zd was considered. To adapt that argument to the setting of an arbitrary infinite, locally finite, connect graph, one can use Theorem 6.17 of [GHM] and the last display in that paper before that theorem. G,q In case G is transitive, the measures P∗,p are automorphism invariant. Since they have trivial tail σ -fields, they are extremal among those. The definitions of critical points for independent percolation are naturally extended G,q to percolation under the measure P∗,p . In particular we define G,q
pc,∗ (G, q) = inf{p : P∗,p (x → ∞) > 0 for some x ∈ V } G,q
= inf{p : P∗,p (x → ∞) > 0 for all x ∈ V }, G,q pexp,∗ (G, q) = sup p : for some C, γ ∈ (0, ∞) P∗,p (x ↔ y) ≤ Ce−γ dist(x,y) for all x, y ∈ V , G,q p¯ conn,∗ (G, q) = sup p : inf P∗,p (x ↔ y) = 0 . x,y∈V
G,q
G,q
We denote by " #i,β ( resp. " #f,β ) the expectation with respect to the Gibbs measure
G,q
G,q
µi,β (resp. µf,β ). It is well known that for i = 1, . . . , q, q 1 G,q G,q δσx ,i − = Pw,p (x → ∞), q −1 q i,β
(4.5)
pc,w (G, q) = 1 − e−βc (G,q) .
(4.6)
which leads to
296
R. H. Schonmann
It is also well known that for i = 1, . . . , q, q 1 G,q G,q G,q δσx ,σy − = Pw,p (x ↔ y → ∞) + Pw,p (x → ∞, y → ∞), q −1 q i,β
(4.7)
and q 1 G,q G,q = Pf,p (x ↔ y). δσx ,σy − q −1 q f,β
(4.8)
G,q
Equations (4.5) and (4.7) can be derived from the coupling Pw,p , and similarly (4.8) can G,q be obtained from the coupling Pw,p . It is well known that G,q
Pw,p (N = 0) = 1
$⇒
G,q
G,q
Pf,p = Pw,p .
(4.9)
Lemma 4.3 of [Jon] and its proof (presented in compact form in the proof of Proposition 5.1 in [Lyo5]) provide
G,q
µf,β =
i=1,...,q
1 G,q µ q i,β
⇐⇒
G,q
G,q
Pf,p = Pw,p (4.10)
q>1
$⇒
G,q
Pf,p (N ≤ 1) = 1.
Note the restriction on q in the last implication in (4.10); if q = 1 and pc (G) < pu (G), this implication is clearly false. In this paper we will not be able to use the equivalence in (4.10) to prove results about the Potts model, but rather we will use this relation in Subsect. 6.1 in the opposite direction. The following theorem provides another relation between properties of the Gibbs measures of the Potts model and connectivity properties of the related FK model. Note that using (4.8) this theorem can be restated with no mention of the FK model. Given φ ∈ Aut(G) and a spin configuration σ ∈ {1, . . . , q}V , define the action of φ on σ by (φ(σ ))x = σφ −1 (x) , for each x ∈ V . Theorem 4.1. Suppose that G = (V , E) is an infinite, locally finite, connected transitive graph and set p = 1 − e−β . For each q, if G,q
inf Pf,p (x ↔ y) = 0,
x,y∈V
then there exists a sequence of automorphisms of G, (φn )n≥1 , such that for any pair of spin events A, B ⊂ {1, . . . , q}V , G,q
G,q
G,q
lim µf,β (A ∩ φn−1 (B)) = µf,β (A)µf,β (B).
n→∞ G,q
(4.11)
G,q
In particular µf,β ∈ (Gβ,A )ext . G,q
G,q
Note. The conclusion that µf,β ∈ (Gβ,A )ext in the theorem above is not new. It is contained in Lemma 6.4 of [LyS].
Multiplicity of Phase Transitions
297
Proof. From the hypothesis of the theorem we know that there is a sequence of sites xn ∈ V such that G,q
lim Pf,p (r ↔ xn ) = 0.
n→∞
(4.12)
Because G is transitive, there are corresponding φn ∈ Aut(G) such that φn (r) = xn . G,q G,q In order to prove that (4.12) implies (4.11), we use the coupling Pf,p between Pf,p G,q
G,q
and µf,β . Pf,p acts on events contained in = = {1, . . . , q}V × {0, 1}E . To avoid the introduction of cumbersome notation, we will identify below any spin event D ⊂ {1, . . . , q}V with the event D × {0, 1}E ⊂ =, and any percolation event D ⊂ {0, 1}E with the event {1, . . . , q}V × D ⊂ =. It is a standard fact that to prove (4.11) it is sufficient to establish it in case the events A and B are cylinder events, i.e., depend only on the state of finitely many spins. We suppose therefore that the events A and B only depend on the states of the spins in a finite set 6 ⊂ V . Consider the following percolation event: Fn = {6 ↔ φn (6)} ⊂ {0, 1}E . G,q
Since Pf,p has positive correlations and is automorphism invariant, we have G,q
G,q
G,q
Pf,p (Fn ) = Pf,p (Fn ) ≤ C Pf,p (0 ↔ xn ), G,q
G,q
where C = (Pf,p (0 ↔ 6) Pf,p (all edges in 6 are occupied))−2 does not depend on n. Therefore (4.12) implies G,q
lim Pf,p (Fn ) = 0.
(4.13)
n→∞
For each S V , let PS be the set of all possible partitions of S, and let 5S be the random partition of S according to the relation of being in the same percolation cluster. Let Fn be the set of partitions of 6 ∪ φn (6) in which some site in 6 and some site in φn (6) are in the same piece of the partition. This means
56∪φn (6) ∈ Fn = Fn . (4.14) Suppose that π ∈ (Fn )c . Then, conditioned on 56∪φn (6) = π , the choices of the states of the spins in the sets 6 and φn (6) are performed independently, i.e., G,q
Pf,p (A ∩ φn−1 (B)|56∪φn (6) = π ) G,q
G,q
= Pf,p (A|56∪φn (6) = π ) Pf,p (φn−1 (B)|56∪φn (6) = π ). Therefore, G,q
G,q
µf,β (A ∩ φn−1 (B)) = Pf,p (A ∩ φn−1 (B)) =
π∈(Fn )c G,q
G,q
G,q
Pf,p (A|56∪φn (6) = π )Pf,p (φn−1 (B)|56∪φn (6) = π ) G,q
× Pf,p (56∪φn (6) = π ) + Pf,p (A ∩ φn−1 (B) ∩ Fn )
298
R. H. Schonmann
=
G,q
π∈P6∪φn (6)
G,q
Pf,p (A|56∪φn (6) = π ) Pf,p (φn−1 (B)|56∪φn (6) = π )
G,q
× Pf,p (56∪φn (6) = π ) −
G,q
Pf,p (A|56∪φn (6) = π )
π∈Fn G,q
G,q
× Pf,p (φn−1 (B)|56∪φn (6) = π ) Pf,p (56∪φn (6) = π ) G,q
+ Pf,p (A ∩ φn−1 (B) ∩ Fn ). From (4.13) and (4.14) we know that the last two terms above vanish, as n → ∞. As for the other term, we can write the following. (Below, φn (π2 ) is the partition of φn (6) obtained from the partition π2 of 6 when we relabel the sites of φn (6) according to their φn -preimages in 6. We will write “π comp π1 , φn (π2 )” for the statement that the partition π of 6 ∪ φn (6) is compatible with the partitions π1 and φn (π2 ) of 6 and φn (6).) G,q G,q Pf,p (A|56∪φn (6) = π ) Pf,p (φn−1 (B)|56∪φn (6) = π ) π∈P6∪φn (6)
G,q
=
× Pf,p (56∪φn (6) = π )
G,q
Pf,p (A|56 = π1 )
π1 ,π2 ∈P6 π∈P6∪φn (6) π comp π1 ,φn (π2 ) G,q
G,q
Pf,p (φn−1 (B)|5φn (6) = φn (π2 )) Pf,p (56∪φn (6) = π ) G,q G,q = Pf,p (A|56 = π1 ) Pf,p (B|56 = π2 ) π1 ,π2 ∈P6
×
G,q
Pf,p (56∪φn (6) = π )
π∈P6∪φn (6) π comp π1 ,φn (π2 )
=
G,q
G,q
Pf,p (A|56 = π1 ) Pf,p (B|56 = π2 )
π1 ,π2 ∈P6 G,q Pf,p ({56
= π1 } ∩ {5φn (6) = φn (π2 )}).
G,q
Since Pf,p has a trivial tail σ -field, it follows from, e.g., Proposition 7.9 of [Geo] that, as n → ∞, the last expression converges to G,q G,q G,q G,q Pf,p (A|56 = π1 ) Pf,p (B|56 = π2 ) Pf,p (56 = π1 ) Pf,p (56 = π2 ) π1 ,π2 ∈P6
=
G,q
G,q
Pf,p (A|56 = π1 ) Pf,p (56 = π1 )
π1 ∈P6
×
G,q
G,q
Pf,p (B|56 = π2 ) Pf,p (56 = π2 )
π2 ∈P6
=
G,q G,q Pf,p (A) Pf,p (B)
G,q
G,q
= µf,β (A) µf,β (B).
Multiplicity of Phase Transitions
299
This completes the proof of (4.11). G,q Applying (4.11) in the case A = B is automorphism invariant, gives µf,β (A) = G,q
(µf,β (A))2 , which is equivalent to G,q
µf,β (A) ∈ {0, 1}.
(4.15)
In other words, the σ -algebra of automorphism invariant spin events is trivial under the G,q measure µf,β . Using now Corollary 7.4 of [Geo], as in Step 5 of the proof of Theorem G,q
G,q
12.31 in the same book, we conclude that µf,β ∈ (Gβ,A )ext , as claimed.
!
Proof of Theorem 1.4. This claim is an immediate consequence of Theorem 4.1, (4.1), (3.7) and (4.1). ! The remainder of this subsection is dedicated to the proof of an extension of Theorem 1.5. For later reference, we state next the extension of Conjecture 3.1 to the FK models. Conjecture 4.1. For any infinite, locally finite, connected transitive graph G, any q ≥ 1 and any p ∈ [0, 1], G,q
inf P∗,p (x ↔ y) > 0
⇐⇒
x,y∈V
G,q
P∗,p (N = 1) = 1.
G,q
In particular for all p > p¯ conn,∗ (G, q), P∗,p (N = 1) = 1. The proof in [LyS] that for unimodular transitive graphs Conjecture 3.1 holds also applies to the FK models, as noted in the proof of Proposition 5.2 in [Lyo5], proving Conjecture 4.1 in this case. G,q The “⇐” part of Conjecture 4.1 is easy to prove, using the fact that P∗,p has positive correlations to conclude that G,q
P∗,p (N ≤ 1)
$⇒
G,q
G,q
P∗,p (x ↔ y) ≥ (P∗,p (r → ∞))2 .
(4.16)
Inequality (4.3) shows that if Conjecture 4.1 holds, then G,q
Pf,p (N = 1) = 1
$⇒
G,q
Pw,p (N = 1) = 1.
(4.17)
Given an infinite, locally finite, connected graph G = (V , E), q ∈ {2, 3, . . . }, and G,q β > 0, we define for i = 1, . . . , q, µ˜ i,β as the probability measure on {1, . . . , q}V obtained as follows. Consider a random configuration of occupied and vacant bonds G,q according to the law Pf,p , p = 1 − e−β , and assign to all the sites in the infinite clusters the state i, and to all the sites in each finite cluster a state from {1, . . . , q}, chosen uniformly at random for each cluster and independently for different clusters. G,q It is clear that if G is transitive, the measures µ˜ i,β are automorphism invariant. Furthermore, these measures have then a trivial invariant σ -field and are therefore extremal in the set of automorphism invariant probability measures on {0, 1}E . This can be proved by adapting the proof of Theorem 4.1. In the current case, the following definitions replace the ones used in that proof. Let Fn be the event that some finite cluster intersects 6 and φn (6); (4.13) is then clearly satisfied. Let PS be the set of all possible marked partitions of S V , meaning that each piece of the partition may be marked or not.
300
R. H. Schonmann
Let 5S be the random marked partition of S according to the relation of being in the same percolation cluster, with marks identifying pieces of the partition which belong to infinite clusters. Let Fn be the set of partitions of 6 ∪ φn (6) in which some site in 6 and some site in φn (6) are in the same unmarked piece of the partition. This means that (4.14) also hold in this case. The proof then proceeds as before. G,q G,q If Pf,p (r → ∞) > 0, then µ˜ i,β , i = 1, . . . , q, are distinct from each other. Indeed, with a self-explanatory notation, ˜ G,q 1 G,q G,q δσr ,i i,β = Pf,p (r → ∞) + 1 − Pf,p (r → ∞) q q − 1 G,q 1 1 = Pf,p (r → ∞) + > , q q q while, for i = j ,
δσr ,j
˜ G,q i,β
=
1 1 G,q 1 − Pf,p (r → ∞) < . q q G,q
G,q
From (4.3) and the construction of µi,β and of µ˜ i,β , 1 = 1, . . . , q, by assigning G,q
G,q
spins in the same fashion to the clusters of a percolation process with law Pw,p or Pf,p , respectively, we have the equivalence G,q
G,q
µ˜ i,β = µi,β
⇐⇒
G,q
G,q
Pf,p = Pw,p .
(4.18)
(Compare with (4.10).) On the other hand, it is not clear in general whether the measures G,q G,q µ˜ i,β are in Gβ . The next theorem provides a sufficient condition for this to happen. This theorem is complementary to Theorem 4.1 in that it provides a sufficient condition G,q G,q G,q for µf,β ∈ (Gβ,A )ext in terms of percolation properties of Pf,p . Theorem 4.2. Suppose that G = (V , E) is an infinite, locally finite, connected transitive G,q G,q graph and set p = 1 − e−β . If Pf,p (N = 1) = 1, then µ˜ i,β , 1 = 1, . . . , q are distinct G,q
elements of (Gβ,A )ext and G,q
µf,β =
i=1,...,q
G,q
1 G,q µ˜ . q i,β
(4.19)
G,q
In particular µf,β ∈ (Gβ,A )ext . G,q
G,q
Proof. Pf,p (N = 1) = 1 implies Pf,p (r → ∞) > 0, which implies that the measures G,q
µ˜ i,β , i = 1, . . . , q, are distinct from each other, as we saw above. G,q
G,q
From the construction of µf,β and of µ˜ i,β , 1 = 1, . . . , q, by assigning spins to the G,q
clusters of a percolation process with law Pf,p , we obtain (4.19) from the uniqueness of the infinite cluster in this percolation process. G,q G,q From Theorem 14.15(c) of [Geo], we learn from (4.19) that for each i, µ˜ i,β ∈ Gβ,A . (Note that while this theorem is stated in [Geo] in the setting of translation invariant Gibbs measures on Zd , its proof applies also to the more general setting of automorphism invariant Gibbs measures on a transitive graph.) Being automorphism invariant Gibbs G,q G,q measures with trivial invariant σ -field, the measures µ˜ i,β are in (Gβ,A )ext by, e.g., Corollary 7.4 of [Geo]. !
Multiplicity of Phase Transitions
301
The following is an extended version of Theorem 1.5. Theorem 4.3. Suppose that G is an infinite, locally finite, connected transitive unimodular graph. Then for each q: (a) We have the following dichotomy for each β > 0 and p = 1 − eβ . Either we have G,q
inf Pf,p (x ↔ y) = 0,
x,y∈V
G,q
Pf,p (N = 1) = 0,
G,q
G,q
µf,β ∈ (Gβ,A )ext .
Or else we have G,q
inf Pf,p (x ↔ y) > 0,
x,y∈V G,q
G,q
Pf,p (N = 1) = 1, G,q
and µf,β is a mixture of exactly q measures in (Gβ,A )ext , given by (4.19). (b) The first alternative above hold for all β < β¯1 (G, q) and the second alternative holds for all β > β¯1 (G, q), with 1 log if p¯ conn,f (G, q) < 1, 1−p¯ conn,f (G,q) β¯1 (G, q) = (4.20) ∞ if p¯ conn,f (G, q) = 1. (c) β¯1 (G, q) < ∞ iff pu (G) < 1. Proof. Claim (a) is an immediate consequence of Theorem 4.1, Theorem 4.2 and the fact that Conjecture 4.1 has been proved in the unimodular case (see the remark after that conjecture). Claim (b) is now immediate from (a) and the monotonicity (4.4). Claim (c) is immediate from (4.20), the fact that Conjecture 3.1 holds in the unimodular case (see the remark after that conjecture) and the inequalities (4.1) and (4.2). ! Note from the proof above that the assumption in Theorem 1.5 and Theorem 4.3 that G is unimodular would not be necessary if Conjecture 4.1 were vindicated (this would include the vindication of Conjecture 3.1). The following conjecture is unfortunately open even in the unimodular case. Conjecture 4.2. Suppose that G = (V , E) is an infinite, locally finite, connected transitive graph. Then for each q ≥ 1, G,q
Pf,p (N = 1) = 1
$⇒
G,q
G,q
Pf,p = Pw,p .
(Compare with (4.9), (4.10) and (4.17).) Because of (4.18) and (4.19) (or alternatively, because of (4.10)), if Conjecture 4.2 were proved, we would obtain in Theorem 4.3 the stronger conclusion, that when G,q G,q µf,β ∈ (Gβ,A )ext , then (1.2) holds, so that in particular β¯1 (G, q) = β¯2 (G, q). Note that even in the case of the lattices Zd , d ≥ 3, these conclusions are not currently available. To the best of our knowledge, even the weaker statement (a) in Theorem 1.5 is new in this case. Note. The proof of Proposition 6.1(iv) of [HJL], which was written independently of this paper, shows that Conjecture 4.2 is true in the case of planar non-amenable transitive graphs with one end. Observe that this result can also be obtained from the arguments in the proof of Theorem 5.6 in the current paper.
302
R. H. Schonmann
4.2. Mean field criticality for the Ising model. One extends the definition of the diagrams (3.11) in the obvious way: G,q
Diagp,∗,k (x, y) G,q G,q G,q G,q = Pp,∗ (x ↔ z1 ) Pp,∗ (z1 ↔ z2 ) . . . Pp,∗ (zk−1 ↔ zk ) Pp,∗ (zk ↔ y). z1 ,...,zk ∈V
The diagram corresponding to the Ising model (q=2), with k = 1 is of fundamental relevance in studying the critical behavior of this model. First we introduce the usual Ising spins, −1 if σx = 1, sx = +1 if σx = 2, for x ∈ V . Also, we use below the more standard notation for the (+)-phase and for the (−)-phase of the Ising model: G,Ising
µβ,−
= µG,2 β,1
G,Ising
µβ,+
= µG,2 β,2 ,
as well as further similar self-explanatory notation. Note that (4.7) can now be rewritten in the following way, where q = 2: G,Ising q 1 G,q sx sy +,β = δσx ,σy − q −1 q 2,β G,q
G,q
= Pw,p (x ↔ y → ∞) + Pw,p (x → ∞, y → ∞). So we have, for q = 2 and k = 1: G,Ising G,Ising G,Ising G,q "sx sz #+,β Diagp,w,k (x, y) = sz sy +,β = Bubbleβ (x, y),
(4.21)
z∈V
G,q
provided Pw,p (N = 0) = 1. Before we can state the known results on mean field critical behavior, we need a few more definitions. We will sometimes below abbreviate βc = βc (G, Ising), since there is no risk of confusion. Since there is a unique Gibbs measure when β < βc we will drop indication of + or − when referring to this unique measure. We will also G,Ising have to consider the Ising model with an external field h. We denote by µ+,β,h the (automorphism invariant when G is transitive) Gibbs measure for this system obtained by taking the infinite volume limit with + boundary conditions. (Note that on nonamenable graphs there may be more than one Gibbs distribution even when h = 0. This is well known for homogeneous trees, and was extended to other transitive non-amenable graphs in [JS].) The results below on mean field critical behavior were proved in the union of the papers [Sok, Aiz, AG] and [AF] (see also [ABF]). Suppose that G is an infinite, locally finite, connected transitive unimodular graph. Under the bubble diagram condition, which reads, G,Ising Bubbleβc (G,Ising) (r, r) < ∞, one has the following (the labels on the left indicate the way one usually refers to each result, in terms of a corresponding critical exponent; note that in this standard notation, the critical exponent β is not the same as the inverse temperature β):
Multiplicity of Phase Transitions d dβ
303
G,Ising x:dist(x,r)=1 "sr sx #β
[α ≤ 0]
0≤
[γ = 1]
C1 (βc − β)−1 ≤
[β = 1/2]
C1 (β − βc )1/2 ≤ "sr #+,β
[δ = 3]
C1 h1/3 ≤ "sr #+,βc ,h ≤ C2 h1/3 ,
< C,
G,Ising x∈V "sr sx #β G,Ising
for β < βc ,
≤ C2 (βc − β)−1 ,
≤ C2 (β − βc )1/2 ,
G,Ising
for β < βc ,
for β > βc ,
for h > 0,
where in each case C, C1 , C2 ∈ (0, ∞). 4.3. Consequences for FK and Potts models of “high non-amenability”. The next theorem is an extension to the FK models of Theorem 3.2, and hence of Theorem 1.1. It implies Theorem 1.3 and Theorem 1.9, as will be explained after we prove it. Theorem 4.4. Suppose that G is an infinite, locally finite, connected graph and q ≥ 1. If q (−1 + D(G)2 + q 2 D(G)2 − q 2 ), (4.22) iE (G) > 1 + q2 then for the q-FK model on G, pc,w (G, q) ≤ pc,f (G, q) < pexp,w (G, q) ≤ pexp,f (G, q).
(4.23)
Moreover, for k = 0, 1, . . . , G,q
sup
x,y∈V ,dist(x,y)≥n
and for each x ∈ V ,
Diagpc,w (G,q),k (x, y) → 0 exponentially fast as n → ∞, G,q
Diagpc,w (G,q),k (x, x) < ∞. Proof. From (4.2) and (3.1) we know that pc,∗ (G, q) 1 ≤ . pc,∗ (G, q) + (1 − pc,∗ (G, q))q 1 + iE (G) This is equivalent to pc,∗ (G, q) ≤
q . q + iE (G)
(4.24)
From (4.1) and (3.5) we know that pexp,∗ (G, q) ≥ 1/R(G). Combining this with (2.4) we have pexp,∗ (G, q) ≥
1 D(G)2
− iE (G)2
.
The strict inequality in (4.23) is implied by (4.24) and (4.25), provided that q 1 < , 2 q + iE (G) D(G) − iE (G)2
(4.25)
304
R. H. Schonmann
which is equivalent to (1 + q 2 )iE (G)2 + 2qiE (G) − q 2 (D(G)2 − 1) > 0. Since iE (G) ≥ 0, this is equivalent to the inequality (4.22). This completes the proof of (4.23), since the non-strict inequalities there are clear G,q G,q from Pf,p ≤ Pw,p . The claims about the diagrams follow by the same argument, using Theorem 3.1 in the place of (3.5). ! Note that the condition iE (G) q > , D(G) 1 + q2
(4.26)
which appears in Theorem 1.3 and Theorem 1.9 is stronger than (4.22), and that therefore, thanks to (4.6), Theorem 4.1 and (4.21), Theorem 1.3 and Theorem 1.9 follow from Theorem 4.4. We chose to state Theorem 1.1, Theorem 1.3 and Theorem 1.9 in the introduction with the less cumbersome (4.26) rather than with (4.22) to make these theorems look more transparent. Theorem 1.3 has the following counterpart for not necessarily transitive graphs. Theorem 4.5. Suppose that G is an infinite, locally finite, connected graph. If (4.22) holds, then there is a non-degenerate interval of values of β on which (1.2) fails and G,q µf,β has exponentially fast decaying correlations, in the sense that G,q 1 sup δσx ,σy f,β − → 0 exponentially fast, as l → ∞. (4.27) q x,y∈V dist(x,y)≥l
In particular β¯2 (G, q) > βc (G, q). Proof. Set p (G, q) = q/(q + iE (G)), which is equivalent to 1 p (G, q) = . p (G, q) + (1 − p (G, q))q 1 + iE (G)
(4.28)
From the proof of Theorem 4.4 we know that p (G, q) < pexp,f (G, q), so that the interval 1 1 I = log , log 1 − p (G, q) 1 − pexp,f (G, q) is well defined and not empty. For β ∈ I , (4.27) follows from (4.8), since p = 1−e−β < pexp,f (G, q). On the other hand, the proof of Theorem 2 in [BS1] ((3.1) in the current paper, when restated for bond percolation) shows that for p > 1/(1 + iE (G)), inf PG p (x → ∞) > 0.
x∈V
(For details, see the proof of Theorem 5.3 in Subsect. 5.2, which extends (3.1).) For β ∈ I , p = 1 − e−β > p (G, p) and therefore p/(p + (1 − p)q) > 1/(1 + iE (G)), by (4.28). Using (4.2), we conclude that G,q
inf Pw,p (x → ∞) ≥ inf PG p/(p+(1−p)q) (x → ∞) > 0.
x∈V
x∈V
Multiplicity of Phase Transitions
305 G,q
From (4.7) and the fact that Pw,p has positive correlations, we obtain now for β ∈ I and i = 1, . . . , q, 2 G,q q −1 1 1 G,q inf Pw,p (x → ∞) > . inf δσx ,σy i,β ≥ + x,y∈V x∈V q q q But if (1.2) held, this would contradict (4.27).
!
5. Potts and Ising Models II. The Finite Island Property 5.1. Sufficient condition for the Potts free Gibbs measure to decompose as the uniform mixture of the q ordered phases. Our approach in this section to Theorem 1.6 and to Theorem 1.7 is motivated by the proof of Proposition 2.1.5 in [NW], where the same conclusion was obtained in the special case in which the graph is Tb × Z, with large b. This approach, combined with results in [BS3], will also lead to the proof of Theorem 1.8. First we need to review a notion which is stronger than the requirement that percolation occurs. The setting is that of bond percolation on an infinite, locally finite, connected graph G. We say that the finite island property holds if any infinite chain in the graph intersects some infinite cluster (in the sense of containing some site which belongs to some infinite cluster). In other words, if we remove all the sites which belong to infinite clusters (along with the edges incident to them), then the remaining graph contains no infinite connected component. Theorem 5.1 (Newman and Wu). Suppose that G is an infinite, locally finite, connected G,q graph and set p = 1 − e−β . If for bond percolation on G under the law Pf,p of the FK model with parameters q and p and free boundary conditions a.s. there is a unique infinite cluster and the finite island property holds, then G,q
µf,β =
i=1,...,q
1 G,q µ . q i,β
(5.1)
The hypothesis, and hence the conclusion above, holds if for independent bond percolation on G with density p/(p + (1 − p)q) a.s. there is a unique infinite cluster and the finite island property holds. For a complete proof of this theorem, we refer the reader to the proof of Proposition 2.1.5 in [NW]. But for convenience we give next an idea of the proof. Sketch of Proof. Suppose that we construct a random spin configuration with Gibbs G,q distribution µf,β on the same probability space on which a bond percolation process G,q
with law Pf,p has been constructed, in the fashion reviewed in the last section, i.e., G,q
according to the coupling Pf,p . Suppose that in this bond percolation process there is a unique infinite cluster and the finite island property holds. The spins in the unique infinite cluster will all take the same value, say S, chosen uniformly from {1, . . . , q}. Therefore any infinite chain must contain spins which take value S. This implies that the spin at any site is “shielded from infinity by spins S”. It is a fairly standard matter to G,q use the Markov property of µf,β now (with no further reference to the FK model and G,q
the coupling Pf,p ) to derive (5.1) in a rigorous fashion.
306
R. H. Schonmann
As for the last statement in the theorem, it is enough to observe that the event that there is a unique infinite cluster and that the finite island property holds is an increasing event, and to use then (4.2). ! Theorem 5.1 motivates the following problem, which is also interesting for its own sake as a natural extension of the problem of determining for which graphs pc < 1. Open Problem 5.1. For which graphs does the finite island property hold for independent bond percolation with large p? In particular, is the condition iE∗ (G) > 0 (or the stronger condition iE (G) > 0) sufficient for this? Theorem 5.2 and Theorem 5.5 contain partial answers to this question. In combination with Theorem 5.1 they imply Theorem 1.6 and Theorem 1.7. Theorem 5.2 and Theorem 5.5 extend results in [NW] and [Wu3] and also a result in [Jon] which states that the finite island property holds for bond percolation on homogeneous trees with degree at least 3, for large p (see the proof of Theorem 1.3 in that paper). Note that Theorem 5.1 cannot be applied to the homogeneous trees, since uniqueness of the infinite cluster always fails there.
5.2. Finite island property for graphs with positive anchored vertex isoperimetric constant. Theorem 5.2. Suppose that G is an infinite, locally finite, connected graph. If iV∗ (G) > 0, then for independent bond percolation with large p the finite island property holds. Our proof of Theorem 5.2 was motivated by the approach used in [NW]. The first step in its proof is Theorem 5.3, which extends Theorem 2 in [BS1] (inequality (3.1) here), by providing exponential estimates. Its proof is adapted from [BS1], with a large deviation estimate used to obtain the desired exponential bound. (Note that [BS1] considered site percolation; the adaptation to bond percolation as done here or in Lemma 4 in [PSN] is straightforward.) It is relevant to stress that the precise form of the exponential estimates obtained in Theorem 5.3 are not a priori obvious. In a sequence of remarks which appear later in this section we will discuss this point and related aspects of the proof of Theorem 5.2. Theorem 5.3. Suppose that G is an infinite, locally finite, connected graph. (a) If iE∗ (G) > 0, then for p > 1/(1 + iE∗ (G)) there is γ ∈ (0, ∞) and for each x ∈ V there is Cx ∈ (0, ∞) such that for any connected A V with x ∈ A, −γ (|A|+|∂V A|) . PG p (A → ∞) ≤ Cx e
Moreover γ can be taken as large as desired, provided p is close enough to 1. (b) If iE (G) > 0, then for p > 1/(1 + iE (G)) there are C, γ ∈ (0, ∞) such that for any A V, −γ (|A|+|∂V A|) . PG p (A → ∞) ≤ Ce
Moreover γ can be taken as large as desired, provided p is close enough to 1.
Multiplicity of Phase Transitions
307
Proof. We will prove (a); the proof of (b) is analogous but simpler. Since p > 1/(1 + iE∗ (G)) we can take 0 < h < iE∗ (G) such that p>
1 . 1+h
(5.2)
Given x ∈ V , with no loss we can suppose that |A| is large enough that for all connected S V such that x ∈ S and |S| ≥ |A| we have |∂E S| ≥ h|S|. (5.3) As in the proofs of Theorem 2 in [BS] and Lemma 4 in [PS-N], we will use a technique known as “growing the cluster of the set A”, and we review now what this means. A random sequence of finite subsets of V , Vi , i = 0, 1, . . . , and a random sequence of finite subsets of E, Ei , i = 0, 1, . . . , will be defined, so that if |C(A)| ≤ ∞, Vi will grow towards C(A) and Ei will grow towards a set which contains ∂E C(A). To describe these sequences we order the elements of E in an arbitrary way. Set V0 = A, E0 = ∅. We suppose now that Vi and Ei are known and will explain how Vi+1 and Ei+1 are obtained. In case ∂E Vi ⊂ Ei set Vi+1 = Vi and Ei+1 = Ei . Otherwise the set ∂E Vi \Ei is not empty and we let ei be its first element. Say that xi is the endpoint of ei contained in (Vi )c . We check whether the edge ei is occupied or not. If ei is occupied, set Vi+1 = Vi ∪ {xi } and Ei+1 = Ei . If ei is vacant set Vi+1 = Vi and Ei+1 = Ei ∪ {ei }. This completes the description of the construction. On the event {A → ∞} there exists n such that C(A) = Vn , ∂E C(A) ⊂ En . In what follows we take the minimal such n, so that we have |Vn | + |En | = |A| + n.
(5.4)
Since ∂E Vn ⊂ En and x ∈ A ⊂ Vn , from (5.3) we obtain |En | ≥ |∂E Vn | ≥ h|Vn |. Using (5.4) then gives |En | ≥
h (|A| + n). 1+h
(5.5)
Note that we must have n ≥ |∂V A|, since for each vertex in ∂V A at least one of the edges incident to it must have its occupancy status checked in the construction before we can conclude whether this vertex belongs to C(A) or not. Therefore from (5.5) and the fact that Ei is non-decreasing with i, we obtain h G (A → ∞) ≤ P (|A| + n) for some n ≥ |∂V A| PG p p |En | ≥ 1+h E|A|+n ≥ h (|A| + n) for some n ≥ |∂V A| ≤ PG p 1+h ∞ E|A|+n ≥ h (|A| + n) ≤ PG p 1+h =
n=|∂V A| ∞
i=|A|+|∂V A|
PG p
|Ei | ≥
h i . 1+h
For each i, the distribution of Ei is stochastically dominated by a binomial distribution corresponding to i indepenent attempts each one with probability 1 − p of success.
308
R. H. Schonmann
Therefore from (5.2) and standard facts about large deviations for the binomial distribution (see, e.g., Exercise 2.2.23(b) on p. 35 of [OZ]) we obtain the desired conclusion, including the statement that γ can be made as large as desired by taking p close to 1. ! Remark 5.1. In the exponential estimates in Theorem 5.3, each one of the terms |A| or |∂V A| may dominate the other. For instance, if G = T! and A is a long chain, then |∂V A| dominates |A|. If G = Br(a), for some a = 2, 3, . . . and A = {x ∈ V : dist(r, x) ≤ 2k − 1} with large k, then |A| dominates |∂V A|. Remark 5.2. Even under the condition iV (G) > 0 (stronger than iV∗ (G) > 0, iE (G) > 0 and iE∗ (G) > 0), PG p (A → ∞) in Theorem 5.3 may not decay as fast as an exponential of |∂E A| for any p < 1. And this is so even if A is supposed to be a chain starting at a given fixed site. An example proving this assertion is as follows. Consider the graph Br(a), with some a = 2, 4, 6, . . . and take for A = Ak a finite chain with exactly one vertex at distance i from the root, for i = 0, . . . , 2k, and a k+1 /2 vertices at distance a k+1 , 2k+1 from the root (this is half of the existing ones). Then PG p (Ak → ∞) ≥ (1−p) while |∂E Ak | ≥ (a k+1 /2)2 = a 2k+2 /4. Proposition 5.1 (Chen, Peres). Suppose that G is an infinite, locally finite, connected graph. (a) If iE∗ (G) > 0, then there is γ ∈ (0, ∞) and for each x ∈ V there is Cx ∈ (0, ∞) such that |{S : x ∈ S ⊂ V , S connected , |∂E S| = n}| ≤ Cx eγ n . (b) If iV∗ (G) > 0, then there is γ ∈ (0, ∞) and for each x ∈ V there is Cx ∈ (0, ∞) such that |{S : x ∈ S ⊂ V , S connected , |∂V S| = n}| ≤ Cx eγ n . Part (a) is Lemma 2.1 in [CP], and part (b) can be proved in an analogous manner. (These proofs are applications of probability to combinatorics; for part (a) one uses bond percolation, and for part (b) one replaces it with site percolation.) Remark 5.3. The condition iE (G) > 0 (stronger than iE∗ (G) > 0) imposes no restriction on the way |{S : r ∈ S ⊂ V , S is a chain, |∂V S| = n}| ( ≤ |{S : r ∈ S ⊂ V , S connected, |∂V S| = n}|) grows with n. For an extreme example consider the graph Br(a) and for k > log(n − 1)/ log(a) let Sk be a finite chain which contains all vertices at distance i from the root, for i = 0, . . . , 2k, and a k+1 −n+1 vertices at distance 2k +1 from the root (this is all but n − 1 of the existing ones). Then for each admissible k, |∂V Sk | = n. This shows that for each n we have |{S : r ∈ S ⊂ V , S is a chain, |∂V S| = n}| = ∞. Remark 5.4. Note that even under the condition iV (G) > 0 (stronger than iV∗ (G) > 0, iE (G) > 0 and iE∗ (G) > 0), |{S : r ∈ S ⊂ V , S is a chain, |S| = n}| may grow faster than exponentially with n. For instance, observe that for T! this last expression takes the value (n − 2)!. Proof of Theorem 5.2. For x ∈ V let Fx denote the event that there is an infinite chain starting at x which does not contain any site which belongs to an infinite cluster. Our task is to show that for each x, PG p (Fx ) = 0.
Multiplicity of Phase Transitions
Set iVx (G) = inf
309
|∂V S| : x ∈ S V , S connected . |S|
Since iV∗ (G) > 0, we also have iVx (G) > 0. If Fx happens, then for each n there is a finite chain A V , x ∈ A, |A| ≥ n/ iVx (G), such that A → ∞. (A can be chosen as a finite chain contained in the chain whose existence is assumed in the definition of Fx .) In particular |∂V A| ≥ n. When p is large enough so that the γ in Theorem 5.3(a) is larger than the γ in Proposition 5.1(b), we obtain now PG PG ! p (Fx ) ≤ lim p (A → ∞) = 0. n→∞
i≥n AV , A is a chain |∂V A|=i, x∈A
Remark 5.5. The presence of |A| in the exponential estimate in Theorem 5.3(a) was not needed in the proof above of Theorem 5.2. The presence of |∂V A| there was the crucial element in this proof, and Remark 5.4 shows that if we only had |A| there, our approach would not work. Remark 5.3 explains why our approach does not solve the question raised in Open Problem 5.1, whether iE∗ (G) > 0 (or iE (G) > 0) could replace the condition iV∗ (G) > 0 in Theorem 5.2. The following theorem is an extension of Theorem 1.6. Theorem 5.4. Suppose that G is an infinite, locally finite, connected graph. If iV∗ (G) > 0 and for all large p < 1, PG p -a.s. there is a unique infinite cluster, then we have G,q G,q β¯2 (G, q) < ∞ for each q. In particular, for large β, µ ∈ (G )ext . f,β
Proof. Combine Theorem 5.1 with Theorem 5.2.
β
!
A class of graphs which satisfy the hypothesis of Theorem 5.4 is that of Cartesian products of 2 infinite connected bounded degree graphs, at least one of which is nonamenable. It is easy to see that then the product is also non-amenable and from Theorem 1.9 of [HPS] we have on it a.s. uniqueness of the infinite cluster for p > pc (Z2 ). 5.3. Finite island property for transitive graphs with one end which satisfy the quasiconnected minimal cut sets property. Theorem 5.5. Suppose that G = (V , E) is an infinite, locally finite, connected transitive graph. If G has the quasi-connected minimal cut sets property and has a single end, then pu (G) < 1 and for independent bond percolation with large p the finite island property holds. Proof. We know from the proof of Corollary 10 of [BB] that pc (G) ≤ pu (G) < 1. Take p ∈ (pc (G), 1), and suppose that the finite island property does not hold for independent bond percolation with parameter p. Then in this percolation process we have a.s. the coexistence of an infinite cluster (call it C) and an infinite chain which does not intersect any infinite cluster (call it I). Choose x ∈ C and y ∈ I. Let 5 be the set of edges {u, v} ∈ E with the property that there is a path from u to x inside C (i.e., all the vertices it crosses belong to C) and there is a path from v to y inside (C)c (i.e., all the vertices it crosses belong to (C)c ). It is clear that 5 is an (x, y) mcs.
310
R. H. Schonmann
We want to argue that 5 is infinite. Since G has a single end, given any n, the removal of the ball B(n) from the graph leaves exactly one infinite connected component in the remaining graph. This infinite component must contain a subset Cn of C and a subset In of I. Therefore there must be a path outside B(n) from Cn to In . Obviously, this path can be chosen so that one of its endpoints, u, belongs to Cn , but it does not cross any other site of Cn . Say that {u, v} is the edge of this path incident to u, and that w is the end-point of this path distinct from u. By pasting this path to a path from u to x inside of C and to a path from w to y inside of I, we obtain a path which shows that (u, v) ∈ 5. Since n is arbitrary one can obtain infinitely many elements of 5 in this manner. From the definition of the quasi-connected mcs property, we know that there exists l < ∞ so that any infinite mcs, when considered as a set of vertices in (GE )l , must contain an infinite chain in this graph. So we learn that 5 must contain an infinite chain in (GE )l . But all edges in 5 are vacant (since it has one endpoint in Cn and one in (Cn )c ). Consider independent site percolation on (GE )l with parameter 1 − p. From what we just learned, there is an infinite cluster in this process. But since G is a transitive graph, (GE )l must be a quasi-transitive graph and, in particular, it must have bounded degree. Therefore (GE )l does not support infinite clusters in site percolation with parameter 1−p when p is close to 1. To avoid a contradiction we must conclude that for independent bond percolation on our original graph G the finite island property holds when p is large enough. ! Proof of Theorem 1.7. Combine Theorem 5.1 with Theorem 5.5.
!
5.4. Finite island property for planar graphs. Suppose that G = (V , E) is a planar graph, as defined in Subsect. 2.8, and let G† = (V † , E † ) be its dual, so that there is a one-to-one correspondence between the edges of E and those of E † . Given a random configuration of occupied and vacant edges in {0, 1}E , say that e† ∈ E † is occupied (resp. vacant) if the corresponding edge e ∈ E is vacant (resp. occupied). Let, as before, N be the number of infinite clusters in G according to a random configuration in {0, 1}E and let N † be the number of infinite clusters in G† according to the corresponding † random configuration in {0, 1}E . From the proof of Theorem 3.4 in [BS3], we know that if G = (V , E) is an infinite, locally finite, connected transitive non-amenable planar graph with one end, and P is a probability distribution on {0, 1}E which is automorphism invariant, has a trivial invariant σ -field and satisfies the finite energy condition (which means that on any finite set of edges each configuration of occupied and vacant edges has positive conditional probability given any configuration outside of this set), then P ((N , N † ) ∈ (1, 0), (0, 1), (∞, ∞)) = 1.
(5.6)
G,q
The measures P∗,p are known to satisfy the conditions above on P , which are the same in the next theorem. Theorem 5.6. Suppose that G = (V , E) is an infinite, locally finite, connected transitive non-amenable planar graph with one end. Suppose that P is a probability distribution on {0, 1}E which is automorphism invariant, has a trivial invariant σ -field and satisfies the finite energy condition. If P (N = 1) = 1, then P (the finite island property holds) = 1.
Multiplicity of Phase Transitions
311
Proof. We will use the notation from Subsect. 2.8. Given a bounded set A ⊂ 5, we say that a set S ⊂ 5 surrounds A if there exists a simple closed curve whose image is contained in S which separates 5 into two components, one of which is bounded and contains A. Let O (resp. C) be the set of embedded edges of G which are occupied (resp. belong to infinite clusters). From (5.6) we learn that P -a.s. there are no dual infinite clusters. Any open ball B ⊂ 5 intersects only finitely many faces of the embedding of G in 5. The random set of faces which belong to the same dual cluster as some face which intersects B ! be the closure of the union of these faces. B ! must be is therefore P -a.s. finite. Let B ! ˇ The connected and bounded. Hence 5\B contains exactly one unbounded component B. ˇ boundary of B is a subset of O which has the property needed to show that O surrounds B. Since P -a.s. there is an infinite cluster, large balls must intersect this infinite cluster, and since they are surrounded by O, they must also be surrounded by C. This implies that P -a.s. C surrounds each bounded A ⊂ 5. But this implies the finite island property. ! Proof of Theorem 1.8. The statement that pu (G) < 1 is contained in Theorem 1.1 of [BS3]. From Proposition 2.1 in [BS3], we know that G is unimodular, and we can therefore use Theorem 4.3. From the dichotomy in part (a) of that theorem, Theorem 5.6 and Theorem 5.1, we obtain claim (a). Using (a) and the fact that pu (G) < 1, parts (b) and (c) of Theorem 4.3 imply now 1 . ! β¯1 (G, q) = β¯2 (G, q) = log 1 − p¯ conn,f (G, q)
6. Related Results 6.1. Multiplicity of phase transitions in the FK model. The FK model is worth of study for its own sake (see, e.g., [Gri1, Häg1, Jon, GHM] and [Lyo5]). To keep the current paper from becoming too long, we will limit ourselves to stating here a conjecture similar to Conjecture 1.4 and then stating two results which are immediate consequences of our results on the Potts model and of (4.10). This will restrict our results to the cases q ∈ {2, 3, . . . }. (Part (a.1) of the next conjecture is a consequence of the well known (4.9) and is included in the statement of the conjecture for comparison with the other statements there. Also the fact that for transitive graphs, or more generally for bounded degree graphs, pc,w (G, q) > 0 is well known.) Conjecture 6.1. Suppose that G is an infinite, locally finite, connected transitive graph. ¯ q) such that: Then for each q > 1 there exist 0 < pc,w (G, q) ≤ p(G, G,q
G,q
(a.1) For p < pc,w (G, q), Pf,p = Pw,p .
G,q
G,q
(a.2) For pc,w (G, q) < p < p(G, ¯ q), Pf,p = Pw,p . G,q
G,q
(a.3) For p > p(G, ¯ q), Pf,p = Pw,p .
312
R. H. Schonmann
Moreover (b.1) p(G, ¯ q) < pc,w (G, q) iff G is non-amenable. (b.2) p(G, ¯ q) < 1 iff pu (G) < 1. As with the last implication in (4.10), the conjecture above cannot be extended to q = 1. Note that if Conjecture 4.1 (which includes Conjecture 3.1) and Conjecture 4.2 were both proved, then using (4.10) the statements in Conjecture 6.1, with the exception of (b.1), would follow, with p(G, ¯ q) = p¯ conn,f (G, q). From Theorem 1.3, Theorem 1.6 and (4.10) we obtain Theorem 6.1. Suppose that G is an infinite, locally finite, connected transitive graph with pu (G) < 1 and that q ∈ {2, 3, . . . }. If iE (G)/D(G) > q/ 1 + q 2 , then there are pc,w (G, q) < p¯ 1 (G, q) ≤ p¯ 2 (G, q) < 1 such that for the q-FK model on G: G,q
G,q
(a) For pc,w (G, q) < p < p¯ 1 (G, q), Pf,p = Pw,p . G,q
G,q
(b) For p > p¯ 2 (G, q), Pf,p = Pw,p . (Only the use of Theorem 1.6 restricts Theorem 6.1 to integer values of q. The use of Theorem 1.3 can be replaced with Theorem 4.4 and (4.16), which hold for q ≥ 1.) From Theorem 1.8 and (4.10) we obtain Theorem 6.2. Suppose that G is an infinite, locally finite, connected transitive nonamenable planar graph with one end. Then for each q ∈ {2, 3, . . . } there is pc,w (G, q) ≤ p(G, ¯ q) < 1 such that: G,q
G,q
¯ q), Pf,p = Pw,p . (a) For pc,w (G, q) < p < p(G, G,q
G,q
(b) For p > p(G, ¯ q), Pf,p = Pw,p . Note that combining Theorem 6.2 with Theorem 4.4(a) in [Jon] and (4.10) again, we learn that under the conditions of Theorem 6.2, if q is a large integer (depending on G), then pc,w (G, q) < p(G, ¯ q). This further supports Conjecture 6.1. Note. Proposition 6.11 (iii) and (iv) of [HJL], which was written independently of this paper, contain further partial results which support Conjecture 6.1.
6.2. Site percolation. In this subsection we summarize the independent site percolation analogues of some of the independent bond percolation results in this paper. In independent site percolation, each site of a graph G = (V , E) is occupied with probability p and vacant otherwise, these decisions being independent for distinct sites. Clusters are the infinite connected components of the graph obtained from G by deleting the vacant sites, along with the edges incident to them. The notation below should be self-explanatory. First we note that Br(a), a ≥ 2 can be used to illustrate the fact that even under iE (G) > 0 we can have pcsite (G) = 1. On the other hand, Theorem 2 in [BS1] states that, similarly to (3.1), pcsite (G) ≤
1 . 1 + iV (G)
(6.1)
Multiplicity of Phase Transitions
313
√ Theorem 1.1 has an analogue √ with the condition iV (G)/D(G) > 1/ 2 replacing the condition iE (G)/D(G) > 1/ 2. And Theorem 4.4 has an analogue with the condition iV (G) > (−1 + 2D(G)2 − 1)/2 replacing (4.22). These claims are consequences of the following. First, (3.1) is replaced with (6.1). Second, (3.4) and Theorem 3.1 are essentially unchanged for site percolation. Third, iV (G) ≤ iE (G), so that (2.4) yields: (iV (G))2 + (R(G))2 ≤ (D(G))2 .
(6.2)
Theorem 5.3 has the following analogue, which improves on (6.1). Theorem 6.3. Suppose that G is an infinite, locally finite, connected graph. (a) If iV∗ (G) > 0, then for p > 1/(1 + iV∗ (G)) there is γ ∈ (0, ∞) and for each x ∈ V there is Cx ∈ (0, ∞) such that for any connected A V with x ∈ A, (A → ∞) ≤ Cx e−γ |∂V A| . PG,site p Moreover γ can be taken as large as desired, provided p is close enough to 1. (b) If iV (G) > 0, then for p > 1/(1 + iV (G)) there are C, γ ∈ (0, ∞) such that for any A V, PG,site (A → ∞) ≤ Ce−γ |∂V A| . p Moreover γ can be taken as large as desired, provided p is close enough to 1. A term |A| can also be included in the exponent in the upper bounds in Theorem 6.3, but unless one is interested in good estimates for γ , this is irrelevant, since the term |∂V A| will dominate the term |A| up to a constant factor. (Compare with Remark 5.1.) While the finite island property for site percolation cannot be motivated by a statistical mechanics Gibbs measure problem as was the case for bond percolation (see Subsect. 5.1), it is still a natural property to investigate. As for bond percolation, also for site percolation we say that the finite island property holds if any infinite chain in the graph intersects some infinite cluster (in the sense of containing some site which belongs to some infinite cluster). Using Theorem 6.3(a) and Proposition 5.1(b), we obtain the analogue of Theorem 5.2. Theorem 6.4. Suppose that G is an infinite, locally finite, connected graph. If iV∗ (G) > 0, then for independent site percolation with large p the finite island property holds. 7. The Contact Process 7.1. Separation of critical points for the contact process. As usual we will identify the state of the contact process at time t with the set of infected sites at this time. In doing so, we will also, as in previous sections, denote sets with a single element by the name of this element. A Denote by (ξG,λ;t )t≥0 the contact process with infection parameter λ started from A A ⊂ V . Also let ξG,λ;t (x) be the indicator of the event that in this process the site x ∈ V is infected at time t. The following is the contact process analogue of Theorem 5.3.
314
R. H. Schonmann
Theorem 7.1. Suppose that G is an infinite, connected, bounded degree graph. If iE (G) > 0, then for λ > 1/ iE (G) there are C, γ ∈ (0, ∞) such that for any A V , A P(ξG,λ;t = ∅ for some t > 0) ≤ Ce−γ |A| .
Moreover γ can be taken as large as desired, provided λ is large enough. In particular λs (G) ≤
1 . iE (G)
(7.1)
Proof. When the state of the contact process is S V , the process jumps to states with one less infected site at rate |V |, and to states with one more infected site at rate A λ|∂E V | ≥ λiE (G)|V |. Therefore the process (|ξG,λ;t |)t≥0 is stochastically larger than a process (Xλ;t )t≥0 which is a birth and death process, with X0 = |A|, in which a death occurs at rate Xλ;t and a birth occurs at rate λiE (G)Xλ;t . A time change argument shows that the probability that this birth and death process will ever reach the state 0 is the same as the corresponding probability for a birth and death process started from the same state, in which a death occurs at rate 1 and a birth occurs at rate λiE (G). But if λ > 1/ iE (G), the probability that this process (which is just a biased random walk) will ever reach the state 0 decays exponentially with |A|, with a rate of exponential decay which can be made as large as desired, provided λ is large enough. ! Theorem 7.2 below is the analogue of Proposition 8.3 of [HPS] ((3.4) in the current paper). Before we can state it and prove it, we need to introduce some notation and review some facts. We will compare the contact process to a branching random walks (BRW) process. This BRW process on an infinite, connected, bounded degree graph G is described as follows. At each time t ≥ 0 the state of the BRW is described by associating a number of particles to each site, with only finitely many sites having more than 0 particles. The evolution is then described by saying that at each time each particle dies at rate 1 and gives birth to a new particle at rate α, with the new particle being placed at a uniformly chosen site from among the sites which are neighbors to its parent’s site. A Denote by (ζG,α;t )t≥0 the BRW on G started from the configuration with one particle A (x) be the number of particles at each site in A V and no other particle. Also let ζG,α;t in this process at the site x ∈ V at time t. A A (·))t≥0 is stochastically larger than (ξG,λ;t (·))t≥0 , i.e., the two It is clear that (ζG,λD;t processes can be coupled in such a way that whenever the contact process has site x infected, the BRW has at least one particle at this site. (Note that the BRW that we consider is different from the one obtained from the contact process by simply allowing births of new particles on already occupied sites. That process lies stochastically between the contact process and the BRW that we are considering.) In order to analyse the BRW in a manner done in [MS] and [Schi], we need to first consider an individual random walk on G. Recall that (AG (·, ·)) denotes the adjacency matrix of G and R(G) denotes its spectral radius. For x, y ∈ V set pG (x, y) =
AG (x, y) . dx
Multiplicity of Phase Transitions
315
pG (·, ·) is the transition kernel for the simple random walk on G. We will denote by n (x, y) the probability that this random walk started at x is at y at time n. The correpG sponding spectral radius is defined by n Rrw (G) = lim sup (pG (x, y))1/n , n→∞
which does not depend on x, y ∈ V . Similarly to (2.2) and (2.3) we have n pG (x, y) ≤ (Rrw (G))n
for all x, y ∈ V ,
n = 1, 2, . . . .
(7.2)
For n(k) = bip(G)k + oddG (x, y), n(k)
lim (pG (x, y))1/n(k) = Rrw (G) for all x, y ∈ V .
(7.3)
R(G) R(G) ≤ Rrw (G) ≤ , D(G) d(G)
(7.4)
k→∞
Clearly
where d(G) is the minimal degree of G and, as before, D(G) is its maximal degree. Consider also a continuous time simple random walk on G which jumps at rate α. We will denote by PG,α;t (x, y) the probability that this continuous time random walk started at x is at y at time t. The following lemma must be well known, but we could not find a reference. Lemma 7.1. lim
t→∞
1 log PG,α;t (x, y) = α(Rrw (G) − 1), t
(7.5)
and for all t ≥ 0, PG,α;t (x, y) ≤ eα(Rrw (G)−1)t .
(7.6)
Proof. By definition, PG,α;t (x, y) =
∞ n=0
e−αt
(αt)n n p (x, y). n! G
(7.7)
So (7.6) follows from (7.2). We will prove (7.5) first when oddG (x, y) = 0. Note that from (7.3) we have then 2k lim (pG (x, y))1/(2k) = Rrw (G).
k→∞
So, given a small F > 0, there exists K < ∞ such that for k ≥ K, 2k (x, y) ≥ (R (G) − F)2k . Therefore, from (7.7), pG rw PG,α;t (x, y) ≥
∞ k=K
≥e
−αt
e−αt "
(αt)2k (Rrw (G) − F)2k (2k)!
cosh(α(Rrw (G) − F)t) −
K−1 k=0
# (α(Rrw (G) − F)t)
2k
.
316
R. H. Schonmann
Hence 1 log PG,α;t (x, y) ≥ α(Rrw (G) − F − 1). (7.8) t Since F > 0 can be taken arbitrarily small, (7.5) for oddG (x, y) = 0 follows from (7.8) and (7.6). One can derive (7.5) for oddG (x, y) = 1 in a similar fashion, or also easily derive it from (7.5) for oddG (x, y) = 0. ! lim inf t→∞
Set
{r} λexp (G) = sup λ : P ξG,λ;t (r) = 1 decays exponentially with t .
Clearly λexp (G) ≤ λr (G).
(7.9)
r (r) = 1 for only To see this, note that for λ < λexp , by a Borel–Cantelli argument, ξG,λ;t finitely many integer values of t > 0, a.s. But using the Markov property of the contact process, it is clear that if the root were infected at arbitrarily large times, then it would a.s. be infected at arbitrarily large integer times (since starting from any configuration that has the root infected at time 0, one has probability at least e−1 of having the root infected throughout the next unit interval of time).
Theorem 7.2. Suppose that G = (V , E) is an infinite, connected, bounded degree graph. For each λ > 0, x, y ∈ V , x P(ξG,λ;t (y) = 1) ≤ e(λD(G)Rrw (G)−1)t .
(7.10)
In particular λr (G) ≥ λexp (G) ≥
d(G) 1 ≥ . D(G)Rrw (G) D(G)R(G)
(7.11)
A Proof. The BRW (ζG,λD(G);t (·))t≥0 is the same BRW studied in [Schi] (with the quantity λ there being our α = λD(G)). Using (3) from that paper and (7.6), we obtain x x P(ξG,λ;t (y) = 1) ≤ E(ζG,λD(G);t (y)) = e(λD(G)−1)t PG,λD(G);t (x, y)
≤ e(λD(G)−1)t+λD(G)(Rrw (G)−1)t = e(λD(G)Rrw (G)−1)t , proving (7.10). The first inequality in (7.11) is (7.9). The second inequality in (7.11) is a direct consequence of (7.10). The third inequality in (7.11) follows from (7.4). ! Theorem 7.3. Suppose that G is an infinite, connected, bounded degree graph. If iE (G) >
D(G)2 D(G)2 + d(G)2
,
(7.12)
then for the contact process on G, λs (G) < λexp (G) ≤ λr (G). Proof. From Theorem 7.1 and Theorem 7.2, it is enough to verify that 1/ iE (G) < d(G)/(D(G)R(G)). From (2.4) we see that it is therefore enough to verify that 1/ iE (G)2 < d(G)2 /(D(G)2 (D(G)2 − iE (G)2 )). But this is equivalent to (7.12). !
Multiplicity of Phase Transitions
317
7.2. Mean field criticality for the contact process. We suppose in this subsection that G = (V , E) is an infinite, locally finite, connected transitive graph. In this case the transition kernel pG (·, ·) is symmetric, since dx = D(G) does not depend on x ∈ V . As a consequence, also PG,α;t (·, ·) is symmetric. Furthermore, (7.4) becomes then D(G)Rrw (G) = R(G).
(7.13)
We will have to consider the contact process being started at time s ≥ 0 from a A,s A,s configuration A ⊂ V . This process will be denoted by (ξG,λ;t )t≥s . Similarly, (ζG,α;t )t≥s will denote the same BRW that we considered before, but started at time s ≥ 0 from the configuration with one particle at each site in A V and no other particle. The contact process triangle diagram is defined as follows, where z ∈ V and s ≥ 0, G λ (z, s) $ = u,v∈V
∞ s
$ dt1
∞ t1
u,t1 r,0 z,s dt2 P(ξG,λ;t (v) = 1)P(ξG,λ;t (u) = 1)P(ξG,λ;t (v) = 1). 2 1 2
Before we can state some known results on the mean field critical behavior of the contact process, we need some more definitions: G
"$
∞
χ (λ) = E 0
dt
x∈V
# r ξG,λ;t (x)
,
r ρ G (λ) = P(ξG,λ;t = ∅ for all t > 0), " " ## $ ∞ G r ρ (λ, h) = 1 − E exp −h dt ξG,λ;t (x) , 0
x∈V
where h ≥ 0. We will sometimes below abbreviate λs = λs (G), since there is no risk of confusion. The results below on mean field critical behavior were proved in [BW]. Suppose that G is an infinite, locally finite, connected transitive unimodular graph. Under the contact process open triangle condition, which reads, lim
sup
l→∞ z:dist(r,z)≥l s:s≥0
G λs (z, s) = 0,
(7.14)
one has the following (the labels on the left indicate the way one usually refers to each result, in terms of a corresponding critical exponent): [γ = 1] C1 (λs − λ)−1 ≤ χ G (λ) ≤ C2 (λs − λ)−1 , for λ < λs , [β = 1] C1 (λ − λs )1 ≤ ρ G (λ) ≤ C2 (λ − λs )1 , for λ > λs , [δ = 2] C1 h1/2 ≤ ρ G (λs , h) ≤ C2 h1/2 , for h > 0, where in each case C1 , C2 ∈ (0, ∞). The following result is the contact process counterpart to Theorem 3.1. It is related to Theorem 7.2 in the same way that Theorem 3.1 is related to Proposition 8.3 of [HPS] ((3.4) in this paper). Theorem 7.4. Suppose that G = (V , E) is an infinite, connected, bounded degree graph. For each λ < 1/R(G), (7.14) holds.
318
R. H. Schonmann
Proof. For arbitrary z ∈ V and s ≥ 0, $ ∞ $ ∞ (z, s) ≤ dt dt2 G 1 λ s
u,v∈V
t1
u,t1 r,0 z,s × E(ζG,λD(G);t (v)) E(ζG,λD(G);t (u)) E(ζG,λD(G);t (v)) 2 1 2 $ ∞ r,0 = dt2 E(ζG,λD(G);t (v)) 2 v∈V
$
× = =
s
t2
s
$
∞
v∈V
s
v∈V
s
$
dt1
∞
u∈V
u,t1 z,s E(ζG,λD(G);t (u)) E(ζG,λD(G);t (v)) 1 2
$
r,0 dt2 E(ζG,λD(G);t (v)) 2
t2 s
z,s dt1 E(ζG,λD(G);t (v)) 2
dt2 e(λD(G)−1)t2
× PG,λD(G);t2 (r, v) e(λD(G)−1)(t2 −s) PG,λD(G);(t2 −s) (z, v) (t2 − s) $ ∞ = dt2 e(λD(G)−1)(2t2 −s) (t2 − s) s × PG,λD(G);t2 (r, v) PG,λD(G);(t2 −s) (v, z) v∈V ∞
$ =
s
$ =
s
dt2 e(λD(G)−1)(2t2 −s) (t2 − s) PG,λD(G);2t2 −s (r, z)
∞
dw
e(λD(G)−1)w (w − s) PG,λD(G);w (r, z), 4
where in the first step we used the stochastic domination of the contact process by the BRW, in the third step we used the Markov property of the BRW and the fact that in this process particles do not interact after having been created, in the fourth step we used (3) from [Schi], in the fifth step we used the symmetry of PG,λD(G);t (·, ·), in the sixth step we used the Markov property of the continuous time simple random walk on G which jumps at rate λD(G), in the seventh step we changed variables: w = 2t2 − s. Therefore, $ ∞ G dw fl (w), sup λ (z, s) ≤ z:dist(r,z)≥l s:s≥0
where
0
fl (w) = e(λD(G)−1)w w
sup
z:dist(r,z)≥l
PG,λD(G);w (r, z).
Thanks to (7.6) and (7.13), we have 0 ≤ fl (w) ≤ we(λD(G)−1)w eλD(G)(Rrw (G)−1)w = w e(λR(G)−1)w . And clearly, for each w,
lim fl (w) = 0.
l→∞
Therefore (7.14) follows from the dominated convergence theorem, since λR(G) − 1 < 0. !
Multiplicity of Phase Transitions
319
Proof of Theorem 1.10. The claim that λs (G) < λr (G) is contained in Theorem 7.3,√since in the case of transitive G, (7.12) is the same as the condition iE (G)/D(G) > 1/ 2. The claim that the contact process open triangle condition, (7.14), holds follows from the same arguments in the proof of Theorem 7.3, using Theorem 7.4 in the place of Theorem 7.2. ! Acknowledgements. It is a pleasure to thank Marcia Salzano and Ander Holroyd for discussions on topics in this paper, and Oded Schramm for conversations on [BS3]. I thank also Russ Lyons for having made several important comments on the first version of this paper and for having sent me a copy of [HJL] shortly after the current paper was submitted. The current paper and [HJL] were written independently, but turned out to have some interesting interrelations. For instance, the explicit expression in [HJL] for the edge-isoperimetric constant of transitive planar graphs with one end shows that when such a graph has a large degree (or has a dual with large degree) it is highly non-amenable in the sense of the current paper. See also the notes at the end of Subsects. 4.1 and 6.1. Thanks go also to Chris Wu for having sent me a copy of [Wu4] shortly after the current paper was submitted, and for having pointed out to me the contribution in [RNO]. For the relation between the current paper and [Wu4], see the note after the statement of Theorem 1.8.
References Aizenman, M.: Geometric analysis of ϕ 4 fields and Ising models. I, II. Commun. Math. Phys. 86, 1–48 (1982) [Aiz2] Aizenman, M.: Stochastic geometry in statistical mechanics and quantum field theory. In: Proceedings of the International Congress of Mathematicians, Vol. 1, 2, Warsaw: PWN, 1984, pp. 1297– 1307 [AB] Aizenman, M. and Barsky, D.: Sharpness of the phase transition in percolation models. Commun. Math. Phys. 108, 489–526 (1987) [AF] Aizenman, M. and Fernández, R.: On the behavior of the magnetization in high-dimensional Ising models. J. Stat. Phys. 44, 393–454 (1986) [ABF] Aizenman, M., Barsky, D. and Fernández, R.: The phase transition in a general class of Ising-type models is sharp. J. Stat. Phys. 47, 343–374 (1987) [AN] Aizenman, M. and Newman, C.M.: Tree graph inequalities and critical behavior in percolation models. J. Stat. Phys. 16, 811–828 (1983) [BB] Babson, E. and Benjamini, I.: Cut sets and normed cohomology with application to percolation. Preprint (1997) Proc. Am. Math. Soc. 127, 589–597 (1999) [BA] Barsky, D.J. and Aizenman, M.: Percolation critical exponents under the triangle condition. Commun. Math. Phys. 19, 1520–1536 (1991) [BW] Barsky, D.J. and Wu, C.C.: Critical exponents for the contact process under the triangle condition. J. Stat. Phys. 91, 95–124 (1998) [Bax] Baxter, R.J.: Exactly Solved Models in Statistical Mechanics. London: Academic Press, 1982 [BLPS1] Benjamini, I., Lyons, R., Peres, R. and Schramm, O.: Group-invariant percolation on graphs. Geom.and Funct. Anal. 9, 29–66 (1999) [BLPS2] Benjamini, I., Lyons, R., Peres, Y. and Schramm, O.: Critical percolation on any non-amenable group has no infinite clusters. Ann. Probab. 27, 1347–1356 (1999) [BLS] Benjamini, I., Lyons, R. and Schramm, O. Percolation perturbations in potential theory and random walks. In: Random Walks and Discrete Potential Theory (Cortona, 1997), M. Picardello and W. Woess, editors. Cambridge: Cambridge Univ. Press, 1999, pp. 56–84 [BS1] Benjamini, I. and Schramm, O.: Percolation beyond Zd , many questions and a few answers. Electronic Communications in Probability 1, 71–82 (1996) [BS2] Benjamini, I. and Schramm, O.: Recent progress on percolation beyond Zd .Available electronically from the web page www.wisdom.weizmann.ac.il/users/˜schramm [BS3] Benjamini, I. and Schramm, O.: Percolation in the hyperbolic plane. Preprint 2000 [BG] Bezuidenhout, C. and Grimmett, G.: The critical contact process dies out. Ann. Probab. 18, 1462– 1482 (1990) [BCK] Biskup, M., Chayes, L., Kotecký, R.: On the continuity of the magnetization and the energy density for Potts models on two-dimensional graphs. Preprint 1998 [BRZ] Bleher, P.M., Ruiz, P.M. and Zagrebnov, V.A.: On the purity of the limiting Gibbs state for the Ising model on the Bethe lattice. J. Stat. Phys. 79, 473–482 (1995)
[Aiz1]
320
[BK] [CK] [CS] [CP] [DS] [ES] [EKPS] [Geo] [GHM] [Gri1] [Gri2] [GN] [GS] [Häg1] [Häg2] [Häg3] [HJL] [HP] [HPS]
[HSS] [Ham] [HS] [Hig] [Hun] [Iof1] [Iof2] [Jon] [JS] [Kes] [Lal1] [Lal2]
R. H. Schonmann
Burton, R.M. and Keane, M.: Density and uniqueness in percolation. Commun. Math. Phys. 121, 501–505 (1989) Chaboud, T. and Kenyon, C.: Planar Cayley graphs with regular dual. Int. J. Alg. and Comp. 6, 553–561 (1996) Chayes, L. and Schonmann, R.H.: Mixed percolation as a bridge between site and bond percolation. Ann. Appl. Prob. (To appear) Chen, D., Peres, Y.: Anchored expansion, percolation and speed. Preprint 1999 Durrett, R. and Schinazi, R.B.: Intermediate phase for the contact process on a tree. Ann. Probab. 23, 668–673 (1995) Edwards, R.G. and Sokal, A.D.: Generalization of the Fortuin–Kasteleyn–Swendsen–Wang representation and Monte Carlo algorithm. Phys. Rev. D 38, 2009–2112 (1988) Evans, W., Kenyon, C., Peres, Y. and Schulman, L.: Broadcasting on trees and the Ising model. Ann. Appl. Prob. 10, 410–433 (2000) Georgii, H.-O.: Gibbs Measures and Phase Transitions. Berlin, NewYork: Walter de Gruyter, 1988 Georgii, H.-O., Häggström, O. and Maes, C. The Random geometry of equilibrium phases. In: Phase Transitions and Critical Phenomena, Vol. 18, C. Domb and J.L. Lebowitz, editors. London: Academic Press, 2000, pp. 1–142 Grimmett, G.R.: The stochastic random-cluster process and the uniqueness of random-cluster measures. Ann. Probab. 23, 1461–1510 (1995) Grimmett, G.R.: Percolation. (2nd edition) New York–Berlin: Springer-Verlag, 1999 Grimmett, G.R. and Newman, C.M.: Percolation in ∞ + 1 dimensions. In: Disorder in physical systems, G. R. Grimmett and D. J. A. Welsh, editors, Oxford: Clarendon Press, 1990, pp. 219–240 Grimmett, G.R. and Stacey, A.M.: Critical probabilities for site and bond percolation models. Ann. Probab. 26, 1788–1812 (1998) Häggström, O.: The random-cluster model on a homogeneous tree. Probab. Theory and Related Fields 104, 231–253 (1996) Häggström, O.: Infinite clusters in dependent automorphism invariant percolation on trees. Ann. Probab. 25, 1423–1436 (1997) Häggström, O.: Markov random fields and percolation on general graphs. Adv. App. Probab. 32, 39–66 (2000) Häggström, O., Jonasson, J. and Lyons, R.: Explicit isoperimetric constants, phase transitions in the random-cluster and Potts models, and Bernoullicity. Preprint (2000) Häggström, O. and Peres, Y.: Monotonicity of uniqueness for percolation on Cayley graphs: All infinite clusters are born simultaneously Probab. Theory and Related Fields 113, 273–285 (1999) Häggström, O., Peres, Y. and Schonmann, R.H.: Percolation on transitive graphs as a coalescent process: Relentless merging followed by simultaneous uniqueness. In: Perplexing Problems in Probability. Festschrift in honor of Harry Kesten, M. Bramson, R. Durrett, editors. Basel–Boston: Birkhäuser, 1999, pp. 69–90 Häggström, O., Schonmann, R.H. and Steif, J.: The Ising model on diluted graphs and strong amenability. Ann. Probab. (to appear) Hammersley, J.M.: Percolation processes. Lower bounds for the critical probability. Ann, Math. Stat. 28, 790–795 (1957) Hara, T. and Slade, G.: Mean-field behaviour and the lace expansion. In: Probability theory of spatial disorder and phase transition, G. Grimmett, ed., Dordrecht, Boston, London: Kluwer Publ. Co, 1994, pp. 87–122 Higuchi, Y.: Remarks on the limiting Gibbs states on a (d + 1)-tree. Publ. RIMS, Kyoto Univ. 13, 335–348 (1977) Hungerford, T.W.: Algebra. Berlin–Heidelberg–New York: Springer Verlag, 1974 Ioffe, D.: On the extremality of the disordered state for the Ising model on the Bethe lattice. Lett. Math. Phys. 37, 137–143 (1996) Ioffe, D.: Extremality of the disordered state for the Ising model on general trees. In: Trees (Versailles, 1995), B. Chauvin, S. Cohen and A. Roualt, editors, Basel–Boston: Birkhäuser, 1996, pp. 3–14 Jonasson, J.: The random cluster model on a general graph and a phase transition characterization of nonamenability. Stochastic Processes and their Appl. 79, 335–354 (1999) Jonasson, J. and Steif, J.E.: Amenability and phase transition in the Ising model. J. Theor. Probab. 12, 549–559 (1999) Kesten, H.: Percolation Theory for Mathematicians. Boston–Basel–Stuttgart: Birkhäuser, 1982 Lalley, S.P.: Percolation on Fuchsian groups. Ann. L’Institut Henri Poincaré (Probability and Statistics) 34, 151–177 (1998) Lalley, S.: Growth profile and invariant measures for the weakly supercritical contact process on a homogeneous tree. Ann. Probab. 27, 206–225 (1999)
Multiplicity of Phase Transitions
[Lal3] [LS] [Leb] [Lig1] [Lig2] [Lig3] [Lig4] [Lyo1] [Lyo2] [Lyo3] [Lyo4] [Lyo5] [LyS] [MS] [MM-S] [Moh1] [Moh2] [MW] [MoS] [MSZ] [MP] [NW] [Ngu] [NY] [PS-N] [Pem] [PS] [Per] [PLM] [Pfi] [RNO] [Sal] [SaS1] [SaS2] [SaS3]
321
Lalley, S.: Percolation clusters in hyperbolic tesselations. Preprint, 1999 Lalley, S. and Sellke, T.: Limit set of a weakly supercritical contact process on a homogeneous tree. Ann. Probab. 26, 644–657 (1998) Lebowitz, J.: Coexistence of phases phases in Ising ferromagnets. J. Stat. Phys. 16, 463–476 (1977) Liggett, T.M.: Interacting Particle Systems. Berlin–Heidelberg–New York: Springer Verlag, 1985 Liggett, T.M.: Multiple transition points for the contact process on the binary tree. Ann. Probab. 24, 1675–1710 (1996) Liggett, T.M.: Branching random walks and contact processes on homogeneous trees. Probab. Theory and Related Fields 106, 495–519 (1996) Liggett, T.M.: Stochastic Interacting Systems: Contact, Voter and Exclusion Processes. Berlin– Heidelberg–New York: Springer Verlag, 1999 Lyons, R.: The Ising model and percolation on trees and tree-like graphs. Commun. Math. Phys. 125, 337–353 (1989) Lyons, R.: Random walk and percolation on trees. Ann. Probab. 18, 931–958 (1990) Lyons, R.: Random walk, capacity and percolation on trees. Ann. Probab. 20, 2043–2088 (1992) Lyons, R.: Random walks and the growth of groups. Comptes Rendus de l’Academie de Sciences, Series I – Mathematique 320, 1361–1366 (1995) Lyons, R.: Phase transition on non-amenable graphs. J. Math. Phys. 41, 1099–1126 (2000) Lyons, R. and Schramm, O.: Indistinguishability of Percolation Clusters. Ann. Probab. 27, 1809– 1836 (1999) Madras, N. and Schinazi, R.B.: Branching Random Walk on trees. Stochastic Processes and Their Appl. 42, 255–267 (1992) Messager, A. and Miracle-Sole, S.: Equilibrium states of the two dimensional Ising model in the two-phase region. Commun. Math. Phys. 40, 187–196 (1975) Mohar, B.: Isoperimetric inequalities, growth, and the spectrum of graphs. Linear Alg. and its Appl. 103, 119–131 (1988) Mohar, B.: Some relations between analytic and geometric properties of infinite graphs. Discrete Math. 95, 193–219 (1991) Mohar, B. and Woess, W.: A survey on spectra of infinite graphs. Bull. London Math. Soc. 21, 209–234 (1989) Moore, T. and Snell, E.J.: A branching process showing a phase transition. J. Appl. Probab. 16, 252–260 (1979) Morrow, G.J., Schinazi, R.B. and Zhang, Y.: The critical contact process on a homogeneous tree. J. Appl. Probab. 31, 250–255 (1994) Muchnik, R. and Pak, I.: Percolation on Grigorchuk groups. Commun. Alg. (To appear) Newman, C.M. and Wu, C.C.: Markov fields on branching planes. Probab. Theory and Related Fields 85, 539–552 (1990) Nguyen, B.: Gap exponent for percolation processes with triangle condition. J. Stat. Phys. 49, 235–243 (1987) Nguyen, B. and Yang, W.-S.: Triangle condition for oriented percolation in high dimensions. Ann. Probab. 21, 1809–1844 (1993) Pak, I. and Smirnova-Nagnibeda, T.: On non-uniqueness of percolation on nonamenable Cayley graphs. Comptes Rendus De L’Academie Des Sciences, Serie I – Mathematique, 330, 495–500 (2000) Pemantle, R.: The contact process on trees. Ann. Probab. 20, 2089–2116 (1992) Pemantle, R. and Stacey, A.: The branching random walk and contact process on Galton–Watson and non-homogeneous trees. Preprint 1999 Peres, Y.: Percolation on nonamenable products at the uniqueness threshold. Ann. L’Institut Henri Poincaré (Probability and Statistics) 36, 395–406 (2000) Peruggi, F., di Liberto, F. and Monroy, G.: The Potts model on Bethe lattices: I. General results. J. Phys. A 16, 811–828 (1983) Pfister, C.: Translation invariant equilibrium states of ferromagnetic Abelian lattice systems. Commun. Math. Phys. 86, 375–390 (1982) Rietman, R., Nienhuis, B. and Oitmaa, J.: The Ising model on hyperlattices. J. Phys. A 25, 6577– 6592 (1992) Salzano, M.: Infinitely many contact process transitions on a tree. J. Stat. Phys. 97, 817–826 (1999) Salzano, M. and Schonmann, R.H.: The second lowest extremal invariant measure of the contact process. Ann. Probab. 25, 1846–1871 (1997) Salzano, M. and Schonmann, R.H.: A new proof that for the contact process on homogeneous trees local survival implies complete convergence. Ann. Probab. 26, 1251–1258 (1998) Salzano, M. and Schonmann, R.H.: The second lowest extremal invariant measure of the contact process II. Ann. Probab. 27, 845–875 (1999)
322
[Schi] [Sch1] [Sch2] [Sch3] [ST] [SeS] [SW] [Sok] [Sta] [TM] [VT] [Wu1] [Wu2] [Wu3] [Wu4] [Wu5] [Zha]
R. H. Schonmann
Schinazi, R.B.: On multiple phase transitions for branching Markov chains. J. Stat. Phys. 71, 507–511 (1993) Schonmann, R.H.: The triangle condition for contact processes on homogeneous tree. J. Stat. Phys. 90, 1429–1440 (1998) Schonmann, R.H.: Stability of infinite clusters in supercritical percolation. Preprint (1997), Probability Theory and Related Fields 113, 287–300 (1999) Schonmann, R.H.: Percolation in ∞ + 1 dimensions at the uniqueness threshold. In: Perplexing Problems in Probability. Festschrift in honor of Harry Kesten, M. Bramson, R. Durrett, editors. Basel–Boston: Birkhäuser 1999, pp. 53–67 Schonmann, R.H. and Tanaka, N.I.: Lack of monotonicity in ferromagnetic Ising model phase diagrams. The Ann. Appl. Probab. 8, 234–245 (1998) Series, C.M. and Sinai, Ya.G.: Ising models on the Lobachevsky Plane. Commun. Math. Phys. 128, 63–76 (1990) Soardi, P.M. and Woess, W.: Amenability, unimodularity, and the spectral radius of random walks on infinite graphs. Math. Z. 205, 471–486 (1990) Sokal, A.: A rigorous inequality for the specific heat of an Ising or J4 ferromagnet. Phys. Lett. A 71, 451–453 (1979) Stacey, A.M.: The existence of an intermediate phase for the contact process on trees. Ann. Probab. 24, 1711–1726 (1996) Terrones, H. and Mackay, A.L.: The geometry of hypothetical curved graphite structure. Carbon 30, 1251–1260 (1992) Vanderbilt, D. and Tersoff, J. Negative-curvature fullerene Analog of C60 . Phys. Rev. Lett. 68, 511–513 (1992) Wu, C.C.: Critical behavior of percolation and Markov fields on branching planes. J. Appl. Probab. 30, 538–547 (1993) Wu, C.C.: The contact process on a tree – behavior near the first transition. Stochastic Processes and their Appl. 57, 99–112 (1995) Wu, C.C.: Ising models on hyperbolic graphs. J. Stat. Phys. 85, 251–259 (1996) Wu, C.C.: Ising models on hyperbolic graphs II. J. Stat. Phys. (to appear) Wu, F.Y.: The Potts model. Rev, Mod. Phys. 54, 235–268 (1982) Zhang, Y.: The complete convergence theorem of the contact process on trees. Ann. Probab. 24, 1408–1443 (1996)
Communicated by J. L. Lebowitz
Commun. Math. Phys. 219, 323 – 355 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Mixing Properties for Mechanical Motion of a Charged Particle in a Random Medium V. Sidoravicius1 , L. Triolo2 , M. E. Vares3 1 IMPA, Rio de Janeiro, Brazil and Institute of Mathematics and Informatics, Vilnius, Lithuania 2 Dipartimento de Matematica, Università di Roma Tor Vergata, Roma, Italy 3 IMPA, Rio de Janeiro, Brazil
Received: 22 March 2000 / Accepted: 8 December 2000
Abstract: We study a one-dimensional semi-infinite system of particles driven by a constant positive force F which acts only on the leftmost particle of mass M, called the heavy particle (the h.p.), and all other particles are mechanically identical and have the same mass m < M. Particles interact through elastic collisions. At initial time all neutral particles are at rest, and the initial measure is such that the interparticle distances ξi are i.i.d. r.v. Under conditions on the distribution of ξ which imply that the minimal velocity obtained by each neutral particle after the first interaction with the h.p. is bigger than the drift of an associated Markovian dynamics (in which each neutral particle is annihilated after the first collision) we prove that the dynamics has a strong cluster property, and as a consequence, we prove existence of the discrete time limit distribution for the system as seen from the first particle, a ψ-mixing property, a drift velocity, as well as the central limit theorem for the tracer particle. 1. Introduction In the present work we are concerned with the question of long time behavior for a mechanical system with infinitely many point particles. The system consists of one charged particle, of mass M, (hereafter called the heavy particle and sometimes the h.p.) initially located at the origin and subject to a constant positive force F , and infinitely many neutral particles of mass m < M, (sometimes called the n.p.) located on the positive half-line. The h.p. will play the role of the tracer particle for the system. Initially all neutral particles are at rest. During the evolution they keep their velocities constant between the collisions, which are assumed to be elastic. Due to the external constant force, between two consecutive collisions the heavy particle is steadily accelerated. We will focus on a set of typical questions such as the existence of an asymptotic drift and diffusivity for the motion of the heavy particle, and the ergodic behaviour of the medium as seen from the heavy particle, and its mixing properties. We refer to Presutti et al. [12], for various motivations which led to a study of analogous semi-infinite systems (see also
324
V. Sidoravicius, L. Triolo, M. E. Vares
[2–4]). For a more general account on this subject we refer to [15–17], and some recent developments are discussed in [6] and [13]. At variance with respect to finite systems, for which generally it is believed that the main mechanism of “chaos” relies on the existence of foliations of the phase space into stable and unstable manifolds (see [1]), for infinite systems it is not completely clear which features are responsible for good ergodic properties. Besides an attempt to bring some analogue of the notion of hyperbolicity, which still remains quite unclear in the context of infinite mechanical systems, another rather natural mechanism was proposed by Landau and Lifschitz: “ . . . A subsystem undergoes all kinds of interactions with other parts of the system [due to infinite size of the system] . . . [and] owing to the complexity of [such] interactions it will pass sufficiently often through all its possible states . . . ” [9]. Still another possible way comes from the fact that the dynamics is expected to be asymptotically free (far from the h.p.), i.e. the neutral particles which interact with the h.p. eventually move away towards +∞ becoming asymptotically free. This naturally reminds us of the Möller wave operators of classical scattering theory as it was pointed out by Sinai more than fifteen years ago, but still very little is known in this direction and questions remain quite hard for a rigorous analysis. To our knowledge the only known result for the infinite mechanical system was obtained by Presutti et al. [12], where existence of the Möller wave operator was proven, as well as asymptotic completeness and existence of a non-trivial scattering matrix. Even though we do not address in this work the same questions, we explore some ideas which appeared in [12], combining them with coupling methods of probability theory, to prove convergence and mixing properties of the underlying discrete dynamics, at the instants the h.p. collides with the standing neutral particles. Namely, from one side the analysis relies on three characteristic properties of the evolution: a) the cluster structure of the dynamics, i.e. the h.p. interacts with a finite set of neutral particles and when such interaction is over, a new finite cluster of particles “comes in for the interaction”; when this is finished, the same phenomenon starts again, and keeps continuing on and on; b) the occurrence of rare events in the flow of neutral particles, that causes a (locally) significant change of the behaviour of the h.p. and which determines an almost complete loss of the memory of the past history of the h.p. (the meaning of “almost complete” will be specified below); c) the (almost everywhere) existence of a contractive manifold [12]. On the other hand, as we have already mentioned, we make strong use of couplings and comparisons of two dynamics associated with the same system. In spirit, this is again similar to scattering, which normally involves a comparison of two different dynamics for the same system – the given dynamics and a simplified “free” dynamics. For our concrete dynamical system the role of “simplified evolution” is played by what we call the Markovian dynamics, in which each neutral particle collides only once with the h.p.; moreover, since all n.p. are indistinguishable, we can think of them as “pulses” which just interact with the h.p., and in this (Markovian) case each one moves freely after the unique collision. Under Assumptions 2.5 given below, we will prove the existence of an invariant measure describing the medium as seen from the heavy particle at moments of collisions with the standing neutral particles (discrete dynamics), ψ-mixing property for the sequence of flight times, and as a consequence of it, we shall show the existence of an asymptotic drift and diffusive behaviour of the h.p. The main content of our assumptions is the existence of a minimal velocity for the moving n.p. which is larger than the asymptotic drift velocity of the h.p. in the Markovian dynamics. Informally the scheme of the proof can be described as follows. We allow the interacting dynamics to act for a “long time” and then look at what happened to the system.
Mixing Properties for the Mechanical Motion of Charged Particle
325
Roughly, the “long time” intervals are characterized by the “typical” time needed in the Markovian dynamics for the h.p. to get close to its asymptotic behaviour, provided that its initial velocity is not too large (later specified by a suitable constant V∗ ). At these random instants we check if there is some large enough cluster of tightly placed neutral particles right in front of the h.p. (the event described in property b) above), and if so, we start to follow the evolution of the velocity process of the h.p. while it is moving “through” this cluster. Now if one would compare two dynamics associated with the same initially standing particles, and such that one of them initially has no moving neutral particles and the other has them distributed according to some limiting measure, due to the property c), one would observe that the two velocity processes are becoming close to each other in the variational distance. This fact and particular properties of the distribution of initial interparticle distances enables us to couple these velocity processes with a positive probability, and by this, in the case of successful coupling, achieving a temporary loss of “inertial memory” of the past. This is one of the most delicate parts of the proof. The next step is to exploit property a) of the cluster structure of the dynamics in order to eliminate possible “returns” of the memory via recollisions with the particles of the distant past. We obtain control over it by following Sinai’s argument of “continuous loss of memory”, and by proving that interdistances between the h.p. and neutral particles with which the h.p. had collided in the past are stochastically bounded from below by a certain Markov chain with a positive drift, and thus, occasionally those neutral particles will stop to interact with the h.p. in the future, producing the so-called cluster indices. These are the principal components in the proof of the existence of an invariant measure describing the medium as seen from the heavy particle at the moments of collisions with the standing neutral particles (discrete dynamics) and the ψ-mixing property for the sequence of flight times. The existence of an asymptotic drift follows quite easily from the mixing property. The verification of the diffusive behaviour of the h.p. is of a more technical nature. We also should notice that in the equal masses case (m = M) the model was treated in [5]. The paper is organized as follows. In Sect. 2 we give a formal description of the model and state our main results. Section 3 is dedicated to the study of long time behaviour of the Markovian dynamics associated with the original one. In Sect. 4, we prove the tightness of the family of measures which describe the medium as seen from the h.p., for the discrete time dynamics. In Sect. 5 we prove the strong cluster property, i.e. existence of cluster indices, and as a consequence we obtain the existence and uniqueness of the invariant measure for the discrete dynamics, as well as mixing properties, and the existence of asymptotic drift for the h.p. The diffusivity of the motion of the h.p. is proven in Sect. 6, using a technical lemma the proof of which is postponed to the Appendix. 2. The Model and Results We consider a system of particles on the half-line R+ = [0, +∞), with nonnegative velocities. No two particles are allowed to be at the same space point and to have the same velocity, so that the state of the system at any given time will be represented by a countable subset of R+ × R+ , where the first coordinate will represent position, and the second will represent velocity. Thus the state space of our system will be taken as X = {x ⊆ R+ × R+ : for each b < +∞, x ∩ [0, b] × R+ is finite}, i.e., the configurations of particles will be locally (in space) finite.
326
V. Sidoravicius, L. Triolo, M. E. Vares
On X we consider the topology for which a fundamental system of neighbourhoods of x ∈ X is given by the sets GA,V = {x ∈ X : #(x ∩ (A × V )) = #(x ∩ (A × V ))}, where A and V are bounded open sets of R+ , such that x ∩ ∂(A × V ) = ∅, and where #B denotes the cardinality of a finite set B. That is, elements of X are thought of as discrete measures and X is endowed with vague topology. It is well known that with this topology X is a Polish space (see [5] and references therein). We then let B denote its Borel σ -field. Since the system is one dimensional it may be convenient to represent a configuration x as a sequence (qn (x), vn (x))n≥0 , where qn , and vn will denote the position and the velocity, resp., of the (n + 1)th particle, ordered according to position, i.e., qn ≥ qn−1 and if qn = qn−1 then vn > vn−1 . Particles are assumed to be pointlike, with the same mass m, except for the leftmost particle, which will have mass M > m, and which will be called “the heavy particle” (the h.p.). The h.p. is subject to a constant force F > 0. Other particles do not feel the field, and are called neutral particles (n.p.). The interaction among particles is elastic. Neutral particles are indistinguishable, thus we may think of them as “pulses” (without interaction among themselves). In this setup all neutral particles keep their own velocities until they collide with the h.p. At such a collision, velocities change according to the following rule, in which v and V are the incoming velocities of the neutral particle and the h.p., respectively, and v and V are the corresponding outgoing velocities: V = αV + (1 − α)v; (2.1) v = (1 + α)V − αv; def
where α = (M − m)/(M + m), so that 0 < α < 1. The h.p. is under the action of the constant force F > 0, so that between collisions def
it moves with constant acceleration f = F /M. Assumptions 2.1. Initially all n.p. are at rest i.e. vi = 0 for i ≥ 1. The h.p. is located at the origin (q0 = 0) and has velocity v0 ∈ R+ . We then completely describe an def
initial probability measure µ0 on X by requiring that the interparticle distances ξn = qn − qn−1 , (n ≥ 1) are i.i.d. positive random variables, with an absolutely continuous distribution. Under previous assumptions we know that the dynamics just described is globally well defined µ0 a.s. on X. Proposition 2.2 (Existence of dynamics). Under Assumptions 2.1 there exists a measurable flow (Tt x)t≥0 on (X, B, µ0 ), corresponding to the dynamical rules prescribed above. The proof of such a statement, as given in [14], is a combination of the following three facts which hold µ0 almost surely: i)
only finitely many neutral particles enter any given bounded neighbourhood of the h.p. in finite time;
Mixing Properties for the Mechanical Motion of Charged Particle
327
ii) all collisions are transversal, i.e. the h.p. never collides simultaneously with several neutral particles, or with a neutral particle which has velocity equal to the velocity of the h.p.; iii) only finitely many collisions occur in a finite interval of the time. Remark. Assumptions 2.1 require absolute continuity of the distribution of the interdistances ξi , though the heuristics seem to indicate that the absence of non-transversal collisions might be obtained by a co-dimension type argument, thus requiring only the continuity of such a distribution. Nevertheless for degenerated initial velocities a rigorous proof along these lines is not known to the authors. As it follows from the proof, we may assume (Tt x) to be right continuous, in the sense that if a collision happens at time t, then vn (Tt x) denote the outgoing velocities (n ≥ 0). Definition 2.3. The position of the h.p. at time t is denoted by q0 (t, x) = q0 (Tt x), where x is the initial configuration. The law of the system at time t, as seen from the position of the h.p., is described by the measure µ t (B) = µ0 ((Tt0 )−1 (B)), where Tt0 x = Sq0 (t,x) Tt x,
(2.2)
Sr x = {(q − r, v) : (q, v) ∈ x}.
(2.3)
and From Proposition 2.2, Tt0 is well defined µ0 a.s. A useful discrete time dynamics is obtained by observing the system from the position of the h.p. at the moments of “fresh collisions”, i.e., the first collision with each given n.p. or equivalently, at the moments of collisions with standing particles. That is, if x ∈ X is such that the dynamics is well defined and n ≥ 1, we define tn (x) via q0 (tn (x), x) = qn (x0 ), where x0 = {(q, v) ∈ x : v = 0} is the configuration of standing particles in x. Clearly tn (x) < +∞ µ0 a.s. Considering the rule (2.1) of elastic collisions, tn (x), (n ≥ 1) coincide with the times of successive visits of (Tt0 x) to the set X0 = {x ∈ X : q0 (x) = q1 (x) = 0, v0 (x) =
α v1 (x)}. 1+α
In particular, defining def
T1 x = Tt01 (x) x, then T1 x ∈ X0 , In fact for each n,
t2 (x) = t1 (T1 x),
and T1 (T1 x) = Tt02 (x) (x).
Tn1 (x) = Tt0n (x) (x),
which is well defined µ0 almost surely for all n ≥ 1, and which we denote by Tn x. We def
also let τn (x) = tn (x) − tn−1 (x), for n ≥ 1, with t0 (x) = 0, so that τn (x) = τ1 (Tn−1 x), µ0 a.s.
328
V. Sidoravicius, L. Triolo, M. E. Vares
Definition 2.4. For each n ≥ 1, µ(n) will denote the law of the system at time tn , as seen from the random position of the h.p., i.e. µ(n) (B) = µ0 ((Tn1 )−1 (B)), for B ∈ B. Assumptions 2.5. In the setup of Assumptions 2.1 we allow v0 , the initial non-negative velocity of the h.p., to be random, independent of the interdistances (ξn )n≥1 . Moreover we require: (a) There exists a > 0 with Eµ0 (eaξ1 ) < +∞, and def
d = inf{r : µ0 (ξ1 ≤ r) > 0} > 0. (b1) (b2) (c)
α
2df ≤ v0 ; µ0 a.s. 1 − α2
Eµ0 (v0h ) < ∞ for some h > 0. 2 (1 − α 2 )d >
Eµ (ξ1 ) 0 . ∞ α 2(i−1) ξ Eµ0 &i=1 i
For α = 0 condition (c) coincides with the condition (ii) in [5], and (b1) becomes trivial. An example satisfying Assumptions 2.5 (a), (c) is {ξi = d + Zi }i , where Zi are i.i.d. exponential random variables with rate greater than 1/d. We may now state our main results. Theorem 2.6. Under Assumptions 2.5 there exists a probability measure µ on X, concentrated on X0 , stationary for the discrete time dynamics, and such that µ(n) → µ weakly, as n → +∞. The proof of the above theorem is based on the construction of the so-called cluster index (see Definition 5.1), and using this construction we will prove the following: Theorem 2.7. Under Assumptions 2.5 we have: I) There exist positive constants c, c such that sup
A∈Mk0 , B∈M+∞ k+n
|µ(B|A) − µ(B)| ≤ ce−c n
for all k, n ≥ 1, where Mnm denotes the σ -field generated by the variables τi : m ≤ i ≤ n. II) There exists a positive constant vD (the drift velocity) so that as t → +∞, q0 (t) → vD , t
µ0 a.s.
Mixing Properties for the Mechanical Motion of Charged Particle
329
III) There exists a positive constant σ˜ so that
q0 (ut) − vD ut → Wu 0≤u≤1 √ σ˜ t 0≤u≤1 in law in the Skorohod space D([0, 1], R), as t → +∞, where (Wu ) represents the standard Wiener process. Notation. According to the interpretation of n.p. as pulses crossing each other, for an initial configuration x = (qi , vi )i≥0 , with q0 = 0, vi = 0 for all i ≥ 1, for which the evolution is well defined, we use the symbol pi , i ≥ 1, to denote the pulse initially located at qi ≡ qi (x), and let qi (t, x) and vi (t, x) be its position and velocity at time t. The initial condition x will be usually omitted from the notation, and we then write qi (t), vi (t). Thus, in general qi (t) = qi (Tt x) if i ≥ 1, due to different labelling. Further we shall use the symbols c, c to denote positive constants. These constants may vary from one line to another, though we use the same symbol. Remark. Though the limit in Theorem 2.6 doesn’t depend on the initial velocity v0 , the speed of the convergence clearly depends on its distribution. 3. Markovian Approximation In this section we focus on an auxiliary dynamics which is obtained by introducing the so-called “annihilation hypothesis”, i.e., each n.p. disappears immediately after the first collision with the h.p. This quite simple evolution will be called Markovian approximation dynamics due to the Markovian property of the velocity of the h.p. The Markovian approximation dynamics will be used here as an auxiliary tool to provide useful bounds. For the case M = m Piasecki studied particular cases for which the test particle collides only once with each pulse, so that the Markovian dynamics coincides with the real one (see [11]). Remark. All quantities like velocities, hitting times and positions related with Markovian approximation dynamics will carry a “bar” in their notations, e.g. v¯0 , q¯0 , etc. As in the previous section t¯n is defined by q¯0 (t¯n ) = ni=1 ξi , and τ¯n = t¯n − t¯n−1 , n ≥ 1, (with t¯0 = 0). Proposition 3.1. q¯0 (t¯n ) 1 lim = n→+∞ t¯n (1 − α)
Eµ (ξ1 ) f def 0 = v¯D 2E 2(j −1) α ξ µ0 j j ≥1
µ0 a.s.
(3.1)
Proof. Let u¯ n = v¯0 (t¯n ) = u¯ n (v¯0 (x), ξ1 , ξ2 , . . . , ξn ) be the outgoing velocity of the h.p. at the n-th collision. The velocity change during (t¯n−1 , t¯n ) is only due to the constant field F and, since there are no recollisions in this dynamics, one may explicitly compute: 1 u¯ n − u¯ n−1 , α 1 τ¯n = − u¯ n−1 + u¯ 2n−1 + 2f ξn , f
f · τ¯n =
(3.2) (3.3)
330
V. Sidoravicius, L. Triolo, M. E. Vares
so that we immediately get u¯ n = α ·
u¯ 2n−1 + 2f ξn ;
(3.4)
and iterating we get: u¯ 2n = α 2n v¯02 (x) + 2f
n
α 2(n−i+1) ξi
(3.5)
i=1
for any initial velocity v¯0 (x) ≥ 0. (u¯ 0 = v¯0 (x).) From Eq. (3.2) we have: n n−1 t¯n 1 u¯ n − u¯ 0 1 (1 − α) 1 τ¯i = u¯ i , = + n n αf n αf n i=1
so that
q¯o (t¯n ) = t¯n
1 n 1 n
(3.6)
i=0
n
i=1 ξi n i=1 τ¯i
=
1 u¯ n −u¯ 0 αf n
1 n
n
+
i=1 ξi . (1−α) 1 n−1 ¯i i=0 u αf n
(3.7)
Since ξ1 , ξ2 , . . . are i.i.d. random variables, the numerator in Eq. (3.7) converges a.s. to Eµ0 (ξ1 ) as n → ∞. On the other hand, (u¯ 2n , n ≥ 1) form a Markov chain, and for each n ≥ 1: u¯ 2n = α 2n v¯02 + Zn (ξ1 , ξ2 , . . . , ξn ) = Zn (
v¯02 + ξ1 , ξ2 , . . . , ξn ), 2f
(3.8)
where Zn (ξ1 , ξ2 , . . . , ξn ) = 2f (α 2 ξn + · · · + α 2n ξ1 ).
(3.9)
Obviously, for each n the law of Zn (ξ1 , ξ2 , . . . , ξn ) is the same as that of Z˜ n (ξ1 , ξ2 , . . . , ξn ) defined as Z˜ n (ξ1 , ξ2 , . . . , ξn ) = Zn (ξn , ξn−1 , . . . , ξ1 ) = 2f
n
α 2i ξi ,
(3.10)
i=1
and moreover 0 ≤ Z˜ n Z˜ ∞ = 2f
∞
Eµ0 Z˜ ∞ =
i=1 α
2i ξ
i
< +∞, a.s., and
2f α 2 Eµ (ξ1 ) < ∞. (1 − α 2 ) 0
(3.11)
In particular we see that u¯ 2n converges in distribution to ν, the law of Z˜ ∞ . By increasing the probability space, if needed, we take ζ distributed according to ν, independent of ξ1 , ξ2 , . . . , and consider the stationary Markov chain Z n = Zn ( Thus
ζ + ξ1 , ξ2 , . . . , ξn )). 2f
u¯ 2n − Z n = α 2n (v¯02 − ζ ),
(3.12)
(3.13)
Mixing Properties for the Mechanical Motion of Charged Particle
331
from where we see at once that the Markov chain u¯ 2n converges exponentially fast (in law) to its unique stationary measure, ν, and by the ergodic theorem we see that
n √ 1 n→∞ u¯ i −→ udν(u) = Eµ0 Z˜ ∞ , (3.14) n R+ i=1
µ0 a.s. On the other side a trivial application of the Borel–Cantelli Lemma gives us that u¯ 0 lim u¯ n − = 0 µ0 a.s. Recalling Eq. (3.7) we get (3.1). n The limit in (3.1) 0 < v¯D < +∞ will be called Markovian (asymptotic) drift. Remark 3.2. The condition µ0 (ξ1 ≥ d) = 1 ensures that at any collision the outgoing def 2df 2df velocity of the n.p. is at least vmin = (1 + α) 1−α 2 , provided v0 (0) ≥ α 1−α 2 , as it follows through a direct calculation using (2.1). Thus, condition (c) in Assumptions 2.5 can be read as (3.15) vmin > v¯D , which is a crucial property in the analysis that follows. When α is not too close to one, √ condition (b1) in Assumptions 2.5 might be omitted, and we would then get (1 + α) 2df as the new value for the minimal velocity; in such case, the condition (c) of Assumptions 2.5 would have to be replaced by √ 2(1 − α 2 ) d >
Eµ (ξ1 ) 0 , 2(j −1) ξ Eµ0 α j j ≥1
which however, makes no sense if 1 − α 2 ≤ 1/4. We end this section by stating and proving some elementary facts which will be useful later on. Proposition 3.3. Let x, x¯ be two initial configurations such that v0 ≤ v¯0 , vi = v¯i = 0, if i ≥ 1, and qi = q¯i , i ≥ 1, and such that the dynamics is well defined when starting from x. Let sz ≡ sz (x) (¯sz resp.) denote the time at which the h.p. reaches the point z ∈ R+ in the original (Markovian, resp.) dynamics which starts at configuration x (x¯ resp.). Then v0 (sz , x) ≤ v¯0 (¯sz , x); ¯ (3.16) for all z ∈ R+ . Proof. Let x ∈ X be such that dynamics (Tt x)t>0 is well defined, as in Proposition 2.2. Then sz (x) < +∞ and the number of collisions up to sz (x) is finite. We shall prove (3.16) by induction on K, the number of collisions of the h.p. in the original dynamics, before reaching point z. The statement is obviously true for K = 0 and K = 1. Let us assume it is true for K ≤ n and we will prove it for K = n + 1. Let then K = n + 1, and let us denote by zn (x) < zn+1 (x) < z the points where the nth and (n + 1)st collision of the h.p. happened. By induction assumption, for any z˜ < zn+1 we have ¯ v0 (sz˜ , x) ≤ v¯0 (¯sz˜ , x). The same immediately holds for the left limits v0 (sz−n+1 (x) , x) ≤ v¯0 (¯sz−n+1 (x) , x), ¯
(3.17)
332
V. Sidoravicius, L. Triolo, M. E. Vares
and independently of the fact that at the point zn+1 (x) we have a recollision or a collision with standing particle, inequality (3.17) implies that v0 (szn+1 (x) , x) ≤ v¯0 (¯szn+1 (x) , x), ¯ as well. Now, by assumption, in between zn+1 (x) and z, the h.p. moves without interaction for both dynamics (with constant acceleration f ). Thus we get v0 (sz , x) ≤ v¯0 (¯sz , x). ¯
Corollary 3.4. Under Assumptions 2.1 we have τn (x) ≥ τ¯n (x) for each n ∈ N µ0 a.s. Proof. Immediate from Proposition 3.3. 4. Tightness of {µ(n) }n≥1 Our goal in this section is to prove the following: Theorem 4.1. The family {µ(n) }n≥1 is tight. For the proof of the above theorem we shall consider a suitable sequence of random indices which we define below. For this, let us start by observing that v¯0 (t¯n ) = u¯ n−m (v¯0 (t¯m ), ξm+1 , . . . , ξn ), τ¯n = τ¯n−m (v¯0 (t¯m ), ξm+1 , . . . , ξn ), for any 1 ≤ m < n.
In what follows we fix a constant V∗ > Eµ0 Z˜ ∞ and we write v¯m,n and τ¯m,n for the restrictions of the previously defined functions to the set v¯0 (t¯m ) = V∗ , i.e., def
v¯m,n (ξm+1 , . . . , ξn ) = u¯ n−m (V∗ , ξm+1 , . . . , ξn ), def
τ¯m,n (ξm+1 , . . . , ξn ) = τ¯n−m (V∗ , ξm+1 , . . . , ξn ). Definition 4.2. Given x ∈ X we say that index n > k is k-admissible if: (i) v¯k,n (x) ≤ V∗ , n 1 (ii) ξ − E (ξ ) j µ0 1 ≤ δ, n − k j =k+1
n (iii)
j =k+1 ξj
n
l=k+1 τ¯k,l
where δ is defined as δ=
≤ v¯D + δ,
vmin − v¯D . 4
(4.1)
Mixing Properties for the Mechanical Motion of Charged Particle
333
We then consider the sequence of random indices {Rn (x)}n≥0 defined as follows: R0 (x) = min{j : v¯0 (t¯j ) ≤ V∗ }, Rn+1 (x) = min{j > Rn (x) : j is Rn (x) − admissible}.
(4.2)
Notice that if n ≥ 1 then Rn − R0 depends only on (ξ1 , ξ2 , . . . ). From the properties of the initial distribution and the expression for u¯ n we can check that µ0 (Rn < +∞) = 1 for each n ≥ 0. In fact, the following holds: Proposition 4.3. Provided Assumptions 2.5 are satisfied, under µ0 the random variables def
Yr = Rr − Rr−1 , r ≥ 1 are i.i.d. and have exponentially decaying tail, i.e.:
µ0 (Yr > n) ≤ ce−c n ,
(4.3)
for suitable c, c > 0 (which depend on V∗ and δ) and all n ≥ 1. Moreover, µ0 (R0 > n) ≤ c1 e−c1 n with c1 depending on V∗ , δ and h > 0 as in (b2), and c1 depending also on Eµ0 (v0h ). Proof. The first statement is obvious and let us now prove Eq. (4.3) which will follow if we prove that each of the following probabilities: 1 n (i’) µ0 n j =1 v¯0,j > V∗ , µ0 n1 nj=1 ξj − Eµ0 (ξ1 ) > δ , n ξj µ0 nj =1τ¯ > v¯D + δ ,
(i”) (iii”)
j =1 0,j
tends to zero exponentially fast in n, as n → +∞. For (ii’) the estimate follows at once from the Cramér theorem applied to due to our assumption on the tail of the distribution of ξ1 . As for (iii’) the main point consists in checking that if η > 0, n 1 ˜ µ0 v¯0,j ≤ Eµ0 Z∞ − η ≤ ce−c n , n
1 n
n
j =1 ξj ,
(4.4)
j =1
for suitable c, c > 0 (depending on η). Indeed, having (4.4) we use Eq.(3.7) with u¯ 0 = V∗ ; since u¯ n ≥ 0, performing straightforward algebraic manipulations we conclude (iii’) by properly choosing η and also using (ii’) with a suitable η replacing the fixed δ. Thus, it remains to check Eq. (4.4). Remark. Equation (4.4) is a one sided large deviation estimate for the sample average of the Markov chain v¯0,i . As we shall see below, there is no problem to apply the Gärtner– Ellis condition, and to get the usual large average of deviation estimates for the sample 2 Zi (or v¯ ). With our choice of V∗ > Eµ0 Z˜ ∞ , instead of V∗ > Eµ0 Z˜ ∞ we avoid 0,i
the consideration of the analogous result for v¯0,j . In this way the estimate (i’) is rougher, but this is irrelevant for our problem.
334
V. Sidoravicius, L. Triolo, M. E. Vares
Proof of (4.4). Clearly, to prove (4.4) we may replace the initial velocity V∗ by 0, i.e. it suffices to prove that if η > 0, then n 1 ˜ Zj (ξ1 , . . . , ξj ) ≤ Eµ0 Z∞ − η ≤ ce−c n . µ0 n j =1
Since the ξi are non-negative, if we write j n n 1 1 An = Zj (ξ1 , . . . , ξj ) = 2f α 2(j −i+1) ξi , n n j =1
j =1
then
i=1
n
Akn ≥
1 (k) Br , n r=1
where Br(k) =
1 k
rk
2f
j =(r−1)k+1
j
α 2(j −i+1) ξi .
i=(r−1)k+1 (k)
Clearly, for each k the random variables Br , r = 1, 2, . . . are i.i.d. and distributed as (k) Ak . In particular, for k large enough, we have Eµ0 B1 > Eµ0 Z˜ ∞ − η/2. Fix one such k and since the moment generating function of Ak is finite in a neighborhood of the origin (due to our assumption on the tails of ξi ) we again apply the Cramér theorem for i.i.d. random variables and get (4.4). As for (i’) we write µ0
n n 1 1 2 v¯0,l > V∗ ≤ µ0 v¯0,l > V∗2 n n l=1
l=1
2 (or equivalently u and now observe that the Markov chain v¯0,l ¯ 2l ) satisfies the Gärtner– Ellis condition (see e.g. Assumption 2.3.2 Ch.2 of [7]). In fact, if φ(a) = Eµ0 (eaξ1 ), Eq. (3.5) gives: n 2 1 log Eµ0 ea l=1 v¯0,l n→∞ n n 1 1 − α 2n 2 2af α 2 2(n−j +1) = lim log Eµ0 exp aα 2 V + (1 − α )ξ j n→∞ n 1 − α2 ∗ 1 − α2
lim
n 1 2af α 2 2(n−j +1) = lim log φ (1 − α ) n→∞ n 1 − α2 j =1 2af α 2 = log φ 1 − α2
j =1
(4.5)
Mixing Properties for the Mechanical Motion of Charged Particle
335
which is going to be finite and differentiable in a in some open interval around the origin. 2f α 2 Using then Theorem 2.3.6 of [7] we get (i’), since V∗2 > Eµ0 (Z˜ ∞ ) = (1−α 2 ) Eµ0 (ξ1 ), cf. (3.11). It remains to check that µ0 (R0 > n) decays exponentially fast. This follows from the argument as in (i’). Clearly the initial condition v¯02 does not affect the limit; its contribution to the above sample average being n1 nj=1 α 2j v¯02 . Nevertheless, since we allowed v¯02 to be random and not necessarily with exponentially decaying tails we just proceed in a more careful way to control the decay rate. More precisely, take β > 0 sufficiently small so that V∗2 − β > Eµ0 (Z˜ ∞ ), and observe that 2 µ0 (R0 > n) ≤ µ0 ( n 2 ≤ µ0 ( n
n j =[n/2]+1 n
u¯ 2j > V∗2 ) α 2j v¯02
j =[n/2]+1
2 > β) + µ0 ( n
n j =[n/2]+1
Zj > V∗2 − β).
The statement follows at once from condition (b2) in Assumptions 2.5, and the previous large deviation estimate. The basic ingredients for the proof of the tightness are contained in the following lemma. Lemma 4.4. a) For any b > 0 there exists a constant Cb so that supn Eµ(n) (#(x|[0,b] )) ≤ Cb ,
(4.6)
where x|I = x ∩ I × R+ for any bounded interval I ⊂ R+ . b) The family of random variables (v0 (tn ))n≥1 is tight under the measure µ0 . Proof. Statement b) follows at once from Proposition 3.3, cf. Eq. (3.16), and the expression for (v¯02 (t¯n )), according to Eq. (3.5). Since b µ(n) x : #(x ∩ ([0, b] × {0}) > = 0, d for the proof of a), it suffices to consider only the moving particles, i.e. it is enough to prove that
supn Eµ(n) #(x ∩ ([0, b] × (0, +∞)) ≤ Cb . (4.7) Of course, having arbitrarily fixed k ≥ 1, it is enough to restrict the supremum in (4.7) to n ≥ k. From condition (iii) of Definition 4.2 and Corollary 3.4 it follows that Rn+1 Rn+1 Rn+1 Rn+1 l=Rn +1 ξl , (4.8) τl ≥ τ¯l ≥ τ¯Rn ,l ≥ v¯D + δ l=Rn +1
l=Rn +1
l=Rn +1
for all n ≥ 0, where in the second inequality we used that v¯0 (t¯Rn ) ≤ V∗ . v −v¯ From Eq. (4.8) and the choice of δ ≤ min4 D cf. (4.1) we have Rn+1
vmin
l=Rn +1
τl ≥
vmin v¯D + δ
Rn+1
l=Rn +1
ξl ≥
Rn+1 v¯D + 4δ ξl . v¯D + δ l=Rn +1
(4.9)
336
V. Sidoravicius, L. Triolo, M. E. Vares
Equation (4.9) implies that during the time interval (tRn , tRn+1 ], when the h.p. travels a Rn+1 distance l=R ξ , all n.p. which were moving at tRn travel a distance at least n +1 l Rn+1 v¯D + 4δ ξl . v¯D + δ l=Rn +1
def
Thus δ ∗ = v¯ 3δ+δ d is a lower bound for their relative displacement with respect to the D h.p. during the time interval (tRn , tRn+1 ]. For any n ≥ 1, let ιn be defined through the relation Rιn ≤ n < Rιn +1 , (4.10) provided R0 ≤ n, and ιn = −1 if R0 > n. For notational completeness we set Y0 = R0 , Y−1 = 0. From the above considerations and since, due to condition (ii) of Definition 4.2, for any j ≥ 0, [Rj +1 < +∞] ⊆ [qRj +1 − qRj ≤ (Eµ0 ξ1 + δ)Yj +1 ], we see that
Eµ(n) #(x ∩ [0, b] × (0, +∞)) ≤ Eµ0 (R1 1(R1 >n) ) ι n −1
Yιn −i 1(iδ ∗ ≤Yιn +1 (Eµ0 ξ1 +δ)+b) + Eµ0 (R0 ) + Eµ0 Yιn +1 + i=0
≤ Eµ0 (Yιn +1 ) +
j −1 n
Eµ0 (
j =1 i=0
Rj 1(iδ ∗ ≤Yj +1 (Eµ0 ξ1 +δ)+b) 1(ιn =j ) ) + Eµ0 (R0 + R1 ), j (4.11)
where we have used that [ιn = j ] ∈ σ (Rj , Yj +1 ), with Y1 , Y2 , . . . are i.i.d. and integrable, Eµ0 (Yk |Rj , Yj +1 ) = Eµ0 (
Rj − R 0 Rj |Rj , Yj +1 ) ≤ , j j
µ0 a.s., if
1 ≤ k ≤ j.
Thus, by Eq. (4.10), and fixing k ≥ 1, we bound the second term on the r.h.s. of Eq. (4.11) as follows: j −1 n n j =1 i=0
j
µ0 (ιn = j, Yj +1 (Eξ1 + δ) + b ≥ iδ ∗ )
n ≤ n µ0 (ιn ≤ ) + k k
j −1 n j =[ nk ]+1
≤ nµ0 (R[ nk ]+1 > n) + k
+∞ i=0
≤ nµ0 (R[ nk ]+1 > n) + k
µ0 (ιn = j, Yj +1 (Eµ0 ξ1 + δ) + b ≥ iδ ∗ )
i=0
µ0 (Yιn +1 (Eµ0 ξ1 + δ) + b ≥ iδ ∗ )
b+1 + 1 δ∗
(4.12)
Mixing Properties for the Mechanical Motion of Charged Particle
337
Eµ0 ξ1 + δ + 1 Eµ0 (Yιn +1 ) δ∗ n n b+1 ≤ nµ0 (R[ nk ]+1 − R0 > ) + nµ0 (R0 > ) + k + 1 2 2 δ∗ Eµ0 ξ1 + δ +k + 1 Eµ0 (Yιn +1 ), δ∗ +k
where [y] denotes the integer part of y. Fix k > 4Eµ0 Y1 and since k/2 < n/([n/k] + 1) for n > k we have: nµ0
[ nk ]+1 1 n k n Yi > R[ nk ]+1 − R0 > ≤ nµ0 n ≤ cne−c k 2 [k] + 1 4
(4.13)
i=1
for suitable positive constants c, c , and all n > k. Here we are using Proposition 4.3, the choice k/4 > Eµ0 (Y1 ), and again the upper bound in the Cramér theorem. Thus, the r.h.s. of (4.13) remains bounded in n and the same happens to nµ0 (R0 > n2 ), due to Proposition 4.3. From this, (4.11), and (4.12), the proof is reduced to show that: supn Eµ0 (Yιn +1 ) < +∞. This follows from direct computation, using the independence of R0 , Y1 , Y2 , . . . : Eµ0 (Yιn +1 ) = Eµ0 (R0 1(R0 >n) ) +
∞ n j µ0 (Yi+1 = j, Ri ≤ n < Ri+1 ) j =1
i=0
∞ n ≤ Eµ0 R0 + j µ0 (Yi+1 = j, n − j < Ri ≤ n)
= Eµ0 R0 + = Eµ0 R0 + ≤ Eµ0 R0 +
j =1 ∞ j =1 ∞ j =1 ∞
i=0
j
n
µ0 (Yi+1 = j )
i=0
j µ0 (Y1 = j )
n
µ0 (Ri = s)
s=n−j +1∨0 n n
µ0 (Ri = s)
s=n−j +1∨0 i=0
j 2 µ0 (Y1 = j ) = Eµ0 R0 + Eµ0 Y12 .
j =1
Theorem 4.1 follows immediately from the previous lemma.
5. Strong Cluster Property and Invariant Measure for the Discrete Dynamics Definition 5.1. Given x ∈ X for which the dynamics is well defined, we say that k is a cluster index for x if the h.p. will never collide after tk with those particles which are moving at time tk , including the k th standing neutral particle.
338
V. Sidoravicius, L. Triolo, M. E. Vares
Theorem 5.2. There exists K : X → {0, 1, 2, . . . } verifying
µ0 {x : K(x) ≤ n} ≥ 1 − ce−c n ;
(5.1)
for suitable positive constants c, c , and all n ≥ 1, such that K(x) is a cluster index for x, µ0 a.s. Before starting the proof we need to introduce several definitions and specify some auxiliary parameters. Definition 5.3. Given x ∈ X, integers r > 1, r > 1, and 0 < @ < 1 (r , r and @ will be specified below) we recurrently define the sequence of pairs {(Rn (x), Rn (x))}n≥0 as follows: R0 (x) = R0 (x) = min{j : v¯0 (t¯j ) ≤ V∗ },
Rn+1 (x) = min{j : j ≥ Rn (x) + r and j is Rn (x) − admissible};
(5.2)
Rn+1 (x) = Rn+1 (x) + max{1 ≤ k ≤ r : max ξRn+1 (x)+j ≤ d + @}
(5.3)
and 1≤j ≤k
with the understanding that the Rn+1 = Rn+1 if ξRn+1 (x)+1 > d + @, n ≥ 0.
Clearly the random indices Rn , Rn are µ0 a.s. finite. (R0 = R0 = R0 , as defined in Sect. 4.) Definition 5.4. For n ≥ 1, the pair (Rn (x), Rn (x)) is called separating if Rn (x) − Rn (x) = r. Before specifying the constants @, r and r we will briefly justify our choice for the notion of separating pair. The heuristics is as follows: the pair (Rn (x), Rn (x)) being separating means that there is “a quite dense block” of r standing neutral particles, (ii) this will affect the evolution of our system in such a way that interacting with the first r1 < r neutral particles of the block the h.p. slows down its velocity up to a certain level, (iii) as a consequence of (ii), in the further motion within the block, the h.p. not only is unable to interact with the n.p. which were moving at the time tRn +r1 , but also when it arrives at the end of the block, all neutral particles with the indices less than or equal to Rn + r1 are relatively far. On the other hand the constant r which appears in (5.2) will be taken big enough (x)) are not so that any two subsequent pairs (Rn (x), Rn (x)) and (Rn+1 (x), Rn+1 too close to each other. This will allow us to get a lower bound for the increment of the interdistances between the h.p. and neutral particles with the indices less than ). This bound still doesn’t exclude or equal to Rn , over the time interval [tRn , tRn+1 interactions between the h.p. and “old” neutral particles during the time interval ) but it will be a fundamental ingredient for the construction of cluster [tRn , tRn+1 indices. (i)
Now we specify the constants @, r and r. We first fix 0 < @ such that @+
2d 2(d + @) < (1 + α)2 , 1 − α2 1 − α2
(5.4)
Mixing Properties for the Mechanical Motion of Charged Particle
i.e. 0 < @ <
2αd(2+α) . 3−α 2
339
Given @ as in (5.4) we define def
r1 = min{k ∈ N : α 2k V∗2 ≤ f @}, def
w¯ @2 = f @ +
(5.5)
α 2 2f (d
+ @) , 1 − α2
(5.6)
and def
B = r1 (d + @).
(5.7)
As for r , we require it to be large enough so that (5.12) below holds, and that r >
(v¯D + δ)(B + Eµ0 ξ1 + δ) , 3δd
(5.8)
where δ is given by (4.1). Finally we set: def
r2 =
w¯ (r + 1)(E ξ + δ) @ µ0 1 + 1, d(vmin − w¯ @ )
(5.9)
where [y] denotes the integer part of y, and def
r = r 1 + r2 .
(5.10)
Remark 5.5. (i) With @ and r1 as in (5.4) and (5.5) we see that if the interdistances ξRn +1 , . . . , ξRn +r1 happen to satisfy d < ξRn +j ≤ d + @, 1 ≤ j ≤ r1 , then at the time t¯Rn +r1 the (outgoing) velocity of the h.p. in the corresponding Markovian approximation dynamics (and therefore in the original one, at the time tRn +r1 ) will be less than w¯ @ . In this case B is an upper bound for rj1=1 ξRn +j . 2 . Consequently, if at some time t (ii) Inequality (5.4) says that w¯ @2 + 2f (d + @) < vmin k the h.p. gets (outgoing) velocity less than or equal to w¯ @ and the next interdistance is at most d +@, then it cannot reach any moving n.p. during the time interval [tk , tk+1 ]. In this case, at time tk+1 its (outgoing) velocity will be smaller than w¯ @ . (iii) Similarly to the proof of Proposition 4.3, we see that under µ0 the random variables R0 and Yn = Rn − Rn−1 , def
n≥1
are independent. Yn , n ≥ 2 are also identically distributed. With minor modifications of that proof we see that the distribution of Y1 , and the common distribution of Yn , n ≥ 2 have exponentially decaying tails, i.e. there exist c, c > 0 so that
µ0 (Yn > r + k) ≤ ce−c k ,
(5.11)
for all n. Taking r large enough the proof shows we can always assume that µ0 (Yn > r + 1) < 1
(5.12)
for all n. Recall that from Proposition 4.3 the R0 also has exponentially decaying tail.
340
V. Sidoravicius, L. Triolo, M. E. Vares
Lemma 5.6. The choice of r in (5.8) insures that ) − qR ] − [qi (tR ) − qR ]} ≥ Eµ ξ1 + δ, min1≤i≤Rn {[qi (tRn+1 0 n n n+1
(5.13)
] between for any n ≥ 0, i.e. the increment of distance over the time interval (tRn , tRn+1 the h.p. and each of n.p. which was already moving at time tRn is at least Eµ0 ξ1 + δ.
Proof. By (iii) of Definition 4.2 and Corollary 3.4 (as in (4.8)) we have Rn+1 tRn+1 − tRn ≥
j =Rn +1 ξj
v¯D + δ
,
(5.14)
and due to the existence of minimal velocity, cf. Remark 3.2, qi (tRn+1 ) ≥ qi (tRn ) + vmin (tRn+1 − tRn ),
(5.15)
for 1 ≤ i ≤ Rn . Together with (5.14) this implies that R
qi (tRn+1 ) − qRn+1 ≥ qi (tRn ) − qRn
≥ qi (tRn ) − qRn ≥ qi (tRn ) − qRn
n+1 vmin +( ξj − 1) v¯D + δ
j =Rn +1
vmin + r d( − 1) v¯D + δ + B + Eµ0 ξ1 + δ
where, in the last inequality we have used (5.8). We next prove that
min qi (t) − q0 (t) − qi (tRn+1 ≥ −B, ) − qRn+1 tR
n+1
≤t≤tR
(5.16)
(5.17)
n+1
which together with (5.16) implies (5.13). Consider two cases: Case I. Rn+1 − Rn+1 ≤ r1 . By (5.3) we have, that for all tRn+1 ≤ t ≤ tRn+1 q0 (t) − qRn+1 ≤ r1 (d + @) = B,
(5.18)
) for all i, (5.17) follows trivially. and since qi (t) ≥ qi (tRn+1
Case II. Rn+1 − Rn+1 > r1 . As in Case I, one has the following estimate at the time tRn+1 +r1 : tR
n+1
min
≤t≤tR
n+1 +r1
qi (t) − q0 (t) − qi (tRn+1 ≥ −B. ) − qRn+1
] the h.p. moves with As noticed in Remark 5.5, during the time interval (tRn+1 +r1 , tRn+1 velocity at most w¯ @ , while all moving n.p. have velocity at least vmin > w¯ @ . Thus, over this time interval the increment of the interdistance between the h.p. and each n.p. with label i ≤ Rn (indeed for each i ≤ Rn+1 + r1 ) is increasing, and therefore (5.17) is verified, concluding the proof.
Mixing Properties for the Mechanical Motion of Charged Particle
341
The choice of r2 , and consequently that of r, implies suitable bounds on the increase of the minimal distance between the moving n.p. and the h.p. during the time interval [tRn +r1 , tRn ], provided the pair (Rn , Rn ) is separating. This will be used in the proof below. Proof of Theorem 5.2. We begin with a couple of simple observations: A) Each Rj is a stopping time with respect to the filtration (σ (v0 , ξ1 , . . . , ξn ))n≥1 , i.e., for each n ≥ 1 the event Rj = n belongs to σ (v0 , ξ1 , . . . , ξn ), the σ −field generated by the random variables v0 , ξ1 , . . . , ξn . From this and (5.3) it follows that µ0 [(Rj , Rj ) is separating] = µ0 [(Rj , Rj ) is separating, Rj = n] n≥j
=
n+r n≥j i=n+1
µ0 [ξi ≤ d + @] µ0 [Rj = n]
(5.19)
= (µ0 [ξ1 ≤ d + @])r > 0. B) On the other hand, if the pair (Rj , Rj ) is separating, from the choice of r2 in (5.9), at time tRj all n.p. with labels less than or equal to Rj + r1 will be at the distance at least (r + 1)(Eµ0 ξ1 + δ) from the h.p., i.e., min
1≤i≤Rj +r1
qi (tRj ) − qRj ≥ (r + 1)(Eµ0 ξ1 + δ).
(5.20)
In fact, to check (5.20) we first recall the observation (ii) of Remark 5.5 from which we see that
qi (tRj ) − qRj ≥ qi (tRj +r1 ) − qRj +r1 + vmin − w¯ @ (tRj − tRj +r1 ). Recalling the choice of r2 (cf. (5.9)) and that tRj − tRj +r1 ≥ r2 d/w¯ @ for a separating pair (Rj , Rj ), we get (5.20). We now check that property (5.20), together with observation (ii) in Remark 5.5, and Lemma 5.6 imply the following inclusion: [(Rj , Rj ) is separating] ∩
∞
[Rj +i − Rj+i−1 ≤ r + i]
i=1
(5.21)
⊆ [Rj + r1 is a cluster index]. Indeed, as observed in (ii) of Remark 5.5, the occurrence of the event [(Rj , Rj is separating] guarantees that during the time interval [tRj +r1 , tRj ] the h.p. cannot interact with those n.p. with labels less than or equal to Rj + r1 and moreover, according to (5.20), at time tRj all these n.p. are to the right of the point qRj + (r + 1)(Eµ0 ξ1 + δ). On the other hand, the occurrence of the event [Rj +1 − Rj ≤ r + 1], implies that qRj +1 ≤ qRj + (r + 1)(Eµ0 ξ1 + δ),
342
V. Sidoravicius, L. Triolo, M. E. Vares
and we see that, in this case, at time tRj all moving n.p. with labels less than or equal to Rj + r1 are to the right of the point qRj +1 . In particular, the h.p. cannot interact with any of them during the time interval [tRj +r1 , tRj +1 ]. Moreover, from (5.16) we have min
1≤i≤Rj +r1
qi (tRj +1 ) − qRj +1 ≥ (r + 2)(Eµ0 ξ1 + δ) + B,
thus, using (5.17) we conclude that the absence of interaction extends up to time tRj+1 and that min qi (tRj+1 ) ≥ qRj+1 + (r + 2)(Eµ0 ξ1 + δ). (5.22) 1≤i≤Rj +r1
Therefore, if the event [Rj +2 − Rj+1 ≤ r + 2] also occurs, it implies that at time tRj+1 all n.p. with the labels less than or equal to Rj + r1 will be not only to the right of the point qRj +2 , but they will be unreachable for the h.p. during the time interval (tRj , tRj+2 ], and we have: min
1≤i≤Rj +r1
qi (tRj+2 ) ≥ qRj+2 + (r + 3)(Eµ0 ξ1 + δ).
(5.23)
Repeating the above argument and using (5.13), we see that the occurrence of [(Rj , Rj ) separating ] and of each of the successive events [Rj +1 − Rj ≤ r + 1], . . . , [Rj +k+1 − Rj+k ≤ r + k + 1] implies that (by (5.13)) min
1≤i≤Rj +r1
qi (tRj+k+1 ) ≥ qRj+k+1 + (r + k + 2)(Eµ0 ξ1 + δ).
(5.24)
In these circumstances, the further occurrence of the event [Rj +k+2 − Rj+k+1 ≤ r + k + 2] guarantees that during the time interval (tRj+k+1 , tRj+k+2 ] the h.p. cannot interact with any n.p. with labels less than or equal to Rj + r1 , and moreover the interparticle distance will again increase at least by Eµ0 ξ1 + δ. Repeatedly using the same argument we obtain (5.21). Let us now use (5.21) in order to estimate the probability on the statement of Theorem 5.2. For this, we first prove the existence of positive constants c, c so that:
µ0 [∃ j ≤ n : Rj + r1 is a cluster index] ≥ 1 − ce−c n .
(5.25)
For the proof of (5.25) let us start by observing that, due to (5.21), the event of interest can be controlled in terms of the time of the last visit to 0 for a renewal Markov chain, ζ , on N, starting at 0 and with transition probabilities satisfying:
µ0 [ζk+1 = i + 1|ζk = i] ≥ 1 − ce−c i , µ0 [ζk+1 = 0|ζk = i] = 1 − µ0 [ζk+1 = i + 1|ζk = i]
(5.26)
for each i ∈ N and where c, c are positive, with c < 1 (which of course might differ from those in (5.25)). A possible way to define such a chain is indicated in (5.21) and (5.11). Namely, ζ0 = 0, ) is separating, and and for any k ≥ 1 if ζk = 0 we set ζk+1 = 1 provided (Rk+1 , Rk+1 0 otherwise. If ζk = i, (i ≥ 1), then ζk+1 = i + 1 provided Rk+1 − Rk ≤ r + i,
Mixing Properties for the Mechanical Motion of Charged Particle
343
and ζk+1 = 0 otherwise. That ζ is a Markov chain with the transition probabilities as described above follows at once, using (5.11), (5.12) and (5.19). From (5.21) we get: µ0 [∃ j ≤ n : Rj + r1 is a cluster index] ≥ µ0 [ζk > 0 for all k > n]. After the above observation, (5.26), and standard facts on the above simple Markov chain ([8], Ch. 2), we immediately get (5.25). To conclude Theorem 5.2 it now suffices to observe that there exist b > 0 so that µ0 {Rn > bn} decays exponentially in n. This follows from (5.11), the exponential decaying tail of the distribution of R0 = R0 , and standard arguments, as already used in Sect. 4 (upper estimate in the Cramér Theorem). Now we turn to the convergence of the measures µ(n) . The existence of weak limit points for the sequence {µ(n) } was established by proving tightness in Sect. 4. We prove the convergence by suitably coupling any Cesaro limit of {µ(n) }, with the initial measure µ0 . In particular, Theorem 2.6 will follow at once from the next lemma. Lemma 5.7. Let µ be a stationary measure for T1 , obtained as some weak limit point of n1 ni=1 µ(i) . Then we can define a coupling Q of µ and µ0 , in such a way that for any b > 0 there exist positive constants c, c so that
Q{(x, x ) : Tn x|[0,b] = Tn x |[0,b] } ≤ ce−c n
(5.27)
for all n ≥ 1. Proof. Let µ be as in the statement of the lemma. As in [5], under µ the moving particles are independent of the standing ones, which are distributed as under µ0 . Let µm be the marginal distribution corresponding to the moving particles in µ. Due to Assumptions 2.5 we know that under µm all n.p. have velocity at least vmin , except possibly for a set of measure zero (see Remark 3.2). Let us now make a joint construction of µ and µ0 which verifies (5.27). Let x = (xm , ξ1 (x), ξ2 (x), . . . ) be a random configuration distributed according to µ, where xm represents the moving particles, ξ1 denotes the position of the leftmost standing n.p., and ξ2 , ξ3 , . . . denote the interparticle distances between successive standing neutral particles. (Recall that we look at the system as seen from the position of the h.p., which is then located at the origin.) We start the construction of the two initial configurations x , x distributed as µ = x , and x contains and µ0 respectively, by first setting their moving particles: xm m m only one particle, namely the h.p., located at the origin, with velocity v0 (x ) being distributed according to the initial measure µ0 and independent of x. For this we might need to enlarge the original probability space where x is defined. We shall enlarge it even more, if needed, so as to accommodate the configurations x , x , cf. the remark at the end of this proof. Notation. To make notation lighter we shall denote by ξi , ξi ; i ≥ 1, the successive interdistances between standing n.p. in the configurations x and x respectively, and ξi = ξi (x), the interdistances in the starting configuration x. Let us now slightly modify the definition of R0 given in Sect. 4, by setting it as R0 (x, v0 (x )) = min{n : α 2n (v02 (x) ∨ v02 (x )) + 2f
n i=1
α 2(n−i+1) ξi (x) ≤ V∗2 };
344
V. Sidoravicius, L. Triolo, M. E. Vares
and set ξi = ξi = ξi for 1 ≤ i ≤ R0 (x, v0 (x )). This automatically implies that in the Markovian evolution for both configurations x and x , at time t¯R0 (x,v0 (x )) (i.e, at times t¯R0 (x,v0 (x )) (x ) (= t¯R0 (x,v0 (x )) (x)) and t¯R0 (x,v0 (x )) (x ) resp.) the velocity of the h.p. will be bounded from above by V∗ . By Proposition 3.3, the same will be true for the original dynamics, at the corresponding discrete times tR0 (x,v0 (x )) (x ) and tR0 (x,v0 (x )) (x ). The further indices Rj , Rj , are also modified, and defined as in Eqs. (5.2) and (5.3), provided r = r1 + r2 is changed to r˜ = r˜1 + r2 , with r˜1 to be chosen below. Before giving the prescription for r˜ , it is important to observe that with such a rule, the random variables Rj − R0 , Rj − R0 will depend only on the configuration x, being described exactly by (5.2), and (5.3), with the admissibility condition being settled in terms of the interdistances ξi , according to Definition 4.2. Letting @ be defined according to (5.4) and ρ > 0 we set, def r(ρ) = min j : α 2j w¯ @2 ≤ ρ and r˜1 will be taken as r˜1 = r1 + r(ρ) for a suitable ρ > 0 to be described below. r2 is the same as in (5.9). The definition of a separating pair is changed accordingly, replacing r by r˜ . Let then both configurations have the same interdistances between successive standing particles as the configuration x up to the index Rj 1 + r˜1 , where j1 = min{k : Rk = Rk + r˜ }, i.e. j1 is the index of the first separating pair. Observe that, [Rj 1 = n] ∈ An , where An = σ (v0 (x ), xm , ξ1 , . . . , ξn+˜r , ξ1 , . . . , ξn , ξ1 , . . . , ξn ), for each n ≥ 1. The choice of the parameters was done in such a way that at tRj +˜r1 the difference 1
of the squared velocities for the h.p. in the two evolutions (from x , x ) is at most ρ. Indeed, recalling the definition of r(ρ) and the choice of the same interdistances, this is a consequence of the fact that for both evolutions, at time tRj +r1 , the h.p. has velocity 1 at most w¯ @ , and that it does not interact with moving particles during time intervals [tRj +r1 , tRj +˜r1 ] (for both configurations), as observed in (ii) of Remark 5.5. 1
1
The distance to the next standing particle in both configurations x , x shall be properly chosen, so as to guarantee that with positive probability the next collision will happen with a standing particle and for both configurations the h.p. will get the same velocity. In order to do that we select these interdistances, (ξR +˜r +1 , ξR +˜r +1 ) in a way that their j1
j1
1
1
regular conditional distribution given the σ −field ARj , is the measure νv ,v , described 1
in Remark 5.8 below, and where v = v0 (TRj +˜r1 (x )), v = v0 (TRj +˜r1 (x )); each of 1 1 the marginals of νv ,v coincides with the distribution of ξ1 conditioned to ξ1 ≤ d + @ and (5.28) νv ,v {(ξ , ξ ) : (v )2 + 2f ξ = (v )2 + 2f ξ } ≥ a(@, ρ) > 0 for a proper choice of ρ. More precisely, given Rj 1 = n,
v0 (TRj
1
+˜r1 (x
)) = v ,
v0 (TRj
1
+˜r1 (x
)) = v ,
we take (ξ , ξ ) (conditionally) independent of An , and distributed according to the measure νv ,v and these will be the values of the interdistances (ξR +˜r +1 , ξR +˜r +1 ). j1
1
j1
1
(Notice that the marginals of νv ,v do not depend on v , v , as needed to ensure the correctness of the marginal distributions of x and x .) We are here using the standard
Mixing Properties for the Mechanical Motion of Charged Particle
345
definition for ARj , i.e., A ∈ ARj if and only if A ∈ A∞ and A ∩ [Rj 1 = n] ∈ An , for 1 1 each n. Remark 5.8. Due to the absolute continuity of the distribution of ξ1 and definition of d cf. Assumptions 2.5, we see that for any given @ > 0 if Z denotes a random variable distributed according to the conditional distribution of ξ1 given d < ξ1 ≤ d + @, then the variational distance between the distributions of u + 2f Z and that of u + 2f Z tends to zero as |u − u | → 0. On the other side, given two probability measures P , P on R there exists a probability measure Q on R × R with marginals P and P , and such that the mass off the diagonal is equal to half of the variational distance between P and P . This measure is called a “maximal coupling” of P and P , since it exactly maximizes the mass on the diagonal, among all probability measures with the given marginals. (For a proof of this classical result on coupling see e.g. p.18 of [10].) As a consequence of this observation we can find a probability measure νv ,v satisfying (5.28), provided v and v are close enough. In order to fix notations, for @ as fixed before we can take ρ(@) so that a(@, ρ) > 0 for 0 < ρ < ρ(@) since the choice of r˜1 guarantees that |(v )2 − (v )2 | ≤ ρ, as previously seen. We now set ξRj +˜r1 +i (x ) = ξRj +˜r1 +i (x ) = ξRj +˜r1 +i (x) for 2 ≤ i ≤ r2 . By the 1 1 1 choice of j1 we already know that all these variables are less than d + @ (since the pair (Rj 1 , Rj1 ) is separating). Two cases have to be considered: (a) If (v )2 +2f ξ = (v )2 +2f ξ , we then continue by the same interdistances between standing particles as from the configuration x, and look at the Markov chain ζ used in the proof of Theorem 5.2 and now defined in terms of the random variables Rj 1 +1 −Rj1 , . . . , starting with ζj1 = 1. The coupling with the same interdistances is done up to the first label k1 > j1 such that ζk1 = 0, if any. If no such k1 exists, it follows from the construction that Rj 1 + r˜1 + 1 will be a cluster index for both configurations x and x and that at this step of discrete dynamics both evolutions will have the same velocity. If k1 is finite, notice that at the corresponding Rk 1 we need to restart the previous construction. That is we keep using the same interdistances between standing particles as prescribed by the configuration {ξn }n up to the index Rj 2 + r˜1 , where j2 = min{k > k1 : Rk = Rk + r˜ }, and then we repeat the proper coupling of the next interdistances as done above. More precisely, we take (ξR +˜r +1 , ξR +˜r +1 ) in a way that their regular conditional distrij2
j2
1
1
bution given the σ −field ARj is νv0 (TR 2
j2 +˜r1
(x )),v0 (TR
j2 +˜r1
(x )) ,
and repeat the previous
argument. (b) If (v )2 + 2f ξ = (v )2 + 2f ξ then we use the same procedure but taking j2 = min{k > j1 : Rk = Rk + r˜ }. In this way two infinite configurations x , x can be constructed, and it is not hard (though tedious to write down all details) to see that each of them has the proper distribution. Using the same arguments as in the proof of Theorem 5.2, we conclude that if Q denotes their joint distribution, then it has the correct marginals, µ and µ0 , and verifies: Q{(x , x ) : there exists j ≤ n, such that Rj + r˜1 + 1 is a cluster index for
both configurations and v0 (TRj +˜r1 +1 (x )) = v0 (TRj +˜r1 +1 (x ))} ≥ 1 − ce−c n (5.29) from which the lemma follows.
346
V. Sidoravicius, L. Triolo, M. E. Vares
Remarks. (1) There are two crucial ingredients for the correctness of the above construction: (a) given Rj i the future of the Markov chain ζji +1 , . . . ζji +k is a function only of the interdistances in the x configuration: ξRj +1 , . . . , ξRj +k ; (b) the marginals of the i
i
measure νv ,v do not depend on the values of v , v and are the conditional distribution of ξ1 given d < ξ1 ≤ d + @. (2) A “constructive” approach to perform such a coupling would be to enlarge the original space, where x is defined by taking a product of independent random variables U0 , U1 , U1 , U2 , U2 , . . . , all of them uniformly distributed on the unit interval (0, 1), and completely independent of x. Then U0 is used to generate v0 (x ) and each pair (Ui , Ui ) is used to generate the observation of νv ,v needed at the index Rj i + r˜1 + 1, i.e. to generate (ξR +˜r +1 , ξR +˜r +1 ). ji
1
ji
1
As a consequence of Theorem 2.6 we may now consider the stationary process (τn )n with n ≥ 1, defined on the space (X0 , B(X0 ), µ), as: τ1 (x) = t1 (x); and for (n−1) (x)). n ≥ 2, τn (x) = t1 (T1
n ≤ Notice that this is well defined, i.e., τn (x) < +∞ µ a.s., since τn ≤ 2ξfn + vξmin cξn for some positive constant c. Moreover, the same argument used in Lemma 5.7 yields the mixing property for this process, as stated in Theorem 2.7 and proven below.
6. Proof of Theorem 2.7 Proof of Parts (I) and (II). For any set A ∈ Mk0 with µ(A) > 0 we consider the measures µ(·|A) and µ restricted to M+∞ k+1 . Proceeding as in the proof of Lemma 5.7 but applied to the evolution starting with Tk x, we see that a coupling Qk,A of two configurations x and x distributed according to µ(·) and µ(·|A), respectively, may be constructed in such a way that with probability at least 1 − ce−c m a joint cluster index less than m (which would correspond to the label k + m in the original system) will be found and such that at this epoch of the discrete dynamics, both configurations will exhibit the same velocity for the heavy particle. This automatically will ensure that on this set τj (x ) = τj (x ), for all j ≥ m. In order to perform this coupling we need first to observe that under each of these measures the law of the standing neutral particles is the same as under µ0 , and that they are independent of the moving particles. All moving n.p. have velocity at least vmin and moreover, under both measures the velocity of h.p. satisfies condition (b1) of Assumption 2.5. The construction done in the proof of Lemma 5.7 can be properly repeated here. With that same construction we have that supB∈M+∞ |µ(B|A) − µ(B)| k+m will be bounded by the complementary probability of the event in the l.h.s of Eq. (5.29), under the measure Qk,A . Since the constants c, c in this analogue of Eq. (5.29) can be chosen independently of k and A, as the construction shows, we conclude the proof of the first part of the theorem. def
Recall that under our conditions τ˜ = Eµ τ1 < +∞. Thus, the ψ−mixing property of the process (τn ) yields at once: n
1 τi → τ˜ n
µ − a.s.
i=1
(the same holds also for the measure µ0 a.s.). Since the random variables ξi satisfy the law of large numbers, under both µ and µ0 , if ϑt (x) denotes the number of collisions of the
Mixing Properties for the Mechanical Motion of Charged Particle
347
h.p. with standing particles up to time t, i.e., ϑt (x) = k if tk (x) ≤ t < tk+1 (x), for k ≥ 0, we get at once that lim
s→+∞
Eµ0 (ξ1 ) def qϑs (x) = = vD tϑs τ˜
µ0 − a.s.
From completely standard arguments we conclude that q0 (Tt (x)) q0 (t, x) = lim = vD t→+∞ t→+∞ t t lim
µ0 − a.s.,
completing the proof of Part II. Proof of Part the proof of III of√Theorem 2.7 is the diffu III. An important √ step for sivity of ni=1 (τi − Eµ τ1 )/ n and of ni=1 (ξi − vD τi )/ n under the measure µ0 . We first use the coupling constructed in the proof of Lemma 5.7 to reduce the proof of the invariance principle to the analysis of the stationary system, under the measure µ. For this, analogously to [5], we use Eq. (5.29) with n replaced by na for some 0 < a < 1/2, which guarantees that with probability tending to one as n → +∞ we have τi (x ) = τi (x ), for i ≥ na . To treat the stationary system, the main technical point is the following lemma, the proof of which is postponed to the Appendix. Lemma 6.1. The quantities Eµ (tn − Eµ tn )2 n→+∞ n +∞ = Eµ (τ1 − Eµ τ1 )2 + 2 Eµ (τ1 − Eµ τ1 )(τi − Eµ τ1 ), def
σ2 =
lim
(6.1)
i=2
Eµ (qn − vD tn )2 n→+∞ n +∞ = Eµ (ξ1 − vD τ1 )2 + 2 Eµ (ξ1 − vD τ1 )(ξi − vD τi )
σ =
2 def
lim
(6.2)
i=2
with vD = Eµ (ξ1 )/Eµ (τ1 ), are finite and strictly positive. (Recall that Eµ (ξ1 ) = Eµ0 (ξ1 ).) Once Lemma 6.1 has been proven, the proof of part III of Theorem 2.7 follows exactly the same pattern as that of part (ii) of Theorem 2 in [5], yielding the invariance principle with σ˜ = σ / Eµ (τ1 ). Details are omitted at this point.
7. Appendix: Proof of Lemma 6.1 Let us first notice that the finiteness of σ and σ as well as the last equality in each of Eqs. (6.1) and (6.2) follow at once from Part I of Theorem 2.7 and the stationarity of µ. The important point which remains to be proven is their positivity. The proof of Lemma 6.1 is analogous for σ and σ , and so we consider only the first, for shortness of writing. Using a construction, similar but more involved than that of Sect. 5, we will show the existence of weakly dependent regions for which the corresponding
348
V. Sidoravicius, L. Triolo, M. E. Vares
flight times have variances that can be explicitly estimated from below by a positive constant. Together with a suitable control of the covariances with the remaining flight times, this would give us a lower bound for the variances of tn which grows linearly in n. We begin with a “technical lemma”, which provides some needed estimates. Lemma 7.1. Given @ > 0, there exist positive constants c@ > 0, 0 < "1 ≤ "2 ≤ "3 ≤ "4 ≤ "5 and integers k1 > 0, k2 > 0 such that, if we define a finite sequence 1 +k2 +r2 +4 H = {Hi }ki=1 , where terms Hi are given by the following equation:
Hi =
"1 "2 @ "3 "4 "5
if if if if if if
i i i i i i
∈ {1, . . . , k1 }, = k1 + 1, ∈ {k1 + 2} ∪ {k1 + k2 + 5, . . . , k1 + k2 + r2 + 4}, ∈ {k1 + 3, . . . , k1 + k2 + 2} = k1 + k2 + 3, = k1 + k2 + 4,
(7.1)
and def
E = {ξi ≤ d + Hi , i = 1, . . . , k1 + k2 + r2 + 4} then we have: a) Varµ (tk1 +k2 +3 − tk1 +1 |E, v0 (0)) ≥ c@ > 0,
c@ Covµ (tk1 +k2 +3 − tk1 +1 ), τk1 +1 |E, v0 (0) ≥ − , 4
c@ Covµ τk1 +k2 +3 , τk1 +k2 +4 |E, v0 (0) ≥ − , 4
(7.2) (7.3) (7.4)
µ− almost surely on the set [v0 (0) ≤ V∗ ]; (b) on the event E ∩ [v0 (0) ≤ V∗ ], v0 (tj −1 ) ≤ α
2df + ρ(Hj ), 1 − α2
for j = k1 + 1, k1 + k2 + 3, k1 + k2 + 4, where ρ(.) is defined in Remark 5.8. We should notice that the constants "i , k1 , k2 appearing above are “of technical character”, do not have any direct relation to the nature of the model, but are sufficient to perform some explicit computations below. The proof of Lemma 7.1 is postponed to the end of the appendix, and we now check that Lemma 6.1 follows once this is proven. For this we shall again use coupling methods, the basic idea being to combine the “trap”, defined by the finite set of restrictions on the interdistances, cf. Lemma 7.1, with the notion of cluster index and a coupling as in Lemma 5.7. This will replace the notion of “good cluster index” in [5]; the situation here is more involved due to the dependence on the velocities for the discrete time dynamics. We start by a convenient modification of Definition 5.3.
Mixing Properties for the Mechanical Motion of Charged Particle
349
Definition 7.2. Given x ∈ X, we recurrently define the sequence of pairs $n (x), R $n (x))}n≥0 as follows: {(R $0 (x) = R $0 (x) = min{j : v¯0 (t¯j ) ≤ V∗ }, R $n (x) + r and j is R $n (x) − admissible}; $n+1 (x) = min{j : j ≥ R R where r satisfies (5.8) and (5.12), and $n+1 $n+1 R (x) = R (x) + max{1 ≤ k ≤ k1 + k2 + r2 + 4 : ξRn+1 (x)+j ≤ d + Hj , ∀1 ≤ j ≤ k}. (7.5)
$n , R $n are µ and µ0 a.s. finite, and each R $n is a stopping As before, the random indices R $ $ time for the filtration (σ (xm , ξ1 , . . . , ξm ))m≥1 . (Notice that R0 = R0 = R0 , cf. Sect. 4.) $n (x), R $n (x))}n≥1 defined above, we will say that Definition 7.3. Given the sequence {(R $n (x), R $n (x)) is strongly separating if R $n (x) − R $n (x) = k1 + k2 + r2 + 4. the pair (R Proof of Lemma 6.1. Due to (b) of Lemma 7.1, we may take the measure νv,v as in Remark
2df 5.8, with v = α 1−α 2 , and v ≤ v + ρ(Hi ), for i = k1 + 1, k1 + k2 + 3, k1 + k2 + 4. Let x be distributed according to µ; if needed, we shall enlarge the probability space, as in the proof of Lemma 5.7, and construct two random configurations x , x , both distributed according to µ. We shall use this enlarged space to estimate the conditional variance of tn (x ) given a suitable σ −field M; upon integration, this gives the desired estimate on the variance of tn under µ. We start by first fixing the moving particles: = x = x , i.e., both configurations have exactly the same moving particles as xm m m the configuration x. As before we shall use ξi to denote the successive interparticle distances between standing particles in the configuration x, and ξi , ξi will denote the corresponding interdistances in the configurations x , x , respectively. The random $n , R $n are taken as functions of the configuration x only, according to the indices R previous definition. Moreover, we set, for s ≥ 1,
$j , R $j ) is strongly separating}. js = min{j ≥ js−1 + 1 : (R with j0 = 0. As before, calling $n = σ (xm , ξ1 , . . . , ξn+k1 +k2 +r2 +4 , ξ1 , . . . , ξn , ξ1 , . . . , ξn ), A $n , for each n ≥ 1, s ≥ 1. Moreover we define ξ = ξ = $ = n] ∈ A we see that [R js i i $ + k1 + 1, R $ + k1 + k2 + 3, R $ + k1 + k2 + 4, and for all s ≥ 1. ξi for i = R js js js $$ is νv ,v , as deThe conditional distribution of (ξR$ +k+1 , ξR $ +k+1 ) given AR js +k js js 2df scribed by Remark 5.8, where v = α 1−α $ +k (x )) for k = 2 and v = v0 (TR js k1 , k1 + k2 + 2, k1 + k2 + 3 and for each s ≥ 1. The procedure is well defined and in this way we construct two configurations (x , x ), each of them being distributed according to µ, analogously to the coupling in the proof of Lemma 5.7. Besides the change in the finite “trap” involved in the definition of “strongly separating” with respect to what we have called “separating” in Sect. 5, our coupling procedure has now been modified to fit into the present purposes.
350
V. Sidoravicius, L. Triolo, M. E. Vares
˜ the distribution of the configurations (x , x ) constructed by this We denote by Q $ , R $ take the prescription; it follows from the construction that the random variables R j j same value in x , x and x. For s ≥ 1 we define: a) v02 (TR$ +k (x )) + 2f ξR$ js js (x)+k+1 2df 1 if = α2 + 2f ξR$ 2 js (x)+k+1 1 − α (7.6) Zs (x , x ) = for k = k1 , k1 + k2 + 2, k1 + k2 + 3, c) R $j +i−1 ≤ r + i, ∀i ≥ 1, $j +i − R s s 0 otherwise. $ , R $ , . . . ) - the sigma-algebra generated by variables Call M = σ (Z1 , Z2 , Z3 , . . . , R j1 j2 $ , s ≥ 1. In particular, from (5.28) and the properties of the Markov chain ζ Zs , R js involved in the proof of Theorem 5.2, we conclude that the pair correlations of the variables Zs decay exponentially fast and that there exists θ > 0 so that ˜ s = 1) ≥ θ. inf Q(Z
(7.7)
s
The next step is to prove the following c@ Zs 1[R$ +k1 +k2 +3≤n] , VarQ˜ (tn (x )|M) ≥ js 2 s
˜ − a.s. Q
(7.8)
Before proving Eq. (7.8) let us observe that the validity of Eq. (6.1) follows quite easily from it. For this, let us first observe that we have $1 + $m ≤ (k1 + k2 + r2 + 4)(m − 1) + R R
m j =2
$j , Y
$ def $ $ where Y j = Rj − Rj −1 , j ≥ 2 are i.i.d. integrable random variables under µ (as well as µ0 ), and their common distribution has exponentially decaying tails, analogously to (5.11). (Observe that these variables depend only on the configuration x.) Thus, if M is large enough, we have: $[n/M] + k1 + k2 + 3 > n] = 0. lim µ[R
n→+∞
(7.9)
(Use [y] to denote the integer part, if y is a non-negative real number.) Also the indices js are partial sums of i.i.d. random variables, with exponentially decaying tails: µ[j1 = k] ≤ cηk for suitable c and 0 < η < 1. Thus, for M large enough: lim µ[j[n/M ] > [n/M]] = 0.
n→+∞
(7.10)
On the other side, from (7.8) we have:
[n/M ] 1 c@ VarQ˜ (tn ) ≥ EQ˜ Zs 1[R$ j[n/M ] +k1 +k2 +3≤n] n 2n s=1
c@ [n/M ] $j ≥ + k1 + k2 + 3 > n] , θ − µ[R ] [n/M 2n
(7.11)
Mixing Properties for the Mechanical Motion of Charged Particle
351
and (6.1) follows at once from (7.9), (7.10), and (7.11). The proof of Eq. (7.8) is based on Lemma 7.1 by decomposing tn into a sum of time intervals determined by Zs = 1. Namely, let s1 , s2 , . . . be the successive indices such that Zs = 1, (recalling Eq (7.7) and the comment just before it), and to avoid too messy $ , and set indexation, write Ri∗ = R js i
% τi = tRi∗ +k1 +k2 +3 − tRi∗ +k1 +1 ,
i ≥ 1,
∗ +k +k +3 , τ%i = tRi∗ +k1 +1 − tRi−1 1 2
i ≥ 2,
% τ%1 = tR1∗ +k1 +1 .
∗ Letting L be determined by RL∗ + k1 + k2 + 3 ≤ n < RL+1 + k1 + k2 + 3, we can write
VarQ˜ (tn (x )|M) & L '
= VarQ˜ τ%i + % τi + tn − tRL∗ +k1 +k2 +3 M i=1
=
L i=1
L
VarQ˜ τ%i M + VarQ˜ (% τi M) + VarQ˜ (tn − tRL∗ +k1 +k2 +3 M
+2
i=1
1≤i
+
1≤i,i ≤L
+
τi , % CovQ˜ % τi M + 2
1≤i
CovQ˜ τ%i , % τ i M
τi , % CovQ˜ % τ i M + CovQ˜ τ%i , tn − tRL∗ +k1 +k2 +3 M 1≤i≤L
τi , tn − tRL∗ +k1 +k2 +3 M CovQ˜ %
1≤i≤L
= I1 + I2 + I3 + I4 + I5 + I6 + I7 + I8 . For I1 and I3 we use the trivial bound: I1 ≥ 0 and I3 ≥ 0. On the other hand, if 1 ≤ i ≤ L, it follows from the construction that at each of the random indices Ri∗ +k1 +1 and Ri∗ + k1 + k2 + 3 there is a successful coupling of velocity processes (in x and x ) and also a cluster index, which immediately gives us the conditional independence of % τi and % τi for i = i as well as % τi and % τ i for i = i , implying that I4 = I5 = I7 = 0. Due to successful coupling of velocity processes at the index Ri∗ + k1 + k2 + 4, together with the fact that it is a cluster index, we get that for i > i,
˜ − a.s., τi , % τ i M = 0, if i ≥ i + 2, Q CovQ˜ % and since at Ri∗ + k1 + k2 + 3 we also have successful coupling of velocity processes and a cluster index, using (7.4) we get
CovQ˜ % τi , % τ i+1 M = CovQ˜ τRi∗ +k1 +k2 +3 , τRi∗ +k1 +k2 +4 M c@ ˜ − a.s. ≥− , Q 4 By the same reasoning applied to the index Ri∗ + k1 + 1, we get that for i ≤ i,
τ i M = 0, if i ≤ i − 1, τi , % CovQ˜ %
352
V. Sidoravicius, L. Triolo, M. E. Vares
and using (7.3)
c@ τi , % τi , τRi∗ +k1 +1 M ≥ − , τ i M = CovQ˜ % CovQ˜ % 4
˜ − a.s. Q
˜ − a.s. Summarizing we get I6 ≥ −2L c4@ ; a similar argument gives I8 ≥ c4@ Q ˜ Finally, from (7.2) it easily follows that I2 ≥ Lc@ Q − a.s., which completes the proof of inequality (7.8). Proof of Lemma 7.1. To shorten notations we introduce the functions: ( 2 2df v(u, z) = α α + u + 2(d + z)f , 1 − α2 and 1 2df −1 τ (u, z) = −α − u + α v(u, z) . f 1 − α2
(7.12)
(7.13)
Both functions are continuous, τ (u, z) is strictly decreasing in the first variable u and strictly increasing in z, while v(u, z) is strictly increasing in both parameters u and z, and they have fairly transparent meaning: τ (u, z) represents the time which is needed for the h.p. to cross a segment of the length d + z starting with the initial velocity α α −1 v(u, z)
2df 1−α 2
+ u,
is the corresponding provided no collisions occur (“free flight time”), and velocity of the h.p. when it would reach the end point of this segment. The constants "i , ki appearing in (7.1) will be defined in the following order of their dependence: @ → c@ → "5 → "4 → k2 → "3 → "2 → "1 → k1 . We fix @ > 0, the same as in (5.4) and can take a positive integer b so that µ0 {d < ξ1 < d + b@ } and µ0 {d + 2@ b < ξ1 < d + @} are both positive. With φ > 0 being defined through τ (0, we take u2 such that
2@ @ ) − τ (0, ) = 2φ, b b
(1 − α)f φ, 4α 2@ @ τ (u2 , ) − τ (0, ) ≥ φ, b b 0 < u2 ≤
and v(0, Take now
def
(7.15) (7.16) (7.17)
µ0 {d < ξ1 < d + b@ } > 0, µ0 {d < ξ1 < d + @}
(7.18)
µ0 {d + 2@ b < ξ1 < d + @} > 0, µ0 {d < ξ1 < d + @}
(7.19)
1 (1 − α)φ 2 p 1 p2 . 2 2
(7.20)
def
p1 = p2 =
2@ @ ) > v(u2 , ). b b
(7.14)
and define def
c@ =
Mixing Properties for the Mechanical Motion of Charged Particle
353
Now we fix 0 < "5 < @ such that
c@ τ (0, d) τ (0, "5 ) − τ (0, 0) ≤ . 8
(7.21)
In what follows we take ρ(") sufficiently small, according to Remark 5.8. By continuity of τ (., .) we can fix 0 < u5 < ρ("5 ) such that
c@ (7.22) τ (0, d) τ (0, "5 ) − τ (u5 , 0) ≤ , 4 and due to the above mentioned properties of the functions τ (., .), v(., .) and ρ(.) we may take 0 < "4 < @ and u4 > 0 such that: 0 < u4 < ρ("4 ), 2df + u5 , (7.23) v(u4 , "4 ) ≤ α 1 − α2 and τ (0, "4 ) − τ (u4 , 0) ≤
1−α φ. 4
(7.24)
Now we fix k2 as the first integer such that α k2 v(u2 , @) ≤
u4 , 2
(7.25)
and pick 0 < "3 < @ such that 2f (d + "3 )
k2
α
2i
i=1
as well as
2df u4 2 ≤ α + , 1 − α2 2
k2 @ 2@ α 2i . α 2k2 v(0, ) − v(u2 , ) > 2f "3 b b
(7.26)
(7.27)
i=1
We set "2 > 0 such that
c@ τ (0, "2 ) − τ (0, 0) (k2 + 2)τ (0, @) ≤ , 8 and take u1 such that 0 < u1 < ρ("2 ), v(u1 , "2 ) ≤ α and
2df + u2 , 1 − α2
c@ τ (0, "2 ) − τ (u1 , 0) (k2 + 2)τ (0, @) ≤ . 4
(7.28)
(7.29) (7.30)
Now we set
u 2 1 k1 = min k ≥ r1 : α 2k V∗2 ≤ 2 and finally take "1 > 0 such that k1 2df u 1 2 2(k1 −i+1) 2f (d + "1 ) α ≤ α + . 2 1−α 2 i=1
(7.31)
(7.32)
354
V. Sidoravicius, L. Triolo, M. E. Vares
Next we verify that with the above choices of ui , "i , i = 1, . . . , 4 we have inf (tk1 +k2 +3 − tk1 +1 ) − sup(tk1 +k2 +3 − tk1 +1 ) ≥ U
L
(1 − α) φ, 2
(7.33)
where U = [v0 (0) ≤ V∗ , ξi ≤ κi , 1 ≤ i ≤ k1 + k2 + 3, ξk1 +2 ≥ d + 2@ b ] and L = [v0 (0) ≤ V∗ , ξi ≤ κi , 1 ≤ i ≤ k1 + k2 + 3, ξk1 +2 ≤ d + b@ ]. Indeed, let us observe that inf (tk1 +k2 +3 − tk1 +1 ) − sup(tk1 +k2 +3 − tk1 +1 ) U
L
≥ inf (τk1 +2 + [tk1 +k2 +2 − tk1 +2 ] + τ (u4 , 0)) U
− sup(τk1 +2 + [tk1 +k2 +2 − tk1 +2 ] + τ (0, "4 )) L
k2 −1
1 1−α 1 v0 (tk1 +2+i ) + v0 (tk1 +k2 +2 ) ≥ inf τk1 +2 − v0 (tk1 +2 ) + f αf αf U
(7.34)
i=1
k2 −1 1 1−α + τ (u4 , 0) − sup τk1 +2 − v0 (tk1 +2 ) + v0 (tk1 +2+i ) f αf L
1 v0 (tk1 +k2 +2 ) + τ (0, "4 )), + αf
i=1
where we have used Eq. (3.6), remarks (i) and (ii) just preceding Lemma 5.6, implying that under the given conditions there are no collisions with moving particles in thetime interval [tk1 , tk1 +k2 +3 ], and the fact that on L or U we have v0 (tk1 +k2 +2 ) ∈
(α
2df ,α 1−α 2
2df 1−α 2
+ u4 ], as it follows from Eqs. (7.25) and (7.26). Moreover, from
Eq. (7.24) we have τ (u4 , 0) − τ (0, "4 ) ≥ − 1−α 4 φ, and Eqs. (3.5) and (7.27) imply that inf U v0 (ti ) ≥ supL v0 (ti ) for k1 + 2 ≤ i ≤ k1 + k2 + 2. On the other side, due to the absence of recollisions during the time interval under consideration, we have:
1 1 inf τk1 +2 − v0 (tk1 +2 ) − sup τk1 +2 − v0 (tk1 +2 ) f f U L
α = inf (1 − α)τk1 +2 − v0 (tk1 +1 ) f U
α − sup (1 − α)τk1 +2 − v0 (tk1 +1 ) f L 2@ @ (1 − α) φ ≥ (1 − α)(τ (u2 , ) − τ (0, )) − b b 4 3 ≥ (1 − α)φ 4
(7.35)
by (7.15) and (7.16). From (7.35) and previous consideration (7.33) follows. We now use the following trivial fact: if Y is a real random variable for which there are constants a, a , with a − a ≥ γ > 0 and P(Y ≤ a) ≥ p1 and P(Y ≥ a ) ≥ p2 , then 1 Var(Y ) ≥ γ 2 p1 p2 . (7.36) 2
Mixing Properties for the Mechanical Motion of Charged Particle
355
From this, (7.33), (7.18) and (7.19) statement (7.2) of Lemma 7.1 follows at once. To prove (7.3) we first notice that, since each "i ≤ @, on E ∩ [v0 (0) ≤ V∗ ] we have: tk1 +k2 +3 − tk1 +1 ≤ (k2 + 2)τ (0, @); and
(7.37)
τ (u1 , 0) ≤ τk1 +1 ≤ τ (0, "2 ).
Due to the choice of "2 and u1 , cf. Eqs. (7.28) and (7.30) we immediately get (7.3). Using (7.22) and analogous estimates we get (7.4). Statement b) of the lemma follows at once from the choice of u2 , u4 and u5 (according to (7.29), (7.25), (7.26), and (7.23)). Acknowledgements. The authors would like to thank E. Presutti for many helpful discussions. Partial support by CNPq-CNR agreement and FINEP(Pronex) are gratefully acknowledged. The project has been partially supported by FAPERJ grant E-26/150.940/99 and by CNPq (Brazil).
References 1. Arnold, V. and Avez, A. (1967): Problèmes ergodiques de la mécanique classique. Paris: Gauthier-Villars 2. Boldrighini, C. (1986): Bernoulli property for a one-dimensional system with localized interaction. Commun. Math. Phys. 103, 499–514 3. Boldrighini, C., De Masi, A., Nogueira, A. and Presutti, E. (1985): The dynamics of a particle interacting with a semi-infinite ideal gas is a Bernoulli flow. In: Statistical physics and dynamical systems: Rigorous results. Fritz, J., Jaffe, A., Szasz, D., (eds). Progress in Physics, 10, Boston– Basel: Birkhäuser, pp. 153– 189 4. Boldrighini, C., Pellegrinotti, A., Presutti, E., Sinai, Ya. and Soloveitchik, M. (1985): Ergodic properties of a semi-infinite one-dimensional system of statistical mechanics. Commun. Math. Phys. 101, 363–382 5. Boldrighini, C., Cosimi, G., Frigio, S. and Nogueira, A. (1989): Convergence to a stationary state and diffusion for a charged particle in a standing medium. Probab. Theory Relat. Fields 80, 481–500 6. Boldrighini, C., Soloveitchik, M. (1995): Drift and diffusion for a mechanical system. Prob. Theory Rel. Fields 103, 349–379 7. Dembo, A. and Zeitouni, O. (1993): Large deviations techniques and applications. Boston: Jones and Bartlett Publishers, Inc. 8. Karlin, S., Taylor, H. (1975): A First Course in Stochastic Processes. 2nd edition, New York: Academic Press 9. Landau, L. and Lifschitz, E. (1959): Statistical physics. London–Paris: Pergamon Press 10. Lindvall, T. (1992): Lectures on the Coupling Method. New York: Wiley 11. Piasecki, J. (1983): Approach to field-induced stationary state in a gas of hard rods. J. Stat. Phys. 30, 201–209 12. Presutti, E., Sinai, Ya. and Soloveitchik, M. (1985): Hyperbolicity and Møller morphism for a model of classical statistical mechanics. In: Statistical physics and dynamical systems: Rigorous results. J. Fritz, A. Jaffe and D. Szasz (eds). Progress in Physics, 10, Basel–Boston: Birkhäuser, pp. 253-284 13. Pellegrinotti, A., Sidoravicius, V. and Vares, M.E. (1999): Stationary state and diffusion for a charged particle in a one dimensional medium with lifetimes. SIAM Probab. Theory Appl. 44 4, 796–825 14. Sidoravicius, V., Triolo, L. and Vares, M.E. (1998): On the forced motion of a heavy particle in a random medium I. Existence of dynamics. Markov Proc. Rel. Fields 4 4, 629–648 15. Sinai, Ya.G. (1970): Dynamical systems with elastic reflections. Russ. Math. Surv. 25, 137–189 16. Sinai, Ya.G. and Soloveitchik, M. (1986): One dimensional Classical Massive Particle in the ideal Gas. Commun. Math. Phys. 104, 423–443 17. Spohn, H. (1991): Large scale Dynamics of Interacting Particles. Text and Monographs in Physics, Berlin: Springer Communicated by Ya. G. Sinai
Commun. Math. Phys. 219, 357 – 398 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Application of the τ -Function Theory of Painlevé Equations to Random Matrices: PIV, PII and the GUE P. J. Forrester1 , N. S. Witte1,2 1 Department of Mathematics and Statistics, University of Melbourne, Victoria 3010, Australia.
E-mail:
[email protected];
[email protected]
2 School of Physics, University of Melbourne, Victoria 3010, Australia
Received: 27 June 2000 / Accepted: 8 December 2000
Abstract: Tracy and Widom have evaluated the cumulative distribution of the largest eigenvalue for the finite and scaled infinite GUE in terms of a PIV and PII transcendent respectively. We generalise these results to the evaluation of E˜ N (λ; a) := N (l) (l) (l) a l=1 χ(−∞,λ] (λ − λl ) , where χ(−∞,λ] = 1 for λl ∈ (−∞, λ] and χ(−∞,λ] = 0 otherwise, and the average is with respect to the joint of the GUE, Neigenvalue adistribution (λ − λ ) . Of particular interest as well as to the evaluation of FN (λ; a) := l l=1 ˜ are EN (λ; 2) and FN (λ; 2), and their scaled limits, which give the distribution of the largest eigenvalue and the density respectively. Our results are obtained by applying the Okamoto τ -function theory of PIV and PII, for which we give a self contained presentation based on the recent work of Noumi andYamada. We point out that the same approach can be used to study the quantities E˜ N (λ; a) and FN (λ; a) for the other classical matrix ensembles. Contents 1. 2.
3.
Introduction and Summary . . . . . . . . . . . . . . . . . . . τ -Function Theory for PIV . . . . . . . . . . . . . . . . . . . 2.1 Affine Weyl group symmetry . . . . . . . . . . . . . . 2.2 Toda lattice equation . . . . . . . . . . . . . . . . . . . 2.3 Classical solutions . . . . . . . . . . . . . . . . . . . . 2.4 Bäcklund transformations and discrete Painlevé systems τ -Function Theory for PII . . . . . . . . . . . . . . . . . . . . 3.1 Affine Weyl group symmetry . . . . . . . . . . . . . . 3.2 Toda lattice equation . . . . . . . . . . . . . . . . . . . 3.3 Classical solutions . . . . . . . . . . . . . . . . . . . . 3.4 Bäcklund transformations and discrete dPI . . . . . . . 3.5 Coalescence from PIV . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
358 362 362 366 369 372 375 375 378 379 379 381
358
4.
5.
6.
P. J. Forrester, N. S. Witte
Application to Finite GUE Matrices . . . . . . . . . . . 4.1 Calculation of EN (0; (s, ∞)) and E˜ N (s; a) . . . 4.2 Calculation of FN (s; a) . . . . . . . . . . . . . 4.3 UN (t; a) and VN (t; a) as Painlevé transcendents Edge Scaling in the GUE . . . . . . . . . . . . . . . . 5.1 Calculation of E soft (s) and E˜ soft (s; a) . . . . . 5.2 Calculation of F soft (λ; a) . . . . . . . . . . . . Conclusions – A Programme . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
382 382 386 388 389 389 393 396 397
1. Introduction and Summary Hermitian random matrices X with a unitary symmetry are defined so that the joint distribution of the independent elements P (X) is unchanged by the similarity transformation X → U † XU for U unitary. For example, an ensemble of matrices with N j P (X) := exp ∞ j =0 αj Tr(X ) =: j =1 g(λj ) for general g(x) ≥ 0 possesses a unitary symmetry. Such ensembles have the property that the corresponding eigenvalue probability density function p(λ1 , . . . , λN ) is given by the explicit functional form p(λ1 , . . . , λN ) =
N 1 g(λl ) C l=1
|λk − λj |2 ,
(1.1)
1≤j
C denoting the normalization. (Throughout the symbol C will be used to denote some constant, i.e. a quantity independent of the primary variable(s) of the equation.) The 2 each diagonal element of X indechoice g(x) = e−x , which is realized by choosing √ pendently from the normal distribution N[0, 1/ 2], and each off diagonal independently with distribution N[0, 1/2] + iN[0, 1/2], is referred to as the Gaussian Unitary Ensemble (GUE) and is the main focus of the present article. Specifically our interest is in the distribution of the largest eigenvalue, and the average values of powers (integer and fractional) of the characteristic polynomial N l=1 (λ − λl ) for such matrices. Let EN (0; (s, ∞)) denote the probability that there are no eigenvalues in the interval (s, ∞) for N × N GUE matrices. The distribution of the largest eigenvalue pmax (s) is given in terms of EN (0; (s, ∞)) by pmax (s) =
d EN (0; (s, ∞)). ds
(1.2)
With RN (s) specified by the solution of the nonlinear equation 2 2 ) + 4(RN ) (RN + 2N ) − 4(sRN − RN )2 = 0, (RN
(1.3)
(an example of the Jimbo–Miwa–Okamoto σ -form of the Painlevé IV differential equation; see Eq. (2.18) below) subject to the boundary condition 2N−1 s 2N−2 e−s , s→∞ π 1/2 (N −1)! 2
RN (s) ∼
it has been shown by Tracy and Widom [35] that EN (0; (s, ∞)) = exp −
s
∞
RN (t) dt .
(1.4)
(1.5)
Application of the τ -Function Theory of PIV, PII to the GUE
359
Table 1. Correspondence between random matrix theory and Painlevé theory Painlevé theory
Random matrix ensembles
τ -function τ [N ](s; a)
Gap probability EN (s) Averages E˜ N (s; a), FN (s; a)
Hamiltonian H [N ](s; a)
Resolvent kernel RN (s; a) Logarithmic derivative of averages – UN (s; a), VN (s; a)
Classical solutions – Weyl chamber walls
Classical weights – Determinant structure
The derivation in [35] uses functional properties of Fredholm determinants (a subsequent derivation using the KP equations and Virasoro algebras has been given by Adler et al. [1]). In this work we will give a derivation of (1.5) based on the τ -function theory of the Painlevé IV equation due to Okamoto [26], and refined by Noumi and Yamada [21]. Our principal observation in employing this body of theory to problems in random matrix theory is that there is a deep and fundamental relationship between τ -functions relating to the Hamiltonian formalism of the Painlevé theory, and particular multiple integrals specifying averages with respect to the probability density function (1.1) in the case that g(x) takes the form of a classical weight function. From the random matrix perspective the classical weights are [9] 2 e−x , Gaussian a −x x e (x > 0), Laguerre g(x) = (1.6) a (1 + x)b (−1 < x < 1), Jacobi (1 − x) (1 + x 2 )−α , Cauchy. To summarise the correspondence, which applies to any of the cases, we set out the Table 1. The quantity E˜ N (s; a) in Table 1 is specified by E˜ N (λ; a) :=
N
l=1
(l) χ(−∞,λ] (λ − λl )a ,
(l)
(1.7)
(l)
where χ(−∞,λ] = 1 for λl ∈ (−∞, λ] and χ(−∞,λ] = 0 otherwise, and the average is with respect to the eigenvalue probability density function (1.1). For general a we obtain the evaluation s ˜ ˜ UN (t; a) dt (1.8) EN (s; a) = EN (s0 ; a) exp s0
(Eq. (4.14) with the substitution (4.10)), where UN (t; a) satisfies the nonlinear equation (UN )2 − 4(tUN − UN )2 + 4UN (UN − 2a)(UN + 2N ) = 0,
(1.9)
(Eq. (4.15)) subject to the boundary condition UN (t; a)
∼
t→−∞
−2N t −
1 N (a + N ) +O 3 , t t
(1.10)
360
P. J. Forrester, N. S. Witte
(Eq. (4.18)). For (N +1) × (N +1) dimensional GUE matrices pmax (s) is proportional 2 to e−s E˜ N (s; 2). We therefore have s = pmax (s0 ) exp (1.11) pmax (s) [−2t + UN (t; 2)] dt , N →N+1
N →N+1
s0
(Eq. (4.20)). The quantity FN (s; a) in Table 1 is specified by FN (λ; a) :=
N
(λ − λl )a ,
(1.12)
l=1
(Eq. (1.7)). For general positive integers a (1.12) has been computed by Brézin and Hikami in terms of the determinant of an a × a matrix involving Hermite polynomials. Note that for a not equal to a positive integer, (1.12) is well defined provided λ has a non-zero imaginary part. For general a we obtain the evaluation λ FN (λ; a) = FN (λ0 ; a) exp VN (t; a) dt , (1.13) λ0
(Eq. (4.33)) where VN (t; a) also satisfies the nonlinear equation (4.15), but now with the boundary conditions Na 1 + O(1/t) as t → ∞ (1.14) VN (t; a) ∼ χ t→±∞ t (Eq. (4.35)) where χ = 1 for t → ∞ and |χ | = 1 for t → −∞. In the case a = 2 this average is proportional to the polynomial part of the eigenvalue density for (N +1) × (N +1) dimensional GUE matrices, which in terms of the Hermite polynomial HN (λ) is proportional to each of the 2 × 2 determinants termed Turánians [17] HN (λ) HN+1 (λ) (λ) HN+1 (λ) HN+1 HN (λ) HN+1 (λ) , , , (λ) HN (λ) HN+1 HN+1 HN+1 (λ) HN+2 (λ) (λ) (λ) HN+1 (1.15) (which are of course proportional to each other). The result (4.33) with a = 2 implies λ ρ(λ) = ρ(λ0 ) exp (1.16) [−2t + VN (t; 2)] dt , N →N+1
N →N+1
λ0
(Eq. (4.36)). In Sect. 2 we review the τ -function theory of the Painlevé IV equation, revising relevant aspects of the work of Okamoto [24, 26], Noumi and Yamada [21, 22] and Kajiwara et al. [16]. The culmination of this theory from our perspective is the derivation of determinant formula expressions for the τ -function corresponding to special values of the parameters in the Painlevé IV equation. On the other hand, it follows easily from the definitions that E˜ N and FN can be written as determinants. These are presented in Sect. 4. The determinant formulas in fact precisely coincide with those occurring in Sect. 2, so consequently we can characterise both E˜ N and FN in terms of solutions of the nonlinear equation (4.15). The theory presented in Sect. 2 also allows E˜ N , FN to be
Application of the τ -Function Theory of PIV, PII to the GUE
361
characterised as solutions of a certain fourth order difference equation (Eq. (2.82)), and UN , VN as solutions of a particular third order difference equation (4.19). √ √ Also of interest is the scaling limit of (1.7) and (1.12) with λ → 2N + λ/ 2N 1/6 . This choice of coordinate corresponds to shifting the origin to the edge of the leading order support of the eigenvalue density, then scaling the coordinate so as to make the spacings of order unity as N → ∞. We find the scaled quantities can be expressed in terms of particular solutions of the general Jimbo–Miwa–Okamoto σ form of the Painlevé II equation (u )2 + 4u (u )2 − su + u − a 2 = 0, (1.17) (Eq. (5.10)). Specifically, as already known from [36], √ ∞ s E soft (s) := lim EN 0; 2N + √ = exp − r(t) dt , N→∞ 2N 1/6 s where r(s) satisfies (5.10) with a = 0. Also E˜ soft (s; a) :=
lim
√ √ N →∞ s → 2N+s/ 2N 1/6
= E˜ soft (s0 ; a) exp
Ce−as
s
s0
2 /2
(1.18)
E˜ N (s; a)
(1.19)
u(t; a) dt ,
where u(s; a) satisfies (5.10) subject to the boundary condition u(s; a)
∼
s→−∞
1/ s 2 4
+
4a 2 −1 (4a 2 −1)(4a 2 −9) + ... , + 8s 64s 4
(Eq. (5.11)). In the case a = 2 (Eq. (5.7)) gives the formula s soft soft pmax (s) = pmax (s0 ) exp u(t; 2) dt , s0
(1.20)
(1.21)
(Eq. (5.20)) for the scaled distribution of the largest eigenvalue in the GUE. Analogous to the formula (5.7), for the scaled limit of FN (λ; a) we have 2 Ce−aλ /2 FN (λ; a) lim F soft (λ; a) := √ √ N →∞ λ → 2N+λ/ 2N 1/6
=F
soft
(λ0 ; a) exp
λ
λ0
(1.22)
v(t; a) dt ,
(Eq. (5.31)) where v(s; a), like u(s; a), satisfies (5.10). The difference between u and v is in the boundary condition; for the latter we require v(t; a) ∼ −at 1/2 − t→∞
a(4a 2 +1) a2 + 4t 32t 5/2
(1.23)
(Eq. (5.35)). The case a = 2 corresponds to the scaled eigenvalue density at the spectrum edge, which has the known evaluation [10] Ai(s) Ai (s) ρ soft (s) = − (1.24) , Ai (s) Ai (s)
362
P. J. Forrester, N. S. Witte
where Ai(s) denotes the Airy function. In fact for all a ∈ Z≥0 we have the determinantal form d j +k F soft (λ; a) = (−1)a(a−1)/2 det Ai (λ) , (1.25) j +k j,k=0,...,a−1 dλ (Eq. (5.33)). In Sect. 3 we present the τ -function theory of the Painlevé II equation in an analogous fashion to the theory presented in Sect. 2 for the Painlevé IV equation. In particular we derive the second order second degree equation satisfied by the Hamiltonian (which is known from [14] and [26]) as well as a fourth order difference equation satisfied by the τ -functions. Also derived is the fact that the right-hand side of (5.33) corresponds to a τ -function sequence in the PII theory, which is a result of Okamoto [26]. In Sect. 5 the results (5.3), (5.7), (5.31) and (5.33) are derived from a limiting process applied to the corresponding finite N results. A programme for further study is outlined in Sect. 6. 2. τ -Function Theory for PIV 2.1. Affine Weyl group symmetry. It has been demonstrated in the works of Okamoto [26] (in a series of papers treating all the Painlevé equations), Noumi and Yamada [21] (see also their works [20, 23, 22]) and the earlier work of Adler [2] that the fourth Painlevé equation y =
1 2 3 3 β (y ) + y + 4ty 2 + 2(t 2 − α)y + , 2y 2 y
(2.1)
can be recast in a way which reveals its symmetries in a particularly manifest and transparent form. Proposition 1 ([21, 22]). The fourth Painlevé equation is equivalent to the coupled set of autonomous differential equations (where = d/dt) f0 = f0 (f1 − f2 ) + 2α0 , f1 = f1 (f2 − f0 ) + 2α1 ,
f2
(2.2)
= f2 (f0 − f1 ) + 2α2 ,
with y = −f1 and where the parameters αj ∈ R with α0 + α1 + α2 = 1 are related by α = α0 − α2 ,
β = −2α12 ,
(2.3)
and the constraint taken conventionally as f0 + f1 + f2 = 2t.
(2.4)
Proof. Equation (2.4) reduces the three first order equations of (2.2) down to two. Eliminating a further variable by introducing a second derivative shows that y = −f1 satisfies the PIV equation. The form of these equations implies (f0 + f1 + f2 ) = 2α0 + 2α1 + 2α2 = k, k = 0 constant, thus permitting the normalization given above.
(2.5)
Application of the τ -Function Theory of PIV, PII to the GUE
363
α0 = 0 1
11 α1 = 0 11 α1 = 1 11 α1 = 2 11
α1 = −1 11
11
1 •1 •1 •1 1
•1 11 α2 = 2 1 1 1
111
111
111
1 1 1
11 11 11
11
11
11
α0 = 11
11 11BJ 1 11
1
•1 11
• 11
• 111 α2 = 1
•1111
11
11
11 11 T3
11111T2 11
1
11 11
11111
11
◦ 1111
11
α0 = 21
11 1 1 1 1ks
11
1
• 11
• 111 α2 = 0
• 111
• 11 T
1
11
11
O 11 11 11 11
11 11 11
11
11
11
α0 = 31
11 11
1 1 1 1 • 1 • • • α = −1
11
11
11
11 2 (1)
Fig. 1. Parameter space for (α0 , α1 , α2 ) associated with the simple roots of the root system A2
Note. Many differing conventions are in use for such a description of the PIV system and for example we have written 2αj (j = 0, 1, 2) in place of the αj used in [21, 23, 22] in order to eliminate unnecessary factors of two appearing in the ensuing theory. The hyperplane α0 + α1 + α2 = 1 in parameter space (α0 , α1 , α2 ) ∈ R3 is associ(1) ated with the simple roots α0 , α1 , α2 spanning the root system of type A2 . From this perspective the parameters α0 , α1 and α2 define a triangular lattice in the plane (see Fig. 1). Let the fundamental reflections si (i = 0, 1, 2) represent the automorphism of the lattice specified by a reflection with respect to the line αi = 0. Their action on the simple roots are given by si (αj ) = αj − αi aij ,
(2.6)
where aij are the elements of the Cartan matrix 2 −1 −1 A = −1 2 −1 . −1 −1 2
(2.7)
Let π represent the lattice automorphism corresponding to a rotation by 120◦ degrees around the barycentre of the fundamental alcove C defined by αi > 0 (i = 0, 1, 2). Then π(αj ) = αj +1 ,
(2.8)
j ∈ Z/3Z. The operators π , si obey the algebra sj2 = 1,
(sj sj +1 )3 = 1,
sj sj ±1 sj = sj ±1 sj sj ±1 ,
π 3 = 1,
π sj = sj +1 π, (2.9)
364
P. J. Forrester, N. S. Witte
= π, s0 , s1 , s2 defining an extension of the affine Weyl group associand generate W (1) ated with the A2 root system. Proposition 2 ([20, 23]). The Bäcklund transformations of the PIV system are given by on the parameters as specified by (2.6) the actions of the extended affine Weyl group W and (2.8), and on the functions as specified by si (fj ) = fj +
2αi uij , fi
π(fj ) = fj +1 (i, j = 0, 1, 2),
(2.10)
where the uij are the elements of the orientation matrix
0 1 −1 U = −1 0 1 , 1 −1 0
(2.11)
associated with the boundary of the fundamental alcove [21]. Proof. Let V denote one of π, s0 , s1 , s2 and let βi := V (αi ). Using (2.6) and (2.8) it’s a simple exercise to explicitly verify that gi := V (fi ) of the form (2.10) satisfy the structurally identical equations g0 = g0 (g1 − g2 ) + 2β0 , g1 = g1 (g2 − g0 ) + 2β1 ,
g2
(2.12)
= g2 (g0 − g1 ) + 2β2 ,
thus giving rise to the stated Bäcklund transformation.
Following [22, 16], in Tables 2, 3 the actions (2.6), (2.8) and (2.10), are listed in tabular format. (1)
Table 2. Action of the generators of the extended affine Weyl group associated with the root system A2 on the simple roots s0 s1 s2 π T1 T2 T3
α0 −α0 α 1 + α0 α 2 + α0 α1 α0 + 1 α0 α0 − 1
α1 α 0 + α1 −α1 α 2 + α1 α2 α1 − 1 α1 + 1 α1
α2 α 0 + α2 α 1 + α2 −α2 α0 α2 α2 − 1 α2 + 1
From the earlier work of Okamoto it has been known that the PIV system, as for all the Painlevé transcendents, admits a Hamiltonian formulation and that from this viewpoint the Bäcklund transformations are birational canonical transformations {q, p; H } → {q, ˜ p; ˜ H˜ }.
Application of the τ -Function Theory of PIV, PII to the GUE
365
Table 3. Bäcklund transformations for the PIV system f0 s0
f0
s1
f0 −
2α1 f1
s2
f0 +
2α2 f2
π
f1 f1 +
f2
2α0 f0
f1 f1 −
f1
2α2 f2
f2
f2 −
2α0 f0
f2 +
2α1 f1
f2 f0
Proposition 3 ([26, 16]). The PIV dynamical system is a Hamiltonian system {q, p; H } with the Hamiltonian H = (2p − q − 2t)pq − 2α1 p − α2 q, = 1/2f0 f1 f2 + α2 f1 − α1 f2 ,
(2.13)
and canonical variables q, p −f1 = q,
f2 = 2p.
(2.14)
Proof. With H specified by (2.13), Hamilton’s equations of motion read q =
∂H = q(4p − q − 2t) − 2α1 , ∂p
p = −
∂H = p(2q − 2p + 2t) + α2 . (2.15) ∂q
Substituting for p and q according to (2.14) shows that these equations are identical to the final two equations in (2.2). Note. Because −f1 satisfies the PIV equation (2.1), it follows immediately from the first equation in (2.14) that q satisfies the PIV equation (2.1). Furthermore, use of the first equation in (2.15) shows p=
1 (q + q 2 + 2tq + 2α1 ), 4q
(2.16)
so H is completely specified in terms of the Painlevé IV transcendent (2.1) with parameters (2.3). There is a degree of ambiguity in constructing a Hamiltonian in that arbitrary functions of time can be added, and in fact there is a more symmetrical form HS = 1/2f0 f1 f2 + 1/3(α1 − α2 )f0 + 1/3(α1 + 2α2 )f1 − 1/3(2α1 + α2 )f2 ,
(2.17)
which is central to the Okamoto theory (termed the auxiliary Hamiltonian). However for our purposes this complicates some later results so we prefer the unsymmetrical form. Furthermore, in the full theory of PIV [21, 22] the Hamiltonian H0 ≡ H is associated with two additional Hamiltonians H1 = π(H0 ), H2 = π 2 (H0 ) but these are not required in the random matrix context. It is also true that H (t) can be specified as the solution of a certain second order second degree equation.
366
P. J. Forrester, N. S. Witte
Proposition 4 ([26, 14]). The Hamiltonian (2.13) satisfies the second order second degree differential equation of the Jimbo–Miwa–Okamoto σ form for PIV, (H )2 − 4(tH − H )2 + 4H (H + 2α1 )(H − 2α2 ) = 0.
(2.18)
Proof. Making use of Hamilton’s equations(2.15), we have for H (t) = H (t; q(t), p(t)), H = f1 f2 ,
(2.19)
H = f1 f2 (f2 − f1 ) + 2α2 f1 + 2α1 f2 .
(2.20)
Use of (2.13) and (2.19) in (2.20) shows −1/2H + tH − H , H − 2α2 1/ H + tH − H f2 = 2 . H + 2α1 f1 =
(2.21)
Substituting (2.21) in (2.13) gives the desired equation (2.18).
For future reference we note that use of Tables 2, 3 shows that under the action of , H transforms according to the generators of W 2α0 , f0 s1 (H ) = H + 2α1 t, s2 (H ) = H − 2α2 t, π(H ) = H + f2 − 2α2 t.
s0 (H ) = H +
(2.22)
2.2. Toda lattice equation. The τ -function τ = τ (t) is defined in terms of the Hamiltonian H (t) by H (t) =
d log τ (t). dt
(2.23)
It is possible to derive a Toda lattice equation for the sequences of τ -functions {τk [n]}n=0,1,... (k = 1, 3) associated with the Hamiltonians (2.24) H α →α +n , H α →α +n , 0 0 0 0 α1 →α1 −n
α2 →α2 −n
respectively (the reason for the subscripts 1 and 3 on τ will become apparent subse which quently). An essential point is that there exist shift operators from the algebra W after n applications on H generate the shifts required by (2.24). There are in fact three fundamental shift operators [21] T1 := π s2 s1 , T2 := s1 π s2 , T3 := s2 s1 π corresponding to translations on the root lattice by the fundamental weights ω˜ j , j = 1, 2, 3 of the root (1) system A2 . As can be checked from Tables 2, 3 and (2.22) these operators have the property that T1 H = H α0 →α0 +1 , T3−1 H = H α0 →α0 +1 . (2.25) α1 →α1 −1
α2 →α2 −1
Application of the τ -Function Theory of PIV, PII to the GUE
367
Table 2 also shows that when acting on the parameters themselves, the same shifts occurring in the transformed Hamiltonian results, and thus T1 (α0 , α1 , α2 ) −1 T3 (α0 , α1 , α2 )
= (α0 + 1, α1 − 1, α2 ), = (α0 + 1, α1 , α2 − 1).
(2.26)
After a further n iterations the equations (2.25) can be written in the form T1n+1 H − T1n H = f(1)2 [n],
−(n+1)
T3
H − T3−n H = −f(3)1 [n],
(2.27)
where the subscripts (1) ((3)) refer to the system of Eqs. (2.2) with the parameters replaced as in the first (second) Hamiltonian (2.24) and use has been made of (2.13). We remark that the two results of (2.25) are inter-related. Thus consider the mapping ω defined by multiplication by −1 together with the replacements (α0 , α1 , α2 ) → (−α0 , −α2 , −α1 ),
(f0 , f1 , f2 ) → (−f0 , −f2 , −f1 ).
(2.28)
We see immediately that the system (2.2) is unchanged by ω, as is the Hamiltonian (2.13), while we can check from Table 2 that ωT1 ω = T3−1 .
(2.29)
Applying ω to the first equation of (2.25) using (2.28) and (2.29) gives the second equation. With the τ -functions τ1 [n] and τ3 [n] defined by T1n H =
d log τ1 [n], dt
T3−n H =
d log τ3 [n], dt
(2.30)
application of (2.29) shows ωτ1 [n] = Cτ3 [n].
(2.31)
In light of the relation (2.31), let us focus attention on the first equation of (2.25) only. Proposition 5 ([26, 16]). The τ -function sequence τ1 [n] corresponding to the parameter sequence (α0 + n, α1 − n, α2 ) obeys the Toda lattice equation d2 σ1 [n+1]σ1 [n−1] log σ1 [n] = , 2 dt σ12 [n]
(2.32)
where σ1 [n] := Cet
2 (α −n) 1
τ1 [n].
(2.33)
Proof. Following [26, 16] we make use of the first equation in (2.27) and consider the difference T1n+1 H − T1n H − T1n H − T1n−1 H = f(1)2 [n] − T1−1 f(1)2 [n]. (2.34) A crucial fact, which follows from Table 3 and (2.15), is that this difference is a total derivative d f(1)2 [n] − T1−1 f(1)2 [n] = (2.35) log f(1)1 [n]f(1)2 [n] + 2(α1 − n) . dt
368
P. J. Forrester, N. S. Witte
But it follows from (2.19) and (2.13) that f(1)1 [n]f(1)2 [n] + 2(α1 − n) =
2 d n d2 T1 H + 2t (α1 − n) = 2 log et (α1 −n) τ1 [n] , dt dt (2.36)
and hence the right-hand side of (2.35) is equal to 2 2 d d t (α1 −n) log log e τ1 [n] . dt dt 2
(2.37)
On the other hand (2.34) and (2.30) shows the left-hand side of (2.35) is equal to d τ1 [n+1]τ1 [n−1] log . (2.38) dt τ12 [n] Equating (2.37) and (2.38), and integrating shows that 2 d2 τ1 [n+1]τ1 [n−1] t (α1 −n) log e τ [n] =C , 1 2 dt τ12 [n]
(2.39)
and the stated result (2.32) follows. There is the ambiguity of a multiplicative constant C, possibly dependent on n but not on t, and this can be chosen freely, for example to render the Toda lattice equation in a simple form. The Toda lattice equation obeyed by the τ3 [n] with parameters (α0 + n, α1 , α2 − n) is obtained by applying the mapping ω to both sides of (2.39) and making use of (2.31). This shows 2 τ3 [n+1]τ3 [n−1] d2 log e−t (α2 −n) τ3 [n] = C , 2 dt τ32 [n]
(2.40)
which with σ3 [n] := Ce−t
2 (α −n) 2
τ3 [n],
(2.41)
gives the Toda lattice equation d2 σ3 [n+1]σ3 [n−1] log σ3 [n] = . dt 2 σ32 [n]
(2.42)
Another way to deduce (2.40) is via (2.39) and the differential equation (2.18). Now, the second Hamiltonian in (2.24) is obtained from the first by simply interchanging α1 and α2 . On the other hand α1 and α2 are interchanged in (2.18) if we replace t by it then replace H (it) by −iH (t). This tells us that in (2.39) we can make the replacements t → it,
τ1 (it) → τ3 (t),
α1 → α2 ,
(2.43)
which indeed gives (2.40). Furthermore, since (2.43) shows τ1 and τ3 are simply related, it suffices to consider one sequence only, τ3 [n] say.
Application of the τ -Function Theory of PIV, PII to the GUE
369
2.3. Classical solutions. For a special initial choice of the parameters it is possible to choose τ3 [0] = 1, and then to determine τ3 [1] in terms of a classical function, that is to say the solution of a second order linear differential equation. What is essential here is the condition for the decoupling of the two independent first order differential equations so that what remains is a Riccati equation. Proposition 6 ([26]). For the special initial choice of the parameters such that α2 = 0, i.e. for parameters (1 − α1 , α1 , 0) the first nontrivial member of the τ -function sequence τ3 [1] satisfies the Hermite-Weber equation, τ3 [1] = −2tτ3 [1] − 2α1 τ3 [1].
(2.44)
Proof. With n = 0, (1 − α1 + n, α1 , −n) implies α2 = 0. Now we see from (2.13) that H = pq(2p − q − 2t) − 2α1 p, (2.45) α2 =0
which allows us to take
H
α2 =0
= 0,
(2.46)
provided we set p = 0. Recalling (2.30) this implies τ3 [0] = 1.
(2.47)
We read off from the n = 1 case of the second equation in (2.27), together with (2.14), that T3−1 H − H = q, (2.48) α2 =0
α2 =0
and thus, after recalling the second equation in (2.30) and (2.46), d log τ3 [1] = q. dt
(2.49)
The first of Hamilton’s equations (2.15) gives, with α2 = 0 and p = 0 (after the differentiation), the Riccati equation q = −q 2 − 2tq − 2α1 .
(2.50)
Substituting (2.49) this reduces to the linear equation (2.44) first obtained in the present context by Okamoto [26]. Proposition 7. Two linearly independent solutions to the Toda lattice equation (2.40) for sequences of τ -functions with parameters (α0 + n, α1 , −n), n ≥ 0, starting from the Weyl chamber wall α2 = 0 are given by the determinant forms t 2 τ3 [n](t; α1 ) = C det (t − x)−α1 +i+j e−x dx , (2.51) i,j =0,...,n−1
−∞
and τ¯3 [n](t; α1 ) = C det
∞
−∞
(t − x)−α1 +i+j e−x dx 2
i,j =0,...,n−1
.
(2.52)
370
P. J. Forrester, N. S. Witte
Proof. In the special case α1 = 0, we observe that a solution of (2.44) is t 2 τ3 [1] = C e−x dx.
(2.53)
−∞
In fact it is possible to solve (2.44) in a form analogous to (2.53) for general α1 . Thus consider the integral t 2 Ia (t) := (t − x)a e−x dx, (2.54) −∞
and suppose temporarily that Re(a) > −1. Simple manipulation gives t d 2 Ia (t) = tIa−1 (t) + 1/2 (t − x)a−1 e−x dx dx −∞ (a − 1) Ia−2 (t). = tIa−1 (t) + 2
(2.55)
But (a − 1)Ia−2 (t) =
1 d2 Ia (t), a dt 2
Ia−1 (t) =
1 d Ia (t). a dt
(2.56)
Thus we see that I−α1 (t) satisfies (2.44) and this implies τ3 [1] = C
t
−∞
(t − x)−α1 e−x dx, 2
(2.57)
where we require Re(α1 ) < 1. Starting with (2.47) and (2.57), up to a multiplicative constant the τ -functions τ3 [n] (n = 2, 3, . . . ) are uniquely specified by the Toda lattice equation (2.42). In fact it was known to Sylvester ([19, pp. 115–117]) that the solution of (2.42) with initial condition σ3 [0] = 1 is the double Wronskian or Hankel determinant i+j d σ3 [n] = det σ3 [1] . (2.58) dt i+j i,j =0,...,n−1 Recalling (2.41) and (2.57) we therefore have τ3 [n] := τ3 [n](t; α1 ) i+j 2 t 2 d t −α1 −x 2 (t − x) e dx e = C det e−t dt i+j −∞
i,j =0,...,n−1
Making use of (2.55) we can check that t dp t2 t 2 −α1 −x 2 p t2 e (t − x) e dx = 2 e (t − x)−α1 +p e−x dx, dt p −∞ −∞ so (2.59) can also be written in the final form of (2.51).
.
(2.59)
(2.60)
Application of the τ -Function Theory of PIV, PII to the GUE
371
The second linearly dependent solution of (2.44) can also be written in an integral form similar to (2.54). Thus we see the integral I¯a (t) :=
∞
(t − x)a e−x dx, 2
−∞
(2.61)
satisfies the formulas (2.55) and (2.56), and thus satisfies (2.44) with a = −α1 . Hence in addition to (2.57) we have the solution τ¯3 [1] = C
∞
(t − x)−α1 e−x dx, 2
−∞
(2.62)
(note that for α1 not equal to a non-positive integer, this is well defined only if t ∈ R). Proceeding as in the derivation of (2.51), we deduce from the Toda lattice equation (2.42), and the initial values (2.47), (2.62), the sequence of τ -functions given by (2.52). Let us now consider the sequence of τ -functions τ¯1 [n]. Proposition 8 ([26]). The sequence of τ -function solutions to the Toda lattice equation (2.39) τ¯1 [n], n ≥ 0, corresponding to the parameter sequence (α0 + n, −n, α2 ) starting from the line α1 = 0 has the determinantal form τ¯1 [n](t; −p) = C det Hp+i+j (t)
(2.63)
i,j =0,...,n−1
for −α2 = p ∈ Z≥0 . Proof. This sequence can be obtained from τ¯3 [n] by the (inverse of) the mappings (2.43). Replacing t by −it in (2.51) does not lead to an integral of interest in random matrix applications, but doing the same in (2.52) gives τ¯1 [n](t; α2 ) = C det
∞
−∞
(t − ix)−α2 +i+j e−x dx 2
i,j =0,...,n−1
,
(2.64)
(n)
which is of interest. We recall that for τ¯1 the parameters (α0 , α1 , α2 ) in the corresponding Hamiltonian are given by (1 − α2 + n, −n, α2 ).
(2.65)
For p ∈ Z≥0 we know
∞ −∞
(t − ix)p e−x dx = 2
√ −p π 2 Hp (t),
and thus setting α2 = −p equation (2.64) yields (2.63).
Note that with p = N , n = 2, this is precisely the final determinant in (1.15).
(2.66)
372
P. J. Forrester, N. S. Witte
2.4. Bäcklund transformations and discrete Painlevé systems. It has been known that some of the Bäcklund transformations of the PIV transcendent can be identified with discrete Painlevé equations [6, 11], although no systematic study has been undertaken for this class. We will find that the difference equations for fj [n], H [n], τ [n] which are generated by the Bäcklund transformations for the two shift operations T3−1 , T1 are in fact manifestations of discrete Painlevé equations. Proposition 9 ([11]). The Bäcklund transformations of the PIV system corresponding to the shift operator T3−1 generating the parameter sequence of (α0 + n, α1 , −n) with n ∈ Z, 0 < α0 , α1 < 1 and α0 + α1 = 1 are second order difference equations of the first discrete Painlevé equation dPI, namely χk+1 + χk + χk−1 = 2t +
k − (1/2 + α1 ) + (−1)k (1/2 − α1 ) χk
k ≥ 1,
(2.67)
where χ2n+1 = f(3)2 [n],
χ2n+2 = f(3)0 [n] n ≥ 0.
(2.68)
Proof. The action of the shift operators on the fj is expressible in a terminating continued fraction, which for T3 and its inverse takes the form T3 (f0 ) = f1 − T3 (f1 ) = f2 +
T3 (f2 ) = f0 +
T3−1 (f0 ) = f2 −
T3−1 (f1 ) = f0 −
T3−1 (f2 ) = f1 +
2α2 , f2 2(α1 +α2 ) , 2α2 f1 − f2 2α2 2(α1 +α2 ) − , 2α2 f2 f1 − f2 2α0 2(α0 +α1 ) + , 2α0 f0 f1 + f0 2(α0 +α1 ) , 2α0 f1 + f0 2α0 , f0
(2.69) (2.70)
(2.71)
(2.72)
(2.73)
(2.74)
as one can verify using the action of the affine Weyl group reflections and diagram rotations as given in Tables 2, 3. For simplicity of notation we suppress the subscript (3) labelling the sequence (α0 + n, α1 , −n) during the discussion of our proofs as there is no risk of confusion. Taking the first and last members of this set, now at the nth rung of the T3 ladder, and adding their unshifted f -variable we have 2n , f2 [n] 2(n+α0 ) f2 [n+1] + f2 [n] = 2t − f0 [n] + , f0 [n]
f0 [n] + f0 [n−1] = 2t − f2 [n] +
(2.75)
Application of the τ -Function Theory of PIV, PII to the GUE
373
so that one has a closed system. This can be recognised as the two components of a staggered system of difference equations and employing the definitions of χk above we arrive at the discrete Painlevé equation dPI. In terms of the coordinate and momenta of the Hamiltonian system this difference system was found by Okamoto [26, 28] and can be expressed as
q[n+1] = (2t +q[n]−2p[n]) q[n−1] = −2p[n]
q[n]p[n] − α1 , q[n]p[n] − n
p[n+1] = −1/2q[n] + p[n−1] = t +
q[n](2t +q[n]−2p[n]) + 2α1 , 2(n+α0 ) − q[n](2t +q[n]−2p[n])
α0 + n , 2t +q[n]−2p[n]
q[n]p[n] − α1 q[n]p[n] − n − p[n] . 2p[n] q[n]p[n] − n
(2.76) (2.77) (2.78) (2.79)
Consequently a third order difference equation exists for the Hamiltonian through the relation H [n + 1] − H [n] = −f1 [n].
(2.80)
Eliminating p[n] between (2.76), (2.77) we find a second order difference equation for q[n], 2 nq[n]q[n−1] 4t + 2q[n] + q[n+1] + q[n−1] = 2 (n+1)q[n+1] − nq[n−1] − (2t + q[n] + q[n+1])(α1 + 1/2q[n](2t + q[n])) × (n+1)q[n+1] − nq[n−1] − (2t + q[n] + q[n−1])(−α1 + 1/2q[n](2t + q[n] + q[n+1] + q[n−1])) . (2.81) Use of H [n+1] − H [n] = q[n] leads to the third order equation in H [n]. In addition we have a higher order difference equation for the τ -function. Proposition 10. The τ -function sequence, appropriately normalised, associated with the shift operator T3−1 with parameter values (α0 + n, α1 , −n), n ∈ Z≥0 , 0 < α0 , α1 < 1
374
P. J. Forrester, N. S. Witte
and α0 + α1 = 1 satisfies the fourth order difference equation 4t 2 2nτ 2 [n] − τ [n+1]τ [n−1] × 2(n−α1 )τ 2 [n] − τ [n+1]τ [n−1] × τ [n−2]τ [n+1]τ [n] + 2τ 2 [n−1]τ [n+1] + 4n(α1 −n)τ 2 [n]τ [n−1] × τ [n+2]τ [n−1]τ [n] − 2τ 2 [n+1]τ [n−1] + 4n(α1 −n)τ 2 [n]τ [n+1] ! = τ [n+2]τ [n−2]τ 3 [n] − 16n2 (α1 −n)2 τ 5 [n] (2.82) + 16n(α1 −n)(α1 −2n)τ 3 [n]τ [n+1]τ [n−1] − 4(2n2 −2α1 n+1)τ [n]τ 2 [n+1]τ 2 [n−1] + τ [n+2]τ 2 [n−1] τ [n+1]τ [n−1] + 2(α1 +1−2n)τ 2 [n] "2 . + τ [n−2]τ 2 [n+1] τ [n+1]τ [n−1] + 2(α1 −1−2n)τ 2 [n] Proof. We first seek to express all the fundamental quantities in terms of the product f1 [n]f2 [n]. By multiplying the two transformations (2.69) and (2.70) we find a quadratic relation for f1 [n] (and f1 [n−1]), f1 [n](2t − f1 [n]) = f1 [n+1]f2 [n+1] + f1 [n]f2 [n] + 2α1 .
(2.83)
Next we multiply (2.70) by f1 [n] which yields a relation for the product f1 [n]f1 [n−1] = f1 [n]f2 [n]
f1 [n]f2 [n] + 2α1 . f1 [n]f2 [n] + 2n
(2.84)
One can verify then that a linear proportionality exists between f1 [n] and f1 [n−1] via the product f1 [n]f2 [n], ! " f1 [n]f2 [n] + 2α1 f1 [n] f1 [n]f2 [n] + f1 [n−1]f2 [n−1] + 2α1 − f1 [n]f2 [n] f1 [n]f2 [n] + 2n ! " f1 [n]f2 [n] + 2α1 = f1 [n−1] f1 [n+1]f2 [n+1] + f1 [n]f2 [n] + 2α1 − f1 [n]f2 [n] , f1 [n]f2 [n] + 2n (2.85) so that f1 [n] and f1 [n−1] may now be linearly related to f1 [n]f2 [n]. Multiplying these two later relations and using C
τ3 [n+1]τ3 [n−1] = 2n + f1 [n]f2 [n], τ32 [n]
with C = 1 to introduce the τ -functions we arrive at (2.82).
(2.86)
Note. The difference equations (2.81) and (2.82) have the advantage of being of the lowest order we have found possible, but the disadvantage of not being linear in the highest order terms (q[n + 1] and τ [n + 2] respectively). In fact difference equations linear in the highest order terms can be given by increasing by one the order of the equations in each case [28].
Application of the τ -Function Theory of PIV, PII to the GUE
375
Applying the operator ω (recall (2.28)) we obtain analogous results for the sequence generated by T1 . Proposition 11 ([11]). The Bäcklund transformations generated by the shift operator T1 corresponding to the parameter sequence (α0 + n, −n, α2 ) with n ∈ Z, 0 < α0 , α2 < 1, and α0 + α2 = 1, are second order difference equations of the first discrete Painlevé equation dPI, that is ηk+1 + ηk + ηk−1 = 2t −
k − [1+(−1)k ]α2 − 1/2[1−(−1)k ] , ηk
k ≥ 1,
(2.87)
where η2n+1 = f(1)1 [n],
η2n+2 = f(1)0 [n],
n ≥ 0.
Proof. This follows immediately upon applying ω to both sides of (2.67).
(2.88)
The analogue of (2.81) for the parameter sequence generated by the shift T1 can be found by applying the ω map to this relation, 2 −2np[n]p[n − 1] 2t − 2p[n] − p[n + 1] − p[n − 1] = (n + 1)p[n + 1]−np[n−1] + (t −p[n]−p[n + 1])(α2 + 2p[n](t −p[n])) × (n + 1)p[n + 1] − np[n − 1] + (t − p[n] − p[n − 1])(−α2 + 2p[n](t − p[n] − p[n + 1] − p[n − 1])) , (2.89) and this implies a third order difference equation for the Hamiltonian via H [n+1] − H [n] = f2 [n] = 2p[n].
(2.90)
There is also a higher order difference equation for the τ -function which can be derived using the relation τ1 [n+1]τ1 [n−1] = f1 [n]f2 [n] − 2n, τ12 [n]
(2.91)
although we do not reproduce this here. 3. τ -Function Theory for PII 3.1. Affine Weyl group symmetry. In the general Painlevé theory the second Painlevé equation naturally appears as a coalescence limit of PIV. From the work of [36] it is known that in random matrix theory PII occurs in the edge scaling limit of the GUE. This suggests that before studying this limit we should develop a theory of PII analogous to that developed for PIV in the previous section. We take the PII equation to be defined in the standard manner y = 2y 3 + ty + α.
(3.1)
376
P. J. Forrester, N. S. Witte
Proposition 12. The second Painlevé equation with the transcendent y = q(t) and parameter α is equivalent to the system of first order differential equations f0 = −2qf0 + α0 ,
(3.2)
= 2qf1 + α1 ,
(3.3)
f1
where f0 + f1 = 2q 2 + t and α0 + α1 = 1 with α = α1 − 1/2 = 1/2 − α0 . Proof. This is established by eliminating p through the substitutions f0 = 2q 2 − p + t and f1 = p. = s0 , s1 , π be the extended affine Weyl group of the root system Proposition 13. Let W (1) of type A1 generated by the reflections s0 , s1 and the diagram rotation π , with action on the roots α0 , α1 as given in Table 4. The coupled system (3.2), (3.3) is symmetric under the Bäcklund transformations induced by the elements of the above affine Weyl group as specified in Table 5. (1)
Table 4. Action of the generators of the extended affine Weyl group associated with the root system A1 on the simple roots s0 s1 π
α0 −α0 α0 + 2α1 α1
α1 α1 + 2α0 −α1 α0
Table 5. Bäcklund transformations for the PII system f0 s0 s1 π
f1 f1 −
f0 f0 +
2α 2 4α1 q + 21 f1 f1 f1
2α 2 4α0 q + 20 f0 f0 f1 f0
q q−
α0 f0
q+
α1 f1
−q
Proof. This can be directly verified using the equations of motion (3.2), (3.3).
Underlying the dynamics of the PII system is a Hamiltonian structure. Proposition 14. The PII dynamical system is equivalent to a Hamiltonian system {q, p; H } with Hamiltonian H = −1/2f0 f1 − α1 q,
(3.4)
and canonical coordinates and momenta q, p defined by p = f1 ,
2q 2 = f0 + f1 − t.
(3.5)
Application of the τ -Function Theory of PIV, PII to the GUE
377
Proof. Using the symmetrised differential equations (3.2, 3.3) the Hamilton equations of motion q = p − q 2 − 1/2t,
p = 2qp + α1 ,
(3.6)
can be verified. Remark. The fundamental domain or Weyl chamber for PII can be taken as the interval α ∈ (−1/2, 0] or α ∈ [0, 1/2), and so there exist identities relating the transcendents and related quantities at the endpoints of these intervals. In particular, denoting the transcendent q(t, α) and with 7 2 = 1, t = −21/3 s we have [12] d q(t, 1/27) − 7 q 2 (t, 1/27) − 1/27 t, dt d 1 q(t, 1/27) = 7 2−1/3 q(s, 0). q(s, 0) ds
−7 21/3 q 2 (s, 0) =
(3.7)
The action of the affine Weyl group on the Hamiltonian is given in Table 6. We define two shift operators corresponding to translations by the fundamental (1) weights of the affine Weyl group A1 , T 1 = π s1 ,
T2 = s1 π,
(3.8)
although T1 T2 = 1 so only one is independent. Their action on the parameter space is given in Table 7. Table 6. Bäcklund transformations of the Hamiltonian
s0
H0
H1 = π(H0 ) = H0 + q
α0 f0
H1
H0 +
s1
H0
π
H1
H1 +
α1 f1
H0 (1)
Table 7. Action of the shift operators on the simple roots of the root system A1 T1 T2
α0 α0 + 1 α0 − 1
α1 α1 − 1 α1 + 1
Proposition 15. The Bäcklund transformations corresponding to the shifts are given by T1 (f0 ) = f1 −
2α 2 4α0 q + 20 , f0 f0
T1 (f1 ) = f0 , T2 (f0 ) = f1 , T2 (f1 ) = f0 +
(3.9) (3.10) (3.11)
2α 2 4α1 q + 21 , f1 f1
(3.12)
378
P. J. Forrester, N. S. Witte
T1 (H0 ) = H0 + q,
(3.13)
α0 T1 (H1 ) = H1 − q + , f0 α1 T2 (H0 ) = H0 + q + , f1 T2 (H1 ) = H1 − q.
(3.14) (3.15) (3.16)
Proposition 16. The Hamiltonian H (t) satisfies the second order second degree differential equation of Jimbo–Miwa–Okamoto σ form for PII, # $2 # $3 H + 4 H + 2H [tH − H ] − 1/4α12 = 0. (3.17) Proof. Using first two derivatives of H H = −1/2f1 , H = −qf1 − 1/2α1 ,
(3.18)
one can solve for q, f0 , f1 and then substitute these back into the expression for H . The result (3.17) then follows after simplification. With H given by (3.4), and p and q specified by (3.5), Hamilton’s equation for q implies p = q + q 2 + 1/2t.
(3.19)
Thus H can be expressed in terms of the Painlevé II transcendent q according to H = 1/2(q )2 − 1/2(q 2 + 1/2t)2 − (α + 1/2)q.
(3.20)
3.2. Toda lattice equation. The τ -functions are defined as before (2.23) and corresponding to each sequence generated by the shift operators is a Toda lattice equation. Proposition 17. The τ -function sequence generated by the shift operator T1 with the parameter sequence (α0 + n, α1 − n) for n ≥ 0 T1n (H ) = H [n] =
d ln τ [n], dt
(3.21)
obeys the Toda lattice equation C
τ [n+1]τ [n−1] d2 = ln τ [n]. τ 2 [n] dt 2
(3.22)
Proof. This parallels the argument employed for PIV case, by utilising the relations H [n+1] − H [n] = q[n],
(3.23)
and q[n] − T1−1 (q[n]) =
d ln f1 [n], dt
(3.24)
along with
d H = −1/2f1 . dt
(3.25)
Application of the τ -Function Theory of PIV, PII to the GUE
379
3.3. Classical solutions. When the parameter values are those on a chamber wall (a point) α1 = n ∈ Z then the τ -functions are known to be expressible in terms of Airy functions [26]. Proposition 18. The solution for the first non-trivial member of the τ -function sequence τ [1](t) generated by the shift operator T1 with initial parameters (α0 , α1 ) = (1, 0) that is bounded as t → −∞ is τ [1](t) = CAi(−2−1/3 t). The nth member of this sequence is i+j d τ [n](t) = C det Ai(−2−1/3 t) dt i+j
i,j =0,... ,n−1
(3.26)
.
(3.27)
Proof. Starting from α1 = 0 at n = 0 one can take p[0] = 0 so that H [0] = 0 and conventionally τ [0] = 1. Using (3.13) we find that H [1] = q[0] and so the equation of motion (3.6) gives the second order linear differential equation τ [1] + 1/2tτ [1] = 0, and thus (3.26). The determinant formula (3.27) follows from (2.58).
(3.28)
Another special parameter value of the Hamiltonian system (3.4) is α1 = −1/2 [37] when Hamilton’s equation (3.6) permit the solution (q, p) = (t −1 , 1/2t).
(3.29)
However the corresponding value of H is not zero so in this case we do not have τ [0] = 1 (rather H [0] and thus log τ [0] is a rational function of t), and thus the sequence of τ functions generated by T1 is not given by a determinant [15]. Nonetheless the Bäcklund transformations of Prop. 15 show that H [n] and thus log τ [n] remain rational functions of t for all n = 1, 2, 3, . . . . 3.4. Bäcklund transformations and discrete dPI. The discrete dynamical system generated by the Bäcklund transformations is also integrable and can be identified with a discrete Painlevé system Proposition 19. The members of the sequence {q[n]}, n ≥ 0 generated by the shift operator T1 with the parameters (α0 +n, α1 −n) are related by a second order difference equation which is the alternate form of the first discrete Painlevé equation, a-dPI, α + 1/2 − n α − 1/2 − n + = −2q 2 [n] − t. q[n] + q[n−1] q[n+1] + q[n]
(3.30)
Proof. We deduce from (3.8) and Table 5 that α − 1/2 , p − 2q 2 − t α + 1/2 T2 (q) = −q − , p T1 (q) = −q +
(3.31) (3.32)
so eliminating p through the combination of these two we arrive at the stated result.
380
P. J. Forrester, N. S. Witte
The full set of forward and backward difference equations are [26] α − 1/2 − n , p[n] − 2q[n]2 − t α + 1/2 − n q[n−1] = −q[n] − , p[n] q[n+1] = −q[n] +
p[n+1] = −p[n] + 2q[n]2 + t, α + 1/2 − n 2 p[n−1] = t − p[n] + 2 q[n] + . p[n]
(3.33) (3.34) (3.35) (3.36)
The discrete Painlevé equation (3.30) implies a third order difference equation for the Hamiltonian α + 1/2 − n α − 1/2 − n + = −2(H [n+1] − H [n])2 − t, (3.37) H [n+1] − H [n−1] H [n+2] − H [n] because q[n] = H [n + 1] − H [n]. Equations (3.35) and (3.36) also imply q[n] =
1 αn p[n] (p[n−1] − p[n+1]) − , 4αn 2p[n]
αn := α + 1/2 − n.
(3.38)
Using this to eliminate q[n] (in say (3.35)) yields a second order difference equation for the p[n], 1 2 α2 p [n] (p[n+1] − p[n−1])2 + 2 n − 2p[n] − p[n+1] − p[n−1] + 2t = 0. 2 4αn p [n] (3.39) Furthermore, Eqs. (3.21), (3.22) and (3.25) give C
τ [n+1]τ [n−1] = p[n]. τ 2 [n]
(3.40)
So substituting in (3.39) (with say C = −2) implies a fourth order difference equation for τ [n], 2 1 2 2 [n+1] − τ [n+2]τ [n−1] + 1/8αn2 τ 6 [n] τ [n−2]τ 2αn2 (3.41) + τ [n−2]τ 3 [n]τ 2 [n+1] + τ [n+2]τ 3 [n]τ 2 [n−1] + 2τ 3 [n−1]τ 3 [n+1] + t τ 2 [n]τ 2 [n−1]τ 2 [n+1] = 0. While (3.39) and the corresponding equation for τ [n] provide a polynomial relation between the smallest set of consecutive sequence members {p[n]} and {τ [n]} we have found possible, they have the disadvantage of not being linear in the highest order term (p[n+1] and τ [n+2] respectively). This disadvantage can be remedied by increasing by one the order of the equation in each case. Thus by replacing n by n − 1 in (3.38), then adding the result to the original equation and using (3.34) implies a third order difference equation for the p[n] [29], 0=
1 αn p[n] (p[n−1] − p[n+1]) + 4αn 2p[n] 1 αn−1 + , p[n−1] (p[n−2] − p[n]) − 4αn−1 2p[n]
(3.42)
Application of the τ -Function Theory of PIV, PII to the GUE
381
which is indeed linear in p[n + 1]. Substituting (3.40) gives a fifth order equation for τ [n], linear in the highest order term τ [n+2]. 3.5. Coalescence from PIV. Since the earliest works on the Painlevé transcendents [31] it was known how to obtain the PII system from a limiting procedure or coalescence applied to the PIV, however there is in fact more than one such coalescence path. In fact our analysis of the scaled GUE requires the application of a second coalescence path rather than the one commonly employed. In the first limit the parameters (α0 , α1 , α2 ) and variables tIV , qIV , pIV , HIV in the PIV system scale in a way that α2 is fixed so that α0 = 1/2 − αII − 1/27 −6 ,
(3.43)
−6
α1 = 1/27 , α2 = αII + 1/2,
(3.44) (3.45)
tIV = −7 −3 + 2−2/3 7 tII ,
(3.46)
qIV = 7
−3
pIV = 2
+2
−2/3
HIV = −7
−3
2/3 −1
7
qII ,
(3.47)
7 pII ,
(3.48)
(αII + 1/2) + 2
2/3 −1
7
HII ,
(3.49)
as 7 → 0 then the function qII (tII ) satisfies the PII differential equation with parameter αII [13]. The second limiting procedure is obtained from the first by the mapping ω introduced in (2.28). Proposition 20. If α1 is fixed so that the variables scale like α0 = αII + 3/2 + 1/27 −6 , α1 = −αII − 1/2,
(3.50) (3.51)
α2 = −1/27 −6 ,
(3.52)
tIV = 7
−3
−2
−2/3
7 tII ,
qIV = 21/3 7 pII , 2pIV = 7
−3
HIV + α1 tIV = −2
+2 7
(3.54)
2/3 −1
2/3 −1
(3.53)
7
qII ,
HII ,
(3.55) (3.56)
as 7 → 0 then the function qII (tII ) satisfies the PII differential equation with parameter αII . Furthermore the third order difference equation for HIV (2.89, 2.90), related to the discrete Painlevé equation dPI, corresponding to the Bäcklund transformation under the shift operator T1 transforms into the third order difference equation for HII (3.37), related to the alternate discrete Painlevé equation a-dPI, under this scaling. Proof. Under the mapping HIV → −HIV , 2pIV ↔ qIV , α1 ↔ −α2 and tIV ↔ −tIV . The only equation which isn’t immediate from the mapping is (3.56). Substituting (3.53) for tIV and ignoring the term proportional to 7 tII shows this equation is equivalent to HIV = 7 −3 (αII + 1/2) − 22/3 7 −1 HII ,
(3.57)
which is precisely what results from applying the mapping ω to (3.49). The scaling of the third order difference equation for HIV associated with the discrete Painlevé equation, dPI, (2.89) to (3.37) can be verified directly.
382
P. J. Forrester, N. S. Witte
4. Application to Finite GUE Matrices In this section we will show that the determinants in (2.51) and (2.52) occur in the calculation of the quantities E˜ N and FN , introduced in the Introduction, relating to GUE random matrices. 4.1. Calculation of EN (0; (s, ∞)) and E˜ N (s; a). Consider first the probability EN (0; (s, ∞)). Proposition 21. The gap probability EN (0; (s, ∞)) is identical to the N th τ -function of the sequence generated by T3−1 from the corner of the Weyl chamber (α0 , α1 , α2 ) = (1, 0, 0), EN (0; (s, ∞)) = τ3 [N ](s; 0),
(4.1)
where the normalization of τ3 [N] must be such that lim τ3 [N ](s; 0) = 1.
(4.2)
s→∞
The resolvent kernel function R = RN (t) occurring in (1.5) is the N th Hamiltonian associated with this sequence, RN (t) = H (t) . (4.3) (α0 ,α1 ,α2 )=(1+N,0,−N)
Proof. From the meaning of EN we see from (1.1) with g(x) = e−x that 2
1 EN (0; (s, ∞)) = C
s
−∞
dx1 · · ·
s −∞
dxN
N j =1
e−xj
2
(xk − xj )2 .
(4.4)
1≤j
Introducing the Vandermonde determinant (xk − xj ) = det[xjk+1 ]j,k=0,...,N−1 ,
(4.5)
1≤j
standard manipulations following Heine (see [33, p. 27]) allow this to be rewritten as the N × N determinant s 1 2 EN (0; (s, ∞)) = det x j +k e−x dx , j,k=0,...,N−1 C −∞ s 1 2 = det (s − x)j +k e−x dx , (4.6) j,k=0,...,N−1 C −∞ where the second equality follows by repeating the procedure which led to the first equality but starting from the Vandermonde determinant formula with xj → xj − s (j = 1, . . . , N ). Comparing with (2.51) gives (4.1). Recalling (2.30) it follows from (4.1) that ∞ EN (0; (s, ∞)) = exp − H (t) dt , (4.7) s
and comparing with (1.5) gives (4.3).
(α0 ,α1 ,α2 )=(1+N,0,−N)
Application of the τ -Function Theory of PIV, PII to the GUE
383
According to (2.18) R satisfies Eq. (1.3), thus rederiving the result of Tracy and Widom (1.5). We remark that the boundary condition follows from the fact that, with ρ(t) denoting the eigenvalue density, R(t) ∼ ρ(t) = t→∞
2−N −t 2 2 H e (t) − H (t)H (t) . N+1 N−1 N π 1/2 (N −1)!
(4.8)
Considering next E˜ N (s; a) we have a generalisation of the previous case. Proposition 22. The average E˜ N (s; a) is the N th member of the τ -function sequence {τ3 [N ](s; −a)} generated by T3−1 from a general position on the α2 = 0 wall of the Weyl chamber, namely (α0 , α1 , α2 ) = (1 + a, −a, 0), and so E˜ N (s; a) = τ3 [N ](s; −a).
(4.9)
The logarithmic derivative UN (t; a) is identical to the Hamiltonian associated with this sequence, UN (t; a) = H (t) . (4.10) (α0 ,α1 ,α2 )=(1+a+N,−a,−N)
Proof. According to (1.7) E˜ N (s; a) is given in terms of a multiple integral by 1 E˜ N (s; a) = C
s
−∞
dx1 · · ·
s −∞
dxN
N
e−xj (s − xj )a 2
j =1
(xk − xj )2 ,
1≤j
(4.11) where the normalization C is such that E˜ N (s; a) ∼ s Na 1 + O(1/s) E˜ N (s; 0)
as
s → ∞.
The method of derivation of (4.6) shows that s 1 2 ˜ EN (s; a) = det (s − x)a+j +k e−x dx . j,k=0,...,N−1 C −∞ Recalling (2.51) we thus have (4.9). It follows from this that s ˜ ˜ H (t) EN (s; a) = EN (s0 ; a) exp s0
(α0 ,α1 ,α2 )=(1+a+N,−a,−N)
dt ,
(4.12)
(4.13)
(4.14)
and consequently (4.10). According to (2.18) UN (t; a) satisfies the nonlinear equation (UN )2 − 4(tUN − UN )2 + 4UN (UN − 2a)(UN + 2N ) = 0,
(4.15)
which considering (4.12) and (1.5) is to be solved subject to the boundary condition UN (t; a) ∼
t→∞
$ Na # 1 + O(1/t) + R(t). t
(4.16)
384
P. J. Forrester, N. S. Witte
In light of (4.8) the term R(t) decays as a Gaussian and so is negligible with respect to the inverse powers in (4.16) (here we assume a = 0). This means the boundary condition (4.16) cannot be distinguished from the boundary condition with R(t) removed; however we will see below that this latter boundary condition relates to a solution of (4.15) distinct from the one required in (4.14). To overcome this ambiguity we specify the t → −∞ behaviour of U (t) rather than the t → ∞ behaviour. Now, replacing s by −s in (4.11) and changing variables x → x + s shows 1 2 2 E˜ N (−s; a) = e−Ns s −Na−N C ∞ ∞ N 2 × dx1 · · · dxN e−(xj /s) e−2xj xja 0
∼
0
1 −Ns 2 −Na−N s e C
j =1
2
(xk − xj )2
1≤j
1 + O(1/s 2 ) .
(4.17)
In combination with (4.14) this implies UN (t; a)
∼
t→−∞
−2N t −
1 N (a + N ) +O 3 . t t
(4.18)
Under the action of the Bäcklund transformations UN (t; a) will satisfy two recurrence relations corresponding to the shift operators T3−1 and T1 . For the T3−1 sequence we relate the Painlevé and GUE parameters by n = N and α1 = −a, and with (2.81) and (2.80) we have a third order recurrence relation in N with a fixed (suppressing the additional variable dependence in U ) 2 − N (UN+1 − UN )(UN − UN−1 ) 4t + UN+2 + UN+1 − UN − UN−1 + 2 (N +1)(UN+2 − UN+1 ) − N (UN − UN−1 ) − 1/2(2t + UN+2 − UN )(−2a + (UN+1 − UN )(2t + UN+1 − UN )) × (N +1)(UN+2 − UN+1 ) − N (UN − UN−1 )
(4.19)
− 1/2(2t + UN+1 − UN−1 )(2a + (UN+1 − UN )(2t + UN+2 − UN−1 )) = 0. In contrast the T1 sequence has the parameter correspondence n = a and α2 = −N which is an interchange of α1 and α2 with respect to the T3 sequence. A third order recurrence relation in a, with fixed N , can be derived from (4.19) using the mapping (2.43) in which N ↔ a, t → it and then UN (it; a) → −iUN (t; a). Alternatively this difference equation can be found directly from (2.89) and (2.90), with the identification (4.10). As noted in the Introduction, for (N +1) × (N +1) dimensional GUE matrices, the 2 distribution of the largest eigenvalue pmax (s) is proportional to e−s E˜ N (s; 2). Hence, using (4.14), s = pmax (s0 ) exp (4.20) pmax (s) [−2t + UN (t; 2)] dt . N →N+1
N →N+1
s0
Application of the τ -Function Theory of PIV, PII to the GUE
385
Comparison with (1.2), after putting N → N + 1 therein and substituting (1.5), gives an and U (t; 2). Using the theory developed identity between the Hamiltonians R(t) N →N +1
here this relation can be independently verified. Proposition 23. The logarithmic derivative of the average E˜ N (t; 2) is related to the resolvent kernel RN (t) by the identity UN (t; 2) = 2t + RN+1 (t) +
RN+1 (t)
RN+1 (t)
.
(4.21)
Proof. From the identifications made above we have UN (t; 2) = H (t) , (N+3,−2,−N) RN+1 (t) = H (t) ,
(4.22) (4.23)
(N+2,0,−N−1)
and we seek to express the first in terms of the second. We note that H (t)
(N+3,−2,−N)
= T1 T2−1 H (t) = 2t + H +
(N+2,0,−N−1)
2(N + 2) − f0
,
(4.24)
2(N + 1) , 2 f0 + 2(N + 2) f2 − f0
(4.25)
where the right-hand side is evaluated at (N + 2, 0, −N − 1). We recognise in this expression the factors of H |(N+2,0,−N−1) = f1 (1/2f0 f2 −N −1) and H |(N+2,0,−N−1) = f1 f2 so that it can be simplified to H (t)
(N+3,−2,−N)
H = 2t + H + . H (N+2,0,−N−1)
The result then follows upon making the appropriate identifications.
(4.26)
One can verify that this transformation T1 T2−1 (and its inverse) is the only nontrivial one which can map the Hamiltonian H into a rational function of H and H . We have H
(α0 +1,α1 −2,α2 +1)
= T1 T2−1 H
(α0 ,α1 ,α2 )
= 2t + H +
(1−α1 )H , H + α 1 f2
(4.27)
and this is rational if α1 = 1 (trivial case) or if α1 = 0 (this case). All other transformations are algebraic functions of H and H .
386
P. J. Forrester, N. S. Witte
4.2. Calculation of FN (s; a). Turning our attention to FN , we can make the following identifications. Proposition 24. The average of the power of the characteristic polynomial is given by the N th member of the τ -function sequence generated by the shift operator T3−1 from the initial parameters (1 + a, −a, 0), FN (λ; a) = τ¯3 [N ](λ; −a), (N)
with the normalization of τ¯3
(4.28)
chosen so that
FN (λ; a) ∼ λNa as λ → ∞,
(4.29)
(Eq. (4.12)). The logarithmic derivative of the average is related to the Hamiltonian by , (4.30) VN (t; a) = H (1+a+N,−a,−N)
(Eq. (4.10)). Proof. We see from (1.12) that FN (λ; a) =
1 C
∞
−∞
dx1 · · ·
∞ −∞
dxN
N
e−xj (λ − xj )a 2
j =1
(xk − xj )2 ,
1≤j
(4.31) where the normalization is such that (4.29) is satisfied. Proceeding as in the derivation of (4.6) and (4.13) we see that this can be written in terms of determinants according to ∞ 1 2 FN (λ; a) = det (λ − x)a+j +k e−x dx . (4.32) j,k=0,...,N−1 C −∞ This is precisely the determinant occurring in (2.52) so we have (4.28). Recalling (2.30), we thus have λ FN (λ; a) = FN (λ0 ; a) exp VN (t; a) dt , (4.33) λ0
where VN (t; a) is given in terms of H with (α0 , α1 , α2 ) equal to (1 + N + a, −a, −N ) as in the formula (4.10) relating UN (t; a) to H . Thus VN (t; a), like UN (t; a) (recall (4.15)), satisfies the nonlinear equation (VN )2 − 4(tVN − VN )2 + 4VN (VN − 2a)(VN + 2N ) = 0.
(4.34)
The asymptotic behaviour (4.29) together with (4.33) implies (4.34) is to be solved subject to the boundary condition Na VN (t; a) ∼ 1 + O(1/t) as t → ∞. (4.35) t Apart from the quantity R(t), which we know decays as a Gaussian, this boundary condition is the same as (4.16). Thus UN (t; a) and VN (t; a) satisfy the same differential equation, and up to a term which decays as a Gaussian, the same boundary condition as
Application of the τ -Function Theory of PIV, PII to the GUE
387
t → ∞. However the t → −∞ behaviours are very different: for UN (t; a) it is given by (4.18), while for VN (t; a) it is (up to a possible phase) again given by (4.35). In addition VN (t; a) satisfies the N −difference equation (4.19) and its a−difference analogue but with the appropriate boundary conditions. It was noted inthe Introduction that for (N+1) × (N+1) dimensional GUE matrices 2 the density, ρ(λ) say, is proportional to e−λ FN (λ; 2). Hence, analogous to N →N+1
(4.20), we see from (4.33) that ρ(λ)
N →N+1
= ρ(λ0 )
N →N+1
exp
On the other hand, we know ρ(λ)
N →N +1
λ
λ0
[−2t + VN (t; 2)] dt .
(4.36)
is proportional to the 2 × 2 determinants of
(1.15). In fact FN (λ; a), for general a ∈ Z>0 can be written as an a × a determinant, a fact which can be understood in the present setting by considering the τ -function sequence (2.64). Proposition 25. The average of the powers of the characteristic polynomial FN (λ; a) obey the duality relation FN (λ; a) Fa (iλ; N ) = , FN (λ0 ; a) Fa (iλ0 ; N )
(4.37)
for all a, N ∈ Z. Proof. First, note that by reversing the steps which led to (4.6) and recalling (4.31) the determinant of (2.64) which specifies τ¯1 [n] can be written as a multiple integral to give τ¯1 [n](λ; α2 ) = CFn (iλ; −α2 ),
(4.38)
where |C| = 1. On the other hand, we see from (2.13) that the Hamiltonian corresponding to τ¯1 [n], H , (4.39) (α0 ,α1 ,α2 )=(1−α2 +n,−n,α2 )
satisfies the differential equation (4.15) with N = −α2 ,
a = n.
(4.40)
(Note that for the latter identification to be possible, we require a ∈ Z≥0 .) Furthermore, from (4.38) and (4.29), we have that d nα2 log τ¯1 [n](t; α2 ) ∼ − , t→∞ dt t
(4.41)
which with the substitutions (4.40) is identical to the boundary condition of (4.35). It follows from these facts that λ Fa (iλ; N ) = Fa (iλ0 ; N ) exp VN (t; a) dt . (4.42) λ0
Comparison with (4.33) then yields (4.37).
388
P. J. Forrester, N. S. Witte
From the definition (4.31) this identity implies the integral identity
∞ −∞
dx1 · · ·
−∞
=C
∞
∞
−∞
dxN
N
2
j =1
dx1 · · ·
e−xj (λ − xj )a
(xk − xj )2
1≤j
dxa
a
e−xj (λ − ixj )N 2
j =1
(xk − xj )2 ,
(4.43)
.
(4.44)
1≤j
and from (4.32) gives the determinant identity ∞ 2 (λ − x)a x j +k e−x dx det j,k=0,...,N−1 −∞ ∞ 2 = C det (λ − ix)N x j +k e−x dx −∞
j,k=0,...,a−1
The integral identity (4.43) has been derived earlier in the context of a theory of generalized Hermite polynomials based on symmetric Jack polynomials [4], and in fact can be generalized so that on the left-hand side the exponent 2 in the product of differences is replaced by 2c and xj2 → cxj2 in the Gaussian, while on the right-hand side this same exponent is replaced by 2/c. Regarding the determinant identity, noting that the right-hand side is proportional to ∞ 2 det (λ − ix)N+j +k e−x dx = C det HN+j +k (λ) , −∞
j,k=0,...,a−1
j,k=0,...,a−1
(4.45) this gives a determinant formula for FN (λ; a), equivalent to that given by Brézin and Hikami [7]. Determinants with a Hankel structure constructed with orthogonal polynomial elements are termed Turánians and their positivity and other properties such as relations with novel Wronskians have been investigated extensively, and reviewed in Karlin and Szegö [17]. Explicit evaluations of Turánians of the Hermite polynomials in terms of the Barnes G-function where the initial degree of the polynomial is zero (N = 0) have been given by Radoux [32]. 4.3. UN (t; a) and VN (t; a) as Painlevé transcendents. The formula (4.10) relates UN (t; a) to the Hamiltonian (2.13). Substituting the appropriate values of α1 and α2 in the first equality of (2.13) and recalling (2.16) shows UN (t; a) =
1 2 1 a2 (q ) − (q + 2t)(q 2 + 2tq − 4a) − + N q, 8q 8 2q
(4.46)
where q satisfies the PIV equation (2.1) with α = 2N + 1 + a,
β = −2a 2 .
(4.47)
In the case a = 0 the functional expression (4.46) agrees with that presented earlier [35, 38], although the transcendent q in the earlier works is the PIV transcendent with α = 2N − 1, β = 0 rather than α = 2N + 1, β = 0 as given by (4.47). In fact it follows from the work [8] that in general Eq. (2.18) has more than one expression in terms of
Application of the τ -Function Theory of PIV, PII to the GUE
389
Painlevé transcendents. In the case α1 = N , α2 = 0 of this equation the results of [8] give the functional expression implied (4.46) with q the PIV transcendent specified with the parameters β = 0 and either α = 2N + 1 or α = 2N − 1 thus reconciling (4.46) in the case a = 0 with the results of [35, 38]. We remark that the theory of [8] gives distinct functional forms for the derivative, 1 1 UN (t; 0) = − 7 q − q 2 − tq, 2 2
(4.48)
in the two case α = 2N + 7 (7 = ±1). Comparison of the formulas (4.30) and (4.10) shows VN (t; a) is given by the same Hamiltonian as UN (t; a). Thus (4.46) remains true with the function UN (t; a) replaced by VN (t; a) on the left-hand side. 5. Edge Scaling in the GUE 5.1. Calculation of E soft (s) and E˜ soft (s; a). To leading order √ the √support of the eigenvalue density for N × N GUE matrices is the interval (− 2N , 2N ). To study distributions √ in the neighbourhood of the largest eigenvalue one shifts the origin to the edge at 2N and then scales the coordinate so as to make the spacings of order unity in the N → ∞ limit. This is achieved by the mapping [10] λ →
√
2N + √
λ 2N 1/6
.
(5.1)
Suppose we make this replacement (in the s-variable) in the probability EN (0; (s, ∞)) as specified by (4.7). Then with √ s E soft (s) := lim EN 0; 2N + √ , (5.2) N→∞ 2N 1/6 (because the eigenvalue density is not strictly zero outside the leading order of its support the edge is referred to as a soft edge) we see that ∞ E soft (s) = exp − r(t) dt , (5.3) s
where r(t) = lim √
1
√
t
. (5.4) 2N 1/6 √ √ Furthermore,√it follows√from changing variables s → 2N + s/ 2N 1/6 in (1.3), √ replacing R( 2N + s/ 2N 1/6 ) by 2N 1/6 r(s) and taking the limit N → ∞ that r(s) satisfies the differential equation (r )2 + 4r (r )2 − sr + r = 0, (5.5) N→∞
2N 1/6
R
2N + √
a result first obtained by Tracy and Widom [36]. Equation (5.5) is a particular case of the Jimbo–Miwa–Okamoto σ form of the Painléve II equation. We will find that the edge scaling is essentially the second coalescence limit of the PIV system to the PII as discussed in Subsect. 3.5.
390
P. J. Forrester, N. S. Witte
Proposition 26. Define the scaling limit of the quantity E˜ N (s; a) by 2 E˜ soft (s; a) := lim Ce−as /2 E˜ N (s; a) √ √ N→∞
s→ 2N+s/ 2N 1/6
Then E˜ soft (s; a) = E˜ soft (s0 ; a) exp
s
s0
.
u(t; a) dt ,
(5.6)
(5.7)
where
− at + UN (t; a) √ , √ 1/6 N→∞ 2N t → 2N+t/ 2N 1/6 = −21/3 H (−21/3 t) ,
u(t; a) = lim √
1
(α0 ,α1 )=(1−a,a)
(5.8) (5.9)
with H (t) is given by (3.4). The function u(s; a) satisfies a second order second degree differential equation of the general Jimbo–Miwa–Okamoto σ form of the Painlevé II equation (u )2 + 4u (u )2 − su + u − a 2 = 0, (5.10) subject to the boundary condition u(s; a)
∼
s→−∞
1/4s 2
+
4a 2 −1 (4a 2 −1)(4a 2 −9) + ... . + 8s 64s 4
(5.11)
The function u(s; a) also satisfies the third order difference equation, related to the alternate discrete Painlevé a-dPI equation, a +1 a + = s − [u(s; a +1) − u(s; a)]2 . u(s; a +1) − u(s; a −1) u(s; a +2) − u(s; a) (5.12) Proof. Unlike the probability EN (0; (s, ∞)), we do not expect the soft edge scaling limit of the quantities E˜ N (s; a) as specified by (4.11) to be well defined. For example, 2 in the case a = 2, it is the combination e−s E˜ N (s; 2) which is proportional to pmax (s), and thus which should have a well defined scaling limit. This suggests that for general a we consider the scaling limit of Ce−as
2 /2
E˜ N (s; a).
(5.13)
According to (4.14) we have e−as
2 /2
2 E˜ N (s; a) = e−as0 /2 E˜ N (s0 ; a) exp
s
s0
[−at + UN (t; a)] dt .
(5.14)
Relation (5.8) follows from the definitions of E˜ soft (s; a) and u(t; a) and (5.14). The scaling in (5.8) is identical to the coalescence of the PIV system to the PII defined in Prop. 20, with the identifications 7 = (2N )−1/6 , α(I V )1 = −a and (4.10) for the relationship of the PIV Hamiltonian and UN (t; a), and the scale changes tII = −21/3 t
Application of the τ -Function Theory of PIV, PII to the GUE
and HII (tII )
(α0 ,α1 )=(1−a,a)
391
= −2−1/3 u(t; a). Proceeding as in the derivation of (5.5) we
find from (4.15), or from (3.17) using the change of scale (5.9), that u satisfies (5.10). To formulate the boundary condition for u(t; a), we first recall [36] that the s → −∞ boundary condition of r(s) in (5.5) is given by r(s)
∼
s→−∞
1/ s 2 4
1 9 1 + +O 7 , 4 8s 64s s
−
(5.15)
and this corresponds to the asymptotic behaviour [36] E soft (s)
∼
s→−∞
exp
s3 12
+
1 log(−s) . 8
(5.16)
Also, we know that E˜ soft (s; 2) is proportional to the derivative of E soft (s), which implies E˜ soft (s; 2)
∼
s→−∞
exp
s3 12
+ C log(−s) .
(5.17)
This suggests that for general a we seek a solution of (5.10) with the s → −∞ boundary condition ∞
u(s; a)
1 2 % cj . s + s→−∞ 4 sj ∼
(5.18)
j =1
Substitution shows that in fact (5.10) has a unique solution of this form, and furthermore c1 =
4a 2 − 1 , 8
c2 = c3 = 0,
c4 =
(4a 2 − 1)(4a 2 − 9) , 64
....
The third order difference equation is just (3.37) with the scale change (5.9).
(5.19)
We see from (5.11) that the asymptotic expansion of u(s; a) terminates for a equal to half an odd integer. Recalling (5.9) this is the case of α1 half an odd integer of the PII Hamiltonian (3.4). From the text about (3.29) we know that this is precisely the parameter value for which the PII Hamiltonian can be expressed as a rational function of t. 2 The fact that e−s E˜ N (s; 2) is proportional to pmax (s) implies that the corresponding soft (s) are proportional. Hence quantities in the scaled limit, E˜ soft (s; 2) and pmax s soft soft pmax (s) = pmax (s0 ) exp u(t; 2) dt . (5.20) s0
The relation soft pmax (s) =
d soft E (s) ds
(5.21)
(Eq. (1.2)) with E soft (s) specified by (5.3) then implies an identity between transcendents analogous to that of Prop. 23.
392
P. J. Forrester, N. S. Witte
Proposition 27. The quantity u(t; 2) of (5.20) and the quantity r(t) of (5.3) are related by u(t; 2) =
d log r(t) + r(t). dt
(5.22)
:= H [n],
(5.23)
Proof. With H (t)
α1 =n
it follows from (5.9) and the substitution r(t) = u(t; 0) that (5.22) is equivalent to H [2] =
d log H [0] + H [0]. dt
(5.24)
But from the analogue of the first equality in (3.21), together with (3.15) and Table 5, α1 α1 + T2 q[0] + H [2] = T22 H [0] = H [0] + q[0] + f1 [0] f1 [0] α1 =0 1 . = H [0] + 2 2q [0] + t − p[0] Furthermore the first equality in (3.18) together with (3.4) in the case α1 = 0 give d 1 = log H [0]. 2q 2 [0] + t − p[0] dt
(5.25)
Note. According to (5.9) we therefore have '2 & −2−1/3 u(−2−1/3 t; a) = 1/2 q (t, a − 1/2) t − 1/2 q 2 (t, a − 1/2) + 2
2
− aq(t, a − 1/2),
(5.26)
where q = q(t, α) satisfies the PII equation (3.1). Also the first member of (3.18), along with (3.5) and (3.6), gives d t −2−1/3 u(−2−1/3 t; a) = −1/2 q (t, a − 1/2) + q 2 (t, a − 1/2) + . (5.27) dt 2 In the special case a = 0 we see from (3.7) that (5.27) simplifies to read u (t; 0) = −q 2 (t, 0)
(5.28)
& '2 u(t; 0) = q (t, 0) − tq 2 (t, 0) − q 4 (t, 0).
(5.29)
while (5.26) simplifies to read
The results (5.28) and (5.29), deduced in a different way, can be found in [36].
Application of the τ -Function Theory of PIV, PII to the GUE
393
5.2. Calculation of F soft (λ; a). Let us next consider the scaled quantity 2 F soft (λ; a) := lim Ce−aλ /2 FN (λ; a) √ , √ 1/6 N→∞
(5.30)
λ → 2N+λ/ 2N
where FN (λ; a) is specified by (4.31). Because of the analogy with E˜ soft (s; a), which follows from the identical structure of (4.14), (4.15) and (4.33), (4.34), analogous to (5.7) we have λ v(t; a) dt , (5.31) F soft (λ; a) := F soft (λ0 ; a) exp λ0
where v(t; a) = lim √ N→∞
1 2N 1/6
− at + VN (t; a)
(5.32)
√ √ t → 2N+t/ 2N 1/6
satisfies the differential equation (5.10) and the difference equation (5.12). The only difference between the logarithmic derivatives v(t; a) and u(t; a) is the boundary condition. Proposition 28. The scaled averages of the powers of the characteristic polynomial F soft (λ; a) for a ∈ Z≥0 have the determinantal form F soft (λ; a) = (−1)a(a−1)/2 det
d j +k Ai (λ) , j,k=0,...,a−1 dλj +k
(5.33)
which was shown by Okamoto [26] to be a τ -function sequence for Painlevé II (recall Prop. 18). Furthermore this scaled average has a multiple integral representation of the Kontsevich form [18], F soft (λ; a) =
(−1)a(a−1)/2 (−2πi)a
i∞ −i∞
dv1 · · ·
i∞ −i∞
dva
a
evj /3−λvj 3
j =1
(vk − vj )2 .
1≤j
(5.34) The logarithmic derivative v(t; a) has the asymptotic expansion v(t; a) ∼ −at 1/2 − t→∞
a(4a 2 +1) a2 + . 4t 32t 5/2
(5.35)
Proof. For positive integer a we can determine the λ → ∞ behaviour of F soft (λ; a), and thus the corresponding behaviour of v(t; a), by making use of the scaled form of the right-hand side of (4.31). To determine this scaled form, we first require the explicit values of the constants in (4.31), (4.44) and (5.30). Let us denote these constants by C1 , C2 , C3 respectively. Then from (4.31) and (4.29) we read off that C1 := c(N) = =2
∞ −∞
dx1 · · ·
−N 2 /2
∞ −∞
dxN
N j =1
(2π )N/2 G(N +2),
e−xj
2
(xk − xj )2
1≤j
(5.36)
394
P. J. Forrester, N. S. Witte
where G(x) denotes the Barnes G-function, characterised for x a positive integer by the functional property G(x + 1) = :(x)G(x) and the initial value G(1) = 1. The integral evaluation in (5.36) can be derived by making use of the Vandermonde determinant identity (4.5) written in terms of Hermite polynomials. The proportionality constant C in (4.44) is the same as that in (4.43) and thus from (5.36) given by C2 =
c(N ) . c(a)
(5.37)
Finally, we seek the value of the constant in (5.30). We know that in the case a = 2, 2 e−aλ /2 FN (λ; a) is proportional to the eigenvalue density for (N+1)×(N+1) dimensional GUE matrices. Specifically ρ(λ)
N →N+1
= (N +1)
c(N ) −λ2 e FN (λ; 2). c(N +1)
(5.38)
Since it is the combination ρ(λ) dλ which √ has a scaled limit, it follows that in the case a = 2, C3 = (N +1)c(N )/(c(N +1) 2N 1/6 ). This suggests that for general a ∈ Z≥0 we should choose p :(N +1+a/2) c(N ) 1 C3 = ca , (5.39) √ :(N +1) c(N +a/2) 2N 1/6 where p is a power to be determined and ca depends only on a (p = 1, ca = 1 for a = 2). Substituting (5.36), (5.37), (5.39) in (4.31), (4.44), (5.30) respectively we see that for a ∈ Z>0 , p :(N +1+a/2) 1 c(N ) 2 e−aλ /2 √ N→∞ :(N +1) c(N +a/2)c(a) 2N 1/6 ∞ 2 × det (λ − ix)N x j +k e−x dx √ √ 1/6
F soft (λ; a) = ca lim
λ → 2N+λ/ 2N
−∞
j,k=0,...,a−1
.
(5.40) But analogous to the equality in (4.6) we have det
∞
−∞
(λ − ix)N x j +k e−x dx 2
= (−1)a(a−1)/2 det
j,k=0,...,a−1 ∞
−∞
(λ − ix)N+j +k e−x dx 2
j,k=0,...,a−1
.
(5.41)
This can be further rewritten by noting that analogous to (2.60),
∞
−∞
d j +k −λ2 ∞ 2 e (λ − ix)N e−x dx j +k dλ −∞ j +k √ d 2 2 = (−2)−(j +k) eλ 2−N π j +k e−λ HN (λ) . (5.42) dλ
(λ − ix)N+j +k e−x dx = (−2)−(j +k) eλ 2
2
Application of the τ -Function Theory of PIV, PII to the GUE
395
Making use of the asymptotic expansion for the Barnes G-function [5] x2 x 3 1 log x − x 2 + log 2π − log x + O(1), x→∞ 2 4 2 12
log G(x + 1) ∼
(5.43)
and the Plancherel–Rotach asymptotic expansion of the Hermite polynomials [33] exp(−x 2 /2)HN (x) = π −3/4 2N/2+1/4 (N !)1/2 N −1/12 {πAi(−t/31/3 ) + O(N −2/3 )}, (5.44) where x = (2N )1/2 − 2−1/2 3−1/3 N −1/6 t and with Ai(x) denoting the Airy function, we see from Eqs. (5.40), (5.41) and (5.42) that with p = a/2 in (5.39) and appropriate ca , the determinantal representation (5.33) holds. Furthermore, in the case a = 2 we read off the functional form 2 (5.45) Ai (x) − Ai(x)Ai (x), which is the known expression [10] for the scaled soft edge density in the GUE. Another point of interest, which follows from the integral formula i∞ 1 1 exp v 3 − xv dv, (5.46) Ai(x) = 2π i −i∞ 3 is that (5.33) can be written F soft (λ; a) =
(−1)a(a−1)/2 det a (−2πi)
i∞
−i∞
exp
1 3
v 3 − λv v j +k dv
j,k=0,...,a−1
. (5.47)
Thus, reversing the reasoning leading from (4.4) to (4.6) we have the multiple integral representation (5.34) for F soft (λ; a), which is an example of the class of integrals studied by Kontsevich [18]. Consider now the asymptotic form of (5.33). In the case a = 1 this is just the Airy function, which has the known x → ∞ asymptotic form (see e.g. [30, p. 116]) ∞ % e−ξ uk (−1)k k , x→∞ 2π 1/2 x 1/4 ξ
Ai (x) ∼
(5.48)
k=0
where ξ := 23 x 3/2 , u0 = 1 and uk =
(2k + 1)(2k + 3) · · · (6k − 1) , (216)k k!
k ≥ 1.
(5.49)
It follows from this and (5.33) that for general a ∈ Z>0 , ∞
log F soft (λ; a) ∼ − λ→∞
% c˜j 2a 3/2 λ + C log λ + c0 + , 3 λ3j/2
(5.50)
j =1
which in combination with (5.31) implies that we must seek a solution of (5.10) (with u replaced by v) subject to the boundary condition ∞
v(t; a) ∼ −at 1/2 + t→∞
C % cj + . t t 3j/2+1 j =1
(5.51)
396
P. J. Forrester, N. S. Witte
Substitution of (5.51) in (5.10) shows there is a unique solution of this form, with C=−
a2 , 4
c1 =
a(1+4a 2 ) , 32
...
(5.52)
given by (5.35). 6. Conclusions – A Programme We have applied the Okamoto τ -function theory of PIV and PII to the computation of E˜ N (s; a) and FN (s; a) for the GUE and its scaled soft edge limit. As noted in the Introduction, the Okamoto τ -function theory applies equally as well to the computation of E˜ N (s; a) and FN (s; a) for all matrix ensembles with a unitary symmetry and classical weight functions (1.6). Thus we expect to be able to compute E˜ N (s; a) and FN (s; a) in the cases of the Laguerre, Jacobi and Cauchy ensembles (special cases of FN (s; a) have been evaluated in terms of Painlevé transcendents for the Laguerre ensemble [34], and for the Jacobi ensemble [3]). In future studies we will undertake this task by following the programme used here for the GUE, the main steps of which can be itemised as follows: – From the definitions of the gap probability EN (0; I ), I a single interval including the boundary of the eigenvalue support, and E˜ N (s; a), FN (s; a) as N -dimensional multidimensional integrals they can be converted into N ×N determinants analogous to (4.6), (4.13) and (4.32) respectively. – Using an identity analogous to (2.60), the determinants can be put into the double Wronskian form (2.58), with d/dt replaced by t
d , dt
t (1 − t)
d dt
(6.1)
in the Laguerre and Jacobi ensembles respectively. – The Okamoto τ -function theory of PV and PVI [25, 27] gives these same determinants as τ -function sequences, in which the initial members are τ [0] = 1, and τ [1] the solution of the particular classical equation associated with relevant Painlevé transcendent when the parameter sequences begin on a wall of the Weyl chamber in the affine space of parameters. The classical solutions, and their polynomial specialisations, are noted for each of the Painlevé transcendents in Table 8, Table 8. Classical solutions of the Painlevé transcendents PJ
Classical Solution
PI PII PIII PIV PV PVI
– Airy Bessel Hermite-Weber Confluent Hypergeometric Gauß Hypergeometric
Classical Orthogonal Polynomial – – – Hermite Laguerre Jacobi
Application of the τ -Function Theory of PIV, PII to the GUE
397
– The logarithmic derivatives (with d/dt replaced by (6.1) as appropriate) RN (s), UN (s), VN (s) coincide with the Hamiltonians in the Painlevé theory and as such satisfy certain second order second degree ODEs of the Painlevé type. – The τ -function sequence {τ0 [N ](t; a)}N≥0 , say corresponding to FN (s; a), is simply related to another τ -function sequence {τ1 [a](t; N )}a≥0 . Both τ -functions relate to the same Hamiltonian but result from the action of different shift operators. Because the shifts are commutative one has τ1 [a](t; N ) τ0 [N ](t; a) = . τ0 [N ](t0 ; a) τ1 [a](t0 ; N )
(6.2)
Identities of this type for the Laguerre and Jacobi ensembles, written as multiple integrals, are already known from [4]. – For all the independent shift operators and sequences of q[n], p[n], H [n], τ [n] there exist difference equations generated by the Bäcklund transformations of these shifts. It has been conjectured that all the difference equations arising in this way are discrete Painlevé equations satisfying integrability criteria such as singularity confinement analogous to the Painlevé criteria. – In the appropriate edge scaling limit, the analogues of r(s), u(s), v(s) are Hamiltonian functions for PII or PIII, and satisfy the corresponding second order second degree equation. Acknowledgement. This research has been supported by the Australian Research Council. PJF thanks M. Noumi for explaining aspects of his work with Y. Yamada, and thanks K. Aomoto for obtaining funds for his visit to Japan in June 2000 which made that possible.
References 1. Adler, M., Shiota, T. and van Moerbeke, P.: Random Matrices, Vertex Operators and the Virasoro Algebra. Phys. Lett. A 208, 67–78 (1995) 2. Adler, V.E.: Nonlinear chains and Painlevé equations. Physica D 73, 335–351 (1994) 3. Adler, M. and van Moerbeke, P.: Integrals over classical Groups, Random permutations, Toda and Toeplitz lattices. math.CO/9912143 4. Baker, T.H. and Forrester, P.J.: The Calogero–Sutherland model and generalized classical polynomials. Commun. Math. Phys. 188, 175–216 (1997) 5. Barnes, E.W.: The theory of the G-function. Quart. J. Pure Appl. Math. 31, 264–313 (1900) 6. Bassom, A.P., Clarkson, P.A. and Hicks, A.C.: Bäcklund Transformations and Solution Hierarchies for the Fourth Painlevé Equation. Studies Appl. Math. 95, 1–71 (1995) 7. Brézin, E. and Hikami, S.: Characteristic polynomials of random matrices. math-ph/9910005 8. Cosgrove, C.M. and Scoufis, G.: Painlevé classification of a class of differential equations of the second order and second degree. Stud. Appl. Math. 88, 25–87 (1993) 9. Forrester, P.J.: Random Matrices and Log Gases. Book in preparation 10. Forrester, P.J.: The spectrum edge of random matrix ensembles. Nucl. Phys. B 402, 709–728 (1993) 11. Grammaticos, B. and Ramani, A.: From continuous Painlevé IV to the asymmetric discrete Painlevé I. J. Phys. A: Math. Gen. 31, 5787–5798 (1998) 12. Gromak, V.I.: Bäcklund Transformations of Painlevé Equations and their Applications. In: Conte, R. (ed.) The Painlevé Property: One Century later. CRM Series in Mathematical Physics, New York: Springer Verlag, 1999, pp. 687–734 13. Iwasaki, K., Kimura, H., Shimomura, S. and Yoshida, M.: From Gauss to Painlevé. A modern theory of special functions. Braunschweig: Vieweg Verlag, 1991 14. Jimbo, M. and Miwa, T.: Monodromy preserving deformation of linear ordinary differential equations with rational coefficients II. Physica D 2, 407–448 (1981) 15. Kajiwara, K. and Masuda, T.: A generalization of determinant formulae for the solutions of Painlevé II and XXXIV equations. J. Phys. A: Math. Gen. 32, 3763–3778 (1999)
398
P. J. Forrester, N. S. Witte
16. Kajiwara, K., Masuda, T., Noumi, M., Ohta, Y. and Yamada, Y.: Determinant Formulas for the Toda and Discrete Toda Equations. solv-int/9908007 17. Karlin, S. and Szegö, G.: On certain determinants whose elements are orthogonal polynomials. In: Askey, R. (ed.) Gabor Szegö: Collected Papers. Volume 3, Boston, MA: Birkhäuser, 1982, pp. 603–762 18. Kontsevich, M.: Intersection Theory on the Moduli Space of Curves and the Matrix Airy Function. Commun. Math. Phys. 147, 1–23 (1992) 19. Mehta, M.L.: Matrix Theory. Selected Topics and Useful Results. Delhi: Hindustan Publishing Corporation, 1989 (1) 20. Noumi, M. and Yamada, Y.: Higher order Painlevé equations of type Al . Funckcial. Ekvac. 41, 483–503 (1998) 21. Noumi, M. and Yamada, Y.: Symmetries in the fourth Painlevé equation and Okamoto polynomials. Nagoya Math. J. 153, 53–86 (1999) 22. Noumi, M. and Yamada, Y.: Affine Weyl group symmetries in Painlevé type equations. In: Howls, C. J., Kawai, T. and Takei, Y. (eds.) Toward the exact WKB analysis of Differential Equations, Linear or NonLinear. Kyoto: Kyoto University Press, 2000, pp. 245–259 23. Noumi, M. and Y.Yamada: Affine Weyl groups, discrete dynamical systems and Painlevé equations. Commun. Math. Phys. 199, 281–295 (1998) 24. Okamoto, K.: On the τ -function of the Painlevé equations. Physica D 2, 525–535 (1981) 25. Okamoto, K.: Studies on the Painlevé Equations. II. Fifth Painlevé Equation PV . Jap. J. Math. 13, 47–76 (1985) 26. Okamoto, K.: Studies on the Painlevé Equations. III. Second and Fourth Painlevé Equations, PII and PIV . Math. Ann. 275, 221–255 (1986) 27. Okamoto, K.: Studies on the Painlevé Equations. I. Sixth Painlevé Equation PVI . Ann. Mate. Pura Appl. 146, 337–381 (1987) 28. Okamoto, K.: Algebraic relations among 6 adjacent τ -functions related to the fourth Painlevé system. Kyushu J. Math. 50, 513–532 (1996) 29. Okamoto, K.: The Hamiltonians associated to the Painlevé Equations. In: Conte, R. (ed.) The Painlevé Property: One Century later. CRM Series in Mathematical Physics, New York: Springer Verlag, 1999, pp. 735–787 30. Olver, F.W.J.: Asymptotics and Special Functions. New York: Academic Press, 1974 31. Painlevé, P.: Sur les équations différentielles du second ordre à points critiques fixes. Academe des Sciences Comptes Rendus 143, 1111–1117 (1906) 32. Radoux, C.: Detérminant de Hankel construit sur les polynômes de Hérmite. Ann. Soc. Sci. Bruxelles Sér. I 104, 59–61 (1990) 33. Szegö, G.: Orthogonal Polynomials. Colloquium Publications 23, Providence, RI: American Mathematical Society, Third edition, 1967 34. Tracy, C.A. and Widom, H.: On the distributions of the lengths of the longest monotone subsequences in random words. math.CO/9904042 35. Tracy, C.A. and Widom, H.: Fredholm Determinants, Differential Equations and Matrix Models. Commun. Math. Phys. 163, 33–72 (1994) 36. Tracy, C.A. and Widom, H.: Level-spacing distributions and the Airy kernel. Commun. Math. Phys. 159(1), 151–174 (1994) 37. Vorob’ev, A.P.: On the rational solutions of the second Painlevé equation. Differencial’nye Uravnenija 1, 79–81 (1965) 38. Witte, N.S., Forrester, P.J. and Cosgrove, C.M.: Integrability, Random Matrices and Painlevé Transcendents. Anziam J. To appear Communicated by P. Sarnak
Commun. Math. Phys. 219, 399 – 442 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Correlation Functions for M N /SN Orbifolds Oleg Lunin, Samir D. Mathur Department of Physics, The Ohio State University, Columbus, OH 43210, USA Received: 20 July 2000 / Accepted: 17 December 2000
Abstract: We develop a method for computing correlation functions of twist operators in the bosonic 2-d CFT arising from orbifolds M N /SN , where M is an arbitrary manifold. The path integral with twist operators is replaced by a path integral on a covering space with no operator insertions. Thus, even though the CFT is defined on the sphere, the correlators are expressed in terms of partition functions on Riemann surfaces with a finite range of genus g. For large N , this genus expansion coincides with a 1/N expansion. The contribution from the covering space of genus zero is “universal” in the sense that it depends only on the central charge of the CFT. For 3-point functions we give an explicit form for the contribution from the sphere, and for the 4-point function we do an example which has genus zero and genus one contributions. The condition for the genus zero contribution to the 3-point functions to be non-vanishing is similar to the fusion rules for an SU (2) WZW model. We observe that the 3-point coupling becomes small compared to its large N limit when the orders of the twist operators become comparable to the square root of N – this is a manifestation of the stringy exclusion principle. 1. Introduction The AdS/CFT correspondence gives a remarkable relation between string theory on a spacetime and a certain conformal field theory (CFT) on the boundary of this spacetime [1]. In particular the near horizon geometry of D3 branes gives the space AdS5 × S 5 , and string theory on this space is conjectured to be dual to N = 4 supersymmetric Yang–Mills on the boundary of the AdS5 . When the string theory is weakly coupled, tree level supergravity is a valid low energy approximation. The dual CFT at this point is a strongly coupled Yang–Mills theory, which cannot therefore be studied perturbatively. On the other hand weakly coupledYang–Mills theory is dual to string theory in a domain of parameters where the latter cannot be approximated by supergravity on a gently curved spacetime. In spite of this fact, it turns out that certain quantities computed in the supergravity limit of string theory agree with their corresponding dual quantities in the
400
O. Lunin, S. D. Mathur
Yang–Mills theory, where the latter computation is done at weak coupling. One believes that such an agreement is due to the supersymmetry which is present in the theory; this supersymmetry would for example protect the dimensions of chiral operators from changing when the coupling is varied. Interestingly, the values of 3-point correlation functions of chiral operators are also found to agree, when we compare the tree level supergravity calculation on AdS space with the computation in the free Yang–Mills theory [2]; the latter is just the result obtained by Wick contractions among the fields in the chiral operators. It is not clear if the 3-point function of chiral operators is protected against change of coupling at all values of N ; the above result just tells us that the large N results agree between the weak and strong coupling limits.1 One is thus led to ask: are the 3-point functions of chiral operators protected in the other cases of theAdS/CFT correspondence? In particular we will be interested in the case of the D1–D5 system [1, 4], which gives a near-horizon geometry AdS3 ×S 3 ×M, where M is a torus T 4 or a K3 space. This system is of great interest for the issues related to black holes, since it yields, upon addition of momentum excitations, a supersymmetric configuration which has a classical (i.e. not Planck size) horizon. In particular, the Bekenstein entropy computed from the classical horizon area agrees with the count of microstates for the extremal and near extremal black holes [5]. Further, the low energy Hawking radiation from the hole can be understood in terms of a unitary microscopic process, not only qualitatively but also quantitatively, since one finds an agreement of spin dependence and radiation rates between the semiclassically computed radiation and the microscopic calculation [6]. While it is possible to use simple models for the low energy dynamics of the D1–D5 system when one is computing the coupling to massless modes of the supergravity theory, it is believed that the exact description of this CFT must be in terms of a sigma model with target space being a deformation of the orbifold M N /SN , which is the symmetric orbifold of N copies of M. (Here N = n1 n5 , with n1 being the number of D1 branes and n5 being the number of D5 branes, and we must take the low energy limit of the sigma model to obtain the desired CFT.) In particular we may consider the “orbifold point” where the target space is exactly the orbifold M N /SN with no deformation. It was argued in [7] that this CFT does correspond to a certain point in the moduli space of string theories on AdS3 × S 3 × M, but at this point the string theory is in a strongly coupled domain where it cannot be approximated by tree level supergravity on a smooth background. The orbifold point is the closest we can get to a “free” theory on the CFT side, and thus this point is the analogue of free N = 4 supersymmetric Yang–Mills in the D3 brane example. Thus one would like to compare the three point functions of chiral operators in the supergravity limit with the 3-point functions at the orbifold point, to see if we have an analogue of the surprising agreement that was found in the case of the AdS5 − 4 − d Yang–Mills duality. The orbifold group in our case is SN , the permutation group of N elements. This group is nonabelian, in contrast to the cyclic group ZN which has been studied more extensively in the past for computation of correlation functions in orbifold theories [9]. Though there are some results in the literature for general orbifolds [10], the study of nonabelian orbifolds is much less developed than for abelian orbifolds. It turns out however that the case of the SN orbifolds has its own set of simplifications which make it possible to develop a technique for computation of correlation functions for these theories. The essential quantities that we wish to compute are the correlation functions of “twist operators”, in the CFT that arises from the infra-red limit of the 2-d sigma model with target space M N /SN . If we circle the insertion of a twist operator 1 See however [3] for an analysis of the finite N case.
Correlation Functions for M N /SN Orbifolds
401
of the permutation group SN , different copies of the target space M permute into each other. We pass to the covering space of the 2-d base space, such that on this covering space the fields of the CFT are single-valued. For the special case where the orbifold group is SN , the path integral on the base space with twist operators inserted becomes a path integral over the covering space for the CFT with only one copy of M, with no operator insertions. Thus the correlation functions of twist operators can be rewritten as partition functions on Riemann surfaces of different genus for the CFT arising from one copy of M. In the simplest case, which is also the case giving the leading contribution at large N , the genus of the covering surface is zero, and we just get the partition function on the sphere. But the metric on this sphere is determined by the orders and locations of the twist operators. We can write this metric as g = eφ g, ˆ for some fiducial metric g, ˆ provided we take into account the conformal anomaly given by the Liouville action for φ. It turns out that φ is harmonic outside a finite number of isolated points, so the Liouville action can be computed by observing the local behavior of the covering surface at these points. In this manner we can compute any correlation function of twist operators. For given operators we get contributions from only a finite range of genera for the covering surface. We compute the 2-point function of twist operators, from which we recover their well known scaling dimensions. We then compute the contribution to 3-point functions which comes from covering surfaces of genus zero. This gives the complete result for the fusion coefficients of twist operators for a subset of cases (the “single overlap” cases) and the leading result at large N for all cases. We then compute the 4-point function for twist operators of order 2; this correlator has contributions from genus 0 and genus 1, and we compute both contributions. We find a certain “universality” in the correlation functions, since the Liouville action depends only on the central charge c of the CFT with target space M. The contribution from higher genus covering surfaces will involve the partition functions at those genera, while the leading order result coming from the covering surface of genus zero will depend only on c. These observations generalize the fact that the dimensions of the twist operators depend only on c. We address only bosonic operators in this paper, though we expect the extension to the supersymmetric case to be relatively straightforward. Thus we also do not address the comparison to supergravity; this comparison should be carried out only for the supersymmetric case. But we do observe some features of our correlation functions that accord with some of the patterns that emerge in the supergravity computation. In particular we find a similarity between the condition for the 3-point correlator to have a contribution from genus zero covering surfaces, and the condition for primaries to fuse together in the SU (2) Wess–Zumino–Witten model. There are several earlier works that relate to the problem we are studying, in particular [8, 11, 12, 14–16]. We will mention these in more detail in the context where they appear. The plan of this paper is the following. Section 2 describes our method of computing correlation functions. In Sect. 3 we compute the 2-point functions, and thus recover the scaling dimensions of the twist operators. In Sect. 4 we construct the map to the covering space for 3-point functions, for the case where the covering space is a sphere. In Sect. 5 we find the Liouville action associated to this map. In Sect. 6 we use the above results to obtain the contribution to 3-point functions from covering surfaces of genus zero. Section 7 discusses an example of a 4-point function. Section 8 is a discussion.
402
O. Lunin, S. D. Mathur
2. Computing Correlation Functions Through the Liouville Action 2.1. The twist operators. Let us consider for simplicity the case M = R, i.e. the noncompact real line. Then the sigma model target space M N /SN can be described through a collection of N free bosons X1 , X2 , . . . XN , living on a plane parameterized by the complex coordinate z. We will see later that we can extend the analysis directly to other CFTs. Let the z plane have the flat metric ds 2 = dzd z¯ .
(2.1)
The CFT is defined through a path integral over the values of the X i , with action S = d 2 z2∂z X i ∂z¯ X i . (2.2) We now make the definition of this partition function more precise. We cut off the z plane at a large radius |z| =
1 , δ
δ small.
(2.3)
We want boundary conditions at this large circle to represent the fact that the identity operator has been inserted at infinity. We will explain the norm of this boundary state later on. Thus we imagine that our CFT is defined on a “sample” of size 1/δ – correlation functions are to be computed by putting the operators at |zi | << 1/δ and we take δ to zero at the end of the calculation. (An exception will be the insertion of an operator at infinity, which we will have to define separately.) We write the path integral for a single boson on the z plane as Zδ = DX e−S . (2.4) The path integral for N bosons, with no twist operator insertions, is (Zδ )N . The twist operator σ12 (z1 ) can be described through the following. Cut a circular hole of radius in the z plane about the point z1 . While the path integral over X i , i = 3, . . . N is left unchanged, we modify the boundary conditions on X1 , X2 such that as we go around the hole at z1 we get X1 → X2 ,
X2 → X1 .
(2.5)
(z ), where the in the superscript reminds us of the regulation We call this operator σ12 1 used to define the twist. Note that we still have to define more precisely the state that is inserted at the edge of the hole of radius – we will do this below. If we want to maintain the boundary condition at the circle |z| = 1/δ (and introduce no twist there) we must insert at some point z2 another such twist operator σ12 ; the size of the circular hole around z2 is also . Let us compute the path integral with these boundary conditions, and call it Z,δ [σ2 (z1 ), σ2 (z2 )]. Then we define the correlation function
σ (z1 )σ (z2 )δ ≡
Z,δ [σ2 (z1 ), σ2 (z2 )] . (Zδ )N
(2.6)
We will later define rescaled twist operators, and the cutoffs , δ will disappear from all final answers.
Correlation Functions for M N /SN Orbifolds
403
2.2. Path integral on the covering space. The functions X 1 , X2 above were not single valued in the z plane due to the insertion of the twist operators. We wish to pass to a covering space where these functions would become one single valued function. Since the fields X3 , . . . XN are not involved in the twist, they cancel out in the RHS of (2.6). We thus consider only X 1 , X2 in the following. Consider a configuration of the fields X 1 , X2 which contributes to the path integral. Consider a simply connected patch of the z plane, which excludes the holes around z1 , z2 . Over this patch there are two functions defined: one for each field, though due to the twists there is no global way to label the functions uniquely as being X 1 or X 2 . Take any one of the functions over this patch – call it X(z). To construct the covering surface , let this open set in z be a patch on , with the complex structure given by z, and the metric also equal to the metric (2.1) of the z plane. There is one field X that will be defined over , and over this patch let it be the above mentioned function X(z). Now consider another such simply connected open patch, partly overlapping with the first, and use it to define another patch on . Clearly, as we go around the point z1 , following these overlapping patches, the surface will look locally like the Riemann surface of a function t = (z − z1 )1/2 .
(2.7)
Further, the functions X1 , X2 will be both encoded in the function X which will be single valued on , and the action of the configuration of X 1 , X2 will be reproduced if in each patch we use for X the action S= d 2 z 2∂z X∂z¯ X. (2.8) patch
The coordinate z cannot be a globally well defined coordinate for , since a generic value of z corresponds to two points on . We will call our choice of coordinate on as t, which will be locally holomorphically related to z. In (2.8) we must evaluate the path integral on the patch using the metric induced from the z plane on the patch; the path integral depends in general on the metric and the physical problem is defined through the metric chosen on the z plane. Thus in terms of the coordinate t used to describe we will have ds 2 = dzd z¯ = |
dz 2 | dtd t¯. dt
(2.9)
For the example above we can take z=a
t2 . 2t − 1
(2.10)
Near the location z = 0, t = 0, we evidently have z ∼ t 2 , so that t parameterizes the covering surface near z = 0. But the map between z and t is singular also at z = a, t = 1, since near t = 1, dz ≈ 2a(t − 1), dt
z − a ≈ a(t − 1)2 .
(2.11)
Thus the twist operators are located at z1 = 0 and z2 = a. The covering space (parameterized by t) is a double cover of the original sphere parameterized by z. We had cut the z plane at |z| = 1/δ, and inserted the identity there.
404
O. Lunin, S. D. Mathur
This boundary in the z space corresponds to two regions in the t space: z → ∞ maps to t → ∞ as well as t → 1/2. Thus in the t plane we will have a boundary near infinity, as well as in a small disc cut out around t = 1/2. We will have the identity inserted at each of these boundaries; the precise state with norm will be defined in the subsection below. We now note that we have simply a path integral over a single free boson X on – there is no twist operator left in the problem, and any boundaries present on carry only the identity state. We will show below how to close these holes in , and then we will have just a path integral over a closed surface to compute. The method of passing to a covering space to analyze orbifold correlation functions has been studied by many authors, for example [8, 9]. The observation that for symmetric orbifolds one gets a single copy of the target space with no nontrivial operator insertions on the covering space is implicit in [11]. The notion of passing to the covering space to take into account the twist operators for symmetric orbifolds is also used in the computation of partition functions in [12]. The Z2 orbifold is the same as the S2 orbifold, and the map to the covering space was used in [8, 9] to find the 4-point correlation of twist operators for the Z2 orbifold of a complex boson. We will depart from the usual way of computing the correlation functions of twist operators, and use a different way which we describe below. The usual computation for
σ σ (and the one adopted in [9, 11]) proceeds by first finding f ≡
∂X i (z)∂X i (w)σ (z1 )σ (z2 )
σ (z1 )σ (z2 )
(2.12)
by looking at the singularities of f as a function of z, w, and constructing a function with these singularities. One then takes the limit z → w, subtracts the singularity and constructs the stress tensor T = 21 ∂X i ∂X i (w). Next, one uses the conformal Ward identity to relate T σ σ to ∂σ σ , thus obtaining an expression for ∂z log σ (z)σ (0). Solving this equation gives the functional form of the 2-point function, and the dimension of σ can be read off from the solution. A similar analysis can be done for the 3-point function, but the functional form of the 3-point function of primary fields is determined by their dimensions, and would tell us nothing new. One cannot find the fusion coefficients Cij k between the twist operators from the 3-point analysis because the method does not determine the overall normalization of the correlator. Thus to find the Cij k one applies the method to the 4-point function σ σ σ σ , finds the functional form of this correlator, and then uses factorization to extract the Cij k . To be able to use such a method one must have a simple stress tensor which can be written as a product of fields, each of which has a simple known behavior near the twist operators. One must also use inspection to construct the correlators like (2.12). Further, to find the Cij k we need to go up to the 4-point function. The method we use will apply to SN orbifolds, but not for example to ZN orbifolds with N > 2. On the other hand we will not need that the stress tensor have a simple form (in fact we will not use the stress tensor at all). Further, we can compute the Cij k using only the 2 and 3-point functions. Thus the method is suited to the computation of correlation functions for CFTs arising from sigma models with target space M N /SN , which arise in D-brane physics. The method also brings out the fact that many quantities for symmetric orbifolds are “universal” in the sense that they do not depend on the details of the manifold M.
Correlation Functions for M N /SN Orbifolds
405
2.3. Closing the punctures. For the discussion below the order and number of twist operators can be arbitrary, but for explicitness we assume that the covering space has the topology of a sphere, and we use the correlator σ2 σ2 as an illustrative example. It will be evident that no new issues arise for other correlators or when has higher genus; we will mention the changes for higher genus where relevant. As it stands the covering surface that we have constructed has several “holes” in it. We will now give a prescription for closing these holes, thus making a closed surface which we also call . The prescription for closing the holes will amount to defining precisely the states to be inserted at various boundaries. If the surface is closed then we can use the Liouville action to find the path integral after change of metric; on an open surface the boundary states can change as well. The holes are of the following kinds: The holes in the finite z plane at the insertion of the twist operators. These holes are circles with radius in the z plane, and lift to holes in under the map t (z). (ii) The holes in at finite values of t, arising from the fact that we have cut the z space at |z| = 1/δ. These holes are located at points t0 where the map behaves as 1 z ∼ t−t . 0 (iii) The hole in at t = ∞, which also arises from the fact that the z space is cut at |z| = 1/δ. If there is no twist operator at z = ∞, then we have z ∼ t for large t, and the hole in is the image of |z| = 1/δ under this map. (i)
We first complete our definition of the twist operators by defining the state inserted at the edge of the cut out hole; this addresses holes of type (i) above. We had said above that the twist operator σ12 imposes the boundary condition (2.5), but this does not specify the operator completely. In fact there are an infinite number of operators, with increasing dimensions, which all create the same twist and in general create some further excitation of the fields. We define σ12 to be the operator from this family with lowest dimension. After mapping the problem to the t space, we need to ask what operator insertions at the points t = 0, t = 1 give the slowest power law fall off for the partition function when the distance a between the twist operators is increased. The answer is of course that we must insert a multiple of the identity operator at the punctures in the t space. But we will also need to know the norm of this state, and would thus like to construct it through a path integral. Thus let the covering surface be locally defined through (2.7). For |t| > 1/2 the metric on the covering surface is the one induced from the z plane ds 2 = dzd z¯ = dtd t¯|
dz 2 | = 4|t|2 dtd t¯, dt
|t| > 1/2 .
(2.13)
We “close the hole” in the t space by choosing, for |t| < 1/2 , the metric ds 2 = 4dtd t˜,
|t| < 1/2 .
(2.14)
Thus we have glued in a “flat patch” in the t space to close the hole created in the definition of the twist operator. The metric is continuous across the boundary |t| = 1/2 , but there is curvature concentrated along this boundary. The path integral of X over the disc |t| < 1/2 creates the required state along the edge of the hole. The map (2.7) is only the leading order approximation to the actual map in general, but our prescription is to “close with a flat patch” the hole in the t space, where the hole is the image on of a circular hole in the z plane. As → 0, the small departure of the map from the form (2.7) will cease to matter.
406
O. Lunin, S. D. Mathur
Note that we could have chosen a different metric to replace the choice (2.14) inside the hole, but this would just correspond to a different overall normalization of the twist operator. (Thus it would be like taking a different choice of .) Once we make the choice (2.14) then we must use the same construction of the twist operator in all correlators, and then the non-universal choices in the definitions will cancel out. The other holes, of types (ii) and (iii), arise from the hole at infinity in the z plane, and we proceed by first replacing the z plane by a closed surface. We take another disc with radius 1/δ (parameterized by a coordinate z˜ ) and glue it to the boundary of the z plane. Thus we get a sphere with metric given by ds 2 = dzd z¯ , = d z˜ d z¯˜ , z˜ =
1 , δ 1 |˜z| < , δ
|z| <
(2.15)
1 1 . δ2 z
The path integral over the second disc defines a state at the boundary of the first disc. This state is proportional to the identity. But further, our explicit construction gives the state a known norm, which is something we needed to completely define the path integrals like (2.4) and (2.6). Since the z space is closed at infinity, we find that the holes of type (ii) and (iii) are now automatically closed in – since we make as a cover of the z sphere with metric on every patch induced from the metric on this z sphere. The space is now a closed surface with a certain metric, and the path integral giving Z,δ [σ2 (z1 ), σ2 (z2 )] in (2.6) is to be carried out on this closed surface. 2.4. The method of calculation. We had found above that if we have twist operators σ12 at z = 0, z = a, then the partition function with the twist operators inserted equals that for a single field X on the double cover of the z plane given by the map (2.10). is also a sphere with infinitesimal holes cut out, but since only the identity operator is inserted at these punctures, we can close the holes and just get the partition function of X on a closed surface . At this point one might wonder that since this partition function is some given number, how does it depend on the parameter a, which gave the separation between the twist operators in the z plane? The point is that even though is a sphere for all a, the metric on this sphere depends on a – this is evident from (2.9). We can compute the partition function of X on using some fixed fiducial metric gˆ on the t space, but we must then take into account the conformal anomaly, which says that if ds 2 = eφ d sˆ 2 , then the partition function Z (s) computed with the metric ds 2 is related to the partition function Z (ˆs ) computed with d sˆ 2 through Z (s) = eSL Z (ˆs ) , where c SL = 96π
d 2 t −g (ˆs ) ∂µ φ∂ν φg (ˆs )µν + 2R (ˆs ) φ
(2.16)
(2.17)
is the Liouville action [13]. Here c is the central charge of the CFT. Since we are considering the theory of a single free field X on , we have c = 1.
Correlation Functions for M N /SN Orbifolds
407
Let us choose the fiducial metric gˆ on to be (in the case where is a sphere) d sˆ 2 = dtd t¯, = d t˜d t¯˜, t˜ =
1 δ 1 |t˜| < , δ |t| <
(2.18)
1 1 . δ2 t
We will let δ → 0 at the end. Thus we have chosen the fiducial metric on the t space to be the flat metric of a plane up to a large radius 1/δ , after which we glue an identical disc at the boundary to obtain the topology of a sphere, just as we did for the z space. From (2.16), (2.17) we see that if we increase φ by a constant, then Z changes by a √ known factor. Using that fact that for a sphere gR = 8π , we find that the partition function of X on the sphere with metric (2.18) is c
Zδ = Q (δ )− 3 = Q (δ )− 3 . 1
(2.19)
Thus we will have Z (ˆs ) = Zδ . Here Q is a constant that is regularization dependent and cannot be determined by anything that we have chosen so far. (Q determines the size of the sphere for which the partition function will attain the value unity; since the CFT has no built-in scale we cannot find the value of this size in any absolute way.) Q will cancel out in all final calculations. The partition function of one boson on the z sphere with the metric (2.15) is Zδ (cf. Eq. (2.4)), and we have c
Zδ = Qδ − 3 = Qδ − 3 . 1
(2.20)
2.5. Contributions to SL . The partition function with twist operators inserted can be written as Z,δ [σn1 (z1 ), . . . , σnk (zk )] = eSL Z (ˆs ) .
(2.21)
Thus the computation of the correlation function boils down to computing SL . There are three types of contributions to SL , which we will analyze separately (1)
(2)
(3)
SL = SL + SL + SL . (1)
(2.22)
SL will give the essential numerical contributions to the correlation functions (as well (2) (3) as regulation dependent quantities), while SL and SL give only regulation dependent quantities; regulation parameters cancels out at the end. (a) We have cut out various discs from the z plane where the physical theory is defined: we have removed infinity by taking |z| < 1/δ and have also cut out circles of radius around the twist operator insertions. Let us call this region of z the “regular region”. This “regular” region of the z space has an image in the t space, which we call the “regular region” on . On we will find, apart from the obvious cuts around the images of the twist operators and a cut near |t| = ∞, further possible cuts around images of z = ∞ as discussed in Subsect. 2.3. Let the contribution to SL from this “regular region” of (1) be called SL .
408
O. Lunin, S. D. Mathur
To evaluate (2.21) we need to choose a fiducial metric on the t space. Suppose that the map z(t) has the form z ∼ bt as t → ∞ . (When there is no twist operator at infinity the map can be taken to have this form.) Let this fiducial metric d sˆ 2 be of the form (2.18) with 1 b < . δ δ
(2.23)
With this choice the boundary |z| = 1/δ gets mapped to a curve inside the disc |t| < 1/δ (i.e. into the “first half” of the t sphere). In this “regular region” of , the fiducial metric (2.18) is flat, and so there is no contribution from the Rφ term in (2.17). Thus we have 1 (1) SL = d 2 t[∂µ φ∂ µ φ], (2.24) 96π where the integral extends over the region described above. We rewrite (2.24) as 1 1 SL = − d 2 t[φ∂µ ∂ µ φ] + φ∂n φ. (2.25) 96π 96π ∂ Here ∂ is the boundary of the “regular region” of , and ∂n is the normal derivative at the boundary. From (2.9) we find that φ = log[
dz d z¯ ] + log[ ], dt d t¯
(2.26)
so that ∂µ ∂ µ φ = 4∂t ∂t¯φ = 0,
(2.27)
and we get 1 SL = 96π
∂
φ∂n φ.
(2.28)
The boundaries of the “regular region” are of two kinds: those arising from the holes of size |z − zi | = cut around the twist operators, and those arising from the cutoff at infinity (|z| = 1/δ). Consider the boundary of the hole arising from some twist operator σn (zi ). We regulated the twist operator by choosing this hole to be a circle in the z plane, so we start by looking at a segment of the boundary using the coordinate z. We have ∂n = −
1 (z∂z + z¯ ∂z¯ ). |z|
Writing z = |z|eiθ , one finds |z| dz |z| d z¯ ds = |z| dθ = = . i z −i z¯
(2.29)
(2.30)
Thus we get ∂
ds φ∂n φ = i
dz φ∂z φ + c.c.
(2.31)
Correlation Functions for M N /SN Orbifolds
409
Since z is holomorphically dependent on t, we can write dz ∂z φ = dt ∂t φ. We can thus write for the contribution to SL from any hole 1 1 [i dt φ∂t φ + c.c.], ds φ∂n φ = 96π ∂ 96π
(2.32)
(2.33)
where φ is given through (2.26). A similar analysis applies to all the other boundaries of the “regular region” on , and we compute (2.33) for each such boundary. Since the “holes” on are infinitesimal size punctures, computing (2.33) needs only the leading order behavior of φ at the punctures. (b) We had cut out holes of radius in the z plane around the insertions of the twist operators, and these gave corresponding holes in the “regular region” of . We now compute the contribution to SL from the part H of that is used to close such a hole. Since we had closed these holes with the flat metric (2.14), and since the fiducial metric we use on is also flat in H (d sˆ 2 = dtd t¯), we get φ = constant, and so there is no contribution from the kinetic term in (2.17). Note that at the boundary of H we have ∂t φ nonzero but bounded, then since the area of this boundary is zero (the boundary is one-dimensional) we get no contribution to the kinetic term from the boundary either. The curvature term in (2.17) is zero, since the curvature of the fiducial metric is zero throughout the region where the twist operators are inserted. Thus we get no contribution to SL from these regions H of . (c) Now consider the contributions from the points that have finite t, but z → ∞. The “regular region” on had excluded the image of |z| > 1/δ. This image will have a small disc D around some finite t0 , if we have z≈
α + β + ··· . t − t0
(2.34)
The fiducial metric we are using on is flat here, so there is no contribution from the curvature term in (2.17). The region inside the disc D has a metric induced from the “second half” of the z sphere (i.e. the part parameterized by z˜ in (2.15)) so that the metric is ds 2 = d z˜ d z¯˜ . Thus 1 β 1 1 ≈ 2 (t − t0 ) − 2 2 (t − t0 )2 , δ2 z δ α δ α d z˜ 1 2β φ = log + c.c. ≈ log 2 − (t − t0 ) + c.c., dt δ α α 2β ∂t φ ≈ − . α z˜ =
(2.35)
The area of the disc D in the fiducial metric is π |t − t0 |2 ≈ π(|α|δ)2 . As δ → 0, we find that d 2 t ∂t φ∂t¯φ → 0. Thus we get no contribution to SL from these images of the cut at infinity. (d) Now we look at the region of near t = ∞. Let z ≈ bt for large t. Let the image of |z| = 1/δ be the contour C on . By the choice (2.23) and the fact that C satisfies 1 , we find that C is inside the curve |t| = 1/δ . Let the contribution to SL from |t| ≈ bδ (2) the region between C and |t| = 1/δ be called SL .
410
O. Lunin, S. D. Mathur
Since the fiducial metric (2.18) is flat in this region, there is no contribution from the curvature term in (2.17). For the kinetic term we have d z˜ 1 1 ≈ − 2 2, dt δ bt δ d 2 t∂φ∂φ ≈ 32π log , |b|δ
1 1 ≈ 2 , δ2 z δ bt 2 ∂t φ ≈ − , t z˜ =
(2.36)
and we get 1 δ (2) . SL = − log 3 |b|δ
(2.37)
(e) Moving further outwards in the t plane, we find a “ring of curvature” at |t| = 1/δ (cf. Eq. (2.18)). At this ring we have (δ )2 d 2 tRφ = 8π φ, (2.38) φ ≈ log 2 + c.c., δ b which gives a contribution to SL equal to (3)
SL =
(δ )2 1 log 2 . 3 δ |b|
(2.39)
The kinetic term in φ has no contribution at this ring. Further, the region |t| > 1/δ gives no contribution to SL , since the curvature of the fiducial metric is zero, and the map gives φ = constant. 2.6. The correlator in terms of the Liouville field. Let us collect all the above contributions together. Note that (2)
(3)
SL + SL =
1 δ log , 3 δ
(2.40)
so that the variable b drops out of this combination. Now let us go back to the expression (2.6) that we want to evaluate:
σn (0)σn (a)δ =
Z (ˆs ) Z,δ [σn (z1 ), σn (z2 )] = eSL , N (Zδ ) (Zδ )n
(2.41)
here we used Eq. (2.16). Taking into account the relations (2.19) and (2.20) we finally get: n 1/3 δ
σn (0)σn (a)δ = eSL Q1−n . (2.42) δ Substituting the expression for the Liouville action, SL , we conclude that n 1/3 (1) (2) (3) (1) n−1 δ
σn (0)σn (a)δ = eSL eSL +SL Q1−n = eSL δ 3 Q1−n . δ
(2.43)
Correlation Functions for M N /SN Orbifolds
411
Thus we observe a cancellation of δ , which served only to choose a fiducial metric on and thus should not appear in any final result. The only quantity that needs computation (1) is SL using (2.33). Let us mention 0that formula (2.41) has a simple extension to the case of a general correlation function:
σn1 (z1 ) . . . σnk (zk )δ = eSL
Z (ˆs ) , (Zδ )s
(2.44)
where Z (ˆs ) is a partition function of the covering Riemann surface with the fiducial metric d sˆ 2 ( may have any genus), and s is a number of fields involved in nontrivial permutation (s = n in the case of the two point function (2.41)). The partition function Z (ˆs ) may depend on the moduli of the surface and its size (there are no moduli in the case of the sphere and the size is parameterized by δ ). 3. The 2-Point Function 3.1. The calculation. Let us apply the above scheme to evaluate the 2-point function of twist operators. If one of the twist operators corresponds to the permutation (1 . . . n),
(3.1)
then the other one should correspond to the permutation (n . . . 1),
(3.2)
since otherwise the correlation function vanishes. Thus we can write
σn (0)σn (a)
(3.3)
instead of σ(1...n) (0)σ(n...1) (a) without causing confusion. The generalization of the map (2.10) to the case of σn is z=a
tn . t n − (t − 1)n
(3.4)
For this map we have dz 2 t n−1 (t − 1)n−1 | = log[an n ] + c.c., dt (t − (t − 1)n )2 dφ (2t + n − 1)(t − 1)n − (2t − n − 1)t n =− . dt t (t − 1)((t − 1)n − t n ) φ = log |
(3.5)
This map has the branch points located at t =0→z=0
t = 1 → z = a.
and
(3.6)
There are n images of the point z = ∞ in t plane: tk =
1 , 1 − αk
αk = e
We note that α0 = 1 gives t = ∞.
2π ik n
,
k = 0, 1, . . . , n − 1.
(3.7)
412
O. Lunin, S. D. Mathur
Let us compute the contribution (2.33) for the point z = 0. Near this point we have: |z|1/n , a 1/n n−1 ∂t φ ≈ . t
z ≈ (−1)n+1 at n ,
|t| ≈
φ ≈ log[ant n−1 ],
(3.8)
Then we get the contribution to the Liouville action (2.25): n−1
n−1 1 n SL (t = 0) = − . log |a| + log n 12 n
(3.9)
By a reflection symmetry t → 1 − t, z → a − z, we get the same contribution from the other branch point: n−1
n−1 1 SL (t = 1) = − . (3.10) log |a| + log n n 12 n Now we look at the images of infinity. First we note that the integral over the boundary located near t = ∞ will give zero, since dφ/dt goes like 1/t 2 , the length of the circle goes like t and the value of φ is at best logarithmic in t. But we do get a contribution from the images of z = ∞ located at finite points in the t plane. Note that if
t=
1 + x, 1 − αk
then
(t − 1)n − t n ≈
xn . αk (1 − αk )n−2
(3.11)
This leads to aαk 1 1 aαk , x≈− , n(1 − αk )2 x n(1 − αk )2 z 2 φ = log a −1 n(1 − αk )2 αkn−1 z2 , ∂t φ ≈ − . t − (1 − αk )−1 z≈−
(3.12) (3.13)
The point t = tk we are considering gives the following contribution to the Liouville action: 1 SL (t = tk ) = log a −1 n(1 − αk )2 αkn−1 δ −2 . (3.14) 6 Thus the total contribution from the images of infinity is:
n−1 n(1 − αk )2 αkn−1 n−1 1 n+1 SL (z = ∞) = =− log log[|a|δ 2 ] + log[n]. −2 6 aδ 6 6 k=1
(3.15) We have used the following properties of αk : n−1 k=1
αk = 1;
n−1 k=1
(q − αk ) =
qn − 1 → n, q −1
if
q → 1,
(3.16)
which follow from the fact that {αk } is the set of different solutions of the equation α n − 1 = 0 and α0 = 1.
Correlation Functions for M N /SN Orbifolds
413
Adding all the contributions together, we get an expression for the interesting part of the Liouville action: 1 (n − 1)2 1 (1) SL = − (n − ) log |a| + log + 2(n − 1) log δ − 2 log n . (3.17) 6 n n This leads to the final expression for the correlation function (see (2.43)): (1)
σn (0)σn (a)δ = eSL δ An = −
n−1 3
Q1−n = a − 6 (n− n ) Cn An QBn , 1
(n − 1)2 6n
,
1
Bn = 1 − n,
Cn = n1/3 .
(3.18) (3.19)
Thus we read off the dimension 5n of σn , 5n =
1 1 (n − ). 24 n
(3.20)
The other constants in (3.18) are to be absorbed into the normalization of σn . We will discuss this renormalization after computing the 3-point functions. 3.2. “Universality” of the 2-point function. The theory we have considered above is that of the orbifold M N /SN , where the manifold M is just R, the real line. If M was R d instead, we could treat the d different species of fields independently, and obtain c 1 5n = n− , (3.21) 24 n where c = d is the central charge of the CFT for one copy of M = R d . But we see that we would obtain the result (3.21) for the symmetric orbifold with any choice of M; we just use the value of c for the CFT on M. Around the insertion of the twist operator we permute the copies of M, but the definition of the twist operator does not involve directly the structure of M itself. The Liouville action (2.17) determines the correlation function using only the value c of the CFT. Thus we recover the result (3.21) for any M. This “universality” of 5n is well known, and the value of 5n can be deduced from the following standard argument. Consider the CFT on a cylinder parameterized by w = x + iy, 0 < y < 2π . At x → −∞ let the state be the vacuum of the orbifold CFT M N /SN . Since there is no twist, each copy of M gives its own contribution to the c vacuum energy, which thus equals − 24 N . Now insert the twist operator σn at w = 0, and look at the state for x → ∞. The copies of M not involved in the twist contribute c − 24 each as before, but those that are twisted by σn turn into effectively one copy of c M defined on a circle of length 2π n. Thus the latter set contribute − 24n to the vacuum energy. The change in the energy between x = ∞ and x = −∞ gives the dimension of σn (since the state at x → −∞ is the vacuum) −
c cn c 1 − [− ] = (n − ) = 5n . 24n 24 24 n
(3.22)
Thus while our calculation of the 2-point function has not taught us anything new, we have obtained a scheme that will yield the higher point functions for symmetric orbifolds using an extension of the same universal features that gave the value of 5n in the above argument.
414
O. Lunin, S. D. Mathur
4. The Map for the 3-Point Function 4.1. Genus of the covering surface. Let us first discuss the nature of the covering surface for the case where we have an arbitrary number of twist operators in the correlation function σn1 σn2 . . . σnk . The CFT is still defined on the plane z, which we will for the moment regard as a sphere by including the point at infinity. At the insertion of the operator σnj (zj ) the covering surface has a branch point of order nj , which means that nj sheets of meet at zj . One says that the ramification order at zj is rj = nj − 1. Suppose further that over a generic point z here are s sheets of the covering surface . Then the genus g of is given by the Riemann–Hurwitz formula: 1 g = rj − s + 1. (4.1) 2 j
Let us now consider the 3-point function. We require each twist operator to correspond to a single cycle of the permutation group, and regard the product of two cycles to represent the product of two different twist operators. Let the cycles have lengths n, m, q respectively. It is easy to see that we can obtain covering surfaces of various genera. For example, if we have σ12 σ13 σ123
(4.2)
as the three permutations, then we have r1 = 1, r2 = 1, r3 = 2, s = 3, and we get g = 0. On the other hand with σ123 σ123 σ123
(4.3)
we get r1 = r2 = r3 = 2, s = 3 and we get g = 1. (This genus 1 surface is a singular limit of the torus, however.) Let us concentrate on the case where we get g = 0. Without loss of generality we can take the first permutation σn to be the cycle (1, 2, . . . k, k + 1, . . . n).
(4.4)
The second permutation is restricted by the requirement that when composed with (4.4) it yields a single cycle (which would be the conjugate permutation of the third twist operator). In addition we must have a sufficiently small number of indices in the result of the first two permutations so that we do get g = 0. A little inspection shows that σm must have the form (k, k − 1, . . . 1, n + 1, n + 2, . . . n + m − k).
(4.5)
Thus the elements 1, 2, . . . k of the first permutation occur in the second permutation in the reverse order, and then we have a new set of elements n + 1, . . . n + m − k. These two permutations compose to give the cycle σm σn equal to ( k + 1, k + 2, . . . n, 1, n + 1, n + 2, . . . n + m − k).
(4.6)
Thus σq must be the inverse of the cycle (4.6), and we have q = n + m − 2k + 1.
(4.7)
Correlation Functions for M N /SN Orbifolds
415
Note that the number of “overlaps” (i.e., common indices) between σn and σm is k. Note that we must have k ≥ 1 in order that the product σm σn be a single cycle rather than just a product of two cycles. Also note that if we have q = n + m − 1, then since s ≥ q, (4.1) gives that must have genus zero (this will be a “single overlap” correlator). Let be the covering surface that corresponds to the insertions σn (z1 )σm (z2 ) σq (z3 ). Then the number of sheets of over a generic point z is just the total number of indices used in the permutations 1 (n + m + q − 1). 2
(4.8)
n−1 m−1 q −1 + + − s + 1 = 0. 2 2 2
(4.9)
s =n+m−k = Thus the genus of is
4.2. The map for the case g = 0. We are looking for a covering surface of the sphere that is ramified at three points on the sphere, with a finite order of ramification at each point. We look for the map from z to as a ratio of two polynomials z=
f1 (t) ; f2 (t)
(4.10)
the existence of such a map will be evident from its explicit construction. By using the SL(2, C) symmetry group of the z sphere, we will place the twist operators σn , σm , σq at z = 0, z = a, z = ∞ respectively. We can assume without loss of generality that n ≤ q,
m ≤ q.
(4.11)
Note that we had placed a cutoff in the z plane to remove the region at infinity, and it will not be immediately clear how to normalize a twist that occurs around the circle at infinity. We will discuss this issue of normalization later. By making an SL(2, C) transformation t = at+b ct+d of the surface , which we assume is parameterized by the coordinate t, we can take z(t = 0) = 0,
z(t = ∞) = ∞,
z(t = 1) = a.
(4.12)
Note that this SL(2, C) transformation maintains the form (4.10) of z to be a ratio of two polynomials, and we will use the symbols f1 , f2 to denote the polynomials after the choice (4.12) has been made. Since we need s values of t for a generic value of z, with s given by (4.8), the relation (4.10) should give a polynomial equation of order s for t. Thus the degrees d1 , d2 of the polynomials f1 , f2 should satisfy: max(d1 , d2 ) = s =
1 (n + m + q − 1). 2
Since we have chosen t = ∞ for z = ∞, we get d1 > d2 , and we have d1 =
1 (n + m + q − 1). 2
(4.13)
416
O. Lunin, S. D. Mathur
The requirement of the proper behavior at infinity (z ∼ t q ) then gives: d2 = d1 − q =
1 (n + m − q − 1). 2
(4.14)
Finally, the number of indices common between the permutations σn (0) and σm (a) (the overlap) is 1 (n + m − (q − 1)) = d2 + 1. 2
(4.15)
Let us now look at the structure required of the map (4.10). For z → 0 we need z = t n (C0 + O(t)).
(4.16)
z = a + (t − 1)m (C1 + O(t − 1)).
(4.17)
dz = f1 f2 − f2 f1 = Ct n−1 (1 − t)m−1 dt
(4.18)
Similarly for t → 1 we need
Then we find f22
(C is a constant). The last step follows on noting that the expression f1 f2 − f2 f1 is a polynomial of degree d1 + d2 − 1 = n + m − 2, and the behavior of z near z = 0, z = a already provides all the possible zeros of this polynomial f22 dz dt . The expression in (4.18) is just the Wronskian of f1 , f2 , and our knowledge of this Wronskian gives an easy way to find these polynomials. We seek a second order linear differential equation whose solutions are the linear span f = αf1 + βf2 . Such an equation is found by observing that f f f f1 f1 f1 = 0 (4.19) f2 f f 2 2 so that we get the equation Wf − W f + c(t)f = 0,
(4.20)
where W = f2 f1 − f1 f2 ,
c(t) = f2 f1 − f1 f2 .
(4.21)
Here W is given by (4.18). The coefficient −W of f is −W = −Ct n−2 (t − 1)m−2 [(n − 1) − (n + m − 2)t].
(4.22)
The coefficient c(t) must be a polynomial of degree n + m − 4 but in fact we can argue further that it must have the form γ t n−2 (1 − t)m−2 ,
γ = constant
(4.23)
Correlation Functions for M N /SN Orbifolds
417
To see this look at the equation near t = 0. Let c(t) ∼ αt k with k < n − 2. Then the equation reads t n−1−k f − (n − 1)t n−2−k f +
α f = 0. C
(4.24)
Note that the two polynomials f1 , f2 which solve the equation must not have a common root t = 0, since we assume that (4.10) is already expressed in reduced form. Thus at least one of the solutions must go like f ∼ constant at t = 0, which is in contradiction with (4.24) since the first two terms on the LHS vanish while the last does not (a = 0 by definition). Thus c(t) has a zero of order at least n − 2 at t = 0, and by a similar argument, a zero of order at least m − 2 at t = 1. Thus the result (4.23) follows. Dividing through by Ct n−2 (1 − t)m−2 we can write Eq. (4.20) as t (1 − t)f + [−(n − 1) + (n + m − 2)t]f + γ˜ = 0.
(4.25)
Let us now look at t → ∞, and let the solutions to the above equation go like t p . Then we get −p(p − 1) + p(m + n − 2) + γ˜ = 0,
(4.26)
which has the solutions p± =
1 m + n − 1 ± (m + n − 1)2 + 4γ˜ . 2
(4.27)
But since we have a twist operator of order q at infinity, we must have p+ − p− = q.
(4.28)
This gives γ˜ =
1 (q − m − n + 1)(q + m + n − 1) = −d1 d2 . 4
(4.29)
Thus we have found the equation which is satisfied by both f1 and f2 : t (1 − t)y + (−n + 1 − (−d1 − d2 + 1)t)y − d1 d2 y = 0,
(4.30)
which is the hypergeometric equation. Its general solution is given by y = AF (−d1 , −d2 ; −n + 1; t) + Bt n F (−d1 + n, −d2 + n; n + 1; t).
(4.31)
The map we are looking for can be written as z=a
d2 !d1 ! F (−d1 + n, −d2 + n; n + 1; t) >(1 − n) tn , F (−d1 , −d2 ; −n + 1; t) n!(d1 − n)! >(1 − n + d2 )
(4.32)
where we have chosen the normalizations of f1 , f2 such that the t = 1 maps to z = a. In our case d1 , d2 and n are integers. Some of the individual terms in the above expression are undefined for integer d1 , d2 , n and a limit should be taken from non-integer values of n (while keeping d1 , d2 fixed at their integer values). We can write the result in a well
418
O. Lunin, S. D. Mathur
defined way by using Jacobi polynomials, which are a set of orthogonal polynomials defined through the hypergeometric function 1−x n+α (α,β) Pn (x) ≡ F −n, n + α + β + 1; α + 1; n 2 n 1 cn = (n + α + β + 1) . . . (n + α + β + ν) (4.33) ν n! ν=0 x−1 ν . · (α + ν + 1) . . . (α + n) 2 Then (4.32) becomes (n,−d1 −d2 +n−1)
z = at n Pd1 −n
−1 (−n,−d1 −d2 +n−1) (1 − 2t) Pd2 (1 − 2t) .
(4.34)
We will have occasion to use the Wronskian of the polynomials later, and we define W˜ to be normalized as follows d n (n,−d1 −d2 +n−1) (−n,−d1 −d2 +n−1) W˜ (t) = t Pd1 −n (1 − 2t) Pd2 (1 − 2t) dt d (−n,−d1 −d2 +n−1) (n,−d −d +n−1) − t n Pd1 −n 1 2 (1 − 2t) Pd2 (1 − 2t) (4.35) dt nd1 ! >(d2 − n + 1) n−1 = t (1 − t)d1 +d2 −n . n!d2 !(d1 − n)! >(1 − n) We will also have occasion to use the relation (4.32) containing hypergeometric functions, and we define W (t) = t n F (−d1 + n, −d2 + n; n + 1; t) F (−d1 , −d2 ; −n + 1; t) − t n F (−d1 + n, −d2 + n; n + 1; t)F (−d1 , −d2 ; −n + 1; t) = nt
n−1
(1 − t)
d1 +d2 −n
(4.36)
.
We will calculate the three point function using the map (4.32), (4.34) in the next section. 5. The Liouville Action for the 3-Point Function Let us evaluate the three point function
σn (0)σm (a)σq (∞)
(5.1)
using the map (4.32), (4.34). Recall that we cut circles of radius in the z plane around the twist operators at z = 0 and z = a to regularize these twist operators. But unlike the case of the 2-point function discussed in Sect. 3, now we have the twist operator σq inserted at infinity. This means that the fields X I have boundary conditions around z = ∞ such that q of the XI form a cycle under rotation around the circle X i1 → X i2 → . . . Xiq → X i1 , while the remaining fields XI are single valued around this circle. Note that if the covering surface has s sheets over a generic z then there will be s − q such single valued fields XI .
Correlation Functions for M N /SN Orbifolds
419
The covering surface will have punctures at t = 0 and t = 1 corresponding to z = 0 and z = a respectively. In addition it will have punctures corresponding to the “puncture at infinity” in the z plane. These latter punctures are of two kinds. The first kind of puncture in the t plane will correspond to the place where q sheets meet in the z plane – i.e., the lift of the point where the twist operator was inserted. But we will also have s − q other punctures in the t plane that correspond to the cut at |z| = 1/δ for the XI that are single valued around z = ∞. We will choose (when defining the “regular region”) a cutoff at value |z| = 1/δ˜ for the first kind of puncture (i.e. the puncture arising from fields X I that are twisted at z = ∞) and a value |z| = 1/δ for the second kind of puncture (i.e. punctures for fields X I which are not twisted at infinity). We will see that both δ and δ˜ cancel from all final results. 5.1. The contribution from z = 0, t = 0. Let us first consider the point z = 0 which gives t = 0. Near this point the map (4.32) gives: d2 !d1 ! zn!(d1 − n)!>(1 − n + d2 ) 1/n >(1 − n) z≈a . (5.2) t n, t ≈ n!(d1 − n)! >(1 − n + d2 ) ad2 !d1 !>(1 − n) Note that by using the relation >(x)>(1 − x) =
π sin(π x)
we can write (n − 1)! >(n − d2 ) sin(π(n − d2 )) >(1 − n) = = (−1)d2 , >(n) sin(π n) (n − d2 − 1)! >(1 − n + d2 ) so that the > functions in the above expressions are in reality well defined. The Liouville field and its derivative are given by: nad2 !d1 ! (n − d2 − 1)! n−1 n−1 φ ≈ log t , + c.c., ∂t φ ≈ n!(d1 − n)! (n − 1)! t
(5.3)
(5.4)
(5.5)
where we have dropped the factor (−1)d2 in (5.4) since φ is the real part of the logarithm. Substituting these values into the expression for the Liouville action: i SL = (5.6) dtφ∂t φ, 96π we get a contribution from the point t = 0: n−1 n − 1 n−1 d2 !d1 ! (n − d2 − 1)! log n n − log a , SL (t = 0) = − 12 12n n!(d1 − n)! (n − 1)! (5.7) where we note that the integration in (5.6) is performed along the circle n!(d1 − n)!(n − 1)! 1/n |t| = . ad2 !d1 !(n − d2 − 1)!
(5.8)
A simplification analogous to (5.4) will occur in many relations below, but for simplicity we leave the > functions in the form where they have negative arguments; we replace them with factorials of positive numbers only in the final expressions.
420
O. Lunin, S. D. Mathur
5.2. The contribution from z = a, t = 1. Let us look at the point t = 1. Using the expression for the Wronskian (4.36), we find the derivative of the map (4.32): d2 !d1 ! >(1 − n) nt n−1 (1 − t)d1 +d2 −n dz =a , dt n!(d1 − n)! >(1 − n + d2 ) [F (−d1 , −d2 ; −n + 1; t)]2
(5.9)
which can be combined with the known property of the hypergeometric function: F (a, b; c; 1) =
>(c)>(c − a − b) >(c − a)>(c − b)
(5.10)
to give the result: z ≈ a − βa(1 − t)d1 +d2 −n+1 , d1 !d2 !(d1 − n)! >(1 − n + d2 ) n . β= d1 + d2 − n + 1 n! [(d1 + d2 − n)!]2 >(1 − n)
(5.11) (5.12)
Our usual analysis gives:
1 z − a d1 +d2 −n+1 , 1−t ≈ − , z ≈ a − βa(1 − t) aβ d1 + d2 − n φ ≈ log aβ(d1 + d2 − n + 1)(1 − t)d1 +d2 −n + c.c., ∂t φ ≈ − , 1−t
d1 +d2 −n+1
d1 + d2 − n (5.13) log(d1 + d2 − n + 1) 12 (d1 + d2 − n)2 (d1 + d2 − n) − log − log(a|β|). 12(d1 + d2 − n + 1) 12(d1 + d2 − n + 1)
SL (t = 1) = −
Note that d1 + d2 − n + 1 = m, so that we can rewrite the contribution from t = 1 in a way which makes it look more symmetrical with the contribution from t = 0. But we will defer all such simplifications to the final expressions for the fusion coefficients. 5.3. The contribution from z = ∞. To analyze the contribution from the point t = ∞ it is convenient to look at the map written in terms of Jacobi polynomials (4.34). Then one can use the Rodrigues’ formula to represent the Jacobi polynomials in the form: (αβ) Pk (x)
=2
−k
k k+α k+β (x − 1)k−j (x + 1)j . j k−j
(5.14)
j =0
The limit x → ∞ gives: (αβ) Pk (x)
k −k
→x 2
k k+α k+β 2k + α + β k −k =x 2 . j k−j k j =0
(5.15)
Correlation Functions for M N /SN Orbifolds
421
Substitution of this limit into the expression (4.34) gives the behavior near t = ∞ : 1 z d1 −d2 z ≈ aγ (−1)d1 −d2 −n t d1 −d2 , t ≈ (−1)d1 −d2 −n , aγ d2 !(d1 − d2 − 1)! >(−d1 ) γ = , (5.16) (d1 − n)!(n − d2 − 1)! >(d2 − d1 ) d1 − d2 − 1 φ ≈ log aγ (d1 − d2 )t d1 −d2 −1 + c.c., ∂t φ ≈ . t Consider first the point z = ∞, t = ∞. Recall that we have taken the “regular region” on to be bounded by the image of 1/δ˜ (rather than 1/δ) when a twist operator is inserted. The contour around the puncture at infinity in the t plane should be taken to go clockwise rather than anti-clockwise, so that it looks like a normal anti-clockwise contour in the local coordinate t = 1/t around the puncture. Thus to compute the contribution from this puncture we should follow our usual procedure but reverse the overall sign. The result reads: d1 − d2 − 1 SL (t = ∞) = (−1) − log(d1 − d2 ) 12 (5.17) (d1 − d2 − 1)2 d 1 − d2 − 1 ˜ + log δ − log(a|γ |) . 12(d1 − d2 ) 12(d1 − d2 ) Finally let us analyze the images of z = ∞ that give finite values ti of t. At each of these points the map t → z is one–to–one, in contrast to the above case z = ∞, t = ∞, where q values of t correspond to each value of z in a neighborhood of the puncture. Further, there is no sign reversal for the contour of integration around these punctures when we use the coordinate t to describe the contour. Looking at the structure of the map (4.34) one can easily identify the locations of the ti : they coincide with zeroes of the denominator. So to evaluate the contribution to the Liouville action from the ti we will need some information about zeroes of Jacobi polynomials. Using the fact that Jacobi polynomials have only simple zeroes we can expand the map (4.34) around any of the ti : z≈ ξi =
atin
(n,−d1 −d2 +n−1)
Pd1 −n
(1 − 2ti )
(−n,−d1 −d2 +n−1) P d2 (1 − 2ti ) t (n,−d −d +n−1) Pd1 −n 1 2 (1 − 2ti ) n . ti d (−n,−d1 −d2 +n−1) (1 − 2ti ) dt Pd2
1 aξi ≡ , − ti t − ti
Then everything can be evaluated in terms of ξi : aξi −aξi t − ti ≈ + c.c., , φ ≈ log z (t − ti )2 1 SL (t = ti ) = − log(δ 2 aξi ). 6 Collecting the contributions from all the ti we get:
(5.18)
(5.19)
∂t φ ≈ −
2 , t − ti (5.20)
d
SL (all ti ) = −
2 d2 1 log(δ 2 a) − log(ξi ), 6 6
i=1
(5.21)
422
O. Lunin, S. D. Mathur
and we only need to evaluate the product of ξi . Note that the regularization parameter δ we use here has the same meaning as one considered in Sect. 3. This product can be written in terms of the Wronskian (4.35) and the discriminant of Jacobi polynomials. To see this we first rewrite (5.19) in terms of zeroes of Jacobi polynomials. If z = P /Q, then the Wronskian (4.35) is W˜ = P Q − P Q = −P Q at a zero of Q. Writing any of the ξi as ξ = P /Q = P Q /Q 2 we find −2 ξi = −W˜ (ti ) − 2a0 (xi − xj ) ,
(5.22)
j =i
where a0 is the coefficient in front of the highest power in the polynomial (−n,−d1 −d2 +n−1) ; it can be evaluated using (5.15). The xi are the zeros of the polynomial Pd2 Q(x) in the denominator. Applying the general definition of the discriminant to Jacobi polynomials (−n,−d1 −d2 +n−1) ≡ a02d2 −2 (xi − xj )2 , (5.23) Dd2 i<j
we get d2
ξi =
2(d −2) (−1)d2 2−2d2 a0 2
i=1
d2 (−n,−d1 −d2 +n−1) −2 Dd2 W˜ (ti ). i=1
(5.24)
The discriminant of Jacobi polynomials can be evaluated [17]: (−n,−d1 −d2 +n−1)
D ≡ Dd2 ×
d2
= 2−d2 (d2 −1)
(5.25)
j j +2−2d2 (j − n)j −1 (j − d1 − d2 + n − 1)j −1 (j − d1 − 1)d2 −j .
j =1
To evaluate the right-hand side of (5.24) we only need the expressions for d2 i=1
ti
and
d2
(1 − ti ).
(5.26)
i=1
Let us consider the general Jacobi polynomial: (αβ)
Pk
(1 − 2t) = (−2)k a0 t k + · · · + ak+1 = (−2)k b0 (t − 1)k + · · · + bk+1 . (5.27)
Obviously b0 = a0 . By taking the limits t → ∞, t → 0 and t → 1 in the above expression we find: d2
ti =
ak+1 >(k + α + 1)>(k + α + β + 1) , = k 2 a0 >(α + 1)>(2k + α + β + 1)
(5.28)
(1 − ti ) =
bk+1 >(k + β + 1)>(k + α + β + 1) . = k 2 b0 >(β + 1)>(2k + α + β + 1)
(5.29)
i=1 d2 i=1
Correlation Functions for M N /SN Orbifolds
423
Collecting all contributions together, we get log
d2 i=1
ξi = − 2d2 (d2 − 1) log 2 + d2 log n − 2 log D − (3d2 − 4) log d2 !
d1 ! (n − 1)! + (n + d2 − 1) log (5.30) n!(d1 − n)! (n − d2 − 1)! (d1 − d2 )! (d1 + d2 − n)! + (d1 − d2 + 3) log + (d1 + d2 − n) log . d1 ! (d1 − n)! + d2 log
5.4. The total Liouville action. Collecting the contributions from the different branching points we obtain the final expression for the Liouville action (n − 1)2 (d1 + d2 − n)2 (d1 − d2 − 1)2 (1) SL = − + log − log δ˜ 12n 12(d1 + d2 − n + 1) 12(d1 − d2 ) d2 n−1 d1 +d2 −n d1 −d2 −1 d2 − log δ − + − + log a 3 12n 12(d1 +d2 −n+1) 12(d1 −d2 ) 6 n−1 d1 + d 2 − n − log n − log(d1 + d2 − n + 1) (5.31) 12 12 d1 −d2 −1 n−1 d1 !d2 ! >(1−n) + log(d1 − d2 ) − log 12 12n n!(d1 − n)! >(1−n+d2 ) d 2 d1 + d2 − n d 1 − d2 − 1 1 ξi . − log |β| + log |γ | − log 12(d1 + d2 − n + 1) 12(d1 − d2 ) 6 i=1
The values of β and γ are given by (5.12) and (5.16), and the last term is given through (5.30). According to (2.44), the three point function is given by: ˜
(1)
(2)
σn (0)σm (a)σqδ (∞)δ = eSL eSL
(3)
+SL
Z (ˆs ) , (Zδ )s
(5.32)
where s is number of fields involved in permutation; it is defined by (4.8). Note that we (2) (3) have not determined the values of SL and SL for the case under consideration, as we will see these quantities will cancel in the final answer. 6. Normalizing the Twist Operators This Liouville action (5.31) yields the correlation function for twist operators with the ˜ We immediately see that the power of a in the correlator regularization parameters , δ, δ. is d 1 + d2 − n d 1 − d2 − 1 d2 n−1 + − + a :− 12n 12(d1 + d2 − n + 1) 12(d1 − d2 ) 6
1 n 1 1 1 1 q + Ma + Mam − Ma = − n− +m− −q + , (6.1) 2 6 n m q
424
O. Lunin, S. D. Mathur
which agrees with the expected a dependence of the 3-point function ˜
σn (0)σm (a)σqδ (∞) ∼ |a|−2(5m +5n −5q ) .
(6.2)
To obtain the final correlation functions and fusion coefficients we have two sources of renormalization coefficients that need to be considered: (a) We have to normalize the operators σn such that their 2-point functions are set to unity at unit z separation; at this point we should find that the parameters , δ, δ˜ disappear from the 3-point (and higher point) functions as well. After we normalize the twist operators σn in this way we will call them σn .
(b) The CFT had N fields XI , though only n of them are affected by the twist operator σn . However at the end of the calculation of any correlation function of the operators σni we must sum over all the possible ways that the ni fields that are twisted can be chosen from the total set of N fields. Thus we will have to define operators On that are sums over conjugacy classes of the permutation group, and these operators On are the only ones that will finally be well defined operators in the CFT [10]. The correctly normalized On will thus have combinatoric factors multiplying the normalized operators σn . We choose to arrive at the final normalized operators On in these two steps since the calculations involved in steps (a) and (b) are quite different; further when ni << N the factor coming from (b) is just a power of N which is easily found. 6.1. Normalizing the σn . Let us define the normalized twist operators σn by requiring
σn (0)σn (a) =
1 . |a|45n
(6.3)
From (3.18) we see that σn = Dn σn ,
Dn = [Cn An QBn ]−1/2 .
(6.4)
Let the Operator Product Expansion (OPE) have the form σm (a)σn (0) ∼
σ |2 |Cnmq
|a|2(5n +5m −5q )
σq (0) + . . . ,
(6.5)
where we have written the OPE for holomorphic and anti-holomorphic blocks combined, σ and we have put a superscript σ on the fusion coefficients Cnmq to remind ourselves that these are not the final fusion coefficients of the physical operators On . With the normalization (6.3) we will get
σn (z1 )σm (z2 )σq (z3 ) =
σ |2 |Cnmq
|z1 − z2 |2(5n +5m −5q ) |z2 − z3 |2(5m +5q −5n ) |z3 − z1 |2(5q +5n −5m )
.
(6.6)
We have computed the 3-point functions and should thus be able to get the fusion σ coefficients Cnmq from (6.6). However while two of our twist operators were inserted at finite points in the z plane, the last one was inserted at infinity. Putting one of the points at infinity simplified the calculation, but it also creates the following problem: unlike the twist operators at z = 0, z = a which are normalized through (6.4) it is not clear what
Correlation Functions for M N /SN Orbifolds
425
is the normalization of the twist operator that is inserted at infinity. (We could think of this operator as inserted at a puncture on the sphere at infinity, and therefore no different from the other insertions, but we have chosen the flat metric on the z plane and thus made infinity a special region carrying curvature.) To get around this problem we adopt the following scheme. If we have the OPE (6.5) then we will get ˜
σn (0)σm (a)σqδ (∞)
σq (0)σqδ˜ (∞)
=
σ |2 |Cnmq
|a|2(5n +5m −5q )
,
(6.7)
and we will thus not need to know the normalization of the operator at infinity. The ˜ term σn (0)σm (a)σqδ (∞) can be found from our 3-point function calculation together with the normalization factors for σn , σm from (6.3). To compute the denominator we must find the 2-point function with one operator at infinity. (We had earlier computed the 2-point function with both operators in the finite z plane since if we put one operator at infinity then we lose any position dependence in the correlator and cannot extract the scaling dimensions.) To evaluate the two point function
σn (0)σn (∞)
(6.8)
z = bt n .
(6.9)
we consider the map:
This map has order n ramification points at t = 0 and t = ∞ and the usual calculations give: φ = log[nbt n−1 ] + c.c.,
∂t φ =
n−1 , t
n−1 log[nb1/n (n−1)/n ], 12 n−1 SL (t = ∞) = −(−1) log[nb1/n δ˜(1−n)/n ]. 12 SL (t = 0) = −
(6.10) (6.11)
As before we cut a hole of size around the origin and put the twist at infinity on a ˜ We also have an extra negative sign for the cut at infinity since boundary at |z| = 1/δ. the contour that goes anti-clockwise in the local coordinate at 1/t near t = ∞ goes clockwise in the coordinate t. Collecting both contributions we get: (2)
˜
σn (0)σnδ (∞) = Fn δ˜Fn eSL Fn = −
(3)
+SL
Z (ˆs ) , (Zδ )n
(6.12)
(n − 1)2 . 12n
(6.13) (2)
(3)
The correlator is b–independent as expected. The values of SL , SL and the partition ˜ but these expressions are the same as in (5.32) and so function Z (ˆs ) depend on δ, δ , δ, they cancel in the final answer.
426
O. Lunin, S. D. Mathur σ The Cn,m,q are then given by σ |Cn,m,q |2 =
˜
σn (0)σm (a)σqδ (∞)
σq (0)σqδ˜ (∞) ˜
=
σn (0)σm (a)σqδ (∞)
σq (0)σqδ˜ (∞)
σq (0)σq (1)
σn (0)σn (1) σm (0)σm (1)
(6.14) .
As a consistency check of our procedure we look at powers of various regularization σ parameters: they should cancel in any physical quantity. For |Cn,m,q |2 we have the ˜ Q: following powers for , δ, δ,
(n − 1)2 1 (m − 1)2 :− (6.15) + − Fq − An + Am − Aq = 0, 12n 12m 2 (q − 1)2 − Fq = 0, 12q d2 δ :− − s + q = 0, 3
1 Q :1 − s − Bn + Bm + Bq = 0. 2 δ˜ : −
(6.16) (6.17) (6.18)
We used the expressions (4.13) and (4.14) for d1 and d2 , the values of An , Bn , Fn from (3.19) and (6.13) and the genus relation (4.9). We finally get (for the contribution from of genus zero) the logarithm of the fusion coefficient q n−1 1 m−1 q −1 σ log |Cn,m,q |2 = log − log n − log m + log(q) 6 mn 12 12 12 n−1 d1 !d2 ! (d1 − m)! − log (6.19) 12n n!(n − 1)! (d1 − n)! m−1 d1 !d2 ! (d1 − n)! − log 12m m!(m − 1)! (d1 − m)! d 2 q−1 1 (q−1)!d2 ! (d1 − d2 )!) + ξi . log − log 12q (d1 −n)!(d1 − m)! d1 ! 6 i=1
The expression for the product of ξi is given by (5.30). σ The coefficients Cn,m,q must be symmetric in the indices m, n, q. We have written (6.20) in such a way that all the terms except the last one show a manifest symmetry between m and n. It can be shown without much difficulty that the last term (given through (5.30) and (5.25)) is also symmetric in m and n. In particular d1 and d2 are symmetric in m and n, and −d1 − d2 + n − 1 = −m, so that the Jacobi polynomial whose discriminant is calculated in (5.25) is Pd−n,−m . 2 On the other hand it is not at all obvious that the expression (5.30) is symmetric under the interchange of q with either n or m. Note that d2 + 1 is the number of elements that overlap between the permutations σn and σm , and the product in (5.25) runs over the range j = 1 . . . d2 . This number d2 + 1 is in general different from the number of overlapping elements between the permutations σq and σn or between σq and σm , and thus there is no
Correlation Functions for M N /SN Orbifolds
427
simple way to write (5.30) in a form that makes its total symmetry manifest. Nevertheless, this expression is indeed symmetric in all three arguments n, m, q, as can be checked by evaluating the expression through a symbolic manipulation program. Verifying this symmetry provides a useful check of all our calculations for the 3-point function. 6.2. Two special cases. Due to the structure of the discriminant (5.25), the general expression for the fusion coefficient looks complicated for an arbitrary value of d2 . However there are two important cases where significant simplifications occur. These are the cases of one and two overlaps (we recall that the number of common indices in σn and σm is d2 + 1). One can see that for d2 = 0 and d2 = 1 the discriminant D = 1. Let us analyze both these cases. For one overlap we have: d2 = 0,
d1 = q = m + n − 1,
and the logarithm of fusion coefficient is given by: 1 1 1 1 σ 2 log |Cn,m,m+n−1 | = − n+ log n − m+ log m 12 n 12 m 1 1 1 1 1 1 1 (q−1)! + q+ + − 1 log q − 1+ − − log . 12 n m 12 q n m (m−1)!(n−1)!
(6.20)
(6.21)
In particular we get: σ 2 | = 2− 9 3 4 . |C223 4
1
(6.22)
The case of two overlaps corresponds to d2 = 1, and the result reads:
q = m + n − 3,
d1 = q + 1,
(6.23)
1 1 1 (m+n−3)! + − − 3 log n m m+n−3 (m−1)!(n−1)! 2 2 n +1 m +1 1 (m+n−4)2 − log n − log m + 2+ log(m+n−3) 12n 12m 12 m+n−3 1 n−m m+n−4 1 m−n m+n−4 + −2n+ log(n−1) + −2m+ log(m−1) 12 mn m+n−3 12 mn m+n−3 1 1 1 1 + 2(m + n) − 5 + + + log(m + n − 2). (6.24) 12 n m m+n−3 σ log |Cn,m,m+n−3 |2 =
1 12
In particular for m = 3 we get: σ 2 log |Cn3n |
2 1 1 2 = − log n − n+ − log(n − 1) 9 6 n 3 1 1 2 2 5 + n+ + log(n + 1) − log 2 − log 3. 6 n 3 9 18
(6.25)
σ and check that it equals the value From this expression we can extract the value of C232 σ given by (6.22). of C223
428
O. Lunin, S. D. Mathur
6.3. Combinatoric factors and large N limit. The twist operators we have considered so far do not represent proper fields in the conformal field theory. In the orbifold CFT there is one twist field for each conjugacy class of the permutation group, not for each element of the group [10]. The true CFT operators that represent the twist fields can be constructed by summing over the group orbit: On =
λn σh(1...n)h−1 . N!
(6.26)
h∈G
Here G is the permutation group SN and the normalization constant λn will be determined below. Using the normalization condition for the σ operators:
σn (0)σn (1) = 1
(6.27)
we find:
On (0)On (1) =
λ2n (N − n)! σh(1...n)h−1 = λ2n n
σ(1...n)
σn (0)σn (1). N! N!
Requiring the normalisation On (0)On (1) = 1 we find the value of λn : n(N − n)! −1/2 λn = . N!
(6.28)
(6.29)
Let us now look at the three point function. First we consider the combinatorics for the g = 0 cases that we worked with above; the permutation structure was described in (4.4), (4.5), (4.6). Simple combinatorics yields
On (0)Om (1)Oq (z) = × nmq
1 N!
3 λ n λ m λq
N! (N − n)!(N − m)!(N − q)! σn (0)σm (1)σq (z). (N − s)!
(6.30)
One way of getting this expression is to note that s different indices are involved in the permutation, and we can select these indices, in the order in which they appear when the permutations are written out, in N !/(N − s)! ways. Having obtained the indices for any given permutation, we ask how many elements out of the sum over group elements yields this set of indices in the permutation; the answer for σn for example is (N − n)!, since only the permutations of the remaining N − n elements leave the indices in σn untouched. Finally, we note that any permutation σk can be written in k equivalent ways since we can begin the set of indices with any index that we choose from the set; this leads to the factors nmq. Substituting the values of λi we get the final result: √ mnq(N − n)!(N − m)!(N − q)!
On (0)Om (1)Oq (z)=
σn (0)σm (1)σq (z) (6.31) √ (N − s)! N ! with s = 21 (n + m + q − 1). Now we analyse the behavior of the combinatoric factors for arbitrary genus g but in the limit where N is taken to be large while the orders of twist operators (m, n and q)
Correlation Functions for M N /SN Orbifolds
429
as well as the parameter g are kept fixed. There are s different fields X i involved in the 3-point function, and these fields can be selected in ∼ N s ways. Similarly the 2-point function of σn will go as N n since n different fields are to be selected. Thus the 3-point function of normalised twist operators will behave as N s−
n+m+q 2
= N −(g+ 2 ) 1
(6.32)
(which can also be obtained from (6.31)). Thus in the large N limit the contributions from surfaces with high genus will be suppressed, and for the leading order the answer can be obtained by considering only contributions from the sphere (g = 0). This is precisely the case that we have analysed in detail, and knowing the amplitude σn (0)σm (1)σq (z) one can easily extract the leading order of the CFT correlation function: 1√ 1
On (0)Om (1)Oq (z) = . (6.33) mnq σn (0)σm (1)σq (z)sphere + O N N 3/2 7. Four Point Function In this section we compute specific examples of 4-point functions, without attempting to analyze the most general case. The computations illustrate interesting features which arise in our approach for four and higher point functions. In particular we will also need to compute a genus one correlation function. We will also be able to verify specific examples of the fusion coefficients computed in the last section as they will be recovered through factorization of the 4-point functions.
7.1. An example of a 4-point function on a sphere. Let us start with a map that has branch points appropriate for a 4-point correlation function of the form
σn (0)σ2 (1)σ2 (w)σn (∞).
(7.1)
Consider the map z = Ct n
t −a , t −1
(7.2)
where the parameter a will be related with coordinate w and the value of coefficient C will be determined below. The map (7.2) has two obvious ramification points: z = 0 and z = ∞; both of them give an nth order branch point for nonzero values of a. For a general value of a the map (7.2) has two more ramification points; to find them we should look at the equation dz = 0. dt
(7.3)
For the general value of a this equation reads: t n−1 (nt 2 − t ((n − 1)a + (n + 1)) + an) = 0.
(7.4)
430
O. Lunin, S. D. Mathur
The first factor corresponds to the obvious fact that at the point t = 0 we have a ramification point of nth order, while the positions of the two “implicit” points of second order are given by: t± =
1 (n − 1)a + n + 1 ± (a − 1)((n − 1)2 a − (n + 1)2 ) . 2n
(7.5)
One of these points should correspond to z = 1; we let this be the point t+ . The other must correspond to z = w. By requiring z(t+ ) = 1 we determine the value of coefficient C: −n C = t+
t+ − 1 , t+ − a
(7.6)
and we note that in what follows we will have w = t− .
(7.7)
Now we will analyze contributions to the Liouville action coming from the different ramification points. Let us start from the point t = 0. If a = 0 the map (7.2) near this point has the form: z ≈ Cat n
(7.8)
and the inverse map is t≈
z 1/n . aC
(7.9)
The Liouville field and its derivative are given by: dz φ = log + c.c. ≈ log(nCat n−1 ) + c.c., dt
∂t φ ≈
n−1 . t
(7.10)
As usual we will cut a hole of radius around the point z = 0. Then the contribution to the Liouville action coming from the integration over the boundary of this hole is SL (t = 0) = −
n−1 n−1 log n(aC)1/n n . 12
(7.11)
The same analysis near t = ∞ gives: z ≈ Ct n ,
t≈
z 1/n C
,
n−1 ∂t φ ≈ , t 1−n n−1 SL (t = ∞) = −(−1) log nC 1/n δ˜ n . 12 φ ≈ log(nCt n−1 ) + c.c.,
(7.12)
Here we have cut a large circle of radius 1/δ˜ in z plane and the factor of (−1) in the last equation comes from the fact that we go around the point t = ∞ clockwise.
Correlation Functions for M N /SN Orbifolds
431
Near the point t = t− we get:
z ≈ z− + ξ− (t − t− )2 , ξ− =
1 2
d 2z dt 2
≈
t − t− ≈
z ξ−
1/2 ,
nz− (t− − t+ ) , 2t− (t− − a)(t− − 1)
t=t− (7.13) 1 1 φ ≈ log(4(z − z− )ξ− ) + c.c., ∂t φ ≈ , 2 t 1 SL (t = t− ) = − log 4ξ− . 24 To get a contribution for t = t+ one should make a replacement + ↔ − in the last expression; we also note that z+ = 1. Thus we get:
1 log 4ξ+ , 24 n(t+ − t− ) ξ+ ≈ . 2t+ (t+ − a)(t+ − 1)
SL (t = t+ ) = −
(7.14)
Finally we should consider the images of z = ∞ that give finite values for t - there will be a puncture here corresponding to the boundary of the |z| plane. As before we let this circle in the z plane have a radius 1/δ. The map (7.2) has only one such image: t = 1. The result is 1−a C(1 − a) z≈C , t −1≈ , t −1 z z2 2 (7.15) φ ≈ log − + c.c., ∂t φ ≈ − , C(1 − a) t −1 1 SL (t = 1) = − log C(1 − a)δ 2 . 6 Collecting all this information together and making obvious simplifications we finally get the expression for the Liouville action corresponding to our four point function: (n − 1)2 1 (n − 1)2 1 (1) SL = − − log − log δ˜ − log δ 12n 12 12n 3 1 1 1 n+1 1 1 (7.16) − log C − log 2 − − log a − log(1 − a) 4 12 12 2 n 8 1 1 − log (n − 1)2 a − (n + 1)2 − log n. 24 12 ˜
The above expression gives the correlator σn (0)σ2 (1)σ2 (w)σnδ (∞). To compute the correlation function of normalised twist operators, with one point at infinity, defined as
σn (0)σ2 (1)σ2 (w)σn (∞) ≡ lim |z|45n σn (0)σ2 (1)σ2 (w)σn (z) |z|→∞
˜
=
σn (0)σ2 (1)σ2 (w)σnδ (∞)
σn (0)σnδ˜ (∞)
(7.17)
432
O. Lunin, S. D. Mathur
we use arguments similar to those in Subsect. 6.1. Then we find F4 ≡ σn (0)σ2 (1)σ2 (w)σn (∞) ˜
=
σn (0)σ2 (1)σ2 (w)σnδ (∞)
σn (0)σnδ˜ (∞)
[ σ2 (0)σ2 (1)]−1 .
(7.18)
This leads to
1 (1) log F4 = SL − log Fn δ˜Fn − log C2 A2 QB2 + (n + 1 − n) log δ 3 + ((n − 1) − n) log Q, (7.19)
where An , Cn , Fn are given in (3.19) and (6.13). The source of the last two terms is the fact that the numerator has n + 1 fields transforming nontrivially, while the denominator has only n: (2)
˜
σn (0)σnδ (∞) = Fn δ˜Fn eSL (1)
(2)
σn (0)σ2 (1)σ2 (w)σnδ (∞) = eSL eSL
Z (ˆs ) , (Zδ )n
(7.20)
Z (ˆs ) . (Zδ )n+1
(7.21)
(3)
+SL (3)
+SL
We observe that the powers of regularization parameters cancel: (n − 1)2 1 − − Fn − A2 = 0, 12n 12 (n − 1)2 1 1 log δ˜ : − − Fn = 0, log δ : − + = 0, 12n 3 3 log Q : − n + (n − 1) + (2 − 1) = 0.
log :
−
(7.22)
This cancellation gives a consistency check on the calculations. The expression for the logarithm of the normalized four point function is given by: 1 1 n+1 1 1 log F4 = − log C − − log a − log(1 − a) 4 12 2 n 8 (7.23) 1 1 5 2 2 − log (n − 1) a − (n + 1) − log n − log 2. 24 12 12 The value of C is given by (7.6). 7.2. Analysis of the 4-point function. Let us step back from the above calculation and think about the structure of a 4-point function σn (0)σ2 (1)σ2 (w)σn (∞). Consider the limit w → 0, and ask what operators are produced in the OPE of σn and σ2 . There are three possibilities, which must all be considered when we make the CFT operators Oj out of the sum over indices in the σj : (a) The indices of σ2 and the indices of σn have no overlap – i.e., the operators are of the form σ12 σ34...n+2 . In this case the other two operators must have no overlapping indices either, and the entire 4-point function factors into two different parts σ2 σ2 σn σn . The covering surfaces are separate for the two parts, and we just multiply together the correlation functions obtained from the covering surfaces for the 2-point functions.
Correlation Functions for M N /SN Orbifolds
433
(b) The indices of σ2 and the indices of σn have one overlap – i.e., the operators are of the form σ12 σ23...n+1 . The OPE then produces the operator σ123...n+1 = σn+1 . The other two operators in the correlator must also have a single overlap so that they can produce σn+1 . The genus of the surface thus produced is seen to be g=
1 + 1 + (n − 1) + (n − 1) − (n + 1) + 1 = 0. 2
(7.24)
This case in fact corresponds to the surface that was constructed in the subsection above. Note that if we take σ12 around σ23...n+1 then it becomes σ13 . The OPE of σ13 with σ23...n+1 is still an operator of the form σn+1 . On the other hand if we take the two σ2 operators near each other, then we get the identity if we have σ12 σ12 , but we get σ123 if we move the operators through a path such that they become σ12 and σ13 . In fact by moving the various operators around each other on the z plane, we can also get from the same correlator OPEs of the form σ12 σ34 (which is nonsingular) and σ12 σn,n−1,...21 which produces an operator σn−1 . Thus we should find singularities in the 4-point function arising from this surface to correspond to all these possibilities. (c) The indices of σ2 and the indices of σn have two overlaps, and the total number of indices involved in the correlator is s = n. (Note that case (b) above also could be brought to a form where σ2 and σn have two overlaps, but the number of indices involved there overall was n + 1.) The other two operators in the correlator must have a similar overlap of indices, since otherwise they cannot produce an operator that has only n distinct indices. In this case the genus of the covering surface is g=
1 + 1 + (n − 1) + (n − 1) − n + 1 = 1. 2
(7.25)
We see that the correlator O2 O2 O2 O2 will have contributions from correlators
σ2 σ2 σ2 σ2 that give genus 0 and genus 1 surfaces, but no other surfaces. The genus 0 case is contained in the analysis in Subsect. 7.1, and we will study the genus 1 case in Sect. 7.4 below. Note that for the genus 0 case we have many combinations of indices for the σ2 operators as discussed in (b) above, but these all arise from different branches of the same function (7.23). We must thus add the results from these branches (as well as the disconnected part (case (a) above) and the genus 1 contribution) to obtain the complete 4-point function of the O2 operators. We will not carry out the explicit addition since we expect the result to be simpler in the supersymmetric case, which we hope to present elsewhere. 7.3. Analysis of the g = 0 contribution. In this subsection, we analyse the correlator computed in (7.23), which corresponds to case (b) above, to check if it reproduces the expected short distance limits. 7.3.1. The limit w → 0. First let us consider the limit w → 0, which corresponds to t− → 0 and a → 0. For small values of a we have: n+1 an , t− ≈ , n n+1 z− ≈ n2n (n + 1)−2−2n a n+1 , t+ ≈
C≈
nn , (n + 1)n+1
434
O. Lunin, S. D. Mathur
log F4 ≈
5 1 1 − log z− − log 2 (7.26) n(n + 1) 2 12 n 1 1 n 1 1 − + + log n + + + log(n + 1). 6 6(n + 1) 12 6 6n 12
1 12
One can see that the correct singularity (z− )−2(5n +52 −5n+1 ) is reproduced. Using the expressions for three point functions we derived before one can check that in the limit w ≈ 0: σ |2 , log F4 ≈ −2(5n + 52 − 5n+1 ) log w + 2 log |Cn,2,n+1
(7.27)
which agrees with the anticipated factorization. 7.3.2. The limit w → 1. Let us now consider a limit a → 1, which corresponds to one of two possible ways for point w to approach 1. After introducing b = a − 1 we get: b(n − 1) 1√ −4nb, C ≈ 1, ± t± ≈ 1 + 2n 2n b bn+1 t± − a ≈ ± − − , n n 2 n z− b b z− − z+ = 1 − (n − 1) − −1≈ 1−2 − z+ n n √ b × 1 − (n + 1) − − 1 ≈ −4i nb, n
(7.28)
1 log(w − 1) = −452 log(w − 1). 4 This singularity corresponds to σ2 and σ2 fusing to the identity. 2 There is another limit (a → (n+1) ) which also corresponds to w → 1. Introducing (n−1)2 log F4 ≈ −
b = a − (n + 1)2 /(n − 1)2 , we get: b n+1 n − 1 n+1 , ± , C≈− t± ≈ n−1 n n+1
dz n(n − 1)4 Cnt n (t − t )(t − t ) ≈ − (t − t+ )(t − t− ), = + − dt (t − 1)2 4(n + 1)2 n(n − 1)4 t− (n − 1)4 b3/2 z− − z+ ≈ − dt (t − t )(t − t ) = − , √ + − 4(n + 1)2 t+ 3 n(n + 1)2 1 1 1 2 1 log F4 ≈ − log(w − 1) + log(n − 1) − n+ − log n 36 9 6 n 9 1 1 1 2 1 + log(n + 1) n+1+ − − log 2 − log 3. 6 n 18 3 36
(7.29)
Using Eqs. (6.25) and (6.22) one can see that σ 2 σ 2 | + log |Cn3n | . log F4 ≈ −2(52 + 52 − 53 ) log(w − 1) + log |C223
This corresponds to merging σ2 and σ2 to σ3 .
(7.30)
Correlation Functions for M N /SN Orbifolds
435
7.3.3. The limit w → ∞. We now look at the remaining limits of the expression (7.23) at which the four point function becomes singular. They emerge at the points where the coefficient C goes either to 0 or infinity, i.e. if the value of t+ approaches one of the points: 0, 1, a, ∞. Substituting this to the quadratic equation (7.4), we get the candidates for the critical values of a: 0, 1, ∞. Two of these limits we already considered, now we analyze the last possibility: a → ∞. In this limit we have: t+ ≈ a
n−1 1 − , n n(n − 1)
t− ≈
n . n−1
(7.31)
So it is convenient to keep a point z− fixed and vary the value of z+ instead2 . Thus we get: 1 n−1 n −n t− − 1 C = t− ≈− , t− − a n a(n − 1) 2n 1 n−1 n − 1 , z+ ≈ a n (n − 1)2 1 1 1 1 1 log F4 ≈ − + log z+ + log(n − 1) − n+ 24 12n(n − 1) 12 6 n 1 1 1 5 + n−1+ + log n − log 2. (7.32) 6 n−1 12 12 This expression can be rewritten in terms of three point functions: σ |2 , log F4 ≈ 2(5n − 52 − 5n−1 ) + 2 log |Cn−1,2,n
thus it corresponds to the factorization of the following type:
σ(1...n) σ(12) σ(12) σ(1...n) .
(7.33)
(7.34)
Thus the four point function reproduces the anticipated factorizations. 7.4. The g = 1 correlator σ12 σ12 σ12 σ12 . Let us consider the case n = 2 in (7.1), so that we have the correlator
σ2 (0)σ2 (1)σ2 (w)σ2 (∞).
(7.35)
We wish to have the number of sheets over a generic point in the z plane to be 2; this gives g = 1 for the covering surface . Each branch point is of order 2, so we seek a map of the form dz = α[z(z − 1)(z − w)(z − z∞ )]1/2 . (7.36) dt We choose not to put any branch point at infinity explicitly, the limit z∞ → ∞ will be taken in the end of the calculation. This equation may be solved using the Weierstrass function P and the solution in the z∞ → ∞ limit is given by: z(t) =
P(t) − e1 , e2 − e 1
(7.37)
2 Note that the definition of t and t depend on the choice of branch for a multivalued function; in + − particular t+ and t− interchange if one goes along a small circle around the point a = 1.
436
O. Lunin, S. D. Mathur
where 1 τ 1 τ e1 = P( ), e2 = P( ), e3 = P( + ), 2 2 2 2 e3 − e1 θ3 (τ ) 4 = w= θ4 (τ ) e2 − e 1
(7.38)
The coordinate t describes a torus given by modding out the complex plane with translations by 1 and τ . We choose the fiducial metric on the torus to be that flat metric d sˆ 2 = dtd t¯. Then we calculate the contribution to the Liouville action from the point z = 0. Near this point √ dz ≈ αz1/2 −wz∞ , dt √ dz φ = log + c.c. ≈ log αz1/2 −wz∞ + c.c., dt dz 1 ∂t φ = ∂z φ, ∂z φ ≈ . dt 2z
(7.39)
We can write dt∂t φ=dz∂z φ for any infinitesimal segment of the contour of integration around the puncture, but we must circle the z plane puncture twice to circle the t space puncture once. We find it easier to work in the z plane instead of the t space to evaluate the integral around the puncture. We recall that the puncture has a radius in the z plane, and we put in a factor of 2 at the end to account for the relation between z and t contours. Then we get for the contribution to SL : 1 SL (z = 0) = − log α 2 |wz∞ | . (7.40) 24 Similarly from the points z = 1, z = w and z = z∞ we get the contributions 1 SL (z = 1) = − log α 2 |1 − w||1 − z∞ | , (7.41) 24 1 (7.42) SL (z = w) = − log α 2 |w||1 − w||w − z∞ | , 24 1 SL (z = z∞ ) = − log α 2 |z∞ ||z∞ ||w − z∞ | . (7.43) 24 Now we look at the image of infinity, where we have cut a circle |z| = 1/δ. We have dz ≈ αz2 , dt
φ ≈ log[αz2 ] + c.c.,
∂z φ ≈
2 . z
(7.44)
Noting that we must take the z plane contour clockwise, and putting in the factor of 2 to relate the contour to the t space contour, we get the contribution 1 SL (z = ∞) = log αδ 2 . (7.45) 3 Adding up all contributions we get 1 (1) SL = − log − 6 1 ≈ − log − 6
2 log δ − 3 2 log δ − 3
1 log |w(1 − w)z∞ (z∞ − 1)(z∞ − w)| 12 1 1 log |w(1 − w)| − log |z∞ |. 12 4
(7.46)
Correlation Functions for M N /SN Orbifolds
437
Note that α does not appear in the final result, as one could anticipate from the fact that this constant can be absorbed in the rescaling of the t plane. In the case under consideration there is no contribution to the Liouville action coming from the |z| > 1/δ region: there is no curvature on the torus t and 1 dz α d z˜ =− 2 2 ≈ − 2, dt δ z dt δ
(7.47)
giving a constant φ to leading order and thus a vanishing kinetic term for φ. Thus (1) SL = SL and the general expression (2.44) gives: (1)
σ2 (0)σ2 (1)σ2 (w)σ2 (z∞ )δ = eSL
Z (ˆs ) . (Zδ )2
(7.48)
To obtain the normalized 4-point function we write
σ2 (0)σ2 (1)σ2 (w)σ2 (∞) ≡
lim
|z∞ |452 σ2 (0)σ2 (1)σ2 (w)σ2 (z∞ )
lim
|z∞ |452
|z∞ |→∞
˜
=
|z∞ |→∞
σ2 (0)σ2 (1)σ2 (w)σ2δ (z∞ )
σ2 (0)σ2 (1)2
(7.49)
= 2−2/3 |w(1 − w)|− 12 Zτ . 1
Here we have used the fact that in this case the partition function Z (ˆs ) in (2.16) is that on the flat torus with modular parameter τ given through (7.38). Since the group S 2 equals the group Z2 , we can compare (7.49) with the 4-point function obtained for σ2 operators for the Z2 orbifold in [9]. Equation (7.49) agrees with (4.13) of [9] for the case of a noncompact boson field and with (4.16) for the compact boson field. One observes that if the fields Xi are noncompact bosons, then as w → 1 we find a factor log(w − 1) in the OPE in addition to the expected power (w − 1)1/4 . We suggest the following interpretation of this logarithm. There is a continuous family of momentum modes for the noncompact boson, with energy going to zero. If we do not orbifold the target space, then momentum conservation allows only a definite momentum mode to appear in the OPE of two fields. But the orbifolding destroys the translation invariance in X1 − X2 , and nonzero momentum modes can be exchanged between sets of operators where each set does not carry any net momentum charge. The exchange of such modes (with dimensions accumulating to zero) between the pair σ2 (w)σ2 (1) and the pair σ2 (0)σ2 (∞) gives rise to the logarithm. Of course when the boson is compact, this logarithm disappears, as can be verified from (7.49) or the equivalent results in [9]. 8. Discussion The motivation for our study of correlation functions of symmetric orbifolds was the fact that the dual of the AdS3 × S 3 × M spacetime (which arises in black hole studies) is the CFT arising from the low energy limit of the D1–D5 system, and the D1–D5 system is believed to be a deformation of an orbifold CFT (with the undeformed orbifold as a special point in moduli space). To study this duality we must really study the supersymmetric orbifold theory, while in this paper we have just studied the bosonic theory. It turns out however that the supersymmetric orbifold can be studied with only
438
O. Lunin, S. D. Mathur
a small extension of what we have done here. Following [14] we can bosonize the fermions. Then if we go to the covering space near the insertion of twist operator then we find only the following difference from the bosonic case – at the location of the twist operator we do not have the identity operator, but instead a “charge operator” of the form P (∂Xa , ∂ 2 Xa , . . . )ei ka Xa . Here Xa are the bosons that arise from bosonizing the complex fermions, and P is a polynomial expression in its arguments. It is easy to compute the correlation function of these charge operators on the covering space , and then we have the twist correlation functions for the supersymmetric theory. We will present this calculation elsewhere, but here we note that many properties of interest for the supersymmetric correlation functions can be already seen from the bosonic analysis that we have done here. In this section we recall the features of the AdS/CFT duality map and analyse some properties of the 3-point functions in the CFT.
8.1. “Universality” of the correlation functions. We have mentioned before that while we have discussed the orbifold theory R N /SN (where the coordinate of R gave X, a real scalar field), we could replace the CFT of X by any other CFT of our choice, and the calculations performed here would remain essentially the same. When the covering surface had genus zero, the results depend only on the value of c, and thus if we had (T 4 )N /SN theory or a (K3 )N /SN theory, then we would simply choose c = 4 in the Liouville action (2.17) (instead of c = 1). If had g = 1, then we would need to put in the partition function (of a single copy) of T 4 or K3 for the value of Z (ˆs ) in (2.16). But apart from the value of c and the value of partition functions on there is no change in the calculation. Thus in particular the 3-point functions that we have computed at genus zero are universal in the sense that if we take them to the power c then we get the 3-point functions for any CFT of the form M N /SN with the CFT on M having central charge c. There is a small change in the calculation when we consider the supersymmetric case. The fermions from different copies of M anticommute, and the twist operators carry a representation of the R symmetry. As a consequence the dimensions of the twist operators are not given by (3.21), but for the supersymmetric theory based on M = T 4 are given by 21 (n − 1). However as mentioned above, our analysis can be extended with small modifications to such theories as well. Note that our method does not work if we have an orbifold group other than SN . Thus for example if we had a ZN orbifold of a complex boson [9], then we could go to the covering space over a twist operator σn , but not write the CFT in terms of an unconstrained field on this covering space. The reason is that we have n sheets or more of the cover over any point in the base space, but the central charge of the theory is just 2, and so we cannot attribute one scalar field to each sheet of the cover. Thus our method, and its associated universalities, are special to SN orbifolds, where a twist operator just permutes copies of a given CFT but does not exploit any special symmetry of the CFT itself.
8.2. The genus expansion and the fusion rules of WZW models. We have studied the orbifold CFT on the plane, but found that the correlation functions can be organized in a genus expansion, arising from the genus of the covering surface . In the large 1 N limit the contribution of a higher genus surface goes like 1/N g+ 2 . This situation is similar to that in the Yang–Mills theory that is dual to AdS5 × S 5 . The Yang–Mills theory has correlation functions that can be expanded in a genus expansion, with higher
Correlation Functions for M N /SN Orbifolds
439
genus surfaces supressed by 1/N g . In the Yang–Mills theory the genus expansion has its origins in the structure of Feynman diagrams for fields carrying two indices (the “double line representation” of gauge bosons). In our case we have quite a different origin for the genus expansion. In the case of AdS5 × S 5 it is believed that the genus expansion of the dual Yang–Mills theory is related to the genus expansion of string theory on this spacetime, though the precise relationship is not clear. It would be interesting if the genus expansion we have for the D1–D5 CFT would be related to the genus expansion of the string theory on AdS3 × S 3 × M. In this context we observe the following relation. It was argued in [7] that the orbifold CFT M N /SN indeed corresponds to a point in the D1–D5 system moduli space. Further, at this point we have the number of 1-branes (n1 ) and of 5-branes (n5 ) given by n5 = 1, N = n1 n5 = n1 . The dual string theory is in general an SU (2) Wess–Zumino– Witten (WZW) model [18], though at the orbifold point of the CFT this string theory is complicated to analyze. The twist operators σn , n = 1 . . . N of the CFT (σ1 =Identity) correspond to WZW primaries with j = (n − 1)/2, 0 ≤ j ≤ N−1 2 . Since in a usual WZW model we have 0 ≤ j ≤ k/2, we set k = N − 1. The fusion rules for the WZW model, which give the 3-point functions of the string theory on the sphere (tree level) are as follows. The spins j follow the rules for spin addition in SU (2), except that there is also a “truncation from above” (j1 , j2 ) → j3 , |j1 − j2 | ≤ j3 ≤ |j1 + j2 |,
j1 + j2 + j3 ≤ k.
(8.1)
Now consider the 3-point function in the orbifold CFT, for the case where the genus of the covering surface is g = 0 . The ramification order of at the insertion of σni is ri = (ni −1) = 2ji . The rules in (4.4), (4.5), (4.6) translate to |j1 −j2 | ≤ j3 ≤ |j1 +j2 |. Further, the number of sheets s is bounded as s ≤ N . Then the relation (4.1) gives ri = g − 1 + s ≤ −1 + N → j1 + j2 + j3 ≤ k. (8.2) 2 i
While (8.2) is a relation for the bosonic orbifold theory, we expect an essentially similar relation for the supersymmetric case. Thus we observe a similarity between the g = 0 3-point functions of the WZW model (8.1) and of the CFT (8.2). At genus g = 1 however, we find that any three spins j1 , j2 , j3 can give a nonzero 3-point function in the string theory. In the orbifold CFT, however, we get only a slight relaxation of the rule (8.2): we get j1 + j2 + j3 ≤ k + 1. Roughly speaking we can reproduce this rule in the string theory if we require that in the string theory one loop diagram there be a way to draw the lines such that only spins j ≤ 1/2 be allowed to circulate in the loop. Of course we are outside the domain of any good perturbation expansion at this point, since if the spins are of order k then there is no small parameter in the theory to expand in, and thus there is no requirement that there be an exact relation between the rules in a WZW string theory and the rules in the orbifold CFT. We note that in [14] the 3-point functions of chiral primaries that were studied had “one overlap” in their indices. This corresponds to j1 +j2 = j3 in the above fusion rules, and since for the supersymmetric case the dimension is linear in the charge, we also have 51 + 52 = 53 . This corresponds to the case of “extremal” correlation functions in the language of [20]. In [14] the 3-point correlators for this special case were found by an elegant recursion relation, which arises from the fact that there is no singularity in the OPE, and thus the duality relation of conformal blocks becomes a “chiral ring” type of
440
O. Lunin, S. D. Mathur
associativity law among the fusion coefficients. It is not clear however how to extend this method to the non-extremal case, and one motivation for the present work was to develop a scheme to compute the correlators for j1 + j2 < j3 , which corresponds to more than one overlap. In the case of one overlap we have extended our calculations to the supersymmetric case, and found results in agreement with [14]. 8.3. 3-point couplings and the stringy exclusion principle. In the AdS5 × S 5 case the 3-point couplings of supergravity agree with the large N limit of the 3-point functions in the free Yang–Mills theory; thus there is a nonrenormalization of this correlator as the coupling g is varied. It is not clear if a similar result holds for the AdS3 × S 3 × M case, and even less clear what nonrenormalization theorems hold at finite N . But it is nevertheless interesting to ask how the correlators in the orbifold CFT behave as we go from infinite N to finite N , and in particular what happens as we approach the limits of the stringy exclusion principle. Thus we examine the ratio √ N¯ On Om Oq N¯ ¯ R(m, n, q; N ) ≡ . (8.3) √ limN→∞ N On Om Oq N where the √ subscripts on the correlator give the value of N . We have rescaled the correlators by N√to obtain the effective coupling of the 3-point function; the correlator itself goes as 1/ N . For n, m, q << N¯ we expect R ≈ 1, while as n, m, q become order N¯ we expect that R will fall to zero. We take the case of the 3-point function with single overlap, and further set m = n. Then we have q = 2n − 1, and we write R(n, n, 2n − 1; N¯ ) ≡ R(n; N¯ ).
(8.4)
It is easy to see that for the case of single overlap the correlators σn σm σm+n−1 can get a contribution only from surfaces with g = 0, for which case we have done a complete calculation of the correlator and its combinatorics. Note further that in the ratio (8.3) the actual value of σn σm σm+n−1 will cancel, and the value of R will be determined by combinatorial factors. These factors are expected to be the same for the bosonic and supersymmetric cases. In Fig. 1 we plot√R(n; N¯ ) versus n (for N¯ = 1000). We see that R drops significantly after n exceeds ∼ N¯ . This effect can be traced to the fact that the number of ways to select s ordered indices from N¯ indices is 1 2 s−1 1− . . . (1 − ) N¯ (N¯ − 1) . . . (N¯ − s + 1) = N¯ s 1 − N¯ N¯ N¯ s−1 1 1 s(s − 1) s s ¯ ¯ ≈N 1− j =N 1− . (8.5) 2 N¯ j =1 N¯ If the CFT 3-point function is indeed not renormalized for finite N , then the above result has interesting implications. The coupling between three gravitons would then be a constant for low energies (n << N¯ ) but would drop rapidly for high energies. Thus the behavior of high frequency modes would not follow a naive “equivalence principle”. This issue may be relevant to Hawking’s derivation of black hole radiation, where we need to make a change of coordinates to study the high frequency modes near the horizon.
Correlation Functions for M N /SN Orbifolds
441
1
0.8
0.6
0.4
0.2
20
40
60
80
100
Fig. 1. The factor (8.4) as function of n for N¯ = 1000
For these modes to evolve as used in the derivation, we use implicitly the naive value of the following cubic coupling: that of a low energy graviton (representing the attraction of the hole) and two high energy quanta (representing the high energy mode emerging from the horizon, getting redshifted by the attraction of the hole). If this coupling differs from the one expected from naive gravitational physics, then the semiclassical derivation of Hawking radiation may require modification, with corresponding implications for the information paradox.
8.4. Conclusion. It would be important to pursue further the study of the supersymmetric case, and to compare with the dual superstring theory. The subset of correlators computed in [14] was compared to supergravity in [15], but it was a little unclear how closely the two calculations agreed. A better picture may emerge when we look at the complete set of correlators of the supersymmetric side, which is possible to do by extending our computation here to include the R-charges carried by the twist operators in the supersymmetric case. It was argued recently in [21] that the CFT of the D1–D5 system exhibits a duality to a set of spacetimes, of which the AdS space is only one member. If the 3-point functions are protected against coupling changes then we should see a reflection of this fact in correlators at the orbifold point. Acknowledgements. We are grateful to A. Jevicki, M. Mihailescu, S. Ramgoolam, and S. Frolov for patiently explaining their results to us, and to L. Rastelli for extensive discussions in the early phase of this work. We also benefited greatly from discussions with S. Das, E. D’Hoker, C. Imbimbo, F. Larsen, E. Martinec, S. Mukhi, S. Sethi, and S.T. Yau.
References 1. Maldacena, J.: Adv. Theor. Math. Phys. 2, 231 (1998), hep-th/9711200; Gubser, S., Klebanov, I. and Polyakov, A.: Phys. Lett. B 428, 105 (1998), hep-th/9802109; Witten, E.: Adv. Theor. Math. Phys. 2, 253 (1998), hep-th/9802150; Aharony, O., Gubser, S.S., Maldacena, J., Ooguri, H. and Oz, Y.: Phys. Rept. 323, 183 (2000), hepth/9905111
442
O. Lunin, S. D. Mathur
2. Freedman, D., Mathur, S.D., Rastelli, L. and Matusis, A.: Nucl. Phys. B 546, 96 (1999) , hep-th/9804058; Lee, S., Minwalla, S., Rangamani, M. and Seiberg, N.: Adv. Theor. Math. Phys. 2, 697 (1999), hepth/9806074 3. Howe, P.S., Sokatchev, E. and West, P.C.: Phys. Lett. B 444, 341 (1998), hep-th/9808162 4. de Boer, J.: Nucl. Phys. B 548, 139 (1999), hep-th/9806104; Dijkgraaf, R.: Nucl. Phys. B 543, 545 (1999), hep-th/9810210; Seiberg, N. and Witten, E.: JHEP 9904, 017 (1999), hep-th/9903224 5. Strominger, A. and Vafa, C.: Phys. Lett. B 379, 99 (1996), hep-th/9601029 6. Callan, C. and Maldacena, J.: Nucl. Phys. B 472, 591 (1996), hep-th/9602043; Das, S.R. and Mathur, S.D.: Nucl. Phys. B 478, 561 (1996), hep-th/9606185; Maldacena, J. and Strominger, A.: Phys. Rev. D 55, 861 (1997), hep-th/9609026 7. Larsen, F. and Martinec, E.: JHEP 9906, 019 (1999), hep-th/9905064 8. Hamidi, S. and Vafa, C.: Nucl. Phys. B 279, 465 (1987) 9. Dixon, L., Friedan, D., Martinec, E. and Shenker, S.: Nucl. Phys. B 282, 13 (1987) 10. Dijkgraaf, R., Vafa, C., Verlinde, E. and Verlinde, H.: Commun. Math. Phys. 123, 485 (1989) 11. Arutyunov, G.E. and Frolov, S.A.: Theor. Math. Phys. 114, 43 (1998), hep-th/9708129; Nucl. Phys. B 524, 159 (1998), hep-th/9712061 12. Bantay, P.: Phys. Lett. B 419, 175 (1998), hep-th/9708120 13. Friedan, D.: Introduction To Polyakov’s String Theory. In: Recent Advances in Field Theory and Statistical Mechanics, ed. by J. B. Zuber and R. Stora. Amsterdam: North–Holland, 1984 14. Jevicki, A., Mihailescu, M. and Ramgoolam, S.: Nucl. Phys. B 577, 47 (2000), hep-th/9907144 15. Mihailescu, M.: JHEP 0002, 007 (2000), hep-th/9910111 16. Evslin, J., Halpern, M.B. and Wang, J.E.: Int. J. Mod. Phys. A 14, 4985 (1999), hep-th/9904105; de Boer, J., Evslin, J., Halpern, M.B. and Wang, J.E.: Int. J. Mod. Phys.A 15, 1297 (2000), hep-th/9908187; Halpern, M.B. and Wang, J.E.: hep-th/0005187 17. Szego, G.: Orthogonal Polynomials. Providence, RI: American Mathematical Society, 1959 18. Giveon, A., Kutasov, D. and Seiberg, N.: Adv. Theor. Math. Phys. 2, 733 (1998), hep-th/9806194 19. Maldacena, J. and Strominger, A.: JHEP 9812, 005 (1998), hep-th/9804085. 20. D’Hoker, E., Freedman, D.Z., Mathur, S.D., Matusis, A. and Rastelli, L.: Extremal correlators in the AdS/CFT correspondence. hep-th/9908160 21. Dijkgraaf, R., Maldacena, J., Moore, G. and Verlinde, E.: A black hole farey tail. hep-th/0005003 Communicated by R. H. Dijkgraaf
Commun. Math. Phys. 219, 443 – 463 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Hausdorff Dimension of Measures via Poincaré Recurrence L. Barreira1, , B. Saussol 2, 1 Departamento de Matemática, Instituto Superior Técnico, 1049-001 Lisboa, Portugal.
E-mail:
[email protected]
2 LAMFA-CNRS FRE 2270, Université de Picardie Jules Verne, 33 rue Saint Leu, 80039 Amiens, France.
E-mail:
[email protected] Received: 17 July 2000 / Accepted: 20 December 2000
Abstract: We study the quantitative behavior of Poincaré recurrence. In particular, for an equilibrium measure on a locally maximal hyperbolic set of a C 1+α diffeomorphism f , we show that the recurrence rate to each point coincides almost everywhere with the Hausdorff dimension d of the measure, that is, inf{k > 0 : f k x ∈ B(x, r)} ∼ r −d . This result is a non-trivial generalization of work of Boshernitzan concerning the quantitative behavior of recurrence, and is a dimensional version of work of Ornstein and Weiss for the entropy. We stress that our approach uses different techniques. Furthermore, our results motivate the introduction of a new method to compute the Hausdorff dimension of measures. 1. Introduction One of the basic but fundamental results of the theory of dynamical systems is the Poincaré Recurrence Theorem. Essentially it states that any dynamical system preserving a finite invariant measure exhibits a non-trivial recurrence to each set of positive measure. More precisely, let T : X → X be a measurable transformation, and µ a T -invariant probability measure in X. The Poincaré Recurrence Theorem says that if A ⊂ X is a measurable set of positive measure, then card{n > 0 : T n x ∈ A} = ∞ for µ-almost every point x ∈ A. Unfortunately this information is only of qualitative nature. In particular it does not address the following natural problems: 1. with which frequency an orbit visits a given set of positive measure; 2. with which rate a given point returns to an arbitrarily small neighborhood of itself. L. B. was partially supported by FCT’s Funding Program, and grants PRAXIS XXI 2/2.1/MAT/199/94 and NATO CRG970161. B. S. was supported by FCT’s Funding Program and by the Center for Mathematical Analysis, Geometry, and Dynamical Systems.
444
L. Barreira, B. Saussol
The Birkhoff Ergodic Theorem provides a comprehensive answer to the first problem. The second problem has been given considerable growing interest during the last decade, also in connection with other fields, including compression algorithms, numerical study of dynamical systems, and applications in linguistics. In particular, there exist several results towards a partial answer of this problem, including the noteworthy work of Boshernitzan [3] and Ornstein and Weiss [8] (see Sect. 2 and 4 for details). The purpose of this paper is to provide a comprehensive answer to the abovementioned problem 1, concerning the quantitative behavior of recurrence. In particular, our results are non-trivial generalizations of the above-mentioned results of Boshernitzan, and provide a dimensional version of the work of Ornstein and Weiss for the entropy (see Sects. 2 and 4 for explanations and examples). We emphasize that our approach uses different techniques. In particular we obtain a new proof of one of the main results of Boshernitzan in [3]. We now illustrate our results with a rigorous statement; see Sect. 4 for details. We shall prove that if µ is an ergodic Gibbs measure of a C 1+α diffeomorphism f on a locally maximal hyperbolic set, then log inf{k > 0 : f k x ∈ B(x, r)} log µ(B(x, r)) = lim r→0 r→0 − log r log r lim
(1)
for µ-almost every point x, where B(x, r) is the ball of radius r centered at x. Note that the identity (1) relates two quantities of very different nature, called respectively recurrence rate and pointwise dimension. In particular, only the first quantity depends on the diffeomorphism, while only the second quantity depends on the measure. Furthermore, our results motivate the introduction of a new method to compute the Hausdorff dimension of a measure. The structure of the paper is as follows. The main statements and inequalities relating the lower and upper pointwise dimensions, and the lower and upper recurrence rates are formulated and discussed in Sect. 2 and 3. We also present examples which indicate that the hypotheses in our results are optimal. In Sect. 4 we apply those results to the case of equilibrium measures supported on locally maximal hyperbolic sets, and establish the identity (1) for µ-almost every point. Section 5 contains an application to suspension flows. The proofs are collected in Sect. 6.
2. Lower Bounds for the Pointwise Dimension Let T : X → X be a Borel measurable transformation on the separable metric space (X, d). Note that T is not necessarily invertible. We define the return time of a point x ∈ X into the open ball B(x, r) by τr (x) = inf{k ∈ N : T k x ∈ B(x, r)} = inf{k ∈ N : d(T k x, x) < r}, def
where N denotes the set of positive integers. We also define the lower and upper recurrence rates of x by log τr (x) r→0 − log r
R(x) = lim
log τr (x) . r→0 − log r
and R(x) = lim
Hausdorff Dimension of Measures via Poincaré Recurrence
445
Furthermore, the lower and upper pointwise dimensions of µ at a point x ∈ X are given by log µ(B(x, r)) log r r→0
d µ (x) = lim
and
d µ (x) = lim
r→0
log µ(B(x, r)) . log r
The following statement provides upper bounds for the lower and upper recurrence rates respectively in terms of the lower and upper pointwise dimensions. Theorem 1. If T : X → X is a Borel measurable transformation on a measurable set X ⊂ Rd for some d ∈ N, and µ is a T -invariant probability measure on X, then R(x) ≤ d µ (x) and R(x) ≤ d µ (x)
(2)
for µ-almost every x ∈ X. It follows from Whitney’s embedding theorem that if X is an arbitrary subset of a finite-dimensional smooth manifold, then it can be smoothly embedded into Rd for some d ∈ N, and thus Theorem 1 applies. Example 3 in Sect. 4 illustrates that the inequalities in (2) may be strict on a set of positive measure. Boshernitzan proved in [3] that if the α-dimensional Hausdorff measure mα is σ finite on X (that is, if X can be written as a countable union of sets Xi for i = 1, 2, . . . such that mα (Xi ) < ∞ for all i), and µ is an invariant probability measure on X, then lim [n1/α · d(T n x, x)] < ∞
n→∞
for µ-almost every x ∈ X. He also showed that if, in addition, mα (X) = 0, then lim [n1/α · d(T n x, x)] = 0
n→∞
(3)
for µ-almost every x ∈ X. Recall that the Hausdorff dimension of a probability measure µ on X is given by dimH µ = inf{dimH Z : µ(Z) = 1}, where dimH Z denotes the Hausdorff dimension of the set Z. The measure µ is called exact dimensional if there exists a constant d such that d µ (x) = d µ (x) = d for µ-almost every x ∈ X.
(4)
It follows from Young’s criteria (see [13] for details) that if (4) holds, then dimH µ = d. In our setting Boshernitzan’s result can be reformulated in the following manner (for details, see Sect. 6 and in particular Lemma 4). Theorem 2 ([3]). If T is a Borel measurable transformation on the separable metric space X, and µ is a T -invariant probability measure on X, then R(x) ≤ dimH µ for µ-almost every x ∈ X. We can also rephrase the first inequality in (2) in a form similar to (3). Theorem 3. If T : X → X is a Borel measurable transformation on a measurable set X ⊂ Rd for some d ∈ N, and µ is a T -invariant probability measure on X, then (3) holds for µ-almost every x ∈ X such that d µ (x) < α.
446
L. Barreira, B. Saussol
Using Young’s criteria (see [13]), one can show that dimH µ ≥ d µ (x) for µ-almost every x ∈ X. Therefore, Theorem 3 may in general provide a stronger statement than that in (3), and the first inequality in (2) may be sharper than that in Theorem 2. This possibility indeed occurs in the following example. Example 1. In [10], Pesin and Weiss presented an example of a Hölder homeomorphism in a closed subset X of [0, 1], whose unique (and thus ergodic) measure of maximal entropy µ is not exact dimensional. More precisely, there exist disjoint sets A1 , A2 ⊂ [0, 1] with positive µ-measure whose union is equal to X, and there exist positive constants c1 and c2 with c1 = c2 such that µ|Ai is exact dimensional and d µ (x) = d µ (x) = ci for µ-almost every x ∈ Ai and i = 1, 2. Clearly dimH µ = max{c1 , c2 } and thus d µ (x) < dimH µ on a set of positive µ-measure (on the set Ai with i such that ci = min{c1 , c2 }). This example illustrates that in general Theorem 3 provides a stronger statement than that in (3). Therefore, one can see the first inequality in Theorem 1 as a non-trivial generalization of one of Boshernitzan’s main results in [3]. Furthermore, we are able to give an estimate for the upper recurrence rate, and we shall see (in Sects. 3 and 4) that for several classes of maps and measures the inequalities in (2) are in fact identities on a full measure set. Therefore, the inequalities in Theorem 1 and 3 are optimal. Example 1 also illustrates that for an arbitrary transformation the functions d µ and d µ need not be invariant µ-almost everywhere. Assume now that T is a Lipschitz map with Lipschitz inverse, and let c > 1 be a Lipschitz constant for T and T −1 . It is easy to verify that τcr (T x) ≤ τr (x) ≤ τr/c (T x) for every x ∈ X and every r > 0. Thus R(T x) = R(x)
and
R(T x) = R(x)
for every x ∈ X. Under the assumption on T , one can also verify that for every x ∈ X we have d µ (T x) = d µ (x)
and
d µ (T x) = d µ (x).
3. Coincidence of Recurrence Rate and Pointwise Dimension 3.1. Formulation of the main result. In this section we investigate conditions under which the inequalities in (2) become identities (see also Sect. 4). These conditions will be shown to hold for a large class of invariant measures. The return time of the point y ∈ B(x, r) into B(x, r) is defined by τr (y, x) = inf{k > 0 : d(T k y, x) < r}. def
(5)
One can easily verify that if d(x, y) < r then τ4r (y) ≤ τ2r (y, x) ≤ τr (y).
(6)
For each x ∈ X and r, ε > 0, we consider the set Aε (x, r) = {y ∈ B(x, r) : τr (y, x) ≤ µ(B(x, r))−1+ε }. We shall say that the measure µ has long return time (with respect to T ) if log µ(Aε (x, r)) >1 r→0 log µ(B(x, r)) lim
(7)
Hausdorff Dimension of Measures via Poincaré Recurrence
447
for µ-almost every x ∈ X and every sufficiently small ε > 0. The class of measures µ with long return time includes equilibrium measures supported on locally maximal hyperbolic sets (see Theorem 5 below). On the other hand, Example 2 below illustrates that a T -invariant measure µ may have long return time even if T is not uniformly hyperbolic. See also Sect. 3.2 for a discussion of the relation of the notion of long return time with return time statistics. The following is a considerably strengthened version of Theorem 1 for measures with long return time. Theorem 4. Let T : X → X be a Borel measurable transformation on a measurable set X ⊂ Rd for some d ∈ N, and µ a T -invariant probability measure on X. If µ has long return time, and d µ (x) > 0 for µ-almost every x ∈ X, then R(x) = d µ (x) and R(x) = d µ (x)
(8)
for µ-almost every x ∈ X. See Sect. 3.2 and 4 for applications of Theorem 4. Remark that the recurrence rates R(x) and R(x) are essentially quantities of topological nature, which are defined independently of any measure. Therefore, the identities in (8) provide non-trivial relations between topological and measure-theoretic quantities.
3.2. Relation with return time statistics. We define the distribution of return time of T (with respect to µ) on the ball B(x, r) by def
Fx,r (t) =
µ({y ∈ B(x, r) : τr (y, x) > t/µ(B(x, r))}) µ(B(x, r))
for each t ≥ 0. In a variety of systems with some kind of hyperbolicity it has been established that Fx,r (t) → e−t as r → 0, for µ-almost every x ∈ X. This behavior is known as the exponential statistic of return time, and is becoming an important ingredient in the analysis of recurrence in dynamical systems. This study is closely related to the above-introduced notion of long return time. This can be readily seen from the equation µ(Aε (x, r)) = [1 − Fx,r (µ(B(x, r))ε )]µ(B(x, r)) ≤ µ(B(x, r))1+ε + µ(B(x, r)) sup |Fx,r (t) − e−t |,
(9)
t≥0
which holds for all sufficiently small r > 0. This implies the following criterion for long return time. Proposition 1. Let T be a Borel measurable transformation on the separable metric space X, and µ a probability measure on X. Assume that for µ-almost every x ∈ X there exists γ = γ (x) > 0 such that sup{|Fx,r (t) − e−t | : t ≥ 0} ≤ µ(B(x, r))γ for all sufficiently small r > 0. Then µ has long return time.
448
L. Barreira, B. Saussol
On the other hand, it is not necessary to have an exponential statistic of return time. For example, assume that for µ-almost every x ∈ X there exist γ = γ (x) > 0 and a function Fx : [0, +∞) → [0, 1] which is Hölder continuous in an open neighborhood of 0 such that sup{|Fx,r (t) − Fx (t)| : t ≥ 0} ≤ µ(B(x, r))γ for all sufficiently small r > 0. Note that one must have Fx (0) = 1. Then it follows from (9) that the measure µ has long return time. The following example illustrates that an invariant measure may have long return time even if the map is not hyperbolic. Example 2. Let α ∈ (0, 1) and consider the map T : [0, 1] → [0, 1] defined by x(1 + 2α x α ) if x ∈ [0, 1/2] T (x) = . 2x − 1 otherwise Note that due to the presence of the neutral fixed point 0 (since T (0+ ) = 1), the map is not uniformly hyperbolic. We recall that there exists a unique invariant probability measure µ which is ergodic and absolutely continuous with respect to Lebesgue measure (see, for example, [6] for references). Since the set of points x ∈ [0, 1] such that |T (x)| > 1 has full µ-measure, the measure µ is hyperbolic (see Sect. 4 for the definition). Denote by an the left preimages of a0 = 1. Let ξ be the countable partition of [0, 1] defined by ξ = {(an+1 , an ] : n ≥ 0}. It is proved in [6] that T has exponential statistic of return time for cylinders of the partition ξ . More precisely, the following sharp estimate is established: there exists γ > 0 such that for µ-almost every x ∈ [0, 1] and all sufficiently large m ∈ N, we have µ({y ∈ ξm (x) : τξm (x) (y) > t/µ(ξm (x))}) sup − e−t ≤ µ(ξm (x))γ , µ(ξm (x)) t≥0 where ξm =
m−1 k=0
T −k ξ , and τA (y) = inf{k ∈ N : T k y ∈ A}.
Proceeding as in (9) we conclude that for µ-almost every x ∈ [0, 1] we have µ({y ∈ ξm (x) : τξm (x) ≤ µ(ξm (x))−1+ε }) ≤ 2µ(ξm (x))1+ε for every ε ≤ γ and every sufficiently large m. We now present a simple argument showing that one can replace cylinders by balls in the last inequality. For each sufficiently small r > 0, it is possible to choose integers mr ≥ nr such that ξmr (x) ⊂ B(x, r) ⊂ ξnr (x) with mr /nr → 1 as r → 0. Since τξnr (x) (y) ≤ τr (y) and
µ(B(x, r)) ≥ µ(ξmr (x)),
we obtain µ(Aε (x, r)) ≤ µ({y ∈ ξnr (x) : τξnr (x) (y) ≤ µ(ξmr (x))−1+ε }). In view of the inequalities µ(ξmr (x))−1+ε ≤ µ(ξnr (x))−1+ε/2
and
µ(ξnr (x)) ≤ µ(B(x, r))1−ε/4 ,
Hausdorff Dimension of Measures via Poincaré Recurrence
449
which are valid for all sufficiently small r, we conclude that for µ-almost every x ∈ [0, 1] we have µ(Aε (x, r)) ≤ 2µ(ξnr (x))1+ε/2 ≤ 2µ(B(x, r))(1+ε/2)(1−ε/4) for all sufficiently small r. By taking ε > 0 sufficiently small, this inequality implies that the measure µ has long return time. It follows from Theorem 4 that the identities in (8) hold for Lebesgue almost every point. We remark that it is an open problem to decide whether all hyperbolic measures (not necessarily supported on uniformly hyperbolic sets; see Sect. 4) have long return time.
4. Hyperbolic Gibbs Measures We now consider equilibrium measures supported on a locally maximal hyperbolic set of a C 1+α diffeomorphism. The following result shows that in this situation the identities in (8) hold on a set of full measure. Theorem 5. Let X be a locally maximal hyperbolic set of a C 1+α diffeomorphism on a compact smooth manifold, for some α > 0. If µ is an ergodic equilibrium measure of a Hölder continuous potential on X then: 1. µ has long return time; 2. R(x) = R(x) = dimH µ for µ-almost every x ∈ X. When µ is not ergodic one can consider the finite ergodic decomposition of µ on (relatively) open subsets Xi ⊂ X such that (Xi , T , µ|Xi ) is ergodic. Since each Xi is (relatively) open it follows immediately from Theorem 5 that R(x) = R(x) = dimH µXi for µ-almost every x ∈ Xi . The following example illustrates that for invariant measures which are not hyperbolic the second statement in Theorem 5 may not hold. Example 3. Consider a rotation of the circle by an irrational number ω which is well approximable by rational numbers. We recall that ω is said to be well approximable by rational numbers if ν(ω) > 1, where ν(ω) is the supremum of all ν > 0 such that |ω − p/q| < 1/q ν+1 for infinitely many relatively prime integers p and q. The unique invariant measure of the rotation is the Lebesgue measure m, which is clearly exact dimensional. Furthermore, it is easy to verify that if 0 < q1 < q2 < · · · is a sequence of positive integers such that |qn ω − pn | < 1/qn ν for some integer pn , then τ1/qn ν (x) = inf{k ∈ N : kω(mod1) < 1/qn ν } ≤ qn for every x in the circle, and thus log τ1/qn ν (x) 1 ≤ < 1 = dimH m. ν ν(ω) n→∞ − log(1/qn )
R(x) ≤ lim
Note that the Lebesgue measure is not hyperbolic in this example.
450
L. Barreira, B. Saussol
In view of Theorem 4, Example 3 also illustrates that an exact dimensional measure may not have long return time. We now show that the above Theorems 4 and 5 can be seen as generalizations of work of Ornstein and Weiss for the measure-theoretic entropy. Let T : X → X be a measurable transformation (note that X need notbe a metric −k Z space), and Z a measurable partition of X. Consider the partitions Zn = n−1 k=0 T for each n. We shall denote by hµ (T , Z) the µ-entropy of T with respect to Z. Then Theorem 1 in [8] can be reformulated in the following manner. Proposition 2 ([8]). Let T : X → X be a measurable transformation, Z a measurable partition of X, and µ an ergodic T -invariant probability measure on X. If we endow X with the (pseudo) metric dZ (x, y) = e−n , where n is the smallest positive integer such that Zn (x) = Zn (y), then R(x) = R(x) = hµ (T , Z) for µ-almost every x ∈ X. We stress that with the special metric dZ , the measure-theoretic entropy hµ (T , Z) coincides with the Hausdorff dimension dimH µ of the measure µ. Theorems 4 and 5 provide versions of Proposition 2 in metric spaces which may have “non-homogeneous” distances. Let now f : M → M be a diffeomorphism of a compact smooth manifold. Given x ∈ M and v ∈ Tx M the Lyapunov exponent of v at x is defined by λ(x, v) = lim
n→∞
1 log dx f n v. n
The measure µ is said to be hyperbolic if there exists a set Y ⊂ M of full µ-measure such that λ(x, v) = 0 for every x ∈ Y and every v ∈ Tx M. One should notice that the relation between Proposition 2 and Theorem 5 is similar to the relation between the Shannon–McMillan–Breiman theorem, and the following statement established by Barreira, Pesin, and Schmeling. Proposition 3 ([1]). If f is a C 1+α diffeomorphism on a compact smooth manifold M, for some α > 0, and µ is a hyperbolic f -invariant probability measure on M, then d µ (x) = d µ (x) for µ-almost every x ∈ M. The role of Proposition 3 in dimension theory of dynamical systems is similar to the role of the Shannon–McMillan–Breiman theorem in the entropy theory. While the first ensures the coincidence of many characteristics of dimension type of the measure (such as the Hausdorff dimension, lower and upper box dimensions, and lower and upper information dimensions), the later ensures the coincidence of various definitions of the entropy (such as those due to Kolmogorov and Sinai, Katok, Brin and Katok, and Pesin). See [1] for details. In a similar fashion, while Proposition 2 relates the measure-theoretic entropy with recurrence, Theorem 5 relates dimension-like characteristics with recurrence. Both results provide a non-trivial insight concerning the quantitative behavior of recurrence. A similar observation can be made about the hypotheses under which the results are established. Namely, the assumptions in Proposition 3 are known to be optimal (see [1, 9] for details), while the Shannon–McMillan–Breiman theorem only assumes the invariance of the probability measure. Similarly, while Proposition 2 only requires the measure to be ergodic, Theorem 5 requires more from the dynamical system and the invariant measure. Example 3 illustrates that the assumption that the measure is hyperbolic is essential in Theorem 5.
Hausdorff Dimension of Measures via Poincaré Recurrence
451
We believe that the other assumption in Theorem 5, concerning the regularity of the map, is also essential, although to the best of our knowledge no counterexample is known with weaker regularity. Moreover we would like to formulate the following plausible conjecture. Conjecture. Let f be a C 1+α diffeomorphism on a compact smooth manifold M, for some α > 0. If µ is an ergodic hyperbolic f -invariant probability measure on M, then R(x) = R(x) = dimH µ for µ-almost every x ∈ M. Theorem 5 establishes this statement when µ is supported on a uniformly hyperbolic set. By Theorem 1 and Proposition 3, observe that in order to establish the conjecture in the affirmative, one must show that R(x) ≥ dimH µ for µ-almost every x ∈ X. Our results motivate the introduction of a new method to compute the Hausdorff dimension of measures. More precisely, one can use Statement 5 in Theorem 5 to compute the Hausdorff dimension of an equilibrium measure µ supported on a locally maximal hyperbolic set of a C 1+α diffeomorphism. Namely, for µ-almost every point we have lim
r→0
log τr (x) = dimH µ. − log r
Therefore, one can use the following algorithm: 1. choose “µ-randomly” a point x ∈ X; 2. iterate the point x and determine the successive “best” return times to a neighborhood of x, i.e., the smallest possible positive integers m1 < m2 < · · · such that d(T m1 x, x) > d(T m2 x, x) > · · · ; 3. plot the points (log mn , − log d(T mn x, x)) in a plane; 4. estimate dimH µ from the asymptotic slope defined by these points. 5. Application to Suspension Flows We assume that T : X → X is a bi-Lipschitz transformation on the separable metric space (X, d). Let ϕ : X → (0, ∞) be a Lipschitz function. Consider the space Y = {(x, s) ∈ X × R : 0 ≤ s ≤ ϕ(x)}, with the points (x, ϕ(x)) and (T x, 0) identified for each x ∈ X. The suspension flow over T with height function ϕ is the flow / = {ψt }t on Y , where each transformation ψt : Y → Y is defined by ψt (x, s) = (x, s + t). We equip the space Y with the Bowen–Walters distance dY introduced in [4], and define the return time of a point y ∈ Y (with respect to the flow /) into the open ball BY (y, r) by τr/ (y) = inf{t > ρr (y) : ψt y ∈ BY (y, r)} = inf{t > ρr (y) : dY (ψt y, y) < r}, def
where ρr (y) = inf{t > 0 : ψt y ∈ BY (y, r)} is the escape time of y from the ball BY (y, r). Observe that ψt y ∈ BY (y, r) for all sufficiently small t, and thus we need to ensure that the orbit ψt y has escaped from BY (y, r) when defining τr/ (y).
452
L. Barreira, B. Saussol
We also define the lower and upper recurrence rates of y by log τr/ (y) r→0 − log r
/
R / (y) = lim
log τr/ (y) . r→0 − log r
and R (y) = lim
Let µ be a T -invariant Borel probability measure in X. It is well known that µ induces a /-invariant probability measure ν in Y such that ϕ(x) g dν = g(x, s) dsdµ(x) ϕ dµ Y
X
X
0
for every continuous function g : Y → R (where ds refers to Lebesgue measure in the line), and that any /-invariant measure ν in Y is of this form for some T -invariant Borel probability measure µ in X. Theorem 5 can be used to establish the following result for suspension flows. Theorem 6. Let X be a locally maximal hyperbolic set of a C 1+α diffeomorphism T on a compact smooth manifold, for some α > 0, and µ an equilibrium measure of a Hölder continuous potential on X. If / is a suspension flow over T |X then /
R / (y) = R (y) = dimH ν − 1 for ν-almost every y ∈ Y . 6. Proofs Following Federer [5], a measure µ is called diametrically regular if there exist constants η > 1 and c > 0 such that µ(B(x, ηr)) ≤ cµ(B(x, r)) for every x ∈ X and r > 0. Examples include equilibrium measures with a Hölder continuous potential for several classes of topologically mixing hyperbolic systems, and namely subshifts of finite type, conformal expanding maps, surface axiom A diffeomorphisms, and, more generally, conformal axiom A diffeomorphisms. See [9] for full details. We shall say that a measure µ is weakly diametrically regular on a set Z ⊂ X if there is a constant η > 1 such that for µ-almost every x ∈ Z and every ε > 0, there exists δ > 0 such that if r < δ then µ(B(x, ηr)) ≤ µ(B(x, r))r −ε .
(10)
It is easy to verify that if µ is a weakly diametrically regular measure on a set Z, then for each fixed constant η > 1, there exists δ = δ(x, ε) > 0 for µ-almost every x ∈ Z and every ε > 0, such that (10) holds for every r < δ. Clearly, diametrically regular measures are weakly diametrically regular on X. Lemma 1. Any Borel probability measure on Rd is weakly diametrically regular. Proof of Lemma 1. Let µ be a Borel probability measure on Rd . Clearly, it is sufficient to show that for µ-almost every x ∈ Rd we have µ(B(x, 2−n )) ≤ n2 µ(B(x, 2−n−1 )) for all sufficiently large n ∈ N. For each n ∈ N and δ > 0 let def Kn (δ) = x ∈ supp µ : µ(B(x, 2−n−1 )) < δµ(B(x, 2−n )) .
(11)
Hausdorff Dimension of Measures via Poincaré Recurrence
453
Taking a maximal 2−n−2 -separated set E ⊂ Kn (δ) we obtain µ(B(x, 2−n−1 )) ≤ δµ(B(x, 2−n )). µ(Kn (δ)) ≤ x∈E
x∈E
Since E is 2−n−2 -separated, there exists a constant M (depending only on d) such that −n E can be decomposed into the union E = M i=1 Ei where each set Ei is 2 -separated. −n Thus for each i = 1, . . . , M the union x∈Ei B(x, 2 ) is disjoint. Therefore µ(Kn (δ)) ≤
M
δµ(B(x, 2−n )) ≤ Mδ.
i=1 x∈Ei
Since
µ(Kn (n−2 )) ≤ M
n>0
n−2 < ∞,
n>0
we conclude from the Borel–Cantelli lemma that (11) holds for µ-almost every x ∈ X and all sufficiently large n ∈ N. This completes the proof. This shows that the class of weakly diametrically regular measures is very broad. In particular, due to Whitney’s embedding theorem, this class contains all probability measures supported in a finite-dimensional smooth manifold. Further weakly diametrically regular measures on arbitrary metric spaces include any measure µ on a separable metric space X restricted to the set {x ∈ X : d µ (x) = d µ (x)}. This readily follows from the definition of pointwise dimension. Note that the property of Rd that we used in the proof of Lemma 1 is that the maximal cardinality, say M(r), of a 41 r-separated subset of balls of radius r is bounded by some constant M. Our proof readily extends to separable metric spaces X with the property that M(r) = o(r −ε ) for any ε > 0. In this case, instead of (11) one can show that for each δ > 0 we have µ(B(x, 2−n )) ≤ 2nδ µ(B(x, 2−n−1 )) for µ-almost every x ∈ X and all sufficiently large n ∈ N. This readily implies the weak regularity of µ. We now provide an example of very different nature. Example 4. Let α ∈ ( 21 , 1) and define the sequence βn = nα . Consider the space of sequences X = {0, 1}N and define a metric on X by requiring that diam Cn (x) = e−βn , where Cn (x) is any cylinder of length n. Consider also the Bernoulli measure µ on X such that µ(Cn (x)) = 2−n . One can easily verify that µ is weakly diametrically regular (by checking that any ball of radius r contains at most r −ε(r) balls of radius 41 r, where ε(r) → 0 as r → 0), and also that dimH µ = +∞. In particular X = supp µ cannot be smoothly embedded into Rd for any d ∈ N. The same measure µ may not be weakly diametrically regular if the sequence βn increases in a slower fashion. This is the case for example when βn = log n. We continue with an auxiliary statement.
454
L. Barreira, B. Saussol
Lemma 2. Let µ be a finite Borel measure on the separable metric space X, and G ⊂ supp µ a measurable set. Given r > 0, there exists a countable set E ⊂ G such that: 1. B(x, r) ∩ B(y, r) = ∅ for any two distinct points x, y ∈ E; 2. µ(G \ x∈E B(x, 2r)) = 0. Proof of Lemma 2. The existence of the set E can be obtained using Zorn’s lemma on the non-empty family of subsets of G which satisfy the first property, ordered by inclusion. Then the second property is satisfied for any maximal element. Since µ(B(x, r)) > 0 for each x ∈ E ⊂ supp µ, the set E is at most countable. We shall call the set E in Lemma 2 a maximal r-separated set for G. We recall that for any a > 0 the following identities hold: log µ(B(x, ae−n )) , −n n→∞ log τae−n (x) R(x) = lim , n n→∞
d µ (x) = lim
log µ(B(x, ae−n )) , n→∞ −n log τae−n (x) . R(x) = lim n→∞ n
d µ (x) = lim
(12) (13)
Lemma 3. Let T be a Borel measurable transformation on the separable metric space X, and µ a T -invariant probability measure on X. If µ is weakly diametrically regular on a measurable set Z ⊂ X with µ(Z) > 0, then (2) holds for µ-almost every x ∈ Z. Proof of Lemma 3. Observe that the function δ(x, ·) in the definition of a weakly regular measure (see (10)) can be made measurable for each fixed x. Fix ε > 0, and choose δ > 0 sufficiently small such that the set G = {x ∈ Z : δ(x, ε) > δ} has measure µ(G) > µ(Z) − ε. For any r, λ > 0 and x ∈ X consider the set Ar,x = {y ∈ B(x, 4r) : τ4r (y, x) ≥ λ−1 µ(B(x, 4r))−1 }, where τ4r (y, x) is defined in (5). Chebychev’s inequality implies that µ(Ar,x ) ≤ λµ(B(x, 4r)) τ4r (y, x) dµ(y). B(x,4r)
Since µ is invariant, Kac’s lemma tells us that τ4r (y, x) dµ(y) = µ({y ∈ X : τ4r (y, x) < ∞}) ≤ 1. B(x,4r)
Since B(x, 2r) ⊂ B(x, 4r), we obtain µ({y ∈ B(x, 2r) : τ4r (y, x)µ(B(x, 4r)) ≥ λ−1 }) ≤ λµ(B(x, 4r)). Furthermore τ4r (y, x)µ(B(x, 4r)) ≥ τ8r (y)µ(B(y, 2r)) whenever d(x, y) < 2r (see (6)), and thus µ({y ∈ B(x, 2r) : τ8r (y)µ(B(y, 2r)) ≥ λ−1 }) ≤ λµ(B(x, 4r)).
(14)
Hausdorff Dimension of Measures via Poincaré Recurrence
455
By Lemma 2 we can find an at most countable maximal r-separated set E ⊂ G. Using (14) with λ = r 2ε and (10) with η = 4 (see also the discussion after (10)), we obtain Dε (r) = µ({y ∈ G : τ8r (y)µ(B(y, 2r)) ≥ r −2ε }) µ({y ∈ B(x, 2r) : τ8r (y)µ(B(y, 2r)) ≥ r −2ε }) ≤ def
x∈E
≤ r 2ε ≤ rε
µ(B(x, 4r))
x∈E
µ(B(x, r)) ≤ r ε .
x∈E
We conclude that
Dε (e−n ) ≤
n>− log δ
e−εn < ∞.
n>− log δ
By the Borel–Cantelli lemma we find that for µ-almost any x ∈ G, we have log τ8e−n (x) log µ(B(x, 2e−n )) ≤ 2ε + −n n for all sufficiently large n. The desired result now follows from the identities in (12) and (13), and the arbitrariness of ε. In particular, if µ is an exact dimensional probability measure on a separable metric space X, then by Lemma 3 we have R(x) ≤ dimH µ for µ-almost every x ∈ X. On the other hand, if µ is not exact dimensional then in general one can only show that R(x) ≤ dimH µ for µ-almost every x ∈ X (see Theorem 2 and Example 1). We notice that as in Lemma 3, the condition that X ⊂ Rd in Theorem 4 may also be replaced by the hypothesis that µ is weakly diametrically regular on X. We now start establishing the results in the former sections. Proof of Theorem 1. The desired statement follows from Lemmas 1 and 3.
We continue with an auxiliary statement. Lemma 4. Given x ∈ X, we have R(x) ≤ d if and only if for every ε > 0, we have lim [n1/(d+ε) d(T n x, x)] = 0.
n→∞
(15)
Proof of Lemma 4. Assume first that R(x) ≤ d. Given ε > 0 there exists a sequence of numbers rn such that rn → 0, and τrn (x) < rn −(d+ε) for all n. Let mn = τrn (x). If the sequence mn is bounded, then x is periodic and (15) holds. Assume now that mn is unbounded. Note that d(T mn x, x) < rn and mn 1/(d+2ε) d(T mn x, x) < τrn (x)1/(d+2ε) rn < rn −(d+ε)/(d+2ε) rn = rn ε/(d+ε) . Therefore lim [n1/(d+2ε) d(T n x, x)] ≤ lim [mn 1/(d+2ε) d(T mn x, x)] = 0.
n→∞
n→∞
456
L. Barreira, B. Saussol
This establishes (15) for each ε > 0. Assume now that (15) holds for every ε > 0. Setting rn = 2d(T n x, x), we conclude that τrn (x) ≤ n, and it follows from (15) that lim [τrn (x)1/(d+ε) rn ] = 0.
n→∞
Thus there exists a diverging sequence of positive integers kn such that τrkn (x)1/(d+ε) rkn < 1 for each n. Therefore log τrn (x) log(rkn d+ε ) ≤ lim = d + ε. n→∞ − log rn n→∞ − log rkn
R(x) ≤ lim
The arbitrariness of ε implies the desired result.
Proof of Theorem 2. We recall that the statement in Theorem 2 is a reformulation of a result of Boshernitzan. By the definition of the Hausdorff dimension of a measure, for any α > dimH µ and all sufficiently small δ > 0 there exists a set Z ⊂ X of full µ-measure such that α > dimH Z > dimH µ − δ, and hence mα (Z) = 0. It follows from (3) and Lemma 4 that R(x) ≤ α for µ-almost every x ∈ Z. Letting α → dimH Z and δ → 0 we obtain R(x) ≤ dimH µ for µ-almost every x ∈ X. Proof of Theorem 3. The desired statement follows from Theorem 1 and Lemma 4. Proof of Theorem 4. By Theorem 1 we have R(x) ≤ d µ (x) and R(x) ≤ d µ (x) for µ-almost every x ∈ X. We shall now establish the reverse inequalities. By Lemma 1 the measure µ is weakly diametrically regular on X. Since µ has long return time (see (7)), µ is weakly diametrically regular on X (see (10)), and d µ (x) > 0 for µ-almost every x ∈ X, if ε > 0 is sufficiently small we conclude that there exist numbers a, γ , ρ > 0 and a set G ⊂ X with µ(G) > 1 − ε such that if x ∈ G and r ∈ (0, ρ) then µ(Aε (x, 2r)) ≤ µ(B(x, 2r))1+γ , µ(B(x, 2r)) ≤ µ(B(x, r/2))r µ(B(x, r)) ≤ r a .
−aγ /2
(16) ,
(17) (18)
Consider the set Aε (r) = {y ∈ G : τr (y) ≤ µ(B(y, 3r))−1+ε }. def
Whenever d(x, y) < r we have τr (y) ≥ τ2r (y, x) (see (6)). Since B(x, 2r) ⊂ B(y, 3r), if x ∈ G then using (16), (17), and (18), we obtain µ(B(x, r) ∩ Aε (r)) ≤ µ({y ∈ B(x, r) : τ2r (y, x) ≤ µ(B(x, 3r))−1+ε }) ≤ µ(Aε (x, 2r)) ≤ µ(B(x, 2r))1+γ ≤ µ(B(x, r/2))r −aγ /2 (2r)aγ .
Hausdorff Dimension of Measures via Poincaré Recurrence
457
If E ⊂ G is a maximal 2r -separated set given by Lemma 2, then µ(Aε (r)) ≤ µ(B(x, r) ∩ Aε (r)) ≤
x∈E
µ(B(x, r/2))r −aγ /2 (2r)aγ
x∈E
≤ 2aγ r aγ /2 . We conclude that ∞
µ(Aε (e−n )) < ∞.
n=1
The Borel–Cantelli lemma implies that for µ-almost every x ∈ G we have τe−n (x) > µ(B(x, 3e−n ))−1+ε for all sufficiently large n. The identities in (12) and (13) imply that R(x) ≥ (1 − ε)d µ (x)
and
R(x) ≥ (1 − ε)d µ (x)
for µ-almost every x ∈ G. The desired statement follows from the arbitrariness of ε. For a non-uniformly hyperbolic set X in the manifold M, and a Lyapunov regular point x ∈ X, let <x : Ux → M be the Lyapunov chart at x for some sufficiently small open neighborhood Ux ⊂ Rdim M of 0, which satisfies <x 0 = x, and d0 <x (Rdim E
s (x)
× {0}) = E s (x)
and
d0 <x ({0} × Rdim E
u (x)
) = E u (x).
s
See the appendix to [1] for details. We denote by D s (x, r) ⊂ Rdim E (x) and D u (x, r) ⊂ u Rdim E (x) the balls of radius r at the origin. We also denote by B s (x, r) ⊂ V s (x) and u B (x, r) ⊂ V u (x) the balls of radius r centered at the point x, with respect to the distances induced in the local stable and unstable manifolds V s (x) and V u (x). In [7], Ledrappier and Young constructed two measurable partitions ξ s and ξ u of M such that for µ-almost every x ∈ M we have: 1. ξ s (x) ⊂ V s (x) and ξ u (x) ⊂ V u (x); 2. ξ s (x) ⊃ V s (x) ∩ B(x, γ ) and ξ u (x) ⊃ V s (x) ∩ B(x, γ ) for some γ = γ (x) > 0. We denote by µsx and µux the conditional measures associated respectively to the partitions ξ s and ξ u . Recall that any measurable partition ξ of M has associated a family of conditional measures: for µ-almost every x ∈ M there exists a probability measure µx defined on the element ξ(x) of ξ containing x. The conditional measures are characterized completely by the following property: if Bξ is the σ -subalgebra of the Borel σ -algebra generated by unions of elements of ξ then for each Borel set A ⊂ M, the function x → µsx (A ∩ ξ(x)) is Bξ -measurable and µsx (A ∩ ξ(x)) dµ. µ(A) = A
We will later need the following result concerning the product structure of hyperbolic measures. Instead of formulating the result in all its generality we state it in a form adapted to our purposes.
458
L. Barreira, B. Saussol
Proposition 4 ([12], after [1]). Let X be a locally maximal hyperbolic set of a C 1+α diffeomorphism f on a compact smooth manifold, for some α > 0. If µ is an equilibrium measure of a Hölder continuous potential on X and a, b, c > 0, then for µ-almost every x ∈ X, there exists ε(r) > 0 for each each r > 0 such that ε(r) → 0 as r → 0 and r ε(r) ≤
µsx (B s (x, r a ))µux (B u (x, r b )) ≤ r −ε(r) µ(<x (D s (x, cr a ) × D u (x, cr b )))
for all sufficiently small r > 0. Proof of Proposition 4. We use the notations of [12]. Take a Lyapunov regular point x, and let χ1 , . . . , χd be the values of the Lyapunov exponent at x. Setting ai = −a for χi < 0, and ai = −b for χi > 0 we conclude that for any r sufficiently small there exist n(r), m(r) ∈ N such that Rm(r) (x) ⊂ <x (D s (x, r a ) × D u (x, r b )) ⊂ Rn(r) (x) and m(r)/n(r) → 1 as r → 0. The desired result now follows immediately from Theorem 3.9 in [12]. We define the return time of a set A into itself by τ (A) = inf{n ∈ N : T n A ∩ A = ∅}. −k ξ for each n. Given a partition ξ of X we consider a new partition ξn = n−1 k=0 T Saussol, Troubetzkoy, and Vaienti show in [11] that the first return time of an element of ξn is typically large. Proposition 5 ([11]). Let T : X → X be a measurable transformation preserving an ergodic probability measure µ. If ξ is a finite or countable measurable partition with entropy hµ (T , ξ ) > 0 then τ (ξn (x)) ≥1 n n→∞ lim
for µ-almost every x ∈ X. Using this result, it can be shown that the first return time of a ball is also typically large. Lemma 5. Let T : X → X be a Lipschitz map with Lipschitz constant L > 1 on a compact metric space X. If µ is an ergodic T -invariant Borel probability measure with entropy hµ (T ) > 0, then 1 τ (B(x, r)) ≥ − log r log L r→0 lim
for µ-almost every x ∈ X.
Hausdorff Dimension of Measures via Poincaré Recurrence
459
Proof of Lemma 5. We claim that for each n > 0 there exists a partition ζn of X with diameter diam ζn ≤ 2−n , such that if r > 0, then µ({x ∈ X : d(x, X \ ζn (x)) < r}) < cn r,
(19)
where ζn (x) is the atom of ζn containing x, and cn is some positive constant depending only on n. Take a finite set En ⊂ X such that x∈En B(x, 2−n−1 ) = X. For each x ∈ En , we can find a sequence of real numbers rk = rk (x) ∈ (2−n−1 , 2−n ) satisfying |rk+1 − rk | ≤ 2−k−1 and µ(B(x, rk + 2−k−1 ) \ B(x, rk − 2−k−1 )) ≤ 2−k . Take r(x) = limk→∞ rk and consider the set Ak = {y ∈ X : r(x) − 2−k ≤ d(x, y) ≤ r(x) + 2−k for some x ∈ En } for each k ∈ N. Note that we obtain a cover of X satisfying µ(Ak ) ≤ 2−k+1 card En . Writing En = {x1 , . . . , xp } we set B1 = B(x1 , r(x1 ))
and
BD = B(xD , r(xD )) \
D−1
Bj
j =1
for D = 2, . . . , p. Then the partition ζn = {B1 , B2 , . . . , Bp } has the desired properties, with cn = card En . Since diam ζn → 0 as n → ∞, there exists n∗ ∈ N such that hµ (T , ζn∗ ) > hµ (T )/2 > 0. Let Z = ζn∗ , C = cn∗ and denote by Zm (x) the unique element of m−1 −k Z which contains x ∈ X. k=0 T Fix σ < 1/L < 1. Clearly, if d(x, X \ Zm (x)) < σ m , then d(T k x, X \ Z1 (T k x)) < σ m Lk for some k < m. It follows from (19) that µ({x ∈ X : d(x, X \ Zm (x)) < σ m }) ≤ m max{µ({x : d(T k x, X \ Z1 (T k x)) < σ m Lk }) : k = 0, . . . , m − 1} ≤ Cm(σ L)m , using the invariance of the measure. By the Borel–Cantelli lemma we conclude that for µ-almost every x ∈ X we have B(x, σ m ) ⊂ Zm (x) for all sufficiently large m. By Proposition 5, for µ-almost every x ∈ X we have τ (Zm (x)) τ (B(x, σ m )) τ (B(x, r)) ≤ − log σ lim = − log σ lim . m m→∞ m→∞ −m log σ r→0 − log r
1 ≤ lim
Since σ can be made arbitrarily close to 1/L, we obtain the desired statement.
460
L. Barreira, B. Saussol
Proof of Theorem 5. Let Z be a Markov partition, and consider the new partitions k=n −k n = Zm Z whenever m ≤ n. There exist constants c > 0 and λ > 0 such that k=m f n ⊂ Zn , for any cylinder Zm m n ≤ ce+λm diams Zm
and
n diamu Zm ≤ ce−λn ,
(20)
where diams and diamu denote the diameters along the stable and unstable manifolds. For µ-almost every x ∈ X, we have d µ (x) = d µ (x) = d > 0 (see Proposition 3) and, by Lemma 5, there exists δ > 0 and ρ > 0 such that B(x, r) ∩ f −k B(x, r) = ∅ def
for every r < ρ and k ≤ tr = −δ log r. Let r < ρ and write B = B(x, r). There exists a countable collection of points na na {xa }a∈A ∈ B and integers ma < 0 < na for each a ∈ A such that if Zm a (xa ) ∈ Zma denotes the unique element containing xa , then
na Zm (xa ) (mod 0), B= a a∈A
in such a way that the union is disjoint (mod 0). We may also assume, without loss of generality, that min{−ma , na } ≥ tr . Let k ≥ tr and write p = [k/2] > 0 and q = p + 1 − k < 0. We have
p
def def p B⊂ Zma (xa ) = Z−∞ (x, r) and B ⊂ Zqnb (xb ) = Zq+∞ (x, r). a∈A
b∈A
One can find sets U ⊂ A and V ⊂ A such that
p
p Zma (xa ) and Zq+∞ (x, r) = Zqnb (xb ), Z−∞ (x, r) = a∈U
b∈V
with the two unions being disjoint (mod 0). Since nb +k f −k Zqnb (xb ) = Zp+1 (f −k xb )
we obtain
B ∩ f −k B ⊂
a∈U,b∈V
p
nb +k [Zma (xa ) ∩ Zp+1 (f −k xb )].
The Gibbs property of the measure implies that there exists a constant κ > 0 such that p nb +k µ(Zma (xa ) ∩ Zp+1 (f −k xb )) µ(B ∩ f −k B) ≤ a∈U,b∈V
≤κ
a∈U,b∈V
≤κ
a∈U,b∈V p
p
nb +k µ(Zma (xa ))µ(Zp+1 (f −k xb ))
(21) p µ(Zma (xa ))µ(Zqnb (xb ))
≤ κµ(Z−∞ (x, r))µ(Zq+∞ (x, r)).
Hausdorff Dimension of Measures via Poincaré Recurrence
461
Setting p
rk = max{diams Zq+∞ (x, r), diamu Z−∞ (x, r)} it follows from (20) that rk ≤ r + ce−λk/2 . Furthermore, for µ-almost every x ∈ X (in fact for all Lyapunov regular points) we have p
Z−∞ (x, r) ⊂ <x (D s (x, 2r) × D u (x, 2rk ))
(22)
Zq+∞ (x, r) ⊂ <x (D s (x, 2rk ) × D u (x, 2r))
(23)
and
for all sufficiently small r > 0. Since k ≥ tr , we have rk ≤ r + cr λδ/2 ≤ r g , where g = λδ/3 > 0 (provided that r is sufficiently small). Proposition 4 together with (21), (22), and (23), and the main theorem in [1] yield µ(B ∩ f −k B) ≤ κµ(<x (D s (x, 2r) × D u (x, 2r g ))) × × µ(<x (D s (x, 2r g ) × D u (x, 2r))) ≤ κr −2ε(r) µsx (B s (x, r))µux (B u (x, r g )) × × µsx (B s (x, r g ))µux (B u (x, r)) ≤ κr −4ε(r) µ(B(x, r))µ(B(x, r g )) ≤ κµ(B)r −5ε(r) r gd , where ε(r) > 0 and ε(r) → 0 as r → 0. Set now 2 (log c − log r) + 1. kr = L If k ≥ kr then rk ≤ 2r, and a similar argument establishes that µ(B ∩ f −k B) ≤ κµ(B)r −5ε(r) r d . Then for any sr > 0, summing the estimate above as k runs from tr through sr we obtain k sr r −1 µ({y ∈ B(x, r) : τr (y, x) ≤ sr }) ≤ κr −5ε(r) r gd + κr −5ε(r) r d µ(B(x, r)) k=tr
k=kr
≤ κr −5ε(r) (kr r gd + sr r d ). By rechoosing ε(r) if necessary, we may assume that µ(B(x, r)) ≥ r d+ε(r) for all sufficiently small r. Setting sr = µ(B(x, r))−1+ε we obtain µ({y ∈ B(x, r) : τr (y, x) ≤ sr }) ≤ κr −5ε(r) [(−δ log r)r gd + r (d+ε(r))(−1+ε) r d ], µ(B(x, r)) provided that ε is sufficiently small. Since gd > 0, εd > 0 and ε(r) → 0 as r → 0 we conclude that (7) holds. Since ε is arbitrarily small, the measure µ has long return time. The second statement is now an immediate consequence of Theorem 4 and Proposition 3, together with Young’s criteria (see [13]).
462
L. Barreira, B. Saussol
Proof of Theorem 6. It follows immediately from Proposition 17 in [2] that there exists a constant c > 1 such that if y = (x, s) ∈ Y \ (X × {0}) and r > 0 is sufficiently small (possibly depending on x) then B(x, r/c) × I (s, r/c) ⊂ BY (y, r) ⊂ B(x, cr) × I (s, cr),
(24)
where I (s, r) = (s − r, s + r) ⊂ R. Therefore µ(B(x, r/c)2r/c ≤ ν(BY (y, r)) ≤ µ(B(x, cr))2cr for all sufficiently small r, and thus d ν (y) = d µ (x) + 1
and
d ν (y) = d µ (x) + 1.
It follows from Proposition 3 that d ν (y) = d ν (y) = dimH µ + 1 for ν-almost every y ∈ Y , and applying Young’s criteria (see [13]) we obtain dimH ν = dimH µ + 1. By (24) we obtain τr/ (y) ≤ inf{t > ρr (y) : ψt y ∈ B(x, r/c) × {s}} + r/c, τr/ (y) ≥ inf{t > ρr (y) : ψt y ∈ B(x, cr) × {s}} − cr, and hence τcr (x)−1 k=0
ϕ(T k x) − cr ≤ τr/ (y) ≤
τr/c (x)−1
ϕ(T k x) + r/c
(25)
k=0
for all sufficiently small r > 0 (possibly depending on x). By Theorem 5, given ε > 0 we have r −d+ε < τr (x) < r −d−ε for all sufficiently small r, where d = dimH µ. By (25) and the ergodicity of µ, we obtain (cr)−d+ε
X
ϕ dµ − ε − cr ≤ τr/ (y) ≤ (r/c)−d−ε ϕ dµ + ε + r/c X
for all sufficiently small r. Therefore /
log τr/ (y) = d = dimH µ r→0 − log r
R / (y) = R (y) = lim
for ν-almost every y ∈ Y . This completes the proof of the theorem.
Acknowledgement. We gratefully thank Joerg Schmeling for several enlightning discussions. We also thank the referee for the detailed and helpful comments.
Hausdorff Dimension of Measures via Poincaré Recurrence
463
References 1. Barreira, L., Pesin, Ya. and Schmeling, J.: Dimension and product structure of hyperbolic measures. Ann. of Math. 149 (2), 755–783 (1999) 2. Barreira, L. and Saussol, B.: Multifractal analysis of hyperbolic flows. Commun. Math. Phys. 214, 339– 371 (2000) 3. Boshernitzan, M.: Quantitative recurrence results. Invent. Math. 113, 617–631 (1993) 4. Bowen, R. and Walters, P.: Expansive one-parameter flows. J. Differential Equations 12, 180–193 (1972) 5. Federer, H.: Geometric measure theory. Berlin–Heidelberg–New York: Springer, 1969. 6. Hirata, M., Saussol, B. and Vaienti, S.: Statistics of return times: A general framework and new applications. Commun. Math. Phys. 206, 33–55 (1999) 7. Ledrappier, F. and Young, L.-S.: The metric entropy of diffeomorphisms. Part II: Relations between entropy, exponents and dimension. Ann. of Math. 122 (2), 540–574 (1985) 8. Ornstein, D. and Weiss, B.: Entropy and data compression schemes. IEEE Trans. Inform. Theory 39, 78–83 (1993) 9. Pesin, Ya.: Dimension theory in dynamical systems: Contemporary views and applications. Chicago Lectures in Mathematics, Chicago: Chicago University Press, 1997 10. Pesin, Ya. and Weiss, H.: On the dimension of deterministic and random Cantor-like sets, symbolic dynamics, and the Eckmann–Ruelle conjecture. Commun. Math. Phys. 182, 105–153 (1996) 11. Saussol, B., Troubetzkoy, S. and Vaienti, S.: Recurrence, dimensions and Lyapunov exponents. In preparation 12. Schmeling, J. and Troubetzkoy, S.: Scaling properties of hyperbolic measures. DANSE Preprint 50/98, 1998 13. Young, L.-S.: Dimension, entropy and Lyapunov exponents. Ergodic Theory Dynam. Systems 2, 109–124 (1982) Communicated by J. L. Lebowitz
Commun. Math. Phys. 219, 465 – 480 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Time Quasi-Periodic Unbounded Perturbations of Schrödinger Operators and KAM Methods Dario Bambusi1 , Sandro Graffi2 1 Dipartimento di Matematica “F. Enriques”, Università di Milano, Via Saldini 50, 20133 Milano, Italy.
E-mail:
[email protected]
2 Dipartimento di Matematica, Università di Bologna, Piazza di Porta S Donato 5, 40127 Bologna, Italy.
E-mail:
[email protected] Received: 3 October 2000 / Accepted: 20 December 2000
Abstract: We eliminate by KAM methods the time dependence in a class of linear differential equations in 2 subject to an unbounded, quasi-periodic forcing. This entails the pure-point nature of the Floquet spectrum of the operator H0 + P (ωt) for small. Here H0 is the one-dimensional Schrödinger operator p 2 + V , V (x) ∼ |x|α , α > 2 for |x| → ∞, the time quasi-periodic perturbation P may grow as |x|β , β < (α − 2)/2, and the frequency vector ω is non resonant. The proof extends to infinite dimensional spaces the result valid for quasiperiodically forced linear differential equations and is based on Kuksin’s estimate of solutions of homological equations with non-constant coefficients. 1. Introduction and Statement of the Results Consider the non-autonomous, linear differential equation in a separable Hilbert space H ˙ iψ(t) = (A + P (ω1 t, ω2 t, . . . , ωn t))ψ(t),
ψ(t) ∈ H, ∈ R
(1.1)
under the following conditions: A1 The operator A is positive self-adjoint. Spec(A) is discrete, and all eigenvalues 0 < λ1 < λ2 < λ3 , . . . are simple. There is d > 1 such that λi ∼ i d ,
i → ∞.
(1.2)
A2 P (φ1 , . . . , φn ) ≡ P (φ) is a function from the n-dimensional torus Tn ≡ Rn /2π Z n into the symmetric operators in H, ω := (ω1 , . . . , ωn ) ∈ [0, 1]n is a frequency vector. A3 For δ ≥ 0, denote B δ the Banach space of all closed operators T in H such that A−δ/d T is bounded (remark that B 0 = L(H)), with norm T δ :=
sup A−δ/d T xH
xH =1
(1.3)
466
D. Bambusi, S. Graffi
Then the map Tn φ → P (φ) ∈ B δ is analytic for some δ < d − 1. Our purpose is to prove the following Theorem 1.1. There exist ∗ > 0, a subset ⊂ := [0, 1]n and, if || < ∗ and ω ∈ , a unitary operator U (ωt) ≡ U (ω1 t, ω2 t, . . . , ωn t) in H with the following properties: T1 U (ωt) is analytic in t and quasiperiodic with frequencies ω; T2 U (ωt) transforms Eq. (1.1) into a system of the form A∞ :=
diag(λ∞ 1
iχ˙ (t) = A∞ (ωt)χ (t) ∞ ∞ + µ∞ 2 (ωt), λ3 + µ3 (ωt), . . . )
∞ + µ∞ 1 (ωt), λ2
(1.4) (1.5)
∞ ∞ n Here {λ∞ i }i=1 ∈ R and any function µi (φ) : T → R is analytic with zero average; T3 There exists C > 0 such that:
1 − U (ωt)0 ≤ C,
δ |λ∞ i − λi | ≤ Ci ,
|µi (ωt)| ≤ Ci δ ,
− →0 → 0.
Straightforward integration of (1.4) reduces (1.1) to an autonomous system which makes the almost-periodic nature of all its solutions evident. Corollary 1.1. 1. If || < ∗ , ω ∈ there exists a unitary transformation UF (ωt), quasiperiodic with frequency ω and such that 1 − UF (ωt)δ ≤ C, which transforms (1.1) into the system ix˙ = AF x,
∞ ∞ AF := diag(λ∞ 1 , λ2 , λ3 , . . . );
(1.6)
2. For any initial datum ψ0 the solution ψ(t) of (1.1) is almost-periodic with frequen∞ cies 2π/λ∞ 1 , 2π/λ2 , . . . ; ω1 , . . . , ωn , i.e. has the form ψ(t) =
∞ i=0
∞
φi0 (ωt)eiλi t ,
(1.7)
where {φi0 (ωt)}∞ i=1 are the components of U (ωt)ψ0 along the eigenvector basis of A. The above result can be equivalently formulated in terms of Floquet spectrum ([21], and [12] for the quasi-periodic case). Consider indeed on K := H ⊗ L2 (Tn ) the Floquet Hamiltonian operator KF := −i
n l=1
ωl
∂ + A + P (φ). ∂φl
(1.8)
The maximal operator in K generated by the differential expression (1.8), still denoted KF , is self-adjoint by A3, which makes A + P (ωt) self-adjoint on D(A) for all t. Then: Corollary 1.2. For || ≤ ∗ and ω ∈ the spectrum of KF is pure point; its eigenvaln ues are νj,k := λ∞ j + k · ω, j = 0, 1, 2 . . . , k ∈ Z .
Nonautonomous Schrödinger Operators and KAM Methods
467
Remark 1. 1. This corollary extends to unbounded and quasiperiodic perturbations the analogous result valid for operators KF with P (φ) periodic and differentiable in φ as a bounded operator in H [5,6]. The gap condition is the same by condition A1, but here the analyticity of the perturbation is required. 2. The KAM methods of [5, 6], first implemented in [2] (see also [3]) made possible to strengthen for small coupling the original result of [10] (see also [14, 17]) from absence of absolutely continuous spectrum to absence of continuous spectrum. Here too the set is the set of all frequencies fulfilling a diophantine condition with respect to the differences λi − λj . Moreover, a result of the type of Corollary 1.1 up to an error of order exp 1/∗ has been proved in [11] for a class of bounded perturbations via the Nekhoroshev technique. 3. Our proof extends to infinite dimensional spaces the KAM technique to eliminate the time dependence of quasiperiodically forced ordinary linear differential equations [1, 13, 20]. The main technical point is that the relevant homological equation has variable coefficients but can be solved by a technique developed by Kuksin[16] in the context of his analysis of the KdV equation by KAM theory. As in [3, 5, 6, 10, 14, 17, 11] the main motivation for this corollary is the (Floquet) spectral analysis for the time dependent Schrödinger equation in dimension one, namely: Theorem 1.2. Consider the time dependent Schrödinger equation H (t)ψ(x, t) = i∂t ψ(x, t), x ∈ R;
H (t) := −
d2 + Q(x) + V (x, ωt), ∈ R dx 2 (1.9)
and the corresponding Floquet Hamiltonian (1.8) under the following conditions: 1. Q(x) ∈ C ∞ (R; R), Q(x) ∼ |x|α for some α > 2 as |x| → ∞; 2. V (x, φ) is a C ∞ (R; R)-valued holomorphic function of φ ∈ Tn , with |V (x, φ)||x|−β α−2 bounded as |x| → ∞ for some β < . 2 Then there is ∗ > 0 such that the spectrum of KF is pure point for all || < ∗ , ω ∈ . Remark 2. 1. We prove the result in the more general case where V is a C ∞ (R2 ; R)valued holomorphic function V (x, ξ ; φ) of φ ∈ Tn with |V (x, ξ ; φ)|(|ξ |2 +|x|α )−δ/d bounded as |ξ | + |x| → ∞. Here V (φ) is realized as a pseudodifferential operator β family in L2 (R) of class Gρ (see e.g. [19], Chapter 8) of Weyl symbol V . 2. For α = 4 we get β < 1. Hence the quantum version of the original Duffing oscillator d2 H (t) = − 2 + x 4 + x sin (ωt) lies just outside the validity range of this corollary. dx 3. In the periodic case (n = 1) we see that, as in classical mechanics (see e.g. [7, Chap. 5.13]) not even an unbounded perturbation delocalizes the system if its strength is too small and its frequency is not too close to a resonant one. There is no diffusion (for small enough) in the classical counterpart of (1.9) even for resonant values of ω, but there are chaotic regions in phase space localized around the resonant actions. In this case it is still unknown whether or not the quantum Floquet spectrum is pure point even for bounded perturbations. On the other hand for 0 < α ≤ 2, when condition (1.2) is not satisfied, the nature of the Floquet spectrum is still unknown apart from the globally resonant case [8, 9].
468
D. Bambusi, S. Graffi
4. In the quasiperiodic case (n ≥ 2) the quantized system behaves as in the periodic one even though in the classical counterpart of (1.9) there are no topological obstructions to the growth of energy. 2. The Formal Construction Without loss of generality equation (1.1) can be written as a first-order system in 2 : ix˙ = (A + P (ωt))x, x ∈ 2 , A = diag(λ1 , λ2 , λ3 , . . . ) λi ∈ R,
λi > 0,
(2.1) (2.2)
where λi and P (ωt) ≡ P (ω1 t, ω2 t, . . . , ωn t) fulfill conditions A1–A3. The key point of any KAM method is the construction of a coordinate transformation mapping the original problem into a new one of the same form with a much smaller size of the perturbation, typically the square of the original one. Here we construct and estimate, by an algorithm very close to that of [11], a unitary operator which maps (2.1) into an equation of the same form but with a perturbation of order 2 . In this section we describe the procedure; in Sect. 3 we work out the estimates, and in Sect. 4 we set up the iterative scheme and prove its convergence. Let B(φ1 , . . . , φn ) ∈ B 0 be anti-selfadjoint ∀ φ ∈ Tn . Given the unitary operator B(φ) e , for fixed ω ∈ perform the change of basis x = eB(ωt) y. Substitution in (2.1) yields iy˙ = (A + P˜ 1 (ωt))y.
(2.3)
The new perturbation P˜ 1 is (the explicit dependence of B on t is omitted): (2.4) P˜ 1 := [A, B] − iB˙ + P ˙ B − B˙ . + e−B AeB −A−[A, B] + e−B P eB − P −i e−B Be If B makes the curly bracket vanish P˜ 1 becomes of order 2 . Hence we study the equation [A, B] − iB˙ + P = 0.
(2.5)
Taking its matrix elements between the eigenvectors of A this equation becomes −i
n
ωl
l=1
∂ Bij + (λi − λj )Bij = Pij . ∂φl
Expand both sides in Fourier series, i.e. write Bˆ ij k eik·φ , Pij = Pˆij k eik·φ . Bij = k∈Z n
k∈Z n
Equating the Fourier coefficients of both sides (2.6) becomes (ω · k + λi − λj )Bˆ ij k = Pˆij k .
(2.6)
Nonautonomous Schrödinger Operators and KAM Methods
469
Clearly this equation cannot be solved when i = j and k = 0. Assuming now ω such that ω · k + λi − λj = 0 when i = j or k = 0, the natural definition of B would be the operator with matrix elements defined as Bij :=
k∈Z n
Bii :=
Pˆij k eik·φ , ω · k + λ i − λj
k∈Z n −{0}
i = j
Pˆiik ik·φ e ω·k
(2.7)
The second line in (2.4) is of order 2 only if the operator B is bounded. However P is not bounded; as a consequence the operator diag(Bii ) is in general unbounded, and the above definition cannot yield the desired result. The idea is therefore to define B by the first of (2.7) with Bii = 0; one can guess that, since the denominators ω · k + λi − λj tend to infinity as i or j diverge, it should be possible to generate a bounded B even if P is unbounded. In the next section we will prove that this is actually the case. With the above definition of B the curly bracket in (2.4) turns out to be the operator diag(Pii ), and hence in terms of the variables y the equation takes the form. iy˙ = (A1 + 2 P 1 (ωt))y, with A1 = A + diag(Pii (ωt)). This system is defined only for ω in the subset of where the denominators in (2.7) do not vanish. In the next section we will assume a diophantine type condition also for such denominators, to be valid on a Cantor subset of . Then it will turn out that P 1 depends in a Lipschitz way on ω in such a subset. Iterating the construction, we see that the operator A is replaced by the operator A1 which depends also on the angles φ. As we shall see, this is precisely the point where Kuksin’s result[16] enters in a critical way. 3. Squaring the Order of the Perturbation Keeping in mind the discussion of the preceding section we first set some notation, and then construct and estimate the transformation squaring the order of the perturbation. Let Tns be the complexified torus with |Imφi | ≤ s. If f is an analytic function from n Ts to a Banach space (in what follows C or the complexification of B δ ), we denote f s = sup f (φ) . φ∈Tns
For B δ -valued functions we use the particular symbol f δ,s := sup f (φ)δ . φ∈Tns
Let − be a closed nonempty subset of of positive measure. If f has an additional (Lipschitz continuous) dependence on ω ∈ − we define the norm f (φ, ω) − f (φ, ω ) L f s := f s + sup sup . |ω − ω | φ∈Tn ω,ω ∈− In particular for B δ -valued functions we use the notation .L δ,s .
470
D. Bambusi, S. Graffi
Let us now include our system into a more general framework, which, by the above discussion, is convenient for the iteration scheme. Consider in 2 the equation ix˙ = (A− + P − (ωt))x
(3.1)
under the following conditions H1) − − − − − A−= diag(λ− 1 (ω) + µ1 (ωt, ω), λ2 (ω) + µ2 (ωt, ω), λ3 (ω) + µ3 (ωt, ω), . . . ). (3.2)
Here: − H1.a) ∀i = 1, . . . λ− i (ω) is positive and Lipschitz continuous w.r.t. ω ∈ ; moreover d λ− i ∼i ,
uniformly in ω ∈ − . Hence there is Cλ− > 0 independent of ω such that − − d d (3.3) λi − λ− j ≥ Cλ |i − j |. H1.b) There is Cω− > 0 suitably small and δ < d − 1 such that sup
ω,ω ∈−
− |λ− i (ω) − λi (ω )| ≤ Cω− i δ . |ω − ω |
(3.4)
n − H1.c) ∀i = 1, . . . µ− i (ω) : Ts × → R is analytic w.r.t. φ, Lipschitz continuous w.r.t. ω, and has zero average, i.e. µi (φ, ω) dφ = 0. Tn
Moreover it fulfills the estimates µi s ≤ Cµ− i δ ,
sup
sup
φ∈Tns ω,ω ∈−
H2) H3)
− |µ− i (ω, φ) − µi (ω , φ)| |ω − ω |
(3.5) ≤ Cω− i δ .
(3.6)
The operator valued function P − : Tns × − → B δ is analytic with respect to φ ∈ Tns and Lipschitz continuous w.r.t. ω ∈ − . there exist γ − > 0 and τ > n + 1 + s2/(d − 1) such that, for any ω ∈ − , one has γ− , ∀ k ∈ Zn − {0}, |k|τ γ − |i d − j d | |λi − λj + ω · k| ≥ , ∀ k ∈ Zn , 1 + |k|τ |ω · k| ≥
(3.7) i = j.
(3.8)
Remark 1. In the next section we will prove that it is possible to construct a set − of positive measure such that also the original system (1.1) fulfills the above assumption.
Nonautonomous Schrödinger Operators and KAM Methods
471
Let now B : Tns (φ1 , . . . , φn ) → B(φ1 , . . . , φn ) ∈ B 0
(3.9)
be an analytic map with B(φ1 , . . . , φn ) anti-selfadjoint for each real value of (φ1 , . . . , φn ). Consider the corresponding unitary operator eB(φ1 ,...,φn ) , and (as above) for any ω ∈ − consider the unitary change of basis x = eB(ωt) y. Substitution in Eq. 3.1 yields iy˙ = (A+ + P + (ωt))y,
(3.10)
+
(3.11)
−
−
A := A + diag(P ).
Here diag(P − ) is the diagonal matrix formed by the diagonal elements of P − , that is − − − diag(P − ) := diag(P11 (ωt), P22 (ωt), P33 (ωt) . . . ). + The new perturbation P is given by (the explicit dependence of B on t is omitted): P + := [A− , B] − iB˙ + (P − − diag(P − )) ˙ B − B˙ . + e−B A− eB − A− − [A− , B] + e−B P − eB − P − − i e−B Be (3.12) According to the standard procedure we subtract the mean of the perturbation. Namely, + + − we write A+ = diag(λ+ i + µi (ωt)), where λi = λi + Pii (φ) (the overline denotes + angular average). Hence the functions µ (φ) have zero average; the quantities λ+ i are − −iδ . − λ | ≤ C independent of φ and by A3 fulfill the estimate |λ+ µ i i The main step of the proof is to construct B so as to make the curly bracket in (3.12) vanish, i.e. to solve for the unknown B the equation [A− , B] − iB˙ + (P − − diag(P − )) = 0.
(3.13)
The procedure explained in the previous section has to be modified since now the eigenvalues of A− depend also on the angles φ. The construction is based on a lemma by Kuksin [16] that we now summarize. On the n-dimensional torus consider the equation −i
n k=1
ωk
∂ χ (φ) + E1 χ (φ) + E2 h(φ)χ (φ) = b(φ). ∂φk
(3.14)
Here χ denotes the unknown, while b, h denote given analytic functions on Tns . h has zero average; E1 , E2 are positive constants and hs ≤ 1. Concerning the frequency vector ω = (ω1 , . . . , ωn ) the assumptions are: |ω · k| ≥
γ2 , ∀ k ∈ Zn − {0}, |k|τ
|ω · k + E1 | ≥
γ1 , ∀ k ∈ Zn . 1 + |k|τ
(3.15)
The final hypothesis is an order assumption on the magnitude of the different parameters, namely: given 0 < θ < 1 and C > 0 we assume E1θ ≥ CE2 .
(3.16)
472
D. Bambusi, S. Graffi
Lemma 3.1. (Kuksin) Under the above assumptions Eq. (3.14) has a unique analytic solution χ which for any 0 < σ < s fulfills the estimate
1 C2 χ s−σ ≤ C1 bs . exp (3.17) γ1 σ a1 γ2a2 σ a3 Here a1 , a2 , a3 , C1 , C2 are constants independent of E1 , E2 , σ, s, γ1 , γ2 , ω. To apply this lemma to the construction and estimation of B, denote G the Banach space of all bounded operators B in 2 such that A−δ/d BAδ/d extends to a bounded linear operator. The norm in G is denoted BG := max B0 , A−δ/d BAδ/d 0 . (3.18) Moreover for the s− norms of an analytic function on the torus taking values in G (possibly Lipschitz-continuous on ω ∈ − ) we will use the notations BG s ,
,L BG . s
In what follows the notation a ≤·b stands for “there exists a constant C independent of Cω± , Cµ± , γ ± , s, σ, i, j, K (some of these parameters will be defined later on) such that a ≤ Cb. Equivalently we will use the notation b ·≥ a. Lemma 3.2. Let
δ d−1
< θ < 1, γ∗ > 0, Cω∗ > 0, and C ∗ > 0 be fixed. Assume that C∗ >
Cµ−
Cλ−
,
γ ≥ γ∗ ,
Cω− ≤ Cω∗ .
(3.19)
Then for any 0 < σ < s Eq. (3.13) has a unique solution B ∈ G analytic on Tns−σ , fulfilling the estimate c 1 ,L P − L . BG (3.20) s−σ ≤· b1 exp δ,s σ σ b2 Here c, b1 , b2 are constants depending only θ, n, τ, δ, C ∗ , γ∗ , Cω∗ . Proof. Taking matrix elements among eigenvectors of A− , Eq. (3.13) becomes −i
n k=1
ωk
∂ − − − Bij + (λ− i − λj )Bij + (µi (φ) − µj (φ))Bij = Pij , ∂φk
i = j. (3.21)
The first inequality of (3.19) ensures that (3.16) holds with a suitable C independent of all the relevant constants. Then a direct application of Kuksin’s Lemma yields that (3.13) has a unique analytic solution fulfilling the estimate
1 c 1 Bij Pij . ≤· d exp (3.22) s−σ s γ |i − j d | σ a1 γ a2 σ a3 To estimate of the sup norm of B we use Lemma 5.2. To this end, first remark that |i d − j d | ≥ |i − j |(i δ + j δ ). Then consider the infinite matrices of elements (i δ
Pij , + j δ)
Pij iδ . δ δ j (i + j δ )
Nonautonomous Schrödinger Operators and KAM Methods
473
Assumption H2 entails a fortiori that these infinite matrices represent bounded operators in 2 . Then Lemma 5.2 yields the estimate of the sup norm of B and of A−δ/d BAδ/d , i.e. one has BG s−2σ ≤·
1 σ a1 +n
exp
c P − δ,s σ a3
(3.23)
after redefinition of σ as 2σ and of the constant c. To obtain the estimate of the Lipschitz norm we proceed as follows. Given a function B of ω set >B := B(ω) − B(ω ).
(3.24)
Applying the operator > to (3.21) one gets that >Bij fulfills an analogous equation. Hence by Kuksin’s Lemma its solution >B can be estimated by the same argument applied in estimating B. Dividing by |ω − ω | and applying again Lemma 5.2 one gets >B >ω
s−3σ
≤·
P L δ,s
c 1 L + a exp a P − δ,s , σ 1 σ 3
whence the proof redefining σ as 3σ and taking the sup as above.
" !
We are now ready to state and prove the main result of this section. Lemma 3.3. Consider the system (3.1) within the stated assumptions. Assume furthermore that also (3.19) holds. Then there exists an anti-selfadjoint operator B ∈ G analytically depending on φ ∈ Tns−σ , and Lipschitz continuous in ω ∈ − such that 1. B fulfills the estimate (3.20); 2. For any ω ∈ − the unitary operator eB(ωt) transforms the system (3.1) into the system (3.10); 3. The new perturbation P + fulfills the estimate + L P
δ,s−σ
2 c L ≤· P − δ,s exp b ; σ 1
γ− 4. For any positive K such that (1+K τ ) < − P and a d4 > 1 (independent of K) fulfilling
(3.25)
, there exists a closed set + ⊂ −
δ,s
− − + ≤·γ − 1 + 1 ; K d4
(3.26)
5. If ω ∈ + then assumptions H1–H3 above are fulfilled also by A+ provided the constants are replaced by the new ones defined by γ + = γ − − P − δ,s (1 + K τ ), Cµ+ = Cµ− + P − δ,s , (3.27) L Cω+ = Cω− + P − δ,s , Cλ+ = Cλ− − 2 P − δ,s . (3.28)
474
D. Bambusi, S. Graffi
Proof. The estimates on B are an obvious consequence of Lemma 3.2 above. The estimate (3.25) is an immediate consequence of Lemmas 5.3 and 5.4. Concerning (3.27) and (3.28) the only nontrivial fact to be proved is the existence of a set + such that, for ω ∈ + (3.7) and (3.8) are fulfilled with the new value of γ . Since (3.7) obviously holds, we examine (3.8). First remark that one has − + iδ ; |λ− i − λi | ≤ P δ,s therefore, for |k| ≤ K we can write, by (3.8) and the inequality |i d − j d | ≥ (i δ + j δ ): + − − − − ω · k ≥ − λ − ω · k λi − λ+ λ − P δ,s (i δ + j δ ) j i j γ − − P − δ,s (1 + K τ ) d ≥ |i − j d |. 1 + |k|τ Hence (3.8) is satisfied for such values of k. Fix i, j, k and set: + Rij k (α) := ω ∈ : λ+ − λ − ω · k ≤ α i = j, i j
γ |i d − j d | . + := − − Rij k 1 + |k|τ
(3.29) (3.30)
|k|≥K
By Lemma 5.5 the set (3.29) is nonempty only if |k| ≥ |i d − j d |(Cλ− − γ − ), and by Lemma 5.6, one has d d d d Rij k γ |i − j | ≤· γ |i − j | . 1 + |k|τ (1 + |k|τ )|k| Since |i d − j d | ≥ |i − j |(i d−1 + j d−1 ), the cardinality of the set {(i, j ) | |i d − j d | ≤ L} is bounded by an absolute constant times L2/(d−1) . Hence if τ > n + 1 + 2/(d − 1) one has d d | γ |i − j γ |i d − j d | ≤· R ij k 1 + |k|τ (1 + |k|τ )|k| ij k:|k|≥K |k|≥K,|i d −j d |≤C|k| (3.31) 1 γ ≤·γ ≤· d , K 4 s τ −n+1−2/(d−1) s≥K
and this proves the assertion. ! " 4. Iteration In this section we set up the iteration needed to prove the stated results. First we preassign the values of the various constants occurring in the iterative estimates. Hence we keep , K, s and γ fixed and define, for l ≥ 1, s , sl = sl−1 − σl , Kl := lK, 4l 2 γl = γl−1 − 4l (1 + Klτ ), Cµ,l = Cµ,l−1 + l , Cλ,l = Cλ,l−1 − 2l , Cω,l = Cω,l−1 + l . l
l := (4/3) ,
σl :=
(4.1) (4.2) (4.3)
Nonautonomous Schrödinger Operators and KAM Methods
475
The initial values of the sequences are chosen as follows: γ0 := γ , s0 = s,
Cµ,0 := 0,
Cλ,0 := Cλ ,
Cω,0 := 0. γ
Proposition 4.1. There exist ∗ = ∗ (γ ) > 0 and, for any l ≥ 1, a closed set l ⊂ γ such that, if || < ∗ , one can construct for ω ∈ l a unitary transformation Ul , analytic and quasiperiodic in t with frequencies ω, mapping the system (2.1) into the system ix˙ = (Al + P l (ωt))x,
(4.4)
where: l
1. Ul (ωt) is as follows: Ul (φ) = eB (φ) eB (φ) . . . eB (φ) , and the anti-selfadjoint operj ators B ∈ G, j=1, . . . ,l depend analytically on φ ∈ Tns−σl , are Lipschitz continuous γ in ω ∈ l and fulfill (3.20) with Pl−1 , σl in place of P − , σ , respectively. 2. Al has the form of (3.2) with the upper index “minus” replaced by l, i.e. 1
2
Al = diag(λl1 (ω) + µl1 (ωt, ω), λl2 (ω) + µl2 (ωt, ω), λl3 (ω) + µl3 (ωt, ω), . . . ). (4.5) 3. The corresponding λli and µli fulfill conditions H1, H2, H3 of the previous section, − l l provided λ− i , µi are replaced by λi , µi , respectively. 4. The following estimates hold l P
δ,sl
≤ l ,
G ,L l ≤ l , B δ,sl+1
γ − γ ≤ γl 1 + l l+1
1 (lK)d4
.
(4.6)
Proof. We proceed by induction applying Lemma 3.3. First we want to apply it to the original system (2.1) to the effect of obtaining a system of the form (4.4) with l = 1. To this end remark that (2.1) satisfies all the assumptions of Lemma (3.3) except the nonresonance conditions (3.7) and (3.8) on the frequencies. We have to restrict the set of the frequencies. Define therefore γ
0 := −
ij k
Rij k
γ |i d − j d | 1 + |k|τ
γ and remark that, by Lemma 5.6, − 0 ≤·γ . Hence we can apply Lemma 3.3 and the starting point of our induction procedure is established. To go from step l to step l + 1 one has to verify that the assumptions of Lemma 3.3 are satisfied for any l. More specifically, defining γ ∗ := γ /2 and fixing C ∗ and Cω∗ we must verify that (3.19) holds. It is easy to check that this is true provided is smaller than a constant which in particular vanishes as γ → 0. Then it is immediately realized that the conclusions of Lemma 3.3 imply the thesis if is small enough (independently of l). ! "
476
D. Bambusi, S. Graffi
Proof of Theorem 1.1. Proposition 4.1 ensures the existence of ∗ > 0 such that, for || < ∗ (γ ), liml→∞ γl = γ ∞ , γ ∞ > γ /2, and liml→∞ sl = s/2. This entails the uniform convergence of the operator valued sequence of functions Ul on Tns/4 . Hence the limit, denoted U∞ (ωt), will be analytic and quasi-periodic. Moreover, writing A∞ := diag(liml→∞ (λli + µli )), one has lim Al (φ) − A∞ (φ)δ = 0
l→∞
uniformly on Tns/4 . This proves T1 and T2. The first three estimates of T3 are also clearly ∞ γ /2 l . By the third of (4.6) we implied by the above convergence. Set now γ = l=1
have
| − γ | ≤·γ0 = γ .
Denote now γ ( ∗ ) the inverse function of γ → ∗ (γ ), and define := γ () . Then the fourth estimate of assertion T3 follows. " ! Proof of Corollaries 1.1 and 1.2. Integration of (1.4) yields: ∞
∞ (t)
χi (t) = χi (0)eiλi t eiFi
,
Fi∞ (t) :=
k∈Zn −{0}
µ∞ i,k ω·k
(eiω·kt − 1),
i = 0, 1, . . . , ∞
iFi (t) n where µ∞ χi we i,k , k = Z , are the Fourier coefficients of µi (φ). Setting xi := e get ix˙i = λ∞ x . Formula (1.7) follows taking χ = U φ. Moreover it is trivially verified i i ∞ + #k, ω$ is an eigenvalue of (1.8). ! " that φi0 (ωt)eiλi t solves (1.1) if and only if λ∞ i
Proof of Theorem 1.2. Let A denote the maximal operator in L2 (R) generated by the d2 differential expression − 2 + Q(x). It is well known that A is self-adjoint, strictly dx positive and has compact resolvent and that, denoting λi , i = 1, 2, . . . its eigenvalues, 2α one has λi ∼ i α+2 , i → ∞. Hence Condition A1 is fulfilled if α > 2. A can be realized also as a pseudifferential operator of symbol σA (x, ξ ) := ξ 2 + Q(x) under Weyl quantization. σA (x, ξ ) belongs to the symbol class ?ρα (R) := ?ρα for any 0 < ρ < 1 (notations as in [19], Sect. 23). This class of symbols generates the class Gαρ of pseudodifferential operators in L2 (R) under the Weyl quantization formula: x+y 1 ei(x−y)ξ σA ( , ξ )u(y) dydξ, u ∈ S(R). (Au)(x) = 2π n Rn ×Rn 2 The inverse [A + 1]−1 , whose principal symbol is σ(A+1)−1 (x, ξ ) = (ξ 2 + Q(x) + 1)−1 , belongs to the class G−α ρ . The functional calculus for pseudodifferential operators (see e.g. [19], Chapt. II.10,11 or [4, Chap. 8],) can be applied to operators in these classes. Hence the self-adjoint operator Aq , q > 0 defined by the spectral theorem can also be αq αq realized a pseudodifferential operator in Gρ , with symbol in ?ρ . Its principal symbol 2 q is is σAq (x, ξ ) := (ξ + Q(x)) , and the principal symbol of [Aq + 1]−1 ∈ G−αq ρ 2 q −1 σ(Aq +1)−q (x, ξ ) := [(ξ +Q(x)) +1] . By assumption the symbol of the perturbation β β V belongs to ?ρ for any 0 < ρ < 1, and hence V belongs to Gρ . By the composition
Nonautonomous Schrödinger Operators and KAM Methods
477 −αq+β
property, the operator T := V [Aq + 1]−1 admits a symbol in ?ρ , and it will be bounded if −αq + β ≤ 0 ([19], Thm. 24.3). In turn, it is enough to verify this property for the principal symbol, which in this case, by the composition formula, is given by σTP (x, ξ ) = v(x, ξ ; φ)[(ξ 2 + Q(x))q + 1]−1 . Since here q = δ/d, |σTP (x, ξ )| is bounded ∀ (x, ξ ) ∈ Rn × Rn if there is D > 0 such that |v(x, ξ ; φ)| ≤ D(ξ 2 + |x|α )δ/d . If V ∼ |x|β as |x| → ∞ the inequality is satisfied α−2 2α . Then δ < d − 1 means 0 < δ < for β ≤ αδ/d. Now we can set 1 < d = α+2 α+2 α−2 and therefore β < . ! " 2 5. Technical Lemmas Lemma 5.1. Let fj be analytic functions on Tns . Then for any 0 < σ < s one has 1/2 4n 2 1/2 fj 2 ≤ s . fj s−σ σn j ≥1
j ≥1
Proof. This is Lemma B.3 of [15]; we reproduce its proof here for convenience of the reader. First consider the case n = 1. For each j ≥ 1 there exists a point φj ∈ Ts−σ such that fj ≤ |fj (φj )|. s−σ By the Cauchy integral formula fj (ζ ) 1 dζ, fj (φj ) = 2π i ∂?ρ ζ − φj where 0 < ρ < σ , is a parameter independent of j , and ∂?ρ is the boundary of the set ?ρ := {φ : −ρ < Reφ < 2π + ρ, −(s − σ + ρ) < Imφ < s − σ + ρ}. One has 2 1/2 1 fj (ζ ) fj 2 ≤ dζ s−σ 2π i ∂?ρ ζ − φj j ≥1 j ≥1 1/2 1 4 fj (ζ ) 2 1/2 ≤ |dζ | ≤ sup |fj (φ)|2 . 2π ?ρ ζ − φj ρ Ts j ≥1
j ≥1
(5.1) Taking the limit ρ → σ one gets the result. The case n > 1 follows similarly.
" !
Lemma 5.2. Let F = (Fij ) be a bounded operator on and let the matrix elements (Fij ) be analytic functions of φ ∈ Tns . Let R = (Rij ) be another operator with matrix elements depending analytically on φ ∈ Tnσ and such that 2 ,
sup |Rij (φ)| ≤
φ∈Tns
1 sup |Fij (φ)|, |i − j | φ∈Tns
i = j.
Then, for any φ ∈ Tns , R is bounded in 2 and for any positive σ < s it fulfills the estimate 4n+1 R0,s−σ ≤ n F 0,s . σ
478
D. Bambusi, S. Graffi
Proof. This is Lemma B.4 of [15]; again we reproduce its proof here for convenience of the reader. Fix φ ∈ Ts−σ . By Lemma 5.1 and the Schwarz inequality we have Rij (φ) ≤ Rij j ≥1
s−σ
j ≥1
≤
4n+1 σn
sup
The same estimate holds for R(φ)v2 =
i≥1
≤
j ≥1
Tns
Fij 2
≤
s−σ
j ≥1
|Fij |2
1/2
j ≥1
i≥1 |Fij (φ)|.
2
|Rij (φ)||vj |
1/2
≤
σn
j ≥1
(5.2)
F 0, s.
Hence, for φ ∈ Tnσ ,
≤
j =i
4n+1
1 1/2 |i − j |2
i≥1
|Rij (φ)|
j ≥1
|Rij (φ)||vj |2
j ≥1
4n+1 2 F v2 |Rij (φ)| |Rij (φ)| |vj |2 ≤ 0,s σn i≥1
j ≥1
(5.3) which proves the result. ! " Lemma 5.3. Let B ∈ G be a bounded anti-selfadjoint operator, and let P ∈ B δ be a selfadjoint operator. Then e−B P eB ∈ B δ and, provided BG ≤ 1/2, the following estimate holds −B B (5.4) e P e − P ≤ 4 P δ BG . δ
Moreover, if both B and P are Lipschitz continuous with respect to ω ∈ , then L −B B G ,L . e P e − P ≤ 4 P L δ B δ
(5.5)
Proof. Define P (t) := e−tB P etB . Then P (t) fulfills the linear differential equation P˙ = [B, P ], whence
P (0) = P
P˙ (t)δ ≤ 2 BG P (t)δ %⇒ P (t)δ ≤ exp 2 BG t P δ .
Then (5.4) follows on account of P (t) − P =
t
[B, P (s)]ds.
0
To obtain the Lipschitz estimate remark that (same notation as in the proof of Lemma 3.2), >P fulfills the equation (>P )˙ = [>B, P ] + [B, >P ], and then proceed as in the estimation of the operator norm.
" !
Nonautonomous Schrödinger Operators and KAM Methods
479
Lemma 5.4. Let B ∈ G be the solution of Eq. (3.13) and let 0 < σ < s/2. Then: −B − B e A e − A− − [A− , B]
δ,s−2σ
L −B − B e A e − A− − [A− , B]
δ,s−2σ
− 1 Bδ,s−σ + P δ , σ
− L G ,L 1 L Bδ,s−σ + P δ . ≤· Bs−σ σ ≤· BG s−σ
(5.6) (5.7)
Proof. The proof goes by the same argument of Lemma 5.3; just use the formula e−B A− eB − A− − [A− , B] =
1
s
ds
0
e−s1 B [[A− , B], B]es1 B ds1
0
and compute [A− , B] from Eq. (3.13). The assertion easily follows.
" !
Lemma 5.5. Assume that the sequence λi fulfills Assumption H1a) of Sect. 3 and 3.4, and fix α < Cλ /2; then if i = j the set Rij k (α|i d − j d |) is empty if |k| < (Cλ /2)|i d − j d |. The proof of this lemma is straightforward and therefore omitted. Lemma 5.6. If the sequence λi fulfills assumption H1a) and (3.4) ∃ C > 0 such that, if nCω 1 ≤ , Cλ 2 then one has |Rij k (α)| ≤
Cα . |k|
Proof. Following the proof of Lemma 5 of ref. [18] we fix v ∈ {−1, 1}n such that v · k = |k| and write ω = av + w with w ∈ v ⊥ . One has that, as a function of a t (ω · k)s = |k|(t − s),
t λi − λj s ≤ Cω (i δ + j δ )|v|(t − s),
so, by Lemma 5.5, either Rij k is empty or
t 1 nCω 2 ≥ |k|(t − s), (ω · k + λi − λj ) s ≥ |k|(t − s) 1 − Cλ 2 and therefore by the assumption we can conclude Rij k (α) ≤ 4 α. |k|
" !
480
D. Bambusi, S. Graffi
References 1. Arnold, V.I.: Chapitres supplémentaires de la théorie des equations différentielles ordinaires. Moscou: Mir, 1980 2. Bellissard, J.: Stability and instability in quantum mechanics. In: Trends and Developments in the Eighties, S. Albeverio and Ph. Blanchard, Editors, Singapore: World Scientific, 1985, pp. 1–106 3. Combescure, M.: The quantum stability problem for tim-periodic perturbation of the harmonic oscillator. An. Inst. H. Poincaré 47, 62–82 (1987) ; Erratum ibidem, 451–454 4. Dimassi, M., Sjöstrand, J.: Spectral Asymptotics in the Semiclassical Limit. London Math.Soc.Lecture Notes Serie 268, Cambridge: Cambridge University Press, 1999 5. Duclos, P., Stovicek, P.: Floquet Hamiltonians with Pure Point Spectrum. Commun. Math. Phys. 177, 327–347 (1996) 6. Duclos, P., Stovicek, P., Vittot, M.: Perturbation of an eigen-value from a dense point spectrum: A general Floquet Hamiltonian. Ann. Inst. H. Poincaré Phys. Théor. 71, 241–301 (1999) 7. Gallavotti, G.: The Elements of Mechanics. Berlin–Heidelberg–New York: Springer-Verlag, 1983 8. Graffi, S., Yajima, K.: Absolute Continuity of the Floquet Spectrum for a Nonlinearly Forced Harmonic Oscillator. Commun. Math. Phys. 215, 245–250 (2000) 9. Hagedorn, G., Loss, M., Slawny, J.: Non-stochasticity of time-dependent quadratic Hamiltonians and the spectra of canonical transformations. J. Phys. A 19, 521–531 (1986) 10. Howland, J.: Floquet Operators with Singular Spectrum, I. Ann. Inst. H. Poincaré 49, 309–323 (1989); II, ibidem, 325–334 (1989) 11. Jauslin, H.R., Monti, F.: Quantum Nekhoroshev theorem for quasi-periodic Floquet Hamiltonians. Rev. Math. Phys. 10, 393–428 (1998) 12. Jauslin, H.R., Lebowitz, J.L.: Spectral and stability aspects of quantum chaos. Chaos 1, 114–121 (1991) 13. Jorba, A., Simó, C.: On the reducibility of linear differential equations with quasiperiodic coefficients. J. Differ. Eqs. 98, 111–124 (1992) 14. Joye, A.: Absence of absolutely continuous spectrum of Floquet operators. J. Stat. Phys. 75, 929–952 (1994) 15. Kappeler, T., Pöschel, J.: Perturbation of KdV Equations – The KAM proof. Preprint 1997 16. Kuksin, S.B.: On small-denominators equations with large variable coefficients. J. Appl. Math. Phys. (ZAMP) 48, 262–271 (1997) 17. Nenciu, G.: Floquet operators without absolutely continuous spectrum. Ann. Inst. H. Poincaré 59, 91–97 (1993) 18. Pöschel, J.: A KAM-Theorem for some Partial Differential Equations. Ann. Scuola Norm. Sup. Pisa Cl. Sci. 23, 119–148 (1996) 19. Shubin, M.A.: Pseudodifferential Operators and Spectral Theory. Berlin–Heidelberg–New York: Springer-Verlag, 1987 20. Xu, J., Zheng, Q.: On the reducibility of linear differential equations with quasiperiodic coefficients which are degenerate. Proc. Am. Math. Soc. 126, 1445–1451 (1998) 21. Yajima, K.: Scattering Theory for Schrödinger Operators with Potentials Periodic in Time. J. Math. Soc. Japan 29, 729–743 (1977) Communicated by B. Simon
Commun. Math. Phys. 219, 481 – 487 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Absolutely Singular Dynamical Foliations David Ruelle1 , Amie Wilkinson2 1 I.H.E.S., 35, route de Chartres, 91440 Bures-sur-Yvette, France. E-mail:
[email protected] 2 Department of Mathematics, Northwestern University, 2033 Sheridan Rd., Evanston, IL 60208-2730, USA.
E-mail:
[email protected] Received: 26 September 2000 / Accepted: 8 December 2000
Abstract: Let A3 be the product of the automorphism 21 11 of T2 and of the identity on T1 .A small perturbation g of A3 among volume preserving diffeomorphisms will have an invariant family of smooth circles forming a continuous foliation of T3 . Corresponding to the vector bundle tangent to the circles there is a “central” Lyapunov exponent of (g, volume), which is nonzero for an open set of ergodic g’s. This surprising result of Shub and Wilkinson is complemented here by showing that the volume on T3 has atomic conditional measures on the ’s: there is a finite k such that almost every carries k atoms of mass 1/k. Introduction
Let A2 be the automorphism of the 2-torus, T2 = R2 /Z2 , given by 21 11 . Let A3 be the automorphism of the 3-torus T3 = R3 /Z3 given by A02 01 . Let Diff 2µ (T3 ) be the set of C 2 diffeomorphisms of T3 that preserve Lebesgue-Haar measure µ. In [SW1], M. Shub and A. Wilkinson prove the following theorem. Theorem. Arbitrarily close to A3 there is a C 1 -open set U ⊂ Diff 2µ (T3 ) such that for each g ∈ U , 1. g is ergodic. 2. There is an equivariant fibration π : T3 → T2 such that πg = A2 π . The fibers of π are the leaves of a foliation W gc of T3 by C 2 circles. The 2-jets along the fibers of π vary continuously in M. 3. There exists λc > 0 such that, for µ-almost every w ∈ T3 , if v ∈ Tw T3 is tangent to the leaf of Wgc containing w, then lim
n→∞
1 log Tw g n v = λc . n
482
D. Ruelle, A. Wilkinson
4. Consequently, there exists a set S ⊆ T3 of full µ-measure that meets every leaf of Wgc in a set of leaf-measure 0. The foliation Wgc is not absolutely continuous. Additionally, it is shown that the diffeomorphisms in U are nonuniformly hyperbolic and Bernoullian. In this note, we prove: Theorem I. Let g satisfy conclusions 1–3 of the previous theorem. Then there exist S ⊆ T3 of full µ-measure and k ∈ N such that S meets every leaf of Wgc in exactly k points. The foliation Wgc is absolutely singular. Remark. Theorem I was also proved several years ago by Anatole Katok, as a first step in an attempt to show that examples such as those later constructed in [SW1] cannot exist (since the full argument turned out not to be valid, this work remains unpublished). We are indebted to Katok for useful conversations, and for pointing out the argument that shows that the number k in Theorem I might necessarily be greater than 1. We also thank Michael Shub for useful conversations. In Katok’s example of an absolutely singular foliation in [Mi], the leaves of the foliation meet the set of full measure in one point. In the [SW1] examples, the set S may necessarily meet leaves of Wgc in more than one point, as the following argument of Katok’s shows. It follows from Theorem II in [SW2] that for k ∈ Z+ and for small a, b > 0, the map g = ja,k ◦ hb satisfies the hypotheses of Theorem I, where hb (x, y, z) = (2x + y, x + y, x + y + z + b sin 2πy), and ja,k (x, y, z) = (x, y, z) + a cos(2π kz) · (1 +
√
5, 2, 0).
For k ∈ N, let ρk be the vertical translation that sends (x, y, z) to (x, y, z + k1 ). Note that hb ◦ ρk = ρk ◦ hb and ja,k ◦ ρk = ρk ◦ ja,k . Thus g ◦ ρk = ρk ◦ g. The fibration π : T3 → T2 was obtained in [SW1] by using the persistence of normally hyperbolic submanifolds under perturbations. In the present case the symmetries ρk preserve the fibers of the trivial fibration P : T3 → T2 from which one starts, and also the maps g. Therefore the fibers of π : T3 → T2 (i.e., the leaves of center foliation Wgc ) are invariant under the action of the finite group < ρk >. Let S be the (full measure) set of points in T3 for which the center direction is a positive Lyapunov direction (i.e. for which conclusion 3 holds). Since ρk (Wgc ) = Wgc , it follows that ρk S = S. If p ∈ S ∩ W c (p), then ρk (p) ∈ ρk (S) ∩ ρk (W c (p)) = S ∩ W c (p); that is, S ∩ W c (p) contains at least k points. Thus Theorem I is “sharp” in the sense that we cannot say more about the value of k in general. Theorem I has an interesting interpretation. Recall that a G-extension of a dynamical system f : X → X is a map fρ : X × G → X × G, where G is a compact group, of the form (x, y) → (f (x), ρ(x)y). If f preserves ν, and ρ : X → G is measurable, then fρ preserves the product of ν with Haar measure on G. A Z/kZ-extension is also called a k-point extension. Let λ be an invariant probability measure for a k-point extension of f : X → X, and {λx } the family of conditional measures associated with the partition {{x} × G}. We remark that if λ is ergodic, then each atom of λx must have the same weight 1/k (up to a set of λ-measure 0).
Absolutely Singular Dynamical Foliations
483
Now take g ∈ U . Choose a coherent orientation on the leaves of {π −1 (x)}x∈T 2 . Take h : T3 → T2 × T to be any continuous change of coordinates such that h restricted to π −1 (x) is smooth and orientation preserving to {x} × T. We may then write F = h ◦ g ◦ h−1 : T2 × T → T2 × T in the form F (x, p) = (A2 x, ϕx (p)), where ϕx : T → T is smooth and orientation preserving. If P : T2 × T → T2 is the projection on the first factor of the product, we have P ◦ h = π . Therefore, writing λ = h∗ µ, we have P ∗ λ = π ∗ µ. Let {λx } be the disintegration of the measure λ along the fibers {x} × T. By a further measurable change of coordinates, smooth along each {x} × T fiber, we may assume that λ-almost everywhere, the atoms of λx are at l/k, for l = 0, . . . , k − 1. But then ϕx permutes the atoms cyclically, and we obtain the following corollary. Corollary. For every g ∈ U there exists k ∈ N such that (T3 , µ, g) is isomorphic to an (ergodic) k-point extension of (T2 , π ∗ µ, A2 ). M. Shub has observed that if g = ja,k ◦hb , then entropy considerations imply that π ∗ µ is actually Lebesgue measure on T2 . Hence, for this g, π is a finite-to-one semiconjugacy between g and A2 , sending Lebesgue measure on T3 to Lebesgue measure on T2 . 1. Proof of Theorem I The proof of Theorem I follows from a more general result about fibered diffeomorphisms. Before stating this result, we describe the underlying setup and assumptions. Let (X, ν) be a probability space, and let f : X → X be invertible and ergodic with respect to ν. It is convenient to assume that X is in fact a Polish topological space since this assumption is made in the study of random smooth dynamical systems in [BL]1 . Let M be a compact Riemannian manifold and φ a map X → Diff 1+α (M). Consider the skew-product transformation F : X × M → X × M given by F (x, p) = (f (x), ϕx (p)) and assume that it is (Borel) measurable. Also, let µ be an F -invariant ergodic probability measure on X × M such that π∗ µ = ν, where π : X × M → X is the projection onto the first factor. (0) (k) For x ∈ X, let ϕx be the identity map on M and for k ∈ Z, define ϕx by ϕx(k+1) = ϕf k (x) ◦ ϕx(k) . Since the tangent bundle to M is measurably trivial, the derivative map of ϕ along the M direction gives a cocycle X × M × Z → GL(n, R), where n = dim(M): (x, p, k) → Dp ϕx(k) . If log+ Dϕ ∈ L1 (X × M, µ), then Oseledec’s Theorem and ergodicity imply that the Lyapunov exponents λ1 < λ2 · · · < λl of this cocycle exist and are constant for µ-a.e. (x, p). We call these the fiberwise exponents of F . 1 A Polish space is a separable topological space with topology given by a complete metric. We use only the Borel structure defined by the topology.
484
D. Ruelle, A. Wilkinson
The next result, Theorem II, states that if these exponents are negative, then µ has atomic disintegration along M-fibers of X × M. The proof of Theorem II uses a fibered Pesin stable manifold theorem, which requires a stronger hypothesis on ϕ than integrability of log+ Dϕ. Namely, we asssume that for some α > 0, log+ Dϕα ∈ L1 (X, ν), where · α is the α-Hölder norm. Theorem II. Suppose that λl < 0. Then there exists a set S ⊆ X × M and an integer k ≥ 1 such that • µ(S) = 1, • For every (x, p) ∈ S, we have #(S ∩ {x} × M) = k. This has the immediate corollary: Corollary. Let f ∈ Diff 1+α (M). If µ is an ergodic measure with all of its exponents negative, then it is concentrated on the orbit of a periodic sink. The corollary has a simple proof using regular neighborhoods. Our proof is a fibered version. Theorem I is also a corollary of Theorem II. For this, the argument is actually applied to the inverse of g, which has negative fiberwise exponents, rather than to g itself, whose fiberwise exponents are positive. As we described in the previous remarks, there is a continuous change of coordinates, smooth along the fibers of π in which g −1 is expressed as a skew product of T2 × T: F (x, p) = (A2 x, ϕx (p)). Since the 2-jets of the fibers of π vary continuously, (by Assumption 2), the maps x → ϕx can be chosen to vary continuously in the C 2 -norm on Diff 2 (M). This implies that log+ Dϕα ∈ L1 (X, ν), for α = 1. Remark. Without the assumption that f is invertible, Theorem II is false. An example is described by Y. Kifer [Ki], which we recall here. Let f : T → T be a C 1+α diffeomorphism with exactly two fixed points, one attracting and one repelling. Consider the following random diffeomophism of T: with probability p ∈ (0, 1), apply f , and with probability 1 − p, rotate by an angle chosen randomly from the interval [−-, -]. Let X = ({0, 1} × T)N . To generate a sequence of diffeomorphisms f0 , f1 , . . . according to the above rule, we first define ϕ : X → Diff 1+α (T) by f if ω(0) = (0, θ ), ϕ(ω) = Rθ if ω(0) = (1, θ), where Rθ is rotation through angle θ . Next, we let ν- be the product of p, 1 − pmeasure on {0, 1} with the measure on T that is uniformly distributed on [−-, -]. Then corresponding to ν-N -almost every element ω ∈ X is the sequence {fk = ϕ(σ k (ω))}∞ k=0 , where σ : X → X is the one-sided shift σ (ω)(n) = ω(n + 1). Put another way, the random diffeomorphism is generated by the (noninvertible) skew product τ : X × T → X × T, where τ (ω, x) = (σ (ω), ϕ(ω)(x)). An ergodic ν- -stationary measure for this random diffeomorphism is a measure µ- on T such that µ- × ν-N is τ -invariant and ergodic. Such measures always exist ([Ki], Lemma I.2.2), but, for this example, there is an ergodic stationary measure with additional special properties.
Absolutely Singular Dynamical Foliations
485
Specifically, for every - > 0, there exists an ergodic ν- -stationary measure µ- on T such that, as - → 0, µ- → δx0 , in the weak topology, where δx0 is Dirac measure concentrated on the sink x0 for f . From this, it follows that, as - → 0, the fiberwise Lyapunov exponent for µ- approaches log |f (x0 )| < 0, which is the Lyapunov exponent of δx0 . Thus, for - sufficiently small, the fiberwise exponent for τ with respect to µ- is negative. Nonetheless, it is easy to see that µ- for - > 0 cannot be uniformly distributed on k atoms; if µ- were atomic, then τ -invariance of µ- × ν-N would imply that, for every x ∈ T, µ- ({Rθ (x)})dθ µ- ({x}) = pµ- ({f −1 (x)}) + (1 − p) = pµ- ({f
−1
−-
(x)}),
which is impossible if µ- has finitely many atoms. In fact, µ- can be shown to be absolutely continuous with respect to Lebesgue measure (see [Ki], p. 173 ff. and the references cited therein). Hence invertibility is essential, and we indicate in the proof of Theorem II where it is used. Proof of Theorem II. We first establish the existence of fiberwise “stable manifolds” for the skew product F . A general theory of stable manifolds for random dynamical systems is worked out in ([Ki], Theorem V.1.6 and more explicitly in [BL]). Since we are assuming that all of the fiberwise exponents for F are negative, we are faced with the simpler task of constructing fiberwise regular neighborhoods for F (see the Appendix by Katok and Mendoza in [KH]). We outline a proof, following closely [KH]. Theorem 1.1 (Existence of Regular Neighborhoods). There exists a set 50 ⊆ X × M of full measure such that for - > 0: • There exists a measurable function r : 50 → (0, 1] and a collection of embeddings 7(x,p) : B(0, q(x, p)) → M such that 7(x,p) (0) = p and exp(−-) < r(F (x, p))/r(x, p) < exp(-). • If ϕ(x,p) = 7F−1(x,p) ◦ ϕx ◦ 7(x,p) : B(0, r(x, p)) → Rn , then D0 ϕ(x,p) satisfies −1 exp(λ1 − -) ≤ D0 ϕ(x,p) −1 , D0 ϕ(x,p) ≤ exp(λl + -).
• The C 1 distance dC 1 (ϕ(x,p) , D0 ϕ(x,p) ) < - in B(0, r(x, p)). • There exist a constant K > 0 and a measurable function A : 50 → R such that for y, z ∈ B(0, r(x, p)), K −1 d(7(x,p) (y), 7(x,p) (z)) ≤ y − z ≤ A(x)d(7(x,p) (y), 7(x,p) (z)), with exp(−-) < A(F (x, p))/A(x, p) < exp(-). Proof. See the proof of Theorem S.3.1 in [KH].
Decompose µ into a system of fiberwise measures dµ(x, p) = dµx (p)dν(x). Invariance of µ with respect to F implies that, for ν-a.e. x ∈ X, ϕx ∗ µx = µf (x) . Corollary. There exists a set 5 ⊆ X × M, and real numbers R > 0, C > 0, and c < 1 such that
486
D. Ruelle, A. Wilkinson
(1) µ(5) > .5, and, if (x, p) ∈ 5, then µx (5x ) > .5, where 5x = {p ∈ M | (x, p) ∈ 5}. (2) If (x, p) ∈ 5 and dM (p, q) ≤ R, then dM (ϕx(m) (p), ϕx(m) (q)) ≤ Ccm dM (p, q), for all m ≥ 0. Proof. This follows in a standard way from the Mean Value Theorem and Lusin’s Theorem. To prove Theorem II, it suffices to show that there is a positive ν-measure set B ⊆ X, such that for x ∈ B, the measure µx has an atom, as the following argument shows. For x ∈ X, let d(x) = supp∈M µx (p). Clearly d is measurable, f -invariant, and positive on B. Ergodicity of f implies that d(x) = d > 0 is positive and constant for almost all x ∈ X. Let S = {(x, p) ∈ X × M | µx (p) ≥ d}. Observe that S is F -invariant, has measure at least d, and hence has measure 1. The conclusions of Theorem II follow immediately. Let 5, R > 0, C > 0, and c < 1 be given by Corollary 1, and let B = π(5). Let N be the number of R/10-balls needed to cover M. We now show that for ν-almost every x ∈ B, the measure µx has at least one atom. For x ∈ X, let m(x) = inf diam (Uj ), where the infimum is taken over all collections of closed balls U1 , . . . , Uk in M such that k ≤ N and µx ( kj =1 Uj ) ≥ .5. Let m = ess sup x∈B m(x). We now show that m = 0. If m > 0, then there exists an integer J such that C>cJ N < m/2,
(1)
where > is the diameter of M. Let U be a cover of M by N closed balls of radius R/10. For x ∈ B, let U1 (x), . . . , Uk(x) (x) be those balls in U that meet 5x . Since k(x) these balls cover 5x , and µx (5x ) > .5, it follows that µx ( j =1 Uj (x)) ≥ .5. But (i)
ϕx
∗ µx
= µf i (x) , and so it’s also true that µf i (x) (
k(x) j =1
ϕx(i) (Uj (x))) ≥ .5,
(2)
for all i. (i) We now use the fact that ϕx contracts regular neighborhoods to derive a contradiction. The balls Uj (x) meet 5x and have diameter less than R/10, and so by Corollary 1, (2), we have diam (ϕx(i) (Uj (x))) ≤ C>ci .
(3)
Let τ : B → N be the first-return time of f J to B, so that f J τ (x) (x) ∈ B, and f J i (x) ∈ / B, for i ∈ {1, . . . , τ (x) − 1}. Decompose the set B according to these first return times: ∞ B= Bi (mod 0), i=1
Absolutely Singular Dynamical Foliations
487
where Bi = τ −1 (i). Because f is invertible and f −1 preserves measure, we also have the mod 0 equivalence: B :=
∞
f J i (Bi ) = B
(mod 0).
i=1
Let y ∈ B . Then y = f J i (x), where x ∈ Bi ⊆ B, for some i ≥ 1. It follows from the definition of m(y) and inequalities (2), (3) and (1) that m(y) ≤
k(x) j =1
diam (ϕx(J i) (Uj (x)))
≤ Ck(x)>cJ i ≤ CN >cJ < m/2. But then m = ess sup x∈B m(x) = ess sup y∈B m(y) < m/2, contradicting the assumption m > 0. Thus m = 0, and, for ν-almost every x ∈ B, we have m(x) = 0. If m(x) = 0, then there is a sequence of closed balls U 1 (x), U 2 (x), · · · with limi→∞ diam (U i (x)) = 0 and µx (U i (x)) ≥ .5/N, for all i. Take pi ∈ U i (x); any accumulation point of {pi } is an atom for µx . Since we have shown that µx has an atom, for ν-a.e. x ∈ B, the proof of Theorem II is complete.
References [BL]
Bahnmüller, J. and Liu. P.-D.: Characterization of measures satisfying Pesin’s entropy formula for random dynamical systems. J. Dynam. Diff. Eq. 10, no. 3, 425–448 (1998) [KH] Katok, A. and Hasselblatt, B.: Introduction to the modern theory of dynamical systems. Cambridge, 1995 [Ki] Kifer, Y.: Ergodic theory of random transformations. Boston: Birkhäuser, 1986 [Mi] Milnor, J.: Fubini foiled: Katok’s paradoxical example in measure theory. Math. Intelligencer 19, no. 2, 30–32 (1997) [SW1] Shub, M. and Wilkinson, A.: Pathological foliations and removable zero exponents. Inv. Math. 139, 495–508 (2000) [SW2] Shub, M. and Wilkinson, A.: A stably Bernoullian diffeomorphism that is not Anosov. Preprint Communicated by Ya. G. Sinai
Commun. Math. Phys. 219, 489 – 522 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations M. D. Groves1, , G. Schneider2 1 Mathematisches Institut A, Universität Stuttgart, Pfaffenwaldring 57, 70569 Stuttgart, Germany 2 Mathematisches Institut, Universität Bayreuth, 95440 Bayreuth, Germany
Received: 8 November 2000 / Accepted: 12 December 2000
Abstract: We consider modulating pulse solutions for a nonlinear wave equation on the infinite line. Such a solution consists of a permanent pulse-like envelope steadily advancing in the laboratory frame and modulating an underlying wave-train. The problem is formulated as an infinite-dimensional dynamical system with one stable, one unstable and infinitely many neutral directions. Using a partial normal form and invariantmanifold theory we establish the existence of modulating pulse solutions which decay to small-amplitude disturbances at large distances. 1. Introduction 1.1. The problem and main result. We consider the nonlinear wave equation ∂t2 u = ∂x2 u − u + g(u) on the infinite line x ∈ R, where g : R → R is a smooth, odd function which satisfies g(u) = O(u3 ) and g (0) > 0. This class of equations includes for example the sineGordon equation ∂t2 u = ∂x2 u − sin u and the φ 4 equation ∂t2 u = ∂x2 u − u + u3 ,
(1)
upon which we will concentrate in order to keep the notation simple. It is well known that on time-scales of order O(1/ 2 ) Eq. (1) has O( )-amplitude solutions which are slow spatial and temporal modulations of an underlying wave train ei(k0 x−ω0 t) , where Supported by a Research Fellowship from the Alexander von Humboldt Foundation. Permanent address: Department of Mathematical Sciences, Loughborough University, Loughborough, LE11 3TU, UK
490
M. D. Groves, G. Schneider
k0 and ω0 are related by the linear dispersion relation ω02 = k02 + 1. Such solutions are described by the formula uA (x, t) = (A(X, T )ei(k0 x−ω0 t) + c.c.) + O( 2 ), where X = (x − cg t), T = 2 t, cg = k0 /(1 + k02 )1/2 is the linear group velocity and A satisfies the nonlinear Schrödinger equation 2iω0 ∂T A + (1 − (cg )2 )∂X2 A + 3|A|2 A = 0
(2)
(e.g. see Kalyakin [10], Kirrmann, Schneider and Mielke [13] and Schneider [23]). Equation (2) possesses a three-parameter family of time-periodic solutions of the form A(X, T ) = B(X − X0 )e−iγ0 T eiφ0 , in which the real-valued function B satisfies the second-order ordinary differential equation BXX = C1 B − C2 B 3 , where C1 = −2γ0 ω0 /(1 − (cg )2 ), C2 = 3/(1 − (cg )2 ). For γ0 < 0 and ω0 > 0 this equation has two homoclinic solutions 2C1 1/2 1/2 Bpulse (X) = ± sech (C1 X) C2 which connect the origin with itself. This procedure therefore identifies modulating pulse solutions of the nonlinear wave equation which are described by the approximate formula upulse = (Bpulse (X)e−iγ0 T ei(k0 x−ω0 t) + c.c.)
= (Bpulse ( (x − cg t))eik0 (x−(cp +γ1
2 )t)
+ c.c.)
accurately over time-scales of order O(1/ 2 ); here cp = (1 + k02 )1/2 /k0 is the linear phase velocity and γ1 = γ0 /k0 . In this paper we consider whether Eq. (1) has modulating pulse solutions which exist for all times t ∈ R. We establish the following result. Theorem 1. Fix a positive integer n and a positive real number k0 . For sufficiently small
> 0 (depending upon n and k0 ) there exists an infinite-dimensional, continuous family of modulating pulse solutions to Eq. (1) of the form u(x, t) = v(x − cg t, x − cp t), where v is 2π/k0 -periodic in its second argument and cp = cp + γ1 2 ,
cg =
1 . cp
These solutions satisfy v(ξ, y) = v(−ξ, y), where and limξ →±∞ h(ξ, ) = 0.
|v(ξ, y) − 2h(ξ, ) sin k0 y| ≤ n+1 , h(ξ, ) = Bpulse ( ξ ) + O( 2 )
ξ, y ∈ R,
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
491
The modulating pulse solutions identified in Theorem 1 consist of a permanent pulselike envelope which moves with constant speed cg and modulates a periodic wave train moving with velocity cp . A modulating pulse of this kind is shown in Fig. 1. In the subsequent analysis it will become clear that our result is optimal in the sense that we cannot expect that limξ →±∞ v(ξ, y) = 0.
cg
cp Fig. 1. A modulating pulse solution
1.2. The method. Theorem 1 is proved by formulating the governing equation for v(ξ, y) as an evolutionary problem in which the unbounded spatial coordinate ξ plays the role of time. This idea was introduced for nonlinear problems by Kirchgässner [11] and has become known as “spatial dynamics”. The evolutionary problem is considered as an infinite-dimensional dynamical system in which the coordinates are the components of the Fourier sine-series expansion of v(·, y). The dynamical system obtained in this fashion is a reversible Hamiltonian system with infinitely many degrees of freedom. The spectrum of the linearised system consists of infinitely many purely imaginary eigenvalues together with one positive and one negative real eigenvalue which are both O( ) in the bifurcation parameter. Our task is to find pulse-like solutions of this dynamical system. To understand the main steps and difficulties in the proof of Theorem 1 using the above spatial dynamics formulation, it is helpful to review an analogous finite-dimensional problem. Consider a Hamiltonian system with n degrees of freedom whose Hamiltonian function takes the form n
H (q, p, ) =
µm 1 2 1 2 ) + O(|(q, p)|3 ), p1 − C1 2 q12 + (q 2 + pm 2 2 2 m m=2
in which C1 > 0 and the higher-order term is a smooth function of q, p with coefficients which depend analytically upon . The associated linearised system has one √ √ positive real eigenvalue C1 , one negative real eigenvalue − C1 and n − 1 pairs of purely imaginary eigenvalues ±iµm , m = 2, . . . , n. Let us also suppose that the system is reversible, so that the Hamiltonian vector field anticommutes with the reverser S : (q, p) → (q, −p); it follows that (q(−ξ ), −p(−ξ )) solves Hamilton’s equations
492
M. D. Groves, G. Schneider
whenever (q(ξ ), p(ξ )) is a solution. This Hamiltonian system is ammenable to standard normal-form theory (see Elphick et al. [5] and Meyer and Hall [15, Ch. VII]), an application of which yields the following result. Lemma 1. For each N ≥ 3 there exists a near-identity, analytic, symplectic change of coordinates with the property that n
H (q, p, ) =
µm 1 2 1 2 2 + qm ) p1 − C1 2 q12 + (pm 2 2 2 m=2
+ HBNF (q12 , 21 (q22 + p22 ), . . . 21 (qn2 + pn2 ), ) + O(|(q, p, )|N−1 |(q, p)|3 ) in the new coordinates. The term HBNF is a polynomial of order N + 1 in (q, p, ) which satisfies HBNF (q, p, ) = O(|(q, p)|3 ). The change of coordinates preserves the reversibility. The finite-dimensional Hamiltonian system obtained by using this lemma and omitting the O(|(q, p, )|N−1 |(q, p)|3 ) remainder term in the Hamiltonian function is completely integrable (the truncated Hamiltonian function and the action functionals Im = 2 2 1 2 (qm + pm ), m = 2, . . . , n are independent first integrals). Its solution set can therefore be completely described in a systematic manner, as the following remarks indicate. The subspace {(q1 , p1 ) = (0, 0)} is clearly invariant under the flow, and since the action variables I2 , . . . , In act as integrals it is foliated by (n − 1)-dimensional tori. We may use this construction to obtain a foliation of the 2n-dimensional phase space into a family of two-dimensional reduced phase spaces parameterised by I2 , . . . , In : for each fixed value of these integrals the equations for q1 , p1 , namely ∂ξ q1 = p1 , ∂ξ p1 = C1 q1 2 − 2q1 ∂1 HBNF (q12 , I2 , . . . , In , ) = C1 q1 2 − C2 q13 + . . . , where C1 , C2 > 0, describe a one-degree-of-freedom Hamiltonian system whose phase portrait is calculated by elementary methods. The angles α2 , . . . , αn corresponding to the action variables I2 , . . . , In are recovered by quadrature. In particular, the above system has a pair of small-amplitude orbits which are homoclinic to the origin whenever the coefficient C2 /2 of q14 in HBNF is positive, and these orbits correspond to solutions of the truncated Hamiltonian system from Lemma 1 which are homoclinic to an (n − 1)torus in {(q1 , p1 ) = (0, 0)}. For I2 = . . . = In = 0 we recover two homoclinic orbits ±h(ξ ) which connect the origin with itself. A dynamical system with the above eigenvalue structure has a one-dimensional stable and a one-dimensional unstable manifold at zero; a homoclinic orbit at zero arises when these two manifolds coincide in the 2n-dimensional phase space. Such a situation can clearly only arise in 2n-dimensional space as the result of a degeneracy. In the case of the truncated Hamiltonian system from Lemma 1 the degeneracy is the invariance of {(qj , pj ) = (0, 0), j = 2, . . . , n}, itself a consequence of the complete integrability of the system. This degeneracy, and hence the homoclinic orbits at zero, generically do not persist when the higher-order remainder terms are re-introduced. One can also consider pulse-like solutions which consist of homoclinic connections to small-amplitude periodic or quasiperiodic orbits. The stable manifold to an invariant
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
493
m-torus consists of solutions which start at time ξ = 0 and converge to the torus as ξ → ∞; it is an (m + 1)-dimensional invariant manifold parameterised by time ξ and the initial data for the angles defining the quasiperiodic flow on the torus. The stable manifold to one of the invariant (n − 1)-tori in {(qj , pj ) = (0, 0), j = 2, . . . , n} for the truncated Hamiltonian system from Lemma 1 can be calculated explicitly, and it is readily confirmed that this n-dimensional manifold intersects the n-dimensional symmetric section ) = Fix S transversally in 2n points. Such points correspond to points of symmetry on homoclinic connections to the (n − 1)-torus. Because the intersection is transversal it generically survives the re-introduction of the higher-order remainder terms, and we expect that 2n pulse-like solutions persist. This procedure is carried out for the case n = 2, in which the tori are periodic orbits, by Groves and Mielke [8] for a finite-dimensional system which arises in water-wave theory. For n > 2 there is an additional complication, namely that the persistence of the invariant (n − 1)-tori is itself an issue which has to be resolved using KAM theory (see Pöschel [20] and the references therein). The above construction also relies upon the fact that the invariant tori are of maximal dimension n − 1 in the subspace {(qj , pj ) = (0, 0), j = 2, . . . , n}. It can therefore not be extended to problems with an infinite-dimensional centre part, since KAM theory is at present restricted to existence results for finite-dimensional invariant tori in infinite-dimensional phase space (e.g. see Kuksin [14], Craig and Wayne [3] and Pöschel [21]). Another possibility of constructing pulse-like solutions is to consider intersections of an appropriately defined centre-stable manifold with the symmetric section. Following standard constructions we can define a local centre-stable manifold W cs at time ξ = 0 for solutions near h(ξ ) in the following manner. Consider solutions v on ξ ≥ 0 of the form v(ξ ) = w(ξ ) + h(ξ ), where the hyperbolic part of w(ξ ) remains O(δ)-small for all ξ ≥ 0 (so that the hyperbolic part of v(ξ ) remains O(δ)-close to h(ξ ) for all ξ ≥ 0). The function w(ξ ) satisfies a nonautonomous differential equation and we construct W cs from the initial data w(0) on solutions whose centre parts are O(δ) on some time interval starting at ξ = 0; it is (2n − 1)-dimensional and parameterised by the centre and stable components of the initial data. Of course intersections of W cs and ) do not necessarily correspond to points of symmetry on solutions which stay O(δ)-close to h(ξ ) for all ξ ∈ R (the centre part of any solution can a priori grow polynomially in time). We can however exploit the Hamiltonian structure to show that there is a submanifold W˜ cs of W cs which constitutes a global centre-stable manifold; this manifold is constructed by restricting to sufficiently small initial data (see below). The manifolds W cs and W˜ cs can be calculated explicitly for the truncated Hamiltonian system from Lemma 1; in particular one finds that W˜ cs intersects ) transversally in a continuum of points. The usual transversality argument therefore indicates that the symmetric pulse-like solutions survive the re-introduction of the higher-order remainder terms. The existence of the submanifold W˜ cs of W cs is a consequence of the positivedefiniteness of H on the locally invariant centre manifold W c . This manifold is constructed by considering solutions on ξ ∈ R whose hyperbolic part remains O(δ) for all ξ ∈ R. The manifold W c consists of those points on such solutions whose centre parts are O(δ); it is (2n − 2)-dimensional and parameterised by the initial data for the centre parts of the solutions. A standard argument shows that W c can be described as the graph of a quadratic reduction function which defines the hyperbolic coordinate of a point as a function of its centre coordinate. Inserting the reduction function into the Hamiltonian, we find that it is positive definite on W c , and the usual Lyapunov stability argument shows that solutions which lie on W c with sufficiently small centre part at some time
494
M. D. Groves, G. Schneider
ξ remain on W c with O(δ) centre part for all subsequent times. Finally, we apply a familiar result from dynamical systems theory which asserts that a solution h(ξ ) + wcs (ξ ) with wcs (0) ∈ W cs converges exponentially to a solution vc (ξ ) on W c as ξ → ∞. By taking sufficiently small initial data wcs (0) we can find ξ > 0 so that wcs (ξ ) and |wcs (ξ ) + h(ξ ) − vc (ξ )| are both sufficiently small at ξ = ξ (the first quantity grows at most polynomially in time while the second decays exponentially in time). It follows that the centre part of wcs (ξ ) remains O(δ) for all ξ > ξ and hence that wcs (ξ ) + h(ξ ) remains O(δ)-close to h(ξ ) for all ξ ≥ 0. In this paper we use the above strategy based upon centre-stable manifolds to prove Theorem 1 in its infinite-dimensional setting. The problem is formulated as a reversible Hamiltonian system with infinitely many degrees of freedom in Sect. 2. A normalform result as complete as that given in Lemma 1 is not possible due to asymptotic resonances among the purely imaginary eigenvalues, but a partial normal-form theory is available which yields a sequence of approximate systems, each of which possesses two homoclinic orbits connecting the origin to itself (see Sect. 3). The local centre-stable and centre manifolds W cs , W c described above are constructed in Sect. 4, while the construction of the global centre-stable manifold W˜ cs is handled in Sect. 5 (with δ = n+1 in the above notation). The semilinearity of the equation studied here allows us to follow the standard method used for finite-dimensional systems, in which the nonlinearities are modified by use of a cut-off function in the centre part and the solutions defining the required manifold are characterised as solutions of an integral equation. The integral equation is solved using the contraction-mapping principle, and using initial data from the solutions we define global centre-stable and stable manifolds for the modified equations which correspond to local centre-stable and stable manifolds for the original equations. It is important to obtain precise estimates of the -dependence of the manifolds in the singular limit → 0 studied here; we therefore give full details of their construction. Sect. 6 contains the proof that W˜ cs intersects the symmetric section ) in a continuum of points. The proof is an analytical version of the heuristic transversality argument given above. The key is an application of the contraction-mapping principle to the hyperbolic part of initial data on W cs ; here the precise dependence of solutions upon clearly plays a central role. Spatial dynamics methods have recently been used in a number of applications involving modulating travelling waves. Haragus-Courcelle and Schneider [9] examined small-amplitude bifurcating fronts in the Taylor-Couette problem which occur when a trivial ground state (the Couette flow) loses stability and bifurcates into a spatially periodic pattern (the Taylor vortices). Their spatial dynamics formulation also has an infinite dimensional centre part at criticality, but as the bifurcation parameter is increased from zero all the eigenvalues leave the imaginary axis, four with speed O( ) and all others with speed O( 1/2 ). The small-amplitude dynamics are therefore controlled by a four-dimensional centre-manifold. A similar situation had previously been discussed for the Swift-Hohenberg equation by Eckmann and Wayne [4]. Sandstede and Scheel [22] consider a bifurcation scenario for a reaction-diffusion equation in which the zero equilibrium becomes unstable and bifurcates into a spatially periodic pattern (a Turing instability). They assume the existence of a large-amplitude localised pulse at criticality and examine how the Turing instability affects it. Their problem has a finite-dimensional centre part but an infinite-dimensional hyperbolic part and is therefore ill-posed. By contrast, our problem is well-posed but has an infinite-dimensional centre part. Cubic nonlinear Schrödinger equations have been derived as modulation equations in applications such as nonlinear optics (Newell and Moloney [18]), nonlinear elasticity
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
495
(Fu [6]) and water waves (Zakharov [25], Craig, Sulem and Sulem [2]). The extension of the present result to these problems is planned as future research. There the situation is complicated by the quasilinearity of the governing equations and the presence of an additional infinite number of stable and unstable directions in the spatial dynamics formulation. 2. Spatial Dynamics Formulation We look for modulating pulse solutions of the nonlinear wave Eq. (1) of the form u(x, t) = v(x − cg t, x − cp t) = v(ξ, y), where v is periodic in y with period 2π/k0 for some k0 > 0. Making this Ansatz, one arrives at the equation (1 − cg2 )∂ξ2 v + 2(1 − cg cp )∂ξ ∂y v + (1 − cp2 )∂y2 v − v + v 3 = 0. It is convenient to choose cp = cp + γ1 2 ,
cg = 1/cp ,
so that cp is a small perturbation of the phase velocity cp of the linearised problem and the equation simplifies to (1 − cg2 )∂ξ2 v + (1 − cp2 )∂y2 v − v + v 3 = 0.
(3)
We now formulate Eq. (3) as an infinite-dimensional dynamical system in which the coordinate ξ is the time-like variable and p is a spatial coordinate. Introducing the new variable w = ∂ξ v, we find that ∂ξ v =w, ∂ξ w = −
(4) 1 − cp2 2 ∂ v 1 − cg2 y
+
1 1 v− v3, 2 1 − cg 1 − cg2
(5)
and these equations can be understood as a dynamical system in the infinite-dimensional phase space s+1 s X = {(v, w) ∈ Hper (2π/k0 ) × Hper (2π/k0 )},
s ≥ 0;
the domain of the densely-defined vector field on the left-hand side of Eqs. (4), (5) is s+2 s+1 D = {(v, w) ∈ Hper (2π/k0 ) × Hper (2π/k0 )}.
Equations (4), (5) represent Hamilton’s equations for the Hamiltonian system (X , /, H ), where the position-independent symplectic form / : T X |x × T X |x → R is given by 2π/k0 /((v1 , w1 ), (v2 , w2 )) = (w2 v1 − v2 w1 ) dy, 0
496
M. D. Groves, G. Schneider
where in the following the tangent spaces T X |x are identified with the phase space X . The Hamiltonian function H ∈ C ∞ (X , R), which depends upon the parameter , is defined by
2π/k0
H (v, w) =
0
(1 − cp2 ) w2 1 1 − v2 + v4 (∂y v)2 − 2 2 2 2(1 − cg ) 4(1 − cg2 ) 2(1 − cg )
dy.
The corresponding Hamiltonian vector field vH is calculated by noting that the point x ∈ X belongs to D(vH ) with vH |x = v| ¯ x if and only if /(v| ¯ x , v) = dH [x](v) for all v ∈ T X |x ; a straightforward calculation based on this procedure confirms that Hamilton’s equations are given by (4) and (5). Equations (4), (5) also have a number of discrete symmetries. Firstly, they are reversible: the Hamiltonian vector field anticommutes with the reverser S : (v, w) → (v, −w). This symmetry has the consequence that (v(−ξ ), −w(−ξ )) solves the equations whenever (v(ξ ), w(ξ )) is a solution. There are two further discrete symmetries, namely antisymmetry in the dependent variable and symmetry in the spatial variable: the equations are invariant under the transformations (v, w) → (−v, −w) and y → −y. Note also that the periodicity in y combines with the translation invariance in this variable to give an O(2) symmetry. The next step in the analysis is to examine the spectrum of the linearised system associated with (4) and (5). Writing the variables as Fourier series v=
k0 vm eik0 my , 2π m∈Z
w=
k0 wm eik0 my 2π m∈Z
with reality condition v−m = v¯m , we find that the mth Fourier component satisfies the system ∂ξ vm =wm ,
(6)
m2 k02 (1 − cp2 ) + 1 ∂ ξ wm = vm (1 − cg2 )
(7)
of ordinary differential equations. The eigenvalues λm, of this system are given by λ2m, =
m2 k02 (1 − cp2 ) + 1 (1 − cg2 )
= (k02 + 1)(1 − m2 ) − 2k0 (1 + k02 )1/2 (k02 + m2 )γ1 2 + O( 4 ), in which the O( 4 ) estimate on the remainder term holds uniformly in m. To determine the spectrum of the linearised system associated with (4), (5) we decompose the phase space X into a direct sum ⊕m∈N0 Em of subspaces, where Em consists of the generalised eigenspaces corresponding to the mth and −mth Fourier components. Examining Eqs. (6), (7), we arrive at the following conclusions concerning the structure of Em , m ∈ N0 .
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
497
2 1/2 + O( ). The m = 0: We have two simple, real eigenvalues λ± 0, = ±(1 + k0 ) corresponding eigenvectors are given by 1 v = . ±λ0, w
m = 1: For = 0 we have a geometrically double zero eigenvalue. The two corresponding eigenvectors 1 1 v v cos(k0 y), sin(k0 y), = = 0 0 w w each have a generalised eigenvector, namely 0 0 v v cos(k0 y), sin(k0 y). = = 1 1 w w For > 0 we have two geometrically double eigenvalues λ± 1, which satisfy the equa2 = −2k γ 2 (1 + k 2 )3/2 + O( 4 ); they are therefore real if γ < 0. The tion (λ± ) 0 1 1 1, 0 eigenvectors are 1 1 v v = = cos(k0 y), sin(k0 y). λ± λ± w w 1, 1, m > 1: We have two geometrically double purely imaginary eigenvalues given by 2 1/2 (k 2 + 1)1/2 + O( ). The eigenvectors are λ± m, = ±i(m − 1) 0
v w
=
1 λ± m,
cos(k0 my),
v w
=
1 λ± m,
sin(k0 my).
Recall that (4), (5) are antisymmetric in (v, w) and symmetric in y. The equations are therefore invariant under the transformation (y, v, w) → (−y, −v, −w), and this further symmetry allows us to look for solutions in the form of sine series ∞ ∞ k0 k0 vm sin(k0 my), w = wm sin(k0 my). v= π π m=1
m=1
This reduction has the effect of eliminating the subspace E0 and making all eigenvalues geometrically simple. This eigenvalue picture is summarised in Fig. 2; for > 0 we have a two-dimensional hyperbolic part and an infinite-dimensional centre part. Working in the coordinates vm , wm , so that X = 2s+1 × 2s ,
D = 2s+2 × 2s+1 ,
498
M. D. Groves, G. Schneider
where t
2 = x = (xm )m∈N |
x2t
:=
m=1
we find that the Hamiltonian is given by ∞ 2 λ2m, 2 wm H = + − v 2 2 m m=1
∞
2 m2t xm
<∞ ,
gij k2 vi vj vk v2 ,
±i±j ±k±2=0
in which gij k2 =
k0 sgn± δ±i±j ±k±2 , 32π(1 − cg2 )
where sgn± is 1 or −1 according to whether there are an even or odd number of plus signs in the summand. Hamilton’s equations ∂ξ v m =
∂H , ∂wm
∂ξ wm = −
∂H , ∂vm
m = 1, 2, . . .
clearly inherit the reversibility of Eqs. (4), (5), and they are also invariant under the transformation (vm , wm ) → (−vm , −wm ). Im
Im
Re
Re
ε=0
ε >0
Fig. 2. The spectrum of the linearised problem consists of infinitely many simple purely imaginary eigenvalues together with a Jordan block at the origin for = 0 and two simple real eigenvalues for > 0
3. Normal-Form Theory In this section we perform a sequence of symplectic changes of variable which simplifies the Hamiltonian function and predicts the existence of pulse solutions. The first step is to transform the Hamiltonian into its quadratic canonical form. To this end we make the linear symplectic change of variable qm vm = √ , µm
wm =
√ µm p m ,
m = 1, 2, . . . ,
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
499
where µ1 = 1 and µ2m = −λ2m,0 = (k02 + 1)(m2 − 1), m = 2, 3, . . . . This change of variable transforms the phase space into X = 2s+1/2 × 2s+1/2 and the Hamiltonian function into H =
∞ ∞ λ21, 2 p12 µm 2 νm, 2 2 + (pm + qm q1 − qm + )− g˜ ij k2 qi qj qk q2 , 2 2 2 2 m=2
m=2
in which g˜ ij k2 = √
1 gij k2 , µ i µj µk µ2
νm, =
±i±j ±k±2=0
1 2 (λ − λ2m,0 ). µm m,
Hamilton’s equations become ∂ξ qm =
∂H , ∂pm
∂ξ pm = −
∂H , ∂qm
m = 1, 2, . . . ,
(8)
where the domain of the Hamiltonian vector field is D = 2s+3/2 × 2s+3/2 . Note that the action of the reverser S on X is given by S(qm , pm ) = (qm , −pm ) and Hamilton’s equations are invariant under the transformation (qm , pm ) → (−qm , −pm ). We now take s > 0, so that X is a Banach algebra; this step leads to simpler estimates in the subsequent analysis. Writing H = HL + HL + HN , where ∞
HL =
1 2 µm 2 2 ), p + (pm + qm 2 1 2 m=2
HL = −
λ21, 2
q12 −
∞ νm, 2 q , 2 m
m=2
we can also decompose vH = vHL +vHL +vHN , where the linear vector fields vHL , vHL : D ⊂ X → X are unbounded and densely defined, while vHN : X → X is bounded and defined upon the whole of X . Note that vHN satisfies the estimate vHN (q, p) = O((q, p)3 ); here, and in the remainder of this article, the symbol · denotes the norm in X and the order-of-magnitude estimates relate to terms which are smooth functions X → X . This semilinearity plays a central role in the subsequent analysis. The next step is to apply a series of normal-form transformations which eliminate certain terms in the Hamiltonian function. It is not possible to obtain a normal-form result as complete as that given in Lemma 1 for the corresponding finite-dimensional problem. The main difficulty lies in the fact that the normal-form transformations contain linear combinations of the frequencies µm = (k02 + 1)1/2 (m2 − 1)1/2 , m = 2, 3, . . . in their denominators, and asymptotic resonances among these frequencies lead to a smalldivisor problem (see Pöschel [21] for a thorough explanation of this point). Nevertheless, it is possible to use a partial normal form to arrive at the same conclusions concerning homoclinic orbits in the normal-form approximations. The essential requirements on the normal-form approximations are that S = {(qm , pm ) = 0, m = 2, 3, . . . } is an invariant subspace and that the dynamics in this subspace are controlled by a dynamical system of the form ∂ξ q1 = p1 , ∂ξ p1 =
λ21, q1
(9) − 2q1 ∂1 F (q12 , ).
The following result shows that these requirements are met.
(10)
500
M. D. Groves, G. Schneider
Theorem 2. Consider the Hamiltonian system (X , /, H ). For each N ≥ 3 there is a near-identity, analytic, symplectic, local change of coordinates 6 : Y → X , where Y is a neighbourhood of the origin in X , with the property that H =
∞ ∞ λ21, 2 p12 µm 2 νm, 2 2 (pm + qm q + HNF (q, p, ) + q1 − )− 2 2 m 2 2 m=2
m=2
+ O((q, p, )N−2 (q, p)4 ) in the new coordinates. The term HNF is a polynomial of order N + 1 in (q, p, ); it satisfies HNF (q, p, ) = O((q, p)4 ), contains no terms which are linear in (qm , pm ), m = 2, 3, . . . and HNF |S = F (q12 , ). The change of coordinates preserves the reversibility and the invariance under the transformation (q, p) → (−q, −p). Proof. We construct a sequence 6j , j = 4, . . . , N + 1 of symplectic transformations with the property that 6j eliminates those monomials in the Hamiltonian of the form β β
k p1α q1 qm , k p1α q1 pm , where m ≥ 2, k ≤ j − 4 and α + β = j − k − 1, together with β those monomials of the form k p1α q1 , where α > 0, k ≤ j − 4 and α + β = j − k. Using the symbol {·, ·} to denote the Poisson bracket given by ∞ ∂F ∂G ∂F ∂G , − {F, G} = ∂pm ∂qm ∂qm ∂pm m=1
we find that β
{HL , P1 } = p1α q1 qm ,
β
{HL , P2 } = p1α q1 pm ,
β
{HL , P3 } = p1α q1 ,
where α+β α+β−1 q1 q m + · · · cβ+1 p1 pm + cβ p1 β α+1 β−1 + c2 p1 q1 qm + c1 p1α q1 pm , P1 = α+β α+β−1 q1 p m + · · · cβ+1 p1 qm + cβ p1 β α+1 β−1 + c2 p1 q1 qm + c1 p1α q1 pm , α+β α+β−1 p qm + d β p 1 q 1 pm + · · · d β+1 1 β α+1 β−1 + d2 p1 q1 pm + d1 p1α q1 qm , P2 = dβ+1 p α+β pm + dβ p α+β−1 q1 qm + · · · 1 1 β−1 β + d2 p1α+1 q1 pm + d1 p1α q1 qm , P3 =
1 β+1 p α−1 q1 β +1 1
ck =
(−1)k β! , µkm (β − k + 1)!
β even, β odd, β even, β odd,
and dk =
(−1)k+1 β! , µkm (β − k + 1)!
k = 1, . . . , β + 1.
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
501
Note that there are no small-divisor problems in the above formulae since µm ≥ µ2 > 0 for all m ≥ 2. We can therefore write down a finite sum Sj of monomials of degree j in (q, p, ) with the property that the expression {HL , Sj } consists of precisely the terms of order j we wish to eliminate. For a sufficiently small neighbourhood Y of the origin in X we find that the Hamiltonian vector field vSj associated with the Hamiltonian system (Y, /, Sj ) is an analytic function Y → X which depends analytically upon . The time-one map vSt j |t=1 of its flow is therefore well defined on Y and by Liouville’s theorem it constitutes a symplectic change of variable 6j : Y → X which depends analytically upon . Using Taylor’s theorem, we find that the Hamiltonian H is transformed into H ◦ 6j = H ◦ vSt j |t=1
= H + {H, Sj } +
1 0
(1 − t){{H, Sj }, Sj } ◦ vSt j dt
∞ νm, 2 = HL − q12 − q + H4 + . . . + Hj −1 + Hj + {HL , Sj } + HR , 2 2 m
λ21,
m=2
where the symbol Hi represents those terms in the Taylor series which are homogeneous of order i in ( , q, p) with Hi = O((q, p, )i−4 (q, p)4 ) and the remainder HR : Y → R satisfies HR = O((q, p, )j −3 (q, p)4 ). Observe that terms in H4 , . . . , Hj −1 are not affected, while those of order j identified above are removed. Note that the above argument relies upon the fact that vSj is defined upon the whole of Y, a consequence of the semilinearity of (8). The reversibility implies that H (q, −p) = H (q, p), so that monomials of the type β β β
k p1α q1 qm , k p1α q1 pm and k p1α q1 can only appear in H for respectively even, odd and even values of α. Inspecting the above formulae, we find that P1 (q, −p) = −P1 (q, −p) (α is even), P2 (q, −p) = −P2 (q, −p) (α is odd), P3 (q, −p) = −P3 (q, −p) (α is even). The vector field vSj is therefore antireversible, that is S ◦ vSj = vSj ◦ S, and the change of variables 6j inherits this property. It follows that (H ◦ 6j )(q, −p) = (H ◦ 6j )(q, p), so that the transformed Hamiltonian system is reversible. A similar argument shows that the change of variable preserves the invariance under the transformation (q, p) → (−q, −p). Let us now write the Hamiltonian system obtained using Theorem 2 as ∂ξ X = L(X) + L h (X) + L c (X) + F (X) + R (X),
(11)
where L : D ⊂ X → X and L c : D ⊂ X → X are the unbounded linear operators 1 2 corresponding respectively to the quadratic Hamiltonians HL and ∞ m=2 2 νm, qm , Lh : X → X is the bounded linear operator corresponding to the quadratic Hamiltonian − 21 λ21, q12 , F : Y → X is the analytic vector field corresponding to the Hamiltonian HNF and R is the smooth O((q, p, )N−1 (q, p)3 ) remainder term. Clearly S = {(qm , pm ) = 0, m = 2, 3, . . . } is an invariant subspace for the truncated system ∂ξ X = L(X) + L h (X) + L c (X) + F (X),
(12)
the dynamics in which are controlled by the two-dimensional system (9), (10). The solution set of Eqs. (9), (10) is analysed by introducing the scaled variables ξ˜ = ξ,
q1 (ξ ) = q˜1 (ξ˜ ),
p1 (ξ ) = 2 p˜ 1 (ξ˜ )
502
M. D. Groves, G. Schneider
and writing λ21, = C1 2 + O( 4 ),
F (q12 , ) =
where C1 = −2k0 γ1 (k02 + 1)3/2 > 0,
C2 =
C2 4 q + O(q16 ), 2 1 3k0 (k02 + 1) > 0. 4π
It follows that ∂ξ˜ q˜1 = p˜ 1 , ˜ ∂ξ˜ p˜ 1 = C1 q˜1 − C2 q˜13 + R(q,
), ˜ is odd in its first argument. In the limit → 0 these where the O( ) remainder term R equations are equivalent to ∂ξ˜ q˜1 = p˜ 1 , ∂ξ˜ p˜ 1 = C1 q˜1 − C2 q˜13 , whose phase portrait is easily calculated by elementary methods and is depicted in Fig. 3. Notice in particular that it has two homoclinic orbits (q˜1+ , p˜ 1+ ), (q˜1− , p˜ 1− ) given by the explicit formulae 2C1 1/2 d ± 1/2 ± ˜ q˜1 (ξ ) = ± sech(C1 ξ˜ ), p˜ 1± (ξ˜ ) = (q˜1 (ξ˜ )). C2 dξ˜
p~1
q~1
Fig. 3. Dynamics in the (q˜1 , p˜ 1 )-subspace
We can exploit the reversibility to show that the phase portrait remains qualitatively unchanged for > 0. In particular, we find that (9), (10) has two homoclinic orbits (q1+ , p1+ ), (q1− , p1− ) of the form
f ± ( ξ ) q1± (ξ ) = , p1± (ξ )
2 g ± ( ξ )
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
503
where f ± , g ± are smooth functions with bounded derivatives. These homoclinic orbits satisfy q1± (−ξ ) q1± (ξ ) = , ξ ∈R p1± (ξ ) −p1± (−ξ ) and
|q1± (ξ )| ≤ M e−r|ξ | ,
|p1± (ξ )| ≤ M 2 e−r|ξ | ,
ξ ∈R
for r ∈ (0, λ1, ) (see Kirchgässner [12, Proposition 5.1] for an analytical explanation or Groves [7, §4] for a geometric explanation of this result). The Hamiltonian system (12) therefore has a pair of homoclinic orbits which are clearly related by the symmetry (q, p) → (−q, −p). We denote one of these orbits by h and note that its qj and pj components vanish for j ≥ 2. Moreover h(ξ ) ≤ M e−r|ξ | and (Sh)(−ξ ) = h(ξ ) for each ξ ∈ R; these facts play an important role in the subsequent analysis. 4. The Local Centre-Stable and Centre Manifolds 4.1. Notation and preliminary estimates. The change of variable X = Xˆ transforms (11) into ˆ + L h (X) ˆ + L c (X) ˆ + Fˆ (X) ˆ + ρ Rˆ (X), ˆ ∂ξ Xˆ = L(X)
(13)
and (12) into the same equation with ρ = 0; here ˆ = −1 F (X), Fˆ (X)
ˆ = −N R (X), Rˆ (X)
ρ = N−1 .
In the following analysis ρ is used to activate or deactivate the remainder term Rˆ in Eq. (13). We treat ρ as a bifurcation parameter which is independent of , but check that all results are compatible with the relationship ρ = N−1 . Note that (13) represents Hamilton’s equations for the scaled Hamiltonian system (Y, /, Hˆ ), where the scaled Hamiltonian is given by the formula ˆ = −2 H (X), Hˆ (X) so that Hˆ =
∞
pˆ 12 µm 2 2 ˆ 2 ) + O(ρ(X, ˆ )N−2 X ˆ 4 ). + (pˆ m + qˆm ) + O( 2 X 2 2
(14)
m=2
We begin by writing
ˆ Xˆ = Zˆ + h,
to find that ˆ + L c (Z) ˆ + N1 (Z) ˆ + ρN2 (Z), ˆ ˆ + L h (Z) ∂ξ Zˆ = L(Z) in which ˆ = Fˆ (hˆ + Z) ˆ − Fˆ (h), ˆ N1 (Z) ˆ = Rˆ (hˆ + Z). ˆ N2 (Z)
(15)
504
M. D. Groves, G. Schneider
Here we have used the fact that hˆ solves Eq. (13) with ρ = 0; this function satisfies ˆ ) ≤ Me−r|ξ | , h(ξ
ξ ∈ R.
(16)
Note that N1 , N2 are smooth functions Y → X , so that (13) is semilinear. The function ˆ we return to this point below. N1 contains terms which are linear in Z; Lemma 2. The estimates ˆ ≤ M (Z ˆ + e−r|ξ | ), N2 (Z) N2 (Zˆ 1 ) − N2 (Zˆ 2 ) ≤ M Zˆ 1 − Zˆ 2 ˆ Zˆ 1 , Zˆ 2 ∈ Y. hold for all Z, Proof. Because Rˆ is a smooth function on the open set Y of the Banach algebra X , we have the estimate ˆ ≤ Rˆ (Zˆ + h) ˆ N2 (Z) ˆ 3 ≤ M Zˆ + h ˆ 3 + h ˆ 3) ≤ M (Z ˆ + e−r|ξ | ). ≤ M (Z Similarly, we find that ˆ − Rˆ (Zˆ 2 + h) ˆ N2 (Zˆ 1 ) − N2 (Zˆ 2 ) = Rˆ (Zˆ 1 + h) ≤ M Zˆ 1 − Zˆ 2 . Lemma 3. The estimates ˆ ≤ M 2 (Z ˆ 2 + Ze ˆ −r|ξ | ), N1 (Z) N1 (Zˆ 1 ) − N1 (Zˆ 2 ) ≤ M 2 (Zˆ 1 + Zˆ 2 + e−r|ξ | )Zˆ 1 − Zˆ 2 ˆ Zˆ 1 , Zˆ 2 ∈ Y. hold for all Z, Proof. Recall that Fˆ is a polynomial in Xˆ with no quadratic terms; we have that ˆ = Fˆ3 (X, ˆ X, ˆ X) ˆ + · · · + FˆN (X, ˆ . . . , X), ˆ Fˆ (X) where Fˆj =
1 j ˆ j ! d F [0],
and Fˆj Lj (X ,X ) is O( j −1 ) for j ≥ 3. It follows that
ˆ = Fˆ3 (Zˆ + h, ˆ Zˆ + h, ˆ Zˆ + h) ˆ − Fˆ3 (h, ˆ h, ˆ h) ˆ + ··· N1 (Z) ˆ . . . , Zˆ + h) ˆ − FˆN (h, ˆ . . . , h), ˆ + FˆN (Zˆ + h, and we can estimate ˆ h, ˆ . . . , Zˆ + h) ˆ − Fˆj (h, ˆ . . . , h) ˆ Fˆj (Z+ ˆ . . . , Z) ˆ + j Fˆj (Z, ˆ . . . , Z, ˆ h) ˆ + . . . + j Fˆj (Z, ˆ h, ˆ . . . , h) ˆ = Fˆj (Z, ˆ 2 + Ze ˆ −r|ξ | ), j = 3, . . . , N. ≤ M j −1 (Z
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
505
To obtain the second estimate note that N1 (Zˆ 1 ) − N1 (Zˆ 2 ) =
N
ˆ . . . , Zˆ 1 + h) ˆ − Fˆj (Zˆ 2 + h, ˆ . . . , Zˆ 2 + h) ˆ Fˆj (Zˆ 1 + h,
j =3
and write, for example, ˆ . . . , Zˆ 1 + h) ˆ − Fˆ3 (Zˆ 2 + h, ˆ . . . , Zˆ 2 + h) ˆ Fˆ3 (Zˆ 1 + h, ˆ h) ˆ = Fˆ3 (Zˆ 1 , Zˆ 1 , Zˆ 1 ) − Fˆ3 (Zˆ 2 , Zˆ 2 , Zˆ 2 ) + 3Fˆ3 (Zˆ 1 − Zˆ 2 , h, ˆ − 3Fˆ3 (Zˆ 2 , Zˆ 2 , h) ˆ + 3Fˆ3 (Zˆ 1 , Zˆ 1 , h) = Fˆ3 (Zˆ 1 − Zˆ 2 , Zˆ 1 , Zˆ 1 ) + Fˆ3 (Zˆ 1 − Zˆ 2 , Zˆ 2 , Zˆ 2 ) + Fˆ3 (Zˆ 1 − Zˆ 2 , Zˆ 1 , Zˆ 2 ) ˆ h) ˆ + 3Fˆ3 (Zˆ 1 − Zˆ 2 , Zˆ 1 , h) ˆ + 3Fˆ3 (Zˆ 1 − Zˆ 2 , Zˆ 2 , h), ˆ + 3Fˆ3 (Zˆ 1 − Zˆ 2 , h, so that ˆ . . . , Zˆ 1 + h) ˆ − Fˆ3 (Zˆ 2 + h, ˆ . . . , Zˆ 2 + h) ˆ Fˆ3 (Zˆ 1 + h, 2 ≤ M (Zˆ 1 + Zˆ 2 + e−r|ξ | )Zˆ 1 − Zˆ 2 . The remaining summands are estimated in a similar fashion.
In the theory below it is also necessary to estimate the strictly nonlinear part of N1 . Lemma 4. The function N0 defined by ˆ = N1 (Z) ˆ − dFˆ [h]( ˆ Z) ˆ N0 (Z) satisfies the estimates ˆ ≤ M 2 Z ˆ 2, N0 (Z) N0 (Zˆ 1 ) − N0 (Zˆ 2 ) ≤ M 2 (Zˆ 1 + Zˆ 2 )Zˆ 1 − Zˆ 2 ˆ Zˆ 1 , Zˆ 2 ∈ Y. for all Z, Proof. In the notation of the previous lemma, we have that ˆ = N1 (Z) ˆ − N0 (Z)
N
ˆ . . . , h, ˆ Z), ˆ j Fˆj (h,
j =1
and the stated results are found by repeating the steps in the proof of the previous lemma. Let us now decompose the space X into the hyperbolic and centre parts Xh and Xc determined by the spectrum of the operator L, so that X = Xh ⊕ Xc ,
506
M. D. Groves, G. Schneider
where Xh = {u ∈ X | qm = pm = 0, m = 2, 3, . . . }, Xc = {u ∈ X | q1 = p1 = 0}. Denote the projection onto Xh along Xh by P : X → X and define Q = I − P . We also use the subscripts c and h as a shorthand notation, so that Xh = P X , Xc = QX . Using this notation, we now establish some basic facts concerning the linear operators ˆ L = L + L h + L c and Lhˆ = L + dFˆ [h]. Recall that Xc and Xh are invariant under L. The spectrum of the 2 × 2 matrix L|Xh consists of two real eigenvalues ±λ, where λ = |λ1, |. The corresponding eigenvectors are u = (1, λ) and s = (1, −λ), and a direct calculation shows that the basis {s ∗ , u∗ } of R2 which is dual to {u, s} in Xh satisfies |u∗ | ≤
M , λ
|s ∗ | ≤
M . λ
The spectrum of L|Xc consists of one pair of simple purely imaginary eigenvalues associated with each mode (qm , pm ), m ≥ 2. The following lemma is a direct consequence of this observation. Lemma 5. The operator L|Xc generates a strongly continuous group {K(ξ )}ξ ∈R of operators in L(Xc , Xc ) such that sup K(ξ )L(Xc ,Xc ) ≤ M.
ξ ∈R
The following results state some spectral properties of the operator Lhˆ . Lemma 6. The hyperbolic subspace Xh is invariant under Lhˆ . ˆ The function Fˆ has by Proof. It suffices to show that Xh is invariant under dFˆ [h].
ˆ ˆ ˆ construction the property that QF (X) = 0 whenever Xc = 0 (see Sect. 3). Since ˆ Z) ˆ = 0 whenever Zˆ c = 0. hˆ c = 0, we conclude that QdFˆ [h]( Lemma 7. The equation ∂ξ Zˆ h = Lhˆ |Xh Zˆ h has solutions s(ξ ), u(ξ ) on [0, ∞) such that |s(ξ )| ≤ Me−λξ ,
|u(ξ )| ≤ Meλξ ,
ξ ∈ [0, ∞).
The dual basis {s ∗ (ξ ), u∗ (ξ )} to {s(ξ ), u(ξ )} in Xh satisfies |s ∗ (ξ )| ≤
M λξ e , λ
|u∗ (ξ )| ≤
M −λξ e , λ
ξ ∈ [0, ∞).
Proof. Note that (Lhˆ − L)|Xh L(Xh ,Xh ) ≤ Me−r|ξ | ,
ξ ∈R
and use the method explained by Groves and Mielke [8, §4.3].
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
507
4.2. The local centre-stable manifold. We now construct the local centre-stable manifold for solutions of Eq. (15) at time ξ = 0. This manifold consists of the initial data for solutions of (15) which exist for some time interval starting at ξ = 0. Such solutions are constructed by modifying the nonlinearities by a cut-off function and proving a global existence result for solutions of the modified system on [0, ∞) whose hyperbolic parts are small. We write Eq. (15) as ˆ + ρN2 (Z)), ˆ ∂ξ Zˆ h = Lhˆ (Zˆ h ) + P (N0 (Z) ˆ + ρN2 (Z)) ˆ ∂ξ Zˆ c = L(Zˆ c ) + Q(N1 (Z)
(17) (18)
and modify the nonlinearities by means of a cut-off function θ ∈ C ∞ (X , R) which satisfies 1, Zˆ c ≤ δ, ˆ θ(Zc ) = 0, Zˆ c ≥ 2δ, where δ = n for some n ∈ N. We therefore consider the equations ˆ + ρN2 (Z)), ˆ ∂ξ Zˆ h = Lhˆ (Zˆ h ) + P (N0 (Z) ˆ + ρN2 (Z)), ˆ ∂ξ Zˆ c = L(Zˆ c ) + Q(N1 (Z)
(19) (20)
ˆ = Nj (Zθ ˆ (Zˆ c )). Note that N0 , N1 , N2 , are controlled by the same in which Nj (Z) estimates as respectively N0 , N1 , N2 . Consider the integral equation ˆ ) = Vs s(ξ ) + K(ξ )Vc Z(ξ ξ ˆ + ρN2 (Z))(τ ˆ + !P (N0 (Z) ), s ∗ (τ )" dτ s(ξ ) 0 ∞ ˆ + ρN2 (Z))(τ ˆ !P (N0 (Z) ), u∗ (τ )" dτ u(ξ ) −
ξ
ξ
+
ˆ + ρN2 (Z))(τ ˆ K(ξ − τ )Q(N1 (Z) )) dτ,
0
where Vs ∈ R, Vc ∈ Dc and !· , ·" denotes the usual inner product in R2 . We write this equation as ˆ Zˆ = F(Z) and study it in the Banach space Er+ = {Zˆ ∈ C([0, ∞), X ) | Zr := sup e−rξ Z(ξ ) < ∞}, ξ ≥0
in which the exponent r satisfies 0 < r < λ. The following lemma is proved using standard techniques for the regularity of mild solutions of semilinear evolutionary problems (see Pazy [19, Ch. 6]). Lemma 8. Any fixed point Zˆ ∈ Er+ of F is a solution of (19), (20) on [0, ∞) with the property that ˆ !Zˆ h (0), s ∗ (0)" = Vs , QZ(0) = Vc .
508
M. D. Groves, G. Schneider
Theorem 3. Suppose that |Vs | ≤ M n+1 and ρ ≤ n+2 . The function F is a contraction on Bδ+ = {Zˆ ∈ Er+ | sup |Zˆ h (ξ )| ≤ δ} ξ ≥0
and the unique fixed point of F in
Bδ+
satisfies
sup |Zˆ h (ξ )| ≤ M n+1 . ξ ≥0
Proof. The first step is to show that Bδ+ is invariant under F. For |Zˆ h (ξ )| ≤ δ on [0, ∞), we find that M ξ 2 2 −λξ ˆ |(F(Z))h (ξ )| ≤ M|Vs |e + [ δ + ρ (δ + e−rτ )]eλτ dτ e−λξ λ 0 M ∞ 2 2 + [ δ + ρ (δ + e−rτ )]e−λτ dτ eλξ λ ξ M = M|Vs |e−λξ + 2 [ 2 δ 2 + ρ δ](1 − e−λξ ) λ M M M + ρ (e−rξ − e−λξ ) + 2 [ 2 δ 2 + ρ δ] + ρ e−rξ λ(λ − r) λ λ(λ + r) ρ (21) ≤ M |Vs | + δ 2 +
and ˆ c (ξ ) ≤ MVc + M (F(Z))
ξ
[ 2 (δ 2 + δe−rτ ) + ρ (δ + e−rτ )] dτ
0 2 2
(22)
≤ MVc + M( δ + ρ δ)ξ + M( δ + ρ). It follows that ˆ h (ξ )| ≤ M n+1 + M|Vs |, |(F(Z)) ˆ c r < ∞, (F(Z)) whenever δ = n , ρ ≤ n+2 , and we conclude that F maps Bδ+ into itself with ˆ h (ξ )| ≤ M n+1 on [0, ∞) for |Vs | ≤ M n+1 . |(F(Z)) It remains to show that F is a contraction in Bδ+ . Observe that |(F(Zˆ 1 ))h (ξ ) − (F(Zˆ 2 ))h (ξ )| ≤
M λ
ξ
( 2 δ + ρ )Zˆ 1 − Zˆ 2 eλτ dτ e−λξ M ∞ 2 + ( δ + ρ )Zˆ 1 − Zˆ 2 e−λτ dτ eλξ λ ξ M ξ 2 ≤ ( δ + ρ )e(λ+r)τ dτ e−λξ Zˆ 1 − Zˆ 2 r λ 0 M ∞ 2 + ( δ + ρ )e(r−λ)τ dτ eλξ Zˆ 1 − Zˆ 2 r λ ξ 0
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
=
509
M ( 2 δ + ρ )(erξ − e−λξ )Zˆ 1 − Zˆ 2 r λ(r + λ) M ( 2 δ + ρ )erξ Zˆ 1 − Zˆ 2 r + λ(λ − r)
and (F(Zˆ 1 ))c (ξ ) − (F(Zˆ 2 ))c (ξ ) ≤ M
ξ
[ 2 (δ + e−rτ ) + ρ ]Zˆ 1 − Zˆ 2 dτ
0
ξ ≤M [( 2 δ + ρ )erτ + 2 ] dτ Zˆ 1 − Zˆ 2 r 0 M 2 rξ 2 ≤ ( δ + ρ )(e − 1) + M ξ Zˆ 1 − Zˆ 2 r r for Zˆ 1 , Zˆ 2 ∈ Bδ+ , so that |(F(Zˆ 1 ))h − (F(Zˆ 2 ))h |r M M ≤ ( 2 δ + ρ ) + ( 2 δ + ρ ) Zˆ 1 − Zˆ 2 r λ(r + λ) λ(λ − r) and (F(Zˆ 1 ))c − (F(Zˆ 2 ))c r ≤ in which the estimate
M 2 M 2 ( δ + ρ ) + r r
Zˆ 1 − Zˆ 2 r ,
sup rξ e−rξ ≤ M ξ ≥0
has been used. It follows that LipB+ F < 1 for δ = n and ρ ≤ n+2 . δ
Lemma 9. Suppose that ρ ≤ n+2 . Any solution Zˆ of (19), (20) on [0, ∞) which satisfies sup |Zˆ h (ξ )| ≤ δ ξ ≥0
is a fixed point of F in Bδ+ with !Zˆ h (0), s ∗ (0)" = Vs ,
Zˆ c (0) = Vc .
Proof. Any solution of (19), (20) on [0, ∞) satisfies ξ ˆ ) = !Zˆ h (0), s ∗ (0)"s(ξ ) + ˆ + ρN2 (Z)), ˆ s ∗ (τ )" dτ s(ξ ) Z(ξ !P (N0 (Z) 0
+ !Zˆ h (0), u (0)"u(ξ ) + ∗
+ K(ξ )Zˆ c (0) +
ξ
ˆ + ρN2 (Z)), ˆ u∗ (τ )" dτ u(ξ ) !P (N0 (Z)
0
ξ
ˆ + ρN2 (Z)) ˆ dτ. K(ξ − τ )Q((N1 (Z)
0
Using this analogue of the variation-of-constants formula, we now follow the standard arguments given by Coddington and Levinson [1, pp. 332–333].
510
M. D. Groves, G. Schneider
We now use the above results to define a local centre-stable manifold at time ξ = 0 for the nonautonomous Eqs. (17), (18); this manifold is a global centre-stable manifold for Eqs. (19), (20) in which the cut-off function is used. Definition 1. Suppose that ρ ≤ n+2 and take Vc ∈ Dc , Vs ∈ R with the property that F has a unique fixed point Zˆ Vs ,Vc in Bδ+ . The set of points W cs =
{Zˆ Vs ,Vc (0)}
Vs ,Vc
is called the local centre-stable manifold for solutions to (17), (18) at time ξ = 0. 4.3. The local centre manifold. We now construct a locally invariant manifold for Eq. (13) which consists of solutions whose hyperbolic parts are small. More precisely, we consider the equations ˆ + ρ Rˆ (X)), ˆ ∂ξ Xˆ h = L(Xˆ h ) + P (Fˆ (X) ˆ + ρ Rˆ (X)), ˆ ∂ξ Xˆ c = L(Xˆ c ) + Q(Fˆ (X)
(23) (24)
where the underscore represents modification by the cut-off function θ . The estimates ˆ ≤ M X ˆ 3, Rˆ (X) Rˆ (Xˆ 1 ) − Rˆ (Xˆ 2 ) ≤ M Xˆ 1 − Xˆ 2 ,
(25)
ˆ ≤ M 2 X ˆ 3, Fˆ (X) Fˆ (Xˆ 1 ) − Fˆ (Xˆ 2 ) ≤ M 2 (Xˆ 1 + Xˆ 2 )Xˆ 1 − Xˆ 2
(26)
ˆ Xˆ 1 , Xˆ 2 ∈ Y and are also valid for Fˆ and Rˆ . clearly hold for all X, Consider the integral equation ξ ˆ ) = K(ξ )Vc + ˆ + ρ Rˆ (X))(τ ˆ X(ξ !P (Fˆ (X) ), s ∗ eλτ " dτ se−λξ −∞ ∞ ˆ + ρ Rˆ (X))(τ ˆ − !P (Fˆ (X) ), u∗ e−λτ " dτ ueλξ
ξ
ξ
+
ˆ + ρ Rˆ (X))(τ ˆ K(ξ − τ )Q(Fˆ (X) ) dτ,
0
where Vc ∈ Dc . We write this equation as ˆ Xˆ = G(X) and study it in the Banach space ˆ r = sup e−r|ξ | X(ξ ˆ ) < ∞ . Er = Xˆ ∈ C(R, X ) | X ξ ∈R
The following theorem is obtained in the same way as the results in Sect. 4.2.
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
511
Theorem 4. Suppose that ρ ≤ n+2 . (i) Any fixed point Xˆ ∈ Er of G is a solution of (23), (24) on R with the property that ˆ QX(0) = Vc . (ii) The function G is a contraction on Bδ = {Xˆ ∈ Er | sup |Xˆ h (ξ )| ≤ δ} ξ ∈R
and the unique fixed point of G in Bδ satisfies sup |Xˆ h (ξ )| ≤ M n+1 .
ξ ∈R
(iii) Any solution to (23), (24) on R which satisfies sup |Xˆ h (ξ )| ≤ δ
ξ ∈R
is a fixed point of G in Bδ with Xˆ c (0) = Vc . We now define a local centre manifold for (13) as the set of initial data on solutions of (23), (24) on R whose hyperbolic parts are bounded by δ. The precise choice of starting time for the solutions is, however, not important because the equations are autonomous. Notice that W c is a locally invariant manifold for (13) and a globally invariant manifold for (23), (24). Definition 2. Suppose that ρ ≤ n+2 , take Vc ∈ Dc and denote the unique fixed point of G in Bδ by Xˆ Vc . The set of points Wc =
{Xˆ Vc (0)} Vc
is called the local centre manifold for (13). It will prove useful to introduce a reduction function ψ to describe the local part of W c , so that W c ∩ {Xˆ c ≤ δ} = {(ψ(Xˆ c ), Xˆ c ) | Xˆ c ≤ δ}. The reduction function ψ is defined by ψ(Xˆ c ) = (Xˆ Xˆ c )h (0), and a familiar argument shows that ψ(Xˆ c ) ≤ MXˆ c 2 (see Mielke [16, p. 77]).
512
M. D. Groves, G. Schneider
5. The Global Centre-Stable Manifold In this section we show that solutions Zˆ Vs ,Vc of (19), (20) whose initial data lie on W cs satisfy Zˆ c (ξ ) ≤ δ for ξ ∈ [0, ∞) whenever |Vs |, Vc ≤ n+2 . Such solutions solve (17), (18), in which the cut-off function θ is not used, and we then refer to the submanifold W˜ cs of W cs given by restricting to these values of Vs and Vc as a global centre-stable manifold for solutions of (17), (18) at ξ = 0. The proof of the above assertion is given in Theorem 6 below. It relies upon three preliminary results. Firstly, the origin is Lyapunov stable within the centre manifold W c , so that solutions on W c which are O( n+1 ) at some time remain O( n+1 ) for all subsequent times (Lemma 10). Secondly, the centre part of solutions with initial data on W cs remains O( n+1 ) on timescales of order 1/ n+1 (Lemma 11). Thirdly, any solution Zˆ with initial data on W cs has the property that Zˆ + hˆ converges exponentially to a solution Xˆ on W c (Theorem 5). The idea is therefore to control the centre part of Zˆ over a long timescale using Lemma 11; at the end of this timescale Zˆ c is very close to Xˆ c because of the exponential decay and can be controlled by the Lyapunov stability of the origin in W c . The main issue is careful book-keeping of the exponential decay rates and multiplicative constants in the various estimates. Lemma 10. Any solution Xˆ of (23), (24) with the property that Xˆ c (ξ ) ≤ M n+1 for some ξ > 0 satisfies Xˆ c (ξ ) ≤ M n+1 < δ, ξ ≥ ξ . Proof. Let Xˆ be a solution of (23), (24) that lies on W c , so that ˆ ) = (ψ(Xˆ c (ξ )), Xˆ c (ξ )) X(ξ ˆ )) = Hˆ (ψ(Xˆ c (ξ )), Xˆ c (ξ )), and using the whenever Xˆ c (ξ ) ≤ δ. It follows that Hˆ (X(ξ formula (14) for Hˆ and the estimate |ψ(Xˆ c )| ≤ MXˆ c 2 for ψ, we find that ˆ )) ≥ MXˆ c (ξ )2 Hˆ (X(ξ whenever Xˆ c (ξ ) ≤ δ. The classical Lyapunov stability argument now shows that Xˆ c (ξ ) ≤ M n+1 implies that Xˆ c (ξ ) ≤ M n+1 for ξ ≥ ξ (cf. Meyer & Hall [15, p. 5]). Lemma 11. Suppose that ρ ≤ n+2 . Any solution Zˆ Vs ,Vc of (19), (20) with initial data on W cs which satisfies |Vs | ≤ M n+1 ,
Vc ≤ M n+1
has the properties that |Zˆ h (ξ )| ≤ M n+1 ,
ξ ∈ [0, ∞)
Zˆ c (ξ ) ≤ M n+1 ,
ξ ∈ [0, ξ1 ]
and for any ξ1 ≤ 1/ n+1 .
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
513
Proof. The fact that |Zˆ h (ξ )| < M n+1 for ξ ∈ [0, ∞) follows directly from Theorem 3, and the estimate (22) in its proof shows that Zˆ c (ξ ) ≤ MVc + M( 2 δ 2 + ρ δ)ξ + M( δ + ρ),
ξ ∈ [0, ∞).
The result for Zˆ c (ξ ) follows from this inequality, the hypothesis on Vc and the relations δ = n , ρ ≤ n+2 . Consider a solution Zˆ to (19), (20) with initial data on W cs which meets the hypotheses in Lemma 11. We now use the strategy explained by Mielke [17] to construct a solution Xˆ of (23), (24) to which Zˆ + hˆ converges exponentially as ξ → ∞. Define Yˆ : R → Y by ˆ ) + h(ξ ˆ )), Yˆ (ξ ) = I(ξ )(Z(ξ
(27)
where I ∈ C ∞ (R, R) is a cut-off function such that 0, ξ ≤ ξ2 /2, I(ξ ) = 1, ξ ≥ ξ2 , and is chosen so that |I (ξ )| ≤ M/ξ2 for all ξ ∈ R. It follows that Yˆ solves the equation ∂ξ Yˆ = LYˆ + Fˆ (Yˆ ) + ρ Rˆ (Yˆ ) + S, in which ˆ ) + h(ξ ˆ )) + ρ Rˆ (Z(ξ ˆ ) + h(ξ ˆ ))] S(ξ ) = I(ξ )[Fˆ (Z(ξ ˆ ) + h(ξ ˆ ))) − ρ Rˆ (I(ξ )(Z(ξ ˆ ) + h(ξ ˆ ))) + I (ξ )(Z(ξ ˆ ) + h(ξ ˆ )). − Fˆ (I(ξ )(Z(ξ Notice that the difference J between Yˆ and a solution Xˆ of Eq. (23), (24) solves the equation ∂ξ J = LJ + Fˆ (Yˆ ) − Fˆ (Yˆ − J) + ρ[Rˆ (Yˆ ) − Rˆ (Yˆ − J)] + S.
(28)
We therefore seek a solution J of this equation which decays exponentially to zero as ξ → ∞ and has the property that Xˆ = Yˆ − J lies on W c . Consider the integral equation ξ J(ξ ) = !P (Fˆ (Yˆ ) − Fˆ (Yˆ − J) −∞
−
ξ
!P (Fˆ (Yˆ ) − Fˆ (Yˆ − J) + ρ[Rˆ (Yˆ ) − Rˆ (Yˆ − J)] + S)(τ ), u∗ e−λτ " dτ ueλξ
−
+ ρ[Rˆ (Yˆ ) − Rˆ (Yˆ − J)] + S)(τ ), s ∗ eλτ " dτ se−λξ ∞
∞ ξ
K(ξ − τ )Q(Fˆ (Yˆ ) − Fˆ (Yˆ − J) + ρ[Rˆ (Yˆ ) − Rˆ (Yˆ − J)] + S)(τ ) dτ.
514
M. D. Groves, G. Schneider
We write this equation as J = K(J) and study it in the Banach space Er− = {J ∈ C(R, X ) | Jr = sup erξ J(ξ ) < ∞}. ξ ∈R
Theorem 5. Suppose that ρ ≤ n+2 and ξ2 satisfies e−rξ2 /2 = n+1 . (i) Any fixed point J ∈ Er− of K is a solution of (28) on R. (ii) The function K is a contraction on Bδ− = {J ∈ Er− | sup |Jh (ξ )| ≤ δ} ξ ∈R
and the unique fixed point of K in Bδ− satisfies J(ξ ) ≤
M rξ /2 −rξ e 2 e ,
2
|Jh (ξ )| ≤ M
n , | log |
ξ ∈ R.
Proof. The first statement is proved in the same way as Lemma 8. Turning to the second statement, observe that Fˆ (Yˆ ) − Fˆ (Yˆ − J) ≤ M 2 (Yˆ + J)J, Rˆ (Yˆ ) − Rˆ (Yˆ − J) ≤ M J, (see (25), (26)) and S(ξ ) ≤
M ˆ ˆ )) + M 2 (Z(ξ ˆ )3 + h(ξ ˆ )3 ), (Z(ξ ) + h(ξ ξ2
ξ ∈ [ξ2 /2, ξ2 ]
ˆ ) ≤ (S(ξ ) vanishes outside this interval). Because ξ2 = 2(n + 1)| log |/r and Z(ξ n+1 M for ξ ≤ ξ1 (see Lemma 11), we find that S(ξ ) ≤ M
n+2 , | log |
ξ ∈ [ξ2 /2, ξ2 ]
(29)
(because ξ1 ≤ 1/ n+1 and ξ2 = 2(n + 1)| log |/r, we can always choose ξ1 > ξ2 ). Furthermore, the estimate ˆ ) ≤ Z(ξ
sup
ξ ∈[ξ2 /2,ξ2 ]
ˆ )e−r(ξ −ξ2 ) Z(ξ
≤ M n+1 e−r(ξ −ξ2 )
≤ Merξ2 /2 e−rξ shows that
S(ξ ) ≤ Merξ2 /2 e−rξ ,
ξ ∈ [ξ2 /2, ξ2 ].
(30)
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
515
Using (30) to estimate S(ξ ), we find that (K(J)(ξ )erξ ≤
M λ
ξ
−∞
M + λ +M
( 2 δJr + ρ Jr + erξ2 /2 χ )e(λ−r)τ dτ e(r−λ)ξ ∞
ξ ∞ ξ
( 2 δJr + ρ Jr + erξ2 /2 χ )e−(λ+r)τ dτ e(r+λ)ξ
( 2 δJr + ρ Jr + erξ2 /2 χ )e−rτ dτ erξ
M M ≤ ( 2 δ + ρ )Jr + erξ2 /2 λ(λ − r) λ(λ − r) M M ( 2 δ + ρ )Jr + erξ2 /2 + λ(λ + r) λ(λ + r) M 2 M rξ /2 + ( δ + ρ )Jr + e 2 , r r
(31)
where χ is the characteristic function of the interval [ξ2 /2, ξ2 ]. It follows that K(J)r < ∞, so that K maps Er− into itself. Replacing Jr by δ and r with 0 in the above inequality and using (29) to estimate S(ξ ), we find that |(K(J))h (ξ )| ≤
M 2 M
n+2 , ( δ + ρ )δ + 2 λ2 λ | log |
so that K maps Bδ− into itself. To show that K is a contraction on Bδ− , note that (K(J1 ))(ξ ) − (K(J2 ))(ξ ) ξ = !P (Fˆ (Yˆ − J2 ) − Fˆ (Yˆ − J1 ) −∞
−
ξ
!P (Fˆ (Yˆ − J2 ) − Fˆ (Yˆ − J1 ) + ρ[Rˆ (Yˆ − J2 ) − Rˆ (Yˆ − J1 )])(τ ), u∗ e−λτ " dτ ueλξ
−
+ ρ[Rˆ (Yˆ − J2 ) − Rˆ (Yˆ − J1 )])(τ ), s ∗ eλτ " dτ se−λξ ∞
∞ ξ
K(ξ − τ )Q(Fˆ (Yˆ − J2 ) − Fˆ (Yˆ − J1 ) + ρ[Rˆ (Yˆ − J2 ) − Rˆ (Yˆ − J1 )])(τ ) dτ.
Using the estimates Fˆ (Yˆ − J2 ) − Fˆ (Yˆ − J1 ) ≤ M 2 (Yˆ + J1 + J2 )(J1 − J2 , Rˆ (Yˆ − J2 ) − Rˆ (Yˆ − J1 ) ≤ M J1 − J2 ,
516
M. D. Groves, G. Schneider
which are obtained from (25), (26), we find that M ξ ( 2 δ + ρ )J1 − J2 eλτ dτ e−λξ erξ (K(J1 ))(ξ ) − (K(J2 ))(ξ )erξ = λ −∞ M ∞ 2 + ( δ + ρ )J1 − J2 e−λτ dτ eλξ erξ λ ξ ∞ +M ( 2 δ + ρ )J1 − J2 dτ erξ ξ
M ≤ ( 2 δ + ρ )J1 − J2 r λ(λ − r) M + ( 2 δ + ρ )J1 − J2 r λ(λ + r) M + ( 2 δ + ρ )J1 − J2 r , r so that LipB − K < 21 . δ
The estimates K(0)r ≤ Merξ2 /2 / 2 (see (31)) and LipB − K < Bδ−
unique fixed point J of K in Merξ2 /2 e−rξ / 2 for each ξ ∈ R.
satisfies Jr ≤
δ
Merξ2 /2 / 2 ,
1 2
show that the
so that J(ξ ) ≤
It now remains to combine the above results to prove the result announced at the start of this section and to construct a global centre-stable manifold for Eqs. (17), (18). Theorem 6. Any solution Zˆ Vs ,Vc of (19), (20) with Zˆ Vs ,Vc ∈ W cs and |Vs |, Vc ≤ n+2 satisfies Zˆ c (ξ ) ≤ δ for ξ ∈ [0, ∞). Proof. Define Yˆ by (27) and construct J using Theorem 5. The function Xˆ : R → Y given by ˆ ) = Yˆ (ξ ) − J(ξ ), ξ ∈ R X(ξ is a solution of (23), (24) and because |Xˆ h (ξ )| ≤ |Yˆh (ξ )| + |Jh (ξ )| ≤ M it lies on W c (see Theorem 4(iii)). Observe that ˆ ) + h(ξ ˆ ), Yˆ (ξ ) = Z(ξ
n < δ, | log |
ξ ∈R
ξ ≥ ξ2 ,
so that
M Zˆ c (ξ ) − Xˆ c (ξ ) ≤ J(ξ ) ≤ 2 erξ2 /2 e−rξ , ξ ≥ ξ2 .
Lemma 11 shows that Zˆ c (ξ ) ≤ M n+1 , ξ ≤ ξ , and clearly 1
Zˆ c (ξ ) − Xˆ c (ξ ) ≤ J(ξ ) ≤
M rξ /2 −rξ e 2 e 1,
2
ξ ≥ ξ1
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
517
because ξ1 > ξ2 . It follows that Zˆ c (ξ ) − Xˆ c (ξ ) ≤ J(ξ ) ≤ M n+1 ,
ξ ≥ ξ1 ,
and in particular Xˆ c (ξ1 ) ≤ M n+1 . Lemma 10 now asserts that Xˆ c (ξ ) ≤ M n+1 , ξ ≥ ξ1 , so that Zˆ c (ξ ) ≤ M n+1 , ξ ≥ ξ1 . Definition 3. The submanifold W˜ cs of W cs given by restricting to |Vs |, Vc ≤ n+2 is called the global centre-stable manifold for solutions of (17), (18) at time ξ = 0. 6. Construction of Symmetric Modulating Pulses In this section we identify solutions Zˆ Vs ,Vc of (17), (18) on [0, ∞) with Zˆ Vs ,Vc (0) ∈ W˜ cs which can be extended to solutions of (17), (18) on R. The idea is to exploit the reversibility of Eqs. (17), (18) (see Sect. 2); in particular, solutions with the property that Zˆ Vs ,Vc (0) lies on the symmetric section ) := Fix S = X ∩ {p = 0} can be extended to symmetric solutions on R. Because Zˆ c (0) = Vc we have that Zˆ c (0) ∈ )c whenever Vc ∈ )c and our task is reduced to that of identifying a criterion on Vs which guarantees that Zˆ h (0) ∈ )h . We consider a solution Zˆ Vs ,Vc with Zˆ Vs ,Vc (0) ∈ W˜ cs as a function of Vs which depends upon ρ ∈ R and Vc ∈ )c as parameters (with ρ, Vc ≤ n+2 ); accordingly we use the alternative notation Zˆ ρ,Vc (Vs ). The above comments show that Zˆ ρ,Vc (Vs )|ξ =0 ∈ ) whenever Vs is a solution of the equation Jρ,Vc (Vs ) = 0,
(32)
where Jρ,Vc : B¯ n+2 (0) ⊂ R → R is defined by Jρ,Vc (Vs ) = (I − S)Zˆ ρ,Vc (Vs )|ξ =0 . (The right-hand side of this equation is a vector in X with only one nonzero entry, namely its p1 -component, and is therefore identified with a real number.) Notice that Eq. (32) has the solution Vs = 0 at (ρ, Vc ) = (0, 0) since the unique solution of (17), (18) with (ρ, Vc ) = (0, 0) is Zˆ = 0. We therefore seek a solution of (17), (18) near this known solution for parameter values (ρ, Vc ) near (0, 0). It therefore seems natural to apply the implicit-function theorem; notice, however, that we are forced to work from first principles (by applying the contraction mapping principle) since we require precise information concerning the parameter-dependence of the solutions. To carry out the above programme it is necessary to show that J is differentiable and obtain some estimates on its derivative. We therefore need to show that the solutions Zˆ ρ,Vc (Vs ) described above are differentiable with respect to Vs and obtain some estimates on their derivatives. Formally differentiating the fixed-point equation for F which defines Zˆ ρ,Vc (Vs ) and replacing Zˆ ρ,Vc (Vs ) and dZˆ ρ,Vc [Vs ](V˜s ) with respectively Zˆ and Yˆ , we obtain the integral equation
518
M. D. Groves, G. Schneider
Yˆ (ξ ) = V˜s s(ξ ) + K(ξ )V˜c ξ ˆ + ρN2 (Z))) ˆ Yˆ , s ∗ (τ )" dτ s(ξ ) + !∂Zˆ (P (N0 (Z) 0 ∞ ˆ + ρN2 (Z))) ˆ Yˆ , u∗ (τ )" dτ u(ξ ) !∂Zˆ (P (N0 (Z) − ξ
+
0
ξ
ˆ + ρN2 (Z))) ˆ Yˆ dτ, K(ξ − τ )∂Zˆ (Q(N1 (Z)
where the cut-off function θ has not been used because Zˆ c (ξ ) < δ, ξ ∈ [0, ∞). We write this equation as ˆ Yˆ ) Yˆ = F (Z)( and study it in the space Eµ+ , where µ ∈ [r, 2r] and r is now chosen so that 0 < 2r < λ. Using the results of the following lemma, we find from the fibre contraction mapping principle and standard uniform continuity arguments that Zˆ ρ,Vc (Vs ) is differentiable with Yˆ = dZˆ ρ,Vc [Vs ](V˜s ) (cf. Vanderbauwhede [24, §1.3]). ˆ is a uniform contraction on Eµ+ for Zˆ on W˜ cs . Lemma 12. (i) The mapping F (Z) + (ii) For each fixed Yˆ ∈ Er+ the mapping F (·)(Yˆ ) : Er+ → E2r is continuous. Proof. Observe that ˆ Yˆ ))c (ξ ) ≤ M 2 (F (Z)(
ξ
Yˆ dτ ≤
0
so that
M 2 ˆ Y µ eµξ , µ
2 ˆ Yˆ ))c µ ≤ M Yˆ µ (F (Z)( µ
and M ξ 2 −λξ ˆ ˆ ˜ |(F (Z)(Y ))h (ξ )| ≤ M|Vs | e + ( δ + ρ )Yˆ eλτ dτ e−λξ λ 0 M ∞ 2 + ( δ + ρ )Yˆ e−λτ dτ eλξ λ ξ ξ M ≤ M|V˜s |e−λξ + ( 2 δ + ρ ) Yˆ dτ λ 0 ∞ M 2 + ( δ + ρ ) Yˆ dτ λ ξ M ≤ M|V˜s |e−λξ + 2 ( 2 δ + ρ )Yˆ µ eµξ ,
so that ˆ Yˆ ))h |µ ≤ M|V˜s | + |(F (Z)(
M 2 ( δ + ρ )Yˆ µ .
2
ˆ Yˆ ) is linear The first assertion follows from the above estimates and the fact that F (Z)( in Yˆ .
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
519
Turning to the second assertion, note that (F (Zˆ 1 )(Yˆ ) − F (Zˆ 2 )(Yˆ ))c (ξ ) ≤ M 2
ξ
Zˆ 1 − Zˆ 2 Yˆ dτ
0
≤
M 2 ˆ Z1 − Zˆ 2 r Yˆ r e2rξ , r
so that
M 2 ˆ (F (Zˆ 1 )(Yˆ ) − F (Zˆ 2 )(Yˆ ))c 2r ≤ Z1 − Zˆ 2 r Yˆ r , r and similarly we find that |(F (Zˆ 1 )(Yˆ ) − F (Zˆ 2 )(Yˆ ))h |2r ≤ MZˆ 1 − Zˆ 2 r Yˆ r .
The following corollary of Lemma 12 states the requisite estimates on the derivative of J . Corollary 1. (i) The operator dJ0,0 [0] : R → R is a bijection and |dJ0,0 [0]−1 | ≤
M . λ
(ii) The operator dJρ,Vc [Vs ] : R → R satisfies the estimate ρ dJρ,Vc [Vs ] − dJ0,0 [0] ≤ M |Vs | + Vc + .
Proof. Clearly
(33)
(34)
dJ0,0 [0] = (I − S)dZˆ 0,0 [0]|ξ =0
and
dZˆ 0,0 [0]|ξ =0 (V˜s ) = V˜s s(0),
so that
dJ0,0 [0](V˜s ) = V˜s (I − S)s(0).
Taking the inner product of this equation with (I + S)s ∗ (0) and using the fact that S : X → X is a self-adjoint involution, one finds that 1 V˜s = !dJ0,0 [0](V˜s ), (I + S)s ∗ (0)", 2 from which the first assertion is a direct consequence. Define Zˆ 1 = Zˆ ρ,Vc (Vs ), Yˆ1 = dZˆ ρ,Vc [Vs ], Yˆ2 = dZˆ 0,0 [0] and note that Zˆ 0,0 is identically zero. By construction we have that Yˆ1 − Yˆ2 = F (Zˆ 1 )(Yˆ1 ) − F (0)(Yˆ2 ), so that (Yˆ1 − Yˆ2 )h (ξ ) =
ξ
!∂Zˆ (P (N0 (Zˆ 1 ) + ρN2 (Zˆ 1 )))Yˆ1 , s ∗ (τ )" dτ s(ξ ) 0 ∞ !∂Zˆ (P (N0 (Zˆ 1 ) + ρN2 (Zˆ 1 )))Yˆ1 , u∗ (τ )" dτ u(ξ ). − ξ
520
M. D. Groves, G. Schneider
The estimates ∂Zˆ (N0 (Zˆ 1 ) + ρN2 (Zˆ 1 ))Yˆ1 ≤ M( 2 Zˆ 1 + ρ )Yˆ1 ρ Zˆ 1 r ≤ M |Vs | + Vc + 2
and show that
∂Zˆ ((N0 (Zˆ 1 ) + ρN2 (Zˆ 1 )))Yˆ1 ≤ M( 2 |Vs + 2 Vc + ρ )Yˆ1 erξ . It follows that M 2 ( |Vs | + 2 Vc + ρ ) |(Yˆ1 − Yˆ2 )h (ξ )| ≤ λ ξ × Yˆ1 erτ eλτ dτ e−λξ + 0
∞ ξ
Yˆ1 erτ e−λτ dτ eλξ
M 2 ≤ ( |Vs | + 2 Vc + ρ )Yˆ1 r λ ξ ∞ 2rτ (2r−λ)τ λξ × e dτ + e dτ e 0
≤
ξ
M 2 ( |Vs | + 2 Vc + ρ )Yˆ1 r λ
from which the stated result is an immediate consequence.
e2rξ − 1 e2rξ + 2r λ − 2r
,
We now study the solution set of the equation Jρ,Vc (Vs ) = 0 near the known solution Vs = 0 at (ρ, Vc ) = (0, 0) by writing it as Vs = Vs − dJ0,0 [0]−1 Jρ (Vs )
(35)
and examining this fixed point problem. According to a standard argument in nonlinear analysis the fixed-point problem (35) has a unique solution Vs = Vs (ρ, Vc ) in B¯ η (0) ⊂ R whenever η dJ0,0 [0]−1 Jρ,Vc (0) ≤ , 2 1 −1 dJ0,0 [0] dJρ,Vc [Vs ] − dJ0,0 [0] ≤ , Vs ∈ B¯ η (0). 2 The estimates (33), (34) and Jρ,Vc (0) ≤ M(Zˆ ρ,Vc (0)|ξ =0 )h M ≤ 2 ( 2 δ 2 + ρ )
(see formula (21)) show that for ρ = n+5 we can take η = n+2 , which is compatible with the restriction |Vs | ≤ n+2 for Zˆ ρ,Vc on W˜ cs .
Modulating Pulse Solutions for a Class of Nonlinear Wave Equations
521
We have therefore constructed a family of symmetric solutions Zˆ Vc to (17), (18) on ˆ ) ≤ δ = n R which are parameterised by Vc ∈ )c with Vc ≤ n+2 and satisfy Z(ξ for each ξ ∈ R. The formula X(ξ ) = h(ξ ) + Zˆ Vc (ξ ),
ξ ∈R
defines a family of pulse-like solutions to the Hamiltonian system (11) which was obtained from the original spatial dynamics formulation of the problem by the normal-form theory in Sect. 3 (because ρ = n+5 we have to take N = n + 6 in this normal-form theory). These solutions are parameterised by Vc , and although all pm -components of Vc vanish because Vc ∈ )c there still exists a continuum of solutions parameterised by the qm -components of Vc . By construction we have that X(ξ ) − h(ξ ) ≤ n+1 for each ξ ∈ R, and because h(0), Zˆ Vc (0) lie in the symmetric section ) all pm -components of X(0) vanish. Tracing the coordinate transformations back to the original variable v(ξ, y), we find that ∂ξ v(0, y) = 0 for these pulse-like solutions, which are therefore indeed symmetric. These remarks complete the proof of the existence result given in the Introduction (Theorem 1). References 1. Coddington, E.A. and Levinson, N.: Theory of Ordinary Differential Equations. NewYork: McGraw-Hill, 1955 2. Craig, W., Sulem, C. and Sulem, P.L.: Nonlinear modulation of gravity waves: A rigorous approach. Nonlinearity. 5, 497–522 (1992) 3. Craig, W. and Wayne, C.E.: Newton’s method and periodic solutions of nonlinear wave equations. Commun. Pure Appl. Math. 46, 1409–1498 (1993) 4. Eckmann, J.-P. and Wayne, C.E.: Propagating fronts and the center manifold theorem. Commun. Math. Phys. 136, 285–307 (1991) 5. Elphick, C., Tirapegui, E., Brachet, M.E., Coullet, P. and Iooss, G.: A simple global characterization for normal forms of singular vector fields. Physica D 29, 95–127 (1987) 6. Fu, Y.: On the propagation of nonlinear travelling waves in an incompressible plate. Wave Motion 19, 271–292 (1994) 7. Groves, M.D.: An existence theory for three-dimensional periodic travelling gravity-capillary water waves with bounded transverse profiles. Physica D 152–153, 395–415 (2001) 8. Groves, M.D. and Mielke, A.: A spatial dynamics approach to three-dimensional gravity-capillary steady water waves. Proc. Roy. Soc. Edin. A 131, 83–136 (2001) 9. Haragus-Courcelle, M. and Schneider, G.: Bifurcating fronts for the Taylor-couette problem in infinite cylinders. J. Appl. Math. Phys. (Z.A.M.P.) 50, 120–151 (1999) 10. Kalyakin, L.A.: Long wave asymptotics, integrable equations as asymptotic limits of nonlinear systems. Russ. Math. Surv. 44, 3–42 (1989) 11. Kirchgässner, K.: Wave solutions of reversible systems and applications. J. Diff. Eqns. 45, 113–127 (1989) 12. Kirchgässner, K.: Nonlinear resonant surface waves and homoclinic bifurcation. Adv. Appl. Math. 26, 135–181 (1988) 13. Kirrmann, P., Schneider, G. and Mielke, A.: The validity of modulation equations for extended systems with cubic nonlinearities. Proc. Roy. Soc. Edin. A 122, 85–91 (1992) 14. Kuksin, S.B.: Nearly integrable infinite-dimensional Hamiltonian systems. Berlin: Springer-Verlag, 1993 15. Meyer, K.A. and Hall, G.R.: Introduction to Hamiltonian dynamics and the N-body problem. New York: Springer-Verlag, 1992 16. Mielke, A.: A reduction principle for nonautonomous systems in infinite-dimensional spaces. J. Diff. Eqns. 65, 68–88 (1986) 17. Mielke, A.: Normal hyperbolicity of center manifolds and Saint-Venant’s principle. Arch. Rational Mech. Anal. 110, 353–372 (1986) 18. Newell, A.C. and Moloney, J.V.: Nonlinear Optics. Reading, MA: Addison-Wesley, 1992 19. Pazy, A.: Semigroups of Linear Operators and Applications to Partial Differential Equations. New York: Springer-Verlag, 1983 20. Pöschel, J.: Über invariante Tori in differenzierbaren Hamiltonschen Systemen. Bonner Mathematische Schriften 120 (1980)
522
M. D. Groves, G. Schneider
21. Pöschel, J.: Quasi-periodic solutions for a nonlinear wave equation. Comment. Math. Helv. 71, 269–296 (1980) 22. Sandstede, B. and Scheel, A.: Essential instability of pulses and bifurcations to modulated travelling waves. Proc. Roy. Soc. Edin. A 129, 1263–1290 (1999) 23. Schneider, G.: Justification of modulation equations for hyperbolic systems via normal forms. Nonlinear Differential Equations and Applications (NODEA) 5, 69–82 (1998) 24. Vanderbauwhede, A.: Centre manifolds, normal forms and elementary bifurcations. Dynamics Reported. 2, 89–169 (1989) 25. Zakharov, V.E.: Stability of periodic waves of finite amplitude on the surface of a deep fluid. Zh. Prikl. Mekh. Tekh. Fiz. 9, 86–94 (1968), (English translation J. Appl. Mech. Tech. Phys. 2, 190–194) Communicated by A. Kupiainen
Commun. Math. Phys. 219, 523 – 565 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Uniqueness of the Invariant Measure for a Stochastic PDE Driven by Degenerate Noise J.-P. Eckmann1,2 , M. Hairer1 1 Département de Physique Théorique, Université de Genève, 1211 Genève, Switzerland.
E-mail:
[email protected];
[email protected]
2 Section de Mathématiques, Université de Genève, 1211 Genève, Switzerland
Received: 10 September 2000 / Accepted: 13 December 2000
Abstract: We consider the stochastic Ginzburg–Landau equation in a bounded domain. We assume the stochastic forcing acts only on high spatial frequencies. The low-lying frequencies are then only connected to this forcing through the non-linear (cubic) term of the Ginzburg–Landau equation. Under these assumptions, we show that the stochastic PDE has a unique invariant measure. The techniques of proof combine a controllability argument for the low-lying frequencies with an infinite dimensional version of the Malliavin calculus to show positivity and regularity of the invariant measure. This then implies the uniqueness of that measure.
Contents 1. 2. 3. 4. 5. 6. 7. 8.
Introduction . . . . . . . . . . . . . . . . . . . Some Preliminaries on the Dynamics . . . . . . Controllability . . . . . . . . . . . . . . . . . . Strong Feller Property and Proof of Theorem 1.1 Regularity of the Cutoff Process . . . . . . . . . Malliavin Calculus . . . . . . . . . . . . . . . . The Partial Malliavin Matrix . . . . . . . . . . . Existence Theorems . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
523 527 527 531 533 539 544 554
1. Introduction In this paper, we study a stochastic variant of the Ginzburg–Landau equation on a finite domain with periodic boundary conditions. The deterministic equation is u˙ = u + u − u3 ,
u(0) = u(0) ∈ H,
(1.1)
524
J.-P. Eckmann, M. Hairer
1 ([−π, π ]), i.e., the closure of the space of smooth where H is the real Hilbert space Wper periodic functions u : [−π, π ] → R equipped with the norm π |u(x)|2 + |u (x)|2 dx.
u 2 = −π
(The restriction to the interval [−π, π ] is irrelevant since other lengths of intervals can be obtained by scaling space, time and amplitude u in (1.1).) While we work exclusively with the real Ginzburg–Landau equation (1.1) our methods generalize immediately to the complex Ginzburg–Landau equation u˙ = (1 + ia)u + u − (1 + ib)|u|2 u,
a, b ∈ R,
(1.2)
which has a more interesting dynamics than (1.1). But the notational details are slightly more involved because of the complex values of u and so we stick with (1.1). While a lot is known about existence and regularity of solutions of (1.1) or (1.2), only very little information has been obtained about the attractor of such systems, and in particular, nothing seems to be known about invariant measures on the attractor. On the other hand, when (1.1) is replaced by a stochastic differential equation, more can be said about the invariant measure, see [DPZ96] and references therein. Since the problem (1.1) involves only functions with periodic boundary conditions, it can be rewritten in terms of the Fourier series for u: π 1 u(x, t) = eikx uk (t), uk = e−ikx u(x) dx. 2π −π k∈Z
We call k the momenta, uk the modes, and, since u(x, t) is real we must always have uk (t) = u¯ −k (t), where z¯ is the complex conjugate of z. With these notations (1.1) takes the form u˙ k = (1 − k 2 )uk − uk1 uk2 uk3 , k1 +k2 +k3 =k
for all k ∈ Z and the initial condition satisfies {(1 + |k|)uk (0)} ∈ 2 . In the sequel, we 1 ([−π, π ]) and for its counterpart will use the symbol H indifferently for the space Wper in Fourier space. In the earlier literature on uniqueness of the invariant measure for stochastic differential equations, see the recent review [MS98], the authors are mostly interested in systems where each of the uk is forced by some external noise term. The main aim of our work is to study forcing by noise which acts only on the high-frequency part of u, namely on the uk with |k| ≥ k∗ for some finite k∗ ∈ N. The low-frequency amplitudes uk with |k| < k∗ are then only indirectly forced through the noise, namely through the nonlinear coupling of the modes. In this respect, our approach is reminiscent of the work done on thermally driven chains in [EPR99a, EPR99b, EH00], where the chains were only stochastically driven at the ends. In the context of our problem, the existence of an invariant measure is a classical result for the noise we consider [DPZ96], and the main novelty of our paper is a proof of uniqueness of that measure. To prove uniqueness we begin by proving controllability of the equations, i.e., to show that the high-frequency noise together with non-linear coupling effectively drives the low-frequency modes. Using this, we then use Malliavin calculus in infinite dimensions, to show regularity of the transition probabilities. This then implies uniqueness of the invariant measure.
Uniqueness of Invariant Measure for Stochastic PDE
525
We will study the system of equations duk = −k 2 uk dt + uk − (u3 )k dt +
qk 4π(1 + k 2 )
dwk (t),
(1.3)
with u ∈ H. The above equations hold for k ∈ Z, and it is always understood that (u3 )k = uk1 uk2 uk3 , (1.4) k1 +k2 +k3 =k k1 ,k2 ,k3 ∈Z
with u−k = u¯ k . To avoid inessential notational problems we will work with even periodic functions, so that uk = u−k ∈ R. We will work with the basis ek (x) =
1 π(1 + k 2 )
cos(kx).
(1.5)
Note that this basis is orthonormal w.r.t. the scalar product in H, but the uk are actually given by uk = (4π(1 + k 2 ))−1/2 u, ek . (We choose this to make the cubic term (1.4) look simple.) The noise is supposed to act only on the high frequencies, but there we need it to be strong enough in the following way. Let ak = k 2 + 1. Then we require that there exist constants c1 , c2 > 0 such that for k ≥ k∗ , −β
c1 ak−α ≤ qk ≤ c2 ak ,
α ≥ 2,
α − 1/8 < β ≤ α.
(1.6)
These conditions imply ∞ k=0
(1 + k 4α−3/2 )qk2 < ∞, sup k −2α qk−1 < ∞.
k≥k∗
We formulate the problem in a more general setting: Let F (u) be a polynomial of odd degree with negative leading coefficient. Let A be the operator of multiplication by 1 + k 2 and let Q be the operator of multiplication by qk . Then (1.3) is of the form dt = −At dt + F (t ) dt + Q dW (t),
(1.7)
∞
where dW (t) = k=0 ek dwk (t) is the cylindrical Wiener process on H with the wk mutually independent real Brownian motions.1 We define t (ξ ) as the solution of (1.7) with initial condition 0 (ξ ) = ξ . Clearly, the conditions on Q can be formulated as
Aα−3/8 Q HS < ∞, is bounded for k ≥ k∗ ,
qk−1 k −2α
(1.8a) (1.8b)
where · HS is the Hilbert–Schmidt norm on H. Note that for each k, (1.3) is obtained by multiplying (1.7) by (4π(1 + k 2 ))−1/2 ·, ek . 1 It is convenient to have, in the case of (1.3), A = 1 − and F (u) = 2u − u3 rather than A = −1 − and F (u) = −u3 .
526
J.-P. Eckmann, M. Hairer
Important Remark. The crucial aspect of our conditions is the possibility of choosing qk = 0 for all k < k∗ , i.e., the noise drives only the high frequencies. But we also allow any of the qk with k < k∗ to be different from 0, which corresponds to long wavelength forcing. Furthermore, as we are allowing α to be arbitrarily large, this means that the forcing at high frequencies has an amplitude which can decay like any power. The point of this paper is to show that these conditions are sufficient to ensure the existence of a unique invariant measure for (1.7). Theorem 1.1. The process (1.7) has a unique invariant Borel measure on H. There are two main steps in the proof of Theorem 1.1. First, the nature of the nonlinearity F implies that the modes with k ≥ k∗ couple in such a way to those with k < k∗ as to allow controllability. Intuitively, this means that any point in phase space can be reached to arbitrary precision in any given time, by a suitable choice of the high-frequency controls. Second, verifying a Hörmander-like condition, we show that a version of the Malliavin calculus can be implemented in our infinite-dimensional context. This will be the hard part of our study, and the main result of that part is a proof that the strong Feller property holds. This means that for any measurable function ϕ ∈ Bb (H), the function t (1.9) P ϕ (ξ ) ≡ E ϕ ◦ t (ξ ) is continuous.2 We show this by proving that a cutoff version of (1.7) (modifying the dynamics at large amplitudes by a parameter !) makes P!t ϕ a differentiable map. The interest in such highly degenerate stochastic PDE’s is related to questions in hydrodynamics where one would ask how “energy” is transferred from high to low frequency modes, and vice versa when only some of the modes are driven. This could then shed some light on the entropy-enstrophy problem in the (driven) Navier-Stokes equation. To end this introduction, we will try to compare the results of our paper to current work of others. These groups consider the 2-D Navier Stokes equation without deterministic external forces, also in bounded domains. In these equations, any initial condition eventually converges to zero, as long as there is no stochastic forcing. First there is earlier work by Flandoli-Maslowski [FM95] dealing with noise whose amplitude is bounded below by |k|−c . In the work of Bricmont, Kupiainen and Lefevere [BKL00a, BKL00b], the stochastic forcing acts on modes with low k, and they get uniqueness of the invariant measure and analyticity, with probability 1. Furthermore, they obtain exponential convergence to the stationary measure. In the work of Kuksin and Shirikyan [KS00] the bounded noise is quite general, acts on low-lying Fourier modes, and acts at definite times with “noise-less” intervals in-between. Again, the invariant measure is unique. It is supported by C ∞ functions, is mixing and has a Gibbs property. In the work of [EMS00], a result similar to [BKL00b] is shown. The main difference between those results and the present paper is our control of a situation which is already unstable at the deterministic level. Thus, in this sense, it comes closer to a description of a deterministically turbulent fluid (e.g., obtained by an external force). On the other hand, in our work, we need to actually force all high spatial frequencies. Perhaps, this could be eliminated by a combination with ideas from the papers above. 2 Throughout the paper, E denotes expectation and P denotes probability for the random variables.
Uniqueness of Invariant Measure for Stochastic PDE
527
2. Some Preliminaries on the Dynamics Here, we summarize some facts about deterministic and stochastic GL equations from the literature which we need to get started. We will consider the dynamics on the following space: 1 ([−π, π ]). The Definition 2.1. We define H as the subspace of even functions in Wper norm on H will be denoted by · , and the scalar product by ·, · .
We consider first the deterministic equation u˙ = u + u − u3 ,
u(0) = u(0) ∈ H.
(2.1)
Due to its dissipative character the solutions are, for positive times, analytic in a strip around the real axis. More precisely, denote by · Aη the norm
f Aη = sup |f (z)|, |Imz|≤η
and by Aη the corresponding Banach space of analytic functions. Then the following result holds. Lemma 2.2. For every initial value u(0) ∈ H, there exist a time T and a constant C such that for 0 < t ≤ T , the solution u(t, u(0) ) of (2.1) belongs to A√t and satisfies
u(t, u(0) ) A√t ≤ C. Proof. The statement is proven in [Col94] for the case of the infinite line. Since the periodic functions form an invariant subspace under the evolution, the result applies to our case. We next collect some useful results for the stochastic equation (1.7): Proposition 2.3. For every t > 0 and every p ≥ 1 the solution of (1.7) with initial condition 0 (ξ ) = ξ ∈ H exists in H up to time t. It defines by (1.9) a Markovian transition semigroup on H. One has the bound E sup s (ξ ) p ≤ Ct,p (1 + ξ )p . s∈[0,t]
Furthermore, the process (1.7) has an invariant measure. These results are well-known and in Sect. 8.6 we sketch where to find them in the literature.
3. Controllability In this section we show the “approximate controllability” of (1.3). The control problem under consideration is u˙ = u + u − u3 + Q f (t),
u(0) = u(i) ∈ H,
(3.1)
528
J.-P. Eckmann, M. Hairer
where f is the control. Using Fourier series’ and the hypotheses on Q, we see that by choosing fk ≡ 0 for |k| < k∗ , (3.1) can be brought to the form qk 2 u um un + fk (t), |k| ≥ k∗ , −k uk + uk − 4π(1 + k 2 ) +m+n=k (3.2) u˙ k = 2 u um un , |k| < k∗ , −k uk + uk − +m+n=k
with {uk } ∈ H and t → {fk (t)} ∈ L∞ ([0, τ ], H). We will refer in the sequel to {uk }|k|
0 the following is true: For every u(i) , u(f) ∈ H and every ε > 0, there exists a control f ∈ L∞ ([0, τ ], H) such that the solution u(t) of (3.1) with u(0) = u(i) satisfies u(τ ) − u(f) ≤ ε. Proof. The construction of the control proceeds in 4 different phases, of which the third is the actual controlling of the low-frequency part by the high-frequency controls. In the construction, we will encounter a time τ (R, ε ) which depends on the norm R of u(f) and some precision ε . Given this function, we split the given time τ as τ = 4i=1 τi , j with τ4 ≤ τ ( u(f) , ε/2) and all τi > 0. We will use the cumulated times tj = i=1 τi . Step 1. In this step we choose f ≡ 0, and we define u(1) = u(t1 ), where t → u(t) is the solution of (3.1) with initial condition u(0) = u(i) . Since there is no control, we really have (2.1) and hence, by Lemma 2.2, we see that u(1) ∈ Aη for some η > 0.
Step 2. We will construct a smooth control f : [t1 , t2 ] → H such that u(2) = u(t2 ) satisfies: +H u(2) = 0. In other words, in this step, we drive the high-frequency part to 0. To construct f , we choose a C ∞ function ϕ : [t1 , t2 ] → R, interpolating between 1 and 0 with vanishing derivatives at the ends. Define uH (t) = ϕ(t)+H u(1) for t ∈ [t1 , t2 ]. This will be the evolution of the high-frequency part. We next define the low-frequency part uL = uL (t) as the solution of the ordinary differential equation u˙ L = uL + uL − +L (uL + uH )3 , with uL (t1 ) = +L u(1) . We then set u(t) = uL (t) ⊕ uH (t) and substitute into (3.1) which we need to solve for the control Qf (t) for t ∈ [t1 , t2 ]. Since uL (t) ⊕ uH (t) as constructed above is in Aη and since Qf = u˙ − u − u + u3 , and maps Aη to Aη/2 we conclude that Qf ∈ Aη/2 . By construction, the components qk of Q decay polynomially with k and do not vanish for k ≥ k∗ . Therefore, Q−1 is a bounded operator from Aη/2 ∩ HH to HH . Thus, we can solve for f in this step. Step 3. As mentioned before, this step really exploits the coupling between high and low frequencies. Here, we start from u(2) at time t2 and we want to reach +L u(f) at time t3 . In fact, we will instead reach a point u(3) with +L u(3) − +L u(f) < ε/2.
Uniqueness of Invariant Measure for Stochastic PDE
529
The idea is to choose for every low frequency |k| < k∗ a set of three3 high frequencies that will be used to control uk . To simplify matters we will assume (without loss of generality) that k∗ > 2: Definition 3.2. We define for every k with 0 ≤ k < k∗ the set Ik by Ik = {10k∗ +k + k , 2 · 10k∗ +k , 3 · 10k∗ +k } . We also define IL0 = {k : 0 ≤ k < k∗ } and I = IL0 ∪ Ik . 0≤k
Lemma 3.3. The sets defined above have the following properties: (A) Let Ik = {k1 , k2 , k3 }. Then, of the six sums ±k1 ± k2 ± k3 exactly one equals k and one equals −k. All others have modulus larger than k∗ . (B) The sets Ik and IL0 are all mutually disjoint. (C) Let S be a collection of three indices in I , S = {k1 , k2 , k3 }. If any of the sums ±k1 ± k2 ± k3 adds up to k with |k| < k∗ then either S = Ik or S ⊂ IL0 or S is of the form S = {k, k , k }. Remark 3.4. At the end of this section, we indicate how this construction generalizes to the complex Ginzburg–Landau equation. Proof. The claims (A) and (B) are obvious from the definition of Ik . To prove (C) let S = {k1 , k2 , k3 }. If S ⊂ IL0 , we are done. Otherwise, at least one of the ki is an element of an I for some = 0, . . . , k∗ − 1. Clearly, if the two others are in IL0 , none of the sums have modulus less than k∗ . If a second kj is in I with " = then again none of the 6 sums can lead to a modulus less than k∗ . Finally if kj is in I then either all 3 are in I and we are done, or ki = kj and thus S = {k, k , k }. We have covered all cases and the proof of the lemma is complete. We are going to construct a control which, in addition to driving the low frequency part as indicated, also implies uk (t) ≡ 0 for k " ∈ I for t ∈ [t2 , t3 ]. By the conditions on I , the low-frequency part of (3.2) is for 0 < k < k∗ equal to (having chosen the controls equal to 0 for k < k∗ ): u˙ k = 1 − k 2 − 6 |un |2 uk − u um u n − 6 un . (3.3) ±±m±n=k {,m,n}⊂IL0
n∈I \IL0
n∈Ik
When k = 0, the last term in (3.3) is replaced by −12 n∈I0 un . This identity exploits the relations u−n = un . To simplify the combinatorial problem, we choose the controls of the 3 amplitudes un with n ∈ Ik in such a way that these un are all equal to a fixed real function zk (t) which we will determine below. With this particular choice, (3.3) reduces for 0 < k < k∗ to |zn |2 uk − (+L u)3 k − 6zk3 . (3.4) 0 = −u˙ k + 1 − k 2 − 18 0≤n
530
J.-P. Eckmann, M. Hairer
For k = 0 the last term is −12z03 . We claim that for every path γ ∈ C ∞ ([t2 , t3 ]; HL ) and every ε > 0, we can find a set of bounded functions t → zk (t) such that the solution of (3.4) shadows γ at a distance at most ε. To prove this statement, consider the map F : Rk∗ → Rk∗ of the form (obtained when substituting the path γ into (3.4)) 2z3 z0 F0 (z) P0 (z) 0 3 z1 F1 (z) z1 P1 (z) = . + , F : .. .. ... → .. . . 3 zk∗ −1 Fk∗ −1 (z) Pk∗ −1 (z) zk∗ −1
where the Pm are polynomials of degree at most 2. We want to find a solution to F = 0. The Fm form a Gröbner basis for the ideal of the ring of polynomials they generate. As an immediate consequence, the equation F (z) = 0 possesses exactly 3k∗ complex solutions, if they are counted with multiplicities [MS95]. Since the coefficients of the Pm are real this implies that there exists at least one real solution. Having found a (possibly discontinuous) solution for the zk , we find nearby smooth functions z˜ k with the following properties: (2)
– Eq. (3.4) with z˜ k replacing zk and initial condition uk (t2 ) = uk leads to a solution u with u(t3 ) − +L u(f) ≤ ε/2. – One has z˜ k (t3 ) = 0. Having found the z˜ k we construct the fk in such a way that for n ∈ Ik one has un (t) = z˜ k (t). Finally, for k ∈ / I we choose the controls in such a way that uk (t) ≡ 0 for t ∈ [t2 , t3 ]. We define u(3) as the solution obtained in this way for t = t3 . Step 4. Starting from u(3) we want to reach u(f) . Note that u(3) is in Aη (for every η > 0) since it has only a finite number of non-vanishing modes. By construction we also have +L u(3) − +L u(f) ≤ ε/2. We only need to adapt the high frequency part without moving the low-frequency part too much. Since Aη is dense in H, there is a u(4) ∈ Aη with u(4) − u(f) ≤ ε/4. By the reasoning of Step 2 there is for every τ > 0 a control for which +H u(t3 + τ ) = +H u(4) when starting from u(t3 ) = u(3) . Given ε there is a τ∗ such that if τ < τ∗ then
+L u(t3 + τ ) − +L u(t3 ) < ε/4. This τ∗ depends only on u(f) and ε, as can be seen from the following argument: Since +H u(3) = 0, we can choose the controls in such a way that +H u(t3 + t) is an increasing function of t and is therefore bounded by
+H u(f) . The equation for the low-frequency part is then a finite dimensional ODE in which all high-frequency contributions can be bounded in terms of R = u(f) . Combining the estimates we see that
u(t4 ) − u(f) = +L (u(t4 ) − u(f) ) + +H (u(t4 ) − u(f) )
≤ +L (u(t4 ) − u(t3 )) + +L (u(t3 ) − u(f) )
+ +H (u(4) − u(f) ) ≤ ε . The proof of Theorem 3.1 is complete.
Uniqueness of Invariant Measure for Stochastic PDE
531
3.1. The combinatorics for the complex Ginzburg–Landau equation. We sketch here those aspects of the combinatorics which change for the complex Ginzburg–Landau equation. In this case, both the real and the imaginary parts of un and u−n are independent. Thus, we would need a noise which acts on each of the real and imaginary components of un and of u−n independently i.e., four components per n > 0 and two for n = 0. A possible definition of Ik for |k| < k∗ is: {10k∗ +2k + k, 2 · 10k∗ +2k , −3 · 10k∗ +2k } for k ≥ 0, Ik = {10k∗ +2|k|+1 − |k|, 2 · 10k∗ +2|k|+1 , −3 · 10k∗ +2|k|+1 } for k < 0. We also define IL0 = {k : |k| < k∗ } and I = IL0 ∪
Ik .
|k|
The analog of Lemma 3.3 is Lemma 3.5. The sets defined above have the following properties: (A) Let Ik = {k1 , k2 , k3 }. Then, the sum k1 + k2 + k3 equals k. (B) The sets Ik and IL0 are all mutually disjoint. (C) Let S be a collection of three indices in I , S = {k1 , k2 , k3 }. If the sum k1 + k2 + k3 equals k with |k| < k∗ then either S = Ik or S ⊂ IL0 or S is of the form S = {k, k , −k }. Finally, the analog of (3.4) is for |k| < k∗ :
0 = −u˙ k + 1 − (1 + ia)k 2 uk − (1 + ib) +L u |+L u|2 k + 6zk3 .
Apart from these combinatorial changes the complex Ginzburg–Landau equation is treated like the real one. 4. Strong Feller Property and Proof of Theorem 1.1 The aim of this section is to show the strong Feller property of the process defined by (1.3) resp. (1.7). Theorem 4.1. The Markov semigroup P t defined in (1.9) is strong Feller. Proof of Theorem 1.1. This proof follows a well-known strategy, see e.g., [DPZ96]. First of all, there is at least one invariant measure for the process (1.7), since for a problem in a finite domain, the semigroup t → e−At is compact, and therefore [DPZ96, Theorem 6.3.5] applies. By the controllability Theorem 3.1, we deduce, see [DPZ96, Theorem 7.4.1], that the transition probability from any point in H to any open set in H cannot vanish, i.e., the Markov process is irreducible. Furthermore, by Theorem 4.1 the process is strong Feller. By a classical result of Khas’minski˘ı, this implies that P t is regular. Therefore we can use Doob’s theorem [DPZ96, pp. 42–43] to conclude that the invariant measure is unique. This completes the proof of Theorem 1.1. Before we start with the proof of Theorem 4.1, we explain our strategy. Because of the polynomial nature of the nonlinearity in (1.3), the natural bounds diverge with some
532
J.-P. Eckmann, M. Hairer
power of the norm of the initial data. On the other hand, the nonlinearity is strongly dissipative at large amplitudes. Therefore we introduce a cutoff version of the dynamics beyond some fixed amplitude and then take the limit in which this cutoff goes to infinity. We seem to need such a technique to get the bounds (5.11) and (5.12). The precise definition of the cutoff version F! of F is: F! (x) = 1 − χ x /(3!) F (x), where χ is a smooth, non-negative function satisfying χ (z) =
1 0
if z > 2, if z < 1.
Similarly, we define Q! (x) = Q + χ ( x /!)+k∗ ,
(4.1)
where +k∗ is the projection onto the frequencies below k∗ . Remark 4.2. These cutoffs have the following effect as a function of x : – When x ≤ ! then Q! (x) = Q and F! (x) = F (x). – When ! < x ≤ 2! then Q! (x) depends on x and F! (x) = F (x). – When 2! < x ≤ 6! then all Fourier components of Q! (x) including the ones below k∗ are non-zero and F! (x) is proportional to a F (x) times a factor ≤ 1. – When 6! < x then all Fourier components of Q! (x) including the ones below k∗ are non-zero and F! (x) = 0. At high amplitudes, the nonlinearity is truncated to 0. Thus, the Hörmander condition cannot be satisfied there unless the diffusion process is non-degenerate. We achieve this non-degeneracy by extending the stochastic forcing to all degrees of freedom when x
is large. Instead of (1.7) we then consider the modified problem dt! = −At! dt + F! ◦ t! dt + Q! ◦ t! dW (t),
(4.2)
with 0! (ξ ) = ξ ∈ H. Note that the cutoffs are chosen in such a way that the dynamics of t! (ξ ) coincides with that of t (ξ ) as long as t (ξ ) < !. We will show that the solution of (4.2) defines a Markov semigroup P!t ϕ(ξ ) = E ϕ ◦ t! (ξ ), with the following smoothing property: Theorem 4.3. There exist exponents µ, ν > 0, and for all ! > 0 there is a constant C! such that for every ϕ ∈ Bb (H), for every t > 0 and for every ξ ∈ H, the function P!t ϕ is differentiable and its derivative satisfies
DP!t ϕ(ξ ) ≤ C! (1 + t −µ )(1 + ξ ν ) ϕ L∞ .
(4.3)
Uniqueness of Invariant Measure for Stochastic PDE
533
Using this theorem, the proof of Theorem 4.1 follows from a limiting argument. Proof of Theorem 4.1. Choose x ∈ H, t > 0, and ε > 0. We denote by B the ball of radius 2 x centered around the origin in H. Using Proposition 2.3 we can find a sufficiently large constant ! = !(x, t, ε) such that for every y ∈ B, the inequality ε P sup s (y) > ! ≤ 8 s∈[0,t] holds. Choose ϕ ∈ Bb (H) with ϕ L∞ ≤ 1. We have by the triangle inequality t P ϕ(x) − P t ϕ(y) ≤ P t ϕ(x) − P t ϕ(x) + P t ϕ(x) − P t ϕ(y) ! ! ! + P t ϕ(y) − P!t ϕ(y). Since the dynamics of the cutoff equation and the dynamics of the original equation coincide on the ball of radius !, we can write, for every z ∈ B, t P ϕ(z) − P t ϕ(z) = E ϕ ◦ t (z) − ϕ ◦ t (z) ! ! ε ≤ 2 ϕ L∞ P sup s (z) > ! ≤ . 4 s∈[0,t] This implies that t P ϕ(x) − P t ϕ(y) ≤ ε + P t ϕ(x) − P t ϕ(y). ! ! 2 By Theorem 4.3 we see that if y is sufficiently close to x then t P ϕ(x) − P t ϕ(y) ≤ ε . ! ! 2 Since ε is arbitrary we conclude that P t ϕ is continuous when ϕ L∞ ≤ 1. The generalization to any value of ϕ L∞ follows by linearity in ϕ. The proof of Theorem 4.1 is complete.
5. Regularity of the Cutoff Process In this section, we start the proof of Theorem 4.3. If the cutoff problem were finite dimensional, a result like Theorem 4.3 could be derived easily using, e.g., the works of Hörmander [Hör67, Hör85], Malliavin [Mal78], Stroock [Str86], or Norris [Nor86]. In the present infinite-dimensional context we need to modify the corresponding techniques, but the general idea retained is Norris’. The main idea will be to treat the (infinite number of) high-frequency modes by a method which is an extension of [DPZ96, Cer99], while the low-frequency part is handled by a variant of the Malliavin calculus adapted from [Nor86]. It is at the juncture of these two techniques that we need a cutoff in the nonlinearity.
534
J.-P. Eckmann, M. Hairer
5.1. Splitting and interpolation spaces. Throughout the remainder of this paper, we will again denote by HL and HH the spaces corresponding to the low (resp. high)-frequency parts. We slightly change the meaning of “low-frequency” by including in the lowfrequency part all those frequencies that are driven by the noise which are in I as defined in Definition 3.2. More precisely, the low-frequency part is now {k : |k| ≤ L − 1}, where L = max{k : k ∈ I } + 1. Note that L is finite. Since A = 1 − is diagonal with respect to this splitting, we can define its low (resp. high)-frequency parts AL and AH as operators on HL and HH . From now on, L will always denote the dimension of HL , which will therefore be identified with RL .4 We also allow ourselves to switch freely between equivalent norms on RL , when deriving the various bounds. In the sequel, we will always use the notations DL and DH to denote the derivatives with respect to HL (resp. HH ) of a differentiable function defined on H. The words “derivative” and “differentiable” will always be understood in the strong sense, i.e., if f : B1 → B2 with B1 and B2 some Banach spaces, then Df : B1 → L (B1 , B2 ), i.e., it is bounded from B1 to B2 . We introduce the interpolation spaces Hγ (for every γ ≥ 0) defined as being equal to the domain of Aγ equipped with the graph norm
x 2γ = Aγ x 2 = (1 − )γ x 2 . Clearly, the Hγ are Hilbert spaces and we have the inclusions Hγ ⊂ Hδ
if
γ ≥ δ.
Note that in usual conventions, Hγ would be the Sobolev space of index 2γ + 1. Our motivation for using non-standard notation comes from the fact that our basic space is that with one derivative, which we call H, and that γ measures additional smoothness in terms of powers of the generator of the linear part.
5.2. Proof of Theorem 4.3. The proof of Theorem 4.3 is based on Proposition 5.1 and Proposition 5.2 which we now state. Proposition 5.1. Assume that the noise satisfies condition (1.6). Then (4.2) defines a stochastic flow t! on H with the following properties which hold for any p ≥ 1: (A) If ξ ∈ Hγ with some γ satisfying 0 ≤ γ ≤ α, the solution of (4.2) stays in Hγ , with a bound (5.1a) E sup t! (ξ ) pγ ≤ CT ,p,! (1 + ξ γ )p , 0
for every T > 0. If γ ≥ 1 the solution exists in the strong sense in H. (B) The quantity t! (ξ ) is in Hα with probability 1 for every time t > 0 and every ξ ∈ H. Furthermore, for every T > 0 there is a constant CT ,p,! for which E sup t αp t! (ξ ) pα ≤ CT ,p,! (1 + ξ )p . (5.1b) 0
4 The choice of L above is dictated by the desire to obtain a dimension equal to L and not L + 1.
Uniqueness of Invariant Measure for Stochastic PDE
535
(C) The mapping ξ → t! (ξ ) (for ω and t fixed) has a.s. bounded partial derivatives with respect to ξ . Furthermore, we have for every ξ, h ∈ H the bound p E sup Dt! (ξ ) h ≤ CT ,p,! h p , (5.1c) 0
for every T > 0. (D) For every h ∈ H and ξ ∈ Hα , the quantity Dt! (ξ ) h is in Hα with probability 1 for every t > 0. Furthermore, for a ν depending only on α the bound p E sup t αp Dt! (ξ ) hα ≤ CT ,p,! (1 + ξ α )νp h p , (5.1d) 0
holds for every T > 0. (E) For every ξ ∈ Hγ with γ ≤ α, we have the small-time estimate p E sup t! (ξ ) − e−At ξ γ ≤ CT ,p,! (1 + ξ γ )p ε p/16 ,
(5.1e)
0
which holds for every ε ∈ (0, T ] and every T > 0. This proposition will be proved in Sect. 8.4. Proposition 5.2. There exist exponents µ∗ , ν∗ > 0 such that for every ϕ ∈ Cb2 (H), every ξ ∈ Hα and every t > 0,
DP!t ϕ(ξ ) ≤ C! (1 + t −µ∗ ) 1 + ξ να∗ ϕ L∞ . (5.2) Proof of Theorem 4.3. Note first that for all τ > 0, one has P!τ ϕ L∞ ≤ ϕ L∞ . Furthermore, for τ > 1,
DP!τ ϕ(ξ ) = D P!1 (P!τ −1 ϕ) (ξ ). Therefore, if we can show (4.3) for t ≤ 1, then we find for any τ > 1:
DP!τ ϕ(ξ ) ≤ 2C! (1 + ξ ν ) P!τ −1 ϕ L∞ ≤ 2C! (1 + ξ ν ) ϕ L∞ . In view of the above, it clearly suffices to show Theorem 4.3 for t ∈ (0, 1]. We first prove the bound for the case ϕ ∈ Cb2 (H). Let h ∈ H. Using Definition (1.9) of P!t ϕ and the Markov property of the flow we write
DP!2t ϕ(ξ )h = DE P!t ϕ ◦ t! (ξ )h = E DP!t ϕ ◦ t! (ξ )Dt! (ξ )h 2 2 ≤ E DP!t ϕ ◦ t! (ξ ) EDt! (ξ )h . Bounding the first square root by Proposition 5.2 and then applying Proposition 5.1 (B–C), (with T = 1) we get a bound 2 2 EDt (ξ )h
DP 2t ϕ(ξ )h ≤ C! ϕ L∞ (1 + t −µ∗ ) E 1 + t (ξ ) να∗ !
!
≤ C! ϕ L∞ (1 + t
−µ∗
)t
−αν∗
!
ν∗
(1 + ξ ) h .
Choosing µ = µ∗ + αν∗ and ν = ν∗ we find (4.3) in the case when ϕ ∈ Cb2 (H). The method of extension to arbitrary ϕ ∈ Bb (H) can be found in [DPZ96, Lemma 7.1.5]. The proof of Theorem 4.3 is complete.
536
J.-P. Eckmann, M. Hairer
5.3. Smoothing properties of the transition semigroup. In this subsection we prove the smoothing bound Proposition 5.2. Thus, we will no longer be interested in smoothing in position space as shown in Proposition 5.1 but in smoothing properties of the transition semigroup associated to (4.2). Important remark. In this section and up to Sect. 8.6 we always tacitly assume that we are considering the cutoff equation (4.2) and we will omit the index !. Thus, we will write Eq.(4.2) as dt = −At dt + F ◦ t dt + Q ◦ t dW (t).
(5.3)
The solution of (5.3) generates a semigroup on the space Bb (H) of bounded Borel functions over H = HL ⊕ HH by P t ϕ = E ϕ ◦ t , ϕ ∈ Bb (H). Our goal will be to show that the mixing properties of the nonlinearity are strong enough to make P t ϕ differentiable, even if ϕ is only measurable. We will need a separate treatment of the high and low frequencies, and so we reformulate (5.3) as dtL = −AL tL dt + FL ◦ t dt + QL ◦ t dWL (t), tL ∈ HL , (5.4a) dtH = −AH tH dt + FH ◦ t dt + QH dWH (t), tH ∈ HH , (5.4b) where HL and HH are defined in Sect. 5.1 and the cutoff version of Q was defined in (4.1). Note that QH t (ξ ) is independent of ξ and t by construction, which is why we can use QH in (5.4b). The proof of Proposition 5.2 is based on the following two results dealing with the low-frequency part and the cross-terms between low and high frequencies, respectively. Proposition 5.3. There exist exponents µ, ν > 0 such that for every ϕ ∈ Cb2 (H), every ξ ∈ Hα and every T > 0, one has E DL ϕ ◦ t (ξ )(DL tL )(ξ ) ≤ CT t −µ 1 + ξ να ϕ L∞ , for all t ∈ (0, T ].5 Lemma 5.4. For every T > 0 and every p ≥ 1, there is a constant CT ,p > 0 such that for every t ≤ T , one has the estimates (valid for hL ∈ HL and hH ∈ HH ): p E sup DL sH (ξ )hL ≤ CT ,p t p hL p , (5.5a) 0<s
These bounds are independent of ξ ∈ H. Remark 5.5. In the absence of the cutoff ! one can prove inequalities like (5.5), but with an additional factor of (1 + ξ 2 )p on the right. This is not good enough for our strategy and is the reason for introducing a cutoff. 5 Recall that not only the flow, but for example also the constant C depends on !. T
Uniqueness of Invariant Measure for Stochastic PDE
537
The proof of Proposition 5.3 will be given in Sect. 6 and the proof of Lemma 5.4 will be given in Sect. 8.5. Proof of Proposition 5.2. As in the proof of Theorem 4.3, it suffices to consider times t ≤ T , where T is any (small) positive constant. The proof will be performed in the spirit of [DPZ96] and [Cer99], using a modified version of the Bismut-Elworthy formula. Take a function ϕ ∈ Cb2 (H). We consider QL and QH as acting on and into HL and HH respectively. It is possible to write as a consequence of Itô’s formula: t ϕ ◦ t (ξ ) = P t ϕ(ξ ) + (DP t−s ϕ) ◦ s (ξ ) Q ◦ s (ξ ) dW (s) 0 t t = P ϕ(ξ ) + (DL P t−s ϕ) ◦ s (ξ ) QL ◦ s (ξ ) dWL (s) 0 t + (5.6) (DH P t−s ϕ) ◦ s (ξ ) QH dWH (s). 0
Choose some h ∈ HH . By Proposition 5.1 (D), DH tH (ξ )h is in Hα for positive times and is bounded by (5.1d). Using condition (1.8b) we see that Q−1 H maps to HH and so we can multiply both sides of (5.6) by 3t/4 s Q−1 D (ξ )h , dW (s) , H H H H t/4
where the scalar product is taken in HH . Taking expectations on both sides, the first two terms on the right vanish because dWL and dWH are independent and of mean zero. Thus, we get 3t/4 s Q−1 E ϕ ◦ t (ξ ) H DH H (ξ )h , dWH (s) t/4 3t/4
=E
t/4
(DH P
t−s
s
ϕ) ◦ (ξ )
DH sH
(5.7)
(ξ )h ds.
We add to both sides of (5.7) the term 3t/4 E (DL P t−s ϕ) ◦ s (ξ ) DH sL (ξ )h ds, t/4
and note that the r.h.s. can be rewritten as 3t/4 t DH E (P t−s ϕ) ◦ s (ξ )h ds = DH E ϕ ◦ t (ξ )h, 2 t/4 t−s since by the Markov property, E P ϕ ◦s (ξ ) = E ϕ ◦t (ξ ). Therefore, (5.7) leads to 3t/4 −1 2 t t QH DH sH (ξ )h, dWH (s) DH P ϕ (ξ )h = E ϕ ◦ (ξ ) t t/4 (5.8) 3t/4 2 t−s s s (DL P ϕ) ◦ (ξ ) DH L (ξ )h ds. + E t t/4
538
J.-P. Eckmann, M. Hairer
For the low-frequency part, we use the equality t/2 DL P t ϕ (ξ ) = E DL P t/2 ϕ ◦ t/2 (ξ ) DL L (ξ ) t/2 + E DH P t/2 ϕ ◦ t/2 (ξ ) DL H (ξ ) .
(5.9)
We introduce the Banach spaces BT ,µ∗ ,ν∗ of measurable functions f : (0, T ) × Hα → H, for which t µ∗ f (t, ξ )
|||f |||T ,µ∗ ,ν∗ ≡ sup sup (5.10) ν∗ 0 0 such that fϕ : (t, ξ ) → DP t ϕ (ξ ) belongs to BT ,µ∗ ,ν∗ and that |||fϕ |||T ,µ∗ ,ν∗ ≤ C ϕ L∞ , thus proving Proposition 5.2. The fact that fϕ ∈ BT ,µ∗ ,ν∗ for every T if ϕ ∈ Cb2 (H) is shown in [DPZ92, Theorem 9.17], so we only have to show the bound on its norm. The following inequalities are obtained by applying to (5.8) in order the Cauchy– Schwarz inequality and the definition (5.10), then (1.8b), (5.1d), and again Cauchy– Schwarz. The last inequality is obtained by applying (5.1a) and (5.1c). This yields for h ∈ HH : 3t/4 1/2 −1 DH P t ϕ (ξ )h ≤ ϕ L∞ 2 E Q DH s (ξ )h2 ds H H t t/4 3t/4 1 + s (ξ ) να∗ 2 DH s (ξ )h ds + |||fϕ |||t,µ∗ ,ν∗ E L µ ∗ t (t − s) t/4 −α ν ∗ ≤ Ct ϕ L∞ 1 + ξ α h
1/2 2 + Ct −µ∗ |||fϕ |||t,µ∗ ,ν∗ E sup 1 + s (ξ ) να∗
(5.11)
s∈[ 4t , 3t4 ]
1/2 2 × E sup DH sL (ξ )h
s∈[ 4t , 3t4 ]
≤ Ct −α ϕ L∞ 1 + ξ να∗ h + Ct −µ∗ +1/4 |||fϕ |||t,µ∗ ,ν∗ (1 + ξ να∗ ) h . Note that this is the place where the lower bound (1.8b) on the noise is really used. For the low-frequency part Eq.(5.9) we use first Proposition 5.3, P t/2 ϕ L∞ ≤
ϕ L∞ , and the definition (5.10), then Cauchy–Schwarz, and finally (5.5a) and (5.1b). This leads for h ∈ HL to: DL P t ϕ (ξ )h ≤ Ct −µ∗ ϕ L∞ 1 + ξ ν∗ h
α t/2 + Ct −µ∗ |||fϕ |||t,µ ,ν E 1 + t/2 (ξ ) ν∗ DL (ξ )h ∗
α
∗
≤ Ct −µ∗ ϕ L∞ 1 + ξ να∗ h
H
(5.12)
Uniqueness of Invariant Measure for Stochastic PDE
539
2 2 t/2 + Ct −µ∗ |||fϕ |||t,µ∗ ,ν∗ E 1 + t/2 (ξ ) να∗ E DL H (ξ )h ≤ Ct −µ∗ ϕ L∞ 1 + ξ να∗ h + Ct −µ∗ +1 |||fϕ |||t,µ∗ ,ν∗ 1 + ξ να∗ h . Combining the above expressions we get for every T ∈ (0, 1] a bound of the type |||fϕ |||T ,µ∗ ,ν∗ ≤ C1 ϕ L∞ + C2 T 1/4 |||fϕ |||T ,µ∗ ,ν∗ . ! Our final choice of T is now T 1/4 = min 1, 1/(2C2 ) , and we find
|||fϕ |||T ,µ∗ ,ν∗ ≤ C ϕ L∞ .
(5.13)
Since fϕ (t, ξ ) = DP t ϕ (ξ ), inspection of (5.10) shows that (5.13) is equivalent to (5.2). The proof of Proposition 5.2 is complete.
6. Malliavin Calculus To prove Proposition 5.3 we will apply a modification of Norris’ version of the Malliavin calculus. This modification takes into account some new features which are necessary due to our splitting of the problem in high and low frequencies (which in turn was done to deal with the infinite dimensional nature of the problem). Consider first the deterministic PDE for a flow: d: t (ξ ) = −A: t (ξ ) + F ◦ : t (ξ ). dt
(6.1)
This is really an abstract reformulation for the flow defined by the GL equation, and ξ belongs to a space H, which for our problem is a suitable Sobolev space. The linear operator A is chosen as 1 − , while the non-linear term F corresponds to 2u − u3 in the GL equation. Below, we will work with approximations to the GL equation, and all we need to know is that A : H → H is the generator of a strongly continuous semigroup, and F will be seen to be bounded with bounded derivatives. For each fixed ξ ∈ H we consider the following stochastic variant of (6.1): d: t (ξ ) = −A: t (ξ ) dt + F ◦ : t (ξ ) dt + Q ◦ : t )(ξ ) dW (t) (6.2) with initial condition : 0 (ξ ) = ξ . Furthermore, W is the cylindrical Wiener process on a separable Hilbert space W and Q is a strongly differentiable map from H to L 2 (W, H), the space of bounded linear Hilbert–Schmidt operators from W to H. We next introduce the notion of directional derivative (in the direction of the noise) and the reader familiar with this concept can pass directly to (6.3). To understand this concept consider first the case of a function t → vit ∈ W. Then the variation Dvi : t of : t in the direction vi is obtained by replacing dW (t) by dW (t) + εvit dt and it satisfies the equation dDvi : t = −ADvi : t + (DF ◦ : t )Dvi : t dt + (DQ ◦ : t )Dvi : t dW (t) + (Q ◦ : t )vit dt.
Intuitively, the first line comes from varying : t with respect to the noise and the second comes from varying the noise itself.
540
J.-P. Eckmann, M. Hairer
We will need a finite number L of directional derivatives, and so we introduce some more general notation. We combine L vectors vi as used above into a matrix called v which is an element of < × [0, ∞) → W L . We identify W L with L (RL , W). Note that we now allow v to depend on <, and to make things work, we require v to be a predictable stochastic process, i.e., v t only depends on the noise before time t. The stochastic process Gtv ∈ HL (corresponding to Dv : t ) is then defined as the solution of the equation dGtv h = −AGtv + DF ◦ : t Gtv + Q ◦ : t v t h dt (6.3) + DQ ◦ : t Gtv h dW (t), G0v = 0, which has to hold for all h ∈ RL . Having given the detailed definition of Gtv , we will denote it henceforth by the more suggestive Gtv (ξ ) = Dv : t (ξ ), to make clear that it is a directional derivative. We use the notation Dv to distinguish this derivative from the derivative D with respect to the initial condition ξ . For (6.2) and (6.3) to make sense, two assumptions on F , Q and v are needed: A1 F : H → H and Q : H → L 2 (W, H) are of at most linear growth and have bounded first and second derivatives. A2 The stochastic process t → v t is predictable, has a continuous version, and satisfies E sup v s p < ∞, s∈[0,t]
for every t > 0 and every p ≥ 1. (The norm being the norm of W L .) It is easy to see that these conditions imply the hypotheses of Theorem 8.9 for the problems (6.2) and (6.3). Therefore Gtv is a well-defined strongly Markovian stochastic process. With these notations one has the well-known Bismut integration by parts formula [Nor86]. Proposition 6.1. Let : t and Dv : t be defined as above and assume A1 and A2 are satisfied. Let B ⊂ H be an open subset of H such that : t ∈ B almost surely and let ϕ : B → R be a differentiable function such that E ϕ(: t ) 2 + E Dϕ(: t ) 2 < ∞. Then we have for every h ∈ RL the following identity in R: t s t v h, dW (s) , E Dϕ(: )Dv : h = E ϕ(: )
t
t
0
where ·, · is the scalar product of W.
(6.4)
Uniqueness of Invariant Measure for Stochastic PDE
541
Remark 6.2. Equation (6.4) is useful because it relates the expectation of Dϕ to that of ϕ. In order to fully exploit (6.4) we will need to get rid of the factor Dv : t . This will be possible by a clever choice of v. This procedure is explained for example in [Nor86] but we will need a new variant of his results because of the high-frequency part. In the sequel, we will proceed in two steps. We need only bounds on DL ϕ, since the smoothness of the high-frequency part follows by other means. Thus, it suffices to construct Dv : t in such a way that +L Dv : t is invertible, where +L is the orthogonal projection onto HL . The construction of +L Dv : t follows closely the presentation of [Nor86]. However, we also want +H Dv : t = 0 and this elimination of the high-frequency part seems to be new. Proof. The finite dimensional case is stated (with slightly different assumptions on F ) in [Nor86]. The extension to the infinite-dimensional setting can be done without major difficulty. By A1–A2 and Theorem 8.9, we ensure that all the expressions appearing in the proof and the statement are well-defined. By A2, we can use Itô’s formula to ensure the validity of the assumptions for the infinite-dimensional version of Girsanov’s theorem [DPZ96].
6.1. The construction of v. In order to use Proposition 6.1 we will construct v = (vL , vH ) in such a way that the high-frequency part of Dv t = (Dv tL , Dv tH ) vanishes. This construction is new and will be explained in detail in this subsection. Notation. The equations which follow are quite involved. To keep the notation at a reasonable level without sacrificing precision we will adopt the following conventions: (DL FL )t ≡ (DL FL ) ◦ t , (DL QL )t ≡ (DL QL ) ◦ t , and similarly for other derivatives of the Q and the F . Furthermore, the reader should note that DL QL is a linear map from HL to the linear maps HL → HL and therefore, below, (DL QL )h with h ∈ HL is a linear map HL → HL . The dimension of HL is L < ∞. Inspired by [Nor86], we define the L × L matrix-valued stochastic processes ULt and VLt by the following SDE’s, which must hold for every h ∈ HL : dULt h = − AL ULt h dt + (DL FL )t ULt h dt + (DL QL )t ULt h dWL (t), UL0 = I ∈ L (HL , HL ),
(6.5a)
dVLt h = VLt AL h dt − VLt (DL FL )t h dt − VLt (DL QL )t h dWL (t) +
L−1
VLt (DL QL )t (DL QL )t h ei ei dt,
i=0
VL0
= I ∈ L (HL , HL ).
(6.5b)
542
J.-P. Eckmann, M. Hairer
The last term in the definition of VLt will be written as L−1
2 VLt (DL QiL )t h dt,
i=0
where QiL is the i th column of the matrix QL . For small times, the process ULt is an approximation to the partial Jacobian DL tL , and VLt is an approximation to its inverse. We first make sure that the objects in (6.5) are well-defined. The following lemma summarizes the properties of UL and VL which we need later. Lemma 6.3. The processes ULt and VLt satisfy the following bounds. For every p ≥ 1 and all T > 0 there is a constant CT ,p,! independent of the initial data (for t ) such that (6.6a) E sup ULt p + VLt p ≤ CT ,p,! , t∈[0,T ]
E sup VLt − I p ≤ CT ,p,! ε p/2 , t∈[0,ε]
(6.6b)
for all ε < T . Furthermore, VL is the inverse of UL in the sense that VLt = (ULt )−1 almost surely. Proof. The bound (6.6a) is a straightforward application of Theorem 8.9 whose conditions are easily checked. (Note that we are here in a finite-dimensional, linear setting.) To prove (6.6b), note that I is the initial condition for VL . One writes (6.5b) in its integral form and then the result follows by applying (6.6a). The last statement can be shown easily by applying Itô’s formula to the product VLt ULt . (In fact, the definition of VL was precisely made with this in mind.) We continue with the construction of v. Since A and Q are diagonal with respect to the splitting H = HL ⊕ HH , we can write (6.3) as (6.7a) d Dv tL = −AL Dv tL + (DL FL )t Dv tL + (DH FL )t Dv tH + QtL vLt dt + (DL QL )t Dv tL dWL (t) + (DH QL )t Dv tH dWL (t), d Dv tH = −AH Dv tH + (DL FH )t Dv tL t + (DH FH )t Dv tH + QH vH dt,
(6.7b)
with zero initial condition. Since we want to consider derivatives with respect to the t as low-frequency part, we would like to define (implicitly) vH t t t = −Q−1 vH H (DL FH ) Dv L .
Uniqueness of Invariant Measure for Stochastic PDE
543
In this way, the solution of (6.7b) would be Dv tH ≡ 0. We next would define the “directions” vL and vH by ∗ vLt = VLt QtL , (6.8) t t t vH = −Q−1 H (DL FH ) Dv L , where Dv tL is the solution to (6.7a) with Dv tH replaced by 0 and vL replaced by t t ∗ VL QL . Here, X ∗ denotes the transpose of the real matrix X. The implict problem (6.8) can be somewhat simplified by the following device: Since we are constructing a solution of (6.7) whose high-frequency part is going to vanish, we consider instead the simpler equation for y t ∈ L (HL , HL ): ∗ dy t = −AL y t + (DL FL )t y t + QtL VLt QtL dt + (DL QL )t y t dWL (t), (6.9) with y 0 = 0, and where we use again the notation F t = F ◦ t , and similar notation for Q. The verification that (6.9) is well-defined and can be bounded is again a consequence of Theorem 8.9 and is left to the reader. Given the solution of (6.9) we proceed to make t : our definitive choice of vLt and vH Definition 6.4. Given an initial condition ξ ∈ Hα (for t ) and a cutoff ! < ∞ we t by define v t = vLt ⊕ vH ∗ ∗ vLt ≡ VLt QtL = VLt (QL ◦ t ) , (6.10) −1 t t t t t ≡ −Q−1 vH H (DL FH ) y = −QH (DL FH ) ◦ ) y , where t solves (5.3), VLt is the solution of (6.5b), and y t solves (6.9). Lemma 6.5. The process v t satisfies for all p ≥ 1 and all t > 0 : E sup v s p < Ct,p,! 1 + ξ α )p , s∈[0,t]
i.e., it satisfies assumption A2 of Proposition 6.1. Proof. By Proposition 5.1 (B), t is in Hα for all t ≥ 0. In Lemma 8.1 P6, it will be checked that DL FH maps Hα into L (HL , Hα ∩ HH ) and that this map has linear growth. By the lower bound (1.6) on the amplitudes qk , we see that Q−1 H is bounded from Hα ∩ HH to HH and thus the assertion follows. We now verify that Dv tH ≡ 0. Indeed, consider Eq.(6.7). This is a system for two t this system takes unknowns, Y t = Dv tL and X t = Dv tH . For our choice of vLt and vH the form d Y t = −AL Y t + (DL FL )t Y t (6.11a) + (DH FL )t X t + QtL (VLt QtL )∗ dt + (DL QL )t Y t dWL (t) + (DH QL )t X t dWL (t),
544
J.-P. Eckmann, M. Hairer
d Xt = −AH X t + (DL FH )t Y t
(6.11b)
+ (DH FH )t X t − (DL FH )t y t dt. By inspection, we see that Xt ≡ 0 and dY t = −AL Y t + (DL FL )t Y t dt + (DL QL )t Y t dWL (t) + QtL (VLt QtL )∗ dt (6.12) solve the problem, i.e., Y t = y t , by the construction of y t . Applying the Itô formula to the product VLt Y t and using Eqs.(6.5b) and (6.12), we see immediately that we have defined Y t = Dv tL in such a way that d VLt Dv tL = VLt QtL (QtL )∗ (VLt )∗ dt, because all other terms cancel. Thus we finally have shown Theorem 6.6. Given an initial condition ξ ∈ Hα (for t ) and a cutoff ! < ∞, the following is true: If v t is given by Definition 6.4 then t ∗ Dv tL = ULt VLs (QL Q∗L ) ◦ s VLs ds ≡ ULt CLt , (6.13) 0 Dv tH ≡ 0. Definition 6.7. We will call the matrix CLt the partial Malliavin matrix of our system. 7. The Partial Malliavin Matrix In this section, we estimate the partial Malliavin matrix CLt from below. We fix some time t > 0 and denote by S L the unit sphere in RL . Our bound is Theorem 7.1. There are constants µ, ν ≥ 0 such that for every T > 0 and every p ≥ 1 there is a CT ,p,! such that for all initial conditions ξ ∈ Hα for the flow t and all t < T , one has −p νp E det CLt ≤ CT ,p,! t −µp 1 + ξ α . Corollary 7.2. There are constants µ, ν ≥ 0 such that for every T > 0 and every p ≥ 1 there is a CT ,p,! such that for all initial conditions ξ ∈ Hα for the flow t and all t < T , one has, with v given by Definition 6.4: −p ≤ CT ,p,! t −µp 1 + ξ α νp . E Dv tL This corollary follows from (Dv tL )−1 = (CLt )−1 VLt and Eq.(6.6a). As a first step, we formulate a bound from which Theorem 7.1 follows easily. Theorem 7.3. There are a µ > 0 and a ν > 0 such that for every p ≥ 1, every t < T and every ξ ∈ H2 , one has t s s ∗ 2 Q V h ds < ε ≤ CT ,p,! ε p t −µp (1 + ξ 2 )νp , P inf L L h∈S L 0
with CT ,p,! independent of ξ .
Uniqueness of Invariant Measure for Stochastic PDE
545
"t Proof of Theorem 7.1. Note that 0 QsL (VLs )∗ h 2 ds is, by Eq.(6.13), nothing but the quantity h, CLt h. Then, Theorem 7.1 follows at once. The proof of Theorem 7.3 is largely inspired from [Nor86, Sect. 4], but we need some new features to deal with the infinite dimensional high-frequency part. This will take up the next three subsections. Our proof needs a modification of the Lie brackets considered when we study the Hörmander condition. We explain first these identities in a finite dimensional setting. 7.1. Finite dimensional case. Throughout this subsection we assume that both HL and HH are finite dimensional and we denote by N the dimension of H. The function Q maps H to L (H, H), and we denote by Qi : H → H its i th column (i = 0, . . . , N − 1).6 # is the drift (in this section, we absorb the linear part of the SDE into F #= Finally, F −A + F , to simplify the expressions). The equation for t is t
t
# ◦ s (ξ ) ds + F
(ξ ) = ξ + 0
t N−1
Qi ◦ s (ξ ) dwi (s).
0 i=0
Let K : H → HL be a smooth function whose derivatives are all bounded and define #t = F # ◦ t , and Qt = Qi ◦ t . We then have by Itô’s formula K t = K ◦ t , F i #t dt + dK t = (DK)t F
N−1 i=0
(DK)t Qti dwi (t) +
1 2
N−1 i=0
(D 2 K)t (Qti ; Qti ) dt.
(7.1)
We next rewrite Eq.(6.5) for VLt as: dVLt
=
#L )t −VLt (DL F
dt −
L−1
VLt (DL Qi )t
dwi (t) +
i=0
L−1
2 VLt (DL Qi )t dt.
i=0
By Itô’s formula, we have therefore the following equation for the product VLt K t : L−1 #L )t K t dt − VLt (DL Qi )t K t dwi (t) d VLt K t = − VLt (DL F i=0
+ VLt
L−1
2 #t dt (DL Qi )t K t dt + VLt (DK)t F
i=0
+ VLt
N−1
(DK)t Qti dwi (t)
i=0
+ 21 VLt − VLt
(7.2)
N−1 i=0
(D 2 K)t (Qti ; Qti ) dt
L−1 i=0
(DL Qi )t (DK)t Qti dt.
6 There is a slight ambiguity of notation here, since Q really means Q i !,i which is not the same as Q! .
546
J.-P. Eckmann, M. Hairer
By construction, DL Qi = 0 for i ≥ L and therefore we can extend all the sums above to N − 1. The following definition is useful to simplify (7.2). Let A : H → H and B : H → HL be two functions with continuous bounded derivatives. We define the projected Lie bracket [A, B]L : H → HL by [A, B]L (x) = +L [A, B](x) = DB(x) A(x) − DL AL (x) B(x). A straightforward calculation then leads to #, K]tL + d VLt K t = VLt [F + VLt
1 2
N−1
$
Qi , [Qi , K]L
i=0
%t L
dt
N −1
[Qi , K]tL dwi (t)
(7.3)
i=0
+ 21 VLt
N−1
2 (DL Qi )t K t − (DK)t (DQi )t Qti
i=0
+ (DDL Qi )t (Qti ; K t ) dt.
Note next that for i < L, both K and Qi map to HL and therefore DDL Qi (Qi ; K) = DL2 Qi (Qi ; K) when i < L and it is 0 otherwise. Similarly, (DK)(DQi )Qi equals (DK)(DL Qi )Qi when i < L and vanishes otherwise. Thus, the last sum in (7.3) only extends to L − 1. & : H → H by In order to simplify (7.3) further, we define the vector field F &= F #− F
1 2
L−1
(DL Qi )Qi .
i=0
Then we get &, K]tL + d VLt K t = VLt [F
1 2
N−1
$
i=0
Qi , [Qi , K]L
%t L
dt + VLt
N−1
[Qi , K]tL dwi (t).
i=0
This is very similar to [Nor86, p. 128], who uses conventional Lie brackets instead of [·, ·]L .
7.2. Infinite dimensional case. In this case, some additional care is needed when we transcribe (7.1). The problem is that the stochastic flow t solves (5.4) in the mild sense but not in the strong sense. Nevertheless, this technical difficulty will be circumvented by choosing the initial condition in Hα . We have indeed by Proposition 5.1 (A) that if the initial condition is in Hγ with γ ∈ [1, α], then the solution of (5.4) is in the same space. Thus, Proposition 5.1 allows us to use Itô’s formula also in the infinite dimensional case. For any two Banach (or Hilbert) spaces B1 , B2 , we denote by P (B1 , B2 ) the set of all C ∞ functions B1 → B2 , which are polynomially bounded together with all their
Uniqueness of Invariant Measure for Stochastic PDE
547
derivatives. Let K ∈ P (H, HL ) and X ∈ P (H, H). We define as above [X, K]L ∈ P (H, HL ) by [X, K]L (x) = DK(x) X(x) − DL XL (x) K(x). Furthermore, we define [A, K]L ∈ P (D(A), HL ) by the corresponding formula, i.e., [A, K]L (x) = DK(x) Ax − AL K(x), where A = 1 − . Notice that if K is a constant vector field, i.e., DK = 0, then [A, K]L extends uniquely to an element of P (H, HL ). We choose again the basis {ei }∞ i=0 of Fourier modes in H (see Eq.(1.5)) and define dwi (t) = ei , dW (t). We also define the stochastic process K t (ξ ) = (K ◦ t )(ξ ) and &= F − F
1 2
L−1
(DL Qi )Qi ,
i=0
where Qi (x) = Q(x)ei . Then one has Proposition 7.4. Let ξ ∈ H1 and K ∈ P (H, HL ). Then the equality VLt (ξ )K t (ξ ) = K(ξ ) +
t
+ 0 1 2
t
VLs (ξ )
0
∞
[Qi , K]sL (ξ ) dwi (s)
i=0
&, K]sL (ξ ) ds VLs (ξ ) −[A, K]sL (ξ ) + [F
+
t 0
VLs (ξ )
∞ $ i=0
%s Qi , [Qi , K]L L (ξ ) ds,
holds almost surely. The same equality also holds if ξ ∈ H2 and K ∈ P (H1 , HL ). Note that by [A, K]sL (ξ ) we mean DK s (ξ ) As (ξ ) − AL K s (ξ ) . Proof. This follows as in the finite dimensional case by Itô’s formula.
7.3. The restricted Hörmander condition. The condition for having appropriate mixing properties is the following Hörmander-like condition. Definition 7.5. Let K = {K (i) }M i=1 be a collection of functions in P (H, HL ). We say that K satisfies the restricted Hörmander condition if there exist constants δ, R > 0 such that for every h ∈ HL and every y ∈ H one has sup
inf
K∈K x−y ≤R
h, K(x)2 ≥ δ h 2 .
(7.4)
We now construct the set K for our problem. We define the operator [X 0 , · ]L : P (Hγ , HL ) → P (Hγ +1 , HL ) by [X0 , K]L = −[A, K]L + [F, K]L +
1 2
∞ $
Qi , [Qi , K]L
i=0
%
− L
1 2
L−1
$
i=0
(DL Qi )Qi , K
% L
.
548
J.-P. Eckmann, M. Hairer
This is a well-defined operation since Q is Hilbert–Schmidt and DQ is finite rank and we can write ∞ ∞ $ % 2 Qi , [Qi , K]L L = D K (Qi ; Qi ) + r, i=0
i=0
with r a finite sum of bounded terms. Definition 7.6. We define – K0 = {Qi , with i = 0, . . . , L − 1}, – K1 = {[X 0 , Qi ]L , with i = k∗ , . . . , L − 1}, – K = {[Qi , K]L , with K ∈ K−1 and i = k∗ , . . . , L − 1}, when > 1. Finally,
K = K0 ∪ · · · ∪ K3 .7
Remark 7.7. Since for i ≥ k∗ the Qi are constant vector fields, the quantity [X 0 , K] is in P (H, HL ) and not only in P (H1 , HL ). Furthermore, if K ∈ K then D j K is bounded for all j ≥ 0. We have Theorem 7.8. The set K constructed above satisfies the restricted Hörmander condition for the cutoff GL equation if ! is chosen sufficiently large. Furthermore, the inequality (7.4) holds for R = !/2. Finally, δ > δ0 > 0 for all sufficiently large !. Proof. The basic idea of the proof is as follows: The leading term of F is the cubic term um with m = 3. Clearly, if i1 , i2 , i3 are any 3 modes, we find % $ Ck +L ek , (7.5) ei1 , [ei2 , [u → u3 , ei3 ]L ]L L = k=±i1 ±i2 ±i3
where the e are the basis vectors of H defined in (1.5), and the Ck are non-zero combinatorial constants. By Lemma 3.3 the following is true: For every choice of a fixed k the three numbers i1 , i2 , and i3 of Ik satisfy – For j = 1, 2, 3 one has ij ∈ {k∗ , . . . , L − 1}. – If |k| < k∗ exactly one of the six sums ±i1 ± i2 ± i3 lies in the set {0, . . . , k∗ − 1} and exactly one lies in {−(k∗ − 1), . . . , 0}. In particular, the expression (7.5) does not depend on u. If instead of u3 we take a lower power, the triple commutator will vanish. The basic idea has to be slightly modified because of the cutoff !. First of all, the constant R in the definition of the Hörmander condition is set to R = !/2. Consider first the case where x ≥ 5!/2. In that case we see from (4.1) that the Q!,i , viewed as vector fields, are of the form ' (qi + 1)ei , if i < k∗ , Q!,i (x) = qi ei , if i ≥ k∗ . Since these vectors span a basis of HL the inequality (7.4) follows in this case (already by choosing only K ∈ K0 ). Consider next the more delicate case when x ≤ 5!/2. 7 The number 3 is the power 3 in u3 .
Uniqueness of Invariant Measure for Stochastic PDE
549
Lemma 7.9. For all x ≤ 3! one has for {i1 , i2 , i3 } = Ik the identity % ei1 , [ei2 , [X0 , ei3 ]L ]L L (x) =
$
Ck +L ek + r! (x),
(7.6)
k=±i1 ±i2 ±i3
where r! satisfies a bound
r! (x) ≤ C!−1 , with the constant C independent of x and of k < k∗ . Proof. In [X0 , ·]L there are 4 terms. The first, A, leads successively to [A, ei3 ]L = (1 + i32 )ei3 , which is constant, and hence the Lie bracket with ei2 vanishes. The second term contains the non-linear interaction F! . Since x ≤ 3! one has F! (x) = F (x). Thus, (7.5) yields the leading term of (7.6). The two remaining terms will contribute to r! (x). We just discuss the first one. We have, using (4.1), 1 x, ei3 +k∗ ei . [Q!,i , ei3 ]L (x) = −DQ!,i (x)ei3 = − χ ( x /!) !
x
This gives clearly a bound of order !−1 for this Lie bracket, and the further ones are handled in the same way. We continue the proof of Theorem 7.8. When k < k∗ , we consider the elements of K3 . They are of the form % Q!,i1 , [Q!,i2 , [X0 , Q!,i3 ]L ]L L (x) = qi1 qi2 qi3
$
Ck +L ek + r! (x) .
k=±i1 ±i2 ±i3
Thus, for ! = ∞ these vectors together with the Qi with i ∈ {k∗ , . . . , L − 1} span HL (independently of y with y ≤ 3!) and therefore (7.5) holds in this case, if x ≤ 5!/2 and R = !/2. The assertion for finite, but large enough ! follows immediately by a perturbation argument. This completes the case of x ≤ 5!/2 and hence the proof of Theorem 7.8. Proof of Theorem 7.3. The proof is very similar to the one in [Nor86], but we have to keep track of the x, t-dependence of the estimates. First of all, choose x ∈ H2 and t ∈ (0, t0 ]. From now on, we will use the notation O(ν) as a shortcut for C(1 + x ν2 ), where the constant C may depend on t0 and p, but is independent of x and t. Denote by R the constant found in Theorem 7.8 and define the subset Bx of H2 by ! Bx = y ∈ H2 : y − x ≤ R and y γ ≤ x γ + 1 for γ = 1, 2 . We also denote by B(I ) a ball of (small) radius O(1/L) centered at the identity in the space of all L × L matrices. (Recall that L is the dimension of HL , and that K ∈ K maps to HL .) We then have a bound of the type sup sup
∞ [Qi , K]L (y)2 ≤ O(0).
y∈Bx K∈K i=0
(7.7)
550
J.-P. Eckmann, M. Hairer
This is a consequence of the fact that QQ∗ is trace class and thus the sum converges and its principal term is equal to ∗ Tr Q∗ (y) DK (y) DK (y) Q(y) ∗ = Tr DK (y)Q(y) Q∗ (y) DK (y) =
L−1
∗
Q∗ (y) DK (y)ei 2 ≤ C! .
i=0
The last inequality follows from Remark 7.7. The other terms form a finite sum containing derivatives of the Qi and are bounded in a similar way. We have furthermore bounds of the type 2 sup sup [X 0 , K]L (y) ≤ O(ν), y∈Bx K∈K
2 $ % sup sup X 0 , [X0 , K]L L (y) ≤ O(ν),
y∈Bx K∈K ∞
sup sup
y∈Bx K∈K i=0
(7.8)
$ % Qi , [X0 , K]L (y)2 ≤ O(ν), L
where ν = 1. Let SL be the unit sphere in HL . By the assumptions on K and the choice of B(I ) we see that: (A) For every h0 ∈ SL , there exist a K ∈ K and a neighborhood N of h0 in SL such that δ inf inf inf V K(y), h2 ≥ , 2 y∈Bx V ∈B(I ) h∈N with δ the constant appearing in (7.4). Next, we define a stopping time τ by τ = min{t, τ1 , τ2 } with ! τ1 = inf s ≥ 0 : s (x) " ∈ Bx , ! τ2 = inf s ≥ 0 : VLs (x) " ∈ B(I ) , t < T as chosen in the statement of Theorem 7.3. It follows easily from Proposition 5.1 (E) that the probability of τ1 being small (meaning that in the sequel we will always assume ε ≤ 1) can be bounded by P(τ1 < ε) ≤ Cp (1 + x 2 )16p ε p , with Cp independent of x. Similarly, using Lemma 6.3, we see that P(τ2 < ε) ≤ Cp ε p . Observing that P(t < ε) < t −p ε p and combining this with the two estimates, we get for every p ≥ 1: P τ < ε ≤ O(16p)t −p ε p .
Uniqueness of Invariant Measure for Stochastic PDE
551
From this and (A) we deduce (B) for every h0 ∈ SL there exist a K ∈ K and a neighborhood N of h0 in SL such that for ε < 1, τ 2 s s VL (x)K (x), h ds ≤ ε ≤ P(τ < 2ε/δ) ≤ O(16p)t −p ε p , sup P h∈N
0
with δ the constant appearing in (7.4). Following [Nor86], we will show below that (B) implies: (C) for every h0 ∈ SL there exist an i ∈ {k∗ , . . . , L − 1}, a neighborhood N of h0 in SL and constants ν, µ > 0 such that for ε < 1 and p > 1 one has τ 2 s sup P VL (x)Qsi (x), h ds ≤ ε ≤ O(νp)t −µp ε p . h∈N
0
Remark 7.10. Note that for small x , Qi (x) = Qi,! (x) may be 0 when i < k∗ , but the point is that then we can find another i for which the inequality holds. By a straightforward argument, given in detail in [Nor86, p. 127], one concludes that (C) implies Theorem 7.3. It thus only remains to show that (B) implies (C). We follow closely Norris and choose a K ∈ K such that (B) holds. If K happens to be in K0 then it is equal to a Qi , and thus we already have (C). Otherwise, assume K ∈ Kj with j ≥ 1. Then we use a Martingale inequality. Lemma 7.11. Let H be a separable Hilbert space and W (t) be the cylindrical Wiener process on H. Let β t be a real-valued predictable process and γ t , ζ t be predictable H-valued processes. Define t t t 0 s β ds + γ s , dW (s), a =a + 0 0 t t t 0 s b =b + a ds + ζ s , dW (s). 0
0
Suppose τ ≤ t0 is a bounded stopping time such that for some constant C0 < ∞ we have ! sup |β s |, |a s |, ζ s , γ s ≤ C0 . 0<s<τ
Then, for every p > 1, there exists a constant Cp,t0 such that τ τ s2 P (bs )2 ds < ε20 and |a | + ζ s 2 ds ≥ ε ≤ Cp,t0 (1 + C06 )p ε p , 0
0
for every ε ≤ 1. Proof. The proof is given in [Nor86], but without the explicit dependence on C0 . If we follow his proof carefully we get an estimate of the type τ τ s2 P |a | + ζ s 2 ds ≥ (1 + C03 )ε ≤ C1 (1 + C012 )p ε p . (bs )2 ds < ε10 and 0
0
Replacing ε by ε2 and making the assumption ε < 1/(1+C03 ), we recover our statement. The statement is trivial for ε > 1/(1+C03 ), since any probability is always smaller than 1.
552
J.-P. Eckmann, M. Hairer
We apply this inequality as follows: Define, for K0 ∈ K, a t (x) = VLt [X 0 , K0 ]tL (x), h , bt (x) = VLt K0t (x), h , $ %t β t (x) = VLt X 0 , [X0 , K0 ]L L (x), h , $ %t (γ t )i (x) = VLt Qi , [X0 , K0 ]L L (x), h , (ζ t )i (x) = VLt [Qi , K0 ]tL (x), h . In this expression, ζ t (x) ∈ H, (ζ t )i (x) = ζ t (x), ei and similarly for the γ t . It is clear by Proposition 7.4, Eq.(7.7), and Eq.(7.8) that the assumptions of Lemma 7.11 are satisfied with C0 = O(ν) for some ν > 0. We continue the proof that (B) implies (C) in the case when K ∈ Kj , with j = 1. Then, by the construction of Kj with j ≥ 1, there is a K0 ∈ Kj −1 such that we have either K = [Qi , K0 ]L for some i ∈ {k∗ , . . . , L − 1}, or K = [X 0 , K0 ]L . In fact, for j = 1 only the second case occurs and K0 = Qi for some i, but we are already preparing an inductive step. Applying Lemma 7.11, we have for every ε ≤ 1: P 0
+
τ
2 VLs K0s (x), h ds
<ε
and 2
∞
VLs [Qi , K0 ]sL (x), h
τ 0
ds ≥ ε1/20
2
VLs [X 0 , K0 ]sL (x), h
≤ O(6νp)ε p/20 .
i=0
" τ 2 Since the second integral above is always larger than 0 VLt K t (x), h dt, the probability for it to be smaller than ε1/20 is, by (B), bounded by O(16p)t −p ε p/20 . This implies (replacing ν by max{6ν, 16}) that τ
P
0
2 VLs K0s (x), h ds < ε ≤ O(νp)t −p ε p/20 .
Since for j = 1 we have K0 = Qi with i ∈ {k∗ , . . . , L − 1}, we have shown (C) in this case. The above reasoning is repeated for j = 2 and j = 3, by iterating the above argument. For example, if K = [Qi1 , [X0 , Qi2 ]L ]L ,with i1 , i2 ∈ {k∗ , . . . , L − 1}, we apply Lemma 7.11 twice, showing the first time that [X0 , Qi2 ]L , h2 is unlikely to be small and then again to show that Qi2 , h2 is also unlikely to be small (with other powers of ε), which is what we wanted. Finally, since every K used in (B) is in K, at most 3 such invocations of Lemma 7.11 will be sufficient to conclude that (C) holds. The proof of Theorem 7.3 is complete.
7.4. Estimates on the low-frequency derivatives (Proof of Proposition 5.3). Having proven the crucial bound Theorem 7.1 on the reduced Malliavin matrix, we can now proceed to prove Proposition 5.3, i.e., the smoothing properties of the dynamics in the low-frequency part. For convenience, we restate it here.
Uniqueness of Invariant Measure for Stochastic PDE
553
Proposition 7.12. There exist exponents µ, ν > 0 such that for every ϕ ∈ Cb2 (H), every ξ ∈ Hα and every T > 0, one has E DL ϕ ◦ t (ξ )(DL tL )(ξ ) ≤ CT t −µ 1 + ξ να ϕ L∞ ,
(7.9)
for all t ∈ (0, T ]. Proof. The proof will use the integration by parts formula (6.4) together with Theorem 7.1. Fix ξ ∈ Hα and t > 0. In this proof, we omit the argument ξ to gain legibility, but it will be understood that the formulas do generally only hold if evaluated at some ξ ∈ Hα . We extend our phase space to include DL t , VLt and Dv tL . We define a new stochastic process : t by & = H ⊕ RL·L ⊕ HL ⊕ RL·L . : t = t , Dv tL , DL t , VLt ∈ H Applying the definitions of these processes, we see that : t is defined by the autonomous SDE given by dt = − At dt + F (t ) dt + Q(t ) dW (t), dDL t = − ADL t dt + DF (t )DL t dt + DQ(t )DL t dW (t), dVLt = VLt AL dt − VLt DL FL (t ) dt − VLt DL QL (t ) dWL (t) + VLt
L−1
2
DL QiL (t )
dt,
i=0
dDv tL = − AL Dv tL dt + DL FL (t )Dv tL dt + QL (t )2 (VLt )∗ dt + DL QL (t ) Dv tL dWL (t). This expression will be written in the short form & t dt + F &(: t ) dt + Q(: & t ) dW (t), d: t = −A: & and dW (t) the cylindrical Wiener process on H. It can easily be verified with : t ∈ H that this equation satisfies assumption A1 of Proposition 6.1. We consider again the stochastic process v t ∈ H defined in (6.10). It is clear from Lemma 6.5 that v t satisfies A2. With this particular choice of v, the first component of Dv : t (the one in H) is equal to Dv tL ⊕ 0. We choose a function ϕ ∈ Cb2 (H) and fix two indices i, k ∈ {0, . . . , L − 1}. Define & → R by ϕ˜i,k : H ϕ˜i,k (: t ) =
L−1 j =0
ϕ(t ) (Dv tL )−1 i,j DL tL j,k ,
where the inverse has to be understood as the inverse of a square matrix. By Theorem 7.1, ϕ˜i,k satisfies the assumptions of Proposition 6.1. A simple computation gives for every h ∈ RL the identity:
554
J.-P. Eckmann, M. Hairer
D ϕ˜i,k (: t )Dv : t h = DL ϕ(t ) Dv tL h (Dv tL )−1 i,j DL tL j,k + ϕ(t ) (Dv tL )−1 (Dv2 tL h)(Dv tL )−1 i,j DL tL j,k + ϕ(t ) (Dv DL tL )h i,j (Dv tL )−1 j,k , (7.10) where summation over j is implicit. We now apply the integration by parts formula in the form of Proposition 6.1. This gives the identity t s v h, dW (s) . E D ϕ˜i,k (: t )Dv : t h = E ϕ˜i,k (: t ) 0
Substituting (7.10), we find E DL ϕ(t ) Dv tL h (Dv tL )−1 i,j DL tL j,k = − E ϕ(t ) (Dv tL )−1 (Dv2 tL h)(Dv tL )−1 i,j DL tL j,k − E ϕ(t ) (Dv DL tL )h i,j (Dv tL )−1 j,k t s + E ϕ(t ) (Dv tL )−1 i,j DL tL j,k v h, dW (s) . 0
The summation over the index j is implicit in every term. We now choose h = ei and sum over the index i. The left-hand side is then equal to E DL ϕ(t ) DL tL ek , which is precisely the expression we want to bound. The right-hand side can be bounded in terms of ϕ L∞ and of E (Dv tL )−4 (at worst). The other factors are all given by components of Dv : t and can therefore be bounded by means of Theorem 8.9. Therefore, (7.9) follows. The proof of Proposition 7.12 is complete. 8. Existence Theorems In this section, we prove existence theorems for several PDE’s and SDE’s, in particular we prove Proposition 5.1 and Lemma 5.4. Much of the material here relies on well-known techniques, but we include the details for completeness. We consider again the problem dt = −At dt + F (t ) dt + Q(t ) dW (t),
(8.1)
with 0 = ξ given. The initial condition ξ will be taken in one of the Hilbert spaces Hγ . We will show that, after some time, the solution lies in some smaller Hilbert space. Note that we are working here with the cutoff equations, but we omit the index !. We will of course require that all stochastic processes are predictable. This means that if we write Lp (<, Y ), with Y some Banach space of functions of the interval [0, T ], we really mean that the only functions we consider are those that are measurable with respect to the predictable σ -field when considered as functions over < × [0, T ]. We first state precisely what is known about the ingredients of (8.1).
Uniqueness of Invariant Measure for Stochastic PDE
555
Lemma 8.1. The following properties hold for A, F and Q: P1 P2 P3
P4
P5
P6
The space H is a real separable Hilbert space and A : D(A) → H is a self-adjoint strictly positive operator. The map F : H → H has bounded derivatives of all orders. For every γ ≥ 0, F maps Hγ into itself. Furthermore, there exists a constant n > 0 independent of γ and constants CF,γ such that F satisfies the bounds
F (x) γ ≤ CF,γ 1 + x γ , (8.2a) n
F (x) − F (y) γ ≤ CF,γ x − y γ 1 + x γ + y γ , (8.2b) for all x and y in Hγ . There exists an α > 0 such that for every x, x1 , x2 ∈ H the map Q : H → L (H, H) satisfies α−3/8 A Q(x)HS ≤ C, Aα−3/8 Q(x1 ) − Q(x2 ) HS ≤ C x1 − x2 , where · HS denotes the Hilbert–Schmidt norm in H. The derivative of Q satisfies α A DQ(x) h ≤ C h , HS
(8.3)
for every x, h ∈ H. The derivative of F satisfies DF (x) y ≤ C(1 + x γ ) y γ , γ for every x, y ∈ Hγ .
Proof. The points P1, P2 are obvious. The point P4 follows from the definition (1.6) of Q and the construction of Q! in (4.1). To prove P3, recall that the map F = F! of the GL equation is of the type F! (u) = χ u /(3!) P (u), with P some polynomial and χ ∈ C0∞ (R). The key point is to notice that the estimate
uv γ ≤ Cγ u v γ + u γ v
holds for every γ ≥ 0, where uv denotes the multiplication of two functions. In particular, we have
un γ ≤ C u γ u n−1 , which, together with the fact that χ has compact support, shows (8.2a). This also shows that the derivatives of F in Hγ are polynomially bounded and so (8.2b) holds. P6 follows by the same argument. P5 immediately follows from the fact that the image of the operator The point DQ(x) h is contained in HL for every x, h ∈ H. Remark 8.2. The condition P1 implies that e−At is an analytic semigroup of contraction operators on H. We will use repeatedly the bound −At e x ≤ Cγ t −γ x . γ
556
J.-P. Eckmann, M. Hairer
We begin the study of (8.1) by considering the equation for the mild solution t :(t, ξ, ω) = e−At ξ + e−A(t−s) F :(s, ξ, ω) ds 0 (8.4) t + e−A(t−s) Q :(s, ξ, ω) dW (s, ω). 0
The study of this equation is in several steps. We will consider first the noise term, then the equation for a fixed instance of ω, and finally prove existence and bounds. We need some more notation: Definition 8.3. Let Hα be as above the domain of Aα with the graph norm. We fix, once and for all, a maximal time T . We denote by HTα the space C([0, T ], Hα ) equipped with the norm
y HTα = sup y(t) α . t∈[0,T ]
We write HT instead of HT0 . 8.1. The noise term. Let y ∈ Lp (<, HT ). (One should think of y as being y(t) = t .) The noise term in (8.4) will be studied as a function on Lp (<, HT ). It is given by the function Z defined as t Z(y) (ω) = t → e−A(t−s) Q y(ω)(s) dW (s, ω). (8.5) 0 p L (<, HTα )
We will show that Z(y) is in here is the Lp norm defined by
when y is in Lp (<, HT ). The natural norm
1/p . HTα , pZ(y) = Eω sup (Z(y))t (ω) pα t∈[0,T ]
Proposition 8.4. Let H, A and Q be as above and assume P1 and P4 are satisfied. Then, for every p ≥ 1 and every T < T0 one has |||Z(y)|||HTα ,p ≤ CT0 T p/16 .
(8.6)
Proof. Choose an element y ∈ Lp (<, HT ). In the sequel, we will consider y as a function over [0, T ] × < and we will not write explicitly the dependence on <. In order to get bounds on Z, we use the factorization formula and theYoung inequality. Choose γ ∈ (1/p, 1/8). The factorization formula [DPZ92] then gives the equality t s Z(y) (t) = C (t − s)γ −1 e−A(t−s) (s − r)−γ e−A(s−r) Q(y(r)) dW (r) ds. 0
0
e−At ,
Since A commutes with the Hölder inequality leads to p Z(y) (t) (8.7) α s t p = C (t − s)γ −1 e−A(t−s) (s − r)−γ Aα e−A(s−r) Q(y(r)) dW (r) ds 0 0 t s p ν ≤ Ct (s − r)−γ Aα e−A(s−r) Q(y(r)) dW (r) ds, 0
0
with ν = (pγ − 1)/(p − 1). For the next bound we need the following result:
Uniqueness of Invariant Measure for Stochastic PDE
557
Lemma 8.5 ([DPZ92, Thm. 7.2]). Let r → : r be an arbitrary predictable L 2 (H)valued process. Then, for every p ≥ 2, there exists a constant C such that p s s p/2 E : r dW (r) ≤ CE
: r 2HS dr . 0
0
This lemma, the Young inequality applied to (8.7), and P4 above imply t p p Aα e−A(t−s) Q y(s) dW (s) |||Z(y)|||Hα ,p = E sup T
≤ CT ν E ≤ CT ν E ≤ CT ν E ≤ CT ν E ≤ CT
ν
0
0 T s
0
0 T s
0
0 T s
T s
0 T
0
s
0
0≤t≤T
p (s − r)−γ Aα e−A(s−r) Q y(r) dW (r) ds p/2 2 (s − r)−2γ Aα e−A(s−r) Q y(r) HS dr ds p/2 2 2 (s − r)−2γ A3/8 e−A(s−r) Aα−3/8 Q y(r) HS dr ds p/2 2 (s − r)−2γ −3/4 Aα−3/8 Q y(r) HS dr ds
−2γ −3/4
ds
p/2 E
0
≤ CT 1+ν
T
s −2γ −3/4 ds
p/2
0
Aα−3/8 Q(y(s))p ds HS
T
,
(8.8)
0
provided γ < 1/8. We choose γ = 1/16 (which thus imposes the condition p > 16), and we find p |||Z(y)|||Hα ,p ≤ CT01+ν T p/16 . T
Thus, we have shown (8.6) for p > 16. Since we are working in a probability space the case of p ≥ 1 follows. This completes the proof of Proposition 8.4. 8.2. A deterministic problem. The next step in our study of (8.4) is the analysis of the problem for a fixed instance of the noise ω. Then (8.4) is of the form t −At e−A(t−s) F h(s, ξ, z) ds + z(t), h(t, ξ, z) = e ξ + 0
HTα .
One should think of this as an instance of Z(), but at where we assume that z ∈ this point of our proof, the necessary bounds are not yet available. We find it more convenient to study instead of h the quantity g defined by g(t, ξ, z) = h(t, ξ, z) − z(t). Then g satisfies t −At e−A(t−s) F g(s, ξ, z) + z(s) ds. (8.9) g(t, ξ, z) = e ξ + 0
We consider the solution (assuming it exists) as a map from the initial condition ξ and the deterministic noise term z. More precisely, we define G(ξ, z)t = g(t, ξ, z).
558
J.-P. Eckmann, M. Hairer
This is a map defined on H × HTα . Clearly, (8.9) reads: t G(ξ, z)t = e−At ξ + e−A(t−s) F G(ξ, z)s + z(s) ds.
(8.10)
0
To formulate the bounds on G, we need some more spaces that take into account the regularizing effect of the semigroup t → e−At . γ
Definition 8.6. For γ ≥ 0 the spaces GT are defined as the closures of C([0, T ], Hγ ) under the norm
y G γ = sup t γ y(t) γ + sup y(t) . T
t∈[0,T ]
t∈(0,T ]
Note that
y G γ ≤ Cγ ,T y Hγ . T
T
With these definitions, one has: Proposition 8.7. Assume the conditions P1–P4 are satisfied. Assume ξ ∈ H and z ∈ HTα . Then, there exists a map G : H × HTα → HT solving (8.10). One has the following bounds: (A) If ξ ∈ Hγ with γ ≤ α one has for every T > 0 the bound
G(ξ, z) Hγ ≤ CT (1 + ξ γ + z Hγ ). T
(8.11)
T
(B) If ξ ∈ H one has for every T > 0 the bound
G(ξ, z) GTα ≤ CT (1 + ξ + z HTα ).
(8.12)
Before we start with the proof proper we note the following regularizing bound: Define t N f (t) = e−A(t−s) f (s) ds. (8.13) 0
Then one has: Lemma 8.8. For every ε ∈ [0, 1) and every γ > ε there is a constant Cε,γ such that
N f G γ ≤ Cε,γ T f G γ −ε , T
for all f ∈
T
γ −ε GT .
Proof. We start with N f (t) ≤ γ
t/2
Aγ e−A(t−s) f (s) ds +
0
≤
t/2
(t − s)−γ f (s) ds +
0
≤
t/2
0
≤ Ct
(t − s)−γ f G γ −ε ds + T
1−γ
f G γ −ε + Ct T
1−ε ε−γ
t
t t/2
t t/2
ε −A(t−s) γ −ε A e A f (s) ds
(t − s)−ε f (s) γ −ε ds t
t/2
(t − s)−ε s ε−γ f G γ −ε ds T
f G γ −ε . T
Uniqueness of Invariant Measure for Stochastic PDE
559
Therefore, t γ N f (t)γ ≤ CT f G γ −ε . Similarly, we have N f (t) ≤
T
e−A(t−s) f (s) ds ≤ Ct f γ −ε . G
t
0
T
Combining the two inequalities, the result follows.
Proof of Proposition 8.7. We first choose an initial condition ξ ∈ Hγ and a function γ z ∈ HT . The local existence of the solutions in Hγ is a well-known result. Thus there exists, for a possibly small time T& > 0, a function u ∈ C([0, T&], Hγ ) satisfying t u(t) = e−At ξ + e−A(t−s) F u(s) + z(s) ds. 0
In order to get an a priori bound on u(t) γ we use assumption P3 and find t 1 + u(s) + z(s) γ ds
u(t) γ ≤ ξ γ + CF,γ 0 t ≤ C 1 + ξ γ + z Hγ + CF,γ
u(s) γ ds. T
0
By Gronwall’s lemma we get for t < T ,
u(t) γ ≤ CT (1 + ξ γ + z Hγ ).
(8.14)
T
γ
Note that (8.14) tells us that if the initial condition ξ is in Hγ and if z is in HT , then u(t) is, for small enough t, again in Hγ with the above bound. Therefore, we can iterate the above reasoning and show the global existence of the solutions up to time T , with bounds. Thus, G is well-defined and satisfies the bound (8.11). We turn to the proof of the estimate (8.12). Define for z ∈ HT the map Mz by t Mz (x) (t) = e−At ξ + e−A(t−s) F x(s) + z(s) ds. (8.15) 0
Taking ξ ∈ H we see from (8.14) with γ = 0 that there exists a fixed point u of Mz which satisfies
u HT = sup u(t) ≤ CT (1 + ξ + z HT ). t∈[0,T ]
Assume next that z ∈
HTα
and hence a fortiori z ∈ GTα . Then, by P3 one has
F (x + z) G γ ≤ C 1 + x G γ + z G γ . T
T
T
Since u is a fixed point and (8.15) contains a term of the form of (8.13) we can apply Lemma 8.8 and obtain for every γ ≤ α and ε ∈ [0, 1):
u G γ +ε = Mz (u) G γ +ε ≤ C ξ + CT F (u + z) G γ T T T ≤ C ξ + CT 1 + u G γ + z G γ . T
T
(8.16)
Thus, as long as z G γ is finite, we can apply repeatedly (8.16) until reaching γ = α, T and this proves (8.12). The proof of Proposition 8.7 is complete.
560
J.-P. Eckmann, M. Hairer
8.3. Stochastic differential equations in Hilbert spaces. Before we can start with the final steps of the proof of Proposition 5.1 we state in the next subsection a general existence theorem for stochastic differential equations in Hilbert spaces. The symbol H denotes a separable Hilbert space. We are interested in solutions to the SDE dX t = (−AX t + N (t, ω, X t ) + M t ) dt + B(t, ω, X t ) dW (t),
(8.17)
where W (t) is the cylindrical Wiener process on a separable Hilbert space H0 . We assume B(t, ω, Xt ) : H0 → H is Hilbert–Schmidt. We will denote by < the underlying probability space and by {Ft }t≥0 the associated filtration. The exact conditions spell out as follows: C1 The operator A : D(A) → H is the generator of a strongly continuous semigroup in H. C2 There exists a constant C > 0 such that for arbitrary x, y ∈ H, t ≥ 0 and ω ∈ < the estimates
N (t, ω, x) − N (t, ω, y) + B(t, ω, x) − B(t, ω, y) HS ≤ C x − y ,
N (t, ω, x) 2 + B(t, ω, x) 2HS ≤ C 2 (1 + x 2 ), hold. C3 For arbitrary x, h ∈ H and h0 ∈ H0 , the stochastic processes N (·, ·, x), h and B(·, ·, x)h0 , h are predictable. C4 The H-valued stochastic process M t is predictable, has continuous sample paths, and satisfies sup E M t p < ∞, t∈[0,T ]
for every T > 0 and every p ≥ 1. C5 For arbitrary t > 0 and ω ∈ <, the maps x → N (t, ω, x) and x → B(t, ω, x) are twice continuously differentiable with their derivatives bounded by a constant independent of t, x and ω. We have the following existence theorem. Theorem 8.9. Assume that ξ ∈ H and that C1–C4 are satisfied. – For any T > 0, there exists a mild solution Xξt of (8.17) with Xξ0 = ξ . This solution is unique among the H-valued processes satisfying T t 2 Xξ dt < ∞ = 1. P 0
Furthermore, Xξ has a continuous version and is strongly Markov. – For every p ≥ 1 and T > 0, there exists a constant Cp,T such that p E sup Xξt ≤ Cp,T (1 + ξ p ). t∈[0,T ]
(8.18)
– If, in addition, C5 is satisfied, the mapping ξ → Xξt (ω) has a.s. bounded partial derivatives with respect to the initial condition ξ . These derivatives satisfy the SDE’s obtained by formally differentiating (8.17) with respect to X. Proof. The proof of this theorem for the case M t ≡ 0 can be found in [DPZ96]. The same proof carries through for the case of non-vanishing M t satisfying C4.
Uniqueness of Invariant Measure for Stochastic PDE
561
8.4. Bounds on the cutoff dynamics (Proof of Proposition 5.1). With the tools from stochastic analysis in place, we can now prove Proposition 5.1. We start with the Proof of (A). In this case we identify Eq.(8.17) with (4.2) and apply Theorem 8.9. The condition C1 of Theorem 8.9 is obviously true, and the condition C3 is redundant in this case. The condition C2 is satisfied because F and Q of (8.17) satisfy P2–P4. Therefore, (8.18) holds and hence we have shown (5.1a) for the case of γ = 0. In particular, t! exists and satisfies t t −At e−A(t−s) F s! (ξ, ω) ds ! (ξ, ω) = e ξ + 0 (8.19) t −A(t−s) + e Q s! (ξ, ω) dW (s). 0
We can extend (5.1a) to arbitrary γ ≤ α as follows. We set as in (8.5), t e−A(t−s) Q s! (ξ, ω) dW (s). Z(! ) t (ω) =
(8.20)
0
By Proposition 8.4, we find that for all p ≥ 1 one has (
)1/p Eω sup
t∈[0,T ]
(Z(! ))t (ω) pα
< CT ,p
(8.21)
for all ξ . From this, we conclude that, almost surely, sup (Z(! ))t (ω) α < ∞.
t∈[0,T ]
(8.22)
Subtracting (8.20) from (8.19) we get t e−A(t−s) F s! (ξ, ω) ds t! (ξ, ω) − Z(! ) t (ω) = e−At ξ + 0 t (8.23) −At −A(t−s) s =e ξ+ e F ! (ξ, ω) − Z(! ) s (ω) + Z(! ) s (ω) ds. 0
Comparing (8.23) with (8.10) we see that, a.s., t! (ξ, ω) − Z(! ) t (ω) = G ξ, Z ! (ξ, ·) (ω) . We now use z as a shorthand:
z(t) = Z ! (ξ, ·) (ω). t
Assume now ξ ∈ Hγ . Note that by (8.22), z(t) is in Hα . If γ ≤ α, we can apply Proposition 8.7 and from (8.11) we conclude that almost surely, sup G(ξ, z) γ ≤ CT 1 + ξ γ + sup z γ . t∈[0,T ]
t∈[0,T ]
562
J.-P. Eckmann, M. Hairer
Finally, since γ ≤ α, we find E sup t! (ξ ) pγ ≤ CE sup G(ξ, z)t pγ + CE sup z(t) pγ t∈[0,T ]
t∈[0,T ]
p
t∈[0,T ]
≤ CT ,p (1 + ξ γ ) + CE
sup
t∈[0,T ]
≤ CT ,p (1 + ξ γ )p ,
z(t) pγ
(8.24)
where we applied (8.21) to get the last inequality. Thus, we have shown (5.1a) for all γ ≤ α. The fact that the solution is strong if γ ≥ 1 is an immediate consequence of [Lun95, Lemma 4.1.6] and [DPZ92, Thm. 5.29]. Proof of (B). This bound can be shown in a similar way, using the bound (8.12) of Proposition 8.7: Take ξ ∈ H. By the above, we know that there exists a solution to (8.19) satisfying the bound (5.1b) with γ = 0. We define z(t) and G(ξ, z)t as above. But now we apply the bound (8.12) of Proposition 8.7 and we conclude that almost surely, sup t α G(ξ, z) α ≤ CT 1 + ξ + sup z α . t∈[0,T ]
t∈[0,T ]
Following a procedure similar to (8.24), we conclude that (5.1b) holds. Proof of (C). The existence of the partial derivatives follows from Theorem 8.9. To show the bound, choose ξ ∈ H and h ∈ H with h = 1, and define the process : t = Dt! (ξ ) h. It is by Theorem 8.9 a mild solution to the equation d: t = −A: t dt +
DF ◦ t! (ξ ): t dt + DQ ◦ t! (ξ ): t dW (t).
(8.25)
By P3 and P5, this equation satisfies conditions C1–C3 of Theorem 8.9, so we can apply it to get the desired bound (5.1c). (The constant term drops since the problem is linear in h.) Proof of (D). Choose h ∈ H and ξ ∈ Hα and define as above : t = Dt! (ξ ) h, which is the mild solution to (8.25) with initial condition h. We write this as t e−A(t−s) DF ◦ s! (ξ ): s ds : t = e−At h + 0 t + e−A(t−s) DQ ◦ s! (ξ ): s dW (s) 0
≡ S1t + S2t + S3t . The term S1t satisfies
sup t α S1t α ≤ CT h .
t∈(0,T ]
(8.26)
The term S3t is very similar to what is found in (8.5), with Q y(s) replaced by (DQ ◦ s! ): s . Repeating the steps of (8.8) for a sufficiently large p, we obtain now with γ = 41 , some µ > 0 and writing Xs = DQ ◦ s! (ξ ): s :
Uniqueness of Invariant Measure for Stochastic PDE
E sup
t∈[0,T ]
S3t pα
=E
sup
0≤t≤T
≤ CT µ E
≤ CT
µ
0
0 T s
0 T
0
T s 0 T s
≤ CT E ≤ CT µ E
p Aα e−A(t−s) X s dW (s)
t
0
µ
0
s
0
≤ CT µ+p/4 E
563
p (s − r)−γ Aα e−A(s−r) X r dW (r) ds p/2 2 (s − r)−2γ Aα e−A(s−r) X r HS dr ds p/2 2 (s − r)−2γ Aα X r HS dr ds p/2 ds E
−2γ
0
0 T
T
Aα X s p ds HS
A DQ ◦ s (ξ ): s p ds. ! HS α
We now use P5, i.e., (8.3) and then (5.1c) and get T t p µ+p/4 E
: s p ds ≤ CT µ+p/4+1 h p . E sup S3 α ≤ CT t∈[0,T ]
(8.27)
0
To treat the term S2t , we fix a realization ω ∈ < of the noise and use Lemma 8.8. This gives for ε ∈ [0, 1) the bound sup t γ S2t γ ≤ CT sup t γ −ε DF ◦ t! (ξ ): t γ −ε . t∈(0,T ]
t∈(0,T ]
By P6, this leads to the bound, a.s., sup t γ S2t γ ≤ CT 1 + sup t! (ξ )γ −ε sup t γ −ε : t γ −ε . t∈(0,T ]
t∈(0,T ]
Taking expectations we have E sup t
γp
t∈(0,T ]
S2t pγ
≤
p CT E
(
t∈(0,T ]
) p t t p (γ −ε)p : . 1 + sup ! (ξ )γ −ε sup t γ −ε
t∈(0,T ]
t∈(0,T ]
By the Schwarz inequality and (5.1a) we get 2p 1/2 p E sup t γp S2t pγ ≤ CT ,p 1 + ξ γ −ε E sup t (γ −ε)2p : t γ −ε . t∈(0,T ]
t∈(0,T ]
(8.28)
Since : t = Dt! (ξ ) h = S1t + S2t + S3t , combining (8.26)–(8.28) leads to E sup t γp Dt! (ξ ) h pγ t∈(0,T ]
2p 1/2 p ≤ CT ,p h p + CT ,p 1 + ξ γ −ε E sup t (γ −ε)2p Dt! (ξ ) hγ −ε . t∈(0,T ]
Thus, we have gained ε in regularity. Choosing ε = 21 and iterating sufficiently many times we obtain (5.1d) for sufficiently large p. The general case then follows from the Hölder inequality.
564
J.-P. Eckmann, M. Hairer
Proof of (E). We estimate this expression by t t s t F (ξ ) ds + e−A(t−s) Q s (ξ ) dW (s) . (ξ ) − e−At ξ ≤ ! ! ! γ γ 0
0
γ
The first term can be bounded by combining (5.1b) and P3. The second term is bounded by Proposition 8.4. The proof Proposition 5.1 is complete. 8.5. Bounds on the off-diagonal terms. Here, we prove Lemma 5.4. This is very similar to the proof of (D) of Proposition 5.1. Proof. We fix T > 0 and p ≥ 1. We start with (5.5b). Recall that here we do not write the cutoff !. We choose h ∈ HH and ξ ∈ H. The equation for : s = DH sL (ξ ) h is : s :s = e−A(s−s ) DFL ◦ s! (ξ ) DH s (ξ ) h ds 0 s + e−A(s−s ) DQL ◦ s! (ξ ) DH s (ξ ) h dW (s ) 0
≡ R1s + R2s . Since DF = DF! is bounded we get s s R ≤ C DH s (ξ ) h ds ≤ Cs sup DH s (ξ ) h . 1 s ∈[0,s]
0
Using (5.1c), this leads to p p E sup R1s ≤ C p t p E sup DH s (ξ ) h ≤ CT ,p t p h p . s∈[0,t]
s∈[0,t]
(5.5b) follows. The term R2s is bounded exactly as in (8.27). Combining the bounds, Since QH is constant, see (4.1), we get for : s = DL sH (ξ ) h and h ∈ HL : s
: =
s
e−A(s−s ) DFH ◦ s! (ξ ) DL s (ξ ) h ds .
0 s This is bounded like R1 and leads to (5.5a). This completes the proof of Lemma 5.4.
8.6. Proof of Proposition 2.3. Here we point out where to find the general results on (1.7) which we stated in Proposition 2.3. Note that these are bounds on the flow without cutoff !. Proof of Proposition 2.3. There are many ways to prove this. To make things simple, without getting the best estimate possible, we note that a bound in L∞ can be found in [Cer99, Prop. 3.2]. To get from L∞ to H, we note that ξ ∈ H and we use (1.7) in its integral The term e−At ξ is bounded in H, while the non-linear term " t −A(t−s) s form. F (ξ ) ds can be bounded by using a version of Lemma 8.8. Finally, the 0 e noise term is bounded by Proposition 8.4. Furthermore, because of the compactness of the semigroup generated by A, it is possible to show [DPZ96, Thm. 6.3.5] that an invariant measure exists.
Uniqueness of Invariant Measure for Stochastic PDE
565
Acknowledgement. We thank L. Rey-Bellet and G. Ben-Arous for helpful discussions. We also thank the referee for insightful remarks which clarified some ambiguities of an earlier version. This research was partially supported by the Fonds National Suisse.
References [BKL00a] Bricmont, J., Kupiainen, A., and Lefevere, R.: Probabilistic Estimates for the Two Dimensional Stochastic Navier–Stokes Equations. J. Stat. Phys. 100, 743–756 (2000) [BKL00b] Bricmont, J., Kupiainen, A., and Lefevere, R.: Ergodicity of the 2d Navier-Stokes Equation with Random Forcing. Commun. Math. Phys. (to appear) [Cer99] Cerrai, S.: Smoothing Properties of Transition Semigroups Relative to SDEs with Values in Banach Spaces. Probab. Theory Relat. Fields 113, 85–114 (1999) [Col94] Collet, P.: Non-Linear Parabolic Evolutions in Unbounded Domains. NATO Adv. Sci. Inst. Ser. C Math. Phys. Sci 437, 97–104 (1994) [DPZ92] Da Prato, G. and Zabczyk, J.: Stochastic Equations in Infinite Dimensions. Cambridge: Cambridge University Press, 1992 [DPZ96] Da Prato, G. and Zabczyk, J.: Ergodicity for Infinite Dimensional Systems. London Mathematical Society Lecture Note Series, Vol. 229, Cambridge: Cambridge University Press, 1996 [EH00] Eckmann, J.-P. and Hairer, M.: Non-Equilibrium Statistical Mechanics of Strongly Anharmonic Chains of Oscillators. Commun. Math. Phys. 212, 105–164 (2000) [EMS00] E, W., Mattingly, J. C., and Sinai, Y. G.: Gibbsian Dynamics and Ergodicity for the Stochastically Forced Navier–Stokes Equation. Preprint, 2001 [EPR99a] Eckmann, J.-P., Pillet, C.-A., and Rey-Bellet, L.: Non-Equilibrium Statistical Mechanics of Anharmonic Chains Coupled to Two Heat Baths at Different Temperatures. Commun. Math. Phys. 201, 657–697 (1999) [EPR99b] Eckmann, J.-P., Pillet, C.-A., and Rey-Bellet, L.: Entropy Production in Non-Linear, Thermally Driven Hamiltonian Systems. J. Stat. Phys. 95, 305–331 (1999) [FM95] Flandoli, F. and Maslowski, B.: Ergodicity of the 2-D Navier–Stokes Equation Under Random Perturbations. Commun. Math. Phys. 172, 119–141 (1995) [Hör67] Hörmander, L.: Hypoelliptic Second Order Differential Equations. Acta Math. 119, 147–171 (1967) [Hör85] Hörmander, L.: The Analysis of Linear Partial Differential Operators 1–4. New York: Springer, 1985 [KS00] Kuksin, S. B. and Shirikyan, A.: Stochastic Dissipative PDE’s and Gibbs Measures. Preprint, 2000 [Lun95] Lunardi, A.: Analytic Semigroups and Optimal Regularity in Parabolic Problems. Basel: Birkhäuser, 1995 [Mal78] Malliavin, P.: Stochastic Calculus of Variation and Hypoelliptic Operators. In: Proc. Intern. Symp. SDE (Kyoto), New York: Wiley, 1978 [MS98] Maslowski, B. and Seidler, J.: Invariant Measures for Nonlinear SPDE’s: Uniqueness and Stability. Archivum Math. (Brno) 34, 153–172 (1998) [MS95] Möller, H. M. and Stetter, H. J.: Multivariate Polynomial Equations with Multiple Zeros Solved by Matrix Eigenproblems. Numerische Mathematik 70, 311–329 (1995) [Nor86] Norris, J.: Simplified Malliavin Calculus. Lecture Notes in Mathematics Vol. 1204, Berlin– Heidelberg–New York: Springer, 1986, pp. 101–130 [Str86] Stroock, D.: Some Applications of Stochastic Calculus to Partial Differential Equations. Lecture Notes in Mathematics Vol. 976, Berlin–Heidelberg–New York: Springer, 1986, pp. 267–382 Communicated by A. Kupiainen
Commun. Math. Phys. 219, 567 – 605 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Functorial QFT, Gauge Anomalies and the Dirac Determinant Bundle Jouko Mickelsson1 , Simon Scott2 1 Department of Theoretical Physics, Royal Institute of Technology, 10044 Stockholm, Sweden.
E-mail: [email protected]
2 Department of Mathematics, King’s College, London WC2R 2LS, UK. E-mail: [email protected]
Received: 27 August 1999 / Accepted: 17 December 2000
Abstract: Using properties of the determinant line bundle for a family of elliptic boundary value problems, we explain how the Fock space functor defines an axiomatic quantum field theory which formally models the Fermionic path integral. The “sewing axiom” of the theory arises as an algebraic pasting law for the determinant of the Dirac operator. We show how representations of the boundary gauge group fit into this description and that this leads to a Fock functor description of certain gauge anomalies. Contents 1. 2. 3.
4.
5.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Axiomatic QFT . . . . . . . . . . . . . . . . . . . . . . 1.2 Heuristic path integral formulae . . . . . . . . . . . . . . Determinant Line Bundles and Fock Spaces . . . . . . . . . . . Construction of the FQFT . . . . . . . . . . . . . . . . . . . . . 3.1 Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Projective representations of categories . . . . . . . . . . 3.3 The category Cd . . . . . . . . . . . . . . . . . . . . . . 3.4 The category CGr . . . . . . . . . . . . . . . . . . . . . 3.5 The projective functor Cd → CGr . . . . . . . . . . . . . 3.6 The functor CGr → Cvect . . . . . . . . . . . . . . . . . . 3.7 The Fock Functor . . . . . . . . . . . . . . . . . . . . . Gauge Anomalies and the Fock Functor . . . . . . . . . . . . . . 4.1 Commutator anomaly on the boundary . . . . . . . . . . 4.2 Chiral anomaly in the bulk . . . . . . . . . . . . . . . . . 4.3 Relation of the chiral anomaly to the commutator anomaly 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . Path Integral Formulae and a 0+1-Dimensional Example . . . . . 5.1 Path integral formulae . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
568 569 570 571 580 580 581 582 583 586 587 590 591 593 596 597 598 599 599
568
J. Mickelsson, S. Scott
5.2 5.3
A (0 + 1)-dimensional example . . . . . . . . . . . . . . . . . . . Relation to the Berezin integral . . . . . . . . . . . . . . . . . . .
601 603
1. Introduction Advances in the construction of topological invariants for low-dimensional manifolds using methods from gauge theory have led to a great deal of interest in the construction of quantum field theories as modified cohomology theories [1, 22, 23, 25, 26]; that is, as generalized functors from manifolds to vector spaces. The purpose of this paper is to explain a construction of a functorial quantum field theory (FQFT) using the Fock functor, generalizing a construction suggested by Segal [22] in (1+1)-dimensions. This may be of particular interest in view of recent developments in the theory of branes in superstring theory. In doing so, we realize the higher-dimensional gauge group representations of [14] in terms of a d+1-dimensional FQFT, while the gluing law of the FQFT arises as an algebraic pasting law for the determinant of a Dirac operator with respect to a partition of the underlying manifold. The aim of FQFT is to abstract the algebraic structure that the path integral would create if it existed as a rigorous mathematical object. With respect to a partition of the underlying manifold the functoriality formally encodes intriguing formal gluing laws for spectral or topological invariants realized as expectation values. The prototypical situation we consider is for a family of chiral Dirac operators over an even-dimensional manifold with closed odd-dimensional boundary. The parameter space A in this case is an affine space of gauge potentials cross Riemannian metrics, acting on which one has a group G of gauge transformations. To a spin manifold X with boundary Y endowed with an admissible decomposition HY = W ⊕ W ⊥ of the space of boundary spinor fields, the Fock functor associates a Fock space FW of holomorphic sections of a relative determinant line bundle over the restricted Grassmannian defined by the polarization W . Globally the functor associates a bundle F of Fock spaces to the parameter space A and the sewing properties of the determinant line bundle for a family of elliptic boundary value problems explained in [20] translate into the required functorial properties of the FQFT. The constructions of [14] arise in this situation in terms of two “orthogonal” G-anomalies. First, there is the “bulk” even-dimensional chiral anomaly measuring the obstruction to a G-equivariant determinant regularization. Second, associated to the gauge group on the boundary one has the odd-dimensional Mickelsson–Faddeev commutator anomaly. In the FQFT context the G-action is lifted from A to a projective bundle map on the Fock bundle, rather than an automorphism of the whole (fixed) space of sections of the determinant line bundle [14]. FQFT formally encodes the relation between the path integral and Hamiltonian approaches to second quantization, the Fock functor FQFT we consider provides a coherent framework in which to describe simultaneously the path integral (determinant line) description of gauge anomalies and their Hamiltonian (Fock space) realization. We hope this may serve to clarify some of the underlying mathematical structures of the QFT. In the remainder of the Introduction we recall from [1, 23, 26] the axiomatic characterization of a QFT and the heuristic path integral formulae this aims to encode. In Sect. 2 we explain some facts about determinant bundles and Fock spaces associated to families of elliptic boundary value problems. In Sect. 3 we define the Fock space functor in general and outline its fundamental properties. In Sect. 4 we apply this to the interactive Yang–Mills gauge theory associated to A and discuss the boundary gauge group action and the chiral anomaly and commutator anomalies. In Sect. 5 we outline the
Determinant Bundles and FQFT
569
path integral formulae for an elliptic boundary value problem which the Fock functor aims to model, and present a concrete 0 + 1-dimensional example which relates our constructions to the finite-dimensional Fermionic (Berezin) integral. 1.1. Axiomatic QFT. A (d+1)-dimensional FQFT means a functor from the category Cd of d-dimensional closed manifolds and cobordisms to the category of vector spaces and linear maps, which satisfies certain natural axioms suggested by path integral formulae. A morphism in Cd between d-dimensional manifolds Y0 , Y1 is a (d + 1)-dimensional manifold X with boundary ∂X = Y0 Y1 . The orientation on the “incoming” boundary Y0 is assumed to be induced by the orientation of X and the inward directed normal vector field on the boundary, whereas for the “outgoing” boundary Y1 the orientation is fixed by the outward directed normal vector field. Let Cvect denote the category whose objects are topological vector spaces and whose morphisms are homomorphisms. A d + 1-dimensional FQFT means a functor Z : Cd → Cvect assigning to each d-dimensional manifold Y a vector space Z(Y ) and to each cobordism X a vector ZX ∈ Z(∂X). By fiat Z(∅) = C, so that if X is closed then ZX is a complex number. Z is required to satisfy the following axioms. For (d + 1)-dimensional manifolds X, X0 , X1 and d-dimensional manifolds Y, Y0 , Y1 : A1. Multiplicativity: ZX0 X1 = ZX0 ⊗ ZX1 ,
Z(Y0 Y1 ) = Z(Y0 ) ⊗ Z(Y1 ).
A2. Duality: If Y denotes Y with reversed orientation then Z(Y ) = Z(Y )∗ . A3. Associativity: If M = X0 ∪Y X1 with ∂X0 = Y0 Y and ∂X1 = Y Y1 , then ZM = ZX1 ◦ ZX0 . A4. Hermitian: ZX = ZX . The associativity property refers to the fact that Axioms A1 and A2 mean that a cobordism X ∈ Cd induces a linear transformation ZX ∈ Cvect through the identifications ZX ∈ Z(Y0 Y1 ) = Z(Y0 ) ⊗ Z(Y1 ) = Z(Y0 )∗ ⊗ Z(Y1 ) = Hom(Z(Y0 ), Z(Y1 )). Thus morphisms in Cd are taken to morphisms in Cvect . In particular, we have then a canonical pairing Z(Y0 ) ⊗ Z(Y ) ⊗ Z(Y ) ⊗ Z(Y1 ) −→ Z(Y0 ) ⊗ Z(Y1 ). In the case when Y0 = Y1 = ∅, so M is a closed manifold , this becomes a pairing ( , ) : Z(Y ) ⊗ Z(Y ) −→ C,
(1.1)
and A3 implies the sewing property ZM = (ZX0 , ZX1 ).
(1.2)
570
J. Mickelsson, S. Scott
This is perhaps the most striking feature of a FQFT, it states that by partitioning the manifold M into “simpler” codimension 0 submanifolds, the number ZM can be computed by evaluating over the submanifolds and then sewing together the results via the bilinear pairing. The bilinear pairing further implies that if ∂X = Y Y and f : Y → Y is an orientation reversing diffeomorphism, then Tr (ZX (f )) = ZXf .
(1.3)
Here Xf is the closed manifold obtained by identifying the boundary components via f , and ZX (f ) ∈ End(Z(Y )) is induced by functoriality (and in (1.3) is implicitly assumed to be trace class). The Hermitian Axiom A4 applies to the case of a unitary FQFT. This means there is a non-degenerate Hermitian structure < , >: Z(Y ) ⊗ Z(Y ) −→ C, and hence a canonical isomorphism Z(Y )∗ ≡ Z(Y ). A4 is the corresponding expected behaviour of ZX . These axioms are idealized, and in practice some modifications are needed. This is illustrated in the FQFT we consider in Sect. 3. 1.2. Heuristic path integral formulae. The above framework aims to algebraicize the relation between the Feynman path integral formulation of QFT and its Hilbert space formulation. The following heuristic interpretation is useful to bear in mind. If X has connected boundary, the vector ZX represents the partition function, which is given by a formal path integral e−S(ψ) Dψ, (1.4) ZX : E(Y ) −→ C, ZX (f ) = Ef (X)
where Dψ is a formal measure. Here S : Ef (X) → C is an action functional on a space of fields on X, which for definiteness we shall take to be the space of C 0 functions on X, with boundary value f ∈ E(Y ). The vector space Z(Y ) is a space of functions on E(Y ) and forms the Hilbert space of the theory, and ZX is the vacuum state. To a cobordism X ∈ Cd with ∂X = Y0 Y0 one has f = (f0 , f1 ), and then e−S(ψ) Dψ, ZX (f0 , f1 ) = E(f0 ,f1 ) (X)
is the kernel of the linear operator ZX ∈ Hom(Z(Y0 ), Z(Y1 )) defined by ZX (f0 , f1 )ξ0 (f0 )Df0 . ZX (ξ0 )(f1 ) = E (Y0 )
If Y0 = Y1 = Y we hence obtain the bilinear form corresponding to (1.1): ξ1 (f )ZX (ξ0 )(f ) Df. (ξ0 , ξ1 ) = E (Y )
In the case of a closed manifold M = X0 ∪Y X1 we can express the space of C 0 functions on M as a fibre product E(M) = E(X0 ) ×E (Y ) E(X1 ) and so formally one expects an equality −S(ψ) −S(ψ0 ) e Dψ = Df e Dψ0 e−S(ψ1 ) Dψ1 E (M)
E (Y )
=
E (Y )
Ef (X0 )
ZX0 (f )ZX1 (f )Df,
Ef (X1 )
(1.5)
Determinant Bundles and FQFT
571
which is the path integral version of the algebraic sewing formula (1.2). The Hamiltonian of the theory is defined by the Euclidean time evolution operator e−tH = ZY ×[0,t] ∈ End(Z(Y )), and to compute the trace one has the integral formulae Tr (e−tH ) =
E (Y )
Tr ZX (f, f ) Df,
and corresponding to (1.3) Tr (e
−tH (f )
)=
E (Xf )
e−S(ψ) Dψ.
(1.6)
The sewing formula (1.5) says that the partition function on M is the vacuum-vacuum expectation value calculated from the partition functions on the two halves. Equivalently: the invariant ZM is obtained from ZX0 (f ) and ZX1 (f ) by integrating (“averaging”) away the choice of boundary data f . In the case of determinants of Dirac operators this formalism provides some insight to sewing formulae relative to a partition of the underlying manifold (see Sect. 5). First, we need to review some facts about determinants and Fock bundles for families of Dirac operators.
2. Determinant Line Bundles and Fock Spaces The determinant of a family of first-order elliptic operators arises canonically not as a function, but as a section of a complex line bundle called the determinant line bundle. The anomalies we shall discuss may be realized as obstructions to constructing appropriate trivializations of that bundle. Equivalently, we can view the determinant line of an operator as a ray in the associated Fock space (via the Plücker embedding), and globally the determinant bundle as rank 1 subbundle of an infinite-dimensional Fock bundle to which the gauge group lifts as a projective bundle map. First, recall the construction of the determinant line bundle for a family of Diractype operators over a closed compact manifold X. Such a family can be specified by a smooth fibration of manifolds π : M −→ B with fibre diffeomorphic to X, endowed with a Riemannian metric gM/B along the fibres and a vertical bundle of Clifford modules S(M/B) which we may identify with the vertical spinor bundle tensored with an external vertical gauge bundle ξ . We assume that ξ is endowed with a Hermitian structure with compatible connection. The manifold B is not required to be compact. We refer to this data as a geometric fibration. Associated to a geometric fibration one has a smooth elliptic family of Dirac operators D = {Db | b ∈ B} : H −→ H, where H = π∗ (S(M/B)) is the infinite-dimensional Hermitian vector bundle on B whose fibre at b is the Frechet space of smooth sections Hb = C ∞ (Mb , Sb ), and Sb a bundle of Clifford modules. If X is even-dimensional there is a Z2 bundle grading H = H+ ⊕ H− into positive and negative chirality fields and we then have a family of chiral Dirac operators D : H+ −→ H− . The Quillen determinant line bundle DET(D) is a complex line bundle over B with fibre at b canonically isomorphic to the complex line Det(Ker Db )∗ ⊗ Det Coker(Db ) [5, 18], where for a finite-dimensional vector space V , Det V is the complex line ∧max V . The bundle structure is defined relative to the covering of B by open subsets Uλ , with λ ∈ R+ , parameterising those operators Db for which λ is not in the spectrum of the Laplacian
572
J. Mickelsson, S. Scott
Db∗ Db . Over each Uλ are smooth finite-rank vector bundles Hλ+ , Hλ− equal to the sum of eigenspaces of Db∗ Db (resp. of Db Db∗ ) for eigenvalues less than λ, and one defines DET(D)|Uλ = Det(Hλ+ )∗ ⊗ Det Hλ− .
(2.1)
The locally defined line bundles patch together naturally over the overlaps Uλ ∩ Uλ [4]. This “spectral” construction of the determinant line bundle is designed to allow one to define the Quillen ζ -function metric and a compatible connection whose curvature R ζ is identified with the 2-form component of the Bismut family’s index density: ζ
R = (2πi)
−n/2
M/B
A(M/B)ch(ξ )
,
(2.2)
2
where A(M/B) is the vertical A-hat form and ch(ξ ) the Chern character, see [18, 5, 4]. There is, however, a natural alternative construction of the determinant line bundle, due to Segal [24, 17], and applied to Dirac families in [20], which allows us to consider more general smooth families of Fredholm operators, which need not be elliptic operators. Let α : H 0 → H 1 be a Fredholm operator of index zero. Then a point of the complex line over α is an equivalence class [A, λ] of pairs (A, λ), where the invertible operator A : H 0 → H 1 is such that A−1 α ∈ End(H 0 ) is trace-class, λ ∈ C, and the equivalence relation is defined by (Aq, λ) ∼ (A, det F (q)λ) for q ∈ End(H 0 ) an operator of the form identity plus trace-class, and det F denotes the Fredholm determinant. If ind α = d we define Det(α) := Det(α ⊕ 0) with α ⊕ 0 acting H 0 → H 1 ⊕ Cd if d > 0, or H 0 ⊕ C−d −→ H 1 if d < 0. Note that, by definition, a Fredholm operator of index zero has an approximation by an invertible operator A such that A − α has finite rank. We work with the larger ideal of trace-class operators in order to be able to use the (complete) topology determined by the trace norm. With the above notation, the abstract determinant of α is defined to be the canonical element detα := [A, detF (A−1 α)] ∈ Det(α). For an admissible smooth family of Fredholm operators A = {αb | b ∈ B} : H0 → H1 acting between (weak) vector bundles Hi [20, 24], the union DET(A) of the determinant lines is naturally a complex line bundle. The bundle structure is defined relative to a denumerable covering of open sets Uτ , where τ : H0 → H1 is finite-rank and Uτ parameterizes those b for which Ab = αb + τb is invertible, via the local trivialization b → det(αb + τb ) over Uτ . On the intersection Uτ 1 ∩ Uτ 2 the transition function is b → det F ((αb + τb1 )−1 (αb + τb2 )). For a family of elliptic operators, such as D, there is a canonical isomorphism between the two constructions of the determinant bundle described above which preserves the determinant section b → detDb , and we may therefore use them interchangeably [20]. This is important when we consider the determinant line bundle for a family of elliptic boundary value problems (EBVPs). To define such a family we proceed initially as for the case of a closed manifold with a geometric fibration π : M −→ B of connected manifolds, but with fibre diffeomorphic to a compact connected manifold X with boundary ∂X = Y . Note that the boundary manifolds ∂M and ∂X may possibly be disconnected. Globally we obtain as before a family of Dirac operators D = {Db | b ∈ B} : H −→ H. We assume that the geometry in a neighbourhood, U ≡ ∂M×[0, 1] of the boundary is a pull-back of the geometry induced on the boundary geometric fibration of closed boundary manifolds ∂π : ∂M −→ B. This means that all metrics and connections on T (M/B) and S(M/B) restricted to U are geometric products composed of the trivial geometry in the normal u-coordinate direction, and the boundary geometry in tangential
Determinant Bundles and FQFT
573
directions, so gM/B = du2 + g∂M/B and so forth. In Ub := U ≡ ∂Xb × [0, 1] the Dirac operator Db then has the form ∂ + DY,b , (2.3) Db|U = Gb ∂u where DY,b is a boundary Dirac operator and Gb is a unitary bundle automorphism. The family of boundary Dirac operators DY = {DY,b | b ∈ B} : HY −→ HY , where HY is the bundle with fibre C ∞ (Yb , SYb ) at b ∈ B, is identified with family defined by the fibration ∂π : ∂M −→ B. In contrast to the closed manifold case, the operators Db are not Fredholm. The crucial analytical property underlying the following determinant line and Fock bundle identifications is the existence of a canonical identification between the infinitedimensional space Ker(Db ) of solutions to the Dirac operator and the boundary traces K(Db ) = γ Ker(Db ), where γ : C ∞ (Xb , Sb ) → C ∞ (Yb , SYb ) is the operator restricting sections to the boundary. More precisely, the Poisson operator Kb : C ∞ (Yb , SYb ) → C ∞ (Xb , Sb ) restricts to define the above isomorphism. It extends to a continuous operator Kb : H s−1/2 (Yb ; S|Yb ) → H s (Xb ; S) on the Sobolev completions with range Ker(Db , s) = {f ∈ H s (Xb ; S) | Db f = 0 in Xb \ Yb }, and the restriction Kbs : K(Db , s)) → Ker(Db , s)
(2.4)
is an isomorphism (see [6, 11]). The Poisson operator of Db defines the Calderon projection: P (Db ) = γ Kb1 .
(2.5)
P (Db ) is a pseudodifferential projection on L2 (Yb , SYb ) which we can take to be orthogonal with range equal to Ker(D, s). The construction depends smoothly on the parameter b ∈ B and so globally we obtain a smooth map P (D) : B → End(HY ) defining, equivalently, a smooth Frechet subbundle K(D) of HY with fibre K(Db ) = ran(P (Db )). Because of the tubular boundary geometry, P (Db ) in fact differs from the APS spectral projection 1b by only a smoothing operator [19, 11]. Recall that there is a polarization HYb = Hb+ ⊕ Hb+ into the non-negative and negative energy modes of the elliptic selfadjoint boundary Dirac operator DY,b and 1b is defined to be the orthogonal projection onto Hb+ . Hence P (Db ) is certainly an element of the pseudodifferential Grassmannian Grb = Gr(HYb ) parameterizing projections on HY,b which differ from 1b by a pseudodifferential operator of order < −dimM/2, where by projection we mean selfadjoint indempotent. Grb is a dense submanifold of the Hilbert–Schmidt Grassmannian. Associated to P ∈ Grb we have the elliptic boundary value problem (EBVP) for Db , DP ,b = Db : dom(DP ,b ) → L2 (Xb ; S 1 )
(2.6)
with domain dom DP ,b = {s ∈ H 1 (Xb ; Sb0 ) | P (s|Yb ) = 0}. The operator DP ,b is Fredholm with kernel and cokernel consisting of smooth sections, see [6] for a general account of EBVPs in index theory. The smooth family of EBVPs DGrb := {DP ,b | P ∈ Grb } defines a smooth admissible family of Fredholm operators and hence an associated
574
J. Mickelsson, S. Scott
determinant line bundle DET(DGrb ) → Grb . On the other hand, for each choice of a basepoint P0 ∈ Grb we have the smooth family of Fredholm operators {PW0 ,W := P ◦ P0 : W0 → W : P ∈ Grb }, where ran(P0 ) = W0 , ran(P ) = W , and hence a relative (Segal) determinant line bundle DETW0 → Grb based at W0 . The bundles so defined for different choices of basepoint are all isomorphic, but not quite canonically. More precisely, from [22, 20], given P0 , P1 ∈ Grb there is a canonical line bundle isomorphism DETW0 ∼ = DETW1 ⊗ DET(W0 , W1 ),
(2.7)
where DET(W0 , W1 ) means the trivial line bundle with fibre the relative determinant line DET(W0 , W1 ) := Det(PW0 ,W1 ). The identification (2.4) defines a canonical line bundle isomorphism DET(DGrb ) ∼ = DETKb ,
(2.8)
preserving the determinant sections det(DGrb ) ←→ det(Sb (P )), where Sb (P ) := P P (Db ) : K(Db ) → ran(P ) (see [20]). The translation of these facts into global statements follows by observing that the restricted Grassmannians Grb , defined for each b ∈ B, fit together to define a fibration GrY → B. A spectral section (or, Grassmann section [20]) P = {Pb : b ∈ B} for the family D is defined to be a smooth section of GrY , and we denote the space of such sections by Gr(M/B). By cobordism, such sections always exist. In particular, the family of Dirac operators D defines canonically the Calderon section P (D) ∈ Gr(M/B). In this sense one may think of the parameter space B as a generalized Grassmannian (parameterizing the subspaces K(Db )), and the usual Grassmannian as a “universal moduli space”. Notice, however, that the map b → 1b is generically not a smooth spectral section because of the flow of eigenvalues of the boundary family. Indeed, it is this elementary fact that is the source of gauge anomalies, see [15] and Sect. 4. A spectral section has a number of consequences for determinants: First. We obtain a smooth family of EBVPs (D, P) = {DP ,b := (Db )Pb | b ∈ B} which has an associated determinant line bundle DET(D, P) → B with determinant section b → DP ,b . Second. A spectral section P defines a smooth infinite-dimensional vector bundle W with fibre Wb = ran(Pb ), and associated to P we have the smooth family of Fredholm operators S(P) : K(D) → W, parameterizing the operators S(Pb ) := PK(Db ),Wb : K(Db ) → Wb . This also has a determinant line bundle DET(S(P)), and corresponding to (2.8), there is a canonical line bundle isomorphism DET(D, P) ∼ = DET(S(P)),
det(DP ,b ) ←→ det(S(Pb )),
(2.9)
Determinant Bundles and FQFT
575
preserving the determinant sections. Given a pair of sections P1 , P2 ∈ Gr(M/B) there is the smooth family of admissible Fredholm operators (P1 , P2 ) : W 1 → W 2 , and corresponding to (2.7) and (2.9) one finds a canonical isomorphism DET(D, P1 ) ∼ = DET(D, P2 ) ⊗ DET(P2 , P1 ),
(2.10)
which does not preserve the determinant sections [20]. Third. We obtain a bundle of Fock spaces FP over B. To see this, return for a moment to the case of a single operator and its Grassmannian Grb . By choosing a basepoint P0 ∈ Grb , we obtain the determinant line bundle DETW0 → Grb . This is a holomorphic line bundle, but has no global holomorphic sections. The dual bundle DET∗W0 , on the other hand, has an infinite-dimensional space of holomorphic sections, and this, by definition, is the Fock space based at W0 : FW0 ,b := 2hol (Grb ; DET∗W0 ).
(2.11)
This is the natural quantization defined by geometric (Kahler) quantization. Actually, a Fock space comes together with a vacuum vector and a representation of the canonical anticommutation relations; we shall return to this at the end of the section. Taking the union Fb := ∪W ∈Grb FW,b we obtain the Fock bundle over Grb . This bundle is topologically completely determined by “the” determinant bundle DETW0 , in fact this is the most direct way to define the bundle structure on Fb . To be precise, if we change the basepoint we find, dropping the b subscript, a canonical isomorphism FW1 = 2hol (Gr; DET∗W1 ) ∼ 2hol (Grb ; DET∗ ⊗ DET(W1 , W0 )∗ ) = W0
∼ = 2hol (Grb ; DET∗W0 ) ⊗ DET(W1 , W0 )∗ ∼ FW ⊗ DET(W0 , W1 ), = 0
where we use (2.7). Hence relative to a basepoint W0 ∈ Grb we have a canonical isomorphism Fb ∼ = FW0 ⊗ DETW0 ,
(2.12)
where the first factor on the right-side is the trivial bundle with fibre FW0 . Hence the topological type of the Fock bundle Fb is determined by that of the determinant line bundle DETW0 for any basepoint W0 . One moves between the isomorphisms for different basepoints via (2.7). As an abstract vector bundle, a Fock bundle is always trivial (but not necessarily canonically); this is because of the fact that (according to Kuiper’s theorem) the unitary group in an infinite-dimensional Hilbert space is contractible. However, as already mentioned, the Fock spaces are equipped with additional structure, the vacuum vectors related to a choice of a family of (Dirac) Hamiltonians, which will modify this statement. In the case of (2.12) we have a preferred line bundle (the “vacuum bundle”) inside of the Hilbert bundle and the structure group is reduced giving a nontrivial Fock bundle; this will be discussed in more detail in Sect. 4. Now as we let b vary we obtain a vertical Fock bundle F(M/B) over the total space GrY of the Grassmann fibration, which restricted to the fibre Grb of GrY coincides with Fb . The bundle structure is obvious from the local triviality of the fibration GrY → B.
576
J. Mickelsson, S. Scott
A spectral section P is a smooth cross section of that fibration, and hence by pull-back we get a Fock bundle over B associated to P: FP := P∗ (F(M/B)) −→ B,
(2.13)
with fibre FPb = 2hol (Grb ; DET∗Wb ), Wb = ran(Pb ) at b ∈ B. In the following we may at times also write FP = FW , where W → B is the bundle associated to P. Moreover, from the equivalences above we get that the various Fock bundles are related in the following way. Proposition 2.1. For spectral sections P1 , P2 ∈ Gr(M/B), there is a canonical isomorphism of Fock bundles FP1 ∼ = FP2 ⊗ DET(P2 , P1 ).
(2.14)
Notice, in a similar way to FP , we can also identify the determinant line bundle DET(P1 , P2 ) as a pull-back bundle. For, associated to the section P1 we have a vertical determinant line bundle DETP1 → GrY , which restricts to DETW 1 over Grb , where b
Wb1 = ran(Pb1 ). Then DET(P1 , P2 ) = (P2 )∗ (DETP1 ). In particular, associated to the family D of Dirac operators parameterized by B, we have the canonical spectral section P (D), and by (2.9) we have a canonical isomorphism DET(D, P) ∼ = P∗ (DETP (D) ). At the Fock space level we have a Fock bundle FD canonically associated to the family D, independently of an extrinsic choice of spectral section, whose fibre at b ∈ B is FDb := 2hol (Grb ; DET(DGrb )∗ ). From (2.8) and (2.14) we obtain the Fock space version of (2.9) and (2.10): Proposition 2.2. There is a canonical isomorphism of Fock bundles FD ∼ = FP (D) .
(2.15)
FD ∼ = FP ⊗ DET(D, P).
(2.16)
For P ∈ Gr(M/B):
Thus the topology of the Fock bundle and the determinant line bundle are intimately related. This is the topological reason relating the Schwinger terms in the Hamiltonian anomaly to the index density. The Fock space FW0 based at W0 ∈ Grb can be thought of more concretely in terms of equivariant functions on the Stiefel frame bundle over Grb . To describe this we fix an orthonormal basis of eigenvectors of DY,b in HYb such that ei ∈ Hb− for i ≤ 0 and ei ∈ Hb+ for i > 0. A point in the fibre of the Stiefel bundle Stb based at Hb+ over W in the index zero component of Grb is a linear isomorphism ξ : Hb+ → W such that 1b ◦ ξ : Hb+ → Hb+ has a Fredholm determinant. ξ is also referred to as an “admissible basis” for W (relative to Hb+ ), in so far as it transforms ei , i > 0 to a basis for W . ξ can ξ+ be thought of as a matrix with columns labeling the elements of the basis and ξ− rows the coordinates in the standard basis ei . If ξ, ξ are two admissible bases for W then ξ = ξ.g where g is an element of the restricted general linear group Gl 1 consisting of invertible linear maps g : Hb + → Hb + such that g − I is trace-class. Gl 1 acts freely
Determinant Bundles and FQFT
577
on Stb × C by (ξ, λ).g = (ξ.g, λdetF (g)−1 ) and we obtain the alternative construction of DETH + = Stb ×Gl 1 C. b
(2.17)
Similarly, for W ∈ Grb we have DETW = StW ×Gl 1 C, where StW is the corresponding frame bundle based at W . In particular, notice that an isomorphism StW0 → StW1 is specified by an invertible operator A : W0 → W1 such that P1 P0 − A is trace-class, from which we once again have the identification (2.7). In this description, an element of the Fock space FWb is a holomorphic function ψ : StWb → C transforming equivariantly under the Gl 1 action as ψ(ξ.g) = ψ(ξ )det F (g). The distinguished element νWb (ξ ) = detF (PWb ξ )
(2.18)
is the vacuum vector. Equivalently, an element of FWb is a holomorphic function f : DETWb → C which is linear on each fibre, and from this view point the vacuum vector is the function νWb ([α, λ]) = λdetF (PWb α),
(2.19)
for any representative (α, λ) of the equivalence class [α, λ] ∈ Det(Wb , W ). A generalization of this leads to the Plücker embedding. First, for W ∈ Grb , fix an orthonormal basis {ei }i∈Z of Hb such that ei ∈ W ⊥ for i ≤ 0 and ei ∈ W for i > 0;. Let S be the set of all increasing sequences of integers S = (i1 , i2 , . . . ) with S −N and N−S finite. For each sequence S we have an admissible basis ξ(S) = {ei1 , ei2 , . . . } ∈ StW , and the Fredholm index of the operator PS PW : W → HS , where HS is the closed subspace spanned by ξ(S) and PS the corresponding orthogonal projection, defines a bijection π0 (Grb ) → Z. The Plücker coordinates of the basis ω ∈ StW are the collection of complex numbers ψS (ω)= detF (PS ω) = detF (ωS ), where ωS is the matrix formed ω+ from the rows of ω = labeled by S. In particular ψN (ω) is the coordinate defined ω− by the vacuum vector. If ω is a basis for W ∈ Grb , then the Plücker coordinates of a second admissible basis ω1 differ from those of ω by the Fredholm determinant of the matrix relating the two bases. The Plücker coordinates therefore define a projective embedding Grb → FW . This is prescribed equivalently by the map φ : StWb × StWb −→ C,
φ(τ, ω) = det F (τ ∗ ω) = detF (τ+∗ ω+ + τ−∗ ω− ), (2.20)
with respect to which the Plücker coordinates are ψS (ω) = φ(ξ(S), ω). φ is the same thing as the map on the determinant bundle gφ : DETWb × DETWb −→ C,
gφ ([α, λ], [β, µ]) = λµdet F (α ∗ PW β),
(2.21)
where α : Wb → W , (resp. β : Wb → W ), is antiholomorphic (resp. holomorphic) and antilinear (resp. linear) in the first (resp. second) variable. We then have the Plücker embedding map DETWb − {0} −→ FWb
(2.22)
578
J. Mickelsson, S. Scott
which maps [ω, λ] → λφ(ω, . ), or, using the Segal definition of the determinant line [α, λ] −→ ψ[α,λ]
(2.23)
defined for [α, λ] = [PW α, λ] ∈ DET(Wb , W ) and ξ : Wb → W by ψ[α,λ] (ξ ) = λdet F (α ∗ ◦ ξ ) = λdet F (α ∗ ◦ PW ◦ ξ ).
(2.24)
det(idWb ) −→ νWb ,
(2.25)
Notice that
where idWb := PWb ,Wb . The map (2.22) thus defines a projective embedding Grb → FW . The map (2.21) restricted to a linear map DETWb ×DETWb → C, defines a canonical metric on DETWb by [α, λ] 2 = |λ|2 det F (α ∗ α), and globally, via (2.9), we get the canonical metric of [20] on Det(D, P), det DPb
2
:= gφ (S(Pb ), S(Pb )) = detF (S(Pb )∗ S(Pb )).
(2.26)
On the other hand, we can use the map φ (or gφ ) to put a unitary structure on FW with respect to which (2.22) is an isometry. To do that we use the fact that any section in FW can be written as a linear combination of the ψ[α,λ] , and set !ψ[α,λ] , ψ[β,µ] "W = gφ ([α, λ], [β, µ]).
(2.27)
In particular, the finite linear combinations of the sections ψS , S ∈ S are dense in FW with respect to the topology of uniform convergence on compact subsets, and one has !ψS , ψS " = φ(ξ(S), ξ(S )) = δSS . Notice further the identities !νW , νW "W = 1
and
!ψ[α,λ] , ψ[α,λ] "W = [α, λ] 2 ,
(2.28)
the latter being the statement that (2.22) is an isometry. For further details see [17] and [14]. There is a different way of thinking about Fock spaces which is perhaps more familiar to physicists, as an infinite-dimensional exterior algebra (fermionic Fock space). Recall [14] that a polarization W of the Hilbert space Hb fixes a representation of the canonical anticommutation relations (CAR) in a Fock space F(Hb , W ), whose only non-zero anticommutators are a ∗ (v)a(u) + a(u)a ∗ (v) = !u, v".
(2.29)
The defining property of this irreducible representation is that there is a vacuum vector |W > with the property a(u)|W " = 0 = a ∗ (v)|W " for all u ∈ W, v ∈ W ⊥ . One has F(Hb , W ) = ∧(W ) ⊗ ∧((W ⊥ )∗ ) =
d=q−p∈Z
∧p (W ) ⊗ ∧q ((W ⊥ )∗ ).
(2.30)
(2.31)
Determinant Bundles and FQFT
579
The vacuum |W " is represented as the unit element in the exterior algebra. For u ∈ W , a(u) corresponds to interior multiplication by u, the creation operator a ∗ (u) is given by exterior multiplication. For u ∈ W ⊥ the operator a(u) (resp., a ∗ (u)) is given by exterior (resp., interior) multiplication by J u; here J : H → H ∗ is the canonical antilinear isomorphism from a complex Hilbert space to its dual. The vacuum |W > has then the characteristic property a(u)|W " = 0, u ∈ W , and a ∗ (v)|W >= 0, v ∈ W ∗ . If we choose a different W ∈ Grb then there is a complex vacuum line in F(Hb , W ) corresponding to the new polarization W . The different vacuum lines parameterized by the planes W form another realization for the determinant bundle DETW over Grb as a subbundle of the trivial Fock bundle with fibre FW . The Plücker embedding DETW →
F(Hb , W ) is defined by mapping (ω, λ) ∈ StW × C to λ S∈S detωS ψS , where ωS is as before. A Hermitian metric on F(Hb , W ) is again defined by < ψS , ψS >= δSS . On the other hand, the finite-dimensional matrix identity for α : Cm → Cn , β : Cn → Cm with n ≤ m: det(α(i))det(β(i)), (2.32) det(αβ) = (i)
the sum being over all sequences (i) = {1 ≤ i1 < i2 . . . in ≤ m}, with α(i) (resp. β(i)) the matrix obtained from A (resp. B) by selecting the columns of A (resp. rows of B) labeled by S, implies the pairing of Fock space vectors ψ[α,λ] (ξ(S))∗ ψ[β,µ] (ξ(S)). (2.33) !ψ[α,λ] , ψ[β,µ] " = S∈S
The metrics so defined on the CAR construction F(Hb , W ) and the geometric construction FW of the Fock space, then correspond under the algebraic isomorphism defined by associating to each section ψS ∈ FW the vector a(ei1 ) . . . a(eip )a ∗ (ej1 ) . . . a ∗ (ejq )|W " ∈ F(Hb , W ), where i1 < i2 < . . . ip ≤ 0 is the set of negative indices in the sequence S and 0 < j1 < j2 < . . . jq is the set of missing positive indices, giving a dense inclusion F(Hb , W ) −→ FW . Returning to the case of a family D of Dirac-type operators parameterized by a manifold B, if we are given a spectral section P ∈ Gr(M/B), then we have the global version of the above properties. Associated to P we have a Fock bundle FW → B and this is endowed with a unitary structure < , >P , given on the fibre FWb by (2.27). The bundle FW has a distinguished section, the vacuum section νP = νW , assigning to b ∈ B the vacuum vector νWb , with unit norm in the fibres. Associated to the canonical Calderon section P (D) defining the Fock bundle FK we then have a determinant line bundle DET(D, P) ∼ = DET(S(P)) and a generalized Plücker embedding DET(D, P) ∼ = DET(S(P)) −→ FK ,
(2.34)
corresponding to the viewpoint on the parameter space B as a generalized Grassmannian. More generally, for any pair of spectral sections P1 , P2 , there is a generalized Plücker embedding DET(P1 , P2 ) −→ FP1 ,
(2.35)
defined fibrewise by the embeddings Det(W1,b , W2,b ) A→ DETW1,b → FW1,b , which according to (2.22) is the map [α, λ] → ψ[α,λ] ∈ FP1 . So, a section of the determinant
580
J. Mickelsson, S. Scott
bundle DET(P1 , P2 ) defines a section of the Fock bundle FP1 . In particular, the vacuum section is the image of the determinant section in the “trivial” case DET(P1 , P1 ) −→ FP1 .
(2.36)
Associated to the family of Dirac operators D, we have a canonical vacuum section νK ∈ FK with < νKb , νKb >Kb = 1, and if we choose an external spectral section P, then via (2.34) we have a canonical section ψK,P of FK corresponding to the determinant section b → det(DPb ) ↔ det(S(Pb )) of DET(D, P), with !ψK,P , ψK,P "Kb = det(S(Pb ))
2
.
(2.37)
That is, the generalized Plücker embedding (2.34) is an isometry with respect to the canonical metric on DET(D, P). This follows by construction from (2.28). As we already mentioned, as an abstract vector bundle the Fock bundle is trivial. However, the non-triviality of the construction lies in the (locally defined) physical vacuum subbundle defined by the family of Hamiltonians. As an example, assume that we have a family of Dirac Hamiltonians parameterized by the set A of smooth vector potentials. Given a real number λ we can define W0 (A) as the subspace of the boundary Hilbert space corresponding to the spectral restriction DY,A > λ for the boundary Hamiltonian; A → W0 (A) is a smooth Grassmann section over the set Uλ ⊂ A of Hamiltonians with λ ∈ / Spec(DY,A ). Let A → W1 (A) be a globally defined Grassmann section. For each A ∈ Uλ we have a well-defined vacuum line |A" ∈ FW1 (A) . This line is just the image of the determinant line DET(W1 (A), W0 (A)) with respect to the map (2.22). If dim Y = 1 the Grassmannian Gr A does not depend on the parameter A and we may take W1 (A) as a constant section. Anyway, the bundle of vacua over Uλ can be identified as the relative determinant bundle DET(W1 , W0 ) and the twisting of this bundle depends solely on the twisting of the local section A → W0 (A).
3. Construction of the FQFT In this section we utilize the facts presented in the previous section to piece together a FQFT, generalized from the two dimensional case proposed by Segal [22].As in Sect. 1.1, the constructions are mathematical and do not refer to any particular physical system. In the next section we explain how the chiral anomaly and commutator anomaly arise in this context.
3.1. Strategy. We define a projective functor from a subcategory Cd of the category of spin manifolds to the category Cvect of Z-graded vector spaces and linear maps, which factors through the category CGr of linear relations: Cd −→ CGr % ↓ Cvect The combination of these functors is the Fock functor defining the FQFT.
Determinant Bundles and FQFT
581
3.2. Projective representations of categories. By a category C we mean a set Ob(C) of elements called the objects of C, and for any two elements a, b ∈ Ob(C) a set MorC (a, b) of morphisms a → b, such that for a, b, c ∈ Ob(C) there is a multiplication defined MorC (a, b) × MorC (b, c) −→ MorC (a, c),
(fa,b , fb,c ) −→ fb,c ◦ fa,b .
The product is required to be associative, so that if fc,d ∈ MorC (c, d), then fc,d (fb,c fa,b ) = (fc,d fb,c )fa,b . One usually also asserts the existence of an identity morphism idb ∈ MorC (b, b) which satisfies idb ◦ fa,b = fa,b and fb,c ◦ idb = fb,c . A (covariant) functor C from a category C to a category C means a map C : Ob(C) → Ob(C ) and for each pair a, b ∈ Ob(C) a map Ca,b : MorC (a, b) → MorC (C(a), C(b)) such that Ca,c (fb,c fa,b ) = Cb,c (fb,c )Ca,b (fa,b ).
(3.1)
If C is the category of vector spaces and linear maps, then C is a representation of the category C. Hence, from the viewpoint of FQFT, quantization may be roughly characterized as a projective representation of the category Cd of manifolds defined by classical geometric data. A classical result of Wigner tells us that in quantum systems we must content ourselves with projective representations of symmetry groups. Similarly, with the Fock functor we have to consider projective category representations. This means that there is essentially a scalar ambiguity in defining the map Ca,b , so that (3.1) is replaced by Ca,c (fb,c fa,b ) = c(fb,c , fa,b )Cb,c (fb,c )Ca,b (fa,b ),
(3.2)
where the “cocycle” c(fb,c , fa,b ) takes values in C−{0}. To explain the meaning here of “essentially”, recall that a projective representation of a group G is a true representation ˆ of G by C× . The group G ˆ forms a C× bundle over G whose of an extension group G ˆ Lie algebra cocycle is the first Chern class of the associated line bundle. Equivalently, G is defined by assigning to each g ∈ G a complex line Lg such that Lg1 g2 = Lg1 ⊗ Lg2 . (A well-known instance of this occurs for loop groups, see Example 4.1 below, and more generally we shall in Sect. 4 give the gauge group representations for a Yang– Mills action functional a similar description.) Likewise, a projective representation of a category is a true representation of an extension category Cˆ constructed by assigning to each fa,b ∈ MorC (a, b) a complex line Lfa,b , and given fb,c ∈ MorC (b, c), an identification Lfa,b ⊗ Lfb,c −→ Lfa,b fb,c ,
(3.3)
which is associative in the natural sense. One then has ˆ = Ob(C), Ob(C)
MorCˆ (a, b) = {(f, λ) | f ∈ MorC (a, b), λ ∈ Lf }.
(3.4)
582
J. Mickelsson, S. Scott
3.3. The category Cd . An element of Ob(Cd ) is a pair (Y, W ), where Y = (Y, gY , SY ⊗ ξY ), with Y is a closed, smooth and oriented d-dimensional spin manifold, gY a Riemannian metric on Y , SY a spinor bundle over Y , ξY a Hermitian bundle over Y with compatible gauge connection, and W is an admissible polarization of the “one-particle” Hilbert space HY = W ⊕ W ⊥ to a pair of closed infinite-dimensional subspaces. Here HY = L2 (Y, SY ⊗ ξY ) and admissible means that PW ∈ GrY , where PW is the orthogonal projection onto W and GrY is the Hilbert–Schmidt Grassmannian defined with respect to the energy polarization HY = H + ⊕H − into positive, resp. negative, energies of the Dirac operator DY . Let (Yi , Wi ) ∈ Ob(Cd ), i = 1, 2, where Yi = (Yi , gYi , SYi ⊗ ξYi ). An element of MorCd ((Y1 , W1 ), (Y2 , W2 )) is a triple X = (X, gX , SX ⊗ ξX ), where X smooth and oriented (d+1)-dimensional spin manifold with boundary ∂X = Y1 Y2 , gX a Riemannian metric on X with (gX )|Yi = gYi , SX a spinor bundle and ξX a Hermitian bundle over X with compatible gauge connection, such that (SX ⊗ξX )|Yi ∼ = SY ⊗ξYi and the connections metrics correspond under the isomorphism. We refer to X as a geometric cobordism from Y1 to Y2 . We assume that: • In a collar neighbourhood of the boundary U = U1 U2 , where Ui = ([0, 1] × Yi ) all metrics, connections are of product-type. Recall this means that near the boundary the metric becomes the product of the standard metric on the real axis and the boundary metric. Similarly, the gauge connection approaches smoothly the connection on the boundary such that at the boundary all the normal derivatives vanish. Thus ξX|Ui is a pull-back of the boundary bundle (ξYi ), and similarly all metrics, connections, etc. are pull-backs of their boundary counterparts, so gX|Ui = du2 + gYi , etc. • The orientation on the “ingoing” boundary Y1 is assumed to be induced by the orientation of X and the inward directed normal vector field on the boundary, whereas for the “outgoing” boundary Y2 the orientation is fixed by the outward directed normal vector field. For notational brevity we may write Si := SYi ⊗ ξYi , gi := gYi etc., and S1,2 := SX ⊗ ξX , g1,2 := gX etc. in the following. We augment Cd by including the empty set ∅ ∈ Ob(Cd ), and for each (Y, W ) ∈ Ob(Cd ) we also allow ∅ := id as an element of MorCd ((Y, W ), (Y, W )). In particular, a geometric cobordism X ∈ MorCd (∅, (Y, W )) means d + 1-dimensional manifold X with boundary Y (plus bundles, connections etc). Thus a morphism in Cd may have disconnected, connected, or empty (X is closed) boundary, according to whether Y is disconnected, connected or empty. For (Yi , Wi ) ∈ Ob(Cd ), i = 1, 2, 3, there is an associative product map MorC ((Y1 , W1 ), (Y2 , W2 )) × MorC ((Y2 , W2 ), (Y3 , W3 ))
(3.5)
−→ MorC ((Y1 , W1 ), (Y3 , W3 )) taking a pair (X1,2 , X2,3 ) to the geometric cobordism X1,2 ∪Y2 X2,3 = (X1,2 ∪Y2 X2,3 , g1,2 ∪ g2,3 , S1,2 ∪σ S2,3 ).
(3.6)
This “sewing together” of bundles is defined in the usual way, the crucial point being that all geometric data in the collar is pulled back from the boundary and is hence compatible under sewing of manifolds. Briefly, the collar neighbourhood Ur = [0, 1) × Y2 of the boundary of Y2 in X2,3 is a copy of the collar neighbourhood Ul = (−1, 0] × Y2 of the boundary of Y2 in X1,2 but with orientation reversed. Hence we may glue together
Determinant Bundles and FQFT
583
the manifolds X1,2 and X2,3 along Y2 to get the “doubled” manifold X1,2 ∪Y2 X2,3 with a tubular neighbourhood of the partition Y2 which we may parameterize as U = (−1, 1) × Y2 . Associated to the geometric data we have Dirac operators D1,2 and D2,3 acting respectively on sections of the Clifford bundles S1,2 and S2,3 . Over Ul the operator D1,2 takes the product form σ (∂/∂u + DY2 ), because of the change of orientation, and hence chirality, (D2,3 )|Ur = σ −1 (∂/∂v + σ DY2 σ −1 ). Over Y2 we construct S1,2 ∪σ S2,3 by gluing S1,2 to S2,3 via the unitary isomorphism σ , identifying s ∈ (S1,2 )|Y with σ s ∈ (S2,3 )|Y . (Thus σ takes positive spinors to negative spinors.) A section of S1,2 ∪σ S2,3 is a pair (ψ, φ) with ψ (resp. φ) is a smooth section of S1,2 (resp. of S2,3 ) such that ∂k ∂k k the normal derivatives of all orders match-up: ∂u k ψ(0, y) = (−1) σ (y) ∂uk φ(0, y). We then have the Dirac-type operator (D1,2 ∪ D2,3 )(ψ, φ) = (D1,2 ψ, D2,3 φ) acting on C ∞ (X1,2 ∪Y2 X2,3 , S1,2 ∪σ S2,3 ), associated to the induced geometric data on X1,2 ∪Y2 X2,3 . 3.4. The category CGr . An element of Ob(CGr ) is a pair (H, W ) with H a Hilbert space, and W a polarization of H into a pair of closed orthogonal infinite-dimensional subspaces H = W ⊕ W ⊥ . A morphism (E, H) ∈ MorCGr ((H 1 , W1⊥ ), (H2 , W2 )) is a closed subspace E ⊂ H 1 ⊕ H2 such that PE − PW ⊥ ⊕W2 is a Hilbert–Schmidt operator, 1
where PE , PW ⊥ ⊕W2 are the orthogonal projections with range E, W1⊥ ⊕W2 respectively, 1
along with an element H of the relative determinant line Det(W1⊥ ⊕ W2 , E). (It is convenient here to use the “reverse” polarization W1⊥ of H1 in order to account for boundary orientations later on, see below.) Thus there is an identification MorCGr ((H1 , W1⊥ ), (H2 , W2 )) = DETW ⊥ ⊕W2 ,
(3.7)
1
where the right-side is the determinant line bundle based at W1⊥ ⊕W2 over the trace-class Grassmannian Gr(H 1 ⊕ H2 ). Here H 1 serves to remind us that we are considering the reverse polarization W1⊥ ; we may write Gr(H 1 ⊕ H2 ) = Gr(H 1 ⊕ H2 , W1⊥ ⊕ W2 ),
DETW ⊥ ⊕W2 = DETW ⊥ ⊕W2 (H 1 ⊕ H2 ) 1
1
if we wish to emphasize the polarization. We also allow ∅ ∈ Ob(CGr ) as an object, and define MorCGr (∅, (H, W )) = DETW (H ), MorCGr ((H , W ⊥ ), ∅) = DETW ⊥ (H ), MorCGr (∅, ∅) = C.
(3.8)
To define the product of morphisms in CGr , first recall when H0 ) = ∅ ) = H2 , from the “category of linear relations”, the “join” product rule 1⊥ ⊕ W2 ) Gr(H 0 ⊕ H1 , W0⊥ ⊕ W1 ) × Gr(H 1 ⊕ H2 , W −→ Gr(H 0 ⊕ H2 , W0⊥ ⊕ W2 ),
(3.9)
(E01 , E12 ) −→ E01 ∗ E12 , where W1 ∈ Gr(H1 , W1 ), defined by E01 ∗ E12 = {(u, v) ∈ H0 ⊕ H2 | ∃ w ∈ H1 such that (u, w) ∈ E01 , (w, v) ∈ E12 }.
584
J. Mickelsson, S. Scott
The join is a generalized composition law of graphs of linear operators, but here the morphisms E are not in general everywhere defined, but dom(E) = range(PH1 PE : E → H1 ), and may also be “multi-valued”. The composition may therefore be discontinuous. From [22] we recall that for continuity one requires that: (i) the map E01 ⊕ E12 → H1 , ((u, w), (w , v)) → w − w is surjective, and (ii) E01 ⊕ E12 → H0 ⊕ H1 ⊕ H2 , ((u, w), (w , v)) → (u, w − w , v) is injective. The crucial fact is the following: Proposition 3.1. With the above notation, when (H0 , W0⊥ ) = ∅ = (H2 , W2 ) there is a canonical pairing, linear and holomorphic on the fibres in the first and second variables, 1 ). κ : DETW1 × DETW ⊥ −→ Det(W1 , W
(3.10)
κ : DETW1 × DETW ⊥ −→ C.
(3.11)
1
1 , then If W1 = W 1
More generally, if (i) and (ii) hold, then one has such a pairing κ : DETW ⊥ ⊕W1 (H0 ⊕ H1 ) × DETW ⊥ ⊕W2 (H 1 ⊕ H2 ) 0
1
1 ), −→ DETW ⊥ ⊕W2 (H 0 ⊕ H2 ) ⊗ Det(W1 , W 0
(3.12)
which respects the join multiplication: on each fibre 1⊥ ⊕ W2 , E12 ) κ : Det(W0⊥ ⊕ W1 , E01 ) × Det(W 1 ). −→ Det(W0⊥ ⊕ W2 , E01 ∗ E12 ) ⊗ Det(W1 , W
(3.13)
(Here the second factor on the right-side of (3.12) denotes the trivial bundle with fibre 1 ).) If W1 = W 1 , then Det(W1 , W κ : DETW ⊥ ⊕W1 (H0 ⊕ H1 ) × DETW ⊥ ⊕W2 (H 1 ⊕ H2 ) −→ DETW ⊥ ⊕W2 (H 0 ⊕ H2 ). 0 1 0 (3.14) Proof. As before, we denote by PW,W the orthogonal projection onto W restricted ⊥ ) = Gr(H 1 , W ⊥ ), to the subspace W . Given E ∈ Gr(H1 , W1 ), E ∈ Gr(H 1 , W 1 1 ⊥ , E ) as the determinant we can represent elements H ∈ Det(W1 , E) and δ ∈ Det(W 1 ⊥ → E with PW1 aH − idW1 and elements of linear operators aH : W1 → E and bδ : W 1 PW ⊥ bδ − idW ⊥ trace-class. We define 1
1
1⊥ , E ) −→ Det(W1 , W 1 ), κ : Det(W1 , E) × Det(W
(3.15)
by 1⊥ bδ ) ∈ Det(W1 ⊕ W 1⊥ , H1 ) ∼ 1 ), κ(H, δ) = det(P1 aH + P = Det(W1 , W
(3.16)
1 are the projections on W1 , W 1 , and Det(W1 ⊕ W ⊥ , H1 ) is the determinant where P1 , P 1 ⊥ ⊥ : W1 ⊕W → H1 . This operator differs from P1 aH +P ⊥ bδ by an operator line of P1 +P 1 1 1 of trace-class (so (3.15) is well-defined) because PW1 aH − idW1 and PW ⊥ bδ − idW ⊥ are 1 1 trace-class.
Determinant Bundles and FQFT
585
The canonical isomorphism on the right-side of (3.16) is expressed via the diagram of commutative maps with exact rows and Fredholm columns ⊥ −−−−→ W ⊥ −−−−→ 0 0 −−−−→ W1 −−−−→ W1 ⊕ W 1 1
P1 P1 +P1⊥
id
P1 P1 1 −−−−→ 0 −−−−→ W
H1
⊥ −−−−→ 0 −−−−→ W 1
where the horizontal maps are the obvious ones. Such a diagram defines an isomorphism between the determinant line of the centre map with the tensor product of the lines defined by the outer columns, mapping the determinant elements to each other [22, 20]. Hence since Det(id) = C canonically, the isomorphism follows, and with E = W1 and ⊥, E = W 1 κ(det(idW1 ), det(idW 1 ,W1 ), ⊥ )) = det(PW 1
(3.17)
where idW = PW,W , which will be relevant later in this section. 1 and choose H ∈ Det(W ⊥ ⊕ For the general case (3.12), suppose initially that W1 = W 0 ⊥ W1 , E01 ) and δ ∈ DET(W1 ⊕ W2 , E12 ) identified with the determinant elements of linear operators aH : W0⊥ ⊕ W1 → E01 and bδ : W1⊥ ⊕ W2 → E12 . Define κ1 : Det(W0⊥ ⊕ W1 , E01 ) × DET(W1⊥ ⊕ W2 , E12 ) −→ DET(W0⊥ ⊕ H1 ⊕ W2 , E01 ⊕ E12 ),
(3.18)
κ1 (H, δ) = det(aH ⊕ bδ ). On the other hand, from [22], conditions (i) and (ii) mean that there is an exact sequence 0 −→ E01 ∗ E12 −→ E01 ⊕ E12 −→ H1 −→ 0,
(3.19)
and this fits into the commutative diagram with Fredholm columns 0 −−−−→ E01 ∗ E12 −−−−→ P ⊥ PE ∗E
W0 ⊕W2 01 12
E01 ⊕ E12
G
−−−−→ H1 −−−−→ 0
id
0 −−−−→ W0⊥ ⊕ W2 −−−−→ W0⊥ ⊕ H1 ⊕ W2 −−−−→ H1 −−−−→ 0 where we modify (3.19) by composing the injection E01 ∗ E12 −→ E01 ⊕ E12 with the involution ((u, w), (w , v) → ((u, w), (−w , v), and the following surjection to ((u, w), (w , v) → (u, w + w , v), while the lower maps are again the obvious ones. The central column is G(ξ, η) = (PW ⊥ PH0 ξ, PH1 ξ + PH1 η, PW2 PH1 η). 0
Because PH1 PE01 − PW1 PH1 PE01 = PH1 (PE01 − PW ⊥ ⊕W1 )PE01 and PH1 PE12 − 0 PW ⊥ PH1 PE12 = PH1 (PE12 − PW ⊥ ⊕W2 )PE12 are trace-class, the operators G and GW1 , 1 1 where ⊥ ⊥ GW1 (ξ, η) = (PW P ξ, PW1 PH1 ξ + PW P η, PW2 PH1 η), 0 H0 1 H1
586
J. Mickelsson, S. Scott
differ by only trace-class operators and so Det(G) = Det(GW ) = DET(E01 ⊕ E12 , W0⊥ ⊕ H1 ⊕ W2 ), while from the diagram we have Det(G) ∼ = Det(E01 ∗ E12 , W0⊥ ⊕ W2 ). Thus by duality (i.e. take adjoints in the above diagrams, reversing the order of the columns and rows and the direction of the arrows) we have a canonical isomorphism Det(W0⊥ ⊕ W2 , E01 ∗ E12 ) ∼ = DET(W0⊥ ⊕ H1 ⊕ W2 , E01 ⊕ E12 ), and so composition 1 . In the general case, replace with κ1 completes the proof of (3.12) in the case W1 = W ⊥ and repeat H1 in (3.18) and the lower row of the commutative diagram by W1 ⊕ W 1 the argument used in the proof of (3.10). Finally, we note for later reference that in the “vacuum case” E01 = W0⊥ ⊕ W1 and E12 = W1⊥ ⊕ W2 one has E01 ∗ E12 = W0⊥ ⊕ W2 and κ(det(idE01 ), det(idE12 ) = det(idE01 ∗E12 ).
+
(3.20)
From (3.14) and the identification (3.7) we now have a canonical multiplication MorCGr ((H 0 , W0⊥ ), (H1 , W1 )) × MorCGr ((H 1 , W1⊥ ), (H2 , W2 )) −→ MorCGr ((H 0 , W0⊥ ), (H2 , W2 )),
(3.21)
(E0,1 , H), (E1,2 , δ)) −→ (E0,1 ∗ E1,2 , H ∗ δ),
where H ∗ δ :=
κ(H, δ) if (i) and (ii) hold, 0 otherwise.
In particular, MorCGr (∅, (H1 , W1 )) × MorCGr ((H1 , W1⊥ ), ∅) −→ MorCGr (∅, ∅),
(3.22)
is precisely Eq. (3.11). 3.5. The projective functor Cd → CGr . Define C : Ob(Cd ) −→ Ob(CGr ),
(Y, W ) −→ (HY , W ),
(3.23)
where as before HY = L2 (Y, SY ⊗ ξY ) and W is an admissible polarization. While for (Y1 , W1 ), (Y2 , W2 ) ∈ Ob(Cd ), C : MorCˆ ((Y1 , W1 ), (Y2 , W2 )) −→ MorC ˆ ((H Y1 , W1⊥ ), (HY2 , W2 )) d
Gr
X −→ (K12 , H),
(3.24)
where K12 ⊂ H Y1 ⊕ HY2 is the Calderon subspace of boundary ‘traces’ of solutions to the Dirac operator D 1,2 over X defined by the geometric data in X , and 1,2 H ∈ Det(W1⊥ ⊕ W2 , K12 ) ∼ = Det(DP ⊥ )∗ . Taking into account that Y1 is an inW1 ,W2
coming boundary, we have K12 ∈ Gr(H Y1 ⊕ HY2 ; W1⊥ ⊕ W2 ) (in fact, an element of
Determinant Bundles and FQFT
587
the ‘smooth Grassmannian’). The choice needed of the element H means that C is a true functor Cd → CGr , where Cd is the extension category of Cd whose objects are the same as Cd , and MorCd (Y1 , Y2 ) = {(X , z) | X ∈ MorCd (Y1 Y2 ), H ∈ Det(W1⊥ ⊕ W2 , K12 )}.
(3.25)
For a closed geometric cobordism X ∈ MorCd (∅, ∅) we set MorCd (∅, ∅) = Det(DX ),
(3.26)
the projectivity of the functor in this case corresponds to a choice of generator for Det(DX ), defining Det(DX ) ∼ = C = MorCGr (∅, ∅). To see the functor respects the product rules in each category, it is enough to show that K01 ∗ K12 is the Calderon subspace of the operator D 0,1 ∪ D 1,2 , i.e. K(D 0,1 ∪ D 1,2 ) = K(D 0,1 ) ∗ K(D 1,2 ) defined by morphisms X0,1 , X1,2 . This, however, is immediate from the definition of D 0,1 ∪ D 1,2 , and the fact that given ψ ∈ Ker D 0,1 , φ ∈ Ker D 1,2 it is enough for their boundary values to match up in order to get an element of Ker (D 0,1 ∪ D 1,2 ). That in turn follows because the product geometry in the collar neighbourhood
U of the outgoing boundary Y1 of X0,1 implies that ψ has the form ψ(u, y) = k e−λk u ψk (0)ek (y), where {λk , ek } is a spectral resolution of HY1 defined by the boundary Dirac operator. (To be quite correct, we should also include the identification by the boundary isomorphism σ (y) in the definition of the join K01 ∗ K12 , but this introduces no new phenomena.) Thus the requirement (3.3) for the rule X → Det(W1⊥ ⊕ W2 , K12 ) to define a projective extension of CGr , is (3.14) of Proposition 3.1. 3.6. The functor CGr → Cvect . The functor M from the category CGr to the category Cvect of (Z-graded) vector spaces and linear maps, is defined on objects of CGr by (H, W ) −→ FW = FW (H )
(H , W ⊥ ) −→ FW ⊥ = FW ⊥ (H ),
(3.27)
∅ −→ C. Thus M takes a polarized vector space to the Fock space defined by the polarization, and F∅ = C is by fiat. Here FW ⊥ = FW ⊥ (H ) := 2hol (Gr(H ), DETW ⊥ ) is the Fock space associated with the reverse polarization. M is defined on morphisms as follows. From (3.7), a morphism in MorCGr ((H1 , W1⊥ ), (H2 , W2 )) is the same thing as an element H ∈ DETW ⊥ ⊕W2 , which we may think of as the pair 1
(E, H) where H ∈ Det(W1⊥ ⊕ W2 , E). By the Plücker embedding (2.22) this gives us a canonical vector φH ∈ FW ⊥ ⊕W2 (H 1 ⊕ H2 ) ∼ = FW ⊥ (H 1 ) ⊗ FW2 (H2 ). 1
1
(3.28)
The isomorphism is immediate from (2.31) since FW (H ) is the completion of F(H, W ). To proceed we need the following facts, generalizing Eq. (8.10) of [22]:
588
J. Mickelsson, S. Scott
Proposition 3.2. The determinant bundle pairing κ of Proposition 3.1 defines a canonical Fock space pairing 1 ), ( ) : FW1 × FW ⊥ −→ Det(W1 , W
(3.29)
(νW1 , νW 1 ,W1 ). ⊥ ) = det(PW
(3.30)
1
with 1
1 , this becomes If W1 = W ( ) : FW1 × FW ⊥ −→ C, 1
(νW1 , νW ⊥ ) = 1. 1
(3.31)
More generally, κ defines a pairing FW ⊥ ⊕W1 (H 0 ⊕ H1 ) × FW ⊥ ⊕W2 (H 1 ⊕ H2 ) 0
1
1 ), −→ FW ⊥ ⊕W2 (H 0 ⊕ H2 ) ⊗ Det(W1 , W
(3.32)
(φH , φδ ) = φκ(H,δ) .
(3.33)
(νW ⊥ ⊕W1 , νW ⊥ ⊕W2 ) = νW ⊥ ⊕W2 ⊗ det(PW2 ,W0 ).
(3.34)
0
with
In particular, 0
1
0
Proof. First notice that in the finite-dimensional case there is a natural isomorphism between the Fock space (the exterior algebra) and its dual defined by the pairing ∧k H × ∧n−k H → Det(H ), (λ1 , λ2 ) → λ1 ∧ λ2 , while in the infinite-dimensional case the pairing using the CAR construction follows directly from the definition F(H, W ) = ∧(W ) ⊗ ∧((W ⊥ )∗ ). For the geometric Fock space FW , the construction of the pairing from the determinant bundle pairing κ on DETW × DETW ⊥ is entirely analogous to the construction of the inner-product < , >W on FW from the determinant bundle pairing gφ on DETW × DETW in Eq. (2.21). Indeed, in the case of the vacuum elements the two pairings are canonically identified (see (3.36) below and Sect. 5). Let us deal first with the case (3.29). We give first the invariant definition, and then the “constructive” definition along the lines of < , >W in Sect. 2. Invariantly, in the 1 , the pairing κ : DETW1 × DET ⊥ → C defines an embedding γ : case W1 = W W1 ∗ DETW1 − 0 → FW ⊥ by γ (a)( . ) = κ(a, . ), and hence a map ρ : FW ⊥ → FW , 1 ∗ ∗ ρ(f )( . ) = f (γ ( . )). This gives us a pairing FW × FW ⊥ → C with (f, g) = ∗∗ ∼ F f (γ (g)), and by duality the asserted pairing, since FW = W in the topology of uniform convergence on compact subsets of Gr(H ) (ψS ↔ evaluation at ξ(S), cf. [17], Sect. 10.2; [12], Sect. 6.2). The general case follows in the same way with C 1 , W1 ). replaced by Det(W Constructively, recall that any section in FW can be written as a linear combination of the ψ[α,λ] , with [α, λ] ∈ DETW . Hence for [α, λ] ∈ DETW1 , [β, µ] ∈ DETW ⊥ we 1 can define the Fock pairing by setting 1 , W1 ), (ψ[α,λ] , ψ[β,µ] ) = κ([α, λ], [β, µ]) ∈ Det(W
(3.35)
Determinant Bundles and FQFT
589
and then extending by linearity. In particular, from (2.25) we have νW1 = ψ[idW1 ,1] and νW , and so ⊥ = ψ[idW ⊥ ,1] 1
1
(νW1 , νW 1 ,W1 ), ⊥ ) = κ(det(idW1 ), det(idW ⊥ )) = det(PW 1
1
where the final equality is Eq. (3.17). Notice further that if we extend gφ in (2.21) to a 1 , W1 ) by gφ ([α, λ], [β, µ]) = λµdet(α ∗ PW β), map gφ : DETW1 × DETW 1 → Det(W then the Fock space inner-product becomes a Hermitian pairing < , >W1 : FW1 ×FW 1 → 1 , W1 ) and with respect to the identification Gr(H, W1 ) ↔ Gr(H 1 , W ⊥ ), W ↔ Det(W 1 W ⊥ we have νW 1 and ⊥ ↔ νW 1
(νW1 , νW 1 >W1 = det(PW 1 ,W1 ). ⊥ ) =< νW1 , νW
(3.36)
1
The pairing (3.32) now follows from (3.29) and (3.28). Alternatively we can define it directly as (ψ[α,λ] , ψ[β,µ] ) = ψκ([α,λ],[β,µ]) , where κ is the pairing (3.12). Note that if conditions (i) and (ii) do not hold then κ([α, λ], [β, µ]) = 0. Equation (3.33) is now just by construction, and Eq. (3.34) follows easily from (3.20). + ∗ and hence the The Fock space pairing (3.31) defines an isomorphism FW ⊥ ∼ = FW 1 1
vector φH ∈ FW ⊥ (H 1 ) ⊗ FW2 (H2 ) defined by 1
H ∈ MorCGr ((H1 , W1⊥ ), (H2 , W2 )) is canonically an element of Hom(FW1 (H1 ), FW2 (H2 )) which is a morphism of Cvect , as required. In the case (H 0 , W0⊥ ) = ∅ the map MorCGr (∅, (H1 , W1 )) → Hom(C, FW1 ) is defined by 1 → νW , and similarly when (H2 , W2 ) = ∅. The functoriality of the composition of linear maps with respect to the multiplication in CGr is precisely (3.33). We may state this as: Theorem 3.3. The category multiplication in CGr induces through the Fock space functor a canonical multiplication in the category Cvect . This can be conveniently summarized in the statement that the following diagram commutes: Gr(H 0 ⊕ H1 , W0⊥ ⊕ W1 ) × Gr(H 1 ⊕ H2 , W1⊥ ⊕ W2 ) −−−−→ Gr(H 0 ⊕ H2 , W0⊥ ⊕ W2 ) ∗
(H,δ)
H∗δ DETW ⊥ ⊕W × DETW ⊥ ⊕W 1 2 0 1
Plucker
−−−−→
DETW ⊥ ⊕W 2 0
Plucker
FW ⊥ ⊗ FW1 ⊗ FW ⊥ ⊗ FW2 1 0
−−−−→
FW ⊥ ⊗ FW2 0
κ
(,)
,
where H, δ are, respectively, a choice of section of the bundles DETW ⊥ ⊕W1 and 0 DETW ⊥ ⊕W2 . 1
590
J. Mickelsson, S. Scott
3.7. The Fock Functor. The Fock functor Z : Cd → Cvect is the projective representation of Cd defined by the composition of the functors C and M, thus Z is the functor Z = M ◦ C : Cd −→ Cvect .
(3.37)
Z : Ob(Cd ) −→ Ob(Cvect ),
(3.38)
Z acts on objects of Cd by
Z((Y, W )) = FW (HY ),
Z(∅) = C,
and on morphisms by Z : MorCd ((Y1 , W1 ), (Y2 , W2 )) −→ MorCd (FW1 (HY1 ), FW2 (HY2 )), Z((X , H)) = φH ,
H ∈ Det(W1⊥ ⊕ W2 , K(DX )),
(3.39)
where (Y1 , W1 ),(Y2 , W2 ) are not both empty, and φH is defined as in Sect. 3.6 by the Fock space pairing. If (Y1 , W1 ) = ∅ = (Y2 , W2 ), so X is a closed manifold, then Z(X ) = det(DX ) ∈ Det(DX ) ∼ = C, where the trivialization requires a choice. The “sewing property” of the FQFT is precisely the functorial Fock space pairing of Proposition 3.2. Note that if both Wi ) = ∅ it is not possible to choose φH to be the vacuum vector νW ⊥ ⊕W2 ∈ FW ⊥ ⊕W2 , since K(DX ), depending on global data, is always 1
1
transverse to the pure boundary data W1⊥ ⊕W2 . Consider though the case W1 = ∅. Let X be a closed connected manifold partitioned by an embedded codimension 1 submanifold Y , so that X = X0 ∪Y X 1 . Here X 0 , X1 are manifolds with boundary Y , where ∂X 0 = Y has outgoing orientation and ∂X1 = Y has incoming orientation. X 0 is assumed to be associated to a morphism X 0 in MorCd (∅, (Y, W )) for a choice of admissible polarization W ∈ Gr(HY ). In this case we can choose W = K(D 0 ) and φH = νK(D 0 ) . Similarly, we have X 1 ∈ MorCd ((Y, W ⊥ ), ∅), and we may choose W ⊥ = K(D 1 ). As a corollary of the properties of Z we then have the following algebraic sewing law for the determinant with respect to a partitioned closed manifold. Theorem 3.4. There are functorial bilinear pairings ( , ) : FK(D 0 ) (HY ) × FW ⊥ (H Y ) −→ Det(DP0 W ),
(3.40)
where the right-side is the determinant line of the EBVP DP with (νK(D 0 ) , νW ⊥ ) = det(DP0 ),
(3.41)
( , ) : FK(D 0 ) (HY ) × FK(D 1 ) (H Y ) −→ Det(DX ),
(3.42)
and
where the right-side is the determinant line of the Dirac operator DX over the closed manifold X, with (νK(D 0 ) , νK(D 1 ) ) = det(DX ).
(3.43)
Determinant Bundles and FQFT
591
Proof. We just need to recall a couple of facts. From Eqs. (3.29) and (3.30) we have a pairing FK(D 0 ) (HY )×FW ⊥ (H Y ) −→ Det(K(D 0 ), W ) with (νK(D 0 ) , νW ⊥ ) = det(S(PW )), where S(PW ) : K(D 0 ) → W is the operator of Sect. 2. But from (2.9) there is a canonical isomorphism Det(S(PW )) ∼ = Det(DPW ), with det(S(PW )) → det(DP ,b ). This proves the first statement. The second statement follows similarly upon recalling from [20] (Theorem 3.2) that there is a canonical isomorphism DET((I − P (D 1 ) ◦ P (D 0 )) ∼ = Det(DX ), again preserving the determinant elements. + Thus one may think of the determinant det(DP ) as an object in the complex line Det(K(D), P ) depending on a choice of boundary condition P , or absolutely the “quantum determinant” of D as a ray in the Fock space FK(D) defined by the vacuum vector that does not depend on a choice of P . The two viewpoints are related by (3.41). Finally, we point out that, in particular, the Fock functor naturally defines a map from geometric fibrations to vector bundles. To a geometric fibration N of closed ddimensional manifolds endowed with a spectral section P it assigns the corresponding Fock bundle FP . A “projective” morphism between objects (N1 , P1 ) and (N2 , P2 ) is a geometric fibration of M of d + 1-dimensional manifolds with boundary N1 N2 along with a section of the determinant bundle DET(P⊥ 1 ⊕ P2 , K(D)), where D is the family of Dirac operators defined by M. This defines a bundle map FP1 → FP2 using the generalized Plucker embedding (2.34) and the Fock space pairing. For a partition of a closed geometric fibration M = M 0 ∪N M 1 over a parameter manifold B by an embedded fibration of codimension 1 manifolds, the analogue of Theorem 3.4 then states that there are functorial Fock bundle pairings: ( , ) : FD0 × FP⊥ −→ DET(D0 , P),
(νD0 , νP⊥ ) = det(DP0 ),
(3.44)
where the right-side is the determinant line bundle of the family of EBVPs (D0 , P), while νD0 , νP are the vacuum sections of the Fock bundles FD0 , FP (see (2.15)), and det(DP0 ) the determinant section of DET(D0 , P); and ( , ) : FD0 × FD1 −→ Det(DM ),
(νD0 , νD1 ) = det(DM ).
(3.45)
The proof again requires only the properties of the Fock bundle pairing and the determinant bundle identifications of Sect. 2 and [20]. Notice that there is no regularization here of the determinant, only a pairing between bundle sections. 4. Gauge Anomalies and the Fock Functor In this section we give a physical application of these ideas with a Fock functor description of the chiral and commutator anomalies for an even-dimensional manifold with (odd-dimensional) boundary. The Fock functor assigns vector spaces to all odd-dimensional compact oriented spin manifolds Y and polarizations. There is no further restriction on the topology of Y . However, in this section we shall restrict to a fixed topological type for Y . For our purposes this is no real restriction since our principal aim is to understand the action of continuous symmetries, diffeomorphisms and gauge transformations, on the family of
592
J. Mickelsson, S. Scott
Fock spaces and on the morphisms between the Fock spaces; the action of the symmetry group cannot change the topological type of Y . To be more concrete, we shall consider the case of the parameter space B = A of smooth vector potentials labeling the geometries over Y . Thus we are lead to consider the action of the group of gauge transformations on the bundle F of Fock spaces over the base A. The gauge transformations act naturally on the base A and thus we have a lifting problem: Construct a (projective) action of the gauge group in the total space of F intertwining with the family of quantized Dirac Hamiltonians in the fibers. We want to stress that we are not going to construct a representation of the gauge group in a single Fock space but we have linear isometric action between different fibers of the Fock bundle. First, we recall some known facts about gauge anomalies in even dimensions. Let M be a closed even-dimensional Riemannian spin manifold and let A be the space of vector potentials on a trivial complex G-bundle over M. For each A ∈ A we have a coupled Dirac operator DA : C ∞ (M; S ⊗ E) → C ∞ (M; S ⊗ E) given locally by DA =
n
σi (∂i + 2i + Ai ) ,
i=1
where 2i and Ai are respectively the components of the local spin connection and Gconnection A, and σi the Clifford matrices. Since M is even-dimensional, then DA splits into positive and negative chirality components, and the object of interest is the Chiral Dirac operator + DA = DA (
1 + γn+1 ) : C ∞ (M; S + ⊗ E) → C ∞ (M; S − ⊗ E). 2
Acting on A we have the group of based gauge transformations G, which acts covari+ + ± ± = g −1 DA g, so that Ker Dg.A = g(Ker DA ). We are antly on the Dirac operators Dg.A interested in the Fermionic path integral: ∗ + Z(A) = e M ψ DA ψ dm DψDψ ∗ , (4.1) C ∞ (M;S + ⊗E)
and a formal extension of finite-dimensional functional calculus gives + Z(A) := det(DA ).
To obtain an unambiguous regularization of (4.1) we therefore require a gauge covariant regularized determinant varying smoothly with A in order that Z(A) pushes down smoothly to the moduli space A/G. In the case of Dirac fermions (both chirality sectors) this can be done and there is a gauge invariant regularized determinant detreg (DA ). For chiral Fermions on the other hand, there is an obstruction due to the presence of zero modes of the Dirac operator. The covariance of the kernels means that the determinant line bundle descends to A/G and the obstruction to the existence of a covariant Z(A) varying smoothly with A ∈ A is the first Chern class of the determinant bundle on A/G, which is the topological chiral anomaly. A 2-form representative for the Chern class DETD+ can be constructed as the transgression of the 1-form ω1 ∈ P(G), + 1 d(det r (DgA )) , ω1 (g) = + 2π i det r (DgA )
Determinant Bundles and FQFT
593
measuring the obstruction to gauge covariance of a choice of regularized determinant + det r (DA ). For details see [2, 14]. In the case of a manifold X with boundary Y new complications arise. Fixing an elliptic boundary condition (spectral section) P for the family of chiral Dirac operators + D+ = {DA : A ∈ A}, we obtain a Fock bundle FP over A to which we aim to lift the G action. It is natural to look first at gauge transformations (or diffeomorphisms) which are trivial on the boundary. In fact, the calculation of the Chern class in [2] can be extended to this case using a version of the families index theory for a manifold with boundary, [5, 16]. The gauge variation of the chiral determinant can be written as + + det r (Dg.A ) = detr (DA )ω(g; A),
(4.2)
where log ω is an integral over X of a local differential polynomial in g, A and the metric on X; ω is the integrated version of the “infinitesimal” anomaly form ω1 . The important point is that the formula applies both to the case of a manifold with/without boundary. In fact, in the latter case this gives a direct way to define the determinant bundle over A/G, [13]. The locality of the anomaly (4.2) is compatible with the formal sewing formula (5.2). Applying a gauge transformation which is trivial on Y to the right-hand side of the equation gives a gauge variation which is a product of gauge variations on the two halves X0 , X1 of M. This product is equal, by locality of the logarithm, to the gauge variation on M of the path integral on the left-hand-side. Since the cutting surface Y is arbitrary, one can drop the requirement that g is trivial on Y . The gauge transformations (and diffeomorphisms) which are not trivial on the boundary need a different treatment. This is because they act non-trivially on the boundary Fock spaces FY . We shall concentrate on the case when Y is odd dimensional. The first question to ask is how the action of the gauge group on the parameter space B of boundary geometries on Y is lifted to the total space of the bundle of Fock spaces F → B. This problem has already been analyzed (leading to Schwinger terms in the Lie algebra of the group G) in the literature, but in the present article we want to clarify how the boundary action intertwines with the Fock functor construction.
4.1. Commutator anomaly on the boundary. Let b ∈ B and W ∈ Grb . In the rest of this section B denotes the space of metrics and vector potentials on a fixed manifold Y and Yb is the manifold Y equipped with the geometric data b. The pair (b, W ) is mapped to (g.b, g.W ) by a gauge transformation (or a diffeomorphism) g, acting on both potentials, metrics and spinor fields. This induces an unitary map from the Fock space F(Hb , W ) to F(Hg.b , g.W ), by a ∗ (u) → a ∗ (g · u) and similarly for the annihilation operators. However, sometimes (b, W ) do not appear independently, but W is given as a function of the boundary geometry; b → PW =Wb is a Grassmann section; this leads to the construction of the bundle of Fock spaces Fb parameterized by b ∈ B, as already mentioned above. An example of this situation is the following. Suppose the Dirac operators on the boundary do not have zero eigenvalue (this happens when massive Fermions are coupled to vector potentials). Then it is natural to take Wb = Hb+ as the space of positive energy states. Still this case does not lead to any complications because of the equivariance property Wg.b = g.Wb . However, there are cases when no equivariant choices for Wb exist. This happens when we have massless chiral Fermions coupled to gauge potentials. For some potentials there are always zero modes and one
594
J. Mickelsson, S. Scott
cannot take Wb as the positive energy subspace without introducing discontinuities into the construction. Let us assume that a Grassmann section Wb is given. For each boundary geometry b we have a Fermionic Fock space Fb = F(Hb , Wb ) determined by the polarization HYb = Wb ⊕ Wb⊥ . In order to determine the obstruction to lifting the gauge group action on Y to the bundle of Fock spaces such that g −1 DYb g = DYg.b we compare the action on F to the natural action in the case of polarizations Wb defined by the positive energy subspaces of Dirac operators DYb − λ. We have fixed a real parameter λ and we consider only those boundary geometries b ∈ B for which λ is not an eigenvalue. Since the choice of polarizations W is equivariant, the gauge action lifts to the (local) Fock bundle F . Relative to W the F vacua form a complex line bundle DET(W , W ); again, this is defined only locally in the parameter space. Example 4.1. Let Y be a unit circle with standard metric but varying gauge potentials. We can choose HY = W ⊕ W ⊥ as the fixed polarization defined by the decomposition to positive and negative Fourier modes. If the gauge group is SU (n) and Fermions are in the fundamental representation of SU (n) then the mapping g → g·W defines an embedding of the loop group LSU (n) to the Hilbert–Schmidt Grassmannian Gr1 (HY , W ). The pullback of the Quillen determinant bundle over the Grassmannian to LSU (n) defines the central extension of the loop group with level k = 1, [17]. There is a general method to describe the relative determinant bundle in terms of index theory on X for ∂X = Y . We assume that the spin and gauge vector bundle on Y can be smoothly continued to bundles on X. This is the case for example when Y = S 2n−1 and X is chosen such that it has the topology of a solid ball, with a product metric near the boundary. Any vector potential can be smoothly continued to a potential on X for example as A(x, r) = f (r)A(x) with f increasing smoothly from zero to the value one at r = 1; all derivatives of f vanishing at r = 0, 1. We can now define a spectral section b → Wb as the Calderon subspace associated to the continued metric and vector potential in the bulk; we denote the Dirac operator defined by this geometric data in X by DX,b . The determinant line for a Dirac operator DX,b subject to the boundary condition W is canonically the tensor product of the line DET(W , W ) and the determinant line of the same operator DX,b but subject to another choice of boundary conditions W , (2.7) Since the spectral section Wb and the Dirac operator DX,b is parameterized by the affine space of geometric data (metrics and potentials) on the boundary, the corresponding Dirac determinant bundle is topologically trivial. Let Uλ be the set of b ∈ B such that the real number λ is not in the spectrum of the corresponding Dirac operator DYb . as the spectral subspace D On Uλ we can define the boundary conditions Wb,λ Yb > λ of the boundary Dirac operator. The set Uλ is in general non-contractible and the Dirac determinant line bundle defined by the boundary conditions W can be nontrivial. The curvature of this bundle is given by the families index theorem [5, 16]. It can be written in terms of characteristic classes in the bulk and the so-called η-form on the boundary; the latter depends on spectral information about the family of Dirac operators. The curvature P when evaluated along gauge and diffeomorphism directions on the boundary data has a simplified expression; in particular, the η-form drops out since it is a spectral invariant and the contribution from the characteristic classes in the bulk reduces to a boundary integral involving the (gauge and metric) Chern–Simons forms, [7,8]: 1 P= CS[2] (A + v, 2 + w) (4.3) 2π Y
Determinant Bundles and FQFT
with
595
ˆ dCS(A, 2) = A(R)ch(F ),
where [2] denotes the part that is a 2-form along parameter directions. The symbol A + v means a connection form on Y × B such that in the Y directions it is given by a vector potential A and in the gauge directions Lu on B it is equal to the Lie algebra valued function u. In a similar way, 2 + w is the sum of the Levi-Civita connection (on Y ) and a metric connection w such that the value of w along a vector field Lu on B, generated by a vector field u on Y , is equal to the matrix valued function on Y given by the Jacobian of the vector field u. The characteristic classes are iR/4π 1/2 ˆ , A(X) = det sinh(iR/4π ) ch(X) = tr (exp(iF /2π )), where R is the Riemann curvature tensor associated to a metric g in the bulk, F is the curvature of a gauge connection A. From the previous discussion it follows that the topological information (de Rham cohomology class of the curvature) in the relative determinant bundle DET(W , W ) is given by the curvature formula for the Dirac determinant bundle for boundary polarization W . This leads to the explicit formula for lifting the gauge and diffeomorphism group action from the base B = A × M to the Fock bundle F, [7, 8]. Here M is the space of Riemann metrics on Y . Infinitesimally, the lifting leads to an extension of the Lie algebras Lie(G) and Vect(Y ) by an abelian ideal J consisting of complex valued functions on A × M. The commutator of two pairs of elements (u, f ) and (v, g) (where f, g are in the extension part J and u, v are infinitesimal gauge transformations or vector fields) is given as [(u, f ), (v, g)] = ([u, v], Lu · g − Lv · f + c(u, v)),
(4.4)
where c(u, v) is an anti-symmetric bilinear function of the arguments u, v taking values in the ideal J . It satisfies the cocycle condition c(u1 , [u2 , u3 ]) + Lu1 · c(u2 , u3 ) + cyclic permutations = 0.
(4.5)
The cocycle c is just the curvature form evaluated along gauge (or diffeomorphism group) directions, c(u, v) = P(Lu , Lv ), where Lu is the vector field on A (resp. M) generated by the gauge (diffeomorphism) group action. When Y is one-dimensional, the cocycle reduces to the central term in an affine Lie algebra or in the Virasoro algebra; in this case the cocycle does not depend on the vector potential or the metric on Y . On S 3 the cocycle (Schwinger term) is given as [12, 9] i tr A[du, dv] (4.6) c(u, v) = 24π 2 Y when the Fermions are in the fundamental representation of the gauge group; here u, v : Y → Lie(G) are smooth infinitesimal gauge transformations. On S 3 the cocycle is trivial in case of vector fields and metrics. Also in higher dimensions explicit expressions can be worked out starting from (4.3), [8].
596
J. Mickelsson, S. Scott
In the case when the projections Pb , Pb on the spaces Wb , Wb of the Grassmann sections differ by trace-class operators, we have a general formula for the curvature of the relative determinant bundle as P(Lu , Lv ) =
1 tr Fb (Lu Fb )(Lv Fb ) − Fb (Lu Fb )(Lv Fb ) , 8πi
(4.7)
where Fb = Pb − Pb⊥ and Pb⊥ is the projection on to the orthogonal complement Wb⊥ . Note that neither of the two terms on the right have a finite trace but the difference is trace class by the relative trace-class property of Pb , Pb . Note also that in the case when all the projections P are in a single restricted Grassmannian, the first term is the standard formula for the curvature of the Grassmannian. The second term can be viewed as a renormalization; it is in fact a background field dependent vacuum energy subtraction. The proof of the curvature formula (4.7) is as follows. First, one notices that this gives the curvature of the relative determinant bundle when both variables Wb , Wb lie in the same restricted Grassmannian relative to a fixed base point P0 . Then one has to show that the difference actually makes sense when dropping the existence of common base point. For that purpose one writes P(Lu , Lv ) =
1 tr (Fb − Fb )(Lu Fb )(Lv Fb ) + Fb (Lu Fb − Lu Fb )(Lv Fb ) 8πi + Fb (Lu Fb )(Lv Fb − Lv Fb ) ,
which is manifestly a trace of a sum of trace-class operators.
4.2. Chiral anomaly in the bulk. In the construction of the Fock functor we took as independent parameter a choice of an element H ∈ DET(K(Db+ ), WYb ) in the boundary determinant bundle; recall that K(Db+ ) is the range of the Calderon projection. A choice of this element, as a function of the geometric data in the bulk, is a section of the determinant bundle. In quantum field theory such a choice is provided by a choice of the regularized determinant of the chiral Dirac operator Db+ . The determinant vanishes if and only if the orthogonal projection π : K(Db+ ) → WYb is singular and therefore it makes sense to choose H = Hb ∈ DET(K(Db+ ), WYb ) (represented as an admissible linear map Hb : WYb → K(Db+ )) such that det r (Db+ ), defined subject to the boundary conditions Wb , is equal to det F (π ◦ Hb ). In the case of chiral Fermions the determinant detr (Db+ ) is anomalous with respect to diffeomorphisms and gauge transformations on X and the variation of the determinant is given by the factor ω(g; b) in (4.2). This implies the transformation rule Hg.b = Hb · ω(g; b),
(4.8)
where g is either a gauge transformation or a diffeomorphism and b stands for both the metric and gauge potential on X. ω is a non-vanishing complex function, satisfying the cocycle condition ω(g1 g2 ; b) = ω(g1 ; g2 · b)ω(g2 ; b).
(4.9)
Here the boundary conditions should be invariant under g, meaning that the gauge transformations (and diffeomorphism) approach smoothly the identity at the boundary.
Determinant Bundles and FQFT
597
If the cocycle ω is nontrivial (and this is the generic case for chiral Fermions) in cohomology, then the relation (4.2) above tells us that the Fock functor is determined by the family of Calderon subspaces K(Db+ ) and a choice of a section (the regularized determinant) of a nontrivial line bundle over the quotient space B of B modulo diffeomorphisms and gauge transformations. Example 4.2. Let A(D) be the space of smooth potentials in a unit disk D. Let G(D, ∂D) be the group of gauge transformations which are trivial on the boundary ∂D = S 1 . For each A ∈ A(D) there is a unique g = gA : D → G such that A = g −1 Ag +g −1 dg is in the radial gauge, Ar = 0, and g(p) = 1, where p ∈ S 1 is a fixed point on the boundary. It follows that B = A(D)/G(D, ∂D) can be identified as Arad (D)×G(D)/G(D, ∂D) = Arad × PG, where Arad (D) is the set of potentials in the radial gauge and PG is the group of based loops, i.e., those loops in G which take the value 1 at the point p. The first factor Arad (D) is topologically trivial as a vector space. Thus in this case the topology of the Dirac determinant bundle over the moduli space B is given by the pull-back of the canonical line bundle over PG, [17], with respect to the map A → gA |∂D . The sections of DET → B are by definition complex functions λ : A(D) → C which obey the anomaly condition (4.2), [13].
4.3. Relation of the chiral anomaly to the commutator anomaly. The bulk anomaly and the extension (Schwinger terms) of the gauge group on the boundary are closely related, [13]. As we saw above, the Fock functor is determined by a choice of a section b → Hb of the relative line bundle DET(K(Db+ ), WYb ). The section transforms according to the chiral transformation law for regularized determinants, Hg·b = ω(g; b)Hb , for transformations g which are equal to the identity on the boundary. If now h is a transformation which is not equal to the identity on the boundary, we can define an operator T (h) acting on sections by (T (h)ξ )(b) = γ (h; b)ξ(h−1 · b),
(4.10)
where γ is a complex function of modulus one and must be chosen in such a way that ξ (b) = (T (h)ξ )(b) satisfies the condition (4.2). Explicit expressions for γ have been worked out in several cases, [14]. For example, if dimX = 2 and g is a gauge transformation then i γ (h; A) = exp( tr A dhh−1 ), (4.11) 2π X where tr is the trace in the representation of the gauge group determined by the action on Fermions. In general, γ must satisfy the consistency condition, γ (h; g · b)ω(hgh−1 ; h−1 · b) = γ (h; b)ω(g; b). In the two-dimensional gauge theory example, i i ω(g; A) = exp( tr (dgg −1 )3 ). tr Adgg −1 + 2π X 24π
(4.12)
(4.13)
598
J. Mickelsson, S. Scott
The latter integral is evaluated over a 3-manifold M such that its boundary is the closed 2-manifold obtained from X by shrinking all its boundary components to a point. The introduction of the factor γ in (4.10) has the consequence that the composition law for the group elements is modified, T (g1 )T (g2 ) = θ(g1 , g2 ; b)T (g1 g2 ),
(4.14)
where θ is a S 1 valued function, defined by θ (g1 , g2 ; b) = γ (g1 g2 ; b)γ (g1 ; b)−1 γ (g2 ; g1 −1 · b).
(4.15)
Thus we have extended the original group of gauge transformations (diffeomorphisms) by the abelian group of circle valued functions on the parameter space B. At the Lie algebra level, the relation (4.14) leads to a modified commutator (by Schwinger terms discussed above) of the “naive” commutation relations of the algebra of infinitesimal gauge transformations or the algebra of vector fields on the manifold X. Actually, the modification is “sitting on the boundary”; the action of T (g) was defined in such a way that the (normal) subgroup of gauge transformations which are equal to the identity on the boundary acts trivially on the sections ξ(b). There is an additional slight twist to this statement. Actually, the normal subgroup is embedded in the extended group as the set of pairs (g, c(g)), where c(g) is the circle valued function defined by c(g) = γ (g; b)−1 ω(g; b).
(4.16)
The consistency condition (4.12) guarantees that the multiplication rule (g1 , c(g1 ))(g2 , c(g2 )) = (g1 g2 , c(g1 g2 )), holds in the extended group with the multiplication law (g1 , µ1 )(g2 , µ2 ) = (g1 g2 , θ(g1 , g2 )µ1 µ2 g1 ), where µg (b) = µ(g −1 b). 4.4. Summary. Let us summarize the above discussion on Fock functors and group extensions. On the boundary manifold Y = ∂X a choice of boundary conditions Wb (labeled by a parameter space BY of boundary geometries) defines a fermionic Fock space FYb . The group of gauge transformations (or diffeomorphisms) on Y acts in the bundle of Fock spaces (parameterized by geometric data on the boundary) through an abelian extension; the Lie algebra of the extension is determined by a 2-cocycle (Schwinger terms) which are computed via index theory from the curvature of the relative determinant bundles DET(Wb , Wb ), where Wb is the positive energy subspace defined by the boundary Dirac operator. If the boundary is written as a union Y = Yin ∪ Yout of the ingoing and outgoing components then the Fock functor assigns to the geometric data on X a linear operator ZX : Fin → Fout . A gauge transformation in the bulk X sends ZX to γ (g; X)Zg −1 X . This action defines an abelian extension of the gauge group. There is a normal subgroup isomorphic to the group of gauge transformations which are equal to the identity on the boundary. This subgroup acts trivially, therefore giving an action of (the abelian extension of) the quotient group on the boundary. The latter group is isomorphic to the group acting in the Fock bundle over boundary geometries.
Determinant Bundles and FQFT
599
5. Path Integral Formulae and a 0+1-Dimensional Example In this section we outline the fermionic path integral formalism for an EBVP and explain how the Fock functor models this algebraically. 5.1. Path integral formulae. The analogue of (4.1) for an EBVP is ∗ ZX (P ) := det(DP ) = e X ψ Dψ dm DψDψ ∗ , EP
(5.1)
where EP = dom(DP ). This is Eq. (1.4) for the case S(ψ) = X ψ ∗ Dψ dx and where the local boundary condition f has been replaced by the global boundary condition P . If we consider a partition of the closed manifold M = X0 ∪Y X1 . The Dirac operator over M restricts to Dirac operators D 0 over X 0 and D 1 over X1 . We assume that the geometry is tubular in a neighbourhood of the splitting manifold Y , then we have Grassmannians Gr Y i of boundary conditions associated to D i , where Y 1 = Y = Y 0 . The reversal of orientation means that there is a diffeomorphism Gr Y 0 ≡ Gr Y 1 given by P ↔ I − P , so that each P ∈ Gr Y 0 defines the boundary value problems DP0 and DI1−P . According to (4.1) and (5.1), the analogue of the sewing formula (1.5) is + ∗ 0 ψ ∗ DA ψ dm ∗ M e DψDψ = DP e X0 ψ0 D ψ0 dx0 Dψ0 Dψ0∗ E (M) GrY EP (X0 ) (5.2) ∗ 1 × e X1 ψ1 D ψ1 dx1 Dψ1 Dψ1∗ . EI −P (X1 )
That is,
ZM =
GrY
or
ZX0 (P )ZX1 (I − P ) DP
(5.3)
detDP0 .detDI1−P DP .
(5.4)
det(D) =
GrY
Notice there is no regularization involved here of the determinant. Without further choices, the path integral formulae express a relation between sections of the corresponding determinant bundles, and (5.4) is a pairing on the spaces of sections of those line bundles. More precisely, according to the properties of the Fock functor (see Sect. 3), the Fermionic integral may be rigourously understood as a linear functional ∧(HY ⊕ H Y ) −→ Det(DP0 ), while (5.4) is replaced by the evaluation of the Fock space bilinear pairing on vacuum elements (3.42): det(D) = (νK(D 0 ) , νK(D 1 ) ).
(5.5)
However, adopting a slightly different point of view gives a more precise meaning to the integral formulae above. With a given boundary condition P the determinants of the (chiral) Dirac operators on the manifolds X0 and X1 should be interpreted as elements of the determinant line bundle DET over the Grassmannian Gr Y , with base point H + . The actual numerical value of the Dirac determinant depends on the choice of a (local) trivialization. For example, one could define det(D) as the zeta function regularized
600
J. Mickelsson, S. Scott
determinant det ζ ((DB )∗ DA ), where DB is a background Dirac operator chosen in such a way that (DB )∗ DA has a spectral cut, i.e., there is a cone in the complex plane with vertex at the origin and no eigenvalues of the operator lie inside of the cone. The value of the zeta determinant will depend on the choice of the background field B. A choice of an element in the line in DET over P 0 ∈ Gr Y is given by a choice of a pair (α, λ), where α : H+ → P 0 is a unitary map and λ ∈ C. It can be viewed as a holomorphic section of the dual determinant bundle DET∗ according to (2.24), ψ[α,λ] (ξ ) = λdetF (α ∗ ◦ π ◦ ξ ), where π : ξ(H+ ) → P 0 is the orthogonal projection. We can think of the variable ξ as the parameter for different elements W = ξ(H+ ) ∈ Gr Y . We want to replace the (ill-defined) integral GrY det(DP0 )det(DP1 )dP by a (so far ill-defined) integral of the form ψ[α,λ] (ξ )∗ ψ[β,µ] (ξ )dξ. (5.6) ξ
But this integral looks like the functional integral defining the inner product between a pair of fermionic wave functions (vectors in the Fock space) defined in Eq. (2.33): detF (α ∗ β) = !ψ[α,λ] , ψ[β,µ] " = ψα,λ (ξ(S))∗ ψ[β,µ] (ξ(S)). (5.7) S∈S
The relation with (5.5) is given by the identity (3.36) which tells us that (νK(D 0 ) , νK(D 1 ) ) = !νK(D 0 ) , νK(D 1 )⊥ ",
(5.8)
so here [α, λ] = det(PK 0 PK 0 ), [β, µ] = det(PK⊥1 PK⊥1 ). To illustrate this consider the case of the Dirac operator over a closed odd-dimensional spin manifold X partitioned by Y . According to (3.16), (3.35), the Fock pairing ( , ) : FK(D 0 ) (HY ) ⊗ FK(D 1 ) (H Y ) → Det(K(D 0 ), K(D 1 )⊥ ) ∼ = Det(DX ) evaluated on vacuum elements is the abstract determinant of the operator P (D 0 ) + P (D 1 ) : K(D 0 ) ⊗ K(D 1 ) → HY ,
(ξ, η) → ξ + η.
(5.9)
To realize this canonically as a number we can compare it to the Fock pairing coming from forming the doubled elliptic operator Ddouble = D 0 ∪ −D 0 over the closed double manifold X0 UY X0 [6]. To do that we use a canonical trivialization of the determinant lines resulting from the identification K(D 0 ) = graph(h0 : F + → F − ) proved in [19], where F ± are the spaces of positive and negative spinor fields over the even-dimensional boundary Y , h0 is a uniquely determined unitary isomorphism differing from g+ = (DY− DY+ )−1/2 DY+ by a smoothing operator, and DY± are the boundary chiral Dirac operators which we assume to be invertible. In particular, H + = graph(g+ : F + → F − ). Similarly K(D 1 ) = −1 by a smoothing operator F − → graph(h1 : F + → F − ), where h1 differs from −g+ + F . In this trivialization one has I h1 1 I h−1 1 0 1 0 , P (D ) = . P (D ) = 2 h0 I 2 h−1 I 1
Determinant Bundles and FQFT
601
Hence the composition of (5.9) with the graph trivialization HY → K(D 0 ) ⊗ K(D 1 ),
(ξ, η) → ((ξ, h0 ξ ), (h1 η, η)),
given us the automorphism of HY = F + ⊗ F − I h1 ξ ξ . → η η h0 I
(5.9b)
In the case of Ddouble one has K(D 1 ) = K(D 0 )⊥ and so h1 in (5.9b) is then replaced by h1 = −h−1 0 . Hence the composition of (5.9b) with the inverse of (5.9b) for Ddouble is the operator I 21 (h1 + h−1 ξ ξ 0 ) , → η η 0 21 (I + h0 h1 ) which is of (Fredholm) determinant class and this gives the realization of the left side of (5.8) in the graph trivialization as 1 νK(D 0 ) , νK(D 1 )⊥ graph = detF (I − h0 h1 ). 2 On the other hand, a similar computation of the right side of (5.8) in the graph trivialization, using (5.7) yields 1 1 ∗ α−h−1 αh0 = detF (I − h1 h2 ) , 2 2 1 α+ the 1/2 arising from the relative term αh∗0 αh0 . Here αT = , where the column T α+ index labels the different vectors of the canonical basis for the graph of T : F + → F − , and the row labels of α+ label the different coordinates of a basis for F + . !νK(D 0 ) , νK(D 1 )⊥ "graph = detF
5.2. A (0 + 1)-dimensional example. The motivation for replacing the integration formula (5.6) by the sum in (5.7) comes from finite dimensions. If H = H− ⊕ H+ is a decomposition of a 2N dimensional vector space into a pair of orthogonal N dimensional subspaces then the maps α, β, ξ above become (with respect to the basis {ei } with i = ±1, ±2, · · · ± N ) 2N × N matrices and we have the matrix identity det(α ∗ β) = det(α ∗ ξ(i))det(ξ(i)∗ β), (i)
the sum being over all sequences −N ≤ i1 < i2 . . . iN ≤ N (with iν ) = 0). On the other hand, it follows from Eq. (3.48) in [10], that the following integration formula holds in this situation: det(α ∗ β) = aN dξ dξ ∗ det(α ∗ ξ )det(ξ ∗ β) · det(ξ ∗ ξ )−2N−1 , (5.10)
602
J. Mickelsson, S. Scott
where aN is a numerical factor and the last factor under the integral sign can be incorporated to the definition of the integration measure. If we consider the basis elements α−h−1 , βh0 for linear maps hi : H + → H − and integrate over the dense subspace Ugraph 1
parameterizing the elements ξT = (ξ + , T ξ − ) with T ∈ Hom(H + , H − ), the integral (5.10) becomes det(1 − h2 h1 ) = aN dT dT ∗ det H + (1 − h1 T )det H − (1 + T ∗ h2 ) · det(1 + T ∗ T )−2N−1 .
(5.11)
This has consequences for determinants in dimension one, where we work with the compact Grassmannian. Let X = [a0 , a1 ] and let E be a complex Hermitian bundle over X with unitary connection ∇. Then the associated generalized Dirac operator is simply D = i∇d/dx : C ∞ (X; E) → C ∞ (X; E). Choosing a trivialization of E, so that Ea0 ⊕ Ea1 = Cn ⊕ Cn , a global boundary condition D is specified by an element P ∈ Gr(Cn ⊕ Cn ), defining the elliptic boundary value problem: DP = i∇d/dx : dom(DP ) −→ L2 ([a0 , a1 ]; E). The Fock functor here is a topological 0+1-dimensional FQFT from the category C1 , whose objects are points endowed with a complex finite-dimensional Hermitian vector space V (we do not need to give a polarization in this finite-dimensional situation), and whose morphisms are compact 1-dimensional manifolds with boundary with Hermitian bundle with unitary connection. The Fock functor Z takes an object (p, V ) ∈ C1 to the Fock space Z(p, V ) := 2hol (Gr(V ); (Det(E))∗ )∗ ∼ = ∧V , where E is the usual canonical vector bundle over the Grassmannian. Consider two compatible morphisms T01 = ([a0 , a1 ], E 01 , ∇ 01 ) and T12 = ([a1 , a2 ], E 12 , ∇ 12 ) in C1 , so that T02 = T12 T01 = ([a0 , a2 ], E 02 , ∇ 02 ), with E 02 |[a0 ,a1 ] = E01 , etc. Let Vi be the fibre over ai , and in [ai , aj ] we assign ai to be “incoming” and aj to be “outgoing”. For incoming boundary components ai the associated object in C1 is (ai , Vi ). Then we define Z(Tij ) = νKij , where Kij ∈ Gr(V0 ⊕ Vj ) is the Calderon subspace of boundary values of solutions to the ‘Dirac’ ij operator D = i∇d/dx . We have Z(Tij ) ∈ Z((pi , V i ) (pj , Vj )) = Z(pi , V i ) ⊗ Z(pj , Vj ) ∼ = Hom(∧Vi , ∧Vj ) := FKij . = (∧V i ) ⊗ (∧Vj ) ∼ Because Kij = graph(hij : Vi → Vj ) with hij the parallel-transport of the connection on Eij between ai and aj , a simple computation gives under the above identification Z(Tij ) ←→ ∧hij ∈ Hom(∧Vi , ∧Vj ). Next we have a canonical pairing Z(a0 , V0 ) ⊗ Z(a1 , V1 ) ⊗ Z(a1 , V1 ) ⊗ Z(a2 , V2 ) −→ Z(a0 , V0 ) ⊗ Z(a2 , V2 ), (5.12) induced by subtraction V1 ⊕ V1 → V1 , with ∧h01 ⊗ ∧h12 −→ ∧h01 h12 . If we take the case, where a2 = a0 , so that T02 = T12 T01 = (S 1 = [a0 , a2 ], E 02 , ∇ 02 ), corresponding to morphisms in Gr(V0 ⊕ V1 ) and Gr(V1 ⊕ V0 ) respectively, then Z(T01 ) ∈ FK01
Determinant Bundles and FQFT
603
and Z(T10 ) ∈ FK ⊥ , and the induced pairing FK01 ⊗ FK ⊥ −→ C, under the above 10 10 identifications is just the supertrace ( , ) : Hom(∧V0 , ∧V1 ) ⊗ Hom(∧V1 , ∧V0 ) −→ C, (−1)k tr (ab|∧k ). (a, b) → tr s (ab) := k
Applied to the vacuum elements νWT01 ↔ ∧T01 ∈ Hom(∧V0 , ∧V1 ) and νW ⊥ ∗) ∧(−T10
T10
∈ Hom(∧V1 , ∧V0 ) we have
∗ (νWT01 , νW ⊥ ) = tr s (∧ − T10 ∧ T01 ) = T10
↔
∗ ∗ tr (∧k (T10 T01 ) = det(I + T10 T01 ).
k
(5.13) Hence we have (Z(T01 ), Z(T10 )) = det(I − h10 h01 ), (since hij is unitary), and (Z(T01 ), νW ⊥ ) = det(I + T ∗ h01 ). On the other hand it well-known that det(I + T T ∗ h01 ) = detζ (DPT ). So from Eq. (5.11) we have (Z(T01 ), Z(T10 )) = aN dT dT ∗ det ζ (DP10T )det ζ (DP01−T ∗ ) · det(1 + T ∗ T )−2N−1 , (5.14) where P−T ∗ = I − PT , expressing the relation of the algebraic Fock space pairing to the path integral sewing formula Eq. (5.4). Notice that the gauge group of a boundary component of [a0 , a1 ] is just a copy of the unitary group U (n) and under the embedding g → graph(g) := Wg , the Fock functor maps g to ∧g on ∧V . Thus in the case of 0+1-dimensions the FQFT representation of the boundary gauge group is the fundamental U (n)-representation, which is a restatement of the Borel-Weil Theorem for U (n). This means that the invariant output by the FQFT, which in fact here is a TQFT, is the character of the fundamental representation π of U (n). This is what we would expect. We are dealing with a single particle evolving through time, and so its only invariants are the representations of its internal symmetry group, which is the symmetry group of the bundle E over [a, b]. In this sense we are dealing with quantum mechanics, rather than QFT, and because it is a topological field theory the Hilbert space is finite-dimensional. 5.3. Relation to the Berezin integral. The above pairing can also be described by a Fermionic integral. Let ∧V denote the exterior algebra of the complex vector space V with odd generators ξ1 , . . . , ξn . It has as its basis the monomials ξI = ξi1 . . . ξip , I = {i1 , . . . , ip }, i1 < . . . < ip , where I runs over subsets of {1, . . . , n}, and we set |I | := p. The Fermionic (or Berezin) integral is the linear functional : ∧V −→ C, f (ξ ) −→ f (ξ ) Dξ which picks out the the top degree coefficient of f (ξ ) (a polynomial in the generators) relative to the generator ξ = ξ1 . . . ξn of Det V = ∧n V . This extends to a functional : ∧V ⊗ ∧V −→ C, f (ξ , ξ ) −→ f (ξ , ξ ) Dξ Dξ ,
604
J. Mickelsson, S. Scott
defined relative to the generator ξ ξ := ξ1 ξ1 . . . ξn ξn . of Det V
⊗Det V . Given an element T ∈ End(V ) we associate to the quadratic element ξ T ξ := i,j tij ξi ξj . We then have 1 n n! (ξ T ξ ) = det(T )ξ ξ , and more generally the Gaussian expression eξ T ξ = det(TI )ξI ξI , I
where TI denotes the submatrix (tij ) with i, j ∈ I , so that we can write eξ T ξ Dξ Dξ = det(T ),
(5.15)
so determinants are expressible as complex Fermion Gaussian integrals. Next, we have a bilinear form on ∧V ⊗ ∧V defined by < f, g >= g(ξ , ξ )σ f (ξ , ξ ) Dµ[(ξ, ξ ), (ξ , ξ )],
(5.16)
where f (ξ )σ is f (ξ ) with the order of the generators reversed, and f (ξ , ξ ) Dµ[(ξ, ξ ), (ξ , ξ )] := f (ξ , ξ )e2ξ ξ D(ξ, ξ )D(ξ , ξ ), is the Fermionic integral with respect to a Gaussian measure. The 2 arises in the exponent because we are dealing with ∧V ⊗ ∧V rather than ∧V . Applied to quadratic elements eξ T ξ and eξ Sξ defined for T , S ∈ End(V ) we then have ξ T ξ ξ Sξ !e , e " = eξ Sξ +ξ T ξ +2ξ ξ Dξ Dξ
I (ξ ξ )
=
e
S I
= detV ⊕V
T ξ
I T
ξ
Dξ Dξ
S I
= det V (I − ST ). I T : V ⊕ V → V ⊕ V and the general formula S I
Here we use (5.15) applied to
a b
= det(d)det(a − bd −1 c), valid provided d : V → V is invertible. c d We can repeat the process for a pair of complex vector spaces V0 ) = V1 of the same dimension and T ∈ Hom(V0 , V1 ) and S ∈ Hom(V1 , V0 ). Now define the Fermionic integral just to be the projection onto the form of top degree : ∧V0 ⊗ ∧V1 → Det(V0 , V1 ). Associated to T we have eT ∈ V0 ⊗ ∧V1 ∼ = Hom(∧V0 , ∧V1 ) which we ∼ ∗ may regard T as an element of V0 ⊗ ∧V1 via the Hermitian isomorphism V0 = V0 , and then e = det(T ) ∈ Det(V0 , V1 ). Here det(T ) is the element det(T )(ξ1 . . . ξn ) = T ξ1 . . . T ξn , for a basis ξi of V0 , which is canonically identified with det(T ) ∈ C when V0 = V1 , and the Gaussian element is then eT = eξ T ξ . The bilinear pairing goes through as before, with !eT , eS " = detV0 (I − ST ), which gives an alternative formulation of the Fock pairing ! , > FWT0 × FW ⊥ → C. det
T1
Determinant Bundles and FQFT
605
References 1. Atiyah, M.F.: Topological quantum field theories. Inst. Hautes Etudes Sci. Publ. Math. 68, 175 (1989) 2. Atiyah, M.F. and Singer, I.M.: Dirac operators coupled to vector potentials. Proc. Nat. Acad. Sci. USA 81, 2597 (1984) 3. Atiyah, M.F., Patodi, V.K., and Singer, I.M.: Spectral asymmetry and Riemannian geometry. I. Math. Proc. Cambridge Phil. Soc. 77, 43 (1975) 4. Berline, N., Getzler, E. and Vergne, M.: Heat Kernels and Dirac Operators. Grundlehren der Mathematischen Wissenschaften 298, Berlin: Springer-Verlag, 1992 5. Bismut, J-M and Freed, D.S.: The analysis of elliptic families. II. Dirac operators, eta invariants, and the holonomy theorem. Commun. Math. Phys. 107, 103 (1986) 6. Booß-Bavnbek, B., and Wojciechowski, K.P.: Elliptic Boundary Problems for Dirac Operators. Boston: Birkhäuser, 1993 7. Carey, A., Mickelsson, J., and Murray, M.: Index theory, gerbes and Hamiltonian quantization. Commun. Math. Phys. 183, 707 (1997) 8. Ekstrand, C., and Mickelsson, J.: Gravitational anomalies, gerbes, and hamiltonian quantization. Commun. Math. Phys. 212, 613 (2000) 9. Faddeev, L., Shatasvili, S.: Algebraic and Hamiltonian methods in the theory of nonabelian anomalies. Theoret. Math. Phys. 60, 770 (1985) 10. Fujii, K., Kashiwa,T., and Sakoda,S.: Coherent states over Grassmann manifolds and the WKB exactness in path integral. J. Math. Phys. 37, 567 (1996) 11. Grubb, G.: Trace expansions for pseudodifferential boundary problems for Dirac-type operators and more general systems. Ark. Mat. 37, 45 (1999) 12. Mickelsson, J.: Chiral anomalies in even and odd dimensions. Commun. Math. Phys. 97, 361 (1985) 13. Mickelsson, J.: Kac–Moody groups, topology of the Dirac determinant bundle, and fermionization. Commun. Math. Phys. 110, 173 (1987) 14. Mickelsson, J.: Current algebras and groups. London and New york: Plenum Press, 1989 15. Mickelsson, J.: On the hamiltonian approach to commutator anomalies in 3+1 dimensions. Phys. Lett. B 241, 70 (1990) 16. Piazza, P.: Determinant bundles, manifolds with boundary and surgery. Commun. Math. Phys. 178, 597 (1996) 17. Pressley, A. and Segal, G.B.: Loop Groups. Oxford: Clarendon Press, 1986 18. Quillen, D. G.: Determinants of Cauchy–Riemann operators over a Riemann surface. Funkcionalnyi Analiz i ego Prilozhenya 19, 37 (1985) 19. Scott, S.G.: Determinants of Dirac boundary value problems over odd–dimensional manifolds. Commun. Math. Phys. 173, 43 (1995) 20. Scott, S.G.: Splitting the curvature of the determinant line bundle. Proc. Am. Math. Soc. 128, 2763–2775 (2000) 21. Scott, S.G., and Wojciechowski, K.P.: The ζ –Determinant and Quillen’s determinant for a Dirac operator on a manifold with boundary. Geom. Funct. Anal. 10, 1202–1236 (2000) 22. Segal, G.B.: The definition of conformal field theory. Oxford preprint (1990) 23. Segal, G.B.: Geometric aspects of quantum field theory. Proc. Int. Cong. Math., Tokyo (1990) 24. Segal, G.B, and Wilson, G.: Loop groups and equations of the KdV type. Inst. Hautes Etudes Sci. Publ. Math. 61, 5 (1985) 25. Witten, E: Topological quantum field theory. Commun. Math. Phys. 117, 353 (1988) 26. Witten, E: Geometry and physics. Proc. Int. Cong. Math. Tokyo (1990) Communicated by R. H. Dijkgraaf
Commun. Math. Phys. 219, 607 – 629 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Isotropic Steady States in Galactic Dynamics Yan Guo1 , Gerhard Rein2 1 Lefschetz Center for Dynamical Systems, Division of Applied Mathematics, Brown University,
Providence, RI 02912, USA
2 Mathematisches Institut der Universität München, Theresienstr. 39, 80333 München, Germany
Received: 24 October 2000 / Accepted: 7 January 2001
Abstract: The present paper completes our earlier results on nonlinear stability of stationary solutions of the Vlasov–Poisson system in the stellar dynamics case. By minimizing the energy under a mass-Casimir constraint we construct a large class of isotropic, spherically symmetric steady states and prove their nonlinear stability against general, i. e., not necessarily symmetric perturbations. The class is optimal in a certain sense, in particular, it includes all polytropes of finite mass with decreasing dependence on the particle energy. 1. Introduction The question of which galaxies or globular clusters are stable has for many years attracted considerable attention in the astrophysics literature, cf. [4,6] and the references there. If one neglects relativistic effects and collisions among the stars, then from a mathematics point of view the question is which steady states of the Vlasov–Poisson system ∂t f + v · ∂x f − ∂x U · ∂v f = 0, U = 4π ρ, lim U (t, x) = 0, |x|→∞ ρ(t, x) = f (t, x, v)dv, are stable. Here f = f (t, x, v) ≥ 0 denotes the density of the stars in phase space, t ∈ R denotes time, x, v ∈ R3 denote position and velocity respectively, ρ is the spatial mass density of the stars, and U the gravitational potential which the ensemble induces collectively. If U0 is a time-independent potential then the particle energy E=
1 2 |v| + U0 (x), 2
(1.1)
608
Y. Guo, G. Rein
is conserved along the characteristics of the Vlasov equation. Therefore, a standard technique to obtain steady states of the Vlasov–Poisson system is to prescribe the particle distribution f0 as a function of the particle energy – this takes care of the Vlasov equation – and to solve self-consistently the remaining Poisson equation. The main problem then is to show that the resulting steady state has finite mass and possibly compact support. A well known class of steady states for which this approach works are the so-called polytropes f0 (x, v) = (E0 − E)k+ .
(1.2)
Here (·)+ denotes the positive part, E0 ∈ R is a cut-off energy, and −1/2 < k ≤ 7/2; only for this range of exponents do these steady states have finite mass, if k < 7/2 they have compact support in addition. If f0 depends only on the particle energy the resulting steady state is isotropic and spherically symmetric. Assuming spherical symmetry of U0 to begin with steady states may also depend on a further conserved quantity, the modulus of angular momentum squared, L := |x|2 |v|2 − (x · v)2 ,
(1.3)
in which case they are no longer isotropic. According to Jeans’ Theorem the distribution function of any spherically symmetric steady state has to be a function of the invariants E and L, cf. [2]. In [7, 9, 10, 17] we addressed the stability of steady states by a variational technique: It was shown that an appropriately chosen energy-Casimir functional has a minimizer under the constraint that the mass is prescribed, this minimizer was shown to be a steady state, and its nonlinear stability was derived from its minimizing property. While this turned out to be an efficient method to assess the stability of known steady states and also to construct new ones which automatically have finite mass, compact support, and are stable, there were two unwanted restrictions: Perturbations had to be spherically symmetric, and only the polytropes with 0 < k < 3/2 were covered. A physically realistic perturbation, say by the gravitational pull of some distant galaxy, is hardly spherically symmetric. Also, while the restriction k > 0 was indispensable – for k ≤ 0 no corresponding Casimir functional can be defined – and is probably necessary for stability since it makes f0 a decreasing function of the particle energy, the restriction k < 3/2 is less well motivated. In [8] the first author removed the latter restriction in the case of the polytropes, while in [18] the second author removed the restriction to spherically symmetric perturbations for a class of isotropic steady states including the polytropes with k < 3/2. It is the purpose of the present paper to combine these techniques to obtain a result which we believe is optimal in the following sense: It does not require any symmetry restrictions of the perturbations, and it covers all isotropic polytropes. As a matter of fact, the restrictions we require of the steady states are necessary to guarantee finite mass and to make the distribution function a decreasing function of the energy. Presumably, if the latter condition is violated sufficiently strongly, then the steady state is unstable. The new elements in our analysis which allow for the improvements described above are the following: Previously we minimized an energy-Casimir functional under a mass constraint. Now we minimize the total energy of the system under a mass-Casimir constraint. The change of the role of the Casimir functional – from part of the minimized functional into part of the constraint – allows us to remove the restriction k < 3/2 and was introduced in [8]. It also leads to a much cleaner assumption on the steady state or on the Casimir functional respectively. Inspired by the concentration-compactness argument
Isotropic Steady States in Galactic Dynamics
609
due to P. L. Lions [14], which was used in [18], we use a refinement of our previous scaling and splitting argument in the compactness analysis of the energy functional to get rid of the symmetry assumption for the perturbations. The investigation is restricted to isotropic steady states: If one includes anisotropic ones then the Casimir functional is not conserved along not spherically symmetric solutions, and the method breaks down. Anisotropic steady states are – under the appropriate assumptions – stable against spherically symmetric perturbations. Whether or not they are stable against general perturbations remains an open problem. The paper proceeds as follows: In the next section we establish some preliminary estimates which in particular show that the total energy is bounded from below and the kinetic energy is bounded along minimizing sequences. In Sect. 3 the existence of a minimizer of the energy is established. To prevent mass from running off to spatial infinity along a minimizing sequence we analyze how the total energy behaves under scaling transformations and under splittings of the distribution into different pieces. In Sect. 4 we show that such minimizers are spherically symmetric steady states of the Vlasov–Poisson system with finite mass and compact support. The stability properties of the steady states are then discussed in Sect. 5. Here we point out one problem: If f0 is a steady state then f0 (x + t V , v + V ) for any given velocity V ∈ R3 is a solution of the Vlasov–Poisson system which for V small starts close to f0 , but travels away from f0 at a linear rate in t. This trivial “instability”, which cannot be present for spherically symmetric perturbations, is handled by comparing f0 with an appropriate shift in x-space of the time dependent perturbed solution f (t). Technically, the necessity of this shift arises in the application of our compactness argument. The considerations discussed so far are restricted to Casimir functionals satisfying a growth condition which excludes the polytropic case k = 7/2. This limiting case, the so-called Plummer sphere, is investigated in the final section. It poses additional difficulties due to a particular scaling invariance of the various functionals considered, but by using rearrangements we are able to reduce it to the same problem with symmetry, which has been investigated in [8]. This shows that it can be essential to understand the symmetric case first, and we hope that such a reduction to symmetry can be applied to other problems as well. We conclude the introduction with some references, where we also compare our approach with other approaches to the stability problem. The first nonlinear stability result for the Vlasov–Poisson system in the present stellar dynamics case is due to G. Wolansky [22]. It is restricted to spherically symmetric perturbations of the polytropes f0 (x, v) = (E0 − E)k+ Ll
(1.4)
with exponents l > −1, 0 < k < l + 3/2 with k = −l − 1/2 and uses a variational approach for a reduced functional which isnot defined on a set of phase space densities f but on a set of mass functions M(r) := |y|≤r ρ(y) dy with r ≥ 0 denoting the radial coordinate. In particular, it does not yield a stability estimate directly for the phase space distribution f . In [21] Y.-H. Wan proves stability by a careful investigation of the quadratic and higher order terms in a Taylor expansion of the energy-Casimir functional about a steady state. He has to assume the existence of the steady state, and requires a strong condition on f0 which is satisfied by the polytropes only for k = 1 and l = 0, but his arguments do not require spherical symmetry of the admissible perturbations. We also mention [1] where stability for the limiting polytropic case k = 7/2 and l = 0 is considered. Global classical solutions to the initial value problem for the Vlasov–Poisson system were first established in [15], cf. also [20]. A rigorous result on linearized stability is given in [3]. For the plasma physics case, where the sign in the Poisson equation is
610
Y. Guo, G. Rein
reversed, the stability problem is better understood; we refer to [5, 11, 12, 16]. Finally, a very general condition which guarantees finite mass and compact support of steady states, but not their stability, is established in [19]. 2. Preliminaries For a measurable function f = f (x, v) we define ρf (x) := f (x, v) dv, x ∈ R3 , and Uf := −ρf ∗ Next we define 1 Ekin (f ) := 2
1 . |·|
|v|2 f (x, v) dv dx, ρf (x)ρf (y) 1 1 Epot (f ) := − |∇Uf (x)|2 dx = − dx dy, 8π 2 |x − y| H(f ) := Ekin (f ) + Epot (f ), and
C(f ) :=
Q(f (x, v)) dv dx,
where Q is a given function satisfying certain assumptions specified below. We will minimize the total energy or Hamiltonian H of the system under a mass-Casimir constraint, i. e., over the set FM := f ∈ L1 (R6 ) | f ≥ 0, C(f ) = M, Ekin (f ) < ∞ , (2.1) where M > 0 is prescribed. The function Q has to satisfy the following Assumptions on Q: Q ∈ C 1 ([0, ∞[), Q ≥ 0, Q(0) = 0, and (Q1) Q(f ) ≥ C(f + f 1+1/k ), f ≥ 0, with constants C > 0 and 0 < k < 7/2, (Q2) Q is convex. Remark. (a) In the last section we consider the limiting case k = 7/2 for which Q(f ) := f 9/7 ,
f ≥ 0.
(b) On their support the minimizers obtained later will satisfy the relation λ0 Q (f0 ) = E with some Lagrange multiplier λ0 < 0 and E as defined in (1.1). Thus f0 is a function of the particle energy and thus a steady state of the Vlasov–Poisson system, provided this identity can be inverted.
Isotropic Steady States in Galactic Dynamics
611
(c) A typical example of a function Q satisfying the assumptions is Q(f ) = f + f 1+1/k , f ≥ 0,
(2.2)
with 0 < k < 7/2 which leads to a steady state of polytropic form (1.2). More generally, if an isotropic steady state (f0 , U0 ) is given with f0 of the form f0 (x, v) = φ(E) with some function φ then the above assumptions for the Casimir functional hold, if φ(E) vanishes for E larger than some cut-off energy E0 , φ(E) ≤ C(E0 −E)k , E ≤ E0 , where 0 < k < 7/2, and φ (E) < 0, E < E0 . The existence of a cut-off energy is necessary in order that the steady state has finite mass. The growth condition is essential for the compactness properties of H; cf. the difficulties in the limiting case k = 7/2, and note also that the polytropic ansatz with k > 7/2 leads to steady states with infinite mass. Finally, it is generally believed that steady states are unstable if the monotonicity condition on φ is violated sufficiently strongly. In this sense one can say that the assumptions on Q are optimal. (d) The function f, 0 ≤ f ≤ 1, Q(f ) = 1 2 (2.3) (f + 1), f >1 2 also satisfies our assumptions and leads to E/E0 , f0 (x, v) = 0,
E < E0 E ≥ E0
(2.4)
with some E0 < 0. Thus the fact that we do not require Q ∈ C 2 (]0, ∞[) with Q > 0 allows for examples where f0 has jump discontinuities, and these steady states will turn out to be dynamically stable as well. We collect some estimates for ρf and Uf induced by an element f ∈ FM . As in the rest of the paper constants denoted by C are positive, may depend on Q and M, and their value may change from line to line. Lemma 1. Let n := k + 3/2 so that 1 + 1/n > 6/5. Then for any f ∈ FM the following holds: (a) f ∈ L1+1/k (R6 ) with
f
1+1/k
dv dx +
f dv dx ≤ C.
(b) ρf ∈ L1+1/n (R3 ) with
1+1/n
ρf
dx ≤ C
f 1+1/k dv dx 3
≤ C Ekin (f ) 2n .
k/n
|v|2 f dv dx
(n−k)/n
612
Y. Guo, G. Rein
(c) Uf ∈ L6 (R3 ) with ∇Uf ∈ L2 (R3 ), the two forms of Epot (f ) stated above are equal, and |∇Uf |2 dx ≤ C ρf 26/5 ≤ CEkin (f )1/2 . The assertions in (b) and (c) remain valid in the limiting case k = 7/2, where n = 5 , cf. Remark (a) above. Proof. Part (a) is obvious from assumption (Q1). Splitting the v-integral according to |v| ≤ R and |v| > R and optimizing in R yields 1+1/n
ρf
≤C
f 1+1/k dv
k/n
|v|2 f dv
(n−k)/n
.
Therefore, the first estimate in (b) follows from Hölder’s inequality with indices n/k and n/(n − k), and part (a) implies the second estimate in (b). Since ρf ∈ L1 ∩ L1+1/n (R3 ) and 1 + 1/n > 6/5 we find by interpolation,
6/5
ρf
≤ CEkin (f )3/10 ;
in the limiting case this follows directly without interpolation. The estimates for Uf follow from the generalized Young’s inequality, and the equality of the two representations for Epot (f ) follows by integration by parts after regularizing ρf if necessary. As an immediate corollary of the lemma above we note that on FM the total energy H is bounded from below in such a way that Ekin – and thus certain norms of f and ρf – remain bounded along minimizing sequences: Lemma 2. There exists a constant C > 0 such that H(f ) ≥ Ekin (f ) − CEkin (f )1/2 ,
f ∈ FM ,
in particular, hM := inf H > −∞, FM
and Ekin is bounded along minimizing sequences of H in FM . The behavior of H and C under scaling transformations can be used to show that hM is negative and to relate the hM ’s for different values of M: Lemma 3. (a) Let M > 0. Then −∞ < hM < 0. (b) For all M, M > 0,
7/3 hM = M/M hM .
Isotropic Steady States in Galactic Dynamics
613
Proof. Given any function f , we define a rescaled function f¯(x, v) = f (ax, bv), where a, b > 0. Then (2.5) C(f¯) = Q(f (ax, bv)) dv dx = (a b)−3 C(f ), i. e. f ∈ FM iff f¯ ∈ FM , where M := (ab)−3 M. Next 1 |v|2 f (ax, bv) dv dx = a −3 b−5 Ekin (f ), Ekin (f¯) = 2 1 f (ax, bv) f (ay, bw) ¯ Epot (f ) = − dw dv dy dx = a −5 b−6 Epot (f ). 2 |x − y| To prove (a) we fix any f ∈ FM and let a = b−1 so that f¯ ∈ FM as well. Then H(f¯) = b−2 Ekin (f ) + b−1 Epot (f ) < 0 for b > 0 sufficiently large, since Epot (f ) < 0. To prove (b) choose a and b such that a −3 b−5 = a −5 b−6 , i. e., b = a −2 . Then H(f¯) = a 7 H(f ),
(2.6)
and since a = (M/M)1/3 and the mapping FM → FM , f → f¯ is one-to-one and onto this proves (b). One should note that both Lemma 2 and Lemma 3 remain valid in the limiting case k = 7/2. 3. Existence of Minimizers for k < 7/2 It is conceivable that along a minimizing sequence the mass could run off to spatial infinity and/or spread uniformly in space. The main problem in proving the existence of a minimizer is to show that this does not happen, which is done in the next lemma. Combined with a local compactness result for the induced fields and a new version of the splitting technique developed in our previous papers this will yield the existence of minimizers. Lemma 4. Let (fi ) ⊂ FM be a minimizing sequence of H. Then there exist a sequence (ai ) ⊂ R3 and $0 > 0, R0 > 0 such that Q(fi ) dv dx ≥ $0 ai +BR0
for all sufficiently large i ∈ N. Here we define BR := {x ∈ R3 ||x| ≤ R}. Proof. For R > 1 define
R, KR (x) := 1/|x|, 0,
|x| < 1/R, 1/R ≤ |x| ≤ R, |x| > R,
614
Y. Guo, G. Rein
and FR (x) :=
1 1{|x|>R} (x), GR (x) := |x|
1 − R 1{|x|<1/R} (x) |x|
so that we split the kernel 1 = KR (x) + FR (x) + GR (x), x ∈ R3 . |x|
(3.1)
Here 1A denotes the indicator function of the set A. We split 1 4π
|∇Ui |2 dx =
ρi (x)ρi (y) dy dx = I1 + I2 + I3 |x − y|
(3.2)
according to (3.1), where ρi := ρfi . Since (ρi ) is bounded in L1+1/n (R3 ) and by (Q1) also in L1 (R3 ), we find from Lemma 1 (b), using the boundedness of the kinetic energy,
|I1 | ≤ R
|x−y|
ρi (x) ρi (y) dx dy ≤ RC sup
≤ R (n+4)/(n+1) C sup
y∈R3
≤R
(n+4)/(n+1)
C sup
y∈R3
y∈R3 y+BR
1+1/n
y+BR
ρi
y+BR
dx
ρi (x) dx
n/(n+1)
1+1/k fi
(3.3) dv dx
k/(n+1)
,
and 1 ρi (x) ρi (y) dx dy ≤ C R −1 , R |I3 | ≤ ρi 1+1/n ρi ∗ GR n+1 ≤ C GR (n+1)/2 ≤ C R −(5−n)/(n+1) ;
|I2 | ≤
for the last estimate we used Hölder’s and Young’s inequality. Since (fi ) is a minimizing sequence we have, for any R > 1, hM /2 > H(fi ) ≥ −|I1 | − |I2 | − |I3 |,
(3.4)
provided i is sufficiently large. Therefore, lim inf i→∞
sup
y∈R3 y+BR
k/(n+1) 1+1/k fi
≥ C lim inf |I1 | R −(n+4)/(n+1)
dv dx
i→∞
≥ C R −(n+4)/(n+1) −hM /2 − R −1 − R −(5−n)/(n+1) .
(3.5)
By Lemma 3 (a) the right-hand side of this estimate is positive for R sufficiently large, and the proof is complete.
Isotropic Steady States in Galactic Dynamics
615
Lemma 5. Let (ρi ) ⊂ L1 ∩ L1+1/n (R3 ) be bounded with respect to both norms and ρ0 ∈ L1 ∩ L1+1/n (R3 ) with ρi + ρ0 weakly in L1+1/n (R3 ). Then for any R > 0, ∇U1BR ρi → ∇U1BR ρ0 strongly in L2 (R3 ). Proof. Take any R > R. Since by assumption on k we have 1 + 1/n ∈]6/5, 5/3[, the mapping L1+1/n (R3 ) ρ → 1BR ∇Uρ ∈ L2 (BR ) is compact. Thus the asserted strong convergence holds on BR . On the other hand, C C ρi 21 ≤ , i ∈ N ∪ {0}, |∇U1BR ρi |2 dx ≤ R − R R −R |x|≥R which is arbitrarily small for R large.
We are now ready to show the existence of a minimizer of H. Theorem 1. Let M > 0. Let (fi ) ⊂ FM be a minimizing sequence of H. Then there is a minimizer f0 ∈ FM , a subsequence (still denoted by (fi )), and a sequence of translations Ti fi (x, v) = fi (x + ai , v) with (ai ) ⊂ R3 , such that H(f0 ) = inf H = hM FM
and Ti fi + f0 weakly in L1+1/k (R6 ). For the induced potentials we have ∇UTi fi → ∇U0 strongly in L2 (R3 ). Remark. Without admitting shifts in x-space the assertion of the theorem is wrong: Starting from a given minimizer f0 and a sequence of shift vectors (ai ) ∈ R3 the sequence (Ti f0 ) is minimizing and in FM , but if |ai | → ∞ this minimizing sequence converges weakly to zero, which is not in FM . Proof of Theorem 1. Let (fi ) be a minimizing sequence and (ai ) ⊂ R3 such that the assertion of Lemma 4 holds. Since H is translation invariant (Ti fi ) is again a minimizing sequence. By Lemma 1 (a), (Ti fi ) is bounded in L1+1/k (R6 ). Thus there exists a weakly convergent subsequence, denoted by (Ti fi ) again: Ti fi + f0 weakly in L1+1/k (R6 ). Clearly, f0 ≥ 0 a. e. By Lemma 2, (Ekin (Ti fi )) is bounded so by Lemma 1, (ρi ) = (ρTi fi ) is bounded in L1+1/n (R3 ), and by assumption (Q1) this sequence is also bounded in L1 (R3 ). After extracting a further subsequence ρi + ρ0 := ρf0 weakly in L1+1/n (R3 ). Also by weak convergence Ekin (f0 ) ≤ lim inf Ekin (Ti fi ) < ∞. i→∞
616
Y. Guo, G. Rein
By (Q2) the functional C is convex. Thus by Mazur’s Lemma and Fatou’s Lemma C(f0 ) ≤ lim sup C(Ti fi ) = M, i→∞
in particular, ρ0 ∈ L1 (R3 ) by (Q1). The key step is to show that up to a subsequence we have ||∇UTi fi − ∇U0 ||2 → 0.
(3.6)
For R0 < R we denote BR0 ,R := {x ∈ R3 |R0 ≤ |x| ≤ R}, and we split Ti fi as follows: Ti fi = Ti fi 1BR =:
fi1
0 ×R
+ fi2
3
+ Ti fi 1BR
0 ,R ×R
3
+ Ti fi 1BR,∞ ×R3
+ fi3 .
(3.7)
Due to Lemma 5, ∇Uf 1 +f 2 converges strongly in L2 for any fixed R. It thus suffices to i i show that for any $ > 0, lim inf |∇Uf 3 |2 dx < $ (3.8) i→∞
i
for sufficiently large R. By Lemma 1 (b) we only need to show that lim inf Q(fi3 ) dv dx < $
(3.9)
i→∞
for sufficiently large R. We use the method of splitting to verify (3.9). According to (3.7), H(Ti fi ) = H(fi1 ) + H(fi2 ) + H(fi3 ) 2 1 ρi (x)(ρi1 + ρi3 )(y) ρi (x)ρi3 (y) − dx dy − dx dy |x − y| |x − y|
(3.10)
=: H(fi1 ) + H(fi2 ) + H(fi3 ) − I1 − I2 , with obvious definitions for ρi1 , ρi2 , ρi3 . The boundedness of ||∇Uρ 1 +ρ 3 ||2 implies that i
i
I1 ≤ C ||∇Uρ 2 ||2 . i
Since ρi2 converges weakly in L1+1/n to ρ02 := ρ0 1BR0 ,R , ||∇Uρ 2 − ∇Uρ 2 ||2 → 0, i
0
i→∞
by Lemma 5. For R > 2R0 we use Hölder’s inequality to estimate I2 as follows: 1/2 R0 −1 2 ρi (x)dx |y| ρi (y)dy ≤ C ||ρi ||6/5 . I2 ≤ 2 R BR0 BR,∞ It is a simple calculus exercise to show that 7 τ 7/3 + (1 − τ )7/3 ≤ 1 − τ (1 − τ ), τ ∈ [0, 1]. 3
(3.11)
Isotropic Steady States in Galactic Dynamics
617
With Lemma 3 and obvious definitions of Mi1 , Mi2 , Mi3 this implies that H(fi1 ) + H(fi2 ) + H(fi3 ) ≥ hM 1 + hM 2 + hM 3 i i i 7/3 7/3 7/3 1 Mi Mi2 Mi3 hM = + + M M M 7/3 7/3 Mi1 + Mi2 Mi3 hM ≥ + M M 7 Mi1 + Mi2 Mi3 hM ≥ 1− 3 M M and thus hM − H(Ti fi ) − C1 hM Mi1 Mi3 ≤ Ii1 + Ii2
≤ C2 ∇Uf 2 2 + ∇Uρ 2 − ∇Uρ 2 2 + i
0
0
R0 R
1/2 .
Here R > 2R0 are so far arbitrary, and the constants C1 , C2 are independent of R and R0 . Now assume (3.9) were false. Then there exists $1 > 0 such that for every R > 0 and i large we have (3.12) Q(fi3 ) dv dx ≥ $1 . Define $2 := −C1 hM $0 $1 > 0, where $0 is as in Lemma 4, and increase R0 from that lemma such that C2 ∇Uf 2 2 ≤ $2 /4. Next choose R > 2R0 such that C2 (R0 /R)1/2 ≤ $2 /4. Then for i large,
0
hM − H(Ti fi ) + $2 ≤ hM − H(Ti fi ) − C1 hM Mi1 Mi3 1 ≤ $2 + C2 ∇Uρ 2 − ∇Uρ 2 2 . 0 i 2 By (3.11) this contradicts the fact that (Ti fi ) is minimizing. Thus (3.9) holds, and (3.6) follows. Clearly we have H(f0 ) ≤ limi H(Ti fi ), and it remains to show that C(f0 ) = M. Assume that M0 := C(f0 ) < M; M0 > 0 since otherwise f0 = 0 in contradiction to H(f0 ) < 0. Let M0 2/3 b := < 1, a := b−1/2 , M so that by (2.5), f¯0 ∈ FM . Then by (2.6), H(f¯0 ) = a 7 H(f0 ) ≤ b−7/2 hM < hM , a contradiction; recall that b < 1 and hM < 0.
618
Y. Guo, G. Rein
4. Properties of Minimizers The purpose of the present section is to show that the minimizers obtained in the previous one are indeed steady states of the Vlasov–Poisson system. Theorem 2. Let f0 ∈ FM be a minimizer of H. Then f0 (x, v) =
φ(E), E < E0 , 0, E ≥ E0
a. e.,
where E :=
1 2 |v| + U0 (x), 2
E0 := λ0 Q (0), λ0 :=
E f0 dv dx < 0, Q (f0 ) f0 dv dx
U0 is the potential induced by f0 , and φ(E) := inf{f ≥ 0|Q (f ) = E/λ0 },
E ≤ E0 .
In particular, f0 is a steady state of the Vlasov–Poisson system. Remark. (a) The Euler–Lagrange equation for our constrained minimization problem will give us the relation λ0 Q (f0 ) = E on f0−1 (]0, ∞[), which we want to invert by means of the function φ. Clearly, if Q is strictly increasing then φ(E) = (Q )−1 (E/λ0 ), E ≤ E0 . (b) Under our general assumption on Q the function Q : [0, ∞[→ [Q (0), ∞[ is continuous, increasing, and onto. This implies that for every η ≥ Q (0) the set (Q )−1 (η) is a closed, bounded interval, and there exists an at most countable set Vcrit such that (Q )−1 (η) consists of one point for η ∈ / Vcrit . The function φ is decreasing with φ(] − ∞, E0 ]) = [0, ∞[, and for f ∈ [0, ∞[ with λ0 Q (f ) ∈ / Vcrit we have φ(λ0 Q (f )) = f as desired. (c) In the example given by (2.3), Q (f ) =
1, 0 ≤ f ≤ 1, f, f > 1
which is not one-to-one on [0, ∞[, but the Euler–Lagrange equation can be inverted to yield (2.4).
Isotropic Steady States in Galactic Dynamics
619
Proof of Theorem 2. Let f0 and U0 be a pointwise defined representative of a minimizer of H in FM and its induced potential respectively; to derive the Euler–Lagrange relation we will argue first on f0−1 (]0, ∞[) and then on the complement. For $ > 0 small, 1 6 K$ := (x, v) ∈ R | $ ≤ f0 (x, v) ≤ $ defines a set of positive, finite measure. Let w ∈ L∞ (R6 ) be compactly supported and non-negative outside K$ , and define G(σ, τ ) := Q(f0 + σ 1K$ + τ w) dv dx; for τ and σ close to zero, τ ≥ 0, the function f0 + σ 1K$ + τ w is bounded on K$ , and non-negative otherwise. Therefore, G is continuously differentiable for such τ and σ , and G(0, 0) = M. Since ∂σ G(0, 0) = Q (f0 ) dv dx = 0, K$
there exists by the implicit function theorem a continuously differentiable function τ → σ (τ ) with σ (0) = 0, defined for τ ≥ 0 small, such that G(σ (τ ), τ ) = M. Hence f0 + σ (τ )1K$ + τ w ∈ FM . Furthermore, Q (f0 )w ∂τ G(0, 0) σ (0) = − . (4.1) = − ∂σ G(0, 0) K$ Q (f0 ) Since H(f0 + σ (τ )1K$ + τ w) attains its minimum at τ = 0, Taylor expansion implies 0 ≤ H(f0 + σ (τ )1K$ + τ w) − H(f0 ) = τ E [σ (0)1K$ + w] dv dx + o(τ ) for τ ≥ 0 small. With (4.1) we get [−λ$ Q (f0 ) + E] w dv dx ≥ 0, where
λ$ :=
K$ K$
E
Q (f0 )
(4.2)
.
By our choice for w this implies that E = λ$ Q (f0 ) a. e. on K$ and E ≥ λ$ Q (f0 ) otherwise. This shows that λ$ = λ0 does in fact not depend on $. Letting $ → 0, we conclude that E = λ0 Q (f0 ) a. e. on f0−1 (]0, ∞[),
(4.3)
E ≥ λ0 Q (0) = E0 a. e. on f0−1 (0).
(4.4)
620
Y. Guo, G. Rein
If we multiply (4.3) by f0 and integrate we obtain the asserted formula for λ0 , and λ0 < 0 as claimed, since E f0 dv dx = Ekin (f0 ) − 2Epot (f0 ) < H(f0 ) < 0. We need to invert (4.3). Let Vcrit be the at most countable set of values where Q is not one-to-one, cf. part (a) of the remark above. Since for any constant η ∈ R the set E −1 (η) has measure zero – for fixed x this is a sphere in v-space – we conclude that Scrit := {(x, v) ∈ R6 |E(x, v)/λ0 ∈ Vcrit } is a set of measure zero, and on f0−1 (]0, ∞[) \ Scrit the Euler–Lagrange equation (4.3) can be inverted to yield f0 (x, v) = φ(E) as claimed, cf. part (a) of the remark above. Together with (4.4) this proves that f0 is a. e. equal to a function of the particle energy E. Next we study the regularity, symmetry, and uniqueness of minimizers. Let Ccm and m Cb denote the space of C m functions with compact support and with bounded derivatives
up to order m, respectively.
Theorem 3. (a) Let f0 ∈ FM be a minimizer of H. Then f0 is spherically symmetric with respect to some point in x-space. (b) If k ≥ 1/2 assume in addition that φ(E) ≤ C(−E)k ,
E → −∞,
where φ is defined by Q as in Theorem 2; this condition is compatible with the general assumptions on Q. Then U0 ∈ Cb2 (R3 ) with lim|x|→∞ U0 (x) = 0 and ρ0 ∈ Cc1 (R3 ). (c) If in particular Q(f ) = f + f 1+1/k , f ≥ 0, with 0 < k < 7/2 then up to a shift in x-space there are at most two minimizers of H in FM . Proof. To prove the spherical symmetry of f0 we denote by f0∗ the spherically symmetric rearrangement of f0 with respect to x so that ρf0∗ is the spherically symmetric rearrangement of ρ0 . Clearly, f0∗ ∈ FM so that H(f0∗ ) ≥ H(f0 ), and since Ekin (f0∗ ) = Ekin (f0 ) this implies that Epot (f0∗ ) ≥ Epot (f0 ). Thus by Riesz’ rearrangement inequality these two terms must be equal, and ρ0 must be spherically symmetric with respect to some point in R3 , cf. [13, Thms. 3.7 and 3.9]. By definition, U0 is symmetric as well, and by Theorem 2 also f0 . To prove part (b), consider first the case where k < 1/2 i. e., n < 2. Then ρ0 ∈ Lp ∩L1 with p = 1 + 1/n > 3/2. The usual Lp -regularity theory and Sobolev’s embedding theorem implies that 2,p
U0 ∈ Wloc (R3 ) ⊂ C(R3 ). Moreover, for any R > 0 and x ∈ R3 , ρ0 (y) −U0 (x) = dy + ··· + ··· |x−y|<1/R |x − y| 1/R≤|x−y|
Isotropic Steady States in Galactic Dynamics
621
where q is the conjugate exponent to p, so q < 3. This implies that U0 ∈ Cb (R3 ),
lim U0 (x) = 0.
|x|→∞
This in turn implies that for |x| sufficiently large, E > E0 ; note that the latter quantity is negative by Theorem 2. By the same theorem, f0 and ρ0 have compact support. To continue, we note that since f0 depends only on the particle energy E via the function φ, ρ0 (x) = hφ (U0 (x)), where
√ hφ (u) := 4π 2
u
∞
φ(E)
√
x ∈ R3 , E − u dE, u ∈ R;
(4.5)
(4.6)
note that hφ (u) = 0 for u ≥ E0 . By the general assumptions on Q the function hφ is continuously differentiable. Thus the regularity of U0 implies that ρ0 ∈ Cc (R3 ), this in turn implies that U0 ∈ Cb1 (R3 ), thus ρ0 ∈ Cc1 (R3 ), and finally U0 ∈ Cb2 (R3 ). Consider now the case that k ≥ 1/2. Clearly we are done if we can but prove that ρ0 is not only in Lp with p = 1 + 1/n, which is now too small for the argument above, but in some Lp with p > 3/2. To show this, we use a bootstrap argument, based on (4.5). For this to work, we need some control on the growth of the function hφ which is the reason for the extra assumption on φ. Indeed, under that assumption the following estimate holds:
hφ (u) ≤ C 1 + (E0 − u)n , u ≤ E0 . If we use this estimate on the set where ρ0 is large – this set has finite measure – and the integrability of ρ0 on the complement we find that (4.7) ρ0 (x)p dx ≤ C + (−U0 (x))np dx. If we would pick the limiting case n = 5, i. e., ρ0 ∈ L6/5 , we would find by Young’s inequality that U0 ∈ L6 , and bootstrapping this via (4.7) gives us ρ0 ∈ L6/5 back. However, for n < 5 this works better: Starting with p0 = 1 + 1/n we apply Young’s inequality to find that U0 lies in Lq with q = (1/p0 − 2/3)−1 > 1, and substituting this into (4.7) we conclude that ρ0 ∈ Lp1 with p1 = q/n; note that by assumption p0 < 3/2. If p1 > 3/2 we are done. If p1 = 3/2 we decrease p1 slightly – note that ρ0 ∈ L1 – so that in the next bootstrap step we find p2 as large as we wish. If p1 < 3/2 we repeat the process. By induction one sees that 3(1 + 1/n)(n − 1) >1 pk = k n (n − 5) + 2n + 2 as long as pk−1 < 3/2. But since 2 ≤ n < 5 the denominator would eventually become negative so that the process must stop after finitely many steps. As to part (c) we first observe that up to some shift U0 as a function of the radial variable r := |x| solves the equation 1 2 k+3/2 (r U0 ) = ck (E0 − U0 )+ , r > 0, (4.8) r2 with some appropriately defined constant ck . Here denotes the derivative with respect to r. The assertion now follows from the scaling properties of (4.8), and we refer to [8, Thm. 3] for the details.
622
Y. Guo, G. Rein
5. Dynamical Stability Let f0 ∈ FM be a minimizer as obtained in Theorem 1. To investigate its dynamical stability we note first that 1 2 1 H(f ) − H(f0 ) = |v| + U0 (f − f0 ) dv dx − ∇Uf − ∇U0 22 2 8π (5.1) 1 2 =: d(f, f0 ) − ∇Uf − ∇U0 2 , f ∈ FM . 8π Since C(f ) = C(f0 ),
d(f, f0 ) =
[E(f − f0 ) − λ0 (Q(f ) − Q(f0 ))] dv dx.
Since Q is convex and the Lagrange multiplier λ0 from Theorem 2 is negative, the integrand can be estimated from below by (E − λ0 Q (f0 )) (f − f0 ). According to Theorem 2, this quantity is zero on supp f0 , while on R6 \ supp f0 it equals (E − λ0 Q (0)) f = (E − E0 ) f ≥ 0. Thus we see that d(f, f0 ) ≥ 0, f ∈ FM . We are now ready to state our stability result. Note that if we shift a minimizer in space we obtain another minimizer. Moreover, we do in general not know whether the minimizers are unique up to spatial shifts. This fact is reflected in two versions of our stability result. Theorem 4. Let MM ⊂ FM denote the set of all minimizers of H in FM . (a) For every $ > 0 there is a δ > 0 such that for any solution t → f (t) of the Vlasov–Poisson system with f (0) ∈ Cc1 (R6 ) ∩ FM , 1 inf d(f (0), f0 ) + ∇Uf (0) − ∇U0 22 < δ 8π f0 ∈MM implies that
inf
f0 ∈MM
d(f (t), f0 ) +
1 ∇Uf (t) − ∇Uf0 22 < $, t ≥ 0. 8π
(b) Suppose that f0 ∈ MM is isolated, i. e., inf ∇Uf0 − ∇Uf˜0 2 | f˜0 ∈ MM \ {T a f0 | a ∈ R3 } > 0. Then for every $ > 0 there is a δ > 0 such that for any solution t → f (t) of the Vlasov–Poisson system with f (0) ∈ Cc1 (R6 ) ∩ FM , d(f (0), f0 ) +
1 ∇Uf (0) − ∇U0 22 < δ 8π
Isotropic Steady States in Galactic Dynamics
623
implies that for every t ≥ 0 there exists a shift vector a ∈ R3 such that d(f (t), T a f0 ) +
1 ∇Uf (t) − ∇UT a f0 22 < $. 8π
Here T a f (x, v) := f (x + a, v) for a ∈ R3 . Remark. (a) By Theorem 3 (c) the assumption of part (b) holds for the polytropes. (b) We only showed that d(f, f0 ) ≥ 0 for f ∈ FM , but one may think of this term as a weighted L2 -difference of f and f0 . For example, if Q ∈ C 2 (]0, ∞[) with cQ :=
inf
0
Q (f ) > 0,
where fmax ≥ f0 ∞ as would be the case for the polytropes Q(f ) = f + f 1+1/k with 1 ≤ k < 7/2, then by Taylor-expanding Q we find d(f, f0 ) ≥
1 cQ f − f0 22 , 2
f ∈ FM with f ≤ fmax ;
observe that the size restriction on f propagates along solutions of the Vlasov– Poisson system. (c) The restriction f (0) ∈ FM for the perturbed initial data is acceptable from a physics point of view: A physical perturbation of a given galaxy, say by the gravitational pull of some outside object, would result in a perturbed state which is an equimeasurable rearrangement of the original state, in particular, the value of C(f ) remains unchanged. Proof of Theorem 4. Assume the assertion of part (a) were false. Then there exist $ > 0, tn > 0, and fn (0) ∈ Cc1 (R6 ) ∩ FM such that for all n ∈ N, 1 1 d(fn (0), f0 ) + ∇Ufn (0) − ∇U0 22 < (5.2) inf 8π n f0 ∈MM but
inf
f0 ∈MM
d(fn (tn ), f0 ) +
1 ∇Ufn (tn ) − ∇U0 22 ≥ $. 8π
(5.3)
By (5.2) and (5.1), lim H(fn (0)) = hM .
n→∞
Since both H and C are conserved along classical solutions as launched by fn (0), lim H(fn (tn )) = hM and fn (tn ) ∈ FM , n ∈ N,
n→∞
i. e., (fn (tn )) is a minimizing sequence for H in FM . Up to a subsequence we may therefore assume by Theorem 1 that there exists a minimizer f0 ∈ FM and a sequence (an ) ⊂ R3 such that ∇Ufn (tn ) − ∇UT an f0 22 → 0;
(5.4)
624
Y. Guo, G. Rein
note that for any f ∈ FM and a ∈ R3 , ∇UT a f − ∇Uf0 2 = ∇Uf − ∇UT −a f0 2 , also d(T a f, f0 ) = d(f, T −a f0 ). Since limn→∞ H(fn (tn )) = hM = H(T an f0 ) we conclude by (5.4) and (5.1) that d(fn (tn ), T an f0 ) → 0, n → ∞, and since T an f0 ∈ MM we arrive at a contradiction to (5.3). Thus part (a) is established. Now assume that f0 is an isolated minimizer in FM , and define 1 δ0 := inf ∇Uf0 − ∇Uf˜0 2 | f˜0 ∈ MM \ {T a f0 | a ∈ R3 } > 0. 8π Let $ > 0 arbitrary. In order to find the corresponding δ we can without loss of generality assume that $ < δ0 /4. Now choose δ > 0 according to part (a), without loss of generality δ < $, and let f (0) ∈ Cc1 (R6 ) ∩ FM be such that d(f (0), f0 ) +
1 ∇Uf (0) − ∇U0 22 < δ. 8π
The function 1 ∇Uf (t) − ∇UT a f0 22 h(t, a) := d(f (t), T a f0 ) + 8π 1 2 1 3 = |v| (f (t) − f0 ) dv dx + ∇Uf (t) 22 + ∇Uf0 22 2 8π 8π + UT a f0 ρf (t) dx is continuous, and since the interaction term goes to zero as |a| → ∞, uniformly on compact time intervals, inf a∈R3 h(t, a) is also continuous. Now assume that there exists t > 0 such that inf h(t, a) ≥ $.
a∈R3
Since at time zero the left hand side is less than $ there exists some t ∗ > 0 where inf h(t ∗ , a) = $.
(5.5)
a∈R3
On the other hand, part (a) provides some f0∗ ∈ MM such that d(f (t ∗ ), f0∗ ) +
1 δ0 ∇Uf (t ∗ ) − ∇Uf0∗ 22 < $ ≤ . 8π 4
(5.6)
By (5.5) and (5.6) together with the non-negativity of d, δ0 1 ∇Uf0 − ∇Uf0∗ 22 ≤ , 8π 2 ∗
and by the definition of δ0 there must exist some a ∗ ∈ R3 such that f0∗ = T a f0 . But this means that (5.5) contradicts (5.6), and the proof of part (b) is complete.
Isotropic Steady States in Galactic Dynamics
625
6. The Case k = 7/2; the Plummer Sphere In this section we study the so-called Plummer sphere which corresponds to the minimization of H on the constraint set f 9/7 dv dx = M, Ekin (f ) < ∞ FM := f ∈ L9/7 (R6 )|f ≥ 0, i. e., we take Q(f ) = f 9/7 which means k = 7/2 and n = 5. Due to the fact that the scaling transformation (Sλ f )(x, v) = λ−7 f (λ−4 x, λv),
(6.1)
leaves each term in both H and C invariant this case presents additional difficulties. As we noted at the end of Sect. 1 the assertion of Lemma 2 remains valid so that there exists a minimizing sequence (fi ) of H in FM . We shall follow the steps in Sect. 3 to conclude the existence of a minimizer. The key step is to verify that the assertions of both Lemma 4 and Lemma 5 are still valid in the presence of the scaling (6.1). To this end we wish to employ the results in [8], so we consider (fi∗ ), the sequence of spherically symmetric rearrangements with respect to x, which is again minimizing and in FM . By [8, Thm. 2] there exists a symmetric minimizer g such that Si fi∗ + g weakly in L9/7 (R6 ), and more importantly ∇USi fi∗ − ∇Ug → 0, 2
where Si = Sλi with some λi > 0. The corresponding spatial densities also converge strongly: Lemma 6. Up to a subsequence, lim ρSi fi∗ − ρg 6/5 = 0.
i→∞
Proof. Define ρi∗ = ρSi fi∗ . By further extracting a subsequence we can assume ρi∗ + ρg
weakly in L6/5 (R3 ),
i → ∞.
(6.2)
We claim that ρi∗ (r) → ρg (r)
(6.3)
for all r ∈]0, ∞[ at which ρg is continuous. If r0 were a point of continuity for ρg at which this is not the case, then there is an $0 > 0 and a subsequence such that |ρi∗ (r0 ) − ρg (r0 )| ≥ $0 ,
i ∈ N.
Assume that ρi∗ (r0 ) − ρg (r0 ) ≥ $0 . Since ρg is continuous at r0 > 0, there exists δ > 0 such that |ρg (r) − ρg (r0 )| < $0 /2
626
Y. Guo, G. Rein
for r ∈ [r0 − δ, r0 ]. From the monotonicity of ρi∗ we have ρg dr ≤ [ρg (r0 ) + $0 /2]dr ≤ [ρi∗ (r0 ) − $0 /2]dr {r0 −δ≤r≤r0 } {r0 −δ≤r≤r0 } {r0 −δ≤r≤r0 } ≤ [ρi∗ (r) − $0 /2] dr {r0 −δ≤r≤r0 }
in contradiction to (6.2). The assumption ρi∗ (r0 ) − ρg (r0 ) < −$0 leads to the analogous contradiction on the interval [r0 , r0 + δ] so that (6.3) is established. Now we choose two sequences rj → 0 and Rj → ∞ at which ρg is continuous. By the monotonicity and (6.3), ρi∗ are uniformly bounded for each region rj ≤ r ≤ Rj . By Lebesgue’s dominated convergence theorem, ρi∗ − ρg L6/5 (rj ≤r≤Rj ) → 0.
(6.4)
But by (0.47) of [8] there is no concentration at either r = 0 or r = ∞, i. e., for any $ > 0, there is R > 0 such that 9/7 lim sup < $. Si fi∗ i→∞
{r≤1/R}∪{r≥R}
The first estimate in Lemma 1 (b) therefore implies that up to a subsequence, ρi∗ L6/5 ({r≤rj }∪{Rj ≤r}) < C$ for i large, and together with (6.4) this completes the proof.
We now establish the key fact that Si fi are equi-integrable, i. e., there is no concentration. For a non-negative function h and a cut-off parameter N > 0 we define 0 if 1/N ≤ h(x) ≤ N, h|N (x) := h(x) otherwise. Lemma 7. For any $ > 0, there is an N > 0 such that up to a subsequence, [ρSi fi ]|N 6/5 < $, i ∈ N. Proof. Let ρi∗ = ρSi fi∗ and ρi = ρSi fi . We first use Lemma 6 to prove the assertion for ρi∗ . To this end, let Ai = {r| ρi∗ < 1/N } ∪ {r| ρi∗ > N }, and Ag = {r| ρg ≤ 1/N } ∪ {r| ρg ≥ N }. For any $ > 0, there is N > 0 large such that ρg L6/5 (Ag ) < $/2. Now from Lemma 6 for i large we deduce that (ρi∗ )|N 6/5 = ρi∗ L6/5 (Ai ) ≤ ρi∗ − ρg L6/5 (Ai ) + ρg L6/5 (Ai ) ≤ $/2 + ρg L6/5 (Ai ) . Again by Lemma 6 we find that up to a subsequence ρi∗ → ρg pointwise a. e., and thus lim supi→∞ 1Ai ≤ 1Ag a. e.. Therefore, applying Fatou’s lemma for ρg (1 − 1Ai ), we get lim sup ρg L6/5 (Ai ) ≤ ρg L6/5 (Ag ) < $/2. i→∞
Isotropic Steady States in Galactic Dynamics
627
Hence up to a subsequence, (ρi∗ )|N 6/5 < $,
i ∈ N.
(6.5)
Now by the equi-measurability of the rearrangements, (6.5) implies the assertion of the lemma. In fact, for any function h ≥ 0 and p ≥ 1, ∞ p s p−1 µ{h1{h<1/N and h>N} > s} ds (h|N ) = p 0 ∞ s p−1 µ{h1{h<1/N} > s} + µ{h1{h>N} > s} ds =p 0 ∞ s p−1 [µ{s < h < 1/N } + µ{h > min[N, s]}] ds =p 0 ∞ s p−1 µ{s < h∗ < 1/N } + µ{h∗ > min[N, s]} ds = (h∗|N )p . =p 0
Taking h = ρi and noticing that by definition, h∗ = ρi∗ = ρ[Si (fi )]∗ = ρSi (fi∗ ) , we see that the assertion of the lemma follows from (6.5). We are now ready to prove the analogue of Theorem 1 for the limiting case k = 7/2: Theorem 5. Let M > 0. Let (fi ) ⊂ FM be a minimizing sequence of H. Then there is a minimizer f0 ∈ FM , a subsequence (still denoted by (fi )), and a sequence of translations and scalings: Ti Si fi (x, v) := λ7i fi (λ−4 i x + ai , λi v) with (ai ) ⊂ R3 and λi > 0 such that H(f0 ) = inf H = hM FM
and Ti Si fi + f0 weakly in L1+1/k (R6 ). For the induced potentials we have ∇UTi Si fi → ∇U0 strongly in L2 (R3 ). Proof. It suffices to verify the analogues of Lemma 4 and Lemma 5 for ρi = ρSi fi which satisfies the estimate in Lemma 7. To this end we split ρi =: ρib + (ρi )|N . Notice that for a fixed cut-off parameterN > 0, (ρib ) is bounded in any Lp , 1 ≤ p ≤ ∞, so that both Lemma 4 and Lemma 5 are valid for ρib . But by Lemma 7 and the generalized Young’s inequality, ∇U(ρi )|N 2 can be made arbitrarily small for N large, uniformly in i. Therefore, the assertion of Lemma 5 is clearly valid for (ρi ). To verify Lemma 4 for ρi , we choose $ < −hM /2 in Lemma 6. Notice that ∇Uρi 2 ≤ ∇Uρ b 2 + ∇U(ρi )|N 2 . i
Since Lemma 4 is valid for ρib we deduce as in (3.4) that hM /2 > H(fi ) ≥ −|I1b | − |I2b | − |I3b | − $,
628
Y. Guo, G. Rein
where Ikb , k = 1, 2, 3, are induced by ρib . Since ρib dx ≤ sup sup y∈R3 y+BR
y∈R3 y+BR
ρi dx,
we deduce the assertion of Lemma 4 for ρi by the same argument as before.
Next we show the analogues of the assertions of Theorems 2 and 3: Theorem 6. Let f0 ∈ FM be a minimizer of H. Then (E/λ0 )7/2 , E < 0, f0 (x, v) = 0 , E ≥ 0, with some constant λ0 < 0, in particular, f0 is a steady state of the Vlasov–Poisson system. Moreover, f0 is spherically symmetric with respect to some point in x-space, and up to scalings and translations in x, U0 (r) = −c0 (1 + r 2 )−1/2 ,
ρ0 (r) =
3c0 (1 + r 2 )−5/2 , 4π
r ≥ 0,
where the positive constant c0 depends on λ0 . Proof. The identity for f0 follows exactly as in the proof of Theorem 2. The spherical symmetry follows as in the proof of Theorem 3. By monotonicity limr→∞ U0 (r) ∈ ] − ∞, 0] exists, and since U0 ∈ L6 (R3 ) this limit must be zero so that U0 is a solution of the corresponding Emden-Fowler equation (4.8) with k = 7/2 and E0 = 0. The uniqueness up to scalings follows as in [8, Thm. 3], and the explicit formulas can be checked by direct computation. Finally, we state the stability theorem for the limiting case k = 7/2: Theorem 7. Let f0 ∈ FM be a minimizer of H. Then for every $ > 0 there is a δ > 0 such that for any solution t → f (t) of the Vlasov–Poisson system with f (0) ∈ Cc1 ∩FM , d(f (0), f0 ) +
1 ∇Uf (0) − ∇Uf0 2L2 < δ 8π
implies that for every t ≥ 0 there exits a shift vector a ∈ R3 and a scaling parameter λ > 0 such that d(f (t), T a Sλ f0 ) +
1 ∇Uf (t) − ∇UT a Sλ f0 22 < $, 8π
t ≥ 0.
The only difference to the proof of Theorem 4 is that one now has to take into account not only the spatial shifts, but also the scaling transformations which arise in Theorem 5, and this is straightforward. The condition f (0) ∈ FM can also be relaxed by a scaling transformation as in [8, Thm. 4]. Note added in proof. The stability of isotropic steady states is also addressed in Wan,Y.-H.: Nonlinear stability of spherical systems in galactic dynamics. Preprint, 2000.
Isotropic Steady States in Galactic Dynamics
629
References 1. Aly, J. J.: On the lowest energy state of a collisionless selfgravitating system under phase space volume constraints. Monthly Notices Royal Astronomical Soc. 241, 15–27 (1989) 2. Batt, J., Faltenbacher, W., & Horst, E.: Stationary spherically symmetric models in stellar dynamics. Arch. Rational Mech. Anal. 93, 159–183 (1986) 3. Batt, J., Morrison, P., Rein, G.: Linear stability of stationary solutions of the Vlasov–Poisson system in three dimensions. Arch. Rational Mech. Anal. 130, 163–182 (1995) 4. Binney, J., Tremaine, S.: Galactic Dynamics. Princeton: Princeton University Press, 1987 5. Braasch, P., Rein, G., Vukadinovi´c, J.: Nonlinear stability of stationary plasmas – An extension of the energy-Casimir method. SIAM J. Applied Math. 59, 831–844 (1999) 6. Fridman, A. M., Polyachenko, V. L.: Physics of Gravitating Systems, I. New York: Springer-Verlag, 1984 7. Guo, Y.: Variational method in polytropic galaxies. Arch. Rational Mech. Anal. 150, 209–224 (1999) 8. Guo, Y.: On the generalized Antonov’s stability criterion. Contem. Math. 263, 85–107 (2000) 9. Guo, Y., Rein, G.: Stable steady states in stellar dynamics. Arch. Rational Mech. Anal. 147, 225–243 (1999) 10. Guo, Y., Rein, G.: Existence and stability of Camm type steady states in galactic dynamics. Indiana University Math. J. 48, 1237–1255 (1999) 11. Guo, Y., Strauss, W.: Nonlinear instability of double-humped equilibria. Ann. Inst. Henri Poincaré 12, 339–352 (1995) 12. Guo, Y., Strauss, W.: Instability of periodic BGK equilibria. Comm. Pure Appl. Math. 48, 861–894 (1995) 13. Lieb, E.H., Loss, M.: Analysis. Providence, RI: American Mathematical Society, 1996 14. Lions, P.-L.: The concentration-compactness principle in the calculus of variations. The locally compact case. Part 1. Ann. Inst. H. Poincaré 1, 109–145 (1984) 15. Pfaffelmoser, K.: Global classical solutions of the Vlasov–Poisson system in three dimensions for general initial data. J. Diff. Eqs. 95, 281–303 (1992) 16. Rein, G.: Nonlinear stability for the Vlasov–Poisson system – The energy-Casimir method. Math. Meth. in the Appl. Sci. 17, 1129–1140 (1994) 17. Rein, G.: Flat steady states in stellar dynamics – Existence and stability. Commun. Math. Phys. 205, 229–247 (1999) 18. Rein, G.: Stability of spherically symmetric steady states in galactic dynamics against general perturbations. Preprint, 1999 19. Rein, G., Rendall, A. D.: Compact support of spherically symmetric equilibria in non-relativistic and relativistic galactic dynamics. Math. Proc. Camb. Phil. Soc. 128, 363–380 (2000) 20. Schaeffer, J.: Global existence of smooth solutions to the Vlasov–Poisson system in three dimensions. Commun. Part. Diff. Eqs. 16, 1313–1335 (1991) 21. Wan, Y.-H.: On nonlinear stability of isotropic models in stellar dynamics. Arch. Rational Mech. Anal. 147, 245–268 (1999) 22. Wolansky, G.: On nonlinear stability of polytropic galaxies. Ann. Inst. Henri Poincaré, 16, 15–48 (1999) Communicated by H. Spohn
Commun. Math. Phys. 219, 631 – 669 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Multi-Interval Subfactors and Modularity of Representations in Conformal Field Theory Yasuyuki Kawahigashi1 , Roberto Longo2, , Michael Müger2, 1 Department of Mathematical Sciences, University of Tokyo, Komaba, Tokyo, 153-8914, Japan.
E-mail: [email protected]
2 Dipartimento di Matematica, Università di Roma “Tor Vergata”, Via della Ricerca Scientifica,
00133 Roma, Italy. E-mail: [email protected]; [email protected] Received: 7 July 1999 / Accepted: 13 January 2001
Dedicated to John E. Roberts on the occasion of his sixtieth birthday Abstract: We describe the structure of the inclusions of factors A(E) ⊂ A(E ) associated with multi-intervals E ⊂ R for a local irreducible net A of von Neumann algebras on the real line satisfying the split property and Haag duality. In particular, if the net is conformal and the subfactor has finite index, the inclusion associated with two separated intervals is isomorphic to the Longo–Rehren inclusion, which provides a quantum double construction of the tensor category of superselection sectors of A. As a consequence, the index of A(E) ⊂ A(E ) coincides with the global index associated with all irreducible sectors, the braiding symmetry associated with all sectors is non-degenerate, namely the representations of A form a modular tensor category, and every sector is a direct sum of sectors with finite dimension. The superselection structure is generated by local data. The same results hold true if conformal invariance is replaced by strong additivity and there exists a modular PCT symmetry.
1. Introduction This paper provides the solution to a natural problem in (rational) conformal quantum field theory, the description of the structure of the inclusion of factors associated to two or more separated intervals. This problem has been considered in the past years, seemingly with different motivations. The most detailed study of this inclusion so far has been done by Xu [50] for the models given by loop group construction for SU (n)k [47]. In this case Xu has computed the index and the dual principal graph of the inclusions. A suggestion to study this inclusion has been made also in [43, Sect. 3]. Our analysis is model independent, and will display new structures and a deeper understanding also in these and other models. Supported in part by GNAFA and MURST
Supported by EU TMR Network “Noncommutative Geometry”. Address after June 2001: Koorteweg de
Vries Institute, Amsterdam, The Netherlands
632
Y. Kawahigashi, R. Longo, M. Müger
Let A be a local irreducible conformal net of von Neumann algebras on R, i.e. an inclusion preserving map I → A(I ) from the (connected) open intervals of R to von Neumann algebras A(I ) on a fixed Hilbert space. One may define A(E) for an arbitrary set E ⊂ R as the von Neumann algebra generated by all the A(I )’s as I varies in the intervals contained in E. By locality A(E) and A(E ) commute, where E denotes the interior of R E, and thus one obtains an inclusion ˆ A(E) ⊂ A(E), ˆ where A(E) ≡ A(E ) . If Haag duality holds, as we shall assume1 , this inclusion is trivial if E is an interval, but it is in general non-trivial for a disconnected region E. We will explain its structure if E is the union of n separated intervals, a situation that can be reduced to the case n = 2, namely E = I1 ∪ I2 , where I1 and I2 are intervals with disjoint closure, as we set for the rest of this introduction. ˆ One can easily realize that the inclusion A(E) ⊂ A(E) is related to the superselection structure of A, i.e. to the representation theory of A, as charge transporters between ˆ endomorphisms localized in I1 and I2 naturally live in A(E), but not in A(E). ˆ Assuming the index [A(E) : A(E)] < ∞ and the split property2 , namely that A(I1 ) ∨ A(I2 ) is naturally isomorphic to A(I1 ) ⊗ A(I2 ), we shall show that indeed ˆ A(E) ⊂ A(E) contains all the information on the superselection rules. We shall prove that in this case A is rational, namely there exist only finitely many ˆ different irreducible sectors {[ρi ]} with finite dimension and that A(E) ⊂ A(E) is isomorphic to the inclusion considered in [28] (we refer to this as the LR inclusion, cf. Appendix A), which is canonically associated with A(I1 ), {[ρi ]} (with the identification A(I2 ) A(I1 )opp ). In particular, ˆ d(ρi )2 , [A(E) : A(E)] = i
the global index of the superselection sectors. In fact A will turn out to be rational in an even stronger sense, namely there exist no sectors with infinite dimension, except the ones that are trivially constructed as direct sums of finite-dimensional sectors. Moreover, we shall exhibit an explicit way to generate the superselection sectors of ˆ A from the local data in E: we consider the canonical endomorphism γE of A(E) into A(E) and its restriction λE = γE |A(E) ; then λE extends to a localized endomorphism λ of A acting identically on A(I ) for all intervals I disjoint from E. We have ρi ρ¯i , (1) λ= i
where the ρi ’s are inequivalent irreducible endomorphisms of A localized in I1 with conjugates ρ¯i localized in I2 and the classes {[ρi ]}i exhaust all the irreducible sectors. To understand this structure, consider the symmetric case I1 = I , I2 = −I . Then A(−I ) = j (A(I )), where j is the anti-linear PCT automorphism, hence we may identify 1 As shown in [18], one may always extend A to the dual net Ad , which is conformal and satisfies Haag duality. 2 This general property is satisfied, in particular, if Tr(e−βL0 ) < ∞ for all β > 0, where L is the 0 conformal Hamiltonian, cf. [5, 8].
Multi-Interval Subfactors and Modularity of Representations in CFT
633
A(−I ) with A(I )opp . Moreover the formula ρ¯i = j · ρi · j holds for the conjugate sector [17], thus by the split property we may identify {A(E), ρi ρ¯i |A(E) } with {A(I ) ⊗ opp A(I )opp , ρi ⊗ ρi }. Now there is an isometry Vi that intertwines the identity and ρi ρ¯i ˆ ˆ and belongs to A(E). We then have to show that A(E) is generated by A(E) and the Vi ’s and that the Vi ’s satisfy the (crossed product) relations characteristic of the LR inclusion. This last point is verified by identifying Vi with the standard implementation isometry as in [17], while the generating property follows by the index computation that will follow by the “transportability” of the canonical endomorphism above. The superselection structure of A can then be recovered by formula (1) and the split property. Note that the representation tensor category of A ⊗ Aopp generated by opp {ρi ⊗ ρi }i corresponds to the connected component of the identity in the fusion graph for A, therefore the associated fusion rules and quantum 6j -symbols are encoded in the ˆ isomorphism class of the inclusion A(E) ⊂ A(E), that will be completely determined by a crossed product construction. A further important consequence is that the braiding symmetry associated with all sectors is always non-degenerate, in other words the localizable representations form a modular tensor category. As shown by Rehren [41], this implies the existence and nondegeneracy of Verlinde’s matrices S and T , thus the existence of a unitary representation of the modular group SL(2, Z), which plays a role in topological quantum field theory. It follows that the net B ⊃ A ⊗ Aopp obtained by the LR construction is a field algebra for A ⊗ Aopp , namely B has no superselection sector (localizable in a bounded interval) and there is a generating family of sectors of A ⊗ Aopp that are implemented by isometries in B. Indeed B is a crossed product of A ⊗ Aopp by the tensor category of all its sectors. As shown by Masuda [30], Ocneanu’s asymptotic inclusion [35] and the Longo– Rehren inclusion in [28] are, from the categorical viewpoint, essentially the same constructions. The construction of the asymptotic inclusion gives a new subfactor M ∨ (M ∩ M∞ ) ⊂ M∞ from a hyperfinite II1 subfactor N ⊂ M with finite index and finite depth and it is a subfactor analogue of the quantum double construction of Drinfel d [11], as noted by Ocneanu. That is, the tensor category of the M∞ –M∞ bimodules arising from the new subfactor is regarded a “quantum double” of the original category of M–M (or N –N ) bimodules. On the other hand, as shown in [33], the Longo–Rehren construction gives the quantum double of the original tensor category of endomorphisms. (See also [12, Chapter 12] for a general theory of asymptotic inclusions and their relations to topological quantum field theory.) Our result thus shows that the inclusion arising from two separated intervals as above gives the quantum double of the tensor category of all localized endomorphisms. However, as the braiding symmetry is non-degenerate, the quantum double will be isomorphic to the subcategory of the trivial doubling of the original tensor category corresponding to the connected component of the identity in the fusion graph. Indeed, in the conformal case, multi-interval inclusions are self-dual. For our results conformal invariance is not necessary, although conformal nets provide the most interesting situation where they can be applied. We may deal with an arbitrary net on R, provided it is strongly additive (a property equivalent to Haag duality on R if conformal invariance is assumed) and there exists a cyclic and separating vector for the von Neumann algebras of half-lines (vacuum), such that the corresponding modular conjugations act geometrically as PCT symmetries (automatic in the conformal case). We will deal with this more general context.
634
Y. Kawahigashi, R. Longo, M. Müger
Our paper is organized as follows. The second section discusses general properties of multi-interval inclusions and in particular gives motivations for the strong additivity assumption. The third section enters the core of our analysis and contains a first inequality between the global index of the sectors and the index of the 2-interval subfactor. In Sect. 4 we study the structure of sectors associated with the LR net, an analysis mostly based on the braiding symmetry, the work of Izumi [22] and the α-induction, which has been introduced in [28] and further studied in [49, 2, 3]. Section 5 combines and develops the previous analysis to obtain our main results for the 2-interval inclusion. These results are extended to the case of n-interval inclusions in Sect. 6. We then illustrate our results in models and examples in Sect. 7. We collect in Appendix A the results of the universal crossed product description of the LR inclusion and of its multiple iterated occurring in our analysis. We include a further appendix concerning the disintegration of locally normal or localizable representations into irreducible ones, that is needed in the paper; these results have however their own interest. For basic facts concerning conformal nets of von Neumann algebras on R or S 1 , the reader is referred to [17, 28], see also Appendix B.
2. General Properties In this section we shortly examine a few elementary properties for nets of von Neumann algebras, partly to motivate our strong additivity assumption in the main body of the paper, and partly to examine relations with dual nets. To get our main result, the reader may however skip this part, except for Proposition 5, and get directly to the next section, where we will restrict our study to completely rational nets. In this section, A will be a local irreducible net of von Neumann algebras on S 1 , namely, A is an inclusion preserving map I I → A(I ) from the set I of intervals (open, non-empty sets with contractible closure) of S 1 to von Neumann algebras on a fixed Hilbert H space such that A(I1 ) and A(I2 ) commute if I1 ∩ I2 = ∅ and I ∈I A(I ) = B(H), where ∨ denotes the von Neumann algebra generated. If E ⊂ S 1 is any set, we put A(E) ≡
{A(I ) : I ∈ I, I ⊂ E}
and set ˆ A(E) ≡ A(E ) with E ≡ S 1 E.3 We shall assume Haag duality on S 1 , which automatically holds if A is conformal [4], namely, A(I ) = A(I ), I ∈ I,
3 The results in this section are also valid for nets of von Neumann algebras on R, if I denotes the set of non-empty bounded open intervals of R and E = R E for E ⊂ R.
Multi-Interval Subfactors and Modularity of Representations in CFT
635
ˆ ) = A(I ), I ∈ I, but for a disconnected set E ⊂ S 1 , thus A(I ˆ A(E) ⊂ A(E) is in general a non-trivial inclusion. We shall say that E ⊂ S 1 is an n-interval if both E and E are unions of n intervals with disjoint closures, namely E = I1 ∪ I2 ∪ · · · ∪ In ,
Ii ∈ I,
where I¯i ∩ I¯j = ∅ if i = j . The set of all n-intervals will be denoted by In . Recall that A is n-regular, if A(S 1 {p1 , . . . pn }) = B(H) for any p1 , . . . pn ∈ S 1 . Notice that A is 2-regular if and only if the A(I )’s are factors, since we are assuming Haag duality, and that A is 1-regular if for each point p ∈ S 1 , A(In ) = C (2) if In ∈ I and
n n In
= {p}.
Proposition 1. The following are equivalent for a fixed n ∈ N: ˆ (i) The inclusion A(E) ⊂ A(E) is irreducible for E ∈ In . (ii) The net A is 2n-regular. Proof. With E = I1 ∪ · · · ∪ In and p1 , . . . , p2n the 2n boundary points of E, we have = B(H), which holds if and only if ˆ ˆ A(E) ∩ A(E) = C if and only if A(E) ∨ A(E) 1 A(E) ∨ A(E ) = B(H), thus if and only if A(S {p1 , . . . , p2n }) = B(H), namely A is 2n-regular. If A is strongly additive, namely, A(I ) = A(I {p}), where I ∈ I and p is an interior point of I , then A is n-regular for all n ∈ N, thus all ˆ A(E) ⊂ A(E) are irreducible inclusions of factors, E ∈ In . A partial converse holds. If N ⊂ M are von Neumann algebras, we shall say that N ⊂ M has finite-index if the Pimsner–Popa inequality [38] holds, namely there exists λ > 0 and a conditional expectation E : M → N with E(x) ≥ λx, for all x ∈ M+ , and denote the index by [M : N ]E = λ−1 with λ the best constant for the inequality to hold and [M : N ] = [M : N ]min = inf [M : N ]E E
denotes the minimal index, (see [20] for an overview). Recall that A is split if there exists an intermediate type I factor between A(I1 ) and A(I2 ) whenever I1 , I2 are intervals and the closure I¯1 is contained in the interior of I2 . This implies (indeed it is equivalent to e.g. if the A(I )’s are factors) that A(I1 ) ∨ A(I2 ) is naturally isomorphic to the tensor product of von Neumann algebras A(I1 ) ⊗ A(I2 ) (cf. [10]) . For a conformal net, the split property holds if Tr(e−βL0 ) < ∞ for all β > 0, cf. [8]. Notice that if A is split and A(I ) is a factor for I ∈ I, then A(E) is a factor for E ∈ In for any n.
636
Y. Kawahigashi, R. Longo, M. Müger
Proposition 2. Let A be split and 1-regular. If there exists a constant C > 0 such that ˆ [A(E) : A(E)] < C ∀ E ∈ I2 , then [A(I ) : A(I {p})] < C ∀I ∈ I, p ∈ I. Proof. With I ∈ I and p ∈ I an interior point, let I1 , I2 ∈ I be the connected compo(n) nents of I {p}, let I2 ⊂ I2 be an increasing sequence of intervals with one boundary (n) (n) (n) point in common with I such that p ∈ / I2 and n I2 = I2 . Then En ≡ I1 ∪ I2 ∈ I2 and we have A(En ) ˆ n) A(E
A(I {p}), A(I ),
where Nn N means N1 ⊂ N2 ⊂ · · · and N = Nn , while Nn ! N will mean N1 ⊃ N2 ⊃ · · · and N = Nn . The first relation is clear by definition. The second relation follows because ˆ n ) = A(En ) = A(I ) ∨ A(Ln ), A(E where En ∈ I2 , En = I ∪ Ln , and Ln = {p}, therefore A(Ln ) ! C. By the split property A(I ) ∨ A(Ln ) ∼ = A(I ) ⊗ A(Ln ), hence by Eq. (2) A(En ) ! A(I ), thus ˆ n) A(E
A(I ).
The rest of the proof is the consequence of the following general proposition. Proposition 3. a) Let N1 ⊂ N2 ⊂ · · · ⊂ N ∩ ∩ ∩ M1 ⊂ M2 ⊂ · · · ⊂ M be von Neumann algebras, N = Ni , M = Mi , b) or let N1 ⊃ N2 ⊃ · · · ⊃ N ∩ ∩ ∩ M1 ⊃ M2 ⊃ · · · ⊃ M be von Neumann algebras, N = Ni , M = Mi . Then [M : N ] ≤ lim inf [Mi : Ni ]. i→∞
Multi-Interval Subfactors and Modularity of Representations in CFT
637
Proof. It is sufficient to prove the result in the situation b) as the case a) will follow after taking commutants. We may assume lim inf i→∞ [Mi : Ni ] < ∞. Let Ei : Mi → Ni be an expectation and λ > lim inf i→∞ [Mi , : Ni ]Ei . Then there exists i0 such that for all x ∈ M+ i , i ≥ i0 , Ei (x) ≥ λ−1 x. (0)
Let Ei = Ei |M , considered as a map from M to Ni , and let E be a weak limit point (0) of Ei . Then E(x) ≥ λ−1 x, x ∈ M+ , and E(M) ⊂ i Ni = N , moreover E|N = id, because Ei |N = id. Thus E is an expectation of M onto N and [M : N ] ≤ [M : N ]E ≤ λ. As Ei is arbitrary, we thus have [M : N ] ≤ lim inf i→∞ [Mi , : Ni ].
Recall now that the dual net Ad of A is the net on the intervals of R defined by ≡ A(R I ) , where we have chosen a point ∞ ∈ S 1 and identified S 1 with R ∪ {∞}. Note that if A is conformal, then Haag duality automatically holds [18] and the dual net Ad is also a conformal net which is moreover strongly additive; furthermore A = Ad , if and only if A is strongly additive, if and only if Haag duality holds on R.
Ad (I )
Corollary 4. In the hypothesis of Proposition 2, let Ad be the dual net on R, then A(I ) ⊂ Ad (I ) has finite index for all bounded intervals I of R. Proof. Denoting I1 = I , the complement of I in S 1 , the commutant of the inclusion A(I ) ⊂ Ad (I ) is A(I1 {∞}) ⊂ A(I1 ), and this has finite index. We have no example where A(I ) ⊂ Ad (I ) is non-trivial with finite index and A is conformal; therefore the equality A(I ) = Ad (I ), i.e. strong additivity, might follow from the assumptions in Corollary 2 in the conformal case. Proposition 5. Let A be split and strongly additive, then ˆ (a) The index [A(E) : A(E)] is independent of E ∈ I2 . ˆ (b) The inclusion A(E) ⊂ A(E) is irreducible for E ∈ I2 . Proof. Statement (b) is immediate by Proposition 1. Concerning (a), let E = I1 ∪ I2 and E˜ = I1 ∪ I˜2 , where I˜2 ⊃ I2 are intervals and ˆ E) ˜ : A(E)] ˜ < ∞, let E ˜ be the corresponding I0 ≡ I˜2 I2 . Assuming λ−1 ≡ [A( E expectation with λ-bound. Of course EE˜ is the identity on A(I0 ), hence ˆ ˜ = A(E), ⊂ A(I0 ) ∩ A(E) EE˜ (A(E)) where the last equality follows at once by the split property and strong additivity as A(I0 ) ∩ A(I˜2 ) = A(I2 ).
638
Y. Kawahigashi, R. Longo, M. Müger
Therefore EE˜ |Aˆ (E) = EE showing ˆ ˆ E) ˜ : A(E)], ˜ [A(E) : A(E)] ≤ [A( where we omit the symbol “min” as the expectation is unique. Thus the index decreases by decreasing the 2-interval. Taking commutants, it also increases, hence it is constant. Corollary 6. Let A satisfy the assumption of Proposition 2 and let Ad be the dual net on R of A. Then d (E) : Ad (E)] < ∞ ∀E ∈ I2 . [A Proof. We fix the point ∞ and may assume E = I1 ∪ I2 with ∞ ∈ I2 . Set E = I3 ∪ I4 with I3 ∞. Then Ad (I3 ) = A(I3 ), Ad (I2 ) = A(I2 ) and we have A(E) ⊂ Ad (I1 ) ∨ A(I2 ) d (E) = Ad (E) ⊂ A ˆ = (A(I3 ) ∨ Ad (I4 )) ⊂ (A(I3 ) ∨ A(I4 )) = A(E).
Anticipating results in the following, we have: ˆ Corollary 7. Let A be a local irreducible conformal split net on S 1 . If [A(E) : A(E)] = Iglobal < ∞, E ∈ I2 , then A is n-regular for all n ∈ N. Proof. If ρ is an irreducible endomorphism of A localized in an interval I , then ρ|A(I ) is irreducible [17]. Therefore, by Th. 9 (and comments thereafter) and Prop. 36, the ˆ assumptions imply that if E ∈ I2 then A(E) ⊂ A(E) is the LR inclusion associated ˆ with the system of all irreducible sectors, which is irreducible. Then A(E) ⊂ A(E) is irreducible for all E ∈ In as we shall see in Sect. 6. By Prop. 1 this implies the regularity for all n. In view of the above results, it is natural to deal with strongly additive nets, when considering multi-interval inclusions of local algebras and thus to deal with nets of factors on R, as we shall do in the following. 3. Completely Rational Nets In this section we will introduce the notion of completely rational net, that will be the main object of our study in this paper, and get a first analysis. In the following, we shall denote by Ithe set of bounded open non-empty intervals of R, set I = R I and define A(E) = {A(I ), I ⊂ E, I ∈ I} for E ⊂ R. We again denote by In the set of unions of n elements of I with pairwise disjoint closures.4 Definition 8. A local irreducible net A of von Neumann algebras on the intervals of R is called completely rational if the following holds: (a) Haag duality on R : A(I ) = A(I ) , I ∈ I, (b) A is strongly additive, 4 There will be no conflict with the notations in the previous section as the point ∞ does not contribute to the local algebras and we may extend A to S 1 setting A(I ) ≡ A(I {∞}), see Appendix B.
Multi-Interval Subfactors and Modularity of Representations in CFT
639
(c) A satisfies the split property, ˆ (d) [A(E) : A(E)] < ∞, if E ∈ I2 . Note that if A is the restriction to R of a local conformal net on S 1 (namely a local net which is Möbius covariant with positive energy and cyclic vacuum vector), then (a) is equivalent to (b), cf. [18]. ˆ We shall denote by µA = [A(E) : A(E)] the index of the irreducible inclusion of ˆ factors A(E) ⊂ A(E) in case µA is independent of E ∈ I2 , in particular if A is split and strongly additive, by Proposition 5. By a sector [ρ] of A we shall mean the equivalence class of a localized endomorphism ρ of A, that will always be assumed to be transportable, i.e. localizable in each bounded interval I (see also Appendix B). Unless otherwise specified, a localized endomorphism ρ has finite dimension. If ρ is localized in the interval I , its restriction ρ|A(I ) is an endomorphism of A(I ), thus it gives rise to a sector of the factor A(I ) (i.e. a normal unital endomorphism of A(I ) modulo inner automorphisms of A(I ) [25]) and it will be clear from the context which sense will be attributed to the term sector. The reader unfamiliar with the sector strucure is referred to [25, 28, 17] and to Appendix B. Let E = I1 ∪ I2 ∈ I2 and ρ and σ be irreducible endomorphisms of A localized respectively in I1 and in I2 . Then ρσ restricts to an endomorphism of A(E), since both ρ and σ restrict. ˆ Denote by γE the canonical endomorphism of A(E) into A(E) and λE ≡ γE |A(E) . Theorem 9. Let A be completely rational. With the above notations, ρσ |A(E) is contained in λE if and only if σ is conjugate to ρ. In this case ρσ |A(E) ≺ λE with multiplicity one. ˆ such Proof. By [28] ρσ |A(E) ≺ λE if and only if there exists an isometry v ∈ A(E) that vx = ρσ (x)v
∀x ∈ A(E).
(3)
If Eq. (3) holds, then it holds for x ∈ A(I ) for all I ∈ I by strong additivity, hence σ = ρ. ¯ Conversely, if σ = ρ, ¯ then there exists an isometry v ∈ A(I ) such that vx = ρσ (x)v for all x ∈ A(I ), where I is the interval I ⊃ E given by I = I1 ∪ I2 ∪ I¯3 with I3 the bounded connected component of E . Since ρ and σ act trivially on A(I3 ), we have v ∈ A(I3 ) ∩ A(I ), but
ˆ A(I3 ) ∩ A(I ) = (A(I3 ) ∨ A(I )) = A(E ) = A(E),
therefore Eq. (3) holds true. As the ρ and σ are irreducible, the isometry v in (3) unique up to a phase and this is equivalent to ρ ρ| ¯ A(E) ≺ λE with multiplicity one. We remark that in the above theorem strong additivity is not necessary for ρ ρ¯ ≺ λE , ˆ as can be replaced by the factoriality of A(E), equivalently of A(E); this holds e.g. in the conformal case. Moreover also the split property is unnecessary, it has not been used.
640
Y. Kawahigashi, R. Longo, M. Müger
We shall say that the net A on R has a modular PCT symmetry, if there exists a cyclic separating (vacuum) vector " for each A(I ), if I is a half-line (Reeh-Schlieder property), and the modular conjugation J of A(a, ∞) with respect to " has the geometric property J A(I + a)J = A(−I + a),
I ∈ I, a ∈ R.
(4)
This is automatic if A is conformal, see [4, 15]. It easy to see that the modular PCT property implies translation covariance, where the translation unitaries are products of modular conjugations, but positivity of the energy does not necessarily hold. Note that Eq. (4) implies Haag duality for half-lines A(−∞, a) = A(a, ∞),
a ∈ R.
Setting j ≡ AdJ , the conjugate sector exists and it is given by the formula [16] ρ¯ = j · ρ · j.
Corollary 10. If A is completely rational with modular PCT, then A is rational, namely there are only finitely many irreducible sectors [ρ0 ], [ρ1 ], . . . , [ρn ] with finite dimension and we have n
d(ρi )2 ≤ µA .
(5)
i=0
Proof. It is sufficient to show this last inequality. By the split property, the endomorphisms ρi ρ¯i |A(E) can be identified with the endomorphisms ρi ⊗ ρ¯i on A(I1 ) ⊗ A(I2 ), hence they are mutually inequivalent. By Theorem 9, n
ρi ρ¯i |A(E) ≺ λE ,
(6)
i=1
hence
ˆ : A(E)] = d(λE ) ≥ µA = [A(E)
d(ρi )2 .
We now give a partial converse to Theorem 9. Lemma 11. Let A be completely rational and let EE be the conditional expectation ˆ A(E) → A(E). (a) If E ⊂ E˜ and E, E˜ ∈ I2 , then EE˜ |Aˆ (E) = EE . ˆ E) ˜ to A(E) ˜ such that γ | ˆ (b) There exists a canonical endomorphism γE˜ of A( A(E) is a ˆ canonical endomorphism of A(E) into A(E) and satisfies γ |Aˆ (E) ∩A(E) ˜ = id.
Multi-Interval Subfactors and Modularity of Representations in CFT
Proof. (a) has been shown in the proof of Proposition 5. (b) is an immediate variation of [16, Prop. 2.3] and [28, Theorem 3.2].
641
Theorem 12. Let A be completely rational. Given E ∈ I2 , λE extends to a localized (transportable) endomorphism λ of A such that λ|A(I ) = id, if I ⊂ E , I ∈ I. Moreover, d(λ) = d(λE ) = µA . In particular, if A is conformal, then λ is Möbius covariant with positive energy. Proof. Let E = (a, b) ∪ (c, d), where a < b < c < d and E˜ = (a , b) ∪ (c, d ), where a < a and d > d. By Lemma 11 we have a γE˜ with λE˜ |A(I ) = id, if I ⊂ I, I ∈ E˜ E. ˆ E) ˜ → A(E) ˜ acting trivially Analogously there is a canonical endomorphism γ : A( on A(E). We may write γE˜ = Ad u · γ ˜ hence with u ∈ A(E),
λE˜ = Adu · λ,
λ = γ |A(E) ˜ .
Since γ |A(a,b) = id, γ |A(c,d) = id, we have λE˜ = Adu on A(a, b), A(c, d). Therefore, the formula
λ˜ = Adu
defines an endomorphism of A(a, d) acting trivially an A(b, c), with ˜ A((a,b)∪(c,d)) = λE . λ| We may also have chosen γ “localized” in (a , a ) ∪ (d , d ) with a < a < a and d < d < d so that we may assume λ˜ to act trivially on A((a , b) ∪ (c, d )). Letting a , a → −∞ and d , d → +∞, we construct, by an inductive limit of the ˜ an endomorphism λ of the quasi-local C ∗ -algebra s>0 A(−s, s). λ’s, Clearly, λ is localized in (a, d), acts trivially on A(b, c) and is transportable. Moreover, λ has finite index as the operators R, R¯ ∈ (i, λ2 ) in the standard solution for the conjugate equation [25, 29] ¯ R¯ ∗ λ(R) = 1,
¯ = 1, R ∗ λ(R)
ˆ on A(E) give the same relation on A(I ) for any I ⊃ E, I ∈ I. If A is conformal, then ρ is covariant with respect to translations and dilations by [17]. As we may vary the point ∞, λ is covariant with respect to dilations and translations with respect to a different point at ∞, hence λ is Möbius covariant. Lemma 13. Let A be completely rational. Then there are at most 'µA ( mutually different irreducible sectors of A (with finite or infinite dimension). Proof. Consider the family {[ρλ ]} of all irreducible sectors and let N be the cardinality of this family. With E = I1 ∪ I2 ∈ I2 , we may assume that each ρλ is localized in I1 and choose endomorphisms σλ equivalent to ρλ and localized in I2 . Let then ˆ uλ ∈ (ρλ , σλ ) ⊂ A(E) be a unitary intertwiner and E the conditional expectation from ˆ A(E) to A(E). Since uλ ρλ (x) = σλ (x)uλ = xuλ ,
∀x ∈ A(I1 ),
642
Y. Kawahigashi, R. Longo, M. Müger
we have
u∗λ uλ ρλ (x) = ρλ (x) u∗λ uλ ,
∀x ∈ A(I1 ),
E(u∗λ uλ )
∈ A(E) intertwines ρλ |A(I1 ) and ρλ |A(I1 ). The split property hence T = allowing us to identify A(E) and A(I1 ) ⊗ A(I2 ), every state ϕ in A(I2 )∗ gives rise to a conditional expectation Eϕ : A(E) → A(I1 ). Then Eϕ (T ) ∈ (ρλ , ρλ ), and the inequivalence of ρλ |A(I1 ), ρλ |A(I1 ), see above, entails Eϕ (T ) = 0. Since this holds for every ϕ ∈ A(I2 )∗ we conclude T = E(u∗λ uλ ) = 0,
λ = λ.
ˆ Let M be the Jones extension of A(E) ⊂ A(E), e ∈ M the Jones projection implementˆ ing E and let E1 : M → A(E) be the dual conditional expectation. Then eu∗λ uλ e = 0 if λ = λ and therefore the eλ ≡ uλ eu∗λ are mutually orthogonal projections in M with E1 (eλ ) = µ−1 λ eλ is again an orthogonal projection A . Since their (strong) sum p = we have p ≤ 1 and thus E1 (p) ≤ E(1) = 1. This implies the bound N µ−1 A ≤ 1 and thus our claim. We shall say that a sector [ρ] is of type I if ∨I ∈I ρ(A(I )) is a type I von Neumann algebra, namely ρ is a type I representation of the quasi local C∗ -algebra ∪s>o A(−s, s). Corollary 14. If A is completely rational on a separable Hilbert space, then all factor representations of A on separable Hilbert spaces are of type I. Proof. Assuming the contrary, by Corollary 59 we get an infinite family [ρλ ] of different irreducible sectors. This is in contradiction with the preceding proposition. We end this section with the following variation of a known fact [10]. Proposition 15. Let A be a completely rational net with modular PCT on a Hilbert space H. Then H is separable. Proof. We chose a pair I ⊂ I˜ of intervals and a type I factor N between A(I ) and A(I˜). The vacuum vector " is separating for A(I˜), hence for N . Thus N admits a faithful normal state, hence it is countably decomposable. Being of type I, N is countably generated. So A(I )" ⊂ N " is a separable subspace of H. But ∪∞ n=1 A(−n, n)" is dense in H, thus H is separable. 4. The Structure of Sectors for the (Time = 0) LR Net This section contains a study of the sector strucure for the net obtained by the LR construction, by means of the braiding symmetry. It will be continued in the next section by a different approach. Let N be an infinite factor and {[ρi ]} a rational system of sectors of N , namely the [ρi ]’s form a finite family of mutually different irreducible finite-dimensional sectors of N which is closed under conjugation and taking the irreducible components of compositions. The identity sector is usually labeled as ρ0 . We call M ⊃ N ⊗ N opp
Multi-Interval Subfactors and Modularity of Representations in CFT
643
the LR inclusion, the canonical inclusion constructed in [28], where M is a factor, N ⊗ N opp ⊂ M is irreducible with finite index and opp ρi ⊗ ρ i λ= i
for λ ∈ End(N ⊗ N opp ) as the restriction of γ : M → N ⊗ N opp . We shall give an alternative characterization of this inclusion in Proposition 45. The same construction works in slightly more generality, by replacing N opp with a opp j factor N1 and {ρi }i by {ρi }i ⊂ End(N1 ), where ρ → ρ j is an anti-linear invertible tensor functor of the tensor category generated by {ρi }i to the tensor category generated j by {ρi }i . Extensions of our results to this case are obvious, but sometimes useful, and will be considered possibly implicitly. The following is due to Izumi [22]. Since it is easy to give a proof in our context, we include a proof here. opp
Lemma 16. For every ρi , the (N ⊗ N opp )–M sector [ρi ⊗ id][γ ] = [id ⊗ ρ¯i ][γ ] is irreducible and each irreducible (N ⊗ N opp )–M sector arising from N ⊗ N opp ⊂ M is of this form, where γ is regarded as an (N ⊗ N opp )–M sector. If [ρi ] = [ρj ] as A–A sectors, then [ρi ⊗ id][γ ] = [ρj ⊗ id][γ ] as (N ⊗ N opp )–M sectors. We have opp [ρi ⊗ ρj ][γ ] = k Nikj¯ [ρk ⊗ id][γ ] as (N ⊗ N opp )–M sectors, where Nikj¯ is the structure constant for {ρi }i . Proof. Set [σ ] = [ρi ⊗ id][γ ] and compute [σ ][σ¯ ]. Since [γ¯ ] = [ι], where ι is the inclusion ⊗ N opp into M regarded as a M–(N ⊗ N opp ) sector, and [γ ][ι] = map of Nopp opp [λ] = k [ρk ⊗ ρk ], we have [σ ][σ¯ ] = k [ρi ρk ρ¯i ⊗ ρk ], and this contains the identity only once. So [ρi ⊗ id][γ ] is an irreducible (N ⊗ N opp )–M sector. We can similarly prove that if [ρi ] = [ρj ], then [ρi ⊗ id][γ ] = [ρj ⊗ id][γ ]. opp We next set [σ ] = [id ⊗ ρ¯i ][γ ] as an (N ⊗ N opp )–M sector, which is also irreducible. We compute opp opp opp [ρi ρk ⊗ ρk ρi ], [σ ][σ¯ ] = [ρi ⊗ id][λ][id ⊗ ρi ] = k
opp
which contains the identity only once. So we have [ρi ⊗ id][γ ] = [id ⊗ ρ¯i The rest is now easy.
][γ ].
Let us now assume we have a strongly additive, Haag dual, irreducible net of factors A(I ) on R with a rational system of irreducible sectors {[ρi ]}i (with ρ0 = id), namely {[ρi ]}i is a family of finitely many different irreducible sectors of A with finite dimension stable under conjugation and irreducible components of compositions. One may construct [42, 28] a net of subfactors A ⊗ Aopp ⊂ B so
that the correspondopp ing canonical endomorphism restricted on A ⊗ Aopp is given by i ρi ⊗ ρi . We call opp opp opp opp ∗ this B the LR net. For A , we use ε (ρk , ρl ) = j (ε(ρk , ρl )) , where j is the anti-isomorphism from A to Aopp . In order to distinguish two braidings, we write ε + and ε − . In other words, the LR net here is obtained as the time zero fields from the canonical two-dimensional net constructed in [28]: it is a local net, but if A is translation covariant with positive energy, B is translation covariant without the spectrum condition (the translation on B are space translations).
644
Y. Kawahigashi, R. Longo, M. Müger
Then the net of inclusion A ⊗ Aopp (I ) ⊂ B(I ) is a net of subfactors in the sense of [28, Sect. 3], that is, we have a vacuum vector with Reeh-Schlieder property and consistent conditional expectations. We denote by γ the canonical endomorphism of B into A ⊗ Aopp and its restriction to A ⊗ Aopp by λ. We may suppose that also λ is localized in I . We shorten our notation by setting N ≡ A(I ) and M = B(I ). We thus opp have λ(x) = i Vi (ρi ⊗ ρi )(x)Vi∗ , where Vi ’s are isometries in N ⊗ N opp with ∗ i Vi Vi = 1. We follow [21] for the terminology of (N ⊗ N opp )–M sectors, and so on, and study the sector structure of the subfactor N ⊗ N opp ⊂ M in this section. In other words we study the sector structure of a single subfactor, not the structure of superselection sectors of the net, though we will be interested in this structure for the net in the next section. So the terminology sector is used for a subfactor, not for a net, in this section. However the inclusion N ⊗ N opp ⊂ M has extra structure inherited by the inclusion of nets A ⊗ Aopp ⊂ B, that is there are the left and right unitary braid symmetries and opp the extension and restriction maps. We first note that {[ρi ⊗ ρj ]}ij gives a system of irreducible A ⊗ Aopp –A ⊗ Aopp sectors. This gives the description of the principal graph of N ⊗ N opp ⊂ M as a corollary as follows, which was first found by Ocneanu in [35] for his asymptotic inclusion. Label opp even vertices with (i, j ) for [ρi ⊗ ρj¯ ] and odd vertices with k for [ρk ⊗ id][γ ] and draw an edge with multiplicity Nijk between the even vertex (i, j ) and the odd vertex k. The connected component of this graph containing the vertex (0, 0) is the principal graph of the subfactor N ⊗ N opp ⊂ M. Now we consider the α-induction introduced in [28] and further studied in [49, 2], namely if σ is a localized endomorphism of A ⊗ Aopp , we set ασ± = γ −1 · Ad(ε ± (σ, λ)) · σ · γ .
(7)
(The notation in [28] is σ ext .) Recall that if σ is an endomorphism of A ⊗ Aopp localized in the interval I , then ασ± is an endomorphism of B localized in a positive/negative half-line containing I , yet, as shown in [2, I], ασ± restricts to an endomorphism of M = B(I ). We will denote this restriction by the same symbol ασ± . Lemma 17. The M–M sectors [αρ+i ⊗id ] are irreducible and mutually different.
Proof. We compute *αρ+i ⊗id , αρ+j ⊗id +, the dimension of the intertwiner space between αρ+i ⊗id and αρ+j ⊗id , by using [2, I, Theorem 3.9]. This number is then equal to opp * ρk ρi ⊗ ρk , ρj ⊗ id+ = δij . k
This gives the conclusion. + Lemma 18. As M–M sectors, we have [αρ+i ⊗id ] = [αid⊗ρ opp ]. i
+ Proof. By a similar argument to the proof of the above lemma, we know that [αid⊗ρ opp ]
is also irreducible. [2, I, Theorem 3.9] gives opp opp + *[αρ+i ⊗id ], [αid⊗ρ ρk ρi ⊗ ρk , id ⊗ ρi + = 1, opp ]+ = * i
which gives the conclusion.
k
i
Multi-Interval Subfactors and Modularity of Representations in CFT
645
We then have the following corollary. Corollary 19. The set of irreducible M–M sectors appearing in the decomposition of αρ+ ⊗ρ opp for all i, j is {[αρ+i ⊗id ]}i . i
j
The next theorem is useful for studying the subfactors arising from disconnected intervals for a conformal net. For the rest of this section we shall assume the braiding to be non-degenerate. Theorem 20. Assume the braiding to be non-degenerate and suppose an irreducible M–M sector [β] appears in decompositions of both αρ+ ⊗ρ opp and αρ− ⊗ρ opp for some i
i, j, k, l. Then [β] is the identity of M.
k
j
l
Proof. α + and α − map sectors localized in bounded intervals to soliton sectors localized in right unbounded and left unbounded half-lines, respectively. Hence [β] is localized in a bounded interval. By the above corollary, we may assume that [β] = [αρ+i ⊗id ] for some i, hence ρi ⊗ id must have trivial monodromy with λ, i.e., ε(ρi ⊗ id, λ)ε(λ, ρi ⊗ id) = 1, which in turn gives ε(ρi , ρk )ε(ρk , ρi ) = 1 for all k. The non-degeneracy assumption gives [ρi ] = [id] as desired. − We now define an endomorphism of M by βij = αρ+i ⊗id αid⊗ρ opp . More explicitly, j
we have βij = γ −1 · Ad(Uij+− ) · (ρi ⊗ ρj ) · γ , where opp opp opp Uij+− = Vk (ε + (ρi , ρk ) ⊗ ε −,opp (ρj , ρk ))(ρi ⊗ ρj )(Vk∗ ). opp
k
Note that if we define similarly opp opp opp Uij++ = Vk (ε + (ρi , ρk ) ⊗ ε +,opp (ρj , ρk ))(ρi ⊗ ρj )(Vk∗ ), k
we then have αρ+ ⊗ρ opp = γ −1 · Ad(Uij++ ) · (ρi ⊗ ρj ) · γ . By [2],1 Prop. 18, we have opp
i
j
− − + − + [βij ] = [αρ+i ⊗id ][αid⊗ρ opp ] = [αρ ⊗id ][αρ ⊗id ] = [α opp ][αρ ⊗id ] j i i id⊗ρ j
j
as M–M sectors. The following proposition is originally due to Izumi [22] (with a different proof) and first due to Ocneanu [37] in the setting of the asymptotic inclusion. (Also see [13].) Proposition 21. Each [βij ] is an irreducible M–M sector and these are mutually different for different pairs of i, j . Each irreducible M–M sector arising from N ⊗ N opp ⊂ M is of this form. Proof. We compute − + − + − *βij , βkl + = *αρ+i ⊗id αid⊗ρ opp , αρ ⊗id α opp + = *αρ¯ ρ ⊗id , α opp opp +. k k i id⊗ρ id⊗ρ ρ¯ j
The only sector which can be contained in
l
[αρ+¯k ρi ⊗id ]
l
and
− [αid⊗ρ opp opp ] l ρ¯j
j
is the identity
by the above proposition. So the above number is δik δj l . Since the square sums of the opp statistical dimensions for {ρi ⊗ ρj }ij and {βij }ij are the same, it completes the proof.
646
Y. Kawahigashi, R. Longo, M. Müger opp
Note that here we have used the definition in [28] for the map ρi ⊗ ρj → βij , and a general theory of this map has been studied in [2] under the name α-induction. But in [2], they assumed a certain condition, called chiral locality in the terminology of [3], and some results in [2] depend on this assumption, while the definition itself makes sense without it. Our mixed use of braidings ε+ and ε − here violates this chiral locality condition, so we can use the results in [2] here only when they are independent of the chiral locality assumption. For example, it is easy to see that the analogue of [2, I, Theorem 3.9] does not hold for our map here. With the above proposition, we have the following description of the dual principal graph of N ⊗ N opp ⊂ M as a corollary, which is originally due to Ocneanu [37]. (Also see [13].) Label even vertices with (i, j ) for [βi j¯ ] and odd vertices with k for [ρk ⊗ id][γ ] and draw an edge with multiplicity Nijk between the even vertex (i, j ) and the odd vertex k. The connected component of this graph containing the vertex (0, 0) is the dual principal graph of the subfactor N ⊗ N opp ⊂ M, which is the same as the principal graph. We next study the tensor category of the M–M sectors. Lemma 22. Let V , W be intertwiners from ρi ρk to ρm and from ρj ρl to ρn , respectively, in N . Then V ⊗ W ∗ opp ∈ N ⊗ N opp in an intertwiner from βij βkl to βmn . Proof. By a direct computation.
Then we easily get the following from the above lemma. (The quantum 6j -symbols for subfactors have been introduced in [36] as a generalization for classical 6j -symbols. See [12, Chapter 12] for details.) Theorem 23. In the above setting, the tensor categories of (N ⊗ N opp )–(N ⊗ N opp ) sectors and M–M sectors with quantum 6j -symbols are isomorphic. 5. Relations with the Quantum Double This section contains our main results. Here below we will consider an inclusion A ⊂ B of nets of factors. We shall say that A ⊂ B has finite index if there is a consistent family of conditional expectations EI : B(I ) → A(I ), I ∈ I and [B(I ) : A(I )]EI < ∞ does not depend on I ∈ I. The independence of the index of the interval I automatically holds if there is a vector (vacuum) with Reeh-Schlieder property and EI preserves the vacuum state (standard nets, see [28]). The index will be simply denoted by [B : A]. Proposition 24. Let A ⊂ B be a finite-index inclusion of nets of factors as above. If A and B are completely rational then µ A = I 2 µB with I = [B : A]. Proof. If N1 , N2 are factors, we shall use the symbol α
N1 ⊥ N2 to indicate that N1 ⊂ N2 and [N2 : N1 ] = α.
Multi-Interval Subfactors and Modularity of Representations in CFT
647
Let E = I1 ∪ I2 ∈ I2 ; we will show that µB
B(E) ⊥ B(E ) ∪ I2 ,
I2 ∪ I2 µA
A(E) ⊥ A(E ) where A(E) ⊂ B(E) has index I2 because A(E) ∼ = A(I1 ) ⊗ A(I2 ), B(E) ∼ = B(I1 ) ⊗ B(I2 ) and [B(Ii ) : A(Ii )] = I. In the diagram, the commutants are taken in the Hilbert space HB of B, hence µB
B(E) ⊥ B(E ) is obvious. We now show that on HB , I2 µA
A(E) ⊥ A(E ). Let γ : B → A be a canonical endomorphism with λ = γ |A localized in an interval I0 ; then the net I → A(I ) on HB (I ⊃ I0 ) is unitarily equivalent to the net I → λ(A(I )) on HA and we may assume I0 ⊂ I1 . Then the correspondence associated with A(E)-A(E )
on HB ,
namely HB with the natural commuting actions of A(E) and A(E ), is unitarily equivalent to the one associated with λ(A(E))-λ(A(E ))
on HA ,
namely HA with the commuting actions of A(E) and A(E ) obtained by composing their defining actions with the map X → λ(X). But λ(A(E)) = λ(A(I1 ) ∨ A(I2 )) = λ(A(I1 )) ∨ A(I2 ) and λ(A(E )) = A(E ), hence the A(E)–A(E ) correspondence on HB is unitarily equivalent to (λ(A(I1 )) ∨ A(I2 )) − A(E ) on HA and its index is ˆ ˆ [A(E) : λ(A(I1 )) ∨ A(I2 )] = [A(E) : A(E)][A(E) : λ(A(I1 )) ∨ A(I2 )] = µA I2 . It follows from the diagram that I 2 µ A = µB I 2 · I 2 , thus, I2 µB = µA .
The following proposition may be generalized to the case of a finite-index inclusion A ⊂ B as above.
648
Y. Kawahigashi, R. Longo, M. Müger
Proposition 25. Let A be completely rational with modular PCT and B ⊃ A ⊗ Aopp be the LR net. Then also B is completely rational with modular PCT. Proof. Let E = I1 ∪ I2 and I3 be the bounded connected component of E . Set C ≡ A ⊗ Aopp . Then the conditional expectation EI : B(I ) → C(I ) associated with the ˆ interval I , where I is the interior of I¯1 ∪ I¯2 ∪ I¯3 , maps B(E) onto C(E), because ˆ EI (B(E)) ⊂ C(I3 ) ∩ C(I ) = C(E), thus E ≡ E0 · EI |B(E)
(8)
ˆ onto is a finite-index expectation of B(E) onto C(E), where E0 is the expectation of C(E) C(E). Therefore µB < ∞ follows by a diagram similar to the one in (5) (with A ⊗ Aopp instead of A), as we know a priori that the vertical inclusions have a finite index, while the bottom horizontal inclusion has finite index by the argument given there. Then the strong additivity of B follows easily, and so its modular PCT property, but we omit the arguments that are not essential here (if A is a conformal case this follows directly because then also B is conformal). We now show the split property of B. For notational convenience we treat the case of two separated intervals, rather than that of an interval and the complement of a larger interval. It will be enough to show that the above expectation (8) satisfies E(b1 b2 ) = E(b1 )E(b2 ),
bi ∈ C(Ii ),
and E(B(Ii )) ⊂ C(Ii ), as we may then compose a normal product state ϕ1 ⊗ ϕ2 of C(I1 ) ∨ C(I2 ) C(I1 ) ⊗ C(I2 ) with E to get a normal product state of B(I1 ) ∨ B(I2 ). (h) Let Ri ∈ B(Ih ), h = 1, 2, be elements satisfying the relations (15) so that B(Ih ) is (h) generated by C(Ih ) and {Ri }i . With bh ∈ B(Ih ) we then have (h) (h) (h) ai Ri , ai ∈ C(Ih ), b(h) = i
hence b(1) b(2) =
i,j
(1)
(1) (2)
(1)
(2)
a i aj R i R j ,
(2)
(1)
so we have to show that E(Ri Rj ) = 0 unless i = j = 0. Now Ri ˆ some unitary ui ∈ C(E) and (2)
(2)
EI (Ri Rj ) = EI
k
Cijk
(2)
(2)
Rk
= Cij0
(2)
0 = δij ¯ Cij
(2)
(2)
= ui Ri
for
,
(see Appendix A for the definition of the Cijk ), hence (1)
(2)
(2)
(2)
(2)
(2)
E(Ri Rj ) = E(ui Ri Rj ) = E0 (ui EI (Ri Rj )) = E0 (ui Ci0i¯
(2)
) = E0 (ui )Ci0i¯
(2)
,
which is 0 if i = 0 because E0 (ui ) ∈ C(E) is an intertwiner between irreducible endomorphisms localized in I1 and I2 , while E0 (u0 ) = E0 (1) = 1.
Multi-Interval Subfactors and Modularity of Representations in CFT
649
We get the following corollary, where the last part will follow from Proposition 36 later. Corollary 26. Let A be completely rational and A ⊗ Aopp ⊂ B be the LR inclusion. Then
2 µB , µ2A = Iglobal
where Iglobal = d(ρi )2 . ˆ In particular, µB = 1 if and only if A(E) ⊂ A(E) is isomorphic to the LR inclusion. Proof. By Propositions 24, 25 and 36.
Lemma 27. Let A1 , A2 be irreducible, Haag dual nets on separable Hilbert spaces. Assume that each sector of A1 is of type I. If ρ is an irreducible localized endomorphism of A1 ⊗ A2 , then ρ ρ1 ⊗ ρ2 with ρi irreducible localized endomorphisms of Ai . Proof. Let π be a DHR representation of A1 ⊗ A2 (see Appendix B) on a separable Hilbert space H. Then π(A1 ) and π(A2 ) generate the von Neumann algebra B(H), where Ai denotes the quasi-local C∗ -algebra associated by Ai . Hence π(A1 ) and π(A2 ) are factors. Let πi ≡ π|Ai , where we identify A1 with A1 ⊗ C and A2 with C ⊗ A2 , then πi is easily seen to be localizable in bounded intervals (namely if I1 ∈ I, the restriction of π1 to the C ∗ -algebra generated by {Ai (I ) : I ∈ I1 , I ∈ I} extends to a normal representation of Ai (I1 )). Therefore πi is unitarily equivalent to a localized endomorphism of Ai . As π1 is a factor representation, by assumption π(A1 ) is a type I factor and so is π(A2 ) . We then have a decomposition π = π1 ⊗ π2 . This concludes the proof. Corollary 28. Let A be a completely rational net on a separable Hilbert space. The only irreducible finite dimensional sectors of A ⊗ Aopp are opp
[ρi ⊗ ρj ] with [ρi ], [ρj ] irreducible sectors of A. Proof. Immediate by Lemma 14 and the above lemma.
Lemma 29. Let A be completely rational and B ⊃ A ⊗ Aopp the LR net. If σ is an irreducible localized endomorphism of B and σ ≺ αρ+ , σ ≺ αρ− for some localized endomorphism ρ, ρ of A ⊗ Aopp , then σ is localized in a bounded interval. Proof. The thesis follows because σ ≺ αρ+ is localized in a right half-line and σ ≺ αρ− in a left half-line. The following lemma extends Theorem 20.
650
Y. Kawahigashi, R. Longo, M. Müger
Lemma 30. Let A be a completely rational net, {[ρi ]}i the system of all irreducible sectors with finite dimension, and B ⊃ A⊗Aopp the LR net. The following are equivalent: (i) The braiding of the net A is non-degenerate. (ii) B has no non-trivial localized endomorphism (localized in a bounded interval, finite index). Proof. We use now an argument in [7]. Let σ be a non-trivial irreducible localized endomorphism of B localized in an interval, with d(σ ) < ∞. By Frobenius reciprocity σ ≺ ασ+rest ,
σ ≺ ασ−rest , where σ rest = γ · σ |A⊗Aopp and γ : B → A ⊗ Aopp is a canonical endomorphism. Hence if ρk ⊗ id ≺ σ rest is an irreducible sector with [αρ+k ⊗id ] = [σ ], then by [28], opp ρi ⊗ ρi must be trivial, Prop. 3.9, the monodromy of ρk ⊗ id with γ |A⊗Aopp = namely ρk is a non-trivial sector with degenerate braiding. The converse is true, namely if ρk is a non-trivial degenerate sector, then αρ+k ⊗id is a non-trivial sector of B localized in a bounded interval. Lemma 31. Let A be a completely rational net with modular PCT and let {[ρi ]}i be the system of all finite dimensional sectors of A. If E = I1 ∪ I2 ∈ I2 , then λE = ρi ρ¯i |A(E) , i
where λE = γE |A(E) , the ρi ’s are localized in I1 and the ρ¯i ’s are localized in I2 . Proof. Let j = AdJ , where J is the modular conjugation of A(0, ∞). Given I ∈ I we may identify A(I )opp with j (A(I )) = A(−I ). We define a net A˜ on R setting ˜ ) ≡ A(I ) ⊗ A(I )opp = A(I ) ⊗ A(−I ), A(I
I ∈ I.
ˆ → A(E) be the canonWith I = (a, b) with 0 < a < b and E = I ∪ −I , let γE : A(E) ical endomorphism and λE ≡ γE |A(E) . We identify now λE with an endomorphism of ˜ ) and want to show that ηI extends to a localized endomorphism of A. ˜ ηI of A(I The proof is similar to the one of Theorem 12. With d > c > b, by Lemma 11 there ˜ d) with η| ˜ is an extension η of η(a,b) to A(a, A(b,d) = id and a canonical endomorphism ˜ d) such that η(a,d) acting trivially on A(a, c) with a unitary u ∈ A(a, η = Adu · η(a,d) . ˜ Therefore Adu|A˜ (−∞,c) is an extension of η(a,b) to A(−∞, c) which acts trivially on ˜ ˜ A(−∞, a) and on A(b, c). Letting c → ∞ we obtain the desired extension of η(a,b) to ˜ that we still denote by η. A, ˜ every irreducible subsector of η will be equivalent to Now, by Lemma 27 for A, ρh ⊗ (j · ρk · j ) for some h, k, hence each irreducible subsector of λE must be equivalent to ρh · ρ¯k |A(E) , where ρh is localized in (a, b) and ρk is localized in (−b, −a). By Theorem 9 this is possible if and only if h = k.
Multi-Interval Subfactors and Modularity of Representations in CFT
651
Corollary 32. Let A be completely rational with modular PCT. The following are equivalent: (i) The net A has no non-trivial sector with finite dimension. (ii) The net A has no non-trivial sector (with finite or infinite dimension). (iii) µA = 1, namely A(E) = A(E ) for all E ∈ I2 . Proof. (i) ⇒ (ii): It will be enough to show that every sector (possibly with infinite dimension) ρ of A contains the identity sector. Given E = I1 ∪ I2 with I1 , I2 ∈ I, we may suppose that ρ is localized in I1 and choose a sector ρ equivalent to ρ and ˆ localized in I2 . If u is a unitary with Adu · ρ = ρ , then u ∈ A(E), hence u ∈ A(E) by assumptions. Now A(E) A(I1 ) ⊗ A(I2 ) by the split property, hence there exists a conditional expectation E : A(E) → A(I1 ) with E(u) = 0, thus E(u) is a non-zero intertwiner between ρ and the identity. (ii) ⇒ (iii) follows by Lemma 31. (iii) ⇒ (i) follows by Th. 9 (or by Lemma 31). The condition µA = 1 is however compatible with the existence of soliton sectors. ˆ ˆ Note also that the condition that A(E) ⊂ A(E) has depth ≤ 2 (equivalently A(E) is the crossed product of A(E) by a finite-dimensional Hopf algebra) is equivalent to the innerness of the sector λ extending λE (because λE is implemented by a Hilbert space ˆ of isometries in A(E) [26]), hence it is equivalent to the property that all irreducible sectors of A have dimension 1 by Lemma 31. The following is the main result of this paper. Theorem 33. Let A be completely rational with modular PCT. Then µA = Iglobal ≡ d(ρi )2 ˆ and A(E) ⊂ A(E) is isomorphic to the LR inclusion associated with A(I1 ) ⊗ A(I2 ) and all the finite-dimensional irreducible sectors [ρi ] of A. ˆ Proof. A(E) ⊃ A(E) contains the LR inclusion by the following Proposition 36. Since µA = Iglobal by Lemma 31 it has to coincide with the LR inclusion. Corollary 34. Let A be completely rational and conformal. The inclusions A(E) ⊂ ˆ A(E) are all isomorphic for E ∈ I2 . Proof. If I ∈ I and the ρi ’s are localized in I , for any given I1 ∈ I there is a Möbius transformation giving rise to an isomorphism of A(I ) with A(I1 ) carrying the ρi ’s to endomorphisms localized in I1 . Therefore the isomorphism class of {A(E), λE } is independent of E ∈ I2 . Hence the LR inclusions based on that are isomorphic. Indeed, by using the uniqueness of the I I I1 injective factor [6, 19] and the classification of its finite depth subfactors [40] we have the following. Corollary 35. Let A be completely rational and conformal. The isomorphism class of ˆ the inclusion A(E) ⊂ A(E), E ∈ I2 , depends only on the tensor category of the sectors of A, not on its model realization.
652
Y. Kawahigashi, R. Longo, M. Müger
Proof. If A is non-trivial and I is an interval, then A(I ) is a I I I1 factor and, as the split property holds, A(I ) is injective (see e.g. [27]). Thus A(I ) is the unique injective I I I1 factor [19]. By Popa’s theorem [40], if N is a I I I1 injective factor and T ⊂ End(N ) a rational tensor category isomorphic to the tensor category of sectors of A (as abstract tensor categories), then there exists an isomorphism of N with A(I ) implementing the equivalence between the two tensor categories. Since the LR inclusion N ⊗ N opp ⊂ M clearly depends, up to isomorphism, only ˆ on N and the tensor category T ⊂ End(N ), it is then isomorphic to A(E) ⊂ A(E). We now show that, even in the infinite index case, the two-interval inclusion always contains the LR inclusion associated with any rational system of irreducible sectors. Proposition 36. Let A be completely rational with modular PCT j and E = I ∪−I ∈ I2 a symmetric 2-interval and {[ρi ]} a rational system of irreducible sectors of A with finite dimension, with the ρi ’s localized in I . Let Ri ∈ (id, ρ¯i ρi ) be non-zero intertwiners, where ρ¯i = j · ρi · j . ˆ If M is the von Neumann subalgebra of A(E) generated by A(E) and {Ri }i , then M ⊃ A(E) is isomorphic to the LR inclusion associated with {[ρi ]}i , in particular [M : A(E)] = d(ρi )2 . i
More generally this holds true if the assumption of complete rationality is relaxed with ˆ possibly [A(E) : A(E)] = ∞. Proof. Denoting by N the factor A(0, ∞), we may assume I¯ ⊂ (0, ∞) and consider the ρi as endomorphisms of N . Let then Vi be the isometry standard implementation of ρi as in [17]. Since J Vi J = Vi , we have ρi ρ¯i (X)Vi = Vi X for all X ∈ N ∨ N , hence for all local operators X by strong additivity. Since ρi is irreducible, √ (id, ρi ρ¯i ) is one-dimensional, thus Ri is a multiple of Vi and we may assume Ri = d(ρi )Vi , thus Ri∗ Ri = d(ρi ).
(9)
Now Vi Vj is the standard implementation of ρi ρj on N hence by [17, Prop. A.4], we have Ri R j = Cijk Rk , (10) k
where Cijk is the canonical intertwiner between ρk ρ¯k and ρi ρj ρ¯i ρ¯j given by Cijk =
h
wh j (wh )
wh ⊗ j (wh ),
h
where the wh ’s form an orthonormal basis of isometries in (ρk , ρi ρj ).
(11)
Multi-Interval Subfactors and Modularity of Representations in CFT
653
Setting ρ0 = id, we also have 0∗ Ri∗ = d(ρi )Cii ¯ Ri¯ .
(12)
Indeed the above equality holds up to sign by the j -invariance of both members [17, Lemma A.3], but the – sign does not occur because both members have positive expectation values on the vacuum vector. Now by the split property A(E) = A(I ) ∨ A(−I ) A(I ) ⊗ A(−I ) and A(−I ) = j (A(I )) can be identified with A(I )opp , therefore M is isomorphic to the algebra generated by A(I ) ⊗ A(I )opp and a multiple of isometries Ri satisfying the above relations. Moreover, there exists a conditional expectation from M to A(I ) ⊗ A(I )opp . Corollary 46 then gives the desired isomorphism between A(E) ⊂ M and the LR inclusion. (The Longo–Rehren inclusion in [31], as well as in [28], is dual to the one in this paper, but it does not matter here. Notice further that, in the conformal case, the ˆ 2-interval inclusion A(E) ⊂ A(E) is manifestly self-dual.) The above proof works also in the case µA = ∞ thanks to Prop. 45. Corollary 37. Let A be completely rational with modular PCT. Then the braiding of the tensor category of all sectors of A is non-degenerate. 2 Proof. With the notations in Corollary 26 we have µ2A = Iglobal µB . On the other hand 2 opp Iglobal = Iglobal (A ⊗ A ), hence 2 µB , Iglobal (A ⊗ Aopp ) = µ2A = Iglobal
therefore µB = 1. By Corollary 32 we B has no non-trivial sector localized in a bounded interval and this is equivalent to the non-degeneracy of the braiding by Lemma 30. That µA = Iglobal implies the non-degeneracy of the braiding has been noticed in [32, Corollary 4.3]. An immediate consequence of Corollary 37 follows from the work [41], where a model independent construction of Verlinde’s matrices S and T has been performed, provided the braiding symmetry is non-degenerate, thus providing a corresponding representation of the modular group SL(2, Z). Hence we have: Corollary 38. The Verlinde matrices T and S constructed in [41] are non-degenerate, hence there exists an associated representation of the modular group SL(2, Z). Corollary 39. Let A be completely rational with modular PCT. Every sector of A is a direct sum of finite dimensional sectors. Proof. Assuming the contrary, by Proposition 59 we have an irreducible sector [ρ] with infinite dimension. Let E = I1 ∪ I2 ∈ I2 with ρ localized in I1 and ρ be equivalent ˆ to ρ and localized in I2 . Let u be a unitary in (ρ, ρ ). Then u ∈ A(E), hence it has a unique expansion u= xi Ri , xi ∈ A(E), i
where Ri are as in Proposition 36. As xu = uρ(x), x ∈ A(I1 ), we have x xi Ri = xi Ri ρ(x) = xi (ρi · ρ¯i )(ρ(x))Ri = xi ρi (ρ(x))Ri i
i
i
i
∀x ∈ A(I1 ),
654
Y. Kawahigashi, R. Longo, M. Müger
thus xxi = xi ρi (ρ(x)) for all i. As there is a xi = 0, by the split property there is a non-zero intertwiner between ρi · ρ and the identity. As ρi and ρ are irreducible, this implies that ρ is finite dimensional, contradicting our assumption. Corollary 40. Let A be conformal and completely rational. Then every representation on a separable Hilbert space is Möbius covariant with positive energy. Proof. By the preceding result every such representation is a direct sum of irreducible sectors with finite dimension. According to [16] every finite dimensional sector is covariant with positive energy, thus also a direct sum of such sectors. 6. n-Interval Inclusions In this section we extend the results on the 2-interval subfactors to arbitrary multi-interval subfactors. Let A be a local, irreducible net on S 1 . We assume A to be completely rational with modular PCT, so that our previous analysis applies. Alternatively A may be assumed ˆ to be conformal with µA = [A(E) : A(E)] finite and independent of the 2-interval E; this setting will be needed to derive Cor. 7. If E ∈ In we set ˆ µn = [A(E) : A(E)]. With this notation µA = µ2 . We also consider the situation occurring in representations different from the vacuum representation: if ρ is a localizable representation of A (i. e. a DHR representation, that, on S 1 , is just the locally normal representations), we set ρ µn = [ρ(A(E )) : ρ(A(E))]. ρ
ρ
Lemma 41. µn = µ1 µn ,
∀n ∈ N.
Proof. Let E = I1 ∪ I2 ∪ · · · ∪ In ∈ In . We may suppose that ρ is an endomorphism of ˆ A localized in I1 . Since ρ acts trivially on E , we have ρ(A(E )) = A(E ) = A(E), thus the inclusion ρ(A(E)) ⊂ ρ(A(E )) is a composition ˆ ρ(A(E)) ⊂ A(E) ⊂ ρ(A(E )) = A(E) ; by the split property ρ(A(E)) ⊂ A(E) is isomorphic to ρ(A(I1 )) ⊗ A(I2 ∪ · · · ∪ In ) ⊂ ˆ 2 ∪ · · · ∪ In ), therefore A(I1 ) ⊗ A(I ˆ : A(E)] · [A(I1 ) : ρ(A(I1 )]. µρn = [A(E) ρ
Lemma 42. µn = d(ρ)2 µn−1 2 ,
∀n ∈ N. ρ
Proof. By the index-statistics theorem [25] we have µ1 = d(ρ)2 , hence, by Lemma 41, we only need to show that µn = µn−1 2 . We proceed inductively. If n = 1 the claim is trivially true. Assume the claim for a given n and let En = I1 ∪ · · · ∪ In ∈ In and En+1 = I1 ∪ · · · ∪ In ∪ In+1 ∈ In+1 . Then ˆ n ) ∨ A(In+1 ) ⊂ A(E ˆ n+1 ), A(En+1 ) = A(En ) ∨ A(In+1 ) ⊂ A(E ˆ n+1 ) : A(E ˆ n ) ∨ A(In+1 )] and, by the thus, by the split property, µn+1 = µn · [A(E ˆ ˆ n+1 ) is equal to inductive assumption, we have to show that A(En ) ∨ A(In+1 ) ⊂ A(E ) ∩ A(E ) ⊂ A(E µ2 . But the commutant of this latter inclusion A(In+1 n n+1 ) has index
Multi-Interval Subfactors and Modularity of Representations in CFT
655
is µ2 because, by the split property, it turns out to be isomorphic to A(I9 ∪ Ir ) ⊗ A(L) ⊃ ˆ 9 ∪ Ir ) ⊗ A(L), namely to a 2-interval inclusion tensored by a common factor, A(I where I9 and Ir are the two intervals of En+1 contiguous to In+1 and L is the remaining (n − 1)-subinterval of En+1 . Theorem 43. Let A be a local, irreducible completely rational net with modular PCT. Let E = ∪ni=1 Ii ∈ In and λ(n) = γ (n) |A(E), where γ (n) is a canonical endomorphism ˆ from A(E) into A(E). Then λ(n) ∼ Ni01 ...in ρi1 ρi2 · · · ρin , (13) = i1 ,... ,in
where {[ρi ]}i are all the irreducible sectors with finite statistics, ρik being localized in Ik . Ni01 ...in is the multiplicity of the identical endomorphism in the product ρi1 . . . ρin . The same results hold true if complete rationality is replaced by conformal invariance ˆ and assuming [A(E) : A(E)] = Iglobal < ∞ independently of the 2-interval E. Proof. Let I be an interval which contains ∪i Ii and let ρik , k = 1, . . . , n, be irreducible endomorphisms localized in Ik , respectively. Then the intertwiner space between ρi1 ρi2 · · · ρin , considered as an endomorphism of A(I ), and the identity has dimension Ni01 ...in . We are using here the equivalence between local and global intertwiners, that holds either by strong additivity or by conformal invariance [17]. These ˆ intertwiners are multiples of isometries in A(E). Thus, by the argument leading to Th. (n) 9, ρi1 ρi2 · · · ρin |A(E) is contained in λ with multiplicity Ni01 ...in . We have thus proved the inclusion 0 in (13). Now the dimension of the endomorphism on the right-hand side of (13) has been computed in [50]. For the sake of selfcontainedness we repeat the argument: i Ni01 ...in d(ρ1 ) · · · d(ρn ) = Ni1n...in−1 d(ρin ) d(ρi1 ) · · · d(ρin−1 ) i1 ,... ,in
i1 ,... ,in−1
=
in
2
d(ρ1 ) · · · d(ρin−1 )
i1 ,... ,in−1
(14) n−1
=
i
d(ρi2 )
,
where we have used Frobenius reciprocity Ni01 ...in = Nii1n...in−1 , the fact d(ρ) = d(ρ) and the identity i *ρi , ρ+d(ρi ) = d(ρ). On the other hand, we have n−1 ˆ : A(E)] = µn−1 d(λ(n) ) = [A(E) A = Iglobal =
d(ρi )2
n−1
,
i
where the first equality is obvious, the second is given by Lemma 42 and the last one follows from the results of the preceding section. Thus the endomorphisms on both sides of (13) have the same dimension, hence they are equivalent. The last claim in the statement follows by the same arguments and the equivalence between local and global intertwiners.
656
Y. Kawahigashi, R. Longo, M. Müger
ˆ Corollary 44. Let A be as in Th. 43. If E ∈ In , then A(E) ⊂ A(E) is isomorphic to the nth iterated LR inclusion associated with N ≡ A(I ), I ∈ I, and the system of all sectors of A (considered as sectors of N ). ˆ In particular, for a fixed n ∈ N, the isomorphism class of A(E) ⊂ A(E) depends only on the superselection structure of A and not on E ∈ In . Proof. Let E = I1 ∪ · · · ∪ In ∈ In with E¯ ⊂ (0, ∞) and n = 2k . It follows by Lemma 42 and the split property that ˆ ∪ −E) : A(E) ˆ ˆ [A(E ∨ A(−E)] = Iglobal . ˆ On the other hand, if the ρi ’s are localized in I1 , then the algebra generated by A(E) ∨ ˆ A(−E) and the standard implementation isometries Vi of ρi |Aˆ (E) are the associated LR ˆ ∪ −E), hence coincide inclusion, analogously as in Th. 33, and are contained in A(E with that by the equality of the indices. The corollary then follows in the case n = 2k by induction, once we note that at each ˆ ˆ ˆ ∪ −E) is ρi | ˆ step the extension αρ+i ⊗id from A(E) ∨ A(−E) to A(E A(E∪−E) . The same is then true for an arbitrary n by taking relative commutants. 7. Examples and Further Comments Our results may be first illustrated by considering the case of an inclusion of completely rational, local conformal irreducible nets A ⊂ B, where A = B G is the fixed-point of B with respect to the action of a finite group G and µB = 1. Then [B : A] = |G|, thus by Prop. 24, Iglobal (A) = µA = |G|2 . Now A has the DHR [9] irreducible sectors [ρπ ] ˆ and associated with π ∈ G d(ρπ )2 = |G|, ˆ π∈G
therefore A has extra irreducible sectors [σi ] with d(σi )2 = |G|2 − |G|. i
For example, in the case of the Ising model, we have A = B Z2 as above (but with B twisted local, yet this does not alter our discussion), thus µA = 4 and thus d(ρi )2 = 4, so the standard three sectors are the only irreducible sectors. On the other hand, in the situation studied in [34], the superselection category of A is equivalent to the representation category of a twisted quantum double D ω (G) with ω ∈ H 3 (G, T). Since D ω (G) is semisimple we again have d(σ )2 = dim D ω (G) = |G|2 = µA . ω (G) σ ∈D
One may compare this with the situation occurring on a higher dimensional spacetime. There the strong additivity property may be replaced by the requirement that ˜ ∩ A(O) ˜ = A(O) if O ⊂ O˜ are double cones. If E ≡ O1 ∪ O2 , where A(O ∩ O)
Multi-Interval Subfactors and Modularity of Representations in CFT
657
O1 and O2 are double cones with space-like separated closure, the split property gives a natural isomorphism of A(O1 ) ∨ A(O2 ) with A(O1 ) ⊗ A(O2 ) and d(ρπ )2 = |G|, [A(E ) : A(E)] = Iglobal = ˆ π∈G
where G is the gauge group and the ρπ ’s are the DHR sectors [9] (there are no extra sectors). The reason for this difference is that on S 1 the complement of a 2-interval is ˆ still a 2-interval, thus the inclusion A(E) ⊂ A(E) is self-dual, while on the Minkowski spacetime the spacelike complement of O1 ∪ O2 is a connected region producing no charge transfer inclusion. The index µA in the models given by the loop group construction for SU (n)k has been computed in [50]. Our results apply in particular to these nets and the 2-interval inclusion is the LR inclusion associated with the corresponding irreducible sectors {[ρi ]}i . We note that in this case the 2-interval inclusion is not the asymptotic inclusion of the corresponding Jones-Wenzl subfactor [24, 48], even up to tensoring by a common injective III1 factor. Consider SU (2)k as an example. The net has k + 1 sectors and if we choose the standard generator, we get a corresponding subfactor of Jones with principal graph Ak+1 , up to tensoring a common injective factor of type III1 , as in [47]. If we apply the construction of the asymptotic inclusion to this subfactor, we get a “quantum double” of only the sectors corresponding to the even vertices of Ak+1 . We get the same result, if we apply the LR construction to the system of N –N sectors (or M–M sectors). But the construction of a subfactor from 4 intervals gives a “quantum double” of the system of all the sectors, both even and odd. If we want to get this system from the asymptotic inclusion or the Longo–Rehren inclusion, we have to use also bimodules/sectors corresponding to the odd vertices of the (dual) principal graph. In order to get this LR inclusion from the construction of the asymptotic inclusion, we need to proceed as follows. Let {[ρi ]}i be the set of all the sectors for the net arising from the loop group
construction for SU (n)k as above. Then for a fixed interval I ⊂ S 1 , we consider ( i ρi )(A(I )) ⊂ A(I ) which has finite index and finite depth.
Take a hyperfinite II1 subfactor P ⊂ Q with the same higher relative commutants as ( i ρi )(A(I )) ⊂ A(I ). Then the tensor categories of ˆ the sectors with quantum 6j -symbols of Q ∨ (Q ∩ Q∞ ) ⊂ Q∞ and A(E) ⊂ A(E) are isomorphic. For this reason, the index of the asymptotic inclusion of the Jones subfactor with principal graph Ak+1 is half of that of the subfactor arising from 4 intervals and the net for SU (2)k . For SU (n)k , this ratio of the two indices is n. Finally we notice that there are models like the SO(2N )1 WZW models, see [1] or [34], where all irreducible sectors have dimension one, yet the superselection category C is modular in agreement with our results. In these cases the fusion graph is disconnected, therefore the equivalent categories of M − M and of N ⊗ N opp − N ⊗ N opp sectors are proper subcategories of the categories C × C opp D(C), where D(C) is the quantum double of C. We close this section with a few questions. Does there exist a net with only trivial sectors and non-trivial 2-interval inclusions (thus µA = ∞)? Is strong additivity automatic in the definition of complete rationality? Is the
LR inclusion the only extension of opp N ⊗ N opp with the given canonical endomorphism i ρi ⊗ ρi ? A. The Crossed Product Structure of the LR Inclusion Let N be an infinite factor and {[ρi ]}i a rational system of irreducible sectors of N . The LR inclusion [28] is a canonical inclusion N ⊗ N opp ⊂ M associated with N and
658
Y. Kawahigashi, R. Longo, M. Müger
{[ρi ]}i such that λ
i
opp
ρi ⊗ ρ i
,
where λ is the restriction to N ⊗ N opp of the canonical endomorphism of M into N ⊗ N opp . In [28] such an inclusion is obtained by a canonical choice of the intertwiners T ∈ (id, λ) and S ∈ (λ, λ2 ) that characterize the canonical endomorphism [26] (Q-system). We now show the universality property of this inclusion and its crossed product structure, that will provide a different realization of it. By LR inclusion we will mean the upward LR inclusion. We shall consider the free ∗ -algebra M0 generated by N ⊗ N opp and elements Ri satisfying the relations opp Ri x = (ρi ⊗ ρi )(x)Ri , x ∈ N ⊗ N opp , R ∗ R = d(ρ ), i i i k (15) R R = i j k Cij Rk , R ∗ = d(ρ )C 0∗ R , i i¯ ¯ i ii where Cijk is the canonical intertwiner between ρk ⊗ ρk and ρi ρj ⊗ ρi ρj given by Cijk = h wh ⊗ j (wh ), with j the antilinear isomorphism of N with N opp , and the wh ’s form an orthonormal basis of isometries in (ρk , ρi ρj ). We equip M0 with the maximal C∗ semi-norm associated to the representations of M0 whose restriction to N ⊗ N opp are normal and denote by M the quotient of M0 modulo the ideal formed by the elements that are null with respect to this seminorm and refer to M as the free reduced pre-C ∗ -algebra generated by N ⊗ N opp and the Ri ’s. opp
opp opp
Proposition 45. Let N be an infinite factor with separable predual and {[ρi ]}i a rational system of finite-dimensional irreducible sectors of N . Let M be the free reduced pre-C ∗ -algebra generated by N ⊗ N opp and elements Ri satisfying the relations (15) as above. Then M is a factor and N ⊗N opp ⊂ M is isomorphic to the LR inclusion associated with N and {[ρi ]}i . In particular every element X ∈ M has a unique expansion X= xi Ri , xi ∈ N ⊗ N opp . i
N opp
In other words: if N ⊗ acts normally on a Hilbert space H and Ri ∈ B(H) are elements satisfying the relations (15), then the sub-algebra M of B(H) generated by N ⊗ N opp and the Ri ’s is a factor and N ⊗ N opp ⊂ M is isomorphic to the LR inclusion. Proof. Clearly all elements of M have the form X= xi Ri , xi ∈ N ⊗ N opp ,
(16)
i
and we may suppose that M acts on a Hilbert space so that N and N opp are weakly closed.
Multi-Interval Subfactors and Modularity of Representations in CFT
659
We now construct an conditional expectation E : M → N ⊗ N opp . Setting ρ0 = id, the expectation E may be defined by E(X) = x0
(17)
for X given by (16), once we show that this is well-defined. To this end we will apply the averaging argument in [23]. opp such that there exist x ∈ N ⊗ N opp , i > 0, Let i J be the set of all x0 ∈ N ⊗ N with i≥0 xi Ri = 0. Clearly J is a two-sided ideal of N ⊗ N opp , hence J = 0 (as opp (we may suppose N to be of type III). Suppose we want to show) or J = N ⊗ N J = 0 and let X = 1 + i>0 xi Ri = 0, thus X =1+
uxi Ri u∗ = 1 +
i>0
i>0
opp
uxi ρi ⊗ ρi
(u∗ )Ri = 0
for all unitaries u ∈ N ⊗ N opp . Letting u run in the unitary group of a simple injective subfactor R of N ⊗ N opp and taking a mean over this group, we have X =1+ yi Ri = 0, i>0 opp
where yi ∈ N ⊗ N opp intertwines id and ρi ⊗ ρi on R, thus on all N ⊗ N opp by the opp simplicity of R. Since ρi ⊗ ρi is irreducible, yi = 0, i > 0, and we have 1 = 0, a contradiction. Notice now that 0∗ 0∗ Ri Ri∗ = d(ρi )Ri Cii ¯ Ri¯ = d(ρi )ρi ⊗ ρi (Cii ¯ )Ri Ri¯ opp 0∗ k = d(ρi )ρi ⊗ ρi (Cii ¯ )Ci i¯ Rk , opp
k
thus, by the conjugate equation in [25], we have E(Ri Ri∗ ) = d(ρi )ρi ⊗ ρi
opp
0∗ 0 (Cii ¯ )Ci i¯ =
1 , d(ρi )
so every X ∈ M has the unique expansion X= xi Ri , xi = d(ρi )E(XRi∗ ).
(18)
i
Denoting by M1 ⊃ N ⊗ N opp the LR inclusion associated with N and {[ρi ]}i , M1 is generated by N ⊗ N opp and elements Ri , with an expectation E , satisfying the relations as in (15) and (18) [31, Sect. 5], hence the linear map C:X≡ xi Ri ∈ M → C(X) ≡ xi Ri ∈ M1 (19) i
i
is clearly a homorphism of M onto M1 , which is the identity on N ⊗ N opp . C is clearly one-to-one by the uniqueness of the expansion (18) both in M and in M1 .
660
Y. Kawahigashi, R. Longo, M. Müger
Note that the above proposition gives an alternative construction of the LR inclusion, which is similar to Popa’s construction of the symmetric enveloping algebra [39], as follows. Let N act standardly on L2 (N ) and Vi be the standard isometry implementing ρi . The ∗ -algebra A generated by N and N is naturally isomorphic to the algebraic √ tensor product N N opp and the operators Ri ≡ d(ρi )Vi satisfy the relations (15) by [17, App. A]. By the above argument there exists a conditional expectation E : B → A, where B is the ∗ -algebra generated by A and the Vi ’s. Taking a normal state ϕ of N , the state ϕ˜ ≡ ϕ ϕ opp · E of B gives by the GNS representation the LR inclusion πϕ˜ (A) ⊂ πϕ˜ (B) (Prop. 45). Corollary 46. Let N be an infinite factor with separable predual and {[ρi ]}i a rational system of finite-dimensional irreducible sectors of N . Let M be a von Neumann algebra with M ⊃ N ⊗ N opp and Ri ∈ M elements satisfying the relations (15). If M is generated by N ⊗ N opp and the Ri ’s, then N ⊗ N opp ⊂ M is isomorphic to the LR inclusion associated with {[ρi ]}i . In particular (N ⊗N opp ) ∩M = C and there exists a normal conditional expectation from M to N ⊗ N opp . Proof. The proof is immediate, the isomorphism is obtained as in (19): X∈M→
i
(notations analogous to the ones in (19).
d(ρi )E(XRi∗ )Ri ,
In the following we shall iterate the LR construction, in order to describe the structure of multi-interval subfactors. With N an infinite factor as above and {[ρi ]}i a system of irreducible sectors with opp unitary braiding symmetry, let α + be the induction map from sectors ρi ⊗ ρj of N ⊗ N opp to sectors of the LR extension M1 ≡ M defined by formula (7). Then {αρ+i ⊗id }i is a system of irreducible sectors of M with braiding symmetry and we may opp construct the corresponding LR inclusion M1 ⊗ M1 ⊂ M2 , where the opposite of + + αρi ⊗id is αρ¯i ⊗id . We may then iterate the procedure to obtain a tower M1 ⊂ M2 ⊂ M2k ⊂ · · · and thus an inclusion Nn ⊂ Mn ,
n = 2k ,
where Nn ≡ N ⊗ N opp ⊗ N ⊗ · · · N ⊗ N opp (2k tensor factors). By construction this n−1 inclusion has index Iglobal and we refer to it as the nth iterated LR inclusion. Proposition 47. Let n = 2k . The nth iterated LR inclusion Nn ⊂ Mn is irreducible. If γ (n) : Mn → Nn is the canonical endomorphism, its restriction λ(n) = γ (n) |Nn is given by λ(n)
i1 ,i2 ,...,in
opp
opp
Ni01 i2 ...in ρi1 ⊗ ρi2 ⊗ · · · ⊗ ρin ,
where Ni01 i2 ...in ≡ *id, ρi1 ρ¯i2 · · · ρ¯in +.
(20)
Multi-Interval Subfactors and Modularity of Representations in CFT
661
Proof. By a computation similar to the one in Sect. 6, λ(n) defined by formula (20) has dimension n−1 d(λ(n) ) = Iglobal ,
therefore the formula λ(n) = γ (n) |Nn will follow by showing that ρi1 ⊗ρi2 ⊗· · ·⊗ρin ≺ γ (n) |Nn with multiplicity Ni01 i2 ...in and this will also imply the irreducibility of Nn ⊂ Mn because then λ(n) 0 id with multiplicity one. opp opp But ρi1 ⊗ ρi2 ⊗ · · · ⊗ ρin is unitarily equivalent to ρi1 ρ¯i2 · · · ρ¯in ⊗ id ⊗ · · · ⊗ id in Mn , by applying iteratively Lemma 18, hence we have the conclusion. opp
opp
Let now m < n = 2k be an integer and set Nm as the alternate tensor product of k copies of N and N opp , Nm ≡ N ⊗ N opp ⊗ N ⊗ · · · N ⊗ N opp ,
m factors.
We then define the mth iterated LR inclusion Nm ⊂ Mm , where Mm is defined as the relative commutant in Mn of the remaining n − m copies of N and N opp , i.e. Mm = (Nm ∩ Nn ) ∩ Mn . Note that Nm ⊂ Mm is an irreducible inclusion of factors because Nm ∩ Mm ⊂ Nn ∩ Mn = C. Arguing similarly as above we then have: Proposition 48. Proposition 47 holds true for all positive integers n (in formula (20) opp ρin is ρin if n is odd). Proof. Let n = 2k . Let {Vi91 ...in : 9 = 1, 2, . . . Ni1 ...in } be a basis of isometries in the opp opp space of elements in Mn that intertwine ρi1 ⊗ ρi2 · · · ⊗ ρin on Nn . Arguing as in Prop. 45 we see that any element X ∈ Mn has a unique expansion X= xi91 ...in Vi91 ...in , xi91 ...in ∈ Mn . i1 ...in 9
Using this expansion it is easy to check that for m < n the factor Mm defined above is generated by Nm and the Vi91 ...in ’s with im+1 = im+2 = · · · = in = 0. The rest then follows easily. B. Nets on R and on S 1 and Their Representations In our paper we deal with nets on R, rather than nets on S 1 , for various reasons: because this is the natural language for our arguments, because our results are valid for nets that are not necessarily conformal and, finally, because even if our analysis were restricted to conformal nets on S 1 , our proofs would require the analysis of more general nets on R (the t = 0 LR net is not conformal). In the next Sect. C we will however need to deal with nets on S 1 and their representations, and then conclude consequences for nets on R. Although the relations between nets on R and on S 1 and their representations is straightforward, we will describe explicitly this point here for the convenience of the reader. However, for simplicity, we consider only the case of strongly additive, Haag dual nets.
662
Y. Kawahigashi, R. Longo, M. Müger
Nets on S 1 . Let A be a net of von Neumann algebras on S 1 on a separable Hilbert space satisfying Haag duality. We also assume the local von Neumann algebras A(I ) to be properly infinite, which is automatically true if the split property holds, or if A is conformal (except, of course, for the trivial net A(I ) ≡ C). A representation π of A is, by definition, a map I ∈ I → πI that associates to each interval I ∈ I of S 1 a representation, on a fixed Hilbert space, of the von Neumann algebra A(I ) such that πI˜ |A(I ) = πI if I ⊂ I˜. We shall say that π is locally normal if πI is normal for all I ∈ I and that π is localizable if πI is unitary equivalent to id|A(I ) for all I ∈ I. As the A(I )’s are properly infinite the two notions coincide if π acts on a separable Hilbert space. Moreover every representation of A on a separable Hilbert space is automatically locally normal [45], thus localizable. Denote by C ∗ (A) the universal C ∗ -algebra [14] associated with A (see also [16]). For each I ∈ I there is a canonical embedding ιI : A(I ) → C ∗ (A) and ιI˜ |A(I ) = ιI if I ⊂ I˜; we identify A(I ) with ιI (A(I )) if no confusion arises. There is a one-to-one correspondence between representations of the C ∗ -algebra C ∗ (A) and representations of the net A, given by π → {I → πI ≡ π ·ιI }. Locally normal representations of the net A correspond, of course, to locally normal representations of C ∗ (A). We shall always assume our representations to act on a separable Hilbert space, thus local normality is automatic. As Haag duality holds, a localizable representation π of C ∗ (A) is unitarily equivalent to a representation of the form σ0 · ρ, where σ0 is the representation of C ∗ (A) corresponding of the identity representation of A (we shall however not need this result). Nets on R. Given a net A of von Neumann algebras on S 1 satisfying Haag duality we may associate a net A0 of Neumann algebras on R = S 1 {∞} (identification by Cayley transform) by setting A0 (I ) = A(I ), for all bounded intervals I of R. We call A0 the restriction of A to R. Clearly, if A is strongly additive, then A0 is also strongly additive and satisfies Haag duality on R in the form A(I ) = A(R I ),
(21)
where I ⊂ R is either an interval or an half-line (a, ∞) or (−∞, a), a ∈ R. Here, if E ⊂ R has non-empty interior, we denote by A0 (E) the C∗ -algebra generated by the von Neumann algebras A0 (I )’s as I runs in the intervals contained in the region E and set A0 (E) = A0 (E) . Conversely, let now A0 be a strongly additive net of properly infinite von Neumann algebras A0 (I ) on the (bounded, non-trivial) intervals of R satisfying Haag duality (21). We may compactify R to S 1 = R ∪ {∞} and extend A0 to a net A on the intervals of S 1 by defining A(I ) ≡ A0 (S 1 I )
(22)
if I is an interval whose closure contains the point ∞. Clearly, A is the unique Haag dual net on S 1 whose restriction to R is A0 ; we thus call A the extension of A0 to S 1 . We explicitly state this one-to-one in the following.
Multi-Interval Subfactors and Modularity of Representations in CFT
663
Lemma 49. Let A be a net on S 1 satisfying Haag duality and strong additivity. Then its restriction A0 to R satisfies strong additivity and Haag duality on R. Conversely if A0 is a Haag dual (21), strongly additive net on R, then its extension A to S 1 is strongly additive and Haag dual. Moreover A0 satisfies the split property on R if and only if A satisfies the split property on S 1 . Proof. The proof is immediate. The statement concerning the split property follows because an inclusion of von Neumann algebras N ⊂ M is split iff the commutant inclusion M ⊂ N is split. We now consider the relation between representations of a net A, satisfying Haag duality and strong additivity on S 1 as in Lemma 49 and its restriction A0 on R. A DHR representation π0 of A0 is, by definition, a representation π0 of A0 (R) such that π0 |A0 (RI ) is unitarily equivalent to id|A0 (RI ) for every bounded non-trivial interval I of R, cf. [9]. Clearly a localizable representation π of A determines a DHR representation π0 of A0 ; indeed π0 is consistently defined on ∪a>0 A(−a, a) by π0 (X) = πI (X), X ∈ A(I ), where I ≡ (−a, a), hence on all A(R) by continuity. We call π0 the restriction of π to A0 . Conversely, as we shall see, every DHR representation π0 of A0 (R) determines uniquely a localizable representation π of A. A localized endomorphism ρ of A0 is, by definition, an endomorphism of A0 (R) such that ρ|A0 (I ) = id|A0 (I ) for some interval I ⊂ R; one then says that ρ is localized in I . ρ is transportable if for each interval I1 there is an endomorphism ρ1 localized in I1 and (unitarily) equivalent to ρ (as representations of A0 (R)). By Haag duality then ρ1 = Adu · ρ, where the unitary u belongs to A0 (I˜), if I˜ is any interval containing both I and I1 . In this paper (as is often the case) transportability is assumed in the definition of localized endomorphism. By a classical simple argument [9], a DHR representation π0 of A0 (R) is unitarily equivalent to a (transportable) endomorphism ρ of A0 (R) localized in each given interval I ; it is enough to put ρ(X) ≡ U π0 (X)U ∗ , X ∈ A0 (R), where U is a unitary intertwiner between π0 |A0 (RI ) and id|A0 (RI ) . Proposition 50. Let A be a strongly additive, Haag dual net on S 1 and A0 be its restriction to R, as in Lemma 49. If π is a localizable representation of A, its restriction π0 to A0 is a DHR representation of A0 . Conversely, if π0 is a DHR representation of A0 , there exists a (obviously unique) localizable representation π of A whose restriction to A0 is π0 . Proof. By the above discussion, we only show that if π0 is a DHR representation of A0 , there exists a localizable representation π of A such that πI = π0 |A(I ) if I is a bounded interval of R. Indeed, if the closure of I contains the point ∞, we can define πI as the normal extension of π0 |A0 (I {∞}) , once we show the necessary normality property. Now the
664
Y. Kawahigashi, R. Longo, M. Müger
normality of π0 |A0 (I {∞}) does not depend on the unitary equivalence class of π0 , thus we may replace π0 by a DHR endomorphism ρ of A0 localized in interval I1 ⊂ R with I1 ∩ I = ∅. But then ρ|A0 (I {∞}) is the identity, hence normal. By definition, the sectors of A (resp. of A0 ) are the unitary equivalence classes of localizable representations of A (resp. of DHR representations of A0 ). By the above discussions, the two classes are in one-to-one correspondence. On the other hand localizable representations of A correspond to localizable representations of C ∗ (A) and DHR representations of A0 are equivalent to DHR localized endomorphisms of A0 , hence we have the following. Corollary 51. Let A0 be a strongly additive, Haag dual as in (21), net on R and A be its extension to S 1 . The restriction map π → π0 gives rise to a natural one-to-one correspondence between unitary equivalence classes of localizable representations of C ∗ (A) and unitary equivalence classes of DHR localized endomorphisms of A0 . In particular π(C ∗ (A)) = π0 (A0 (R)) , so π is of type I iff π0 is of type I. Proof. It remains to check the last part of the statement. As C ∗ (A) is generated (as a C∗ -algebra) by the von Neumann algebras A(I ) as I runs in the intervals of S 1 , one has π(C ∗ (A)) = ∨I πI (A(I )), thus clearly π(C ∗ (A)) ⊃ π0 (A0 (R)) . On the other hand if I is an interval of S 1 , by local normality and strong additivity we have πI (A(I )) = πI (A(I {∞})) ⊂ π0 (A0 (R)) , hence π(C ∗ (A)) ⊂ π0 (A0 (R)) . The naturality in the above corollary means that the tensor categories of localizable representations of C ∗ (A) and of DHR localized endomorphisms of A0 are equivalent, but we do not need this form of the above statement. C. Disintegration of Locally Normal Representations and of Sectors Takesaki and Winnink [44] have shown that a locally normal state decomposes into locally normal states, if the split property holds. We shall show here analogous results for localizable representations (sectors). Our arguments work, however, along the same lines to show that locally normal representations decompose into locally normal representations, also on higher dimensional manifolds. We begin with a simple lemma. Lemma 52. Let M be a von Neumann algebra, L ⊂ M a σ -weakly dense C∗ -subalgebra and J ⊂ L a right ideal of L. If π is a representation of L on a Hilbert space H such that π |J is σ -weakly continuous and π(J )H = H, then π is σ -weakly continuous, thus it extends uniquely to a normal representation of M. Proof. It is sufficient to show that π is σ -weakly continuous on the unit ball of L, see e.g. [45]. Let then {ai }i be a bounded net of elements ai ∈ L such that ai → 0 σ -weakly. If t ∈ B(H) is a σ -weak limit point of {π(ai )}i , we have to show that t = 0. By considering a subnet, if necessary, we may assume π(ai ) → t. Given h ∈ J , we have ai h ∈ J and ai h → 0, thus π(ai h) → 0 because π |J is σ -weakly continuous, therefore tπ(h) = lim π(ai )π(h) = lim π(ai h) = 0, i
i
and this entails t = 0 because h is arbitrary and π(J )H is dense in H.
Multi-Interval Subfactors and Modularity of Representations in CFT
665
We shall use the well-known fact that the C∗ -algebra of compact operators on a separable Hilbert space H has only one non-degenerate (i.e. not containing the zero representation) representation, up to multiplicity, hence this representation has a unique normal extension to B(H). Corollary 53. Let N be a type I factor with separable predual, K ⊂ N the ideal of compact operator relative to N and L a C∗ -algebra with K ⊂ L ⊂ M. If π is a representation of L such that π |K is non-degenerate, then π is σ -weakly continuous, thus it extends uniquely to a normal representation of N . Proof. Immediate because any non-degenerate representation of K is σ -weakly continuous and K is σ -weakly dense in N . Let A be a net of von Neumann algebras on S 1 over a separable Hilbert space satisfying the split property and Haag duality. If I, I˜ are intervals, we write I ⊂⊂ I˜ if the closure of I is contained in the interior ˜ of I . For each pair of intervals I ⊂⊂ I˜ we choose an intermediate type I factor N (I, I˜) between A(I ) and A(I˜) and let K(I, I˜) be the compact operators of N (I, I˜) (there is a canonical choice for N (I, I˜) [10], but this does not play a role here). We denote by IQ the set of intervals with rational endpoints and by A the C ∗ -subalgebra of C ∗ (A) generated by all K(I, I˜) as I ⊂⊂ I˜ run in IQ . Clearly A is norm separable. If I1 ⊂⊂ I˜1 ⊂ I2 ⊂⊂ I˜2 then clearly N (I1 , I˜1 ) ⊂ N (I2 , I˜2 ), but K(I1 , I˜1 ) is not included in K(I2 , I˜2 ). For this reason we define the C∗ -algebras associated to pairs of intervals I ⊂⊂ I˜, L(I, I˜) ≡ N (I, I˜) ∩ A. As N (I, I˜) is the multiplier algebra of K(I, I˜), L(I, I˜) consists of elements of A that are multipliers of K(I, I˜). By definition K(I, I˜) ⊂ L(I, I˜) ⊂ N (I, I˜) and A is the C ∗ -subalgebra of C ∗ (A) generated by all L(I, I˜) as I ⊂⊂ I˜ run in IQ . Lemma 54. If I1 ⊂⊂ I˜1 ⊂ I2 ⊂⊂ I˜2 are intervals then L(I1 , I˜1 ) ⊂ L(I2 , I˜2 ). Proof. L(I1 , I˜1 ) ⊂ N (I1 , I˜1 ) ⊂ N (I2 , I˜2 ), thus L(I1 , I˜1 ) ⊂ N (I2 , I˜2 ) ∩ A = L(I2 , I˜2 ).
Proposition 55. Let π be a locally normal representation of C ∗ (A). Then π |A is a representation of A and π |K(I,I˜) is non-degenerate for every of pair of intervals I ⊂⊂ I˜. Conversely, if σ is a representation of A such that σ |K(I,I˜) is non-degenerate for all intervals I, I˜ ∈ IQ , I ⊂⊂ I˜, there exists a unique locally normal representation σ˜ of C ∗ (A) that extends σ . Moreover equivalent representations C ∗ (A) correspond to equivalent representations of A.
666
Y. Kawahigashi, R. Longo, M. Müger
Proof. The only non-trivial part is that σ extends to a locally normal representation σ˜ of C ∗ (A). If I ⊂⊂ I˜ are intervals in IQ , we denote by σ˜ I,I˜ the unique normal extension of σ |L(I,I˜) to N (I, I˜) given by Corollary 53. Given an interval I , we choose I1 , I˜1 ∈ IQ , I1 ⊂⊂ I˜1 such that I ⊂⊂ I1 and set σ˜ I ≡ σ˜ I1 ,I˜1 |A(I ) , We have to show that σ˜ I is well-defined, then I → σ˜ I is clearly a representation of A. Indeed, let I2 , I˜2 ∈ IQ with I2 ⊂⊂ I˜2 be another pair such that I ⊂⊂ I2 . We can choose I3 , I˜3 ∈ IQ such that I ⊂⊂ I3 ⊂⊂ I˜3 ⊂⊂ I1 ∩ I2 . Then by Lemma 54 L(I3 , I˜3 ) ⊂ L(Ii , I˜i ), i = 1, 2, and therefore σ˜ I3 ,I˜3 = σ˜ I1 ,I˜1 |N (I3 ,I˜3 ) = σ˜ I2 ,I˜2 |N (I3 ,I˜3 ) . This concludes the proof.
Proposition 56. Let π be a locally normal representation of C ∗ (A) on a separable Hilbert space and denote by πA the restriction of π to A. If ⊕ πA = πλ dµ(λ) X
is a decomposition into irreducible representations πλ (which always exists), then πλ extends to a locally normal representation π˜ λ of C ∗ (A) for almost all λ. Proof. By Proposition 55, it is sufficient to show that there exists a null set E ⊂ X such that πλ |K(I,I˜) is non-degenerate for λ ∈ / E and all I, I˜ ∈ IQ with I ⊂⊂ I˜. This is clear for a fixed pair I, I˜ of the family, because π ˜ is non-degenerate. Then the statement K(I,I )
follows since the considered family of K(I, I˜)’s is countable.
Proposition 57. With the notations in Proposition 56, if π(C ∗ (A)) is a factor not of type I, then for each λ ∈ X the set Xλ ≡ {λ ∈ X, πλ πλ } has measure zero. Proof. The set Xλ is measurable by Lemma 60 below. We have µ(X Xλ ) > 0, as otherwise π would be quasi-equivalent to πλ , hence π(A) would be a type I factor. If µ(Xλ ) > 0, then πA would be the direct sum of two inequivalent representations ⊕ ⊕ πA = πλ dµ(λ) ⊕ πλ dµ(λ) Xλ
which is not possible since π(A) is a factor.
XXλ
Corollary 58. If there exists a localizable representation π of C ∗ (A) with π(C ∗ (A)) a factor not of type I, then there exist uncountably many inequivalent irreducible localizable representations of C ∗ (A). Proof. If the representation π is factorial not of type I, then the family of the πλ ’s in the above proposition contains an uncountable set of mutually inequivalent irreducible localizable representations as desired.
Multi-Interval Subfactors and Modularity of Representations in CFT
667
Corollary 59. Let A0 be a strongly additive, split net of von Neumann algebras on the intervals of R which is Haag dual as in (21). If there exists a DHR localized endomorphism ρ of A0 with ρ(A0 (R)) a factor not of type I, then there exist uncountably many inequivalent irreducible DHR localized endomorphisms of A0 . Proof. Immediate by Corollary 58 and Corollary 51. Before concluding this appendix we have to prove a lemma that has been used. Let A be any separable C ∗ -algebra and σ a representation of A. Choose a sequence of elements a9 ∈ A dense in the unit ball A1 , a sequence ϕi ∈ A∗ dense in the Banach space of normal linear functionals (σ (A) )∗ associated with σ . A linear functional ϕ ∈ A∗ is then normal with respect to σ if and only if ∀k ∈ N, ∃i ∈ N : |ϕ(a9 ) − ϕi (a9 )| ≤
1 , ∀9 ∈ N. k
(23)
We thus have the following. Lemma 60. Let A be aseparable C ∗ -algebra, π a representation of A on a separable ⊕ Hilbert space and π = X πλ dµ(λ) a direct integral decomposition into a.e. irreducible representations πλ of A. For any irreducible representation σ of A, the set Xσ ≡ {λ, πλ σ } is measurable. ⊕ Proof. Let ξ = X ξ(λ)dµ(λ) be a vector with ξ(λ) = 0, for all λ ∈ X, and consider the functional of A given by ϕλ = (πλ (·)ξ(λ), ξ(λ)). As both σ and πλ are irreducible, we have σ πλ if and only if ϕλ is normal with respect to σ . With the previous notations, we then have by Eq. (23) Xσ = Xik9 , k
i
9
where 1 Xik9 = λ ∈ X : |ϕλ (a9 ) − ϕi (a9 )| ≤ . k As Xik9 is measurable, also Xσ is measurable.
Acknowledgements. A part of this work was done during visits of the first-named author to Università di Roma “Tor Vergata”. Y.K. acknowledges the hospitality and financial supports of CNR (Italy), Università di Roma “Tor Vergata” and the Kanagawa Academy of Science and Technology Research Grants. R.L. wishes to thank the Japan Society for the Promotion of Science for the invitation at the University of Tokyo in June 1997. The authors would like to thank K.-H. Rehren for comments.
References 1. Böckenhauer, J.: An algebraic formulation of level one Wess–Zumino–Witten models. Rev. Math. Phys. 8, 925–947 (1996) 2. Böckenhauer, J., Evans, D.E.: Modular invariants, graphs and α-induction for nets of subfactors I–III. Commun. Math. Phys. 197, 361–386 (1998), 200, 57–103 (1999) & 205, 183–228 (1999) 3. Böckenhauer, J., Evans, D.E., Kawahigashi, Y.: On α-induction, chiral projectors and modular invariants for subfactors. Commun. Math. Phys. 208, 429–487 (1999) 4. Brunetti, R., Guido, D., Longo, R.: Modular structure and duality in conformal quantum field theory. Commun. Math. Phys. 156, 201–219 (1993)
668
Y. Kawahigashi, R. Longo, M. Müger
5. Buchholz, D., D’Antoni, C., Longo, R.: Nuclear maps and modular structures. II. Commun. Math. Phys. 129, 115–138 (1990) 6. Connes, A.: Classification of injective factors. Ann. Math. 104, 73–115 (1976) 7. Conti, R.: Inclusioni di algebre di von Neumann e teoria algebrica dei campi. Tesi del dottorato di ricerca in matematica, Università di Roma “Tor Vergata”, 1996 8. D’Antoni, C., Longo, R., Radulescu, F.: Conformal nets, maximal temperature and models from free probability. J. Oper. Th. 45, 195–208 (2001) 9. Doplicher, S., Haag, R., Roberts, J.E.: Local observables and particle statistics, I. Commun. Math. Phys. 23, 199–230 (1971); II. 35, 49–85 (1974) 10. Doplicher, S., Longo, R.: Standard and split inclusions of von Neumann algebras. Invent. Math. 73, 493–536 (1984) 11. Drinfel’d, V.G.: Quantum groups. Proc. ICM-86, Berkeley, 1986, pp. 798–820 12. Evans, D.E., Kawahigashi, Y.: Quantum symmetries on operator algebras. Oxford: Oxford University Press, 1998 13. Evans, D.E., Kawahigashi, Y.: Orbifold subfactors from Hecke algebras II — Quantum doubles and braiding. Commun. Math. Phys. 196, 331–361 (1998) 14. Fredenhagen, K., Rehren, K.-H., Schroer, B.: Superselection sectors with braid group statistics and exchange algebras II. Rev. Math. Phys. Special issue, 113–157 (1992) 15. Fröhlich, J., Gabbiani, F.: Operator algebras and conformal field theory. Commun. Math. Phys. 155, 569—640 (1993) 16. Guido, D., Longo, R.: Relativistic invariance and charge conjugation in quantum field theory. Commun. Math. Phys. 148, 521—551 (1992) 17. Guido, D., Longo, R.: The conformal spin and statistics theorem. Commun. Math. Phys. 181, 11–35 (1996) 18. Guido, D., Longo, R., Wiesbrock, H.-W.: Extensions of conformal nets and superselection structures. Commun. Math. Phys. 192, 217–244 (1998) 19. Haagerup, U.: Connes’ bicentralizer problem and the uniqueness of the injective factor of type I I I1 . Acta. Math. 158, 95–148 (1987) 20. Kosaki, H.: Type III Factors and Index Theory. Res. Inst. of Math., Lect. Notes 43, Seoul Nat. Univ., 1998 21. Izumi, M.: Subalgebras of infinite C ∗ -algebras with finite Watatani indices II: Cuntz–Krieger algebras. Duke Math. J. 91, 409–461 (1998) 22. Izumi, M.: The structure of sectors associated with the Longo–Rehren inclusions I. General theory. Commun. Math. Phys. 213, 127–179 (2000) 23. Izumi, M., Longo, R., Popa, S.: A Galois correspondence for compact groups of automorphisms of von Neumann algebras with a generalization to Kac algebras. J. Funct. Anal. 10, 25–63 (1998) 24. Jones, V.F.R.: Index for subfactors. Invent. Math. 72, 1–25 (1983) 25. Longo, R.: Index of subfactors and statistics of quantum fields I–II. Commun. Math. Phys. 126, 217–247 (1989); 130, 285–309 (1990) 26. Longo, R.: A duality for Hopf algebras and for subfactors. Commun. Math. Phys. 159, 133–150 (1994) 27. Longo, R.: Algebraic and modular structure of von Neumann algebras of physics. Proc. Symp. Pure Math. 38, Part 2, 551 (1982) 28. Longo, R., Rehren, K.-H.: Nets of subfactors. Rev. Math. Phys. 7, 567–597 (1995) 29. Longo, R., Roberts, J.E.: A theory of dimension. K-Theory 11, 103–159 (1997) 30. Masuda, T.: An analogue of Longo’s canonical endomorphism for bimodule theory and its application to asymptotic inclusions. Internat. J. Math. 8, 249–265 (1997) 31. Masuda, T.: Generalization of Longo–Rehren construction to subfactors of infinite depth and amenability of fusion algebras. J. Funct. Anal. 171, 53–77 (2000) 32. Müger, M.: On charged fields with group symmetry and degeneracies of Verlinde’s matrix S. Ann. Inst. H. Poincaré (Phys. Théor.) 71, 359–394 (1999) 33. Müger, M.: Categorical approach to paragroup theory I. Ambialgebras in and Morita equivalence of tensor categories & II. The quantum double of tensor categories and subfactors. In preparation 34. Müger, M.: Global symmetries in conformal field theory: Orbifold theories, simple current extensions and beyond. In preparation 35. Ocneanu, A.: Quantum symmetry, differential geometry of finite graphs and classification of subfactors. University of Tokyo Seminary Notes 45 (Notes recorded by Y. Kawahigashi), 1991 36. Ocneanu, A.: An invariant coupling between 3-manifolds and subfactors, with connections to topological and conformal quantum field theory. Preprint 1991 37. Ocneanu, A.: Chirality for operator algebras. (Notes recorded by Y. Kawahigashi), in: Subfactors (ed. H. Araki, et al.), Singapore: World Scientific, 1994, pp. 39–63 38. Pimsner, M., Popa, S.: Entropy and index for subfactors. Ann. Scient. Éco. Norm. Sup. 19, 57–106 (1986) 39. Popa, S.: Symmetric enveloping algebras, amenability and AFD properties for subfactors. Math. Res. Lett. 1, 409–425 (1994)
Multi-Interval Subfactors and Modularity of Representations in CFT
669
40. Popa, S.: Classification of Subfactors and their Endomorphisms. CBMS Lecture Notes Series, 86 41. Rehren, K.-H.: Braid group statistics and their superselection rules. In: The Algebraic Theory of Superselection Sectors. D. Kastler ed., Singapore: World Scientific, 1990 42. Rehren, K.-H.: Space-time fields and exchange fields. Commun. Math. Phys. 132, 461–483 (1990) 43. Schroer, B.: Recent developments of algebraic methods in quantum field theories. Int. J. Mod. Phys. B 6, 2041–2059 (1992) 44. Takesaki, M., Winnink, M.: Local normality in quantum statistical mechanics. Commun. Math. Phys. 30, 129–152 (1973) 45. Takesaki, M.: Theory of Operator Algebras. I. Springer-Verlag, Berlin–Heidelberg–New York: SpringerVerlag, 1979 46. Turaev, V.G.: Quantum invariants of knots and 3-manifolds. Berlin–New York: Walter de Gruyter, 1994 47. Wassermann, A.: Operator algebras and conformal field theory III: Fusion of positive energy representations of SU (N ) using bounded operators. Invent. Math. 133, 467–538 (1998) 48. Wenzl, H.: Hecke algebras of type An and subfactors. Invent. Math. 92, 345–383 (1988) 49. Xu, F.: New braided endomorphisms from conformal inclusions. Commun. Math. Phys. 192, 347–403 (1998) 50. Xu, F.: Jones-Wassermann subfactors for disconnected intervals. Commun. Contemp. Math. 2, 307–347 (2000) Communicated by A. Connes
Commun. Math. Phys. 219, 671 – 702 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Renormalization Group Flow of the Two-Dimensional Hierarchical Coulomb Gas Leonardo F. Guidi , Domingos H. U. Marchetti Instituto de Física, Universidade de São Paulo, Caixa Postal 66318, 05315 São Paulo, SP, Brasil. E-mail: [email protected]; [email protected] Received: 29 November 1999 / Accepted: 13 January 2001
Abstract: We consider a quasilinear parabolic differential equation associated with the renormalization group transformation of the two-dimensional hierarchical Coulomb system in the limit as the size of the block L ↓ 1. We show that the initial value problem is well defined in a suitable function space and the solution converges, as t → ∞, to one of the countably infinite equilibrium solutions. The j th nontrivial equilibrium solution bifurcates from the trivial one at βj = 8π/j 2 , j = 1, 2, . . . . These solutions are fully described and we provide a complete analysis of their local and global stability for all values of inverse temperature β > 0. Gallavotti and Nicoló’s conjecture on infinite sequence of “phases transitions” is also addressed. Our results rule out an intermediate phase between the plasma and the Kosterlitz–Thouless phases, at least in the hierarchical model we consider. 1. Introduction We consider, for each β > 0, the partial differential equation β (1.1) (uxx − u2x ) − 2u = 0 4π on R+ × (−π, π ) with periodic boundary condition, u(t, −π ) = u(t, π ) and ux (t, −π) = ux (t, π ), in the space of even functions, satisfying an additional condition u(t, 0) = 01 . We show that the initial value problem is well defined in an appropriate function space B and the solution exists and is unique for all t > 0. Furthermore, as t → ∞, the solution converges in B to one of the (equilibrium) solutions φ of β (1.2) φ − (φ )2 + 2φ = 0, 4π ut −
Supported by FAPESP under grant #98/10745 − 1.
Partially supported by CNPq, FINEP and FAPESP. 1 This is assured by a Lagrange multiplier (see Remark 3.1).
672
L. F. Guidi, D. H. U. Marchetti
with φ(−π) = φ(π) and φ (−π) = φ (π ). For β > 8π , φ0 ≡ 0 is the (globally) asymptotically stable solution of (1.1). For β < 8π such that 8π/ (k + 1)2 ≤ β < 8π/k 2 holds for some k ∈ N+ , φ0 is unstable and there exist 2k non-trivial equilibria solutions φ1± , . . . , φk± of (1.2) among which φ1± are the only asymptotically stable ones.
The aim of the present work is to show that, for j ≥ 1, φj± have a (j − 1)-dimensional unstable manifold Mj ⊂ B so φj± are more stable than φj± if j < j . As a consequence, there exists a dense open set of initial conditions in B such that φ1+ (φ1− is not physically admissible) is the non-trivial stable solution for all β < 8π . Our description of Eq. (1.1) is motivated by two distinct goals. Firstly, it provides a new example of a nonlinear parabolic differential equation by which a geometric theory can be carried out (see e.g. Henry [H]). According to this theory, the above scenario can be stated as follows: there exist a sufficiently large ball B0 ⊂ B about the origin such that, if u(t, B0 ) denotes the set of points reached at time t starting from any initial function in B0 , then the invariant set t≥0 u(t, B0 ) coincides with the k-dimensional unstable manifold Kk = 0≤j ≤k Mj = M0 provided 8π/(k + 1)2 ≤ β < 8π/k 2 . Secondly, the solution of the initial value problem (1.1) describes the renormalization group (RG) flow of the effective potential in the two-dimensional hierarchical Coulomb system and the stationary solutions φj+ , the fixed points of RG, contain information on its critical phenomena. The analysis of Eq. (1.1) presented here can hopefully bring some light to a question raised by Gallavotti and Nicoló [GN] on the “screening phase transitions” in twodimensional Coulomb systems. The existence of infinitely many thresholds of “instabilities” found in the Mayer series at inverse temperature βn = 8π(1 − 1/(2n)), n ∈ N+ , indicates, according to the authors, a sequence of “intermediate” phase transitions from the “plasma phase” (β ≤ β1 = 4π ) to the multipole phase (β ≥ β∞ = 8π ). They conjectured that some partial screening takes place when the inverse temperature decreases from 8π to 4π, which prevents the formation of the neutral multipole of order larger than 2n, where n is the integer part of 1/(2 − β/4π ) (dipoles are the last to be prevented at 4π). The Kosterlitz–Thouless phase (multipole phase) was established by Fröhlich– Spencer [FS] and extended up to 8π by one of the present authors and A. Klein [MK]. Debye screening was only proved for sufficiently small β << 4π [Y, BF]. Study of the region [4π, 8π] began with the work by Benfatto, Gallavotti and Nicoló [BGN] on the ultraviolet collapses of neutral clusters in the Yukawa gas which served as a base for the results in [GN]. It seems improbable, in light of the present knowledge, that a conclusive answer to the Gallavotti–Nicoló conjecture will come up soon. It may be noted, however, that the scenario of an intermediate phase, which has challenged the conventional picture due to Jose et al. [JKKN], has been contested by Fisher et al. [FLL] based on Debye–Hückel–Bjerrum theory and by Dimock and Hurd [DH] who have reinterpreted the ultraviolet collapses in the Yukawa gas. The Kosterlitz–Thouless phase is manifested in the hierarchical model as a bifurcation from the trivial solution [MP]. Our results rule out the existence of further phase transitions since no other bifurcation arises from the stable solution (see Theorem 5.1 on the stability of φ1+ ). Even though the existence of the invariant unstable manifold Kk may provide a suitable explanation to the appearance of Gallavotti–Nicoló thresholds, their nature and location differ substantially from the ones described here because neutral multipoles
Renormalization Group Flow of the 2-D Hierarchical Coulomb Gas
673
decouple from the system – in the hierarchical model they are so precisely neutral as to have no interactions at scales larger than the one at which they form. We believe, however, our results may be helpful for the investigation of the plasma phase. “Screening” of integer charges was shown to be held in the hierarchical model by Benfatto and Renn [BR] as long as nontrivial fixed point occurs. In reference [MP] the behavior of fractional charge correlations shows a “weak form of screening” for β < 8π (but close to the transition point). Both results may now be extended to all inverse temperature values 0 < β < 8π . It is worth mentioning that our numerical analysis shows the stable solution φ1+ looks like the Debye–Hückel potential φDH = (2π/β) x 2 in (−π, π) right after the transition takes place (see Remark 4.6). As in [F], the renormalization group (GR) flow (1.1) may be derived from the block– spin RG transformation of a two-dimensional hierarchical Coulomb system in the limit as the block size L ↓ 1. This procedure, called local potential approximation, has been discussed by Felder [F] in the context of Dyson’s hierarchical model, whose partial differential equation, 1 d −2 1 ut − uxx + (1.3) x ux − d u + u2x = 0, 2 2 2 coincides with (1.1) when his dimensional parameter d = 2 if β is equal to 2π (without boundary conditions). Felder showed that (1.3) has global stationary solutions u∗2n on R for 2 < d < dn with u∗2n (x) → 0 as d ↑ dn and calculated their profile. Here, dn = 2 + 2/(n − 1), n = 2, 3, . . . , is the sequence of thresholds where nontrivial fixed points are expected to appear as a bifurcation from the trivial solution. We mean by global solution one which doesn’t blow up at finite x. The present paper begins with a derivation of Eq. (1.1) in Sect. 2. The existence, uniqueness and continuous dependence on the initial value are presented in Sect. 3 and the precise statements are given in Theorems 3.2 and 3.4. We describe all global solutions of (1.2) completely in Sect. 4. Due to smoothness and the periodic condition, blow-up of an admissible stationary solution is impossible. We show that the non-trivial stationary solution for β < 8π is unique modulo solutions with period 2π/j , j = 2, 3, . . . , which are responsible for the existence of the unstable manifold (see Theorem 4.1). Finally, we analyze in detail the local and global stability of equilibrium solutions of (1.1) in Sect. 5. The main results are stated in Theorems 5.1 and 5.14. 2. The Flow Equation This section is devoted to the derivation of (1.1) from the RG transformation of twodimensional hierarchical Coulomb system. We begin with a brief review of this model. A Coulomb system is an ensemble of two species (for simplicity) of charged particles, interacting via a two-body Coulomb potential V . In the grand canonical ensemble the total number of particles fluctuates around a mean value determined by the particle activity z. It will become clear that the charge ensemble, rather than the particle ensemble, is more appropriate for RG transformation. A configuration q of this system is a function q : ⊂ Z2 −→ Z which associates to each site x of the lattice the total charge q(x) at this position. To each configuration we introduce two functionals: the total energy E : Z −→ R+ , 1 E(q) = q(x) V (x, y) q(y) (2.1) 2 x,y∈
674
L. F. Guidi, D. H. U. Marchetti
(self-energy is included) and an “a priori” weight F : Z −→ R+ , λ(q(x)) F (q) =
(2.2)
x∈
defined for positive real valued functions λ. The equilibrium Gibbs measure µ : Z −→ R+ is thus given by µ (q) :=
1 F (q) e−β E(q) ,
where β is the inverse temperature and F (q) e−β E(q) =
(2.3)
(2.4)
q∈Z
is the grand partition function. It has been shown (see e.g. [FS]) that the standard Coulomb system in the grand canonical ensemble with particle activity z has charge activity given by λ(q) = Iq (2z), where Iq is the q th modified Bessel function. If λ(q) = δq,0 + z δq,1 + δq,−1 , is the grand canonical ensemble of charged particles with hard core. Let us introduce our hierarchical model as proposed in ref. [MP]. The potential V in (2.1) is replaced by a function Vh (x, y) = −
1 ln dh (x, y), 2π
given by the asymptotic behavior of the two-dimensional Coulomb potential with the Euclidean distance |x − y| replaced by hierarchical distance dh (x, y) := LN(x,y) ,
(2.5)
defined for an integer L > 1, where
x y = N (x, y) := inf N ∈ N+ : LN LN
(2.6)
and [z] ∈ Z2 has components the integer part of the components of z ∈ R2 . Notice that dh is not invariant by translations. Now, given an integer number N > 1 , let = N = [−LN , LN − LN−1 ]2 ∩ Z2 and define, for each configuration q ∈ Z , the block configuration q 1 : N−1 −→ Z, q(Lx + y). (2.7) q 1 (x) = 0≤yi
The renormalization group transformation R acting on the space of Gibbs measures (2.3), µN (q), (2.8) µ1N −1 (q 1 ) = [RµN ](q 1 ) := q∈ZN : q 1 fixed
Renormalization Group Flow of the 2-D Hierarchical Coulomb Gas
675
involves an integration over the fluctuations about q 1 following by a rescaling back to the original lattice. As it has been shown in [MP], the RG transformation R preserves the form of the Gibbs measure in the grand canonical ensemble of charges. The measure µ1N −1 is thus given by (2.3) with the “a priori weight” F replaced by F 1 (q 1 ) = λ1 (q 1 (x)), (2.9) x∈N −1
where λ1 (p) = L−βp with (λ %)(p) =
q∈Z
2 /(4π)
(λ
λ · · · λ)(p)
(2.10)
L2 −times
λ(p − q) %(q). Note that N (λ) = N −1 (λ1 ).
Remark 2.1. A peculiar feature of hierarchical models is the reduction of the measure space where R acts to local functions. The RG transformation (2.8) induces a transformation λ1 = rλ given by (2.10) on the space of infinite sequences. Note that the space '1 (Z) of summable sequences is closed by the r transformation: (λ λ) ∈ '1 (Z) if λ ∈ '1 (Z) by the Hausdorff–Young inequality. In order to take the L ↓ 1 limit of the RG transformation r it is convenient to write the system in the sine-Gordon representation. Fourier transforming (2.10), λ(ϕ) = λ(q) eiqϕ , q∈Z
and using the convolution theorem, yields π 1 2 = ϑ(ϕ − τ ) λL (τ ) dτ, λ1 (ϕ) = rλ(ϕ) 2π −π where ϑ(ϕ) =
L−βq
2 /(4π)
eiqϕ
q∈Z
=
(2.11)
1 2 e−π(ϕ+2πn) /(β ln L) 1/2 (β ln L)
(2.12)
n∈Z
by the Poisson formula. Plugging (2.12) into (2.11) and changing the variable ζ = τ + 2π n, Eq. (2.11) can be written as 2 rλ(ϕ) = ν ∗ λL (ϕ), (2.13) where ν∗ means convolution by a Gaussian measure with mean zero and variance β ln L/(2π): ∞ 2 −1/2 (ν ∗ f )(ϕ) = (β ln L) dζ e−π(ϕ−ζ ) /(β ln L) f (ζ ) −∞ (2.14)
= e(β ln L/4π )
d 2 /dϕ 2
f (ϕ),
676
L. F. Guidi, D. H. U. Marchetti
where in the second form of the Gaussian convolution we have used Wick’s theorem. Note that (2.13) is precisely the RG transformation derived by Gallavotti who has started directly from the sine-Gordon representation. In order to let the block size L to 1, we introduce a variable t := n ln L which keeps track of the number of times the RG transformation (2.8) has to be iterated in order to bring two sites at hierarchical distance Ln to O(1) distance. We shall take the limit L ↓ 1 together with n → ∞ maintaining t fixed. Define u(t, x) = − ln λn (x),
(2.15)
n λ denotes the nth iteration of the transformation (2.13). If one writes where λn = r t = (n + 1) ln L then, by taking the logarithm and using (2.15), Eq. (2.13) reads βt d 2 2t/n u(t , x) = − ln exp exp −e u(t, x) (2.16) 4π n dx 2 t 1 β 2 = u(t, x) − ln 1 + ux (t, x) − uxx (t, x) − 2u(t, x) + O n 4π n2 t 1 β = u(t, x) + uxx (t, x) − u2x (t, x) + 2u(t, x) + O n 4π n2
which, combined with u(t , x) − u(t, x) t − t t ↓t
n = lim u(t , x) − u(t, x) , n→∞ t
ut (t, x) = lim
(2.17)
yields equation (1.1). 3. Existence, Uniqueness and Continuous Dependence In this section the existence, uniqueness and continuous dependence on the initial value of Eq. (1.1) will be established by Picard’s theorem for Banach spaces. To avoid the appearance of zero modes upon linearization, we differentiate (1.1) with respect to x and consider the equation for v = ux , β (3.1) (vxx − 2v vx ) − 2v = 0, 4π with v (t, −π) = v (t, π) and vx (t, −π ) = vx (t, π ), in the subspace of odd functions and initial value v(0, ·) = v0 . Note that the operator defined by the l. h. s. of (3.1) preserves this subspace. Before we proceed, we have the following vt −
Remark 3.1. The “a priori weight” λ(t, q) := λn (q) at scale t = n ln L, is a positive symmetric, λ(t, q) = λ(t, −q), sequence of real numbers and has to be normalized at all scales. In [MP] Eq. (2.10) was redefined so that λn (0) = 1 holds for all n. Here, the appropriated normalization is given by λ(t, q) = 1, q∈Z
Renormalization Group Flow of the 2-D Hierarchical Coulomb Gas
677
since, in view of Eq. (2.15), this leads to the condition u (t, 0) = 0, which is already imposed for all t if x v(t, y) dy (3.2) u(t, x) = 0
with v(s, x) an odd solution of (3.1). From (3.2), we have x vt (t, y) dy ut = 0 x = u2x + 2 α uxx − u dy x 0 2 =α uxx − u − α uxx (t, 0), ux + 2
(3.3)
u(t, x) = − ln λn (x) + ln λn (0) where ux (t, 0) = v(t, 0) = 0 by parity. Note that also satisfies (3.3) by Eqs. (2.16) and (2.17). Moreover, note that there is a one-to–one correspondence between the solution of (1.1) and the solution of (3.3), with the same initial value u0 , given by u(t, x) = u(t, x) − u(t, 0) and
u(t, x) = u(t, x) + α
t
0
(3.4)
e2(t−s) uxx (s, 0) ds,
(3.5)
where α uxx (t, 0) is the required Lagrange multiplier introduced in (3.3) to assure that u(t, 0) = 0 (see comments after Eq. (1.1 ) in ref. [F]). This correspondence will be useful in Sect. 5. Because the standard initial condition u0 (x) = z (1 − cos x) satisfies u0 (0) = u0 (π ) = 0, Eq. (3.1) may equivalently be considered on (0, π ) with Dirichlet boundary conditions v (t, 0) = v (t, π) = 0. Another reason for considering (3.1) instead of (1.1) is the fact that the nonlinearity 2v vx is more suitable than u2x for the analysis of equilibrium solutions and corresponding stabilities given in the next sections. The boundary and initial value problem (3.1) may be written as an ordinary differential equation dz + Az = F (z) dt
(3.6)
in a conveniently defined Banach space B, where Az = −αz − 2z
and
F (z) = −2αz z,
(3.7)
with α = β/(4π) and initial value z(0) = z0 . 2 of smooth odd and periodic realThe linear operator A is defined on the space Co,p π 2 f (x) g(x) dx. Because valued functions in [−π, π], with inner product (f, g) := −π
2 From here on, the subindexes in C 2 , L2 , L2 , H 1 and etc., indicate spaces of odd and periodic (o,p) o,p o,p e,p e,p
or even and periodic (e,p) functions.
678
L. F. Guidi, D. H. U. Marchetti
of (f, Ag) = (Af, g) , A may be extended to a self–adjoint operator in L2o,p (−π, π ). The domain D(A) of A is D(A) = f ∈ L2o,p (−π, π ) : Af ∈ L2o,p (−π, π ) and the spectrum of A,
σ (A) = λn = αn2 − 2, n ∈ N+ ,
(3.8)
consists of simple eigenvalues with corresponding eigenfunctions φn (x) = (1/π )1/2 sin nx. Let A1 denote a positive definite linear operator given by A if α > 2 and A + aI for some a > 2 − α, otherwise. The following properties also hold for A given by the q closure in Lo,p (−π, π ), 1 ≤ q < ∞, of the operator −α d 2 /dx 2 − 2 C 2 . o,p
1. The operator A generates an analytic semi-group T (t) = 1 1 T (t) = eλt dλ, 2π i 9 λ + A
e−tA
given by the formula
where 9 is a contour in the resolvent set of A with arg λ −→ ±θ, π/2 < θ < π, as |λ| → ∞. From this, we have −tA e ≤ C e−ct
and
−tA C −ct Ae ≤ e t
(3.9)
for t > 0, c < inf λ σ (A) and C < ∞. 2. Given γ ≥ 0, let the fractional power of A1 be given by ∞ 1 −γ A1 = t γ −1 e−A1 t dt 9(γ ) 0 γ −γ −1 −γ and define A1 = A1 . A1 is a bounded operator (compact if γ > 0) with −1/2
−1/2
A1 bounded in the L2o,p (−π, π ) norm. In addition, for (d/dx) and (d/dx) A1 γ γ γ > 0, A1 is closely defined with the inclusion D(A1 ) ⊂ D(Aτ1 ) if γ > τ . It thus follows from 1. and 2. (see e.g. [H]) γ −tA Cγ −ct A e 1 ≤ e 1 tγ
(3.10)
holds for 0 < γ < 1, t > 0. Here Cγ is bounded in any compact interval of (0, 1) and also bounded as γ 0. Note that, if the operator norm is induced by the L2 -norm, Eq. (3.10) holds with γ γ Cγ = sup (tλn )γ e−tλn ≤ sup r γ e−r ≤ , (3.11) e r>tc n∈N+ uniformly in γ , t ≥ 0.
Renormalization Group Flow of the 2-D Hierarchical Coulomb Gas
679
Following Picard’s method, let us replace F in (3.6) by a locally Hölder continuous function f : [0, T ] −→ B: f (r) − f (s) ≤ C |r − s|θ for 0 ≤ r ≤ s < T and θ > 0. In this case, a solution to (3.6) is given by the variation of constants formula t −tA z(t) = e z0 + e−(t−s)A f (s) ds. (3.12) 0
Note that z : [0, T ) −→ B is continuously differentiable with z ∈ D(A) satisfying the differential equation (3.6). Moreover, z(t) is the unique solution with z(0) = z0 provided f is such that lim
ρ→0 0
ρ
f (s) ds = 0.
Now, substituting f (s) = F (z(s)) into (3.12) leads to an integral equation t z(t) = e−tA z0 + e−(t−s)A F (z(s)) ds
(3.13)
0
whose solution, whether it exists, also solves the initial value problem (3.1) provided F (z(s)) is shown to be locally Hölder continuous on the interval 0 ≤ t < T . To formulate the necessary condition on F and state our results, let B γ = D(Aγ ), γ ≥ 0, denote the Banach space with the graph norm f γ := Aγ f . F : B γ −→ L2p,o (−π, π ) is said to be locally Lipschtzian if there exist U ⊂ B γ and a finite constant L such that F (z1 ) − F (z2 ) ≤ L z1 − z2 γ
(3.14)
holds for any z1 , z2 ∈ U . Theorem 3.2. The initial value problem (3.6) has a unique solution z(t) for all t ∈ R+ with z(0) = z0 ∈ B 1/2 . In addition, if z(t)1/2 is bounded as t → ∞, the trajectories {z(t)}t≥0 lie on a compact set in B 1/2 . Proof. The proof of Theorem 3.2 will be divided into four parts. Firstly, F (z(t)) will be shown to be Hölder continuous under the Lipschtzian condition (3.14), which establishes the equivalence between the integral equation (3.13) and the initial problem (3.6). Secondly, the Banach fixed point theorem will be used to show the existence of a unique solution z(t) of (3.13) for 0 ≤ t ≤ T . Hence, by a compactness argument, the solution z(t) will be extended to all t ∈ R+ . Finally, assuming that z(t)1/2 stays bounded for all t > 0, we conclude the proof. We have to wait till Sect. 5 for the boundedness hypothesis to be established. Part I: Continuity. Let us show that F : D(A1/2 ) −→ L2o,p (−π, π ) given by F (z) = 1 (−π, π ), where H k (−π, π ) −2αz z is locally Lipschitz. We note that D(A1/2 ) = Ho,p o,p is the Sobolev space of odd periodic functions which have distributional derivatives up x 1 , then z(x) = to order k. It thus follows that, if z ∈ Ho,p 0 z (ξ ) dξ is absolutely continuous with √ sup |z(x)| ≤ 2π z1/2 , x∈[−π,π]
680
L. F. Guidi, D. H. U. Marchetti
by the Schwarz inequality. Moreover, using (3.10), we have F (z1 ) − F (z2 ) ≤ 2α z1 (z1 − z2 ) + (z1 − z2 )z2 √ (3.15) ≤ 2α 2π z1 z1 − z2 1/2 + z1 − z2 z2 1/2 √
which satisfies (3.14) with γ = 1/2 and L = 2α 2π z1 1/2 + z2 1/2 . Suppose that z : (0, T ) −→ B 1/2 is a continuous solution of (3.13). From the estimate (3.10), we have h −hA −(s+τ )A e A e − I e−τ A w 1/2 ≤ w 1/2 ds 0
h
A1−δ e−sA ds Aδ e−τ A w
= 0
h
≤ C1−δ 0
≤
1 s 1−δ
ds Aδ e−τ A w
1/2
1/2
e−cτ
C1−δ δ h Cδ+1/2 δ+1/2 w δ τ
for 0 < δ < 1/2 which can be used in Eq. (3.13) along with (3.14), to get
z(t + h) − z(t) ≤ e−hA − I e−tA z0 1/2 1/2 t −hA
−(t−s)A e + −I e F (z(s))1/2 ds 0
+
(3.16)
t+h
t
e−
t+h−s A
(3.17)
F (z(s))1/2 ds ≤ K hδ
for some constant K < ∞ in the open interval (0, T ). Combined with (3.14), this implies the Hölder continuity of f (t) = F (z(t)) and the equivalence between Eqs. (3.6) and (3.13). Part II: Local existence. Let V = z ∈ B1/2 : z − z0 ≤ ε be an ε-neighborhood and let L be the Lipschitz constant of F on V . We set B = F (z0 ) and let T be a positive number such that ε −hA − I z0 ≤ (3.18) e 1/2 2 with 0 ≤ h ≤ T and
T
C1/2 (B + Lε)
s −1/2 e−cs ds ≤
0
ε 2
(3.19)
hold. Let S denote the set of continuous functions y : [t0 , t0 + T ] −→ B 1/2 such that y(t) − z0 ≤ ε. Equipped with the sup–norm yT := S is a complete metric space.
sup
t0 ≤t≤t0 +T
y(t)1/2
Renormalization Group Flow of the 2-D Hierarchical Coulomb Gas
681
Defining D[y] : [t0 , t0 + T ] −→ B 1/2 for each y ∈ S by t −(t−t0 )A D[y](t) = e z0 + e−(t−s)A F (y(s)) ds, t0
we now show that, under the conditions (3.18) and (3.19), D : S −→ S is a strict contraction. Using F (y(t)) ≤ F (y(t)) − F (z0 ) + F (z0 ) ≤ L y(t) − z0 1/2 + B ≤ Lε + B and (3.10), we have
D[y](t) − z0 ≤ e−(t−t0 )A − I e−tA z0 1/2 1/2 t0 +T 1/2 −(t−s)A A e F (y(s)) ds + t0
ε ≤ + C1/2 (B + Lε) 2
T
s −1/2 e−cs ds ≤ ε,
0
and since D[y] is continuous by an estimate analogous to (3.17), D[y] ∈ S. Analogously, from (3.14) and (3.19), for any y, w ∈ S, t0 +T 1/2 −(t−s)A A e F (y(s)) − F (w(s)) ds D[y](t) − D[w](t) ≤ 1/2 t0
T
≤ C1/2 L 0
s −1/2 e−cs ds y − wT ≤
1 y − wT 2
holds uniformly in t ∈ [t0 , t0 + T ] concluding our claim. By the contraction mapping theorem, D has a unique fixed point z in S which is the continuous solution of the integral equation (3.13) on (t0 , t0 + T ) and, by Part I, is the solution of (3.6) in the same interval with z(t0 ) = z0 ∈ B 1/2 . Part III: Global existence. As the set U where (3.14) holds is compact, the same T can be chosen in Part II for any initial condition z0 ∈ U . Moreover, if I1 = (t1 , t1 + T ) and I2 = (t2 , t2 + T ) are two intervals containing t0 , then there exist z0,1 , z0,2 ∈ U such that the two solutions z1 (t) and z2 (t) of Eq. (3.6) on I1 with z1 (t1 ) = z0,1 and on I2 with z2 (t2 ) = z0,2 , respectively, coincide in the open interval I1 ∩ I2 . As a consequence, one can define an open maximal interval Imax = (t− , t+ ) (containing the origin), where the solution z(t) of (3.6) is uniquely given by patching together the solutions zj (t) on intervals Ij with zj (tj ) = z0,j . By construction, there is no solution to (3.6) on (t0 , t ) if t > t+ . Therefore, either t+ = ∞, or else there exist a sequence {tn }n∈N+ , with tn → t+ as n → ∞ such that z(tn ) tend to the boundary ∂U of the compact set U . It thus follows that, if t+ is finite, the solution z(t) blows–up at finite time. In what follows we show that z(t)1/2 remains finite for all t > t0 and this implies global existence of z(t). Let us start with the following generalization of the Gronwall inequality. Lemma 3.3 (Gronwall). Let ξ and γ be numbers and let θ and ζ be non-negative continuous functions defined in a interval I = (0, T ) such that ξ ≥ 0, γ > 0 and t ζ (t) ≤ θ(t) + ξ (3.20) (t − τ )γ −1 ζ (τ ) dτ. 0
682
L. F. Guidi, D. H. U. Marchetti
Then
t
Eγ (t − τ ) θ (τ ) dτ
(3.21)
n 1 ξ 9(γ ) t γ 9 (nγ + 1)
(3.22)
ζ (t) ≤ θ (t) + 0
holds for t ∈ I , where Eγ = dEγ /dt, Eγ (t) =
∞ n=0
and 9(z) = then
∞
t z−1 e−t dt is the gamma function. In addition, if θ(t) ≤ K for all t ∈ I ,
0
ζ (t) ≤ K Eγ (t) ≤ K eξ 9(γ )T
(3.23)
holds for some finite constant K . Proof of Lemma 3.3. If T is an integral operator given by the convolution
t
T ζ (t) = ξ
(t − τ )γ −1 ζ (τ ) dτ,
(3.24)
0
then the inequality (3.20) can be formally solved by ζ (t) = θ(t) +
∞
T n θ(t),
n=1
where T n is also an convolution integral operator which can be explicitly evaluated by the Laplace transform, t 1 (t − τ )nγ −1 θ(τ ) dτ (ξ 9(γ ))n 9 (nγ ) 0 t
1 d = (ξ 9(γ ))n (t − τ )nγ θ(τ ) dτ ≡ fn ∗ θ (t), 9 (nγ + 1) 0 dt
T n θ (t) =
with fn (t) = (ξ 9(γ ) t γ )n / 9 (nγ + 1). Equation (3.21) (and (3.23) by the fundamental theorem of calculus) thus follows by setting Eγ (t) = n∈N fn (t). Note that this series is absolutely and uniformly convergent in t ∈ I , with Eγ (0) = 1, and it cannot grow faster than exponential Eγ (T ) ∼
1 ξ 9(γ )T e γ
as T → ∞ (see Lemma 7.1.1 in [H]). This concludes the proof of Lemma 3.3.
(3.25)
Renormalization Group Flow of the 2-D Hierarchical Coulomb Gas
683
Taking the graph norm of (3.13), we have in view of (3.9), (3.10) and (3.25), t 1/2 −(t−s)A A e z(s)1/2 ds z(t)1/2 ≤ e−(t−t0 )A z0 1/2 + L
t0
t
≤ C z0 1/2 + L (t − s)−1/2 z(s)1/2 ds t √0
≤ C exp LC1/2 π t z0 1/2 ,
(3.26)
which is finite for any t ∈ R+ . Part IV: Compact trajectories. Since B γ ⊂ B 1/2 has compact inclusion if 1/2 < γ < 1 [H], it suffices to show that z(t)γ remains bounded as t → ∞. The hypothesis z(t)1/2 < ∞ combined with (3.15) implies the existence of C < ∞ such that, analogously as in (3.26), t γ −(t−s)A −tA A e F ( z(s)) ds z(t)γ ≤ e z0 γ + 0 t ≤ Cγ −1/2 t 1/2−γ e−ct z0 1/2 + C Cγ (t − s)−γ e−c(t−s) ds, 0
which is bounded for t > 0 provided c > 0 (i.e. inf λ σ (A) > 0). Although the spectrum of A is not positive if β ≤ 8π, we shall see in Sect. 5 that A in the integral equation (3.13) can be replaced by a positive linear operator L (see Theorems 5.2 and 5.3). This concludes the proof of Theorem 3.2. It follows by analogous procedure that if z1 and z2 are solutions of (3.6) differing by their initial value in B 1/2 , then
z1 (t) − z2 (t)1/2 ≤ e−tA z0,1 − z0,2 1/2 t 1/2 −(t−s)A A e F (z1 (s)) − F (z2 (s)) ds + 0
≤ e−tA z0,1 − z0,2 1/2 t + C1/2 L (t − s)−1/2 e−cs ds z1 (s) − z2 (s)1/2 0
which implies, by the Gronwall inequality, the continuous dependence of z(t) with respect to its initial condition. We may also consider the dependence of z with respect to the parameter α = β/(4π ). The next statement is a corollary of the above analysis. Theorem 3.4. The solution z(t) : R+ × B 1/2 −→ B 1/2 to the initial value problem (3.6) as a function of the bifurcation parameter α and the initial value z0 is continuous. Remark 3.5. It can be shown (see [H]) that for any initial value z0 ∈ B γ , 0 < γ < 1, the solution is actually in D(A) at any later time. Moreover, since F : B 1/2 −→ L2o,p (−π, π ) is C ∞ (has Fréchet derivatives of all orders), it can also be shown that (α, z0 ) ∈ R+ × B 1/2 −→ z(t; α, z0 ) is C ∞ for all t > 0.
684
L. F. Guidi, D. H. U. Marchetti
Remark 3.6. Under minor modifications, one can show existence, uniqueness contin and 1 (−π, π ) with norm z = z uous dependence of (3.1) in Sobolev space Ho,p (just 1 L2o,p include the linear term of (3.1) in the definition of F ). The same results hold for Eq. (1.1) 1 (−π, π ) with both norms · in the Sobolev space of even and periodic function He,p 1 and ·1/2 . Note from item 2. after (3.9) and (3.7) that α z1 = z1/2 + 2 zL2o,p so, both norms are equivalent. 4. Equilibrium Solutions Time independent (equilibrium) solutions of (3.1) are odd solutions of the ordinary differential equation
α ψ − 2ψψ + 2ψ = 0, (4.1) with periodic conditions ψ(−π) = ψ(π ) and ψ (−π ) = ψ (π ), α = β/ (4π ) ≥ 0 , which can be written as w = 2p w − α −1 (4.2) p = w, by setting p = ψ and w = ψ . In this section we give a qualitative and quantitative description of the solutions of (4.2) in the phase space R2 and study their implications for the equilibrium solutions of (3.1). Our results are summarized as follows. Theorem 4.1. The stationary equation (4.1) has two distinct regimes separated by α = 2 (β = 8π ). For α ≥ 2, ψ0 ≡ 0 is the unique solution. For α < 2 such that 2/ (k + 1)2 ≤ α < 2/k 2 holds for some k ∈ N+ , there exist 2k non-trivial solutions ψj+ , ψj− , j = 1, . . . , k, with fundamental period 2π/j , ψj± (−x) = −ψj± (x) and ψj− (x) = ψj+ (x + π ). Moreover, each pair of non-trivial solutions bifurcate from the trivial solution ψ0 at αj = 2/j 2 (βj = 8π/j 2 ) with lim ψj± = 0. α↑αj In the phase space, these solutions ψj , ψj , are closed orbits around (0, 0) whose distance from the origin increases monotonically as α decreases. Numerical computa- tions indicate that these orbits approach rapidly to the open orbit α −1 , α −1 x , x ∈ R from the left as α → 0. Let us begin by stating the general properties derived by the same tools used in the analysis performed in Sect. 3. The vector field f : R2 −→ R2 , (w, p) −→ f (w, p) = 2p(w − α −1 ), w , in the right-hand side of (4.2), defines a smooth autonomous dynamical system. It thus follows from Piccard’s theorem (see e.g. [CL]) that there exist a unique solution (w(x), p(x)) of this system, globally defined in R2 , with (w(0), p(0)) = (w0 , p0 ). As we have seen in Sect. 3, the existence of a global solution and its continuous dependence on the value (w0 , p0 ), and on the parameter α, follow from Gronwall’s lemma, which
Renormalization Group Flow of the 2-D Hierarchical Coulomb Gas
685
holds here in its standard form. As a consequence, the phase space R2 is foliated by non-overlapping orbits γP = {(w(x), h(x)) : x ∈ R and P = (w(0), p(0))} which passes by P = (w0 , p0 ) ∈ R2 at x = 0. Note that, by varying continuously P and α, the orbit γP varies continuously in the phase space. We shall now determine the values (P , α) by which the solution of (4.2) defines closed orbits. Note that the orbits are symmetric with respect to the w-axis, L = {(w, 0) : w ∈ R}, since the system of equations (4.2) remains invariant if the sign of both, x and p, are reversed. As we shall see, there is no loss of generality if the initial value (w(0), p(0)) = P belongs to L. We write γP = γw0 . Proposition 4.2. Every orbit γP is determined by a single value P in the positive semi-axis L+ = {(w0 , 0) : w0 ≥ 0}. For w0 > 0, the orbit γw0 is either closed or unbounded depending on whether α w0 < 1 or α w0 ≥ 1, respectively. The orbit γα −1 = {(α −1 , α −1 x) : x ∈ R} separates the phase space R2 in such a way that γP is closed if P is on the left of γα −1 and unbounded otherwise. In addition, if w0 = 0, then γ0 = {(0, 0)}, and the origin is enclosed by every closed orbit. Proof. The proof of Proposition 4.2 follows from an explicit computation. By the chain rule, Eq. (4.2) can be written as dp w
= dw 2p w − α −1
(4.3)
provided αw = 1. The trajectories γw0 , obtained by integrating 2p dp = w dw/ w − α −1 with initial point P = (w0 , 0), p2 = w − w0 + α −1 ln
1 − αw 1 − αw0
,
(4.4)
are portrayed in Fig. 1. We note that P = (0, 0) is the only critical point of (4.2) which is a center for all α > 0 since, by √ linearizing f (w, p) around P = (0, 0) gives a matrix whose eigenvalues are λ± = ±i 2α −1 . This implies that γ0 = {(0, 0)} and the orbits γw0 with w0 sufficiently closed to 0 are, in view of (4.4), ellipses defined by the equation 2α −1 p 2 + w 2 = C. When αw0 = 1, using mathematical induction and Eqs. (4.2) with (w(0), p(0)) = (w0 , 0), we have d nw (0) = 0, dx n for all n ≥ 1, which leads to γα −1 =
α −1 , α −1 x : x ∈ R .
686
L. F. Guidi, D. H. U. Marchetti
2.0
p
0.0 1/α
-2.0 -3.0
1
2 0.0
w Fig. 1. Trajectories of the dynamical system (4.2)
Hence, if ω = ω(P ) denotes the set of limit points (the ω –limit set) given by ω(P ) = (w∗ , h∗ ) ∈ R2 : lim (w(xn ), h(xn )) = (w ∗ , h∗ ) n→∞
(4.5)
for some sequence of points {xn } such that xn → ∞, as n → ∞, γα −1 separates two different type of orbits: ω(P ) = γP or ω(P ) = {∞} depending on whether the point P is at the left or at the right of γα −1 . Proof of Theorem 4.1. The stationary solutions satisfy (4.2) with periodic conditions w(0) = w(2π) and p(0) = p(2π ). By fixing the period T of an orbit γw0 in 2π , the label w0 becomes implicitly dependent on the parameter α. In view of Proposition 4.2, Theorem 4.1 follows if for α ≥ 2, except by the orbit γ0 = {(0, 0)}, no (non-trivial) solution has period T = 2π and for α < 2 there is a one-to–one "√ # correspondence between w0 and α for T fixed at any value 2π/k, k = 1, . . . , 2/α . More precisely, let T = T (α, w0 ) denote the period of the dynamical system (4.2) with initial value (w(0), p(0)) = (w0 , 0):
T =
γw0
dx = 2
dp , w
(4.6)
Renormalization Group Flow of the 2-D Hierarchical Coulomb Gas
687
where, by symmetry, the second integration is over the semi-orbit above the w-axis. For D = {(α, w0 ) ∈ R+ × R+ : αw0 ≤ 1}, we set Gj = T −
2π j
and note that Gj : D −→ R is a continuous function of both variables satisfying (4.7) Gj 2/j 2 , 0 = 0. To see (4.7), we compute the period TL of an elliptic orbit, e.g. (2/α) p 2 + w 2 = 1 , −1
of (4.2) linearized at the origin (f (w, p) replaced by 2α p, w ), % √α/2 dp α $ TL = 4 = 2π , (4.8) 2 1 − (2/α) p 2 0 and note that limw0 →0 T (α, w0 ) = TL . Continuity follows from the general properties stated previously. Hence, provided ∂T >0 ∂w0
(4.9)
theorem, there exists a unique holds for all (α, w0 ) ∈ D, by the implicit function " # j (2/j 2 ) = 0 (strictly) monotone decreasing function w j : 0, 2/j 2 −→ R+ with w such that Gj (α, w j (α)) = 0. Note that (4.9) and √ (4.10) T (α, w0 ) = αT (1, αw0 ) imply that T is an increasing function of both α√and w0 , independently. This fact, which √ can be seen by rescaling (4.2) by x → x = x/ α, w → w = αw and p → p = αp, explains the monotone behavior of w j . It thus follows that, if α < 2, for each j = 1, . . . , k such that 2/ (k + 1)2 ≤ α < 2/k 2 holds, a unique function w j such that w j (2/j 2 ) = 0 exists. The non-trivial solutions ± ± ψ1 , . . . , ψk of (4.1) are the p-component of γw j , j = 1, . . . , k, which winds around + the origin j -times: ψj is 2π -periodic odd function with fundamental period 2π/j , ψj+ (0) > 0 and satisfies ψj+ (x + π ) = ψj− (x). If α ≥ 2, because T (α, w0 ) is a strictly increasing function of w0 and T (α, 0) ≥ 2π (see Eq. (4.8)), there is no solution of Gj (α, w0 ) = 0 besides w j (α) = 0 for j = 1. This reduces the proof of Theorem 4.1 to the proof of inequality (4.9). To prove (4.9), it is convenient to change variables. Let q = ln (1 − α w)
(4.11)
be defined for αw < 1. From (4.10), there is no loss of generality in taking α = 1. The system of equations (4.2) under this condition is thus equivalent to the following Hamiltonian system3 & q = 2p (4.12) p = 1 − eq , 3 We thank G. Benfatto for explaining this tranformation and for pointing us to Eq. (4.4) in a footnote of [F].
688
L. F. Guidi, D. H. U. Marchetti
whose energy function is given by H (q, p) = p2 + eq − q − 1.
(4.13)
The trajectory equation (4.4), when written in terms of the q-variable, gives exactly the energy level equation H (q, p) = E with E = −w0 − ln (1 − w0 ) .
(4.14)
We denote by γE the orbits of (4.12) and note that, in view of the fact dE w0 = > 0, dw0 1 − w0
there is a one-to–one correspondence between the two families of closed orbits γw0 , 0 ≤ w0 < 1 and {γE , 0 ≤ E < ∞}. Now, let T = T(E) be the period of an orbit γE , q+ dq dx = . (4.15) T = γE q− p Using the energy conservation law, we have p = p(q, E) =
$ E − v(q),
(4.16)
where the potential energy is given by v(q) = eq − q − 1,
(4.17)
and q± = q± (E) are the positive and negative roots of equation v(q) = E. d T Equation (4.9) holds if and only if > 0 holds uniformly in E ∈ R+ . But this dE follows from the monotonicity criterion given by C. Chicone [C] (see also [CG]): Lemma 4.3. Let v ∈ C 3 (R) be a three-times differentiable function and let f (q) = −v (q) be the force acting at q. If v/f 2 is a convex function with 2 2 6v v − 3 v v − 2vv v v = > 0, q = 0, (4.18) f2 (v )4 then the period T is a monotone (strictly) increasing function of E. Proof. It follows from (4.16) two basic facts: ∂p f = ∂q 2p
and
p(q± , E) = 0.
(4.19)
These will be used for deriving an appropriated integral representation of d T/dE. Let 1 q+ 3 v K := p dq. (4.20) 3 q− f2
Renormalization Group Flow of the 2-D Hierarchical Coulomb Gas
689
Integrating twice by parts, gives q+ q+ v pv q+ v p3 − + K= (pf ) 2 dq 3 f 2 q− 2f q− f q− 1 q+ v f = + vp 2 dq 2 q− 2p f in view of (4.19). Note that f (q± ) = 0 since v (q± ) = v(q± ) − q± = E − q± vanishes only at E = 0. This follows from the fact that v is a convex positive function with v(0) = 0 and asymptotic behavior v(q) ∼ q − 1 and ∼ eαq , as q goes to −∞ and ∞. Now, using (v/f ) = v /f − vf /f 2 = −1 − vf /f 2 , and integrating by parts, we continue 1 q+ v v K= −p − p dq 2 q− 2p f q+ 1 q+ v 1 v = (4.21) − p dq − p 2 q− p 2 f q− 1 q+ E = − 2p dq, 2 q− p where in the last equation we have used v = E − p 2 . From (4.15), (4.20) and (4.21), we have q+ 2 q+ 3 v E T = 2 p dq + p dq. 3 q− f2 q− Differentiating this with respect to E and using (4.19), gives q+ q+ d T dq v T +E = + p dq 2 dE p f q− q− which, in view of (4.15) and the assumption of Lemma 4.18, implies d T v 1 q+ p dq > 0. = dE E q− f2 It remains to verify (4.18) for v given by (4.17). By an explicit computation (see Chicone [C]) 4 v v = eq g(q), 2 f where g(q) := e2q + 4 (1 − q) eq − 2q − 5 is such that g(0) = g (0) = 0 and g (q) = 4eq v(q) ≥ 0. This implies g(q) ≥ 0 (g(q) = 0 only if q = 0), the hypothesis of Lemma 4.3 and concludes the proof of Theorem 4.1.
690
L. F. Guidi, D. H. U. Marchetti
Turning back to the Coulomb system problem, some remarks are now in order. Remark 4.4. Recalling v(t, x) = ux (t, x) and denoting λ∗ = lim λn the charge activity n→∞
at the fixed point, we have from (2.15) ψ(0) = −i
' ∗
q λ (q)
q∈Z
ψ (0) =
' ∗
q λ (q) 2
λ∗ (q) = 0
q∈Z
and
q∈Z
λ∗ (q) ≥ 0.
q∈Z
These boundary conditions select ψj+ , j = 1, . . . , k, as being the only physically x + ψ + (y) dy ≥ 0 on (−π, π ). meaningful stationary solutions and implies φ (x) = 0
Remark 4.5. The value α = 2 is a bifurcation point as one can see by linearizing (4.1) about ψ ≡ 0. The linear operator L[0] = A given by (3.7) in the subspace of odd 2π periodic functions has eigenvalues and associate eigenfunctions given by (3.8). Hence, if α > 2, the eigenvalues are all positive and ψ ≡ 0 is locally stable. When α < 2 (but close to 2) a single eigenvalue becomes negative and one can apply the Crandall– Rabinowitz bifurcation theory [C] to locally describe the stable solution which bifurcates from the trivial one. Note that Crandall–Rabinowitz theory can also be applied in neighborhood of αj = 2/j 2, j > 1, in the orthogonal complement of the span the −1/2 π sin mx, m = 1, . . . , j − 1 corresponding to the odd functions with fundamental period T = 2π/j . These points were referred to in the introduction as a sequence of instability thresholds. In Theorem 4.1 we have given a global characterization of the non-trivial stationary solutions. 4.6. In the sine-Gordon representation, the effective potential φ(x) = Remark x ψ(y) dy = x 2 / (2α) at γα −1 corresponds to the Debye–Hückel regime with De0 bye length α. Although this regime is not reached for all β > 0, it gets closed quite fast as β = 4πα approaches 0. Numerical calculation is shown in Fig. 2. Note that at α = 1 (β = 4π), w 1 cannot be distinguished from α −1 (numerical error is in the sixth decimal order). Remark 4.7. The derivative of (4.6) with respect to w0 , computed from Eq. (4.4), ∂T 2αw0 = ∂w0 1 − αw0
2 (1 − αw) p2 /α
dp, 1 + 2 (1 − αw) p 2 /α
sign (w)
indicates that an estimate from below can be very delicate to obtain. Note sign (w) changes along the orbit γw0 . This shows how amusing Chicone’s monotonicity result is for the problem at hand.
Renormalization Group Flow of the 2-D Hierarchical Coulomb Gas
691
w0 1.4
1.2
1
γ α −1
nonperiodic orbit 0.8
0.6
0.4
periodic orbit
γ w∧ 1
0.2
0.8
1.2
1.4
1.6
1.8
2
α
Fig. 4.2. Comparison between the initial value function for the periodic orbit of period 2π , w 1 = w 1 (α), and for the nonperiodic Debye–Hückel orbit, w DH (α) = α −1
5. Stability Let z(t; z0 ) denote the solution of the initial value problem (3.6) and (3.7). It follows from the analysis in Sect. 3 that S(t)z0 = z(t; z0 )
(5.1)
defines a dynamical system on a closed subset V ⊂ D (A) of B 1/2 with the topology induced by the graph norm ·1/2 . Note that z(t; z0 ) is continuous in both t and z0 with z(0; z0 ) = z0 and satisfies the (nonlinear) semi-group property S(t + τ )z0 = z(t; z(τ ; z0 )) = S(t)S(τ )z0 . This section is devoted to the stability analysis of the equilibrium solutions described in Sect. 4. By local stability it is meant that z(t; z0 ) is uniformly continuous in V for all t ≥ 0: given ε > 0, z(t; z0 ) − z(t; z1 )1/2 < ε for all t ≥ 0 and z1 ∈ V such that z1 − z0 1/2 < δ for some δ = δ(ε) > 0. It is uniformly asymptotically stable if, in addition, lim z(t; z0 ) − z(t; z1 )1/2 = 0. t→∞ The Liapunov (global) stability analysis as developed by LaSalle and applied to semilinear parabolic differential equations by Chafee and Infante [CI] (see also [H]) will also be discussed and extended in this section. Let us begin with the local analysis. Theorem 5.1 (Local Stability). There exists a neighborhood U ⊂ B 1/2 of the origin such that, if α > 2 and z0 in U , then ψ0 ≡ 0 is stable, i.e., lim z(t; z0 )1/2 = 0. If t→∞
α < 2 is such that 2/ (k + 1)2 ≤ α < 2/k 2 holds, among all equilibrium solutions of (4.1), ψ0 , ψj± , j = 1, . . . , k, ψ1± are the only asymptotically stables. So, there exists ρ > 0 such that if z0 − ψ1/2 ≤ ρ, then lim z(t; z0 ) − ψ1/2 = 0 for ψ = ψ1± and, t→∞
for any sequence {zn }n≥1 with lim zn − ψ = 0, we have sup z(t; zn ) − ψ1/2 ≥ n→∞
ε > 0 for all n and ψ = ψj± , j = 1.
t>0
692
L. F. Guidi, D. H. U. Marchetti
It is convenient to consider the equation dζ + Lζ = F (ζ ) dt for ζ = z − ψ, where ψ is a solution of (4.1). Here
(5.2)
Lζ = L [ψ] ζ = −αζ + 2αψζ − 2 1 − αψ ζ
(5.3)
is the linearization of the differential operator (3.1) around ψ and F is as in (3.7). Note L = A and (5.2) reduces to (3.6) if ψ = ψ0 = 0. Proof. The proof of the Theorem 5.1 follows from the next two theorems.
Theorem 5.2. If the spectrum σ (L) of (5.3) lies in {λ ∈ R : λ ≥ c} for some c > 0, then ζ = 0 is the unique uniformly asymptotically stable solution of (5.2). On the other hand, if σ (L) ∩ {λ ∈ R : λ < 0} = ∅, then ζ = 0 is unstable. Theorem 5.3. Let L = L[ψ] be given by (5.3). Then σ (L) > 0 whenever ψ = ψ0 and α > 2 or ψ = ψ1± and α < 2. If α is such that 2/ (k + 1)2 ≤ α < 2/k 2 holds for some k ∈ N+ , then σ (L) ∩ {λ ∈ R : λ < 0} = ∅ for ψ = ψ0 and ψ = ψj± , j = 2, . . . , k. Proof of Theorem 5.2. We shall prove only the first part of Theorem 5.2 and refer to Theorem 5.1.3 of Henry’s book [H] for the instability part. It follows from (3.13), (3.10), (3.15) and the hypothesis on σ (L) that t ζ (t)1/2 ≤ C1/2 e−ct ζ0 1/2 + ξ (5.4) (t − s)−1/2 e−c(t−s) ζ (s)21/2 ds, 0
√ √ with c > 0, C1/2 = 1/ 2e and ξ = 2 2π α. Let us assume that ζ (s)1/2 ≤ ρ on a interval (0, t) for some ρ satisfying % ∞ 1 π −1/2 −ct ξ < t e dt = ξ , c 2ρ 0 % % c e 1 . If ζ0 1/2 ≤ ρ , then Eq. (5.4) can be bounded as i. e., ρ < 4πα 2 2 t ρ ζ (t)1/2 ≤ + ρ 2 ξ (t − s)−1/2 e−c(t−s) < ρ 2 0
(5.5)
(5.6)
and this implies the existence of a unique solution of (5.2) with ζ (t)1/2 ≤ ρ for all t > 0. Note that ζ0 1/2 < ρ and if t1 is the maximum value under which ζ (t)1/2 < ρ for all 0 < t < t1 , then either ζ (t1 )1/2 = ρ or t1 = ∞. But the first case is impossible by (5.6). Going back to (5.4), using ζ (s)1/2 < ρ and a slightly modification of the Gronwall ∞ n ( √ ρξ π t 1/2 9(n/2 + 1) , we have inequality (3.3) with E1/2 (t) = n=0
ζ (t)1/2 ≤ C1/2 ζ0 1/2 E1/2 (t) e−ct 2 2
≤ C1/2 ζ0 1/2 1 + ρξ t 1/2 e− c−ρ ξ π t ) % * 1 ct 1 ζ0 1/2 1 + ≤ e−3ct/4 , 2e 2 π
Renormalization Group Flow of the 2-D Hierarchical Coulomb Gas
693
in view of (5.5). This proves the stability statement of Theorem 5.2, since (5.2) defines a dynamical system in a closed subset Vρ = ζ ∈ B 1/2 : ζ 1/2 ≤ ρ with lim ζ (t)1/2 t→∞ % e . = 0 if ζ0 1/2 = z0 − ψ1/2 ≤ ρ 2 Remark 5.4. One can actually show that if c = inf λ σ (L) then ζ (t; ζ0 ) = z(t; z0 ) − ψ decays exponentially fast to 0 as ζ (t; ζ0 ) = κ(ζ0 ) e−ct + ε(t; ζ0 ),
where ε(t; ζ0 )1/2 ≤ C ζ0 1/2 e−c t with 0 < c < c and κ : Vρ −→ N (L − cI ) is continuous and such that κ(0) = 0, where N (L − cI ) is the one-dimensional span of the eigenfunction of L associated to c. Proof of Theorem 5.3. Since L[ψ0 ] = A, Theorem 5.3 for ψ = ψ0 with α ≥ 0 follows from the spectral computation in (3.8). Now, let ψ be a nontrivial solution of the equilibrium equation (4.1) and note that ψ(0) = ψ(π) = 0 by parity. According to Theorem 5.2, ψ is asymptotically stable if σ (L) > 0 and unstable if σ (L) ∩ {λ < 0} = ∅. Let ϕ be the solution of L[ψ]ϕ = 0
(5.7)
in the domain 0 < x < π satisfying ϕ(0) = 0
and
ϕ (0) = 1.
(5.8)
As in [H], we shall use the comparison theorem to establish that ψ is asymptotically stable if ϕ(x) > 0 on 0 < x ≤ π and unstable if ϕ(x) < 0 somewhere in 0 < x < π. To apply the comparison theorem and complete the proof of Theorem 5.3, let p(x) := e−2
x 0
ψ(y) dy
be the weight which makes L a self–adjoint operator:
p L[ψ]ζ = −α p ζ − 2p 1 − αψ ζ.
(5.9)
(5.10)
Note that (Lζ, η)p = (ζ, Lη)p for any odd periodic functions ζ and η of period 2π , π f (x) g(x) p(x) dx. where (f, g)p := −π
Theorem 5.5 (Comparison). Suppose ζ1 and ζ2 are two real solutions on the domain (0, π ) of p L[ψ]ζ = fi ,
i = 1, 2,
respectively, with ζ1 (0) = ζ1 (π ) = 0, ζ1 (0) > 0 and ζ2 (0) = 0, ζ2 (0) > 0. If ζ1 > 0 and fi = fi (ζ ; x) is such that f2 > f1 on (0, π ), then ζ2 must vanish at some point of this domain.
(5.11)
694
L. F. Guidi, D. H. U. Marchetti
Proof. Let assume that ζ2 > 0 on (0, π ). Then, from (5.10) and the hypotheses of Theorem 5.5, we have 2 0
π
(f2 − f1 ) dx = (ζ1 , Lζ2 )p − (Lζ1 , ζ2 )p π
p ζ1 ζ2 − ζ1 p ζ2 dx = 2α 0 π "
# = 2α p ζ1 ζ2 − ζ1 ζ2 dx 0
which, in view of the boundary conditions and (5.11), implies a contradiction p(π ) ζ1 (π ) ζ2 (π ) > 0. Note that ζ1 (π ) < 0 since ζ1 > 0 on (0, π ) and ζ1 (π ) = 0. So, there must exist x ∈ (0, π) such that ζ2 (x) = 0. If we consider the eigenvalue equation L[ψ]θ = λθ
(5.12)
on (0, π ) for the smallest eigenvalue λ in the space of odd periodic function, θ satisfies the conditions of ζ1 in Theorem 5.5 with f1 = λpζ . Note the eigenfunction associated to the smallest eigenvalue may be chosen to be positive in the domain (0, π ). Applying Theorem 5.5 for (5.7) and (5.12) we arrive at the following stability criterium: Criterium 5.6. The smallest eigenvalue λ of L[ψ] is positive if ϕ > 0 on (0, π ) and negative if there exist x ∈ (0, π) such that ϕ(x) = 0, where ϕ is the solution of Eqs. (5.7) and (5.8). Now, for a given non-trivial stationary solution ψ let
χ = c −αψ + 4ψ ,
(5.13)
where c > 0 is chosen so that χ (0) = 1. It follows from the equation −αψ = 2 1 − αψ ψ (see (4.1)), that χ (0) = 0
and
χ >0
whenever ψ > 0 (recall ψ(0) = 0 and 1 − αψ > 0 for all closed orbits). Moreover, we have Proposition 5.7. 2 L[ψ]χ = 8cα 2 ψ ψ > 0 on the same domain (0, x) that ψ > 0.
(5.14)
Renormalization Group Flow of the 2-D Hierarchical Coulomb Gas
695
Proof. Differentiating (4.1) twice,
−α ψ = −2αψ ψ + 2 1 − 3αψ ψ , and using (4.1) again, gives
L[ψ]ψ = −α ψ + 2αψ ψ − 2 1 − αψ ψ = −4αψ ψ
= 8 1 − αψ ψ ψ . In addition, we have
L[ψ]ψ = −αψ + 2αψψ − 2 1 − αψ ψ = 2αψψ
which combined with the above equation, gives the equality in Proposition 5.7.
Completion of the proof of Theorem 5.3. We are in position to prove Theorem 5.3 for non-trivial equilibrium solutions. Let χ be given by (5.13) with ψ = ψ1+ . Then χ > 0 on (0, π ) and Theorem 5.5 can be used to compare Eq. (5.14) with (5.7). This yields ϕ > χ ≥ 0 on (0, π] which implies the stability of ψ1+ by Criterium 5.6. For instability, we observe that ψ satisfies
L[ψ]ψ = −αψ + 2αψψ − 2 1 − αψ ψ
= −αψ + 2αψψ − 2ψ = 0, in view of Eq. (4.1). Recall that ψ = ψj+ with j ≥ 2, has fundamental period 2π/j and satisfies ψ(π/j ) = ψ (π/j ) = 0 by the odd parity and Eq. (4.1). Since ψ (0) > 0, this 3π implies ψ < 0 on (π/j, 2π/j ) and the minimum of ψ is attained at x = . Since ψ 2j and ϕ satisfies the same self–adjoint equation pL[ψ]ζ = 0, their Wronskian
ϕ ψ W ϕ, ψ ; x = −αpϕ −αpψ
= αp ϕ ψ − ϕψ = αψ (0) > 0 is a non-vanishing constant (recall p(0) = 1, ϕ(0) = 0 and ψj+ (0) > 0). As a consequence
W ϕ, ψ ; π/j = −αp(x)ϕ x ψ x > 0
implies ϕ x < 0 because ψ x > 0. It thus follows from the stability criterium that ψj+ , j = 2, . . . , k, are unstable since x ∈ (0, π ) provided j ≥ 2 and there exist x ∈ (0, π ), x < x, such that ϕ(x) = 0. By a slight modification of these arguments, one may conclude the stability of ψ1− and instability of ψj− , j = 2, . . . , k, as well. This concludes the proof of Theorem 5.3 and, consequently, the proof of Theorem 5.1. Now, we turn to the Liapunov stability analysis with a proof of global stability of the trivial solution φ0 ≡ 0.
696
L. F. Guidi, D. H. U. Marchetti
Let V be a real-valued functional on the subspace of absolutely continuous function of D(A) given by π
V (v) = α −1 − v ln 1 − αv + v − v 2 dx (5.15) −π
and notice that V (0) = 0 and V (η) = W (η) + o η2 , as η → 0, where 1 π 2 α v − 2v 2 dx, W (v) = 2 −π −1
by Taylor expanding g(w) = α − w ln (1 − αw) − w around w = 0. Observe that W (v) = (1/2) v21/2 if α > 2 and since g(w) − (α/2)w 2 ≥ 0 if αw < 1, V (v) ≤ W (v) holds on the space 1 2 V = v ∈ Ho,p ∩ Ho,p : αv < 1 , of odd, positive and 2π-periodic functions with distributional derivative up to second order. A Liapunov function V of a dynamical system {S(t), t ≥ 0} satisfies · 1 V (v) = lim (V (S(t)v) − V (v)) ≤ 0 t↓0 t
(5.16)
for all v ∈ V. We now show that (5.16) holds if S(t) is given by Eq. (3.1). More precisely, Proposition 5.8. Let S(t)v0 = v(t; v0 ) be the dynamical system in V given by (5.1). Then, the pair of functions ρ(w) = and
1 1 − αw
(5.17)
D(p, w) = α −1 − w ln (1 − αw) + w − p 2
generate the Liapunov function given by (5.15): π · D(v, vx ) dx with V (v) = − V (v) = 0
0
π
ρ(vx )vt2 dx.
(5.18)
(5.19)
Proof. Note that, from the parity of v the integral in (5.15) can be made over [0, π ]. By the calculus of variations and Eqs. (5.17) and (5.18) we have π d ∂D ∂D ∂D π ˙ V (v (t, ·)) = − − vt vt dx + dx ∂vx ∂v ∂vx 0 0 π d (5.20) − =− ln(1 − αvx ) + 2v vt dx dx 0 π =− ρ(vx ) (αvxx − 2αvvx + 2v) vt dx, 0
where vt (t, 0) = vt (t, π ) = 0, t ≥ 0, in view of the boundary conditions on V. Since ρ(w) ≥ 0 for αw < 1, this with (3.1) concludes the proof of Proposition.
Renormalization Group Flow of the 2-D Hierarchical Coulomb Gas
697
Remark 5.9. We have used the construction method based in the Euler–Lagrange equation to find this Liapunov function (see e.g. Chap. 2 of Zelenyak, Lavrentiev and Vishnevskii [ZLV]). A sufficient condition for (5.19) hold leads to a first order partial differential equation for ρ wρp −
2 (1 − αw)pρw = −2pρ α
whose characteristics are given by the orbits γw0 described in Sect. 4 in the study of the equilibrium solutions of (3.1). Note that Eq. (5.18) is the Lagrangian associated with the Hamiltonian (4.13) (with q defined by (4.11)). Due to the requirement αw < 1, our particular solution takes into account only the closed orbits. There may be other suitable choices which include all orbits. The proof of global stability of φ0 requires that a subspace of V be invariant under the flow equation (3.1). This is shown in the following by using the maximum principle. Theorem 5.10. If v(t, x) is a classical solution of Eq. (3.1) with initial condition v(0, x) = v0 (x) ∈ V, then αvx (t, x) < 1 and α −1 (x − π ) < v(t, x) < α −1 x,
(5.21)
hold for all t ≥ 0 and 0 ≤ x ≤ π . Proof. Denoting L[v] := F (vxx , vx , v) − vt ,
(5.22)
where F (a1 , a2 , a3 ) = α (a1 − 2a2 a3 ) + 2a3 is a continuous and differentiable function of its variables, the differential equation (3.1) can be written as L[v] = 0.
(5.23)
For v satisfying (5.23) with v(t, 0) = v(t, π ) = 0 , 0 ≤ t ≤ τ , and initial data v(0, ·) = v0 , let us suppose z = z(t, x) and Z = Z(t, x) are such that L[Z] ≤ 0 ≤ L[z]
(5.24)
for all (t, x) in D = (0, τ )×(0, π ) with z(t, y) ≤ 0 ≤ Z(t, y), y = 0, π and 0 ≤ t ≤ τ , and z(0, x) ≤ v0 (x) ≤ Z(0, x), for 0 ≤ x ≤ π. Then, by the maximum principle (see [PW], Theorem 12 in Chap. 3), z(t, x) ≤ v(t, x) ≤ Z(t, x)
(5.25)
in D = [0, π ] × [0, τ ]. The lower limit function z is given by z(x) = θ (x − π ) ,
(5.26)
698
L. F. Guidi, D. H. U. Marchetti
with θ ≥ 0. From (5.23), L[z] = 2θ(αθ − 1)(π − x)}, is always positive provided αθ ≥ 1. Analogously, the upper limit function Z is given by Z(x) = δx,
(5.27)
from which L[Z] = −2δ(αδ − 1)x is always negative provided αδ ≥ 1. Since (5.24) holds uniformly in τ , Eq. (5.25) holds for all (t, x) in R+ × [0, π ]. Note that v remains bounded irrespective of αvx < 1. However, if this condition holds for t = 0, it remains for all t > 0. To see this, observe from the equation vt = vxx + (1 − αvx )v with vxx = 0, that the rate by which |v| increases tends to zero when the inequality saturates. The inclusion of the Laplacian only smooths v and prevents, even more, vx to increase beyond the threshold. The same argument justifies the strict inequality (5.21). This concludes the proof of Theorem 5.10. We pause to discuss xsome properties of the classical solutions of Eqs. (1.1) and (3.3). Recall that u(t, x) = 0 v(t, y) dy with v satisfying (3.1). 1 ∩ H 2 : u ≥ 0, αu Remark 5.11. Note that the cone C = u ∈ He,p xx < 1 is invariant e,p under the unnormalized evolution (1.1). For this, let M[u] := α(uxx − u2x ) + 2u. If u(t, x) is a classical solution of (1.1) with initial value u0 ∈ C, since M[u] = 0 for u ≡ 0, we have by Theorem 7 in Chap. 3 of [PW] (see also Remark (ii) after this) that u(t, x) ≥ 0 for all t > 0. This, however, does not imply that u(t, x) remains positive (recall (3.5)). A proof of this assertion goes as follows. Theorem 5.10 implies u(t, x) remains bounded, and uxx (t, 0) bounded from above, if u satisfies (3.3) with initial condition u0 satisfying α −1 [(x − π )2 /2 − π 2 /2] < u0 < α −1 x 2 /2 (by integrating (5.21)). The comparison principle applied directly to Eq. (3.3) leads to (5.24) with L replaced by M and 0 replaced by α uxx (t, 0). Upper and lower solutions, z and Z, can be obtained from the solution of the equilibrium initial value problem (4.2): x D± (α, w0 ; x) = S ± (α, w0 ; y) dy, 0
where S ± (α, w0 ; x) is the p-component of the closed orbit γ±w0 starting at (±w0 , 0). Note that D± is an even periodic function of period T = T (α, w0 ) given by a monotonically increasing function of both w0 and α with T → ∞ as αw0 ↑ 1 for D+ and as w0 → ∞ for D− (see proof of Theorem 4.1 for details). D+(−) is also a monotone increasing (decreasing) function of x in [0, T /2] and satisfies M[D± ] = ±αw0 ,
Renormalization Group Flow of the 2-D Hierarchical Coulomb Gas
699
with D± (0) = D± (0) = D± (T /2) = 0 and D± (0) = w0 . The lower limit function z is given by α z(x) = θ D− (α, w0 ; x + (5.28) 0 , x) + w 2 where θ < 1, w 0 ≥ w0 , x = x (α, w0 , w 0 ) is such that z(0) = 0, with w0 and w 0 so that T is very large and z (π ) = 0 which can always be done in view of the properties of D− . The upper limit function Z can be written also as (5.28) with D− replaced by D+ and the second term with minus sign. We have M[W ± ] = αθ{(1 − θ )(D± )2 ∓ ( w0 − w0 )}, with W + = Z and W − = z. In order that inequality (5.25) holds uniformly in τ , uxx (t, 0) has to remain bounded from above and below. Since uxx (t, 0) < α −1 by Theorem 5.10, one may choose θ arbitrarily small in (5.28) and take w 0 and w0 so large that θ ( w0 − w0 ) > α −1 . In the limit as θ → 0 we have u(t, x) ≥ 0 and 0 ≤ uxx (t, 0) < α −1 for 0 ≤ t ≤ τ , uniformly in τ , implying u(t, x) ≥ 0 for all t ≥ 0. LaSalle’s invariance principle allows us to apply Liapunov function techniques under milder assumptions. A subset K ⊂ V of a complete metric space V is said to be invariant (positive invariant) if, for any v0 ∈ K, there exist a continuous curve v : R −→ K with v(0) = v0 and S(t)v(τ ) = v(t + τ ) for all t ≥ 0 and τ ∈ R (R+ ). The following two theorems express the content of this principle. Theorem 5.12. Suppose v0 ∈ V is such that the orbit γ (v0 ) = {S(t)v0 , t ≥ 0} through v0 lies in a compact set in V and let ω(v0 ) denote its ω –limit set, i.e., ω(v0 ) =
+
γ (S(τ )v0 )
τ ≥0
(see (4.5) for alternative definition). Then ω(v0 ) is nonempty, compact, invariant, connected and dist (S(τ )u0 , ω(v0 )) −→ 0 as t → ∞. Proof. We refer to Theorem 4.3.3 of [H] for details. Note that ω(v0 ) is the intersection of a decreasing collection of nonempty compact sets. Note, in addition, that ω(v0 ) is positive invariant by definition and is invariant by compactness argument. Theorem 5.13. Let V be a Liapunov function for t ≥ 0 and, for · E := v ∈ V : V (v) = 0 ,
(5.29)
let K be the maximal invariant set in E. If the orbit γ (v0 ) lies in a compact set in V, then S(t)v0 −→ K as t → ∞. Proof. By definition, V (S(t)v0 ) is a nonincreasing function of t and bounded from below, by hypothesis. So, lim V (S(t)v0 ) = υ exists. If y ∈ ω(v0 ), then V (y) = υ t→∞
·
and, in view of the fact that S(t)y = y, we have V (S(t)y) = υ which implies V (t) = 0 and ω(v0 ) ∈ K.
700
L. F. Guidi, D. H. U. Marchetti
Now, we apply the invariance principle to the problem at our hand. As we will see, if B0 is a sufficiently large ball around φ0 = 0 in the cone C (with the induced topology 1 ), the invariant set K = {ω(u ), u ∈ B } ⊂ E consists of the union of unstaof He,p k 0 0 0 x ψj+ (y) dy, ble manifolds for the equilibrium points φ0 , φ1 , . . . , φk , with φj (x) = 0
provided α is such that 2/ (k + 1)2 ≤ α < 2/k 2 holds for some k ∈ N. Note that the hypotheses of Theorems 5.12 and 5.13 hold since the orbits of S(t)v0 are bounded in 1 by Theorem 5.10 and remain in a compact set of H 1 in view of Theorem 3.2. For Ho,p o,p this the Sobolev embedding theorem is evoked: W 2,2 (−π, π ) ⊂ C 1+a (−π, π ) with 1+a which belongs to continuous inclusion, so v has a continuous representative in Co,p 2+a Co,p by Schauder estimates (see e.g [S] and references therein). Therefore, any solux tions u(t, x) = 0 v(t, y) dy of (3.3) in C has a continuously three-times differentiable representative. We thus have Theorem 5.14. If α > 2, φ0 = 0 is globally asymptotically stable solution of (3.3) in 1 2 C = u ∈ He,p ∩ He,p : u(0) = 0, u ≥ 0 and αuxx < 1 . If α < 2, the origin is unstable in C and there exists an open dense set U ⊂ C of initial u(t; u0 ) −→ φ1+ for all u0 ∈ U. conditions such that lim t→∞
·
Proof. It follows from Theorem 5.13 v(t; ·) −→ ω (v(0; ·)) ⊂ ψ : V (ψ) = 0 in V ·
as t → ∞. But, from (5.20), V (ψ) = 0 iff αψ − 2α ψψ + 2ψ = 0,
(5.30)
whose solutions are ψ = ψ0 and ψj+ , j = 1, . . . , k, studied in Sect. 4. We note x that φj+ (x) = 0 ψj+ (y) dy ≥ 0 (φj− (x) ≤ 0) for all x ∈ [−π, π ] and j ≥ 1, since ψj+ (0) > 0 ( ψj− (0) < 0). Multiplying (5.30) by ψ and integrating over (−π, π ), gives π
αψ + 2ψ ψ dx = − ψ21/2 ≤ 0 −π
if α > 2. The nonlinear term vanishes since, by integration by parts, π π ψ ψ 2 dx = −2 ψ ψ 2 dx. −π
−π
This implies ψ ≡ 0 and proves that S(t)v0 −→ 0 as t → ∞ in V. We quote Theorem 4.3.5 in [H] for the instability assertion. Since the spectrum σ (L) of the linearized operator around the equilibrium points (see Theorem 5.3) lies on the real line, all equilibrium points are hyperbolic, E given in (5.29) is a discrete and finite set and , V= W s (ψ) ψ∈E
Renormalization Group Flow of the 2-D Hierarchical Coulomb Gas
701
holds with W s (ψ) = {u0 ∈ V : S(t)v0 −→ ψ as t → ∞}. It is proven in [H] that each stable manifold W s (ψ) is a C 2 embedded submanifold of V (W s (φ) is C 3 submanifold and, if ψ is locally unstable, then W s (ψ) has codimension larger than or equal to of C) 1. Therefore, V, and consequently C, can be written as a finite union of open connected sets together with a closed nowhere-dense remainder. Finally, we show that, for an open set V0 ⊂ V given as before, the maximal invariant set Kk =
,
W u (ψ),
(5.31)
ψ∈E
where W u (ψ) = {v0 ∈ V : S(t)v0 −→ ψ as t → −∞} is the unstable manifold of ψ. By Theorem 5.13, the orbit v(t; u0 ) = S(t)v0 exists and remains, by invariance, in Kk for all t ∈ R. Therefore, lim v(t, u0 ) = ψ exists and ψ ∈ Kk , so Kk ⊂ ψ∈E W u (ψ). t→∞
Since the converse is also true, the equality (5.31) thus holds. Acknowledgements. We wish to thank J. Fernando Perez for posing to one of the authors (D.H.U.M.), together with C. Ragazzo, the problem of what is the effect of the sequence of thresholds on the stable branch. The authors have benefited from the discussions with C. Ragazzo, W. F. Wreszinski and J. C. A. Barata.
References [BR]
Benfatto, G. and Renn, J.: Nontrivial fixed point and screening in the hierarchical two-dimensional Coulomb gas. J. Stat. Phys. 67, 957–980 (1992) [BF] Brydges, D. and Federbush, P.: Debye Screening. Commun. Math. Phys. 73, 197–246 (1980) [BGN] Benfatto, G., Gallavotti, G. and Nicoló, F.: On the massive sine-Gordon equation in the first few regions of collapse. Commun. Math. Phys. 83, 387–410 (1982) [C] Chicone, C.: The monotonicity of the period function for planar Hamiltonian vector fields. J. Diff. Eqns. 69, 310–321 (1987) [CG] Coppel, W.A. and Gavrilov, L.: The period function of a Hamiltonian quadratic system. Diff. and Int. Eqns. 6, 1357–1365 (1993) [CI] Chafee, N. and Infante, E.F.: A bifurcation problem for a nonlinear partial differential equation of parabolic type. J. Aplicable Anal. 4, 17–37 (1974) [CL] Coddington, E.A. and Levinson, N.: Theory of ordinary differential equations. NewYork: MacGrawHill Book Company, 1955 [DH] Dimock, J. and Hurd, T.R.: Sine-Gordon revisited. Ann. Henry Poincaré 1, 499–541 (2000) [F] Felder, G.: Renormalization group in the local potential approximation. Commun. Math. Phys. 111, 101–121 (1987) [FLL] Fisher, M.E., Li, X.-J.: and Levin, Y.: On the absence of intermediate phases in the two-dimensional Coulomb gas. J. Stat. Phys. 79, 1–11 (1995) [FS] Fröhlich, J. and Spencer, T.: The Kosterlitz–Thouless in two-dimensional abelian spin systems and Coulomb gas. Commun. Math. Phys. 81, 527–602 (1981) [GN] Gallavotti, G. and Nicoló, F.: The “The screening phase transitions” in the two-dimensional Coulomb gas. J. Stat. Phys. 39, 133–156 (1985) [H] Henry, D.: Geometric theory of semilinear parabolic equations. Lecture Notes in Mathematics 840, Berlin–Heidelberg–New York: Springer-Verlag, 1981 [JKKN] José, J.V., Kadanoff, L.P., Kirkpatric, S. and Nelson, D.R.: Renormalization, vortices, and symmetrybreaking perturbations in the two-dimensional planar model. Phys. Rev. B 16, 1217–1241 (1977) [MK] Marchetti, D.H.U. and Klein, A.: Power-law falloff in two-dimensional Coulomb gases at inverse temperature β > 8π . J. Stat. Phys. 64, 135–162 (1991) [MP] Marchetti, D.H.U. and Perez, J.F.: The Kosterlitz–Thouless phase transition in two-dimensinal hierarchical Coulomb gases. J. Stat. Phys. 55, 141–156 (1989) [PW] Protter, M.H. and Weinberger, H.F.: Maximum principles in differential equations. New Jersey: Prentice–Hall, Inc., 1964 [S] Sattinger, D.H.: Monotone methods in nonlinear elliptic and parabolic boundary value problems. Indiana Univ. Math. J. 21, 979–1000 (1972)
702
L. F. Guidi, D. H. U. Marchetti
[Y]
Yang, W.-S.: Debye screening for two-dimensional Coulomb systems at high temperatures. J. Stat. Phys. 49, 1–32 (1987) Zelenyak, T.I., Lavrentiev, Jr. M.M., and Vishnevskii, M.P.: Qualitative theory of parabolic equations. Utrecht: VSP, 1997
[ZLV]
Communicated by D. C. Brydges