Commun. Math. Phys. 189, 1 – 7 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
On Rigidity of Analytic Black Holes Piotr T. Chru´sciel? D´epartement de Math´ematiques, Facult´e des Sciences, Parc de Grandmont, F37200 Tours, France. E-mail:
[email protected] Received: 25 October 1996 / Accepted: 30 January 1997
Abstract: We establish global extendibility (to the domain of outer communication) of locally defined isometries of appropriately regular analytic black holes. This allows us to fill a gap in the Hawking–Ellis proof of black–hole rigidity.
1. Introduction According to Hawking and Ellis [11, Prop. 9.3.6], under appropriate conditions, which include analyticity of all the objects under consideration, the event horizon of a stationary, say electro–vacuum, black hole space–time (M, g) is necessarily a Killing horizon. More precisely, the isometry group of (M, g) should contain an R subgroup, the orbits of which are normal to the black hole horizon. In order to substantiate their claim the authors of [11] first argue that for each t the map defined as the translation by t along the appropriately parameterized generators of the event horizon extends to an isometry φt in a neighborhood of the event horizon. Next they assert that for all t one can analytically continue φt to the whole space–time, to obtain a globally defined one parameter group of isometries. This last claim is wrong, which1 can be seen as follows: Let (M, gab ) be the extension of the exterior region of the Kerr space–time (with a2 < m2 ) consisting of “two type I regions and two type II regions”, as described in Sect. 5.6 of [11] (thus (M, gab ) consists of the four uppermost blocks of Fig. 28(i), p. 165 in [11]). Let φt denote those isometries of (M, gab ) which are time–translations in an asymptotic region Mext , and let hhMext ii denote the domain of outer communication of (M, gab ) as determined by Mext (cf. Eq. (2.1) below; Mext corresponds to one of the blocks “I” of Fig. 28 of [11]). Let Σ be any asymptotically flat Cauchy surface of (M, gab ) (thus Σ has ? On leave of absence from the Institute of Mathematics, Polish Academy of Sciences, Warsaw. Supported in part by KBN grant # 2 P301 105 007. 1 The construction that follows is a straightforward adaptation to the problem at hand of a construction in [8, Sect. 5].
2
P.T. Chru´sciel
two asymptotic regions), and let E be any embedded two-sided three-dimensional submanifold of intD+ (Σ; M ) \ hhMext ii, invariant under φt . We shall moreover suppose that M \ E is connected, and that E is not invariant under the U (1) factor of the isometry group of (M, gab ). Let (Ma , ga ), a = 1, 2, be two copies of M \ E with the metric induced from g. As E is two-sided, there exists an open neighborhood O of E such that E separates O into two disjoint open sets Oa , a = 1, 2, with O1 ∩ O2 = E, O1 ∩ O2 = ∅. Let ψa denote the natural embedding of Oa into Ma . Let M3 be the disjoint union of M1 , M2 and O, with the following identifications: a point p ∈ Oa ⊂ O is identified with ψa (p) ∈ Ma . It is easily seen that M3 so defined is a Hausdorff topological space. We can equip M3 with the obvious real analytic manifold structure and an obvious metric g3 coming from (M1 , g1 ), (M2 , g2 ) and (O, g|O ). Note that g3 is real analytic with respect to this structure. Let finally (M4 , g4 ) be any maximal 2 vacuum real analytic extension of (M3 , g3 ). Then (M4 , g4 ) is a maximal vacuum real analytic stably causal extension of hhMext ii which clearly is not isometric to (M, g). The space–time (M4 , g4 ) satisfies all the hypotheses of [11]. It should be clear that all the orbits of the rotation group acting on (M, gab ) meeting E \ E are incomplete in (M4 , g4 ), so that the connected component of the identity of the group of isometries drops down from R × U (1) (in the case of (M, gab )) to R (in the case of (M4 , g4 )). We shall not attempt to give a rigorous proof of such a claim in all generality, let us instead give a simple argument showing that at least some orbits cannot be complete whatever (M4 , g4 ), under a supplementary condition on E. Suppose, for instance, that for all p ∈ E ⊂ M the standard “rotational” Killing vector, say Z, in (M, gab ) (which is equal to ∂/∂φ in the coordinate system used in Eq. (5.31) of [11]) is transverse to E at p. (A possible example of such set E is E = {u+ ∈ R, r ∈ (a, b), θ ∈ ( π4 , 3π 4 ), φ = 0} in the coordinate system used in Eq. (5.31) of [11], where a and b are arbitrary constants satisfying r− < a < b < r+ .) For any point p ∈ E the orbit φt [Z](p) of the Killing vector Z starts at p at t = 0, enters, say, O1 , travels in M to come back at t = 2π from O2 . Let χa be the natural embedding of Ma into M4 , choose a point p ∈ E, let be small enough so that φt [Z](p) ∈ O1 for t ∈ [0, ), set pˆ = χ2 (φ [Z](p)) (here we have implicitly identified the point φ [Z](p) with the corresponding point in M2 ), ˆ p) consider the orbit φt [Z]( ˆ of Zˆ through p, ˆ where Zˆ is the Killing vector in M4 which coincides with Z on M . (If no such Killing vector Zˆ exists there is nothing to prove.) ˆ p) ˆ p) By construction we have φ2π [Z]( ˆ 6= p, ˆ as the orbit φt [Z]( ˆ leaves M2 and enters M1 when crossing E ⊂ O from O2 into O1 . Thus there exist orbits of Zˆ which are not 2π periodic on M4 . Suppose, for contradiction, that all the orbits of Zˆ are complete. Choose a domain of outer communication hhMext ii of M4 diffeomorphic to a standard domain ˆ of outer communication in the Kerr space–time (M, gab ), we have φ2π [Z](p) = p for all p ∈ hhMext ii. This, by standard results on one parameter groups of isometries on ˆ is the identity on M4 (and, hence, all the orbits of connected sets implies that φ2π [Z] ˆ Z are 2π periodic), which leads to a contradiction, and establishes incompleteness of some Killing orbits on M4 . Topological games put aside, the method of proof suggested in [11] of analytically extending φt faces the problem that φt might potentially be analytically extendible to a 2 Cf., e.g. [4, Appendix C] for a proof of existence of space–times maximal with respect to some property. It should be pointed out that there is an error in that proof, as the relation ≺ defined there is not a partial order. This is however easily corrected by adding the requirement that the isometry 8 considered there restricted to some fixed three–dimensional hypersurface be the identity.
On Rigidity of Analytic Black Holes
3
proper subset3 of the space–time only. One can nevertheless hope that the analyticity of the domain of outer communication and some further conditions, as e.g. global hyperbolicity thereof, allow one to extend the locally defined isometries at least to the whole domain of outer communication. The aim of this paper is to show that this is indeed the case. More precisely, we wish to show the following: Theorem 1.1. Consider an analytic space–time (M, gab ) with a Killing vector field X with complete orbits. Suppose that M contains an asymptotically flat three–end Σext with time-like ADM four–momentum, and with X(p) — time-like for p ∈ Σext . (Here asymptotic flatness is defined in the sense of Eq. (2.2) with α > 1/2 and k ≥ 3.) Let hhMext ii denote the domain of outer communication associated with Σext as defined below, assume that hhMext ii is globally hyperbolic and simply connected. If there exists a Killing vector field Y , which is not a constant multiple of X, defined on an open subset O of hhMext ii, then the isometry group of hhMext ii (with the metric obtained from (M, gab ) by restriction) contains R × U (1). Remarks . 1. It should be noted that no field equations or energy inequalities are assumed. 2. Simple connectedness of the domain of outer communication necessarily holds when a positivity condition is imposed on the Einstein tensor of gab [10].4 3. When a positivity condition is imposed on the Einstein tensor of gab , the hypothesis of time-likeness of the ADM momentum can be replaced by that of existence of an appropriately regular Cauchy surface in (M, gab ). See, e.g., [12] and references therein; cf. also [2] for a recent discussion. 4. It should be emphasized that no claims about isometries of M \ hhMext ii (with the obvious metric) are made. Theorem 1.1 allows one to give a corrected version of the rigidity theorem, the reader is referred to [7] for a precise statement together with a proof. It seems of interest to remove the condition of completeness of the Killing orbits of X above. Recall that completeness of those necessarily holds [5] in maximal globally hyperbolic, say vacuum, space–times under various conditions on the Cauchy data. (It was mentioned in [6] that the results of [5] generalize to the electro–vacuum case.) Those conditions are, however, somewhat unsatisfactory in the black hole context for the following reasons: recall that the existing theory of uniqueness of black holes gives only a classification of domains of outer communication hhMext ii. Thus in this context one would like to have results which do not make any hypotheses about the global properties of the complement of hhMext ii in M . Moreover the hypotheses of those results of [5] which apply when degenerate Killing horizons are present require further justification. Here we wish to raise the question, whether or not it makes sense to talk about a stationary black hole space–time for space–times for which the Killing orbits are not complete in the asymptotic region. We do not know an answer to that question. It is nevertheless tempting to decree that in “physically reasonable” stationary black hole space–times the 3 In the physics literature there seem to be misconceptions about existence and uniqueness of analytic extensions of various objects. As a useful example the reader might wish to consider the (both real and complex) analytic function f from, say, the open disc D(1, 1/2) of radius 1/2 centered at 1 into C, defined as the restriction of the principal branch of log z. Then: 1) There exists no analytic extension of f from D(1, 1/2) to C. 2) There exists no unique maximal subset of C on which an analytic extension of f is defined. 4 Cf. also [9, 13, 11] for similar but weaker results. Note that in the stationary black hole context, under suitable hypotheses one can use Theorem 1.2 below to obtain completeness of orbits of X in hhMext ii, and then use [9] to obtain simple–connectedness of hhMext ii.
4
P.T. Chru´sciel
orbits of the Killing vector field X which is time-like in the asymptotically flat three–end Σext are complete through points in the asymptotic region Σext . One would then like to be able to derive various desirable global properties of hhMext ii using this assumption. Our second result in this paper is the proof that in globally hyperbolic domains of outer communication the orbits of those Killing vector fields which are time-like in Σext are complete “if and only if” they are so5 for points p ∈ Σext (it should be emphasized that, in contradistinction to [5], no maximality hypotheses are made and no field equations are assumed below; similarly no analyticity or simple connectedness conditions are made here): Theorem 1.2. Consider a space–time (M, gab ) with a Killing vector field X and suppose that M contains an asymptotically flat three–end Σext , with X time-like in Σext . (Here the metric is assumed to be twice differentiable, while asymptotic flatness is defined in the sense of Eq. (2.2) with α > 0 and k ≥ 0.) Suppose that the orbits of X are complete through all points p ∈ Σext . Let hhMext ii denote the domain of outer communication associated with Σext as defined below. If hhMext ii is globally hyperbolic, then the orbits of X through points p ∈ hhMext ii are complete. In view of the recent classification of orbits of Killing vector field in asymptotically flat space–times of [1] it is of interest to prove the equivalent of Theorem 1.2 for “stationary–rotating” Killing vectors X, as defined in [1]. In Theorem 3.1 below we prove that generalization. 2. Definitions, Proof of Theorem 1.1 Throughout this work all objects under consideration are assumed to be smooth. For a vector field W we denote by φt [W ] the (perhaps defined only locally) flow generated by W. Consider a Killing vector field X which is time-like for p ∈ Σext . If the orbits γp of X are complete through points p ∈ Σext , then we define the asymptotically flat four–end Mext by (2.1) Mext = ∪t∈R φt [X](Σext ), and the domain of outer communication hhMext ii by hhMext ii = J − (Mext ) ∩ J + (Mext ). Let R > 0 and let (gij , Kij ) be initial data on Σext ≡ ΣR ≡ R3 \ B(R) satisfying gij − δij = Ok (r−α ),
Kij = Ok−1 (r−1−α ),
(2.2)
with some k ≥ 1 and some 0 < α < 1. A set (Σext , gij , Kij ) satisfying the above will be called an asymptotically flat three–end. Here a function f , defined on ΣR , is said to be Ok (rβ ) if there exists a constant C such that we have 0≤i≤k
|∂ i f | ≤ Crβ−i .
We shall need the following result, which is a straightforward consequence6 of what has been proved in [14]: 5 The quotation marks here are due to the fact that in our approach the asymptotic four–end hhM ii is ext not even defined when the orbits of X through Σext are not complete. In that last case one could make sense of this sentence using Carter’s definition of the domain of outer communication [3], involving Scri. 6 Actually in [14] it is assumed that (M, g ) is Riemannian. The reader will note that all the assertions ab and proofs of [14] remain valid word for word when “Riemannian” is replaced by “pseudo–Riemannian”.
On Rigidity of Analytic Black Holes
5
Theorem 2.1 (Nomizu). Let (M, gab ) be a (connected) simply connected analytic pseudo–Riemannian manifold, and suppose that there exists a Killing vector field Y defined on an open connected subset O of M. Then there exists a Killing vector field Yˆ defined on M which coincides with Y on O. Let us pass to the proof of Theorem 1.1. Without loss of generality we may assume that X is future oriented for p ∈ Σext . Simple connectedness and analyticity of hhMext ii together with Theorem 2.1 allow us to conclude that the Killing vector Y can be globally extended to a Killing vector field Yˆ defined on hhMext ii. The time-likeness of the ADM four–momentum pµ allows us to use the results in [1] to assert that there exists a linear combination Z (with constant coefficients) of X and Yˆ which has complete periodic orbits through all points p in Mext which satisfy r(p) ≥ R, for some R. (Moreover Z and X commute.) To prove Theorem 1.1 we need to show that the orbits of Z are complete (and periodic) for all p ∈ hhMext ii. Consider, thus, a point p ∈ hhMext ii. There exist q± ∈ Mext , with r(q± ) ≥ R, such that p ∈ J − (q+ ) ∩ J + (q− ). Completeness and periodicity of the orbits γq± [Z] ≡ ∪t∈R φt [Z](q± ) of Z through q± implies that the sets γq± [Z] are compact. Global hyperbolicity of hhMext ii implies then that K ≡ J − (γq+ [Z]) ∩ J + (γq− [Z]) is compact. For q ∈ hhMext ii let t± (q) ∈ R ∪ {±∞} be the forward and backward life time of orbits of Z through q, defined by the requirement that (t− (q), t+ (q)) is the largest connected interval containing 0 such that the solution φt [Z](q) of the equation dφt [Z](q)/dt = Z ◦ φt [Z](q) is defined for all t ∈ (t− (q), t+ (q)). From continuous dependence of solutions of ODE‘s upon initial values it follows that t+ is a lower semi– continuous function and t− is an upper semi–continuous function. Let γ : [0, 1] → M be any future oriented causal curve such that γ(0) = q− , γ(1) = q+ , and p ∈ γ. Set T+ = inf t+ (q), q∈γ
T− = sup t− (q). q∈γ
(2.3)
Here and elsewhere inf and sup are taken in R ∪ {±∞}. If T± = ±∞ we are done, suppose thus that T+ 6= ∞; the case T− 6= −∞ is analyzed in a similar way. By lower ˜ = T+ . By semi–continuity of t+ and compactness of γ there exists p˜ ∈ γ such that t+ (p) global hyperbolicity the family of causal curves φt [Z](γ), t ∈ [0, T+ ), accumulates at ˜ t ∈ [0, T+ ), has an accumua causal curve γ˜ ⊂ K. Consequently the orbit φt [Z](p), lation point in K. It follows that φt [Z](p) ˜ can be extended beyond T+ , which gives a contradiction unless T+ = ∞, and the result follows. 3. Proof of Theorem 1.2 Proof of Theorem 1.2. Without loss of generality we may suppose that X is future oriented for p ∈ Σext . Consider a point p ∈ hhMext ii, there exist p± ∈ Mext such that p ∈ J + (p− )∩J − (p+ ). Let Σ be a Cauchy surface for hhMext ii, without loss of generality we may assume that p− ∈ I − (Σ) and p+ ∈ I + (Σ). Let t± be defined as in the proof of Theorem 1.1, we have t− (p± ) = −∞, t+ (p± ) = ∞. Let γ : [0, 1] → hhMext ii be any
6
P.T. Chru´sciel
causal curve such that γ(0) = p− , γ(1) = p+ , and p ∈ γ. Define T± by Eq. (2.3). By ˜ = T+ . Define lower semi–continuity of t+ there exists p˜ ∈ γ such that t+ (p) ˜ = {s ∈ [0, T+ ) : φs [X](p) ˜ ∈ I − (Σ)} . ˜ Then the curve obtained by concatenating φt [X](p− ), t ∈ [0, s], Consider any s ∈ . with φs [X](γ) is a future directed causal curve which starts at p− and passes through φs [X](p), ˜ hence ˜ s∈
⇒
φs [X](p) ˜ ∈ K ≡ J + (p− ) ∩ J − (Σ) .
(3.1)
˜ = ∅ set ω = 0, otherwise By global hyperbolicity of hhMext ii the set K is compact. If set ˜ . ω = sup ˜ such that ωi → ω. By (3.1) and by compactness of K Consider any sequence ωi ∈ ˜ has an accumulation point in K. It follows that ω < T+ . the sequence φωi [X](p) ˜ ∈ J + (Σ) for all ω ≤ s < T+ . By Lemma 2.5 By definition of ω we have φs [X](p) ˜ = T+ we obtain t+ (p) = ∞. The equality of [5] it follows that T+ = ∞. As t+ (p) ≥ t+ (p) t− (p) = −∞ for all p ∈ hhMext ii is obtained similarly by using the time–dual version of Lemma 2.5 of [5]. Before presenting a generalization of Theorem 1.2 which covers the case of “stationary–rotating” Killing vectors, as defined in [9, 1], we need to introduce some terminology. Following [9] we shall say that the orbit through p of a Killing vector field Z is time–oriented if there exists tp > 0 such that φtp [Z](p) ∈ I + (p). It then follows that for all α ∈ R and all z ∈ N we have φα+ztp [Z](p) ∈ I + (φα [Z](p)): if γ is a timelike curve from p to φtp [Z](p), one obtains a timelike curve from φα [Z](p) to φα+ztp [Z](p) by concatenating φα [Z](γ) with φα+tp [Z](γ) with φα+2tp [Z](γ), etc. A trivial example of a Killing vector field with time–oriented orbits is given by a timelike Killing vector field. A more interesting example is that of “stationary–rotating” Killing vector fields, as considered in [9, 1] — loosely speaking, those are Killing vectors which behave like α∂/∂t + β∂/∂φ in the asymptotic region, with α and β non– vanishing, where φ is an angular coordinate. Thus the theorem that follows applies in the “stationary–rotating” case. Theorem 3.1. The conclusion of Theorem 1.2 will hold if to its hypotheses one adds the requirement that k in (2.2) is larger than or equal to 2, and if the hypothesis that X is timelike is replaced by the assumption that the orbits of X are time–oriented through all p ∈ Σext . Proof. The proof is achieved by a minor modification of the proof of Theorem 1.2, as follows: Let p± be as in that proof, from the asymptotic behavior of Killing vector fields in asymptotically flat space–times (cf. e.g. Sect. 2 of [2]) it follows that we can without loss of generality assume that φ2π [X](p− ) ∈ I + (p− ) , φ2π [X](p+ ) ∈ I + (p+ ), − ∀s ∈ [0, 2π] φs [X](p− ) ∈ I (Σ) , φs [X](p+ ) ∈ I + (Σ) . The proof proceeds then as before, up to the definition of the set K, Eq. (3.1). In the present case that definition is replaced by K ≡ J + (∪s∈[0,2π] φs [X](p− )) ∩ J − (Σ) .
On Rigidity of Analytic Black Holes
7
This set is again compact, in view of global hyperbolicity of Mext . The fact that for ˜ we have φs [X](p) ˜ ∈ K follows by considering the causal curve obtained by s ∈ ˜ to φs [X](p) ˜ with φs [X](γ). concatenating a causal curve γ1 from φs−bs/2πc2π [X](p) Here bαc denotes the largest integer smaller than or equal to α; the existence of γ1 is guaranteed by our discussion above. Acknowledgement. The author is grateful to I. R´acz for comments about a previous version of this paper.
References 1. Beig, R. and Chru´sciel, P.T.: The isometry groups of asymptotically flat, asymptotically empty space– times with timelike ADM four–momentum. Commun. Math. Phys. 188, 585–597 (1997) 2. Beig, R. and Chru´sciel, P.T.: Killing vectors in asymptotically flat space–times: I. Asymptotically translational Killing vectors and the rigid positive energy theorem. Jour. Math. Phys. 37, 1939–1961 (1996) gr-qc/9510015 3. Carter, B.: Black hole equilibrium states. In: Black Holes C. de Witt and B. de Witt, eds., Paris: Gordon & Breach, 1973 Proceedings of the Les Houches Summer School 4. Chru´sciel, P.T.: On uniqueness in the large of solutions of Einstein equations (“Strong Cosmic Censorship”). Canberra: Australian National University Press, 1991 5. Chru´sciel, P.T.: On completeness of orbits of Killing vector fields. Class. Quantum Grav. 10, 2091–2101 (1993) gr-qc/9304029 6. Chru´sciel, P.T.: “No Hair” Theorems – folklore, conjectures, results. In: Differential Geometry and Mathematical Physics (J. Beem and K.L. Duggal, eds.), Vol. 170, Providence RI: American Mathematical Society, 1994, pp. 23–49, gr-qc/9402032 7. Chru´sciel, P.T.: Uniqueness of black holes revisited. Univ. de Tours preprint, To appear in the Proceedings of Journ´ees Relativistes, May 1996, N. Straumann, P. Jetzer, G. Lavrelashvili, eds, Helv. Phys. Acta 69, 529–552 (1996) 8. Chru´sciel, P.T. and Rendall, A.: Strong cosmic censorship in vacuum space–times with compact, locally homogeneous Cauchy surfaces. Annals of Phys. 242, 349–385 (1995) 9. Chru´sciel, P.T. and Wald, R.M.: On the topology of stationary black holes. Class. Quantum Grav. 11, L147–152 (1994) 10. Galloway, G.J.: On the topology of the domain of outer communication. Class. Quantum Grav. 12, L99–L101 (1995) 11. Hawking S.W. and Ellis, G.F.R.:The large scale structure of space-time. Cambridge: Cambridge University Press, 1973 12. Horowitz, G.T.: The positive energy theorem and its extensions. In: Asymptotic behavior of mass and spacetime geometry (F. Flaherty, ed.), Springer Lecture Notes in Physics, Vol. 202, New York: Springer Verlag, 1984 13. Jacobson, T. and Venkatarami, S.: Topology of event horizons and topological censorship. Class. Quantum Grav. 12, 1055–1061 (1995) 14. K. Nomizu, On local and global existence of Killing vector fields. Ann. Math. 72 105–120 (1960) Communicated by H. Nicolai
Commun. Math. Phys. 189, 9 – 16 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Logarithmic Sobolev Inequalities on Path Spaces Over Riemannian Manifolds? Elton P. Hsu Department of Mathematics, Northwestern University, Evanston, IL 60208, USA. E-mail:
[email protected] Received: 3 September 1996 / Accepted: 6 February 1997
Abstract: Let Wo (M ) be the space of paths of unit time length on a connected, complete Riemannian manifold M such that γ(0) = o, a fixed point on M , and ν the Wiener measure on Wo (M ) (the law of Brownian motion on M starting at o). If the Ricci curvature is bounded by c, then the following logarithmic Sobolev inequality holds: Z F 2 log |F |dν ≤ e3c kDF k2 + kF k2 log kF k. Wo (M )
1. Introduction Logarithmic inequalities were introduced in Gross [5] as a tool for studying hypercontractivity of symmetric Markov semigroups. Let (X, ν) be a probability space and E a densely defined nonnegative quadratic form on L2 (X, ν). We say that the logarithmic Sobolev inequality holds for E if Z F 2 log |F |dν ≤ E(F, F ) + kF k2 log kF k, ∀F ∈ Dom(E). X
Gross [5] proved that it holds for the standard gaussian measure on RN for any N with Z |∇F |2 dν. E(F, F ) = RN
This implies immediately by a simple argument that the logarithmic Sobolev inequality holds for the quadratic form Z |DF |2 dν, E(F, F ) = Wo (RN )
?
Research was supported in part by NSF grant DMS-9406888.
10
E.P. Hsu
on the probability space (Wo (RN ), ν), where Wo (RN ) is the path space on RN and ν the Wiener measure, and D is the gradient operator on Wo (Rn ) (see the definition below). It was proved in Gross [6] that the logarithmic Sobolev inequality holds for the Wiener measure on Wo (G), where G is a connected Lie group with the gradient operator derived from the Cartan (±)-connection. Note that for these connections the curvature vanishes. It has been conjectured by the author that a logarithmic Sobolev inequality on the path space over a general complete, connected Riemannian manifold holds with a bounding constant which can be estimated in terms of the Ricci curvature. This conjecture is completely borne out by the main result of the present work. Let M be a complete, connected Riemannian manifold of dimension n. Throughout this work we assume that M has bounded Ricci curvature. We write |RicM | ≤ c if sup {|Ric(v, v)| : v ∈ Tx M, |v| = 1, x ∈ M } ≤ c. Fix a point o ∈ M and let Wo (M ) = {γ ∈ C([0, 1], M ) : γ(0) = o} , be the space of pinned paths from o. We will work with the Wiener measure ν on Wo (M ), which can be defined as follows. Let O(M ) be the bundle of orthonormal frames over M and π : O(M ) → M the canonical projection. Fix an orthonormal frame uo at o and let {Us } be the solution of the following stochastic differential equation on O(M ): dUs = HUs ◦ dωs ,
U 0 = uo ,
(1.1)
where {ωs } is an Rn -valued Brownian motion. Here H = {Hi , 1 ≤ i ≤ n} are the canonical horizontal vector fields on O(M ). The projected process γs = πUs is a Brownian motion on M starting from o. The Wiener measure ν is just the law of the Brownian motion {γs }. We now introduce the gradient operator D on the path space. Let H be the Rn -valued Cameron-Martin space, i.e., the space of Rn -valued functions h such that h0 = 0 and h˙ ∈ L2 ([0, 1]; Rn ). It is a Hilbert space with the norm Z 1 |h˙ s |2Rn ds. |h|2H = 0
For each h ∈ H, let Dh be the vector field on the path space Wo (M ) defined by D ht(γ)s = 1U (γ)s hs , where U (γ) is the horizontal lift of γ with initial value uo . Let ζh , t ∈ R be the flow on the path space Wo (M ) generated by the vector field Dh . Let F be a real-valued function on Wo (M ). We define F (ζht γ) − F (γ) , t→0 t
Dh F (γ) = lim
if the limit exists in L2 (ν). The gradient DF is defined to be the H-valued function DF such that hDF, hiH = Dh F for all h ∈ H. Suppose that F is a cylindrical function given by (1.2) F (γ) = f (γs1 , · · · , γsl ), where f : M × · · · × M → R1 is smooth and 0 ≤ s1 < · · · < sl ≤ 1. Then it is easy to verify that l X (i) (s ∧ si )U (γ)−1 (1.3) DF (γ) = si ∇ F (γ), i=1
Logarithmic Sobolev Inequalities
11
where ∇(i) F denotes the gradient of f with respect to the ith variable. From definition we have for F in (1.2) 2 l l X X 2 −1 (i) (si − si−1 ) U si ∇ F . |DF |H = j=i i=1
(1.4)
This formula will be useful later. We state the main result of the present work. Theorem 1.1. Suppose that M is a complete, connected manifold such that |RicM | ≤ c. Then the following logarithmic Sobolev inequality on the path space Wo (M ): Z F 2 log |F |dν ≤ e3c kDF k2 + kF k2 log kF k, ∀F ∈ Dom(D). Wo (M )
We conclude this introduction by stating a few well known consequences of the logarithmic Sobolev inequality (see Gross [7]). The self-adjoint operator L = −D∗ D is a generalization of the usual Ornstein-Uhlenbeck operator for a euclidean path space. Let Qt = eLt/2 be the associated Markovian semigroup. Theorem 1.2. Let M be a complete, connected Riemannian manifold such that |RicM | ≤ c. Let λM = e−3c . (i) The semigroup {Qt } on the path space Wo (M ) is hypercontractive: kQt kLp (ν)→Lq (ν) ≤ 1
if eλM t ≥
q−1 . p−1
(ii) The spectral gap of L exists and is at least λM , namely the following Poincar´e inequality holds: if F ∈ Dom(D), then 2 kF − EF k2 ≤ λ−1 M kDF k .
(iii) If F ∈ L2 (ν), then kQt F − EF k ≤ e−λM t/2 kF k. Remark 1.3. Using a Clark-Haussman-Ocone formula for path spaces, Fang [4] proved directly the existence of a spectral gap for the Ornstein-Uhlenbeck operator L on the path space over a connected, compact Riemannian manifold. 2. Gradient of a Wiener Functional The key to the present proof to the logarithmic Sobolev inequality is a formula for ∇Ex F , the gradient of the expected value of a cylindrical function F . Define the matrix-valued process {φs } by Z 1 s φτ RicUτ dτ, (2.1) φs = I − 2 0 where Ricu : Rn → Rn denotes the Ricci curvature transform read at the frame u and I is the identity matrix.
12
E.P. Hsu
Proposition 2.1. Let F be a cylindrical function given by (1.2). Then ) ( l X −1 (i) φ si U s i ∇ F . ∇Ex F = Ux Ex
(2.2)
i=1
Proof. The case l = 1 is due to Bismut (see Bismut [2], p.82). We give a proof here based solely on Itˆo’s formula for the horizontal Brownian motion {Us }, see (1.1). Let f be a smooth function on M and consider the function J(τ, u) = Eπu f (γτ ). It satisfies the equation 1 (2.3) ∂τ J(s − τ, u) + 1H J(s − τ, u) = 0, 2 P n where 1H = i=1 Hi2 is Bochner’s horizontal Laplacian on O(M ). This implies that {J(s − τ, Uτ ), 0 ≤ τ ≤ s} is a martingale. We now apply Itˆo’s formula to the horizontal gradient ∇H J(s − τ, Uτ ), using the fact that {Uτ } is a diffusion generated by 1H : dτ ∇H J(s − τ, Uτ ) = ∂τ ∇H J(s − τ, Uτ )dτ + h∇H ∇H J(s − τ, Uτ ), dωτ i 1 + 1H ∇H J(s − τ, Uτ )dτ 2 1 H H 1 , ∇ J(s − τ, γτ )dτ = h∇H ∇H J(s − τ, Uτ ), dωτ i + 2 1 H H +∇ ∂τ J(s − τ, Uτ ) + 1 J(s − τ, Uτ ) dτ 2 1 = h∇H ∇H J(s − τ, Uτ ), dωτ i + RicUτ ∇H J(s − τ, Uτ )dτ. 2 Here we have used (2.3) and Bochner’s formula H H 1 , ∇ J(u) = Ricu ∇H J(u) for the lift J of a function on M . It follows that φτ ∇H J(s − τ, Uτ ), 0 ≤ τ ≤ s is a martingale. Equating the expected values at τ = 0 and τ = s we obtain ∇H J(s, uo ) = E φs ∇H J(0, Us ) , This is equivalent to what we wanted because by definition ∇H J(τ, u) = u−1 ∇Eπu f (γτ ). By the Markov property and the case l = 1 we have ∇g(γs1 ) , (2.4) ∇Ex F = ∇Ex g(γs1 ) = Ux Ex φs1 Us−1 1 where
g(y) = Ey {f (y, γs2 −s1 , · · · , γsl −s1 )} . Using the induction hypothesis we have ∇g(y) = Ey ∇(1) f (y, γs2 −s1 , · · · , γsl −s1 ) +Uy
l X
Ey φsi −s1 Us−1 ∇(i) f (y, γs2 −s1 , · · · , γsl −s1 ) , i −s1
i=2
where Uy is any orthonormal frame at y and {Us } starts at Uy . Substituting this into (2.4) and using the Marko property again we obtain the desired equality.
Logarithmic Sobolev Inequalities
13
The following corollary to Proposition 2.1 is known and appeared as a special case of Theorem 6.4(iii) in Donnelly and Li [3]. Corollary 2.2. Suppose that RicM ≥ −c for a nonnegative constant c. Then |∇Ex f (γs )| ≤ ecs/2 Ex |∇f (γs )|. Proof. This follows from the preceding proposition (with l = 1) and the inequality |φs | ≤ ecs/2 , which is a consequence of the assumption on the Ricci curvature. 3. A Finite Dimensional Case Let µ be the gaussian measure on Rn given by n/2 2 1 e−|x| /2s dx. µs (dx) = 2πs Gross [6] proved the following logarithmic Sobolev inequality: Z Z f 2 log |f |dµs ≤ s |∇f |2 dµs + kf k2µs log kf kµs . Rn
Rn
The first step towards proving a logarithmic Sobolev inequality in path space is to generalize the above result to Riemannian manifolds with the gaussian measure µs replaced by νs,z (dx) = p(s, z, x)dx, where p(s, z, x) is the heat kernel. Note that νs,z is the distribution of Brownian motion (starting at z) at time s. The main result of this section is Theorem 3.1 below. As before, let Z f (x)p(s, z, x)dx Ps f (z) = νs,z (f ) = M
be the heat semigroup. Theorem 3.1. Suppose that RicM ≥ −c for a nonnegative constant c. Then for any smooth function f on M , Z ecs − 1 k∇f k2νs,z + kf k2νs,z log kf kνs,z . f 2 log |f |dνs,z ≤ c M Proof. We may assume that f is strictly p greater than a fixed positive constant on M . Otherwise consider the function f = f 2 + 2 and let → 0. Let g = f 2 and consider the function Hr = Pr φ(Ps−r g), where φ(t) = 2−1 t log t. Differentiating with respect to r and noting that 1 commutes with Pr we have 1 dHr 1 = Pr 1φ(Ps−r g) − Pr {φ0 (Ps−r g)1Ps−r g} dr 2 2 o 1 n 0 2 = Pr φ (Ps−r g)1Ps−r g + φ00 (Ps−r g) |∇Ps−r g| 2 1 − Pr {φ0 (Ps−r g)1Ps−r g} 2 o 1 n 00 2 = Pr φ (Ps−r g) |∇Ps−r g| . 2
14
E.P. Hsu
Now using Corollary 2.2 and then the inequality 2
{Ps−r |∇g|} ≤ 4Ps−r gPs−r |∇f |2 , we have
( 2 ) Ps−r |∇g| dHr 1 c(s−r) ≤ e Pr dr 4 Ps−r g c(s−r) Pr Ps−r |∇f |2 ≤e = ec(s−r) Ps |∇f |2 .
Integrating over r from 0 to s we obtain the desired inequality.
4. Proof of the Main Theorem We are in a position to prove our main result Theorem 1.1. We divide the proof into two steps, Lemma 4.1 and Lemma 4.3 below. Lemma 4.1. Let M be a Riemannian manifold such that RicM ≥ −c for a nonnegative constant c. Suppose that F is a cylindrical F given by (1.2). Define {φs0 ,s , s ≥ s0 } by 1 d φs ,s = − φs0 ,s RicUs , ds 0 2
φs0 ,s0 = I.
(4.1)
Then Z
2 X l 2 c −1 (j) F log |F |dν ≤ e (si − si−1 )E φsi ,sj Usj ∇ F Wo (M ) j=i i=1 l X
+kF k2 log kF k. Proof. For the sake of simplicity we assume that F (γ) = f (γs1 , γs2 , γs3 ). Using the Markov property and Theorem 3.1 we have kF k2 log kF k (4.2) 1 = EEγs1 f (γ0 , γs2 −s1 , γs3 −s1 )2 log EEγs1 f (γ0 , γs2 −s1 , γs3 −s1 )2 2 1 = Ef1 (γs1 )2 log Ef1 (γs1 )2 2 ecs1 − 1 E|∇f1 (γs1 )|2 + Ef1 (γs1 )2 log |f1 (γs1 )|, ≥− c p where f1 (x) = Ex f (γ0 , γs2 −s1 )2 . Let g(x) = Ex f (γ0 , γs2 −s1 , γs3 −s1 )2 . We have |∇f1 |2 =
|∇g|2 . 4g
Logarithmic Sobolev Inequalities
15
Now compute ∇g by Proposition 3.1 and use the Cauchy-Schwarz inequality. We have 2 ∇(1) F + φs1 ,s2 Us−1 ∇(2) F + φs1 ,s3 Us−1 ∇(3) F . E|∇f1 (γs1 )|2 ≤ E Us−1 1 2 3 Using this and the inequality (ecs − 1)/c ≤ sec for c ≥ 0 and 0 ≤ s ≤ 1 in (4.2) we have kF k2 log kF k 2 ≥ −ec s1 E Us−1 ∇(1) F + φs1 ,s2 Us−1 ∇(2) F + φs1 ,s3 Us−1 ∇(3) F 1 2 3
(4.3)
+Ef1 (γs1 )2 log |f1 (γs1 )|. Repeating the same calculation for the last term f1 (x)2 log |f1 (x)| =
1 Ex f (γ0 , γs2 −s1 , γs3 −s1 )2 log Ex f (γ0 , γs2 −s1 , γs3 −s1 )2 , 2
we have Ef1 (γs1 )2 log |f1 (γs1 )| 2 ≥ −ec (s2 − s1 )E Us−1 ∇(2) F + φs2 ,s3 Us−1 ∇(3) F 2 3
(4.4)
+Ef2 (γs1 , γs2 )2 log |f2 (γs1 , γs2 ), where f2 (x, y) =
p Ex Ey f (x, y, γs3 −s2 )2 . We have for the same reason Ef2 (γs1 , γs2 )2 log |f2 (γs1 , γs2 )| 2 ≥ −ec (s3 − s2 )E Us−1 ∇(3) F + E F 2 log |F | . 3
The desired inequality follows immediately from (4.3)–(4.5).
(4.5)
Remark 4.2. The above proof is reminiscent of the Federbush–Faris–Gross addivity property of logarithmic Sobolev inequalities for product measure spaces (see Gross [7]). The independence property needed in the original proof is replaced by the weaker property of Markov dependence. The same idea appeared in the works of Stroock and Zegarlinski [9, 10], especially p. 118 in the second article. Lemma 4.3. Suppose that M is a Riemannian manifold such that |RicM | ≤ c. Then 2 X l (j) 2c 2 (si − si−1 ) φsi ,sj Us−1 ∇ F j ≤ e |DF |H . j=i i=1
l X
Proof. Let Zi = we have
Pl
j=i
(4.6)
Us−1 ∇(j) F. From (4.1) and the assumption on the Ricci curvature j
kφsi ,sj
c − φsi ,sj−1 k ≤ 2
Z
sj
ec(s−si )/2 ds.
sj−1
Note that this is the only place we have to use the absolute bound of the Ricci curvature instead of just the lower bound. Now we have
16
E.P. Hsu
2 2 X l X l −1 (i) φsi ,sj Usj ∇ F = Zi + φsi ,sj − φsi ,sj−1 Zj j=i j=i+1 2 Z sj l c X ≤ |Zi | + |Zj | ec(τ −si )/2 dτ 2 sj−1 j=i+1 2 Z 1 c2 1 c(τ −si ) 2 ≤ (1 + λ) |Zi | + 1 + e gτ dτ , λ 4 si where gs = |Zj | if s ∈ [sj−1 , sj ). It follows that the left-hand side of (4.6) 2 Z Z Z 1 1 c2 1 1 c(τ −s)/2 2 gs ds + 1 + e gτ dτ ds ≤ (1 + λ) λ 4 0 0 s Z 1 1 1 (cec − ec + 1) 1+ gs2 ds. ≤ 1+λ+ 4 λ 0 R1 2 2 Note that |DF |H = 0 gs ds by (1.4). We complete the proof by using the inequality cec − ec + 1 ≤ c2 ec and choosing λ = (c/2)ec/2 . Acknowledgement. The author thanks Dominique Bakry for his proof of Theorem 3.1. Thanks are also due to Bruce Driver, Jingyi Chen, Mike Cranston, Len Gross, and Dan Stroock for helpful discussions at various stages of this work. The results presented here were announced in [8]. After the present work was completed, the author was informed that Aida and Elworthy [1] obtained an independent proof a logarithmic Sobolev inequality by embedding the manifold into a euclidean space. Our result shows that the bounding constant depends only on the Ricci curvature.
References 1. Aida, S. and Elworthy, K. D.: Differential Calculus on path and loop spaces, I. Logarthmic Sobolev inequalities on path spaces. C. R. Acad. Sci. (Paris), 321, Seri´e I, 97–102 (1995) 2. Bismut, J.-M.: Large Deviations and Malliavin Calculus. New York: Birkh¨auser, 1984 3. Donnelly, H. and Li, P.: Lower bounds for the eigenvalues of Riemannian manifolds. Michigan Math. J. 29, 149–161 (1982) 4. Fang, S.: Un in´equalit´e du type Poincar´e sur un espace de chemins. C. R. Acad. Sci. (Paris), 318, Seri´e I, 257–260 (1994) 5. Gross, L.: Logarithmic Sobolev inequalities. Am. J. of Math. 97, 1061–1083 (1975) 6. Gross, L.: Logarithmic Sobolev inequalities on Lie groups. Ill. J. Math. 36, 447–490 (1992) 7. Gross, L.: Lecture Notes in Mathematics, No. 1563. New York: Springer-Verlag, (1993), pp. 54–82 8. Hsu, E. P.: In´egalit´e de Sobolev logarithmiques sur un espace de chemins. C. R. Acad. Sci. (Paris), 320, Seri´e I, 1009–1012 (1995) 9. Stroock, D. W. and Zergalinski, B.: The equivalence of the logarithmic Sobolev inequality and the Dobrushin–Shlosman mixing condition. Commun. Math. Phys. 144, 303–323 (1992) 10. Stroock, D. W. and Zergalinski, B.: The logarithmic Sobolev inequality for discrete spin systems on a lattice. Commun. Math. Phys. 149, 175–193 (1992) Communicated by A. Jaffe
Commun. Math. Phys. 189, 17 – 33 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
On the Formulation of the Feynman Path Integral Through Broken Line Paths Wataru Ichinose? Section of Applied Mathematics, Department of Computer Science, Ehime University, Matsuyama 790, Japan. E-mail:
[email protected] Received: 21 August 1996 / Accepted: 13 February 1997
Abstract: We study the formulation of the Feynman path integral through broken line paths in non-relativistic quantum mechanics. This formulation is very familiar to us and well known to be useful. But its rigorous meaning is given little except for special cases. In the present paper, using the ideas in the theory of difference methods and the theory of pseudo-differential operators, we show rigorously for some class of potentials that this formulation is well defined and that this Feynman path integral gives the probability amplitude, i.e., the solution of the Schr¨odinger equation. 1. Introduction Consider some charged particles in an electromagnetic field. For the sake of simplicity we suppose charge = one and mass = m > 0. Let V (t, x) ∈ R, A(t, x) = (A1 , · · · , Ad ) ∈ Rd (x ∈ Rd , t ∈ [0, T ]) be the electromagnetic potentials, which are defined from Ej = − d(
d X
∂V ∂Aj − (j = 1, · · · , d), ∂t ∂xj
Aj dxj ) =
j=1
X
(1.1) Bjk dxj ∧ dxk
on Rd
1≤j
for the electric strength E(t, x) = (E1 , · · · , Ed ) and the magnetic strength tensor (Bjk (t, x))1≤j
m 2 |x| ˙ + (x˙ · A − V ), x˙ ∈ Rd , 2
(1.2)
? Research partially supported by Grant-in-Aid for Scientific No.08640211, Ministry of Education, Science, and Culture, Japanese Government.
18
W. Ichinose
1 X H(t) = (~Dxj − Aj )2 + V 2m d
(Dxj =
j=1
∂ ), i∂xj
(1.3)
and
∂ u(t) = H(t)u(t) (t ∈ [s, T ]), u(s) = f (1.4) ∂t respectively. We sometimes write the solution u(t) of (1.4) as U (t, s)f . For a multiPd index α = (α1 , · · · , αd ) we write ∂xα = (∂/∂x1 )α1 · · · (∂/∂xd )αd and |α| = j=1 αj . Throughout the present paper we assume that ∂xα Aj , ∂xα ∂t Aj , ∂xα V , and ∂xα ∂t V are continuous in (t, x) ∈ [0, T ] × Rd for any α. Denote by L2 = L2 (Rd ) the space of all square integrable functions on Rd with inner product (·, ·) and norm k · k. Let (Rd )[s,t] be the space of all paths γ : [s, t] → Rd and 0x the subspace of (Rd )[s,t] Rt with γ(t) = x. We denote by S(γ) the classical action s L(θ, γ(θ), γ(θ))dθ ˙ along the path γ ∈ (Rd )[s,t] . Feynman in [5, 6] expressed the solution of (1.4) in the integral form, which is called the Feynman path integral, Z −1 1 ei~ S(γ) f (γ(s))µ(dγ), (1.5) U (t, s)f = N 0x i~
where µ(dγ) is a uniform measure on (Rd )[s,t] and N is a normalization factor. Since then, much work has been devoted by physicists and mathematicians to give the rigorous meaning of the Feynman path integral. In [1, 2, 12, 13, 18], etc., equations with the potentials V =
d X
ajk xj xk +
j=1
j,k=1
Al =
d X
d X
clj xj + dl
Z bj x j +
eix·y ν(dy), Rd
(l = 1, · · · , d)
j=1
were studied, where ajk , bj , clj , and dl are constants and ν(dy) is a complex measure of bounded variation on Rd . See also [4]. On the other hand Fujiwara in [7, 8] gave the rigorous meaning of the Feynman path integral for a class of potentials, adopting the formulation through piecewise classical paths. In [19] this result was generalized for a wide class of potentials. Let 1 : 0 = t0 < t1 < · · · < tn = t be an arbitrary subdivision of the interval [0, t] and put |1| = max1≤j≤n (tj − tj−1 ). Let x(j) ∈ Rd (j = 0, 1, · · · , n − 1) and denote by γc1 = γc1 (x(0) , x(1) , · · · , x(n−1) , x) ∈ (Rd )[0,t] the piecewise classical path joining (tj , x(j) ) (j = 0, 1, · · · , n, x(n) = x). Set Z n r dZ Y −1 m ··· ei~ S(γc1 ) f (x(0) )dx(0) dx(1) · · · dx(n−1) , Cc (1)f = 2πi~(tj − tj−1 ) d d R R j=1
where
√
i = eiπ/4 . Assume |∂xα Bjk (t, x)| ≤ Cα < x >−(1+δ) , |α| ≥ 1, |∂xα Aj (t, x)| + |∂xα ∂t Aj (t, x)| ≤ Cα , |α| ≥ 1, |∂xα V (t, x)| ≤ Cα , |α| ≥ 2, (t, x) ∈ [0, T ] × Rd
Feynman Path Integral Through Broken Line Paths
19
p with constants Cα for a constant δ > 0. Here < x >= 1 + |x|2 . Then they proved that Cc (1) converges to U (t, 0) in the topology of the operator norm on L2 (Rd ) as |1| tends to zero, applying the WKB method and the theory of oscillatory integral transformations to Cc (1). See also [9, 10]. In the present paper we study the formulation of the Feynman path integral through broken line paths. This formulation is very familiar to us and well known to be useful. We show rigorously for some class of potentials that this formulation is well defined and that this Feynman path integral gives the probability amplitude, i.e., the solution of the Schr¨odinger equation. Let 1 be the subdivision of [0, t] above and x(j) ∈ Rd (j = 0, 1, · · · , n − 1). We denote by γ1 = γ1 (x(0) , x(1) , · · · , x(n−1) , x) ∈ (Rd )[0,t] the broken line path joining (tj , x(j) ) (j = 0, 1, · · · , n, x(n) = x). Set Z n r dZ Y −1 m ··· ei~ S(γ1 ) f (x(0) )dx(0) dx(1) · · · dx(n−1) . C(1)f = 2πi~(tj − tj−1 ) d d R R j=1
(1.6) Then we obtain the theorem below. Theorem. We assume the following. There exist constants δ > 0 and Cα such that |∂xα Aj (t, x)| ≤ Cα < x >−(1+δ) , |α| ≥ 2, |∂xα ∂t Aj (t, x)| ≤ Cα , |α| ≥ 1,
(1.7) (1.8)
|∂xα V (t, x)| ≤ Cα , |α| ≥ 2, (t, x) ∈ [0, T ] × Rd .
(1.9)
In addition, there exist constants υ ≥ 0 and Cα0 such that |∂xα ∂t V (t, x)| ≤ Cα0 < x >υ , (t, x) ∈ [0, T ] × Rd for any α. Let f ∈ L2 (Rd ). Then as |1| tends to zero, C(1)f converges to the solution U (t, 0)f of the Schr¨odinger equation in the topology of L2 (Rd ) uniformly in t ∈ [0, T ]. Our method of proving the Theorem is new and easier than others (cf. Remark 2.1 in the present paper). We use the idea in the theory of difference methods (cf. [15, 16]). t,s ∈ (Rd )[s,t] by We define γx,y t,s γx,y =y+
θ−s (x − y) t−s
(s ≤ θ ≤ t)
for 0 ≤ s < t ≤ T and set p dR t,s m/(2πi~(t − s)) ))f (y)dy, exp(i~−1 S(γx,y C(t, s)f = f,
(1.10)
s < t, s = t.
(1.11)
Then we can write C(1) = C(t, tn−1 )C(tn−1 , tn−2 ) · · · C(t1 , 0).
(1.12)
Let S be the space of rapidly decreasing functions on Rd with semi-norms |f |k = max|l|+|α|≤k supx {< x >l |∂xα f (x)|} (k = 0, 1, · · ·). First we will prove the consistency of C(t, s), that is, C(s + ρ, s)f − f − H(s)f k = 0 ρ uniformly in s ∈ [0, T ], f ∈ {g ∈ S; |g|λ1 ≤ M1 }
lim ki~
ρ→+0
(1.13)
20
W. Ichinose
for constants λ1 , M1 . Secondly we will prove the stability of C(t, s), that is, there exist constants ρ∗ > 0 and K ≥ 0 independent of t, s such that kC(t, s)f k ≤ (1 + K(t − s))kf k,
0 ≤ t − s ≤ ρ∗
(1.14)
for all f ∈ L2 (Rd ). To prove this we use the theory of pseudo-differential operators. Then combining the results on the existence and the regularity of the solution of (1.4) in [11], the theorem can be easily proved. The outline of the present paper is as follows. We will prove the consistency (1.13) in Sect. 2. The stability (1.14) will be proved in Sect. 3. In Sect. 4 the theorem will be proved. 2. Consistency We write x = (t, x) ∈ Rd+1 . As is done in the theory of relativity, we set A = (−V, A) ∈ Rd+1 and
t,s t,s d+1 γt,s x,y : γx,y (θ) = (θ, γx,y (θ)) ∈ R
(2.1) (s ≤ θ ≤ t).
Then we have from (1.2) and (1.10), Z m|x − y|2 t,s + )= A · dx S(γx,y 2(t − s) γt,s x,y Z 1 m|x − y|2 = + (x − y) · A(s + θ(t − s), y + θ(x − y))dθ 2(t − s) 0 Z 1 V (s + θ(t − s), y + θ(x − y))dθ. − (t − s)
(2.2)
(2.3)
0
Let M ≥ 0 and p(x, w) be a C ∞ function on R2d satisfying α β |∂w ∂x p(x, w)| ≤ Cα,β < x; w >M , x, w ∈ Rd p for all α, β, where < x; w >= 1 + |x|2 + |w|2 . We define
r P (t, s)f =
m 2πi~(t − s)
d
Z ei~
−1
t,s S(γx,y )
x−y )f (y)dy p(x, √ t−s
(2.4)
(2.5)
√ for 0 ≤ s < t ≤ T and f ∈ S. Making the change of variables by w = (x − y)/ t − s in (2.5), we have from (2.3) r dZ −1 m √ ei~ φ(t,s;x,w) p(x, w)f (x − ρw)dw, (2.6) P (t, s)f = 2πi~ Z 1 m √ √ A(s + θρ, x − (1 − θ) ρw)dθ φ(t, s; x, w) = |w|2 + ρw · 2 0 Z 1 √ V (s + θρ, x − (1 − θ) ρw)dθ, ρ = t − s. (2.7) −ρ 0
Feynman Path Integral Through Broken Line Paths
We set
r P (s, s)f =
m 2πi~
d
21
Z ei~
−1
m|w|2 /2
p(x, w)dwf (x),
(2.8)
where the right-hand side above is the oscillatory integral (cf. [14]). Notice that if p = 1, P (t, s) = C(t, s), holds. For we know
R
ei~
−1
m|w|2 /2
dw =
0≤s≤t≤T
p d 2πi~/m .
Lemma 2.1. Assume (1.7) and (1.9). Then we have : (i) α β |∂w ∂x φ(t, s; x, w)| ≤ Cα,β , |α + β| ≥ 2, 0 ≤ s ≤ t ≤ T, x, w ∈ Rd .
(2.9)
(ii) There exist constants ρ0 > 0 and κ > 0 such that inf
0≤t−s≤ρ0 ,x,w
det
∂2φ (t, s; x, w) ≥ κ, ∂w2
(2.10)
where ∂ 2 φ/∂w2 is the Hessian in w. Proof. Let |α| ≥ 1. Then we have from (1.7), Z
|x| dθ < θx >1+δ 0 Z ∞ 1 dθ < ∞, ≤ Const. < θ >1+δ 0 1
|∂xα Aj (t, x) − ∂xα Aj (t, 0)| ≤ Const.
and hence
|∂xα Aj (t, x)| ≤ Cα0 , |α| ≥ 1, (t, x) ∈ [0, T ] × Rd .
(2.11)
In the same way we have for |α| ≥ 2, √
Z
1
| ρw · 0
√ (∂xα A)(s + θρ, x − (1 − θ) ρw)dθ|
Z
√ ρ|w| dθ √ 1+δ 0 < x − θ ρw > Z ∞ w 1 ) dθ ( = ≤ Const. < x − θ >1+δ |w| 0 Z ∞ 1 dθ = Cα " < ∞, 0 ≤ s ≤ t ≤ T, x ∈ Rd , ≤ Const. 1+δ −∞ < θ > 1
≤ Const.
(2.12)
where we used |x−θ| ≥ |θ−x·|. The inequality (2.9) can be shown from assumptions (1.9), (2.7), (2.11), and (2.12). So can (2.10), because we have ∂ 2 φ/∂w2 = (m/2)Id + O(t − s). Id is the identity matrix. Proposition 2.2. Assume (1.7) and (1.9) on V and A. We suppose (2.4) on p(x, w). Then there exists a constant 0 < ρ1 ≤ ρ0 such that the mapping : S 3 f → P (t, s)f ∈ S is continuous uniformly in t, s with 0 ≤ t − s ≤ ρ1 .
22
W. Ichinose
Proof. We first use the arguments in the proof of Lemma 2.1 in [3]. Let 0 ≤ t − s ≤ ρ0 . Then it follows from Lemma 2.1 in the present paper that each component of (∂ 2 φ/∂w2 )−1 is bounded in w ∈ Rd . So applying Theorem 1.22 in [17], we see that the mapping : Rd 3 w → ζ = ∂w φ(t, s; x, w) = (∂w1 φ, · · · , ∂wd φ) ∈ Rd is homeomorphic onto Rd . Let us write the inverse mapping as Rd 3 ζ → w = w(t, s; x, ζ) ∈ Rd . Denote by ∂ζ w the Jacobi matrix of w(t, s; x, ζ) in ζ. We have ∂ζ w = (∂ 2 φ/∂w2 )−1 , and so get the following from Lemma 2.1. There exists a constant C > 0 independent of t, s, x, and ζ such that |w(t, s; x, ζ) − w(t, s; x, ζ 0 )| ≤ C|ζ − ζ 0 |, and hence
|w − w0 | ≤ C|∂w φ(t, s; x, w) − ∂w φ(t, s; x, w0 )|.
(2.13)
Let 0 ≤ t − s ≤ ρ0 . Since the mapping : R 3 w → ζ = ∂w φ(t, s; x, w) ∈ ¯ s, x) = (w¯ 1 , · · · , w¯ d ) ∈ Rd of Rd is homeomorphic onto Rd , the solution w(t, ∂w φ(t, s; x, w) = 0 is uniquely determined. We can easily have from Lemma 2.1, d
|∂xα w¯ j (t, s, x)| ≤ Cα , |α| ≥ 1, 0 ≤ t − s ≤ ρ0 , x ∈ Rd
(2.14)
for all j. So there exists a constant 0 < ρ1 ≤ ρ0 satisfying √ 1 < det(I − t − s∂x w(t, ¯ s, x)) < 2, 0 ≤ t − s ≤ ρ1 , x ∈ Rd . 2
(2.15)
We also have from (2.13), (2.16) |w − w(t, ¯ s, x)| ≤ C|∂w φ(t, s; x, w)|. P Let 0 ≤ t − s ≤ ρ1 , f ∈ S, L =< ∂w φ(t, s; x, w) >−2 (1 − i~ j ∂wj φ(t, s; x, w)∂wj ), and t L the transposed operator of L. Then we can write (2.6) as r dZ −1 m √ P (t, s)f = ei~ φ (t L)l {p(x, w)f (x − ρw)}dw 2πi~ for l = 0, 1, · · ·. Using Lemma 2.1, (2.14), and (2.16), we get for N = 0, 1, · · ·, |P (t, s)f | Z √ < w − w(t, ¯ s, x) >−l < x; w >M < x − ρw >−N dw ≤ Cl,N Z √ < w >−l < x; w + w¯ >M < x − ρ(w + w) ¯ >−N dw = Cl,N Z √ √ 0 < w >−l < x >M < w >M < x − ρw¯ − ρw >−N dw. ≤ Cl,N √ √ √ The inequality < x >≤ Const. < x − ρw(t, ¯ s, x) >≤ Const. < x − ρw¯ − ρw > √ × < ρw > follows from (2.14) and (2.15). Consequently we have Z √ √ < w >−l+2M < x − ρw¯ − ρw >−N +M dw. |P (t, s)f | ≤ C"l,N Take l > 2M + d and N > M + d/2. Using the Schwarz inequality and (2.15), we obtain Z Z 2 0 −l+2M 2 < x >−2(N −M ) dx < ∞ kP (t, s)f k ≤ C ( < w > dw) (2.17)
Feynman Path Integral Through Broken Line Paths
23
with a constant C 0 . In the same way we can show ∂xα {< x >k P (t, s)f } ∈ L2 from Lemma 2.1 for all k and α. Hence we can complete the proof by the Sobolev inequality. Proposition 2.3. Let V and A satisfy the assumptions of the theorem. Then there exists a continuous function q(t, s; x, w) in 0 ≤ s ≤ t ≤ T, x, w ∈ Rd satisfying (2.4) with an M so that for f ∈ S r d √ ∂ m {i~ − H(t)}C(t, s)f = t − s ∂t 2πi~(t − s) (2.18) Z −1 t,s x −y i~ S(γx,y ) )f (y)dy, 0 ≤ s < t ≤ T. × e q(t, s; x, √ t−s Proof. By direct calculations we have from (1.3), r d ∂ m {i~ − H(t)}C(t, s)f = − ∂t 2πi~(t − s) Z −1 t,s i~ r2 (t, s; x, y)}f (y)dy, × ei~ S(γx,y ) {r1 (t, s; x, y) + 2m 1 X t,s {∂xj S(γx,y ) − Aj (t, x)}2 + V (t, x), 2m
(2.19)
d
t,s )+ r1 = ∂t S(γx,y
(2.20)
j=1
r2 = P
dm t,s − 1x S(γx,y ) + (∇ · A)(t, x), t−s
(2.21)
where ∇ · A = j ∂xj Aj . Set ρ = t − s. Using (2.3), we have t,s ) − Aj (t, x) ∂xj S(γx,y Z 1 X m(xj − yj ) = + Aj (s + θρ, y + θ(x − y)) − Aj (t, x)dθ + (xk − yk ) ρ 0 k Z 1 Z 1 ∂Ak ∂V θ (s + θρ, y + θ(x − y))dθ − ρ θ (s + θρ, y + θ(x − y))dθ × ∂xj ∂xj 0 0 Z 1 m(xj − yj ) + Aj (t − θρ, x − θ(x − y)) − Aj (t, x)dθ = ρ 0 Z 1 X ∂Ak (xk − yk ) (1 − θ) (t − θρ, x − θ(x − y))dθ + ∂xj 0 k Z 1 ∂V (1 − θ) (t − θρ, x − θ(x − y))dθ, −ρ ∂xj 0
and so by the Taylor expansion t,s ) − Aj (t, x) = ∂xj S(γx,y
m(xj − yj ) 1 X ∂Aj − (t, x)(xl − yl ) ρ 2 ∂xl l
x−y 1 X ∂Ak (t, x)(xk − yk ) + ρq1 (t, s; x, √ ). + 2 ∂xj ρ k
(2.22)
24
W. Ichinose
It follows from − that
P
j,l (∂xl Aj )(xj
− yj )(xl − yl ) +
P
j,k (∂xj Ak )(xj
− yj )(xk − yk ) = 0
m|x − y|2 √ x−y 1 X t,s {∂xj S(γx,y ) − Aj (t, x)}2 = + ρq2 (t, s; x, √ ). 2m 2ρ2 ρ d
(2.23)
j=1
The same arguments show that t,s )=− ∂t S(γx,y
t,s )= 1x S(γx,y
m|x − y|2 x−y √ − V (t, x) + ρq3 (t, s; x, √ ), 2 2ρ ρ
(2.24)
dm x−y √ + (∇ · A)(t, x) + ρq4 (t, s; x, √ ). ρ ρ
(2.25)
Inserting (2.23)–(2.25) into (2.19)–(2.21), we can complete the proof.
t,s t,s Remark 2.1. By replacing γx,y in (1.11) with the classical path γc,x,y joining (s, y) and (t, x) we define the operator Cc (t, s). In [7, 8, 19] applying the WKB method, they showed the result below. There exists a qc (t, s; x, y) satisfying (2.4) with M = 0 so that for f ∈ S,
∂ − H(t)}Cc (t, s)f ∂t r m = (t − s) 2πi~(t − s)
{i~
d
Z ei~
−1
t,s S(γc,x,y )
qc (t, s; x, y)f (y)dy.
This result is essential in their proof for the formulation through piecewise classical paths. In our formulation through broken line paths such a result does not hold. In fact we get the following as an example. Let d = 3, V = 0, and A = (x2 , 0, 0). Then we have t,s ) = m|x − y|2 /(2(t − s)) + (x1 − y1 )(x2 + y2 )/2 from (2.3). Hence it follows from S(γx,y (2.19)–(2.21) that ∂ − H(t)}C(t, s)f ∂t r dZ −1 t,s m 1 =− {(x1 − y1 )2 + (x2 − y2 )2 }f (y)dy. ei~ S(γx,y ) 2πi~(t − s) 8m
{i~
The theorem below follows from Propositions 2.2 and 2.3. Theorem 2.4. Let V and A satisfy the assumptions of the Theorem. Then there exist a λ1 and an M1 so that (1.13) holds. 3. Stability We assume (1.7) and (1.9) on V and A. Then Proposition 2.2 holds. Let 0 ≤ t − s ≤ ρ1 . The formal adjoint operator C(t, s)∗ for C(t, s), defined by (C(t, s)∗ f, g) = (f, C(t, s)g) for f, g ∈ S, is written as p dR t,s im/(2π~(t − s)) ))f (y)dy, s < t , exp(−i~−1 S(γy,x (3.1) C(t, s)∗ f = f, s =t.
Feynman Path Integral Through Broken Line Paths
25
By the same arguments as in the proof of Proposition 2.2 we can prove that the mapping : S 3 f → C(t, s)∗ f ∈ S is continuous uniformly in 0 ≤ t − s ≤ ρ1 . Here if necessary, we exchange ρ1 > 0. Consequently the continuous mapping C(t, s) on S can be extended to that on S 0 by (C(t, s)f, g) = (f, C(t, s)∗ g) for f ∈ S 0 , g ∈ S. So can C(t, s)∗ . S 0 is the dual space of S. Take an infinitely differentiable function χ(x) on Rd with compact support such that χ(x) = 1 on |x| ≤ 1. Let 0 < t − s ≤ ρ1 and f ∈ S. Then C(t, s)f ∈ S and C(t, s)∗ is continuous on S. Since lim→+0 χ(·)C(t, s)f = C(t, s)f in S, we can write C(t, s)∗ C(t, s)f = lim C(t, s)∗ χ(·)C(t, s)f →+0 (3.2) Z Z d m −1 t,s t,s f (y)dy χ(z) exp{−i~ (S(γz,x ) − S(γz,y ))}dz. = lim →+0 2π~(t − s) We have by (2.3), t,s t,s ) − S(γz,y ) S(γz,x
m|z − y|2 m|z − x|2 − + = 2(t − s) 2(t − s)
Z
Z γt,s z,x
A · dx −
γt,s z,y
A · dx.
s,s d+1 Consequently letting γs,s (0 ≤ θ ≤ 1), we get by x,y : γx,y (θ) = (s, y + θ(x − y)) ∈ R the Stokes theorem, t,s t,s ) − S(γz,y ) S(γz,x Z ZZ x + y m z− − A · dx − d(A · dx) = −(x − y) · t−s 2 γs,s ∆ x,y Z 1 m x + y = −(x − y) · z− − (x − y) · A(s, y + θ(x − y))dθ t−s 2 0 ZZ − d(A · dx),
(3.3)
∆
where ∆ = ∆(t, s, x, y, z) is the 2-dimensional plane with oriented boundary consisting t,s t,s of −γs,s x,y , γz,y , and −γz,x . The following is well known. We can prove it from (1.1) and (2.1). Lemma 3.1. We have d(A · dx) = −
d X j=1
Ej (t, x)dt ∧ dxj +
X
Bjk dxj ∧ dxk on [0, T ] × Rd .
1≤j
Let x 6= y. We introduce coordinates σ = (σ1 , σ2 ) with 0 ≤ σ1 , σ2 ≤ 1 in ∆ = ∆(t, s, x, y, z) by (τ (σ), ζ(σ)) = (1 − σ2 ){(1 − σ1 )(t, z) + σ1 (s, x)} + σ2 {(1 − σ1 )(t, z) + σ1 (s, y)} = (t − σ1 (t − s), z + σ1 (x − z) + σ1 σ2 (y − x)) ∈ R It is clear that σ gives the positive orientation of ∆.
d+1
.
(3.4)
26
W. Ichinose
Lemma 3.2. Set Bjk = −Bkj for 1 ≤ k < j ≤ d and Bjj = 0 for j = 1, 2, · · · , d. Then we have ZZ d(A · dx) = (x − y) · (91 , · · · , 9d ), (3.5) ∆
Z
1
Z
1
9j = −(t − s)
σ1 Ej (τ (σ), ζ(σ))dσ1 dσ2 0
−
0 d X
Z
1
Z
(zk − xk )
σ1 Bjk (τ (σ), ζ(σ))dσ1 dσ2 . 0
k=1
Proof. We have by (3.4), ZZ Z Ej dt ∧ dxj = ∆
Z
1
(3.6)
1 0
1
Ej (τ (σ), ζ(σ)) det 0
0
Z
1
Z
1
= (t − s)(xj − yj ) Z
1
Z
(3.7)
σ1 Ej dσ1 dσ2 , 0
ZZ
∂(τ, ζj ) dσ1 dσ2 ∂(σ1 , σ2 )
0
1
∂(ζj , ζk ) dσ1 dσ2 ∂(σ1 , σ2 ) ∆ 0 0 Z 1Z 1 = −{(xk − yk )(xj − zj ) − (xj − yj )(xk − zk )} σ1 Bjk dσ1 dσ2 , Bjk dxj ∧ dxk =
Bjk det
0
(3.8)
0
and hence from Lemma 3.1, ZZ d(A · dx) ∆
= −(t − s)
X
Z
−
Z
1
σ1 Ej dσ1 dσ2 0
j
X
1
(xj − yj ) 0
Z
d X j=1
+
Z
1
{(xk − yk )(xj − zj ) − (xj − yj )(xk − zk )}
d X
σ1 Bjk dσ1 dσ2 0
1≤j
= −(t − s)
1
Z
1
Z
0
1
(xj − yj )
σ1 Ej dσ1 dσ2 0
0
Z
1
Z
1
(xj − yj )(xk − zk )
j,k=1
Thus Lemma 3.2 could be proved.
σ1 Bjk dσ1 dσ2 . 0
0
We can easily prove the following from (3.2), (3.3), and Lemma 3.2. Proposition 3.3. Assume (1.7) and (1.9). Let f ∈ S. Then we have for 0 < t − s ≤ ρ1 , C(t, s)∗ C(t, s)f Z Z d m8 m f (y)dy χ(z) exp i(x − y) · dz, = lim →+0 2π~(t − s) ~(t − s) 8 = 8(t, s; x, y, z) = (81 , · · · , 8d ),
(3.9)
Feynman Path Integral Through Broken Line Paths
8j = zj −
xj + yj t − s + 2 m
27
Z
1
Aj (s, x + θ(y − x))dθ 0
Z 1Z 1 d t−sX − (zk − xk ) σ1 Bjk (τ (σ), ζ(σ))dσ1 dσ2 m 0 0 k=1 Z Z (t − s)2 1 1 − σ1 Ej (τ (σ), ζ(σ))dσ1 dσ2 . m 0 0
(3.10)
The lemma below follows from (1.1). Lemma 3.4. Assume (1.7)–(1.9). Then we get |∂xα Ej (t, x)| ≤ Cα , |α| ≥ 1, |∂xα Bjk (t, x)| ≤ Cα < x >−(1+δ) , |α| ≥ 1, (t, x) ∈ [0, T ] × Rd .
(3.11) (3.12)
Lemma 3.5. Let f ∈ C ∞ (Rd ) and |∂xα f | ≤ Cα < x >−(1+δ) , |α| ≥ 1. Then we have : (i) f is a bounded function on Rd . (ii) Z 1Z 1 α β γ |x − z||∂x ∂y ∂z σ1 f (z + σ1 (x − z) + σ1 σ2 (y − x))dσ1 dσ2 | 0
≤ Cα,β,γ ,
0
|α + β + γ| ≥ 1, x, y, z ∈ Rd .
Proof. (i) follows from Z
|x| dθ < θx >1+δ Z0 ∞ 1 dθ < ∞. ≤C < θ >1+δ 0 1
|f (x) − f (0)| ≤ C
We have from the assumption Z 1Z 1 ∂ |x − z|| σ1 f (z + σ1 (x − z) + σ1 σ2 (y − x))dσ1 dσ2 | ∂zj 0 0 Z 1Z 1 σ1 dσ1 dσ2 ≤ C|x − z| < z + σ (x − z) + σ1 σ2 (y − x) >1+δ 1 0 0 Z 1Z 1 σ1 |ξ| =C dσ1 dσ2 , 1+δ < z + σ ξ 1 + σ1 σ 2 η > 0 0 setting ξ = x − z and η = y − x. Making the change of variables by σ10 = σ1 and σ20 = σ1 σ2 , we get Z 1Z 1 σ1 |ξ| dσ1 dσ2 1+δ < z + σ ξ 1 + σ1 σ 2 η > 0 0 Z 1 Z 1 |ξ| = dσ20 dσ10 0 0 1+δ 0 σ20 < z + σ1 ξ + σ2 η > Z |ξ| Z 1 1 = dσ20 dσ10 ( = ξ/|ξ|). 0 0 1+δ 0 σ20 |ξ| < z + σ1 + σ2 η >
28
W. Ichinose
This is bounded by
Z
1
dσ20
0
Z
∞ −∞
1 dσ 0 < ∞ < σ10 >1+δ 1
R1R1 because of |z+σ10 +σ20 η| ≥ |σ10 +·(z+σ20 η)|. Hence we see that |x−z||∂zj 0 0 σ1 f (z+ 3d σ1 (x − z) + σ1 σ2 (y − x))dσ1 dσ2 | is bounded on Rx,y,z . The general case can be proved in the same way. We write 8 defined by (3.10) as Z x+y t−s 1 + A(s, x + θ(y − x))dθ 2 m 0 t−s 0 (t − s)2 0 − B (t, s; x, y, z) − E (t, s; x, y, z), m m
8(t, s; x, y, z) = z −
(3.13)
where E 0 = (E10 , · · · , Ed0 ), B 0 = (B10 , · · · , Bd0 ). Assume (1.7)–(1.9). Then it follows from Lemmas 3.4 and 3.5 that |∂xα ∂yβ ∂zγ Ej0 | ≤ Cα,β,γ ,
|α + β + γ| ≥ 1,
|∂xα ∂yβ ∂zγ Bj0 |
|α + β + γ| ≥ 1, 0 ≤ s ≤ t ≤ T, x, y, z ∈ R . (3.15)
≤ Cα,β,γ ,
(3.14) d
Note
(t − s)2 ∂E 0 ∂8 t − s ∂B 0 (t, s; x, y, z) = Id − − . (3.16) ∂z m ∂z m ∂z So it follows from (3.14) and (3.15) that Theorem 1.22 in [17] can be applied to the mapping : Rd 3 z → ξ = 8 ∈ Rd . Then there exists a constant 0 < ρ∗ (≤ ρ1 ) such that the mapping above is homeomorphic for each fixed 0 ≤ t − s ≤ ρ∗ , x, and y. We write its inverse mapping as Rd 3 ξ → z = z(t, s; x, y, ξ) ∈ Rd . There we can assume det ∂z/∂ξ > 0 for 0 ≤ t − s ≤ ρ∗ , x, y, and ξ. Lemma 3.6. Assume (1.7)–(1.9). Then we have |∂xα ∂yβ ∂ξγ zj (t, s; x, y, ξ)| ≤ Cα,β,γ , ∗
|α + β + γ| ≥ 1,
0 ≤ t − s ≤ ρ , x, y, ξ ∈ R . d
(3.17)
Proof. It follows from (1.7), Lemma 3.5, and (3.13)–(3.15) that |∂xα ∂yβ ∂zγ 8| ≤ Cα,β,γ , |α + β + γ| ≥ 1, 0 ≤ s ≤ t ≤ T, x, y, z ∈ Rd . Hence we can easily prove Lemma 3.6.
Theorem 3.7. Assume (1.7)–(1.9). Then there exists a constant K ≥ 0 independent of t, s such that (1.14) holds. Proof. Let f ∈ S and 0 < t − s ≤ ρ∗ . We can write from Proposition 3.3, Z d m f (y)dy C(t, s)∗ C(t, s)f = lim →+0 2π~(t − s) Z mξ ∂z × exp i(x − y) · χ(z(t, s; x, y, ξ)) det dξ ~(t − s) ∂ξ because of det ∂z/∂ξ > 0. It follows from (3.14)–(3.17) that
Feynman Path Integral Through Broken Line Paths
∂z = 1 + (t − s)h(t, s; x, y, ξ), ∂ξ |∂xα ∂yβ ∂ξγ h(t, s; x, y, ξ)| ≤ Cα,β,γ for all α, β, γ. det
29
(3.18) (3.19)
Consequently we obtain Z Z 1 d f (y)dy ei(x−y)·η →+0 2π × χ(z(t, s; x, y, ξ)){1 + (t − s)h(t, s; x, y, ξ)}dη, ξ = ~(t − s)η/m.
C(t, s)∗ C(t, s)f = lim
Therefore using Lemma 3.6 and (3.19), we can represent C(t, s)∗ C(t, s) as a pseudodifferential operator C(t, s)∗ C(t, s)f Z Z 1 d f (y)dy Os − ei(x−y)·η {1 + (t − s)h(t, s; x, y, ~(t − s)η/m)}dη = 2π ZZ 1 d Os − ei(x−y)·η h(t, s; x, y, ~(t − s)η/m)f (y)dydη. = f + (t − s) 2π (3.20) R Here Os − (· · ·)dη implies the oscillatory integral (cf. [14]). Apply the Calder´on-Vaillancourt theorem (cf. [14]) to (3.20). Then there exists a constant K 0 ≥ 0 such that kC(t, s)∗ C(t, s)f k ≤ (1 + K 0 (t − s))kf k, 0 ≤ t − s ≤ ρ∗
(3.21)
for all f ∈ S, which shows (1.14) for f ∈ S. Since the operator C(t, s) on S 0 is continuous, (1.14) holds for all f ∈ L2 . Remark 3.1. Let 1 : 0 = t0 < t1 < · · · < tn = t be a subdivision of the interval [0, t] such that |1| ≤ ρ∗ . The stability, i.e., Theorem 3.7 gives kC(t, tn−1 ) · · · C(t1 , 0)f k n X (tj − tj−1 ))kf k = eKt kf k, f ∈ L2 ≤ exp(K
(3.22)
j=1
because of 1 + K(t − s) ≤ eK(t−s) for t − s ≥ 0.
4. Proof of the Theorem and a Remark on the Gauge Transformation denote by p(x, Dx ) the pseudo-differential operator p(x, Dx )f = (2π)−n RWe R i(x−y)·ξ p(x, ξ)f (y)dydξ with symbol p(x, ξ). For a constant a ≥ 0 we define the e weighted Sobolev space B a by B a = {f ∈ L2 ; kf kB a ≡ k < · >a f k + k < · >a fˆk < ∞}Rand denote its dual space by B −a with norm kf kB −a , where fˆ is the Fourier transform e−ix·ξ f (x)dx. The space of all B a -valued j times continuously differentiable functions in t ∈ [s, T ] is denoted by Ctj ([s, T ]; B a ). In [11] we showed the following.
30
W. Ichinose
Proposition 4.1. Assume |∂xα Aj (t, x)| ≤ Cα , |α| ≥ 1, |∂xα V (t, x)| ≤ Cα < x >, |α| ≥ 1, (t, x) ∈ [0, T ] × Rd . Then for any f ∈ B a (−∞ < a < ∞) there exists a unique solution U (t, s)f ∈ Ct0 ([s, T ]; B a ) ∩ Ct1 ([s, T ]; B a−2 ) of (1.4). In addition, there exists a constant Ca (T ) such that (4.1) kU (t, s)f kB a ≤ Ca (T )kf kB a , 0 ≤ s ≤ t ≤ T. In particular, when a = 0, we have kU (t, s)f k = kf k, 0 ≤ s ≤ t ≤ T.
(4.2)
Lemma 4.2. Let V and A satisfy the assumptions of Proposition 4.1. Then we have : (i) U (t, s) : S 3 f → U (t, s)f ∈ S is continuous uniformly in 0 ≤ s ≤ t ≤ T . (ii) There exist constants λ2 and M2 such that U (s + ρ, s)f − f − H(s)f k = 0 ρ uniformly in s ∈ [0, T ], f ∈ {g ∈ S; |g|λ2 ≤ M2 }.
lim ki~
ρ→+0
(4.3)
Proof. Let a ≥ 0. Then there exist a constant µa ≥ 0 and a wa (x, ξ) such that |∂xα ∂ξβ wa (x, ξ)| ≤ Cα,β (< x >a + < ξ >a )−1
(4.4)
for all α and β and wa (x, Dx ) = (µa + < x >a + < Dx >a )−1
on S
(4.5)
(Lemma 2.3 in [11]). Now for an l ≥ 0 and a multi-index α set a = 2 max (l, |α|). Then applying Lemma 2.1 in [11], we have from (4.4) and (4.5), k < x >l ∂xα (µa + < x >a + < Dx >a )−1 f k ≤ Const.kf k, and so k < x >l ∂xα f k ≤ Const.(k < · >a f k + k < · >a fˆk) = Const.kf kB a . Consequently applying the Sobolev inequality, we get the following. For any k = 0, 1, · · · there exist constants k 0 ≥ 0 and Ck,k0 such that |f |k ≤ Ck,k0 kf kB k0 .
(4.6)
Hence we have from (4.1) in Proposition 4.1, |U (t, s)f |k ≤ Ck,k0 kU (t, s)f kB k0 0 ≤ Ck,k 0 kf kB k0 ,
and so (i). Consider (ii). Let f ∈ S. Then we have from Proposition 4.1,
(4.7)
Feynman Path Integral Through Broken Line Paths
31
U (s + ρ, s)f − f − H(s)f ρ ∂U (s + θρ, s)f − H(s)f = i~ ∂t = H(s + θρ) U (s + θρ, s)f − f + H(s + θρ) − H(s) f ∂U = θρH(s + θρ) (s + θ0 θρ, s)f + H(s + θρ) − H(s) f ∂t
i~
for 0 < θ, θ 0 < 1. We can easily see from the assumptions on V and A that applying Lemma 2.5 in [11], we have kH(t)f kB a ≤ Const.kf kB a+2 , 0 ≤ t ≤ T
(4.8)
for any a ∈ R. Consequently we have from (4.1), k
∂U (t, s)f kB 2 ≤ Const.kU (t, s)f kB 4 ∂t ≤ Const.kf kB 4 , 0 ≤ s ≤ t ≤ T,
and hence U (s + ρ, s)f − f − H(s)f k ρ ∂U (s + θ0 θρ, s)f kB 2 + k(H(s + θρ) − H(s))f k ≤ Const. θρk ∂t ≤ Const. ρkf kB 4 + k(H(s + θρ) − H(s))f k , 0 ≤ s ≤ T.
ki~
Therefore we obtain for any M < ∞, U (s + ρ, s)f − f − H(s)f k = 0 ρ uniformly in s ∈ [0, T ], f ∈ {g ∈ B 4 ; kgkB 4 ≤ M},
lim ki~
ρ→+0
and hence (ii).
(4.9)
Let us prove the Theorem. We use the idea in the theory of difference methods (cf. [15, 16]). At first we get the following from Theorem 2.4 and (ii) in Lemma 4.2. Set λ3 = max (λ1 , λ2 ) and M3 = min (M1 , M2 ). Then for any positive constant there exists a ρ∗∗ > 0 such that kC(t, s)f − U (t, s)f k ≤ (t − s)
(4.10)
for 0 ≤ t − s ≤ ρ∗∗ and f ∈ {g ∈ S; |g|λ3 ≤ M3 }. Now take an f ∈ S and fix it. We can write by (1.12), C(1)f − U (t, 0)f = C(t, tn−1 ) · · · C(t1 , 0)f − U (t, tn−1 ) · · · U (t1 , 0)f n X C(t, tn−1 ) · · · C(tj+1 , tj ){C(tj , tj−1 ) − U (tj , tj−1 )}U (tj−1 , 0)f. = j=1
(4.11)
32
W. Ichinose
Let > 0 be an arbitrary constant. Then applying (3.22), (i) in Lemma 4.2, and (4.10) to (4.11), we obtain for sufficiently small |1|, kC(1)f − U (t, 0)f k ≤
n X
exp(K(tj − tj−1 ))(tj − tj−1 )
j=1
≤ e
KT
(4.12)
T.
This completes the proof of the Theorem from (3.22) and (4.2). Thus we prove the Theorem. Remark 4.1. In this remark we study how the operator lim|1|→0 C(1) is changed by the Gauge transformation V 0 = V − ∂t ψ,
A0 = A + (∂x1 ψ, · · · , ∂xd ψ).
(4.13)
Let V and A satisfy the assumptions of the Theorem and define C(1) by (1.6). Let ψ = ψ(t, x) ∈ R be a continuously differentiable function on [0, T ] × Rd and consider the Gauge transformation (4.13). Let us define C(1)0 and C(t, s)0 by (1.6) and (1.11) for V 0 and A0 respectively. We can easily see from (4.13) that Z
Z
0
γt,s x,y
A · dx = Z
Z γt,s x,y
= γt,s x,y
A · dx +
d X γt,s x,y j=0
∂xj ψdxj
A · dx + ψ(t, x) − ψ(s, y),
x0 = t,
and so from (2.3) that C(t, s)0 f = ei~
−1
ψ(t,·)
C(t, s)(e−i~
ψ(t,·)
C(1)(e−i~
−1
ψ(s,·)
f ).
Hence we have from (1.12), C(1)0 f = ei~
−1
−1
ψ(0,·)
f ),
(4.14)
and so from the theorem, lim C(1)0 f = ei~
|1|→0
−1
ψ(t,·)
lim C(1)(e−i~
|1|→0
−1
ψ(0,·)
f)
in L2 ,
f ∈ L2 .
(4.15)
This gives the well-known Gauge invariance. References 1. Albeverio, S., Brze´zniak, Z: Oscillatory integrals on Hilbert spaces and Schr¨odinger equation with magnetic fields. J. Math. Phys. 36, 2135–2156 (1995) 2. Albeverio, S., Høegh-Krohn, R. J.: Mathematical theory of Feynman path integrals. Lecture Notes in Math. 523, Berlin–Heidelberg–New York: Springer, 1976 3. Asada, K., Fujiwara, D.: On some oscillatory integral transformations in L2 (Rn ). Japan J. Math. 4, 299–361 (1978) 4. Cameron, R. H.: A family of integrals serving to connect the Wiener and Feynman integrals. J. Math. and Phys. 39, 126–140 (1960) 5. Feynman, R. P.: Space-time approach to non-relativistic quantum mechanics. Rev. Mod. Phys. 20, 367– 387 (1948)
Feynman Path Integral Through Broken Line Paths
33
6. Feynman, R. P., Hibbs, A. R.: Quantum mechanics and path integrals. New York: McGraw-Hill, 1965 7. Fujiwara, D.: A construction of the fundamental solution for the Schr¨odinger equation. J. D’Analyse Math. 35, 41–96 (1979) 8. Fujiwara, D.: Remarks on convergence of the Feynman path integrals. Duke Math. J. 47, 559–600 (1980) 9. Fujiwara, D.: Some Feynman path integrals as oscillatory integrals over a Sobolev manifold. In: Lecture Notes in Math. 1540, Berlin– Heidelberg–New York: Springer, 1993, pp. 39–53 10. Fujiwara, D., Tsuchida, T.: The time slicing approximation of the fundamental solution for the Schr¨odinger equation with electromagnetic fields. J. Math. Soc. Japan 49, 299–327 (1997) 11. Ichinose, W.: A note on the existence and ~-dependency of the solution of equations in quantum mechanics. Osaka J. Math. 32, 327–345 (1995) 12. Itˆo, K.: Wiener integral and Feynman integral. In: Proc. 4th Berkeley symposium on Mathematical Statistics and Probability 2, Berkeley: Univ. of California Press, 1961, pp. 227–238 13. Itˆo, K.: Generalized uniform complex measures in the Hilbertian metric space with their application to the Feynman integral. In: Proc. 5th Berkeley symposium on Mathematical Statistics and Probability 2, Berkeley: Univ. of California Press, 1967, pp. 145–161 14. Kumano-go, H.: Pseudo-differential operators. Cambridge: MIT Press, 1981 15. Lax, P. D., Richtmyer, R. D.: Survey of the stability of linear finite difference equations. Comm. Pure Appl. Math. 9, 267–293 (1956) 16. Richtmyer, R. D., Morton, K. W.: Difference methods for initial-value problems. New York: Interscience Publishers, 1967 17. Schwartz, J. T.: Nonlinear functional analysis. New York, London, Paris, Montreux, Tokyo: Gordon and Breach Science Publishers, 1969 18. Truman, A: The polygonal path formulation of the Feynman path integral. In: Lecture Notes in Phys. 106, Berlin–Heidelberg–New York: Springer, 1979, pp. 73–102 19. Yajima, K.: Schr¨odinger evolution equations with magnetic fields. J. D’Analyse Math. 56, 29–76 (1991) Communicated by H. Araki
Commun. Math. Phys. 189, 35 – 71 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Exponentially Small Splitting of Separatrices Under Fast Quasiperiodic Forcing ` Amadeu Delshams1 , Vassili Gelfreich23 , Angel Jorba2 , Tere M. Seara1 1 Departament de Matem` atica Aplicada I, Universitat Polit`ecnica de Catalunya, Diagonal 647, 08028 Barcelona, Spain. E-mail:
[email protected],
[email protected] 2 Departament de Matem` atica Aplicada i An`alisi, Universitat de Barcelona, Gran via 585, 08007 Barcelona, Spain. E-mail:
[email protected],
[email protected] 3 Chair of Applied Mathematics, St.Petersburg Academy of Aerospace Instrumentation, Bolshaya Morskaya 67, 190000, St. Petersburg, Russia. E-mail:
[email protected]
Received: 19 February 1996 / Accepted: 14 February 1997
Abstract: We consider fast quasiperiodic perturbations with two frequencies (1/ε, γ/ε) of a pendulum, where γ is the golden mean number. The complete system has a twodimensional invariant torus in a neighbourhood of the saddle point. We study the splitting of the three-dimensional invariant manifolds associated to this torus. Provided that the perturbation amplitude is small enough with respect to ε, and some of its Fourier coefficients (the ones associated to Fibonacci numbers), are separated from zero, it is proved that the invariant manifolds split and that the value of the splitting, which turns out to be exponentially small with respect to ε, is correctly predicted by the Melnikov function. 1. Introduction At the end of the last century, H. Poincar´e [Poi99] discovered the phenomenon of the splitting of separatrices, which seems to be the main cause of the stochastic behaviour in Hamiltonian systems. He formulated the general problem of dynamics as a perturbation of an integrable Hamiltonian H(I, ϕ, ε) = H0 (I) + εH1 (I, ϕ), where ε is a small parameter, I = (I1 , I2 , . . . , In ), ϕ = (ϕ1 , ϕ2 , . . . , ϕn ). The values of the actions I, such that the unperturbed frequencies ωk (I) = ∂H0 /∂Ik are rationally dependent, are called resonances. As a model for the motion near a resonance, Poincar´e studied the pendulum with a high-frequency perturbation, which can be described by the Hamiltonian t y2 + cos x + µ sin x cos . 2 ε His calculations of the splitting, originally validated only for |µ| exponentially small with respect to ε, predicted correctly the splitting up to |µ| ≤ εp for any positive parameter p
` Jorba, T. Seara A. Delshams, V. Gelfreich, A.
36
[Gel97, Tre97]. The main problem in studying such kinds of systems is that the splitting is exponentially small with respect to ε. Namely, Neishtadt’s theorem [Nei84] implies that in a Hamiltonian of the form H x, y, t/ε = H0 (x, y) + H1 x, y, t/ε , where the Hamiltonian system of H0 has a saddle and an associated homoclinic orbit, and the perturbation H1 is a periodic function of time with zero mean value, the splitting can be bounded from above by O(e−const /ε ). For this estimate to be valid all the functions have to be real analytic in x and y, but C 1 dependence on time is sufficient. Lately, the constant in the exponent was related to the position of complex time singularities of the unperturbed homoclinic orbit [HMS88, Fon93, Fon95]. The above-mentioned systems provide a realistic model for the motion near a resonance only in the case of two degrees of freedom. If one considers simple resonances of systems with more than two degrees of freedom, one can choose all the angles except one to be fast variables. The simplest case is a quasiperiodic perturbation of a planar Hamiltonian system. Neishtadt’s averaging theorem was generalized to this case by C. Sim´o [Sim94], but the upper bounds provided for the splitting depend in an essential way on the frequency vector of the perturbation. For a perturbation of the pendulum depending on two frequencies, C. Sim´o [Sim94] checked numerically that a proper modification of the Melnikov method gives the correct prediction for the splitting. Autonomous models with perturbations that depend on time in a quasiperiodic way appear in several problems of Celestial Mechanics. For instance, the motion of a spacecraft in the Earth–Moon system can be modeled assuming that Earth and Moon revolve in circles around their common centre of masses (this gives an autonomous model), and the main perturbations (difference between the circular and the real motion of the Moon, effect of the Sun, etc.) are modeled as a time-dependent quasiperiodic function. For more details, see [DJS91, GJMS91]. In the present paper we consider a quasiperiodic high-frequency perturbation of the pendulum, described by the Hamiltonian function ω·I + h(x, y, θ, ε), ε
(1.1)
where ω · I = ω1 I1 + ω2 I2 ,
h(x, y, θ, ε) =
y2 + cos x + εp m(θ1 , θ2 ) cos x, 2
with symplectic form dx ∧ dy + dθ1 ∧ dI1 + dθ2 ∧ dI2 . We assume that ε is a small positive parameter and p is a positive parameter. Mainly due to a technical limitation imposed by the Extension Theorem (Theorem 3), we will restrict ourselves to the case p > 3. We also assume that the frequency is of the form ω/ε for √ ω = (1, γ),
γ=
5+1 . 2
(1.2)
The number γ is the famous golden mean number, which is the “most irrational" number [Khi63, Lan91]. The equations of motion associated with Hamiltonian (1.1) are
Splitting of Separatrices Under Fast Quasiperiodic Forcing
x˙ = y,
y˙ = (1 + εp m(θ1 , θ2 )) sin x,
1 θ˙1 = , ε γ θ˙2 = , ε
∂m I˙1 = −εp cos x (θ1 , θ2 ), ∂θ1 ∂m I˙2 = −εp cos x (θ1 , θ2 ). ∂θ2
37
(1.3)
Actions I1 and I2 have only been introduced to put the Hamiltonian in autonomous form, but are not relevant from a dynamical point of view (note that they do not appear in the right-hand sides of the equations of motion). The function m is assumed to be a 2π-periodic function of two variables θ1 and θ2 . Thus, it can be represented as a Fourier series: X mk1 k2 ei(k1 θ1 +k2 θ2 ) . (1.4) m(θ1 , θ2 ) = k1 ,k2
We assume that, for some positive numbers r1 and r2 , sup mk1 k2 er1 |k1 |+r2 |k2 | < ∞, k1 ,k2
(1.5)
and that there are positive numbers a and k0 , such that |mk1 k2 | ≥ ae−r1 |k1 |−r2 |k2 | ,
(1.6)
for all |k1 |/|k2 |, which are continuous fraction convergents of γ with |k2 | ≥ k0 . In fact, k1 and k2 are consecutive Fibonacci numbers: k1 = ±Fn+1 and k2 = ∓Fn . The Fibonacci numbers are defined by the recurrence: F0 = 1, F1 = 1, Fn+1 = Fn + Fn−1 for n ≥ 1. We call the corresponding terms in the perturbation to be resonant or Fibonacci terms. For example, the function m(θ1 , θ2 ) =
cos θ1 cos θ2 (cosh r1 − cos θ1 )(cosh r2 − cos θ2 )
satisfies these conditions. The upper bound (1.5) implies that the function m is analytic on the strip {| Im θ1 | < r1 } × {| Im θ2 | < r2 }. Equation (1.6) implies that this function can not be prolonged analytically onto a larger strip. Let us select α ∈ (0, 1]. Estimate (1.5) implies that |m(θ1 , θ2 )| ≤ Kε−2α on the strip
|Im θ1 | ≤ r1 − εα ,
|Im θ2 | ≤ r2 − εα .
(1.7) (1.8)
Formula (1.6) implies that the upper bound (1.7) can not be improved. The value of the splitting depends essentially on the width of these strips. The function m under consideration has a singularity “of second order” in the sense that the upper bound (1.7) for the maximum of the modulus is quadratic with respect to the inverse of the distance to the boundary of the strip. In a similar way the case of a singularity of any “order” q can be considered. In this case mk1 k2 should be replaced by mk1 k2 /|k|q−2 in (1.5) and (1.6). An example from [DGJS97b] shows that the Melnikov function and the splitting of separatrices can be of the order of some power of ε if the function m is not analytic, but has only a finite number of continuous derivatives. This makes a first qualitative
` Jorba, T. Seara A. Delshams, V. Gelfreich, A.
38
difference between periodic and quasiperiodic perturbations. Indeed, in the periodic case, only the C 1 dependence with respect to θ of the perturbed Hamiltonian is needed to prove that the splitting is O( e−const /ε ). (In both cases, the analyticity with respect to x, y is essential.) The Hamiltonian (1.1) can be considered as a singular perturbation of the pendulum h0 =
y2 + cos x. 2
(1.9)
The unperturbed system has a saddle point (0, 0) and a homoclinic trajectory given by x0 (t) = 4 arctan(et ),
y0 (t) = x˙ 0 (t).
(1.10)
The complete system (1.3) has a whiskered torus T : (0, 0, θ1 , θ2 ). The whiskers are 3Dhypersurfaces in the 4D-extended phase space (x, y, θ1 , θ2 ). These invariant manifolds are close to the unperturbed pendulum separatrix. Our main result is that for p > 3 and small ε > 0 the invariant manifolds split, and that the value of the splitting is correctly predicted by the Melnikov function Z ∞ , θ ; ε) = {h0 , h}(x0 (t), y0 (t), θ1 + t/ε, θ2 + γt/ε, ε) dt. (1.11) M (θ1 2 −∞
To give a more precise statement, we need to introduce a 2 log γ-periodic function c defined by δ − δ0 for δ ∈ [δ0 − log γ, δ0 + log γ], (1.12) c(δ) = C0 cosh 2 s
where C0 =
2π(γr1 + r2 ) , γ + γ −1
δ0 = log ε∗ ,
ε∗ =
π(γ + γ −1 ) , 2γ 2 (r1 γ + r2 )
and continued by 2 log γ-periodicity onto the whole real axis. The function c is piecewiseanalytic and continuous. Theorem 1 (Main Theorem). Given positive constants T1 < T2 , there exists a canonical coordinate system (H, T, θ1 , θ2 ), such that for p > 3, T1 ≤ T ≤ T2 and real θ1 and θ2 , the stable manifold has the equation H = 0, and the unstable manifold can be represented as the graph of a function H = H u (T, θ1 , θ2 ; ε), where the function H u depends 2π-periodically on θ1 and θ2 and is close to the Melnikov function: u ε) H (T, θ1 , θ2 ; ε) − M θ1 − T /ε, θ2 − γT /ε; ε ≤ const ε2p−4 exp − c(log √ , ε (1.13) with c(δ) as defined in (1.12). If condition (1.6) is fulfilled, then there exists ε0 > 0 such that, for 0 < ε < ε0 , the maximum of the modulus of the Melnikov function is larger than the right hand side of (1.13). This result, which was already announced in [DGJS97a], is not trivial since the Melnikov function is exponentially small with respect √ to ε. As we will see, for a fixed small ε the resonant terms with k1 , k2 ∼ const / ε are the ones that give the largest contribution to the Melnikov function. Condition (1.6) is not needed to get an upper bound for the Melnikov function of the form
Splitting of Separatrices Under Fast Quasiperiodic Forcing p−1
const ε
c(log ε) exp − √ ε
39
,
(1.14)
as well as the upper bound (1.13), which provide together an upper estimate for the splitting of separatrices. We need condition (1.6) to ensure a lower bound for the maximum of the Melnikov function of the same form (1.14), which dominates the error bound (1.13) and gives rise to a lower estimate for the splitting of separatrices. If condition (1.6) is not satisfied, we only get an upper bound for the splitting. In particular, this happens if m is a trigonometric polynomial or an entire function. Nevertheless, if the inequality (1.6) is satisfied for a sufficiently large (but finite) number of consecutive resonant terms, we can still find small positive numbers ε00 , ε0 > 0, such that the Melnikov function is greater than the error for small finite ε ∈ (ε00 , ε0 ). Then, from a practical point of view, the Melnikov theory gives a good approximation, but not an asymptotic formula. It is remarkable that the exponent of the asymptotic expression (1.14) for the Melnikov function is different from the case of a periodic perturbation. There appears not only a different power of ε, but a periodic function c(log ε) instead of a constant. In the case of an entire function m, we think that the method used in the present paper can be modified in order to improve the estimate of the error and to prove that the Melnikov function gives an actual asymptotic at least when the resonant terms decrease not faster than 1/k!. We note that the Melnikov function is not invariant with respect to canonical changes of variables. After a change, e.g. after a step of the classical averaging procedure, a lot of non-zero harmonics, which were not present in the original system, can appear. If in the original system the Fibonacci terms were not big enough, these new harmonics may give larger contribution to the splitting. This idea was used in [Sim94] to detect the splitting for a system with only 4 terms initially present. The assumption (1.2) that ω in the frequency vector is just (1, γ) can be relaxed. The generalization of the present result to the case when γ is a quadratic number is straightforward, with a similar expression (1.14) for the size of the Melnikov function. The case in which ω = (ω1 , ω2 ), with the ratio ω1 /ω2 being of constant type (the continued fraction expansion has bounded coefficients), but not quadratic, can be similarly analyzed, but in this case c(δ) is no longer a periodic function. In some sense one can say, properly speaking, that there are no√asymptotics. But it seems that there still exist upper and lower bounds, with the factor ε in the denominator of the exponential term. The case of two frequencies whose ratio ω1 /ω2 is not of constant type, as well as the case of more than two perturbing frequencies, is more complicated. Our model is based on the paper [Sim94] by C. Sim´o, where a lot of semi-numerical computations were presented. It can be thought of as an intermediate step between a Hamiltonian with one and a half degrees of freedom and a Hamiltonian with n degrees of freedom like the following generalization of Arnold’s example: H(x, φ, y, I, ε, µ) =
1 2 1 2 y + I + ε(cos x − 1) + µF (x, φ), 2 2
(1.15)
where x ∈ T , φ ∈ T n−1 are the coordinates, y ∈ R, I ∈ Rn−1 are the momenta, which was introduced by P. Lochak. It is remarkable that in his paper [Loc92, V§2], P. Lochak was already putting emphasis on perturbations F with arbitrarily high harmonics, in contrast with the original Arnold’s example [Arn64], in order to get realistic estimates for the splitting of separatrices. A similar Hamiltonian (the fast rotator-pendulum model) was studied by L. Chierchia and G. Gallavotti [CG94], and by G. Gallavotti [Gal94],
40
` Jorba, T. Seara A. Delshams, V. Gelfreich, A.
working to all orders of perturbation theory, and expressing the coefficients of the µth order contribution to the splitting of separatrices as improper time integrals from t = −∞ to t = +∞. However, by using a perturbation F with finite harmonics, they only were able to get upper estimates for the splitting. A related result is the Jeans–Landau–Teller approximation for adiabatic invariants, where the change of actions is given in first order by a sort of Melnikov function. Exponentially small upper bounds for the change of actions were obtained by G. Benettin, A. Carati and G. Gallavotti [BCG97], proving cancellations through tree–like diagrams. A numerical study performed by G. Benettin, A. Carati and F. Fass`o [BCF97] √ for the case of a large asymptotic frequency λ(1, γ) (λ = 1/ε in our notation) with γ = 2, a quadratic number, shows a good agreement between the numerical values of the Melnikov function and the change of actions. While we were revising this paper, we became aware of a new version of a remarkable preprint [RW97b] by M. Rudnev and S. Wiggins, devoted also to the Hamiltonian (1.15). Assuming similar conditions to (1.5), (1.6) for an even perturbation F (in particular, F possesses arbitrarily high harmonics), they give exponentially small upper and lower bounds for the splitting of separatrices for µ = O(εp ). It is important to notice that their results apply to n ≥ 3 degrees of freedom. It is interesting to remark here that since we restrict ourselves to a more concrete model, we obtain more information about the limit behaviour of the Melnikov function and the splitting of separatrices, for a lower value of the exponent p. The rest of the paper is devoted to the proof of the Main Theorem. In contrast with the above-mentioned papers, the method used in the present paper is based on the geometrical ideas proposed by Lazutkin [Laz84] for the study of the separatrix splitting for the standard map, and adapted to differential equations by the authors [Gel90, DS92, Gel93]. In Sect. 2, the Melnikov function is carefully analyzed, to provide its asymptotic behaviour. In Sect. 3, like in [DS92, DS97], and as a first step to give a description of the dynamics near the 2D-dimensional hyperbolic invariant torus T , the existence of a convergent normal form is established. This result on the normal form theorem is similar to Moser’s theorem [Mos56] on the normal form near a periodic hyperbolic orbit, and to [CG94] and [RW97a] in more general situations. However, our proof (see Sect. 8) is based on a quadratically convergent scheme, which allows us to show that the loss of √ domain in the phases θ is bounded by ε, as required to obtain an asymptotic formula for the Melnikov function. Besides, the Normal Form Theorem ensures that the local unstable manifold is O εp−1 -close to the unperturbed separatrix. In Sect. 4, the Extension Theorem, proved in Sect. 9, extends this local approximation for solutions of system (1.3) to a global one, on a suitably chosen complex domain. Since the unperturbed homoclinic orbit comes back to the domain of the normal form, the same happens to the unstable manifold, which can be compared with the local stable manifold. This comparison is performed in Sect. 5, where Theorem 4 is proved, implying immediately the Main Theorem. Finally, Sect. 6 is devoted to the arithmetic properties of the golden mean number γ, and in Sect. 7 some analytic properties of the quasiperiodic functions are studied. 2. Melnikov Function As is well-known, the Melnikov function (1.11) gives a first order approximation of the difference between the values of the unperturbed pendulum energy h0 on the stable and unstable manifolds. The next lemma describes its main features.
Splitting of Separatrices Under Fast Quasiperiodic Forcing
41
Lemma 1 (Properties of the Melnikov function). The Melnikov function defined by (1.11) is a 2π-periodic function of θ1 and θ2 , such that 1) M θ1 − T /ε, θ2 − γT /ε; ε is analytic on the product of strips {| Im θ1 | < r1 } × {| Im θ2 | < r2 } × {| Im T | < π/2}; 2) the maximum of the modulus of the Melnikov function, max(θ1 ,θ2 )∈T 2 |M (θ1 , θ2 ; ε)|, taken on the real arguments, can be bounded from above and from below by terms of the form c(log ε) p−1 √ const ε exp − ε with different ε-independent constants, where the function c in the exponent is defined by (1.12); 3) for a fixed small ε only 4 terms dominate in the Fourier series for the Melnikov √ −C1 / ε ), where the constant function and the rest can be estimated from above by O(e √ C1 > max c(δ) = C0 cosh(log γ). Remark 1. The number of leading terms depends on ε. In fact, the largest terms correspond to (k1 , k2 ) = ± Fn(ε)+1 , −Fn(ε) , where Fn(ε) is the Fibonacci number closest p to F ∗ (ε) = φ0 /ε, φ0 being a constant to be defined later in this section. Except for a small neighbourhood of ε = ε∗ γ −n , there is only one Fibonacci number closest to F ∗ (ε), and then only two corresponding terms dominate in the Fourier series. Proof of the lemma. Taking into account the explicit formula (1.10) for x0 (t) and y0 (t) we obtain easily that Z ∞ y0 (t) sin(x0 (t))m(θ1 + t/ε, θ2 + γt/ε) dt M (θ1 , θ2 ; ε) = εp −∞ Z ∞ 4 sinh t = −εp m(θ1 + t/ε, θ2 + γt/ε) dt. 3 −∞ cosh t To prove the assertion 1) of the lemma we note that Z ∞ 4 sinh(t + T ) M θ1 − T /ε, θ2 − γT /ε; ε = −εp m(θ1 + t/ε, θ2 + γt/ε) dt, 3 −∞ cosh (t + T ) and the last integral is analytic with respect to θ1 , θ2 and T . For the Fourier coefficients of the Melnikov function we have Z ∞ 4 sinh t i(k1 +γk2 )t/ε p dt mk1 k2 . e Mk1 ,k2 (ε) = −ε 3 −∞ cosh t Calculating the integral by residues we obtain Mk1 ,k2 (ε) = −
2πiεp (k1 + γk2 )2 mk 1 k 2 . +γk2 ) ε2 sinh π(k12ε
(2.1)
All these coefficients are exponentially small with respect to ε, but the constant in the exponent depends, in an essential way, on the coefficient index. The dependence on the largest Fourier coefficients on ε is represented in Fig. 1 in logarithmic scale for a
` Jorba, T. Seara A. Delshams, V. Gelfreich, A.
42 4
3.5
3
2.5
2
1.5
1
0.5
0 0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
Fig. √ 1. Each dashed line represents a Fourier coefficient of the Melnikov function as a function of ε: − ε log10 |Mk1 k2 (ε)| versus − log10 ε. The solid line represents the maximum of the modulus of the Melnikov function in the same scale.
perturbation with |mk1 k2 | = 1. In this figure for a fixed first coordinate the lower is a point, the larger is the value. The scale √ is chosen in such a way that a horizontal line corresponds to the function exp(−C/ ε) for some constant C. The most important resonant terms correspond to Fibonacci numbers, that is |k1 | = Fn+1 , |k2 | = Fn . Taking into account (1.5), (1.6) and (6.3) we bound these coefficients from below and from above by the terms of the form πCF εp−2 − (r1 γ + r2 )Fn . const 2 exp − Fn 2εFn For a fixed value of ε the first term in the exponent is an increasing function of Fn and the second one is decreasing. In order to describe this competition it is convenient to rewrite the last formula as ÿ ! C0 cosh 21 log(εFn2 ) − 21 log φ0 εp √ , const 2 2 exp − ε Fn ε where
p 2πCF (r1 γ + r2 ),
πCF . 2(r1 γ + r2 ) For a fixed ε, the largest term corresponds to the minimal value of the numerator or, equivalently, minimizes | log(εFn2 ) − log φ0 |. This happens for Fn closest to r φ0 ∗ F (ε) = . ε C0 =
φ0 =
That is, the index of √ the most important terms in the Fourier series for the Melnikov function grows as 1/ ε. Except when F ∗ (ε) lies exactly in the centre of an interval [Fn , Fn+1 ] there is only one Fibonacci number closest to F ∗ (ε). The index of the leading term changes when ε crosses this value. In a small neighbourhood of this value two terms are of the same order. In fact, the number of leading terms is two or four, respectively, since we have to take into account complex conjugate coefficients, (−k1 , −k2 ). Since Fn = CF (γ n+1 + (−1)n γ −n−1 ) we have log(εFn2 ) = log ε + 2(n + 1) log γ + log CF + 2 log 1 + (−1)n γ −2n−2 .
Splitting of Separatrices Under Fast Quasiperiodic Forcing
43
The value of this expression repeats with the error of the order O(γ −2n−2 ) = O(Fn−2 ) = O(ε) when we increase n by 1 and simultaneously decrease log ε by 2 log γ. Thus we obtain that supk1 k2 |Mk1 k2 (ε)| can be estimated from below and from above by c(log ε) , const εp−1 exp − √ ε where the function c was defined √ by (1.12). In the exponent the numerator oscillates between C0 and C0 cosh log γ with the period 2 log γ in log ε. In particular, this gives the lower bound for the maximum of the Melnikov function modulus, since a Fourier coefficient of a function cannot be larger than the maximum value of the function. The Fourier coefficients which are not related to the Fibonacci numbers, can be estimated in the same manner, but with a constant larger than CF . That implies that they are exponentially small with respect to the Fibonacci ones for small values of ε. The proof of the fact that the sum of these terms is also exponentially small is straightforward, and we omit it since it literally repeats the proof of Lemma 4. As we have established that for most small values of ε only the terms with (k1 , k2 ) = ±(Fn(ε)+1 , −Fn(ε) ) are important, the Melnikov function is essentially M (θ1 , θ2 ; ε) ≈ 2 MFn(ε)+1 ,−Fn(ε) (ε) sin Fn(ε)+1 θ1 − Fn(ε) θ2 + ϕ(ε) . The zeros of the Melnikov function correspond to homoclinic trajectories. The above formula implies that the zeros of the Melnikov function form two lines on the torus. As already noticed by C. Sim´o [Sim94], the averaged slopes of those lines approach γ when ε → 0.
3. Normal Form and Local Manifolds As we have seen during the analysis of the Melnikov function, the size of the splitting depends essentially on the widths of the analyticity strip (r1 , r2 ) of the angular variables θ1 , θ2 , as well as on the width of the analyticity strip of the separatrix (x0 (t), y0 (t)). Therefore, to detect the splitting in the quasiperiodic case the loss of domain in the angular variables must be very small (i.e., O(εα ), where α depends on the Diophantine properties of the frequencies). This makes another difference with the periodic case, where the size of the splitting does not depend on the width of the analyticity strip of the angular variable θ, but only on the width of the analyticity strip of the separatrix (t), y0 (t)). When dealing with the frequencies (1/ε, γ/ε) one needs a reduction of (x0√ O( ε) at most. Hence, during the proof of the convergence of the normal form one has to bound carefully the loss of domain (with respect to the angular variables) in order to achieve such a small reduction. Finally, we want to stress that if the amount of reduction is something bigger, one can only produce upper bounds for the splitting of separatrices. Theorem 2 (Normal Form Theorem). Let ε ∈ (0, ε0 ). In a neighbourhood of the hyperbolic torus T there is a canonical change of variables (x, y) 7→ (X, Y ), which depends 2π-periodically on θ1 and θ2 , such that the Hamiltonian (1.1) takes the form H(XY, ε) = H0 (XY ) + εp−1 H1 (XY, ε),
` Jorba, T. Seara A. Delshams, V. Gelfreich, A.
44
where H0 is the normal form Hamiltonian for the unperturbed pendulum. Moreover, the change of variables has the form x = x(0) (X, Y ) + εp−1 x(1) (X, Y, θ1 , θ2 , ε), y = y (0) (X, Y ) + εp−1 y (1) (X, Y, θ1 , θ2 , ε),
(3.1)
where (x(0) , y (0) ) are normal form coordinates for the unperturbed pendulum. The functions H0 , H1 , x(0) , y (0) , x(1) and y (1) are analytic and uniformly bounded in the complex domain defined by √ √ |X|2 + |Y |2 < r02 , | Im θ1 | < r1 − ε, | Im θ2 | < r2 − ε, for r1 and r2 from (1.5) and some positive constant r0 > 0. We prove this theorem in a more general form in Sect. 8. In the normal form coordinates the stable whisker is given by the equation X = 0 and the unstable one by the equation Y = 0. Let λ = H 0 (0, ε). The Normal Form Theorem provides a convenient parameterization for the invariant manifolds:
and
x = xs (T, θ1 , θ2 , ε) ≡ x(0, e−λT , θ1 , θ2 , ε), y = y s (T, θ1 , θ2 , ε) ≡ y(0, e−λT , θ1 , θ2 , ε),
(3.2)
x = xu (T, θ1 , θ2 , ε) ≡ x(eλT , 0, θ1 , θ2 , ε), y = y u (T, θ1 , θ2 , ε) ≡ y(eλT , 0, θ1 , θ2 , ε),
(3.3)
where we have used the change (3.1). Theorem 2 also implies that there is a positive ε-independent number T0 , such that |xu (T, θ1 , θ2 , ε) − x0 (T )| ≤ Cεp−1 |y u (T, θ1 , θ2 , ε) − y0 (T )| ≤ Cεp−1 and
|xs (T, θ1 , θ2 , ε) − x0 (T )| ≤ Cεp−1 |y s (T, θ1 , θ2 , ε) − y0 (T )| ≤ Cεp−1
for
for
T ≤ −T0 ,
(3.4)
T ≥ T0 .
(3.5)
4. Extension Theorem By the Normal Form Theorem, the unstable manifold is O εp−1 -close to the unperturbed separatrix. The next theorem extends this local approximation to a global one. Since the unperturbed separatrix (x0 (T ), y√ 0 (T )) has a singularity on T = ±π/2, we will restrict ourselves to |Im T | ≤ π/2 − ε, i.e., up to a distance to the singularity T = ±π/2 of the same order as the loss of domain in the angular variables. Besides, the extension time t + T will be chosen big enough in order that the unperturbed separatrix reaches again the domain of convergence of the normal form. Theorem 3 (Extension Theorem). Let (x0 (t), y0 (t)) be the unperturbed homoclinic trajectory given in (1.10), let α ∈ (0, 1), s = 2α, and assume p − s − 2α > 0. Then, there exists ε0 > 0 such that the following extension property holds: For any positive constants C and T0 there exists a constant C1 , such that for any ε ∈ (0, ε0 ), every solution of system (1.3) that satisfies the initial conditions
Splitting of Separatrices Under Fast Quasiperiodic Forcing
|x(t0 ) − x0 (t0 + T )| ≤ Cεp−s ,
|y(t0 ) − y0 (t0 + T )| ≤ Cεp−s ,
|Im θ1 (t0 )| ≤ r1 − εα ,
45
(4.1)
|Im θ2 (t0 )| ≤ r2 − εα ,
for some T ∈ C, t0 ∈ R with |Im T | ≤ π/2 − εα ,
−T0 ≤ t0 + Re T ≤ 0,
can be extended for −T0 ≤ t + Re T ≤ T0 and verifies there |x(t) − x0 (t + T )| ≤ C1 εp−s−2α ,
|y(t) − y0 (t + T )| ≤ C1 εp−s−2α .
The proof is given in Sect. 9. From now on we fix α = 1/2. Theorem 3 can be applied to get an approximation for the stable and the unstable manifold. As we will see in Sect. 7, constructing the approximation for the invariant manifolds in such a complex domain enables us to bound the error on the real axis in a way sufficiently precise to detect the splitting. Corollary 1. The following estimate holds: h0 (xu , y u ) − h0 (xs , y s ) = M (θ1 − T /ε, θ2 − γT /ε; ε) + O ε2(p−2) ,
(4.2)
where the value of h0 is evaluated at points of the invariant manifolds corresponding to (T, θ1 , θ2 , ε) such that √ √ Re T ∈ (T0 − R, T0 ), | Im T | ≤ π/2 − ε, | Im θk | ≤ rk − ε, k = 1, 2, (4.3) for any positive constants T0 and R, R < T0 . The constant in the estimate (4.2) depends on these two constants. Proof of Corollary 1. Since h˙ 0 = {h0 , h}, we have h0 (xu , y u ) =
Z
and s
s
0
−∞
Z
{h0 , h}(xu , y u , θ + ωt/ε, ε) dt
+∞
h0 (x , y ) = −
{h0 , h}(xs , y s , θ + ωt/ε, ε) dt.
0
Inside the above integrals the functions xu , y u , xs , y s are evaluated on (T +t, θ+ωt/ε, ε), with θ + ωt/ε = (θ1 + t/ε, θ2 + γt/ε). We can write the difference in the form Z +∞ {h0 , h}(x0 (t + T ), y0 (t + T ), θ + ωt/ε, ε) dt h0 (xu , y u ) − h0 (xs , y s ) = −∞
Z
0
Z
−∞ +∞
+ +
{h0 , h}(xu , y u , θ + ωt/ε, ε) − {h0 , h}(x0 , y0 , θ + ωt/ε, ε) dt {h0 , h}(xs , y s , θ + ωt/ε, ε) − {h0 , h}(x0 , y0 , θ + ωt/ε, ε) dt.
0
We have three integrals in this expression. The first one is the Melnikov integral, and we have to bound the second and third one. We note that
` Jorba, T. Seara A. Delshams, V. Gelfreich, A.
46
{h0 , h}(x, y, θ1 , θ2 , ε) = εp y sin x m(θ1 , θ2 ) and, consequently, since by (1.7) |m(θ1 , θ2 )| ≤ K/ε holds on (4.3), we get {h0 , h}(xu,s , y u,s , θ1 , θ2 , ε) − {h0 , h}(x0 , y0 , θ1 , θ2 , ε) ≤ Kεp−1 |y u,s − y0 | | sin xu,s | + |y0 | | sin xu,s − sin x0 | . We note that sin xu,s decrease exponentially as t goes to ±∞, respectively, as well as y0 as t goes to both ±∞. Then the extension theorem and the estimates (3.4–3.5) imply that the second and third integrals are bounded by O ε2(p−2) and O ε2(p−1) , respectively. Indeed, for the third integral we only have to use the estimate (3.4) to get a O(ε2(p−1) )bound, and for the second integral, we only have to use the extension theorem to get a O(ε2(p−2) )-bound. (See Lemma 10 for related bounds.)
5. First return A trajectory with initial conditions on the local unstable manifold leaves the domain of the normal form. Such a trajectory remains close to the homoclinic trajectory of the unperturbed pendulum at least during the time sufficient for the unperturbed trajectory to come back to the domain of the normal form. This part of the unperturbed separatrix, and, consequently, the corresponding part of the unstable manifold are close to the local stable manifold. In order to describe the difference between the unstable and local stable manifolds it is convenient to take H and T = − log Y / H 0 (XY ) as canonical coordinates near the stable separatrix. The equation of the local stable manifold is H = 0. In this coordinate system the unstable separatrix is a graph of a quasiperiodic function. This function is approximately the Melnikov function with the error O ε2p−4 . We use Fourier series arguments to show that on the real value of the arguments the remainder is exponentially small. It is less than the amplitude of the Melnikov function, as shown in the next theorem, which contains the Main Theorem. Theorem 4. There exist positive constants T0 and R, R < T0 , such that in the coordinate system (H, T, θ1 , θ2 ) the unstable manifold can be represented as the graph of the function H = H u (T, θ1 , θ2 ; ε), where the function H u depends 2π-periodically on θ1 and θ2 . In the domain π √ Re T ∈ (T0 − R, T0 ), | Im T | ≤ − ε, 2 √ √ | Im θ1 | < r1 − ε, | Im θ2 | < r2 − ε,
(5.1)
this function is analytic and close to the Melnikov function:
Moreover,
H u (T, θ1 , θ2 ; ε) = M (θ1 − T /ε, θ2 − γT /ε; ε) + O ε2p−4 .
(5.2)
H u (T, θ1 , θ2 ; ε) = H u (0, θ1 − T /ε, θ2 − γT /ε; ε),
(5.3)
and its mean value is zero:
Splitting of Separatrices Under Fast Quasiperiodic Forcing
ZZ T
H u (0, θ1 , θ2 ; ε) dθ1 dθ2 = 0.
47
(5.4)
2
Furthermore, for p > 3 and real T , θ1 and θ2 ,
u ε) H (T, θ1 , θ2 ; ε) − M θ1 − T /ε, θ2 − γT /ε; ε ≤ const ε2p−4 exp − c(log √ , ε (5.5) where c(δ) is defined in (1.12). If the condition (1.6) is fulfilled then the Melnikov function is larger than the right-hand side of (5.5). Proof. The equation of the unstable manifold is x = xu (T u , θ1 , θ2 , ε),
y = y u (T u , θ1 , θ2 , ε).
The Extension Theorem implies that they are εp−2 -close to the pendulum separatrix for | Re T u | < T0 . Choosing T0 large enough we ensure that the segment of the unstable separatrix, which correspond to T0 − R < Re T u < T0 , belongs to the domain of the normal form. Then we can represent this segment in the parametric form ˜ u (T u , θ1 , θ2 , ε), H=H
T = T˜ (T u , θ1 , θ2 , ε),
(5.6)
evaluating H and T at a point x = xu (T u , θ1 , θ2 , ε), y = y u (T u , θ1 , θ2 , ε). Denote by X u and Y u the value of the normal form coordinates at the corresponding point. As it was pointed out previously, the stable manifold is given by X s = 0 and Y s (T, θ1 , θ2 ) = e−λT . Using the Normal Form theorem, we obtain H˜ u (T u , θ1 , θ2 , ε) = H(X u Y u , ε) = H(X u Y u , ε) − H(X s Y s , ε) = H0 (X u Y u ) − H0 (X s Y s ) + εp−1 H1 (X u Y u , ε) − H1 (X s Y s , ε) = H0 (X u Y u ) − H0 (X s Y s ) + O ε2(p−2) = H0 (X (0) (xu , y u )Y (0) (xu , y u )) − H0 (X (0) (xs , y s )Y (0) (xs , y s )) + O ε2(p−2) = h0 (xu , y u ) − h0 (xs , y s ) + O ε2(p−2) ,
where X (0) and Y (0) denote the normal form coordinates of the unperturbed pendulum. Here the parameterizations of both invariant manifolds are taken at a point (T u , θ1 , θ2 , ε). Together with the estimate (4.2) this implies H˜ u (T u , θ1 , θ2 , ε) = M θ1 − T u /ε, θ2 − γT u /ε; ε + O ε2(p−2) . (5.7) Now we have to eliminate the parameter T u . From T˜ (T u , θ1 , θ2 , ε) = T (X u (T u , θ1 , θ2 , ε), Y u (T u , θ1 , θ2 , ε)) = T (X (0) (xu , y u ), Y (0) (xu , y u )) + O(εp−2 ) = T X (0) (x0 (T u ), y0 (T u )), Y (0) (x0 (T u ), y0 (T u )) + O(εp−2 ) = T u + O(εp−2 ) on the domain (5.1), we get, by Cauchy estimates, that ∂ T˜ (T u , θ1 , θ2 , ε) = 1 + O(εp−5/2 ). ∂T u
` Jorba, T. Seara A. Delshams, V. Gelfreich, A.
48
If p > 5/2, the Implicit Function Theorem allows us to eliminate T˜ from the first equation in (5.6) and to obtain the estimate (5.2). Suppose for a moment that the mean value of the function H u (T, θ1 , θ2 ; ε) with respect to the angle variables is equal to zero. Then the estimate (5.5) for real T , θ1 and θ2 , is a consequence of the quasiperiodicity of H u , the estimate (5.2)√and of Lemma √4 of Sect. 7. In the last lemma one√has to replace r1 and r2 by r1 − ε and r2 − ε, respectively, and take ρ = π/2 − ε. Lemma 1 shows that for p > 3 the amplitude of the Melnikov function is larger than the error term in (5.5). The exponentially small upper bound for the error is proved for p > 5/2. So what we have for 5/2 < p ≤ 3 is a very sharp upper bound for the splitting. To use Lemma 4 we have to prove that the mean value of the function H u (T, θ1 , θ2 ; ε) with respect to the angle variables is equal to zero. Indeed, in the variables H, T, θ1 , θ2 the equations of motion have the form H˙ = 0,
T˙ = 1,
1 θ˙1 = , ε
γ θ˙2 = , ε
Since H is an integral of motion we obtain (5.3). The proof of the equality (5.4) is completely analogous to the case of periodic perturbation. Consider the part of the phase space bounded by a KAM torus and the segments of the stable and unstable separatrices. Since the flow is Hamiltonian the volume of this subset is time-invariant, that is, the volume of the trajectories which enter the subset equals the volume of the trajectories which leave it. The trajectories may enter or leave this subset only through the “turnstile” formed by the split separatrices. The algebraic value of the volume, which passes through the “turnstile” during a small time interval 1t, may be evaluated in the coordinate system (H, T, θ1 , θ2 ) as ZZ 1t H u (T, θ1 , θ2 ; ε) dθ1 dθ2 . T2
By (5.3), this integral does not depend on T , and thus it should be zero. In other words, the equality (5.4) means that in average there is no diffusion in the direction of the T -axis. 6. Rational Approximation of γ √
by rational In this section we discuss the approximations of the number γ = 5+1 2 numbers. The best approximation is given in terms of Fibonacci numbers, which are defined by the following recurrent formula F0 = 1,
F1 = 1
Fn+1 = Fn + Fn−1 ,
n > 1.
It is easy to check the following: Fn−1 = and Fn − γFn−1 = For large values of n this implies
γ n − (−1)n γ −n , γ + γ −1
(6.1)
(−1)n (−1)n = . n γ Fn + γ −1 Fn−1
(6.2)
Splitting of Separatrices Under Fast Quasiperiodic Forcing
Fn − γFn−1
CF = (−1) +O Fn−1
49
ÿ
! 1
n
3 Fn−1
,
CF =
1 . γ + γ −1
(6.3)
The estimation of the following lemma is not sharp, but it is sufficient for our purposes. Lemma 2. If N ∈ N is not a Fibonacci number, then for all integers k, |k − γN | >
γCF . N
(6.4)
Proof. The first Fibonacci numbers are F1 = 1, F2 = 2, F3 = 3 and F4 = 5. Let us define dn = min min |k − γj|. 1≤j
k
Equation (6.2) implies that d2 = γ −2 and d3 = γ −3 . 0 Suppose that d0n = γ −n for all n0 , 2 ≤ n0 ≤ n. First, consider an integer number j, Fn < j < Fn+1 , and let j 0 = j − Fn . Obviously, 1 ≤ j 0 < Fn+1 − Fn = Fn−1 and we have for all k |k − γj| = |k − Fn+1 − γj 0 + (Fn+1 − γFn )| ≥ |k − Fn+1 − γj 0 | − |Fn0 +1 − γFn | ≥ dn−1 − γ −n−1 = γ −n+1 − γ −n−1 = γ −n . Then take j = Fn ,
(6.5)
min |k − γj| = |Fn+1 − γFn | = γ −n−1 . k
Comparing with the previous inequality, we see that the minimum is reached at the Fibonacci numbers. So we have dn+1 = γ −n−1 . Consequently, by induction we obtain that dn = γ −n for all n ≥ 2. Now consider a non Fibonacci number N , let Fn < N < Fn+1 . Formula (6.1) γ+1 n γ . The estimates, (6.5) with j = N implies that implies that if N > Fn , then N > γ+2 |k − γN | > γ −n ≥ which is equivalent to the desired estimate (6.4).
γ+1 1 , γ+2N
7. Exponentially Small Upper Bounds The following two lemmas provide a tool to pass from estimate (5.2) to the sharp estimate (5.5). The proof of the first lemma is standard. Lemma 3. Let F (θ1 + s/ε, θ2 + γs/ε) be a 2π-periodic function of the variables θ1 , θ2 , analytic in the product of strips |Im θ1 | ≤ r1 , |Im θ2 | ≤ r2 and |Im s| ≤ ρ, and |F | ≤ A for these values of the variables. Then for all k1 , k2 ∈ Z, |Fk1 k2 | ≤ Ae−|k1 |r1 −|k2 |r2 e−ρ|k1 +γk2 |/ε .
(7.1)
` Jorba, T. Seara A. Delshams, V. Gelfreich, A.
50
Consider the (2 log γ)-periodic function cρ,r1 ,r2 (δ) defined on the interval [log ε∗ − log γ, log ε∗ + log γ] by δ − log ε∗ , (7.2) cρ,r1 ,r2 (δ) = C0 cosh 2 s ρ(γ + γ −1 ) (γr1 + r2 )ρ ∗ ε = , C0 = 2 , (7.3) 2 (γr1 + r2 )γ γ + γ −1 and continued by 2 log γ-periodicity. The following lemma gives the exponentially small upper bound for the function F for real values of the variables. √ Lemma 4. Let F satisfy the conditions of Lemma 3. If γ = ( 5 + 1)/2 is the golden mean number and the mean value of the function F is zero, then cρ,r1 ,r2 (log ε) √ (7.4) |F (θ1 , θ2 )| ≤ const A exp − ε on the real values of its arguments. The constant depends continuously on r1 and r2 on r1 > 0 and r2 > 0. Proof. If the arguments of the function F are real, then X |Fk1 k2 | |F (θ1 , θ2 )| ≤ X
≤A
|k1 |+|k2 |6=0
exp(−|k1 |r1 − |k2 |r2 − ρ|k1 + γk2 |/ε),
(7.5)
|k1 |+|k2 |6=0
due to the estimate (7.1). In order to estimate the last sum in (7.5) we separate it into two parts. The first one contains non-resonant terms, that is, all the terms such that |k1 + γk2 | ≥ 1/2. We easily obtain the upper estimate X e−|k1 |r1 −|k2 |r2 −ρ|k1 +γk2 |/ε |k1 +γk2 |≥1/2
X
< e−ρ/(2ε)
e−|k1 |r1 −|k2 |r2 =
|k1 |+|k2 |6=0
2(e−r1 + er2 − e−r1 −r2 )e−ρ/(2ε) . (7.6) (1 − e−r1 )(1 − e−r2 )
For the resonant terms we have |k1 +γk2 | < 1/2. Obviously, for every k2 there exists exactly one integer k1 = k1 (k2 ) such that this inequality holds. Since the coefficients of the sum (7.5) are even with respect to (k1 , k2 ) we can assume that k2 is positive and, at the end, multiply the estimates by 2. The sum of the resonant terms with k2 ≥ ε−1 can be easily estimated: X X e−|k1 |r1 −|k2 |r2 −ρ|k1 +γk2 |/ε ≤ e−|k1 |r1 −|k2 |r2 k2 ≥ε−1
k2 ≥ε−1
≤
X k2 ≥ε−1
er1 /2−(γr1 +r2 )k2 ≤
er1 /2 e−(γr1 +r2 )/ε . 1 − e−(γr1 +r2 )
(7.7)
Splitting of Separatrices Under Fast Quasiperiodic Forcing
51
Now we estimate the resonant terms with 1 ≤ k2 < ε−1 . The number of such terms is large, but√finite. We will show that all of them, except at most 4, can be estimated by O(e−C1 / ε ) with a constant C1 > maxδ cρ,r1 ,r2 (δ) = C0 γ 1/2 + γ −1/2 /2. Let B denote the following expression from the exponent of the right hand of (7.5), obtained after substituting |k1 | = γk2 : √ ρ|k1 + γk2 | √ B(k2 , ε) = (γr1 + r2 )k2 ε + . ε It is sufficient to provide an appropriate lower bound for this function. If k2 is not a Fibonacci number, then we use Lemma 2: p √ ργCF √ B(k2 , ε) ≥ (γr1 + r2 )k2 ε + √ ≥ 2 (γr1 + r2 )ργCF ≡ C1 = γC0 . k2 ε If k2 is a Fibonacci number, then instead of (6.4), we use (6.2) |k1 + γk2 | = to obtain
1 1 CF ≥ = |k1 + γ −1 k2 | γk2 + 1 + γ −1 k2 k2 + CF
√ B(k2 , ε) ≥ (γr1 + r2 )k2 ε +
ρCF √ . (k2 + CF ) ε Provided ε is small, 0 < ε < ε0 , there are two positive numbers K1 and K2 , such that the right √ hand√side of the last inequality is larger than C1 for k2 outside the interval (K1 / ε, K2 / ε). Moreover, this interval contains at most 2 Fibonacci numbers, that is, (7.8) B(k2 , ε) ≥ C1 for all except at most 2 terms. For these exceptional terms √ √ ρCF ρCF2 √ , B(k2 , ε) ≥ (γr1 + r2 )k2 ε + √ − ε k2 ε K1 (K1 + CF ε) and it is convenient to rewrite
ÿ
√
B(k2 , ε) ≥ C0 cosh log(k2 ε) − log
s ρCF γr1 + r2
!
√ − O( ε).
√ The above O( ε) term affects only the constant in front of the estimate (7.4), √ since the terms in the sum of the right hand of (7.5) are of the form exp(−B(k2 , ε)/ ε). Since k2 is a Fibonacci number, k2 = Fn for some n and taking into account (6.1), we obtain s ÿ ! √ γ+1 ρCF 1 log ε + n log γ + log − log B(k2 , ε) ≥ C0 cosh − O( ε). 2 γ+2 γr1 + r2 The envelope of this family of curves is the function cρ,r1 ,r2 (δ) defined by Eq. (7.2). Thus, in the sum of the resonant terms there is one leading term which is exponentially larger than the others except in the neighbourhoods of ε = ε∗ γ n , when the index of the leading term changes, and there are two terms of the same order. Moreover, we have established that for all resonant terms, with k2 < ε−1 , √ B(k2 , ε) ≥ cρ,r1 ,r2 (log ε) − O( ε). Together with the estimates (7.6), (7.7) and (7.8), this completes the proof.
` Jorba, T. Seara A. Delshams, V. Gelfreich, A.
52
8. Normal Form Theorem This theorem gives a convergent normal form in a neighbourhood of a hyperbolic torus of a one degree of freedom Hamiltonian system under quasiperiodic time-dependent perturbations (with two frequencies). The main contribution is that the convergence is ensured in a√wide domain of the angle variables θ of the perturbation: the loss of domain is only O ε in the complex direction of θ. Theorem 5 (Normal Form Theorem). Let K be a Hamiltonian of the form K(x, y, θ, p, ε) =
ω · p + h0 (x, y, ε) + εq h1 (x, y, θ, ε), ε
(8.1)
with regard to the symplectic form dx∧ dy+ dθ∧ dp, with h0 , h1 analytic in the variables x, y, θ and with√continuous (and bounded) dependence on ε, on the set |x| , |y| < r0 , |Im θi | < ρi − ε (i = 1, 2), 0 < ε < ε0 , for some positive constants r0 , ρ = (ρ1 , ρ2 ) and ε0 . Assume also: 1. There exists c > 0 such that |k · ω| ≥ c/ |k|, ∀k ∈ Z 2 \ {0}. 2. The origin is a saddle point of the Hamiltonian h0 (x, y, 0). 3. h1 (0, 0, θ, ε) = ∂x h1 (0, 0, θ, ε) = ∂y h1 (0, 0, θ, ε) = 0. Then, there exists ε1 (0 < ε1 < ε0 ), r1 (0 < r1 < r0 ) and a canonical change of variables x = x(0) (X, Y, ε) + εq x(1) (X, Y, θ, ε), y = y (0) (X, Y, ε) + εq y (1) (X, Y, θ, ε), θ = θ, p = p(X, Y, θ, P, ε), analytic in the variables X, Y√ , θ, P , and bounded and continuous in ε, on the set |x| , |y| < r1 , |Im θi | < ρi − 2 ε (i = 1, 2), 0 < ε < ε1 , which transforms Hamiltonian (8.1) into its normal form: K(X, Y, P, ε) =
ω · P + H0 (XY, ε) + εq H1 (XY, ε). ε
Moreover, the canonical change of variables x = x(0) (X, Y, ε), y = y (0) (X, Y, ε), transforms the unperturbed Hamiltonian h0 into its normal form H0 . 8.1. Idea of the Proof. The proof is based on a quadratically convergent scheme, similar to the one used in the proof of KAM Theorem (see [Arn63]). The first step is to put the Hamiltonian in the action-angle variables of H0 : ω · p + H0 (z, ε) + εq H1 (z, ϕ, θ, ε), ε where the couples (z, ϕ) and (p, θ) correspond to canonically conjugated variables. Here H0 (z, 0) is the normal form of the unperturbed Hamiltonian.
Splitting of Separatrices Under Fast Quasiperiodic Forcing
53
The next step is to start a sequence of changes of variables to kill the term εq H1 , in the same way as it is done in the proof of the KAM Theorem. The main difference is due to the fact that the “small” divisors we will obtain are of the form √ ` ∈ Z, k ∈ Z 2 , |`| + |k| 6= 0, ε`λ0 (z, ε) + −1k · ω, where both λ0 (z, ε) := ∂z H0 (z, ε) and ω are real. This makes that all the divisors with ` 6= 0 are separated from zero, so they do not produce convergence problems. The only small divisors correspond to the case ` = 0, so we will ask ω to satisfy a suitable Diophantine condition. In this case, as ω is fixed in all the iterative process (see Lemma 7), the initial Diophantine condition will be satisfied in all the steps of the iterative scheme. This also implies that we can have convergence on an open set of the phase space. This makes a difference with standard KAM problems: there it is usual that frequencies depend on actions and then they have to be controlled at each step of the proof. This leads us to take out a Cantor-like set of actions, so the convergence is only proved on sets with empty interior. Finally, let us comment that the quadratic convergence allows us to be very strict in the amount of domain lost at each step,√so we have been able to show that the loss of domain in the θ variables is bounded by ε. We want to stress that the good properties of this case are due to the fact that the unperturbed problem has a saddle point with one degree of freedom. If the saddle is replaced by a centre (with one or more degrees of freedom) we obtain a standard KAM Theorem valid only on a set with empty interior (see [JS96]). If the unperturbed Hamiltonian has a saddle point with more than one degrees of freedom we obtain new small divisors (the resonances between the eigenvalues of the saddle when k = 0), that require to be controlled by using the actions of the Hamiltonian. This produces a KAM Theorem on the conservation of hyperbolic invariant manifolds. There is another detail worth comment: if we proceed exactly in the way mentioned above, we have technical problems due to the lack of definition of action-angle variables at the origin (we want to show convergence on a neighbourhood of that point!). To avoid this difficulty, we will work all the time with spatial (cartesian) coordinates, but grouping them as if they were the action and angle ones. One can see that (some of) these groupings have poles at the origin (as expected), but they also have factors that cancel those singularities (as expected too). So, bounding them together we can show that the whole thing is well defined and it is convergent in a neighbourhood of the origin. Now, let us go on to the details. 8.2. Technical lemmas. This section contains the lemmas used during the proof of Theorem 5. Lemma 5. Let ω = (ω1 , ω2 ) ∈ R2 such that |k · ω| ≥ c/ |k|, ∀k ∈ Z 2 \ {0}, for some constant c > 0. Then there exists a constant β = β(ω) > 0 such that the following inequality holds for all α ∈ (0, 1]: X k∈Z 2 \{0}
β e−α|k| ≤ 2. |k · ω| α
Proof. Assume, for instance, that |ω1 | ≤ |ω2 |. Given an integer k1 6= 0, there exists a unique integer k2 = k2∗ (k1 ) such that |k1 ω1 /ω2 + k2 | < 1/2, and in particular, such that |k1 ω1 + k2 ω2 | < |ω2 | /2. Moreover, |k2∗ (k1 )| ≤ |k1 ω1 /ω2 | + |k1 ω1 /ω2 + k2∗ (k1 )| < |k1 | + 1/2. Therefore,
` Jorba, T. Seara A. Delshams, V. Gelfreich, A.
54
X k∈Z 2 \{0}
X e−α|k| = |k · ω|
X
k1 ∈Z k2 6=k2∗ (k1 )
e−α|k| + |k · ω|
X k1 6=0, k2 =k2∗ (k1 )
e−α|k| |k · ω|
2 X −α|k| 1 X −α|k1 | e + e 2 |k1 | + |ω2 | c k1 6=0 k∈Z 2 ∞ 1 2 α 2 2 X −αn coth + e 2n + = |ω2 | 2 c 2 n=1 −α 5− e α 2 β 2 = + coth ≤ 2. |ω2 | 4c 2 α
<
1 2
Lemma 6. Let us consider the change of variables x = x(X, Y, θ, ε), y = y(X, Y, θ, ε) given implicitly by ∂S (x, Y, θ, ε), X = x + εm ∂Y (8.2) ∂S (x, Y, θ, ε), y = Y + εm ∂x where S(z1 , z2 , θ, ε) is defined on the set ε ∈ (0, ε0 ], |zi | < r, | Im θi | ≤ ρi , i = 1, 2, with z = (z1 , z2 ) ∈ C 2 and Re θ = (Re θ1 , Re θ2 ) ∈ T 2 . Moreover, it depends on z1 , z2 and θ in an analytic way, and has continuous and bounded dependence on ε. Let us also assume that |S(z1 , z2 , θ, ε)| ≤ M on this set and let η be such that 0 < η < 1. Then, if r2 η 2 (1 − η) , (8.3) 2M Eq. (8.2) defines the change x = x(X, Y, θ, ε), y = y(X, Y, θ, ε) analytic in the variables X, Y , θ and bounded and continuous in ε, on the set |X| < r(1 − η)2 , |Y | < r(1 − η)2 , | Im θi | ≤ ρi . Besides, the following bounds hold: εm 0 ≤
|X − x| ≤ εm
M , rη
|Y − y| ≤ εm
M . rη
(8.4)
The proof of this lemma is omitted, since it is straightforward. Now let us introduce some notations to be used in this section. Let f (θ) be a periodic function, θ ∈ T 2 , analytic on a (complex) strip of width ρ = (ρ1 , ρ2 ), that is, analytic on |Im θ| < ρ and continuous on the boundary. We denote by kf kρ the sup norm on that set, this is, kf kρ =
sup |f (θ)|,
| Im θ|≤ρ
where | Im θ| ≤ ρ means | Im θ1 | ≤ ρ1 and | Im θ2 | ≤ ρ2 . Moreover, if F (z, θ) is an analytic periodic function with respect to θ ∈ T 2 on a strip of width ρ, and analytic with respect to z on |z| ≤ r, we define the norm kF kr,ρ = sup kF (z, ·)kρ = |z|≤r
sup
|F (z, θ)|.
|z|≤r | Im θ|≤ρ
If F depends only on z, we simply denote kF kr,0 = sup|z|≤r |F (z)|. Here we have assumed that z ∈ C or z ∈ C 2 . Of course, in this last case the notation |z| ≤ r means |z1 | ≤ r and |z2 | ≤ r.
Splitting of Separatrices Under Fast Quasiperiodic Forcing
55
For the sake of the simplicity (and without loss of generality), we will assume that ε0 ≤ 1, λ ≤ 1, r ≤ 1, A0 ≥ 1 and A1 ≥ 1. This will be used along this section to avoid cumbersome bounds. Lemma 7 (Inductive Lemma). Let us consider the Hamiltonian ω · p + K0 (xy, ε) + εm K1 (x, y, θ, ε), ε
(8.5)
where 1. K0 (z, ε) is an analytic function of z on |z| ≤ r2 and 0 < ε ≤ ε0 . 2. |∂z K0 (z, ε)| ≥ λ > 0, on |z| ≤ r2 , 0 < ε ≤ ε0 . 3. K0 (z, ε) and K1 (x, y, θ, ε) depend on ε in a continuous and bounded way, if 0 < ε ≤ ε0 . 4. K1 (x, y, θ, ε) is analytic on |x| ≤ r, |y| ≤ r, | Im θi | ≤ ρi , i=1,2. 5. K1 (0, 0, θ, ε) = ∂x K1 (0, 0, θ, ε) = ∂y K1 (0, 0, θ, ε) = 0. Moreover, let us define A0 = kK0 kr,0 and A1 = kK1 kr,ρ , and let us consider 0 < η < 1/2 and 0 < δ < 1. Then, for ε0 small enough (which is detailed below), there exists a canonical change of variables, given implicitly by a generating function θ · P + Y x + εm S(x, Y, θ, ε) such that the new Hamiltonian takes the form ω b 0 (XY, ε) + ε2m K b 1 (X, Y, θ, ε), ·P +K ε
(8.6)
where b 0 (Z, ε) is an analytic function of Z on |Z| ≤ r2 (1 − η)6 , 1. The “normal form” K 0 < ε ≤ ε0 . b 0 (Z, ε)| ≥ λ − εm A1 /η > 0, on |Z| ≤ r2 (1 − η)6 . 2. |∂Z K 0 b 1 (X, Y, θ, ε) depend on ε in a continuous and bounded way, if 0 < b 0 (Z, ε) and K 3. K ε ≤ ε0 . b 1 (X, Y, θ, ε) is analytic on |X| ≤ r(1 − η)3 , |Y | ≤ r(1 − η)3 , | Im θi | ≤ ρbi = 4. K √ ρi − δ ε, i=1,2. b 1 (0, 0, θ, ε) = ∂x K b 1 (0, 0, θ, ε) = ∂y K b 1 (0, 0, θ, ε) = 0. 5. K b 0 kr(1−η)3 ,0 and A b1 = kK b0 = kK b1k If we define A , the following bounds hold: r(1−η)3 ,b ρ A1 b0 ≤ A0 + εm , A 0 η
2 b1 ≤ C A0 A1 , A 2 4 4 λ r δ η8
where C is a constant that only depends on ω. The generating function is bounded by kS(x, Y, θ, ε)kr(1−η),b ≤ ρ
BA1 , λδ 2 η 2
(8.7)
where the constant B only depends on ω. Finally, in order to apply this lemma, ε0 must satisfy (8.3), where M is the right hand size of (8.7).
` Jorba, T. Seara A. Delshams, V. Gelfreich, A.
56
Proof. In order to facilitate the reading, the proof has been split in several parts. 1. Rearranging the initial Hamiltonian. Let us start by expanding K1 in Taylor series with respect to x and y, X aij (θ, ε)xi y j , K1 (x, y, θ, ε) = i+j≥2
where aij (θ, ε) is a periodic function of θ, that can be expanded in Fourier series, √ X akij (ε)ek·θ −1 . aij (θ, ε) = k∈Z 2
Moreover, the following bounds hold: kaij kρ ≤
A1 , ri+j
|akij | ≤
A1 −|k1 |ρ1 −|k2 |ρ2 e , ri+j
0 < ε ≤ ε0 .
Let us write K1 in the following form: √ X X K1 (x, y, θ, ε) = akij (ε)(xy)j xi−j ek·θ −1 . i+j≥2 k∈Z 2
Now, we define ` = i − j. The conditions i ≥ 0, j ≥ 0 and i + j ≥ 2 are simply replaced by ` + j ≥ 0, j ≥ 0 and 2` + j ≥ 2 in terms of ` and j, so √ X X K1 (x, y, θ, ε) = ak`+j,j (ε)(xy)j x` ek·θ −1 , k∈Z 2 , `∈Z
j≥max{0,1−`/2,−`}
and defining ak,` (xy, ε) as the expression between brackets, one obtains √ X K1 (x, y, θ, ε) = ak,` (xy, ε)x` ek·θ −1 . k∈Z 2 , `∈Z
For the moment, we will handle these expressions as formal series and, once the change of variables is (formally) computed we will check that everything is well defined on the suitable domains. 2. Computing the generating function. We will look for a change of variables given by a generating function of the type P · θ + Y x + εm S(x, Y, θ, ε). So, the corresponding change is ∂S , ∂θ ∂S , y = Y + εm ∂x
p = P + εm
θ = θ, X = x + εm
∂S . ∂Y
(8.8)
We want this change to transform (8.5) into (8.6), so we insert (8.8) into (8.5), asking the result to be equal to (8.6):
Splitting of Separatrices Under Fast Quasiperiodic Forcing
57
ω m ω ∂S m ∂S m m ∂S ·P +ε · + K0 x Y + ε , ε + ε K1 x, Y + ε , θ, ε ε ε ∂θ ∂x ∂x ω ∂S b0 b 1 x + εm ∂S , Y, θ, ε . = ·P +K Y, ε + ε2m K x + εm ε ∂Y ∂Y b 0 (z, ε) = K0 (z, ε)+ This equation is automatically satisfied at order 0 in ε, if we choose K O(εm ). If we take the terms of order εm and we ask them to cancel out we obtain an equation that determines S: ∂S ∂S ω ∂S · + λ0 (xY ) x −Y + K1 (x, Y, θ, ε) = 0, ε ∂θ ∂x ∂Y
(8.9)
where λ0 (z, ε) = ∂z K0 (z, ε). We will try to solve this equation in the sense of the formal series. We look for S of the same type as K1 : S(x, Y, θ, ε) =
X
sk,` (xY, ε)x` ek·θ
√
−1
,
k∈Z 2 , `∈Z
being
X
sk,` (xY, ε) =
sk`+j,j (ε)(xY )j .
j≥max{0,1−`/2,−`}
The next step is to substitute S into (8.9), to obtain X
√
k∈Z 2 , `∈Z
+λ0 (xY )
−1sk,` (xY, ε)x` k · X
ω k·θ√−1 e ε
`sk,` (xY, ε)x` ek·θ
k∈Z 2 , `∈Z
√
−1
+
X
ak,` (xY, ε)x` ek·θ
√
−1
= 0.
k∈Z 2 , `∈Z
Equating terms, we obtain the following set of equations: √
−1k ·
ω k,` s (xY, ε) + λ0 (xY, ε)`sk,` (xY, ε) + ak,` (xY, ε) = 0, ε
k ∈ Z 2 , ` ∈ Z.
So, if k 6= 0 or ` 6= 0, the generating function S is defined (formally) by the coefficients sk,` (xY, ε) = −
εak,` (xY, ε) √ . ε`λ0 (xY, ε) + −1k · ω
Hence, this implies that we can not kill ak,` with k = 0 and ` = 0 and then Eq. (8.9) has no solution. So, we solve Eq. (8.9) but with a0,0 (xY, ε) instead of 0 in the right-hand side (this is not a problem because a0,0 (xY, ε) is already in normal form). Since s0,0 can be chosen arbitrarily, we simply take s0,0 (xY, ε) = 0. 3. Bounds on the generating function. √ Let us√reduce the analyticity strip with respect to θ from ρ = (ρ1 , ρ2 ) to ρb = (ρ1 − δ ε, ρ2 − δ ε) and the analyticity ball with respect to x and y from r to r(1 − η). Now, let us bound S:
` Jorba, T. Seara A. Delshams, V. Gelfreich, A.
58
X
kSkr(1−η),b ≤ ρ
ksk,` (xY, ε)x` kr(1−η),0 e(ρ1 −δ
√ √ ε)|k1 |+(ρ2 −δ ε)|k2 |
k∈Z 2 , `∈Z |k|+|`|6=0
X
≤
k∈Z 2 , `∈Z
εkak,` (xY, ε)x` kr(1−η),0 (ρ1 −δ√ε)|k1 |+(ρ2 −δ√ε)|k2 | √ .(8.10) e |ε`λ0 (xY ) + −1k · ω|
|k|+|`|6=0
To bound kak,` (xy, ε)x` kr(1−η),0 we recall that X ak,` (xY, ε) =
ak`+j,j (xY )j .
j≥max{0,1−`/2,−`}
Hence, X
kak,` (xY, ε)x` kr(1−η),0 ≤
|ak`+j,j ||xY |j |x|`
j≥max{0,1−`/2,−`}
≤ A1 e
−ρ1 |k1 |−ρ2 |k2 |
X
(1 − η)`
(1 − η)2j .
j≥max{0,1−`/2,−`}
Here we distinguish several possibilities, according to the values of `: P a) ` ≤ −2, that implies j ≥ −`. Since j≥−` (1 − η)2j < (1 − η)−2` /η, one has kak,` (xY, ε)x` kr(1−η),0 ≤ A1 e−ρ1 |k1 |−ρ2 |k2 | (1 − η)−` /η. P b) ` = −1, that implies j ≥ 2. Since j≥2 (1 − η)2j < (1 − η)4 /η, kak,` (xY, ε)x` kr(1−η),0 ≤ A1 e−ρ1 |k1 |−ρ2 |k2 | (1 − η)3 /η. P c) ` = 0, 1 that implies j ≥ 1. Since j≥1 (1 − η)2j < (1 − η)2 /η, kak,` (xY, ε)x` kr(1−η),0 ≤ A1 e−ρ1 |k1 |−ρ2 |k2 | (1 − η)2+` /η. P d) ` ≥ 2 that implies j ≥ 0. Since j≥0 (1 − η)2j < 1/η,
(8.11)
kak,` (xY, ε)x` kr(1−η),0 ≤ A1 e−ρ1 |k1 |−ρ2 |k2 | (1 − η)` /η. Now, putting these bounds into√(8.10), separating the cases k = 0 and k 6= 0, and taking into account that |ε`λ0 (xY ) + −1k · ω| ≥ |k · ω| one obtains ≤ kSkr(1−η),b ρ
X ka0,` (xY, ε)x` kr(1−η),0 `6=0
|`|λ √
X X εkak,l (xY, ε)x` kr(1−η),0 e(ρ1 −δ ε)|k1 |+(ρ2 −δ √ + |ε`λ0 (xY ) + −1k · ω| k6=0 `∈Z
√ ε)|k2 |
.
To bound the first sum we start using a), b), c) and d) for the different values of `, and then we apply the bound
Splitting of Separatrices Under Fast Quasiperiodic Forcing
X (1 − η)` `≥2
to obtain
`
≤ | ln η| − (1 − η),
X ka0,` (xY, ε)x` kr(1−η),0 `6=0
59
|`|λ
<
2A1 (1 + λβ) . λδ 2 η 2
With a similar scheme one can show that √
X X εkak,l (xY, ε)x` kr(1−η),0 e(ρ1 −δ ε)|k1 |+(ρ2 −δ √ |ε`λ (xY ) + −1k · ω| 0 k6=0 `∈Z
√ ε)|k2 |
√
2εA1 X e−δ ε|k| . < η2 |k · ω| k6=0
√
As δ ε < 1, we can apply Lemma 5 to control this sum. So, the final bound on the generating function is ≤ kSkr(1−η),b ρ
2A1 (1 + λβ) BA1 ≤ 2 2, λδ 2 η 2 λδ η
(8.12)
where β only depends on ω and B = 2(1 + β). 4. The transformed Hamiltonian. Up to now, we have found a function S(x, Y, θ, ε) such that Eq. (8.9) holds, but with a0,0 (xY, ε) instead of 0 in the right-hand side: ∂S ∂S ω ∂S · (x, Y, θ, ε) + λ0 (xY, ε) x (x, Y, θ, ε) − Y (x, Y, θ, ε) ε ∂θ ∂x ∂Y +K1 (x, Y, θ, ε) = a0,0 (xY, ε).
(8.13)
This function S satisfies the bound (8.12) and it generates implicitly the canonical change (8.8), whose explicit expression is of the form p = P + εm f0 (X, Y, θ, ε), x = X + εm f1 (X, Y, θ, ε), y = Y + εm f2 (X, Y, θ, ε). (8.14) Applying this change to the initial Hamiltonian (8.5) and expanding it in power series one obtains ω ω · P + εm · f0 (X, Y, θ, ε) + K0 (XY, ε) ε ε +εm λ0 (XY, ε)(Xf2 (X, Y, θ, ε) + Y f1 (X, Y, θ, ε)) + εm K1 (X, Y, θ, ε) + O(ε2m ). e 1 (X, Y, θ, ε) as the remainder O(ε2m ) in this formula. Then, ε2m K e1 Let us define ε2m K can be obtained in terms of the initial Hamiltonian by computing the remainder of the corresponding Taylor expansion with respect to εm . Indeed, let us call R1 the contribution e 1 that comes from K1 and R2 the one that comes from K0 (this is, ε2m K e1 = to ε2m K R1 + R2 ). Then, it is straightforward to check that R1 = ε2m f1
R2 =
∂K1 (X + τ εm f1 , Y + τ εm f2 ) ∂x ∂K1 + ε2m f2 (X + τ εm f1 , Y + τ εm f2 ), ∂y
(8.15)
ε2m 00 K0 (XY + τ εm [Xf2 + Y f1 ] + τ 2 ε2m f1 f2 )(Xf2 + Y f1 + 2τ εm f1 f2 )2 2 + ε2m K00 (XY + τ εm [Xf2 + Y f1 ] + τ 2 ε2m f1 f2 )f1 f2 , (8.16)
` Jorba, T. Seara A. Delshams, V. Gelfreich, A.
60
where 0 < τ < 1. Moreover, let us define ∂S mω · f0 (X, Y, θ, ε) − (X, Y, θ, ε) , Rp = ε ε ∂θ ∂S m Rx = ε λ0 (XY, ε)Y (X, Y, θ, ε) + f1 (X, Y, θ, ε) , ∂Y ∂S Ry = εm λ0 (XY, ε)X f2 (X, Y, θ, ε) − (X, Y, θ, ε) . ∂x So, with these notations and using (8.13) (replacing the value x by X), the Hamiltonian can be rewritten as ω b 1 (X, Y, θ, ε), · P + K0 (XY, ε) + εm a0,0 (XY, ε) + ε2m K ε b 1 (X, Y, θ, ε) = R1 + R2 + Rp + Rx + Ry . where ε2m K 5. Bounds on the transformed Hamiltonian. The bound on a0,0 has already been done in (8.11): A1 A1 (1 − η)2 < . ka0,0 kr(1−η),0 ≤ η η The bounds on R1 and R2 are obtained bounding directly (8.15) and (8.16), since bounds on f1 and f2 are given by (8.4) but taking into account that the actual function S(z1 , z2 , θ, ε) is defined on |z1 | ≤ r(1 − η) and |z2 | ≤ r(1 − η). Of course, the constant M that appears in (8.4) must be replaced by the bound (8.12). The bounds on ∂x K1 and ∂y K1 are produced applying Cauchy estimates, reducing the analyticity domain of K1 from r to r(1 − η) (we can reduce it more, but this is enough and it produces simpler bounds). Hence, for |X| ≤ r(1 − η)3 , |Y | ≤ r(1 − η)3 , | Im θi | ≤ ρbi , i = 1, 2 and 0 < ε ≤ ε0 (this is the domain where the change is defined, see Lemma 6) we have kR1 kr(1−η)3 ,b ≤ ε2m ρ
4BA21 . λδ 2 η 4 r2
Similarly (but with a little more work), one can derive the corresponding bound for R2 : kR2 kr(1−η)3 ,b ≤ ε2m ρ
72B 2 A0 A21 . λ2 r 4 δ 4 η 8
To bound Rp , Rx and Ry we note that ∂S (x(X, Y, θ, ε), Y, θ, ε), ∂θ ∂S f1 (X, Y, θ, ε) = − (x(X, Y, θ, ε), Y, θ, ε), ∂Y ∂S f2 (X, Y, θ, ε) = (x(X, Y, θ, ε), Y, θ, ε), ∂x
f0 (X, Y, θ, ε) =
(8.17) (8.18) (8.19)
where x(X, Y, θ, ε) is the change of variables for the x coordinate given in (8.14). Our purpose is to apply the mean value Theorem plus (8.4) to derive the desired bounds. Before continuing, let us give a bound to be used later:
ω ∂2S A0 A1
· (8.20)
ε ∂x∂θ ≤ C r4 λδ 2 η 5 ,
Splitting of Separatrices Under Fast Quasiperiodic Forcing
61
where C is a constant that only depends on the initial Hamiltonian. This has been obtained from (8.13), differentiating both sides with respect to x and then applying Cauchy estimates. Now it is not difficult to produce bounds for Rp , Rx and Ry , applying the mean value theorem: A0 A21 , r 4 λ2 δ 4 η 8 2 2m A0 A1 ≤ C ε , kRx kr(1−η)3 ,b x ρ r 2 λ2 δ 4 η 8 2 2m A0 A1 ≤ C ε , kRy kr(1−η)3 ,b y ρ r 2 λ2 δ 4 η 8 ≤ Cp ε2m kRp kr(1−η)3 ,b ρ
where constants Cp , Cx and Cy only depend on the initial Hamiltonian. Of course, (8.20) has been used to bound kRp kr(1−η)3 . The final bound is obtained by taking the biggest factors of these two bounds (we recall that r ≤ 1, λ ≤ 1 and A0 ≥ 1). 8.3. Proof of Theorem 5. The first step is to transform the autonomous Hamiltonian h0 (x, y, ε) into its normal form. This can be done because it is an integrable Hamiltonian. The domain of definition of h0 may be reduced, but in that case we rename this new domain in order to keep the same notation. Now, assume we have done n changes of variables like the ones of Lemma 7, so the Hamiltonian has the form n ω (8.21) H (n) (x, y, θ, p, ε) = · p + H0(n) (xy, ε) + εq2 H1(n) (x, y, θ, ε), ε where we have kept the same notation for the variables. This Hamiltonian is defined on some set |x| ≤ rn , |y| ≤ rn , | Im θi | ≤ ρ(n) i , i = 1, 2 and 0 < ε ≤ ε0 . On this set, we define the following constants (bounds): A(n) 0 = kK0 krn ,
A(n) 1 = kK1 krn ,ρ(n) ,
λn = inf |∂z K0 (z, ε)|. 2 |z|≤rn
Now, let us define the domains. Let δn be δ0 /(n+1)2 , with δ0 = 6/π 2 . Then, the reduction (n) of the analyticity strip √ with respect to θ done to H in order to compute the generating function will be δn√ ε (see Lemma 7). Note that in this way the total reduction of the domain is exactly ε. For the domain with respect to the spatial variables x and y we define the sequence {rn } as rn = rn−1 (1 − ηn−1 )3 , where ηn = 1 − exp(−η/(n + 1)2 ) and η = (2 ln 2)/π 2 . As the total reduction of domain is given by 3 Y (1 − ηn−1 ) , n≥1
one can check that, with this selection of ηn and η, this makes a reduction from r0 to r0 /2. From Lemma 7 we obtain the following inequalities: (n−1) + εq2 A(n) 0 0 ≤ A0
A(n) 1 ≤ C
n−1
A(n−1) 1 , ηn−1 2
(8.22)
,
(8.23)
A(n−1) A(n−1) 0 1
8 4 4 λ2n−1 rn−1 δn−1 ηn−1
` Jorba, T. Seara A. Delshams, V. Gelfreich, A.
62
λn ≥ λn−1 − εq2 0
n−1
A(n−1) 1 . ηn−1
(8.24)
(0) Let us assume that λi ≥ λ0 /2 and A(i) 0 ≤ 2A0 , i = 1, . . . , n − 1. Then, from (8.23) one obtains 2 24 A(n−1) , A(n) 1 ≤ En 1
where E is a constant that only depends on the initial Hamiltonian. Taking logarithms to both sides of the expression above one gets (see [JS92, lemma 5] for the details): A(n) 1
≤E
1+2+···+2n−1
" "n−1 #24 #2 n 24 2 n Y 1 5 (0) (0) 2i A1 (n − i) ≤ EA1 . E 3 i=0
Now it is easy to finish the proof. It is immediate that there exist a sufficiently small (but different from zero) value of ε0 such that n
(n) εq2 0 A1 ≤
n D (1/2)2 , 80
(8.25)
where D = max{A0 , λ0 /2}. This value ε0 does not depend on n and only depends on the initial Hamiltonian. It has been chosen in this way because it satisfies X
εq2 0
n−1
n≥1
A(n−1) 1 ≤ D. ηn−1
≤ 2A(0) Then, from (8.22) and (8.24) one gets λn ≥ λ0 /2 and A(n) 0 0 . Then, using induction, the proof of the convergence of the normal form is finished. To show the convergence of the sequence of changes of variables we use (8.4), where the bound M on the generating function is given by (8.7). Then, at step n (n ≥ 1), we are applying a change of variables whose distance to the identity is bounded by εq2 0
n−1
BA(n−1) 1 . 3 2 rn−1 λn−1 δn−1 ηn−1
Now, using (8.25), the convergence of the sequence of changes of variables is straightforward. 9. Proof of the Extension Theorem Before proceeding to the proof of this theorem, we will introduce some notations as well as some auxiliary lemmas. In the sequel, α > 0, t0 , T0 > 0, will be the real parameters introduced in the Extension Theorem, T the complex parameter in the strip |Im T | ≤ π/2 − εα and t will be the real time. K will denote a generic positive constant independent of ε, and τ will denote |t + T − πi/2|. In order to bound the solution (x(t), y(t)), we will compare it with the homoclinic solution of the unperturbed system, and thus we introduce the functions: ξ(t) := x(t) − x0 (t + T ),
˙ = y(t) − y0 (t + T ), η(t) := ξ(t)
Splitting of Separatrices Under Fast Quasiperiodic Forcing
63
which satisfy the system of differential equations with respect to the variable t: ξ˙ = η, η˙ = sin(x0 (t + T ) + ξ) − sin(x0 (t + T )) + εp sin(x0 (t + T ) + ξ) · m(θ1 (t), θ2 (t)). In order to study this system is very convenient to write it as: z˙ = A(t + T )z + εp G(x0 (t + T ), t) + F (ξ, t + T, t),
(9.1)
where z = (ξ, η)> , A(u) is the matrix A(u) =
0 1 cos (x0 (u)) 0
,
and the functions G = (0, g)> and F = (0, f )> , that depend also on ε, are given by: g(x, t) = sin x · m(θ1 (t), θ2 (t)), f (ξ, u, t) = sin(x0 (u) + ξ) − sin(x0 (u)) − cos (x0 (u)) · ξ + εp [g(x0 (u) + ξ, t) − g(x0 (u), t)] .
(9.2)
From the initial condition (4.1), our goal is to bound solutions z(t) of system (9.1) with z(t0 ) = O(εp−s ). To this purpose, first of all we seek for a fundamental matrix of the corresponding homogeneous linear system dz = A(u)z, du
(9.3)
which can be integrated using the fact that (y0 (u), y˙0 (u)) is a solution of Eq. (9.3). Another ˙ with independent solution can be obtained in the form ξ(u) = y0 (u)W (u), η(u) = ξ(u), Z u dσ , W (u) = 2 b y0 (σ) b being an arbitrary complex number. It is very important to choose adequately this parameter b to get a function W (u) as regular as possible, near the singularities of y0 . We will consider first the case 0 ≤ Im T ≤ π/2 and we choose b = πi/2. In this way, at the point u = πi/2, since y0 (u) has a simple pole, W (u) has a triple zero and y0 (u)W (u) has a double zero. Introducing: 9(u) = y0 (u) = x˙ 0 (u), 8(u) = y0 (u)W (u) = 9(u)W (u), a fundamental matrix of the linear Eq. (9.3) is 9(u) 8(u) , M (u) = 0 0 9 (u) 8 (u) which has determinant 1: 9(u)80 (u) − 8(u)90 (u) = y0 (u)2 W 0 (u) = 1. Expanding the functions 9, 8 near u = πi/2 we get:
` Jorba, T. Seara A. Delshams, V. Gelfreich, A.
64
9(u) =
−2i 1 + O((u − πi/2)2 ) , (u − πi/2)
i 8(u) = (u − πi/2)2 1 + O((u − πi/2)2 ) , 6
(9.4)
and that means that the fundamental matrix M (u) behaves near u = πi/2 as: −2i i (u − πi/2)2 (u − πi/2) 6 . 2i i (u − πi/2) (u − πi/2)2 3 In passing, from formula (9.4) one gets easily the following bounds that will be used later on. Lemma 8. For 0 ≤ Im T ≤ π/2 and 0 ≤ τ := |t + T − πi/2| ≤ T0 , the following bounds hold: K , τ K |90 (t + T )| ≤ 2 , τ |9(t + T )| ≤
|8(t + T )| ≤ Kτ 2 , |80 (t + T )| ≤ Kτ.
Since the fundamental solution ϕ(u, σ) of the linear Eq. (9.3) satisfying the initial condition ϕ(u, u) = Id is given by ϕ(u, σ) = M (u)M −1 (σ), we can now easily write the solution z(t) = (ξ(t), η(t)) of system (9.1) with initial condition z(t0 ) as: Z t z(t) = zin (t0 , t) + ϕ(t + T, σ + T )N (ξ(σ), σ + T, σ) dσ, t0
with and
zin (t0 , t) = ϕ(t + T, t0 + T )z(t0 ) = M (t + T )M −1 (t0 + T )z(t0 ), N (ξ, u, t) = εp G(x0 (u), t) + F (ξ, u, t).
The integral equation for z(t) can be also written as z(t) = M (t + T )M −1 (t0 + T )z(t0 ) Z t σ dσ M −1 (σ + T )N ξ(σ), σ + T, + M (t + T ) ε t0 Z t σ 1 dσ, = z (t) + M (t + T ) M −1 (σ + T )F ξ(σ), σ + T, ε t0 with
h z 1 (t) := M (t + T ) M −1 (t0 + T )z(t0 ) Z t σ dσ . M −1 (σ + T )G x0 (σ + T ), + εp ε t0
Splitting of Separatrices Under Fast Quasiperiodic Forcing
65
Writing this equation for z(t) in components, we obtain Z t 8(σ + T )f (ξ(σ), σ + T, σ) dσ ξ(t) = ξ1 (t) − 9(t + T ) t0
Z + 8(t + T )
t
9(σ + T )f (ξ(σ), σ + T, σ) dσ,
t0
˙ = ξ˙1 (t) − 90 (t + T ) η(t) = ξ(t) + 80 (t + T ) where
Z
t t0
Z
t t0
(9.5)
8(σ + T )f (ξ(σ), σ + T, σ) dσ
9(σ + T )f (ξ(σ), σ + T, σ) dσ,
(9.6)
ξ1 (t) = 9(t + T ) 80 (t0 + T )ξ(t0 ) − 8(t0 + T )η(t0 ) −ε
p
Z
t t0
8(σ + T )g(x0 (σ + T ), σ) dσ
+ 8(t + T ) 90 (t0 + T )ξ(t0 ) − 9(t0 + T )η(t0 ) +ε
p
Z
t t0
9(σ + T )g(x0 (σ + T ), σ) dσ .
(9.7)
We have now a suitable expression (9.5) for ξ, and we need to bound the functions f , g, defined in Eqs. (9.2), that appear therein. Lemma 9. If εα ≤ τ ≤ T0 + π/2, it follows that |g(x0 (t + T ), t)| ≤ Moreover, if |ξj | ≤ λ/τ β ≤ 1, j = 1, 2, then
Kε−s . τ2
|f (ξ1 , t + T, t) − f (ξ2 , t + T, t)| ≤ K
λ τ β+2
εp−s + 2 τ
(9.8)
|ξ1 − ξ2 | .
(9.9)
Proof of the lemma. We first recall that, at ±πi/2, y0 (u) = 2/cosh u has a simple pole, sin x0 (u) = y˙0 (u) has a double pole, as well as cos x0 (u) = 1 − y0 (u)2 /2. In the bound of g it appears also the bound (1.7) of m(θ1 , θ2 ) for the complex values (1.8) of θ1 , θ2 . In order to bound f it is enough to apply Taylor’s Theorem to the function sin x. Finally here is a technical lemma that will be needed later on. Its proof is straightforward (and can be found in [DS92, lemma 7.1] for β = 3). Lemma 10. Let t, t0 real, T complex, such that |Im T | < π/2,
−T0 ≤ t0 + Re T ≤ t + Re T ≤ T0 .
Then, given β ∈ R, the following inequality holds:
` Jorba, T. Seara A. Delshams, V. Gelfreich, A.
66
Z
t t0
dσ |σ + T − πi/2|
with K = K(T0 , β) > 0, and ρ−β [t0 ,t] (T )
:=
β
sup
≤ K · ρ−(β−1) [t0 ,t] (T ),
1
, if β = 6 0, β |σ + T − πi/2| sup |ln(|σ + T − πi/2|)| , if β = 0,
where the supremum is taken for σ ∈ [t0 , t]. The proof of the Extension Theorem, for the moment for T ∈ D+ = {T ∈ C : 0 < Im T ≤ π/2 − εα }, is based on the next two propositions. In the first one, the solutions of system (9.1) with initial conditions (4.1) will be extended up to t = t1 (T ). In the second proposition, we take t = t1 (T ) as the initial time. We divide the complex strip D+ in two parts Dup = {T ∈ C : π/2 − ε2α/3 ≤ Im T ≤ π/2 − εα }, D
down
= {T ∈ C : 0 ≤ Im T ≤ π/2 − ε
2α/3
},
and define the separation point t1 (T ) by 2α/3 , for T ∈ Dup , ε t1 (T ) + Re T = 0, for T ∈ Ddown .
(9.10) (9.11)
(9.12)
Proposition 1. Let z = (ξ(t), η(t)) be a solution of system (9.1) with initial conditions satisfying (9.13) |ξ(t0 )| ≤ Cεp−s , |η(t0 )| ≤ Cεp−s . Then, if p − s − 2α > 0, there exists ε0 > 0 such that, for 0 < ε < ε0 , (ξ(t), η(t)) can be extended for t ∈ [t0 , t1 (T )] and satisfies there the following bound: |80 (t + T )ξ(t)| + |8(t + T )η(t)| ≤ Kεp−s .
(9.14)
Proof of the proposition. We shall use the method of successive approximations. We begin the iteration process with ξ0 (t) = 0, and consider for n ≥ 0, the recurrence suggested by Eq. (9.5): Z t 8(σ + T )f (ξn (σ), σ + T, σ) dσ ξn+1 (t) = ξ1 (t) − 9(t + T ) Z + 8(t + T )
t0 t
t0
9(σ + T )f (ξn (σ), σ + T, σ) dσ.
(9.15)
The first iterate is ξ1 (t), as given by Eq. (9.7), and can be bounded easily, using the initial conditions (9.13), and Lemmas 8, 9, and 10: |ξ1 (t)| ≤
K p−s ε + εp−s + Kτ 2 εp−s + εp−s ρ−2 [t0 ,t] (T ) . τ
An analogous bound for 80 (t + T )ξ1 (t) follows immediately
(9.16)
Splitting of Separatrices Under Fast Quasiperiodic Forcing
67
|80 (t + T )ξ1 (t)| ≤ K εp−s + εp−s + Kτ 3 εp−s + εp−s ρ−2 [t0 ,t] (T ) . −2α Now, for T ∈ D down we have ρ−2 , and for T ∈ Dup we have [t0 ,t] (T ) ≤ Kτ −2 ρ−2 , for t0 ≤ t ≤ − Re T, [t0 ,t] (T ) ≤ Kτ −2α , ρ−2 [t0 ,t] (T ) ≤ Kε
τ 3 ≤ Kε2α , for − Re T ≤ t ≤ t1 (T ),
and, consequently, in all the strip D+ : T ∈ D+ , and for t ∈ [t0 , t1 (T )], it follows that τ 3 ρ−2 [t0 ,t] (T ) ≤ K.
(9.17)
Let us remark that the value of t1 (T ) has been chosen just in order that bound (9.17) holds. In this way, we can bound |80 (t + T )ξ1 (t)| in a uniform way: |80 (t + T )ξ1 (t)| ≤ Kεp−s , or, in other words, |ξ1 (t)| ≤ Kεp−s /τ . To begin the iteration process, we introduce the norm kξk := sup |80 (t + T )ξ(t)| , where the supremum is taken for T ∈ D+ and t ∈ [t0 , t1 (T )]. The above bound on 80 (t + T )ξ1 (t) reads now as kξ1 k ≤ Kεp−s . Assuming that kξn−1 k , kξn k ≤ Kεp−s , we now consider Z
t
Z
t0 t
ξn+1 (t) − ξn (t) = − 9(t + T ) + 8(t + T )
t0
8(σ + T ) fn − fn−1 dσ 9(σ + T ) fn − fn−1 dσ,
where fk denotes f (ξk (σ), σ + T, σ). Applying Lemmas 8, 9 (with λ = Kεp−s and β = 1) and 10, as well as inequality (9.17), we obtain |80 (t + T )(ξn+1 (t) − ξn (t))| Z t εp−s τ (σ)2 |ξn (σ) − ξn−1 (σ)| dσ ≤K τ (σ)3 t0 Z t εp−s 1 · |ξn (σ) − ξn−1 (σ)| dσ + Kτ 3 3 t0 τ (σ) τ (σ) Z t p−s Z t p−s ε dσ ε dσ 3 + τ kξn − ξn−1 k ≤K 2 5 τ (σ) t0 t0 τ (σ) p−s 3 −4 τ ρ[t0 ,t] (T ) kξn − ξn−1 k ≤ K εp−s ρ−1 [t0 ,t] (T ) + ε ≤ Kεp−s−2α kξn − ξn−1 k , where τ (σ) denotes |σ + T − πi/2|. Since p − s − 2α > 0, if we choose now ε0 = ε0 (K, p − s − 2α) small enough, it follows, by induction, that the following inequalities kξk k ≤ 2 kξ1 k ≤ Kεp−s ,
kξk+1 − ξk k ≤
1 kξk − ξk−1 k , 2
(9.18)
` Jorba, T. Seara A. Delshams, V. Gelfreich, A.
68
are valid for k ≥ 1, 0 < ε ≤ ε0 , and consequently (ξk )n≥0 converges uniformly for T ∈ D+ and t ∈ [t0 , t1 (T )] to the first component ξ(t) of a solution of system (9.1), satisfying |80 (t + T )ξ(t)| ≤ Kεp−s . For the second component η(t), we simply use its integral Eq. (9.6), and it is straightforward to check that |8(t + T )η(t)| ≤ Kεp−s , and consequently the bound (9.14) is proved.
From bound (9.14) we get the following global estimates |ξ(t)| ≤ Kεp−s−α ,
|η(t)| ≤ Kεp−s−2α ,
for t ∈ [t0 , t1 (T )]. On the final point t = t1 (T ), bound (9.14) gives a better estimate εp−s ≤ Kεp−s−4α/3 , τ12 (9.19) where we have denoted τ1 = |t1 (T ) + T − πi/2|, and we have used that τ1 ≥ ε2α/3 . These are the initial conditions for the next proposition. |ξ(t1 (T ))| ≤ K
εp−s ≤ Kεp−s−2α/3 , τ1
|η(t1 (T ))| ≤ K
Proposition 2. Let (ξ(t), η(t)) be a solution of system (9.1) with initial conditions satisfying (9.19). Then, if p − s − 2α > 0, there exists ε0 > 0 such that, for 0 < ε < ε0 , (ξ(t), η(t)) can be extended for t ∈ [t1 (T ), T0 − Re T ] and satisfies there the following bound: (9.20) |90 (t + T )ξ(t)| + |9(t + T )η(t)| ≤ Kεp−s−2α . Proof of the proposition. We shall use exactly the same method of successive approximations as in Proposition 1, but replacing the initial condition t0 by t1 (T ): Z t 8(σ + T )f (ξn (σ), σ + T, σ) dσ ξn+1 (t) = ξ1 (t) − 9(t + T ) t1 (T ) t
Z + 8(t + T )
t1 (T )
9(σ + T )f (ξn (σ), σ + T, σ) dσ.
(9.21)
The first iteration gives ξ1 (t) as provided by Eq. (9.7), but with t1 (T ) instead of t0 . Proceeding like in Proposition 1, but using now the initial conditions (9.19), we can bound the first iterate ξ1 (t) as in (9.16): p−s K p−s p−s 2 ε p−s −2 |ξ1 (t)| ≤ ε +ε + ε ρ[t1 (T ),t] (T ) . + Kτ τ τ13 Now the following inequalities hold −β ρ−β [t1 (T ),t] (T ) ≤ Kτ1 ,
ρ0[t1 (T ),t] (T ) ≤ K |ln τ1 | ,
ρβ[t1 (T ),t] (T ) ≤ Kτ β ,
and consequently we can bound 90 (t + T )ξ1 (t) as p−s εp−s ε ≤ εp−s−2α , + |90 (t + T )ξ1 (t)| ≤ K τ3 τ13
(9.22)
Splitting of Separatrices Under Fast Quasiperiodic Forcing
69
where we have used that τ ≥ τ1 ≥ ε2α/3 . In view of this bound, we define now the norm kξk := sup |90 (t + T )ξ(t)| , with the supremum taken for T ∈ D+ and t ∈ [t1 (T ), T0 − Re T ]. With this new terminology we have proved that kξ1 k ≤ Kεp−s−2α , and therefore
|ξ1 (t)| ≤ Kεp−s−2α τ 2 .
For the successive iterates we apply Lemmas 8, 9 (with λ = Kεp−s−2α and β = −2) and 10, as well as inequalities (9.22), obtaining |90 (t + T )(ξn+1 (t) − ξn (t))| Z εp−s K t τ (σ)2 εp−s−2α + |ξn (σ) − ξn−1 (σ)| dσ ≤ 3 τ t1 (T ) τ (σ)2 Z t εp−s 1 p−s−2α ε + |ξn (σ) − ξn−1 (σ)| dσ +K τ (σ)2 t1 (T ) τ (σ) Z t 1 εp−s−2α τ (σ)4 + εp−s τ (σ)2 dσ ≤K 3 τ t1 (T ) Z t εp−s dσ kξn − ξn−1 k εp−s−2α τ (σ) + + τ (σ) t1 (T ) p−s−2α εp−s 3 ε 5 p−s−2α 2 p−s 0 ≤K ρ + 3 ρ +ε ρ + ε ρ kξn − ξn−1 k τ3 τ ≤ Kεp−s−2α kξn − ξn−1 k , where τ (σ) has denoted |σ + T − πi/2|, and ρβ has denoted ρβ[t1 (T ),t] (T ). Since p − s − 2α > 0, choosing now ε0 = ε0 (K, p − s − 2α) small enough, it follows by induction that for n ≥ 1, kξn k ≤ Kεp−s−2α ,
kξn+1 − ξn k ≤
1 kξn − ξn−1 k , 2
and consequently (ξn )n≥0 converges uniformly for T ∈ D+ and t ∈ [t1 (T ), T0 − Re T ] to the first component ξ(t) of a solution of system (9.1), satisfying the required bound. As in Proposition 2, we can now bound η(t) from its integral Eq. (9.5), and we finally obtain bound (9.20). Proof of the Extension Theorem. First consider 0 ≤ Im T ≤ π/2 − εα . Putting Propositions 1 and 2 together, as well as the bound (9.20) produced by this last proposition, we immediately obtain the Extension Theorem for 0 ≤ Im T ≤ π/2 − εα , with the required estimates. For −π/2 + εα ≤ Im T ≤ 0 we only have to choose b = −πi/2 in the definition of W (u), in order to get a second solution 8(u) of the linear system (9.3) with a double zero at u = −πi/2. Lemma 8, as well as Propositions 1 and 2 are also valid for −π/2 + εα ≤ Im T ≤ 0, and consequently the Extension Theorem follows for |Im T | ≤ π/2 − εα .
70
` Jorba, T. Seara A. Delshams, V. Gelfreich, A.
Acknowledgement. We are indebted to C. Sim´o for relevant discussions and remarks. Three of the authors (A. D, A. J. and T.M. S) have been partially supported by the Spanish grant DGICYT PB94–0215, the EC grant ERBCHRXCT940460, and the Catalan grant CIRIT 1996SGR–00105. One of the authors (V. G.) was supported by a CICYT grant.
References [Arn63]
Arnold, V.I.: Proof of A.N. Kolmogorov’s theorem on the preservation of quasi-periodic motions under small perturbations of the Hamiltonian. Russian Math. Surveys. 18, 9–36, 1963 [Arn64] Arnold, V.I.: Instability of dynamical systems with several degrees of freedom. Sov. Math. Dokl. 5, 581–585 (1964) [BCF97] Benettin, G., Carati, A. and Fass`o, F.: On the conservation of adiabatic invariants for a system of coupled rotators. Phys. D 104, 253–268 (1997) [BCG97] Benettin, G., Carati, A., and Gallavotti, A.: A rigorous implementation of the Jeans-Landau-Teller approximation for adiabatic invariants. Nonlinearity 10, 479–505 (1997) [CG94] Chierchia, L. and Gallavotti, G.: Drift and diffusion in phase space. Ann. Inst. H. Poincar´e Phys. Th´eor. 60(1), 1–144 (1994) [DGJS97a] Delshams, A., Gelfreich, V.G., Jorba, A. and Seara, T.M.: Lower and upper bounds for the splitting of separatrices of the pendulum under a fast quasiperiodic forcing. ERA Amer. Math. Soc. 3, 1–10 (1997) [DGJS97b] Delshams, A., Gelfreich, V.G., Jorba, A., and Seara, T.M.: Splitting of separatrices for (fast) quasiperiodic forcing. In C.Sim´o, editor, Hamiltonian Systems with Three or More Degrees of Freedom, NATO Adv. Sci. Inst. Ser. C Math. Phys. Sci. Held in S’Agar´o, Spain, 29–30 June 1995. Dordrecht: Kluwer Acad. Publ., to appear in 1997 [DJS91] D´ıez, C., Jorba, A. and Sim´o, C.: A dynamical equivalent to the equilateral libration points of the real Earth-Moon system. Celestial Mech. 50, 13–29 (1991) [DS92] Delshams, A. and Seara,T.M.: An asymptotic expression for the splitting of separatrices of the rapidly forced pendulum. Commun. Math. Phys. 150, 433–463 (1992) [DS97] Delshams, A. and Seara,T.M.: Splitting of separatrices in Hamiltonian systems with one and a half degrees of freedom. Preprint, 1997. Submitted to Math. Phys. EJ [Fon93] Fontich, E.: Exponentially small upper bounds for the splitting of separatrices for high frequency periodic perturbations. Nonlinear Anal. 20 (6), 733–744, (1993) [Fon95] Fontich, E.: Rapidly forced planar vector fields and splitting of separatrices. J. Differential Eq. 119 (2), 310–335 (1995) [Gal94] Gallavotti, G.: Twistless KAM tori, quasi flat homoclinic intersections, and other cancellations in the perturbation series of certain completely integrable Hamiltonian systems. A review. Rev. Math. Phys. 6 (3), 343–411 (1994) [Gel90] Gelfreich, V.G.: Separatrices splitting for the rapidly forced pendulum. (unpublished), 1990 [Gel93] Gelfreich, V.G.: Separatrices splitting for the rapidly forced pendulum. In S.Kuksin, V.F. Lazutkin, and J.P¨oschel, editors, Proceedings of the Dynamical Systems Semester, Held in St. Petersburg, Russia, 17–30 November, 1991. Basel-Boston-Stuttgart: Birkh¨auser, 1993, pp 47–67 [Gel97] Gelfreich, V.G.: Reference systems for splitting of separatrices. Nonlinearity 10 (1), 175–193 (1997) [GJMS91] G´omez, G., Jorba, A., Masdemont, J. and Sim´o, C.: A quasiperiodic solution as a substitute of L4 in the Earth-Moon system. In Proceedings of the 3rd International Sysmposium on Spacecraft Flight Dynamics, Noordwijk: ESTEC. ESA Publications Division 1991, pp. 35–41 [HMS88] Holmes, P., Marsden, J. and Scheurle, J.: Exponentially small splittings of separatrices with applications to KAM theory and degenerate bifurcations. Contemp. Math. 81, 213–244 (1988) [JS92] Jorba, A. and Sim´o, C.: On the reducibility of linear differential equations with quasiperiodic coefficients. J. Differential Eq. 98, 111–124 (1992) [JS96] Jorba, A. and Sim´o, C.: On quasiperiodic perturbations of elliptic equilibrium points. SIAM J. Math. Anal. 27 (6), 1704–1737 (1996) [Khi63] Khintchine, A.Ya.: Continued Fractions. Groningen: P. Noordhoff, Ltd., 1963 [Lan91] Lang, S.: Introduction to Diophantine Approximations. New York: Springer, Second Edition, 1991
Splitting of Separatrices Under Fast Quasiperiodic Forcing
[Laz84] [Loc92] [Mos56] [Nei84] [Poi99] [RW97a] [RW97b]
[Sim94]
[Tre97]
71
Lazutkin, V.F.: Splitting of separatrices for the Chirikov’s standard map. Preprint VINITI No. 6372–84 (in Russian), 1984 Lochak, P.: Canonical perturbation theory via simultaneous approximation. Russ. Math. Surv. 47 (6), 57–133 (1992) Moser, J.: The analytic invariants of an area-preserving mapping near a hyperbolic fixed point. Comm. Pure Appl. Math. 9, 673–692 (1956) Neishtadt, A.I.: The separation of motions in systems with rapidly rotating phase. J. Appl. Math. Mech. 48 (2), 133–139 (1984) Poincar´e, H.: Les m´ethodes nouvelles de la m´ecanique c´eleste, Volume 1, 2, 3. Paris: GauthierVillars, 1892–1899 Rudnev, M. and Wiggins, S.: KAM theory near multiplicity one resonant surfaces in perturbations of a-priori stable hamiltonian systems. J. Nonlinear Sci. 7, 177–209 (1997) Rudnev, M. and Wiggins, S.: Existence of exponentially small separatrix splittings and homoclinic connections between whiskered tori in weakly hyperbolic near-integrable hamiltonian systems. Preprint, archived in mp
[email protected], #97-4, 1997. To appear in Phys. D Sim´o, C.: Averaging under fast quasiperiodic forcing. In J.Seimenis, editor, Hamiltonian Mechanics: Integrability and Chaotic Behaviour, Volume 331 of NATO Adv. Sci. Inst. Ser. B Phys. Held in Toru´n, Poland, 28 June–2 July 1993, New York: Plenum, 1994, pp. 13–34 Treschev, D.V.: Separatrix splitting for a pendulum with rapidly oscillating suspension point. Russian J. Math. Phys. 5 (1), 63–98 (1997)
Communicated by A. Jaffe
Commun. Math. Phys. 189, 73 – 105 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Global Existence of Small Solutions to a Relativistic Nonlinear Schr¨odinger Equation? Anne de Bouard1 , Nakao Hayashi2 , Jean-Claude Saut1 1 2
Analyse Num´erique et EDP, CNRS et Universit´e Paris-Sud, Bˆat 425, 91405 Orsay, France Department of Applied Mathematics, Faculty of Science, Science University of Tokyo, 162 Tokyo, Japan
Received: 2 December 1996 / Accepted: 24 February 1997
R´esum´e: On e´ tudie le probl`eme de Cauchy associ´e a` une equation de Schr¨odinger non lin´eaire mod´elisant une impulsion laser ultra-intense et ultra-courte dans un plasma. Les termes non lin´eaires nouveaux sont dˆus aux effets relativistes et a` la force pond´eromotrice. Nous prouvons l’existence locale et l’unicit´e de solutions petites lorsque la dimension de l’espace transverse est e´ gale a` 2 ou 3, ainsi que l’existence locale pour des donn´ees initiales de taille arbitraire, en dimension un d’espace transverse. Abstract: We study the Cauchy problem associated to a nonlinear Schr¨odinger equation modelling the self-channeling of a high power, ultra-short laser pulse in matter. The new nonlinear terms arise from relativistic effects and from the ponderomotive force. We prove global existence and uniqueness of small solutions in transverse space dimensions 2 and 3, and local existence without any smallness condition in transverse space dimension 1.
1. Introduction This paper is concerned with the global existence of small solutions to a nonlinear Schr¨odinger equation modelling the self-channeling of a high-power ultra-short laser pulse in a plasma. The propagation of a high-irradiance laser beam in a plasma creates an optical index depending nonlinearly on the light intensity and this leads to interesting new nonlinear wave equations [1, 3, 5, 11]. The new nonlinear effects are the relativistic decrease of the plasma frequency and the ponderomotive expelling of the electrons. In particular, a key parameter is the Lorentz factor γ of the electrons, defined by γ = (1 − v 2 /c2 )−1/2 . ?
Partially supported by GDR 1180 POAN of CNRS
74
A. de Bouard, N. Hayashi, J.-C. Saut
The full nonlinear model is obtained by coupling the Maxwell equations with the equations of fluid mechanics. Simplified models are derived by using a paraxial approximation and the approximation (when |a| is “small") γ = (1 + |a|2 )1/2 , where a = a(x, y, z, t) is the complex amplitude of the electromagnetic field vector potential. The nonlinear wave equation which describes the evolution of a is (see [1]) ( #) " −2 1γ 1 + k ∂ i 1 ∂ p + a+ + kp2 1 − a = 0, (1.1) vg ∂t ∂z 2k γ where vg is the group velocity, k 2 = k02 − kp2 , k0 = ω/c being the wave number, kp = ωp,0 /c the plasma wave number, where ωp,0 = (4πe2 ne,0 /me,0 )1/2 is the unperturbed plasma frequency. Finally, =
2 2 ∂2 ∂2 ∂2 ∂2 −2 ∂ −2 ∂ + + − c = 1 + − c ⊥ ∂x2 ∂y 2 ∂z 2 ∂t2 ∂z 2 ∂t2
is the wave operator. In the approximation of slowly varying complex amplitude, that is, assuming that the complex amplitude varies slowly over distances on the order of the wavelength in the direction of propagation and over times on the order of the oscillation period of the high-frequency field oscillations, one has ∂a k|a|, 1 ∂a k|a|, ∂z c ∂t so that the wave operator and the laplacian in the nonlinear term should be formally replaced by the transverse laplacian 1⊥ and (1.1) takes the form ( #) " 1 + kp−2 1⊥ γ ∂ i 1 ∂ 2 + a+ 1⊥ + kp 1 − a = 0. (1.2) vg ∂t ∂z 2k γ If we are considering a semi-infinite medium z > 0 and the pulse enters this medium moving in the positive z direction, it is convenient to perform the following change of variables (cf. [1]) ξ = z, τ = t − z/vg , and Eq. (1.2) is written as i ∂a + ∂ξ 2k
(
" 1⊥ +
kp2
1 + kp−2 1⊥ γ 1− γ
#) a = 0.
(1.3)
By rescaling ξ and (x, y), one can take k = −1, kp2 = 1 and we write (1.3), with a slight change of notations 2i ∂a + 1a + f (γ, 1γ)a = 0, (t, x) ∈ R × Rn , ∂t (1.4) a(0, x) = a (x), x ∈ Rn , 0
Global Existence of Small Solutions to Relativistic NLS Equation
75
p ∂2 ∂2 where γ = 1 + |a|2 f (γ, 1γ) = 1 − 1+1γ γ , and 1 = ∂x21 + · · · + ∂x2n is the Laplace operator in Rn , n = 1, 2, 3. Note that the case n = 3 occurs when using a non paraxial theory ([11]). Note also that it is physically pertinent to assume that the electron density is positive (no cavitation), i.e. 1 + 1γ > 0,
(1.5)
although this condition will play no major role in our results. In this paper, we will focus on the Cauchy problem (1.4). The main difficulty in obtaining local as well as global existence results for Eq. (1.4) is due to the presence of first and second order derivatives of the unknown function a in the nonlinear term. Actually, to avoid the treatment of a laplacian of a arising from f (γ, 1γ) in the nonlinear term, Eq. (1.4) will first be rewritten as a system in (a, a¯ )t involving a linear term in (1a, 1¯a)t , and in compensation exhibiting nonlinear terms as coefficients of the time derivative ∂t a (see Eq. (3.3)). We then use space and time derivatives of Eq. (3.3) to “linearize" the quadratic term involving ∇a (see Eqs. (3.4) and (3.7)). Since the diagonal coefficients of the highest order term in (3.3)–(3.7) are not pure imaginary, a gauge transform is used in order to apply the “energy method," and we then obtain a system in
a a¯ −1/2 q ∇a γ where γ = 1 + |a|2 U = γ −1/2 ∇¯a −1/2 γ ∂ a t γ −1/2 ∂t a¯ involving pure imaginary terms as diagonal coefficients of its highest order term (see (3.17)). Because of the presence of a nonlinear term in front of the time derivative ∂t U, we need to make use of function spaces involving time derivatives of a (see Sect. 2). The presence of this nonlinear term is also the reason why we cannot remove the smallness condition on the initial data, even in the local existence theorem, except in the one-dimensional case. The paper is organized as follows: in Sect. 2, we introduce the function spaces and state the results concerning local and global existence of solutions of Eq. (1.4) (Theorem 2.1, 2.2 and 2.3). Section 3 is devoted to rewriting Eq. (1.4) in the way described above, and stating intermediate results on the new equation (see Theorem 3.1 and 3.2). In Sect. 4, we prove Theorem 3.2 and deduce the proof of Theorem 2.2. The proof of Theorem 3.1 and 2.1 follows in a very similar way. We end Sect. 4 by showing that in the special case of dimension 1, the smallness condition on the data can be removed for the local existence result. Section 5 is devoted to a few concluding remarks. Part of those results have been announced in [2]. We thank T. Lehner for introducing us to this problem and for fruitful discussions. We also thank J. Ginibre who pointed out that our results were still valid for the Schr¨odinger flow for certain harmonic maps (see Remark 5.1).
76
A. de Bouard, N. Hayashi, J.-C. Saut
2. Notations and Statement of the Results In all that follows, we will denote for m ∈ N, p ∈ [1, +∞], W m,p = W m,p (Rn ) = {f ∈ Lp (Rn ),
∂ α f ∈ Lp (Rn ),
0 ≤ |α| ≤ m}
and H m = W m,2 . The standard norm in W m,p (resp. H m ) will be denoted k · km,p (resp. k · km ), while k · k = k · k0 = k · kL2 . In order to state our results on the Cauchy problem for Eq. (1.4), we need to introduce a few function spaces. The following spaces will be used to obtain local and global existence for small initial data. For any nonnegative integer m and any T ∈ R+ , we define XT2m = {f ∈ C([−T, T ], L2 (Rn )), |||f |||XT2m < +∞} , where |||f |||XT2m =
sup
X
−T ≤t≤T 0≤j≤m
k∂tj f (t)k2(m−j)
and for m ≥ 1, 2m 2m < +∞} , = {f ∈ C(R+ , L2 (Rn )), |||f |||X∞ X∞
where 2m = sup |||f |||X∞
X
t∈R 0≤j≤m
k∂tj f (t)k2(m−j) + sup
X
(1 + |t|)n/3 k∂tj f (t)k2(m−j−1),6 .
t∈R 0≤j≤m−1
2m 2n+4 2m 2n+4 and X2m . We also set X2m ∞ = (X∞ ) T = (XT ) In the one dimensional case, we use the following function space to get local existence without any smallness restriction on the initial data. Let n = 1, m a nonnegative integer and T ∈ R+ , then we define
YTm = {f ∈ C([−T, T ], L2 (R)), |||f |||YTm < +∞} , where |||f |||YTm =
sup
X
−T ≤t≤T 0≤j≤m
k∂tj f (t)km−j .
m 6 We also set Ym T = (YT ) . We can now state our results; let n = 2 or 3, then the Cauchy problem for Eq. (1.4) is locally and globally well posed, as is stated in the two following theorems.
Theorem 2.1. Let n = 1, 2 or 3, and assume that a0 ∈ H 2m (Rn ), with m ≥ 3, and ka0 k2m is sufficiently small. Then there is a positive T and a unique solution a of Eq. (1.4) such that a ∈ XT2m . Moreover, the solution satisfies the following conservation laws, for |t| ≤ T : ka(t)k = ka0 k, E(t) ≡
1 2
Z
R
n
|∇a(t)|2 − (γ(t) − 1)2 − |∇γ(t)|2 dx = E(0).
(2.1) (2.2)
Global Existence of Small Solutions to Relativistic NLS Equation
77
Theorem 2.2. Let n = 2 or 3, and assume that a0 ∈ H 2m (Rn ) ∩ W 2(m−1),6/5 (Rn ) with m ≥ 4, and ka0 k2m + ka0 k2(m−1),6/5 is sufficiently small. Then there is a unique global 2m . solution a of (1.4) with a ∈ X∞ When n = 1, we are able to remove the smallness condition of Theorem 2.1 by working in the space YTm and we obtain Theorem 2.3. Let n = 1 and assume that a0 ∈ H m (R), with m ≥ 6. Then there is a positive T and a unique solution a of Eq. (1.4) such that a ∈ C([0, T ]; H m (R)), which satisfies the conservation laws (2.1) and (2.2). 3. Transformation of the Equation In this section, we transform Eq. (1.4) p into a system to which we are able to apply energy estimates. First, recalling that γ = 1 + |a|2 , a direct computation of the nonlinear term in (1.4) gives 1 a2 2i∂t a + 21 1a = − 2 1a + 2 1¯a 2γ 2γ (3.1) , a 1 (∇|a|2 )2 1 2 + 2 |∇a| − −1 a + γ 4 γ2 γ so that the equation can be written as a system involving a and a¯ 2 2 −a a 2 + |a| 1 a 2i∂t + 2 1 2γ 2 2 a¯ −(2 + |a| ) a¯ a¯ 1 (∇|a|2 )2 a 1 2 −1 a + γ 2 |∇a| − 4 γ 2 γ = . 2 2 a¯ 1 (∇|a| ) 1 − 1 a¯ + − 2 |∇a|2 − 2 γ 4 γ γ
(3.2)
Multiplying both sides of (3.2) by
−a 1 2 + |a| , 2 2 2 a¯ −(2 + |a| ) 2
H −1 (a) =
we obtain
2
2 2 −a a 2 + |a| a i ∂t + 1 a¯ 2 −(2 + |a|2 ) a¯ a¯ 2 2 1 1 (∇|a| ) 2 −1 a a + γ2 |∇a| − 4 γ 2 γ = 2 2 . 1 1 (∇|a| ) 2 2 − 1 a¯ a¯ + γ |∇a| − 4 γ2 γ
(3.3)
78
A. de Bouard, N. Hayashi, J.-C. Saut
Differentiating both sides of (3.3) with respect to t, we obtain 2 2 2 + |a| ∂ −a a a ∂ t t i ∂t + 1 a¯ 2 −(2 + |a|2 ) ∂t a¯ ∂t a¯ a2 ∇|a|2 |a|2 ∇|a|2 a∇a − a∇¯ a − a ∂ 2 2 t 2γ 2γ ∇ = 2 2 2 2 a¯ ∇|a| |a| ∇|a| ∂t a¯ a¯ ∇¯a − a ¯ ∇a − 2 2γ 2 2γ 2 2 f2 − i∂t |a| f3 + i∂t a ∂t a + , f¯3 − i∂t a¯ 2 f¯2 + i∂t |a|2 ∂t a¯ where we have set 1 (∇|a|2 )2 ∇|a|2 3 − a∇¯a + ( |a|2 + 1) f2 = |∇a| − 4 γ4 2γ 2 2 2
and
a2 (∇|a|2 )2 ∇|a|2 a2 f3 = − a∇a + 4 γ4 2γ 2 2
|a|2 1 −1 − γ 2
1 −2 . γ
(3.4)
(3.5)
(3.6)
In the same way, differentiating both sides of (3.3) with respect to xj we obtain for j = 1, . . . , n, denoting ∂j = ∂xj , 2 2 ∂ −a a a ∂ 2 + |a| j j i ∂t + 1 a¯ 2 −(2 + |a|2 ) ∂j a¯ ∂j a¯ |a|2 ∇|a|2 a2 ∇|a|2 a∇¯ a − a∇a − a ∂ j 2γ 2 2γ 2 ∇ = (3.7) 2 2 2 2 a¯ ∇|a| |a| ∇|a| ∂j a¯ a¯ ∇¯a − a ¯ ∇a − 2 2γ 2 2γ 2 2 f2 f3 ∂j a −i∂j |a| i∂j a ∂t a + + . f¯3 f¯2 −i∂j a¯ 2 i∂j |a|2 ∂j a¯ ∂t a¯ We cannot apply energy methods to the system of equations (3.3)–(3.7) directly, because the diagonal coefficients of the matrix a2 ∇|a|2 |a|2 ∇|a|2 a∇a − a∇¯ a − 2γ 2 2γ 2 a¯ 2 ∇|a|2 |a|2 ∇|a|2 a¯ ∇¯a − a¯ ∇a − 2γ 2 2γ 2 are not pure imaginary. But since the real part of those coefficients is the gradient of the function h = ln(γ)1/2 , we can use a gauge transform in order that they become pure imaginary (see [6, 7 and 8] for an explanation of the method which stems from an idea
Global Existence of Small Solutions to Relativistic NLS Equation
79
of Soyeur [13]). Hence we multiply both sides of (3.4) by e−h = (1 + |a|2 )−1/4 = γ −1/2 to obtain 2 2 −1/2 −1/2 −a ∂t a ∂t a γ γ 2 + |a| i ∂t + 1 a¯ 2 −(2 + |a|2 ) γ −1/2 ∂t a¯ γ −1/2 ∂t a¯
a2 ∇|a|2 −1/2 2i Im a∇¯ a 2a∇a − γ ∂ a t γ2 .∇ = a¯ 2 ∇|a|2 −1/2 2¯a∇¯a − 2i Im a ¯ ∇a ∂ a ¯ γ t γ2
−1/2 f2 − i∂t |a| f3 + i∂t a ∂t a 0 γ + a ¯ 0 γ −1/2 ∂ 2 ¯ 2 t ¯ f3 − i∂t a¯ f2 + i∂t |a|
2
2
1/2 −1/2 γ ∂ (γ ) 0 t 2 2 −1/2 −a ∂t a γ 2 + |a| +i 2 2 −1/2 a¯ −(2 + |a| ) ∂t a¯ γ 0 γ 1/2 ∂t (γ −1/2 )
1/2 −1/2 γ 1(γ ) 0 −1/2 ∂t a γ + γ −1/2 ∂t a¯ 0 γ 1/2 1(γ −1/2 )
1/2 −1/2 a2 ∇|a|2 |a|2 ∇|a|2 ∇(γ ) 0 γ a∇a − a∇¯a − 2γ 2 2γ 2 − . 2 2 2 2 a¯ ∇|a| |a| ∇|a| a¯ ∇¯a − a ¯ ∇a − 1/2 −1/2 2γ 2 2γ 2 ) 0 γ (∇γ
−1/2 γ ∂ a t × . −1/2 ∂t a¯ γ
(3.8)
In the same way, multiplying both sides of (3.5) by γ −1/2 , we obtain for j = 1, . . . , n,
80
A. de Bouard, N. Hayashi, J.-C. Saut
2 2 −1/2 −1/2 −a ∂ a ∂ a γ 2 + |a| γ j j i ∂t + 1 γ −1/2 ∂j a¯ γ −1/2 ∂j a¯ a¯ 2 −(2 + |a|2 ) a2 ∇|a|2 −1/2 2i Im a∇¯ a 2a∇a − γ ∂ a j 2 γ ∇ = a¯ 2 ∇|a|2 −1/2 2¯a∇¯a − 2i Im a ¯ ∇a ∂ a ¯ γ j 2 γ −1/2 2 2 −1/2 ∂j a −i∂j |a| i∂j a γ ∂t a f2 f3 γ + + f¯2 f¯3 γ −1/2 ∂j a¯ γ −1/2 ∂t a¯ −i∂j a¯ 2 i∂j |a|2 2 2 1/2 −1/2 −1/2 −a ) 0 ∂j a γ ∂t (γ γ 2 + |a| +i a¯ 2 −(2 + |a|2 ) 0 γ 1/2 ∂t (γ −1/2 ) γ −1/2 ∂j a¯ 1/2 −1/2 −1/2 γ 1(γ ) 0 ∂ a γ j + 0 γ 1/2 1(γ −1/2 ) γ −1/2 ∂j a¯ |a|2 ∇|a|2 a2 ∇|a|2 1/2 −1/2 a∇¯ a − a∇a − ∇(γ ) 0 γ 2 2 2γ 2γ . − a¯ 2 ∇|a|2 |a|2 ∇|a|2 0 γ 1/2 ∇(γ −1/2 ) a¯ ∇¯a − a ¯ ∇a − 2γ 2 2γ 2 −1/2 ∂j a γ × . γ −1/2 ∂j a¯ (3.9) In terms of u 0 a u¯ 0 a¯ u1 −1/2 ∇a γ ∈ C2n+4 u¯ = 1 γ −1/2 ∇¯a .. . −1/2 γ ∂t a un+1 −1/2 ∂t a¯ γ u¯ n+1
we may rewrite
Global Existence of Small Solutions to Relativistic NLS Equation
|∇a|2 − =
n X k=1
81
1 (∇|a|2 )2 4 γ2
1 [(1 + |u0 |2 )1/2 |uk |2 − (1 + |u0 |2 )−1/2 (u0 u¯ k + u¯ 0 uk )2 ] , 4
and Eq. (3.3) becomes
u0 2 + |u0 | u0 i ∂t + 1 u¯ 20 −(2 + |u0 |2 ) u¯ 0 u¯ 0 2
=
−u20
n X
1 u0 [(1 + |u0 |2 )1/2 |uk |2 − (1 + |u0 |2 )−1/2 (u0 u¯ k + u¯ 0 uk )2 ] 4 k=1 u¯ 0 u0 +(1 + |u0 |2 )((1 + |u0 |2 )−1/2 − 1) . u¯ 0
(3.10)
With the same notations, we have (2i Im a∇¯a).∇ = 2i(1 + |u0 |2 )1/4
X
Im (u0 u¯ k )∂k
1≤k≤n
and n
(2a∇a −
X a2 ∇|a|2 ).∇ = [2(1 + |u0 |2 )1/4 u0 uk − (1 + |u0 |2 )−3/4 u20 (u¯ 0 uk + u0 u¯ k )]∂k 2 γ k=1
so that
2i Im a∇¯a 2¯a∇¯a −
a¯ 2 ∇|a|2 γ2
a2 ∇|a|2 γ2 .∇ 2i Im a¯ ∇a
2a∇a −
2 1/4 n X b(u0 , uk ) 2i(1 + |u0 | ) Im u0 u¯ k = ∂k , k=1 b(u0 , uk ) 2i(1 + |u0 |2 )1/4 Im u¯ 0 uk
where we have set b(u0 , uk ) = (1 + |u0 |2 )−3/4 [(2 + |u0 |2 )u0 uk − u30 u¯ k ]. One can also compute (see (3.5) and (3.6))
(3.11)
82
A. de Bouard, N. Hayashi, J.-C. Saut
f2 = (1 + |u0 |2 )1/2
n X
|uk |2
k=1
n X 1 1 − 2(2 + |u0 |2 )|u0 |2 |uk |2 + (3 + 2|u0 |2 )u20 u¯ 2k + u¯ 20 u2k 2 3/2 4 (1 + |u0 | ) ÿ k=1 ! |u0 |2 1 3 2 −1 − +( 2 |u0 | + 1) p 2 1 + |u0 |2 (3.12)
and n X 1 1 (u40 u¯ 2k − |u0 |4 u2k − 2u20 |uk |2 − 2|u0 |2 u2k ) f3 = 4 (1 + |u0 |2 )3/2 k=1 ÿ ! u20 1 p −2 . + 2 1 + |u0 |2
(3.13)
Keeping these notations in view, we have 1[(1 + |a|2 )−1/4 ] n
=
X 5 1 (1 + |u0 |2 )−7/4 (u¯ 0 uk + u0 u¯ k )2 − (1 + |u0 |2 )−5/4 (a1¯a + a¯ 1a) 16 4 k=1 n X 1 − (1 + |u0 |2 )−3/4 |uk |2 , 2 k=1
and since from Eq. (3.3), (∇|a|2 )2 )a 1a = −i(2 + |a|2 )∂t a + ia2 ∂t a¯ + (|∇a|2 − 41 (1 + |a|2 )2 ÿ ! 1 − 1 a, +(1 + |a|2 ) p 1 + |a|2 we may write a¯ 1a + a1¯a as a¯ 1a + a1¯a = −4(1 + |u0 |2 )5/4 Im (u0 u¯ n+1 ) n X 1 (u0 u¯ k + u¯ 0 uk )2 +2|u0 |2 [(1 + |u0 |2 )1/2 |uk |2 − ] 4 (1 + |u0 |2 )3/2 k=1 ÿ ! 1 −1 . +2|u0 |2 (1 + |u0 |2 ) p 1 + |u0 |2 Hence, 5 + 1[(1 + |a|2 )−1/4 ] = (1 + |u0 |2 )−11/4 ( 16 n
7 2 16 |u0 | )
n X
(u0 u¯ k + u¯ 0 uk )2
k=1
X 1 − (1 + |u0 |2 )1/4 |uk |2 + Im (u0 u¯ n+1 ) 2 k=1 ! ÿ 1 1 2 2 −1/4 p −1 . − |u0 | (1 + |u0 | ) 2 1 + |u0 |2
(3.14)
Global Existence of Small Solutions to Relativistic NLS Equation
83
Equation (3.8) then becomes after using (3.11)–(3.14),
un+1 un+1 2 + |u0 | i ∂t + 1 u¯ 20 −(2 + |u0 |2 ) u¯ n+1 u¯ n+1 2
−u20
=
n X
b(u0 , uk ) 2i(1 + |u0 | ) Im u0 u¯ k un+1 ∂k k=1 b(u0 , uk ) 2i(1 + |u0 |2 )1/4 Im u¯ 0 uk u¯ n+1 2 1/4
f u f 2 3 n+1 + f¯3 f¯2 u¯ n+1
u ¯ + u ¯ u ) 2u u −(u u 0 j 0 j 0 j n+1 +i(1 + |u0 |2 )1/4 −2u¯ 0 u¯ j u0 u¯ j + u¯ 0 uj u¯ n+1 2 2 u | −u 2 + |u 0 1 0 n+1 − i(1 + |u0 |2 )−3/4 (u0 u¯ n+1 + u¯ 0 un+1 ) (3.15) 4 u¯ 20 −(2 + |u0 |2 ) u¯ n+1
n X u (u , u ) 0 f n+1 4 0 k + k=1 u¯ n+1 0 f4 (u0 , uk )
+[(1 + |u0 |2 )1/4 Im u0 u¯ n+1 − 21 |u0 |2
u n+1 (1 + |u0 |2 )−1/2 − 1 ] u¯ n+1
1 + (1 + |u0 |2 )−1/2 4
u0 u¯ k u0 uk un+1 (u0 u¯ k + u¯ 0 uk ) k=1 u¯ 0 uk u¯ 0 u¯ k u¯ n+1
1 − (1 + |u0 |2 )−3/2 8
n X
n X
|u0 | un+1 (u0 u¯ k + u¯ 0 uk )2 , k=1 u¯ 20 |u0 |2 u¯ n+1 2
u20
where 5 f4 (u0 , uk ) = (1 + |u0 |2 )−5/2 ( 16 +
7 2 ¯ 0 uk 16 |u0 | )(u
+ u0 u¯ k )2 − 21 (1 + |u0 |2 )1/2 |uk |2 .
84
A. de Bouard, N. Hayashi, J.-C. Saut
In the same way, Eq. (3.9) can be written as 2 2 | −u 2 + |u u 0 0 j uj i ∂t + 1 u¯ 20 −(2 + |u0 |2 ) u¯ j u¯ j
2 1/4 n X b(u0 , uk ) 2i(1 + |u0 | ) Im u0 u¯ k uj = ∂k k=1 b(u0 , uk ) 2i(1 + |u0 |2 )1/4 Im u¯ 0 uk u¯ j
f 2 f 3 uj + f¯3 f¯2 u¯ j
2u0 uj −(u0 u¯ j + u¯ 0 uj ) un+1 +i(1 + |u0 |2 )1/4 −2u¯ 0 u¯ j u0 u¯ j + u¯ 0 uj u¯ n+1
−u20
uj 2 + |u0 | − 41 i(1 + |u0 |2 )−3/4 (u0 u¯ n+1 + u¯ 0 un+1 ) 2 2 u¯ 0 −(2 + |u0 | ) u¯ j 2
(3.16)
n X (u , u ) 0 f 4 0 k uj + k=1 u¯ j 0 f4 (u0 , uk )
+[(1 + |u0 |2 )1/4 Im u0 u¯ n+1 − 21 |u0 |2
uj (1 + |u0 |2 )−1/2 − 1 ] u¯ j
1 + (1 + |u0 |2 )−1/2 4
u ¯ u u u 0 k 0 k uj (u0 u¯ k + u¯ 0 uk ) k=1 u¯ 0 uk u¯ 0 u¯ k u¯ j
n X
1 − (1 + |u0 |2 )−3/2 8
2 2 | u |u 0 uj 0 (u0 u¯ k + u¯ 0 uk )2 . k=1 u¯ 20 |u0 |2 u¯ j
n X
Equations (3.10), (3.15) and (3.16) can be put together to give a system of the form:
Global Existence of Small Solutions to Relativistic NLS Equation
iA(u0 )∂t U + 1U =
n X
85
B(u0 , uk )∂k U + F (U) ,
(3.17)
k=1
where
u (t, x) 0 u¯ 0 (t, x) .. ∈ C2n+4 , U(t, x) = . un+1 (t, x) u¯ n+1 (t, x)
A(u0 ) = 2
H −1 (u0 ) 0 .. . 0
and
··· . H −1 (u0 ) . . .. .. . . ··· 0 0
0
0 B(u0 , uk ) = . .. 0
0 G(u0 , uk ) .. . ···
G(u0 , uk )
ÿ 2 −3/4
= (1 + |u0 | )
··· .. . .. . 0
1 2 + |u0 | , 2 2 2 u¯ 0 −(2 + |u0 | ) 2
0 H −1 (u0 )
H −1 (u0 ) =
0 .. .
0 .. . 0 G(u0 , uk )
−u20
(3.18)
,
1≤k≤n
, and
2i(1 + |u0 |2 ) Im u0 u¯ k
(2 + |u0 |2 )u0 uk − u30 u¯ k
(2 + |u0 |2 )u¯ 0 u¯ k − u¯ 30 uk
2i(1 + |u0 |2 ) Im u¯ 0 uk
and at last,
F0 (U) F¯0 (U) F(U) = . .. . F¯n+1 (U) One can compute explicitly the terms
! ,
86
A. de Bouard, N. Hayashi, J.-C. Saut
F0 (U ) =
n X k=1
1 [(1 + |u0 |2 )1/2 |uk |2 − (1 + |u0 |2 )−1/2 (u0 u¯ k + u¯ 0 uk )2 ]u0 4 2 −1/2
+(1 + |u0 | )((1 + |u0 | ) 2
(3.19)
− 1)u0 ,
Fn+1 (U) = f2 un+1 + f3 u¯ n+1 + 4i (1 + |u0 |2 )−3/4 [−2u0 |un+1 |2 − 2u¯ 0 u2n+1 − u¯ 0 |u0 |2 u2n+1 + u30 u¯ 2n+1 ] +
n X
f4 (u0 , uk )un+1 − (1 + |u0 |2 )1/4 Im (u0 u¯ n+1 )un+1
k=1
− 21 |u0 |2 ((1 + |u0 |2 )−1/2 − 1)un+1 + 41 (1 + |u0 |2 )−1/2
n X
[u20 u¯ 2k un+1 + |u0 |2 |uk |2 un+1 + u20 |uk |2 u¯ n+1
k=1
+|u0 |2 u2k u¯ n+1 −
1 2
u0 (u0 u¯ k + u¯ 0 uk )2 (u0 u¯ n+1 + u¯ 0 un+1 )] , 1 + |u0 |2
(3.20)
and for j = 1, . . . , n, Fj (U) = f2 uj + f3 u¯ j − i(1 + |u0 |2 )1/4 (u0 u¯ j + u¯ 0 uj )un+1 +2i(1 + |u0 |2 )1/4 u0 uj u¯ n+1 − 4i (1 + |u0 |2 )−3/4 (2 + |u0 |2 )(u0 u¯ n+1 + u¯ 0 un+1 )uj − 4i (1 + |u0 |2 )−3/4 u20 u¯ j +
n X
f4 (u0 , uk )uj − (1 + |u0 |2 )1/4 Im (u0 u¯ n+1 )uj
k=1
− 21 |u0 |2 ((1 + |u0 |2 )−1/2 − 1)uj + 41 (1 + |u0 |2 )−1/2
n X
[u20 u¯ 2k uj + |u0 |2 |uk |2 uj + u20 |uk |2 u¯ j + |u0 |2 u2k u¯ j
k=1
u0 − 21 (u0 u¯ k + u¯ 0 uk )2 (u0 u¯ j + u¯ 0 uj )]. 1 + |u0 |2
(3.21)
We will then consider the Cauchy problem for Eq. (3.17), supplemented with the initial condition
Global Existence of Small Solutions to Relativistic NLS Equation
87
φ 0 ¯ φ0 U(0) = U0 = φ1 . .. . φ¯ n+1
(3.22)
In order to prove the theorems stated in Sect. 2, we first prove that the Cauchy problem (3.17), (3.22) is locally well posed for small initial data, and globally if the space dimension is greater than 1. More precisely we prove the following Theorem 3.1 and 2m 2n+4 2m 2n+4 and X2m . Theorem 3.2, where we recall that we have set X2m ∞ = (X∞ ) T = (XT ) Theorem 3.1. Assume that n = 1, 2, or 3. There is an 0 > 0 such that if U0 ∈ n+1 X kφj k2m ≤ 0 , then there exists T > 0 and a (H 2m (Rn ))2n+4 with m ≥ 2 and j=0
unique solution U ∈ X2m T of the system (3.17),(3.22). Theorem 3.2. Assume that n = 2 or 3. There is an 00 > 0 such that if U0 ∈ (H 2m ∩ n+1 X (kφj k2m + kφj k2(m−1),6/5 ) ≤ 00 , then there is a W 2(m−1),6/5 )2n+4 with m ≥ 3 and j=0
unique global solution U ∈ X2m ∞ of the system (3.17), (3.22). 4. Proof of the Results In this section, we first prove Theorem 3.2, by considering the linearized equation associated to (3.17), that is iA(v0 )∂t U + 1U =
n X
B(v0 , vk )∂k U + F (V) ,
(4.1)
k=1
where V = (v0 , v¯ 0 , . . . , v¯ n+1 )t ∈ X2m ∞ . After having proved Theorem 3.2, we explain how one may deduce the proof of Theorem 2.2 . The proofs of Theorem 3.1 and Theorem 2.1 follow exactly the same lines, so that they are omitted. Proof of Theorem 3.2. We assume that
n+1 X
(kφj k2m + kφj k2(m−1),6/5 ) ≤ 1 in (3.22),
j=0
≤ 1 the mapping M by U = M V, where U and we define, for V ∈ X2m ∞ and |||V|||X2m ∞ is the solution of Eq. (4.1) with the initial condition (3.22) given by Lemma A.2 in the appendix. We consider the closed ball with radius ρ in X2m ∞ , 2m ≤ ρ}, X2m ∞,ρ = {W ∈ X∞ ; |||W|||X2m ∞ 2m where ρ ≤ 1 , and assume that V ∈ X2m ∞,ρ . We will prove that M maps X∞,ρ into 2(m−1) in order to use a fixed point itself and is a contraction mapping for the norm of X∞ argument. To simplify the computations, we only prove the case m = 3.
88
A. de Bouard, N. Hayashi, J.-C. Saut
We differentiate Eq. (4.1) with respect to t three times to obtain iA(v0 )∂t Ut + 1Ut = −iAt (v0 )Ut + +Ft (V) +
n X
n X
Bt (v0 , vk )∂k U
k=1
(4.2)
B(v0 , vk )∂k Ut ,
k=1
iA(v0 )∂t Utt + 1Utt = −2iAt (v0 )Utt − iAtt (v0 )Ut n n X X +2 Bt (v0 , vk )∂k Ut + Btt (v0 , vk )∂k U k=1
n X
+Ftt (V) +
k=1
(4.3)
B(v0 , vk )∂k Utt ,
k=1
and iA(v0 )∂t Uttt + 1Uttt = −3iAt (v0 )Uttt − 3iAtt (v0 )Utt n X −iAttt (v0 )Ut + 3 Btt (v0 , vk )∂k Ut +3
n X
k=1
Bt (v0 , vk )∂k Utt +
k=1
+Fttt (V) +
n X
n X
Bttt (v0 , vk )∂k U
(4.4)
k=1
B(v0 , vk )∂k Uttt .
k=1
Let
1 0 J = 0 .. . 0
0 −1 ..
.
..
.
···
0 ··· 0 . . 0 . . .. .. .. . . 0 . 0 1 0 0 0 −1
(4.5)
Multiplying both sides of (4.2) by (J U¯t )t and integrating the imaginary part of the resulting expression over Rn , one obtains (we recall that k · k = k · kL2 ) Z d k(2 + |v0 |2 )1/2 Ut (t)k2 − Re (v02 U¯t2 )dx dt Rn (4.6) ≤ C(kv0 kL∞ kv0t kL∞ kUt (t)k2 + (1 + ρ2 )kV(t)k2 ∞ k∇U (t)kkUt (t)k L
+(1 + ρ2 )kV(t)k21,∞ kUt (t)k(ρ + kUt (t)k)). In what follows, we assume that ρ ≤ 1. We then deduce from (4.6) that
Global Existence of Small Solutions to Relativistic NLS Equation
89
Z d 2 1/2 2 2 ¯2 k(2 + |v0 | ) Ut (t)k − Re (v0 Ut )dx dt Rn ≤ C(kV(t)k22,6 + kVt (t)k21,6 )(k∇U(t)k2 + kUt (t)k2 + ρkUt (t)k) ≤ C(1 + |t|)
− 23 n
|||V|||2X6 |||U|||X6∞ (ρ ∞
(4.7)
+ |||U|||X6∞ )
≤ C(1 + |t|)− 3 n ρ2 |||U|||X6∞ (ρ + |||U|||X6∞ ). 2
In the same way as in the proof of (4.7), we obtain by taking the inner product of (4.3) with J U¯tt , Z d 2 k(2 + |v0 |2 )1/2 Utt (t)k2 − Re (v02 U¯tt )dx dt Rn ≤ C(kV(t)k22,6 + kVt (t)k2L∞ )(k∇Ut (t)k2 + kUtt (t)k2 ) +C(kV(t)kL∞ kUt (t)kL∞ kVtt (t)k2 kUtt (t)k +kVt (t)k2L6 kUt (t)kL6 kUtt (t)k) +C(kV(t)kL∞ k∇U(t)kL∞ kVtt (t)k + kUtt (t)k
(4.8)
+kVt (t)k2L6 k∇U(t)kL6 kUtt (t)k) +CkV(t)kL∞ (kV(t)kL∞ + kVt (t)kL∞ )(kVt (t)k + kVtt (t)k)kUtt (t)k ≤ C(1 + |t|)− 3 n |||V|||2X6 |||U|||X6∞ (|||V|||X6∞ + |||U|||X6∞ ) 2
∞
≤ C(1 + |t|)
− 23 n 2
ρ |||U|||X6∞ (ρ + |||U|||X6∞ ).
In the same way as in the proof of (4.8), we have by (4.3), Z d 2 k(2 + |v0 |2 )1/2 Uttt (t)k2 − Re (v02 U¯ttt )dx dt Rn
(4.9)
≤ C(1 + |t|)− 3 n ρ2 |||U|||X6∞ (ρ + |||U|||X6∞ ). 2
Collecting (4.7), (4.8), (4.9) and the estimate obtained in the same way using (4.1), we get Z 3 d X 2 1/2 j 2 2 j ¯2 k(2 + |v0 | ) ∂t U(t)k − Re (v0 ∂t U )dx dt Rn (4.10) j=0 ≤ C(1 + |t|)− 3 n ρ2 |||U|||X6∞ (ρ + |||U|||X6∞ ) 2
which, once integrated in t gives, since n ≥ 2, 3 X j=0
k∂tj U(t)k2 ≤ C(kU0 k26 + ρ2 |||U|||X6∞ (ρ + |||U|||X6∞ )).
(4.11)
90
A. de Bouard, N. Hayashi, J.-C. Saut
By (4.1), (4.2), (4.3) and Sobolev inequalities, k1Uk ≤ C(1 + ρ2 )(k∂t U k + k∇U k) + Cρ3 , k1Ut k ≤ C(1 + ρ2 )(k∂t2 Uk + k∂t Uk + k∇U k + k∇∂t U k) + Cρ3 , k1Utt k ≤ C(1 + ρ2 )(
3 X
k∂tj U k +
j=1
2 X
k∇∂tj U k) + Cρ3 .
(4.12) (4.13) (4.14)
j=0
Since, for j = 0, 1, 2, k∇∂tj Uk ≤
1 k1∂tj U k + Ck∂tj U k, 2
we get by (4.11), 2 X j=0
k1∂tj Uk ≤ C(1 + ρ2 )(kU0 k6 + ρ|||U|||X6∞ + ρ2 ).
(4.15)
Applying the operator 1 to (4.1), we get 12 U = −iA(v0 )∂t 1U − 2i∇A.∇∂t U − 1A∂t U n X (1B∂k U + 2∇B.∇∂k U + B1∂k U ) + 1F(V) , +
(4.16)
k=1
from which it follows k12 Uk ≤ C(1 + ρ2 )(k∂t 1U k + k∇∂t U k + k∂t U k +k∇Uk + k1Uk + k∇1U k). Hence by (4.15), k12 Uk ≤ C(1 + ρ2 )(kU0 k6 + ρ|||U|||X6∞ + ρ2 )
(4.17)
(where we have assumed that ρ ≤ 1). In the same way as in the proof of (4.17) we have k13 Uk ≤ C(kU0 k6 + ρ|||U|||X6∞ + ρ2 )
(4.18)
k12 ∂t Uk ≤ C(kU0 k6 + ρ|||U|||X6∞ + ρ2 ).
(4.19)
and From (4.11), (4.15)–(4.19) it follows that sup
3 X
t∈R j=0
k∂tj U(t)k2(3−j) ≤ C(kU0 k6 + ρ|||U|||X6∞ + ρ2 ). it
(4.20)
We now consider the integral version of Eq. (4.1): let S(t) = e 2 1 the unitary group associated with the linear Schr¨odinger equation, that is u(t) = S(t)u0 is the solution of
Global Existence of Small Solutions to Relativistic NLS Equation
91
iut + 1 1u = 0 2 u(0) = u , 0 and define
S(t) 0 0 · · · 0 .. .. . 0 S(−t) 0 . .. .. .. S(t) = 0 . . . 0 . .. .. . . 0 S(t) 0 0 · · · 0 0 S(−t)
Equation (4.1) is then equivalent to the integral equation Z t S(t − s)(2J − A(v0 (s)))Ut (s)ds U(t) = S(t)U0 + 0 Z n X t −i S(t − s)B(v0 (s), vk (s))∂k U (s)ds Zk=1t
(4.21)
0
S(t − s)F(v(s))ds ,
−i 0
where J is the (2n+4)×(2n+4) matrix defined by (4.5). Using the well-known L6 −L6/5 time decay estimate kS(t)u0 kL6 ≤ (4π|t|)−n/3 ku0 kL6/5 ,
(4.22)
we get from (4.21), kU (t)kL6 ≤ C(1 + |t|)−n/3 kU0 kL6/5 Z t +C (1 + |t − s|)−n/3 kV(s)k2L6 (kUt (s)k + k∇U (s)k + kV(s)k)ds
(4.23)
0
≤ C(1 + |t|)−n/3 (kU0 k4,6/5 + kU0 k6 ) Z t 2 (1 + |t − s|)−n/3 (1 + |s|)−2n/3 ds. +Cρ (ρ + |||U|||X6∞ ) 0
In order to get estimates on kUt (t)kL6 and kUtt (t)kL6 , we use the integral equations equivalent to (4.2) and (4.3), namely
92
A. de Bouard, N. Hayashi, J.-C. Saut
Z t Ut (t) = S(t)Ut (0) + S(t − s)(2J − A(v0 (s)))Utt (s)ds 0 Z t S(t − s)At (v0 (s))Ut (s)ds − 0 n XZ t −i S(t − s)[Bt (v0 (s), vk (s))∂k U (s) k=1
(4.24)
0
+B(v0 (s), vk (s))∂k Ut (s)]ds Z t S(t − s)Ft (V(s))ds −i 0
and
Z t S(t − s)(2J − A(v0 (s)))Uttt (s)ds Utt (t) = S(t)Utt (0) + 0 Z t S(t − s)At (v0 (s))Utt (s)ds −2 Z t0 S(t − s)Att (v0 (s))Ut (s)ds − 0 n Z t X −i S(t − s)[2Bt (v0 (s), vk (s))∂k Ut (s) k=1
(4.25)
0
+Btt (v0 (s), vk (s))∂k U(s) + B(v0 (s), vk (s))∂k Utt (s)]ds Z t −i S(t − s)Ftt (V(s))ds. 0
As we obtained (4.23), we get from these two equations, using the time decay estimate (4.22), kUt (t)kL6 ≤ C(1 + |t|)−n/3 kUt (0)kL6/5 Z t +C (1 + |t − s|)−n/3 (kV(s)k2L6 + kVt (s)k2L6 ) 0
×(kUtt (s)k + kUt (s)k + k∇U(s)k + k∇Ut (s)k + kV(s)k)ds. Ut (0) is estimated by using Eq. (4.1) again, i.e. Ut (0) = i(A(v0 ))−1 [1U −
n X
B(v0 , vk )∂k U − F(V)],
k=1
and we get kUt (0)kL6/5 ≤ C(1 + ρ2 )(k1U0 kL6/5 + ρ2 k∂k U0 kL6/5 + kV(0)k2L6 kV(0)k) ≤ C(ρ + kU0 k4,6/5 + kU0 k6 ).
(4.26)
Global Existence of Small Solutions to Relativistic NLS Equation
93
Hence, we infer from (4.26) that kUt (t)kL6 ≤ C(1 + |t|)−n/3 (kU0 k4,6/5 + kU0 k6 ) Z t +Cρ2 (ρ + |||U|||X6∞ ) (1 + |t − s|)−n/3 (1 + |s|)−2n/3 ds.
(4.27)
0
In exactly the same way, using (4.25) we obtain kUtt (t)kL6 ≤ C(1 + |t|)−n/3 kUtt (0)kL6/5 Z t +C (1 + |t − s|)−n/3 (kV(s)k2L6 + kVt (s)k2L6 + kVtt (s)k2L6 ) 0
×(kUttt (s)k + kUtt (s)k1 + kUt (s)k1 + kU (s)k1 )ds. We estimate Utt (0) with the help of Eq. (4.3), kUtt (0)kL6/5 ≤ C(1 + ρ2 )[k1Ut (0)kL6/5 + ρ2 (1 + ρ2 )kUt (0)kL6/5 +ρ2 k∇U0 kL6/5 + ρ2 kVt (0)k] , and since from (4.16), k1Ut (0)kL6/5 ≤ C(1 + ρ2 )[ρk∇Ut (0)kL6/5 + ρk∂t U (0)kL6/5 +(1 + ρ)kU0 k4,6/5 ] ≤
1 k1Ut (0)kL6/5 + C(kU0 k4,6/5 + k∂t U (0)kL6/5 ) , 2
we finally get kUtt (t)kL6 ≤ C(1 + |t|)−n/3 (kU0 k4,6/5 + kU0 k6 ) Z t 2 (1 + |t − s|)−n/3 (1 + |s|)−2n/3 ds. +Cρ (ρ + |||U|||X6∞ )
(4.28)
0
It remains to estimate k12 U(t)kL6 + k1Ut (t)kL6 ; this is done in a similar fashion by applying the operators 12 and 1 respectively to Eqs. (4.21) and (4.24), and the estimate reads k12 U(t)kL6 + k1Ut (t)kL6 ≤ C(1 + |t|)−n/3 (kU0 k4,6/5 + kU0 k6 ) Z t 2 (1 + |t − s|)−n/3 (1 + |s|)−2n/3 ds. +Cρ (ρ + |||U|||X6∞ ) 0
Collecting (4.23), (4.27), (4.28) and (4.29), we have
(4.29)
94
A. de Bouard, N. Hayashi, J.-C. Saut 2 X
(1 + |t|)n/3 k∂tj U(t)k4−2j,6
j=0
≤ C(kU0 k4,6/5 + kU0 k6 ) +Cρ2 (1 + |t|)n/3 (ρ + |||U|||X6∞ )
Z
t
(4.30) (1 + |t − s|)−n/3 (1 + |s|)−2n/3 ds
0
≤ C(kU0 k4,6/5 + kU0 k6 ) + Cρ2 (ρ + |||U|||X6∞ ). Hence we finally get from (4.20) and (4.30), |||U|||X6∞ ≤ C(kU0 k4,6/5 + kU0 k6 + ρ|||U|||X6∞ + ρ2 ).
(4.31)
Using the same computations, one can show that if V1 , V2 ∈ X6∞,ρ , with ρ ≤ inf (1, 1 ), and if we set U1 = M V1 and U2 = M V2 , then |||U1 − U2 |||X4∞
(4.32)
≤ C(ρ + |||U1 |||X6∞ + |||U2 |||X6∞ )(|||U1 − U2 |||X4∞ + |||V1 − V2 |||X4∞ ). Note that we cannot estimate the X6∞ -norm of U1 − U2 , due to the presence of the term A(v0 ) in Eq. (4.1). Now, we choose ρ in order that Cρ ≤
1 , 9
where C is the maximum of the constants appearing in (4.31) and (4.32), and assume that 7ρ . kU0 k4,6/5 + kU0 k6 ≤ 9C We then infer from (4.31) and (4.32) that M maps X6∞,ρ into itself and is a contraction mapping for the norm of X4∞ . This achieves to prove Theorem 3.2. Proof of Theorem 2.2. Let a0 ∈ H 2m (Rn ) ∩ W 2(m−1),6/5 (Rn ), m ≥ 4; we set φ0 = a0 , φj = (1 + |a0 |2 )−1/4 ∂j a0 , for j = 1, . . . , n, and φn+1 = (1 + |a0 |2 )−1/4 an+1 , where 2 2 2 + |a | −a a n+1 0 i 0 a0 1 = 2 4(1 + |a0 | ) 2 2 a¯ n+1 a¯ 0 −(2 + |a0 | ) a¯ 0
a0 2 1 + |a |2 |∇a0 | − 0 + −¯a0 |∇a0 |2 − 1 + |a0 |2
1 (∇|a0 |2 )2 4 1 + |a0 |2
!
ÿ
1
p − 1 a0 1 + |a0 |2 . ! ÿ 2 2 1 (∇|a0 | ) 1 p − 1 a ¯ + 0 4 1 + |a0 |2 1 + |a0 |2 +
Global Existence of Small Solutions to Relativistic NLS Equation
95
We have U0 = (φ0 , φ¯ 0 , . . . , φ¯ n+1 )t ∈ (H 2(m−1) )2n+4 , and assuming that ka0 k2m + ka0 k2(m−1),6/5 is sufficiently small, we have n+1 X
(kφj k2(m−1) + kφj k2(m−2),6/5 ) ≤ 00 ,
j=0
where 00 is the smallness condition of Theorem 3.2 in H 2(m−1) (Rn ) ∩ W 2(m−2),6/5 (Rn ). Hence, by Theorem 3.2, there is a unique global solution U(t) = (u0 (t), u¯ 0 (t), . . . , un+1 (t), u¯ n+1 (t))t ∈ X2(m−1) ∞ of (3.17) with U(0) = U0 . We then set a(t) = u0 (t), and
a(t)
a¯ (t) 2 −1/4 ∇a(t) (1 + |a| ) ; V(t) = (1 + |a|2 )−1/4 ∇¯a(t) (1 + |a|2 )−1/4 ∂t a(t) 2 −1/4 ∂t a¯ (t) (1 + |a| ) we will prove that a(t) is the unique solution of Eq. (1.4) by proving that V is identically equal to U. By the way we constructed system (3.17), V necessarily satisfies (3.17) with U , uk replaced by V, vk (note that u0 (t) = a(t) = v0 (t)) and we have V(0) = U(0); furthermore, V ∈ X2(m−2) . U − V thus satisfies ∞ iA(u0 )∂t (U − V) + 1(U − V)
=
n X
[B(u0 , uk ) − B(u0 , vk )]∂k U
(4.33)
k=1
+
n X
B(u0 , vk )∂k (U − V) + F(U) − F(V).
k=1
¯ t and integrating the imaginary part of the resulting Multiplying (4.33) by [J(U¯ − V)] n expression over R , we get
96
A. de Bouard, N. Hayashi, J.-C. Saut
Z d 2 1/2 2 2 ¯ 2 ¯ k(2 + |u0 | ) (U(t) − V(t))k − Re (u0 [U (t) − V(t)] )dx dt Rn ≤ Cku0 kL∞ k∂t u0 kL∞ kU − Vk2 n X +Cku0 kL∞ (1 + ku0 kL∞ ) kuk − vk kk∂k U kL∞ kU − Vk Z n X Im + k=1
k=1
¯ JB(u0 , vk )∂k (U − V)dx (U¯ − V) t
Rn
+C(kUkL∞ , kVkL∞ )kU − Vk2 ≤ C(|||U|||X2(m−1) , |||V|||X2(m−2) )kU − Vk2 ∞ ∞ n X +C k∂k JB(u0 , vk )kL∞ kU − Vk2 . k=1 2(m−1) ⊂ C(R, H 4 (Rn )), so that But vk = (1 + |a|2 )−1/4 ∂k a and a = u0 ∈ X∞
k∂k JB(u0 , vk )kL∞ ≤ C(ku0 kL∞ , k∇u0 kL∞ , kvk kL∞ , k∇vk kL∞ ) ≤ C(|||U|||X2∞ (m−1) ) ≤ |||U|||X2(m−1) . Hence, we have and |||V|||X2(m−2) ∞ ∞ Z d 2 ¯ k(2 + |u0 |2 )1/2 (U(t) − V(t))k2 − Re (u20 [U¯ (t) − V(t)] )dx dt Rn )kU − Vk2 , ≤ C(|||U|||X2(m−1) ∞ , and since U(0) = V(0), this shows that U = V. Furthermore, since U = V ∈ X2(m−1) ∞ 2(m−1) 2(m−1) by using Eq. (3.17), and since ∂t a ∈ X∞ , we finally get we get 1a ∈ X∞ 2m a ∈ X∞ . Due to the regularity of a, the conservation laws (2.1) and (2.2) are obtained in a very classical way: to get (2.1), we multiply Eq. (1.4) by a¯ , take the imaginary part and integrate over Rn ; to get (2.2), we multiply Eq. (1.4) by a¯ t , take the real part and integrate over Rn . This achieves to prove Theorem 2.2. Proof of Theorem 2.3. In order to prove Theorem 2.3, i.e. to remove the smallness condition on a in Theorem 2.1 when n = 1, we also use the transformations of Sect. 3 and Eq. (3.17), and we will prove that when n = 1, Eq. (3.17) has solutions in (H 2m (R))6 for m ≥ 2, and for some T > 0, by using the space YTm . More precisely, let U0 = (φ0 , φ¯ 0 , φ1 , φ¯ 1 , φ2 , φ¯ 2 )t ∈ (H 2m (R))6 ; we first briefly show that for some T > 0, (3.17) has a unique solution U ∈ Ym T with U(0) = U0 , after which we prove that U ∈ C([0, T ]; (H 2m (R))6 ). For simplicity, we only treat the case m = 2. As in the proof of Theorem 3.2, we consider Eq. (4.1) and set for V ∈ Y2∞ , U = M V, the solution of (4.1) with U(0) = U0 given by Lemma A.1 in the appendix. We then apply J∂xk to ¯ and integrate Eq. (4.1), where J is defined by (4.5), take the inner product with ∂xk U, the imaginary part of the resulting expression over R. Doing this for k = 0, 1, 2, and assuming that |||V|||Y2T ≤ ρ, we easily get
Global Existence of Small Solutions to Relativistic NLS Equation
97
Z d 2 1/2 2 2 ¯ 2 k(2 + |v0 | ) U(t)k − Re (v0 U(t) )dx dt R ≤ C(ρ)|||U|||Y2T (ρ + |||U|||Y2T ). Performing the same computations for k = 0, 1 on Eq. (4.2), and for k = 0 on Eq. (4.3), we obtain Z 2 2−j d XX 2 1/2 j k 2 2 j k ¯ 2 Re (v0 (∂t ∂x U(t)) )dx k(2 + |v0 | ) ∂t ∂x U(t)k − dt R j=0 k=0
(4.34)
≤ C(ρ)|||U|||Y2T (ρ + |||U|||Y2T ). In order to integrate (4.34) in t, we impose V(0) = U0 (this will cause no trouble in the fixed point procedure), define U1 = (ψ0 , ψ¯ 0 , ψ1 , ψ¯ 1 , ψ2 , ψ¯ 2 )t by iA(φ0 )U1 + 1U0 = B(φ0 , φ1 )∂x U0 + F (U0 )
(4.35)
so that ∂t U(0) = U1 , and define U2 by iA(φ0 )U2 + 1U1 = −iA0 (φ0 ).ψ0 U1 + B 0 (φ0 , φ1 ).(ψ0 , ψ1 )∂x U0
(4.36)
+B(φ0 , φ1 )∂x U1 + F 0 (U0 ) U1 , so that ∂t2 U(0) = U2 . We clearly have kU1 k1 ≤ C(kU0 k4 ) and kU2 k ≤ C(kU0 k4 ). Hence, (4.34) can be integrated over [0, T ] to give |||U|||Y2T ≤ C1 (kU0 k4 ) + C2 (ρ)T (1 + |||U|||Y2T ).
(4.37)
In the same way as we obtained (4.34), we can check that if U = M V, U 0 = M V 0 and U (0) = V(0) = U 0 (0) = V 0 (0) = U0 , then Z d 2 1/2 0 2 2 ¯ 2 k(2 + |u0 | ) (U(t) − U (t))k − Re (u0 U(t) )dx dt R Z d 2 1/2 0 2 2 2 ¯ k(2 + |u0 | ) ∂x (U(t) − U (t))k − Re (u0 ∂x U(t) )dx + dt ZR d ¯ 2 )dx k(2 + |u0 |2 )1/2 ∂t (U(t) − U 0 (t))k2 − Re (u20 ∂t U(t) + dt R ≤ C(ρ)|||U − U 0 |||Y1T (|||V − V 0 |||Y1T + |||U − U 0 |||Y1T ), which can be integrated over [0, T ] to give
98
A. de Bouard, N. Hayashi, J.-C. Saut
|||U − U 0 |||Y1T ≤ C3 (ρ)T (|||V − V 0 |||Y1T + |||U − U 0 |||Y1T ).
(4.38)
Now, choosing ρ in order that C1 (kU0 k4 ) ≤
ρ2 , 3
and then choosing T small enough so that 1 ρ2 C3 (ρ)T ≤ inf( , ) , 3 3 we have |||U|||Y2T ≤ ρ,
|||U − U 0 |||Y1T ≤
and
1 |||V − V 0 |||Y1T , 2
(4.39)
showing that there is a unique solution U ∈ Y2T of (3.17) such that U (0) = U0 . Now, since ∂t2 U ∈ C([0, T ]; (L2 (R))6 ) and ∂t U ∈ C([0, T ]; (H 1 (R))6 ), it follows from Eq. (4.2) with V = U , that Ut ∈ C([0, T ]; (H 2 (R))6 ), and then it follows from Eq. (3.17) that U ∈ C([0, T ]; (H 4 (R))6 ). In the general case m ≥ 2, which is treated in the same way, we would recover U ∈ C([0, T ]; (H 2m (R))6 ). Now, let a0 ∈ H m (R) with m ≥ 6. Setting
a0
a¯ 0 2 −1/4 ∂ x a0 (1 + |a0 | ) ∈ H m−2 (R) 6 , U0 = (1 + |a |2 )−1/4 ∂ a¯ 0 x 0 a2 a¯ 2 where
i a0 2 + |a0 | a2 1 = 2 4(1 + |a0 | ) a¯ 2 a¯ 20 −(2 + |a0 |2 ) a¯ 0 ! ÿ 1 (∇|a0 |2 )2 a0 1 2 − 1 a0 |∇a0 | − + p 1 + |a0 |2 4 1 + |a0 |2 1 + |a0 |2 ! , ÿ + 2 2 1 (∇|a0 | ) 1 −¯a0 2 p | − − 1 a ¯ |∇a + 0 0 2 1 + |a0 |2 4 1 + |a0 |2 1 + |a0 | 2
−a20
of (3.17) with we show as in the proof of Theorem 2.2 that the solution U (t) ∈ Ym−2 T U (0) = U0 has the form
Global Existence of Small Solutions to Relativistic NLS Equation
99
a a¯ −1/2 ∇a γ U(t) = γ −1/2 ∇¯a −1/2 γ ∂ a t γ −1/2 ∂t a¯
with
γ=
q 1 + |a|2 ;
this shows again that a is the unique solution of (1.4) and that a ∈ C([0, T ]; H m (R)) ∩ C 1 ([0, T ], H m−2 (R)), achieving to prove Theorem 2.3.
5. Final Remarks and Open Problems Remark 5.1. It was pointed out by J. Ginibre that all the results in the present paper are still valid for the nonlinear Schr¨odinger equation associated with the energy Z 1 |∇u|2 − |∇g|2 dx, E(u) = 2 Rn where g = g(|u|2 ) is any function defined on R+ satisfying 1 − 4ρ(g 0 (ρ))2 > 0 for ρ ≥ 0. The associated NLS equation is 1 i∂t u + 1u = ug 0 (|u|2 )1(g(|u|2 )). 2 Note that E(u) generates the harmonic maps from Rn with euclidean metric to C with the metric 2 du2 = (1 − 4r2 g 0 (r2 ))dr2 + r2 dθ2 . In this case, after using the same computations as those leading to (3.4) and (3.7), the real part of the diagonal coefficients of the matrix corresponding to the highest order terms in (3.4) and (3.7) is equal to 1 2 2 − (1 − 4|u|2 g 0 )−1 ∇(1 − 4|u|2 g 0 ) = ∇h 4 with h(u) = ln(1 − 4|u|2 g 0 )−1/4 . Due to the gradient form of h, the use of a gauge 2 transform (multiplying both sides of the equations by e−h = (1 − 4|u|2 g 0 )1/4 ) still allows one to get rid of the obstructing terms. All the estimates of the paper work in the same way. 2
Throughout this paper, we were mainly concerned with the existence of small solutions. No result seems to be known so far on the local well posedness of the Cauchy problem (1.4) for arbitrary data in dimension 2 or 3. A fortiori, no global in time result is known! However, the following observation due to Chen and Sudan [5] shows that a solution with negative energy cannot disperse as t → +∞. This has been observed in the numerical simulations of [4].
100
A. de Bouard, N. Hayashi, J.-C. Saut
Proposition 5.1 [5]. Let a ∈ C(R+ ; H 2 (Rn )), n = 1, 2 or 3, be a solution of (1.4) with E(0) < 0. Then a cannot disperse in the sense that ka(·, t)kL∞ (Rn ) ≥
2|E(0)|1/2 . ka0 kL2
(5.1)
Proof. By the computations of [5], one has |a|4 = (γ − 1)2 (γ + 1)2 ≥ 4(γ − 1)2 , Z
and kak2L∞ (Rn )
Z Rn
|a|2 ≥
Z Rn
|a|4 ≥ 4
Now,
2
2
|∇a| − |∇γ| ≥ |∇|a|| − |∇γ| = 2
2
Rn
(γ − 1)2 .
|a|2 1− 1 + |a|2
|∇a|2 > 0.
Therefore, for E(0) < 0, one has by (2.2), ka(., t)k2L∞ (Rn ) ≥ Z where N =
1 2
Rn
|a|2 .
4|E(0)| , N
Another interesting issue is the existence and stability of solitary waves solutions of (1.4), a(x, t) = 8ω (x)eiωt . This question has been answered when n = 1 by Iliev and Kirchev [9]. No result seems to be known in higher dimension. We will conclude this section by some remarks on the nonlinear wave equation (1.1). It turns out that the issues considered in the present paper for the nonlinear Schr¨odinger equation (1.2) (or (1.3), (1.4) ) are considerably simpler for (1.1). First, using the change of function a(x, y, z, t) = eiαt+iβz u(x, y, z, t), c2 k , β = k, the nonlinear wave equation (1.1) is transformed into the vg nonlinear Klein–Gordon equation ! ÿ 2 2 1 + kp−2 1γ k c 2 2 u = 0, (5.2) − k 0 u + kp u+ vg2 γ
where α = −
with γ = (1 + |u|2 )1/2 . Note that
vg2 k 2 c2 k2 2 − k > 0 since > 1, and < 1. 0 vg2 c2 k02
Since we are interested in small solutions, we can write (5.2) by expanding (recall that k 2 = k02 − kp2 )
1 to get γ
Global Existence of Small Solutions to Relativistic NLS Equation
1γ u− u+m u+ γ 2
ÿ
! kp2 2 4 |u| + O(|u| ) u = 0 , 2
101
(5.3)
c2 − 1 > 0. vg2 Writing u = v + iw, (5.2) can be written as a system of two nonlinear Klein–Gordon equations for v and w which satisfies the hypothesis of Shatah [12] or Klainerman and Ponce [10] (cubic case). We then obtain n Theorem 5.1. Assume that n = 2, 3. There exists an integer s0 > + 2 depending only 2 on n, and a constant δ > 0 such that if u0 ∈ H s (Rn ) ∩ W s,6/5 (Rn ), s ≥ s0 and ku0 ks + ku0 ks,6/5 ≤ δ, then (5.3) admits a unique smooth solution u ∈ C([0, +∞); H s (Rn )) ∩ C 1 ([0, +∞); H s−2 (Rn )). Moreover, for large t, one has
where m2 = k 2
ku(·, t)kL∞ = O(t−n/3 ) , ku(·, t)kL6 = O(t−n/3 ) , ku(·, t)kL2 = O(1). It would be interesting, though not easy, to justify rigorously the approximation leading to (1.4) from (1.1). A. Appendix In this appendix, we justify the a-priori estimates of Sect. 3 by showing that (4.1) has a unique solution which is as smooth as needed for the afore-mentioned estimates. We first treat the one dimensional case, in which we only need to use a parabolic regularization. Lemma A.1. Let n = 1, U0 ∈ (H 2m (R))6 , with m ≥ 2, and V ∈ Ym ∞ ; then there is a of (4.1) with U (0) = U . unique solution U ∈ Ym 0 ∞ Proof. We treat the case m = 2 to keep notations clear. Let us consider for ε > 0 the regularized equation iA(v0 )∂t U + iεA(v0 )∂x4 U + 1U = B(v0 , v1 )∂x U + F (V).
(A.1)
It is easy to see by applying standard theorems on parabolic equations that given V ∈ Y2∞ and U0 ∈ (H 4 (R))6 , there is a unique solution U ε ∈ Y2∞ of (A.1), with ∂x2 U ε ∈ Y2∞ satisfying U ε (0) = U0 . Let ρ = |||V|||Y2∞ , and fix T > 0. We now prove that U ε is bounded in Y2T independently of ε. Taking the inner product of Eq. (A.1) (with U = U ε ) with J U¯ ε , where J is defined by (4.5), and integrating the imaginary part of the resulting expression over R, we obtain as in Sect. 3, Z Z ε ε ¯ JA(v0 )∂t U (t). U (t)dx + ε Re JA(v0 )∂x4 U ε (t). U¯ ε (t)dx Re R
≤ C(ρ)(1 +
2 X j=0
R
k∂tj U ε (t)k2−j ).
(A.2)
102
A. de Bouard, N. Hayashi, J.-C. Saut
Integrating the second term in the left-hand side of (A.2) by parts, we get Z ε Re JA(v0 )∂x4 U ε (t). U¯ ε (t)dx RZ JA(v0 )∂x2 U ε (t). ∂x2 U¯ ε (t)dx ≥ ε Re R
−C(ρ) kU ε (t)k22 + εkU ε (t)k23 , so that (A.2) yields Z Z d ε ε ¯ Re JA(v0 )U (t). U (t)dx + ε Re JA(v0 )∂x2 U ε (t). ∂x2 U¯ ε (t)dx dt R R 2 X ≤ C(ρ) 1 + k∂tj U ε (t)k22−j + εkU ε k23 .
(A.3)
j=0
In the same way, we multiply the second x-derivative of Eq. (A.1) by (J∂x2 U¯ ε )t and integrate the imaginary part over R to get Z Z 2 ε 2 ¯ε JA(v0 )∂t ∂x U . ∂x U + ε Re JA(v0 )∂x4 ∂x2 U ε . ∂x2 U¯ ε dx Re R R 2 X k∂tj U ε (t)k22−j , ≤ C(ρ) 1 + j=0
and integrating again by parts the second term in the left-hand side, the estimate reads Z Z d 2 ε 2 ¯ε Re JA(v0 )∂x U . ∂x U dx + ε JA(v0 )∂x4 U ε . ∂x4 U¯ ε dx dt R R 2 (A.4) X ≤ C(ρ) 1 + k∂tj U ε (t)k22−j + εk∂x4 U ε kkU ε k3 . j=0
We proceed in the same way with the first and second time-derivatives of Eq. (A.1), which we multiply respectively by (J∂x U¯ ε )t and by (J U¯ ε )t . Collecting all those estimates, we obtain Z 2 2−j d XX Re (v02 (∂tj ∂xk U¯ ε (t))2 )dx k(2 + |v0 |2 )1/2 ∂tj ∂xk U ε (t)k22−j − dt R j=0 k=0 2 X
k∂tj ∂x2 U ε (t)k22−j j=0 2 2 2 X X X ≤ C(ρ) 1 + k∂tj U ε (t)k22−j + ε k∂tj U ε (t)k3−j k∂tj ∂x2 U ε (t)k2−j +ε
≤ C(ρ) 1 +
j=0 2 X j=0
k∂tj U ε (t)k22−j +
j=0
ε 2
2 X
j=0
k∂tj ∂x2 U ε (t)k22−j ,
j=0
(A.5)
Global Existence of Small Solutions to Relativistic NLS Equation
103
where we have used k∂tj U ε k3−j ≤ αk∂tj ∂x2 U ε k2−j + C(α)k∂tj U ε k2−j for any α > 0. Let g(t) =
2−j 2 X X
k(2 +
|v0 |2 )1/2 ∂tj ∂xk U ε (t)k22−j
Z − Re
j=0 k=0
R
v02 (∂tj ∂xk U¯ ε (t))2 dx
.
By (A.5), and the inequality 2 X
k∂tj U ε (t)k22−j ≤ g(t),
(A.6)
j=0
we have
g 0 (t) ≤ C(ρ)(1 + g(t))
(A.7) ε
and we conclude by the Gronwall Lemma and (A.6) again that U is bounded in Y2T independently of ε for any T > 0; hence, some subsequence of U ε converges to some U ∈ Y2T for any T > 0, which is a solution of (4.1). Uniqueness is proved in a classical way, using the estimates of Sect. 4. We proceed with the case of dimension 2 and 3. In this case, we prove the existence of a solution to the linear equation under the restriction that U0 and V are small, since this restriction allows us to use the same estimates as in Sect. 4, and because this restriction is anyway needed in the proof of Theorem 3.1. However, Lemma A.2 could be proved without any smallness restriction on the size of the initial data and V. Lemma A.2. Let m ≥ 3. There is an 1 > 0 such that if U0 ∈ (H 2m (Rn ) ∩ W 2(m−1),6/5 (Rn ))2n+4 and V ∈ X2m ∞ , with kU0 k2m + kU0 k2(m−1),6/5 ≤ 1 and |||V|||X2m ≤ , then there is a unique solution U ∈ X2m 1 ∞ of (4.1) satisfying U(0) = U0 . ∞ Proof. We first fix R > 0 and use a Galerkin method to find a solution on B(0, R), after 2 which we will let R go to infinity. Let (eR m )m∈N∗ be an orthonormal basis of L (B(0, R)) consisting of eigenfunctions of the Laplacian operator on B(0, R) with Dirichlet boundR R be the orthogonal projector on VmR = sp{eR ary conditions. Let Pm 1 , . . . , em }. We also ∞ take ψ a nonnegative C0 function with supp ψ ⊂ B(0, 1) and ψ ≡ 1 on B(0, 1/2), and define ψR = ψ( R· ). We then consider the following finite dimensional system of differential equations: X R R R R R R A(v0 )∂t Um + 1Um = Pm B(v0 , vk )∂k Um + Pm F (V) (A.8) iPm k
with initial data
R R Um (0) = Pm (ψR U0 ).
R ∈ C(R+ , VmR ), since the map It can easily be seen that (A.8) has a unique solution Um
U
R A(v0 ) U 7→ Pm
VmR −→
VmR
104
A. de Bouard, N. Hayashi, J.-C. Saut
is invertible, as follows from the inequality R A(v0 ) U, U¯ ) = (JA(v0 ) U , U¯ ) (JPm Z Re (v02 U¯ 2 )dx = k(2 + |v0 |2 )1/2 Uk2 − Rn
≥ kUk , 2
R which is true for any U ∈ VmR . We now prove that Um is bounded in X2m T , for any T > 0, independently of m and R. We may reproduce the estimates of Sect. 4 for R R R satisfying (A.8) with Um (0) = Pm (ψR U0 ), and assuming that |||V|||X2m ≤ 1 and Um ∞ kU0 k2(m−1),6/5 + kU0 k2m ≤ 1 , with 1 ≤ 1, we obtain exactly as in (4.20), R R |||X2m ≤ C1 (1 + 1 + |||Um |||X2m ), |||Um T T
where the constant C does not depend on m or R. Choosing 1 sufficiently small, this R is bounded in X2m shows that Um T independently of m and R. Hence, some subsequence R of Um converges in X2m T weak star to some U which is a solution of (4.1) for any T > 0. We can now reproduce the second part of the estimates of Sect. 4 (using the integral equation (4.21)) on U and obtain easily as we obtained (4.31), m X ≤ C1 1 + 1 + sup k∂tj U (t)k2(m−j) , |||U|||X2m ∞ t∈R j=0
showing thus that U is in X2m ∞ . Time-continuity (i.e. the fact that U indeed belongs to X2m ∞ ) and uniqueness of U are again proved in a classical way. This achieves the proof of Lemma A.2. References 1. Borovskii, A.V., Galkin, A.L.: Dynamic modulation of an ultrashort high-intensity laser pulse in matter. JETP 77, 4, 562–573 (1993) 2. de Bouard, A., Hayashi, N., Saut, J.C.: Sur une e´ quation de Schr¨odinger non lin´eaire en physique relativiste. C. R. Acad. Sci. Paris 321, S´erie I, 175–178 (1995) 3. Brandi, H.S., Manus, C., Mainfray, G., Lehner, T., Bonnaud, G.: Relativistic and ponderomotive selffocusing of a laser beam in a radially inhomogeneous plasma. I. Paraxial approximation. Phys. Fluids B 5, 10, 3539–3550 (1993) 4. Bruneau, C.H., Di Menza, L., Lehner, T.: Numerical simulations of nonlinear plasmas. Preprint, 1996 5. Chen, X.L., Sudan, R.N.: Necessary and sufficient conditions for self-focusing of short ultraintense laser pulse in underdense plasma. Phys. Rev. Let. 70, 14, 2082–2085 (1993) 6. Chihara, H.: Local existence for semilinear Schr¨odinger equations in one space dimension. J. Math. Kyoto Univ. 34, 353–367 (1994) 7. Hayashi, N., Ozawa, T.: Remarks on nonlinear Schr¨odinger equations in one space dimension. Diff. and Integral Eqs. 7, (1994), pp. 453-461. 8. Hayashi, N., Ozawa, T.: Global, small radially symmetric solutions to nonlinear Schr¨odinger equations and a gauge transformation. Diff. and Integral Eqs. 8, 1061–1072 (1995) 9. Iliev, I.D., Kirchev, K.P.: Stability and instability of solitary waves for one-dimensional singular Schr¨odinger equations. Diff. Int. Eq. 6, 685–703 (1993) 10. Klainerman, S., Ponce, G.: Global, small amplitude solutions to nonlinear evolution equations. Comm. Pure Appl. Math. 36, 133–141 (1983)
Global Existence of Small Solutions to Relativistic NLS Equation
105
11. Ritchie, B.: Relativistic self-focusing and channel formation in laser-plasma interactions. Phys. Rev. E 50, 2, 687–689 (1994) 12. Shatah, J.: Global existence of small solutions to nonlinear evolution equations. J. Diff. Eq. 46, 409–425 (1982) 13. Soyeur, A.: The Cauchy problem for the Ishimori equations. J. Funct. Anal. 105, 232–255 (1992) Communicated by H. Araki
Commun. Math. Phys. 189, 107 – 126 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Spinq , Twistor and Spinc Masayoshi Nagase Department of Mathematics, Faculty of Science, Saitama University, Urawa, Saitama 338, Japan. E-mail:
[email protected] Received: 3 October 1995 / Accepted: 2 March 1997
Abstract: Spinq structures induce (Spinq style) twistor spaces, which possess canonical Spinc structures. Such structures produce Dirac operators. Their indices for the even dimensional case, and the adiabatic limit of their reduced η-invariants for the odd dimensional case, are discussed.
Introduction Let (M, g M ) be an n-dimensional oriented Riemannian manifold equipped with a Spinq structure introduced in [13]: Spinq (n) = Spin(n) ×Z2 Sp(1). Namely, the reduced structure bundle PSO(n) is assumed to have principal Spinq (n)-, SO(3)-bundles PSpinq (n) , PSO(3) together with a Spinq (n)-equivariant bundle map from PSpinq (n) to the fibre product bundle PSO(n) × PSO(3) , ξ q : PSpinq (n) → PSO(n) × PSO(3) . The concept came out originally from the idea of twisting Spin-structure with Sp(1) to fit it for almost quaternionic structure ([13]), just as one originally twisted Spinstructure with U (1) to fit it for almost complex structure and called the twisted one a Spinc -structure ([3, §5, Remark 4]). Using the canonical action of Spinq (n) on the quotient Spinq (n)/Spinc (n) = Sp(1)/U (1) = CP 1 , we get a CP 1 -fibration, π : Z = PSpinq (n) ×can
Spinq (n) → M. Spinc (n)
Because of twisting with CP 1 it will be suitable to call Z a (Spinq style) twistor space. By pulling back the product of the Levi-Civita connection associated to g M and a fixed connection on PSO(3) we obtain a connection on PSpinq (n) . This induces a splitting of the
108
M. Nagase
tangent bundle T Z into the horizontal and vertical components, H ⊕ V. Let g V denote the invariant metric on the fibres associated to the Fubini-Study metric of CP 1 (with holomorphic sectional curvature 1). We may now define a metric g Z on Z by gZ = π∗ gM + gV ,
π ∗ g M = g Z |H.
Then the reduced structure bundle PSO(n+2) (Z) of (Z, g Z ) turns out to admit a natural Spinc structure ξ c : PSpinc (n+2) (Z) → PSO(n+2) (Z) × PU (1) (Z). We obtain thus a chain of attractive materials, Spinq -Twistor-Spinc . These naturally induce spinor bundles Sq , Sc over M , Z, which possess the usual Dirac operators Dq , Dc . In the even dimensional case (§4), their indices will be calculated. In the odd dimensional case (§5), we will construct another Dirac operator Dqc on Z, which differs ¯ qc ). It from Dc and study the so-called adiabatic limit of its reduced η-invariant η(D qc follows from Theorem 5.1 that: Let Dε be its adiabatic version associated to the metric, gεZ = ε−1 π ∗ g M + g V ,
ε > 0.
Then, while both η(D ¯ εqc ) and η(D ¯ q ) are not locally computable (even in the base space ¯ εqc ) − η(D ¯ q ) modulo Z is a value which is partially direction), the difference limε→0 η(D locally computable in the base space direction and which we may call an anomaly, a term which often appears in physics. The remarkable idea of extracting a partially locally computable value by taking the (adiabatic) limit is originally due to Witten [18, IV], in which he found that, for a determinant line bundle associated to a family of invertible√ Dirac operators over a circle, the global anomaly (or the holonomy) is equal to ¯ If we want to generalize his result to the higher dimensional exp(−2π −1 limε→0 η). universe which may admit a Spinq -structure and may be blown up to a Spinc manifold according to the twistor theory, Theorem 5.1 will be an interesting result. The paper might be thus of interest not only to mathematicians but to physicists. In Sect. 1, after reviewing Spin, Spinc , Spinq briefly, we will discuss a natural Spinc structure of the twistor space. The reduction problem for π ∗ PSpinq (n) will be discussed also there. In Sect. 2 we will study some kinds of connections on the twistor space and discuss their relations. In Sect. 3 we will define Spinq -, Spinc -vector bundles Sq , Sc and Dirac operators Dq , Dc . The Lichnerowicz formulas for them will be given. In Sect. 4 we will consider an even-dimensional Spinq manifold M and its twistor space. According to the splittings of spinor bundles, the Dirac operators are graded into Dq± , Dc± . The indices of Dq+ , Dc+ are calculated. In Sect. 5 we will consider an odddimensional Spinq manifold and its twistor space. The Dirac operator Dqc is defined and the difference mentioned above will be studied. 1. Twistor Space Associated To Spinq Structure First let us briefly recall relevant facts on Spin, Spinc , Spinq . Spin(n) is the covering group of SO(n) with a short exact sequence, ξ 1 → Z2 → Spin(n) → SO(n) → 1.
(1.1)
If n ≥ 3, it is the universal one because π1 (SO(n)) = Z2 . Twisting it with the unitary group U (1), we get Spinc (n) = Spin(n)×Z2 U (1), which has a short exact sequence,
Spinq , Twistor and Spinc
109
ξc 1 → Z2 → Spinc (n) → SO(n) × U (1) → 1,
(1.2)
where ξ c ([ϕ, z]) = (ξ0c ([ϕ, z]), ξ1c ([ϕ, z])) = (ξ(ϕ), z 2 ). Further, by twisting it with the quaternionic unitary (or symplectic) group Sp(1), we obtain Spinq (n) = Spin(n) ×Z2 Sp(1) together with a short exact sequence, ξq 1 → Z2 → Spinq (n) → SO(n) × SO(3) → 1,
(1.3)
where ξ q ([ϕ, λ]) = (ξ0q ([ϕ, λ]), ξ1q ([ϕ, λ])) = (ξ(ϕ), Ad(λ)). Here R3 and the imaginary part of the quaternion field, Im H = {a1 i + a2 j + a3 k | a` ∈ R} (i2 = j2 = k2 = −1, ij = −ji = k), are naturally identified and the homomorphism Ad is defined by SO(3) = SO(Im H) 3 Ad(λ) : w 7→ λwλ−1 . Let us now consider a principal SO(n)-bundle over a C ∞ -manifold M , πSO(n) : PSO(n) → M. (1.4) Its Spin-structure is then a principal Spin(n)-bundle PSpin(n) together with a bundle map ξ : PSpin(n) → PSO(n) which is equivariant to the map ξ in (1.1). The definitions of Spinc -, Spinq -structures may be easily imagined now. For further details refer to [12, Appendix D] and [13]. Now assume that (1.4) is equipped with a Spinq -structure: ξq
PSpinq (n) @ π0 @ R
-
PSO(n) × PSO(3)
πSO(n) × πSO(3)
(1.5)
M The bundle PSO(n) is an abstract one so that the situation is more general than that in the introduction. It exists if and only if the second Stiefel-Whitney class of PSO(n) is equal to that of some PSO(3) , w2 (PSO(n) ) = w2 (PSO(3) ). (1.6) In this section we will define a twistor space associated to the Spinq -structure and investigate its canonical Spinc -structure. Consider the canonical identifications Spinq (n) Sp(1) = = CP 1 . Spinc (n) U (1)
(1.7)
¯ 2 , bw1 + a¯ w2 ], of The second one is given by the action, (a + jb, [w1 , w2 ]) 7→ [aw1 − bw Sp(1) = SU (2) on CP 1 . Then, using the canonical action of Spinq (n), we have Definition 1.1. The total space of the following CP 1 -fibration is called a (Spinq style) twistor space associated to (1.5): π : Z = PSpinq (n) ×can
Spinq (n) → M. Spinc (n)
If (M 4n , g M ) has an almost quaternionic structure PSp(n)·Sp(1) (Sp(n) · Sp(1) = Sp(n)×Z2 Sp(1)), then it admits a canonical Spinq -structure as was shown in [13, Sect. 3]. Indeed we may find a homomorphism Ξ q = [Ξ0q , Ξ1q ] which makes the following diagram commutative:
110
M. Nagase
: q Ξ
Spinq (4n)
ξq ? - SO(4n) × SO(3)
Sp(n)·Sp(1)
(1.8)
inc × Ad 0 Here we set inc([S, λ]) : R = Hn 3 x 7→ Sxλ¯ ∈ Hn = R4n , and Ad0 ([S, λ]) : Im H → Im H is given by w 7→ w (if 4n = 8m), w 7→ λwλ−1 = Ad(λ)w (if 4n = 8m + 4). The canonical one is then defined by 4n
PSpinq (4n) = PSp(n)·Sp(1) ×Ξ q Spinq (4n).
(1.9)
Hence the twistor space is of the form ( Z = PSp(n)·Sp(1) ×Ξ1q CP = 1
M × CP 1 (trivial),
4n = 8m
PSp(n)·Sp(1) ×can CP , 1
4n = 8m + 4.
(1.10)
If 4n = 8m + 4, then it is thus the almost quaternionic style twistor space studied in [16]. On the other hand, it is trivial if 4n = 8m, in contrast to [16]. This is because we take the trivial Spinq -structure as the canonical one since M 8m has a Spin-structure (w2 (M 8m ) = 0). Though one may feel it undesirable, it will be proper from the standpoint of Spinq theory that the Spinq style twistor space is reduced to a trivial one provided the Spinq -structure is trivial. Let us return to the general situation and discuss a canonical Spinc -structure of the twistor space Z, PSpinc (n) (Z) ≡ PSpinq (n) π (1.11) π 1 y y 0 π Z −−−−→ M Here, since π1 : PSpinq (n) 3 px 7→ [px , [1]] ∈ Z obviously has a structure of principal Spinc (n)-bundle, PSpinq (n) regarded as the total space of the bundle π1 is denoted by PSpinc (n) (Z). Moreover, let us consider the vector bundle of vertical tangent vectors of Z, (1.12) V = PSpinq (n) ×can∗ T CP 1 → Z. Since the complex structure and the Fubini-Study metric ds2 = 4(1 + |z|2 )−2 dz ⊗ dz¯ (z = w2 /w1 or w1 /w2 ) (with holomorphic sectional curvature 1) of CP 1 are Sp(1)invariant, they induce a hermitian line bundle structure on V. The set of unitary frames with respect to the hermitian metric dsV forms a principal U (1)-bundle PUV(1) (Z) over V Z. The reduced structure bundle PSO(2) (Z) (= PUV(1) (Z)) of V with the underlying V V Riemannian metric g (of ds ) admits then a canonical Spinc (2)-structure. Actually, let J0 and v denote the canonical complex structure and a fixed unitary frame of C and let us consider the homomorphism √ θ θ + sin vJ0 v, e −1θ/2 ]. 2 2
(1.13)
V V V V c ξ c : PSpin c (2) (Z) = PU (1) (Z) ×Ξ c Spin (2) → PSO(2) (Z) × PU (1) (Z). 1
(1.14)
Ξ1c : U (1) → Spinc (2),
√
e
−1θ
7→ [cos
Then the desired one is given by
Spinq , Twistor and Spinc
111
Now, using the embeddings Rn , C ,→ Rn ⊕C = Rn+2 , we embed Spinc (n) and Spinc (2) into Spinc (n + 2) and denote by mult the multiplication in Spinc (n + 2). Observe (1.2) with n replaced by n + 2. Definition 1.2. We put V ×mult Spinc (n + 2), PSpinc (n+2) (Z) = PSpinc (n) (Z) × PSpin c (2) (Z) PSO(n+2) (Z) = PSpinc (n+2) (Z) ×ξ0c SO(n + 2), PU (1) (Z) = PSpinc (n+2) (Z) ×ξ1c U (1). Then a Spinc -structure of PSO(n+2) (Z) is defined canonically by ξ c : PSpinc (n+2) (Z) → PSO(n+2) (Z) × PU (1) (Z).
(1.15)
Obviously there exists also an expression V (Z) ×can SO(n + 2). PSO(n+2) (Z) = π ∗ PSO(n) × PSO(2)
(1.16)
This says: If PSO(n) has a reduction embedding into the frame bundle F (M ) and a connection αM,q of PSpinq (n) is fixed, then there is a natural reduction embedding of PSO(n+2) (Z) into F (Z). Namely, by means of the metrics g M , g V and the splitting into the horizontal and vertical components TZ = H ⊕ V
(1.17)
induced by αM,q , a metric g Z of Z can be defined by gZ = π∗ gM + gV ,
π ∗ g M = g Z |H.
(1.18)
Now, if PSO(n) is the reduced structure bundle given by g M , then the one given by g Z is (1.16). The situation is just as that in the introduction. The definition given above is thus natural. Next let us investigate more closely the principal Spinc (n)-bundle structure of π1 . Consider π˜ 1 : Spinq (n) → Spinq (n)/Spinc (n), [ϕ, λ] 7→ [[ϕ, λ]] ≡ [λ], and take a cross-section f = [1, f1 ] over a neighborhood W of [1]. Then we can take an open covering {W` } of Spinq (n)/Spinc (n) each of which is of the form W` = g` ·W (g` = [1, g1` ] ∈ Spinq (n)) and define cross-sections f` : W` → Spinq (n) by f` ([s]) = g` f (g`−1 [s]). Note that f`0 ` ([s]) = f`0 ([s])−1 f` ([s]) ∈ Spinc (n) for [s] ∈ W`0 ∩ W` and these form a family of transition functions of π˜ 1 . Moreover let f˜ba : Ub ∩ Ua → Spinq (n) be transition functions of π0 . Then, for (x, [s]) ∈ (Ub × W`0 ) ∩ (Ua × W` )(⊂ Z), ψ(b,`0 )(a,`) (x, [s]) = f`0 (f˜ba (x)[s])−1 f˜ba (x)f` ([s])
(1.19)
belongs to Spinc (n) and these ψ(b,`0 )(a,`) turn out to form a family of transition functions of π1 . Lemma 1.3. The following is a reduction embedding: PSpinc (n) (Z) ≡ PSpinq (n) ,→ π ∗ PSpinq (n) px 7−→ ([px , [1]], px )
(1.20)
112
M. Nagase
Proof. The usual transition functions of π ∗ PSpinq (n) = {(z, p) ∈ Z × PSpinq (n) | π(z) = π0 (p)} are f˜ba : π −1 (Ub ) ∩ π −1 (Ua ) → Spinq (n), f˜ba (zx ) = f˜ba (x). By choosing Ua × W` as new local coordinate neighborhoods, one can easily show that they are reduced to ψ(b,`0 )(a,`) . Lemma 1.4. Using ξ1c : Spinc (n) → U (1), we set PUH(1) (Z) = PSpinc (n) (Z) ×ξ1c U (1).
(1.21)
Then we have (1) PU (1) (Z) = PUH(1) (Z) ⊗ PUV(1) (Z),
(2) the embedding U (1) = SO(2) ,→ SO(3), A 7→
1 0 , and (1.20) induce a 0A
reduction embedding PUH(1) (Z) ,→ π ∗ PSO(3) .
(1.22)
Proof. We have PUH(1) (Z) × PUV(1) (Z) = (PSpinc (n) (Z) × PUV(1) (Z)) ×(ξ1c ,can) (U (1) × U (1)), PU (1) (Z) = (PSpinc (n) (Z) × PUV(1) (Z)) ×ξ1c ◦mult◦(id,Ξ1c ) U (1),
(1.23)
so that the canonical multiplication U (1) × U (1) → U (1) defines a well-defined homomorphism PUH(1) (Z) × PUV(1) (Z) → PU (1) (Z), which obviously induces the identification in (1). As for (2), (1.22) is given by PUH(1) (Z) = PSpinc (n) (Z) ×ξ1c U (1) ,→ PSpinc (n) (Z) ×Ad SO(3) = π ∗ PSO(3) , (1.24) 1 0 . [[ϕ, z], A] 7→ [ϕ, z], 0A This is actually well-defined because Ad(z)(x1 i +x2 j+x3 k) = x1 i +z 2 (x2 +x3 i)j.
The existence of a Spinc -structure guarantees the identity w2 (PSO(n+2) (Z)) ≡ c1 (PU (1) (Z))
(mod 2),
(1.25)
where c1 ( · ) is the first Chern class. It is also easy to show this only by handling the characteristic classes as follows: (1.22) says π ∗ PSO(3) ×can R3 = (Z × R) ⊕ (PUH(1) (Z) ×can C).
(1.26)
Hence we have w2 (π ∗ PSO(3) ) = w2 (PUH(1) (Z)) ≡ c1 (PUH(1) (Z)). This, combined with (1.16) and (1.6), implies V w2 (PSO(n+2) (Z)) = w2 (π ∗ PSO(n) × PSO(2) (Z)) V (Z)) ≡ c1 (PUH(1) (Z)) + c1 (PUV(1) (Z)) = w2 (π ∗ PSO(3) ) + w2 (PSO(2)
= c1 (PUH(1) (Z) × PUV(1) (Z)) = c1 (PUH(1) (Z) ⊗ PUV(1) (Z)) = c1 (PU (1) (Z)).
Spinq , Twistor and Spinc
113
2. Connection on the Twistor Space From now on, throughout the paper, (M, g M ) is assumed to be a Spinq manifold, that is, the reduced structure bundle PSO(n) is assumed to have a fixed Spinq -structure (1.5). Moreover, let αM be the Levi-Civita connection on PSO(n) and let αSO(3) be a fixed connection on PSO(3) . These give a connection αM,q = ξ q∗ (αM ⊕ αSO(3) )
(2.1)
on PSpinq (n) , which induces a splitting (1.17) and a metric g Z on Z given in (1.18). The reduced structure bundle PSO(n+2) (Z) possesses then a Spinc -structure (1.15). The Levi-Civita connection on PSO(n+2) (Z) associated to g Z is denoted by αZ . Let us denote by ∇Z the covariant derivative on T Z associated to αZ and by Z its curvature 2-form. Composed by the orthogonal projection P V : T Z → V, it induces a connection ∇V = P V ∇Z on V, which is compatible with the metric g V and whose restriction to each fibre equals the Levi-Civita connection associated to g V . Its curvature 2-form is denoted by V , which takes values in so(2). Further, by lifting the covariant derivative ∇M (with curvature 2-form M ) associated to αM , we obtain a covariant derivative π ∗ ∇M on H, i.e., (π ∗ ∇M )X f π ∗ Y = X(f )π ∗ Y + f π ∗ ∇M π∗ X Y for X ∈ 0(T Z), Y ∈ 0(T M ) and f ∈ C ∞ (Z). We may define now another covariant derivative on T Z = H ⊕ V by (2.2) ∇ ⊕ = π ∗ ∇ M ⊕ ∇V , whose curvature 2-form is denoted by ⊕ . Note that it is compatible with the metric g Z but the torsion T may not vanish, so that the tensor S = ∇Z − ∇⊕
(2.3)
may not vanish. Now, according to the splitting (1.17), let us decompose X ∈ T Z into X = X H + X V . Then we have Lemma 2.1. Let X, Y , U , V be vector fields on Z with X = X H , Y = Y H , U = U V and V = V V . H (1) The fibres of π are totally geodesic, i.e., (∇Z U V ) = 0. V V (2) ∇X U = [X, U ] . ⊕ V (3) S(U )V = ∇Z U V − ∇U V = 0, T (X, Y ) = −[X, Y ] , T (U, V ) = T (U, X) = 0, 1 g Z (S(X)U, Y ) = −g Z (S(X)Y, U ) = g Z (S(U )X, Y ) = g Z (T (X, Y ), U ), 2 g Z (S(·)·, · ) = 0 (otherwise). (4) ∇V can be a unitary connection on the hermitian complex line bundle V. P |T (Xi , Xj )|2gZ , (5) Let κZ , κM be the scalar curvatures of g Z , g M and let us set |T |2 =
where {Xj } is an orthonormal basis of H. Then we have 1 κZ = 2 + π ∗ κM − |T |2 . 4
(2.4)
The number 2 on the right side is the scalar curvature of g V . (If M is a quaternionic M is an Einstein manifold.) K¨ahler manifold, κZ can be more closely examined because R (6) For each element β ∈ 0(∧i T ∗ Z), we denote by Z/M β ∈ 0(∧i−2 T ∗ M ) its integral R R over the fibres, that is, the form determined by the condition that M Z/M β ∧ γ =
114
M. Nagase
R
β∧π ∗ γ holds for all compactly supported γ ∈ 0(∧T ∗ M ): see [4, (1.15)-(1.17)]. Note i−2 i (Z) → HDR (M ) on de Rham cohomology that the integral naturally induces a map HDR √ √ 1/2 V V ˆ ) = det ( −1 /4π)/ sinh( −1V /4π) (here groups. Further, let us set A( V V√ takes values in so(2) √ and the determinant is taken in gl(2, R) (⊃ so(2))) and c1 ( ) = tr −1V /2π = −1V /2π (here V is the curvature of ∇V regarded as a unitary connection and hence takes values in u(1), and the trace is taken in gl(1, C) (⊃ u(1))), ˆ namely, the A-form and the first Chern form made from V . Then we have Z 1 V V 2∗ ˆ c1 ( ) = 1 in HDR (M ). (2.5) A( ) exp 2 Z/M Z
Proof. (1) is shown in [17, Theorem 3.5] or [5, Theorem 9.59]. Since g Z (∇V XU − Z Z Z Z [X, U ]V , V ) = g Z (∇Z X U − [X, U ], V ) = g (∇U X, V ) = −g (X, ∇U V ), (1) implies (2). (3) is easily shown by straightforward computation: see [4, Sect. 10.1]. Let us show (4). It will suffice to show ∇V commutes with the (almost) complex structure J V on V. Let X be specially a basic vector field, i.e., the lifting of a vector field on M . Then [X, U ] is certainly a vertical vector field. Further, since the horizontal transport associated to H preserves J V , the Lie derivative along X (acting on 0(V)) commutes with J V : see V V V V [5, Sect. 14.72]. Hence we have J V ∇V X U = J [X, U ] = [X, J U ] = ∇X J U . On the V other hand, since the restriction of ∇ to each fibre is just the hermitian connection on V V the fibre, we have J V ∇V V U = ∇V J U . Thus we have proved (4). (5) is due to (1) and [5, (9.70d)], in which not T but the O’Neill tensor A is used. Note that A = −T /2 on H. (6) will be shown in Sect. 3 by understanding the meaning of the left side of (2.5). Now we attach to PUV(1) (Z) an (Ehresmann) connection αV associated to the unitary H ∇V . Next, on PUH(1) (Z), let us take a connection αU (1) which is made from αSO(3) as follows. Lemma 1.4(2) says that it is a subbundle of π ∗ PSO(3) and, moreover, regarding U (1) as a subgroup of SO(3) by the embedding given there, SO(3)/U (1) is reductive, that is, by the inclusion u(1) = so(2) ,→ so(3), we have a decomposition into subspaces 0 −c −d so(3) = u(1) + m, m = c 0 0 c, d ∈ R , (2.6) d 0 0 with Ad(U (1))m ⊂ m. Hence the u(1)-component of π ∗ αSO(3) restricted to PUH(1) (Z) H H 2 ∗ gives its connection αU (1) . Let us investigate its curvature 2-form U (1) (∈ 0(∧ T Z ⊗ 3 u(1)) briefly. We use the identification R = ImH, f1 , f2 , f3 ↔ i, j, k, given in (1.3). By putting (fa ∧ fb )(v) = hfa , vifb − hfb , vifa (h , i is the standard inner product of R3 ), we Pobtain a basis {fa ∧ fb }a
X a
(fa ∧ fb ) ⊗ ba =
1X (fa ∧ fb ) ⊗ ba . 2
(2.7)
a,b
Note that the subspaces u(1) and m are spanned by {f2 ∧ f3 } and {f1 ∧ f2 , f1 ∧ f3 } √ ˜ respectively. Accordingly we set H = −1 32 . Then, if we take tangent vectors X, U (1) Y at x ∈ Ua ⊂ M and regard them as tangent vectors at (x, [s]) ∈ Ua × W` ⊂ Z, ˜ 32 (X, Y ) is given by the formula
Spinq , Twistor and Spinc
115
X ˜ ba (X, Y ). Ad (Ad(f` ([s])))−1 SO(3) (X, Y ) = (fa ∧ fb )
(2.8)
a
˜ 32 (U, V ) may not vanish for tangent vectors U , V at [s] ∈ W` (⊂ Notice, however, that 1 CP ) in contrast to the fact (π ∗ SO(3) )(U, V ) = 0. On PU (1) (Z) = PUH(1) (Z) ⊗ PUV(1) (Z), a canonical connection is now defined by H V αU (1) = αU (1) ⊗ 1 + 1 ⊗ α ,
(2.9)
whose curvature 2-form is certainly V U (1) = H U (1) + .
(2.10)
Further, by using (Ehresmann) connections αZ and α⊕ = π ∗ αM ⊕ αV on PSO(n+2) (Z) associated to ∇Z and ∇⊕ , we can define now two kinds of connections on PSpinc (n+2) (Z) as αZ,c = ξ c∗ (αZ ⊕ αU (1) ),
α⊕,c = ξ c∗ (α⊕ ⊕ αU (1) ).
(2.11)
Lastly let us examine the second one α⊕,c more closely. Spinq (n)/Spinc (n) = Sp(1)/U (1) is reductive. Indeed we have a decomposition spinq (n) = spinc (n) + m, where m is identified with the subspace m given in (2.6) through the adjoint map 1 (a1 i + a2 j + a3 k) 7→ a1 f2 ∧ f3 + a2 f3 ∧ f1 + a3 f1 ∧ f2 . 2 (2.12) Note that Ad(Spinc (n))m ⊂ m. Hence the spinc (n)-component of π ∗ αM,q restricted to the subbundle PSpinc (n) (Z) (see Lemma 1.3) gives its connection αH,c . On the other V hand, αV induces a connection αV,c on PSpin c (2) (Z): see (1.14). Now the direct sum conH,c V,c V nection α ⊕ α and a canonical homomorphism Θc : PSpinc (n) (Z) × PSpin c (2) (Z) → c V V c H,c ⊕ αV,c ) on PSpinc (n+2) (Z), Θ (pz , pz ) = [(pz , pz ), 1], induce a connection Θ∗ (α PSpinc (n+2) (Z). Then we want to show ad = Ad∗ : sp(1) ∼ = so(3),
Lemma 2.2.
α⊕,c = Θ∗c (αH,c ⊕ αV,c )
H Proof. First, we have αH,c = “ (π ∗ ξ q )∗ (αM ⊕αSO(3) ) restricted " = ξ s∗ (π ∗ αM ⊕αU (1) ), s ∗ H ∗ q ∗ where ξ : PSpinc (n) (Z) → π PSO(n) × PU (1) (Z) is a reduction of π ξ : π PSpinq (n) → π ∗ PSO(n) ×π ∗ PSO(3) through (1.20) and (1.22). Let us consider then two kinds of canoniV c cal homomorphisms Ξ1c : PUV(1) (Z) → PSpin c (2) (Z) with p 7→ [p, 1] (see (1.14)) and θ = c c V Θ ◦(id, Ξ1 ) : PSpinc (n) (Z) × PU (1) (Z) → PSpinc (n+2) (Z), where id is the identity map. Further, taking a canonical homomorphism, can : π ∗ PSO(n) × PUV(1) (Z) → PSO(n+2) (Z) induced from (1.16) and the multiplication, mult : PUH(1) (Z) × PUV(1) (Z) → PU (1) (Z) (see (1.23)), we may define a map, can × mult : π ∗ PSO(n) × PUH(1) (Z) × PUV(1) (Z) → V V H V PSO(n+2) (Z) × PU (1) (Z) by (can × mult)(pz , pH z , pz ) = (can(pz , pz ), mult(pz , pz )). c c s c H,c ⊕ αV,c ) = Then obviously we have ξ ◦θ = (can × mult)◦(ξ , id) ,and hence Θ∗ (α c H,c V c∗ ∗ M H V ⊕,c ⊕ α ) = ξ (can × mult)∗ (π α ⊕ αU (1) ⊕ α ) = α . θ∗ (α
116
M. Nagase
3. Spinq -, Spinc -Vector Bundles and Dirac Operators Let Cl(n) denote the Clifford algebra associated to Rn with Pn standard quadratic form and let ◦ denote the Clifford multiplication, i.e., x◦x = − `=1 x2` for x ∈ Rn . First let us recall the irreducible C-representations of Cl(n) and Cl(n) = Cl(n) ⊗ C. We take a positively oriented orthonormal basis {ei } of Rn and define a complex volume element by √ [(n+1)/2] −1 (3.1) e 1 ◦ · · · ◦e n . τ = τn = Lemma 3.1. (0) Cl(2n) (respectively Cl(2n)) has a unique irreducible C-representation; rCl : Cl(2n) → EndC (S), rCl : Cl(2n) → EndC (S),
(3.2) n
S = S2n = C2 . (3.3)
S has a Z2 -grading S ± = (1 ± rCl (τ ))S and the above representations are both Z2 graded. (1) Cl(2n+1) (respectively Cl(2n+1)) has two kinds of irreducible C-representations; r Cl± : Cl(2n + 1) → EndC (S ± ),
(3.4) S
r
Cl±
±
=
± S2n+1
2n
=C ,
±
: Cl(2n + 1) → EndC (S ),
(3.5)
with rCl± (τ ) = ±1. For the proof and concrete constructions refer to [12], etc. In (1), we set rCl = rCl+ , + , and hence we have rCl (τ ) = 1. rCl = rCl+ , S = S2n+1 = S2n+1 Lemma 3.2. (cf. [6, Sect. 1]). The representation rnCl of Cl(n) is C-equivalent to a 0 representation rnCl of Cl(n − 2) ⊗ Cl(2) = Cl(n) with representation space Sn−2 ⊗ S2 given by 0
Cl (e0 )(s) ⊗ r2Cl (τ2 )(t), rnCl (e0 )(s ⊗ t) = rn−2
0
rnCl (e00 )(s ⊗ t) = s ⊗ r2Cl (e00 )(t),
where e0 ∈ Rn−2 ⊂ Cl(n − 2) and e00 ∈ R2 ⊂ Cl(2). Let us take the standard representation rC : U (1) → GLC (C) ≡ GLC (C)
(3.6)
and the representation given by left multiplication rH : Sp(1) → (GLH (H) ,→)GLC (C2 ) ≡ GLC (H), Lemma 3.3. (i) rC 6= rC , rH = rH .
rH (ξ + jη) =
ξ −η¯ . (3.7) η ξ¯
Spinq , Twistor and Spinc
117
(ii) As for the (half) spinor representations Cl(±) 1(±) restricted " : Spin(n) → GLC (S (±) ), n = “ rn
we have
(
(0) 12n = 12n ,
1± 2n
=
1± 2n ,
n; even
1∓ 2n ,
n; odd.
(1) 1+2n+1 = 1− 2n+1 = 12n+1 ,
(3.8)
12n+1 = 12n+1 .
Proof. rH = rH is due to the fact that rH has a quaternionic structure; see [16] for example. We can show (ii)(0) by observing the image of e1 ◦ · · · ◦e2n ∈ Spin(2n). Definition 3.4. The spinor representations are defined by: (±) c(±) c ), (0) 1c(±) 2n = 12n ⊗ rC : Spin (2n) → GLC (S
S c(±) = S (±) ⊗ C,
(±) q(±) q ), S q(±) = S (±) ⊗ H, 1q(±) 2n = 12n ⊗ rH : Spin (2n) → GLC (S c+ c S c+ = S ⊗ C, (1) 1c+ 2n+1 = 12n+1 ⊗ rC : Spin (2n + 1) → GLC (S ), c− c ), 1c− 2n+1 = 12n+1 ⊗ rC : Spin (2n + 1) → GLC (S
S c− = S ⊗ C,
c− c c S c = S ⊗ (C ⊕ C), 1c2n+1 = 1c+ 2n+1 ⊕ 12n+1 : Spin (2n + 1) → GLC (S ), q q q 12n+1 = 12n+1 ⊗ rH : Spinq (2n + 1) → GLC (S ), S = S ⊗ H.
We will use the above expressions for representations throughout the paper, but we would like to mention here that, up to C-equivalence, there exist the following expressions and relations, which will be important in some cases: Cl(±) restricted ", 1c(±) 2n = “ r2n
( 1q± 2n
=
1q± 2n ,
n, even,
1q∓ 2n ,
n, odd,
c∓ 1c± 2n+1 = 12n+1 ,
(3.9)
,
1q2n = 1q2n
1q2n+1 = 1q2n+1 .
(3.10)
(3.11)
Further we have the following expressions for 1q in the style of [13, Sect. Sect. 1,2]. Lemma 3.5. Set N = 2[n/2] . 1q has the following expression: (i)
If n = 8m, 8m + 1, 8m + 5, 8m + 6, 8m + 7, then 1q : Spinq (n) → GLC (HN ) = GLC (C2N ), ¯ 1q ([ϕ, λ])(z + wj) = rCl (ϕ)(z + wj)λ.
(ii) If n = 8m + 2, 8m + 3, 8m + 4, then 1q is the complexification of 1qR : Spinq (n) → GLR (HN/2 ) = GLR (R2N ), ¯ 1q ([ϕ, λ])(z + jw) = rCl (ϕ)(z + jw)λ. R
118
M. Nagase
Remark 3.6. (ii) is just the Spinq -representation given in [13]. In the case (i), we used, ˜ q of in [13], not 1q given in (i) above but the complexification 1 ˜ qR : Spinq (n) → GLR (HN ) = GLR (R4N ), 1 ¯ ˜ qR ([ϕ, λ])(z + jw) = r˜ Cl (ϕ)(z + jw)λ, 1 where r˜ Cl : Cl(n) → GLH (HN ) (= r˜ Cl+ if n is odd) is the H-representation given by ˜ q = 21q . left multiplication. The relation is obviously 1 Proof. The representation rH0 : Sp(1) → GLC (H) = GLC (C2 ) given by rH0 (λ)(z +wj) = (z + wj)λ¯ is C-equivalent to rH . Since the expression in (i) is obviously C-equivalent to 1n ⊗ rH0 , (i) is true. (ii) can be shown easily by the same argument as that of [13, Proof of Theorem 5.1 for n = 4 + 8m]. Now, Spinq -, Spinc -vector bundles over M , Z are defined by Sq = PSpinq (n) ×1q S q ,
Sc = PSpinc (n+2) (Z) ×1c S c .
(3.12)
We denote by ∇M,q , ∇Z,c the covariant derivatives on (3.12) induced from αM,q and αZ,c . Dirac operators Dq , Dc on 0(Sq ), 0(Sc ) are defined by X X c e0a ◦∇eM,q , D = ea ◦∇eZ,c , (3.13) Dq = 0 a a where {e01 , · · · , e0n }, {e1 , · · · , en+2 } are orthonormal bases of Tx M with g M , Tz Z with g Z , respectively. Proposition 3.7 (Lichnerowicz formulas). Set N = 2[n/2] and denote by 1N the N × N -identity matrix. Then we have 1 (3.14) (D q )2 = ( ∇M,q )∗ ∇M,q + κM 4 1 1 1 0 0 1 01 √ √ 23 + + 1N ⊗ −112 , 31 + 0 −1 −1 0 10 2 −1 1 1 (3.15) (Dc )2 = ( ∇Z,c )∗ ∇Z,c + κZ + U (1) . 4 2 Proof (cf. [13, (4.30)]). Let Sp(1) denote the curvature 2-form of the locally defined vector bundle H = PSpinq (n) ×rH H with connection ξ1q∗ αSO(3) . Then, as is well-known (see [4, Sect. 3.5] for example), we have X 1 (Dq )2 = (∇M,q )∗ ∇M,q + κM + Sp(1) (e0a , e0b )e0a ◦e0b ◦. 4
(3.16)
a
Through the isomorphism (2.12), obviously we have 1 Sp(1) = − (i23 + j31 + k12 ) . 2
(3.17)
Taking its image by rH∗ , we get thus the last term in (3.14). Next, similarly, since the curvature 2-form of (PU (1) (Z))1/2 is U (1) /2, we have (3.15).
Spinq , Twistor and Spinc
119
Finally we give the proof of Lemma 2.1(6) by investigating the Chern character of an index bundle of a family of Dirac operators. Set c(±) V = PSpin , Sc(±) c (2) (Z) ×1c(±) S2 V
(3.18)
which possesses a covariant derivative ∇V,c(±) induced from αV,c . In the same manner as in (3.12) and (3.13), we may define a family of Dirac operators X c(±) c(∓) e00i ◦∇V,c (3.19) DV = DxV | x ∈ M , DxV = e00 : 0(SV,(x) ) → 0(SV,(x) ), i
where fibre π
{e001 , e002 } −1
is an orthonormal basis of V and ScV,(x) is the restriction of ScV to the
(x).
Proof of Lemma 2.1(6). It is well-known that we can identify: c− 0,∗ (TC∗ CP 1 ) = ∧0,0 (TC∗ CP 1 ) ⊕ ∧0,1 (TC∗ CP 1 ), ScV,(x) = Sc+ V,(x) ⊕ SV,(x) = ∧
V
D =
0 DV− DV+ 0
= 2(∂¯ + ∂¯ ∗ ) =
0 2∂¯ ∗ 2∂¯ 0
(3.20) .
Thus we have dim Ker DV+ = 1, dim Ker DV− = 0 and the index bundle Ind DV = ` V character of the bundle x∈M Ker Dx is a trivial line bundle over M . Hence the Chern 2∗ (M ). On the other hand, is equal to 1, i.e., ch Ind DV = exp c1 Ind DV = 1 in HDR the Atiyah-Singer family index theorem ([4, Corollary 10.24]) asserts Z 2∗ ˆ V ) exp 1 c1 (V ) ch Ind DV = in HDR (M ). (3.21) A( 2 Z/M Thus we get the equality (2.5).
4. The Indices of Dirac Operators Let us assume here that (M, g M ) is compact and of dimension 2n. Using the complex volume elements of M , Z, √ n √ n+1 −1 e01 ◦ · · · ◦e02n , τ Z = −1 e1 ◦ · · · ◦e2n+2 , (4.1) τM = we have the splittings Sq = Sq+ ⊕ Sq− ,
Sq± = (1 ± τ M ) Sq = PSpinq (2n) ×1q± S q± ,
Sc = Sc+ ⊕ Sc− ,
Sc± = (1 ± τ Z ) Sc = PSpinc (2n+2) (Z) ×1c± S c± .
Accordingly the Dirac operators can be written as 0 Dq− 0 Dc− q c , D = . D = Dq+ 0 Dc+ 0
(4.2)
(4.3)
Note that, in [13, Sect. 5], e01 ◦ · · · ◦e02n was used as a (complex) volume element τ M to make such a splitting of Sq as in (4.2), so that, in the case 2n = 8m + 4, Sq± and Dq± in (4.2) and (4.3) are equal to S∓ and D∓ in [13, Sect. 5].
120
M. Nagase
ˆ · ) be the A-class ˆ Theorem 4.1. Let A( and p1 ( · ) be the first Pontryagin class. Then we have Z 1 ˆ p1 (PSO(3) )1/2 (= 0 if n is odd), (4.4) A(M ) cosh ind Dq+ = 2 2 M Z 1 c+ ˆ c1 (PU (1) (Z)) . (4.5) A(Z) exp ind D = 2 Z Proof (cf. [13, Sect. 5]). Let us show (4.4) first. According to the Atiyah-Singer index theorem, it suffices to calculate the Chern character of the (locally defined) H given in the proof of Proposition 3.7. Its curvature 2-form Sp(1) is given by ÿ ! √ √ 23 −12 + −131 1 −1 Sp(1) = . (4.6) √ 2π 4π −12 − −131 −23 Hence we have
√ 1 −1 det t12 − Sp(1) = t2 − 212 + 223 + 231 2 2π (4π)
= t2 −
(4.7)
1 p1 (PSO(3) ) ≡ (t − x)(t + x). 4
Thus we have ch (H) = ex + e−x = 2 cosh The formula (4.5) is obvious.
1 p1 (PSO(3) )1/2 . 2
(4.8)
5. Reduced η-Invariants of Dirac Operators and the Adiabatic Limit Let us assume here that (M, g M ) is compact and of dimension 2n − 1. We consider the Lie group Spinqc (2n + 1) = Spin(2n + 1) ×Z2 Sp(1) ×Z2 U (1) and set V PSpinqc (2n+1) (Z) = π ∗ PSpinq (2n−1) × PSpin ×mult Spinqc (2n + 1), c (2) (Z) (5.1) PSO(3)×U (1) (Z) = π ∗ PSO(3) × PUV(1) (Z), which induce canonically, to say, a Spinqc -structure, of PSO(2n+1) (Z), ξ qc : PSpinqc (2n+1) (Z) → PSO(2n+1) (Z) × PSO(3)×U (1) (Z).
(5.2)
Let us take a connection αSO(3)×U (1) = αSO(3) × αU (1) on PSO(3)×U (1) (Z). Then we can define two kinds of connections on PSpinqc (2n+1) (Z) as αZ,qc = ξ qc∗ (αZ ⊕ αSO(3)×U (1) ),
α⊕,qc = ξ qc∗ (α⊕ ⊕ αSO(3)×U (1) ),
(5.3)
which induce covariant derivatives ∇Z,qc , ∇⊕,qc on q ⊗ S2c . Sqc ≡ π ∗ Sq ⊗ ScV = PSpinqc (2n+1) (Z) ×can S2n−1
(5.4)
Spinq , Twistor and Spinc
121
In the same way as Lemma 2.2, it is easily shown that we have α⊕,qc = Θ∗qc (π ∗ αM,q ⊕ V αV,c ), where Θqc : π ∗ PSpinq (2n−1) × PSpin c (2) (Z) → PSpinqc (2n+1) (Z) is a canoniqc V ⊕,qc = cal homomorphism given by Θ (pz , pz ) = [(pz , pV z ), 1]. We have hence ∇ π ∗ ∇M,q ⊗ 1 + 1 ⊗ ∇V,c . Recall that we have the reduction embedding (1.20) and we can identify 1q |Spinc (2n − 1) = 1c+ ⊕ 1c− = 1c : Spinc (2n − 1) → GLC (S q ) = GLC (S ⊗ (C ⊕ C)) = GLC (S c ). If we set c(±) Sc(±) = PSpinc (2n−1) (Z) ×1c(±) S2n−1 , H
(5.5)
then there exists thus an isomorphism π ∗ Sq ∼ = ScH
(5.6)
c− as Cl(H)-module bundles. Notice that, in general, the decomposition ScH = Sc+ H ⊕ SH q is twisted along the fibres, so that it does not induce a decomposition of S . We can now identify (5.7) Sqc ∼ = ScH ⊗ ScV
as Cl(T Z)-module bundles, where we assume that the actions of Cl(T Z) on Sqc ScH ⊗ScV 0 Cl are induced from the representation r2n+1 given in Lemma 3.2. Hence (5.4) has also the Z,c ⊕,c H,c = ∇ ⊗1+1⊗∇V,c associated to (2.11) (see Lemma covariant derivatives ∇ , ∇ 2.2), where ∇H,c is associated to αH,c . It is noteworthy that the covariant derivatives ∇Z,qc , ∇⊕,qc may not be reduced to ∇Z,c , ∇⊕,c . P In this section, our interest lies in the Dirac operators, Dq for Sq , Dqc = ea ◦∇Z,qc ea qc c qc for S (not D in (3.13)) and its adiabatic version Dε associated to the metric gεZ = ε−1 π ∗ g M + g V . For such Dirac operators D, the η-functions are defined by Z ∞ 2 1 t(s−1)/2 Tr De−tD dt, η(D)(s) = 0((s + 1)/2) 0
(5.8)
Re s >> 0.
(5.9)
By analytic continuation to the whole complex plane we obtain meromorphic functions, which are regular at s = 0 ([2]). The η-invariants and the reduced η-invariants are then defined by Z ∞ 2 1 1 t−1/2 Tr De−tD dt, η(D) ¯ = {dim Ker D + η(D)} . η(D) = η(D)(0) = √ 2 π 0 (5.10) 2 Note that Tr(De−tD ) = O(t1/2 ) as t → 0 ([7, (2.13)]) so that the integral expressions for η(D) are well-defined. The purpose of this section is to prove Theorem 5.1. The (adiabatic) limit limε→0 η(D ¯ εqc ) exists in R/Z and there is an odd degree form η˜ on M such that Z 1 qc q M 1/2 ˆ lim η(D ¯ ε )− η(D ¯ ) ≡ 2 p1 (SO(3) ) ∧ η˜ (mod Z), (5.11) A( ) cosh ε→0 2 M Z ˆ V ) exp 1 c1 (V ) − 1. dη˜ = (5.12) A( 2 Z/M
122
M. Nagase
ˆ M ), etc., should not be replaced by the cohomology Remark 5.2. In general: (1) A( ˆ class A(M ), etc.,R in contrast to Theorem 4.1. (2) In the similar way we can show ˆ M ) ∧ η˜ 0 , in which unfortunately no terms related to η(D ¯ εc ) = M A( ¯ q) limε→0 η(D appear. This will be discussed closely elsewhere. In general it is quite hard to calculate (reduced) η-invariants explicitly because of analytic continuation and more than anything else they are not locally computable, that is, they cannot be obtained by integrating any differential forms which are given by canonical expressions derived from the symbols of Dirac operators. It follows, however, from (5.11) and the definition of the form η˜ (see (5.24) and (5.29)) that the difference ¯ εqc ) − η(D ¯ q ) modulo Z is locally computable at least in the base space limε→0 η(D direction. The idea of extracting such a partially locally computable value by taking the adiabatic limit (i.e., by killing an extra local term) is originally due to Witten [18, IV] (for some fibration over a circle). His work has been given full mathematical treatment by Bismut-Freed [7] and Cheeger [9], and is extended by Bismut-Cheeger [6] and Dai [10] to the case in which the base space of the fibration is a compact Spin manifold. The argument in this section follows essentially the lines of [6, Sect. 4] and [10, Theorem 0.10 ], but requires some care because the interest of [6, 10] lies not in Spinq nor in Spinc but in Spin. We take a positively oriented local orthonormal frame (e1 , · · · , e2n−1 ) = (e01 , · · · , 0 e2n−1 ) of T M as before and denote its lift to H by the same symbol. Further let (e2n , e2n+1 ) = (e001 , e002 ) be such a local frame of V. Then (2.3) yields 1X Z ⊕,qc g (S(ea )eb , ec )eb ◦ec ◦. = (5.13) ∇Z,qc ea − ∇ ea 4 Notice that the skew-endomorphism S(ea ) of T Z is lifted (by the map ξ) to the right side which acts on Sqc : see [4, Proposition 3.7]. On the other hand, (5.7) induces the tensor product expression 0(Sqc ) = π ∗ 0(Sq ) ⊗ 0(ScV ).
(5.14)
First we want to express Dqc (acting on the left side), in terms of operators acting on the right side, ∇⊕,qc and X e00i ◦∇eV,c DV = 1 ⊗ DV = 1 ⊗ 00 , c(T ) =
X
i
e0i ◦e0j ◦c(T )(e0i , e0j ) =
i≤j
X
e0i ◦e0j ◦
X
g Z (T (e0i , e0j ), e00k )e00k ◦.
(5.15)
i≤j
Here c(T ) is the quantization of the torsion tensor T of ∇⊕ and is thus a global crosssection of Cl(T Z). X 1 e0i ◦∇e⊕,qc + DV − c(T ). Lemma 5.3. (cf. [6, (4.26)]). Dqc = 0 i 4 Proof. (5.13) and Lemma 2.1(3) imply that Dqc is equal to X 1X Z ea ◦∇⊕,qc g (S(ea )eb , ec )ea ◦eb ◦ec ◦ + ea 4 X 1X Z g (T (e0i , e0j ), e00k )e00k ◦e0i ◦e0j ◦. = ea ◦∇⊕,qc − ea 8 = 1 ⊗ ∇V,c Hence, using the fact ∇⊕,qc e00 e00 , we obtain the lemma. i
i
Spinq , Twistor and Spinc
123
Let us find out then a similar expression for Dεqc . We have fixed the local frame (e1 , · · · , e2n+1 ) for g Z , and , accordingly, we may take such a frame (ε1/2 e1 , · · · , ε1/2 e2n−1 , e2n , e2n+1 ) for gεZ . By identifying the two frames, the two reduced structure bundles PSO(2n+1) (Z) = PSO(2n+1) (Z, g Z ) and PSO(2n+1) (Z, gεZ ) may be naturally identified. The identification canonically identifies Cl(T Z) = Cl(T Z, g Z ) and Cl(T Z, gεZ ), and, moreover, identifies Scq = Scq (Z, g Z ) and Scq (Z, gεZ ). Through the identifications, let us regard Dεqc as acting on 0(Scq ): see [6, Sect. 1]. Thus both Dqc and Dεqc act on the same object 0(Scq ) and it is then easy to get a similar expression for Dεqc . Namely, in the expression for Dqc , we have only to replace as follows: g Z ⇒ gεZ ,
e0i ⇒ ε1/2 e0i
We obtain thus the expression
(e0i ◦, e00i , e00i ◦ leave unchanged).
(5.16)
X
ε (5.17) + DV − c(T ). e0i ◦∇e⊕,qc 0 i 4 Now, using this, we want to calculate limε→0 η(D ¯ εqc ) in the same way as [6, Sect. 4] and [10, Theorem 0.10 ]. Notice that DV is not invertible: see (3.20). Hence we cannot apply [6, Sect. 4(b)] directly. However, fortunately, since Ind DV forms a trivial line bundle over M , we can apply the same arguments as in [6, Sect. 4(d)] and [10] to our case. + − ⊕ H∞ over M First, let us define an infinite dimensional vector bundle H∞ = H∞ by ± = 0(Sc± (5.18) H∞,x V,(x) ). We have an obvious functorial isomorphism ˜ ˜ 0(Sc ) ∼ ψ(x) = (π −1 (x) 3 z 7→ ψ(z)), (5.19) = 0(H∞ ), ψ ↔ ψ, Dεqc = ε1/2
V
which defines a hermitian inner product ( , ) on H∞ by Z (ψ1 , ψ2 )ScV volV,x , (ψ˜ 1 , ψ˜ 2 )x =
(5.20)
π −1 (x)
where volV,x is the volume element of π −1 (x) with metric g V . Now, similarly to [6, Definition 4.29], we will take the Levi-Civita superconnection (or the Bismut superconnection) on H∞ , 1 cˆ(T ), u > 0, (5.21) 4u1/2 P V,c 0∗ 0∗ 0 0 ˜ ˜ V,c where we set ∇ i≤j ei ∧ ej ⊗ c(T )(ei , ej ). Note that e0i ψ = (∇e0i ψ)˜ and cˆ(T ) = this is unitary with respect to the hermitian inner product (5.20). Further, the orthog˜ V,c to the subbundle Ind DV which is trivial gives its conneconal projection of ∇ Ind tion ∇ , which obviously equals the exterior differential on M . Let us consider now 2 the renormalized Chern character form of Au (or the fiberwise supertrace of e−Au ), 2 2 2 + − ) − tr(e−Au |H∞ ). Then we have str(e−Au ) = tr(e−Au |H∞ ˜ V,c + u1/2 DV − Au = ∇
Lemma 5.4. (cf. [4, Corollary 9.22 and Theorem 10.23], etc.). 2 lim str e−Au = 1, u→∞ Z √ √ √ 2 1 ˆ lim str e−Au = (2π −1)−1 c1 (2π −1V ) . −1V ) exp A(2π u→0 2 Z/M
124
M. Nagase
Proof. Since the curvature Ind of ∇Ind vanishes, we have limu→∞ str(e−Au ) = exp(−Ind ) = 1. 2
Next, let us consider an obvious functorial isomorphism π ∗ 0(Sq ) ⊗ 0(ScV ) ∼ = 0(Sq ) ⊗ 0(H∞ ), The superconnection q
˜ π ∗ φ ⊗ ψ ↔ φ ⊗ ψ.
Bu = ∇M,q ⊗ 1 + 1 ⊗ Au
(5.22) (5.23)
qc D1/u
through the idention S ⊗ H∞ produces a Dirac operator which is equal to u fications (5.14) and (5.22). We set X cˆ(T ) −A2u e = ˆ η(u) ˆ = str DV + [η(u)] 2j−1 , 4u (5.24) X 1 √ ˆ η(u) ˜ = [η(u)] 2j−1 , (2π −1)j 1/2
where [η(u)] ˆ ˆ homogeneous component of degree 2j − 1. It follows 2j−1 is the η(u)’s from [4, Theorems 9.23 and 10.32(1)] that we have uniform convergence ( O(u−1 ), u → ∞, (5.25) η(u) ˆ = O(1), u → 0. Moreover, in the same way as the proof of [6, (4.40)] (du/2u1/2 of which should be removed), we obtain Lemma 5.5. We have uniform convergence as ε → 0, Z √ qc 2 1/2 ˆ M ) cosh 1 p1 (SO(3) )1/2 ∧ η(u)+O(ε (1+uN )) ˜ Tr Dεqc e−u(Dε ) = 2 π A( 2 M for some N . Proof. First of all, note that here we use the usual (unrenormalized) characteristic forms √ ˆ ˆ M ), etc., while in [6] the renormalized ones A(2π −1M ), etc.are used. To prove A( the lemma, it will suffice to indicate only one point: In [6] Bismut and Cheeger take a Spin manifold B and consider a bundle S(B) ⊗ H∞ , but here we take a Spinq manifold M and consider the bundle Sq ⊗ H∞ . The difference makes us modify [6, (4.69)] into 2 1 − ∂α + g M M (fα , fβ )fγ , fδ y β dy γ dy δ + Sp(1) , (5.26) 8 where the first term is just the same one as in [6, (4.69)] ({y α } is a normal coordinate system around a point y0 ∈ M and {fα } is the parallel transport of {∂α = ∂/∂y α } at y0 along the geodesics) and Sp(1) is the curvature 2-form given in the proof of Proposition 3.7. Accordingly, [6, (4.74)] (the number ` of which should be replaced by k) is changed into Z 2 1 1 qc −u(Dεqc )2 M 1/2 ˆ p1 (SO(3) ) = 1/2 Tr Dε e ∧η(u). ˜ A( ) cosh lim √ ε→0 2 πu1/2 2 2u M (5.27) Note that Sp(1) produces 2 cosh(p1 (SO(3) )1/2 /2): see (4.8).
Spinq , Twistor and Spinc
125
Now we can prove Theorem 5.1. Proof of Theorem 5.1. First it follows from [6, Sect. 4(d)] and the argument in [10] that ¯ εqc ) exists in R/Z. We have then the limit limε→0 η(D ¯ εqc ) − η(D ¯ q ) = lim η(D ¯ εqc ) − η(D ¯ q ⊗ Ind DV ) (5.28) lim η(D ε→0 Z ˆ M ) cosh 1 p1 (SO(3) )1/2 ∧ η, ˜ A( ≡2 2 M P 0 q V ei ◦(∇M,q ⊗ 1 + 1 ⊗ ∇Ind where we set Dq ⊗ Ind DV = e0i ) acting on 0(S ⊗ Ind D ) e0i and Z ∞ du η(u) ˜ , (5.29) η˜ = 1/2 2u 0 ε→0
which is obviously convergent because of (5.25). The fact that Ind DV is a trivial line bundle with trivial connection implies the first equality. Lemma 5.5 and the same arguof which should be removed) ment as in the proof of [10, Theorem 0.10 ] (the R term h/2 ˆ M ) cosh(p1 (SO(3) )1/2 /2) ∧ η˜ is imply the second one. Notice that the term 2 M A( R∞ √ obtained by calculating (2 π)−1 0 t−1/2 limε→0 Tr(Dεqc exp(−t(Dεqc )2 ))dt: see (5.10). R∞ 1/2 Next, let us calculate the exterior differentiation of η. ˜ We set ηˆ = 0 (η(u)/2u ˆ )du. Then the transgression formula ([4]) for the superconnection Au , 2 cˆ(T ) −A2u η(u) ˆ ∂ 1 str e−Au = −d str DV + e = −d 1/2 , (5.30) ∂u 4u 2u1/2 2u and Lemma 5.4 imply Z ∞ du −A2u −A2u η(u) ˆ = lim str e str e − lim dηˆ = d u→∞ 2u1/2 u→0 0 √
−1
= (2π −1) Hence we have X dη˜ =
Z Z/M
√ ˆ −1V ) exp A(2π
1 √ d[η] ˆ 2j−1 = (2π −1)j
Z Z/M
(5.31)
√ 1 c1 (2π −1V ) 2
ˆ V ) exp A(
− 1.
1 c1 (V ) − 1. 2
(5.32)
References 1. Atiyah, M.F., Bott, R. and Shapiro, A.; Clifford modules. Topology 3 (Suppl. 1), 3–38 (1964) 2. Atiyah, M.F., Patodi, V.K. and Singer, I.M.: Spectral asymmetry and Riemannian geometry I. Math. Proc. Camb. Philos. Soc. 77, 43–69 (1975) 3. Atiyah, M.F. and Singer, I.M.: The index of elliptic operators III. Ann. Math. 87, 546–604 (1968) 4. Berline, N., Getzler, E. and Vergne, M.: Heat kernels and Dirac operators. Berlin–Heidelberg: SpringerVerlag, , 1992 5. Besse, A.L.: Einstein manifolds. Berlin–Heidelberg: Springer-Verlag, 1987 6. Bismut,J.-M. and Cheeger, J.: η-invariants and their adiabatic limits. J. Am. Math. Soc. 2, 33–70 (1989)
126
M. Nagase
7. Bismut, J.-M. and Freed, D.S.: The analysis of elliptic families II, Dirac operators, eta invariants and the holonomy theorem. Commun. Math. Phys. 107, 103–163 (1986) 8. Cheeger, J.: On the formulas of Atiyah-Patodi-Singer and Witten. Proc. of ICM, Berkeley, 1986 pp. 515–521 9. Cheeger, J.: η-invariants, the adiabatic approximation and conical singularities. J. Diff. Geom. 26, 175– 221 (1987) 10. Dai, X.: Adiabatic limits, nonmultiplicativity of signature and Leray spectral sequence. J. Am. Math. Soc. 4, 265–321 (1991) 11. Gilkey, P.B.: Invariance theory, the heat equation and the Atiyah-Singer index theorem. Math. Lecture Series, No.11, Boston: Publish or Perish, 1984 12. Lawson, H.B. and Michelsohn, M.: Spin geometry. Princeton, N J: Princeton Univ. Press, 1989 13. Nagase, M.: Spinq structures. J. Math. Soc. Japan 47, 93–119 (1995) 14. O’Brian, N.R. and Rawnsley, J.H.: Twistor spaces. Ann. Global Anal. Geom. 3, 29–58 (1985) 15. Roe, J.: Elliptic operators, topology and asymptotic methods. Pitman Res. Notes in Math. Series 179, Harlow: Longman Scientific and Technical, 1988 16. Salamon, S.M.: Quaternionic K¨ahler manifolds. Invent. Math. 67, 143–171 (1982) 17. Vilms, J.: Totally geodesic maps. J. Diff. Geom. 4, 73–79 (1970) 18. Witten, E.: Global gravitational anomalies. Commun. Math. Phys. 100, 197–229 (1985) Communicated by S. T. Yau
Commun. Math. Phys. 189, 127 – 144 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Translational Symmetry Breaking and Soliton Sectors for Massive Quantum Spin Models in 1 + 1 Dimensions Taku Matsui Graduate School of Mathematics, Kyushu University, 1-10-6 Kakozaki, Highashi-ku, Fukuoka 812-81, Japan. E-mail:
[email protected] Received: 13 January 1997 / Accepted: 11 March 1997
Abstract: We consider the classification of pure infinite volume ground states and that of soliton sectors for 1+1 dimensional massive quantum spin models. We obtain a proof that non-translationally invariant ground state cannot exist for a class of translationally invariant Hamiltonians including the spin 1 AKLT (Affleck Kennedy Lieb Tasaki) antiferromagnetic spin model. We also obtain a complete classification of soliton sectors (up to unitary equivalence) for certain massive models (e.g. ferromagnetic XXZ models).
1. Preliminary In this paper, we prove the absence of non-translationally invariant ground states for a class of quantum spin models on one hand and give the classification of soliton sectors for another class of models, both 1+1 dimensional quantum lattice systems. Our method depends crucially on the assumption of a uniform spectral gap (massiveness) for finite volume Hamiltoninans. We present a class of Hamiltonians with the unique infinite volume ground state which have more than two local ground states. We will also present a complete classification for stable soliton sectors. By “stable” we mean that the state has the positive energy. Our analysis implies that the spin 1 antiferromagnetic model introduced by I. Affleck, T. Kennedy, E. Lieb and H. Tasaki in [1] has the unique ground state. The model is determined by the following Hamiltonian. X X 1 hj (1.1) S (j) · S (j+1) + (S (j) · S (j+1) )2 = H= 3 j∈Z
j∈Z
The Hamiltonian (1.1) is referred to as the AKLT model. In [1], all the finite volume ground states were determined explicitly. It was also shown that the infinite volume state
128
T. Matsui
ϕ satisfying the zero energy condition ϕ(hj ) = inf spec(h0 ) is unique where inf spec(h0 ) is the infimum of the spectra of h0 . This unique state is called “valence bond ground state”. Our uniqueness is a stronger implication. Namely, consider the finite volume ground state ϕ3 for the finite volume Hamiltonian H3 + B3 with the boundary term B3 , X H3 + B3 = hj + B 3 . j∈3
Assume that the infinite volume limit of this Hamiltonian is equivalent to the AKLT model (1.1) in the sense that it gives rise to the same infinite volume dynamics. In another word, lim [B3 , Q] = 0 3→Z
for any local observable Q. Our claim is that the infinite volume limit of ϕ3 is necessarily the valence bond ground state of [1]. Another consequence of our analysis is a classification of soliton sectors. Our typical example is the ferromagnetic XXZ model, X 1 σx(j) σx(j+1) + σy(j) σy(j+1) , 1 − σz(j) σz(j+1) − (1.2) H= 1 j∈Z
where 1 > 1. With use of the quantum group SUq (2) invariant boundary condition, we obtain non-translationally invariant infinite volume ground states called “kink” or “domain wall”. This fact was discoverd by S.R. Alcaraz, R.S. Salinas, and W.F. Wreszinski, (cf. [2]) and by C.T. Gottstein and R. Werner independently (see [7]) and the complete classification of these ground states was obtained by C.T. Gottstein and R. Werner (see [7] and [15]). Our analysis covers the classification problem in this context and the proof presented here is simpler than that published before. We can apply our results to other models. Mathematically speaking, the infinite quantum spin models are described in the language of C ∗ -algebras (see [6]). We now fix notations to describe our basic assumptions precisely. The algebra of quantum observables is the UHF C ∗ -algebra A ( the infinite tensor product of the algebra Mn (C) of n by n complex matrices). A=
O
C∗
Mn (C)
,
Z
where any component of the tensor product is indexed by an integer j. Let Q be a matrix in Mn (C). By Q(j) we denote the following element of A: ··· ⊗ 1 ⊗ 1 ⊗
Q ⊗1 ⊗ 1 ⊗ · · · ∈ A. |{z} the j-th component
Given a subset 3 of Z , A3 is defined as the subalgebra of A generated by all Q(j) with Q ∈ Mn (C) , j ∈ 3. We also set Aloc = ∪|3|<∞
A3 ,
where |3| is the cardinality of 3. Suppose that ϕ is a state of A. The restriction of ϕ to A3 is denoted by ϕ3 ,
Massive Quantum Spin Models in 1 + 1 Dimensions
ϕ|A3
129
=
ϕ3 .
The lattice translation τj is an automorphism of A defined by τj (Q(k) ) = Q(j+k) . The time evolution αt of our systems is generated by the translationally invariant finite range Hamiltonain. This means that we have a selfadjoint local energy operator h0 , h0 = h∗0 ∈ A30
,
τj (h0 ) = hj
and the finite volume Hamiltonian H3 is determined by X
H3 =
hj .
j:j+30 ⊂3
The time evolution αt (Q) of Q ∈ A is obtained via the thermodynamic limit, αt (Q) = lim eitH3 Qe−itH3 . 3→Z
˜ = lim H˜3 are called equivalent if Two Hamiltonians, H = lim H3 and H lim [ H3 , Q ] = lim
3→Z
3→Z
H˜3 , Q
for any Q ∈ Aloc .
(1.3)
The next task is to recall the definition of the infinite volume ground state. For finite systems, the ground state is an eigenvector of H3 with the least eigenvalue where H3 is identified with a finite matrix rather than an element of A. An infinite volume ground state is, then, the infinite volume limit of these states. However, a moment thought reveals that this is not the only possible physical ground states. The reason is twofold. One is the effect of boundary conditions. It is easy to imagine that the addition of boundary external fields can change the degeneracy of ground states completely as is the case of the ferromagnetic XXZ model. The second reason is the possibility of existence of low lying spectrum which converges to the ground states energy exponentially fast in the infinite volume limit. Such an example is provided by the quantum Ising model in a transversal field. The Hamiltonian of this model is X σz(j) σz(j+1) + λσx(j) , H= j∈Z
where λ is a real parameter. If λ 6= 0 the finite volume Hamiltonian has the unique ground state. This follows from the Perron-Frobenious theorem for positive matrices. However, the exact solution obtained via the Jordan Wigner transformation shows that the second least eigenvalue converges to the ground state energy exponentially fast in the infinite volume limit and we obtain two pure ground states when |λ| < 1. Taking into account of these cases, three different definitions of ground states have been used in mathematical physics. Suppose h0 (hence H3 ) is given.
130
T. Matsui
Definition 1.1. Let ϕ be a state of A. ϕ is a translationally invariant ground state if and only if ϕ is translationally invariant, ϕ ◦ τj = ϕ
for any j,
and ϕ(h0 ) = inf{ψ(h0 )},
(1.4)
where the infimum is taken among all translationally invariant states ψ. The set of all translationally invariant ground states will be denoted by G Z . Remark 1.2. Due to translational invariance for the state ψ in (1.4) , we have H3 lim ψ = ψ(h0 ). 3→Z |3| So the condition (1.4) means that the ground state minimizes the average energy per unit volume. The translationally invariant ground state exists for any finite range translationally invariant Hamiltonians. Definition 1.3. A state ϕ of A is a zero energy state if and only if for any j in Z, ϕ(hj ) = inf{ψ(h0 )},
(1.5)
where the infimun is taken among all states ψ. Remark 1.4. In (1.5), the right-hand side is zero when we subtract the ground state energy from H3 . The zero energy state may not exist in general. See [1], [5] and [12] for examples of Hamiltonians with non-trivial zero energy states. A structure analysis of zero energy states was initiated in [8]. When a zero energy state exists any translationally invariant ground state is a zero energy state. However, non-translationally invariant zero energy states may exist. The set of all zero energy state is denoted by Gzero . Definition 1.5. A state ϕ is a ground state if and only if ϕ(Q∗ [H, Q]) = lim ϕ(Q∗ [H3 , Q]) ≥ 0 3→Z
(1.6)
for any Q ∈ Aloc . The set of all ground states will be denoted by G. Let ξ3 be the ground state eigenvector of the finite volume Hamiltonian H3 with the eigenvalue E3 , H3 ξ3 = E3 ξ3 . Then the state ϕ∞ determined by ϕ∞ (Q) = lim (ξ3 , Qξ3 ) 3→Z
is a ground state in the sense of (1.6). To see this, for Q in A3 , ξ30 , Q∗ [H30 , Q] ξ30 = Qξ30 , (H30 − 30 )Qξ30 ≥ 0 for sufficiently large 30 . This implies (1.6).
Massive Quantum Spin Models in 1 + 1 Dimensions
131
Remark 1.6. If ϕ is a translationally invariant state satisfying (1.6), (1.4) is also valid (cf. [4]). As a consequence, if Gzero is non-empty, ⊂
GZ
Gzero
⊂
G.
G Z , Gzero and G are convex, weak ∗ closed subsets of the state space of A. Any extremal state in Gzero is a pure state and the associated GNS representation is irreducible. The same remark is valid for G, however, it may happen that the extremal state in G Z is not pure and that it may decompose into non translationally invariant periodic states. In various expamples, the uniqueness and the classification of states in Gzero and G Z are possible to establish ([1], [5], [11], [12]). The difference of Gzero and G is rather difficult to clarify. This article deals with the existence and the classification of ground states which are not necessarily translationally invariant. We will always assume that zero energy states exist and that the spectral gap of H3 is open uniformly in the volume 3. The GNS representation of A associated with the non-translationally invariant ground state is referred to as, in generic cases, the soliton sector because it is an interface of two mutually non-equivalent translationally invariant ground state representations.
2. Infinite Volume Ground States
Let us recall several equivalent conditions for ground states defined by (1.6). Suppose that h0 = h∗0 ∈ A30 , where 30 is a finite subset of Z containing the origin 0. Let r be the diameter of 30 , r = sup{|i − j| | i, j ∈ 30 }. Set X B3 = hj . j+30 ∩36=∅ ,j+30 ∩3c 6=∅
The following theorem is due to O. Bratteli, A. Kishimoto and D. Robinson (see [4]). Theorem 2.1 (Bratteli, Kishimoto, Robinson). Let ϕ be a state of A. The following conditions are equivalent. (i) ϕ is a ground state. (ii) For any integers n and m ( n < m ), ϕ(H[n,m] + B[n,m] ) = inf ψ(H[n,m] + B[n,m] ),
(2.1)
where the infimum is taken among any state ψ satisfying ϕ[n,m]c = ψ[n,m]c . To obtain the above theorem, O. Bratteli, A. Kishimoto and D. Robinson used the following lemma which is crucial in our argument below. Lemma 2.2. Suppose that 3 is a finite subset of Z and ϕ and ψ are states of A. If these are identical outside 3, ϕ3c = ψ3c , the GNS representations associated with ϕ and ψ are quasi-equivalent. In particular, when both ϕ and ψ are pure they are unitarily equivalent.
132
T. Matsui
In terms of the GNS representation, the ground state condition (1.6) is equivalent to the positivity of the spectrum of the effective Hamiltonian in the GNS space. More precisely, if ϕ is a ground state and {πϕ , ϕ , Hϕ } is the GNS triple for ϕ, ϕ(Q) = ϕ , πϕ (Q)ϕ , there exists the positive selfadjoint operator H ϕ on Hϕ satisfying H ϕ ϕ = 0
,
Hϕ ≥ 0
(2.2)
and ϕ
ϕ
eitH πϕ (Q)e−itH = πϕ (αt (Q)).
(2.3)
H ϕ will be referred to as the effective Hamiltonian. In various examples, H ϕ is obtained by the subtraction of the ground state energy. It is known that H ϕ is affiliated to the von Neumann algebra πϕ (A)00 generated by πϕ (A). For any t ∈ R , ϕ
eitH ∈ πϕ (A)00 . Any unit vector ξ satisfying H ϕ ξ = 0 gives rise to a ground state ϕξ defined by ϕξ (Q) = ξ, πϕ (Q)ξ .
We next consider the zero energy state. Without loss of generality, we may assume that the local Hamiltonain is positive (hj ≥ 0) and for any zero energy state ϕ, H3 ≥ 0
,
ϕ(H3 ) = 0
for any 3.
(2.4)
Then passing to the GNS space associated with the zero energy state ϕ, (2.4) reads πϕ (H3 )ϕ = 0 . The effective Hamiltonian H ϕ is the limit of πϕ (H3 ). The convergence is in the sense of strong resolvent convergence, 1 1 = st − lim 3→Z πϕ (H3 ) + i Hϕ + i
,
ϕ
st − lim eitπϕ (H3 ) = eitH . 3→Z
Lemma 2.3. Suppose that (2.4) is valid. Then H ϕξ = 0 if and only if
ξ, πϕ (H3 )ξ = 0 for any 3.
(2.5)
(2.6)
Massive Quantum Spin Models in 1 + 1 Dimensions
133
Proof. Assume (2.5) is valid. To show (2.6), we have only to note that 1 1 ξ, ϕ ξ = st − lim ξ , ξ 3→Z H +1 πϕ (H3 ) + 1 and, by the monotone decreasing property of (H3 + 1)−1 in 3 due to hj ≥ 0, we have 1 1 ξ, ϕ ξ ≤ ξ, ξ . H +1 πϕ (H3 ) + 1 Thus, due to (2.5) we obtain 1 1 (ξ, ξ)= ξ, ϕ ξ ≤ ξ, ξ ≤ ( ξ , ξ ). H +1 πϕ (H3 ) + 1 This implies
(ξ, ξ) =
ξ,
As a consquence, we obtain (2.6). The converse implication is obvious.
1 ξ πϕ (H3 ) + 1
.
Lemma 2.4. Suppose that (2.4) is valid and ϕ is a zero energy state. Then for any ground state ψ, the total energy is finite, ψ(H3 ) ≤ 2r kh0 k
(2.7)
for any 3. Proof. In (2.1), put the trial state ψ˜ defined by ψ˜ = ψ3c ⊗ ϕ3 .
We now state a sufficient condition for absence of the non-translationally invariant ground state. Assumption 2.5. Suppose that (2.4) is valid. Assume further (i) H[n,m] ( n < m) has the spectral gap γ(> 0) uniformly in n and m, spec H[n,m] ∩ (0, γ) = ∅.
(2.8)
(ii) There exist the unique zero energy states ψ, ψ− ψ+ for Hamiltonians HZ , H(−∞,−1] , and H[0,∞) . Namely, ψ− is the unique state of A(−∞,−1] specified with the condition, ψ− (hj ) = 0
for any j − r < 0,
ψ+ is the unique state of A[0,∞) specified with the condition, ψ+ (hj ) = 0
for any j + r ≥ 0 ,
where r is the range of interaction and ψ is the unique state of A = AZ specified with the condition, ψ(hj ) = 0 for any j ∈ Z.
134
T. Matsui
Proposition 2.6. If Assumption 2.5 is valid, the ground state is unique, in particular, it is translationally invariant. Proof of Proposition 2.6. Let ψ and ψ± be the zero energy states of Assumption 2.5 and ϕ (for HZ ) be a pure ground state. By {ϕ , πϕ , Hϕ } we denote the GNS representation associated with ϕ. Let P3 be the projection to the eigenvectors with the zero eigenvalue πϕ (H3 ). Obviously P3 P30 = P30 P3 = P3
provided 3 ⊂ 30 .
By Lemma 2.4, for any small > 0, we have n such that X 0≤ ϕ(hj ) < .
(2.9)
(2.10)
|j|>n
Set
P˜m = P3˜m
,
3˜m = [−m, −n ] ∪ [n , m].
As P˜m is a decreasing sequence of projections (due to (2.9)), the following limit exists in the strong operator topology, P˜∞ = st − lim P˜m . m→∞
We claim that P˜∞ ϕ 6= 0. In fact,
ϕ − P˜∞ ϕ 2 = 1 − P˜∞ ϕ 2 = ϕ(1 − P˜∞ ) and
γϕ(1 − P˜m ) ≤ ϕ((1 − P˜m )H3˜m (1 − P˜m )) = ϕ(H3˜m ) < .
As a consequence,
2 1 − P˜∞ ϕ ≤ < 1. γ
Thus
P˜∞ ϕ 6= 0.
Next consider the vector state ϕ˜ defined by 1 ϕ(Q) ˜ =
P˜∞ ϕ , πϕ (Q)P˜∞ ϕ .
P˜∞ ϕ 2 Obviously we have ϕ(h ˜ j) = 0
(2.11)
if |j| > n + r.
By our Assumption 2.5 (ii), ϕ˜ [−n −r,n +r]c = ψ[−n −r,n +r]c As a consequence ϕ, ˜ ϕ and ψ are unitarily equivalent due to Lemma 2.2. It turns out that ϕ is a zero energy state due to Lemma 2.3. Thus, ϕ = ψ. Unfortunately Assumption 2.5 is valid only for a few models such as the ferromagnetic XXX model in an external field. In many cases, the zero energy state is not unique for half infinite Hamiltonians, H(−∞,−1] and H[0,∞) , even though the zero energy state is unique for the two sided infinite Hamiltonian HZ . With a modification of the above argument we can apply our idea to a larger class of Hamiltonians.
Massive Quantum Spin Models in 1 + 1 Dimensions
135
Assumption 2.7. Suppose that (2.4) and the condition (i) of Assumption 2.5 are valid. Furthermore we assume (i)
The zero energy state for the Hamiltonian HZ is unique. By ψ we denote this unique zero energy state.
(ii) All the zero energy states for H(−∞,−1] are mutally quasi-equivalent. Namely if a state ϕ of A(−∞,−1] satisfies the condition ϕ(hj ) = 0
for j < 0
ϕ is quasi-equvalent to the state ψ(−∞,−1] (the restriction of ψ in (i)to A(−∞,0] ). (iii) All the zero energy states for H[0,∞) of A[0,∞) are mutally quasi-equivalent. Proposition 2.8. Suppose Assumption 2.7 is valid. The ground state is unique. Examples for the above proposition are presented in the next section. Proof of Proposition 2.8.. Let ϕ be a pure ground state for the Hamiltonian HZ in Proposition 2.8 and we denote the GNS triple associated with ϕ by {ϕ , πϕ , Hϕ }. Projections P3 , P˜m and P˜∞ 6= 0 are defined as in the Proof of Proposition 2.6. Consider again the vector state ϕ˜ defined by (2.11). By our Assumption 2.7, ϕ˜ [−n ,n ]c and ψ[−n ,n ]c are quasi-equivalent. Hence, ϕ˜ and ψ are quasi-equivalent. As ψ is pure due to Assumption 2.7 (i), ψ is a vector state in the GNS representation {ϕ , πϕ , Hϕ }, which, in turn, implies Proposition 2.8 due to Lemma 2.3 and the unicity of the zero energy state of HZ . To show that two states ϕ and ψ are quasi-equivalent, the following condition should be verified. (cf. [4] Corollary 2.6.11.) Lemma 2.9. Two states ϕ and ψ are quasi-equivalent if and only if for any > 0 there exists a finite set 3 such that |ψ(Q) − ϕ(Q)| ≤ kQk
for Q ∈ A3c .
(2.12)
3. Massive Models with the Unique Ground State In this section, we present quantum models for which Proposition 2.8 is valid. The first class of Hamiltonians is that considered in [12]. An example in this class is the following quantum Ising model with the next nearest neighbour interaction in a transversal field, X −σx(j) + 2 sinh β cosh β σz(j) σz(j+1) + sinh2 β σz(j−1) σz(j+1) , H= j∈Z
where β is a real parameter. For this Hamiltonian the zero energy state is obtained by a pure state extension of the Gibbs measure for the one-dimensinal classical Ising model at the inverse temperature β. The second example is the AKLT model of (1.1) for which the zero energy state is the valence bond state introduced in [1]. We begin with a description of the models of [12]. We only consider the spin 1/2 case for simplicity of exposition and notation. By σx(j) ,σy(j) and σz(j) we denote the Pauli
136
T. Matsui
spin matrices on the site j in Z. Consider the configuration space X of the classical Ising spin system in the one-dimensional integer lattice, Z
X = {1, −1} . By the product topology, X is compact and the set of all continous functions on X is denoted by C(X). The coordinate function ( the projection to the jth component )is denoted by σ (j) . Let B be the abelian subalgebra generated by σz(j) ( j in Z ). B can be identified with C(X) in a natural way, σz(j) = σ (j) . If A is a finite subset of Z we introduce σ(A) and σα (A) via the following equations: Y Y σ(A) = σ (j) , σα (A) = σα(j) (α = x, y, z). j∈A
j∈A
To introduce the Gibbs measure on X, we fix an interaction { J(A) |A ⊂ Z} which is a real valued function defined on the set of all finite subset A in Z. For a finite subset 3 of Z we introduce the classical Hamiltonian J3 and the boundary energy B3 via the equations X J(A)σ(A) J3 = A⊂3
B3 =
X
J(A)σ(A).
(3.1)
A∩36=∅,A∩3c 6=∅
Throughout this section, we assume that the interaction is translationally invariant and of finite range, J(A + k) = J(A) ( k ∈ Z ), J(A) = 0 if the diameter of A is larger than r0 . The Gibbs measure µ is obtained by Z Z Z Z Y 1 dµF (σ) = lim dσ (j) e−J3 −B3 F (σ). 3→Z Z3 j∈Z
Another equivalent definition of the Gibbs measure for the interaction {J(A)|A ⊂ Z} is the unique probability measure µ on X characterized by X dµ(σj ) = exp(2 J(A)σ(A)), (3.2) dµ(σ) A:A3j
dµ(σ )
where σj is the configuration obtained by the spin flip at the site j and dµ(σ)j is the Radon Nikodym derivative for this coordinate transformation. Let H be the L2 space for the Gibbs measure µ. We consider the irreducible representation πµ of A on H determined by the following equation: s dµ(σj ) (j) F (σj ), πµ (σx )F (σ) = dµ(σ) s √ dµ(σj ) (j) (j) πµ (σy )F (σ) = − −1σ F (σj ), dµ(σ) πµ (σz(j) )F (σ) = σ (j) F (σ) for F (σ) in H = L2 (µ).
(3.3)
Massive Quantum Spin Models in 1 + 1 Dimensions
137
The vector state associated with the constant function is denoted by ϕµ , Z ϕµ (Q) = (πµ (Q)1)(σ)dµ(σ).
(3.4)
We introduce the quantum Hamiltonian for which ϕµ is the unique zero energy state. Assumption 3.1. Let {VA (σz ) |A ⊂ Z, |A| < ∞} be a collection of polynomials of σz(j) satisfying the following conditions: 1. Positivity V{j} (σz ) > 0
,
VA (σz ) ≥ 0.
2. Translational Invariance τj (VA (σz )) = VA+j (σz ). 3. Finiteness of the Range of Interaction VA (σz ) = 0
if the diameter of A is larger than r1 .
4. Reversiblity VA (σz )σx (A) = VA (σz )σx (A). For any finite 3 ( 3 ⊂ Z, |Z| < ∞ ) we set X X VA (σz ) exp( H3 = A:3⊃A
J(B)σz (B)) − σx (A)
B:|A∩B| odd
.
(3.5)
Then, H3 ≥ 0 and ϕµ (H3 ) = 0. In [12] we have shown the converse. If ϕ(H3 ) = 0 for any 3, ϕ = ϕµ . Theorem 3.2. Suppose that Assumption 3.1 is valid and consider the infinite volume Hamiltonian introduced via Eq. (3.5). The ground state is unique. Proof. Let r be the range of interaction defined by r = max{r0 , r1 } . Thus if n < m and 3 = [n, m], H3 ∈ A[n−r,m+r] . For this 3 = [n, m] we set ∂ r 3 = [n − r, n − 1] ∪ [m + 1, m + r]. We say that ω is a boundary configuration when ω is a configuration r of the classical Ising spin at the augmented boundary ∂ r 3, ω ∈ {1, −1}∂ 3 and the r (j) component at j in ∂ 3 is denoted by ω . For any boundary configuration ω we consider the projection Y 1/2 1 + ω (j) σz(j) P∂ωr 3 = j∈∂ r 3
and the partial state ( unital completely positive map from A to A(∂ r 3)c ) defined by ω ω ϕ(ω) ∂ r 3 (Q) = trA∂ r 3 (P∂ r 3 QP∂ r 3 ) .
We define the local Hamiltonian with the boundary condition ω by the following equation: (3.6) H3(ω) = ϕ(ω) ∂ r 3 (H3 ) ≥ 0. It is easy to see
X ω
H3(ω) P∂ωr 3 = H3 ,
(3.7)
138
T. Matsui
and ϕµ (H3(ω) P∂ωr 3 ) = 0.
(3.8)
H3(ω) has the unique ground state and the spectral gap uniformly in 3 and ω.This is because H3(ω) is unitarily equivalent to the generator of a spin flip Markov semigroup. The latter operator is a positive perturbation of the generator of the stochastic Ising model.For both generators zero is the ground state eigenvalue with multiplicity one where the eigenvector is the constant function. The fact that the spectral gap of H3(ω) is open uniformly in 3 follows from the result of [10]. By Proposition 2.8, it suffices to show that all the zero energy states of the half infinite Hamiltonian H[0,∞] ( or H[−∞,−1] ) are equivalent. For any ground state ϕ there exists a constant K independent of 3 such that 0 ≤ ϕ(H3(ω) P∂ωr 3 ) ≤ ϕ(H3 ) ≤ K < ∞.
(3.9)
We concentrate on the state ϕ of A[0,∞] satisfying ϕ(H[n,m] ) = 0 ( 0 ≤ n < m ). This means that we consider the state ϕ satisfying ϕ(H3(ω) P∂ωr 3 ) = 0
(3.10)
ω ) 6= 0 ϕ(P[n−r,n]
(3.11)
for any ω. If n is sufficiently large,
for any ω. This is because ϕµ is the unique zero energy state for HZ , and due to (3.10), w − lim ϕ ◦ τj = ϕµ . j→+∞
Let ϕ(ω) [n,∞] be the state of A[n+1,∞] defined by ϕ(ω) [n,∞] (Q)
ω ω ϕ P[n−r,n] QP[n−r,n] . = ω ϕ(P[n−r,n] )
Note that H30 ≤ H3(ω) if 30 ∪ ∂ r 30 ⊂ 3. By (3.10) we have ϕ(ω) [n,∞] (H3 ) = 0.
(3.12)
provided that 3 = [n0 , m0 ] ( n ≤ n0 < m0 ). By the proof of Theorem 3.1 of [12], we conclude ω ω ϕµ P[n−r,n] QP[n−r,n] (ω) ϕ[n,∞) (Q) = ω ϕµ (P[n−r,n] ) for any Q in A[n+1,∞) This means that ϕµ |[n+1,∞] and ϕ|[n+1,∞) are quasi-equivalent, hence ϕµ |[0,∞] and ϕ|[0,∞] are quasi-equivalent. By the same argument, any zero energy state ψ of A(−∞,−1] and ϕµ |(−∞,−1] are quasi-equivalent.
Massive Quantum Spin Models in 1 + 1 Dimensions
139
In principle, Proposition 2.8 can be applied to Hamiltonians with the unique zero energy state which possesses the “quantum Markov property”. (cf. [5].) Here we take the AKLT model (1.1) as a typical example. We give a different presentation of the Hamiltonian (1.1). The algebra of observables is now the infinite tensor product of 3 by 3 matrices, Aloc = ⊗Z M3 (C) and Sx(j) , Sy(j) and Sz(j) are spin 1 matrices at the site j. Set X (S (j) · S (j+1) ) = Sα(j) Sα(j+1) α=x,y,z
and
1 (j) (j+1) 1 1 (S · S ) + (S (j) · S (j+1) )2 + . 2 6 3 hj is the projection to the spin 2 representation of SU(2) at the bond (j,j+1), hj =
hj = h∗j = h2j ≥ 0 The Hamiltonian H AKLT of the AKLT model is X H AKLT = hj . j∈Z
The zero energy state ψ satisfying ψ(hj ) = 0 for any j is unique. The proof of unicity and the construction of this state was done by I.Affleck, T.Kennedy, E.Lieb and H.Tasaki. They also raised the question whether their valence bond state is the unique ground state of the model or not (see the Remark after Theorem 2.7 of [1]). Theorem 3.3. The ground state of H AKLT is unique. To verify the condition of Proposition 2.8 of this paper, we have only to recall the following facts proved by I.Affleck, T.Kennedy, E.Lieb and H.Tasaki. (i) The dimension of the zero energy vector for each interval is 4. (ii) The spectral gap opens uniformly in the volume. (iii) For the half infinite chain, the set of pure zero energy states is parametrized by unit vectors in the two dimensional irreducible representation of SU(2). By ψ± we denote pure zero energy states ( of A[0,∞) or A(−∞,−1] ) associated with spin up and down. Thus we have to show that ψ± are mutually quasi-equivalent. In fact, it is possible to verify Lemma 2.9(see (2.7), (2.14) and (2.29) of [1]). As all the material needed to prove equivalence of ψ± is already presented in [1], we omit the detail. 4. Gauge Symmetry Breaking In this section, we consider the situation where there are more than two pure zero energy states and non trivial soliton sectors appear. In this case, it is more natural to consider the positive energy representation rather than ground states. Definition 4.1. Let (A, αt ) be a C ∗ -dynamical system where A is a C ∗ -algebra and αt is a one-parameter group of automorphisms for A. A representation {π(A), H} of A on the Hilbert space H is called the positive energy representation if there is a positive selfadjoint operator Hπ on H satisfying Hπ ≥ 0
,
eitHπ π(Q)e−itHπ = π(αt (Q))
Q∈A
t ∈ R.
(4.1)
140
T. Matsui
In the context of the algebraic local quantum field theory, the positive energy condition is a part of the selection criteria for the physically relevant representation of the algebra of local observables (see [9]). The classification of the positive energy representations was studied for the exactly solvable one-dimensional XY model by H.Araki (see [3]). ( In [3] the positive energy representation is referred to as the finite energy representation.) A theorem of Borchers tells us that we can redefine Hπ which is affiliated to the von Neumann algebra π(A)00 generated by π(A) (cf. Theorem 3.2.46 of [6]). Theorem 4.2 (Borchers). Let {π(A), H} be a positive energy representation of the C ∗ dynamical system (A, αt ). Then there exists a selfadjoint operator Hπ on H satisfying (4.1) such that eitHπ is a unitary group in the von Neumann algebra π(A)00 . Due to this Borchers theorem, the central decomposition of a positive energy state gives rise to the decomposition to factorial positive energy representations. We now turn to our quantum lattice models. We again assume that zero energy states exist and (2.4) is valid. Lemma 4.3. Let {π(A), H} be a positive energy representaion for the Hamiltonian satisfying (2.4). Suppose that a zero energy state ψ exists. Let Hπ be the positive operator of Theorem 4.2. Suppose that ξ is a unit vector in H with the compact spectral support in the spectral decomposition of Hπ . Consider the vector state ϕξ of A determined by ϕξ (Q) = (ξ, π(Q)ξ) . Then there exists a constant K such that X ϕξ (hj ) ≤ K 0≤
(4.2)
j∈Z
Proof of Lemma 4.3. The proof is similar to that for ground states (cf. [4]). We will denote the σ-weak extension of ϕξ on the von Neumann π(A)00 by the same symbol ϕξ . Set 3(k) = {−k, −k + 1, ..., k − 1, k} and let r be the range of the interaction. Suppose H3(k) ∈ A3(k) and [H3(k+l) − H3(k+r) , Q] = 0
if Q is in A3(k) and l > r .
As a conditional expectation exists from A to A3c , we have π(A)00 ∩π(A3 )0 = π(A3c )00 . By the Trotter Kato formula, t
t
lim (ei N Hπ e−i N π(H3 ) )N = eit(Hπ −π(H3 )) .
N →∞
Thus Hπ − π(H3 ) is affiliated with π(A)00 , Furthermore, if Q is in A3(k) [ Hπ − π(H3(k+r) ) , π(Q) ] = 0. Any local element Q is an analytic element of the time evolution generated by Hπ − π(H3(k+r) ). We can conclude that eit(Hπ −π(H3(k+r) )) π(Q)e−it(Hπ −π(H3(k+r) )) = π(Q) for Q in A3(k) . Thus eit(Hπ −π(H3(k+r) )) ∈ π(A)00 ∩π(A3(k) )0 . Hπ −π(H3(k+r) ) is affiliated with π(A3(k)c )00 , eit(Hπ −π(H3(k+r) )) ∈ π(A3(k)c )00 . Consider the state ϕ˜ξ defined by
Massive Quantum Spin Models in 1 + 1 Dimensions
141
ϕ˜ξ = (ϕξ )3(k)c ⊗ ψ3(k) . ϕ˜ξ is quasi-equivalent to ϕξ and (ϕ˜ξ )3(k)c = (ϕξ )3(k)c . Thus for the σ−weak extension of (ϕ˜ξ )3(k)c and (ϕξ )3(k)c to π(A3(k)c )00 we obtain (ϕ˜ξ )(eit(Hπ −π(H3(k+r) )) ) = (ϕξ )(eit(Hπ −π(H3(k+r) )) ).
(4.3)
By differentiating (4.3) with respect to t, we have ϕξ (Hπ − π(H3(k+r) )) = ϕ˜ξ (Hπ − π(H3(k+r) )) = ϕ˜ ξ (Hπ ) − ϕ˜ ξ (H3(k+r) ) ≥ −ϕ˜ ξ (H3(k+r) )) ≥ −K0 , where K0 is a constant independent of k. As a consequence, we get 0 ≤ ϕξ (H3 ) ≤ ϕξ (Hπ ) + K0 = K. Assumption 4.4. Assume (2.4) is valid and the spectral gap is open in the sense of (2.8). We further assume that the number of unitary equivalence classes of (GNS representations for) pure zero energy states of A(−∞,−1] for H(−∞,−1] is finite. The same condition is valid for H[0,∞) . (α) Theorem 4.5. Suppose that Assumption 4.4 is valid. Take zero energy states ψ− of are the representative of uniA(−∞,−1] for H(−∞,−1] and ψ+(β) of A[0,∞) for H[0,∞) which n o n o (α) (α) tary equivalence classes of pure zero energy states. Let π− ( resp. π+(β) , H+(β) , H−
(α) ( resp. ψ+(β) ) be the GNS represenation of A(−∞,−1] (resp. A[0,∞) ) associated with ψ− ). Consider the representation {πZ , HZ } of A determined by
πZ =
X⊕
(α) π− ⊗ π+(β) ,
α,β
HZ =
X⊕
(α) H− ⊗ H+(β) .
(4.4)
α,β
Any positive energy representation is quasi-contained in {πZ , HZ }, namely, it is quasiequivalent to a subrepresentation of {πZ , HZ }. The above theorem tells us that the existence of non-translationally invariant ground states is reduced to another spectral problem, namely, the existence of the point spectrum at the bottom of spec(Hπ ) in the soliton sector in the universal positive energy representation {πZ , HZ }. Corollary 4.6. Suppose that Assumption 4.4 is valid. For any ground state ϕ, there exists a density matrix ρϕ in HZ of Theorem 4.5 satisfying (4.5) trHZ ρϕ πZ (Q) = ϕ(Q) for Q ∈ A.
142
T. Matsui
(α) In many cases, zero energy states ψ− and ψ+(β) for half infinite systems can be identified via the reflection at the point 1/2 (the middle point of the origin and 1 in the lattice Z) (α) ⊗ ψ+(α) of the full algebra Anis equivalent to a translationally invariant and the state ψ− o (α) (α) ground state. So the soliton sector is the part π− ⊗ π+(β) , H− ⊗ H+(β) for α 6= β.
Proof of Theorem 4.5. Suppose that {π , H} is a positive energy representation. Let π(P3 ) be the zero energy projection for H3 . Let ϕξ be the vector state of Lemma 4.3. As in the proof of Proposition 2.6, there exists m such that for 3 = [−n, −m] ∪ [m, n], lim P3 ξ = ξ∞ 6= 0 .
n→∞
As the state ϕ∞ defined by ϕ∞ (Q) =
1 2
kξ∞ k
(ξ∞ , π(Q)ξ∞ )
satisfies the zero energy condition for H[−m+1,m−1]c . By Lemma 2.9, the representation π(A), π(A)ξ∞ ( the restriction of π(A) to π(A)ξ∞ ) is quasi-contained in {πZ , HZ }. Then the same procedure continues in the orthogonal complement of π(A)ξ∞ . In the above proof, the assumption of the finiteness of equivalence classes of pure zero energy states is not essential. For example, consider the gauge group action γg on our UHF algebras A (= a product type action of a compact group G on A). Hence γg (Q(j) ) ∈ A{j}
for any Q in Mn (C) and any j in Z.
G The fixed point subalgebra by this action is denoted by AG , AG loc or A3 ,
AG = {Q ∈ A | γg (Q) = Q ∀g ∈ G}
G AG 3 = A ∩ A3
G AG loc = A ∩ Aloc .
Assumption 4.7. Assume (2.4) is valid and the spectral gap opens in the sense of (2.8). We further assume the following conditions. (i) The Hamiltonian is gauge invariant, hj ∈ AG ∩ Aloc . (ii) The gauge invariant zero energy state exists uniquely for HZ . (iii) All the gauge invariant zero energy states of A(−∞,−1] for H(−∞,−1] are mutually quasi-equivalent. The same condition is valid for H[0,∞) . By the idea of proof of the Theorem 4.5, we can also show the following result. Theorem 4.8. Suppose Assumption 4.7 is valid. Let ψ− be a pure zero energy state of A(−∞,−1] for H(−∞,−1] . Let ψ+ be a pure zero energy state of A[0,∞) for H[0,∞) . Let ϕ be a ground state of HZ . Then, there exists g± ∈ G such that ϕ and ψ− ◦ γg− ⊗ ψ+ ◦ γg+ are (unitarily) equivalent. In short, Theorem 4.8 claims that the soliton sectors of a massive model is parametrized by the pair of states ψ− ◦ γg− , ψ+ ◦ γg+ ( g− 6= g+ ). We present an example where the gauge group G is Z2 . The model is the Z2 symmetric version of the Hamiltonian of (3.5) (see [14] for the higher spin version of models described below). The ferromagnetic XXZ model (1.2) is the simplest example. Suppose the finite range translationally invariant interaction {JA } for the classical Ising spin is given. Let r0 be the range of the interaction.
Massive Quantum Spin Models in 1 + 1 Dimensions
143
Assumption 4.9. Let {VA (σz ) |A ⊂ Z , |A| < ∞} be a collection of polynomials of σz(j) satisfying the following conditions. 1. Positivity V{j,j+1} (σz ) > 0
,
VA (σz ) ≥ 0.
2. Translational Invariance τj (VA (σz )) = VA+j (σz ). 3. Finiteness of the Range of Interaction VA (σz ) = 0
if the diameter of A is larger than r1 .
4. Reversiblity VA (σz )σx (A) = VA (σz )σx (A). 5. Z2 Invariance VA (σz ) = 0
if |A| is odd.
(4.6)
Let Θ be the automorphism of A determined by Θ(σz(j) ) = σz(j)
,
Θ(σα(j) ) = −σα(j)
if α = x, y.
(4.7)
This Θ gives rise to a Z2 action on A. Consider the Hamiltonian H3 defined by (3.5) under Assumption 4.9. Due to (4.6) H3 is Θ invariant. When restricted to AZ2 the time evolution is conjugate to that of Theorem 3.2 via the Kramer-Wannier duality. In particular, the spectral gap is open uniformly in 3. In [13], we have also shown that any zero energy state is a convex combination of ϕµ and ϕµ ◦ Θ where ϕµ is the state defined in (3.4). Thus by Theorem 4.8 and the argument similar to the proof of Theorem 3.2 we obtain possible candidates of non-translationally invariant ground states. Let Θ− be the automorphism A defined by Θ− (σz(j) ) = σz(j)
for any j,
Θ− (σα(j) ) Θ− (σα(j) )
if α = x, y and j > 0 ,
= =
σα(j) −σα(j)
if α = x, y and j ≤ 0.
(4.8)
Theorem 4.10. Suppose Assumption 4.9 is valid. Any irreducible positive energy representation is equivalent to the GNS representation associated with one of the following states, ϕµ , ϕµ ◦ Θ, ϕµ ◦ Θ− or ϕµ ◦ Θ ◦ Θ− . If a non-traslationally invariant pure ground state ψ exists, it is equivalent to ϕµ ◦ Θ− or ϕµ ◦ Θ ◦ Θ− . Theorem 4.10 tells us that any positive energy representation of the XXZ model (1.2) with 1 > 1 is necessarily a ground state representation. This is because any vector states associated with any factorial positive energy representation is equivalent to the GNS representation of one of the listed states of Theorem 4.10. We know that these are equivalent to ground states due to the result of [7]. Acknowledgement. This work is supported by Sumitomo Foundation (grant 960268).
144
T. Matsui
References 1. Affleck, I., Kennedy, T., Lieb, E.H., Tasaki, H.: Valence Bond Ground States in Isotropic Quantum Antiferromagnets. Commun. Math. Phys. 115, 477–528 (1988) 2. Alcaraz, S.R., Salinas, R.S., Wreszinski, W.F.: Anisotropic ferromagnetic quantum domain. Phys. Rev. Lett. 75, 930–933 (1995) 3. Araki: H.Soliton sector of the XY-model. International Journal of Modern Physics B, 10, Nos. 13 & 14, 1685–1693, 1996 4. Bratteli, O., Kishimoto, A., Robinson, D.: Ground states of quantum spin systems. Commun. Math. Phys. 64, 41–48 (1978) 5. Fannes, M., Nachtergaele, B., Werner, R.: Finitely Correlated States on Quantum Spin Chains. Commun. Math. Phys. 144, 443–490 (1992) 6. Bratteli, O., Robinson, D.: Operator algebras and quantum statistical mechanics I,II. Berlin– Heidelberg–New York: Springer, 1979 7. Gottstein, C.T., Werner, R.: Zero-energy states of the ferromagnetic XXZ chain. Preprint, Osnabr¨uck, 1995 8. Gottstein, C.T., Werner, R.: Zero-energy ground states of quantum lattice systems. Preprint, Osnabr¨uck, 1995 9. Haag, R.: Local Quantum Physics. Berlin–Heidelberg–New York: Springer-Verlag, 1992 10. Holley, R.: Rapid convergence to equilibrium in one-dimensional stochastic Ising models. Ann. Probab. 13, 72–89 (1985) 11. Matsui, T.: Uniqueness of translationally invariant ground state in quantum spin systems.Commun. Math. Phys. 126, 453–467 (1990) 12. Matsui, T.: Gibbs measure as quantum ground states. Commun. Math. Phys. 135, 79–89 (1990) 13. Matsui, T.: On Ground State Degeneracy of Z2 Symmetric Quantum Spin Models. Publ. RIMS, Kyoto Univ. 27, 657–659 (1991) 14. Matsui, T.: Markov Semigroups which Describe the Time Evolution of Some Higher Spin Quantum Models, J. Functional Analysis 116, 179–198 (1993) 15. Matsui, T.: On ground states of the one-dimensional ferromagnetic XXZ model. Lett. Math. Phys. 37, 397–403 (1996) Communicated by H. Araki
Commun. Math. Phys. 189, 145 – 164 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Existence of Constant Mean Curvature Foliations in Spacetimes with Two-Dimensional Local Symmetry Alan D. Rendall? Institut des Hautes Etudes Scientifiques, 35 Route de Chartres, 91440 Bures sur Yvette, France Received: 15 July 1996 / Accepted: 12 March 1997
Abstract: It is shown that in a class of maximal globally hyperbolic spacetimes admitting two local Killing vectors, the past (defined with respect to an appropriate time orientation) of any compact constant mean curvature hypersurface can be covered by a foliation of compact constant mean curvature hypersurfaces. Moreover, the mean curvature of the leaves of this foliation takes on arbitrarily negative values and so the initial singularity in these spacetimes is a crushing singularity. The simplest examples occur when the spatial topology is that of a torus, with the standard global Killing vectors, but more exotic topologies are also covered. In the course of the proof it is shown that in this class of spacetimes a kind of positive mass theorem holds. The symmetry singles out a compact surface passing through any given point of spacetime and the Hawking mass of any such surface is non-negative. If the Hawking mass of any one of these surfaces is zero then the entire spacetime is flat. 1. Introduction There are a number of general results in the literature on the properties of foliations by compact spacelike hypersurfaces of constant mean curvature (CMC hypersurfaces) in spacetimes which admit a compact Cauchy hypersurface. (See [18, 1] and references therein.) In particular, a basic result of Gerhardt [9] shows that if there is a foliation whose mean curvature tends uniformly to infinity, there is a CMC foliation with the same property. However, the only results which give criteria in terms of Cauchy data for the existence of such foliations covering more than a small neighbourhood of a given CMC hypersurface are restricted to special classes of spacetimes, all of which have high symmetry. The results of this paper also apply only to certain spacetimes with symmetry but represent a significant generalization, since they include for the first time spacetimes containing both matter and gravitational waves. The method used suggests that there is ?
Present address: Max-Planck-Institut f¨ur Gravitationsphysik, Schlaatzweg 1, 14473 Potsdam, Germany
146
A. D. Rendall
a close connection between the question of global existence of CMC foliations and that of global existence of solutions of the Einstein-matter equations and such a connection helps to explain why it has up to now been necessary to make symmetry assumptions: we cannot understand the question of global existence of CMC foliations in a context where we do not understand the question of global existence for the Einstein-matter equations. When a CMC foliation exists in a spatially compact spacetime satisfying the strong energy condition it is unique. If the exceptional case of flat spacetime is excluded, the mean curvature of the leaves of this foliation varies in a strictly monotone manner. Thus it provides an invariantly defined preferred time coordinate on spacetime. The spacetimes studied in the following are defined by two conditions. The first is that they be solutions of the Einstein equations coupled to certain matter fields and the second is that they admit a compact CMC Cauchy hypersurface which possesses a two-dimensional abelian group of local symmetries without fixed points. (The second condition, stated here informally, is made precise in the next section.) The simplest example of a symmetry of this kind is that where the compact Cauchy hypersurface is diffeomorphic to the torus T 3 = S 1 × S 1 × S 1 , with the symmetries given by the action of U (1) × U (1) acting on two of the three S 1 factors by rotations. As shown below, the mean curvature of the Cauchy hypersurface must be non-zero, except in the trivial case where the spacetime is flat. Without loss of generality, reversing the time orientation if necessary, it can be assumed to be negative. The main result of this paper (Theorems 5.1 and 6.1) is that if the spacetime is the maximal globally hyperbolic development of data with local U (1) × U (1) symmetry on a CMC Cauchy hypersurface then the entire past of this hypersurface can be covered by a CMC foliation, with the mean curvature taking all values in the interval (−∞, H0 ], where H0 is the mean curvature of the initial hypersurface. In particular the initial singularity in these spacetimes is a crushing singularity in the sense of Eardley and Smarr[8]. The assumption made on the matter model is that it is either collisionless matter modelled by the Vlasov equation (Theorem 5.1) or a wave map with values in an arbitrary complete Riemannian manifold (Theorem 6.1). It is also shown that the CMC foliation can be extended so that the mean curvature takes on all values in the interval (−∞, 0). Unfortunately this does not by itself suffice to show that the CMC foliation covers the entire future of the initial hypersurface. Special cases of this result are already known. The first is that of the Gowdy spacetimes on the torus. These are vacuum spacetimes with global U (1) × U (1) symmetry which satisfy the additional condition that the so-called twist constants vanish. (The meaning of this is explained in Sect. 3. For more information on its significance, see [6].) The result was proved in this case by Isenberg and Moncrief [14]. The second is that of solutions of the Einstein-Vlasov system with plane symmetry [19]. In the first of these cases there is no matter present, while in the second there are no gravitational waves. The results of this paper contain both these results as special cases. It should be noted that they go beyond previous results even in the vacuum case in two ways: they require only local, rather than global symmetry and they allow non-vanishing twist constants. The essential new element in comparison with the cases considered previously is the occurrence of nonlinear hyperbolic equations which are coupled to the matter equations. These are treated with the help of methods introduced by Gu [13] in the study of wave maps and by Glassey and Strauss [12] in the study of the Vlasov-Maxwell system. As a by-product of this analysis, a theorem on the positivity of the Hawking mass in spacetimes with local U (1)×U (1) symmetry is obtained (Proposition 3.1). This says that the Hawking mass of any surface of symmetry in a spacetime of this type is non-negative and that if the Hawking mass vanishes for any one of these surfaces in a spacetime, then the spacetime is flat.
Constant Mean Curvature Foliations in Spacetimes with Local Symmetry
147
2. Local U (1) × U (1) Symmetry The spacetimes considered in the following are defined on manifolds of the form M = R × S, where S is a bundle over the circle S 1 whose fibre is a compact orientable surface F . Let p be the projection of the universal cover F˜ onto F . Let gαβ be a globally hyperbolic metric on M for which each submanifold {t} × S is a Cauchy hypersurface. S is covered by its pull-back to a bundle over R. Since R is contractible, the latter bundle is isomorphic to R × F . This in turn is covered in a natural way by R × F˜ , which is simply connected. Hence the universal cover S˜ of S can be identified with R × F˜ and there is a natural fibre preserving projection corresponding to p. Let pˆ be the associated ˜ = R2 × F˜ onto M . Define gˆ αβ to be the pull-back of gαβ by p. ˆ Suppose projection of M ˜ by isometries of gˆ αβ in such a that a two-dimensional Lie group G acts effectively on M way that the orbits are the inverse images under pˆ of the fibres of the bundle S. Each orbit with its induced metric is a simply connected Riemannian manifold of constant curvature and thus must be, up to a constant conformal rescaling, isometric to the standard metric on the sphere, the Euclidean plane or the hyperbolic plane. The isometry group of the sphere has no two-dimensional subgroups and thus the case F = S 2 is not possible. (If G is replaced by a three-dimensional Lie group then F = S 2 is possible and the spherically symmetric spacetimes studied in [19] are obtained.) The surfaces diffeomorphic to F which correspond to the fibres of S and whose inverse images are the group orbits will be referred to as surfaces of symmetry. These spacetimes will be said to have twodimensional local symmetry. In the case where the orbits are isometric to the Euclidean plane, they will be said to have local U (1) × U (1) symmetry. Consider now any spacelike hypersurface S in the spacetime (M, gαβ ) which is a union of surfaces of symmetry. Choose one of these surfaces of symmetry and call it F0 . Let γ be an affinely parametrized geodesic of the induced metric on S which starts orthogonal to F0 . It is also orthogonal to all the other surfaces of symmetry which it meets. Taking all geodesics of this type and following them until they meet F0 again gives a smooth mapping φ from F0 × I to S, where I is some interval. Let the parameter along the geodesics be chosen so that I = [0, 2π]. Let ψ denote the mapping which takes φ(x, 0) to φ(x, 2π) and let ψ˜ denote the lift of this mapping to a diffeomorphism between ˜ defined by following geodesics in the covering space. two inverse images of F0 in M The mapping ψ˜ maps the Killing vectors of F0 corresponding to the group action on one ˜ to those corresponding to another inverse image. These of the inverse images of F0 in M two inverse images can be identified with each other by an isometry, which is uniquely determined up to an element of the isometry group of the Euclidean or hyperbolic plane respectively by their induced metrics. In the case that F˜0 is isometric to the Euclidean plane ψ˜ 0 is the composition of a linear mapping ψ˜ 1 with a translation ψ˜ 2 . The mapping ψ˜ must preserve the lattice of inverse images in F˜0 of a given point in F0 . These lattices are isometric, and so by using the freedom in identifying the two covering spaces, it can be assumed without loss of generality that they are identified with each other. It follows that ψ˜ 1 can be represented as an element of GL(2, Z). When this element is not the identity it means in general that the topology of S is that of a non-trivial torus bundle over the circle. Let y A be periodic coordinates on F0 corresponding to Cartesian coordinates on F˜0 . Let (x, y A ) be Gauss coordinates based on F0 such that y A restrict to the previously chosen coordinates on F0 . Now let B 4 = det(gAB ), where upper case Roman indices take the values 2 and 3. The metric takes the form: dx2 + B 2 g˜ AB (x)dy A dy B ,
(2.1)
148
A. D. Rendall
where det g˜ AB = 1. Let L be the length of a geodesic which starts normal to an orbit RL and ends when it intersects the same orbit again. Define a = 2π/( 0 B −1 (x)dx) and Z x 0 x =a B −1 (y)dy. (2.2) 0
If x0 is used as a coordinate and the primes omitted from the notation the metric takes the form: (2.3) A2 (dx2 + a2 g˜ AB dy A dy B ), where A is a−1 B, reexpressed as a function of x0 . The new coordinate x runs from 0 to D ˜ CD (x), where nA 2π. The functions g˜ AB (x) satisfy the relation g˜ AB (x + 2π) = nC A nB g B are the components of a matrix in GL(2, Z). A satisfies A(x + 2π) = A(x). A similar analysis for the case that F˜0 is the hyperbolic plane would no doubt be more complicated and is not attempted here. However it appears that, due to the fact that vector fields and tracefree symmetric rank two tensors on a surface of genus higher than one must have zeroes, in that case nothing will be obtained which goes beyond the spacetimes with hyperbolic symmetry already studied in [19]. The metric g˜ AB can be parametrized in terms of two functions W and V in the following way: g˜ 22 = eW cosh V, g˜ 33 = e−W cosh V, g˜ 23 = sinh V.
(2.4)
The values of V and W at x = 0 and x = 2π are related by a diffeomorphism N which does not have a simple explicit form. There are several special cases which are of interest. Consider first the case where the matrix N with components nA B is the identity. Then there is a natural action of U (1) × U (1) on the spacetime and we have the case of (global) U (1) × U (1) symmetry. A further specialization is given by the assumption that the reflections y A 7→ −y A are isometries of the spacetime metric for A = 2, 3. Spacetimes satisfying this condition will be called polarized U (1) × U (1)-symmetric spacetimes. They have the property that V = 0. The vacuum spacetimes of this class are the polarized Gowdy spacetimes [7]. The plane symmetric spacetimes studied in [19] have the property that W = V = 0. When N is not the identity, there are two qualitatively different cases. If N is diagonalizable and not the identity, then either it is minus the identity, or the two eigenvalues are distinct. If it is minus the identity then S has a two-fold cover which is a torus and for our purposes is essentially the same as when N is the identity. When the eigenvalues are distinct the manifold S admits a geometric structure of type Sol in the sense of Thurston [20]. The metrics obtained in that case include ones which are of Bianchi type VI0 . There is a polarized case, where reflections in the eigendirections of N are supposed to be isometries of the metric gˆ αβ on the universal covering space. If N has a non-standard Jordan form then, by passing to a two-fold cover if necessary, we can assume that these eigenvalues are equal to unity. The resulting manifold S admits a geometric structure of type Nil [20]. The metrics obtained in that case include those of Bianchi type II. Lemma 2.1. Let (M, g) be a non-flat spacetime with local U (1)×U (1) symmetry having a symmetric constant mean curvature Cauchy hypersurface and satisfying the dominant and strong energy conditions. Then given any point p on the Cauchy surface there exists an open neighbourhood U of p and a smooth local diffeomorphism φ of I × [0, 2π] × T 2 onto U for some interval I such that: (i) If φ(t, x1 , y1 ) = φ(t, x2 , y2 ) then x1 = 0 and x2 = 2π or vice versa.
Constant Mean Curvature Foliations in Spacetimes with Local Symmetry
149
(ii) For each t ∈ I the set φ({t} × [0, 2π] × T 2 ) is a hypersurface of constant mean curvature t. (iii) The pull-back of the metric under φ has the form −α2 dt2 + A2 [(dx + β 1 dt)2 + a2 g˜ AB (dy A + β A dt)(dy B + β B dt)].
(2.5)
The functions α, β a , A and g˜ AB depend on t and x and g˜ AB has unit determinant. They satisfy α(t, 2π) = α(t, 0), A(t, 2π) = A(t, 0), β 1 (t, 2π) = β 1 (t, 0) = 0, β B (t, 0) = 0, D ˜ CD (t, 0), where nA g˜ AB (t, 2π) = nC A nB g B is an element of GL(2, Z). The quantity a depends only on t. Proof. It is a standard fact that, in a non-flat spacetime satisfying the strong energy condition, a neighbourhood of a compact CMC hypersurface with non-zero mean curvature can be foliated by constant mean curvature hypersurfaces and that the mean curvature of these hypersurfaces can be used as a time coordinate. If the U (1) × U (1)-symmetry is global then it follows from the uniqueness of CMC hypersurfaces that they are unions of surfaces of symmetry. If the symmetry of the data is only local then some more care is needed, but since an almost identical argument has been given in [19] the details are omitted here. The mean curvature of the Cauchy hypersurface in the assumption of the lemma cannot be zero. To see this consider the Hamiltonian constraint R − k ab kab + (trk)2 = 16πρ.
(2.6)
When the mean curvature is zero this implies that the scalar curvature R is non-negative. The topology of the Cauchy hypersurface is such that any metric with non-negative scalar curvature must be flat. For its universal cover is diffeomorphic to R3 . This implies ([16], p. 324) that the Cauchy hypersurface admits no metric of positive scalar curvature and it is well-known that a compact 3-manifold satisfying the latter condition admits no nonflat metrics of non-negative scalar curvature. Hence the induced metric on the Cauchy hypersurface is flat and, from the Hamiltonian constraint the second fundamental form and the energy density are zero. It follows from the dominant energy condition that the spacetime is vacuum everywhere and uniqueness in the Cauchy problem for the vacuum Einstein equations shows that the spacetime is flat. Since the spacetime is by hypothesis non-flat, it follows that a maximal hypersurface is impossible. Let U be an open neighbourhood of the Cauchy hypersurface covered by a CMC foliation and let t be the function on U which is equal to the mean curvature of the leaf of the foliation on which the point lies. Choose a surface of symmetry F0 in the initial hypersurface t =const. and identify this with surfaces in the other hypersurfaces t =const. by means of geodesics which start on F0 orthogonal to the Cauchy hypersurface. Construct a mapping on each hypersurface t =const. from [0, 2π] × T 2 in the way described above. Putting together these mappings for all values of t occurring in the foliation of U gives the mapping whose existence is asserted by the lemma. Remark. If the hypotheses of the lemma are weakened to allow the spacetime to be flat then almost all the conclusions remain true. The only property which is lost is that the hypersurfaces of constant t, while still CMC, cannot always be chosen to have mean curvature t. 3. Estimates for the Hawking Mass and Area Radius In this section certain general estimates for spacetimes with local U (1)×U (1) symmetry are derived. A solution of the Einstein constraint equations consists of a 3-dimensional
150
A. D. Rendall
manifold S and a Riemannian metric hab , a symmetric tensor kab , a real-valued function ρ and a covector ja on M which satisfy the Hamiltonian constraint (2.6) and the momentum constraint (3.1) ∇a kab − ∇b (trk) = 8πjb . In this paper it is always assumed that the dominant energy condition holds and this implies that ρ ≥ |ja |. Suppose now that S is covered by a Gaussian foliation. In other words, if F0 is a fixed leaf of the foliation, any other leaf is obtained by going a fixed distance along the geodesics which start normal to F0 . If we think of S as being embedded in spacetime, then the resulting embedding of each leaf F in this spacetime defines various geometrical objects on F , as is always the case for an embedding of pseudoRiemannian manifolds. We present the definition of these objects in terms of such an embedding, but in fact they are uniquely defined by the initial data. There is a preferred orthonormal basis of the normal bundle of F in spacetime, where the first vector is normal to S and the second vector tangent to S. These vectors are defined uniquely up to sign by this condition. The geometric objects defined on F by the embedding are then the induced metric, a second fundamental form associated to each normal vector and a 1-form, which is the representation of the normal connection in the given normal basis. The two second fundamental forms will be denoted by κAB and λAB respectively and the 1-form representing the normal connection will be denoted by ηA . (Here upper case Roman indices are used for objects intrinsic to F . Indices of objects of this kind are raised and lowered using the induced metric gAB and its inverse.) In a vacuum spacetime the freedom in ηA consists of just two spacetime constants. These are the twist constants referred to in the introduction, whose vanishing is one of the defining conditions of Gowdy spacetimes. If u is an arc length parameter along the normal geodesics, the constraints can be written in the following form: ∂u (trλ + trκ) = H(trλ + trκ) + ∇A ηA + K − 8π(ρ + J) − 43 (trλ + trκ)2 − 21 (λ˜ AB + κ˜ AB )(λ˜ AB + κ˜ AB ) − η A ηA , (3.2) ∂u (trλ − trκ) = −H(trλ − trκ) − ∇A ηA + K − 8π(ρ − J) − 43 (trλ − trκ)2 − 21 (λ˜ AB − κ˜ AB )(λ˜ AB − κ˜ AB ) − η A ηA ,(3.3) ∂u ηA = −(trλ)ηA − ∇B κAB − 8πjA .
(3.4)
Here H is the trace of the second fundamental form kab (i.e. the mean curvature), K is the Gaussian curvature of F , κ˜ AB and λ˜ AB are the trace-free parts of κAB and λAB respectively and J is the contraction of the unit normal vector to F in S with ja . This way of writing the constraints generalizes an approach used by Malec and ´ Murchadha [17], for spherically symmetric asymptotically flat spacetimes, by the O author [19] for spatially compact surface symmetric spacetimes and by Chru´sciel [5] for vacuum spacetimes with U (1) × U (1) symmetry. In the following these equations are only used in the case of local U (1) × U (1) symmetry. It should, however, be noted that the form of the equations suggests that there may exist an analogue in the general case. The terms which cause difficulties in general are those containing derivatives tangential to the foliation by surfaces F . These are of the form ∇A ηA and K. The integral of the first of these over F is zero, while the integral of the second is a constant only depending on the topology as a consequence of the Gauss-Bonnet theorem. Thus integrating over F eliminates the tangential derivatives from Eqs. (3.2) and (3.3). Consider now the case of local U (1) × U (1) symmetry, with the Gaussian foliation being that by surfaces of symmetry. Let θ = trλ − trκ, θ0 = trλ + trκ. These are the
Constant Mean Curvature Foliations in Spacetimes with Local Symmetry
151
expansions of the two families of null geodesics orthogonal to F . Let the area radius r be the square root of the area of F . The Hawking mass is defined by m = − 21 r∇α r∇α r. In this case ∇A ηA = K = 0 and equations (3.2) and (3.3) become: ∂u θ = −Hθ − P, ∂u θ0 = Hθ0 − P 0 ,
(3.5) (3.6)
where the quantities P and P 0 are non-negative. It is also useful, following [17], to write these equations in the alternative form: 1 2 2 (θ r + 4θHr2 + θr(θr − θ0 r)), 4r , 1 02 2 0 0 0 2 0 0 ∂u (rθ ) = −Q − (θ r − 4θ Hr + θ r(θ r − θr)) 4r ∂u (rθ) = −Q −
(3.7)
where the quantities Q and Q0 are non-negative. Consider now a symmetric Cauchy hypersurface S, i.e. one which is a union of surfaces of symmetry. Denote the maximum value attained by rθ and rθ0 on this hypersurface by M+ and the minimum by M− . Let x0 be a point where the maximum is attained and suppose without loss of generality that θ(x0 ) ≥ θ0 (x0 ). Since x0 is a critical point of rθ, it follows from (3.7) that at that point either rθ ≤ 0 or (3.8) θ2 r2 + (4Hr)(θr) ≤ 0. It follows that M+ ≤ 4|Hr|. Similarly, M− ≥ −4|Hr|. These inequalities show that θ and θ0 can be bounded in modulus by 4|H|. The Hawking mass is related to the area radius and the expansions by −2m/r = 41 r2 θθ0 . Thus in a spacetime with local U (1) × U (1) symmetry which is foliated by compact CMC hypersurfaces with the mean curvature varying in a finite interval (t1 , t2 ) and which satisfies the dominant energy condition, if r is bounded then 2m/r is bounded. These equations can also be used to prove a kind of positive mass theorem. Proposition 3.1. Let (M, g) be a spatially compact spacetime with local U (1) × U (1) symmetry which satisfies the dominant energy condition. Then the Hawking mass of each surface of symmetry is non-negative and if the Hawking mass of any surface of symmetry is zero the spacetime is flat. Proof. The proof is similar to that of Lemma 2.4 of [19]. If m vanishes on some surface F then θ or θ0 is zero there. Suppose without loss of generality that it is θ. Then it can be concluded as in the proof of Lemma 2.4 of [19] that θ and P vanish on any symmetric compact Cauchy hypersurface containing F . When θ is zero the other expansion θ0 is given by the rate of change of r along the compact Cauchy hypersurface. Hence θ0 must vanish somewhere and, repeating the previous argument, θ0 and P 0 vanish on the whole Cauchy hypersurface. Looking at the explicit forms of P and P 0 shows that ρ = 0, κAB = 0, λAB = 0 and ηA = 0 on the Cauchy hypersurface. The vanishing of λAB implies that g˜ AB is independent of x. It follows that a linear transformation with constant coefficients of the coordinates y A can be used to reduce the metric g˜ AB on a given Cauchy hypersurface to the form δAB . This, together with the vanishing of κAB and ηA , shows that the initial data are plane symmetric. The subset of spacetime where m = 0 is closed. Because of the possibility of deforming spacelike hypersurfaces, it is also open and must be the whole spacetime. Hence the spacetime is plane symmetric and applying Lemma 2.4 of [19] shows that it is flat. If the spacetime is not flat then it follows that θ and θ0 can never vanish. If they had opposite signs then this would
152
A. D. Rendall
mean that the gradient of r was everywhere spacelike and hence that the restriction of r to a Cauchy hypersurface was strictly monotonic. This is clearly impossible, since this restriction must have a critical point somewhere. Hence θ and θ0 have opposite signs, the gradient of r is timelike and the Hawking mass is positive. The timelike vector ∇a r is past-pointing. For otherwise θ would be negative and θ0 positive. Integrating (3.5) from 0 to 2π on a hypersurface of constant time would then imply that H was positive somewhere, contrary to what has already been assumed. It follows that r is non-decreasing to the future along any timelike curve and that its value at any point with time coordinate t1 is bounded by above by its value on the hypersurface t = t2 if t1 < t2 . Let na denote the unit normal to the surfaces of symmetry in the hypersurfaces t =const. and define K = kab na nb . Then, with respect to the coordinates introduced in Lemma 2.1, some of the field equations take the following explicit forms: ∂x2 (A1/2 ) = − 18 A5/2 [ 23 (K − 13 t)2 − 23 t2 + 2ηA η A + κ˜ AB κ˜ AB + λ˜ AB λ˜ AB + 16πρ], 2 −1 ∂x α + A ∂x A∂x α = αA2 [ 23 (K − 13 t)2 + 13 t2
(3.9)
+ 2ηA η A + κ˜ AB κ˜ AB + 4π(ρ + trS)] − A2 , (3.10) −1 −1 AB ˜ ∂x K + 3A ∂x AK − A ∂x At − κ˜ λAB = 8πJA, (3.11) (3.12) ∂x β 1 = −a−1 ∂t a + 21 α(3K − t), ∂t a = a[−∂x β 1 + 21 α(3K − t)],
(3.13)
∂t A = −αKA + ∂x (β A),
(3.14)
1
These equations have been written in a form which makes as clear as possible how they differ from the form they take in the special case of plane symmetric spacetimes. The differences are not very great and in particular Eqs. (3.12)–(3.14) are identical to the corresponding Eqs. (2.6)–(2.8) in [19]. In terms of these variables the expansions are given by θ = 2A−2 ∂x A − t + K, θ0 = 2A−2 ∂x A + t − K. On an interval where H is bounded the quantities θ and θ0 and K = t + 21 (θ − θ0 ) are bounded. On the other hand, it follows from the lapse equation (3.10) that α ≤ 3/t2 . Integrating Eq. (3.13) in space shows that on a finite time interval a and a−1 are bounded. Putting this back into the integrated equation shows that ∂t a is bounded. Equation (3.13) then implies that ∂x β 1 is bounded. Equation (3.14) can be rewritten as ∂t (log A) − β 1 ∂x (log A) = −αK + ∂x β 1 .
(3.15)
Together with the bounds which have just been derived this implies that A and its inverse are bounded. Since r = aA it follows immediately that r and its inverse are bounded. The inequalities just obtained serve as a replacement for the bound for m−1 obtained at the corresponding point in the argument in [19]. The argument of [19] would apparently not work in the present case because ηA makes a contribution to the equation for ∇a m which has the wrong sign. The argument used here also has the advantage that it only requires the matter to satisfy the dominant and strong energy conditions and no assumption on the positivity of the pressure is needed. For this reason it applies to more general matter models and in particular to situations where an electromagnetic field is present. The following theorem can now be proved:
Constant Mean Curvature Foliations in Spacetimes with Local Symmetry
153
Theorem 3.1. Let a solution of the Einstein equations with local U (1) × U (1) symmetry be given and suppose that when coordinates are chosen which cast the metric into the form (2.5) with constant mean curvature time slices the time coordinate takes all values in the finite interval (t1 , t2 ). Suppose further that: i) the dominant and strong energy conditions hold ii) t2 < 0 Then the following quantities are bounded on the interval (t1 , t2 ): α, ∂x α, A, A−1 , ∂x A, K, β 1 , a, a−1 , ∂t a. ∂t A, ∂x β 1 .
(3.16) (3.17)
Proof. It has already been shown that α, A, A−1 , a, a−1 , K and ∂x β 1 are bounded. The fact that θ and θ0 are bounded implies that ∂x A is bounded. The boundedness of ∂x β 1 and the fact that β 1 vanishes at one point show that β 1 is bounded. Equation (3.14) gives a bound for ∂t A. Integrating equation (3.9) over the circle and using the bounds R 2π R 2π obtained already shows that 0 ρ and 0 (2ηA η A + κ˜ AB κ˜ AB + λ˜ AB λ˜ AB ) are bounded. R 2π R 2π By the dominant energy condition it follows that 0 j and 0 trS are bounded. Finally, integrating (3.10) starting at a point where ∂x α = 0 gives a bound for ∂x α.
4. Estimates for the Hyperbolic and Vlasov Equations The field equations which are used to control W and V are hyperbolic. These quantities may be thought of as describing gravitational waves. The fact that these equations are coupled with the matter equations and are themselves nonlinear means intuitively that the waves interact with the matter and with each other. The equations will be written in terms of a 2+2 split of the metric. Here lower case Roman indices refer to objects which live on the quotient of spacetime by the symmetry group. Indices of objects of this kind are raised and lowered using the metric gab on the quotient space and its inverse. The equations are: ∇a (r2 ∇a W ) = −2r2 tanh V ∇a W ∇a V − r2 (cosh V )−1 [e−W T22 − eW T33 − 21 (e−W (η2 )2 − eW (η3 )2 )], (4.1) ∇a (r2 ∇a V ) = r2 cosh V sinh V ∇a W ∇a W − 2r2 (cosh V )−1 [(T23 − 21 h˜ AB TAB g˜ 23 ) − 21 (η2 η3 − 21 (h˜ AB ηA ηB )g˜ 23 )].
(4.2)
The derivation of these equations is lengthy and it proved useful for this purpose to make use of the calculations of Kundu [15]. Let SW and SV denote the right hand sides of Eqs. (4.1) and (4.2) respectively. It will now be shown that the modulus of each of these quantities can be bounded by a constant multiple of the expression ρ + ηA η A + κ˜ AB κ˜ AB + λ˜ AB λ˜ AB . It then follows from what was said in the proof of Theorem 3.1 that under the hypotheses of that theorem the L1 norms of SW and SV in space are bounded by a constant which does not depend on time. For this purpose it is necessary to calculate λ˜ AB and κ˜ AB explicitly in terms of W and V , λ˜ AB λ˜ AB = 21 A−2 (cosh2 V Wx2 + Vx2 ), κ˜ AB κ˜
AB
=
2 1 −2 2 α [cosh
(4.3)
V (Wt − β Wx ) + (Vt − β Vx ) ]. 1
2
1
2
(4.4)
154
A. D. Rendall
This shows that the first term on the right-hand side of each of the Eqs. (4.1) and (4.2) can be bounded by λ˜ AB λ˜ AB + κ˜ AB κ˜ AB . To bound the other terms on the right hand side of (4.1) and (4.2), define an orthonormal frame on each orbit by: e2 = (Aa)−1 (e−W/2 cosh(V /2)∂/∂y 2 − eW/2 sinh(V /2)∂/∂y 3 ), −1
e3 = (Aa)
(−e
−W/2
2
sinh(V /2)∂/∂y + e
W/2
3
cosh(V /2)∂/∂y ).
(4.5) (4.6)
Then e−W/2 ∂/∂y 2 = Aa[cosh(V /2)e2 + sinh(V /2)e3 ], e
W/2
3
∂/∂y = Aa[sinh(V /2)e2 + cosh(V /2)e3 ].
(4.7) (4.8)
The components of the covector ηA expressed in an orthormal frame can be bounded in terms of η A ηA . Thus if the latter expression is bounded it follows that the components of ηA expressed with respect to the basis (e−W/2 ∂/∂y 2 , eW/2 ∂/∂y 3 ) can be bounded by a constant multiple of cosh(V /2) or, equivalently, by a constant multiple of (cosh V )1/2 . This means that e−W/2 η2 and eW/2 η3 can be bounded by an expression of the form Cη A ηA (cosh V )1/2 for some constant C. This allows the expressions on the right-hand side of Eqs. (4.1) and (4.2) containing ηA to be bounded in modulus by a constant multiple of ηA η A . The terms involving the energy-momentum tensor can be handled in a very similar way. The dominant energy condition implies that the components of the energy-momentum tensor in an orthonormal frame are bounded in modulus by ρ and using (4.7) and (4.8) allows this to be translated into a bound on the matter terms on the right-hand side of Eqs. (4.1) and (4.2) in terms of ρ. Lemma 4.1. Under the hypotheses of Theorem 3.1 the quantities W , V , ηA , β A , ∂x β A are bounded. Proof. Choose some t3 in the interval (t1 , t2 ) and let (t4 , x4 ) be some point of the ¯ parametrized by t and x with t4 < t3 . (The case t4 > t3 is similar.) quotient manifold M Equations (4.1) and (4.2) have the same characteristics. These are the null curves of the ¯ . Let γ1 and γ2 be the two characteristics passing through (t4 , x4 ) and metric defined on M let (t3 , x5 ) and (t3 , x6 ) be the coordinates of the points where they meet the hypersurface t = t3 . The left-hand side of Eq. (4.1) has the form of a divergence. Applying Stokes’ theorem to the triangular region T bounded by γ1 , γ2 and the curve t = t3 gives the identity: Z t3 Z t3 Z 2 SW αAdtdx = (r DW/Dt)(t, γ1 (t))dt + (r2 DW/Dt)(t, γ2 (t))dt T t4 t4 , Z x6 2 r (Wt − β1 Wx )(t3 , x)Adx − x5
and hence, after integration by parts: (r2 W )(t4 , x4 ) = 21 [(r2 W )(t3 , x5 ) + (r2 W )(t3 , x6 )] Z t3 Z t3 (2rW Dr/Dt)(t, γ1 (t)) − 21 (2rW Dr/Dt)(t, γ2 (t)) − 21 (4.9) t4 t4 Z x6 Z r2 (Wt − β1 Wx )(t3 , x)Adx − 21 SW αAdtdx. − 21 x5
T
Constant Mean Curvature Foliations in Spacetimes with Local Symmetry
155
Here D/Dt denotes a derivative in the direction of the characteristic along which the integration is carried out, with this characteristic being parametrized with respect to t. In other words, for any function f , Df /Dt = d/dt(f (γ(t))). Most of the quantities in (4.9) are already known to be bounded. This is in particular true of Dr/Dt. Thus the following inequality holds: kr2 W (t3 − t)k∞ " Z ÿ t
≤C 1+ 0
Z
kr2 W (t3 − s)k∞ +
γ2 (t3 −s) γ1 (t3 −s)
! |SW (s, x)|dx ds
#
(4.10)
R 2π
Since 0 |SW (t, x)|dx is known to be bounded (as shown in the discussion preceding this lemma) and under the given hypotheses the number of times the characteristics can go around the circle between t = t1 and t = t3 is bounded, it follows from Gronwall’s inequality that W is bounded. The same kind of argument shows that V is bounded. The remaining conclusions of the lemma are simple consequences of the boundedness of W and V , as will now be shown. The momentum constraint implies that: ∂x (A2 ηA ) = 8πA2 jA ,
(4.11)
which means that (A2 ηA )(t, x1 ) − (A2 ηA )(t, x2 ) is bounded independently of t, x1 and R 2π x2 . On the other hand the boundedness of 0 η A ηA (x)dx together with that of V and R 2π W shows that 0 |A2 ηA (x)|dx is bounded. These two facts together show that ηA is bounded. The definition of the second fundamental form gives the equation: g˜ AB ∂x β B = 2αA−1 a−2 ηA .
(4.12)
This means that ∂x β A is bounded and, remembering that by definition β A (0) = 0, this implies that β A is bounded. This completes the proof. Everything which has been done up to now consists of obtaining bounds for parts of the geometry using nothing about the matter model except the dominant and strong energy conditions. Now the special case of the Vlasov equation will be considered. In this class of spacetimes the Vlasov equation for particles of unit mass takes the following form: (4.13) ∂f /∂t + (αA−1 (v 1 /v 0 ) − β 1 )∂f /∂x + F i ∂f /dv i = 0. Here the quantities F i are functions of t, the quantities listed in (3.16) and (3.17), β A and their spatial derivatives, ηA and the first derivatives of W and V with respect to t and x. pThey depend linearly on the derivatives of W and V . The mass shell condition v 0 = 1 + δij v i v j defines v 0 in terms of v i . The characteristics of Eq. (4.13) satisfy the system: dxi /ds = [αA−1 (v 1 /v 0 ) − β 1 ]δ1i , (4.14) dv i /ds = F i . The spacetimes considered here have two local Killing vectors. If k α is a Killing vector in any spacetime and pα the unit tangent vector to a timelike geodesic, then the quantity pα kα is conserved along the geodesic. This allows two conserved quantities for the Eqs. (4.14) to be derived. They can be computed, using (4.7) and (4.8) to be: AaeW/2 [cosh(V /2)v 2 + sinh(V /2)v 3 ], Aae−W/2 [sinh(V /2)v 2 + cosh(V /2)v 3 ].
(4.15)
156
A. D. Rendall
It is easy to solve for v 2 and v 3 in terms of these two conserved quantities and the boundedness of W and V implies that v 2 and v 3 are bounded along a characteristic. Consider now a solution of the Einstein-Vlasov system where the initial datum for the distribution function has compact support. Let P (t) be the supremum of |v| over the support of f (t). Since a, A, W and V have already been controlled pointwise the components TAB of the energy-momentum tensor occurring on the right hand side of (4.1) and (4.2) can be estimated in terms of the corresponding frame components. Looking at the explicit expressions for these frame components and using the boundedness of v 2 and v 3 in the support of f shows that: (4.16) kTAB (t)k∞ ≤ CP (t), where C is a constant which only depends on the initial data. To make use of (4.16) an estimate for v 1 must be obtained. Define: Q(t) = k∂x W (t)k∞ + k∂t W (t)k∞ + k∂x V (t)k∞ + k∂t V (t)k∞ .
(4.17)
Lemma 4.2. If the hypotheses of Theorem 3.1 are satisfied by a solution of the EinsteinVlasov system then the following inequality holds for t4 < t3 : Z t3 −t4 1 + P (t4 ) ≤ C 1 + P (t3 ) + 1 + P (t3 − t) + Q(t3 − t)dt . (4.18) 0
The analogous inequality holds for t4 > t3 . Proof. In a 3+1 decomposition of a general spacetime the Vlasov equation takes the following form when expressed in terms of frame components: ∂f /∂t + (αv i /v 0 eai − β a )∂f /∂xa i i j k − [ei (α)v 0 + α(−kab eai ebj + γ0j )v j + αγjk v v /v 0 ]∂f /∂v i = 0.
(4.19)
i i and γjk are Ricci rotation coefficients. Consider the terms appearing in F 1 in Here γ0j the case of the symmetry considered here which contain derivatives of V and W . No such terms arise from the terms in (4.19) involving the derivatives of α and the second fundamental form. To go further it is necessary to have more explicit expressions for the i are identical with rotation coefficients. The four-dimensional rotation coefficients γjk corresponding three-dimensional ones while: i i k a = −α−1 γkj θa β + 21 α−1 (eaj ∇a β b θbi − δ is eas ∇a β b θbt δjt + cij − δ is cts δjt ). (4.20) γ0j
Here cij = eaj ∂t θai . Each term in the expressions for the coefficients of the Vlasov equation is either independent of the derivatives of W and V , in which case it is bounded as a consequence of the estimates already proved, or it is linear in these derivatives. Consider now the equation for v 1 in the characteristic system. The estimate (4.18) is obtained by considering the dependence on v of those terms which are linear in the derivatives of W and V . The quantity v i is bounded unless i = 1 while the quantity v j v k /v 0 is bounded unless j = k = 1. However, by the symmetry properties of the rotation coefficients, 1 = 0. Hence all terms on the right hand side of the equation for v 1 can be bounded γj1 by an expression of the form C(1 + P + Q). Lemma 4.3. If the hypotheses of Theorem 3.1 are satisfied by a solution of the EinsteinVlasov system then the quantities P , ∂t W , ∂t V ∂x W , ∂x V , ρ, α−1 , the derivative with respect to x of all the quantities in (3.16) and (3.17), ∂x ηA and ∂x2 β A are bounded.
Constant Mean Curvature Foliations in Spacetimes with Local Symmetry
157
Proof. The first step is to obtain an estimate for the first derivatives of W and V . In order to do this, it is useful to write Eqs. (4.1) and (4.2) in a slightly different way: ∇a ∇a W + 2 tanh V ∇a W ∇a V = −(2/r)∇a r∇a W − (cosh V )−1 [e−W T22 − eW T33 − 21 (e−W (η2 )2 − eW (η3 )2 )], a
(4.21)
a
a
−1
∇ ∇a V − sinh V cosh V ∇ W ∇a W = −(2/r)∇ r∇a V − 2(cosh V ) [(T23 − 1 h˜ AB TAB g˜ 23 ) − 1 (η2 η3 − 1 (h˜ AB ηA ηB )g˜ 23 )]. 2
2
2
(4.22)
The advantage of this is that if the right hand sides of (4.21) and (4.22) are replaced by zero the resulting equations are those for a wave map (hyperbolic harmonic map) with target space R2 , endowed with the metric cosh2 V dW 2 + dV 2 . This is a representation of the standard metric of the hyperbolic plane in a certain coordinate system. It is natural to try to generalize estimates which have been used in the study of wave maps to the present situation. Here this is done with an estimate of Gu [13], who used it to prove global existence of classical solutions in the Cauchy problem for wave maps defined on two-dimensional Minkowski space. Define two null vectors on the two-dimensional space coordinatized by t and r by e+ = α−1 (∂/∂t − β∂/∂x) + A−1 ∂/∂x, e− = α−1 (∂/∂t − β∂/∂x) − A−1 ∂/∂x.
(4.23)
The (2-dimensional) covariant derivatives ∇e− e+ and ∇e+ e− are given by: ∇e− e+ = α−1 (b++ e+ + b+− e− ), ∇e+ e− = α−1 (b−+ e− + b−− e− )
(4.24)
for some bounded functions b++ , b+− , b−+ and b−− . The normalization chosen for the vectors e+ and e− here is important, since otherwise the covariant derivatives could contain the time derivatives of α and β 1 , quantities which have not yet been shown to be bounded. Let E+ and E− be the images of e+ and e− under the wave map, i.e. E+ = e+ (W )∂/∂W + e+ (V )∂/∂V, E− = e− (W )∂/∂W + e− (V )∂/∂V.
(4.25)
Let γ1 and γ2 be integral curves of e− and e+ respectively and let γˆ i = (W ◦ γi , V ◦ γi ). The observation of Gu is that the equations obtained from (4.21) and (4.22) by replacing the right hand side by zero and the given metric by the flat metric just say that E+ is parallelly transported along γˆ 1 and that E− is parallelly transported along γˆ 2 . A similar calculation can be done for Eqs. (4.21) and (4.22) and this gives rise to the following equation along γˆ 1 (and an analogous equation along γˆ 2 ): ∇αE− E+ = (b++ − r−1 αe− (r))E+ + (b+− − r−1 αe+ (r))E− + B− ,
(4.26)
where B− satisfies an inequality of the form |B− | ≤ C(1 + kTAB k∞ ). These equations allow the lengths of the vectors E+ and E− to be controlled. Multiplying (4.26) and the analogous equation for E+ by α allows the following inequality to be derived: Z t3 −t4 1 + Q(t3 − t) + kTAB (t3 − t)k∞ dt]. (4.27) Q(t4 ) ≤ C[Q(t3 ) + 0
158
A. D. Rendall
Putting together (4.16), (4.18) and (4.27) gives: Z t3 −t4 (1 + P + Q)(t3 − t)dt. (1 + P + Q)(t4 ) ≤ (1 + P + Q)(t3 ) + C
(4.28)
0
Hence by Gronwall’s lemma P , ∂t W , ∂t V , ∂x W and ∂x V are bounded. It then follows immediately that ρ is bounded and (3.10) shows that α−1 is bounded. The equations (3.9)–(3.14) can be used directly to show that the first derivatives with respect to x of all the quantities in (3.16) and (3.17) are bounded. It follows from (4.11) that ∂x ηA is bounded and from (4.12) that ∂x2 β A is bounded. Lemma 4.4. If the hypotheses of Theorem 3.1 are satisfied by a solution of the EinsteinVlasov system then the second derivatives of W and V and the first derivatives of f are bounded. Proof. If f were zero (the vacuum case) then it would be rather simple to prove this theorem, since the equations obtained by differentiating the equations for W and V with respect to x are linear in the highest order derivatives in that case. With the coupling to f things are less straightforward. When the Vlasov equation is differentiated with respect to x terms come up which involve second derivatives of W and V multiplied by first derivatives of f . In other words, there are terms which are quadratic in the quantities to be estimated and this precludes a direct application of Gronwall’s inequality. This problem can be solved using a device of Glassey and Strauss [12], which can be seen in a particularly simple form, adequate for the present application, in the paper [10] of Glassey and Schaeffer (see also [11]). The equation for W can be written in the following form: la ∇a (nb ∇b W ) = (Y1 (W, V )la + Y2 (W, V )na )∇a W + Z(W, V ),
(4.29)
where Z(W, V ) contains no derivatives of W or V and Y1 (W, V ) and Y2 (W, V ) contain them at most linearly. Here, for ease of notation, l = e+ and n = e− . The equation for V can of course be written in a similar form. There are also alternative forms of both of these equations where the roles of l and n are interchanged. Differentiating Eq. (4.29) with respect to x gives an equation of the form ˜ V ), la ∇a (∂x (nb ∇b W )) = (Y1 (W, V )la + Y2 (W, V )na )∂x (∇a W ) + Z(W,
(4.30)
˜ where the expression Z(W, V ) does not depend on second derivatives of W and V . Suppose now that we integrate Eq. (4.30) along the characteristic which is an integral curve of la . The only terms which cannot be bounded straightforwardly (even before integration) are those which contain derivatives of the energy momentum tensor with respect to x. It will now be shown how a typical term of this type can be handled. The others which occur can be taken care of in a strictly analogous way. The term which is to be bounded is: Z t3 −t4 [(eW cosh V )−1 ∂x T22 ](t3 − t)dt. (4.31) 0
In fact the coordinate components of the energy-momentum tensor may be replaced by frame components at this stage since their spatial derivatives only differ by terms which are bounded. Substituting the definition of the frame component T (e2 , e2 ) into the expression of interest gives
Constant Mean Curvature Foliations in Spacetimes with Local Symmetry
Z
t3 −t4
Z
[(eW cosh V )−1 (v2 )2 (1 + |v|2 )−1 ]∂x f dvdt.
159
(4.32)
0
The idea of [12] is to express ∂x as a linear combination of l and the vector
The result is:
m = ∂/∂t + (αA−1 (v 1 /v 0 ) − β 1 )∂/∂x.
(4.33)
∂/∂x = α−1 A(1 − v 1 /v 0 )−1 (l − m).
(4.34)
This allows the integral in (4.32) to be rewritten as a sum of two terms, one containing l and the other containing m. Now it is possible to substitute for mf using the Vlasov equation and the result contains only derivatives of f with respect to the velocity variables. These derivatives can be eliminated by an integration by parts in v and the result is a bounded quantity. The other term is equal to Z Z t3 −t4 [α−1 A(eW cosh V )−1 (v2 )2 (1+|v|2 )−1 la ∇a f ](γ1 (t3 − t))dt(1−v 1 /v 0 )−1 dv 0 Z = [(α−1 A(eW cosh V )−1 )(t4 , x4 )f (t4 , x4 , v) − (α−1 A(eW cosh V )−1 )(t3 , x5 )f (t3 , x5 , v)](v2 )2 (1 + |v|2 )−1 (1 − v 1 /v 0 )−1 dv + . . . . (4.35) This is obtained by integrating by parts in t along the characteristic. Only the boundary terms are written explicitly. The other terms are integrals in t of bounded quantities. Thus the term involving lf coming from (4.32) is also bounded. The same trick can be applied when W is replaced by V and when l and n are interchanged. (In the last case ∂/∂x must be replaced by a combination of m and n.) The result of all this is that if Q1 = k∂x (la ∇a W )k∞ + k∂x (na ∇a W )k∞ + k(la ∇a V )k∞ + k(na ∇a V )k∞ then an estimate of the form:
Z
t3 −t4
1 + Q1 (t4 ) ≤ 1 + Q1 (t3 ) + C
(1 + Q1 (t3 − t))dt
(4.36)
(4.37)
0
is obtained. It follows from Gronwall’s inequality that Q1 is bounded. Hence the derivatives Wxx , Wtx , Vxx and Vtx are bounded. Using this information in the equations obtained by differentiating the Vlasov equation with respect to x or v shows that the first derivatives of f with respect to these variables are bounded. 5. The Main Result In this section the estimates collected in Sect. 4 are applied to prove the main result. First one last auxiliary lemma is required. Lemma 5.1. Consider a CMC initial data set for the Einstein-Vlasov system with local U (1) × U (1) symmetry. Then there exists a local Cauchy evolution of this data which has local U (1) × U (1) symmetry, so that the hypotheses of Lemma 2.1 are satisfied. Consider next a family of initial data sets of this type on the same manifold such that: (i) the data in the family are uniformly bounded in the C ∞ topology, (ii) the metrics are uniformly positive definite,
160
A. D. Rendall
(iii) the supports of the distribution functions are contained in a common compact set, (iv) the mean curvatures are uniformly bounded away from zero. Then the time interval in the conclusion of Lemma 2.1 can be chosen uniformly for the Cauchy evolutions of all data in the family. Proof. The first statement of the proof is essentially a direct consequence of the standard local existence theorem for the Einstein-Vlasov system and for CMC hypersurfaces and the fact that the resulting spacetimes inherit any symmetry which is present. When there is only local symmetry the inheritance of symmetry argument should be applied to the universal cover (cf. [19]). The second part of the lemma, concerning families is a consequence of the stability of various operations. Firstly, the statement is used that the time of existence of a solution of the Einstein-Vlasov system, measured with respect to an appropriate time coordinate, depends only on the size of the initial data and that on a fixed closed time interval the solution depends continuously on the initial data. Secondly, the fact is used that the interval on which a CMC foliation exists in a neighbourhood of a given CMC hypersurface depends only on the size of the metric coefficients in an appropriate coordinate system and a positive lower bound for the lapse function, provided the mean curvature of the starting hypersurface remains bounded away from zero. Theorem 5.1. Let (M, g, f ) be a C ∞ solution of the Einstein-Vlasov system with local U (1) × U (1) symmetry which is the maximal globally hyperbolic development of data on a symmetric hypersurface of constant mean curvature H0 < 0. Then the part of the spacetime to the past of the initial hypersurface can be covered by a foliation of CMC hypersurfaces with the mean curvature taking all values in the interval (−∞, H0 ]. Moreover, the CMC foliation can be extended to the future of the initial hypersurface in such a way that the mean curvature attains all negative real values. Proof. Let T be the largest number (possibly infinite) such that the local foliation by CMC hypersurfaces which exists near the initial hypersurface can be extended so that the mean curvature takes on all values in the interval (−T, H0 ). Suppose that T is finite. Then Theorem 3.1 and the results of Sect. 4 imply the boundedness of many quantities on the interval (−T, H0 ]. It will now be shown by induction that the spatial derivatives of all orders of all quantities of interest are bounded on the given interval. The inductive hypothesis is that the following quantities are bounded: Dn f, Dn+1 W, Dn (∂t W ), Dn+1 V, Dn (∂t V ), Dn+1 α, Dn+1 β 1 , Dn+1 A, Dn (∂t A), Dn K, Dn ηA , Dn+1 β A .
(5.1)
It follows from the results of Sect. 4, and in particular Lemma 4.4, that the inductive hypothesis is satisfied for n = 1. Suppose now that it is satisfied for a given value of n. Then it follows immediately from the field Eqs. (3.9)–(3.14) and (4.11)–(4.12) that all quantities which are required to be bounded by the inductive hypothesis at the next step are bounded, except possibly for the relevant derivatives of f , W and V . Consider the equation obtained by differentiating the Vlasov equation n + 1 times with respect to x. There results a linear equation for Dn+1 f with coefficients which are known to be bounded, except for terms involving derivatives of W and V in the inhomogeneous term. If Fn (t) = kDn f k∞ and Qn (t) = kDn+1 W k∞ + kDn+1 V k∞ + kDn (∂t W )k∞ + kDn (∂t V )k∞ , then this equation implies an inequality of the form
Constant Mean Curvature Foliations in Spacetimes with Local Symmetry
Z
t0 −t
Fn+1 (t) ≤ Fn+1 (t0 ) + C
Fn+1 (t0 − s) + Qn+1 (t0 − s)ds.
161
(5.2)
0
Similarly, differentiating Eqs. (4.29) and (4.30) n times with respect to x gives a linear system of equations for derivatives of V and W with coefficients which are known to be bounded, except for terms involving derivatives of order n + 1 of matter quantities in the inhomogeneous term. Hence: Z t0 −t Fn+1 (t0 − s) + Gn+1 (t0 − s)ds. (5.3) Gn+1 (t) ≤ Gn+1 (0) + C 0
Putting together (5.2) and (5.3) and applying Gronwall’s inequality proves that Fn+1 and Gn+1 are bounded and completes the inductive step. Thus the quantities in (5.1) are bounded for all n. Consider now the data obtained by restricting the given solution to the hypersurfaces t =const. By what has just been proved, this family of data satisfies the conditions of Lemma 5.1. Hence there exists some > 0 such that each of these initial data has a corresponding solution on a time interval of length . Hence the original solution extends to the interval (−T − , H0 ), contradicting the maximality of T . It follows that in fact T = ∞, as desired. This means in particular that the spacetime has a crushing singularity in the past, and hence that the CMC foliation covers the entire past of the initial hypersurface. Now let T 0 be the largest number such that the CMC foliation can be extended to the interval (−∞, T 0 ). Since the spacetime contains no compact maximal hypersurface T 0 ≤ 0. If T 0 were strictly less than zero it could be argued as in the first part of the proof that the CMC foliation could be extended further, which would contradict the definition of T 0 . Hence in fact T 0 = 0. This argument does not prove that the entire future of the initial hypersurface is covered by the CMC foliation. In connection with this it is interesting to note that if instead of assuming, as is done in this paper, that the cosmological constant 3 vanishes, it is assumed that 3 < 0, then the same types of arguments apply to give a stronger theorem. (The choice of sign convention for the cosmological constant used here is such that 3 < 0 corresponds to anti-de Sitter space.) With 3 < 0 the result is that the whole spacetime can be covered by a CMC foliation with the mean curvature taking on all real values. The reason for this difference can be traced to the estimate for α following from the lapse equation, which in general reads α ≤ ( 13 t2 − 3)−1 . 6. The Case of Wave Maps In this section we consider what happens when the collisionless matter described by the Vlasov equation is replaced by a wave map as source in the Einstein equations. This is quite natural, given that, as was seen in Sect. 4, a wave map comes up automatically in the case of vacuum spacetimes. Let (N, h) be a complete Riemannian manifold. If (M, g) is a Lorentz manifold a wave map φ from M to N is a map which satisfies the equation whose expression in local coordinates xα on M and y I on N is: ∇α ∇α φI + 0IJK ∇α φJ ∇α φK = 0.
(6.1)
(Wave maps are also known as (hyperbolic) harmonic maps or nonlinear sigma models.) The global Cauchy problem for wave maps on two-dimensional Minkowski space was
162
A. D. Rendall
solved by Gu [13] and for wave maps on three-dimensional Minkowski space which are invariant or equivariant under rotations by Christodoulou, Shatah and Tahvildar-Zadeh [3, 4, 21]. The results of [4] were applied to the Einstein-Maxwell equations in [2]. Associated to a wave map φ is the energy-momentum tensor: Tαβ = [∇α φI ∇β φJ − 21 (∇γ φI ∇γ φJ )gαβ ]hIJ ,
(6.2)
and this can be used to couple the wave map to the Einstein equations. In harmonic coordinates the coupled equations form a system of nonlinear wave equations and so a local existence and uniqueness theorem can be proved by the usual methods. The energy-momentum tensor of a wave map satisfies both the dominant and strong energy conditions. This can be seen by noting that both these conditions are purely algebraic in nature and can be checked using normal coordinates based at a given point of N . Then the energy-momentum is reduced at a point to a sum of terms, each of which is the energy-momentum tensor of a massless scalar field. Consider now the case of a solution of the Einstein equations with local U (1) × U (1) symmetry coupled to an invariant wave map. To say that the wave map is invariant means that each surface of symmetry is mapped to a single point of N . Since the relevant energy conditions hold, it follows that the analogues of the results obtained in Sects. 2 and 3 for the Einstein-Vlasov system are also valid for the Einstein-wave map system. Given that in proving Theorem 5.1 a wave map was already estimated, albeit for a special target manifold (N, h), it appears straightforward to generalize that theorem to the case of the Einstein-wave map system. In fact the equation of motion for the wave map does not involve W , V or ηA while the combinations of matter terms occurring in (4.1) and (4.2) vanish identically for the energy-momentum tensor of an invariant wave map. Thus there is no direct coupling between the wave map describing the matter and the wave-map-like equation satisfied by W and V . The one difficulty which occurs is that, in contrast to the special case of the hyperbolic plane, there is no global coordinate system on N in the general case. The equation for an invariant wave map can be written in the form: ∇a (r2 ∇a φI ) + r2 0IJK ∇a φJ ∇a φK = 0.
(6.3)
This bears a strong resemblance to Eqs. (4.1)–(4.2), with the difference that there are no terms involving η or matter quantities in (6.3). This makes the analogue of the calculation (4.9) for the wave map superfluous. This is just as well, since it seems difficult to formulate an analogue of (4.9) in the case that there is no global coordinate system on N . What can be done instead is to go directly to the analogue of (4.26) for the wave map. Define: E˜ + = e+ (φI )∂/∂φI , (6.4) E˜ − = e− (φI )∂/∂φI . Then E˜ + and E˜ − satisfy propagation equations like (4.26) along γˆ 1 and γˆ 2 respectively. There is no term corresponding to B− in this case. It follows that under the hypotheses of Theorem 3.1 the length of the vectors E˜ + and E˜ − is bounded on the given time interval. This implies a bound on the distance of any point of the image under φ of this time interval from the image of the initial hypersurface. In particular, the image of this time interval under φ is contained in a compact subset of N . This compact set can be covered by a finite number of charts, each of which can be chosen to be defined on a domain with compact closure in a larger chart domain. In each of these charts the quantities φI , ∂t φI and ∂x φI are bounded. Moreover, in any of these charts the following analogue of (4.30) holds:
Constant Mean Curvature Foliations in Spacetimes with Local Symmetry
˜ I ). la ∇a (∂x (nb ∇b φI )) = (Y1 (φI )la + Y2 (φI )na )∂x (∇a φI ) + Z(φ
163
(6.5)
This equation and the equations obtained by differentiating it repeatedly with respect to x can be used to inductively bound all spatial derivatives of φI . This proceeds essentially as in the proof of Theorem 5.1; it is merely necessary to be careful about the different charts which occur. In the case of a wave map define Fn (t) to be the maximum over the finite set of charts chosen of kDn+1 φI k∞ and kDn (∂t φI )k∞ . When the derivatives of lower orders are known to be bounded, this is equivalent to choosing for each point one chart which contains its image and only taking the supremum over those values. When the quantity Fn (t) is bounded the derivatives of order n of the frame components of the energy-momentum tensor are bounded. In order to get an inequality which can be used to control Fn (t), we would like to integrate a derivative of (6.5) along a characteristic (integral curve of l or n). The image of this characteristic under φ need not be contained in a single chart. Consider such a characteristic γ, parametrized by t from t = 0 to t = T . For each t ∈ [0, T ] there exists an interval I, open in [0, T ], whose image under φ ◦ γ is contained in one of the chosen charts on N . By compactness of [0, T ], finitely many of these intervals cover it. It follows that there is a finite sequence of times {0 = t1 , t2 , . . . , tk } = T such that φ◦γ([ti , ti+1 ]) is contained in one of the chosen charts for all i between 1 and k − 1. It follows from (6.5) that Fn (tk ) ≤ Fn (tk−1 )eC(tk −tk−1 ) .
(6.6)
This is enough to allow Fn to be bounded for all t ∈ [0, T ]. Thus the following analogue of Theorem 5.1 is obtained: Theorem 6.1. Let (N, h) be a complete Riemannian manifold and let (M, g, φ) be a C ∞ solution of the Einstein equations with local U (1) × U (1) symmetry coupled to an invariant wave map with target space (N, h) which is the maximal globally hyperbolic development of data on a symmetric hypersurface of constant mean curvature H0 < 0. Then the part of the spacetime to the past of the initial hypersurface can be covered by a foliation of CMC hypersurfaces with the mean curvature taking all values in the interval (−∞, H0 ]. Moreover, the CMC foliation can be extended to the future of the initial hypersurface in such a way that the mean curvature attains all negative real values. Acknowledgement. I am grateful to Lars Andersson for helpful discussions.
References 1. Bartnik, R.: Remarks on cosmological spacetimes and constant mean curvature hypersurfaces. Commun. Math. Phys. 117, 615–624 (1988) 2. Berger, B. K., Chru´sciel, P. T. and Moncrief, V.: On ‘asymptotically flat’ spacetimes with G2 -invariant Cauchy surfaces. Ann. Phys. 237, 322–354 (1995) 3. Christodoulou, D., Tahvildar-Zadeh, S.: On the regularity of spherically symmetric wave maps. Commun. Pure Appl. Math. 46, 1041–1091 (1993) 4. Christodoulou, D., Tahvildar-Zadeh, S.: On the asymptotic behaviour of spherically symmetric wave maps. Duke Math. J. 71, 31–69 (1993) 5. Chru´sciel, P. T.: On spacetimes with U (1) × U (1) symmetric compact Cauchy surfaces. Ann. Phys. 202, 100–150 (1990) 6. Chru´sciel, P. T.: On uniqueness in the large of solutions of Einstein’s equations (Strong cosmic censorship). Proceedings of the C.M.A. 27, Australian National University (1991) 7. Chru´sciel, P. T., Isenberg, J. and Moncrief, V.: Strong cosmic censorship in polarised Gowdy spacetimes. Class. Quantum Grav. 7, 1671–1680 (1990)
164
A. D. Rendall
8. Eardley, D., Smarr, L.: Time functions in numerical relativity: Marginally bound dust collapse. Phys. Rev. D 19, 2239–2259 (1979) 9. Gerhardt, C.: H-surfaces in Lorentzian manifolds. Commun. Math. Phys. 89, 523–553 (1983) 10. Glassey, R., Schaeffer, J.: On the ‘one and one-half dimensional’ relativistic Vlasov-Maxwell system. Math. Meth. Appl. Sci. 13, 169–179 (1990) 11. Glassey, R., Schaeffer, J.: The relativistic Vlasov-Maxwell equations in low dimension. In: Murthy, M. K. V., Spagnolo, S. (eds.): Nonlinear hyperbolic equations and field theory. London: Pitman, 1992 12. Glassey, R., Strauss, W.: Singularity formation in a collisionless plasma could only occur at high velocities. Arch. Rat. Mech. Anal. 92, 56–90 (1986) 13. Gu, C.-H.: On the Cauchy problem for harmonic maps defined on two-dimensional Minkowski space. Commun. Pure Appl. Math. 33, 727–737 (1980) 14. Isenberg, J., Moncrief, V.: The existence of constant mean curvature foliations of Gowdy 3-torus spacetimes. Commun. Math. Phys. 86, 485–493 (1983) 15. Kundu, P.: Projection tensor formalism for stationary axisymmetric gravitational fields. Phys. Rev. D 18, 4471–4479 (1978) 16. Lawson, H. B., Michelson, M.-L.: Spin geometry. Princeton: Princeton University Press, 1989 ´ Murchadha, N.: Optical scalars and singularity avoidance in spherical spacetimes. Phys. 17. Malec, E., O Rev. D 50, 6033–6036 (1994) 18. Marsden, J. E., Tipler, F. J.: Maximal hypersurfaces and foliations of constant mean curvature in general relativity. Phys. Rep. 66, 109–139 (1980) 19. Rendall, A. D.: Crushing singularities in spacetimes with spherical, plane and hyperbolic symmetry. Class. Quantum Grav. 12, 1517–1533 (1995) 20. Scott, P.: The geometries of 3-manifolds. Bull. London Math. Soc. 15, 401–487 (1983) 21. Shatah, J., Tahvildar-Zadeh, S.: Regularity of harmonic maps from the Minkowski space into rotationally symmetric manifolds. Commun. Pure Appl. Math. 45, 947–971 (1992) Communicated by S.-T. Yau
Commun. Math. Phys. 189, 165 – 204 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Breakdown of Stability of Motion in Superquadratic Potentials Vadim Zharnitsky ? Theoretical Division and Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM 87545, USA Received: 1 February 1995 / Accepted: 15 March 1997
Abstract: Based on the KAM theory, investigation of the equation of motion of a classical particle in a one-dimensional superquadratic potential well, under the influence of an external time-periodic forcing, raised a hope that all the solutions are bounded. Indeed, due to the superquadraticity of the potential the frequency of oscillations of the solutions in the system tends to infinity as the amplitude increases. Therefore, because of this relationship between the frequency and the amplitude, intuitively one might expect that all resonances that could cause the accumulation of energy would be destroyed, and thus all solutions would stay bounded for all time. More formally, according to Moser’s twist theorem, this could mean the existence of invariant tubes in the extended phase space and therefore would result in the boundedness of the solutions. Actually, the boundedness results have been established for a large class of superquadratic potentials, but in general, the above intuition turns out to be wrong. Littlewood showed it by creating a superquadratic potential in which an unbounded motion occurs in the presence of some particular piecewise constant forcing. Moreover, Littlewood’s result holds for a larger class of forcings. Here it is proven for the continuous time-periodic forcing. For this purpose a new averaging technique for the forced motions in superquadratic potentials with rather weak assumptions on the differentiability of the potentials has been developed. Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 1.1 Historical remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 1.2 Outline of Littlewood’s counterexample . . . . . . . . . . . . . . . . . . . . . . . . . 167 1.3 The main theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 2 Construction of an Unbounded Solution . . . . . . . . . . . . . . . . . . . . . . . . . 169 2.1 Reduction to the action-angle variables . . . . . . . . . . . . . . . . . . . . . . . . . . 169 ?
Supported by AFOSR F49620-92-J-0049 DEF and DOE.
166
V. Zharnitsky
2.2 2.3 2.4 2.5 3 4 5 6 6.1 6.2 A A.1 A.2 A.3 A.4 B B.1 B.2 B.3 B.4 C C.1 D D.1 D.2 D.3 D.4 E
Mechanism of the increase of the action . . . . . . . . . . . . . . . . . . . . . . . . . 170 An outline of the construction of the unbounded solution . . . . . . . . . . . . 171 Averaging procedure and adiabatic invariance in wavy potentials . . . . . 172 Estimation of finite differences of periods . . . . . . . . . . . . . . . . . . . . . . . . 174 Proof of the Main Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 Preliminary Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 Increase in the Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Proof of Lemma 3.1 and Lemma 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 Proof of Lemma 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 Proof of Lemma 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Proof of the Proposition 5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 Estimation of 1T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 Estimation of 1x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Estimation of 1hxi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Estimation of 1φ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Modification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 Proof of Lemma 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 Proof of Lemma 6.1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 Proof of Lemma 6.1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Proof of Lemma 6.1.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Proof of Lemma 3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 Proof of Lemma 6.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 Proof of Propositions 5.3, A.2, A.3, and B.1 . . . . . . . . . . . . . . . . . . . . . . 199 Proof of Proposition A.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Proof of Proposition A.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 Proof of Proposition 5.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 Proof of Proposition B.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 Hamiltonian in the Action-Angle Variables . . . . . . . . . . . . . . . . . . . . . . . 202
1. Introduction 1.1. Historical remarks. In the early 1960’s Littlewood asked whether or not all the solutions of the Duffing-type equation 0
x¨ + U (x) = p(t), 0
where p(t + 1) = p(t) and U x(x) → ∞ as x → ±∞, stay bounded for all time. The above equation represents the motion of a classical particle in a one-dimensional potential field U (x) subjected to external time-periodic force p(t). The Poincar´e period map which takes the plane of initial conditions onto itself is area-preserving due to the conservative nature 0 of the equation. The superlinear growth of U (x) implies that the Poincar´e map possesses a twist at infinity raising a hope that the KAM theory, see [12] and [13], is applicable to show the existence of invariant curves. The greatest difficulty of the problem is to link the analytic properties of the potential U (x) with the geometry of the Poincar´e map to which the KAM theory could be applied. One of the conditions for invoking the theory 0 is the monotone twist of the Poincar´e map. Therefore more strict assumptions on U (x) have to be made to prove boundedness of the solutions. The simplest condition which 0
guarantees the monotonicity of the twist is the monotone growth of
U (x) x .
In fact, all
Breakdown of Stability of Motion in Superquadratic Potentials
167
boundedness results for the above equation include this condition (monotone growth of 0
the ratio U x(x) ) as well as some other conditions, e.g. see [9–11], and [14]. See also [2, 3, 7, and 4] for related problems and more references. That this condition is crucial follows from Littlewood’s example, where the ratio 0
U (x) x
tends to infinity but not monotonically and there exists an unbounded solution, see [5–8] or Subsect.1.2. The choice of a piecewise constant forcing, p(t) = (−1)[t]+1 made Littlewood’s construction clear and persuasive, because the system is piecewise autonomous. However, that left an open question about the effect of smoothness of p(t) on the boundedness of the solutions. In particular, it is of interest to construct an example of instability with a continuous forcing. 0 In this work an unbounded solution for a continuous p and for a superlinear U is constructed. 1.2. Outline of Littlewood’s counterexample. Firstly, we give an outline of Littlewood’s construction, see Fig. 1. A good description of the counterexample, much simpler than the original one [6], can be found in [5]. U (x)
p=1
p=-1
x p(t)
-X
-X1
3
0
X0
X2
X4
1
0
1
2
3
t
-1
Fig. 1. Resonance conditions and the modification of the potential. Each arrow represents oscillation of a particle during an autonomous phase. If the graph of U0 = x4 is replaced by U0 ± x then the arrows marked p = ±1 become horizontal
Let us start with the equation x¨ + 4x3 = (−1)[t]+1 , where [t] is an integer part of t. We choose the initial condition (x(0), x(0)) ˙ = (x0 , 0) (x0 > 0) so that the solution x(t) would swing left and stop at x(1) = −x1 (x1 > 0) with x(1) ˙ = 0. If the potential is left unmodified, the solution will make more than one full swing during the second half-period stopping at −x2 (x2 > x1 ) at least once. The potential can be modified in the open interval (x0 , x2 ) in such a way that the solution
168
V. Zharnitsky
is slowed down so that the point x2 is reached exactly at the end of the period. A simple argument based on the superquadratic character of x4 shows that during the next half-period the solution will again make at least one trip from x2 to −x3 , where x3 > x2 . By modifying the potential on the interval (−x3 , −x1 ) we slow the solution down so that x(3) = −x3 , thus again achieving the resonance condition. Continuing this procedure we obtain a monotonically increasing in magnitude sequence of turning points {(−1)k xk }k=∞ k=0 , where xk > 0. We note that the potential is not modified at these points. The sequence {(−1)k xk } is uniquely defined by the conservation of energy relation once x0 > 0 is fixed. Such a sequence is unbounded and so is the solution starting at x0 . However, the above argument does not show whether superquadraticity of U is preserved. This turns out to be the most delicate part of the construction. It is shown in [5] that the modification producing an unbounded solution can be such that 0 |U (x)| > cx2 . 1.3. The main theorem. p 1
1
2
t 3
4
-1 Fig. 2. p(t))
We start with the equation 0
x¨ + U0 (x) = p(t),
(1)
where p is continuous, periodic of period 4, p(t + 2) = −p(t) and on t ∈ [0, 2], √ 1 − 2 t if t ∈ (0, 1) p(t) = −1 if t ∈ (1, 2), see Fig. 2, and U0 (x) = x4 . It is known that this equation has no unbounded solution, see [9]. Our main goal is to modify the potential in order to obtain at least one unbounded solution x(t) for the new equation. We prove the following theorem Theorem 1.1. There exists a C ∞ - modification U (x) of the quartic potential U0 (x) = x4 satisfying the estimates p – 0 ≤ U (x) − U0 (x) ≤ C |x| 0 – c|x|1.5 ≤ |U (x)| ≤ C|x|3 such that the equation 0
x¨ + U (x) = p(t) has an unbounded solution x(t), that is |x| + |x| ˙ → ∞ as t → ∞.
(2)
Breakdown of Stability of Motion in Superquadratic Potentials
169
The proof of this theorem requires some estimates for the solution x(t) and is presented below, see Sect. ref’mainthmproof’. Remark 1.1. Since this paper contains many constants some of them will be denoted by the same letters C or c whenever the values of the constants are irrelevant. 2. Construction of an Unbounded Solution 2.1. Reduction to the action-angle variables. In this subsection we carry out the standard reduction to the action-angle variables. In these variables the oscillatory behavior of the solutions becomes explicit. x
∆Α
φ=
∆Ι
∆Α ∆Ι
x
<x>
-x -
x+
x
x Ι
Fig. 3. The definition of the action-angle variables. To each point (x, x) ˙ the action I is assigned given by the area enclosed by the level curve through (x, x) ˙ and the angle φ which is given by the proportion of the area of an infinitesimal ring between the x-axis ˙ and (x, x). ˙ The center of gravity of the infinitesimal ring is at hxi
The Hamiltonian of the problem is given by H=
x˙ 2 + U (x) − p(t)x. 2
Let I(x, x, ˙ p) be the area bounded by the closed curve H(u, v, p) = H(x, x, ˙ p) and let φ(x, x, ˙ p) be the angular variable defined as the product of the frequency ω(H, p) of the corresponding autonomous system and the time t it takes the solution of the autonomous ˙ Thus, the angular variable is defined modulo 1 system to get from (0, x˙ 0 > 0) to (x, x). and therefore the period T (I, p) is reciprocal to the frequency ω(I, p). In geometrical terms, see Fig. 3, φ is given by the proportion of the area of an infinitesimal ring between the x-axis ˙ and (x, x). ˙ The center of gravity of this infinitesimal ring is at hxi. The Hamiltonian in the action-angle variables is given by ˙ 1 (I, φ, p), K(I, φ, p) = H0 (I, p) + pH where H0 is the old Hamiltonian in the new variables and Z φ Z 1 x(I, ψ, p)dψ − φ x(I, ψ, p)dψ), H1 = T (I, p)( 0
0
(3)
(4)
170
V. Zharnitsky
see Appendix E. The equations of motion are given by I˙ = −pf ˙ (I, φ, p) φ˙ = ω(I, p) + pg(I, ˙ φ, p), 1 where f (I, φ, p) = T (I, p)(x(I, φ, p) − hxi(I, p)) and g = ∂H ∂I . In the autonomous case (p =constant) the motion becomes a uniform rotation I = I(0) φ = ω(0)t + φ(0) p = p(0).
2.2. Mechanism of the increase of the action. In this subsection we use the equations of motion in the action-angle variables to show the mechanism to increase √ the action. We consider the non-autonomous phase t ∈ [0, 1], p(t) = 1 − 2 t. The equations of motion have singularities at t = 0: ( I˙ = √1t f (I, φ, p) φ˙ = ω(I, p) − √1t g(I, φ, p). Note that f as a function of φ has zero mean and it changes the sign only two times during one period of φ (when x(I, φ, p) = hxi(I, p), see Fig. 3 ). The change of the action is given by Z I(t) − I(0) =
t
0
ds f (I(s), φ(s), p(s)) √ . s
(5)
The calculations show that for the quartic potential the following estimates hold: |f |, |g| ω, f = O(1) and the frequency of the oscillator is asymptotically proportional to the amplitude (the distance between the origin and the point where the autonomous solution turns around), see e.g. [4]. Therefore such a system can be treated as a perturbation of a completely integrable one for large amplitudes. Evaluating the integral in (5) over the autonomous trajectory we obtain Z
1
I(1) − I(0) = 0
ds f (I0 , φ0 + ω0 s, p0 ) √ . s
This integral gives the leading asymptotic term of the action change for “well-behaved” potentials. For example if f = sin φ and φ0 = 0 we have I(1)−I(0) → 2√πω0 as ω0 → ∞. For the class of potentials satisfying conditions of Theorem 1.1 the change the in action is later shown to be also of order √1ω0 if φ0 satisfies the resonance condition x(I0 , φ0 , p) = hxi(I0 , p0 ).
(6)
If x(I ˙ 0 , φ0 , p0 ) < 0(> 0) the action decreases (increases). Remark 2.1. The corresponding change of energy is 1H ≈
∂H ∂I 1I
≈ ω1I ≈
√ ω.
Breakdown of Stability of Motion in Superquadratic Potentials
171
x=<x> t=0
t . x
x
Fig. 4. Change of the action due to the rapid change of forcing. On the autonomous interval the trajectory stays on the tube with the constant area of t-crossection. At t ≈ 2n the tube section moves so fast that the trajectory does not catch up with it and starts rotating near the tube with a different area of section
We provide an additional geometrical explanation of the action increase. Consider the tube in the (x, x, ˙ t)-space whose t-crossections have a constant area I0 : TI0 = {(x, x, ˙ t) : H(x, x, ˙ p(t)) = H(I0 , p(t))}. When p =constant= 1 the trajectory spirals on the tube, see Fig. 4. At t = 0 the tube is sheared to the left. If the solution at t = 0 finds itself to the right of x = hxi then the distance to the tube increases and so does the action. The action decreases when the solution enters x ≤ hxi. However, this decrease is smaller than the preceding increase. Indeed, if at t = 0 (when p˙ = ∞) the solution is at x = hxi then it spends the largest possible time in x ≥ hxi and by the time it enters x ≤ hxi p˙ is smaller making the change in I smaller. The result is that more action is gained than lost after the first revolution. The same happens in the succeeding rotations. This effect is due to the monotonicity of p. ˙ In the next subsection we show how the above mechanism of action increase is used to create an unbounded solution. 2.3. An outline of the construction of the unbounded solution. Now we outline the induction procedure producing an unbounded solution. The procedure is based on the repeated use of the resonance condition and is similar to Littlewood’s construction. ˙ t) assoWe define the instantaneous right and left turning points ±x± = ±x± (x, x, ciated with a point (x, x, ˙ p(t)), see Fig. 3, as the solutions of the equation ˙ p(t)). H(±x± , 0, p(t)) = H(x, x, Thus, −x− and x+ are the points where the solution would turn around had the system been frozen at the moment t. Our modification of U0 = x4 will leave U0 unchanged at a sequence of points {(−1)k Ak }, where Ak+1 > Ak > 0 and (−1)k Ak is the point where the solution turns around on the autonomous intervals t ∈ [2k − 1, 2k]. In contrast with Littlewood’s construction, the sequence Ak cannot be defined by a recursive formula, although we will construct it so that it increases according to the asymptotic relation 1 1
In this paper "an ∼ bn " means 0 < c ≤
an bn
≤ C, where c and C are independent of n.
172
V. Zharnitsky
- (A 0 )-
-1
-A1
p=
1
p=
(A1 )+
A0
A 0 A2
-A 1
a).
b).
Fig. 5. Behavior of the unbounded solution. a) Drift of the solution due to the monotone change of p. The arrows indicate the sequence of turning points on t ∈ [0, 1]; b) Global behavior of the unbounded solution. The dashed arrows indicate the sequence (−1)n An
An+1 − An ∼
1 . A2.5 n
(7)
It is important for our construction that such a sequence is unbounded ( indeed An → A leads to an immediate contradiction 0 = A1 ). We describe the inductive procedure which will eventually produce an unbounded solution. Let the initial condition (x(0), x(0)) ˙ be chosen so that it satisfies the resonance condition (6) and x˙ > 0, with a yet unspecified I0 . Had the system remained autonomous, the solution would oscillate between x+ (0) = A0 and −x− (0) = −(A0 )− , where −(A0 )− is defined by the energy relation. But in our nonautonomous system the solution moves as shown in the diagram: the turning points drift to the left as p decreases from 1 to −1. At t = 1 the system becomes autonomous again and the solution oscillates between −A1 and (A1 )+ . Because of the resonance condition (6) the action has increased: A1 > A0 (the estimates (7) will be verified later). To repeat this increase (A2 > A1 ) we need to fulfill the resonance condition at t = 2. This is done by increasing I0 (or, equivalently, ˙ in the phase plane during t ∈ [0, 2] thus A0 ). The number of full rotations of (x, x) increases (as we will prove later), and therefore for some A0 the resonance condition is satisfied. To create the resonance again at t = 4 the potential is deformed on (A0 , A2 ) with a parameter ν (ν1 ≤ ν ≤ ν2 ), see Fig. 5. If for some deformation of the potential the number of full rotations on t ∈ [2, 4] decreases by 2, then the solution vector z ν (4) makes a full revolution in (x, x) ˙ as ν travels through [ν1 , ν2 ]2 . Therefore for some intermediate ν-value the resonance condition at t = 4 is satisfied. Repeating this procedure each half-period we obtain an unbounded solution. In the next subsection we discuss the main technical difficulties encountered in the above construction. 2.4. Averaging procedure and adiabatic invariance in wavy potentials. In Subsect. 2.2 it was shown that increase in the action takes place near t = 0 due to the rapidly changing 2
The one-parameter family of deformations U ν is constructed later.
Breakdown of Stability of Motion in Superquadratic Potentials
173
force. We have to ensure that this gain in the action achieved near p˙ = ∞ is not lost by the end of the half period. This requires an adiabatic invariance theorem. Unfortunately, none of the standard adiabatic invariance results, see e.g. [1], apply to our system since it is not C 1 -close to an integrable one: the dependence ω = ω(I) is very irregular since U has “waves”, see Fig. 6. Nevertheless, the following adiabatic invariance result with weak assumptions on U holds. Although this lemma is not quite sufficient for our purposes it still has the main ingredients of later proofs. Lemma 2.1. If the potential U (x) satisfies the conditions of Theorem 1.1 and ||p||1 is bounded then we have I(1) − I(0) = O(
1 ). ω(0)
Remark 2.2. In the standard normal form proof of adiabatic invariance the change of variables moves (φ, t)-dependence to higher order terms (see sketch of the proof below). But this approach requires estimates on the growth rate of ||H||2 . These estimates hold for the well-behaved potentials, e.g. U0 = x4 , but unfortunately they do not hold for all “wavy” potentials of Theorem 1.1. Below, we show how the proof of the adiabatic invariance theorem can be modified so that boundedness on the growth rate of the derivatives will be no longer required. Instead boundedness on the growth rate of some finite differences will have to be assumed. These finite differences can be estimated directly, see the next subsection. Remark 2.3. Methods used in the proof apply to the analysis when ||p||1 is unbounded. Sketch of the proof. We recall a standard averaging procedure for the proof of adiabatic invariance of the action, see e.g. [1], in order to indicate the step which fails for the modified potentials; then we will present the modified averaging procedure that works. Since ||p||1 is bounded, the equations of motion can be rewritten in the following form: I˙ = −f (I, φ, p) φ˙ = ω(I, p) + g(I, φ, p). 1 is used as a small parameter. This is justified because, as we will In this system ω(0) show later, ω → ∞ as I → ∞. The idea of the proof for the adiabatic invariance theorem is to find a change of variable I close to the identity so that the dependence on φ in the equation for I˙ would move to higher order terms. Let J = I + ω10 P (I, φ, p), then the equation for the rate of change of J is given by
ω ∂P 1 1 ω ∂P J˙ = I˙ + + O( ) = −f (I, φ, p) + + O( ), ω0 ∂φ ω0 ω0 ∂φ ω0 provided ||P ||1 is bounded. Since hf i = 0 the function ω P = ω0
Z
φ
f (I, ψ, p)dψ 0
is periodic in φ. Then under appropriate assumptions on the growth rate of f , g, ω and their derivatives we obtain
174
V. Zharnitsky
˙ J = O( ω10 ) J − I = O( ω10 ) and therefore I(1) − I(0) = O( ω10 ). Unfortunately, the last step fails for modified potentials since f , g, ω do not behave well. We show now how the above averaging procedure can be adjusted to work for the modified potentials by making the following observations: 1. Between two consecutive intersections at t = tn and at t = tn+1 of the solution with the section φ = 0 we approximate the evolution of action in the original system by that in the corresponding autonomous one Z
tn+1
I(tn+1 ) − I(tn ) =
f (I, φ, p(t))dt = tn Z tn+1
f (In , ω(In , pn )(t − tn ), pn )dt
tn
Z
tn+1
+
(f (I, φ, p(t)) − f (In , ω(In , pn )(t − tn ), pn ))dt.
tn
2. Taking the sum over all revolutions during t ∈ [0, 1] and suppressing the dependence of f on I, φ, p we obtain I(1) − I(0) =
N Z X n=0
tn+1
fn (t)dt + tn
N Z X n=0
tn+1
(f (t) − fn (t))dt,
tn
where N ∼ ω0 . The integrals in the first sum would vanish if they were taken over the intervals t ∈ [tn , tn +ω −1 (In , pn )] = [tn , tn +Tn ], because hf i = 0. It is assumed that t0 = 0 and tN = 1, however, there is no loss of generality in these assumptions since p(t) can be taken constant outside the interval (0, 1) without affecting I(1) − I(0). 3. Thus, the change of action is O( ω10 ) if Tn − (tn+1 − tn ) = O( ω12 ) and f − fn = O( ω10 ) 0 for t ∈ [tn , tn+1 ] for all n ≤ N . Therefore, the problem is reduced to estimating the differences of some functions on the solutions of autonomous and nonautonomous systems. In the next subsection we show how these differences can be estimated directly without computing the corresponding derivatives. 2.5. Estimation of finite differences of periods. In the last subsection we introduced the version of averaging procedure, which requires boundedness of some finite differences rather than pointwise boundedness of the derivatives. Here we show how these differences can be estimated directly even if the corresponding derivatives are unbounded. As an example we consider estimation of the difference of periods. Theorem 2.1. Let U (x) be a modified quartic potential satisfying the assumptions of √ 1 ), where Theorem 1.1. If H1 − H2 = O( x+ (H1 )), then T (H1 ) − T (H2 ) = O( x2 (H 1) 0
+
T (H) is the period of a solution of the autonomous system x¨ + U (x) = 0 with energy H.
Breakdown of Stability of Motion in Superquadratic Potentials
175
√ Remark 2.4. 1. The condition H1 − H2 = O( x+ (H1 )) comes from estimates on how much energy a solution can lose or gain during one revolution in the phase plane. 2. We will write x+ instead of x+ (H1 ) because x+ (H1 ) ∼ x+ (H2 ). First, we show that 1T = T (H1 ) − T (H2 ) can not be estimated through the derivative. The following formula was obtained by Levi in [4]: √ Z x (H) 00 + 0 2 VV dx (1 − 2 0 2 ) √ . (8) T (H) = 2H −x− (H) (V ) H − V (x) 0
T ≈ For the quartic potential this formula gives a good estimate T (H) ≤ C H 00
VV (V 0 )2
1 , x5+
since
0
is bounded but for deformed potentials T (H) may not only fail this bound but it 0
can take on even infinite values at some points. More precise estimates show that T (H) indeed exceeds the required bound for some values of H, when the quartic potential is modified as in our construction. U0 (x)+C γ(x) |x|
U(x)
H2 H1
U0(x) U0(x)
ω
U(x)
x a).
0
A
b).
Fig. 6. a) On the calculation of the difference of periods b) The dependence of the frequency on the amplitude. Note that although the derivative is not bounded the finite differences are close to those of the quartic potential
We now give a direct estimate of T (H1 ) − T (H2 ), see Fig. 6, I I dx dx − = T (H1 ) − T (H2 ) = v1 (x) v2 (x) √ Z x+ (H2 ) √ Z x+ (H1 ) dx dx √ √ −2 2 . 2 2 H1 − U (x) H2 − U (x) −x− (H1 ) −x− (H2 )
(9)
For convenience we introduce a cut-off function γ(x) satisfying 1. γ(x) ∈ C ∞ (R), 2. 0 ≤ γ(x) ≤ 1 and γ(−x) = γ(x), 3. γ(x) = 0 if |x| ≤ 1 and γ(x) = 1 if |x| ≥ 2.
p Using the inequality x4 ≤ U (x) ≤ x4 + γ(x)C |x| we estimate the difference of integrals as follows. p On the interval (−xm , xm ), where x4m + γ(xm )C |xm | ≤ min(Hp 1 , H2 ), the difference of the integrands can be estimated by substituting x4 + γ(x)C |x| instead of
176
V. Zharnitsky
U (x). Indeed, this substitution decreases the expressions under the square root signs by the same value; the difference of the fractions, therefore, increases (this follows from the monotonicity of the derivative of √1x ). The difference of the integrals over this “middle interval” can now be estimated by (8). The remaining left and right intervals . Indeed, using the equality (−x− , −xm ) and (xm , x+ ) have lengths of order x−2.5 + p x4m + γ(xm )C |xm | = x4+ (H2 ) + O(H2 − H1 ) we obtain |x+ (H1 ) − xm | ≤ |x+ (H2 ) − xm | ≤
√ C x+ ≤ Cx−2.5 . + x3+
Therefore, the time the solution spends on each interval is bounded from above by s s C distance x−2.5 + ≤C = 2. 1t ≤ C 1.5 minimal slope x+ x+ √ Thus, we obtain that 1T = O( x12 ) provided 1H ≤ C x+ . +
3. Proof of the Main Theorem In the previous section we produced a controlling sequence Ak associated with an unbounded solution x(t). In the proof of the main theorem we reverse this procedure: a solution x(t) is constructed so that the sequence Ak would satisfy the asymptotic relation (7). First, we give a formal definition of {Ak }k=∞ k=0 : Definition 3.1.
Ak :=
x+ (2k) if k is even x− (2k) if k is odd.
Note that Ak give approximate amplitudes. The next theorem is equivalent to Theorem 1.1. Theorem 3.1. There exists a C ∞ - modification U of the quartic potential U0 = x4 such that Eq. (2) possesses an unbounded solution x(t) and the following estimates on the “amplitudes” Ak and on the potential U hold 1. |Ak − uk | ≤ 0
0
3b u3k
for all k ∈ Z, 0
2. c |x|1.5 ≤ |U (x)| ≤ C |x|3 for all x ∈ R, 3. U ((−1)k Ak ) = A4k , where the sequence {un }n=∞ n=0 is defined recursively un+1 = un + 0
a u2.5 n 0
and
u0 = A0 .
The positive constants A0 , a, b, C , and c will be defined in the proofs below.
Breakdown of Stability of Motion in Superquadratic Potentials
177
Proof. The theorem is proved by induction: at the n-th step of the induction procedure we modify the potential on the interval (An−2 , An ). Remark 3.1. We concentrate on the even intervals I2n = [A2n−2 , A2n ]; the odd intervals are treated similarly. Suppose that the quartic potential U0 = x4 has already been modified on the interval [−A2n−1 , A2n−2 ] so that the new potential U (x) satisfies condition 2 of the theorem and there is a solution x(t) satisfying conditions 1 and 3 of the theorem on t ∈ [0, 4n − 2]. Consider the continuation x0 (t) of this solution on t ∈ [4n − 2, 4n]. Definition 3.2. A02n := x0+ (4n). Definition 3.3. We will say that x satisfies condition A2n (or B2n , respectively) if x ≥ A2n−2 and A2n : |x − u2n | ≤
2b , u32n
B2n : |x − (A2n−1−2q )+ | ≥
2b , u32n
where q = 0, 1, ..., n − 1 and (A2n−1−2q )+ are the right turning points corresponding to the “amplitudes” {A2n−1−2q } associated with the solution x(t) on t ∈ [0, 4n − 2]. Induction Step. Suppose that the potential is modified on [−A2n−1 , A2n−2 ] so that 1. the modification on [−A2n−1 , A2n−2 ] satisfies condition 2, 2. there is a solution x(t) satisfying condition 1 and 3 on t ∈ [0, 4n − 2], 3. A02n satisfies A2n and B2n . Then there exists a deformation of the potential U ν , in (A2n−2 , Aν2n ), where Aν2n = xν+ (4n), such that 1. U ν satisfies condition 2 on x ∈ [A2n−2 , Aν2n ], 2. Aν2n satisfies condition 1 and 3, 3. A02n+1 3 satisfies both A2n+1 and B2n+1 , where conditions A2n+1 and B2n+1 are defined similarly to A2n and B2n . Modification of the potential We define a two-parameter family of deformations of quartic potential U σ,h (ξ), where σ characterizes the smallness of the slope of modification and h is related to the length of the modified interval, on the interval [A2n−2 , u2n+2 ], as follows: – U σ,h (ξ) is a continuous function 1 – U σ,h (ξ) = U (ξ) if ξ ∈ / (A2n−2 , h 4 ), where h ∈ [A42n−2 , u42n+2 ] √ – U σ,h (ξ) is linear on each half of the interval [A2n−2 , 4 h], where the potential is deformed and on the second half of the interval the slope is σ. 3
A02n+1 is obtained by continuing the solution xν (t) defined on t ∈ [0, 4n] to t ∈ [4n, 4n + 2].
178
V. Zharnitsky
σ
h
x -A2n-1
A2n-2
σ,h A 2n
Fig. 7. Construction of a two-parameter family of deformations of the potential
These conditions uniquely define the piecewise linear (σ, h)-modification, see Fig. 7. The smoothing procedure is discussed in Remark 6.2. Next we choose a continuous one-parameter subfamily U ν = U σ(ν),h(ν) (ξ) so that the resonance condition is achieved for some sufficiently small σ while Aν2n stays “near” u2n . Due to the resonance −Aν2n+1 comes closer to −u2n+1 . The theorem follows from the two lemmas. Lemma 3.1. Under the induction assumptions there exists a continuous one-parameter family of deformations of the potential on (A2n−2 , u2n+2 ) satisfying the condition 2 and Aν2n satisfying the condition 1 and 3 such that the number of full rotations on t ∈ [4n − 2, 4n] for some values ν1 and ν2 of the controlling parameter will differ at least by two, i.e. |N (ν1 ) − N (ν2 )| ≥ 2. Lemma 3.2. Aν2n+1 satisfies A2n+1 and B2n+1 for some ν ∈ [ν1 , ν2 ]. Remark 3.2. Lemma 3.1 shows that as ν passes through [ν1 , ν2 ], the solution vector (xν (4n), x˙ ν (4n)) makes a full revolution in the phase space. Lemma 3.2 shows that for some intermediate value of ν the system will gain the necessary amount of energy (−Aν2n+1 will be sufficiently close to −u2n+1 ). 4. Preliminary Estimates In this section we find preliminary estimates for the solutions of Eq. (2), where U (x) satisfies the conditions of Theorem 1.1. Assuming a priori bounds on the constants
Breakdown of Stability of Motion in Superquadratic Potentials
179
1. 0 ≤ a, b ≤ 1, 0 2. c ≥ 10−6 , 3. A0 ≥ 100, we prove
p 0 0 Proposition 4.1. |U (x)| ≤ 10|x|3 and 0 ≤ U (x) − U0 (x) ≤ 25a |x| i.e. C = 10. Proof. We prove only the first inequality; proof of the second inequality is similar. The modification procedure is organized so that on each interval I2n−2 = (A2n−2 , A2n ) the slope cannot become steeper than |U 0 (x)|x∈(A2n−2 ,A2n ) ≤
A42n −A42n−2 A2n 3 ≤ 2s3 (A2n , A2n−2 ) ≤ 8A32n ≤ 8x3 ( ). 0.5(A2n −A2n−2 ) x
To estimate the ratio of amplitudes we use assumptions of Theorem 1.1 and apriori bounds on a, b, and A0 , |
|A2n − A2n−2 | A2n − 1| = , A2n−2 A2n−2
and since |A2n − A2n−2 | ≤ |A2n − u2n | + |u2n − u2n−1 | + |u2n−1 − A2n−1 | 1 3 5 3 ≤ 2.5 , ≤ 3 + 2.5 + 3 u2n u2n−1 u2n−1 u2n−1 then we have |
5 5 5 A2n . − 1| ≤ ≤ ≤ 2.5 A2n−2 A2n−2 100 A2n−2 u2n−1
Using inequality 1.053 < 10 8 we obtain the result. The proof for I2n−1 = (−A2n+1 , −A2n−1 ) is the same.
Next lemma shows that the left and right turning points −x− (H, p) and x+ (H, p), have the same absolute value up to an error of order x12 . +
Lemma 4.1. |x+ − x− | ≤
3 x2±
if x+ ≥ 10 . 4
Proof. Using the energy relation we write U (x+ ) − px+ = U (−x− ) + px− .
(10)
Therefore |U (x+ )−U (−x− )| = |p|(x+ +x− ). Applying the inequality for |U (x)−U0 (x)| we obtain √ √ |x4+ − x4− | ≤ |p|(x+ + x− ) + 25( x+ + x− ), then
√ √ √ √ 2 max(x+ , x− ) + 50 max( x+ , x− ) x+ + x− + 25( x+ + x− ) ≤ ≤ |x+ − x− | ≤ s3 (x+ , x− ) (max(x+ , x− ))3 50 2 50 3 2 + ≤ 2 + 2.5 ≤ 2 (max(x+ , x− ))2 (max(x+ , x− ))2.5 x± x± x±
if x± ≥ 104 .
180
V. Zharnitsky
Let x+ (t) and −x− (t) be the left and right turning points associated with the solution (x(t), x(t), ˙ p(t)), i.e. if p got frozen at t = t0 then the solution would oscillate between −x− (t0 ) and x+ (t0 ). Lemma 4.2. |x± (t) − x± ([t])| ≤ 2 provided x± (0) is sufficiently large. Proof. We prove that |x+ (t) − x+ ([t])| ≤ 2 for t ∈ [0, 1]; the other cases are proved similarly. ˙ we obtain after straightforward Differentiating H = U (x+ )−px+ and applying H˙ = −px calculations that x+ − x . (11) x˙ + = p˙ 0 U (x+ ) − p Estimating the fraction on the right hand side we obtain that 3x+ x+ − x ≤ −6 1.5 ≤1 U 0 (x+ ) − p 10 x+ − p if x+ (t) > 1014 . Taking x+ (0) > 1014 + 2 we have 1x+ ≤ 1p ≤ 2.
Corollary 4.1. If 1t ≤ 1 then |1H| ≤ 2|1p|x+ (0) Proof. The result follows from H˙ = −px ˙ and monotonicity of p. With these estimates we can calculate the time between two consecutive intersections of a solution with the section X + given by X+ = {(x, x˙ ) ∈ R2 |˙x = 0 & x > 0}. Theorem 4.1. Let 1tX + denote the interval of time between two consecutive intersections of a solution with X + . Then for any t0 and x0 the solution x(t) with initial condition 1 ˙ 0 )) = (x0 , 0) has the following property: |1tX + − x10 T0 (1)| = O( x1.75 ) for suf(x(t0 ), x(t 0 ficiently large x0 , where T0 (1) is a constant. Proof. Since v =
dξ dt
(ξ(t) is the solution) we have Z Z Z dξ dξ dξ 1tX + = = + . v(ξ) v(ξ) v(ξ) x≤0 ˙ x≥0 ˙
We estimate the first integral on the right hand side of the equation Z x+ Z 1 dξ dξ √ =√ ; 1t− = v(ξ) H(ξ) − U (ξ) + p(ξ)ξ 2 x≤0 ˙ x−
(12)
(13)
the second integral is estimated similarly. For t − t0 ≤ 1 we have 1. |ξ(t)| ≤ 2x0 , 2. 1H ≤ 4x0 , 3. |x± (t) − x± (t0 )| ≤ 2, 4. U (ξ) ≤ ξ 4 + γ(ξ)ξ; these inequalities follow immediately from the above lemmas and propositions in this
Breakdown of Stability of Motion in Superquadratic Potentials
181
H0+ 10 A 0 H0
x 2 +U(x) 2
H0- 10 A0
T
-b
-a
x
a
b
Fig. 8. On the estimation of the time of one revolution in the phase space. The real trajectory on this diagram 2 is represented by x and x˙2 + U (x). The vertical distance between the point on the trajectory and the potential is equal to the kinetic energy. The proof uses the fact that the confinement of the trajectory lies in the strip with height of order x0
section. Applying the estimates 1, 3, and 4, we obtain H0 − ξ 4 − 10x0 ≤ H(ξ) − U (ξ) + pξ ≤ H0 − ξ 4 + 10x0 . Therefore, the following inequalities for the kinetic energy hold: H0 − 10x0 − ξ 4 ≤
v 2 (ξ) ≤ H0 + 10x0 − ξ 4 . 2
(14)
Let a and b be such that H0 − 10x0 = a4 and H0 + 10x0 = b4 . By (14) the turning 1 1 points satisfy a ≤ x± ≤ b and |b − a| = (H0 + 10x0 ) 4 − (H0 − 10x0 ) 4 ≤ xC2 . 0 Now, we estimate the integral in (13) which is the sum of three parts: one where the left part of (14) is positive and the other two where it is negative, 1t− = 1(t− )1 + (1t− )2 + (1t− )3 . First, we show that the time the solution spends on the right and left intervals gives negligible input in the integral (note that these intervals are contained in (−b, −a) and (a, b)). ˙ x=0 Integrating an obvious inequality x(t) ¨ ≥ x¨ min and using identity x(t ˙ ) = 0 we obtain |b − a| ≥ |x(t) − x(tx=0 ˙ )| ≥
2 x¨ min (t − tx=0 ˙ ) . 2
Combining these estimates with the lower bound on acceleration 0
x¨ ≥ c x1.5 − 1 ≥ 10−6 x1.5 − 1 ≥ we obtain
1 −6 1.5 10 x0 , 2
182
V. Zharnitsky
r (1t− )1,3 ≤
2(b − a) ≤ x¨ min
s 20 1 q ≤ 2 ∗ 103.5 x−1.75 . 0 x20 0.5 ∗ 10−6 x1.5 0
The main contribution comes from (1t− )2 ; this is given by the integral in (13) taken over (−a, a). Applying (14) we estimate the integral Z a Z a dξ dξ 1 1 √ p √ √ ≤ 4 H(ξ) − U (ξ) + p(ξ)ξ 2 −a H0 + 10x0 − ξ 2 −a (15) Z a 1 dξ p . ≤√ 2 −a H0 − 10x0 − ξ 4 Using the definitions of a and b the left and the right integrals can be rewritten Z a Z a 1 1 dξ dξ √ p p ,√ . 4 4 2 −a b − ξ 2 −a a4 − ξ 4 The right integral can be calculated by substituting ξ = aη Z a T0 (1) T0 (1) 1 1 dξ √ p = = + O( 2 ), 2a 2x0 x0 2 −a a4 − ξ 4
(16)
(17)
where T0 (1) is the time of one revolution in the quartic potential with A0 = 1. To estimate the left integral we add and subtract two integrals over (a, b) Z a Z b Z b 1 2 dξ dξ dξ 1 √ p p p = √ −√ = 4 2 −a b4 − ξ 4 2 −b b4 − ξ 4 2 a b − ξ4 r T0 (1) 1 T0 (1) b−a − O( )= + O( 2 ). 2b b3 2x0 x0 Adding up 1(t− )k we obtain 1t− =
T0 (1) 1 + O( 1.75 ). 2x0 x0
Due to the symmetry we have the same expression for 1t+ and therefore 1tX + =
1 1 T0 (1) + O( 1.75 ). x0 x0
Corollary 4.2. T (H, p) =
1 1 T0 (1) + O( 1.75 ). x0 x0
Proof. It is a particular case of the theorem with H and p constant. Corollary 4.3. If 1t ≤ 1 then 1I ≤ 4T0 (1)1p.
(18)
Breakdown of Stability of Motion in Superquadratic Potentials
183
Proof. Using the expression for I˙ and applying the estimates for T (H, p) and x we get the result. Corollary 4.4. The interval of time between two consecutive intersections of the solution with the set W = {(x, x)|x ˙ = 0 ∨ x˙ = 0} is given by 1tW =
T0 (1) 1 + O( 1.75 ). 4x0 x0
Proof. The result follows from the symmetry properties of the integrals in (16).
5. Increase in the Action Proposition 5.1. Suppose that at t = 0 the initial condition (x(0), x(0)) ˙ is such that x(0) ˙ > 0 and (19) x(H, φ, p) = hxiφ (H, p), then I(2) − I(0) ≥
T01.5 (1) √ . 100 x+ (0)
Proof. Consider the section defined by (19) and by inequality x˙ > 0. A solution crosses this section at least once between two intersections with X + . Let us denote the moments −1 and let tN = 2 and t0 = 0. Using (5) we have of the first crossings by (tn )n=N n=0 Z
2
I(2) − I(0) = 0
N −1 Z tn+1 X dt dt f (H, φ, p) √ = f (H, φ, p) √ . t n=0 tn t
(20)
Let us estimate each integral in the last sum: Z
tn+1 tn
Z tn+1 dt dt f (H, φ, p) √ = f (Hn , φn + ωn (t − tn ), pn ) √ + t t tn Z tn+1 dt (f (H, φ, p) − f (Hn , φn + ωn (t − tn ), pn )) √ , t tn
(21)
where Hn , φn , ωn , pn are the values of the corresponding variables at t = tn . Then I(2) − I(0) =
N −1 Z tn+1 X
dt f (Hn , φn + ωn (t − tn ), pn ) √ + R, t
(22)
dt (f (H, φ, p) − f (Hn , φn + ωn (t − tn ), pn )) √ . t
(23)
n=0
tn
where R=
N −1 Z tn+1 X n=0
tn
184
V. Zharnitsky
Proposition 5.2. For any n and for any t ∈ [tn , tn+1 ] we have |f (H, φ, p) − f (Hn , φn + ωn (t − tn ), pn )| ≤
This proposition implies that R in (23) is less than
C . x+ (0)
C x+ (0) .
Proposition 5.3. In the sum in (22) all the terms are positive and the first one is larger T 1.5 (1) than 1000√x (0) . +
Proof. From Proposition 5.2 and Proposition 5.3 we get the result.
Corollary 5.1. If the condition x˙ > 0 is changed for x˙ < 0 then the action change is negative satisfying the same estimate. Remark 5.1. The same increase (decrease) in the action takes place if at t = 2 the initial condition (x(2), x(2)) ˙ is such that x(2) ˙ < 0 ( x(2) ˙ > 0 ) and (19) holds. 6. Proof of Lemma 3.1 and Lemma 3.2 6.1. Proof of Lemma 3.1. The amount of energy that the solution gains at the end of the forcing phase depends on deformation of the potential. However, we have the following Lemma 6.1.1. For the two-parameter family of deformation of the potential on [A2n−2 , 0 0 u2n+2 ] defined above we have |U σ,h (Aσ,h 2n ) − U (A2n )| ≤ 10T0 (1). This lemma and inductive assumption 3 on A2n , see Sect. 3, imply that A42n−2 = hmin ≤ 4 U σ,h (Aσ,h 2n ) ≤ hmax = u2n+2 . Therefore the equation U σ,h(σ) (Aσ,h(σ) ) = h(σ) 2n
(24)
has at least one solution for any σ. Let σ1 and σ2 be two numbers given by 0 – σ1 = c1 u1.5 2n−2 , where c1 = 2c – σ2 = u1.5 2n−2
and let h1 and h2 be the corresponding solutions of Eq. (24) with the minimal h’s. The minimal h’s exist by the following argument. For any σ there exists the infimum hinf (σ) by Lemma 6.1.1. If σ and hinf (σ) satisfy (24) then we are done, if not then there exists a sequence U σ,hk converging to U σ,hinf as k → ∞, and satisfying (24). Since the solution depends continuously on the deformation of the potential then the equality (24) will hold after taking the limit. Now, we can choose a continuous one-parameter ν-family connecting the deforma1 tions U ν1 = U h1 ,σ1 and U ν2 = U h2 ,σ2 so that Aν2n ≥ (hν ) 4 , see Fig. 9. The proof follows from the two lemmas proved in Appendix B. Lemma 6.1.2. The obtained family U ν = U h(ν),σ(ν) is continuous and Aν2n satisfies conditions 1 and 3 of Theorem 3.1 for all ν ∈ [ν1 , ν2 ]. Lemma 6.1.3. The number of full rotations on t ∈ [4n − 2, 4n] in the potentials U ν1 and U ν2 differs at least by two.
Breakdown of Stability of Motion in Superquadratic Potentials
185
σ1 σ2
h1 h2
h h1 h2
A2n-2
σ1
σ2
σ
Fig. 9. a) Two special deformations of the potential b) The construction of the continuous one-parameter family connecting the two potentials
By the last lemma the resonance condition is satisfied for some U ν , where ν1 ≤ ν ≤ ν2 . 6.2. Proof of Lemma 3.2. We have to show that while ν changes from ν1 to ν2 then −Aν2n+1 passes the interval (−u2n+1 −
2b 2b , −u2n+1 + 3 ), u32n+1 u2n+1
(25)
thus, implying Lemma 3.2. If for any x in the above interval and any Aν2n the difference |I(x, 0, −1) − I(Aν2n , 0, 1)| is smaller than a possible increase and decrease of the action then for some ν+ and ν− such that ν1 ≤ ν± ≤ ν2 we have + ≥ u2n+1 + – Aν2n+1
ν
− – A2n+1 ≤
2b u32n+1 u2n+1 − u32b . 2n+1
Then Lemma 3.2 follows from Lemma 6.2.1. 2b 200T0 (1)a , 0, −1) − I(Aν2n , 0, 1)| ≤ √ , u2n u32n+1 2b 200T0 (1)a |I(−u2n+1 + 3 , 0, −1) − I(Aν2n , 0, 1)| ≤ √ , u2n u2n+1
|I(−u2n+1 −
which is proved in Appendix B. Remark 6.1. Combining these inequalities with the expression of possible change of action, see Proposition 5.3, we obtain the inequality for a √ 200T0 (1)a T0 (1) T0 (1)1.5 ≥ √ ⇒a≤ . √ 100 u2n u2n 2 ∗ 104
186
V. Zharnitsky
Remark 6.2 (Smoothing of the potential). We use the piecewise linearity of deformations of the potential only in the proof of Lemma 3.1. Proofs of other statements require only continuity of the potential. We can define a two-parameter family of C ∞ deformations of the potential by smoothing U σ,h in small neighborhoods of the corners. The smoothing can be organized so that the new family depends continuously on the parameters. Then after minor changes the proof of Lemma 3.2 works for a smooth family, as well, because smoothing can be done on arbitrarily small intervals.
A. Proof of the Proposition 5.2 We prove a statement which implies Proposition 5.2. 0
Proposition A.1. Consider the equation x¨ + U (x) = p(t), where U satisfies the as˙ 0 )), then for any sumptions of Theorem 3.1. Let the initial condition be (x(t0 ), x(t 0 (1) t ∈ [t0 , t0 + 3T ] we have x+ (t0 ) |f (H, φ, p) − f (H0 , φ0 + ω(H0 , p0 )(t − t0 ), p0 )| ≤
C , x+ (t0 )
where H0 = H(t0 ) and p0 = p(t0 ). Proof. Applying the triangle inequality we obtain |f (H, φ, p) − f (H0 , φ0 + ω(H0 , p0 )(t − t0 ), p0 )| ≤ |f (H, φ, p) − f (H0 , φ, p0 )| + |f (H0 , φ, p0 ) − f (H0 , φ0 + ω(H0 , p0 )(t − t0 ), p0 )| ≤ ∂f |f (H, φ, p) − f (H0 , φ, p0 )| + | ||φ − φ0 − ω(H0 , p0 )(t − t0 )|. ∂φ ∂f To estimate | ∂φ | we use that f (H, φ, p) = T (H, p)(x(H, φ, p) − hxiφ (H, p)) and
∂x = T ∂φ = T 2 x, ˙ see [4], hence | ∂f (H,φ,p) | ≤ T 2 |x| ˙ ≤ C, since using the energy ∂φ equation one immediately obtains the inequality |x| ˙ max ≤ Cx2+ . It follows from the above observations that the statement of the proposition is true if under the same conditions for the same amount of time we have the estimates ∂f (H,φ,p) ∂φ
1. 2. 3. 4.
1T = T (H0 , p0 ) − T (H, p) = O( x12 ), + 1x = x(H, φ, p) − x(H0 , φ, p0 ) = O(1), 1hxi = hxi(H, p) − hxi(H0 , p0 ) = O(1), 1φ = φ − φ0 − ω(H0 , p0 )(t − t0 ) = O( x1+ ).
A.1. Estimation of 1T . From the definition of period we have I I dξ dξ − = T (H, p) − T (H0 , p0 ) = v(ξ) v0 (ξ) Z Z √ dξ dξ √ − ). (26) 2( √ H − U (ξ) + pξ H0 − U (ξ) + p0 ξ √ Since p(t) does not change faster than t we have the following estimates if 1t ≤ xC+ :
Breakdown of Stability of Motion in Superquadratic Potentials
187
U-p0x
H 0 + C1 x+ x2 +U(x)-p0x 2
H0
H 0 - C1 x+
x 0
x
x+
Fig. 10. Estimation of the difference of periods 1T . The trajectory on this diagram is represented by x and y = x˙ 2 /2 + U (x) − p0 x. The vertical distance between the point on the trajectory and the potential U (x) − p0 x is equal to the kinetic energy. The estimates hold because the real trajectory is confined to the strip with the √ height of order x+
√ C |p − p0 | ≤ √ |H − H0 | ≤ C x+ . x+ p By Proposition 4.1 we have ξ 4 ≤ U (ξ) ≤ ξ 4 + 25γ(x) |ξ|. Consider the interval where the expressions under the square roots remain positive if U (ξ) is replaced by p ξ 4 + 25γ(ξ) |ξ|. On this interval (−a− , a+ ) we have Z a+ √ Z a+ dξ dξ √ √ − |≤ 2| H − U (ξ) + pξ H − U (ξ) + p0 ξ 0 −a− −a− Z a+ √ Z a+ dξ dξ q q − |≤ 2| p p −a− −a− H − ξ 4 − 25γ |ξ| + pξ H0 − ξ 4 − 25γ |ξ| + p0 ξ √ Z a+ dξ q 2| p √ −a− H0 − ξ 4 − 25γ |ξ| + p0 ξ + C1 x+ Z a+ dξ q − |, p √ −a− H0 − ξ 4 − 25γ |ξ| + p0 ξ − C1 x+ √ because |H + pξ − H0 − p0 ξ| ≤ |H − H0 | + |p − p0 ||ξ| ≤ C1 x+ . We choose −a− and a+ as the solutions of the equation p √ H0 − ξ 4 − 25 |ξ| + p0 ξ − C1 x+ = 0. First, we estimate the time of motion outside the middle interval (−a− , a+ ). The right and left turning points ±x± (H, p) and ±x± (H0 , p0 ), which are the limits of integration in (26), are solutions of the equations
188
V. Zharnitsky
H − U (ξ) + pξ = 0, H0 − U (ξ) + p0 ξ = 0. We prove that |x± (H, p) − a± | ≤
C , x2.5 +
|x± (H0 , p0 ) − a± | ≤
(27)
C . x2.5 +
Indeed, after subtracting the equations for a+ and x+ (H, p), p √ H0 − a4+ − 25 |a+ | + p0 a+ − C1 x+ = 0, H − U (x+ (H, p)) + px+ (H, p) = 0, and using the estimates we obtain
√ |x4+ (H, p) − a4+ − p0 (x+ (H, p) − a+ )| ≤ C x+ ,
therefore we obtain |x+ (H, p) − a+ | ≤
C . x2.5 +
The other inequalities in (27) are proved similarly. 0 0 Since U (x) ≥ c x1.5 the time the solutions spend outside of the middle interval is of order s 1 x−2.5 + = 2. 1.5 x+ x+ The difference of integrals in (26) taken over [−a− , a+ ] is majorized by (27). Note that the distance between a+ (−a− ) and the right limit (the left limit) point, where the denominators vanish, is of order x12.5 . These estimates are obtained by the same + calculations as those in (27). By the same argument as before we can calculate that replacing the limits of integration a+ and a− by the limits at which the denominators vanish, we make an error of order x12 . Therefore the problem reduces to estimating the + difference of periods: √ √ |T (H0 + C1 x+ , p0 ) − T (H0 − C1 x+ , p0 )| p with the potential Uaux (ξ) = ξ 4 + 25γ(ξ) |ξ| − p0 ξ. 0
T Proposition A.2. For Uaux we have T (H) ∼ H . √ √ Thus T (H0 + C1 x+ , p0 ) − T (H0 − C1 x+ , p0 ) ∼
1 . x4.5 +
Finally we obtain 1T ≤
C . x2+
Corollary A.1. Consider two trajectories starting on the same ray of the set W = {(x, x)|x ˙ = 0 ∨ x˙ = 0}. Let one of the trajectories be a solution of the real system and the other one be a solution of the √ autonomous system (i.e. p is frozen). If the initial conditions are such that 1H = O( x+ ) and 1p = O( √1x+ ), then it takes the same time for both solutions to reach the next ray of W up to an error of order and H2 are sufficiently large).
1 x2+
(provided H1
Breakdown of Stability of Motion in Superquadratic Potentials
189
Proof. The proof goes along the lines of the above one for 1T . The estimate holds for the same√reason: both trajectories on the diagram are confined to the strip of the hight of order x+ , see Fig. 10. A.2. Estimation of 1x. To estimate |x(H, φ, p) − x(H0 , φ, p0 )| we use the definition of φ: Z x(H,φ,p) dξ 1 √ , φ = √ ω(H, p) H − U (ξ) + pξ 2 0 Z x(H0 ,φ,p0 ) 1 dξ √ φ = √ ω(H0 , p0 ) . H − U (ξ) + p0 ξ 2 0 0 Subtracting one equation from the other we obtain Z
x(H,φ,p)
(ω(H, p) − ω(H0 , p0 )) Z
x(H,φ,p)
ω(H0 , p0 )( 0
√
dξ − H − U (ξ) + pξ
Z
0 x(H0 ,φ,p0 )
0
√
√
dξ + H − U (ξ) + pξ
dξ ) = 0. H0 − U (ξ) + p0 ξ
The first term in the last equation is of order x1+ . Indeed, ω(H, p) − ω(H0 , p0 ) = O(1) by the result in Subsect. A.1 and the integral representing the time of less than three revolutions is by Theorem 4.1 of order x1+ . Dividing the equation by ω(H0 , p0 ) we obtain Z
x(H,φ,p)
√
0
dξ − H − U (ξ) + pξ
Z
x(H0 ,φ,p0 )
√
0
1 dξ = O( 2 ). x+ H0 − U (ξ) + p0 ξ
If x(H, φ, p) and x(H0 , φ, p0 ) are of the same sign then one of them is larger in magnitude than the other, if, however, their signs are different then they are much smaller than x+ (otherwise their symplectic angles would be different, see Corollary 4.4). In both cases we can subtract and add a proper mixed term to obtain Z
x(H,φ,p)
0 x(H,φ,p)
Z
√
√
0
dξ − H − U (ξ) + pξ
dξ − H0 − U (ξ) + p0 ξ
Z
x(H,φ,p)
√
0
Z
x(H0 ,φ,p0 )
dξ + H0 − U (ξ) + p0 ξ
√
0
1 dξ = O( 2 ). x+ H0 − U (ξ) + p0 ξ
The integrals in the first difference represent the times it takes the solutions to travel between x = 0 and x. We estimate this difference by first applying Corollary A.1. Then the problem reduces to estimating the difference in case the parts of trajectories on which we estimate the travel times lie in the first quadrant. √ Lemma A.1. Suppose that H − H0 = O( x+ ) and p − p0 = O( √1x+ ). If the integrals are taken over those parts of trajectories which lie in the first quadrant then Z
x(H,φ,p) 0
√
dξ − H − U (ξ) + pξ
Z 0
x(H,φ,p)
√
1 dξ = O( 2 ). x+ H0 − U (ξ) + p0 ξ
190
V. Zharnitsky
Proof. Due to the assumptions on H − H0 and p − p0 there exists a constant C1 such that √ H0 + p0 ξ − H − pξ ≤ C1 x+ .
(28)
We estimate the difference on two intervals: [0, a] and [a, x], where a is the least of x and the positive solution of p √ H0 − ξ 4 − 25γ |ξ| + p0 ξ − C1 x+ = 1. We have 1 ), x2+ 1 x+ (H, p) − a = O( 2 ). x+
x+ (H0 , p0 ) − a = O(
These estimates correspond to those in (27) and are obtained similarly. By the same argument as in Subsect. A.1 we obtain that both solutions stay in [a, x] for the time of order x12 . Thus, to finish the proof it remains to show that +
Z 0
a
√
dξ − H − U (ξ) + pξ
Z 0
a
√
1 dξ = O( 2 ). x+ H0 − U (ξ) + p0 ξ
By Eq. (28) this difference does not exceed Z a Z a dξ dξ p p | − √ √ |≤ H − U (ξ) + p ξ + C x H − U (ξ) + p0 ξ − C1 x+ 0 0 0 0 1 + 0 Z a Z a dξ dξ p p − |≤ | √ √ H0 + C1 x+ − Uaux (ξ) H0 − C1 x+ − Uaux (ξ) 0 0 Z a √ dξ d √ |2C1 x+ ), max(| dH 0 H − Uaux (ξ) p √ √ where Uaux (ξ) = ξ 4 + 25γ(ξ) |ξ| − p0 ξ and H ∈ [H0 − C1 x+ , H0 + C1 x+ ]. Proposition A.3. For Uaux we have Z a Uaux (a) d T dξ √ √ | | ≤ C( + | |). 0 dH 0 H H − Uaux (ξ) HUaux (a) H − Uaux (a) By the last proposition and due to the choice of a we can continue the inequality to obtain C(
√ 1 1 1 + 3 )2C1 x+ ≤ C 2.5 . 5 x+ x+ x+
This ends proof of the lemma.
Breakdown of Stability of Motion in Superquadratic Potentials
191
To estimate the second difference we rewrite it Z
x(H,φ,p) x(H0 ,φ,p0 )
√
dξ 1 . ≥ |x(H, φ, p) − x(H0 , φ, p0 )| √ H0 − U (ξ) + p0 ξ H0 + 1
Therefore we obtain x(H, φ, p) − x(H0 , φ, p0 ) =
p 1 H0 + 1 O( 2 ). x+
By Proposition 4.1 we have √ |H0 − x4+ | = |U (x+ ) − p0 x+ − x4+ | ≤ |25 x+ + x+ | ≤ 2x+ . Therefore
√
H0 + 1 ≤
p x4+ + 2x+ + 1 ≤ 2x2+ and we obtain the desired estimate x(H, φ, p) − x(H0 , φ, p0 ) = O(1).
A.3. Estimation of 1hxi. |hxi(H, p) − hxi(H0 , p0 )| = | (x(H, φ, p) − x(H0 , φ, p0 )) dφ| ≤ |x(H, φ, p) − x(H0 , φ, p0 )|max = O(1) H
A.4. Estimation of 1φ. Let us denote the values of the variables in the nonautonomous system at t by the same letters with subscripts t, e.g. Ht = H(t), pt = p(t), φt = φ(t) etc. Using the definition of φ we write
1 √ ωt 2
Z
xt 0
φ − φ0 − ω(H0 , p0 )(t − t0 ) = Z xt 1 dξ dη √ √ √ − ω(H0 , p0 ) . H(η) − U (η) + p(η)η Ht − U (ξ) + pt ξ 2 0
Remark A.1. The first integral is taken over the closed trajectory of the corresponding autonomous system with H = Ht and p = pt , the second integral is taken over the autonomous trajectory with (H0 , p0 ) on x ∈ [0, x0 ] and then over the actual trajectory η(s), where (η(t0 ), η(t ˙ 0 )) = (x0 , x˙ 0 ). Subtracting and adding the mixed term we obtain Z xt 1 dξ √ + φ − φ0 − ω(H0 , p0 )(t − t0 ) = √ (ωt − ω(H0 , p0 )) Ht − U (ξ) + pt ξ 2 0 Z xt Z xt 1 dξ dξ √ ω(H0 , p0 )( √ √ − ). Ht − U (ξ) + pt ξ H(ξ) − U (ξ) + p(ξ)ξ 2 0 0 Only the difference of integrals has not been estimated before; it is estimated as a similar expression in Subsect. A.2 and is of order x12 . Therefore we get the result 1φ = O( x1+ ). +
192
V. Zharnitsky
B. Modification B.1. Proof of Lemma 3.1. According to the induction procedure we consider the behavior of the system for t ∈ [4n − 2, 4n]. At t = 4n − 2 the left turning point is −A2n−1 . At t = 4n − 1 the right turning point is A02n , it satisfies assumptions A2n and B2n . B.2. Proof of Lemma 6.1.1. Plan of the proof. 1. Using the energy relation the time when x+ ≥ A2n−2 can be estimated, if the potential is not modified on x ≥ A2n−2 . Since x+ increases monotonically on t ∈ [4n − 2, 4n − 1] the time when x+ ≥ A2n−2 is independent of a modification on (A2n−2 , u2n+2 ). 2. Using 1., the variation in I(4n − 1) due to the modification on (A2n−2 , u2n+2 ) is estimated. 3. Knowing 1I(4n − 1) we estimate 1H(4n − 1) for different modifications on (A2n−2 , u2n+2 ). Proof. 1. Consider the energy relation H=
y2 + U (x) − px = U (x+ ) − px+ . 2
After differentiation and rearrangement of the terms we obtain U˙ (x+ ) = p(x ˙ + − x) + px˙ + . We estimate U˙ (x+ ) from below using that if x ≤ 0 and p ≥ 0 then U˙ (x+ ) ≥ px ˙ +. This inequality is true, in particular, for t ∈ [4n − 2 + 41 , 4n − 1] and on this interval 1 ≤ p˙ ≤ 2. By Corollary 4.4 the interval of time between two consecutive intersections of a solution with the surface {(x, y, t)|x = 0} is asymptotically half of the time of one T0 (1) 1 + o( A2n−2 ). revolution, i.e. 1t = 2A 2n−2 Thus, U˙ (x+ ) ≥ x+ during 1t = T0 (1) + O( 1 ) every revolution. Using these 2A2n−2
A2n−2
estimates we obtain that on t ∈ [4n − 2 + 41 , 4n − 1] U (x+ ) increases at least by T0 (1) = c T02(1) after each revolution. U˙ (x+ )min 1t ≥ cA2n−2 2A 2n−2
C Since by the inductive assumptions A02n − A2n−2 ≤ A2.5 then U (A02n ) − 2n−2 p U (A2n−2 ) = U0 (A02n ) − U0 (A2n−2 ) ≤ C A2n−2 . Thus, the solution passes the inp terval [A2n−2 , A02n ] in less than C A2n−2 revolutions. Since the time of one revolution 0 (1) the interval is crossed in less than √ C seconds. Note that during is of order AT2n−2 A2n−2
this time p ≥ 0. 2. We use the averaging procedure similar to the one used in the proof of Theorem 4.1. On the considered time interval p˙ is bounded (by 2), therefore we can rewrite the expression for I˙ as follows: 2 f (H, φ, p) = F (H, φ, p). I˙ = −pf ˙ (H, φ, p) = − p+1
Breakdown of Stability of Motion in Superquadratic Potentials
193
The solution intersects the plane {(x, y, t)|x = 0, x˙ > 0} once each revolution. The moments of intersections are denoted by {tn }n=N n=0 , where t0 corresponds to the last intersection before x+ = A2n−2 and tN corresponds to the last intersection before t = 4n − 1. Integrating the equality for I˙ we obtain Z 4n−1 F (H, φ, p) dt I(4n − 1) − I(t0 ) = t0
=
N −1 Z tk+1 X k=0
Z
4n−1
F (H, φ, p) dt +
tk
F (H, φ, p) dt.
(29)
tN
Proposition B.1. The sum in (29) is less than
C . u1.25 2n
To estimate the integral in (29) we observe |F (H, φ, p)| ≤ 2f ≤ 3A2n−2 T ≤ 4A2n−2
1.1T0 (1) ≤ 5T0 (1), A2n−2
and 4n − 1 − tN is less than the time of one revolution, i.e. |(4n − 1) − tN | ≤ Therefore we have the result I(4n − 1) − I(t0 ) ≤
5.5T02 (1) 1 6T 2 (1) + O( 1.5 ) ≤ 0 . A2n−2 A2n−2 u2n
3. We use the fact that only during the last C √
1 A2n−2
1.1T0 (1) A2n−2 .
(30)
seconds the motions in modified
and unmodified p potentials (on the interval [A2n−2 , u2n+2 ]) are different. The energy changes by C A2n−2 during this time in both cases, see Corollary 3.1. This implies p that |H σ,h − H 0 | ≤ C A2n−2 , where H 0 denotes the energy of the solution x(t) at t = 4n−1 in the potential, which is not modified on the interval (A2n−2 , u2n+2 ) and H σ,h in the potential modified on the above interval. However, we need a sharper estimate |H σ,h − H 0 | ≤ C. To establish the last inequality we consider the difference |H σ,h (I σ,h , 1) − H 0 (I 0 , 1)| ≤ |H σ,h (I σ,h , 1) − H σ,h (I 0 , 1)| + |H σ,h (I 0 , 1) − H 0 (I 0 , 1)|. Let us introduce the notation for some energy values: – H σ,h = H σ,h (I σ,h , 1) – H 0 = H 0 (I 0 , 1) – H0σ,h = H σ,h (I 0 , 1). The first difference in the right part of the above inequality is estimated by applying Theorem 4.1, ∂H σ,h (I, 1)|I σ,h − I 0 | ∂I 6T 2 (1) ≤ 1.1T0−1 (1)A2n−2 0 ≤ 6.6T0 (1). A2n−2
|H σ,h (I σ,h , 1) − H σ,h (I 0 , 1)| ≤
194
V. Zharnitsky
The second difference turns out to be of higher order. We prove this by considering the difference of actions which is zero by the obvious identity I σ,h (H σ,h (I 0 , 1), 1) = I 0 , Z
xσ,h + −xσ,h −
Z
0 = I σ,h (H0σ,h , 1) − I 0 (H 0 , 1) = Z x0+ p q H0σ,h − U σ,h (ξ) + ξ dξ − H 0 − U 0 (ξ) + ξ dξ = −x0−
A2n−2 −(A2n−2 )−
q p ( H0σ,h − U 0 (ξ) + ξ − H 0 − U 0 (ξ) + ξ) dξ + R,
where R is a sum of four integrals4 over the intervals on the left and right of −(A2n−2 )− and A2n−2 . The lengths of these intervals are of order A2.51 and the expressions under 2n−2 p the square roots on these intervals are of order A2n−2 by the previous observations, therefore C C p . |R| ≤ 2.5 4 A2n−2 ≤ 2 A2n−2 A2n−2 Thus, we obtain the inequality Z A2n−2 q p C σ,h 0 (ξ) + ξ − ≥ | H − U H 0 − U 0 (ξ) + ξ|dξ = 0 A22n−2 −(A2n−2 )− Z A2n−2 dξ q |H0σ,h − H 0 | . p σ,h −(A2n−2 )− H0 − U 0 (ξ) + ξ + H 0 − U 0 (ξ) + ξ Suppose for definiteness that H 0 ≥ H0σ,h (the opposite case is proved similarly), then Z A2n−2 C dξ σ,h 0 p = ≥ |H − H | 0 2 0 A2n−2 −(A2n−2 )− 2 H − U 0 (ξ) + ξ Z x0+ dξ p + R), |H0σ,h − H 0 |( −x0− 2 H 0 − U 0 (ξ) + ξ where R is the sum of two boundary integrals, which are estimated as before and are of order A2 1 . The last integral in the last inequality is equal to the period of one revolution 2n−2
in the autonomous system and by Corollary 4.2 we have |H0σ,h − H 0 | ≤
1 C O(A2n−2 ) ≤ . A2n−2 A22n−2
Adding up the estimates for both differences we obtain the inequality |H σ,h − H 0 | ≤ 7T0 (1). Finally, we have 0 0 σ,h 0 0 + Aσ,h |U σ,h (Aσ,h 2n ) − U (A2n )| = |H 2n − H − A2n | 0 ≤ |H σ,h − H 0 | + |Aσ,h 2n − A2n | ≤ 10T0 (1),
since |A02n − Aσ,h 2n | ≤
C . A22n−2
4 Substitution of U 0 for U σ,h in the first integral is possible since by construction U σ,h (x) = U 0 (x) for x ≤ A2n−2 .
Breakdown of Stability of Motion in Superquadratic Potentials
195
B.3. Proof of Lemma 6.1.2. Proof. The one parameter family U ν is continuous by construction. Now we estimate how far Aσ2n can deviate from A02n . Since the potential is not modified at Aν2n we have U 0 = (A02n )4 , U ν = (Aν2n )4 . Therefore |A02n − Aν2n |s3 (A02n , Aν2n ) = |U 0 − U ν | ⇒ |A02n − Aν2n | ≤
10T0 (1) 4u32n−2 |u2n − A02n | + |A02n
Now, we specify b: let b = 10T0 (1). Then |u2n − Aν2n | ≤ 2b + ub3 ≤ u3b3 , therefore, Aν2n satisfies condition 1 of Theorem 3.1. u3 2n
2n
2n
≤
10T0 (1) . u32n − Aν2n | ≤
B.4. Proof of Lemma 6.1.3. We consider the motion in the two potentials U ν1 and U ν2 , defined in Subsect. 6.1. The corresponding solutions are denoted by xν1 (t) and xν2 (t). The proof consists of two parts: – first, we show that the number of rotations of the solutions xν1 (t) and xν2 (t) on t ∈ [4n − 2, 4n − 1] is different by not more than one – second, we prove that the number of full rotations for the two solutions on t ∈ [4n − 1, 4n] is different by more than 3. By Theorem 4.1 the interval of time between two consecutive intersections of the 1 0 (1) + O( A1.75 ). solution xν (t) with the section X + , see Sect. 4 for definition, is 1t = AT2n−2 0
+ Therefore, the corresponding number of revolutions per second is = AT2n−2 0 (1) 0.25 O(A2n−2 ). But since the interval of time during which the system “feels” modification on (A2n−2 , u2n+2 ), i.e. while x+ ≥ A2n−2 , is of order √ 1 (this is precisely the A2n−2
result of the first part of Lemma 6.1.1’s proof) the difference in the number of rotations for the two modifications is given by 1N = O(A0.25 2n−2 ) p
1 1 = O( p ), 4 A2n−2 A2n−2
which proves the first part of the lemma. To prove the second part, it suffices to show that the difference of frequencies for the solutions xν1 (t) and xν2 (t) on t ∈ [4n − 1, 4n] is larger than 3, since the number of revolutions per second coincides with the frequency. Expressing this condition in terms of periods we obtain |ω1 − ω2 | ≥ 3 ⇔ | T11 − T11 | ≥ 3 ⇔ |T2 − T1 | ≥ 3T1 T2 . By Corollary 4.2 it suffices to show that |T2 − T1 | ≥
4T02 (1) . u22n−2
We write the difference of periods as follows Z I I √ Z dξ dξ dξ dξ √ − ), − = 2( √ T 2 − T1 = v2 (ξ) v1 (ξ) H2 − U2 (ξ) + ξ H1 − U1 (ξ) + ξ where the limits of integration are the points where the corresponding denominators vanish. The difference of the times the solutions spend on the left and on the right of A2n−2 is estimated separately.
196
V. Zharnitsky
H H
-a -
ν2
ν1
a+
0
ν
ν
x
2 1 A2n A2n-2 A 2n
Fig. 11. The difference of periods for U ν1 and U ν2 . The largest contribution is due to 1t on x ≥ A2n−2 . The left interval x ≤ −a− gives smaller contribution because the potential is steep there by condition B2n . The middle interval gives also smaller input because of the large velocity
First consider the difference of integrals on ξ ≥ A2n−2 , where they differ the most. Using linearity of the potential on each half of the intervals under consideration [A2n−2 , Aν2ni ] we can estimate the corresponding integrals. We use the formula for the time it takes a freely falling body to travel from the rest point to a given one r 1t =
2 distance . acceleration
We use the above formula and two simple facts: – it takes less time for a particle to travel a smaller distance (for the first integral) – it takes longer to travel the same distance with a smaller acceleration (for the second integral) to estimate Z
ν
A2n1
1t1 = 2
A2n−2
s dξ ≥2 v1 (ξ)
2
(Aν2n1 − A2n−2 )/2 σ1
and Z 1t2 = 2
ν
A2n2 A2n−2
s dξ ≤2 v2 (ξ)
2
(Aν2n2 − A2n−2 ) . σ2
1.5 Since σ1 = c1 u1.5 2n−2 and σ2 = u2n−2 and we can choose c1 to be arbitrarily small we have
Breakdown of Stability of Motion in Superquadratic Potentials
197
s
s (Aν2n1 − A2n−2 ) (Aν2 − A2n−2 ) 1t1 − 1t2 ≥ 2 − 2 2 2n = σ1 σ2 s s u2n − u2n−2 + O(u−3 u2n − u2n−2 + O(u−3 2n−2 ) 2n−2 ) − 2 2 = 2 1.5 1.5 c1 u2n−2 u2n−2 v v u 2a u 2a u u2.5 + O(u−3 u u2.5 + O(u−3 ) 2n−2 2n−2 ) t 2n−2 t 2n−2 2 − 2 2 = c1 u1.5 u1.5 2n−2 2n−2 √ r 1 4 a ( − 1) + O(u−2.5 2n−2 ). 2 2c1 u2n−2 By choosing sufficiently small c1 (a is already fixed by Remark 6.1) we make the last 4T 2 (1) expression larger than u2 0 . Therefore we also fix c0 = 0.5c1 ; note that c0 > 10−6 as 2n−2
was originally assumed. Now, it remains to show that Z Z dξ dξ √ √ − H2 − U (ξ) + ξ H1 − U (ξ) + ξ ξ≤A2n−2 ξ≤A2n−2 1 . We estimate this difference on three intervals, see Fig. 11. u22n−2 p The middle interval (−a− , a+ ) is defined by: {a± : a4± + 25 |a± | ± a± = min(H1 , H2 ) and a+ ≤ A2n−2 }. The lengths of the left and right intervals are of order u2.51 as in the 2n−2 theorems before. The integrals over the right intervals give contribution of order u2.51 2n−2
is smaller in order than
since on these intervals v(ξ) ≥ 1. The integrals over the left intervals where v(ξ) ≥ 1 are estimated similarly. The integrals over the part where v(ξ) < 1 are also of higher order because the potential has a steep slope there by condition B2n . Now we estimate the difference of the integrals over the middle interval: Z a+ Z a+ √ Z a+ dξ dξ dξ q − | ≤ 2| | p v (ξ) v (ξ) 1 2 a− a− a− H2 − ξ 4 − 25γ(ξ) |ξ| + ξ Z a+ dξ q | − p a− H1 − ξ 4 − 25γ(ξ) |ξ| + ξ √ Z x+ (H1 ) dξ q = 2| p x− (H1 ) H2 − ξ 4 − 25γ(ξ) |ξ| + ξ Z x+ (H2 ) dξ q | + R, − p x− (H2 ) H1 − ξ 4 − 25γ(ξ) |ξ| + ξ
where R is the sum of four integrals which are estimated as before and are of order A−2.5 2n−2 . The last p difference is equal to T (H1 ) − T (H2 ) for the auxillary potential Uaux (x) = x4 + 25γ(x) |x| − x.
198
V. Zharnitsky
Using Proposition A.2 we have the estimate 0
|T (H1 ) − T (H2 )| ≤ |T (H)|max |H1 − H2 | ≤ C
1 T ≤C 5 . H A2n−2
C. Proof of Lemma 3.2 C.1. Proof of Lemma 6.2.1. We will prove the first inequality; the proof of the other inequality is similar. Using the triangle inequality we obtain |I(−u2n+1 − |I(−u2n+1 −
2b u32n+1
2b u32n+1
, 0, −1) − I(Aν2n , 0, 1)| ≤
, 0, −1) − I(−Aν2n , 0, −1)| + |I(−Aν2n , 0, −1) − I(Aν2n , 0, 1)|.
1. Estimation of the first difference |I(−u2n+1 − First, we observe that |u2n+1 +
2b , 0, −1) u32n+1
− I(−Aν2n , 0, −1)|.
2b 2b − Aν2n | ≤ |u2n+1 − u2n | + |u2n − Aν2n | + 3 ≤ u32n+2 u2n+1 2b 2b 2a a + 3 + 3 ≤ 2.5 u u u2.5 u 2n 2n+1 2n 2n
for sufficiently large n. Using that the potential is not modified for x < −A2n−1 we have 2b , 0, −1) − H(−Aν2n , 0, −1)| = u32n+1 2b 2b |(u2n+1 + 3 )4 + u2n+1 + 3 − (Aν2n )4 − Aν2n | ≤ u2n+1 u2n+1 √ 2b 2b 2a |u2n+1 + 3 − Aν2n ||s3 (u2n+1 + 3 , Aν2n ) + 1| ≤ 2.5 (4u32n+1 + C) ≤ 16a u2n . u2n+1 u2n+1 u2n |H(−u2n+1 −
Using Corollary 4.2 we estimate the first difference |I(−u2n+1 −
2b u32n+1
∂I(H, −1) |max |1H| ≤ ∂H √ T0 (1) 1 20aT0 (1) ( + O( 1.75 ))16a u2n ≤ √ . u2n u2n u2n
, 0, −1) − I(−Aν2n , 0, −1)| ≤ |
2. Estimation of the second difference |I(−Aν2n , 0, −1) − I(Aν2n , 0, 1)|. Since the potential U ν is not modified at x = ±Aν2n , then H(−Aν2n , 0, −1) = H(Aν2n , 0, 1). Adding and subtracting a proper mixed term we obtain |I(−Aν2n , 0, −1) − I(Aν2n , 0, 1)| ≤ |I(−Aν2n , 0, −1) − I0 (−Aν2n , 0, −1)| + |I(Aν2n , 0, 1) − I0 (Aν2n , 0, 1)|,
Breakdown of Stability of Motion in Superquadratic Potentials
199
where I0 (x, x, ˙ p) is the action at (x, x, ˙ p) in the quartic potential U0 (x) = x4 . We show how to estimate the second difference in the last inequality, the first one is estimated similarly. Using the definition of I we have |I(Aν2n , 0, 1) − I0 (Aν2n , 0, 1)| = Z x0+ (H) p √ Z x+ (H) p 2 2| H − U (ξ) + ξ dξ − H − U0 (ξ) + ξ dξ| ≤, x0− (H)
x− (H)
where the limits of integration are found from the energy relations. Observing that x+ (H) = x0+ (H) = Aν2n , x− (H) = x0− (H) + O(u−2.5 2n ) and using the inequality U 0 (ξ) ≥ cξ 1.5 we can continue p √ Z x+ (H) 25a |ξ| dξ √ √ ≤2 2 + O(u−2 2n ) ≤ H − U (ξ) + ξ + H − U0 (ξ) + ξ x− (H) Z x+ (H) p √ dξ √ 2 2 25a x+ (H) + O(u−2 2n ) ≤ H − U (ξ) + ξ x− (H) √ √ T0 (1) 1 75aT0 (1) 2 2 25a x+ ( + O( 1.75 )) + O(u−2 , √ 2n ) ≤ x+ x+ u2n since x+ (H) = Aν2n and |Aν2n − u2n | ≤
C . u32n
With the same calculations we obtain |I(−Aν2n , 0, −1) − I0 (−A2n , 0, −1)| ≤
75aT0 (1) . √ u2n
Finally, adding up all three terms we obtain the bound |I(−u2n+1 −
2b 200aT0 (1) , 0, −1) − I(Aν2n , 0, 1)| ≤ √ . u2n u32n+1
D. Proof of Propositions 5.3, A.2, A.3, and B.1 D.1. Proof of Proposition A.2. It was shown in [4] that if V (x) satisfies the following conditions: 1. V (x± (H)) = H, 2. V (x) < H for x− < x < x+ , 0 0 3. V (0) = V (0) = 0 and V (x) 6= 0 for x 6= 0, 0
then the derivative of period of one oscillation in the autonomous system x¨ + V (x) = 0 is given by √ Z x (H) 00 + 0 VV dx 2 (1 − 2 0 2 ) √ . (31) T (H) = 2H x− (H) (V ) H − V (x)
200
V. Zharnitsky
In our case U (ξ) does not have all the above properties, however, the function V (η) = 0 U (η + ξ0 ) − U (ξ0 ), where ξ0 : U (ξ0 ) = 0, does. The periods of oscillations for these 0 0 potentials are related via TU (H) = TV (H − U (ξ0 )), therefore TU (H) = TV (H − U (ξ0 )). 0 To estimate TV (K = H − U (ξ0 )) we use (31), √ Z x (K) 00 00 + 0 VV dx 1 VV 2 | |TV (K)| = (1 − 2 0 2 ) √ | ≤ |2 0 2 |max TV (K). 2K x− (K) (V ) K (V ) K − V (x) Since H ∼ K and TU (H) = TV (K) we have proved 0
Proposition D.1. Let U (ξ) satisfy conditions 1 and 2 above and let U (ξ0 ) = U (ξ0 ) = 0 0 and U (x) 6= 0 for x 6= ξ0 , then 0
TU (H) ≤ C
00
T H
if
which implies Proposition A.2.
|
(U (x) − U (ξ0 ))U (x) | = O(1), (U 0 (x))2
D.2. Proof of Proposition A.3. Under the same conditions on the potential a formula similar to (31) has been obtained in [4] for the derivative (w.r.t. H) of Z a dξ √ . H − V (ξ) 0 Using the same procedure as in the previous subsection we obtain an asymptotic expression for a potential which does not satisfy condition 3. D.3. Proof of Proposition 5.3. We prove the first part of the proposition by showing that Z tn+1 dt (32) f (Hn , φn + ωn (t − tn ), pn ) √ ≥ 0 t tn for any tn+1 ≥ tn . Indeed, integrating by parts (32) we obtain that it is equal to Z tn+1 F (t) dt F (tn+1 ) √ , √ + tn+1 2 t3 tn where Z
t
F (t) =
f (Hn , φn + ωn (t − tn ), pn ) dt ≥ 0.
tn
To finish the proof it is sufficient to show that F (t) is not negative. This follows immediately from the properties of f and the choice of tn : f is periodic with zero average, it changes the sign only two times during one period, and at φn = φ(tn ) it is at the beginning of its positive phase. To prove the second part of the proposition it suffices to show that Z T0 Z t1 C dt dt √ ≥ f (H0 , φ0 + ω0 t, p0 ) f (H0 , φ0 + ω0 t, p0 ) √ ≥ √ . (33) x t t + (0) 0 0
Breakdown of Stability of Motion in Superquadratic Potentials
201
The inequality for the integrals follows from the fact that if t1 ≤ T0 (t1 ≥ T0 ) then f (t) is negative (positive) on t ∈ [t1 , T0 ] (t ∈ [T0 , t1 ]). Changing variables (t → x(t)) we obtain Z x+ Z T0 x − hxi √ f (H0 , φ0 + ω0 t, p0 )dt = 2T (H0 , p0 ) dx, 0= H0 − V (x, p0 ) 0 −x− therefore Z
hxi −x−
x − hxi √ dx + H0 − V (x, p0 )
Z
x+ hxi
√
x − hxi dx = 0. H0 − V (x, p0 )
(34)
The second integral in (33) can be decomposed into the sum of four integrals which would be equal in magnitude in the absence of the multiplying factor of √1t : Z
T0
dt f (H0 , φ0 + ω0 t, p0 ) √ = t 0 Z t(x+ ) Z t(hxi) dt dt T (H0 , p0 ) (x − hxi) √ + T (H0 , p0 ) (x − hxi) √ + t t 0 t(x+ ) Z t(−x− ) Z t(hxi) dt dt T (H0 , p0 ) (x − hxi) √ + T (H0 , p0 ) (x − hxi) √ . t t t(hxi) t(−x− ) Since t increases along the path of integration the integrals are decreasing in absolute value in the order they occur in the sum. Therefore the algebraic sum of the integrals is larger than that of the first and the fourth ones. Indeed, the sum of the second and the third integrals is positive and hence their absence decreases the whole sum. Using the equality of the integrals without the multiplying factor of √1t we have Z
T0
dt f (H0 , φ0 + ω0 t, p0 ) √ ≥ t 0 Z t(x+ ) Z t(hxi) dt dt T (H0 , p0 ) (x − hxi) √ + T (H0 , p0 ) (x − hxi) √ ≥ t t 0 t(−x− ) Z x+ 1 1 x − hxi √ T (H0 , p0 )( √ ) dx = −p t(x+ ) t(−x− ) hxi H0 − V (x, p0 ) Z x+ x − hxi t(−x− ) − t(x+ ) √ p dx ≥ Tp √ t(x+ )t(−x− )( t(x+ ) + t(−x− )) hxi H0 − V (x, p0 ) √ Z x+ Z x+ T /2 x − hxi T0 (1) x − hxi √ √ √ dx ≥ √ T dx. 5 x+ (0) hxi H0 − V (x, p0 ) T 2 T hxi H0 − V (x, p0 ) + o(x−1 We have used the formula for the period T = Tx0 (1) + ) and obvious inequalities + t(x+ ) ≤ t(−x− ) ≤ T and t(−x− ) − t(x+ ) ≤ T . Now, it remains to show that the last integral is bounded from zero by some constant. By (34) our task reduces to showing that any of the two integrals in (34) is larger than a positive constant in magnitude. Suppose for definiteness that hxi(H0 , p0 ) ≤ 0 then
202
V. Zharnitsky
Z
x+ hxi
√
x − hxi dx ≥ H0 − V (x, p0 )
Z
x+
√
0
x dx = H0 − V (x, p0 )
Z
t(x+ )
x dt. t(0)
By Corollary 4.4 t(x+ ) − t(0) ∼ 41 T and since in the autonomous system the speed decreases as the amplitude increases we have Z t(x+ ) x+ T0 (1) x+ T ≥ x dt ≥ = 0.05T0 (1). 2 8 2 10x+ t(0) Therefore we obtain the estimate 1I ≥
T0 (1)1.5 √ . 100 x+
D.4. Proof of Proposition B.1. We estimate N −1 Z tn+1 X n=0
F (H, φ, p) dt =
tn N −1 Z tn+1 X n=0
N −1 Z tn+1 X n=0
F (Hn , φn + ωn (t − tn ), pn ) dt +
(35)
tn
(F (H, φ, p) − F (Hn , φn + ωn (t − tn ), pn )) dt,
tn
where Hn , φn , ωn , pn are the values of the corresponding variables at t = tn . By Proposition 5.2 1F ≤ uC2n and since tN − t0 ∼ √u12n the second sum is of order u11.5 . 2n Now we estimate the first sum Z tn+1 Z tn +Tn F (Hn , φn + ωn (t − tn ), pn ) dt = F (Hn , φn + ωn (t − tn ), pn ) dt + tn
tn
Z
tn+1
F (Hn , φn + ωn (t − tn ), pn ) dt.
tn +Tn
The first integral is equal to zero because hF i = 0. The second integral is majorized by C|tn + Tn − tn+1 | which is the difference of the time of one revolution and the period. 1 . Since the number of the terms in the By Theorem 4.1 this difference is of order u1.75 2n √ 1 sum is of order u2n the first sum is at most of order u1.25 . 2n
E. Hamiltonian in the Action-Angle Variables In this subsection we will obtain the expression for H1 (I, φ, p) in (4). Let the Hamiltonian in the original variables be given by H = H(x, y, p(t)). Assuming that the level curves of the Hamiltonian function are simple and closed, we define the action variable I(x, y, p) as the area enclosed by the curve H(u, v, p) = H(x, y, p),
Breakdown of Stability of Motion in Superquadratic Potentials
203
I I=
y(u, H, p)du,
(36)
where y(u, H, p) is the inverse function of the Hamiltonian. The generating function of the transformation to the action-angle variables is given by Z x y(u, H0 (I, p), p)du, S(x, I, p) = 0
where H0 (I, p) is the inverse function of I(H, p) in (36), see [1] for details and motivation. The new Hamiltonian is given by ˙ K(I, φ, t) = H0 (I, p(t)) + p(t) where x is defined implicitly by
∂S = ∂p
Z
x
( 0
∂S (x, I, p(t)), ∂p
y = Sx (x, I, p) φ = SI (x, I, p),
∂y ∂H0 ∂y + )du = ∂H0 ∂p ∂p
Z
x
( 0
∂H ∂H ∂H0 ∂H / − / )du = ∂p ∂y ∂p ∂y Z x ∂H du ∂H0 − ) ( . ∂p ∂p Hy 0
Using the definition of the angular variable, see Subsect. 2.1, we have ω
du = Hy , dφ
therefore H1 takes the form Z φ Z ∂H dψ 1 ∂H0 1 φ ∂H ∂S ∂H0 = − ) = φ− dψ. ( H1 = ∂p ∂p ∂p ω ω ∂p ω 0 ∂p 0 To compute
∂H0 ∂p
we differentiate H0 (I, p) given implicitly by (36), I ∂y ∂H0 ∂y + )du. 0= ( ∂H0 ∂p ∂p
Repeating the same procedure as above we obtain Z 1 ∂H0 ∂H = dψ. ∂p 0 ∂p Finally we obtain the desired expression for H1 , Z 1 Z φ 1 ∂H ∂H (φ dψ − dψ). H1 (I, φ, p) = ω(I, p) ∂p ∂p 0 0
Acknowledgement. I would like to thank Mark Levi for suggesting to me this problem and for many important comments.
204
V. Zharnitsky
References 1. Arnold, V.: Mathematical Methods of Classical Mechanics. 2. Dieckerhoff, R., Zehnder, E.: An “a priori” estimate for oscillatory equation. Dyn. Syst. and Bifurcations. Groningen, 1984, LNM 1125, Berlin–Heidelberg–New York: Springer-Verlag, 1985, pp. 9–14 3. Dieckerhoff, R., Zehnder, E.: Boundedness of solutions via the twist-theorem. Ann. Scuola Norm. Sup. Pisa 14(1), 79–75 (1987) 4. Levi, M.: Quasiperiodic motions in superquadratic time-periodic potentials. Commun. Math. Phys. 143, 43–83 (1991) 5. Levi, M.: On the Littlewood’s counterexample of unbounded motions in superquadratic potentials. Dynamics Reported (1993) 6. Littlewood, J.E.: Unbounded solutions of an equation y¨ + g(y) = p(t), with p(t) periodic and bounded and g(y)/y → ∞ as y → ±∞. J. Lond. Math. Soc. 41, 497–507 (1966) 7. Liu, B.: Boundedness for solutions of nonlinear Hill’s equations with periodic forcing terms via Moser’s twist theorem. J. Diff. Equations, 79, 304–315 (1989) 8. Long, Y.. An unbounded solution of a superlinear Duffing’s equation. Acta Math. Sinica, New Series, 7, 360–369 (1991) 9. Morris, G.R.: A case of boundedness in Littlewood’s problem on oscillatory differential equations. Bull. Austr. Math. Soc. 14, 71–93 (1976) 10. Moser, J.K.: Stable and Random Motions in Dynamical Systems. Princeton: Princeton University Press, 1973 11. Norris, J.W.: Boundedness in periodically forced second order conservative systems. J. London Math. Soc. (2) 45, 97–112 (1992) 12. Russman, H.: On the existence of invariant curves of twist mapping of an annulus. Geometric Dynamics. J. Palis (ed.), LNM, vol. 1007, Berlin–Heidelberg–New York: Springer-Verlag, 1981, pp. 677–718 13. Siegel, C., Moser, J.K.: Lectures on Celestial Mechanics. Berlin–Heidelberg–New York: SpringerVetlag, 1971 14. You, J.: The boundedness of solutions and the existence of pseudo-periodic solutions of superlinear Duffing’s equations. Science in China, 1992 Communicated by M. Herman
Commun. Math. Phys. 189, 205 – 226 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Long Time Stability of Some Small Amplitude Solutions in Nonlinear Schr¨odinger Equations Dario Bambusi Dipartimento di Matematica dell’Universit`a, Via Saldini 50, 20133 Milano, Italy. E-mail:
[email protected] Received: 10 December 1996 / Accepted: 17 March 1997
Abstract: We consider small perturbations of the Zakharov–Shabat nonlinear Schr¨odinger equation on [0, π] with vanishing or periodic boundary conditions; we prove a Nekhoroshev type result for solutions starting in the neighbourhood (in the H 1 topology) of the majority of small amplitude finite dimensional invariant tori of the linearized system. More precisely we will prove that along the considered solutions all the actions of the linearized system are approximatively constant up to times growing exponentially with the inverse of a suitable small parameter. 1. Introduction We consider the equation iut + uxx ± πu|u|2 =
∂H(x, u, u) ¯ , ∂ u¯
(1.1)
with vanishing or periodic boundary conditions on [0, π]. Here H is a sufficiently smooth real valued function (the coefficient π in front of the third order term has been introduced for future convenience). When = 0 Eq. (1.1) reduces to the Zakharov–Shabat (ZS) equation which is well known to be integrable. So, it has an infinite sequence of constant of motion and, for any finite N , there exist families of N dimensional invariant tori (finite gap solutions) which, as N → ∞, fill densely the phase space. In the case 6= 0, it is known that there exists a sequence N such that, if || ≤ N , then the majority of the above N dimensional tori can be continued to invariant tori of (1.1) [10, 14, 11, 6] (see also [7]). However N → 0 as N → ∞, and therefore the above results give a good insight on the dynamics of (1.1) only in compact subsets of the phase space. On the contrary, very little is known [5] on the dynamics in open subsets of the phase space (a detailed discussion of [5] will be given in Sect. 4).
206
D. Bambusi
In the present paper we give a bound on the long time diffusion of the actions for initial data in open subsets of the phase space, in the spirit of Nekhoroshev’s theorem [12, 13]. Actually, our result restricts to solutions which (i) are of small amplitude, and (ii) correspond to initial data close to the majority of finite dimensional invariant tori of the linearized system, namely those fulfilling a suitable (generic) nonresonance condition. The reasons of the limitations (i) and (ii) are as follows: due to (i) we avoid all difficulties related to the actual integration of the ZS equation. Indeed, as pointed out in [11], the ZS equation can be approximated, in a neighbourhood of the origin, with the integrable nonlinear system given by the first order Birkhoff normal form of the ZS equation itself (averaged system). The advantage is that the Hamiltonian function of the averaged system can be explicitly calculated in terms of the action variables. In particular this allows one to study the dependence of the frequency on the initial datum, and to show that the majority (in a measure sense) of finite dimensional invariant tori have some good nonresonance properties. Our result concerns precisely the dynamics in a neighbourhood of such nonresonant tori. It turns out that the condition of being close to such tori can be formulated in terms of invariant tori of the linearized system; this gives rise to condition (ii). The paper is organized as follows. In Sect. 2 we give a precise statement of our main result (Theorem 2.1), and in Sect. 3 we present the scheme of its proof. Section 4 contains a discussion on the relation of the present result with some recent works on long time behaviour in nonlinear PDE’s. Sections 5, 6, 7 constitute the technical part of the paper. In Sect. 5 we perform a transformation putting the system in first order Birkhoff normal form, and give the Hamiltonian a form adapted for the subsequent steps of the proof. In Sect. 6 we state the abstract normal form theorem which constitutes the main technical result of the paper as well as the main step for the proof of Theorem 2.1; in our opinion this could have some independent interest. In that section we also give the original elements leading to the proof of the normal form result. In Sect. 7 we apply the normal form theorem to the nonlinear Schr¨odinger equation, and complete the proof of Theorem 2.1. We also give a slightly more precise description of the dynamics.
2. Statement of the Main Result We come now to a precise statement of our result. For definiteness we first consider Eq. (1.1) with vanishing boundary conditions on [0, π]; concerning the function H we assume that the application C2 3 (y, z) 7→ H(., y, z) takes values in H01 ([0, π], C) (Sobolev space of L2 functions having L2 derivatives and satisfying u(0) = u(π) = 0) and moreover is analytic in a neighbourhood of the origin as means that, considan application from C2 to H01 ([0, π], C). We recall that this simplyP ering the Taylor expansion of H in y and z, namely H(x, y, z) = k,l≥0 Hkl (x)y k z l , the series X kHkl (.)kH 1 |y|l |z|k (2.1) k,l≥0
is convergent in a neighbourhood of the origin. Consider now the Fourier expansion of u, namely
On Long Time Stability in Nonlinear Schr¨odinger Equations
u(x) =
X
r uk
k≥1
207
2 sin kx , π
(2.2)
and define the action variables Zk := |uk |2 /2 of the linearized system. Fix an arbitrary N dimensional diophantine vector (ω1 , . . . , ωN ) ∈ RN , namely a vector such that there exist τ ≥ 0 and β > 0 with the property that |
N X
ωi ki | ≥
i=1
β , |k|τ
∀ k ∈ ZN \ {0} ;
(2.3)
we also assume ωi > 0, i = 1, . . . , N . A rescaling of the vector ω := (ω1 , . . . , ωN , 0, 0, 0, 0 . . .) ∈ R∞ , will be used as the reference value of the actions. As anticipated above, our main result deals with small amplitude solutions. In the statement of the forthcoming theorem the role of order of magnitude of the solution will be played by the parameter R. Theorem 2.1. Under the above assumptions there exist positive constants µ∗ , C1 , . . . , C4t with the following property: fix R > 0 and assume that µ := R +
|| ≤ µ∗ ; R5
then provided the initial value of the actions is sufficiently close to R2 ω, precisely satisfies X k 2 Zk (0) − R2 ωk ≤ C1 µ1/(τ +1) R3 , (2.4) k≥1
where ωi := 0 for i ≥ N +1, then the value of the actions remains close to R2 ω, precisely satisfies X k 2 Zk (t) − R2 ωk ≤ C2 µ1/(τ +1) R3 (2.5) k≥1
for exponentially long times, namely for all times t satisfying " # 1/(τ +1) C3 C4 |t| ≤ exp . µR2 µ Consider now the N -dimensional torus |uk |2 MR,ω : = u ∈ H01 ([0, π], C) : 2
(2.6)
= R ωk , k = 1, . . . , N ; uk = 0 , k ≥ N + 1 2
(2.7) ,
and denote by dH 1 (., .) the distance in the H 1 norm, then the following corollary holds: Corollary 2.2. Provided µ ≤ µ∗ , there exist positive constants C5 , C6 such that solutions of (1.1) corresponding to initial data u0 satisfying dH 1 (u0 , MR,ω ) ≤ C5 R2 µ1/(τ +1) , remain close to MR,ω , namely satisfy dH 1 (u(t), MR,ω ) ≤ C6 R2 µ1/(τ +1) , for exponentially long times, namely for times satisfying (2.6).
208
D. Bambusi
A particularly interesting case of (1.1) is 2
iut + uxx = ug(|u| ) ,
(2.8)
where g : R → R is analytic in a neighbourhood of the origin, and satisfies g(0) = 0, g 0 (0) 6= 0. For such an equation we have the following Theorem 2.3. Consider Eq. (2.8); there exist positive constants R∗ , C7 , . . . , C10 such that for any positive R < R∗ the following holds true: if the initial value of the actions is sufficiently close to R2 ω, namely satisfies X
i2 Zi (0) − R2 ωi ≤ C7 R3 R1/(τ +1) ,
(2.9)
i2 Zi (t) − R2 ωi ≤ C8 R3 R1/(τ +1) ,
(2.10)
i≥1
then one has
X i≥1
for exponentially long times, namely for C9 |t| ≤ 3 exp R
"
C10 R
1/(τ +1) # .
(2.11)
Remark 2.4. Theorem 2.3, exactly in the same form, holds for Eq. (1.1) with = 1, if the series (2.1) starts with terms of the 6th order (l + k ≥ 6). Remark 2.5. Strictly speaking Theorem 2.3 is not a corollary of Theorem 2.1, indeed in this case one has := R6 , and H(u, u) ¯ =
1 2 2 [G(|u| ) − G0 (0) |u| ] , R6
where G is a primitive of g with G(0) = 0, and therefore H depends also on . However, as it will be clear, the proof of Theorem 1.1 holds also in the case of Eq. (2.8). In the case of periodic boundary conditions Theorem 2.1 (and also Corollary 2.2 and Theorem 2.3) hold essentially unchanged, the only difference being that the Fourier expansion has to be done on the basis eikx , k ∈ Z, so that the index in (2.4) and (2.5) runs over Z, and the coefficient of Z0 − R2 ω0 is 1; moreover, the vector ω has to satisfy ω0 > 0. The proof of Theorem 2.1 will be obtained using a method which we expect to work in quite general situations (see Sect. 3 for more details). For definiteness in all the rest of the paper we will consider only Eq. (1.1) with vanishing boundary conditions.
On Long Time Stability in Nonlinear Schr¨odinger Equations
209
3. Scheme of the Proof As anticipated above we first consider Eq. (1.1) as a perturbation of the completely resonant linearized system iu˙ + uxx = 0, and, following [11] (see also [6]), we perform a first order averaging. Then we use the nonlinearity of the averaged system to tune suitably the frequency of the unperturbed motions around which we study the dynamics of Eq. (1.1). So, we consider the invariant torus of the averaged system on which the actions Zk are equal to R2 ω (ω as in Theorem 2.1). Expanding the Hamiltonian in a neighbourhood of this torus we obtain H=
N X
ν˜ k Ik +
k=1
X
ω˜ k Zk + higher order terms ,
(3.1)
k≥N +1
where ν˜ k are the frequencies of the motions on the torus and ω˜ k are the frequencies the small oscillations around it (and Ik := Zk − R2 ωk ); in terms of ω such frequencies are given by N N X X ωi (4 − δik ) , ω˜ k := k 2 + R2 4ωi . (3.2) ν˜ k := k 2 + R2 i=1
i=1
Subsequently we develop normal form theory for system (3.1). From the technical point of view this is the most difficult step, the main difficulty being related to the fact that ω˜ k → ∞ when k → ∞. To illustrate this point remark that (at least formally) it is possible to introduce suitable action angle variables (J1 , . . . , J∞ ; ψ1 , . . . , ψ∞ ) for PN P the main part of (3.1) by defining Jk := −Ik + 4( i=1 Ii + i≥N +1 Zi ), k = 1, . . . , N , PN 2 P JN +1 := i=1 i Ii + i≥N +1 i2 Zi and completing to a canonical transformation. In such a way system (3.1) takes the form R
2
N X
ωk Jk + JN +1 + higher order terms .
(3.3)
k=1
Provided (ω1 , . . . , ωN ) is diophantine we are thus reduced to a perturbation of an integrable isochronous system with a finite number of nonresonant frequencies1 . Thus one expects that it should be possible to put the system in normal form up to an exponentially small remainder in order to conclude that J1 , . . . , JN +1 are approximate constant of motion. However, it is immediate to realize that, since the flow generated by JN +1 is essentially the flow of the linear Schr¨odinger equation, the angle ψN +1 is a continuous non-differentiable function of the phase point. As a consequence the perturbation, when expressed in terms of these action angle variables depends only continuously on the angle ψN +1 . In particular this has the consequence that one cannot use Fourier series to solve the homological equation, and to develop normal form theory. We solve the above difficulty by introducing a new method for the solution of the homological equation which avoids the introduction of the angle ψN +1 , and even an implicit use of the Fourier expansion in such an angle (see Lemma 6.4 below). Such a method is based on the fact that there is only one bad angle. We point out that one could proceed also in a different way, namely exploiting the fact that the dynamics generated by JN +1 is much faster than the dynamics generated by 1 Actually there could be some resonances between R2 ω on the one side and 1 on the other side, however i we will prove that due to the R2 factor such resonances are of such high order that they do not matter.
210
D. Bambusi
PN R2 k=1 ωk Jk . We think that Theorem 6.2 giving a normal form result for Hamiltonian systems of the form (3.3) (without making use of the different time scales quoted above) could be interesting in itself. Besides the above difficulty there is the well known problem that action variables are singular at the origin, but such a problem can be solved by the same method developed in [3]. So, we are able to prove a normal form result in the vicinity of a nonresonant torus. This allows to conclude that the functions J1 , . . . , JN +1 are small for exponentially long times provided the solution does not leave the set where the normal form holds. We point out that the above procedure can be used also to study perturbation of integrable hamiltonian PDE’s which (i) are nonlinear and (ii) have an equilibrium point with the property that the spectrum of small oscillations is an infinite sequence of real numbers which are linear combination with integer coefficients of a finite number of fundamental frequencies. The final step to prove Theorem 2.1 is to use the functions J1 , . . . , JN +1 as Lyapunof functions in order to prove that all the actions are approximate constants of motions (see Lemma 7.4) and the solution does not leave the set where the normal form holds up to exponentially long times. 4. Discussion We compare here our result with some recent works on long time behaviour in nonlinear PDE’s. In particular we will consider [5] and [4]. In [5] (see also [6]) Bourgain proved a general theorem on the long time behaviour in the nonlinear Schr¨odinger equation iu˙ + uxx − V (x)u =
∂H(u, u) ¯ , ∂ u¯
u(0) = u(π) = 0 ,
(4.1)
where V is a typical smooth potential. Bourgain’s theorem applies to H s (s ≥ 1) initial data such that “the norm stored in the high frequency modes” is very small, namely such that the relation v u X u |ui (0)|2 t ≤ CM i2s (4.2) 2 i≥N +1
holds for some positive (large) M (here ui (0) is the component of the initial datum on the ith eigenvector of −∂xx + V ). The conclusion is that the solution is O(M ) close (in the H s topology) to a quasiperiodic motion up to times of order −M . Bourgain’s theorem does not apply directly to Eq. (1.1), but we think that it should apply to a neighbourhood of the manifold MR,ω , at least in the case of vanishing boundary conditions. The technique of the present paper allows to improve Bourgain’s result in the case of H 1 initial data; the improvement being essentially that condition (4.2) on the initial datum is removed, and that the time scales on which the dynamics is controlled are much longer than those considered by Bourgain. Moreover the present method allows to deal with the case of periodic boundary conditions, which seems difficult to be treated with Bourgain’s technique. However, in the general case of perturbations of linear systems (like Eq. (4.1)) Bourgain’s method is much more general (in a measure sense) than that of the present paper. We come now to the work [4], where the perturbation of a completely resonant linear system was studied. In that paper a method was given for constructing a family of closed
On Long Time Stability in Nonlinear Schr¨odinger Equations
211
phase curves which are approximate solutions of the considered system, and are stable over exponentially long times. Such a method was also applied to the nonlinear wave equation. We expect that the method of [4] applies also to Eq. (1.1), allowing to prove the existence of a family of closed curves with the property that solutions starting O(Rb ) close (in the H 1 topology) to one of these curves remain O(Rb/2 ) close to it (in the same topology) up to times of order Rb exp(c/R2 ). The result of the present paper is much stronger since it applies to initial data close to the majority of finite dimensional tori, instead of initial data close to the particular curves selected by the method of [4]. On the other hand we point out that the method of [4] applies to generic perturbations of the linear Schr¨odinger equation, while the method of the present paper strongly relies on the integrability and on the nonlinearity of the ZS equation. We also recall the works [1–3]; the technique of the present paper is essentially a generalization of the techniques developed in those papers; it turns out that the main abstract theorems of [1] and [3] are contained in Theorem 6.2 of the present paper. 5. First Normalization It is well known that Eq. (1.1) with vanishing boundary conditions is Hamiltonian, e.g., on the phase space P = H01 ([0, π], C) endowed with the scalar product h., .i, and with the symplectic form (., .) defined respectively by Z π Z π u¯ x vx dx , (u, v) := Re uivdx ¯ . (5.1) hu, vi := Re 0
0
We recall that the Hamiltonian vector field ∇ f corresponding to a differentiable function f : P → R is defined by (∇ f (u), X) = df (u)X ,
∀X ∈ P ,
so that the Hamiltonian function corresponding to (1.1) is given by Z Z Z π π π 4 1 π 2 |ux | dx + |u| dx + H(x, u(x), u(x))dx ¯ . 2 0 4 0 0 We will also denote h0 (u) :=
1 2
Z
π
(5.2)
(5.3)
2
|ux | dx .
0
It is useful (and possible) to introduce canonical coordinates. To this end consider the Fourier expansion of u (cf. Eq. (2.2)) and define qk := Im uk ,
pk := Re uk ,
then (pk , qk ) are the canonical conjugated coordinates (which obviously are not orthonormal). Remark that in terms of these coordinates one has h0 (u) =
X p2 + q 2 1X 2 k . k |uk |2 = k2 k 2 2 k≥1
(5.4)
k≥1
Following [11] we now put (5.3) in normal form up to quantities of order |u|6 + ||.
212
D. Bambusi
So, consider the complexification P C of P, i.e. allow pk and qk to assume complex values, and denote by BR ⊂ P C the (complex) ball of radius R and center at the origin. We will also use the notation a b to mean “there exists a positive constant C independent of R and such that a ≤ Cb”. Lemma 5.1. Consider the Hamiltonian system (5.3) in the domain BR . There exists a positive µ] such that, if || + R6 ≤ µ] , then the following holds true: there exists an analytic canonical transformation T1 such that 1. T1 is defined on B3R/4 , one has BR/2 ⊂ T1 (B3R/4 ) ⊂ BR and sup ku − T1 (u)kH 1 R3 .
B3R/4
2.
H ◦ T1 = h0 + hf i + f˜ , where hf i(u) :=
2 1 X 1 X 2 |uk | − |uk |4 , 2 8 k≥1
(5.5)
(5.6) (5.7)
k≥1
f˜ is analytic on BR/2 and satisfies the estimates
|| 1 sup ∇ f˜(u) H 1 R4 + 2 , R u∈BR/2 R sup f˜(u) R6 + || . u∈BR/2
(5.8)
3. For any analytic function g : BR → B, with B a Banach space, one has sup kg(u) − g(T1 (u))kB R2 sup kg(u)kB ,
u∈BR/2
u∈BR
(5.9)
and the same estimate holds for T1−1 . Such a lemma is just a reformulation of Lemma 4 of [11]. We gave it a form suitable for our purpose. Proof. The proof can be obtained by standard perturbation theory. For example one can define the transformation T1 as the time one flow of a Hamiltonian system with Hamiltonian function χ given by Z 2π 1 t f (eAt u) − hf i(eAt u) dt , (5.10) χ(u) := 2π 0 where eAt is the flow generated by the linearized system (A = −∂xx ), and Z Z 2π 1 π π 4 |u| dx , hf i(u) := f (eAt u)dt , f (u) := 4 0 2π 0
(5.11)
are respectively the Hamiltonian of the nonlinear part of the ZS equation and its average with respect to the unperturbed flow. The estimate (5.8) can be obtained exploiting the relations
On Long Time Stability in Nonlinear Schr¨odinger Equations
∇ χ =
1 2π
Z
2π
te−At ∇ f (eAt u) − ∇ hf i(eAt u) dt ,
213
(5.12)
0
(which follows from the fact that eAt is a canonical transformation) and
1 sup ∇ g sup ∇ g1 , sup ∇ {g, g1 } ≤ Rd Br BR(1−d) Br
(5.13)
which holds for any two functions whose Hamiltonian vector field is analytic. Equation (5.13) follows from ∇ {g, g1 } = [∇ g, ∇ g1 ] (for the complete proof of (5.13) see [1], Lemma 5.2). For more details on the estimate (5.8) see [11] or [1]. One has also to prove that hf i, as defined by (5.11) coincides with the function (5.7). This can be obtained by explicit calculation exactly as in [11] (see in particular Lemma 5); the details are omitted. Remark 5.2. The proof has been obtained by performing a resonant normal form, i.e. by averaging the first order part of the perturbation with respect to the linear flow. The resonant normal form thus obtained turns out to coincide with the nonresonant normal form obtained in [11]. This is not surprising since along a period of the flow of the linearized system all the angles conjugated to the linear actions vary by an integer multiple of 2π. Remark 5.3. Particular useful cases of (5.9) are sup |Zk (u) − Zk (T1 (u))| R4 ,
BR/2
X X k 2 Zk (u) − k 2 Zk (T1 (u)) R4 . sup BR/2 k≥N +1 k≥N +1
(5.14)
(5.15)
We concentrate now on system (5.6). Remark that it is a perturbation of the integrable system h0 +hf i, for which action variables are Zk = (p2k +qk2 )/2. We will use perturbation theory to construct a local normal form around a nonresonant finite dimensional invariant torus of such an integrable system. We give now a suitable form to our Hamiltonian. Fix a positive integer N and introduce the first N action angle variables for our system, namely define (Zi , φi ) by p p pi = 2Zi cos φi , qi = 2Zi sin φi , i = 1, . . . , N . Remark 5.4. Outside the set Zi = 0 for some i, this is an analytic change of coordinates, and therefore f˜ is analytic in terms of these coordinates. As in the introduction we fix an N -dimensional vector (ω1 , . . . , ωN ) with positive entries (ωi > 0, ∀ i = 1, . . . , N ), which we assume to be diophantine, precisely to satisfy PN (2.3); we also assume i=1 i2 ωi ≤ 1/8, the general case in which such a quantity is arbitrary can be dealt with with minor modifications. In what follows we will denote Ii := Zi − R2 ωi .
214
D. Bambusi
Remark 5.5. By our assumption on ω, the torus M := {(I, φ, p, q) ∈ P : Ik = pl = ql = 0, k = 1, . . . , N, l ≥ N + 1}
(5.16)
is in BR/4 , therefore it is in the domain of definition of T1 ; by (5.5) it is O(R3 ) close in the H 1 norm to the torus MR,ω defined by (2.7). We also introduce the notations r := R1.5 ,
N
ζ = ({Ii , φi }i=1 , {pk , qk }k≥N +1 ) .
(5.17)
In what follows the role of perturbative parameter will be played by r. We introduce now a new r−dependent norm in P, by kζk :=
N X i2 |Ii | i=1
r
v u X u |pi |2 + |qi |2 + sup r|φi | + t , i2 2 i=1,...,N
(5.18)
i≥N +1
(recall that r = R1.5 ) which is clearly equivalent to the H 1 norm used before. From now on we will use only this new norm, except when explicitly specified. In what follows we will consider domains of the form [ Bρ (I, φ, p, q) , Gρ := φ∈TN
where Bρ (I, φ, p, q) = Bρ (ζ) := {ζ1 ∈ P : kζ − ζ1 k ≤ ρ} .
(5.19)
The norm (5.18) is useful since in the domain Gr the actions I and the coordinates (p, q) are small, but the angles φ can change by quantities of order one also in the complex directions. Remark 5.6. For R (and therefore r) small enough we have G4r ⊂ BR/2 ,
(5.20)
and therefore G4r (which will be of particular interest) is contained in the domain of definition of T1 . Remark 5.7. Let u ∈ H01 be such that its coordinates ζ are real and belong to G4r , then one has (for R small enough) dH 1 (u, M) dP (ζ, M) , R dP (ζ, M) dH 1 (u, M) , r where dP (., .) is the distance in the norm (5.18).
On Long Time Stability in Nonlinear Schr¨odinger Equations
215
Proposition 5.8. In terms of the variables (I, φ, p, q), the Hamiltonian (5.6) can be written in the form N +1 X νi Ji (ζ) + f¯(ζ) , (5.21) H(ζ) = i=1
where
N X X p2 + q 2 k k Ji (ζ) := −Ii + 4 , Ik + 2 k=1
JN +1 (ζ) :=
N X
X
i2 Ii +
i=1
νi := R2 ωi ,
i≥N +1
i = 1, . . . , N ,
k≥N +1
i2
p2i + qi2 , 2
i = 1, . . . , N ;
(5.22) νN +1 := 1 .
Moreover there exists r∗ such that, for r ≤ r∗ the estimate
|| 1 sup ∇ f¯ r2 + 2 r G2r r
(5.23)
holds, and f¯ is analytic in G2r . Remark 5.9. The flow of νi Ji , is periodic with period Ti := 2π/νi . PN ˆ Proof. A simple calculation shows that (apart from a constant) hf i = i=1 νi Ji + h, where hf i is the function (5.7) expressed in terms of the new variables, and N N X X X X 1 hˆ := 2( Ik + Z k )2 − Ik2 + Zk2 . (5.24) 2 k=1
k≥N +1
k=1
k≥N +1
Define f¯ := hˆ + f˜. We estimate now ∇ f˜ in the new norm. One has v v ÿ u ! uX 2 ÿ ! u X k 2 ∂ f˜ 2 ∂ f˜ 2 u ∂ f˜ 2 ∂ f˜ 2 k t + ≤t + 2 ∂pk ∂qk 2 ∂pk ∂qk k≥N +1
k≥1
R5 + Consider now
|| || < r3 + . R r
N X ∂ f˜ i2 ∂ f˜ + sup ∂Ii r . r ∂φi i i=1
Notice that, due to (5.20) and (5.18), f˜, as a function of φi is analytic in the complex strip |Im φi | 1. It follows, by Cauchy inequality, that ∂ f˜ sup |f˜| R6 + || = r4 + || . sup G2r ∂φi G3r Analogously
216
D. Bambusi
∂ f˜ 1 sup |f˜| r2 + || , sup r2 r2 G2r ∂Ii G3r
and therefore ∇ f˜ r3 + ||/r. The estimate of ∇ hˆ is a straightforward calculation that we omit. 6. Perturbation Theory for Hamiltonians of the Form (5.21) System (5.21) has the form of a small perturbation of an integrable isochronous system whose dynamics is continuous but not differentiable and quasiperiodic with a finite number of frequencies; we will find a canonical transformation putting it in resonant normal form up to an exponentially small remainder. We will prove a result for abstract systems of the form (5.21) without taking into account the fact that ν is of order R2 ; in our opinion such a result could have some independent interest. So, consider a weakly symplectic [8] Banach space P endowed by the symplectic structure . Let P C be the complexification of P, and let G ⊂ P be a domain. For each ρ > 0 define the complex extension of G by [ Bρ (ζ) , Gρ := ζ∈G
where Bρ (ζ) is the (complex) ball of radius ρ and center ζ. We fix once and for all a real number r > 0 and consider, in G2r an analytic Hamiltonian function of the form N +1 X
hi + f .
(6.1)
i=1
Essentially we will assume that the hi ’s commute and that each of them generates a periodic flow 8i , i = 1, . . . , N + 1; finally we assume that the flows 8i i = 1, . . . , N are analytic in time, while 8N +1 is only continuous. The precise assumptions we need are 1) For i = 1, . . . , N the symplectic gradient ∇ hi (defined by (5.2)) exists in G2r and is analytic, moreover there exists a domain D dense in G2r , where ∇ hN +1 exists. 2) There exist positive constants Ti , i = 1, . . . , N + 1 such that the flow 8i generated by ∇ hi is periodic with period Ti , namely 8it+Ti = 8it ,
i = 1, . . . , N + 1 ,
moreover 8it (Gρ ) ⊂ Gρ ∀ ρ < 2r. 3) The functions hi commute each other, namely one has i j h , h ≡ 0 , i, j = 1, . . . , N + 1 ,
i<j.
(6.2)
Remark 6.1. Using the flows 8i , i = 1, . . . , N one can define an action 9 of TN on Gρ , namely 9ψ1 ,...,ψN (ζ) := 81ψ1 /ν1 ◦ 82ψ2 /ν2 ◦ · · · ◦ 8N ψN /νN (ζ) , where νi = 2π/Ti , i = 1, . . . , N .
On Long Time Stability in Nonlinear Schr¨odinger Equations
217
We assume 4) Denote by TN + iσ the set of the ψ’s belonging to the complexified N -dimensional torus such that |Im ψi | ≤ σ, then we assume that there exists a σ > 0 such that ∀ ζ ∈ Gr , the function TN 3 ψ 7→ 9ψ (ζ) ∈ Gr can be extended to a complex analytic function from Tn + iσ to G2r . +1 +1 (D) ⊂ D, the map 8N leaves invariant Gρ for each ρ ≤ 2r and is analytic (for 5) 8N t t +1 0 each fixed t); moreover it is a canonical transformation, i.e. its differential (8N ) (ζ) t satisfies i h +1 0 +1 0 (ζ)X, 8N (ζ)Y = (X, Y ) , 8N t t and satisfies also the estimate
+1 0 (ζ) sup 8N
t
ζ∈Gr
P,P
≤1.
6) The different periodic motions have frequencies which form a vector close to a diophantine one; precisely there exist γ > 0, τ ≥ 0 and M > 0 such that N N X X γ N +1 ν k + ωn ≥ , ∀ (k, n) ∈ Z \ {0} , |k| := |ki | ≤ M , (6.3) i i |k|τ i=1
i=1
here ω := 2π/TN +1 , νi = 2π/Ti , i = 1, . . . , N . Theorem 6.2. Consider the Hamiltonian system (6.1); assume that it is analytic over G2r , that it satisfies (1–6) above; assume also that ∇ f is analytic on G2r , and that there exist constants ωf , cT , such that
1 sup ∇ f ≤ ωf , r ζ∈G2r
sup 90ψ (ζ) P,P ≤ cT , ∀ ψ ∈ TN + iσ .
ζ∈Gr
Define the dimensionless parameter µ := 24(K1 )τ e2τ where
ω˜ f := cT
1 + e−σ/2 1 − e−σ/2
π ω˜ f , γ
N
τ + 1 + log 2 K1 := max 1, σ/2
ωf ,
.
Then the following holds true: if µ < 1 and M≥
K1 , µ1/(τ +1)
(6.4)
then there exists an analytical canonical transformation T : Gr/2 → G3r/4 , with T (Gr/2 ) ⊃ Gr/4 , such that H ◦ T turns out to be of the form H(T (ζ)) =
N +1 X i=1
where
hi (ζ) + Z(ζ) + R(ζ) ,
(6.5)
218
D. Bambusi
R1) Z is analytic together with ∇ Z; it is in normal form, namely satisfies Z, hi ≡ 0, i = 1, . . . , N + 1, and moreover satisfies the estimate
1 sup ∇ Z(ζ) ≤ ω˜ f (e + 1) ; r ζ∈Gr/2
(6.6)
R2) R is an exponentially small remainder, namely one has " 1/(τ +1) #
1 1 τ +2 1/(τ +1)
Sup ∇ R(ζ) P ≤ e ω˜ f µ ; exp −(τ + 1 − log 2) r ζ∈Gr/2 µ R3) for any analytic function g : Gr → B, where B is a Banach space, one has Sup kg(ζ) − g(T (ζ))kB ≤ µ1/(τ +1) Sup kg(ζ)kB .
ζ∈Gr/2
ζ∈Gr
(6.7)
The remainder of this section is devoted to the scheme of the proof of this theorem. It is almost identical to the proof of Theorem 4.1 of [3], the only variant being the technique used to solve the homological equation. First we introduce a suitable Fourier expansion for functions: related to the action 9 we define the “k th Fourier coefficient” gˆ k (ζ) of a function g : P C → B, where B is a Banach space, by Z 1 g 9ψ (ζ) eik·ψ dN ψ . (6.8) gˆ k (ζ) := N (2π) TN It is easy to see that for smooth functions one has X e−ik·ψ gˆ k (ζ) , g(9ψ (ζ)) = k∈ZN
so that, in particular g(ζ) =
X
gˆ k (ζ) .
k∈ZN
Remark 6.3. Using standard arguments and hypothesis 4) of the theorem it is possible to prove exponential decay of the Fourier coefficients of an analytic function; moreover, if a function g has symplectic gradient which is analytic then the following estimate holds:
(6.9) sup ∇ gˆ k ≤ cT e−σ|k| sup ∇ g Gr
G2r
(for the proof see [3] Lemma 7.4) Then we make the so called ultraviolet cutoff. So we fix K ∈ N+ , and correspondingly we give the following: Definition. A function g : Gr → C is said to be of class Πl if one has X g(ζ) = gˆ k (ζ) . |k|
On Long Time Stability in Nonlinear Schr¨odinger Equations
219
Next, decompose the perturbation in terms of class Πs : X fs , f= s≥1
X
where fs :=
fˆk .
(s−1)K≤|k|<sK
From (6.9) it follows
sup ∇ fs ≤ ω˜ f e−σ(s−1)K/2 Gr
(for the proof see [9] Lemma 8), which allows (provided K is sufficiently large) to consider fs of order s. At this point, provided one is able to solve the homological equation he can use any formal algorithm to develop perturbation theory for the Hamiltonian H=
N +1 X i=1
hi +
X
fs ,
s≥1
where fs ∈ Πs is of order s. We recall that the homological equation has the form {hω , χ} (ζ) + Z(ζ) = g(ζ) ,
(6.10)
PN +1 where hω := i=1 hi , and g are given, while the unknowns are Z which has to satisfy Z, hi ≡ 0, and χ. Concerning the homological equation we have the following Lemma 6.4. Let g : Gr → C be an analytic function of class Πs with analytic symplectic gradient, then the homological equation (6.10) has a solution Z TN +1 1 +1 gˆ 0 (8N (ζ))ds , (6.11) Z(ζ) := s TN +1 0 where gˆ 0 is the 0th Fourier coefficient of g, and χ ∈ Πs , which is defined by Z TN +1 1 +1 sZˆ0 (8N (ζ))ds , χˆ 0 (ζ) := s TN +1 0
(6.12)
where Z := g − Z, and χˆ k (ζ) :=
e−i 1−e
ν·k ω 2π
−i ν·k ω 2π
Z
TN +1 0
+1 eiν·ks gˆ k (8N (ζ))ds , s
0 6= |k| ≤ Ks ,
(6.13)
The function χ has the same analyticity properties of g; moreover for any ρ ≤ r, the following inequalities hold:
π(Ks)τ sup ∇ gˆ k (ζ) , sup ∇ χˆ k (ζ) ≤ γ ζ∈Gρ ζ∈Gρ
sup ∇ Z(ζ) ≤ sup ∇ gˆ 0 (ζ) . ζ∈Gρ
ζ∈Gρ
∀ k ∈ Z\ {0} , (6.14)
220
D. Bambusi
Proof. First we define Z by (6.11). Then we will solve the equation {hω , χ} (ζ) = Z(ζ) .
(6.15)
Consider {hω , χ}: one has {hω , χ} =
N X
νj
j=1
d ∂ +1 (χ ◦ 9 ) + (χ ◦ 8N ); ψ t ∂ψj ψ=0 dt t=0
(6.16)
passing to Fourier series we get that the r.h.s. of (6.16) is equal to X X X d d N +1 ∧ +1 iνj kj χˆ k + = iν·k χˆ k + χˆ k ◦8N , χ ◦ 8t t k dt dt t=0 t=0 N N N N
N X X j=1 k∈Z
k∈Z
k∈Z
k∈Z
(6.17) where the last equality follows from (6.2). So, writing Eq. (6.15) in Fourier components we get d +1 = Zˆk . (6.18) iν · k χˆ k + χˆ k ◦ 8N t dt t=0
Actually we will solve the equation d +1 +1 +1 χˆ k ◦ 8N . + = Zˆk ◦ 8N iν · k χˆ k ◦ 8N t t t dt
(6.19)
Applying both sides of (6.19) to a fixed ζ, we obtain a linear nonhomogenous ordinary differential equation which can be solved explicitly. We need a solution periodic with +1 (ζ). It is convenient to study period TN +1 . We will simply denote χk (t) for χˆ k ◦ 8N t separately the case k = 0. Indeed in such a case (6.19) reduces to χ˙ 0 = Zˆ0 , whose solution can be written in the form Z TN +1 1 sZˆ0 (s + t)ds , χ0 (t) := TN +1 0 from which we get (6.12). Such a quantity is analytic and actually solves our equation. Moreover, from hypothesis 5) it follows that ∇ χˆ 0 satisfies a formula similar to (5.12) +1 0 (ζ) in place of eAt ), and therefore one has (with 8N t
TN +1
sup ∇ Zˆ0 . sup ∇ χˆ 0 ≤ 2 Gρ Gρ In the case k 6= 0 the general solution of (6.19) is Z t e−iν·k(t−s) Zˆk (s)ds , e−iν·kt ak + 0
where ak is a constant of integration. Imposing the above function to be periodic of period TN +1 we get an equation for ak whose solution is ak =
e−i 1−e
ν·k ω 2π
−i ν·k ω 2π
Z
TN +1 0
eiν·ks Zˆk (s)ds ,
On Long Time Stability in Nonlinear Schr¨odinger Equations
221
which is well defined by hypothesis 5). So, we define χˆ k by (6.13), and it actually solves the original equation (6.18). We can use (6.13) to estimate the norm of the symplectic gradient of χˆ k . Exploiting hypothesis 5) we get ν·k 2γ −i ω 2π − 1 ≥ , e ω|k|τ from which,
|k|τ sup ∇ fk . sup ∇ χˆ k (ζ) ≤ π γ Gρ Gρ So the thesis holds.
Now, to make rigorous the formal normalization procedure one has to introduce a norm for analytic functions (to be precise for their vector field), obtain estimates of the norm of the Poisson brackets of two analytic functions, and finally to use one of the well established recursive algorithms which allow to show that the normalizing transformation is well defined in a suitable domain, and to obtain the exponential estimate of the remainder. All this program can be achieved exactly as in [3]. So we omit the details of the proof.
7. On Stability over Exponential Times in Nonlinear Schr¨odinger Equations We apply Theorem 6.2 to the nonlinear Schr¨odinger equation. We get Lemma 7.1. Consider the Hamiltonian system (5.21). There exists µ∗ and C4 , such that, if || µ := R + 5 ≤ µ∗ R then there exists an analytic canonical transformation ζ = T (ζ 0 ), with T : Gr/2 → G3r/4 (recall that r := R1.5 ) such that H ◦T =
N +1 X
ν i Ji + Z + R ,
(7.1)
i=1
where Z is in normal form, i.e. {Z, Ji } ≡ 0 (i = 1, . . . , N + 1), R is exponentially small with R, namely satisfies " # 1/(τ +1)
C 1 4 sup ∇ R(ζ 0 ) µR2 µ1/(τ +1) exp − ; r ζ 0 ∈Gr/2 µ
(7.2)
moreover, for any analytic function g : Gr → B, where B is a Banach space, one has Sup kg(ζ 0 ) − g(T (ζ 0 ))kB µ1/(τ +1) Sup kg(ζ)kB .
ζ 0 ∈Gr/2
ζ∈Gr
(7.3)
222
D. Bambusi
Proof. We have 8lt (ζ) = (Ik , φk + (4 − δkl )νl t, uk ei4νl t ) ,
l = 1, . . . , N ,
+1 8N (ζ) = (Ik , φk + k 2 t, uk eik t ) , t 2
where uk = pk + iqk . It follows 9ψ (ζ) = (Ik , φk − ψk + 4
N X
ψl , uk ei4
PN l=1
ψl
).
l=1
Then, choosing σ small enough, hypothesis 4) holds, and cT ≤ 2. Hypothesis 5) is also satisfied. By (5.23) one has ωf r2 + ||/r2 . We come to (6.3); we show now that it holds with γ = R2 β and any M satisfying, M≤
1 −β . R2 supi ωi
(7.4)
The statement is obvious when n = 0. Consider the case |n| ≥ 1 assuming |k| ≤ M , we have N X ωi ki | ≥ 1 − R2 M sup ωi , |n + R2 i
i=1
and the r.h.s. is larger than βR2 provided (7.4) holds. So we have µ = R + ||/R5 . It is now clear that it is possible to choose M in such a way that both (6.4) and (7.4) hold. So we can apply Theorem 6.2, and therefore the thesis holds. Remark 7.2. Particular cases of (7.3) are |Ji ◦ T − Ji | µ1/(τ +1) R3 .
(7.5)
Remark 7.3. One has (by Cauchy inequality) sup J˙i (ζ 0 ) =
ζ 0 ∈Gr/2
2
sup |Ji (ζ 0 )| sup ∇ R(ζ 0 ) sup J i , R (ζ 0 ) ≤ r ζ 0 ∈Gr ζ 0 ∈Gr/2 ζ 0 ∈Gr/2 " # 1/(τ +1) C4 5 1/(τ +1) R µµ exp − . (7.6) µ
From Lemma 7.1 we obtain that the functions Ji (ζ 0 ) move exponentially slowly. We aim to use such functions as Lyapunov functions in order to show that the distance of the solution from the torus I 0 = p0 = q 0 = 0 increases exponentially slowly. To this end we will need the following: Lemma 7.4. For any ρ > 0 consider real (p, q, I, φ) and assume |Ji (ζ)| ≤ ρ ,
i = 1, . . . , N + 1 ;
(7.7)
|Ik | ≤ C11 ρ ,
(7.8)
then there exists a constant C11 such that X i≥N +1
i2
p2k + qk2 ≤ C11 ρ , 2
k = 1, . . . , N.
On Long Time Stability in Nonlinear Schr¨odinger Equations
223
Proof. Define A := 4
N X
Il ,
J := 4
l=1
X p2 + q 2 k k , 2
k≥N +1
so that one has Ji = −Ii + A + J, i = 1, . . . , N . The first N of (7.7) are therefore equivalent to A + J − ρ ≤ Il ≤ A + J + ρ ; inserting these inequalities in the definition of A solving with respect to A and substituting again in the above inequalities we get −
8N − 1 8N − 1 1 1 J− ρ ≤ Il ≤ − J+ ρ. 4N − 1 4N − 1 4N − 1 4N − 1
Inserting in the last of (7.7), and remembering the definition of J we get X 8N − 1 α(N ) ρ , i2 − α(N ) Zi ≤ 1 + 4 i≥N +1
where
PN
α(N ) := 4 Recalling that N X
l2 = N
l=1
l2 . 4N − 1 l=1
N2 N 1 + + 3 2 6
,
it is easy to see that (N + 1)2 − α(N ) is positive, so that there exists a positive constant such that i2 − α(N ) ≥c>0. inf i≥N +1 i2 From this the thesis follows. Remark 7.5. The above lemma has an interesting geometrical interpretation. Indeed it can be shown to be equivalent to the statement that JN +1 restricted to the surface Ji = 0, i = 1, . . . , N has a minimum at zero. Remark 7.6. By Lemma 7.4 one has that there exists C12 , such that, provided ρ is small enough ρ √ + ρ , (7.9) |Ji (ζ)| ≤ ρ =⇒ d (ζ, M) ≤ C12 r where M was defined by (5.16). We have now the following Theorem 7.7. Consider the Hamiltonian system (5.21) with real initial data. There exists µ∗ , C4 , and C13 such that, if µ ≤ µ∗ then corresponding to initial data such that N X i=1
i2 |Ii (ζ(0))| +
X i≥N +1
i2 Zi (ζ(0)) ≤ C13 R3 µ1/(τ +1) ,
(7.10)
224
D. Bambusi
one has
N X
i2 |Ii (ζ(t))| +
i=1
X
i2 Zi (ζ(t)) R3 µ1/(τ +1)
i≥N +1
for all times satisfying 1 |t| 2 exp R µ
"
C4 µ
1/(τ +1) # .
(7.11)
Proof. Equation (7.10) implies |Ji (ζ(0))| R3 µ1/(τ +1) ∀ i = 1, . . . , N + 1, and ζ(0) ∈ Gr/2 , so we can apply Lemma 7.1 implying (7.5), from which |Ji (ζ 0 (0))| R3 µ1/(τ +1) ,
(7.12)
and moreover (7.6) holds. It implies |Ji (ζ 0 (t)) − Ji (ζ 0 (0))| R3 µ1/(τ +1) ,
(7.13)
provided t satisfies (7.11) and ζ 0 (t) ∈ Gr/2 . We prove such an inclusion. By (7.12) and (7.13) it follows |Ji (ζ 0 (t))| R3 µ1/(τ +1) , which by (7.9) implies ζ 0 (t) ∈ GCRµ1/2(τ +1) , which provided µ is small enough implies ζ 0 (t) ∈ Gr/2 . Using again (7.5) we get the thesis. Theorem 2.1 is a simple corollary of Theorem 7.7. We have Proof of Theorem 2.1. First we prove that (2.4) implies (7.10). Indeed, denoting by u0 the function whose coordinates are ζ, and by u := T1 (u0 ) we have N X i=1
i2 |Ii | +
X i≥N +1
i2
N X X p2i + qi2 = i2 |Zi (u0 ) − νi | + i2 Zi (u0 ) 2 i=1
≤
N X
i≥N +1
i2 |Zi (u) − Zi (u0 )| +
i=1
N X
i2 |Zi (u) − νi |
i=1
X X 2 0 i Zi (u ) − Zi (u) + i2 Zi (u) . + i≥N +1 i≥N +1 Using (5.14) and (5.15) we obtain N X i=1
i2 |Ii | +
X i≥N +1
i2
X p2i + qi2 R4 + |Zi (u) − νi | , 2 i≥1
which shows that indeed (2.4) implies (7.10). Using the analogous of (5.14) and (5.15) for T1−1 it is clear that also the converse estimate holds, and therefore that Theorem 7.7 implies Theorem 2.1. A slightly different description of the dynamics is given by the following:
On Long Time Stability in Nonlinear Schr¨odinger Equations
225
Theorem 7.8. There exist µ∗ and C14 such that, if µ ≤ µ∗ then there exists a manifold N ⊂ P isomorphic to an N dimensional torus, with the following properties: 1) It is stable over exponentially long times, namely implies d(ζ(0), N ) ≤ C14 r v " # u 1/(τ +1) u C4 t 5 1/(τ +1) d(ζ(t), N ) rd(ζ(0), N ) + |t|µR µ exp − ,(7.14) µ for the times satisfying (7.11). 2) N is close to the manifold M defined by (5.16) (and therefore also to MR,ω ), namely one has (7.15) sup d(ζ, N ) rµ1/2(τ +1) . ζ∈M
Proof. N is defined as the set such that Ik0 = p0l = ql0 = 0, ∀ k = 1, . . . , N , ∀ l ≥ N + 1. Denote for simplicity ρ := d(ζ(0), N ) , then, since the Lipschitz constant of T is bounded we have d(ζ00 , N ) ρ , with ζ00 := T −1 (ζ(0)); it follows, by the very definition of the norm and of N , |Ji (ζ00 )| (r + ρ)ρ rρ , and, by (7.6), 0
|Ji (ζ (t))| rρ + |t|µR µ
5 1/(τ +1)
# " 1/(τ +1) C4 ; exp − µ
using (7.9) we get v " # u 1/(τ +1) u C 4 0 , d(ζ (t), N ) trρ + |t|R5 µ1/(τ +1) exp − µ and, by the boundedness of the Lipschitz constant of T1−1 we get (7.14). To prove (7.15) use (7.3) which gives Ik = pl = ql = 0 =⇒ |Ik0 | µ1/(τ +1) R3 ,
X i≥N +1
from which the thesis immediately follows.
0
i2
0
pi2 + qi2 µ1/(τ +1) R3 , 2
Acknowledgement. I would like to thank Antonio Giorgilli for many fruitful discussions. This work has been developed with the partial support of the grant EC contract ERBCHRXCT940460 for the project “Stability and universality in classical mechanics”, and of the grant CE n. CHRX-CT96-0330/DG.
226
D. Bambusi
References 1. Bambusi, D., Giorgilli, A.: Exponential Stability of States Close to Resonance in Infinite Dimensional Hamiltonian Systems. J. Stat. Phys. 71, 569–606 (1993) 2. Bambusi, D.: A Nekhoroshev–Type Theorem for the Pauli–Fierz Model of Classical Electrodynamics. Ann. Inst. Henri Poincar´e, Physique th´eorique 60, 339–371 (1994) 3. Bambusi, D.: Exponential Stability of Breathers in Hamiltonian Chains of Weakly Coupled Oscillators. Nonlinearity 9, 433–457 (1996) 4. Bambusi, D., Nekhoroshev, N.N.: A property of exponential stability in nonlinear wave equation near the fundamental linear mode. Preprint 16/1996 Dipartimento di Matematica, Universit`a di Milano 5. Bourgain, J.: Construction of approximative and almost periodic solutions of perturbed linear Schr¨odinger and wave equation. GAFA 6, 201–230 (1995) 6. Bourgain, J.: Nonlinear Schr¨odinger equations. Preprint 1995 7. Craig, W., Wayne, C.E.: Newton’s Method and Periodic Solutions of Nonlinear Wave Equations. Comm. Pure and Appl. Math. 46, 1409–1501 (1993) 8. Chernoff, M.P., Marsden, J.E.: Properties of Infinite Dimensional Hamiltonian Systems. Lect. Notes Math. 425, Berlin-Heidelberg-New York: Springer Verlag, 1974 9. Giorgilli, A., Zehnder, E.: Exponential stability for time dependent potentials. J. Appl. Math. Phys. 43, 827–855, (1992) 10. Kuksin, S.B.: Nearly Integrable Infinite-Dimensional Hamiltonian Systems. Lect. Notes Math. 1556, Berlin–Heidelberg–New York: Springer, 1994 11. Kuksin, S.B., P¨oschel, J.: Invariant Cantor manifolds of quasi-periodic oscillations for a nonlinear Schr¨odinger equation. Ann. of Math. 142, 149–179 (1995) 12. Nekhoroshev, N.N.: Behaviour of Hamiltonian systems close to integrable. Funct. Anal. and Appl. 5, 338–339 (1971) 13. Nekhoroshev, N.N.: Exponential estimate of the stability time of near integrable Hamiltonian systems. Russ. Math. Surv. 32 (6), 1–65 (1977) 14. Wayne, C.E.: Periodic and Quasi-periodic Solutions of Nonlinear Wave Equation via KAM Theory. Commun. Math. Phys. 127, 479–528 (1990) Communicated by A. Kupiainen
Commun. Math. Phys. 189, 227 – 235 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Indefinite K¨ahler-Einstein Metrics on Compact Complex Surfaces Jimmy Petean Department of Mathematics, SUNY, Stony Brook, NY 11794-3651, USA. E-mail:
[email protected] Received: 6 December 1996 / Accepted: 17 March 1997
Abstract: We completely classify those compact complex surfaces which admit indefinite Ricci-flat K¨ahler metrics. Slightly weaker results are also obtained for indefinite K¨ahler-Einstein metrics with non-zero scalar curvature.
1. Introduction A pseudo-Riemannian metric on a smooth manifold is called Einstein if the Ricci tensor of the Levi-Civita connection equals a scalar multiple of the metric. These equations first appeared as the vacuum case of the Einstein field equations “with cosmological constant” and were introduced by Einstein as a system of hyperbolic partial differential equations for an unknown Lorentzian-signature metric on a 4-manifold. Since then, mathematicians have been highly interested in these equations, but attention has been focused in the Riemannian case. While we still have only limited knowledge about general Riemannian solutions, strong results have been obtained about the existence of (positive definite) K¨ahler-Einstein metrics (cf. [1, 20]). In [3] the reader can find a detailed discussion of these topics and an extensive list of references. In the last years, especially since the work of Ooguri and Vafa [17] on N = 2 string theory, indefinite Ricci-flat metrics of K¨ahler type on complex surfaces have attracted considerable attention from physicists. We will study indefinite K¨ahler-Einstein metrics on compact complex surfaces, focusing on the existence problem. We will completely classify surfaces admitting indefinite Ricci-flat metrics of K¨ahler type, and almost completely classify those admitting indefinite K¨ahler-Einstein metrics with non-zero Einstein constant. We will also display some non-locally-homogeneous examples. These examples will show that the moduli spaces of these metrics can be highly non-trivial and surprisingly different from those encountered in the positive definite case. In particular, we will see that indefinite Ricci-flat metrics on tori need not be flat.
228
J. Petean
Let us begin by considering a compact complex manifold (M 2n , J). Here M is a 2ndimensional smooth compact manifold and J is an integrable almost complex structure on M . If n = 2, M is called a (compact) complex surface. A pseudo-Riemannian metric g on M 2n is said to be Hermitian (or J-compatible) if g(x, y) = g(Jx, Jy) for all x, y. At any point of the manifold one can choose an orthogonal basis of the tangent space of the form {x1 , Jx1 , ..., xn , Jxn }; so, if g is Hermitian, its signature is of the form (2k, 2l). In particular if M is a complex surface, any indefinite Hermitian metric on M has signature (2, 2). If g is a Hermitian pseudo-Riemannian metric then ω(x, y) = g(Jx, y) is a 2-form, called the K¨ahler form of g. Definition 1. A Hermitian pseudo-Riemannian metric g is called K¨ahler if its K¨ahler form is closed. In particular, if g is not positive or negative definite, it is called an indefinite K¨ahler metric. Consider now the Levi-Civita connection ∇ of g on M . Assume that g is K¨ahler; then J is parallel with respect to ∇. This is usually stated only in the Riemannian case, but it is not difficult to check that it is also valid in the indefinite case (by exactly the same proof). Let Ric be the Ricci tensor of ∇. Then Ric is J-invariant and hence ρ(x, y) = Ric(Jx, y) is a 2-form. It is called the Ricci form of g. It is also true in the indefinite case that −iρ is the curvature of the canonical line bundle of M (the bundle of holomorphic 2-forms); the proof is the same as in the Riemannian case. In particular ρ is closed and the de Rham class [ρ/2π] is equal to the first Chern class of M in cohomology with real coefficients. Definition 2. An indefinite K¨ahler metric g on M is called indefinite K¨ahler-Einstein if there exists λ ∈ R such that Ric = λg (or ρ = λω). In this case λ is called the Einstein constant. If g is an indefinite K¨ahler-Einstein metric on M and k ∈ R, then gˆ = kg is also an indefinite K¨ahler-Einstein metric (even if k < 0). The K¨ahler form of gˆ is ωˆ = kω while the Ricci form is ρˆ = ρ. If ρ = λω, then ρˆ = (λ/k)ω. ˆ Without loss of generality, we may therefore assume that λ is either 0 or 1. Indefinite K¨ahler-Einstein metrics on compact complex surfaces is the object of study of this paper. The following are the simplest examples. Complex Tori. Let M = C2 /3 be a complex 2-dimensional torus. Let z1 , z2 be the standard coordinates on C2 . The 1-forms dz1 , dz2 , dz¯1 , dz¯2 then descend to M . If A = (ajk ) is a 2 × 2 (constant) Hermitian non-degenerate matrix, then ω = Σajk dzj ∧ dz¯k defines a closed, real, (1,1)-form on M . So ω is the K¨ahler form of a K¨ahler metric g. Moreover, this pseudo-metric is flat. If we choose A to be indefinite, then g is an indefinite K¨ahler-Einstein metric on M with Einstein constant 0. Minimal Ruled Surfaces. Let S be a Riemann surface of genus g ≥ 2. There is a unique Riemannian metric h1 compatible with the complex structure of S with constant scalar curvature -2; h1 is a K¨ahler-Einstein metric on S with Einstein constant -1. In the same way we have a K¨ahler-Einstein metric h2 on CP1 with Einstein constant 1. Then h2 − h1 is a well defined indefinite K¨ahler-Einstein metric on M = CP1 × S with Einstein constant 1. The general ruled surface is of the form P(E), where E is a 2-dimensional complex vector bundle over a Riemann surface S. We will later construct indefinite K¨ahlerEinstein metrics on “most” of these twisted products (assuming always that the genus of S is greater than 1).
Indefinite K¨ahler-Einstein Metrics on Compact Complex Surfaces
229
More examples (including non-locally-homogeneous ones) will be presented in the last section. The following theorem determines on which surfaces solutions could be found. Theorem 1. Let M be a compact complex surface. If M admits an indefinite K¨ahlerEinstein metric, then M is one of the following: a) b) c) d) e)
a Complex Torus; a Hyperelliptic surface; a Primary Kodaira surface; a minimal ruled surface over a curve of genus g ≥ 2; or a minimal surface of class V II0 with no global spherical shell, and with second Betti number even and positive.
Remark 1. No surface of type (e) is known, and it has been conjectured that they simply do not exist (cf. [14, Sect. 5]). Moreover, if such a surface existed and admitted an indefinite K¨ahler metric, providing it with the opposite orientation, would then yield a symplectic manifold with b+ > 1 violating the Bogomolov inequality 2χ ≥ 3τ ; no such symplectic manifold is known at present. Remark 2. We will display indefinite K¨ahler-Einstein metrics with Einstein constant 0 on the surfaces (a) , (b) and (c) and with Einstein constant 1 on “most” surfaces of type (d).
2. Indefinite K¨ahler Metrics A natural question to consider, independently of the Einstein equations, is the existence of indefinite K¨ahler metrics on complex manifolds. Our main tool to study this problem in the case of compact complex surfaces will be the Seiberg–Witten invariants introduced in [19]. Let us first fix a few notations. M will always be a compact complex surface and M will mean the smooth manifold M provided with the non-standard orientation. As usual bk (M ) denotes the k th Betti number of M and b+ (M ) (b− (M )) the dimension of a maximal subspace of H 2 (M, R), where the intersection form is positive (negative) definite. So b2 = b+ + b− and the signature of M is τ (M ) = b+ − b− . The blow-up of c and M is called minimal if it can not be obtained as the blow-up of M is denoted by M another surface. If M admits an indefinite K¨ahler metric, its K¨ahler form ω is a symplectic form compatible with the orientation of M . In particular b− (M ) > 0. But much more can be said: the work of Taubes [18] shows that the Seiberg-Witten invariant of the canonical Spinc structure induced by ω on M is different from 0. This turns out to be a very strong obstruction, as we can see in the following lemma. Lemma 1. If a compact complex surface admits an indefinite K¨ahler metric, then it is minimal or a one-point blow-up of CP2 . c admits an indefinite K¨ahler metric. From the discussion Proof. Assume that N = M above we know that there is at least one Spinc structure on N with non-trivial SeibergWitten invariant. All the Seiberg-Witten invariants of a connected sum of manifolds
230
J. Petean
vanish unless one of them has a negative-definite intersection form [19, 18]. Since N = M #CP2 we must have b− (M ) = 0 (in particular M is minimal) and hence b1 (M ) = b1 (N ) must be even (recall that the symplectic form produces an almost complex structure on N ). Now we invoke the classification of compact complex surfaces; cf. e.g. [2]. If Kod(M ) is 0 or 1, then c21 (M ) = 0 and the formula c21 + 8q + b− = 10pg + 9 would imply b− (M ) > 0. If Kod(M ) = −∞ the only possibility is M = CP2 . If Kod(M ) = 2, then 0 < c21 = 2c2 + 3τ ≤ 3c2 . This implies that b1 (M ) = 0 and b2 (M ) = 1. But then it is known that M is a quotient of the unit ball in C2 (cf. [2, p.136]). By a theorem of Mal’tsev (cf. [21, p.151]) the fundamental group of M is then “residually finite” and hence M admits non-trivial finite coverings. Consider a covering of order k > 1; then N is covered by a surface which is a k-fold blow-up. Such a surface can not admit an indefinite K¨ahler metric and so neither can N . Remark 3. The blow-up of CP2 at one point is a ruled surface and any ruled surface M admits an indefinite K¨ahler metric: let π : E → S be a 2-dimensional holomorphic vector bundle over a Riemann surface S and M = P(E). Given a Hermitian metric on E and a K¨ahler form ω0 on S, a sign variation on a well known form gives ω = π ∗ (ω0 ) − is∂ ∂¯ log ||W ||; which, for small s, is the K¨ahler form of an indefinite K¨ahler metric on M . Now we have to study which minimal complex surfaces do admit indefinite K¨ahler metrics. The Kodaira classification (cf. [2]) will be of much help. The previous remarks deal with the surfaces of Kodaira number −∞ of K¨ahler type. Those which are not of K¨ahler type are called surfaces of class V II0 . The following lemma shows that none of the known examples admits such a metric. Nevertheless the classification here is not complete and we can not decide if new examples could hold indefinite K¨ahler metrics; but, as we said in the introduction, it seems very unlikely. Lemma 2. If M is a surface of class V II0 with a global spherical shell (cf. [14]) and b2 (M ) = b− (M ) > 0, then M does not admit an indefinite K¨ahler metric. Proof. Such a surface is diffeomorphic to the connected sum of S 1 × S 3 with b2 (M ) 2 copies of CP (cf. [14]). Then M admits Riemannian metrics of positive scalar curvature; hence all the Seiberg-Witten invariants of M vanish and the lemma follows from the previous remarks. Now we turn our attention to elliptic surfaces: Proposition 1. A minimal elliptic surface of K¨ahler type admits an indefinite K¨ahler metric if and only if its Euler characteristic is 0. Proof. The smooth structure of an elliptic surface M with positive Euler characteristic 12d is determined by its base orbifold and d (cf. [5]). For any orbifold and any d > 0 it is easy to construct (as in [9]) an elliptic surface with constant j-invariant, only singular fibers of types mI0 and I0∗ and the given base orbifold and Euler characteristic. It follows then that M has embedded 2-spheres of self-intersection 2. Note also that b+ (M ) > 1. This implies that M does not admit indefinite K¨ahler metrics because all the Seiberg-Witten invariants of a 4-manifold X that has embedded 2-spheres with positive self-intersection vanish (cf. [4, 10]). Indeed, if X has a Spinc structure with non-trivial Seiberg-Witten invariant so does X#CP2 [4]. We can then assume that X has
Indefinite K¨ahler-Einstein Metrics on Compact Complex Surfaces
231
an embedded sphere S whose cohomology class is non-trivial and has self-intersection 0. Let E be the exceptional curve in X#CP2 (i.e. E = CP1 ⊂ CP2 ). For any positive integer k the cohomology class of kS + E can be represented by an embedded sphere of self-intersection -1 and hence there is a diffeomorphism of X#CP2 realizing the reflection on the orthogonal complement of this class (with respect to the intersection form). It is easy to check that these diffeomorphisms produce infinite different Spinc structures with non-trivial Seiberg-Witten invariants. But this is a contradiction, since Witten has observed [19] that the number of Spinc structures with non-trivial invariants is always finite. Now assume that the Euler characteristic of M is 0. We will construct an indefinite K¨ahler metric on M . First note that M has only singular fibers of type mI0 . Let π : M → S be the projection onto the base curve and ω be a K¨ahler form on M . At any p ∈ M the fiber of π through p is smooth and hence it has a tangent plane; this is of course contained in the kernel of π∗ : Tp M → Tπ(p) S. It is not equal to the kernel exactly when p is in a multiple fiber. Assume that this is the case; there exists a neighbourhood U of π(p) such that π −1 (U ) is isomorphic to the quotient of the product V × T of the unit disc V and a torus T by the action of a finite cyclic group G generated by an automorphism χ of the form χ(z, t) = (e2iπ/m z, t + h(z)) (cf. [8]). Let z be a holomorphic coordinate in V and f be a smooth positive function of kzk with compact support in V which is ¯ ψp is invariant through equal to 1 in a neighbourhood of 0. Consider ψp = f dz ∧ dz; G and so descends to π −1 (U ). Since this form has compact support in π −1 (U ), it can be extended to the whole M . Construct such a form for each multiple fiber. Summing up these forms and the pull-back of a K¨ahler form on S we get a (1,1)-form ψˆ on M which is closed, vanishes on the tangent plane to the fibers and is strictly positive in the orthogonal plane. For a big positive constant λ, ω¯ = ω − λψˆ is then the K¨ahler form of an indefinite K¨ahler metric on M . Finally, we turn our attention to a surface M of general type. In this case, it has been proved (cf. [10], [13]) that the existence of non-trivial Seiberg-Witten invariants for M implies that τ (M ) ≥ 0; if b+ (M ) = 1 then clearly τ (M ) ≥ 0, so we can assume that b+ (M ) > 1. Hence, for any Riemannian metric g on M we have (cf. [12]), Z s2g dvolg ≥ 32π 2 c21 (M ). M
When M is a minimal surface of general type with no (-2)-sphere (what is guaranteed by the presence of a non-trivial Seiberg-Witten invariant for M ), c1 (M ) is negativedefinite and hence [20] M admits a (Riemannian) K¨ahler-Einstein metric g0 . For this metric Z s2g0 dvolg0 = 32π 2 c21 (M ). M
This shows that c21 (M ) ≤ c21 (M ) and hence τ (M ) ≥ 0. Let us now summarize the results of this section: Theorem 2. Suppose M admits an indefinite K¨ahler metric. Then i)
If Kod(M ) = −∞, then M is a ruled surface or is as in Theorem 1 (e).
232
J. Petean
ii) If Kod(M ) = 0, then M is a torus, an Hyperelliptic surface or a Primary Kodaira surface. iii) If Kod(M ) = 1, then M is minimal and τ (M ) = 0 iv) If Kod(M ) = 2, then M is minimal and τ (M ) is non-negative and even. We have seen that ruled surfaces and the surfaces in (iii) of K¨ahler type do admit indefinite K¨ahler metrics. We will see in the next section that all the surfaces in (ii) actually admit indefinite K¨ahler-Einstein metrics. The existence of indefinite K¨ahler metrics in the other cases remains unknown. 3. Indefinite K¨ahler-Einstein Metrics We will first see that the existence of an indefinite K¨ahler-Einstein metric completely determines the Kodaira number of a compact complex surface. Together with the results of the last section, this will prove Theorem 1. Proposition 2. If M admits an indefinite K¨ahler-Einstein metric with Einstein constant 6= 0, then Kod(M ) = −∞ and c21 < 0. Proof. If M admits such a metric then its Ricci form ρ = kω is everywhere nondegenerate and indefinite. If γ ∈ O(K m ) is not trivial then ρ=
1 ¯ ∂ ∂ log |γ|2 im
would be semi-negative where |γ| attains its maximum. Hence, for all m > 0, K m has no non-trivial global section; and Kod(M ) = −∞. The second assertion follows from the facts that [ρ] = 2πc1 and ω ∧ ω < 0. Corollary 1. If M admits an indefinite K¨ahler-Einstein metric with Einstein constant 6= 0, then M is as in (d) or (e) of Theorem 1. Proposition 3. If M admits an indefinite K¨ahler-Einstein metric with Einstein constant 0, then Kod(M ) = 0 and c1 (M, R) = 0. Proof. Suppose that M admits such a metric g. Then c1 (M, R) = 0 and M must be minimal. The only surfaces with Kodaira number −∞ and vanishing real first Chern class are the minimal surfaces of class V II with 0 second Betti number; which do not admit indefinite K¨ahler metrics. So we can assume that there exists m > 0 and m f be the universal covering of M . The pull-back of g ) non-trivial. Let M γ ∈ O(KM f (with Einstein constant 0). Since this gives an indefinite K¨ahler-Einstein metric on M metric is Ricci flat, there are holomorphic 2-forms of constant length in a neighborhood of any point (this fact is usually stated only in the Riemannian case, but it is not difficult f is simply connected to check that the proof also works in the indefinite case). Since M it then admits a global non-trivial holomorphic 2-form ϕ of constant length. The pull f of bounded back γˆ of γ can be written γˆ = f ϕm for some holomorphic function f on M m length. Hence f is constant and kγk is constant. Then γ is never zero and KM is trivial. It follows that Kod(M ) = 0. Corollary 2. If M admits an indefinite K¨ahler-Einstein metric with Einstein constant 0, then M is as is (a), (b) or (c) of Theorem 1.
Indefinite K¨ahler-Einstein Metrics on Compact Complex Surfaces
233
By now we have already proved Theorem 1. The only thing remaining is to display the promised examples. Non flat solutions on Tori. On the torus M = C/31 × C/32 consider γ = f (z)dz ∧ dz¯ + dz ∧ dw¯ + dw ∧ dz; ¯ where z and w are holomorphic coordinates on each complex plane and f is a smooth positive function on M = C/31 . It is clear that γ defines an indefinite K¨ahler metric g on M . We will now compute the curvature of g. If z = x1 + ix2 ∂ and w = x3 + ix4 , then ( ∂x ) is the standard basis of the tangent space of R4 . This i basis, of course, descends to M and one can apply the Gram-Schmidt process to it to get a basis (vi ) orthonormal with respect to g. Let (ei ) be the basis of T ? M dual to (vi ). The canonical L orientation of M and g induce a splitting of the space of 2-forms 32 (T ∗ M ) = 3+ 3− into self-dual and anti self-dual 2-forms. And φ 1 = e1 ∧ e 2 + e 3 ∧ e4 ,
φ 2 = e1 ∧ e 3 + e 2 ∧ e 4 ,
φ 3 = e1 ∧ e4 − e 2 ∧ e 3
form an orthogonal basis of the space of self-dual forms. Considering the curvature tensor R as a section of End(32 (T ∗ M )) and taking into account the splitting we have A B R= . B? D Direct computations show that B = D = 0 and in the basis of 3+ considered above, 1 0 −1 A = (2f −2 1f ) 0 0 0 1 0 −1 so the metric is Einstein, and is locally homogeneous (and flat) iff f is constant. It is interesting to compare this computation with [11]. Our metric is Einstein on a manifold with vanishing Euler characteristic; nevertheless the metric is not flat. This is allowed because as shown in this example a symmetric operator A with respect to a metric of signature (2,2) (instead of a Riemannian metric) can verify trace(A2 ) = 0 while A 6= 0. Primary Kodaira surfaces. As described in [8, p.786] such a surface M is of the form M = C2 /G, where G =< ψ1 , ψ2 , ψ3 , ψ4 >, each ψi is an affine automorphism of C2 and G is fixed point free. More precisely ψi is of the form ψi (w1 , w2 ) = (w1 + αi , w2 + α¯ i w1 + βi ) , where αi , βi ∈ C and α1 = α2 = 0. Let S be the torus given by the lattice < α3 , α4 > and f be a smooth function on S. On C2 consider γ = (f (w1 ) − 2Re(w1 ))dw1 ∧ dw¯ 1 + dw1 ∧ dw¯ 2 + dw2 ∧ dw¯ 1 . The same computations as in the torus show that γ defines an indefinite K¨ahlerEinstein metric on C2 which is homogeneous (and flat) only when f is constant. Moreover, γ is invariant through the ψi ’s and hence the metric descends to M . Hyperelliptic surfaces. It is shown in [6, p.585] that any hyperelliptic surface M is of the form M = F × C/G, where F and C are elliptic curves and G is finite group of fixed-point-free automorphisms of F × C. Moreover, let F = C/3 with 3 =< 1, τ >;
234
J. Petean
then G =< φ, ϕ >, where φ is of the form φ(z, w) = (z + τ /m, e2kπi/m w) and ϕ is a translation of order m. If z, w are the standard holomorphic coordinates in C2 then dz ∧ dz¯ − dw ∧ dw¯ is the K¨ahler form of an indefinite K¨ahler flat metric (on C2 ). This form is invariant through translations and so projects to a (1, 1)-form on F × C. A direct computation shows that this form is invariant through φ and ϕ and hence defines a (1, 1) on M ; this is the K¨ahler form of an indefinite K¨ahler-Einstein metric on M with Einstein constant 0. Minimal irrational ruled surfaces. Now let M = P(E), where E is a 2-dimensional holomorphic vector bundle over a curve S of genus g ≥ 2. We will construct indefinite K¨ahler-Einstein metrics on M when the bundle E is stable or the direct sum of two line bundles of the same degree (see [7], [16]). b P(E) and P(E) b are isomorphic if and only Note that given vector bundles E and E, b = E ⊗ L for a line bundle L; and that E b verifies any of the conditions above if and if E only if E does. So both conditions are really properties of M . and gij : Consider M as a CP1 -bundle over S. Let (Ui )N i=1 be an open cover of S T T Uj → Ui Uj → Gl(2, C) be a set of transition functions for E. Then [gij ] : Ui P Gl(2, C) are transition functions for M . Under the conditions stated above, Narasimhan and Seshadri [16] proved that M admits constant transition functions in P(U 2). Let g1 be the Fubini-Study metric on CP1 ; then g1 is a K¨ahler-Einstein metric on CP1 , invariant through the action of P(U 2). Renormalize g1 so that the Einstein constant is 1 and let g2 be a K¨ahler-Einstein metric on S with Einstein constant -1. Then g1 − g2 is invariant through the transition functions and so defines an indefinite K¨ahler-Einstein metric on M with Einstein constant 1. Remark 4. In [15, p.395] M.S. Narasimhan and S. Ramanan proved that every vector bundle (over a curve of genus greater than 1) can be “approximated” by stable vector bundles. A little more precisely, every vector bundle is contained in an analytic family of vector bundles for which the set of stable bundles is open and dense. The cases considered above therefore contain “most” of the minimal ruled surfaces (over curves of genus greater than 1). Acknowledgement. The author most gratefully thanks his supervisor, Claude LeBrun, for his continuing help and support during the preparation of this paper.
Note added in proof Concerning Theorem 2 (iv), note that there exist surfaces of general type with strictly positive signature which admit indefinite K¨ahler metrics. Examples of these are the fibre bundles constructed by Kodaire (“A certain type of irregular Algebraic Surfaces”, J. Anal. Math. 19, 207–215, 1967) and Atiyah (“The signature of fibre-bundles”, Global Analysis, Univ. of Tokyo Press, Tokyo, 1969, 73–84).
References 1. Aubin, T.: Equations du type de Monge-Ampere sur les varietes k¨ahleriennes compactes. C.R. Acad. Sci. Paris 283, 119-121 (1976) 2. Barth, W., Peters, C. and Van de Ven, A.: Compact Complex Surfaces. Berlin–Heidelberg–New York: Springer-Verlag, 1984 3. Besse, A.: Einstein Manifolds, Berlin–Heidelberg–New York: Springer-Verlag, 1987
Indefinite K¨ahler-Einstein Metrics on Compact Complex Surfaces
235
4. Fintushel, R., Stern, R.J.: Immersed spheres in 4-dimensions and the immersed Thom conjecture. Proc. of the Gokova Geometry-Topology Conference 1994, Turkish J. Math. 19 (2), 27–39 (1995) 5. Friedman, R., Morgan, J.W.: Smooth 4-manifolds and Complex Surfaces. Berlin–Heidelberg–New York: Springer-Verlag, 1991 6. Griffiths, P., Harris, J.: Principles of Algebraic Geometry. New York: Wiley, 1978 7. Kobayashi, S.: Differential Geometry of Complex Vector Bundles. Tokyo–Princeton, NJ: Iwanami Shoten and Princeton Univ. Press, 1987 8. Kodaira, K.: On the structure of compact complex analytic surfaces I. Am. J. Math. 86, 751–798 (1964) 9. Kodaira, K.: On Compact Analytic Surfaces II. Ann. of Math. 77, 563–626 (1963) 10. Kotschick, D.Orientations and Geometrizations of Compact Complex Surfaces. Preprint 11. Law, P.R.: Neutral metrics in four dimensions. J. of Math. Phys. 32 (11), 3039–3042 (1991) 12. LeBrun, C.: Polarized 4-manifolds, extremal K¨ahler metrics, and Seiberg-Witten theory. Math. Res. Lett. 2, 653–662 (1995) 13. Leung, N.C.: Seiberg-Witten invariants and uniformizations, Math. Ann. 306, 31–46 (1996) 14. Nakamura, I.: Towards classification of Non-K¨ahlerian complex surfaces. Sugaku Expositions 2, 209– 229 (1989) 15. Narasimhan, M.S., Ramanan, S.: Deformations of the moduli space of vector bundles over an algebraic curve, Ann. of Math. 101, 391-417 (1975) 16. Narasimhan, M.S., Seshadri, C.S.: Stable and unitary bundles on a compact Riemann surface, Ann. of Math. 82, 540–567 (1965) 17. Ooguri, H., Vafa, C.: Self-Duality and N = 2 String magic. Mod. Phys. Lett. A Vol. 5, no 18, 1389–1398 (1990) 18. Taubes, C.H.: The Seiberg-Witten invariants and symplectic forms. Math. Res. Lett. 1, 809–822 (1994) 19. Witten, E.: Monopoles and Four-manifolds. Math. Res. Lett. 1, 769–796 (1994) 20. Yau, S.T.: On Calabi’s conjecture and some new results in algebraic geometry. Proc. Nat. Acad. Sci. USA 74, 1798–1799 (1977) 21. Zalesskij, A.E.: Encyclopedia of Mathematical Sciencies. Vol. 37, Part II, Berlin–Heidelberg–New York: Springer-Verlag, 1993. Communicated by A. Jaffe
Commun. Math. Phys. 189, 237 – 257 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Computer-Assisted Bounds for the Rate of Decay of Correlations Gary Froyland? Department of Mathematics, The University of Western Australia, Nedlands WA 6907, Australia. E-mail:
[email protected] Received: 8 January 1997 / Accepted: 20 March 1997
Abstract: The rate of decay of correlations quantitatively describes the rate at which a chaotic system “mixes” the state space. We present a new rigorous method to estimate a bound for this rate of mixing. The technique may be implemented on a computer and is applicable to both multidimensional expanding and hyperbolic systems. The bounds produced are significantly less conservative than current rigorous bounds. In some situations it is possible to approximate resonant eigenfunctions and to strengthen our bound to an estimate of the decay rate. Order of convergence results are stated. 1. Introduction and Motivation One of the major concerns in dynamical systems and ergodic theory is the rate at which systems settle down into their regular (statistically speaking) behaviour. Let T : M govern a discrete dynamical system on a compact Riemannian manifold M . We assume that our system (M, T ) has a unique asymptotic distribution for a “large” set of initial Pn−1 points, that is, limn→∞ n1 i=0 δT i x → µ weakly for Lebesgue almost all x. The measure µ is called the “physical” or “natural” invariant measure of the system (M, T ) [2, 18]. We are interested in the rate at which an initial concentration of mass in phase space approaches the distribution given by µ under the evolution of T . Definition 1.1. The system (T, µ) is called mixing if for each pair of Borel sets A, B ∈ B(M ), lim µ(A ∩ T −k B) → µ(A)µ(B).
k→∞
(1)
We may rewrite this Eq. as limk→∞ µ(A ∩ T −k B)/µ(A) = µ(B), and interpret it as saying that the probability (according to µ) of a point x moving from a set A to a set B ?
URL:http://maths.uwa.edu.au/∼gary/
238
G. Froyland
in k iterations, given that x ∈ A, is (for k large) roughly the probability of x being in B, or the “size” of B (according to µ). As this Eq. holds for any A, we see that T −k B is being evenly dispersed throughout phase space according to the physical measure µ. It is the rate at which this dispersion occurs that we are concerned with. In this paper we shall assume that T is at least piecewise C 1 with H¨older continuous derivative. As we are after rigorous results, T is assumed to be either (i) expanding or (ii) transitive Anosov, and possess an absolutely continuous invariant measure. For manifolds M of dimension d ≥ 2, we call a map T expanding if kDx T (v)k > kvk for all x ∈ M , v ∈ Tx M . It is known (Ma˜ne´ [15] and Bowen [2], for example), that µ is the unique absolutely continuous invariant measure, that (T, µ) is mixing, and that the density h : M → R+ of µ is positive and at least H¨older continuous. Definition 1.2. Let ϕ, ψ : M → R be C 1 test functions. We define the correlation function of ϕ and ψ as Z Z Z k ϕ ◦ T · ψ dµ − ϕ dµ · ψ dµ . (2) Cϕ,ψ (k) = M
M
M
By putting ϕ = χB and ψ = χA , we see that (2) reduces to (1), and that CχA ,χB (k) → 0 as k → ∞. In fact, by approximating ϕ and ψ by step functions it may be shown that Cϕ,ψ (k) → 0 as k → ∞ for all ϕ, ψ ∈ L2 (M, µ) is equivalent to (T, µ) mixing. We shall, however, restrict ϕ and ψ to be at least piecewise continuously differentiable, as we are then guaranteed that Cϕ,ψ (k) approaches zero exponentially fast with n; (Bowen [2], Ruelle [18]). The test functions ϕ and ψ may be thought of as physical observables of our dynamical system. The special case of ϕ = ψ gives what is known as the autocorrelation function: Z Z 2 ϕ ◦ T k · ϕ dµ − ϕ dµ . (3) Cϕ,ϕ (k) = M M Cϕ,ϕ (k) gives an indication of how much the observable ϕ at time t = 0 is correlated with itself at time t = k. Remark 1.3. It is reasonable to suppose that our physical observables vary smoothly in phase space. If we choose our observables ϕ and ψ from the larger spaces L2 (M, µ) or C 0 (M, R), pathological examples may be found for which the rate of decay of correlations is very slow (subexponential). For instance, in the case where M = T2 the 2-torus, T : T2 is defined by the linear map T (x1 , x2 ) = (2x1 + x2 , x1 + x2 ) (mod 1), and µ is normalised Lebesgue measure, Crawford and Cary [5] construct a C 0 test function ϕ for which Cϕ,ϕ (k) = 2/k 2 . Since (2) approaches zero at an exponential rate for ϕ, ψ ∈ C 1 (M, R), one may find constants C = C(ϕ, ψ) and 0 < r0 < 1 such that Cϕ,ψ (k) ≤ C(ϕ, ψ)rk for all r > r0 . We shall call the minimal such r0 the rate of decay of correlations for the system (T, µ). Only relatively recently has work on rigorous bounds for the rate of decay of correlations been done; Rychlik [21], Liverani [13, 14]. Unfortunately, the estimates of [21] and [13, 14] are extremely conservative. For example, one may easily show that the rate of decay of correlations for the tent map with respect to C 1 test functions is 0.5. The estimates provided by Rychlik’s and Liverani’s papers are 0.9986 and 0.5 respectively. While the estimate of [13] is optimal in this case, we shall demonstrate later that the bounds quickly worsen as small nonlinearities are introduced into T .
Computer-Assisted Bounds for Rate of Decay of Correlations
239
One could consider a naive direct approach to the estimation of r0 by scattering a very large number of test points x ∈ M throughout M , distributed according to the measure µ. Once specific functions ϕ and ψ are chosen, the integrals in (2) could then be approximately evaluated as a finite sum over the scattered points. A linear fit of log Cϕ,ψ (k) vs. k would then estimate r0 . This approach is infeasible, however, because of the exponentially stretching nature of the map T . If the points are scattered so that they are roughly a distance apart, then nearby scatter points may be stretched apart to a distance of order the diameter of M after only k = log(diam M/)/ log(1/λ), where λ is the expansion constant (λ = 1/ inf x∈M inf v∈Tx M kDx T (v)k/kvk ). Thus the number of scatter points required grows exponentially with k, making such a calculation infeasible, or at the very least, unreliable for low values of k. One can also not be sure that the particular functions ϕ and ψ used in this calculation yield an uncharacteristically low value for r0 . Other methods of bounding the rate of decay are discussed in the final section. These methods compute rates of decay for test functions that are either real-analytic [3, 20] or of bounded variation [1] (one-dimensional systems). We argue that such function spaces are not good models of physical observables. We instead have concentrated on piecewise C γ or C 1 test functions as many physical measurements naturally fall into this class. To reiterate, our aim is to provide a rigorous computational method of bounding the value of r0 . We expect our method to provide much better bounds than those of [21] or [13, 14]. In addition, we believe our bounds to be of greater physical significance than those of [1] and [3]. In later sections we discuss the order of convergence of our bounds. In some instances we are able to compute resonant eigenfunctions of our system (T, µ), thus strengthening our bound to an estimate of the rate of decay. Particular features of our result are that it may be applied to multidimensional and hyperbolic systems, and that an estimate of the physical invariant measure is obtained “for free” with no extra work. The order of convergence of our bounds are also discussed. 2. Outline of Method Without loss of generality, we henceforth assume that Frobenius operator P for expanding T by Pψ(x) =
X y∈T −1 x
We shall consider P : C γ (M, C) has norm
M
ψ dµ = 0. Define the Perron-
ψ(y) . | det Dy T |
(4)
, where the Banach space C γ (M, C), 0 < γ ≤ 1
kψkγ = sup |ψ(x)| + sup x∈M
R
x,y∈M
|ψ(x) − ψ(y)| := |ψ|∞ + |ψ|γ . kx − ykγ
(5)
We produce an upper bound for Cϕ,ψ (k) by rewriting (2) in terms of the Perron-Frobenius operator: Z k ϕ ◦ T · ψ · h dm (6) Cϕ,ψ (k) = ZM = ϕ · P k (ψ · h) dm M
240
G. Froyland
≤ kϕkγ kP k (ψ · h)kγ ≤ kϕkγ kψ · hkγ Hrk .
(7) R
γ Here r > r0 , the spectral radius of P restricted to C⊥ γ = {ψ ∈ C (M, C) : M ψ dµ = 0}. Thus the problem boils down to one of estimating the spectral radius of the PerronFrobenius operator acting on C γ test functions of zero µ-integral.
Definition 2.1. We define the essential spectral radius ress to be the smallest nonnegative number for which elements of σ(P) outside the disk {z ∈ C : |z| ≤ ress } are isolated eigenvalues of finite multiplicity. The spectrum of P may now be divided into the part that lies inside the disk |z| ≤ ress and that which lies outside this disk.
Isolated Spectrum
Eigenvalue corresponding to invariant density Essential Spectrum
Unit disk
Fig. 1. Schematic representation of the spectrum (as a subset of the complex plane) of the Perron-Frobenius operator acting on the space of C γ test functions
We are not concerned with the composition of the part of σ(P) inside |z| ≤ λγ . It is known (Ruelle, [19]) that ress ≤ λγ , where λ = 1/ inf x∈M inf v∈Tx M kDx T (v)k/kvk . Collet & Isola [4] have better estimates for one-dimensional Markov maps, but these are difficult to compute. We can thus obtain reasonable bounds for the radius of the essential spectrum, and now need only worry about a bound for the radius of the remainder of the non-unit spectrum, that which consists of isolated eigenvalues of finite multiplicity. It is this remaining spectrum of P in the region |z| > λγ that we wish to approximate. We shall partition the state space M into a finite number of connected sets A1 , . . . , An and form the n × n transition matrix Pij =
m(Ai ∩ T −1 Aj ) , m(Ai )
(8)
where m is the Riemannian volume measure on M . One may think of the entry Pij as representing the probability of a point in the region Ai moving into the region Aj in one step. Ulam [23] originally proposed this matrix as a finite-dimensional approximation to the Perron-Frobenius operator in the case where T was an expanding interval map. In [9] the author proved that provided that the regions A1 , . . . , An are carefully chosen, the matrix P is a very good approximation of the Perron-Frobenius operator for both
Computer-Assisted Bounds for Rate of Decay of Correlations
241
multidimensional expanding maps and uniformly hyperbolic maps. The main result of [9] is that the left eigenvector of the stochastic matrix P defines a good approximation of the physical measure µ. In the present paper, we go on to show that the same matrix in fact gives us more information, namely a bound on the rate of decay of correlations for T with respect to the physical measure µ. Theorem 2.2 (Main Result). Let T : M be a C 1+γ , (0 < γ ≤ 1), expanding (resp. Anosov) map of a compact d-dimensional (resp. 2-dimensional) Riemannian manifold M . Denote by {Pn }∞ n=n0 a sequence of Markov partitions for T on M , with the property that maxA∈Pn diam A → 0 as n → ∞. Construct the matrix Pn,ij =
m(An,i ∩ T −1 An,j ) , m(An,j )
An,i , An,j ∈ Pn ,
1 ≤ i, j ≤ card Pn .
(9)
0
Denote by σ 0 (Pn ) the spectral values of Pn that lie in the region |z| > λγ , and by σ(P) 0 the spectrum of P : C γ (M, C) , (0 < γ 0 < γ). Then given > 0, there is n ∈ Z+ such that 0 for all n > n . σ(P) \ {|z| ≤ λγ } ⊂ B (σ 0 (Pn )) Simply put, the above theorem says that the spectral values of the matrices Pn lying 0 outside the disk {|z| ≤ λγ } converge to a set containing the isolated spectrum of 0 P : C γ (M, C) outside this disk. The matrices Pn are relatively simple to compute, and provide us with a means of approximating the important portion of the isolated spectrum of the Perron-Frobenius operator. In the case of an example computation, if it 0 0 so happens that all of σ(Pn ) \ {1} is contained in the disk |z| ≤ λγ , then we take λγ as our bound for the rate of decay of correlations. Remarks 2.3. (i) Finite Markov partitions exist for the class of maps that we are considering [2]. If T is Anosov, the behaviour of the boundaries of the Markov partition sets become difficult to control in dimensions d ≥ 3 (see Ma˜ne´ , p.184 for a discussion). (ii) For Anosov T , the operator P is not the Perron-Frobenius operator for the full map T . A standard construction is followed, where the dynamics of T is projected onto the unstable boundaries of the Markov partition. This projected dynamics induces an expanding map on the unstable boundaries, and we define P for T to be the Perron-Frobenius operator for this induced expanding map. (iii) The matrix defined in (9) is slightly different from that of (8). The two, however, are related via a similarity transformation using the matrix Qij = δij m(Ai ) and thus have the same eigenvalues. 0
(iv) We have enlarged our space of test functions from C γ (M, C) to C γ (M, C), γ 0 < γ for technical reasons. This slight change makes no practicable difference to computations. Corollary 2.4. Let M, T, P, and Pn be defined as in Theorem 2.2. An upper bound for 0 the rate of decay of correlations of T with respect to C γ test functions (0 < γ 0 < γ) is: Expanding T : 0 max λγ , lim
max
n→∞ z∈σ(Pn )\{1}
|z| .
(10)
242
G. Froyland
Anosov T :
1/3
γ0
max λ , lim
max
n→∞ z∈σ(Pn )\{1}
|z|
.
(11)
Proof of Corollary. The expanding case follows from Theorem 2.2 and the inequality (7). The Anosov situation is described in Sect. 6. Thus the Ulam matrices provide us with estimates for the isolated spectrum of the Perron-Frobenius operator; it is these isolated spectral values that cause the estimates of [21] and [13] to be so poor. If there are no isolated eigenvalues for P (besides unity) 0 then we take the much more reasonable value λγ as a bound for the decay rate.
3. Coding We deal with the expanding situation first; the Anosov case will follow from this by extracting its expanding part as described in Sect. 6. It is convenient for us to code the dynamics of our map T using the symbolic dynamics provided by our Markov partition P. We shall determine a bound for the rate of decay of our induced system and use this as a bound for our original smooth system. Define ( Aij =
0,
if Int Ai ∩ T −1 Int Aj = ∅
1,
otherwise.
+ The matrix A defines a subset ΣA = {ξ ∈ Σ + : Aξi ξi+1 = 1 for all i ∈ Z+ } ⊂ Z+ + + is invariant under the {1, 2, . . . , s} = Σ of “allowable” sequences. This subset ΣA + one-sided left-shift σ : Σ defined by [σ(ξ)]i = ξi+1 . Denote by [ξ0 , . . . , ξN −1 ] the + cylinder set {η ∈ ΣA : η0 = ξ0 , . . . , ηN −1 = ξN −1 }. We define π : Σ + → M to be the semiconjugacy between Σ + and M that maps the sequence (ξ0 , ξ1 , . . .) onto the unique point x satisfying T i x ∈ Aξi for all i ≥ 0. We have the following commutative diagram; + → M is continuous and surjective; see Bowen [2] for details: π : ΣA σ
+ ΣA −→ π y
M
T
−→
+ ΣA π y
M
+ will be defined by dθ (ξ, η) = θN , where N is the maximal integer The metric on ΣA for which ξi = ηi , i = 0, . . . , N − 1, and 0 < θ < 1 is fixed. Define a norm on the space + of complex-valued functions on ΣA by
kφkθ = |φ|∞ + sup
N ≥0
sup + ξ,η∈ΣA ξ0 =η0 ,...,ξN −1 =ηN −1
|φ(ξ) − φ(η)| := |φ|∞ + |φ|θ . θN
(12)
+ Lemma 3.1. If ϕ is γ-H¨older on M (0 < γ ≤ 1), the function ϕ ◦ π is Lipschitz on ΣA with respect to the dλγ metric.
Computer-Assisted Bounds for Rate of Decay of Correlations
243
Proof. Let ϕ ∈ C γ (M, R); that is, there is a constant C2 < ∞ such that |ϕ(x) − ϕ(y)| ≤ C2 kx − ykγ for all x, y ∈ M . Let |ϕ|γ denote the minimal such constant. We note that if 1/λ := inf x∈M inf v∈Tx M kDx T (v)k/kvk we may find a universal constant C1 < ∞ such that if ξi = ηi for i < N , then π([ξ0 , ξ1 , . . . , ξN −1 ]) ⊂ M has diameter less than C1 λ−N (Ruelle [18]). Also suppose that dθ (ξ, η) = θN . Then |ϕ ◦ π(ξ) − ϕ ◦ π(η)| ≤ |ϕ|γ kπ(ξ) − π(η)kγ γ ≤ |ϕ|γ · C1 λN = |ϕ|γ C1γ (λγ )N = |ϕ|γ C1γ dλγ (ξ, η).
(13)
+ by We denote this space of Lipschitz functions on ΣA + Fλγ = {φ : ΣA → C : |φ|λγ < ∞}.
One has kϕ ◦ πkλγ = |ϕ ◦ π|∞ + |ϕ ◦ π|λγ |ϕ ◦ π(ξ) − ϕ ◦ π(η)| = |ϕ|∞ + sup + dλγ (ξ, η) ξ,η∈ΣA by (13) ≤ |ϕ|∞ + |ϕ|γ C1γ ≤ max{1, C1γ }kϕkγ .
(14)
We are now in a position to define a rate of decay for our symbolic system. Given a T -invariant probability measure µ on M , there is a σ-invariant probability measure + such that π ∗ ν = µ ([2] p.91); thus ψ ◦ π has ν integral zero. The measure ν on ΣA ν plays the role of the “physical” measure µ for our symbolic system. Denote by F⊥ λγ the subspace of Fλγ with zero ν integral, and define the Perron-Frobenius operator (or transfer operator) L : (Fλγ , k · kλγ ) by (Lφ)(ξ) =
X η∈σ −1 ξ
φ(η) . | det Dπ(η) T |
(15)
+ ⊥ F⊥ λγ is the space of test functions on ΣA that corresponds to the space Cγ of test functions on the smooth space M . L is the operator corresponding to P. Recall that if x 7→ | det Dx T | is C γ then ξ 7→ | det Dπ(ξ) T | is in Fλγ . We use the symbolic dynamics to rewrite (6) as follows: (hν ∈ Fλγ satisfies L(hν ) = + that assigns equal weight to cylinder sets hν and mΣ is the probability measure on ΣA of equal length. ν = hν · mΣ in the same way that µ = h · m.) Z Z k k = ϕ ◦ T · ψ dµ (ϕ ◦ T ) ◦ π · ψ ◦ π dν + M ΣA Z k = (ϕ ◦ T ) ◦ π · (ψ · hν ) ◦ π dmΣ Σ+ Z A = (ϕ ◦ π) · Lk ((ψ · hν ) ◦ π) dmΣ Σ+ A
244
G. Froyland
≤ kϕ ◦ πkkLk ((ψ · hν ) ◦ π)kλγ ≤ H2 kϕ ◦ πkλγ k((ψ · hν ) ◦ πkλγ Rk ≤ H3 kϕkγ kψ · hν kγ Rk .
(16)
Here R is any positive number greater than R0 , the spectral radius of L|F⊥γ . λ
F⊥ λγ
C⊥ γ
Remark 3.2. The class of functions is larger than the space ◦ π = {ϕ ◦ π : ϕ ∈ ⊥ Cγ }. Functions ϕ ◦ π with ϕ discontinuous on boundaries of partition sets in M (and inverse images of the boundaries) are allowed in F⊥ λγ because π(ξ) and π(η) may be very close, but will be assigned a distance of 1 if ξ0 6= η0 . Thus R0 , the spectral radius . of L|F⊥γ , will in general be larger than r0 , the spectral radius of P|C⊥ γ λ
4. Approximating L There are two main ingredients to the approximation of the spectrum of the operator L. The first thing we do is define a simpler operator LN that is close in norm to L. Because LN is close to L in the operator norm topology, standard perturbation theory tells us that their spectra are also close. The second step is to restrict the simpler operator LN to a small (finite-dimensional) invariant subspace. The operator restricted to this finitedimensional space now has a matrix representation, and its spectrum is easily computed as the eigenvalues of this matrix. 4.1. A Simpler Operator. To make finding the spectrum of L tractable, we construct a simpler operator X gN φ(η). (17) (LN φ)(ξ) = η∈σ −1 ξ + → R+ as an approximation to g(ξ) := 1/| det Dπ(ξ) T | that is We shall think of gN : ΣA constant on N -cylinders [ξ0 , ξ1 , . . . , ξN −1 ]. Formally, define
gN =
m(π([ξ0 , . . . , ξN −2 ]) ∩ T −1 π([ξ1 , . . . , ξN −1 ])) ; m(π([ξ1 , . . . , ξN −1 ]))
(18)
see Fig. 2. Lemma 4.1. |g − gN |∞ ≤ |g|θ θN . Proof. Clearly, inf
ξ∈[ξ0 ,...,ξN −1 ]
| det Dπ(ξ) T | ≤
m(π([ξ1 , . . . , ξN −1 ])) ≤ sup | det Dπ(ξ) T |, m(π([ξ0 , ξ1 , . . . , ξN −1 ])) ξ∈[ξ0 ,...,ξN −1 ]
so that m(π([ξ0 , ξ1 , . . . , ξN −2 ]) ∩ T −1 π([ξ1 , . . . , ξN −1 ])) 1 ≤ ξ∈[ξ0 ,...,ξN −1 ] | det Dπ(ξ) T | m(π([ξ1 , . . . , ξN −1 ])) 1 . ≤ sup ξ∈[ξ0 ,...,ξN −1 ] | det Dπ(ξ) T | inf
Computer-Assisted Bounds for Rate of Decay of Correlations
245
π([ξ0,...,ξΝ−1])
T
π([ξ0,...,ξΝ−2])
π([ξ1,...,ξΝ−1])
Fig. 2. Schematic representation of the sets in M involved in defining gN . We identify π([ξ0 , . . . , ξN −2 ]) with An,i and π([ξ1 , . . . , ξN −1 ]) with An,j in Eq. (9)
Hence, |g − gN |∞
1 1 − inf ≤ max sup ξ∈[ξ0 ,...,ξN −1 ] | det Dπ(ξ) T | ξ0 ,...,ξN −1 ∈{1,...,r} ξ∈[ξ0 ,...,ξN −1 ] | det Dπ(ξ) T |
=
sup
|g(ξ) − g(η)|
+ ξ,η∈ΣA ξ0 =η0 ,...,ξN −1 =ηN −1
≤ |g|θ θN , where we have used the definition of |g|θ in Eq. (12).
By standard perturbation theory (e.g. Kato [12] or Dunford & Schwartz [6] Lemma 6.3) if the operators L and LN are close in operator norm, then their spectra are also close in the sense of Hausdorff distance. We may simply bound the norm of the difference L − LN as follows.
X
1
− gN (η) kφkθ k(L − LN )φkθ ≤
| det Dπ(η) T | η∈σ −1 ξ
1
− gN (·) ≤ 8kφkθ
. | det Dπ(·) T |
θ
(19)
θ
where 8 is the maximum number of inverse branches of T . It is shown in Froyland [9], Lemma 4.1 that gN ◦ π −1 (·) → 1/| det D· T | uniformly in the C 0 topology on M . However, this corresponds to convergence of gN (·) → 1/| det Dπ(·) T | in k · kθ with θ = 1 and this is not enough for convergence of LN to L in the k · kθ operator norm, where 0 ≤ θ < 1. To obtain the required convergence we use a small trick, namely slightly relaxing our norm and enlarging our space of test functions. Proposition 4.2. Let g ∈ F+θ and suppose gN is constructed as above. For any 0 < θ < θ0 < 1 we have that N θ 0 , N ≥ 0. (20) |g − gN |θ ≤ |g|θ θ0
246
G. Froyland
Proof. The only property required of the gN ’s is that listed in Lemma 4.1. We refer the reader to Proposition 1.3, Parry & Pollicott [16] for a proof. If T is C 1+γ , then | det DT | is C γ . We shall consider L to be acting on the Banach space (F⊥ , k · kλγ 0 ), where 0 < γ 0 < γ. For brevity we shall sometimes write k · kθ0 to mean λγ 0 k · kλγ 0 . By virtue of Proposition 4.2 and Lemma 4.1, gN → 1/| det Dπ(·) T | in the k · kθ0 norm (see Eq. (22)), and so k(L − LN )kθ0 → 0. As we now have norm convergence of LN to L we may apply standard perturbation results to the spectrum of L. The following lemma is a suitably modified version of Lemma 6.3 [6]. Lemma 4.3. If −1 −1 kL − LN kθ0 ≤ kR(z, LN )k−1 θ 0 := k(LN − zI) kθ 0 ,
(21)
for all z ∈ C \ B (σ(LN )), then σ(L) ⊂ B (σ(LN )), where B (σ(LN )) denotes an -neighbourhood of the spectrum of LN . Proof. Recall that if LN − zI is invertible, then so too are all operators in a ball centred 0 at LN − zI of radius k(LN − zI)−1 k−1 θ 0 . Thus if k(LN − zI) − (L − zI)kθ ≤ k(LN − zI)−1 k−1 for z ∈ C \ B (σ(L )) then L − zI is invertible on C \ B (σ(L N N )) and so θ0 σ(L) ⊂ B (σ(LN )). We summarise our findings so far. Proposition 4.4. H(σ(LN ), σ(L)) → 0 as N → ∞, where H denotes the Hausdorff metric on C defined by H(E, F ) = max{supx∈E dist(x, F ), supy∈F dist(E, y)} for E, F ⊂ C. Proof. Putting together (19), Lemma 4.1, and Proposition 4.2, we see that kL − LN kθ0 ≤ 8 |g − gN |∞ + |g − gN |θ0 N ! θ 0 N ≤ 8 |g|θ0 θ + |g|θ θ0 →0
(22)
as N → ∞.
By applying Lemma 4.3 twice (reversing the roles of L and LN ) we are done.
We shall now show how to extract the isolated part of the spectrum of LN . 4.2. A Smaller Space. The operator LN has a finite dimensional invariant subspace, + . namely those functions that are constant on N -cylinders [ξ0 , ξ1 , . . . , ξN −1 ] in ΣA −1 −(N −1) Recall that π([ξ0 , ξ1 , . . . , ξN −1 ]) = Aξ0 ∩ T Aξ1 ∩ · · · ∩ T AξN −1 , so that + functions φ : ΣA → R piecewise constant on N -cylinders correspond to functions WN −1 ϕ : M → R piecewise constant on t he refined Markov partition i=0 T −i P. Let VN = sp{χ[ξ0 ,ξ1 ,...,ξN −1 ] : ξi ∈ {1, . . . , r}} ⊂ Fθ0 be the span of the test functions constant on N -cylinders. We may now define two operators, namely LN |VN : VN (the restriction of LN to VN ), and LN /VN : Fθ0 /VN (the quotient operator on the quotient space Fθ0 /VN ). There are three very important points regarding these induced operators:
Computer-Assisted Bounds for Rate of Decay of Correlations
(i)
247
The operator LN |VN has matrix representation [Pn ]ij =
m(An,i ∩ T −1 An,j ) , m(An,j )
where the partition Pn = {An,1 , . . . , An,n } in M is formed from the images (under π) of the N -cylinder sets. That is, Pn = {π([ξ0 , . . . , ξN −1 ]) : [ξ0 , . . . , ξN −1 ] ⊂ + }. This is clear from the construction of the weight function gN in (18). ΣA (ii) The spectrum of LN is contained in the union of the spectra of LN |VN and LN /VN ; symbolically, σ(LN ) ⊂ σ(LN |VN ) ∪ σ(LN /VN ). See, for example, Erdelyi and Lange [7]. Note that the spectrum of LN |VN is simply the set of eigenvalues of the matrix representing LN |VN , that is, the eigenvalues of the matrix Pn defined above. (iii) The spectral radius of LN /VN in Fθ0 /VN is bounded above by θ. This is proven in Pollicott [17]. By putting these three facts together, we see that the part of σ(LN ) contained outside the disk {|z| > θ0 } must be wholly contained in σ(LN |VN ). Further, σ(LN |VN ) may be simply computed as the eigenvalues of the matrix Pn . Thus 0 max |z| . (23) R0,N := rσ (LN |F⊥0 ) ≤ max θ , θ
z∈σ(Pn )\{1}
We now have an upper bound for the spectral radius of the simpler operator LN acting on the space Fθ0 , and by our perturbation theorems, this will be close to an upper bound for L acting on Fθ0 . Proof (of Main Result). We first note that if ϕ is an eigenfunction of P, then ϕ ◦ π is an 0 eigenfunction of L. Thus σ(P) \ {|z| ≤ λγ } ⊂ σ(L) \ {|z| ≤ θ0 }. By Proposition 4.4 we have in particular that H(σ(L) \ {|z| ≤ θ0 }, σ(LN ) \ {|z| ≤ θ0 }) → 0 as N → ∞. Finally, we have just shown that σ(LN ) \ {|z| ≤ θ0 } ⊂ σ(Pn ). By putting these three observations together we are done. Remark 4.5. Corollary 3.3 of [19] states that the part of the discrete spectrum of our transfer operator L that we are considering does not vary as we vary the norm k · kθ and function space Fθ with θ. Formally, the spectra of L : (Fθ , k·kθ ) and L : (Fθ0 , k·kθ0 ) 0 0 coincide in the region {z : |z| > θ }. In particular, this tells us that the choice of θ is not overly important in practice. 5. When T is Anosov In the case where T is Anosov (all of M is uniformly hyperbolic), one must not deal directly with the full Perron-Frobenius operator of T , as the action of P would be to stretch in unstable directions (decrease the slope of a test function) and to compress in stable directions (increase the slope of a test function). Thus under the action of P, the C γ norm of test functions would most likely increase with each application of P. We instead induce an expanding map from T on the unstable boundaries of a Markov partition as described in Remark 2.3 (ii). This induced expanding map will have its own Perron-Frobenius operator, and we may calculate its rate of mixing as before. We then
248
G. Froyland 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Fig. 3. Graph of the piecewise quartic interval map T as defined by (24)
appeal to a standard result to show that if R0 is a bound for the rate of mixing of the 1/3 induced expanding map, then R0 is a bound for the rate of mixing of the full Anosov map. The advantage of our matrix technique is that the induced expanding map need never be constructed. We compute the matrices Pn as before using the full map T , and simply take the cube root of the maximal non-unit eigenvalue to obtain a bound. It is shown in [9] Lemmas 7.1 and 7.3 that the matrix Pn as defined in (9) may be taken via a similarity transformation into the matrix representing LN |VN (here LN is the Perron-Frobenius operator for the induced expanding map acting on test functions on symbol space). Thus as in the expanding case, the eigenvalues of Pn outside the disk 0 0 {|z| ≤ λγ } coincide with the spectrum of LN outside {|z| ≤ λγ }. We calculate Pn rather than the true matrix representation of LN |VN as it is much easier to compute and does not require the construction of the induced expanding map. 1/3 The fact that we may take R0 as a bound for the rate of decay of correlations for C γ test functions, when R0 is a bound for the induced expanding map, can be found in [16], Proposition 2.4, for example.
6. An Example For ease of presentation, we illustrate our technique for a one-dimensional system. The map that we choose is a piecewise quartic interval map with two inverse branches, defined by1 ( 0.9982x4 − 0.8104x3 +0.1390x2 +1.7821x+0.1131, 0.1154 ≤ x ≤ 1/2 (24) . Tx = 1.1572x4 −3.5886x3 +4.0826x2 −3.7830x+2.2471, 1/2 ≤ x ≤ 1 1
All numerical values in this section are approximate only.
Computer-Assisted Bounds for Rate of Decay of Correlations
249
This rather complicated-looking piecewise polynomial is just a slight nonlinear perturbation of a piecewise linear Markov map; see Fig. 3. The perturbation is so slight, that the graph of T appears linear. That T is not linear is evident from the graph of the unique invariant density h of T as shown in Fig. 4. If T were linear, this invariant 1.5
1
0.5
0
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Fig. 4. Graph of the unique invariant density h of T
density would be piecewise constant. The density h defines a probability measure µ, and as h is everywhere positive, the Birkhoff theorem tells us that Lebesgue almost all x ∈ [0.1154, 1] exhibit µ. Clearly, µ is a natural candidate for the physical measure of T . We are interested in how quickly initial blobs of mass in state space distribute themselves according to the measure µ under the action of T . That is, we want an estimate of the rate of mixing, or the rate of decay of correlations for T with respect to the measure µ. The value of λ = 1/ minx∈[0.1154,1] |T 0 (x)| is approximately 0.5706. This shall be an initial bound for our rate of decay. As T is piecewise C 2 we choose P to act on the space of piecewise C γ real-valued test functions, γ < 1. Typically, if there are kinks in the smoothness of T S on a set K ⊂ M , we allow breaks in the smoothness of the test functions at B = i>0 T i K. In our example, K = {1/2} and B = {T 3 (1/2), T 4 (1/2)} = {0.3194, 0.6806} (as T 4 (1/2) is fixed under T ; we have discarded the endpoints of M = [0.1154, 1].) Thus P will act on test functions in C γ (M, R), γ < 1, with breaks allowed at 0.3194 and 0.6806. The norm for this space is k · kγ as defined in (5) where each pair x, y ∈ M are both elements of one of the three subintervals [0.1154, 0.3194], [0.3194, 0.6806], [0.6806, 1]. Our strategy is as follows. Choose γ < 1. If the Perron-Frobenius operator for T has no isolated spectrum, then we take 0.5706γ as our upper bound. If, however, there are some isolated eigenvalues of the Perron-Frobenius operator lying outside the disk {|z| ≤ 0.5706γ }, we will approximate these as eigenvalues of our matrices Pn , and use these values as our upper bounds instead. The map T has been chosen for our example because it is close to a map T˜ whose Perron-Frobenius operator P˜ is known to have some eigenvalues lying outside the region {|z| ≤ λ}. The author is grateful to Viviane Baladi for communicating the piecewise linear map T˜ x = −1.7525|x − 1/2| + 1 as an example of a map with this property. Our nonlinear map T is a small perturbation of T˜ that demonstrates a nontrivial application of our technique.
250
G. Froyland
An initial Markov partition of P = {[0.1154, 0.3194], [0.3194, 0.5], [0.5, 0.6806], [0.6806, 1]} is formed. Refined Markov partitions are constructed by simply taking the WN −1 join of P with its inverse images, that is, define P(N ) = i=0 T −i P, where A ∨ B := {A ∩ B : A ∈ A, B ∈ B}. The 12 set partition P(2) is illustrated in Fig. 3. At each stage of refinement, a transition matrix Pn is constructed (note that Pn is an n × n matrix, whereas N refers to the number of inverse iterates required to construct the partition; thus P12 is the matrix for corresponding to P(2) ). The absolute values of the maximum non-unit eigenvalues of the sequence of matrices for N = 0, . . . , 9 are given in Table 1. Table 1. Estimates for the magnitude of the largest non-unit eigenvalue of the transfer operator of L Inverse Iterates N
Bound for rate of decay maxz∈σ(Pn )\{1} |z|
0 1 2 3 4 5 6 7 8 9
0.60037 0.59687 0.59649 0.59846 0.59815 0.59812 0.59823 0.59812 0.59819 0.59816
Figure 5 shows the position of the eigenvalues of P212 (corresponding to N = 7) relative to the region bounding the essential spectrum. From the above results, we can
1 0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1 −1
−0.5
0
0.5
1
Fig. 5. A plot of the spectrum of P212 (the transition matrix for T constructed from a Markov partition refined from 7 inverse iterations). The dotted circle is given by {|z| = 0.5706}; it is known that the essential spectrum is contained in this region. The crosses represent the eigenvalues of P212 that are estimating the isolated spectrum of P
fairly confidently state that P does have a non-trivial isolated spectrum, and that there-
Computer-Assisted Bounds for Rate of Decay of Correlations
251
fore, it is this isolated spectrum that will control the rate of decay. Corollary 2.4 says that a bound for the rate of decay is max λγ , limn→∞ maxz∈σ(Pn )\{1} |z| . We thus take 0.599 as a safe upper bound for the mixing rate. By Remark 4.5, 0.599 is also the rate of decay with respect to piecewise C 1 test functions. In Sect. 8 we shall see that not only is 0.599 a bound, but that 0.5982 is an estimate of the rate of decay. This result will follow from an analysis of the eigenvectors of Pn . As mentioned earlier, if it so happened that all of the eigenvalues of P212 were contained in {|z| ≤ 0.5706γ }, we would simply take 0.5706γ as our upper bound instead.
7. Order of Convergence of the Spectrum In this section we consider the situation where L has nontrivial isolated spectrum. We wish to know the rate at which the isolated spectrum of LN approaches that of L. Recall that any isolated spectral values of LN appear as eigenvalues of the matrix Pn , and that the spectral radius of LN |F⊥0 coincides with the magnitude of the second largest θ eigenvalue of Pn . Denote by zˆ one of the eigenvalues of L satisfying |z| ˆ = maxz∈σ(L)\{1} |z|. Whether or not this maximum is attained by more than one eigenvalue of L (in the case of a complex conjugate pair, for example) we do not care, as we are only concerned with the magnitude of the eigenvalue. We decompose the spectrum as σ(L) = Σ0 ∪ Σ1 , where ˆ and Σ1 = σ(L) \ {z}. ˆ One has the L-invariant subspace decomposition Σ0 = {z}, Fθ0 = X0 ⊕ X1 , satisfying σ(L|Xi ) = Σi , i = 0, 1 ([12] Theorem III-6.17). Denote by 0 ⊂ C a simple closed curve containing an open neighbourhood of z, ˆ but no elements of Σ1 . The operator Π0 : Fθ0 → X0 defined by Z 1 Π0 = − (L − zI)−1 dz (25) 2πi 0 is a projection onto X0 along the direction X1 . The following theorem is a modified version of Theorem IV-3.16 [12]. Theorem 7.1. If kL − LN kθ0 is sufficiently small, the spectrum of LN is separated by 0 into two parts Σ0,N and Σ1,N . There is an associated LN -invariant decomposition Fθ0 = X0,N ⊕ X1,N with dim(Xi,N ) = dim(Xi ), i = 0, 1. Furthermore the projection onto X0,N along X1,N given by Z 1 Π0,N = − (LN − zI)−1 dz 2πi 0 approaches Π0 in norm as kL − LN kθ0 → 0. Proof. See [12] p.213.
For large enough N we have that a portion of σ(LN ) is isolated inside the closed curve 0, and that the dimension of the corresponding invariant subspace is equal to the dimension of the eigenspace associated with z. ˆ We now go on to bound the distance between the ˆ The following two results are borrowed from Chatelin [8] p.278. elements of Σ0,N and z. Lemma 7.2. Π0,N |X0 : X0 → X0,N is a bijection for sufficiently large N .
252
G. Froyland
Proof. Let φ0 ∈ X0 . One has |kφ0 kθ0 −kΠ0,N φ0 kθ0 | = |kΠ0 φ0 kθ0 −kΠ0,N φ0 kθ0 | ≤ k(Π0 −Π0,N )φ0 kθ0 ≤ kφkθ0 /2, for sufficiently large N by Theorem 7.1. Thus kΠ0,N φ0 kθ0 ≥ kφ0 kθ0 /2, and so −1 kθ0 ≤ 2. kΠ0,N |X0 kθ0 ≥ 1/2 =⇒ k Π0,N |X0 Proposition 7.3. Let N be large enough to make the conclusions of Theorem 7.1 and −1 ◦ LN ◦ Lemma 7.2 true. Define L, LN : X0 by L = L|X0 and LN = Π0,N |X0 Π0,N |X0 . Then kL − LN kθ0 ≤ 2kL − LN kθ0 ), ˆ = O(kL − H(Σ0,N , {z})
and
1/ dim(X0 ) L N kθ 0 ).
Proof. Clearly σ(L) = {z} ˆ and σ(LN ) = Σ0,N . Now,
−1
◦ LN ◦ Π0,N |X0 0
L − Π0,N |X0
θ −1
= L − Π0,N |X0 ◦ Π0,N ◦ LN |X0 0
θ −1
= Π0,N |X0 ◦ Π0,N ◦ (L − LN ) |X0 0 θ
−1
≤ Π0,N |X0 ◦ Π0,N kL − LN kθ0 ≤ 2kL − LN kθ0
θ0
for N as in Lemma 7.2.
For the second part of the proposition we apply a standard result of Elsner (see Stewart & Sun [22], p.168, for example). Let [L], [LN ] denote matrix representations of L and LN with respect to some basis of X0 . Sublemma 7.4. H(L, LN ) ≤ k[L]k2 + k[LN ]k2
1−1/ dim(X0 )
1/ dim(X0 )
k[L] − [LN ]k2
,
where k · k2 is the standard L matrix norm. 2
Using the fact that all norms are equivalent on finite dimensional spaces, by Sublemma 7.4 we are done. Eq. (22) and Proposition 7.3 imply that the order of convergence of maxz∈σ(Pn )\{1} |z| to zˆ is at least exponential in N with rate max{θ0 , θ/θ0 }. Such behaviour is corroborated 0 by Fig. 6. In our example, θ = λ = 0.5706, and we may take θ0 = λγ = 0.597, say. The exact figure of θ0 doesn’t matter, so long as it lies between θ and where we think our discrete eigenvalues are. In this case, the rate of convergence is at least (0.5706/0.597)N = 0.9558N . If we wish to consider how quickly our estimates converge when compared to an increase in the size of Pn (in other words, the size of the partition Pn .), we may proceed as follows. Define Θ := 3 := 1/ supx∈M | det Dx T | (T expanding). There exists a constant C such that the number of sets n in a Markov partition constructed from N κ > 0 such that inverse iterates of the original partition satisfies n ≤ C/Θ N . Choose Θκ = max{θ0 , θ/θ0 }; in other words, put κ = log max{θ0 , θ/θ0 } / log Θ. Then Cκ 1 . kL − LN kθ0 ≤ 8|g|θ (Θκ )N ≤ 8|g|θ κ = O n nκ In our example, Θ = 0.5515, and κ = log(0.5706/0.597)/ log(0.5515) = 0.0760.
Computer-Assisted Bounds for Rate of Decay of Correlations
253
0.601 0.6005 0.6 0.5995 0.599 0.5985 0.598 0.5975 0.597 0.5965 0.596 0
1
2
3
4
5
6
7
8
9
Fig. 6. A plot of maxz∈σ(Pn )\{1} |z| versus N . Note that the convergence to a single value appears to be occurring exponentially as expected. The safe bound of 0.599 is shown dotted
8. Estimating the Rate of Decay Using Eigenvectors By computing the eigenvectors of the matrix Pn in addition to its eigenvalues, one may further refine the analysis of the rate of decay. We assume that we are in the situation where L has nontrivial isolated spectrum, and N is large enough to satisfy the conditions of Proposition 7.3. We denote by zˆN one of the eigenvalues of Pn satisfying |zˆN | = maxz∈σ(Pn )\={1} |z|. Norm convergence of LN to L not only guarantees that ˆ but that the the isolated spectrum of L is approached by that of LN (that is, zˆN → z), corresponding eigenvectors (eigenfunctions) of L are also approximated in k · kθ0 norm by those of LN . Formally, one has the following result; see [8], Theorem 6.7. Theorem 8.1. Under the hypotheses of Proposition 7.3, let φˆ N be an eigenvector of Pn corresponding to the eigenvalue zˆN ∈ Σ0,N . Consider φˆ N as a test function in X0,N . Then 1/ dim(X0 ) . (26) dist(φˆ N , X0 ) := min kφˆ N − φkθ0 = O kL − LN kθ0 φ∈X0
Such convergence results sometimes allow us to refine our bounds into estimates of the rate of decay. If the maximal non-unit eigenvalue zˆN of Pn satisfies |zˆN | > θ0 , then |zˆN | actually gives us an estimate, rather than just a bound, for the spectral radius of γ0 L|F⊥0 . As mentioned in Remark 3.2, the space F⊥ θ 0 is a good deal larger than C (M, R) θ
0
and C γ (M, C). F⊥ θ 0 contains a lot of functions that do not concern us, namely those with more discontinuities than we are allowing. Recall that if T is piecewise smooth, we allow breaks in the smoothness of test functions at all forward images of the breaks in T . However, functions in F⊥ θ 0 are allowed breaks at infinitely many places. We can identify these unwanted eigenfunctions by numerically examining the eigenvectors of Pn . If any of these eigenvectors have more breaks in smoothness than is allowed, we disregard their corresponding eigenvalues. By removing the offending eigenfunctions and corresponding eigenvalues, we sharpen our bounds on the rate of decay. If, after removal of all irrelevant eigenvalues, there still remains an eigenvalue that is clearly
254
G. Froyland 0
outside the disk {|z| ≤ λγ }, then the magnitude of this eigenvalue is an estimate of the rate of decay of correlations, rather than merely a bound. This is because we have found an explicit approximation to an eigenfunction, namely the corresponding eigenvector, for which (6) displays a decay rate given by the magnitude of the eigenvalue. 20 10 0 −10 −20 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
20 10 0 −10 −20 0.1
Fig. 7. The real and imaginary part of the (approximate) eigenfunction corresponding to the eigenvalue −0.5014 + 0.3261i calculated using the matrix P212 (N = 7). These two functions (considered separately as real-valued) also span a two-dimensional real invariant subspace associated with the complex eigenvalues −0.5014 ± 0.3261i
In our example, both of our eigenvalues of Pn lying outside {|z| ≤ 0.5706} are complex. We compute the eigenvectors corresponding to this complex conjugate pair. From Fig. 7 it is apparent that the corresponding eigenfunction is piecewise smooth with breaks only at the allowed positions of x = 0.3194 and x = 0.6806. Thus 0.59816 is an estimate of the rate of decay of (T, µ) with respect to complex-valued C 1 test functions, and the function shown in Fig. 7 is an example of a function for which (6) decays at this rate. This value is also an estimate for the rate of decay of (T, µ) with respect to real-valued C 1 test functions. The complex conjugate eigenvalue pair have an associated two-dimensional invariant subspace of real-valued functions (spanned by the two functions shown in Fig. 7). This two-dimensional invariant subspace contains no eigenfunctions, but all functions in the subspace decay in norm under the action of P at the same rate as the complex eigenfunctions. 9. Comparisons and Discussion We have presented a method of computing a bound for the rate of decay of correlations for C 1+γ expanding and hyperbolic maps acting on C γ , 0 < γ ≤ 1, test functions. Our technique used the relative volumes of intersection of Markov partition sets with their inverse images to provide us with a matrix approximation of the Perron-Frobenius operator for the map. This approximation was close to P in the operator norm topology, and so their spectra were close also. To compute the spectrum of our matrix approximation was a simple matter. Our method also yields a numerical approximation of the physical invariant measure of T “for free”; see [9] for details.
Computer-Assisted Bounds for Rate of Decay of Correlations
255
In the introduction, we mentioned a simple-minded method of calculating the correlation function (1.2) by choosing specific real-valued functions ϕ and ψ, sprinkling a large number of points in M distributed according to µ, and iterating them forward k steps. The integral with respect to µ is then approximated by averaging the contributions from the sprinkled points; in this way Cϕ,ψ (k) is “computed”. Once one has values for Cϕ,ψ (k) for k = 1, . . . , K, one may subject this sequence to a number of analyses to determine a rate of decay. The simplest is to perform a sum-of-exponentials fit to the data. One may also perform Fourier analysis (see Isola [11], for example) on the sequence to try to extract the frequencies corresponding to decay rates. Either way, the number of sprinkled points required to maintain accuracy becomes prohibitively large very quickly because of the very exponential stretching properties one is trying to measure. Usually, this distribution of points approximating µ is obtained by running out a single long orbit of T . This procedure itself is an unreliable method for approximating µ because of computer roundoff and the possibility that the orbit chosen represents atypical behaviour (for example, it may be an orbit falling into a weak periodic sink, while we only observe its initial transient effects and do not detect the eventual periodicity). Our construction is a single-step method, thus avoiding problems such as compounding computer roundoff and long-term transient effects. Finally, the particular functions ϕ and ψ chosen may not be representative of the slowest possible decay rate. Results obtained from such an analysis must be treated with caution. Rugh [20] has shown that one may compute the spectrum of the Perron-Frobenius operator for real analytic hyperbolic maps acting on the space of analytic test functions, using the periodic points of the map. This method of using the periodic orbits of a map to build up a picture of the dynamics has been popular with physicists for some time; see Christiansen et al. [3] for numerical examples of the determination of correlation spectra. However, the technique is very restrictive, requiring extremely smooth (analytic) maps and observables ϕ, ψ. From a physical point of view, one would expect to have to deal with both systems and physical observables that are not infinitely differentiable. In this paper, we have made the choice of observables that are C 1 or at least C γ , 0 < γ < 1, and believe that such a model of the world is much more realistic. Our main result is of a similar vein to that of Baladi et al. [1]. In their paper they consider expanding Markov maps of the interval, with the Perron-Frobenius operator acting on test functions of bounded variation, and use the variational norm to compute the spectra. Again, we consider physical observables that are C 1 or H¨older continuous to be more realistic than observables of bounded variation. Their result is also restrictive in the sense that it is only applicable to one-dimensional expanding systems. Although the matrix construction of [1] turns out to be identical to ours, their method of proof is entirely different. Our goal was to improve the known theoretical bounds for the rate of mixing with respect to C 1 test functions. In the table below, we compare the estimates of [21] and [13, 14] with our computer-assisted bounds. It is clear that just a small amount of nonlinearity greatly affects the known theoretical bounds. It is interesting that although Ruelle’s [19] bound of λ for the essential spectral radius increases as we add nonlinearities to T˜ x (because the minimum slope 1/λ decreases), our bound for the mixing rate actually improves. That is, our bound for the isolated spectrum moves in from a magnitude of 0.6009 to somewhere around 0.598–0.599. Thus in this case, adding nonlinearities may actually speed up the mixing of phase space. Such an observation would not be possible using the bounds of [21] or [13, 14] as added nonlinearities always worsen their estimates.
256
G. Froyland
Table 2. Our computed-assisted bound for the rate of decay with respect to piecewise C 1 test functions compared to known theoretical bounds1 Map T˜ x = −1.7525|x − 1/2| + 1 T as in (24)
Ruelle bound (essential spectrum)
Our bound
Liverani bound
Rychlik bound
0.5652 0.5706
0.6009 0.599
0.6009 0.9162
0.9986 0.99999915
1 Liverani’s results strictly only apply to interval maps for which each branch is onto. As the left branches of T˜ and T are not onto, we generously make this estimate the best possible for T˜ , assuming that his bound actually finds the isolated eigenvalues. When the map is piecewise linear and all branches are onto, his algorithm does return the best estimate, however, for such maps there are no isolated eigenvalues to find. For the map T , we compute his bound again assuming that both branches of this nonlinear map are onto.
We have established an order of convergence of our bounds to the “optimal” bounds in terms of (i) the number of inverse iterates N of T used to construct the Markov partition and (ii) the number of partition sets n used to construct our matrix approximation Pn . A computation of the eigenvectors of Pn may allow a refinement of our bound, in some cases producing an estimate of the rate of decay rather than merely a bound. The order of convergence of such estimates is identical to the order of convergence of our bounds. Finally, we acknowledge that although the construction of Markov partitions for one-dimensional maps is relatively easy, it may be time consuming for some higher dimensional maps. If one is in a hurry, one may construct matrices defined by (9) using any finite partition of M into connected subsets (a triangulation, for example), and compute its eigenvalues. Because one no longer has the coding machinery available, rigorous results are not as easy to come by. In the author’s experience, “most” randomly chosen partitions tend to give pessimistic estimates of the rate of mixing. That is, for systems with known rates of mixing, matrices constructed from arbitrary partitions tend to have eigenvalues lying outside the disk containing all the true eigenvalues of the Perron-Frobenius operator. A discussion of these results and which mixing rate is actually being approximated (be it with respect to C 1 test functions, L1 test functions, or test functions of bounded variation) is contained in [10]. Acknowledgement. The author thanks Lai-Sang Young for helpful discussions and Viviane Baladi for communicating the map T˜ . This work was partially supported by a grant from the Australian Research Council.
References 1. Baladi, V., Isola, S. and Schmitt, B.: Transfer operator for piecewise affine approximations of interval maps. Annales de l’Institut Henri Poincar´e - Physique th´eorique, 62 (3), 251–265 (1995) 2. Bowen, R.: Equilibrium States and the Ergodic Theory of Anosov Diffeomorphisms, Volume 470 of Lecture Notes in Mathematics. Berlin: Springer-Verlag, 1975 3. Christiansen, F., Paladin, G. and Rugh, H.H.: Determination of correlation spectra in chaotic systems. Phys. Rev. Lett. 65 (17), 2087–2090 (1990) 4. Collet, P. and Isola, S.: On the essential spectrum of the transfer operator for expanding Markov maps. Commun. Math. Phys. 139, 551–557 (1991) 5. Crawford, J. and Cary,J.: Decay of correlations in a chaotic measure-preserving transformation. Physica D 6 (2), 223–232 (1983) 6. Dunford, N. and Schwartz, J.T.: Linear Operators. I. General Theory, Volume 7 of Pure and Applied Mathematics. New York: Interscience, 1958 7. Erdelyi, I. and Lange, R.: Spectral Decomposition on Banach Spaces, Volume 623 of Lecture Notes in Mathematics. Berlin: Springer-Verlag, 1977
Computer-Assisted Bounds for Rate of Decay of Correlations
257
8. Chatelin, F.: Spectral Approximation of Linear Operators. Computer Science and Applied Mathematics. New York: Academic Press, 1983 9. Froyland, G.: Finite approximation of Sinai-Bowen-Ruelle measures of Anosov systems in two dimensions. Random & Comp. Dyn. 3 (4), 251–264 (1995) 10. Froyland, G.: Estimating Physical Invariant Measures and Space Averages of Dynamical Systems Indicators. PhD thesis, The University of Western Australia, Perth, 1996. Available at http://maths.uwa.edu.au/∼gary/ 11. Isola, S.: Resonances in chaotic dynamics. Commun.Math. Phys. 116, 343–352 (1988) 12. Kato, T.: Perturbation Theory for Linear Operators, Volume 132 of Grundlehren der mathematischen Wissenschaften. Berlin: Springer-Verlag, second edition, 1976 13. Liverani, C.: Decay of correlations. Ann. Math. 142 (2), 239–301 (1995) 14. Liverani, C.: Decay of correlations for piecewise expanding maps. J. Stat. Phys. 78 (3/4), 1111–1129 (1995) 15. Ma˜ne´ , R.: Ergodic Theory and Differentiable Dynamics. Berlin: Springer-Verlag, 1987 16. Parry, W. and Pollicott, M.: Zeta Functions and the Periodic Orbit Structure of Hyperbolic Dynamics Volume 187–188 Soci´et´e Math´ematique de France, 1990 17. Pollicott, M.: Meromorphic extensions of generalised zeta functions. Invent. Math. 85 147–164 (1986) 18. Ruelle, D.: A measure associated with axiom-A attractors. Am. J. Math. 98 (3), 619–654 (1976) 19. Ruelle, D.: The thermodynamic formalism for expanding maps. Commun.Math. Phys. 125, 239–262 (1989) 20. Rugh, H.H.: The correlation spectrum for hyperbolic analytic maps. Nonlinearity, 5, 1237–1263 (1992) 21. Rychlik, M.: Regularity of the metric entropy for expanding maps. Trans. Am. Math. Soc. 315 (2), 833–847 (1989) 22. Stewart, G.W. and Sun, J.G.: Matrix Perturbation Theory. Computer Science and Scientific Boston: Computing. Academic Press, 1990 23. Ulam, S.M.: Problems in Modern Mathematics. New York: Interscience, 1964 Communicated by Ya.G. Sinai
Commun. Math. Phys. 189, 259 – 261 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Roland L. Dobrushin A. Jaffe, J. Lebowitz, Ya. Sinai
We dedicate this issue of Communications in Mathematical Physics to our dear friend and colleague Roland Dobrushin, who died of cancer on November 13, 1995, at the age of 65. Dobrushin was not only a great, classic scientist, but he was also a man of integrity, courage, and good humor. These qualities stood out at a time and place where they were extremely scarce commodities. His joy of life was infectious and his death is an irreplaceable loss to his family, his friends, and the world scientific community. Dobrushin made fundamental contributions to many fields of mathematics and to mathematical physics. Others have treated his work on analysis and probability theory, [1, 9, 10, 12, 13, 14, 15], so we concentrate here on his achievements in mathematical physics. Trained as a probabilist, Dobrushin became intrigued early in his career with the question of how to formulate the notion of a statistical state in infinite volume. In a finite volume, there is no problem, as one can use standard Gibbs
260
A. Jaffe, J. Lebowitz, Ya. Sinai
distributions. Taking this as a starting point, Dobrushin introduced an interesting mathematical definition of an equilibrium statistical state by specifying a net of conditional distributions, obtained by imposing different boundary conditions in finite volumes [2]. Lanford and Ruelle introduced a similiar notion at approximately the same time [11]. Since then, we usually refer to conditions for conditional distributions as DLR-equations. This point of view had an immediate effect: researchers in the field began to regard the problem of characterizing phase transitions as a problem of how to describe the set of all possible Gibbs states for a given inter-particle potential and thermodynamic parameters. Today, Dobrushin’s point of view has become generally accepted – not only by mathematicians – but also by theoretical physicists. Dobrushin pioneered the development of a mathematical formulation of the famous Peierls’ contour method in the theory of phase transitions [17], rediscovering it independently of the related work of Griffiths [8]. Dobrushin proved the existence of phase transitions in the low-temperature Ising model, establishing non-uniqueness of the infinite-volume Gibbs’ state [3]. Combining his ideas on limiting states with Peierls’ method, Dobrushin established a remarkable phenomenon – the existence of a non-translationally invariant Gibbs state at low temperatures for Ising models in dimension d ≥ 3 [4]. Today, this is known as the “roughening transition” or the “Dobrushin phase”. In his last years, working together with R. Kotecky and S. Shlosman, Dobrushin developed the theory of phase “droplets” in Ising type models [5]. This body of work justifies the famous Wulff construction in the theory of solids [18]. Dobrushin possessed an unusual intuition about probabilistic matters, as did his teacher A. N. Kolmogorov. A manifestation of this quality can be seen in the paper on the absence of continuous symmetry breaking in two-dimensional lattice models. Dobrushin and Shlosman reduced this old question of Mermin and Wagner [16] to a well-known result in probability theory about the distribution of sums of special, non-identically distributed, random variables [6]. Dobrushin always had some interest in the dynamical problems of statistical mechanics. One of several examples was his result with J. Fritz showing the existence of dynamics for statistical mechanical systems in two dimensions containing an infinite number of particles, along with partial results in three dimensions [7]. These problems lead to difficulties that look similar to those in the study of the Navier-Stokes equations; they are under control in two dimensions, but may lead to singular behavior of the energy in three dimensions that is not yet understood. References 1. Bassalygo, L., Malyshev, V., Minlos, R., Ovseevich, I., Pechersky, E., Pinsker, M., Prelov, V., Rybko, A., Suhov, Yu., Shlosman, S.: In memory of Roland L’vovich Dobrushin. Problemy Peredachi Informacii, 32 nr. 3, 3–24 (1996) 2. Dobrushin, R.L.: Gibbsian random fields for lattice systems with pair interaction. Funct. Anal. and Appl. 2, 31–43 (1968); Problem of uniqueness of a Gibbs random field and phase transitions. Funct. Anal. and Appl. 2, 44–57 (1968); Gibbs field: The general case. Funct. Anal. and Appl. 3, 27–35 (1969) 3. Dobrushin, R.L.: Existence of a phase transition in two- and three-dimensional lattice models. Prob. Theory and Appl. 10, 209–230 (1965) 4. Dobrushin, R.L.: Gibbs states which describe the co-existence of phases for a three-dimensional Ising model. Prob. Theory and Appl. 17, 612–639 (1972)
R. L. Dobrushin
261
5. Dobrushin, R.L., Kotecky, R. and Shlosman, S.: Wulff construction – A global shape from local interaction. Translations of Math. Monographs, 104, Providence, RI: AMS, 1992 6. Dobrushin, R.L. and Shlosman, S.: Absence of breakdown of continuous symmetry in two-dimensional models of statistical physics. Commun. Math. Phys. 42, 31–40 (1975) 7. Dobrushin, R.L. and Fritz, J.: Non-equilibrium dynamics of two-dimensional infinite particle systems with a singular interaction. Commun. Math. Phys. 57, 67–81 (1977) 8. Griffiths, R.: Peierls’ proof of spontaneous magnetization of a two-dimensional Ising ferromagnet. Phys. Rev. A136, 437–439 (1964) 9. Gurevich, B., Ibragimov, I., Minlos, R., Ovseevich, I., Oseledec, V., Pinsker, M., Prelov, V., Prohorov, Yu., Sinai, Ya., Shiryaev, A., Holevo, A.: Roland L. Dobrushin: Teor. Veroyatnos. i Primenen. 41, 164–169 (1996) 10. Gurevich, B., Ibragimov, I., Minlos, R., Prelov, V., Suhov, Yu.: Memory of R. Dobrushin. Russian Mathematical Surveys 52, (1997) 11. Lanford, O. and Ruelle, D.: Observables at infinity and states with short-range correlations in statistical mechanics. Commun. Math. Phys. 13, 194–215 (1969) 12. Malyshev, V.A., Minlos, R.A.: Roland Lvovich Dobrushin (1929–1995), (English), Markov Process. Related Fields. 1, no. 4, 447–458 (1995) 13. Minlos, R., Pechersky, E. and Suhov, Yu.: Remarks on the life and research of Roland L. Dobrushin. Jour. Appl. Math. and Stochastic Anal. 9, 337–372 (1996) 14. Minlos, R., Shlosman, S., Sinai, Ya.: Obituary: Roland L. Dobrushin (1929–1995). (English), Ergodic Theory Dynam. Systems 16, no. 5, 863–869 (1996) 15. Minlos, R., Shlosman, S. and Vvedenskaya, N.: Memories of Roland Dobrushin. Translated from Russian by Wm. Faris, Notices of the American Mathematical Society 43, 428–429 (1996) 16. Mermin, D. and Wagner, H.: Absence of ferromagnetism or ” antiferromagnetism in one or twodimensional isotropic Heisenberg models. Phys. Rev. Lett. 17, 1133–1136 (1966) 17. Peierls, R.: Ising’s model of ferromagnetism Math. Proc. Cambridge Philos. Soc. 32, 477–481 (1936) 18. Wulff, G.: Zur Frage der Geschwindigkeit des Wachstums und der Aufl¨osung der Krystallfl¨achen. Z. Kryst. 34, 449–430 (1901)
Commun. Math. Phys. 189, 263 – 275 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Reversibility, Coarse Graining and the Chaoticity Principle F. Bonetto1 , G. Gallavotti2 1 Matematica, Universit` a di Roma “La Sapienza”, P.le Moro 2, 00185 Roma, Italia. E-mail:
[email protected] 2 Fisica, Universit` a di Roma “La Sapienza”, P.le Moro 2, 00185, Roma, Italia. E-mail:
[email protected]
Received: 28 February 1996 / Accepted: 12 February 1997
Dedicated to the memory of Roland Dobrushin Abstract: We describe a way of interpreting the chaotic principle of [GC1] more extensively than it was meant in the original works. Mathematically the analysis is based on the dynamical notions of Axiom A and Axiom B and on the notion of Axiom C, that we introduce arguing that it is suggested by the results of an experiment ([BGG]) on chaotic motions. Physically we interpret a breakdown of the Anosov property of a time reversible attractor (replaced, as a control parameter changes, by an Axiom A property) as a spontaneous breakdown of the time reversal symmetry: the relation between time reversal and the symmetry that remains after the breakdown is analogous to the breakdown of T -invariance while T CP still holds. 1. Introduction In reference [GC2] a general mechanical system in a non equilibrium situation was considered. Calling C the (compact or “finite”) phase space of the “observed events”, µ0 the volume measure on it and S the map describing the time evolution (regarded as a discrete invertible mapping of “observed events” into the “next ones”, i.e. as a Poincar´e map on some surface in the full phase space) a principle holding when motions have an empirically chaotic nature was introduced: 1. Chaotic hypothesis. A chaotic many particle system in a stationary state can be regarded, for the purpose of computing macroscopic properties, as a smooth dynamical system with a transitive Axiom A globally attracting set. In reversible systems it can be regarded, for the same purposes, as a smooth transitive Anosov system. • Chaotic is an empirical qualitative notion that means that most points of the attracting set have a stable and an unstable manifold with positive dimension. In the applications in [GC1, GC2, G1] the use of the hypothesis, which is a natural extension of a principle proposed by Ruelle, was based on reversibility and on transitivity.
264
F. Bonetto, G. Gallavotti
• An attracting set is a closed invariant set such that all points in its vicinity evolve (in the future) tending to it and such that no subset has the same property (i.e. it is “minimal”). A set is globally attracting if it is attracting and all points of an open dense set evolve tending to it. • An Axiom A attracting set is a hyperbolic attracting set: i.e. an attracting set with each of its points possessing stable and unstable manifolds depending continuously upon the points and with contraction and expansion rates bounded uniformly away from 0. Furthermore the periodic points are dense. • Reversibility means that there is a smooth map i of C onto itself that changes the sign to time in the sense that iS = S −1 i and i2 = 1. As is well known reversibility should not be confused with the invertibility of the map S (always assumed below). • Transitivity is intended to mean that the stable and unstable manifolds of the attracting set points are dense on it (this is not a very strong requirement in view of (7.6) in [Sm], p. 783). • Anosov system is a dynamical system in which the whole phase space is hyperbolic. Note that if the Axiom A attracting set is supposed to be also a smooth manifold then the restriction of the dynamics to it is an Anosov system. This is the meaning that we give, in this paper, to the second assumption in the above hypothesis (see Sect. 6 of [BGG]). However in the previous paper [G4] transitivity was instead intended to mean, at least in the reversible cases, density of the stable and unstable manifolds of the attracting set points on the entire phase space (so that the system was in fact a transitive Anosov system). This is not always a property that one may be willing to consider as reasonable. It is reasonable for systems that are very close to conservative ones (as in [G4]). But it is very likely (see [BGG], Sect. 6, Fig. 14) to be incorrect in systems that are under strong non conservative forces, even if still evolving with a reversible dynamics. In fact the attracting sets of such systems often evolve, as the strength of the forces increases, from a very chaotic initial attracting set to a more ordered situation characterized by a periodic orbit or by a very small attracting “tube” almost identical to a periodic orbit; the evolution shows a gradual decrease of the dimension and of the phase space region occupied by the attracting set, which therefore quite soon may become contained in a proper closed subset of phase space (so that the system cannot have stable and unstable manifolds with dimension half that of phase space: a necessary consequence of time reversibility in transitive Anosov systems). In such cases it is still reasonable, see [R1, ER], to think that the attracting set, if chaotic, can be regarded “just” as an Axiom A attracting set, an assumption weaker than assuming that the system is an Anosov system. What can be said for such systems? Is there a suitable reformulation of the chaoticity principle that could make it applicable even in very strongly forced (but still “chaotic”) systems making possible an analysis similar to that leading to the fluctuation theorem of [GC1, GC2] or to the Onsager relations of [G4]? These are the questions we address here.
2. A Distinction Between Anosov and Axiom A Properties. Statistics. Axioms B and C We shall try to adopt the notations used in the well known paper by Smale, [Sm]. Furthermore in this paper, as in [GC1, GC2], a distinction will be made between an attractor and its closure that we call more properly an attracting set: this is often a
Reversibility, Coarse Graining and Chaoticity Principle
265
rather confusing point because some authors identify an attractor with its closure. Here we shall not. Thus, recalling that µ0 denotes the volume measure on C, we shall formally say that: Definition 1. A point x ∈ C admits a statistics if there is a probability distribution in phase space such that: Z M −1 1 X F (S j x) = F (y)µ(dy) M →∞ M C lim
(2.1)
j=0
for all continuous functions F on phase space C., and: Definition 2. An attractor is a set C enjoying the properties: (a) it is invariant and dense in an attracting set, (b) µ0 –almost all points in its vicinity admit the same statistics µ and µ(C) = 1, (c) it has minimal Hausdorff dimension and (d) it is minimal.1 An attracting set verifies Axiom A if it has the properties: (i) each of its points is hyperbolic, (ii) the periodic points are dense on it, (iii) it contains a dense orbit. With this definition one has to live with the fact that there will be several essentially identical attractors: if the distribution µ gives, for instance, 0 probability to individual points on the attractor C then any subset of C obtained by removing the orbis of a countable number of points will still be an attractor (of course with the same closure as C itself). This is the main reason why it is wise to distinguish the notions of attractor and of attracting set; in the cases met in this paper it will be appropriate to call the latter an attracting basic set (see below). It is useful to recall some more general definitions and properties: • a dynamical system verifies Axiom A if each point in the set of “nonwandering points” (i.e. in the set of all “recurrent points”) is “hyperbolic”, i.e. each nonwandering point admits stable and unstable manifolds, continuously dependent on the point and with expansion and contraction rates uniformly bounded away from 0; furthermore the periodic points are dense on , [Sm], p.777. The closure of an attractor for an Axiom A system is an Axiom A attracting set in the above sense. • Axiom A systems have a rather simple structure as the sets of their nonwandering points consist of a finite number of closed invariant indecomposable topologically transitive sets, called basic sets, see [Sm], p.777.2 • A basic set that is the closure of an attractor is called an attracting basic set, likewise one defines a repelling basic set.3 • Another interesting class of dynamical systems is the class of Axiom B systems (see [Sm.] p 778): they are the dynamical systems verifying Axiom A and the further mild transversality property4 that if i , j are a pair of basic sets such that the stable set 1
This means that any other set with the same properties has a closure that contains the closure clos(C). Topologically transitive, [Sm] p.776, means that they contain a point with a dense orbit. Note that this is weaker than transitive in the sense of Sect. 1. Indecomposable means that they contain no subset with the same properties. The basic sets are the building pieces (or the “bases”) of the part of phase space where the dynamics is non trivial. Transitive Anosov systems are simply Axiom A systems with a (unique) basic set coinciding with the whole phase space. 3 Thus a basic set for an Axiom A system can be regarded as a dynamical system in itself: in this case it may fail to be Anosov only because in general it is not a smooth manifold but just a closed set. 4 Mild because of Theorem (6.7), p. 779, in [Sm]. 2
266
F. Bonetto, G. Gallavotti
W s (i ) and the unstable set W u (j ) intersect then they intersect transversally (see below). • The stable (resp. unstable) set of i (resp. j ) is the union of the stable (resp. unstable) manifolds of all its points (see [Sm], p. 777). The intersection between W s (i ) and W u (j ) is transversal if the stable manifold W s (p) and the unstable manifold W u (q) of any two periodic points p ∈ i and q ∈ j have a point of transversal intersection (see [Sm], p. 783). Finally two manifolds intersect transversally at a point if their tangent planes span the full tangent plane (see [Sm], p. 752). We also need to recall that, particularly in the numerical experiments, for dynamical systems with “chaotic behavior” it is usually also assumed that, after the “obvious” conservation laws and symmetries are taken into account, the whole phase space admits a statistics, in the sense that almost all points (with respect to the volume measure) admit a statistics, the same for all of them. This is usually called the zeroth law, [UF]: Extended zeroth law: A dynamical system (C, S) describing a many particle system (or a continuum such as a fluid) generates motions that admit a statistics µ in the sense that, given any (smooth) macroscopic observable F defined on the points x of the phase space C, the time average of F exists for all µ0 –randomly–chosen initial data x and is given by: Z M −1 1 X F (S j x) = µ(dx0 )F (x0 ) M →∞ M C
(2.2)
lim
k=0
with µ being a S–invariant probability distribution on C. It is important to note the physical meaning of the above law: in fact, among other things, it implies that the dynamical system (C, S) cannot have more than one attracting basic set. We shall say that a system for which the property described by the above “law” holds verifies the zeroth law. For our purposes only systems verifying at least Axiom B will be relevant. For such systems the notion of diagram of a dynamical system, see [Sm] p. 754, allows us to interpret the zeroth law as saying that the diagram of the system is a partially ordered set with unique top and bottom points. In this paper we shall deal with reversible dynamical systems (C, S) verifying Axiom B and the zeroth : hence with a unique attracting basic set + and a unique repelling basic set − . Furthermore the attracting set will be assumed transitive in the sense that the stable and unstable manifolds of each of its points are dense on it. The Axiom B and the zeroth law could be reasonably taken as definitions of models for globally “chaotic” or “globally hyperbolic systems”. However the problem that we pose in the next section suggests that the appropriate notion for “globally hyperbolic” or “globally chaotic” dynamical systems is somewhat stronger. Its definition has been suggested by our effort to interpret the results of the experiment [BGG] and we allowed ourselves to give the name of Axiom C systems to systems verifying a stronger property; to describe it we introduce the notion of distance of a point x to the basic sets {i } of an Axiom A system as:
dj (S n x) d (x) δ(x) = min min i , min i d0 j, −∞
,
(2.3)
Reversibility, Coarse Graining and Chaoticity Principle
267
where d0 is the diameter of the phase space C, di (x) is the distance of the point x from the basic set i , the minimum over i runs over the attracting or repelling basic sets and the minimum over j runs over the other k ≥ 0 basic sets. We can then define the Axiom C systems as: Definition 3. A smooth dynamical system (C, S) verifies Axiom C if it is Anosov or if it verifies Axiom A and: (1) among the basic sets there are a unique attracting and a unique repelling basic sets, denoted + , − respectively, with (open) full volume dense basins that we call the poles of the system (future or attracting and past or repelling poles, respectively). (2) for every x ∈ C the tangent space Tx admits a H¨older–continuous5 decomposition as a direct sum of three subspaces Txu , Txs , Txm such that: α = TSx , α = u, s, m , a) dS Txα n b) |dS w| ≤ Ce−λn |w| , w ∈ Txs , n ≥ 0, −n −λn |w| , w ∈ Txu , n ≥ 0, c) |dS w| ≤ Ce n −1 −λ|n| |w|, w ∈ Txm ∀n, d) |dS w| ≤ Cδ(x) e where the dimensions of Txu , Txs , Txm are > 0 and δ(x) is defined in (2.3). (3) if x is on the attracting basic set + then Txs ⊕ Txm is tangent to the stable manifold in x; viceversa if x is on the repelling basic set − then Txu ⊕ Txm is tangent to the unstable manifold in x. Although Txu and Txs are not uniquely determined the planes Txs ⊕ Txm and Txu ⊕ Txm are uniquely determined for x ∈ + and, respectively, x ∈ − . It is clear that an Axiom C system is necessarily also an Axiom B system verifying the zeroth law (as it follows from [R2]). We do not know an example of an Axiom B system with a unique attracting and a unique repelling basic set which is not at the same time an Axiom C system. Apart from property (1) that is meant to imply the validity of the zeroth law, one can also say that (“at most”) the real difference between an Axiom B and an Axiom C system is that the latter has a stronger, and more global, hyperbolicity property. Namely, if + and − are the two poles of the system the stable manifold of a periodic point p ∈ + and the unstable manifold of a periodic point q ∈ − not only have a point of transversal intersection, but they intersect transversally all the way on a manifold connecting + to − ; the unstable manifold of a point in − will accumulate on + without winding around it. In fact one can “attach” to W s (p), p ∈ + , points on − as follows: we say that a point z ∈ − is attached to W s (p) if it is an accumulation point for W s (p) and there is a curve with finite length linking a point z0 ∈ W s (p) to z and entirely lying on W s (p), with the exeption of the endpoint z. A drawing helps understanding this simple geometrical construction, slightly unusual because of the density of W s (p) on − ,. s We call W (p) the set of the points either on W s (p) or just attached to W s (p) on the s system basic sets (the set W (p) should not be confused with the closure clos(W s (p)), s which is the whole space, see [Sm], p. 783). If a system verifies Axiom C the set W (p) intersects − on a stable manifold, by 2) in the above definition. u The definition of W (q), q ∈ − , is defined symmetrically by exchanging + with − . Furthermore if a system verifies Axiom C and p ∈ + , q ∈ − are two periodic points, on the attracting basic set and on the repelling basic set of the system respectively, 5 One might prefer to require real smoothness, e.g. C p with 1 ≤ p ≤ ∞: but this would be too much for rather trivial reasons. On the other hand H¨older continuity might be equivalent to simple C 0 –continuity as in the case of Anosov systems, see [AA, Sm].
268
F. Bonetto, G. Gallavotti s
u
then W (p) and W (q) have a dense set of points in + and − , respectively. Note s u that W (p) ∩ W (q) is dense in C as well as in + and − . This follows from the density of W s (p) and W u (q) on + and − respectively and from the continuity of Txm . s u s u Furthermore if z ∈ + is such that z ∈ W (p) ∩ W (q) then the surface W (p) ∩ W (q) intersects − in a unique point z˜ ≡ ı˜z which can be reached by the shortest smooth path s u on W (p) ∩ W (q) linking z to − (the path is on the surface obtained as the envelope of the tangent planes T m , but it is in general not unique even if Tm has dimension 1, see the example in Sect. 4 below). The map ı˜, as a map of + ∪ − into itself, commutes with both S and i, squares to the identity and will play a key role in the following analysis. We conjecture that the Axiom C systems are –stable in the sense of Smale, [Sm] p. 749 (added on revision: this follows from Robbin’s theorem, see [R3], p. 170).
3. Axiom A, B, C and Time Reversibility: The Problem If one considers the closure + = clos(C) of an attractor C verifying Axiom A then the action of the dynamics S on it fails to be an Anosov system only because clos(C) might be a fractal set rather than a smooth surface. In nonequilibrium statistical mechanics the dimensionality of the attractors is usually very large so that their fractality is likely to be irrelevant. This is part of the hypothesis that the system can be regarded as an Anosov system for the purpose of studying averages of relevant quantities. And in fact the Anosov property is used in the above references only to obtain a representation of the SRB distribution, i.e. of the distribution describing the averages of observables. The same representation holds for the SRB distribution on an Axiom A attracting set. For this reason the fractality of an attractor was regarded in [GC2] as “an unfortunate accident”. Therefore the really non trivial hypothesis in the mentioned applications is the reversibility of the motion on the attracting set. Such reversibility is of course implied by the reversibility of the motion on the whole phase space if the attracting set and the whole phase space coincide: in the above references this was taken as a consequence of the chaoticity hypothesis. However one may wish to see how far this is justified in the cases in which the attractor C is really smaller than the whole phase space. We shall refer to such cases as the cases in which the attracting set verifies Axiom A: we therefore include under the latter denomination also the case in which the attracting set is a smooth surface (and could therefore be said to be an Anosov system). The possible fractality of the closure of the attractor or its smoothness play no role in the following. Suppose that the reversible mechanical system under consideration verifies Axiom C (a stronger notion than Axiom B, and a kind of “global hyperbolicity” condition as discussed in Sect. 2). Suppose that the attracting pole + = clos(C) is not the whole phase space C. Can one then conclude that the fluctuation theorem of [GC1] holds? or at least some modification of it? We “answer” this in the affirmative by noting that the global time reversal map i, a priori assumed to exist, induces on the pole + = clos(C) of the system a natural “smooth” (i.e. H¨older continuous) map i∗ which verifies: i∗ S = S −1 i∗ ,
(i∗ )2 = 1.
(3.1)
Reversibility, Coarse Graining and Chaoticity Principle
269
In fact since the map ı˜ commutes with S and maps the attracting pole + onto the repelling pole − we can set i∗ = ı˜ i and define a map of ± into themselves verifying: i∗ S = ı˜ i S = ı˜ S −1 i = S −1 i∗ .
(3.2)
Hence i∗ is a time reversal on the future attracting set. Note that i∗ is not the restriction of i to the future attracting set. This map will be the local time reversal. One should stress that since ı˜ is defined only on ± also i∗ has only a meaning as a map of such sets onto themselves. Its existence immediately implies the validity of a fluctuation theorem ([GC1, GC2, G3]) for systems that are globally reversible and chaotic in the sense that they verify Axiom C (and hence verify the zeroth law and have an Axiom A attracting set), with the only difference that the theorem applies to the phase space contraction that occur on + rather than to the phase space contraction occurring in the whole phase space C (see Sect. 6, (b), for a discussion). Moreover it implies also that the chaotic hypothesis can be conceptually simplified. Curiously enough this does not even require a modification of its formulation, but it allows for a broader intrepretation of it as the word “Anosov” can be essentially replaced by “reversible Axiom A attracting set” and this covers explicitly the cases in which the attracting set is not dense on phase space. It is important to note here that the existence of i∗ is not trivial: because i∗ cannot be (unless C = + , i.e. unless the future pole of the system is the whole phase space so that the system is actually an Anosov system) the restriction to + of the time reversal map i, as one would naively surmise. In fact reversible systems with Axiom A attractors usually have attractors C+ and C− for the forward motion and for the backward motions which are distinct, [S1], in the sense that + ∩ − ≡ clos(C+ ) ∩ clos(C− ) = ∅. In such cases the time reversal i maps + into − and viceversa. Therefore although the global time reversal i has the property Si = iS −1 it does not leave invariant the attracting basic set + = clos(C+ ). The map i∗ is an effective time reversal acting on the closure of the attractor + = clos(C+ ) for the future motion. Its existence could be expected on philosophical grounds: if a system is reversible there should be no way of knowing whether one is moving on the future pole + or on the past pole − . In particular we should be able to see that the motion is reversible without ever even knowing about the existence of the past attractor C− . Hence we expect that there is a “local time reversal” i∗ on both the future and the past attractors: and the problem is to find a way to construct, at least in principle, i∗ . The reader will notice the analogy between the above picture and the spontaneous symmetry breaking: when the attractor dimension decreases (because some Lyapunov exponent changes sign as a parameter changes) the time reversal symmetry is no longer valid, but some other symmetry survives which still has the effect of changing the sign to time: a well known example is the breaking of T -symmetry in relativistic quantum mechanics, with the T CP -symmetry remaining valid. In Sect. 4 we present a model in which i∗ can be constructed and provides the paradigm of a reversible Axiom C system. In Sect. 5 we discuss the meaning in symbolic dynamics of the map i∗ . On the mathematical side there are various points that would require closer investigation. But the discussion seems to indicate that the scenario for the construction of i∗ should work rather generally, as we think it is quite naturally suggested by the results of the experiment in [BGG].
270
F. Bonetto, G. Gallavotti
4. An Example We give here an example in which i∗ , the local time reversal, arising in the applications can be easily constructed. The example illustrates what we think is a typical situation. The poles ± , closures of an attractor C+ and respectively of a repeller C− , will be two compact regular surfaces, identical in the sense that they will be mapped into each other by the time reversal i defined below. If x is a point in M∗ = + = clos(C+ ) the generic point of the phase space will be determined by a pair (x, z), where x ∈ M∗ and z is a set of transversal coordinates that tell us how far we are from the attractor. The coordinate z takes two well defined values on + and − that we can denote z+ and z− respectively. The coordinate x identifies a point on the compact manifold M∗ on which a reversible transitive Anosov map S∗ acts (see [G3]). And the map S on phase space is defined by: ˜ S(x, z) = (S∗ x, Sz),
(4.1)
where S˜ is a map acting on the z coordinate (marking a point on a compact manifold) which is an evolution leading from an unstable fixed point z− to a stable fixed point z+ . For instance z could consist of a pair of coordinates v, w with v 2 + w2 = 1 (i.e. z is a point on a circle) and an evolution of v, w could be governed by the equation ˜ to be the time 1 evolution (under v˙ = −αv, w˙ = E − αw with α = Ew. If we set Sz the latter differential equations) of z = (v, w) we see that such evolution sends v → 0 ˜ and w → ±1 as t → ±∞ and the latter are non marginal fixed points for S. ˜ Thus if we set S(x, z) = (S∗ x, Sz) we see that our system is hyperbolic on the basic sets ± = M∗ × {z± } and the future pole + = clos(C+ ) is the set of points (x, z+ ) with x ∈ M∗ ; while the past pole − = clos(C− ) is the set of points (x, z− ) with x ∈ M∗ . Clearly the two poles are mapped into each other by the map i(x, z± ) = (i∗ x, z∓ ). But on each attractor a “local time reversal” acts: namely the map i∗ (x, z± ) = (i∗ x, z± ). The system is “chaotic” as it has an Axiom A attracting set with closure consisting of the points having the form (x, z+ ) for the motion towards the future and a different Axiom A attracting set with closure consisting of the points having the form (x, z− ) for the motion towards the past. In fact the dynamical systems (+ , S) and (− , S) obtained by restricting S to the future or past attracting sets are Anosov systems because ± are regular manifolds. We may think that in the reversible cases the situation is always the above: namely there is an “irrelevant” set of coordinates z that describes the departure from the future and past attractors. The future and past attractors are copies (via the global time reversal i) of each other and on each of them is defined a map i∗ which inverts the time arrow, leaving the attractor invariant: such map will be naturally called the local time reversal. In the above case the map i∗ and the coordinates (x, z) are “obvious”. The problem is to see that they are defined quite generally under the only assumption that the system is reversible and has unique future and unique past attractors that verify the Axiom A. This is a problem that is naturally solved in general when the system verifies the Axiom C of Sect. 2 (see Sect. 3, (3.1) above). In the following section we shall describe the interpretation of i∗ in terms of symbolic dynamics when the system verifies Axiom C: as one may expect the construction is simple but it is deeply related to the properties of hyperbolic systems such as their Markov partitions.
Reversibility, Coarse Graining and Chaoticity Principle
271
5. Local Time Reversal and Markov Partitions In this section we discuss the properties of the map i∗ and its relation with the Markov partitions and the symbolic dynamics. We assume the reader is familiar with the notion of Markov partition: in any event the results of this paper logically follow those of [GC1, GC2 and G2, G3] and we can expect that only readers familiar with those papers can have any interest in the present one. In [GC1] we mention that a transitive Anosov reversible system admits a Markov partition P = {Qσ }, which is invariant under time reversal: iP = P. This means that for every element Q ∈ P one can find an element Q0 ∈ P such that iQ = Q0 . If the dynamical system only has a transitive Axiom A attracting set we can still construct a Markov partition P = iP but it will not have a transitive transition matrix. The transition matrix is in fact defined by setting Tσ,σ0 = 1 if SQσ ∩ int(Qσ0 ) 6= ∅ (here int(Q) are the interior points of Q relatively to the closure of the attractor) and Tσ,σ0 = 0 k otherwise. And transitivity means that there is a power k of T such that Tσ,σ 0 > 0 for 0 all σ, σ (see comments after the chaotic hypothesis Sect. 1 and footnote 2). The lack of transitivity in the above sense is simply due to the fact that the Markov partition P really splits into two transitive Markov partitions, one, denoted P+ , paving the closure + = clos(C+ ) of the future attractor and one, denoted P− , paving the closure − = clos(C− ) of the past attractor C− . And of course there is no possibility of a transition from one to the other under the action of S as the two are S-invariant sets. But for Axiom C systems the Markov partition can be built in a special way that takes into account more deeply the global time reversal symmetry of the system. Let in fact O be a fixed point of S on + (if no fixed point exists O can, for the purposes of the following discussion, be replaced by a point on one of the periodic orbits on + ; recall that by the Axiom A property the periodic orbits are dense on ± ). We shall assume, for simplicity, that + (hence − ) are smooth surfaces: then we consider the stable manifold of O. The latter is dense on + because of the assumed transitivity of the attractor and it has a part that is not contained on the attracting set (because we are supposing that the attractor is not dense in phase space). If n0 is the dimension of the pole + and ns is the dimension of the part of the stable manifold WOs lying on + then the dimension of the stable manifold will be ns + n for some n ≥ 1. The manifold WOs will intersect the pole − : otherwise it would lead to another repelling basic set, violating the assumption that there are only two poles (i.e. only one attractor for the future motion and one for the backward motion as expressed by the zeroth law above, see 3) in the definition of Axiom C). The pole − has (by the time reversal symmetry) the same dimension n0 of the pole s + and its intersection with W O will be a ns -dimensional manifold in − , an unstable −1 manifold for the map S , dense on − . Likewise we can consider the point iO ∈ − u and perform the same construction by using the unstable manifold WiO = iWOs of iO. It will have a part of dimension nu = ns lying on − and its dimension will be nu + n. The u intersect densely on ± and each intersection point x ∈ + is manifolds WOs and WiO on a n-dimensional manifold which has one point ı˜x ∈ − on − . It is clear that the densely defined map ı˜ of + ← → − commutes with S and it can be extended by continuity to a map of + to − . If P+ is a Markov partition of + then ı˜P+ = P− is a Markov partition of − . This is just another way of looking at the construction of the map ı˜, hence of i∗ , see (3.2). This also shows that we can establish a natural correspondence σ ←→ σ˜ between labels of elements of P such that Qσ˜ = ı˜Qσ . Note that if Qσ ∈ P± then Qσ˜ ∈ P∓ .
272
F. Bonetto, G. Gallavotti
Note that the map i∗ has a very simple and natural symbolic dynamics interpretation. Given an allowed sequence σ = {σj } we set σ˜ = {σ˜ −j }; since Tσ,σ0 = 1 means SQσ ∩ int(Qσ0 ) 6= ∅, we deduce that it means also i SQσ ∩ i int(Qσ0 ) 6= ∅, hence S −1 iQσ ∩i int(Qσ0 ) 6= ∅. So that i int(Qσ )∩S i Qσ0 6= ∅, hence i∗ Qσ ∩S i∗ int(Qσ0 ) 6= ∅: Tσ,σ0 = 1 ← → Qσ˜ ∩ Sint(Qσ˜ 0 ) 6= ∅,
(5.1)
i.e. Tσ,σ0 = 1 is equivalent to Tσ˜ 0 ,σ˜ = 1. This means that if σ = {σj } is an allowed sequence of symbols for a point x on the pole + (i.e. it is the history on P of a point x ∈ + in the sense that S j x ∈ Qσj for all j) then also σ˜ = {σ˜ −j } is an allowed sequence and i∗ x is the (unique) point on the pole − that has {σ˜ −j } as the history under S. 6. Markov Partitions, Coarse Graining and Trajectory Segments. Extended Liouville Measure a) We first discuss the notion of coarse graining, making precise some ideas that were advanced in [G1]. We show that Anosov systems with Axiom A attracting sets admit, in spite of the chaoticity of the motions that they describe, a rather natural decomposition of phase space into cells so that the time evolution can be naturally represented as a cells permutation and the SRB distribution can be naturally interpreted as the distribution that gives equal weight to each of the cells. This may look surprising and contradictory with the property of hyperbolicity and chaoticity of the system. Therefore it is a particularly interesting (rather elementary) feature of the SRB distribution which makes it even more analogous to the microcanonical distribution in equilibrium. Imagine that P is a Markov partition for a transitive Axiom A attracting set. We use it to set up a symbolic dynamic description of the attractor. T
j 2 Let T be large and PT = ∨j=− T S P. Then it is well known, [S1,R1] (see also 2 [G2]), that we can represent the SRB distribution as a limit of probability distributions obtained by assigning to the elements Q ∈ PT a weight:
3−1 e,T (xQ ),
(6.1)
where 3e,T (x) is the expansion rate (i.e. the modulus of the jacobian determinant) of the map S T as a map of the unstable manifold of S −T /2 x to that of S T /2 x, and xQ is a (suitable, see [GC2, G3]) point in Q. Then we can imagine to partition each Q ∈ PT into boxes so that the number of boxes, that we call cells, in Q is proportional to (6.1). In this way we find a representation for the SRB distribution in which each cell of phase space has the same weight. The SRB distribution thus appears as the uniform distribution on the attracting set (thus partitioned) and the time evolution can be rather faithfully represented simply as a permutation of the cells, in spite of the hyperbolicity. This also shows that one has to be careful in saying that “it is obvious that the SRB distribution is obtained by attributing the weights (6.1) to points on the attractor”, sometimes erroneously called the “trajectory segment method”: this is right only if the points are identified with the cells of a Markov partition PT , refinement of a fixed Markov partition. This means that (6.1) is correct only if a suitable coarse graining of the phase space is made, and incorrect otherwise.
Reversibility, Coarse Graining and Chaoticity Principle
273
If the points are chosen differently then the weight to give to each may well be very different, as in the latter case in which it is equal for all cells, no matter what the value of 3−1 e,T is in each of them. And in some sense the latter representation is the most natural one, and it realizes in general the Boltzmann idea that all points in phase space are equivalent and the dynamics is just a one cycle permutation of the cells, [G2]. One can say that if the system admits an Axiom A attracting set then it is possible to define a coarse graining of phase space such that the dynamics is eventually just a one cycle cell permutation (even though the evolution may be non volume preserving): the SRB distribution appears then as the uniform distribution on the relevant phase space part (i.e. the attracting set). In other words the chaotic hypothesis can be regarded as a natural version of the original viewpoint of Boltzmann, [G1], on time evolution and ergodicity. In general the attracting set will support two stationary distributions: one, coinciding with µ+ , which describes the statistics of the data that are chosen randomly on the attracting set with a distribution proportional to the area elements of the set itself, and a second one describing the statistics of the same data evolving backward in time. The latter statistics can be denoted µ∗+ and it is different from the statistics of the data that are chosen randomly with distribution proportional to the full phase space volume (the latter is in fact concentrated “elsewhere”, on the set obtained from the attracting set by the global time reversal). A representation of the above dynamical properties in terms of “coarse grain” cells is also easy to set up. We can imagine to partition each Q ∈ PT into boxes of type + and boxes of type − so that the number of boxes of type + in Q is proportional to 3−1 e,T (xQ ) and the number of boxes of type − in Q is proportional to 3c,T (xQ ), where this is the modulus of the jacobian determinant of S T restricted to part of the stable manifold of S on the attracting set, regarded (locally) as a submanifold of the attracting set (that we are supposing to be a manifold). One can define the forward evolution on the attracting set as a one cycle permutation of the + cells and the backward evolution as a one cycle permutation of the − cells. The local time reversal i∗ can now be represented as a one to one transformation of the + cells into the − cells. The “wandering” points could also be represented as cells of a third type, but they are not interesting for the description of the statistical properties of the motions on the attracting set. The representation of the dynamics in terms of two “compenetrated” sets of coarse grain cells gives a clear idea of how it is possible that the forward and backward evolutions have different statistics related by the same time reversal operation i∗ that is the basis of the proof of the fluctuation theorem. b) Consider an Axiom C system. Then we can define a local time reversal i∗ on the future pole. This means that a fluctuation theorem holds for the statistics of the Liouville distribution µ0 on C. The formulation of the fluctuation theorem is unchanged provided one defines the entropy production rate as the contraction of the surface area on the attracting set. This is to be expected to be a rather difficult quantity to evaluate in concrete cases because we cannot expect to have a precise knowledge of the geometric structure of the attracting set. In this respect one can remark that the system may have other properties that nevertheless allow us to establish a relationship between the contraction rate of the Liouville measure (directly accessible from the measurement of the divergence of the equations of motion) and the contraction rate of the surface measure on the attracting
274
F. Bonetto, G. Gallavotti
set: an interesting instance of this has been found in [BGG], Sect. 6, (ii), where the extra property used was the pairing rule that held in that case, (see [ECM1, DM]). In general on the pole + one can define a probability distribution µ∗0 that is the natural extension of the Liouville distribution in the equilibrium case and for which the fluctuation theorem holds in the same form that it has in the Anosov case. It is the probability distribution that is defined in the symbolic dynamics representation by the Gibbs distribution, [D, LR, R2], with non translationally invariant potential given by the formal energy function: H( σ ) =
−1 X
h− (ϑk σ ) +
k=−∞
∞ X
h+ (ϑk σ ),
(6.2)
k=0
where σ is the symbolic sequence corresponding to a point x on + evaluated on a Markov partition P, see Sect. 5; ϑ is the shift of the sequence σ ; and we have set: h− ( σ ) = − log 3∗s (X( σ )),
h+ ( σ ) = log 3u (X( σ )),
(6.3)
3∗s , 3u
are the jacobian determinants of the map S restricted to the intersections of and the stable or, respectively, unstable manifolds with + . In the Anosov case (6.2) defines a distribution equivalent to the ordinary Liouville distribution, see [G3, G2]. In the Axiom C case it defines a distribution on + which is absolutely continuous with respect to the surface area on + when the poles are smooth manifolds (because in such a case the system (+ , S) is a Anosov system). But if the pole + is just an Axiom A attracting set which is not a smooth manifold then 3∗s is not ^ properly defined as a jacobian determinant (because the intersection W sx = Wxs ∩ + is not a manifold). Nevertheless it can be defined by using i∗ via: ∗ 3∗s (x) = 3−1 u (i x),
(6.4)
and this is our proposal for a natural extension of the definition of the Liouville measure on the attracting basic set + . It is a distribution that may have several further properties that it seems worth investigating. c) Finally, with reference to Sect. 1 above, we note that in a system like the one studied in [BGG] it is possible that while a forcing parameter grows the Lyapunov spectrum changes nature because some exponents initially positive continuously evolve into negative ones as the forcing increases. Everytime one “positive” exponent “becomes” negative the dimension of the future pole diminishes (usually by 2 units when the pairing rule is verified, see [BGG]). At this “bifurcation” the future pole splits into two basic sets, one will be the new pole and the other will be its i∗ image. Of course the above analysis implies that there will be a new local time reversal i∗∗ on the new future pole. The i∗ image of the future pole, however, will not be the past pole: one can easily see that the latter is more stable than the i–image of the future pole: hence the past pole will be the full time reversal of the future pole, no matter how many intermediate bifurcations took place. The picture in terms of diagrams, see [Sm] p. 754, is quite suggestive and is that after n = 0, 1, . . . “positive” Lyapunov exponents have “become” negative the diagram of the system consists of 2n points totally ordered starting from the past pole and going straight down to the future pole. During the evolution of the bifurcations n + 1 time reversals are defined i∗0 ≡ i, i∗1 , . . . , i∗n and the k th time reversal i∗k leaves invariant the set of nodes in the diagram with labels 1, 2, . . . , 2n−k , k = 0, . . ..
Reversibility, Coarse Graining and Chaoticity Principle
275
Acknowledgement. We are indebted to N. Chernov, E.G.D. Cohen, P. Garrido, G . Gentile, J.L. Lebowitz for stimulating discussions and comments. This work is part of the research program of the European Network on: “Stability and Universality in Classical Mechanics”, # ERBCHRXCT940460, and it has been partially supported also by CNR-GNFM and Rutgers University.
References [AA] [BGG]
Arnold, V., Avez, A.: Ergodic problems of classical mechanics. New York: Benjamin, 1966 Bonetto, F., Gallavotti, G., Garrido, P.: Chaotic principle: an experimental test. Roma, Preprint, (1996) [CG1] Gallavotti, G., Cohen, E.G.D.: Dynamical ensembles in nonequilibrium statistical mechanics. Phys. Rev. Let. 74, 2694–2697 (1995) [CG2] Gallavotti, G., Cohen, E.G.D.: Dynamical ensembles in stationary states. In print in J. Stat. Phys. (1995) [D] Dobrushin, R.L.: Gibbs random fields for lattice systems with pair interactions. Functional Anal. and Appl. vol. 2, 31–43, 1968 [DM] Dettman, Morriss, G.: Preprint 1995 [ECM1] Evans, D.J., Cohen, E.G.D., Morriss, G.P.: Viscosity of a simple fluid from its maximal Lyapunov exponents. Phys. Rev. 42A, 5990–5997 (1990) [ER] Eckmann, J.P., Ruelle, D.: Ergodic theory of strange attractors. Rev. Mod. Phys. 57, 617–656 (1985) [G1] Gallavotti, G.: Ergodicity, ensembles, irreversibility in Boltzmann and beyond. J. Stat. Phys. 78, 1571–1589 (1995) [G2] Gallavotti, G.: Topics in chaotic dynamics. Lectures at the Granada School, ed. Garrido–Marro, Lecture Notes in Physics 448, 1995 [G3] Gallavotti, G.: Reversible Anosov diffeomorphisms and large deviations. Mathematical Phys. Electr. J. 1, (1) (1995) [G4] Gallavotti, G.: Chaotic hypothesis: Onsager reciprocity and fluctuation–dissipation theorem. J. Stat. Phys. 84, 899–926 (1996) [LR] Lanford, O., Ruelle, D.: Observables at infinity and states with short range correlations. Commun. Math. Phys. 9, 327–338 (1968) [R1] Ruelle, D.: Chaotic motions and strange attractors. Lezioni Lincee, notes by S. Isola, Accademia Nazionale dei Lincei, Cambridge: Cambridge University Press, 1989; see also: Ruelle, D.: Measures describing a turbulent flow. Annals of the New York Academy of Sciences, 357, 1–9 (1980). For more technical expositions see Ruelle, D.: Ergodic theory of differentiable dynamical systems. Pub. Math. de l’ IHES 50, 275–306 (1980) [R2] Ruelle, D.: A measure associated with Axiom A attractors. Am. J. Math. 98, 619–654 (1976) [R3] Ruelle, D.: Elements of differentiable dynamics and bifurcation theory. New York: Academic Press, 1989 [S1] Sinai, Y.G.: Gibbs measures in ergodic theory. Russ. Math. Surv. 27, 21–69 (1972). Also: Introduction to ergodic theory. Princeton: Princeton U. Press, Princeton, 1977 [Sm] Smale, S.: Differentiable dynamical systems. Bull. Am. Math. Soc. 73, 747–818 (1967) [UF] Uhlenbeck, G.E., Ford, G.W.: Lectures in Statistical Mechanics. Providence, RI: Am. Math. Soc., 1963, pp. 5, 16, 30 Communicated by J.L. Lebowitz
Commun. Math. Phys. 189, 277 – 286 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Relative Energies for Non-Gibbsian States Christian Maes? , Koen Vande Velde Instituut voor Theoretische Fysica, K.U. Leuven, Belgium. E-mail:
[email protected],
[email protected] Received: 27 March 1996 / Accepted: 29 July 1996
In memory of Roland Dobrushin Abstract: We investigate the suggestion of R.L. Dobrushin, that for many examples of non-Gibbsian measures, a well-defined interaction potential can still be derived. Such an interaction cannot be absolutely summable, uniformly in all configurations. Rather there will exist a typical set of configurations on which the interaction decays sufficiently fast. We show how to construct an interaction, using quite elementary methods, for the example of the restriction of the Ising pure phases to one layer of the lattice. We sketch a general setup, showing that there is an intimate relation between the problem treated here and the decay of correlations in certain disordered systems. 1. Introduction Ever since the first appearance of “natural” examples of non-Gibbsian measures [9, 12, 13] in simple models of statistical mechanics, there has been the feeling that in many cases the non-Gibbsianness is very weak. After all, all that matters is that one can make sense of relative energies somehow. These relative energies are indeed the objects most people concentrate on to parametrize or to investigate the effect of transformations of equilibrium states. This was especially emphasized by R.L. Dobrushin, who pointed out that the “pathologies” of certain transformations [5] are not so worrisome as one might think at first. He explained that, in practice one would not see much of these pathologies, if the non-Gibbsian aspect is due only to special configurations of the system, which are very untypical anyway. In [4] he has shown (at least for one example [14]) how relative energies can be defined for typical configurations. In this paper, we investigate his suggestion for this example, where we show by quite elementary methods how to construct a potential. For this we have benefitted from discussions with J. Bricmont, who together with A. Kupiainen [2] is using very ? Onderzoeksleider N.F.W.O. Belgium Research supported by EC grant CHRX-CT93-0411
278
C. Maes, K. Vande Velde
similar methods to investigate the statistical mechanics of infinite dimensional dynamical systems. As will become clear, an interesting relation follows between non-Gibbsian aspects and certain singularities in disordered systems. These singularities show up if one considers atypical realizations of the disorder, not allowing a nice decay of the correlation functions uniformly in the set of all disorder realizations. The paper is organized as follows. In Sect. 2 we sketch a general setup and make the link to problems in disordered systems. Section 3 is devoted to the construction of an interaction for the projection of the Ising phases.
2. General Setup The main problem of the paper is how to construct interactions for states that are obtained from Gibbs states via some transformation. For more details on the relationship between Gibbs states and interactions we refer to [5, 7]. Let µ be a probability measure on 1 := {+, −}3 for some finite volume 3 ⊂ d . From µ we obtain another state ν on 2 := {+, −}V (V ⊂ 3) via a stochastic transformation T . T is a stochastic matrix, i.e.: T (ξ, σ) ≥ 0, X T (ξ, σ) = 1,
σ ∈ 1 , ξ ∈ 2 ,
ξ∈2
and µT = ν, i.e.:
X
T (ξ, σ)µ(σ) = ν(ξ).
σ∈1
Let H be some (energy) function on 3 and β ≥ 0. Think of µ as a Gibbs measure with respect to the Hamiltonian H at inverse temperature β. Of course, for finite volumes 3 and V , ν will always be Gibbsian. The problem is then to see whether and in what sense the Gibbsian character is preserved as the volumes tend to infinity. We assume that for all sets A ⊂ V there is a set BA ⊂ 3 such that X X T (ξ A , σ)µ(σ) = T (ξ, σ)µ(σ) exp[−β1BA H(σ)] (2.1) σ∈1
σ∈1
where, for any sets A ⊂ V and B ⊂ 3, x ∈ A, ξ A (x) = −ξ(x), A x 6∈ A, ξ (x) = ξ(x), 1B H(σ) = H(σ B ) − H(σ), the relative energy with respect to a spin-flip in B. (2.1) is satisfied whenever µ(σ B ) = exp[−β1B H(σ)], µ(σ) and T (ξ A , σ) = T (ξ, σ BA ).
(2.2)
Relative Energies for Non-Gibbsian States
279
In particular this is the case when T is a projection or decimation transformation, because then Y 1 + ξ(x)σ(x) , (2.3) T (ξ, σ) = 2 x∈V
and we can take BA = A. It holds also for (real) stochastic transformations like the Kadanoff transformation [11], where 3 is divided into disjoint blocks Bx , x ∈ V such that ∪x∈V Bx = 3. The transformation is P Y exp[pξ(x) y∈B σ(y)] x P , (2.4) T (ξ, σ) = 2 cosh[p y∈Bx σ(y)] x∈V
0 < p < ∞. It is clear that we can take BA = ∪x∈A Bx . Let x, y ∈ V , x 6= y. If we want to define an interaction H˜ for ν then the relative energy for a spin-flip at the site x in the configuration ξ should be given by x
ν(ξ ) ˜ = − log . 1x H(ξ) ν(ξ)
(2.5)
˜ Formula (2.1) makes The effective temperature should be thought of as included in H. it possible to write x
ν(ξ ) ˜ = − log = − loghexp[−β1Bx H]iξ , 1x H(ξ) ν(ξ) {x,y} hexp[−β1B{x,y} H]iξ ) ˜ y ) = − log ν(ξ 1x H(ξ = − log , y ν(ξ ) hexp[−β1By H]iξ where
P σ∈1 f (σ)T (ξ, σ)µ(σ) . hf iξ = P σ∈1 T (ξ, σ)µ(σ)
(2.6)
(2.7)
˜ is given by The contribution of ξ(y) to 1x H(ξ) ˜ ˜ y) = − 1x H(ξ 1x H(ξ) (2.8) {x,y} x hexp[−β1 H]i ν(ξ ν(ξ ) ) B{x,y} ξ log − log = log . ν(ξ y ) ν(ξ) hexp[−β1Bx H]iξ hexp[−β1By H]iξ If H is a Hamiltonian for a strictly local interaction, then 1B{x,y} H = 1Bx H + 1By H
(2.9)
whenever x and y are far enough apart and
hfx fy iξ − hfx iξ hfy iξ y ˜ ˜ 1x H(ξ) − 1x H(ξ ) = log 1 + , hfx iξ hfy iξ where fz = exp[−β1Bz H].
(2.10)
We see then that the decay of the interaction for ν is closely related to the decay of correlations in the state h·iξ . To define ν as a Gibbs measure in the infinite volume case, one needs that the state h·iξ has good decay of correlations uniformly in ξ, while 3 and V become infinite. But this is often too much to ask. We can compare the situation with the one in disordered systems where the decay of correlations is always proven only for a set
280
C. Maes, K. Vande Velde
of typical configurations of the random couplings and not for all configurations. There also will be ‘bad’ configurations spoiling a good decay of the correlations uniformly in the random couplings. We will therefore be interested in the decay of correlations (as in 2.10) for a typical set of configurations ξ. This allows to define an interaction potential for ν which behaves nicely (but not uniformly so) for a set of measure one. When T is interpreted as a dynamics, the measure h·iξ is seen to be the time reversal with respect to µ: it gives the probability for the state µ when the configuration in the next instant is known to be ξ. Also note that the main result in [10] follows from the above analysis: if the correlations hf ; giξ decay fast enough, uniformly in ξ, then ν can be defined as a bona-fide Gibbs measure also in the limit 3, V ↑ ∞. The way we presented the problem here allows to iterate the transformation. In particular, we don’t need that µ is a Gibbs measure in the strong sense. All we need is a DLR-type equation as (2.1) and a sufficiently local character as in (2.9) for the associated relative energies.
3. The Projection of the Ising Model For convenience, we restrict ourselves here to the two-dimensional case. This is however not essential. Consider the planar lattice 2 for which each site x ∈ 2 has coordinates (i, j). The integer lattice can be viewed as a one-dimensional sublattice containing the origin. That is, we take x ∈ iff x = (i, 0). 2 An Ising spin configuration on 2 is an element σ of 2 = {−1, +1} . Similarly, = {−1, +1} contains all spin configurations ξ = {ξ(x), x ∈ } on . We look at the Ising model on 2 . At low temperatures the restrictions ν ± to of the pure phases µ± of the two-dimensional Ising model are known to be non-Gibbsian [14]. Our goal is to construct an interaction for ν + that decays exponentially fast for a set of typical configurations of ν + . First we need some definitions. We write hxyi to indicate that two sites x and y are nearest neighbors in 2 . For a subset 3 ⊂ 2 , define 3c to be the complement of 3 and ∂3 = {x ∈ 3|∃y ∈ 3c , hxyi}. For n ≥ 1, let 3n = [−n, n]2 ∩ 2 and Vn = 3n ∩ . Let 0n = 3n ∩{(i, j) ∈ 2 |j > 0}. Let µ+n be the Ising measure on the volume 3n with + boundary conditions at inverse temperature β > 0: X X (σ(x)σ(y) − 1) − (σ(x) − 1), (3.1) Hn+ (σ) := − hxyi⊂3n
µ+n (σ) :=
hxyi x∈3n ;y∈3c n
1 exp(−βHn+ (σ)), Zn+ (β)
(3.2)
where Zn+ (β) is the usual normalizing partition function. Denote by νn+ the restriction of µ+n to Vn . For any set A ⊂ we write ξA for the configuration in that takes the same values as ξ inside A and the value +1 outside. For notational convenience we put x ξ := ξ{x}c . Let + denote the configuration with +(x) = +, ∀x. ‘ξ on A’ stands for the event that for every x ∈ A the spin equals ξ(x). We define for A ⊂ Vn the relative energy En+ (A, ξ) := log
νn+ [ξ on A|ξ on Vn \A] νn+ (ξ) = log . νn+ (ξAc ) νn+ [+ on A|ξ on Vn \A]
(3.3)
Relative Energies for Non-Gibbsian States
281
It follows from the results in [6] (Proposition 3.1) that En+ (A, ξ) → E + (A, ξ) for all ξ ∈ and that the limit is a relative energy for the state ν + , i.e. E + (A, ξ) = log
ν + [ξ on A|ξ on Ac ] ν + [+ on A|ξ on Ac ]
(3.4)
for ν + -a.e. ξ ∈ . We now construct an interaction potential ([15]). Write En+ (Vn , ξ)
=
n X
En+ (j, ξ[j,n] ).
(3.5)
j=−n
Define the interaction potential by En+ (j, ξ[j,n] ) = −
n X
8n[j,k] (ξ),
(3.6)
k=j
where for j < k
8n[j,k] (ξ) = En+ (j, ξ[j,k−1] ) − En+ (j, ξ[j,k] ).
(3.7)
˜ n+ (ξ)] In that way, the relative energy can be expressed through νn+ (ξ) = νn+ (+) exp[−H in the formula: n X n X ˜ n+ (ξ) = 8n[j,k] (ξ). (3.8) H j=−n k=j
We want to show that there exists a set + of “good” configurations on which the convergence of (3.7) is fast enough. We define the sets Dk , k ∈ IN by D2l+1 = [−l, l] ∩ and D2l = [−l + 1, l] ∩ except for D0 = ∅. Let Dk (x) = Dk + x. Define +l (x) := {ξ ∈ |∀k ≥ l, + :=
[
1 k
X
ξ(y) ≥ 3/4},
y∈Dk (x)
+l (x).
(3.9)
l
The 3/4 is of course arbitrary. Although the definition of +l (x) depends on the position of x, + is independent of x and thus translation invariant. Indeed, take the site z as a reference point instead of x. If ξ ∈ l (x) for some l then also ξ ∈ +l0 (z) for some l0 . In the following we will write +l = +l (o) (o denotes the origin). The important thing is Lemma 1. If β is large enough, then ν + (+ ) = 1. Proof. The proof is easy using that for β large enough, there exists α = α(β) > 0 such that 1 X ξ(x) < 3/4}) < e−αk . (3.10) ν + ({ξ ∈ | k x∈Dk
Theorem 1. 1. |E + (x, ξDk (x) ) − E + (x, ξ)| ≤ const(β)e−δ(β)k for all ξ ∈ +l (x), whenever k > l, and δ(β) ↑ ∞ as β ↑ ∞.
282
C. Maes, K. Vande Velde
2. For all ξ ∈ + :
lim 8n[j,k] (ξ) = 8[j,k] (ξ)
(3.11)
n
exists and is translation invariant with, for large |k − j| |8[j,k] (ξ| ≤ const(β) exp[−δ(β)|k − j|].
(3.12)
Statement 2 is a straight consequence of 1 and (3.11): Exponentially fast convergence implies absolute convergence. We now set out to prove 1. Because of translation invariance we restrict ourselves now to x = o (the origin) and study the object h+n (ξ) := En+ (0, ξ) = log
νn+ (ξ) . νn+ (o ξ)
(3.13)
A small calculation shows that h+n (ξ) = β(1 − ξ(0))ξ(0)(ξ(1) + ξ(−1)) + 2 log X
where Zn+,ξ (β) :=
Zn+,ξ (β) o
Zn+, ξ (β)
,
exp(−βHn+,ξ (σ))
(3.14)
(3.15)
σ(x)=±,x∈0n
with Hn+,ξ the Ising Hamiltonian on the volume 0n with + boundary conditions on 0cn \ and ξ boundary conditions on : X (σ(x)σ(y) − 1) − Hn+,ξ (σ) := − hxyi⊂0n
−
X
(σ(x) − 1) −
hxyi x∈0n ;y∈3c n
X
(ξ(x)σ(x0 ) − 1),
(3.16)
x∈Vn
where σ(x0 ) is the neighboring spin in 0n of ξ(x). The relative energy (3.13) can be expanded using the telescopic identity (as in [2]) h+n (ξ) =
2n+1 X
(h+n (ξk ) − h+n (ξk−1 )),
(3.17)
k=1
where we have put ξk := ξDk for notational convenience. The constant h+n (+) = 0. Now define (3.18) 8nk (ξ) := h+n (ξk ) − h+n (ξk−1 ) for k ≥ 1. More explicitly, 8nk (ξ) = 2 log
+,o ξk−1
Zn+,ξk (β)Zn
(β)
o +,ξ Zn+, ξk (β)Zn k−1 (β)
(3.19)
for all k > 3. Let ak = Dk \Dk−1 . Observe that (3.19) equals zero whenever ξk = ξk−1 , i.e. whenever ξ(ak ) = + or whenever ξ = o ξ, i.e. when ξ(o) = +. As pointed out in the general setup, we can write the 8nk in terms of correlation functions. For any ξ, η ∈ let µη,ξ n be the measure on 0n with ξ boundary conditions on Vn and η boundary conditions on the rest of ∂0n . One readily sees from (3.19) that for k > 3 8nk (ξ) =
1 µ+,ξk (e2βσ(0,1) e2βσ(ak ,1) ) (1 − ξ(0))(1 − ξ(ak )) log +,ξkn k 2βσ(ak ,1) ) 2 µn (e2βσ(0,1) )µ+,ξ n (e
(3.20)
Relative Energies for Non-Gibbsian States
283
Proposition 1. The projection νn+ satisfies νn+ (o ξ) = νn+ (ξ) exp[−
2n+1 X
8nk (ξ)]
(3.21)
k=0
with {8nk }k defined in (3.18),(3.19) satisfying: 1. limn 8nk (ξ) = 8k (ξ) exists for all ξ ∈ ; and for β large, 2. |8nk (ξ)| ≤ const(β)e−δ(β)k for all ξ ∈ +l whenever k > l, and δ(β) ↑ ∞ as β ↑ ∞; P2n+1 n + , uniformly in n ↑ ∞; 3. k=0 |8k (ξ)| ≤ c(ξ) < ∞ for all ξ ∈P ∞ + 4. For all ξ ∈ , h+ (ξ) := limn h+n (ξ) = k=0 8k (ξ). The first statement of Proposition 1 (existence of the limit) can be easily obtained via a monotonicity argument: h+n (ξk ) is increasing in n and bounded. The proof of 2 follows from combining (3.20) with Proposition 2. Take fm (σ) = exp[2βσ(m, 1)]. For β large, there exists δ = δ(β) (δ ↑ ∞ as β ↑ ∞), such that for ξ ∈ +l and 2n + 1 ≥ k ≥ l +,ξk +,ξk −δk k |µ+,ξ . n (f0 fak ) − µn (f0 )µn (fak )| ≤ const(β)e
(3.22)
The proof of 3,4 of Proposition 1 follows from 2. 3 is a direct consequence of 2. To prove 4, we remark that 2n+1 X
8nk (ξ)
=
k=0
∞ X
8nk (ξ),
(3.23)
k=0
since the extra terms on the right-hand side are identically zero. We can then use dominated convergence to get
lim n
2n+1 X k=0
8nk (ξ) =
∞ X
8k (ξ)
(3.24)
k=0
on + . We introduce the following notation. For any two sets A, B ⊂ 2 , define the event A → B = {(σ, σ 0 ) ∈ 2×2| there is a path π from A to B such that (σ(x), σ 0 (x)) 6= (+, +) for all x ∈ π}. A path from A to B is a sequence x(0) ∈ A, x(1), . . . , x(n) ∈ B of consecutive nearest neighbor sites. Before proving Proposition 2 we need the following Lemma 2. If β is large enough there exists δ = δ(β) (δ ↑ ∞ as β ↑ ∞), such that for ξ ∈ l and k ≥ l, c −δk k k × µ+,ξ . µ+,ξ n n [(0, 1) → 3k ] ≤ const(β)e
(3.25)
284
C. Maes, K. Vande Velde
Proof. First note that for ξk with ξ ∈ +k only one spin out of eight can be minus in Dk and thus X c +,+ c k k × µ+,ξ (1 − ξ(x))] µ+,+ µ+,ξ n n [(0, 1) → 3k ] ≤ exp[4β n × µn [(0, 1) → 3k ] x∈Dk
≤e
βk
c × µ+,+ n [(0, 1) → 3k ].
µ+,+ n
(3.26)
We can then use Proposition 2.4 in [3] (with some trivial modifications) to conclude that for β large c −δk k k × µ+,ξ (3.27) µ+,ξ n n [(0, 1) → 3k ] ≤ const(β)e and δ ↑ ∞ as β ↑ ∞.
Proof of Proposition 2. For any configuration ξ ∈ we have that +,ξ +,ξ |µ+,ξ n (f0 fak ) − µn (f0 )µn (fak )| = η,ξ c +,ξ µ+,ξ |µ+,ξ n n (fak |σ = η on ∂3k )[µk (f0 ) − µn (f0 )] | η,ξ +,ξ ≤ e2β µ+,ξ n (|µk (f0 ) − µn (f0 )|) η,ξ +,ξ = e2β µ+,ξ n (|µk × µn (f0 ×
−
× f0 )|).
(3.28)
The rest of the argument is similar to the ideas in [1]. Pick a configuration (σ, σ 0 ) from the c +,ξ 0 distribution µη,ξ k × µn . If there is no path from (0, 1) to 3k on which (σ, σ ) 6≡ (+, +), c there exists a ∗-chain around (0, 1) separating it from 3k and on which σ ≡ σ 0 . This chain has a part in 0k on which (σ, σ 0 ) ≡ (+, +) and a part in Vk on which lives the configuration ξ and (σ, σ 0 ) ≡ (ξ, ξ). It follows that there exists a maximal ∗-chain 1 with this property inside 3k (maximal in the sense that it is contained in no other ∗-chain). Therefore +,ξ µη,ξ k × µn (f0 ×
X
−
× f0 ) =
+,ξ µη,ξ k × µn (f0 ×
(3.29) +,ξ × f0 |1) µη,ξ k × µn (1) +
−
∗-chains 1 around (0,1)
+,ξ µη,ξ k × µn (f0 ×
−
+,ξ c × f0 |(0, 1) → 3ck ) µη,ξ k × µn [(0, 1) → 3k ].
Now since σ = σ 0 on 1, +,ξ µη,ξ k × µn (f0 ×
−
× f0 |1) = 0
(3.30)
for every 1 so +,ξ |µη,ξ k × µn (f0 ×
−
+,ξ c × f0 )| ≤ 2e2β µη,ξ k × µn [(0, 1) → 3k ].
(3.31)
Lemma 2 and (3.28), (3.31) now yield that +,ξk +,ξk k |µ+,ξ n (f0 fak ) − µn (f0 )µn (fak )| ≤ η,ξk k k e2β µ+,ξ × µ+,ξ n [|µk n (f0 × k 2e4β µ+,ξ n
×
k µ+,ξ n [(0, 1)
→
−
3ck ]
× f0 )|] ≤
≤ const(β)e−δk .
(3.32)
Relative Energies for Non-Gibbsian States
285
Proof of Theorem. Take x = o, ξ ∈ +l . We use statements 2 and 4 of Proposition 1. |E + (o, ξk ) − E + (o, ξ)| = |h+ (ξk ) − h+ (ξ)| ∞ X 8i (ξ)| =| i=k+1
≤
∞ X
|8i (ξ)|
i=k+1
≤ const(β)e−δ(β)k .
(3.33)
Remarks. • The proof of Lemma 2 works only for dimension d = 2. The result is however more general. It can be proven with a little bit more effort. We are also able to treat the Ising model under a Kadanoff transformation in essentially the same way. We will return to this in a future publication. The proof of Proposition 2 does not depend essentially on specific features of the Ising model. The important thing is to be able to prove Lemma 2, but this is expected to hold for a much wider class of measures, see chapter 6 in [7]. A result that comes close to Lemma 2 is found in [8]. In fact, from [8] we know that for all temperatures T < Tc the (+, +) sites percolate in the µ+ × µ+ state. • The same things can of course be done for the minus-phase ν − (obtained from µ− ), 0 resulting S in a good behavior of Pa potential U on − = l {ξ ∈ |∀k ≥ l, k1 x∈Dk ξ(x) ≤ −3/4}. 0 . This comes from the fact that in the +-phase It is clear that U 6= U 0 , e.g. U{x} 6= U{x} one starts from all + as a reference, while in the −-phase one starts with all −. + − For any ξ ∈ we have that h+n (ξ) = h− n (−ξ). We do not know whether h (ξ) = h (ξ) for ξ ∈ + ∪ − nor do we know anything on the continuity of these functions on + ∪ − (for more on the relation between Gibbsianness and continuity see [5–7]). References 1. van den Berg, J.: A uniqueness condition for Gibbs measures with application to the two-dimensional Ising antiferromagnet. Commun. Math. Phys. 152, 161 (1993) and J. van den Berg, C. Maes: Disagreement percolation in the study of Markov fields. Ann. Prob. 22, 749 (1994) 2. Bricmont, J., Kupiainen, A.: Infinite dimensional SRB-measures. Submitted to Physica D 3. Burton, R.M., Steif, J.E.: Quite weak Bernoulli with exponential rate and percolation for random fields. Stoch. Process. Appl. 58, 35 (1995) 4. Dobrushin, R.: A Gibbsian representation for non-Gibbsian fields. Talk presented in Renkum, September 1995 5. van Enter, A.C.D., Fern´andez, R., Sokal, A.D.: Regularity properties and pathologies of position-space renormalization transformations: Sscope and limitations of Gibbsian theory. J. Stat. Phys. 72, 879 (1993) 6. Fern´andez, R. and Pfister, C.-E.: Global specifications and non-quasilocality of projections of Gibbs measures. Preprint EPFL Lausanne (1996) 7. Georgii, H.-O.: Gibbs measures and phase transitions. Berlin · New York: de Gruyter 1988 8. Giacomin, G., Lebowitz, J.L., Maes, C.: Agreement percolation and phase coexistence in some Gibbs systems. J. Stat. Phys. 80, 1379 (1995) 9. Griffiths, R.B. and Pearce, P.A.: Mathematical properties of renormalization-group transformations. J. Stat. Phys. 20, 499 (1979)
286
C. Maes, K. Vande Velde
10. Haller, K., Kennedy, T.: Absence of renormalization group pathologies near the critical temperature two examples. Preprint 95–505 texas mathematical physics archive 11. Kadanoff, L.P. Houghton, A.: Numerical evaluations of the critical properties of the two-dimensional Ising model. Phys. Rev. B 11: 377 (1975) 12. Lebowitz, J.L. and Maes, C.: The effect of an external field on an interface, entropic repulsion. J. Stat. Phys. 46, 39 (1987) 13. Lebowitz, J.L. and Schonmann, R.: Pseudo-free energies and large deviations for non-Gibbsian FKG measures. Probab. Th. Rel. Fields 77, 49 (1988) 14. Schonmann, R.H.: Projections of Gibbs measures may be non-Gibbsian. Commun. Math. Phys. 124, 1 (1989) 15. Sullivan, W.G.: Potentials for almost markovian random fields Commun. Math. Phys. 33, 61 (1973) Communicated by A. Kupiainen
Commun. Math. Phys. 189, 287 – 298 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Phase Diagram of Ising Systems with Additional Long Range Forces ? T. Bodineau1 , E. Presutti2 1
DMI, Ecole Normale Sup´erieure, 45 Rue d’ Ulm, 75005 Paris, France Dipartimento di Matematica, Universit`a di Roma Tor Vergata, Via della Ricerca Scientifica, 00133 Roma, Italy
2
Received: 9 April 1996 / Accepted: 26 November 1996
Dedicated to the memory of Roland Dobrushin
Abstract: We consider ferromagnetic Ising systems where the interaction is given by the sum of a fixed reference potential and a Kac potential of intensity λ ≥ 0 and scaling parameter γ > 0. In the Lebowitz Penrose limit γ → 0+ the phase diagram in the (T, λ) positive quadrant is described by a critical curve λmf (T ), which separates the regions with one and two phases, respectively below and above the curve. We prove that if λ > λmf (T ), i.e. above the curve, there are at least two Gibbs states for small values of γ. If instead λ < λmf (T ) and if the reference Gibbs state (i.e. without the Kac potential) satisfies a mixing condition at the temperature T , then, at the same temperature the full interaction (i.e. with also the Kac potential) satisfies the Dobrushin Shlosman uniqueness condition for small values of γ so that there is a unique Gibbs state.
1. Introduction In the original van der Waals theory the occurrence of phase transitions is due to long attractive forces between molecules. In its Statistical Mechanics formulation, [11], these forces are described by Kac potentials that depend on a scaling parameter γ > 0 which controls the strength and the range of the potential. In a large variety of systems the scaling limit γ → 0+ reproduces the van der Waals theory, see [11] and [12]. Here we consider the special case of Ising spins σx , x ∈ Zd , d ≥ 2, σx = ±1, with Hamiltonian H(σ) := −
X 1 X j(x, y) + λJγ (x, y) σx σy − h σx , 2 x
(1.1)
x6=y
? The research has been partially supported by CNR (CNR-CNRS agreement), GNFM and by the grant CEE CHRX-CT93-0411.
288
T. Bodineau, E. Presutti
where h is an external magnetic field, j(x, y) ≥ 0 is a “fixed," translationally invariant, finite range potential. λ ≥ 0 a strength parameter, γ > 0 the Kac scaling parameter and Jγ (x, y) := γ d J(γ|x − y|)
(1.2)
the Kac potential, where J(s), s ≥ 0, is a smooth, non-negative function, supported by s ∈ [0, 1] and such that Z drJ(|r|) = 1 . (1.3) Rd
The system with λ = 0 is called by Lebowitz and Penrose the “reference system." We denote by P ref (T, h) and F ref (T, m) the thermodynamic limits of the grand canonical pressure and, respectively, of the canonical free energy density in the reference system, T being the absolute temperature and m the magnetization density. F ref (T, m) is the Legendre transform of P ref (T, h), F ref (T, m) = sup{hm − P ref (T, h)} . h
Let Pγ (T, h, λ) and Fγ (T, m, λ) be the corresponding quantities relative to (1.1). We recall from [12] that λ m2 + F ref (T, m) , lim+ Fγ (T, m, λ) = F0 (T, m, λ) = CE − γ→0 2kT
(1.4)
where, denoting by fT,λ (m) the argument of CE(·) in (1.4), CE(fT,λ (m)) is the convex envelope of fT,λ (m), as a function of m with T and λ fixed. F0 (T, m, λ) has a phase transition when T , λ and m are such that CE(fT,λ (m)) 6= fT,λ (m). In the positive quadrant of the plane (T, λ) there is a phase transition above a curve λ = λmf (T ): namely for each (T, λ) with λ > λmf (T ), there is an interval [−mT,λ , mT,λ ], mT,λ > 0, where F0 (T, m, λ) is different from fT,λ (m) and it is a flat function of m. There are therefore two pure phases with magnetization density ±mT,λ and this phase transition, described by F0 (T, m, λ), is of the van der Waals type. The subcritical region λ ≤ λmf (T ) is determined by either one of the following two equivalent conditions: λ ∂ 2 ref , F (T, 0) ≥ 2 ∂m kT
λ
∂ 2 ref P (T, 0) ≤ kT . ∂h2
(1.5)
Since P ref (T, h) has a discontinuous derivative at h = 0 for T < Tcref , Tcref the critical temperature of the reference system alone, (whose existence follows from ferromagnetic inequalities) then λmf (T ) = 0 for all T ≤ Tcref . λmf (T ) > 0 for T > Tcref , where λmf (T ) is defined by the equation λP 00 (T, 0) = kT , we simply denote hereafter by P (T, h) the pressure in the reference system, P 00 (T, h) being its second derivative with respect to h. The purpose of this paper is to characterize the phase transition region for the system with Hamiltonian (1.1). The aim is to prove that the critical curve λγ (T ) converges to λmf (T ) as γ → 0+ . This statement is proved in [2] for j ≡ 0. We have successfully extended this result for its part which refers to the existence of phase transitions:
Phase Diagram of Ising Systems with Additional Long Range Forces
289
Theorem 1.1 (Existence of phase transitions). There is a positive function γ(T, λ), T > 0, λ > λmf (T ), so that for any γ ≤ γ(T, λ) the Hamiltonian (1.1), with h = 0, has at least two distinct Gibbs states. Theorem 1.1 is proved in Sect. 2. The proof, similar to that in [2], is only sketched. It involves a Peierls estimate on contours defined in terms of suitable block spin variables. The converse statement, on the absence of phase transitions, is true for general Ising ferromagnets at h 6= 0, so that hereafter we restrict to h = 0. When j ≡ 0 uniqueness below λmf (T ) follows directly from the Dobrushin high temperature theorem, [4]. Applied to (1.1) it says in fact that there is only one Gibbs state if X j(x, y) + λJγ (x, y) < kT . (1.6) y
If j ≡ 0, (1.6) becomes
P y
lim
λJγ (x, y) < kT . By (1.3),
γ→0+
X
Z Jγ (x, y) =
y
Rd
dr J(|r|) = 1 ,
so that there is a unique Gibbs state for small γ if λ < kT . On the other hand when j ≡ 0, 1−m 1+m 1+m 1−m log − log , 2 2 2 2 (1.7) and the first condition in (1.5) becomes λ < kT . Thus “miraculously" the Dobrushin theorem gives the mean field critical curve. The condition (1.6) defines (also for j 6= 0) a curve λD γ (T ) below which there is (T ) → λD (T ) as γ → 0+ and uniqueness. However, in contrast to the case j = 0, λD γ λD (T ) is strictly below the mean field critical value λmf (T ). It thus remains to examine what happens between these two curves. We will prove uniqueness below λ < λmf (T ) if the reference Gibbs measure with temperature T satisfies the strong mixing condition below. We are now considering T > Tcref as the region below λmf (T ) is void in {T ≤ ref Tc }. We denote by ν h,T the (unique) Gibbs measure for the reference system at temperature T and magnetic field h and if 3 is a finite set in Zd , ν3h,T (·|τ ) denotes its conditional probability in 3 with boundary conditions τ . If 1 is any set included in 3 h,T (·|τ ) the projection of the measure ν3h,T (·|τ ) on the set {−1, 1}1 . we denote by ν1,3 F ref (T, m) := −kT i(m),
i(m) := −
Definition 1.2 (The strong mixing (SM) condition). The condition SM holds at temperature T if there are C1 and C2 positive so that for any cube 3, any subset 1 in 3 and every site x outside 3, h,T h,T (·|τ ); ν1,3 (·|τ x ) ≤ C1 exp(−C2 dist(x, 1)) , (1.8) sup sup var ν1,3 τ
h∈R
where var denotes the variation distance and τ x is obtained from τ by flipping the spin at x.
290
T. Bodineau, E. Presutti
The condition SM holds when (1.6) is satisfied. SM is similar to the Dobrushin Shlosman complete analyticity condition (see [6] and [7]), but restricted to all the cubes 3 in the lattice, see [6, 7] and for a general discussion on these issues also [13–15] and [10]. We actually use in the proofs a weaker condition involving the two body correlation functions, as will become clear in Sect. 3. Theorem 1.3 (Absence of phase transitions). There is a positive function γ(T, λ) defined for all T for which SM holds and for all λ < λmf (T ), so that if γ ≤ γ(T, λ) then there is a unique Gibbs state for the Hamiltonian (1.1) and the Dobrushin Shlosman uniqueness condition is satisfied. As already remarked, the condition SM holds for all T > T D (the temperature when (1.6) with λ = 0 becomes an equality) so that by the above theorem uniqueness extends in the semiplane T > T D till the critical curve λmf (T ) and past the region where the Dobrushin high temperature condition is verified for the full system (1.1): thus for the interaction (1.1) the Dobrushin condition (1.6) is not valid, but the weaker Dobrushin Shlosman uniqueness condition still holds. There is general belief (if we understand the literature correctly) that for such ferromagnetic systems as ours there is a good chance that SM holds all the way down to the critical temperature Tcref . This is indeed the case in the two dimensional Ising model with nearest neighbor interactions, as proved in [14]. Therefore in d = 2 with j(x, y) the nearest neighbor interaction, for small γ and in the sense of Theorems 1.1 and 1.3, the phase diagram for the Hamiltonian (1.1) exhibits essentially the same features as the diagram of F0 (T, m, λ), i.e. it is of the van der Waals type. A few final remarks: While evident from the proofs why the Dobrushin and the Dobrushin Shlosman uniqueness conditions are just those needed to extend the analysis until the critical curve, yet it is surprising that these uniqueness conditions introduced in apparently different contexts may just coincide with the criticality condition for mean field. What we have done in this paper fits so well in the scheme conceived by Dobrushin in his theory of Gibbs states that it seems like he already had all these applications in mind. This work is dedicated to his memory, we will always remember him as a great scientist and friend.
2. Proof of Theorem 1.1 (Phase Transition) Let Tcref be the critical temperature of the reference system. It is enough to prove that a phase transition occurs for the Hamiltonian (1.1) with h = 0 for each point (T, λ) with λ > λmf (T ) and T > Tcref , because due to ferromagnetic inequalities this implies that there is also a phase transition for any smaller value of T at the same λ. We thus hereafter restrict to T > Tcref . The existence of more than one Gibbs state is due to a breaking of the spin flip symmetry and it is proved by a Peierls estimate on contours. The role of the temperature is played by γ. Block spins. The block spins are defined in terms of spin averages over cubes in Zd whose size depends on γ. To scale out such a dependence we first represent for any given γ > 0 the spin configurations as elements in L∞ (Rd ; {±1}), by setting s(r) = σ(x) if γ −1 r is in the unit cube of Rd with center in x. We then introduce two partitions of Rd ,
Phase Diagram of Ising Systems with Additional Long Range Forces
291
{1i , i ∈ Zd } and {Ci , i ∈ Zd } into cubes of side ` and, respectively, L. We suppose the former finer than the latter and that L 1 `, L and ` are fixed independently of γ. We also suppose, for simplicity, that each cube of side γ, where s(r) is constant is entirely contained in some cube 1i (by suitably restricting the values of γ). Given an “accuracy parameter" ζ > 0 we define for each spin configuration s a block spin configuration η ∈ L∞ (Rd ; [−1, 1]) by setting η(r) = ±1 for all r ∈ C if for any 1 ⊂ C 1 Z dr0 s(r0 ) ∓ mT,λ < ζ (2.1) |1| 1 (±mT,λ being the magnetizations of the pure phases relative to the limit free energy F0 (T, m, λ)). We set η(r) = 0 in all the other cases. We will sometimes write ηi for the value of η(r) when r ∈ Ci and, by using the previous rules, we will also introduce block spin configurations η obtained from any m ∈ L∞ (Rd ; [−1, 1]). The correct set and the contours. Two cubes Ci and Cj (resp. 1i and 1j ) are ?connected if their closures have non-empty intersection. A block Ci is “correct" if ηi 6= 0 and ηi = ηj for all the ?-neighbor blocks Cj . The contours are the maximal ?-connected components of the complement of the “correct set." We denote by 0 a contour, including the specification of the values ηi on the cubes Ci whose closure has non-empty intersection with the closure of the spatial support of 0. When clear from the context, we may denote by 0 only the spatial part of the contour. The free energy of a contour. Given a contour 0 we denote by 0¯ the set union of 0 and all the the cubes 1 in 0c which have distance from 0 ≤ L/2 − 10 (we are supposing c L large enough). We call δ0 the union of all the cubes 1 in 0¯ whose distance from 0 is ≤ L/2 + 10. Let sδ0 be a spin configuration on δ0 compatible with 0: namely (2.1) holds on all 1 in δ0 with the same sign in (2.1) as that of η, which is specified by the contour 0. We then define s?δ0 as the configuration on δ0 such that s?δ0 (r) = sδ0 (r) if η(r) = 1, while s?δ0 (r) = −sδ0 (r) if η(r) = −1, η(r) being the block spin value specified by the contour 0. The (excess) free energy of a contour 0 with boundary spins sδ0 is then defined as n o F (0, sδ0 ) := −kT log Z0constr (sδ0 ) − log Z0¯ (s?δ0 ) , (2.2) ¯ where Z0¯ (s?δ0 ) is the partition function in the region 0¯ with boundary conditions s?δ0 (recall that we are using spin variables on Rd and that to go back to the familiar notation on the lattice we need to scale the region by a factor γ −1 and then discretize associating to each site in Zd the unit cube with that center). Denoting by β = 1/kT , X Z0constr (sδ0 ) := 1{s0¯ ⇒0} e−βH(s0¯ |sδ0 ) , (2.3) ¯ s0¯
where {s0¯ ⇒ 0} denotes the set of all the spin configurations s0¯ that can be completed into a configuration s equal to sδ0 on δ0 and such that 0 is a contour for s. H(s0¯ |sδ0 ) is the energy of s0¯ in interaction with sδ0 . ref Theorem 2.1. Let λ > λmf (T ), T > Tc . Then there are `, L, ζ and c all positive so that for any finite contour 0, any sδ0 and any γ > 0 small enough, F (0, sδ0 ) ≥ cγ −d N0 , where N0 is the number of cubes C in 0.
(2.4)
292
T. Bodineau, E. Presutti
Remark. When γ is small the estimate (2.4) implies by the Peierls argument the existence of at least two Gibbs states, we refer for instance to [8, 16, 2] for a proof of such an implication. The sequel of the section is devoted to the proof of Theorem 2.1. Reduction to a variational problem. Let 3 and 1 be measurable regions in Rd , m3 and m1 measurable functions on 3 resp. 1 with values in [−1, 1]. We set Z Z 1 F3,m1 (m3 ) := dr ω(m3 (r)) + dr dr0 J(|r − r0 |)[m3 (r) − m3 (r0 )]2 4 3 3×3 Z 1 + dr dr0 J(|r − r0 |)[m3 (r) − m1 (r0 )]2 , (2.5) 2 3×1 where for s ∈ [−1, 1] , ω(s) := [−
λs2 λs2 + F ref (T, s)] − min [− + F ref (T, s)] . s∈[−1,1] 2 2
(2.6)
By [9], ω(s) is a double well potential with minima at ±mT,λ , mT,λ > 0. Lemma 2.2. There is a function 0(γ) that vanishes as γ → 0+ so that the following holds. Let 0 be a contour, s a configuration that produces 0 and sδ0 its restriction to δ0. Then n o (2.7) F (0, sδ0 ) − γ −d inf F0,s ¯ δ0 (m0¯ ) − inf F0,s ¯ ? (m0¯ ) ≤ N0 0(γ) , m0¯ ⇒0
δ0
m0¯
see (2.3) and (2.4) for notation. The proof of Lemma 2.2 is based on the original proof by Lebowitz and Penrose, [12]: first we introduce a new partition into cubes of side γ 1−a , a ∈ (0, 1). Then we express the partition function in terms of averages of spins over the new cubes. An error comes when we take the Kac potential constant in each cube (we do that to have the energy expressed in terms of the average spins). Another error comes after we have fixed the values of these averages, when we express the canonical partition function of the reference system in each cube in terms of the limit free energy F ref (T, s). We omit the details and refer for the case j ≡ 0 to [1] and [2]. We are going to prove that there is a positive constant c such that inf
m0¯ ⇒0
F0,s ¯ δ0 (m0¯ ) ≥ cN0 + inf
m0¯ ⇒0
F0/0,s (m0¯ ) , ¯ δ0
(2.8)
¯ where 0/0 is the set theoretical difference of 0¯ and 0; of course we are now referring ¯ to their spatial supports. Dropping the interactions between 0/0 and 0, we get inf
m0¯ ⇒0
F0,s ¯ δ0 (m0¯ ) ≥ inf
m0 ⇒0
F0 (m0 ) + inf
m0¯ ⇒0
F0/0,s (m0/0 ¯ ¯ ), δ0
where F0 is defined as in (2.5) without the last term, i.e. with no interaction with the boundaries. Thus to derive (2.8) we need only prove that inf m0 ⇒0 F0 (m0 ) ≥ cN0 . ¯ [−1, 1]) | m0¯ ⇒ 0} ¯ is a finite intersection of open Notice that the set {m0¯ ∈ L2 (0; ¯ [−1, 1]). For each cube C we define a and closed sets in the weak topology of L2 (0; ¯ [−1, 1]), functional F¯ C on L2 (0;
Phase Diagram of Ising Systems with Additional Long Range Forces
Z F¯ C (m0¯ ) :=
C
dr jC (r) ω(m0¯ (r))+
1 4
Z C×0¯
293
dr dr0 J(|r−r0 |)[m0¯ (r)−m0¯ (r0 )]2 , (2.9)
R the term jC (r) := C dr0 J(r − r0 ) has been added to make the functional F¯ C lower semi-continuous in the weak topology of L2 (C; [−1, 1]). We get X F¯ C (m0¯ ) . inf F0¯ (m0¯ ) ≥ inf (2.10) m0¯ ⇒0¯
m0¯ ⇒0¯
C⊂0¯
Let C be the set of functions m(r), r ∈ C ∪ ∂C, with values in [−1, 1], ∂C denoting the union of the cubes ?-connected to C. The set C is compact in the weak topology. We consider the subset C1 of C which contains only the configurations such that η equals 0 on C, this set is also compact in the weak topology. By using the lower semi-continuity of F¯ C we deduce that inf F¯ C (m0¯ ) ≥ c1 , m0¯ ⇒C1
where c1 is a positive constant which does not depend on the location of C. Let C2 be the subset of C which contains the configurations such that η equals 1 (resp. -1) on C and -1 (resp. 1) on a cube ?-connected to C. A similar argument implies that there is a positive constant c2 such that inf
m0¯ ⇒C2
F¯ C (m0¯ ) ≥ c2 .
Any contour 0 is an intersection of shifted sets of type C1 or C2 . Noticing that F¯ C (m0¯ ) depends only of the values of m0¯ on the set C ∪ ∂C, we get for c = min(c1 , c2 ), inf F¯ C (m0¯ ) ≥ inf inf F¯ C (m0¯ ); inf F¯ C (m0¯ ) ≥ c , m0¯ ⇒C1
m0¯ ⇒0¯
m0¯ ⇒C2
thus combining the previous estimates and (2.10) we derive (2.8). We will next prove that for any ε > 0 there is L sufficiently large such that ? (m0/0 inf F0,s ¯ ? (m0¯ ) ≤ εN0 + inf F0/0,s ¯ ¯ ), δ0
m0¯
δ0
m0¯ /0
(2.11)
and this, together with (2.7) and (2.8), will prove Theorem 2.1, because, by symmetry, the last infimum in (2.11) is equal to the infimum when the conditioning is sδ0 . To prove (2.11) we follow [2]. Given any ε0 > 0 we will construct a configuration 1 satisfies m0¯ equal to mT,λ on 0 and such that its restriction m10/0 ¯ 1 0 ? (m0/0 ? (m ¯ inf F0/0,s ¯ ¯ ) ≥ F0/0,s ¯ 0/0 ) − ε . δ0
m0¯ /0
δ0
We will also show that 1 1 ? (m ¯ F0,s ¯ ? (m0¯ ) ≤ εN0 + F0/0,s ¯ 0/0 ) . δ0
δ0
(2.12)
¯ This is accomplished by showing that for some constants a and b and all r ∈ 0/0, |m10¯ (r) − mT,λ | ≤ a exp(−b dist(r, δ0)) .
(2.13)
294
T. Bodineau, E. Presutti
If the side length L is sufficiently large, (2.13) implies in fact that m10¯ (r) is exponentially ¯ close to mT,λ as r → 0. Then by (2.13) the cost of the interaction between 0 and 0/0 is bounded by εN0 . ? is a The configuration m10¯ (r) is constructed using a dynamics for which F0/0,σ ¯ δ0 0 Lyapunov function. Then, starting from a configuration m0/0 which satisfies ¯ 0 0 ? (m0/0 ? (m ¯ inf F0/0,s ¯ ¯ ) ≥ F0/0,s ¯ 0/0 ) − ε , δ0
m0¯ /0
δ0
the dynamics will evolve it into another configuration, m10/0 ¯ , which, as we will see, has the property (2.13), but also, (because the evolution decreases the free energy functional) 1 0 ? (m ¯ ? (m ¯ F0/0,s ¯ ¯ 0/0 ) ≤ F0/0,s 0/0 ) . δ0
(2.14)
δ0
The natural choice for the dynamics would be to take the limit of the spin flip Glauber dynamics, see [3] and [1]. This is what is done in [2], but here where j 6= 0, the analysis of the continuum limit of the Glauber dynamics is an open, non-trivial problem that we will avoid by defining the evolution directly in the macroscopic regime. We define in ¯ Setting fact the dynamics as the solution mt of the following Cauchy problem in 0/0. ¯ r0 ) = j(r)δ(r − r0 ) + J(r − r0 ) J(r,
Z j(r) =
and
0
dr0 J(r − r0 ) ,
(2.15)
¯ we define for r ∈ 0/0 and t ≥ 0, λ ∂ mt (r) = −mt (r) + P 0 J¯ ? [mt (r) + s?δ0 (r)] , ∂t kT
(2.16)
where the ? product denotes the action of the kernel J¯ on what follows and P (h) is the pressure in the reference system as a function of the magnetic field; we are omitting here the dependence on T that is fixed (since T > Tcref , the critical temperature, the pressure is continuously differentiable). Our initial condition is m0 = m00/0 ¯ . By an explicit computation we find that ? (mt ) dF0/0,s ¯ δ0
dt
λ ? ¯ =− J ? [mt (r) + sδ0 (r)] dr −mt (r) + P kT ¯ 0/0 λ ¯ J ? [mt (r) + s?δ0 (r)] − F 0 (mt (r)) , (2.17) kT Z
0
where F (m) is the free energy density in the reference system as a function of m (we are again omitting the dependence on T ). As P is the Legendre transform of F , the inverse of −1 F 0 is linked to P 0 by F 0 = P 0 . This implies that the rhs of (2.17) is non-positive. Thus F is a Lyapunov function for (2.16) and (2.14) holds with mt replacing m10/0 ¯ . As in [2] we can find bounds on the solution of (2.16) by using subsolutions and supersolutions. In this way we prove that for t large enough mt satisfies (2.13), we then choose m10/0 ¯ equal to mt and we thus derive the upper bound (2.11). We omit the details.
Phase Diagram of Ising Systems with Additional Long Range Forces
295
3. Proof of Theorem 1.3 (Uniqueness) In this section we suppose that the reference system satisfies the strong mixing property of Definition 1.2 and prove that the Hamiltonian (1.1) has a unique Gibbs state. We will denote by µγ,3 (·|τ ), 3 a finite region in Zd , the Gibbs conditional probability in 3 with Hamiltonian (1.1) and boundary conditions τ . T > Tcref and λ > 0 are fixed and dropped from the notation. We will prove uniqueness by checking that the Hamiltonian (1.1) satisfies the Dobrushin and Shlosman criterion C3 , [5]. Definition 3.1 (The criterion C3 ). The Hamiltonian (1.1) satisfies C3 , 3 a finite subset of Zd , if there exists a function kx ≥ 0, x ∈ 3c , for which the following holds: For all x ∈ 3c and all configurations τ , τ 0 which differ only in x Rρ µγ,3 (·| τ ), µγ,3 (·| τ 0 ) ≤ kx , (3.1) where Rρ is the Vasserstein distance, [4, 5], defined in terms of X 0 ) := |σx − σx0 | ρ(σ3 , σ3
(3.2)
x∈3
X
and
kx < |3| .
(3.3)
x∈3c
By [5] if a Hamiltonian satisfies C3 the corresponding Gibbs state is unique, hence we only need to prove that for all γ small enough (1.1) satisfies C3 . Step 1. Let 3 be the cube centered in 0 with side length [γ −δ ], δ ∈ (0, 1/10). Let τ be a configuration in 3c , we set X h := Jγ (0, y)τ (y) . (3.4) y∈3c
For any x in 3c , we denote by τ 0 the configuration obtained from τ by flipping the spin at x. (5) We will prove that there is cγ (x), x ∈ 3c , so that cγ (x) = c(1) γ (x) + · · · + cγ (x), (i) cγ (x) ≥ 0, X lim+ c(i) i = 1, . . . , 5 , (3.5) γ (x) = 0, γ→0
x∈3c
and such that the following holds uniformly in τ |3| n o λJγ (0, x)P 00 (h) + cγ (x) , Rρ µγ,3 (·| τ ), µγ,3 (·| τ 0 ) ≤ kT
(3.6)
where P (h) is the pressure of the reference system with external magnetic field h at the given temperature. By GHS inequalities we check that P 00 (h) ≤ P 00 (0) (see for instance [9]). Thus o |3| n λJγ (0, x)P 00 (0) + cγ (x) (3.7) Rρ µγ,3 (·| τ ), µγ,3 (·| τ 0 ) ≤ kγ (x) := kT
296
T. Bodineau, E. Presutti
and X x∈3c
kγ (x) ≤
o X X |3| n 00 λP (0) + cγ (x) + λP 00 (0)[ Jγ (0, x) − 1] . kT x∈3c x∈3c
(3.8)
As it was noticed in the introduction (1.5), the region below the critical curve λmf (T ) is defined by λP 00 (0) < kT . By (1.3) and (3.3), the criterion C3 is thus satisfied for all γ small enough. Step 2. Since the interaction is ferromagnetic the two measures µγ,3 (·| τ ) and µγ,3 (·| τ 0 ) are ordered, then X µγ,3 (σy | τ ) − µγ,3 (σy | τ 0 ) . (3.9) Rρ µγ,3 (·| τ ), µγ,3 (·| τ 0 ) ≤ y∈3
By generalizing the notion of boundary conditions to configurations with continuous spins, for any s positive we denote by τ s the configuration such that τ s equals τ on Zd − {x} and τxs = sτx + (1 − s)τx0 . We also denote by p(σx ; σy ) = p(σx σy ) − p(σx )p(σy )
(3.10)
the truncated two body correlation functions of the probability measure p. Then XX 1 Z 1 λJγ (x, z) + j(x, z) ds µγ,3 σy ; σz | τ s . Rρ µγ,3 (·| τ ), µγ,3 (·| τ 0 ) ≤ kT 0 y∈3 z∈3 (3.11) Step 3. We now want to get an estimate of sup0≤s≤1 µγ,3 σy ; σz | τ s . We introduce the magnetic field X h(s, y) = λJγ (y, z)τzs , z∈3c
which is almost constant in 3: there is positive constant c0 such that |h(s, y) − h| ≤ c0 γ 1−δ
(3.12)
with h as in (3.4). We notice that µγ,3 (·| τ s ) differs from the reference measure ν3h (·| τ ) because of a local magnetic field of the order of γ 1−δ and because of the long range interaction energy XX λJγ (y, z)σy σz , y∈3 z∈3
which is bounded by a term of the order γ d−2δd . This leads to sup µγ,3 (σy ; σz | τ s ) − ν3h (σy ; σz | τ s ) ≤ O(γ) ,
(3.13)
0≤s≤1
where the function O(·) is independent of τ and h (recall that τ and h are linked by (3.4)) and O(·) vanishes as γ goes to 0. We thus set
Phase Diagram of Ising Systems with Additional Long Range Forces
c(1) γ (x) :=
297
1 X λJγ (x, z) + j(x, z) O(γ) |3| y,z∈3
that, by the choice of δ, satisfies (3.5) so that we can replace µγ,3 (σy ; σz | τ s ) by ν3h (σy ; σz | τ s ). Step 4. We now want to obtain an upper bound for the term in (3.11) that contains the factor j(x, z). By applying the SM condition we have for some positive constant C that for any x ∈ 3c , XX j(x, z)ν3h (σy ; σz | τ s ) ≤ C1d(x,3)≤r0 , (3.14) sup 0≤s≤1 y∈3 z∈3
where r0 denotes the range of the reference interaction. We then set c(2) γ (x) :=
1 C1d(x,3)≤r0 |3|
that also satisfies (3.5) because |3| → ∞ as γ → 0+ . We next replace Jγ (x, z) by Jγ (x, 0) by setting c(3) γ (x) :=
2 sup |Jγ (x, z) − Jγ (x, 0)| |3| z∈3
(the truncated correlation functions are bounded by 2). Also c(3) γ (x) satisfies (3.5). P P 1 h s It thus remains to relate |3| y∈3 z∈3 ν3 (σy ; σz | τ ) to the second derivative of the pressure of the reference system P 00 (h). The SM condition will again play a key role. Let 30 be a cube contained in 3 such that the distance between 30 and 3c is [γ −δ/2 ]. By SM X X sup ν3h (σy ; σz | τ s ) − ν h (σy ; σz ) ≤ C1 |30 |2 exp(−C2 γ −δ/2 ) , (3.15) 0≤s≤1
y∈30 z∈30
where ν h is the reference Gibbs measure (unique because T > Tcref ). Thus c(4) γ (x) :=
Jγ (x, 0) C1 |30 |2 exp(−C2 γ −δ/2 ) |3|
satisfies (3.5). By SM there is c > 0 so that X X
ν3h (σy ; σz | τ s ) ≤ c|3/30 |
(3.16)
y∈3 z∈3/30
and c(5) γ (x) := satisfies (3.5). Since X X y∈30
z∈30
ν h (σy ; σz ) ≤
Jγ (x, 0) c|3/30 | |3| X X
y∈30
ν h (σy ; σz ) = |30 |P 00 (h)
z∈Zd
and |30 | < |3|, we have concluded the proof of (3.6). Theorem 1.3 is proved.
(3.17)
298
T. Bodineau, E. Presutti
Acknowledgement. We are indebted to Joel Lebowitz, Enzo Olivieri and Milos Zahradnik for many useful discussions. One of us (T.B.) acknowledges very kind hospitality at the Dipartimento di Matematica di Roma Tor Vergata.
References 1. Alberti, G., Bellettini, G., Cassandro, M., Presutti, E.: Surface tension in Ising system with Kac potentials. J. Stat. Phys. 82, 743–796 (1996) 2. Cassandro, M., Presutti, E.: Phase transitions in Ising systems with long but finite range. Markov Processes and Related Fields 2, 241–262 (1996) 3. DeMasi, A., Orlandi, E., Presutti, E., Triolo, L.: Glauber evolution with Kac potentials. I. Mesoscopic and macroscopic limits, interface dynamics. Nonlinearity 7, 1–67 (1994) 4. Dobrushin, R.L.: Prescribing a system of random variables by the help of conditional distributions . Theor. Prob. and Appl. 15, N 3, 469–497 (1970) 5. Dobrushin, R.L., Shlosman, S.B.: Constructive criterion for the uniqueness of Gibbs fields. Statistical Physics and Dynamical Systems, Basel-Boston: Birkh¨auser, 1985 6. Dobrushin, R.L., Shlosman, S.B.: Completely analytical Gibbs fields. Statistical Physics and Dynamical Systems, Basel-Boston: Birkh¨auser, 1985 7. Dobrushin, R.L., Shlosman, S.B.: Completely analytical interactions: Constructive description. J. Stat. Phys. 46, 983–1014 (1987) 8. Dobrushin, R.L., Shlosman, S.B.: The problem of translation invariance of Gibbs states at low temperatures. In: S.P. Novikov (ed), Soviet Scientific Reviews C, Math. Phys. 5, London: Harwood Ac. Publ, 1985, pp. 53–196 9. Ellis, R.: Entropy large deviations and statistical mechanics. Berlin-Heidelberg-New York: SpringerVerlag, 1985 10. Higuchi, Y.: Coexistence of infinite *-cluster. Ising Percolation in two dimensions. Prob. Theor. Rel. Fields 97, 1–34 (1993) 11. Kac, M., Uhlenbeck, G., Hemmer, P.C.: On the Van der Waals theory of vapor-liquid equilibrium: I. Discussion of a one dimensional model. J. Math. Phys. 4, 216–228 (1963); II. Discussion of the distribution functions. J. Math. Phys. 4, 229–247 (1963); III. Discussion of the critical region. J. Math. Phys. 5, 60–74 (1964) 12. Lebowitz, J., Penrose, O.: Rigorous treatment of the Van der Waals Maxwell theory of the liquid vapour transition. J. Math. Phys. 7, 98–113 (1966) 13. Martinelli, F., Olivieri, E.: Approach to equilibrium of Glauber dynamics in the one phase region. I. The attractive case. Commun. Math. Phys. 161, 447–486 (1994); II. The general case. Commun. Math. Phys. 161, 487–514 (1994) 14. Martinelli, F., Olivieri, E., Schonmann, R.: For 2-D lattice spin systems weak mixing implies strong mixing. Commun. Math. Phys. 165, 33–47 (1994) 15. Schonmann, R., Shlosman, S.: Complete analyticity for 2D Ising completed. Commun. Math. Phys. 170, 453–482 (1995) 16. Zahradnik, M.: A short course on the Pirogov–Sinai theory. Roma Tor Vergata, Feb. 1996 Communicated by J.L. Lebowitz
Commun. Math. Phys. 189, 299– 309 (1997)
Exponential Relaxation of Glauber Dynamics with Some Special Boundary Conditions Roberto H. Schonmann1 , Nobuo Yoshida2; 3 1 Mathematics
Department, UCLA, Los Angeles, CA 90024, USA. E-mail:
[email protected] 2 Division of Mathematics, Graduate School of Science, Kyoto University, Kyoto 606-01, Japan. E-mail:
[email protected] 3 Mathematics Department, 2-101, MIT, Cambridge, MA 02139, USA. E-mail:
[email protected] Received: 24 May 1996 / Accepted: 24 May 1996
This paper is dedicated to the memory of Professor Roland Dobrushin
Abstract: We consider attractive nite-range Glauber dynamics and show that if a certain mixing condition is satis ed, then the system evolving on arbitrary subsets of the lattice, with appropriate boundary conditions, converges to equilibrium exponentially fast, in the uniform sense, uniformly over the subsets of the lattice. This result applies, for instance, to the ferromagnetic nearest neighbor Ising model in the so-called “Basuev region,” where complete analyticity is expected to fail. Technically the result in this paper is an extension of a result of Martinelli and Olivieri, who proved that under a weaker form of mixing the in nite system approaches equilibrium exponentially fast. Conceptually this paper may be seen as a step towards developing and exploiting a restricted notion of complete analyticity in which the boundary conditions, rather than the shapes of the regions under consideration, are being restricted.
1. Introduction In [MO94a], Sect. 3, Martinelli and Olivieri solved an important and old problem, by showing that for attractive spin ip systems (subject to some natural and mild technical conditions) the following holds. Exponential decay of the in uence from the boundary for the invariant measures of the system restricted to cubic boxes implies exponential convergence to equilibrium for the in nite system, i.e., exponential ergodicity. In the case of spin ip systems which are Glauber dynamics associated to some interaction, the invariant measure for the system restricted to such a box is a nite-volume Gibbs measure, and the hypothesis in the theorem quoted above corresponds to a sort of mixing condition for the associated Gibbs measures. In this Partially supported by the American N.S.F. under the grant DMS 9400644.
300
R.H. Schonmann, N. Yoshida
sense the result says that good equilibrium mixing properties imply fast approach to equilibrium for the class of Glauber dynamics under consideration. (A precise statement of this result is postponed to Sect. 3, after the pertinent notation is introduced.) The fact that the result recalled above refers only to the in nite lattice dynamics and not to the corresponding dynamics restricted to subsets of the lattice, with boundary conditions frozen outside, may seem surprising and disappointing at rst sight. Nevertheless this feature of the result is not simply a technical weakness of the approach used, but a matter of reality. Even for the nearest neighbor ferromagnetic Ising model in dimension d = 3 it is believed that at low temperatures and under appropriate values of the external magnetic eld, “boundary phase transitions” can occur and prevent the dynamics from approaching equilibrium exponentially fast uniformly over all cubic boxes, for carefully chosen boundary conditions (in which some spins are frozen as −1, and others as +1). Following the tradition, we will refer to the part of the phase diagram where this phenomenon is expected to occur as the “Basuev region.” (This name derives from the fact that according to oral tradition, transmitted to one of us by S. Shlosman, Basuev rst proposed the existence of such a phenomenon in a Moskow seminar.) A similar phenomenon is rigorously known to occur for some non-ferromagnetic systems known as Czech models (see [Shl86]), and for SOS approximations to the ferromagnetic Ising model (see [DM94, CM96a and CM96b]). In this paper we address the following question: would it be possible to extend the result above of [MO94a], to include exponential approach to equilibrium for nite subsets of the lattice, uniformly in the size of the subset, provided that only “non-tricky” boundary conditions are considered? Indeed, it is very natural to ask, for instance, whether for the Ising model such a result holds when the boundary condition corresponds to having all spins +1, or all spins −1. We provide a result of this type, which is stated as Theorem 3.2. The mixing condition that is assumed to hold in the hypothesis of our theorem is stronger than the one in the Martinelli–Olivieri theorem, and involves the special type of boundary condition being considered. The good news is that in several cases of interest, one can check that this mixing condition does hold, and hence conclude that exponential convergence to equilibrium, in the sense discussed above, takes place. These successful cases are summarized in Theorem 4.1 and include the Basuev region. We conclude this introduction with some remarks. A major direction of recent research on the issues of mixing properties of Gibbs measures and fast convergence to equilibrium of the associated Glauber dynamics, has concerned the equivalence of the (very strong) type of mixing condition called Dobrushin–Shlosman complete analyticity (introduced in [DS85]) and exponential convergence uniformly over subsets of the lattice, with arbitrary boundary conditions. The major breakthrough in this direction was obtained in [SZ92]. Such results are very strong and impressive, but come with one drawback. In the way the Dobrushin–Shlosman complete analyticity condition was originally introduced, it referred to uniform properties over all the subsets of the lattice, with arbitrary boundary conditions. Even for the ferromagnetic Ising model in two dimensions, counterexamples to complete analyticity in this original sense can actually be obtained in parts of the phase diagram where the system is supposed to be very well behaved. This is done by taking sequences of subsets of the lattice which have boundaries of sizes comparable to their volume. It was realised in [MO94a] and [MO94b] that by restricting the class of subsets of the lattice to, say, only cubes (actually much larger classes can be considered) the
301
Exponential Relaxation of Glauber Dynamics
pathologic counterexamples would on one hand disappear, while on the other hand, the relations with rapid convergence to equilibrium would be preserved, uniformly now only over the allowed subsets of the lattice, but still uniformly over all boundary conditions. One way to look at the present paper is as a rst attempt to restore the relations between good equilibrium mixing properties and fast convergence of the dynamics uniformly over subsets of the lattice, by restricting the set of allowed boundary conditions, rather than the set of allowed subsets of the lattice. Indeed, the mixing condition (3.3), which appears in the hypothesis of Theorem 3.2 is a natural analogue of the complete analyticity mixing condition (in the ferromagnetic case) restricted to the boundary condition !. The problem of obtaining stronger results than the one provided in Theorem 3.2 under such a mixing condition, e.g., a bound on the Logarithmic Sobolev constant, uniform on the subsets of the lattice, but restricted to the appropriate boundary condition, remains open. Such a result would imply, in particular, the niteness of the logarithmic Sobolev constant for the unique Gibbs measure of the in nite ferromagnetic Ising model at low temperatures, for all non-null values of the external magnetic eld, including hence the Basuev region. (Naturally, one can also contemplate the possible advantages of introducing a type of mixing in which both the class of lattice subsets and the boundary conditions are restricted.) It is natural to remind the reader at this point that in d = 2 the mixing condition in the hypothesis of Theorem 3.1 is actually known to imply the stronger type of mixing (complete analyticity restricted to squares), which, in particular implies exponential convergence to equilibrium uniformly over all squares, with arbitrary boundary conditions (see [MOS94]). Still, it seems to us that even in d = 2 the results in Theorem 3.2 and Theorem 4.1 are new, since they refer to arbitrary subsets of the lattice.
2. Notation and De nitions The lattice. We will work on the d-dimensional integer lattice Zd = {x = (xi )di=1 : Pd xi ∈ Z}, on which we consider the l1 -metric; kxk1 = i=1 |xi |. The number of points contained in a set ⊂ Zd is denoted by || and we write b Zd when 1 5 || ¡ ∞. The diameter of will be denoted by diam(). For x ∈ Zd and l = 1; 2; : : : ; Ql (x) denotes the cube with the center x and the radius l; n o Ql (x) = y ∈ Zd : max |xi − yi | 5 l : (2.1) i
The con gurations. The set of all spin con gurations of S = {−1; 1} on ⊂ Zd is denoted by S ; S = { = (x )x∈ ; x ∈ S}; ⊂ Zd : As usual, S is endowed with the product topology inherited from the discrete topology on S. In S , the following partial order is introduced: 5 0
if x 5 x0 for all x ∈ :
Clearly, the maximal and the minimum element in this partial order are + and −, which are respectively the con gurations with all spins +1 and −1. For b Zd and
302
R.H. Schonmann, N. Yoshida d
d
(; !) ∈ S Z × S Z , · !c denotes the following con guration: x if x ∈ ; ( · !c )x = !x if x ∈| : For f : S → R and x ∈ , we set ∇x f() = f( x ) − f(), where x = (yx )y∈ is the con guration obtained from by ipping the spin at x, −x if y = x ; yx = if y-x : y For f : S → R, we introduce the notation f = {x ∈ ; ∇x f is not identically zero} ; kfk = sup |f()| ; ∈S
and
k|f|k =
P
k∇x fk :
x∈
The function spaces C and C ( ⊂ Zd ) are de ned respectively by d
C = {f : S Z → R; |f | ¡ ∞} ; and
d
C = {f : S Z → R; f ⊂ } :
The interaction and nite volume Gibbs states. A family = {X ∈ CX : X b Zd } is called a bounded, nite range interaction if it satis es the following: (-1) There exists M0 ¡ ∞ such that kk := sup
P
x∈Zd X : X 3x
kX k 5 M0 :
(2.2)
(-2) There exists r0 ¡ ∞ such that | 0} 5 r0 : r() := sup{diam(X ); X ≡
(2.3)
kk in (2.2) and r() in (2.3) are called the norm and the range of the interaction, respectively. From here on, we x a bounded, nite range interaction . For each b Zd , we de ne the Hamiltonian H ∈ C by P X : (2.4) − H = X : X ∩-∅
d
For each b Zd and ! ∈ S Z , we de ne the nite volume Gibbs state ; ! as the probability measure on S , in which each con guration ∈ S appears with probability exp −H ( · !c ) ; (2.5) ; ! ({ }) = Z ; ! where Z ; ! is the normalizing constant.
303
Exponential Relaxation of Glauber Dynamics
The in nite volume Gibbs state. A Borel probability measure is called an in nite volume Gibbs state with respect to the interaction or simply a Gibbs state if it solves the following Dobrushin–Lanford–Ruelle equation; (; · f) = f; ∀ b Zd ; ∀f ∈ C : (2.6) The set of Gibbs states with respect to will be denoted by Gibbs(). The stochastic dynamics. We introduce now for the model above, the time evolution called Glauber dynamics. We de ne for each x ∈ Zd an operator Ax : C → C by d Ax f = cx ∇x f; where the ip rates cx : S Z → (0; ∞) are functions on which we assume the following. (R-1) Boundedness: There exist positive constants c() and c(), which depends only on the norm kk of the interaction such that c() 5 cx () 5 c();
d
for all (x; ) ∈ Zd × S Z :
(R-2) Finite range: There exists r1 = 0 such that ∇y cx ≡ 0
if |x − y| ¿ r1 :
(2.7)
(R-3) Detailed Balance Condition: ∇x {cx () exp −H{x} ()} ≡ 0;
for all x ∈ Zd :
(2.8)
(R-4) Attractiveness: For each x ∈ Zd , 7→ cx ()| x =−1 is non-decreasing and
7→ cx ()| x =+1 is non-increasing : P We now de ne A : C → C by Af = x∈Zd Ax f. Then, by general arguments in d [Lig, Chapter I], we know that A can be closed on (C(S Z ); k · k) and that the closure generates a contraction, strongly continuous semi-group (exp tA)t=0 , which we will refer to as Glauber dynamics. Further, we see from (2.8) that − (fAg) =
1 P (cx ∇x f∇x g); 2 x∈Zd
{f; g} ⊂ C :
(2.9)
We also need to introduce the Glauber dynamics in a nite set b G with the d boundary condition ! ∈ S Z . The semi-group will be generated by an operator P A; ! f() = Ax f( · !c ); f ∈ C : x∈
We then have − ; ! (fA; ! g) = We set
1 P ; ! (cx ∇x f∇x g); 2 x∈
Tt; ! = exp tA; ! ;
{f; g} ⊂ C :
(2.10)
t ¿0 :
We have de ned A; ! and Tt; ! as operators acting on C. But, we can also regard them as operators from C into itself.
304
R.H. Schonmann, N. Yoshida
3. An Extension of a Result of Martinelli and Olivieri First we recall the following result: Theorem 3.1 ([MO94a]). Suppose that there exists {C0 ; C1 } ⊂ (0; ∞) such that sup {Ql (x);+ (x ) − Ql (x);− (x )} 5 C0 exp − x∈Zd
l C1
(3.1)
for all l = 1; 2; : : : (cf. (2.1)). Then, the set Gibbs() consists of a unique element and there exists {C2 ; C3 } ⊂ (0; ∞) which depends only on d; M0 ; r0 ; r1 ; C0 ; and C1 (cf. (2:2); (2:3) and (2:7)) such that k(exp tA)f − fk 5 C2 k|f|k exp −
t C3
(3.2)
for all f ∈ C and all t = 0. Remark. In [MO94a], the interaction is supposed to be shift-invariant, but their proof works also without shift-invariance. Next we state the main result in this paper. This result says that a stronger conclusion follows under a stronger type of mixing condition than (3.1) above. In the statement of this stronger mixing condition, a certain boundary condition ! is singled out and in the conclusion of the theorem the same special boundary condition ! is the only one that can be considered. In typical applications of this theorem the constants C0 and C1 , which appear in its statement, will not depend on . Theorem 3.2. Suppose that b Zd and that {C0 ; C1 } ⊂ (0; ∞) are such that for a certain con guration ! sup { ∩ Ql (x); (+) · !c (x ) − ∩ Ql (x); (−) · !c (x )} 5 C0 exp − x∈
l C1
(3.3)
for all l = 1; 2; : : : Then there exists {C2 ; C3 } ⊂ (0; ∞) which depend only on d; M0 ; r0 ; r1 ; C0 ; and C1 above such that t kTt; ! f − ; ! fk 5 C2 k|fk| exp − (3.4) C3 for all f ∈ C and all t = 0. ˆ by setting Proof. We de ne a new bounded, nite range interaction P X ( · !c ) : ˆ F () = X; X ∩=F
ˆ is dominated by a constant which depends on M0 ˆ 5 r() and kk Clearly r() and r0 , but not on the choice of . We will denote the corrresponding Hamiltonian and the nite volume Gibbs states respectively by HbY and ˆY; (Y b Zd ). Then, HbY (Y · Y c ) can be seen to be equal to H ∩ Y ( ∩ Y · \Y · !c ). In fact, P P −HbY (Y · Y c ) = X ( ∩ Y · \Y · !c ) F; F ∩ Y -∅ X ; X ∩ =F
=
P
X ; X ∩ ∩ Y -∅
X ( ∩ Y · \Y · !c ) :
305
Exponential Relaxation of Glauber Dynamics
From this, we see that ˆY; = ∩ Y; \Y · !c ⊗ Y \ ;
(3.5)
which implies that c
ˆ = {; ! ⊗ } ; Gibbs() where X denotes the 1=2-Bernoulli measure on S X for X ⊂ Zd . Also, using (3.3) and (3.5), we can verify the condition (3.1) for the nite volume Gibbs states { ˆY; } with the same constants C0 and C1 that appear in (3.3). On the other hand, we de ne a new generator Aˆ of an attractive Glauber dyˆ by namics corresponding to the interaction P
Aˆ = A; ! +
x ∈|
∇x :
(3.6)
Then, it is clear that ˆ = Tt; ! f; for f ∈ C : (exp t A)f
(3.7)
ˆ to conclude We now apply Theorem 3.1 to A, ˆ − (; ! ⊗ c )fk 5 C2 k|fk| exp − t k(exp t A)f C3
(3.8)
for all f ∈ C and for all t = 0. In view of (3.7), (3.4) follows from (3.8).
4. Application to the Ising Ferromagnet In this section, we turn our attention to the most classical example of the interaction ; x y if X = {x; y} and kx − yk1 = 1 ; X () = (4.1) hx if X = {x} ; where ¿ 0 and h ∈ R are xed parameters. By c we denote, as usual, the critical inverse temperature. The nite volume Gibbs state ; ! is de ned by (2.5), with a d slight extension. Namely, we will take the boundary condition ! from {−1; 0; +1}Z , d instead of S Z . The major reason for this extension is to include an important special case !x ≡ 0, i.e., the free boundary condition. One can easily check that Theorem 3.2 applies also to these extended boundary conditions. Theorem 4.1. Suppose that one of the following holds. d
(a) ¡ c ; ! ∈ {−1; 0; +1}Z and ( inf
x∈
h+
P
y ∈| :kx−yk1 =1
) !y
=0:
(4.2)
306
R.H. Schonmann, N. Yoshida
(b) d = 2; = c ; h ¿ 0; and ! = +. (c) is suciently large; h ¿ 0; and ! = +. Then there exist {C0 ; C1 } ⊂ (0; ∞) which is independent of the choice of such that (3.3) holds for any l = 1; and hence there exists {C2 ; C3 } ⊂ (0; ∞) which depends only on d; M0 ; r0 ; r1 ; C0 ; and C1 above such that (3.4) holds for all f ∈ C and t = 0. In case (a); it is possible to choose {C0 ; C1 }; and hence {C2 ; C3 } uniformly in h and ! within the class de ned by (4.2): If ¿ c in case (b) and (c), it is possible to choose C1 uniformly in h ¿ 0. Proof. We will prove that (3.3) holds in each one of the cases. To handle case (a), the following estimate is available ([Aiz81]); if ¡ c , there exists C = C( ; d)¿0 such that kx − yk1 ; for all {x; y} ⊂ Zd ; (4.3) ;0 (x y ) 5 C exp − C where ; 0 denote the in nite volume Gibbs state with h = 0. The proof of (3.3) will be built on an argument used in the proof of [Hig93, Theorem 2], which makes use of several inequalities and of (4.3). (Part (ii) of Theorem 2 in [Hig93] is actually the particular case of (3.3), in which ! = +.) For an inhomogeneous magnetic eld {hz ∈ R; z ∈ ∩ Ql (x)}, we de ne a probability measure {hz } on S ∩ Ql (x) by {hz } ({}) = where − H{hz } () =
P hv; wi
exp −H{hz } () ; normalization v w +
P
z∈ ∩ Ql (x)
hz z ;
P with hv; wi denoting the summation over all nearest neighbour pairs in the set ∩ Ql (x). We will need the following fact; if inhomogeneous magnetic elds {h± z } and {kz± } are given such that for each z ∈ ∩ Ql (x),
and then
− + − 0 5 h+ z − hz 5 kz − kz
(4.4)
− 0 5 kz+ + kz− 5 h+ z + hz ;
(4.5)
−
−
{hz } (x ) − {hz } (x ) 5 {kz } (x ) − {kz } (x ) : +
+
(4.6)
The proof of the above fact, which is based on Lebowitz’ inequality for duplicated spins ([Leb74]), can be found in [Hig93]. At this point, we make the following ± particular choice of {h± z } and {kz }: P ((±) · !c )y h± z =h+ y ∈| ∩ Ql (x) ky−zk1 =1
and
kz± =
P
y ∈| ∩ Ql (x) ky−zk1 =1
((±) · 0c )y :
307
Exponential Relaxation of Glauber Dynamics ±
In this case (4.4) is obvious and (4.5) follows from (4.2). Since {hz } = ± ∩ Ql (x);(±) · !c and {kz } = ∩ Ql (x); (±) · 0c |h=0 ; (4.6) implies in this case that ∩ Ql (x);(+) · !c (x ) − ∩ Ql (x);(−) · !c (x ) 5 ∩ Ql (x); (+) · 0c (x )|h=0 − ∩ Ql (x); (−) · 0c (x )|h=0 =
+1 R −1
d {kz+ } (x ) d : d
(4.7)
On the other hand, we have P P + + d {kz+ } (x ) = kz+ {kz } (x ; z ) 5 2d {kz } (x ; z ) d z∈ ∩ Ql (x) z∈@int Ql (x) 5 2d
P
; 0 (x z ) 5 4d2 ld−1 C exp −
z∈@int Ql (x)
l : C
(4.8)
Here, the second step comes from an observation that kz+ = 0 if z ∈|@int Ql (x), while we have used the GHS and GKS inequalities in the third step and (4.3) in the last step. Plugging (4.8) in (4.7) we obtain (3.3). The constants {C0 ; C1 } obtained from the above proof is obviously uniform in h and ! under consideration. To prove that (3.3) also holds in cases (b) and (c) of the theorem, we will use some percolation notions. As usual we say that a chain is a sequence of distinct sites x1 ; : : : ; xn , with the property that for i = 1; : : : ; n − 1, kxi − xi+1 k1 = 1. The sites x1 and xn are called the end-points of the chain x1 ; : : : ; xn . A chain is said to connect two sets if it has one end-point in each set. Let B be the event that there is a chain of − spins connecting x to ∩ (Ql (x))c . The statement that we want to prove is reduced to an upper estimate on ∩ Ql (x); (−) · !c (B), via an argument that we present next. We partition Bc according to the set of sites in Ql (x) which are connected to ∩ (Ql (x))c by chains of − spins. We will use the notation {F } to denote this partition. Using a selfexplanatory notation for conditional expectations, from the Markov property and the FKG inequalities, we obtain for each ; ∩ Ql (x); (−) · !c (x |F ) = ∩ Ql (x); (+) · !c (x ) : Therefore ∩ Ql (x); (−) · !c (x ) =
P
∩ Ql (x); (−) · !c (x |F ) ∩ Ql (x); (−) · !c (F )
+ ∩ Ql (x); (−) · !c (x | B) ∩ Ql (x); (−) · !c (B) = ∩ Ql (x); (+) · !c (x ) ∩ Ql (x); (−) · !c (Bc ) − ∩ Ql (x); (−) · !c (B) :
308
R.H. Schonmann, N. Yoshida
Hence ∩ Ql (x); (+) · !c (x ) − ∩Ql (x); (−) · !c (x ) 5 ∩ Ql (x); (−) · !c (B) + ∩ Ql (x); (+) · !c (x ) ∩ Ql (x); (−) · !c (B) (4.9) 5 2 ∩ Ql (x); (−) · !c (B) : But since ! = +, we have from the FKG inequalities, ∩ Ql (x); (−) · !c (B) 5 Ql (x); − (B) :
(4.10)
The inequality (3.3) now follows from (4.9) and (4.10), since it is known that under the conditions in parts (b) or (c) of the theorem that we are proving, the probability in the right-hand side of (4.10) decays exponentially with l. For the two-dimensional case this result can be found in [SS95] (combine inequalities (4.24) and (4.27) in that paper). For higher dimensions it is enough to observe that the renormalization argument used in [SS95] to prove this result in two dimensions works in the same way in higher dimensions, if one uses as input Theorem 1 in [Mar87], in the place of part (b) of Corollary 1 in [SS95]. (For an alternative proof of Martirosyan’s result the reader can consult [Sch94]; see Theorem 3 there.) If ¿ c , the above argument yields a uniform choice of C1 in h ¿ 0. Remark. Part (a) of Theorem 4.1 can be generalized. As the proof we presented suggests, the nearest neighbor interaction can be replaced by a more general translation-invariant nite-range ferromagnetic two-point interaction. Part (a) of Theorem 4.1 includes the case in which h = 0 and the boundary condition is free. The result that the system approaches equilibrium exponentially fast in this case for all ¡ c , uniformly in the size of the system can be contrasted with the result that for ¿ c the rate of exponential convergence to equilibrium for the system in a square box with free boundary conditions vanishes as an exponential of the side-length of the box, with the rate of this exponential decay being given by the surface tension (see [Mar94] and [CGMS]). Regarding parts (b) and (c) of Theorem 4.1, we expect the same statements also to be true with free boundary conditions, but unfortunately the proof given above does not work in this case. Acknowledgement. N.Y. thanks Y. Higuchi for discussions. The authors also thank two anonymous referees for their comments.
References [Aiz81]
Aizenman, M.: Rigorous studies of critical behavior II. In: Fritz, J., Jae, A., Szasz, D. (eds.) Statistical Physics and dynamical systems. Proceedings, Koseg 1984, Boston, Basel, Stuttgart: Birkhauser, 1985, pp. 453– 481 [CGMS] Cesi, F., Guadagni, G., Martinelli, F., Schonmann, R.H.: On the 2D dynamical Ising model in the phase coexistence region near the critical point. J. Stat. Phys. (to appear) [CM96a] Cesi, F., Martinelli, F.: On the layering transition of an SOS surface interacting with a wall. I. Equilibrium results. J. Stat. Phys. 82, 823–913 (1996) [CM96b] Cesi, F., Martinelli, F.: On the layering transition of an SOS surface interacting with a wall. II. The Glauber dynamics. Commun. Math. Phys. 177, 173–201 (1996) [DM94] Dinaburg, E.I., Mazel, A.E.: Layering transition in SOS model with external magnetic eld. J. Stat. Phys. 74, 533–563 (1994)
Exponential Relaxation of Glauber Dynamics
[DS85]
309
Dobrushin, R., Shlosman, S.: Completely analytical Gibbs elds. In: Fritz, J., Jae, A., Szasz, D., (eds.) Statistical Physics and Dynamical Systems. Boston, Basel, Stuttgart: Birkhauser, 1985, pp. 371– 403 [Hig93] Higuchi, Y.: Coexistence of in nite (∗)-clusters II: – Ising percolation in two dimension. Prob. Th. Rel. Fields 97, 1–33 (1993) [HY95] Higuchi, Y., Yoshida, N.: Slow relaxation of stochastic Ising models with random and non-random boundary conditions. To appear in Proceeding of Taniguchi Symposium, (1995) [Leb74] Lebowitz, J.L.: GHS and other inequalities. Commun. Math. Phys. 35, 87–92 (1974) [Lig85] Liggett, T.M.: Interacting Particle Systems. Berlin-Heidelberg-Tokyo: Springer Verlag, 1985 [Mar94] Martinelli, F.: On the two dimensional dynamical Ising model in the phase coexistence region. J. Stat. Phys. 76, 1179–1246 (1994) [MO94a] Martinelli, F., Olivieri, E.: Approach to equilibrium of Glauber dynamics in the one phase region I. The attractive case. Commun. Math. Phys. 161, 447– 486 (1994) [MO94b] Martinelli, F., Olivieri, E.: Approach to equilibrium of Glauber dynamics in the one phase region II. The general case. Commun. Math. Phys. 161, 487–514 (1994) [MOS94] Martinelli, F., Olivieri, E., Schonmann, R.H.: For 2-D lattice spin systems weak mixing, implies strong mixing. Commun. Math. Phys. 165, 33– 47 (1994) [Mar87] Martirosyan, D.G.: Theorems on strips in the classical Ising ferromagnetic model. Sov. J. Comtemp. Math. 22, 59–83 (1987) [SS95] Schonmann, R.H., Shlosman, S.B.: Complete analyticity for 2D Ising completed. Commun. Math. Phys. 170, 453– 482 (1995) [Sch94] Schonmann, R.H.: Slow droplet-driven relaxation of stochastic Ising models in the vicinity of the phase coexistence region. Commun. Math. Phys. 161, 1– 49 (1994) [Shl86] Shlosman, S.B.: Uniqueness and half space non-uniqueness of Gibbs states in Czech models. Theor. Math. Phys. 66, 284 –293 (1986) [SZ92] Stroock, D.W., Zegarlinski, B.: The logarithmic Sobolev inequality for discrete spin systems on a lattice. Commun. Math. Phys. 149, 175–193 (1992) Communicated by J.L. Lebowitz
Commun. Math. Phys. 189, 311 – 321 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
On the Uniqueness of Gibbs States in the Pirogov-Sinai Theory? J.L. Lebowitz1 , A.E. Mazel2 1
Department of Mathematics and Physics, Rutgers University, New Brunswick, NJ 08903, USA International Institute of Earthquake Prediction Theory and Mathematical Geophysics, Russian Academy of Sciences, Moscow 113556, Russia
2
Received: 4 June 1996 / Accepted: 30 October 1996
Dedicated to the memory of Roland Dobrushin Abstract: We prove that, for low-temperature systems considered in the Pirogov-Sinai theory, uniqueness in the class of translation-periodic Gibbs states implies global uniqueness, i.e. the absence of any non-periodic Gibbs state. The approach to this infinite volume state is exponentially fast. 1. Introduction The problem of uniqueness of Gibbs states was one of R.L.Dobrushin’s favorite subjects in which he obtained many classical results. In particular when two or more translation-periodic states coexist, it is natural to ask whether there might also exist other, non translation-periodic, Gibbs states, which approach asymptotically, in different spatial directions, the translation periodic ones. The affirmative answer to this question was given by R.L.Dobrushin with his famous construction of such states for the Ising model, using ± boundary conditions, in three and higher dimensions [D]. Here we consider the opposite situation: we will prove that in the regions of the low-temperature phase diagram where there is a unique translation-periodic Gibbs state one actually has global uniqueness of the limit Gibbs state. Moreover we show that, uniformly in boundary conditions, the finite volume probability of any local event tends to its infinite volume limit value exponentially fast in the diameter of the domain. The first results concerning this problem in the framework of the Pirogov-Sinai theory [PS] were obtained by R.L.Dobrushin and E.A.Pecherski in [DP]. The PirogovSinai theory describes the low-temperature phase diagram of a wide class of spin lattice models, i.e. it determines all their translation-periodic limit Gibbs states [PS, Z]. The results of [DP], corrected and extended in [Sh], imply that, for any values of parameters at which the model has a unique ground state, the Gibbs state is unique for sufficiently small temperatures. But the closer these parameters are to the points with non unique ?
Work supported by NSF grant DMR 92-13424
312
J.L. Lebowitz, A.E. Mazel
ground state the smaller the temperature for which uniqueness of the Gibbs state is given by this method. Independently, an alternative method leading to similar results was developed in [M1,2]. The main difficulty in establishing the results of this type is due to the necessity of having sufficiently detailed knowledge of the partition function in a finite domain with an arbitrary boundary condition. This usually requires a detailed analysis of the geometry of the so called boundary layer produced by such a boundary condition (see [M1,2, Sh]). Here we develop a new simplified approach to the problem. The simplification is achieved by transforming questions concerning the finite volume Gibbs measure with arbitrary boundary conditions into questions concerning the distribution with a stable (in the sense of [Z]) boundary condition. The latter can be easily investigated by means of the polymer expansion constructed for it in the Pirogov-Sinai theory. This also allows the extension of the uniqueness results from systems with a unique ground state to the case with several ground states but unique stable ground state (see [Z]). Since the publication of the paper [PS] about twenty years ago the Pirogov-Sinai theory was extended in different directions. For a good exposition of the initial theory we refer the reader to [Si] and [Sl]. Some of the generalizations can be found in [BKL, BS, DS, DZ, HKZ] and [P]. Below we present our results in the standard settings of [PS and Z]: a finite spin space with a translation-periodic finite potential of finite range, a finite degeneracy of the ground state and a stability of the ground states expressed via the so called Peierls or Gertzik-Pirogov-Sinai condition. The extension to other cases is straightforward. Our method also works for unbounded spins, see [LM].
2. Models and Results The models are defined on some lattice, which for the sake of simplicity we take to be the d-dimensional (d ≥ 2) cubic lattice Zd . The spin variable σx associated with the lattice site x takes values from the finite set S = {1, 2, . . . , |S|}. The energy of the d configuration σ ∈ S Z is given by the formal Hamiltonian X U0 (σA ). (1) H0 (σ) = A⊂Zd , diam A≤r
Here σA ∈ S A is a configuration in A ⊂ Zd and the potential U0 (σA ) : S A 7→ R, satisfies U0 (σA ) = U0 (σA+y ) for any y belonging to some subgroup of Zd of finite index and the sum is extended over subsets A of Zd with a diameter not exceeding r. Accordingly for a finite domain V ⊂ Zd , with the boundary condition σ¯ V c given on its complement V c = Zd \ V , the conditional Hamiltonian is X U0 (σA ), (2) H0 (σV |σ¯ V c ) = A∩V 6=∅, diam A≤r
where σA = σA∩V + σ¯ A∩V c for A ∩ V c 6= ∅, i.e. the spin at site x is equal to σx for x ∈ A ∩ V and σ¯ x for x ∈ A ∩ V c . A ground state of (1) is a configuration σ in Zd whose energy cannot be lowered by changing σ in some local region. We assume that (1) has a finite number of translationperiodic (i.e. invariant under the action of some subgroup of Zd of finite index) ground states. By a standard trick of partitioning the lattice into disjoint cubes Q(y) centered
Uniqueness of Gibbs States
313
at y ∈ qZd with an appropriate q and enlarging the spin space from S to S Q one can transform the model above into a model on qZd with a translation-invariant potential and only translation-invariant or non-periodic ground states. Hence, without loss of generality, we assume translation-invariance instead of translation-periodicity and we permute the spin so that the ground states of the model will be σ (1) , . . . , σ (m) with σx(k) = k for any x ∈ Zd . Taking q > r one obtains a model with nearest neighbor and next nearest neighbor (diagonal) interaction, i.e. the potential is not vanishing only on lattice cubes Q1 of linear size 1, containing 2d sites. Given a configuration σ in Zd we say that site x is in the k th phase if this configuration coincides with σ (k) inside the lattice cube Q2 (x) of linear size 2 centered at x. Every connected component of sites not in one of the phases is called a contour of the configuration σ. It is clear that for σ = σV + σ (k)c contours are connected subsets of V which V we denote by γ˜ 1 (σ), . . . , γ˜ l (σ). The important observation is that the excess energy of a configuration σ with respect to the energy of the ground state σ (k) is concentrated along the contours of σ. More precisely, X H0 (γ˜ i (σ)), (3) H0 (σV |σ (k)c ) − H0 (σ (k) |σ (k)c ) = V
V
where H0 (γ˜ i (σ)) =
V
i
X Q1 : Q1 ⊆γ˜ i (σ)
(k) U0 (σQ1 ) − U0 (σQ ) 1
(4)
and the sum is taken over the unit lattice cubes Q1 , containing 2d sites. The Peierls condition is (5) H0 (γ˜ i (σ)) ≥ τ |γ˜ i (σ)|, where τ > 0 is an absolute constant and |γ˜ i (σ)| denotes the number X of sites in γ˜ i (σ). Un (σA ), n = Consider now a family of Hamiltonians Hn (σ) = A⊂Zd , diam A≤r
1, . . . , m − 1 satisfying the same conditions as H0 with the same or smaller set of translation-periodic ground states. For λ = (λ1 , . . . , λm−1 ) belonging to a neighborhood of the origin in Rm−1 define a perturbed formal Hamiltonian H = H0 +
m−1 X
λ n Hn .
(6)
n=1
Here λn Hn play the role of generalized magnetic fields removing the degeneracy of the ground state. The finite volume Gibbs distribution is h i exp −βH(σV |σ¯ V c ) , (7) µV ,σ¯ c (σV ) = V Ξ(V |σ¯ V c ) where β > 0 is the inverse temperature and µV ,σ¯ c (σV ) is the probability of the event V that the configuration in V is σV , given σ¯ V c . Here the conditional Hamiltonian is H(σV |σ¯ V c ) = and the partition function is
X Q1 : Q1 ∩V 6=∅
U (σQ1 ),
U (·) =
m−1 X n=0
Un (·)
(8)
314
J.L. Lebowitz, A.E. Mazel
Ξ(V |σ¯ V c ) =
X σ
h i exp −βH(σV |σ¯ V c ) .
(9)
V
The notion of a stable ground state was introduced in [Z] (see also the next section) and it is crucial for the Pirogov-Sinai theory because of the following theorem. Theorem [PS, Z]. Consider a Hamiltonian H of the form (6) satisfying all the conditions above. Then for β large enough, β ≥ β0 (λ), every stable ground state σ (k) generates a translation-invariant Gibbs state µ(k) (·) = lim µ V →Zd
(k)
V ,σV c
(·).
(10)
These Gibbs states are different for different k and they are the only translation-periodic Gibbs states of the system. An obvious corollary of the above theorem is Corollary . If there is only a single stable ground state, say σ (1) , then for β ≥ β0 (λ) there is a unique translation-periodic Gibbs state µ(1) (·) = lim µ V →Zd
(1)
V ,σV c
(·).
(11)
Our extension of this result is given by Theorem 1. Under conditions of the Corollary the Gibbs state µ(1) (·) is unique. Denoting this unique state by µ(·) then for finite A ⊂ V , such that dist(A, V c ) ≥ 2d diamA + C1 (τ, β, λ, d), any configuration σA and any boundary condition σ¯ V c , one has (12) µV ,σ¯ c (σA ) − µ(σA ) ≤ exp −C2 (τ, β, λ, d) dist (A, V c ) , V
where C1 , C2 > 0. Remark 1. In contrast with [DP, Sh and M1,2] the theorem above treats the situation when there are several ground states with only one of them being stable. Moreover, the result is true for all sufficiently low temperatures not depending on how close the parameters are to the points with non unique ground state. Remark 2. If some of the conditions of the Theorem are violated the statement can be wrong. The simplest counterexample can be constructed from the Ising model in d = 3. It is well-known that at low temperatures this model contains precisely two translation-invariant Gibbs states taken into each other by ± symmetry and infinitely many non translation-invariant Gibbs states, i.e. the Dobrushin states mentioned earlier. Identifying configurations taken into each other by ± symmetry one obtains a model with a unique translation-invariant Gibbs state and infinitely many non translation-invariant ones. The condition of the Theorem which is not true for this factorized model is the finiteness of the potential: the model contains a hard-core constraint. Indeed, consider a dual lattice formed by the centers of the unit cubes of the initial lattice. Then define on the dual lattice the model with the spin taking 128 values represented by the unit cubes of initial lattice with the following properties: (i) The bonds of the cube are labeled by +1 or −1. (ii) The product of labels along any of 6 plaquettes is 1.
Uniqueness of Gibbs States
315
(iii) The energy of the cube is −1/2 times the sum of the labels of the bonds. With the hard-core condition saying that for any two neighboring cubes the labels of common bonds are the same in each cube one abtains precisely the factorized Ising model above. Another counterexample based on a gauge model can be found in [B]. Remark 3. We conjecture that for Hamiltonians H satisfying our conditions the statement of our theorem is true at all temperatures. This is known to be the case for systems satisfying FKG inequalities [LML]. Furthermore the only examples of nontranslationinvariant states for such H which we are aware of are those of the Dobrushin interface tye [D], [HKZ] which in d ≥ 3 “connect” different pure phases. 3. Preliminaries In this section we assume that the reader is familiar with the Pirogov-Sinai theory and we only list the appropriate notations and quote some necessary results. A contour is a pair γ = (γ, ˜ σγ˜ ) consisting of the support γ˜ and the configuration σγ˜ in it. The components of the interior of the contour γ are denoted by Intj γ and the exterior of γ is denoted by Extγ. The family {γ, ˜ Intj γ, Extγ} is a partition of Zd . The configuration σγ˜ can be uniquely extended to the configuration σ 0 in Zd taking constant values Ij (γ) and E(γ) on the connected components of γ˜ c . Generally these values are different for different components. The contour γ is said to be from the phase k if E(γ) = k. The energy of the contour γ is X (E(γ)) U (σQ1 ) − U (σQ ) . (13) H(γ(σ)) ˜ = 1 Q1 : Q1 ⊆γ(σ) ˜
The statistical weight of γ is
and satisfies
w(γ) = exp(−βH(γ))
(14)
˜ 0 ≤ w(γ) ≤ e−βτ |γ| .
(15)
The renormalized statistical weight of the contour is W (γ) = w(γ)
Y Ξ(Int∗j γ|Ij (γ)) j
Ξ(Int∗j γ|E(γ))
,
(16)
where for any A ⊂ Zd we denote A∗ = {x ∈ A| x is not adjacent to Ac }. The contour γ is stable if
1 W (γ) ≤ exp − βτ |γ| ˜ 3
(17)
(18)
and the ground state σ (k) is stable if all contours γ with E(γ) = k are stable. It is known (see [Z]) that at least one of the ground states is stable. Because of (18) for any x ∈ Zd , N ≥ 1 and β large enough
316
J.L. Lebowitz, A.E. Mazel
X γ:
(Extγ)c 3x,
W (γ) ≤ e−C3 N ,
(19)
|γ|≥N ˜
where C3 = C3 (τ, β, d) is positive and monotone increasing in τ and β. In particular X W (γ) ≤ C4 , (20) γ: (Extγ)c 3x
where C4 = e−C3 . For the stable ground state σ (k) the corresponding partition function can be represented as X Y −βH(σ (k) |σ (k)c ) V V W (γi ), (21) Ξ(V |k) = e [γi ]s ∈V, E([γi ]s )=k i
where the sum is taken over all collections of contours [γi ] such that γ˜ i are disjoint, E(γi ) = k for all i and γ˜ i ⊆ V for all i. Representation (21) and estimate (18) allow to write an absolutely convergent polymer expansion X W (π (k) ), (22) log Ξ(V |k) = −βH(σ (k) |σ (k)c ) + V
V
π (k) ∈V
where the sum is taken over so called polymers π (k) of the phase k belonging to the domain V . By definition a polymer π (k) = (γi ) is a collection of, not necessarily different, contours γi of the phase k such that ∪i γ˜ i is connected. The statistical weight W (π (k) ) is uniquely defined via W (γi ) and satisfies the estimate (see [Se]) " # X 1 (k) βτ − 6d |γ˜ i | (23) |W (π )| ≤ exp − 3 i implying
X π (k) =(γi ): ∪i (Extγi )c 3x,
|W (π (k) )| ≤ e−C3 N .
P i
(24)
|γ˜ i |≥N
µ(k) ({γi }, ext) V
the probability of the event that all contours of the collecDenote by tion {γi } are external ones inside V . By the construction Y W (γi ). (25) µ(k) ({γi }, ext) ≤ V
i
From the polymer expansion (22) and estimate (24) it is not hard to conclude that for {γi } with dist(∪i γ˜ i , V c ) ≥ | ∪i γ˜ i | (k) µV ({γi }, ext) − µ(k) ({γi }, ext) (26) ≤ µ(k) ({γi }, ext)| ∪i γ˜ i | exp −C5 dist (∪i γ˜ i , V c ) , where C5 = C5 (τ, β, d) is positive and monotone increasing in τ and β. For any A ∈ V , dist(A, V c ) ≥ diamA, and any σA estimate (26) implies in a standard way that (k) (27) µ (σA ) − µ(k) (σA ) ≤ exp −C6 dist (A, V c ) , V
Uniqueness of Gibbs States
317
where again C6 = C6 (τ, β, d) is positive and monotone increasing in τ and β. From now on we suppose that σ (1) is the only stable ground state of H and denote by µV (·) and µV,σ¯ V c (·) the Gibbs distribution with the stable boundary condition σV(1)c and an arbitrary boundary condition σ¯ V c respectively. Clearly µV ,σ¯ c (σA ) depends only on V σ¯ ∂V , where c (28) ∂V = {x ∈ V : x is adjacent to V }, and we freely use the notation µV,σ¯ ∂V (σA ). For a domain V fix a collection {γi }e of all external contours in V touching ∂V and suppose that the number of the points at which ∪i γ˜ i touches ∂V is not greater than L. Consider a smaller domain V 0 = ∪i ∪j Int∗j γi with P P (Ij (γi )) P 0 e the boundary condition σδV = i j σ∂Int ∈ V and M > i |γ˜ i | ∗ γ . Given {γi } j i denote by E{γi }e ,M the event thatPthe number of unstable (see Def. 2 of Sect. 3.2 in [2]) sites in V 0 is not less than M − i |γ˜ i |. According to the Theorem of Sect. 3.2 in [Z], P P µV 0 ,σ0 0 E{γi }e ,M ≤ e−C7 (τ,β,λ,d)(M − i |γ˜ i |)+C8 (τ,β,λ,d) i |γ˜ i | . (29) ∂V
The positive constants C7 and C8 tend to 0 as β → ∞ or (β, λ) approaches the manifold on which σ (1) is not the only stable ground state. For different {γi }e the events E{γi }e ,M are disjoint and for their union EV,M,L = ∪{γi }e ∈V E{γi }e ,M one has the estimate X µV 0 ,σ0 0 E{γi }e ,M µV {γi }e µV EV,M,L = ∂V
{γi }e ∈V
X
≤
e−C7 (M −
{γi }e ∈V
≤ e−C7 M
X
≤e ≤e
−C7 M +C4 L
i
Y
{γi }e ∈V −C7 M
P
(1 + C4 )
|γ˜ i |)+C8 |γ˜ i |
Y
W (γi )
i
e−( 3 τ β−C7 −C8 )|γ˜ i | 1
i L
(30)
.
Finally observe that for any A ⊆ V and any σA µV (σA )e−C9 L(σ¯ ∂V ) ≤ µV ,σ¯
Vc
(σA ) ≤ µV (σA )eC9 L(σ¯ ∂V ) ,
(31)
where L(σ¯ ∂V ) is the number of sites x ∈ ∂V with σ¯ x 6= 1 and C9 = 2d β max |U (σQ1 )| . σQ1
(32)
4. Proof of Theorem We are now ready to prove the theorem. Given σV denote by (σV ) the union of the connected components of the set {x ∈ V : σx 6= 1} adjacent to {x ∈ δV : σ x 6= 1}. This set is called the boundary layer of σV . Take an integer N > 0 and suppose that V contains a cube Q6N with sides of length 6N centered at the origin. From now on all cubes are assumed to be centered at the origin. Let QN 0 , N 0 ≥ 6N be the maximal cube contained in V . Denote ∂ 0 V = ∂QN 0 ∩ ∂V . First we consider boundary conditions √ σ¯ ∂V which coincide with σ (1) on ∂V \ ∂ 0 V and differ from σ (1) on ∂ 0 V by at most N lattice sites. Introduce the event E0 = {σV : (σV ) ∩ Q4N 6= ∅}. By construction for
318
J.L. Lebowitz, A.E. Mazel
σV ∈ E0 every i (σV ) touches ∂V at some site x ∈ ∂V with σ¯ x 6= 1 and there exists at least one component i intersecting Q4N . Without loss of generality we suppose that it is 1 (σV ). This leads to the estimate µV ,σ¯
V
c
(E0 ) ≤ eC9 ≤ eC9
√ √
N
µV (E0 ) √ √ −C N N N e 3 (1 + C4 ) N
≤ e−C10 N .
(33)
In the first inequality of (33) we used (31) reducing the problem to the calculation for the stable boundary condition σ (1) . The second inequality comes in a standard way from the cluster expansion for µV (·). Indeed, in the domain V with the stable boundary condition σ (1)c every component i contains an external contour γi such that E(γi ) = 1, V γ˜ i ⊆ i and i ⊆ (Ext(γi ))c . One may simply say that γi is the external boundary of i and clearly γ˜ i touches ∂V . If 1 intersects Q4N then γ˜ 1 intersects or encloses Q4N . The number of possibilities to choose the site x ∈√∂V, σ¯ x 6= 1 at which γ˜ 1 touches ∂V √ does not exceed N which produces the factor N in the estimate. The next factor estimates the sum of the statistical weights of all possible γ1 touching this site. It is based on (19) and takes into account the fact that the diameter of γ˜ 1 , and hence |γ˜ 1 |, is of the statistical weights not less than N . The constant C4 (see (20)) estimates the sum √ N of all possible γi touching a given lattice site and (1 + C4 ) estimates the statistical weight of all possibilities to choose {γi , i 6= 1}. The whole estimate uses (25) and the fact that µV ({γi }, ext) is the upper bound for the sum of µV -probabilities of boundary layers = {i } having γi as the boundary of i . The third inequality in (33) is trivial for C10 = 0.5C3 and N ≥ 4(C9 + 1 + C4 )2 C3−2 . Denote by E0c the complement of E0 . If σV ∈ E0c then (V \ (σV ))∗ ⊇ Q4N −2 (see (17) for the definition of (·)∗ ). It is not hard to see that the configuration σV ∈ E0c equals 1 on the boundary of (V \ (σV ))∗ . Now fix A ⊂ Q2N and σA . In view of (27) one has µV ,σ¯ V c (σA |E0c ) − µ(σA ) ≤ e−C6 N .
(34)
This gives us µV ,σ¯ V c (σA ) − µ(σA ) ≤ µV ,σ¯ V c (σA |E0 ) − µ(σA ) µV ,σ¯ V c (E0 ) + µV ,σ¯ c (σA |E0c ) − µ(σA ) µV ,σ¯ c (E0c ) ≤e ≤e
V −C10 N −C11 N
V
+e ,
−C6 N
(35)
where C11 = 0.5 min(C10 , C6 ) and N ≥ log 2/C11 . To extend (35) to the wider class of boundary conditions we suppose that V ⊇ Q8N and QN 0 , N 0 ≥ 8N is the maximal cube contained in V . Now we consider boundary conditions σ¯ ∂V which coincide with σ (1) on ∂V \ ∂ 0 V and differ from σ (1) on ∂ 0 V by √ at most ( N )2 lattice sites. Denote 1i = Q8N −2i+2 \ Q8N −2i and let (i) (σV ) be a union of the connected components of the set {x ∈ V \ Q8N −2i : σx 6= 1} adjacent to {x ∈ ∂V : σ¯ x 6= 1}, i = 1, . . . , N . Introduce disjoint events
Uniqueness of Gibbs States
319
√ N }, √ √ (1) Ei = {σV : | (σV ) ∩ 11 | ≥ N , . . . , |(i−1) (σV ) ∩ 1i−1 | ≥ N , √ |(i) (σV ) ∩ 1i | < N }
E1 = {σV : |(1) (σV ) ∩ 11 | <
and Ec =
ÿN [
!c Ei
.
(36)
i=1
√ If σV ∈ Ec then the boundary contour (σV ) contains at least N N sites. Hence (30) implies the following estimate for the probability of Ec µV ,σ¯
V
c
(Ec ) ≤ eC9 ( ≤ e C9 (
√ √
N )2 N)
≤ e−C12 N
2
µV (Ec ) e−C7 N
√
N
√
√ N +C4 ( N )2
(37)
,
where C12 = 0.5C7 and N ≥ 4(C9 + C4 )2 C7−2 . For σV ∈ Ei consider the volume Vi = (V \ (i) (σV ))∗ ∪ Q8N −2i with the boundary condition σ¯ V c + σV \Vi . By construction the number of sites x ∈ ∂Vi with σx 6= 1 is less √ than N and one can apply (35) to obtain the bound (38) µV ,σ¯ c (σA |Ei ) − µ(σA ) ≤ e−C11 N . V
Joining (37) and (38) we conclude N X (σ ) − µ(σ ) = µ (E ) µ (σ |E ) − µ(σ ) µV ,σ¯ V c A A A i A V ,σ¯ V c i V ,σ¯ V c i=1 + µV ,σ¯ c (Ec ) µV ,σ¯ c (σA |Ec ) − µ(σA ) V V ≤ e−C11 N + e−C12 N ≤ 2e−C11 N ,
√
N
(39)
where N ≥ (C11 /C12 )2 . Expression (39) is a version of (35) which is weaker by the factor √ to the wider class of boundary conditions containing √ 2 in the RHS but is applicable ( N )2 unstable sites instead of N for (35). The argument leading from (35) to (39) can be iterated several times. The first iteration treats the following situation. Suppose that V ⊇ Q10N and QN 0 , N 0 ≥ 10N is the maximal cube contained in V . Consider boundary conditions √ σ¯ ∂V which coincide with σ (1) on ∂V \ ∂ 0 V and differ from σ (1) on ∂ 0 V by at most ( N )3 lattice sites. Then the analogue of (39) is (40) µV ,σ¯ c (σA ) − µ(σA ) ≤ 3e−C11 N . V
320
J.L. Lebowitz, A.E. Mazel
Similarly after 2d iterations one obtains √ that for any V ⊇ Q2dN with the boundary condition σ¯ ∂V containing not more than ( N )2d unstable sites (41) µV ,σ¯ V c (σA ) − µ(σA ) ≤ 2de−C11 N . Now set C1 = 2d max 4(C9 + 1 + C4 )2 C3−2 , log 2/C11 , 4(C9 + C4 )2 C7−2 , (C11 /C12 )2 and for any A and σA consider a cube Q2L with L ≥ C1 and dist(A, Qc2L ) ≥ (1−1/d)L. Taking N = L/d and C2 = C11 /2d for V = Q2L one obtains (12) from (41). For any V ⊇ Q2L with ∂V ∩ ∂Q2L 6= ∅ we have X µQ2L ,σ∂Q2L (σA ) − µ(σA ) µV ,σ¯ c (σ∂Q2L ) µV ,σ¯ c (σA ) − µ(σA ) ≤ V
V
σ∂Q2L
≤ exp −C2 dist (A, V c ) , which finishes the proof of the Theorem.
(42)
Acknowledgement. We are greatly indebted to F.Cesi, F.Martinelli and S.Shlosman for many very helpful discussions.
References [B]
Borgs, C.: Translation Symmetry Breaking in Four-Dimensional Lattice Gauge Theories. Commun. Math. Phys. 96, 251-.284 (1984) [BKL] Bricmont, J., Kuroda, K. and Lebowitz, J.L.: First Order Phase Transitions in Lattice and Continuous Systems: Extension of Pirogov-Sinai Theory. Commun. Math. Phys. 101, 501-538 (1985) [BS] Bricmont, J. and Slawny, J.: Phase Transitions in Systems with a Finite Number of Dominant Ground States. J. Stat. Phys. 54, 89–161 (1989) [DS] Dinaburg, E.I. and Sinai, Ya.G.: Contour Models with Interaction and their Applications. Sel. Math. Sov. 7, 291–315 (1988) [D] Dobrushin, R.L.: Gibbs State Describing Phase Coexistence for Three Dimensional Ising Model. Theor. Probab. and Appl. N4, 619–639 (1972) [DP] Dobrushin, R.L. and Pecherski, E.A.: Uniqueness Conditions for Finitely Dependent Random Fields. In: Colloquia mathematica Societatis Janos Bolyai, 27, Random Fields, 1, J.Fritz, J.L.Lebowitz and D.Szasz, eds., Amsterdam, New York: North-Holland Pub., 1981, pp. 223–262 [DZ] Dobrushin, R.L. and Zahradnik, M.: Phase Diagrams for Continuous Spin Models. Extension of Pirogov-Sinai Theory. In: Mathematical Problems of Statistical Mechanics and Dynamics. R.L.Dobrushin ed., Dordrecht, Boston: Kluwer Academic Publishers, 1986, pp. 1–123 [HKZ] Holicky, P., Kotecky R. and Zahradnik, M.: Rigid Interfaces for Lattice Models at Low Temperatures. J. Stat. Phys. 50, 755–812 (1988) [LM] Lebowitz, J.L. and Mazel, A.E.: A Remark on the Low Temperature Behavior of the SOS Interface in Halfspace. J. Stat. Phys., 84, 379–391 (1996) [LML] Lebowitz, J.L. nd Martin-L¨of, A.: On the uniqueness of the equilibrium state for Ising spin systems. Commun. Math. Phys. 25, 276–282 (1971) [M1] Martirosyan, D.G.: Uniqueness of Gibbs States in Lattice Models with One Ground State. Theor. and Math. Phys. 63, N1, 511–518 (1985) [M2] Martirosyan, D.G.: Theorems Concerning the Boundary Layers in the Classical Ising Models. Soviet J. Contemp. Math. Anal. 22, N3, 59–83 (1987) [P] Park, Y.M.: Extension of Pirogov-Sinai Theory of Phase Transitions to Infinite Range Interactions I. Cluster Expansion and II. Phase Diagram. Commun. Math. Phys. 114, 187–218 and 219–241 (1988) [PS] Pirogov, S.A. and Sinai, Ya.G.: Phase Diagrams of Classical Lattice Systems. Theor. and Math. Phys. 25, 358–369, 1185–1192 (1975)
Uniqueness of Gibbs States
[Se] [Si] [Sl]
[Sh] [Z]
321
Seiler, E.: Gauge Theories as a Problem of Constructive Quantum Field Theory and Statistical Mechanics. Lect. Notes in Physics, 159, Berlin: Springer-Verlag (1982) Sinai, Ya.G.: Theory of Phase Transitions, Budapest: Academia Kiado and London: Pergamon Press, 1982 Slawny, J.: Low-Temperature Properties of Classical Lattice Systems: Phase Transitions and Phase Diagrams. In: Phase Transitions and Critical Phenomena 11, C. Domb and J.L. Lebowitz, eds., Oxford: Pergamon Press, 1987, pp. 128–205 Shlosman, S.B.: Uniqueness and Half-Space Nonuniqueness of Gibbs States in Czech Models. Theor. and Math. Phys. 66, 284–293, 430–444 (1986) Zahradnik, M.: An Alternate Version of Pirogov-Sinai Theory. Commun. Math. Phys. 93, 559–581 (1984)
Communicated by Ya. G. Sinai
Commun. Math. Phys. 189, 323 – 335 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Relaxation to Equilibrium for Two Dimensional Disordered Ising Systems in the Griffiths Phase? F. Cesi1 , C. Maes2 , F. Martinelli3 1 Dipartimento di Fisica, Universit` a “La Sapienza”, P.le A. Moro 2, 00185 Roma, Italy. E-mail:
[email protected] 2 Instituut voor Theoretische Fysika, K.U. Leuven, Celestijnenlaan 200D, B-3001 Leuven and Onderzoeksleider N.F.W.O., Belgium. E-mail:
[email protected] 3 Dipartimento di Energetica, Universit` a dell’ Aquila, Italy. E-mail:
[email protected]
Received: 12 June 1996 / Accepted: 31 January 1997
Dedicated to the memory of Roland Dobrushin Abstract: We consider Glauber–type dynamics for two dimensional disordered magnets of Ising type. We prove that, if the disorder–averaged influence of the boundary condition is sufficiently small in the equilibrium system, then the corresponding Glauber dynamics is ergodic with probability one and the disorder–average C(t) of 2 time–autocorrelation function satisfies C(t) . e−m(log t) (for large t). For the standard two dimensional dilute Ising ferromagnet with i.i.d. random nearest neighbor couplings taking the values 0 or J0 > 0, our results apply even if the active bonds percolate and J0 is larger than the critical value Jc of the corresponding pure Ising model. For the same model we also prove that in the whole Griffiths’ phase the previous upper bound is optimal. This implies the existence of a dynamical phase transition which occurs when J crosses Jc . 1. Introduction Roland Dobrushin certainly was one of the pioneers in the theory of interacting particle systems and in the study of stochastic dynamics for Gibbs fields in particular. From the start (at the end of the 60’s) his insights related important concepts in probability theory to those of statistical physics. The mathematical tools he developed have turned out to be extremely useful in several physically relevant problems. The present paper concerns a topic he was very much interested in and to which he contributed many ideas. The general question is the study of Gibbs fields with random interactions. In particular we want to study (see [GM1], [GZ1], [GZ2], [CMM] and references therein for related results) the speed of convergence to equilibrium in a single spin flip, Glauber–type, dynamics, which has a reversible (random) Gibbs measure in the so-called Griffiths’ phase, namely that region of the phase diagram between the paramagnetic (high–temperature) phase and the phase with long–range order (at least for ferromagnetic systems). ?
Work partially supported by grant CHRX-CT93-0411 of the EEC
324
F. Cesi, C. Maes, F. Martinelli
The simplest example of such a system is the dilute Ising ferromagnet below the percolation threshold. In this case the random couplings Jxy between nearest neighbor spins take only two values, Jxy = 0 and Jxy = J with probability 1−p and p respectively, with J > Jc and p < pc , Jc and pc being the critical values for the “pure” Ising model and independent bond percolation in Zd respectively. Since p < pc , equilibrium truncated correlations decay exponentially fast (with probability one or averaged over the disorder), but the presence of arbitrarily large connected clusters of the pure system below its critical temperature destroys, for example, the global analyticity of the free energy as a function of the external field (see [Gr], [F] and also [BD], [DKP], [GM2], [GM3] for recent progress in this direction for systems without dilution). More interesting but much more difficult to analyze is the case J > Jc and p > pc , since now there is, with probability one, an infinite cluster of low-temperature bonds. It is remarkable that, using the random–cluster representation for the Ising model (see [ACCN] and [N]) and some older results relating averaged correlations to the usual type of correlations for a translation–invariant system (see [OPG] ), one can still find, for all p ≥ pc , an explicit value J(p) > Jc (see Corollary 4.2 below) such that for all Jc ≤ J < J(p) the equilibrium behaviour is qualitatively similar to that below the percolation threshold. Moreover the behaviour of J(p) close to pc is essentially optimal (see [ACCN]). Much more pronounced are the differences between the relaxational behaviour of Glauber–type dynamics in the paramagnetic and in the Griffiths phase. A rather complete, though not completely rigorous, theory was developed some years ago in [DRS], [B1], [B2] for dilute systems below the percolation threshold (see also [RSP]). The result was a predicted asymptotic decay for the disorder–averaged spin autocorrelation C(t) of the form d (1.1) C(t) ≈ exp[−A(log t) d−1 ] to be compared with the expected pure exponential decay in the paramagnetic phase. α Notice that such a decay is much slower than a stretched exponential e−t , α < 1 being the so–called Kohlrausch exponent, that has been argued to occur for several disordered systems (see [AAPS], [DLO] and [O]). Computer simulations (see [J]) suggest however that the asymptotic behaviour (1.1) can only be seen for very large times and that for intermediate times a stretched exponential decay is more appropriate. In [CMM] we prove a behaviour very close to (1.1) for all those disordered (not necessarily dilute) discrete lattice spin systems which satisfy a a certain equilibrium assumption (hypothesis (H) in section 2 below). This assumption was shown to hold at least in part of the Griffiths phase of some natural models, and it is an interesting question to check its validity for dilute models in that part of the Griffiths phase above the percolation threshold. The main scope of the present work is to carry out this analysis for two dimensional systems. For Ising–type but not necessarily ferromagnetic models, we will prove that the main hypothesis behind the results of [CMM] holds if the disorder average of the infinite volume two point function of the associated ferromagnetic model with couplings |Jxy | decays exponentially fast. When applied to the standard dilute Ising model this result, combined with the bounds of [ACCN], [N] and [OPG], allows us to extend (1.1) to a non trivial part of the Griffiths phase above pc . For this model we also rigorously establish the existence of a dynamical phase transition as soon as one crosses the critical temperature Tc ≡ Jc−1 of the pure system (see [DRS] for earlier results in this direction). More precisely we will show that for any J < Jc the average over the disorder of the spin–spin time autocorrelation function decays exponentially fast, while if J > Jc
Relaxation to Equilibrium for 2D Disordered Ising Systems
325
it is bounded from below by e−k(log t) for suitable constant k. Our limitation on the dimension, d = 2, comes from the fact that we need in a essential way a recent beautiful extension to random systems (see [Be]) of the results of [MOS]. 2
2. Definition of the Model and Review of Previous Results In this section we will briefly review the models of disordered magnets and the results for the relaxation to equilibrium of an associated Glauber–type dynamics discussed in [CMM] in the so-called Griffiths phase. In [CMM] we studied general finite range many body interactions, while here we restrict our attention to Ising-like spin systems with nearest neighbor interactions, since our main results are on the dilute Ising model. 2.1. The model. We consider the d dimensional lattice Zd whose vertices are called sites ˆ d of unordered pairs b = {x, y} = {y, x} of sites x, y in Zd , with Euclidean and the set Z distance d(x, y) ≡ |x − y| = 1. By QL we denote the cube of all x = (x1 , . . . , xd ) ∈ Zd such that for each i, xi ∈ {0, . . . , L − 1}. If x ∈ Zd , QL (x) stands for QL + x. A finite subset 3 of Zd is said to be a “multiple” of QL , if 3 is the union of a finite number of cubes QL (xi ), where ˆ the set xi ∈ LZd . If 3 is a finite subset of Zd we write 3 ⊂⊂ Zd and we denote by 3 d ˆ of all bonds b = {x, y} ∈ Z such that either x or y belong to 3. The cardinality of 3 is denoted by |3|. The configuration space. Our configuration space is = S Z , where S = {−1, 1}, or V = S V for some V ⊂ Zd . The single spin space S is endowed with the discrete topology and with the corresponding product topology. Given σ ∈ and 3 ⊂ Zd we denote by σ3 the natural projection over 3 . If U , V are disjoint, σU ηV is the configuration on U ∪ V which is equal to σ on U and η on V . Moreover, for any bond b = {x, y}, we set σ b = σ(x)σ(y). If f is a function on , 3f denotes the smallest subset of Zd such that f (σ) depends only on σ3f . f is called local if 3f is finite. The gradient of a function f is defined as d
(∇x f )(σ) = f (σ x ) − f (σ), where σ x ∈ is the configuration obtained from σ, by flipping the spin at the site x. The interaction and the Gibbs measures. We consider an abstract probability space (Θ, B, P) and two sets of i.i.d. real valued random variables indexed by the sites and bonds of Zd : h = {hx }x∈Zd and J = {Jb }b∈Zˆ d . E(·) stands for the expectation with respect to P. We assume (see hypothesis (H3) in [CMM]) bounded interactions, i.e. there exist J∞ > 0 and h∞ ≥ 0 such that P{|Jb | > J∞ or |hx | > h∞ } = 0. As a particular case we consider the dilute Ising model in which the couplings Jb are i.i.d. random variables and take value Jb = J0 with probability p and Jb = 0 with probability 1 − p. Unless we specify the contrary, when we speak of dilute Ising model, J0 can be negative, and we do not make any extra assumption on the magnetic field h beyond the boundedness specified above. We will sometimes refer to this model as the dilute Ising model with parameters J0 and p. We denote by |J| the set of random variables {|Jb |}b∈Zˆ d . If g is any real function, we also let E(g(J)) be the set of (translation invariant) couplings {Eg(Jb )}b∈Zˆ d .
326
F. Cesi, C. Maes, F. Martinelli
For each V ⊂⊂ Zd we define the Hamiltonian HV : 7→ R by X X Jb σ b − hx σ(x). HVJ,h (σ) = −
(2.1)
x∈V
b∈Vˆ
We also set, for σ, τ ∈ HVJ,h,τ (σ) = HVJ,h (σV τV c )
(2.2)
and τ is called the boundary condition. Free boundary conditions are obtained by (formally) letting τ ≡ 0 in (2.2). The corresponding Hamiltonian is denoted by HVJ,h,∅ . For each V ⊂⊂ Zd and τ ∈ we define the (finite volume) conditional Gibbs measure by ( J,h,τ −1 exp[ −HVJ,h,τ (σ) ] if σ(x) = τ (x) for all x ∈ V c ZV J,hτ (2.3) µV (σ) = 0 otherwise, where ZVJ,h,τ is the proper normalization factor called partition function. Notice that we have absorbed the inverse temperature β in the Hamiltonian. We will sometimes drop the superscripts J, h if that does not generate confusion. Given a measurable bounded function f on , µτV (f ) denotes its average and µτV (f, g) stands for the covariance of f and g. Finally, given 1 ⊂ 3 ⊂ Zd we denote by µτ3,1 the projection of the Gibbs measure µτ3 over 1 that is, for σ1 ∈ 1 , we let µτ3,1 (σ1 ) ≡ µτ3 { η ∈ : η1 = σ1 }. For further reference, we define Jc (d) as the critical coupling of the ordinary (nonrandom) ferromagnetic Ising model on Zd , and pc (d) as the critical density for inded when it is pendent bond percolation on Zd . We will sometimes omit the argument √ clear from the context. It is well known that Jc (2) = (1/2) log(1 + 2) and pc (2) = 1/2. The dynamics. The stochastic dynamics we want to study is determined by the Markov d generators LJ,h V , V ⊂⊂ Z , defined by X (LJ,h cJ,h (x, σ)(∇x f )(σ). V f )(σ) =
(2.4)
x∈V
The nonnegative real quantities cJ,h (x, σ), x ∈ Zd , σ ∈ , are the transition rates for the process. The general assumptions on the transition rates are (1) Finite range. If σ(y) = σ 0 (y) for all y such that |x − y| ≤ 1, then cJ,h (x, σ) = cJ,h (x, σ 0 ). (2) Detailed balance. For all σ ∈ and x ∈ Zd , J,h J,h exp −H{x} (σ) cJ,h (x, σ) = exp −H{x} (σ x ) cJ,h (x, σ x ).
(2.5)
(3) Positivity and boundedness. There exist non-negative real numbers cm and cM such that for all J, cm ≤ inf cJ,h (x, σ) x,σ,J,h
and
sup cJ,h (x, σ) ≤ cM . x,σ
(2.6)
Relaxation to Equilibrium for 2D Disordered Ising Systems
327
Two cases one may want to keep in mind are cJ,h (x, σ) = min{e−(∇x H{x} )(σ) , 1}, x (∇x H{x} )(σ) −1 cJ,h (x, σ) = µJ,σ {x} (σ ) = 1 + e
(2.7) (2.8)
corresponding to the Metropolis and heat–bath dynamics respectively. J,h,τ 2 the operator LJ,h ) (this amounts to We denote by LJ,h,τ V V acting on L (, dµV choosing τ as the boundary condition). Assumptions (1), (2) and (3) guarantee that there exists a unique Markov process whose generator is LJ,h,τ , and whose semiV J,h,τ J,h,τ (t)}t≥0 . LV is actually a bounded selfadjoint operagroup we denote by {TV ). The process has a unique invariant measure given by µJ,h,τ . tor on L2 (, dµJ,h,τ V V J,h,τ is reversible with respect to the process, i.e. LJ,h,τ is self-adjoint on Moreover µV V ). L2 (, dµJ,h,τ V Infinite volume dynamics. Let µ be a Gibbs measure for the interaction J, h. Since the transition rates are bounded, the infinite volume generator LJ,h obtained by choosing in L2 (, dµ) (or C()) such V = Zd in (2.4) P is well defined on the set of functions f J,h that |||f ||| ≡ x∈Zd k∇x f k∞ is finite. The closure of L in L2 (, dµ) (C()) is a Markov generator (see, for instance Theorems 3.9 in Chapter I and 4.1 in Chapter IV of [L]), which defines a Markov semigroup denoted by T (t). Again LJ,h is self-adjoint on L2 (, dµ). 2.2. Main results on the dynamics. In [CMM] we prove upper bounds on the speed of relaxation of the dynamics under a key probabilistic assumption (hypothesis (H) below) concerning the equilibrium system. In the same paper we also show that these upper bounds are more or less optimal, since they are “almost” saturated in a quite general class of models (see Theorems 3.2 and 3.3 in [CMM] for precise statements, and for the meaning of “almost”). In order to state hypothesis (H) we need the following definition. Given V ⊂⊂ Zd , n ∈ Z+ and α > 0, we say that the condition SM T (V, n, α) holds if for all local functions f and g on such that d(3f , 3g ) ≥ n we have sup |µτV (f, g)| ≤ |3f ||3g |kf k∞ kgk∞ exp(−αd(3f , 3g )).
τ ∈
Then the main hypothesis of [CMM] can be formulated as follows: (H) There exist L0 ∈ Z+ , α > 0, ϑ > 0 such that for all L ≥ L0 , P{ SM T (QL , L/2, α) } ≥ 1 − e−ϑL . Remark. It is not difficult to check that the main result of [CMM] (see Theorem 2.1 below) follows even if in hypothesis (H) we replace “for all L ≥ L0 ” with the slightly weaker “for all L multiple of L0 ”. It is actually this (apparently) weaker form of (H) that we will show to be implied by a simpler assumption (see Theorem 3.4 below). The following result is a special case of Theorem 3.3 in [CMM]. Theorem 2.1. Assume bonded interactions (see Sect. 2.1) and hypothesis (H). Then (a) If d ≥ 1 there exists a set Θ¯ ⊂ Θ of full measure such that for each J, h ∈ Θ¯ there exists a unique infinite volume Gibbs measure µJ,h . Moreover there exists a constant
328
F. Cesi, C. Maes, F. Martinelli
k and, for each J, h ∈ Θ¯ and for any local function f there exists t0 (J, h, f ) < ∞ such that for all t ≥ t0 , h i 1 kT J,h (t)f − µJ,h (f )k∞ ≤ exp −t exp −k (log t)1− d (log log t)d−1 . (2.9) (b) Let d ≥ 2. Then there exists a constant k and for any local function f there exists t0 (f ) < ∞ such that, if t ≥ t0 (f ) then d (2.10) E kT J,h (t)f − µJ,h (f )k∞ ≤ exp −k (log t) d−1 (log log t)−d . Remark. Some of the results of [CMM] hold also for unbounded interaction as long as 1+δ 1+δ E{e|Jb | } < ∞ and E{e|hx | } < ∞ for some positive δ. Here we prefer to work directly in the bounded case since the latter is more relevant from the physical point of view and the results on the dynamics are more transparent. The main purpose of this paper is then to study the two dimensional dilute Ising model with paramenters J0 and p, and (1) to show that hypothesis (H) is verified in a portion of the (J0 , p) plane which intersects the region {p > pc (2)} ∩ {J0 > Jc (2)} (Corollary 4.2). (2) to prove a lower bound on the RHS of (2.10) which shows that in the Griffiths’ phase (and in the ferromagnetic case), the corresponding upper bound is “almost” optimal (Proposition 4.3). 3. An Alternative Assumption to Hypothesis (H) in Two Dimensions We present here, for two dimensional systems, an alternative assumption to the basic hypothesis (H) of [CMM]. The advantage of this new assumption is that in some concrete case, like the dilute Ising ferromagnet, it can be explicitly verified in interesting regions of the phase diagram. The idea behind what follows is essentially based on a result obtained in [MOS] for two dimensional non-random systems and recently extended in [Be] to random ones. In order to formulate the main result of this section we first need some additional definitions. Definition 3.1. Given C > 0, m > 0 and 3 ⊂ Zd , we say that W M E(3, C, m) holds if for all 1 ⊂ 3 X 0 c e−md(x,3 ) , E{sup Var(µτ3,1 , µτ3,1 )} ≤ C τ,τ 0
x∈1
where Var denotes the variation distance. We say that W M E(C, m) holds if W M E(3, C, m) holds for all 3 ⊂ Zd . Definition 3.2. Given C > 0, m > 0 and 3 ⊂ Zd , we say that SM E(3, C, m) holds if for all 1 ⊂ 3 and all y ∈ 3c X y E{sup Var(µτ3,1 , µτ3,1 )} ≤ C e−m|x−y| . τ,τ y
x∈1
Using these notions we have the following key result [Be] (see also [MOS] for a similar statement in case of non-random systems).
Relaxation to Equilibrium for 2D Disordered Ising Systems
329
Theorem 3.3. Let d = 2. Then W M E(C, m) implies that there exists L0 ∈ Z+ , m0 > 0 and C 0 > 0 such that SM E(3, C 0 , m0 ) holds for all sets 3 multiple of QL0 . Remark. Actually in [Be] the above result is proved under the hypothesis that W M E (3, C, m) holds for all rectangles 3. We are finally in a position to give our alternative to assumption (H). Lemma 3.4. Let d = 2 and let ε ∈ (0, 1). Then W M E(C, m) implies that there exists L0 ∈ Z+ , α > 0 and ϑ > 0 such that for all integers L multiple of L0 , P{SM T (QL , εL, α)} ≥ 1 − e−ϑL . Proof. Let l0 < l1 < L be three integers such that l1 and L are both multiples of l0 and suppose moreover that L2/3 ≤ l1 ≤ εL/4. Let us partition 3 ≡ QL into disjoint rectangles as follows Bi = Ql1 (xi ) ∩ 3;
3 = ∪ni=1 Bi ,
xi ∈ l 1 Z 2 ,
and let 3I = ∪i∈I Bi ∀I ⊂ {1 . . . n}. Finally we denote with Iε the set of all pairs (I1 , I2 ) with I1 ⊂ I2 ⊂ {1 . . . n} such that d(3I1 , 3 \ 3I2 ) ≥ εL/2. Next, for a given α > 0, we define Θ(α, ε) as the set of those couplings such that for all pairs (I1 , I2 ) ∈ Iε one has sup y∈3\3I2
sup Var(µτ3I τ
2
τy ,3I1 , µ3I2 ,3I1 )
≤ e−2αL .
(3.1)
We claim that for all J ∈ Θ(α, ε) property SM T (QL , εL, α) holds, provided that L is large enough depending only on α. Given in fact two arbitrary functions f and g, with d(3f , 3g ) ≥ εL, let us set If = {i ∈ {1 . . . n} : Bi ∩ 3f 6= ∅}
Ifc = {1 . . . n} \ If ,
and similarly for g. Notice that, because of our choice of l1 , we have d(3If , 3Ig ) ≥ εL − 2l1 ≥ εL/2.
(3.2)
Now we write, using the DLR equations, X 0 µτ3 (σ)µτ3 (σ 0 )f (σ)[µσ3I c (g) − µσ3I c (g)], µτ3 (f, g) = f
(σ,σ 0 )
(3.3)
f
so that, using a standard interpolation argument (see e.g. the proof of Proposition 4.3 in [CMM]), X y sup Var(µτ3I c ,3Ig , µτ3I c ,3Ig ). (3.4) sup |µτ3 (f, g)| ≤ kf k∞ kgk∞ τ
y∈3If
τ
f
f
Notice that, because of (3.2), the pair (Ifc , Ig ) belongs to Iε . Thus, thanks to (3.1), we get RHS of (3.4) ≤ kf k∞ kgk∞ L2 e−2αL ≤ kf k∞ kgk∞ |3f ||3g |e−αd(3f ,3g ) ,
(3.5)
namely SM T (3, εL, α), for any J ∈ Θ(α, ε) and any L large enough. In conclusion we have proved that, for any L large enough,
330
F. Cesi, C. Maes, F. Martinelli
P{SM T (QL , εL, α) holds } ≥ P{Θ(α, ε)}. Let us estimate from above P{Θ(α, ε)c }. We write P{Θ(α, ε)c } ≤|Iε |2
sup (I1 ,I2 )∈Iε
|3 \ 3I2 |
sup y∈3\3I2
P{sup Var(µτ3I τ
2
τy ,3I1 , µ3I2 ,3I1 )
≥ e−2αL }.
(3.6)
Using the standard Chebyshev inequality we can bound from above the probability appearing in the r.h.s of (3.6) by P{sup Var(µτ3I τ
2
τy ,3I1 , µ3I2 ,3I1 )
≥ e−2αL } ≤ e2αL E{sup Var(µτ3I τ
2
τy ,3I1 , µ3I2 ,3I1 )}.
(3.7) We are finally in a position to use our key hypothesis W M E(C, m). We know in fact, thanks to Theorem 3.3 that W M E(C, m) implies that there exist L0 , C 0 and m0 > 0 such that SM E(C 0 , m0 ) holds for all sets that are multiples of the square QL0 (0). Thus, if we choose l0 = L0 , we get that all sets of the form 3I for some I ⊂ {1 . . . n} are “multiple” of QL0 (0) since l1 and L are both multiple of l0 . Thus, using the definition of SM E(C 0 , m0 ), we get that 0
RHS of (3.7) ≤ L2 e2αL C 0 e−m εL/2 . If we finally plug the above bound into the r.h.s of (3.6) and use the trivial bounds 2
|Iε | ≤ 2(L/l1 ) ≤ 2L
2/3
,
sup |3 \ 3I | ≤ L2 , I
we obtain
0
P{Θ(α, ε)c } ≤ 2L L4 e2αL C 0 e−m εL/2 . 2/3
0
The lemma now follows with e.g. ϑ = α and 4α = εm .
(3.8)
4. Applications We have shown in Lemma 3.4 that, in two dimensions, hypothesis (H) of Theorem 2.1 can be replaced by W M E(C, m). In this final section we will show that condition W M E(C, m) holds if either one of two simple inequalities involving the expectation of a certain function of Jxy is verified. It is then straightforward to check that, for the two dimensional diluted Ising model with couplings Jxy ∈ {0, J0 } and dilution p, the region (in the (J0 , p) plane) where at least one of these inequalities holds, includes the case where there exists an infinite connected cluster of bonds with coupling J0 > Jc (2). Finally we consider the ferromagnetic case with no magnetic field and prove a lower bound on the disorder–averaged autocorrelation function which almost saturates the corresponding upper bound. Theorem 4.1. Assume d ≥ 2, bounded interactions (see Sect. 2.1) and either (a) hx = 0 for all x ∈ Zd or ˆ d. (b) hx ≥ 0 and Jb ≥ 0 for all x ∈ Zd and all b ∈ Z
Relaxation to Equilibrium for 2D Disordered Ising Systems
331
Then any of the following two conditions implies that there exist two positive constants C and m such that W M E(C, m) holds. (i)
E |Jxy | < Jc (d),
(ii) E (1 − e−2|Jxy | ) < pc (d). where Jc (d) and pc (d) are those defined is Sect. 2.1. Remark. We give this result for all d ≥ 2 since its proof does not depend on d, but for our purposes we only need the d = 2 case. Proof of Theorem 4.1. (a) Using the Fortuin-Kasteleyn representation for the Ising model in the absence of external field, one can prove the following very nice inequality (see formula (3.5) in [N]) X |J|,+ J,τ 0 µ3 (σ(x)), (4.1) sup Var(µJ,τ 3,1 , µ3,1 ) ≤ 2 τ,τ 0
x∈1
|J|,+ µ3
where denotes the Gibbs measure in 3 with plus boundary conditions and cou|J|,+ plings {|Jb |}b∈3ˆ . Since the Gibbs measure µ3 is ferromagnetic, we can use at this point a result of [Hi] (see inequality (2.20) in Sect. 2 there) which states that X X |J|,+ |J|,∅ µ3 (σ(x)σ(z)), (4.2) µ3 (σ(x)) ≤ y ∈3 /
z∈3 |z−y|=1
|J|,∅
where the superscript ∅ in µ3 denotes free boundary conditions. Thus, in order to prove the theorem, it is sufficient to show that either (i) or (ii) imply that there exist two positive constants C and m such that |J|,∅
E{µ3
(σ(x)σ(z))} ≤ Ce−m|x−z| .
(4.3)
Let us start with (i). In this case we can use an old result [OPG] that states that |J|,∅
E{µ3
E{|J|},∅
(σ(x)σ(z))} ≤ µ3
(σ(x)σ(z)).
(4.4)
E{|J|},∅
where µ3 denotes standard Ising Gibbs measure in 3 with zero external field, free boundary conditions and coupling E{|Jb |} . Thanks to the second Griffiths inequality R.H.S. of (4.4) ≤ µE{|J|} (σ(x)σ(z)) ≤ Ce−m|x−z|
(4.5)
for suitable C and m. Here µE{|J|} denotes the infinite volume limit as 3 → Zd of E{|J|},∅ and in the last inequality in the r.h.s of (4.5) we used the fact that E{|J|} < µ3 Jc implies exponential decay of infinite volume correlations (see [ABF]). Let us now consider case (ii). Here we can use another result on the FK representation for the random Ising model (see e.g. formula (5a), (5b) and following the remark in [ACCN] or formula (1.9), (1.11) and Theorem 1 in [N]) which states that |J|,∅
E{µ3
(σ(x)σ(z))} ≤ Probp (x → z),
(4.6)
where p ≡ E{(1− e−2|Jxy | )} and Probp (x → z) denotes the probability for independent bond percolation with occupation density p that there exists a path of occupied bonds
332
F. Cesi, C. Maes, F. Martinelli
starting in x and ending in z. Since by assumption p < pc this last probability is exponentially small in |x − y| (see e.g. [Gri] and references therein), (4.3) follows. (b) In the ferromagnetic case Jb ≥ 0 with non-negative external field hx , it has been proved in [Hi] (see formula (2.13) and (2.15) there) that X J,h=0,+ 1 X J,h,+ [µ3 (σ(x)) − µJ,h,− (σ(x))] ≤ µ3 (σ(x)). 3 2 x∈1 τ,τ 0 x∈1 (4.7) At this point we proceed exactly as before. 0
J,τ sup Var(µJ,τ 3,1 , µ3,1 ) ≤
As a corollary we get Corollary 4.2. Consider the two dimensional dilute Ising model with parameters J0 and p, and assume that either hypothesis (a) or hypothesis (b) of Theorem 4.1 hold and that J 2p 1 c p ∨ 2 log( 2p−1 ) if p > 1/2 . |J0 | < ∞ if p ≤ 1/2 Then hypothesis (H) holds and Theorem 2.1 applies. Proof of the Corollary. If |J0 | is smaller than the stated value then either (i) or (ii) of theorem 4.1 holds since pc = 1/2 for d = 2. Thus in two dimensions we can apply Lemma 3.4 above and get the result. Remark. Using the Fortuin–Kesteleyn representation, it has also been shown in [ACCN] that the dilute Ising magnet exihibits the phenomenon of spontaneous magnetization provided that 1 − e−2Jb > pc E 1 + e−2Jb so that the infinite volume Gibbs state is non-unique and the associated dynamics is no longer ergodic. We conclude by observing that if J0 is smaller than the critical value Jc , then, thanks to the Griffiths inequality, J,h=0,∅ (σ(x)σ(z)) ≤ Ce−m|x−z| µ3
for any configuration of J. Thus, in this case, we can apply Theorem 3.1 of [MO1] and get that the relaxation to equilibrium is purely exponential for any configuration of J and any value of p. In order to show that a dynamical phase transition occurs when J0 crosses the critical value Jc , we need a lower bound like the one given in Theorem 2.1 for any J0 > Jc . Proposition 4.3. Consider the two dimensional ferromagnetic dilute Ising model with parameters J0 and p and no magnetic field (hx = 0 for all x). Let J0 > Jc and assume that there is a unique Gibbs measure with probability one (w.r.t P(·)). Let f (σ) = σ(0). Then for any δ > 0, we have, for t large enough E kT J (t)f − µJ (f )kL2 (µJ ) ≥ e−(1+δ)k(p,J0 ) (log t) , 2
Relaxation to Equilibrium for 2D Disordered Ising Systems
333
where k(p, J0 ) = 2 log(1/p)(J0 τ (J0 ))−2 and τ (J0 ) is the surface tension at inverse temperature J0 in the direction of the coordinate axes for the standard ferromagnetic two dimensional Ising model. Proof. We follow closely the proof of the lower bound of Theorem 3.3 given in [CMM]. Since some details are different, we repeat for clarity the basic steps. Since there exists a unique infinite volume Gibbs state, which we denote by µJ , and since there is no magnetic field, we have, by spin flip symmetry µJ (f ) = 0. Therefore, in order to prove the result, it is sufficient to prove the required lower bound only for E kT J (t)f kL2 (µJ ) . For this purpose let 3 = QL , and let Θ¯ be the set of all interactions J ∈ Θ such that (a) Jxy = J0 for all {x, y} such that {x, y} ⊂ 3, (b) Jxy = 0 for all {x, y} which intersect both 3 and 3c (the boundary edges). Notice that for all events X which depend only on the spins inside 3 we have µJ (X) = J0 ,∅ J0 ,∅ (X), where µ3 denotes the ordinary Ising Gibbs measure in 3 with free boundary µ3 conditions and coupling J0 . For a ∈ [−1, 1], consider then the event F3 = {σ ∈ : m3 (σ) > a},
P −1
where m3 (σ) = |3| x∈3 σ(x) denotes the normalized magnetization in 3 of the configuration σ. It is then easy to see (see (6.24), (6.25) in [CMM]) that p ¯ inf µJ (F3 ) µJ (T J (t) m3 | F3 ) (4.8) E kT J (t)f kL2 (µJ ) ≥ P(Θ) ¯ J∈Θ
¯ the last factor can be estimated (see formula (6.25), . . . , (6.28) in [CMM]) as If J ∈ Θ, J0 ,∅ (F3 )−1 µJ (T J (t) m3 | F3 ) ≥ a − (a + 1) µ3 h i 0 2 J0 ,∅ |} + e−k |3|t k |3| t µ3 {|m3 (σ) − a| ≤ |3|
(4.9)
for suitable positive constants k, k 0 . In [CMM] we have set a = 1/2, and inequality (4.9) appears with the factor 2/|3| replaced by 1/100. Here we want to be more careful because we are arbitrarily close to the critical temperature. It is straightforward to check that the replacement (1/100) → (2/|3|) is correct. The 2/|3| in fact represents the absolute value of the smallest change of m3 due to a spin flip. As for the value of a we let here a = 1/|3|, which is enough for our purposes. From (4.9) we then get µJ (T J (t)m3 | F3 ) ≥ h
1 J0 ,∅ − 2 µ3 (F3 )−1 |3|
J0 ,∅ k |3| t µ3 {|m3 (σ)| ≤
i 0 3 |} + e−k |3|t . |3|
(4.10)
In order to conclude the proof we use a large deviation result for the magnetization of the two dimensional Ising model [CGMS]. Theorem 4.4. Let d = 2 and J0 > Jc . Then for any δ 0 > 0 there exists L0 (δ) such that for all L ≥ L0 (δ) J0 ,∅ { |mQL (σ)| ≤ µQ L
3 |QL |
0
} ≤ e−J0 τ (J0 )(1−δ )L
and
J0 ,∅ µQ { mQL (σ) ≥ L
1 |QL |
} ≥ 1/3,
334
F. Cesi, C. Maes, F. Martinelli
where τ (J0 ) is defined as in Proposition 4.3. Proof. The first result is a particular case of Proposition 2.3 in [CGMS], while the second inequality follows from the first (plus spin flip symmetry). Choose now the side of the square 3 depending on t as L = Lt = blog t/(J0 τ (J0 ) (1 − 2δ 0 ))c Using Theorem 4.4, we see that if t is large enough then the RHS of (4.10) is greater than, ¯ Using (4.8) and again the second statement of theorem say, 1/(2|3t |) for all J ∈ Θ. [CGMS] we obtain, for large enough t, E kT J (t)f kL2 (µJ ) ≥
2 0 −2 2 1 1 √ p2Lt (1 − p)4Lt ≥ e−(1−3δ ) k(p,J0 )(log t) . 2|3t | 3
The proposition then follows with an appropriate choice of δ 0 as a function of δ.
References [AAPS] Palmer, R.G., Stein, D.L., Abrahams, E. and Anderson, P.W.: Models of hierarchically constrained dynamics for glassy relaxation. Phys. Rev. Lett. 53, No 10, 958 (1984) [ABF] Aizenman, M., Barsky, D.J. and Fernandez, R.: The phase transition in a general class of Ising–type models is sharp. J. Stat. Phys. 47, No. 3/4, 343 (1987) [ACCN] Aizenman, M., Chayes, J.T., Chayes, L. and Newman, C.M.: The phase boundary in dilute and random Ising and Potts ferromagnets. J.Phys.A: Math. Gen. 20, L313 (1987) [BD] Bassalygo, L.A. and Dobrushin, R.L.: Uniqueness of a Gibbs field with random potential- an elementary approach. Theory Prob. Appl. 31, No 4, 572 (1986) [B1] Bray, A.J.: Upper and lower bounds on dynamic correlations in the Griffiths phase. J.Phys.A: Math. Gen. 22, L81 (1989) [B2] Bray, A.J.: Dynamics of dilute magnets above Tc . Phys. Rev. Lett. 60, No 8 720 (1988) [Be] van den Berg, J.: A constructive mixing condition for 2-D Gibbs measures with random interactions. Preprint 1996 [CGMS] Cesi, F., Guadagni, G., Martinelli, F. and Schonmann, R.H.: On the 2D stochastic Ising model in the phase coexistence region close to the critical point. J. Stat. Phys. 85, No 1/2, 55 (1996) [CMM] Cesi, F., Maes, C. and Martinelli, F.: Relaxation of disordered magnets in the Griffiths phase. Roma preprint 1996, Commun. Math. Phys., in press [DLO] De Dominicis, C., Orland, H. and Lain´ee, F.: Stretched exponential in systems with random free energies. J.Physique Lett. 46, L–463 (1985) [DRS] Dhar, D., Randeria, M. and Sethna, J.P. Griffiths singularities in the dynamics of disordered Ising models. Europhys. Lett., 5, No 6, 485, (1988) [DKP] Von Dreyfus, H., Klein, A. and Perez, J.F.: Taming Griffiths singularities: infinite differentiability of quenched correlations functions. Commun. Math. Phys. 170, 21 (1995) [F] Fr¨ohlich, J.: Mathematical aspects of the physics of disordered systems. In Critical Phenomena, Random Systems, Gauge Theories. Eds. K.Osterwalder and R. Stora, Amsterdam: Elsevier, 1986 [GM1] Gielis, G. and Maes, C.: Percolation techniques in disordered spin flip dynamics: Relaxation to the unique invariant measure. Commun.Math.Phys. 177, 83 (1995) [GM2] Gielis, G. and Maes, C.: Local analyticity and bounds on the truncated correlation functions in disordered systems. Markov Processes Relat. Fields 1, 459 (1995) [GM3] M3 Gielis, G. and Maes, C.: The uniqueness regime of Gibbs fields with unbounded disorder. J. Stat. Phys. 81, No. 3/4, 829 (1995) [Gr] Griffiths, R.: Non-analytic behaviour above the critical point in a random Ising ferromagnet. Phys. Rev. Lett. 23, 17 (1969) [Gri] Grimmett, G.: Percolation. Berlin–Heidelberg–New York: Springer-Verlag, 1989 [GZ1] Guionnet, A. and Zegarlinski, : Decay to equilibrium in random spin systems on a lattice. Commun. Math.Phys. 181, 703 (1996)
Relaxation to Equilibrium for 2D Disordered Ising Systems
[GZ2] [J] [Hi] [L] [M] [MO1] [MOS] [N] [O] [OPG] [RSP]
335
Guionnet, A. and Zegarlinski, B.: Decay to equilibrium in random spin systems on a lattice II. J. Stat. Phys. 86, 899 (1997) Jain, S.: Anomalously slow relaxation in the diluted Ising model below the percolation threshold. Physica A 218, 279 (1995) Higuchi, Y.: Coexistence of infinite (*)-clusters II: – Ising percolation in two dimension. Prob. Th. Rel. Fields 97, 1 (1993) Liggett, T.M.: Interacting particles systems. Berlin–Heidelberg–New York: Springer-Verlag, 1985 Martinelli, F: On the two dimensional dynamical Ising model in the phase coexistence region. J. Stat. Phys. 76, No. 5/6 1179 (1994) Martinelli, F. and Olivieri, E.: Approach to equilibrium of Glauber dynamics in the one phase region I: The attractive case. Commun. Math. Phys. 161, 447 (1994) Martinelli, F., Olivieri, E. and Schonmann, R.H.: For Gibbs state of 2D lattice spin systems weak mixing implies strong mixing. Commun. Math. Phys. 165, 33 (1994) Newman, C.M.: Disordered Ising systems and random cluster representation. In: Probability and Phase Transition edited by G.Grimmet, NATO ASI Series Vol. 120 (1993) Ogielski, A.T.: Dynamics of three dimensional Ising spin glasses in thermal equilibrium. Phys. Rev. B, 32, No 11, 7384 (1985) Olivieri, E., Perez, F. and Goulart, S.–Rosa–Jr.: Some rigorous results on the phase diagram of the dilute Ising model. Phys. Lett. 94A, No 6,7, 309 (1983) Randeria, M., Sethna, J.P. and Palmer, R.G.: Low-frequency relaxation in Ising spin–glasses. Phys. Rev. Lett. 54, No. 12, 1321 (1985)
Communicated by J. L. Lebowitz
Commun. Math. Phys. 189, 337 – 363 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Fluctuations of Principal Eigenvalues and Random Scales Alain-Sol Sznitman Department Mathematik, ETH-Z¨urich, CH-8092 Z¨urich, Switzerland Received: 13 June 1996 / Accepted: 10 March 1997
Dedicated to the memory of Roland Dobrushin Abstract: We investigate the principal Dirichlet eigenvalue of the Laplacian with soft Poissonian obstacles in large boxes of Rd , d ≥ 2. With the help of our recent version of the method of enlargement of obstacles [18], we derive quantitative confidence intervals for these eigenvalues. We also provide less quantitative estimates, which however point out the correct size of fluctuations, and indicate a stiffness in their behavior. In the twodimensional case we derive geometric controls, which relate these eigenvalues to certain empty circular droplets. Our results also have natural applications to the study of the location of minima of certain intermittent random variational problems, motivated by [13, 17].
0. Introduction We consider a soft repulsive Poissonian potential in Rd , d ≥ 2: Z X X V (x, ω) = W (x − xi ) = W (x − y) ω(dy), x ∈ Rd , ω = δxi ∈ , (0.1) i
i
where is the space of locally finite simple pure point measures on Rd and ω the generic cloud configuration. The function W (·), which models the soft obstacles, is non negative, bounded measurable, compactly supported, not a.e. equal to 0. We let P stand for the canonical Poisson law of constant intensity ν > 0, on Rd . For each open set U ⊆ Rd , and ω ∈ , we introduce 1 2
λω (U ) = the principal Dirichlet eigenvalue of − 1 + V (·, ω) in U .
(0.2)
The main body of this article is concerned with the derivation of estimates on the fluctuations of λω ((−`, `)d ), as ` → ∞. The motivation for this study is to develop a better
338
A.-S. Sznitman
understanding of the asymptotic (i.e. large t) location of minima of the variational problem: (0.3) inf Ft (`, ω) , where `>0
Ft (`, ω) = ` + t λω ((−`, `)d ), ` > 0 .
(0.4)
Related variational problems show up in the study of Brownian motion in a Poissonian potential, when one tries to determine where the particle settles down, see [13, 17]. In fact random variational problems naturally arise in several questions of disordered media, see for instance [1] and references therein. The one dimensional case of (0.3) can be discussed as in [17] I, Sect. 5, see also (5.8) below. We want to investigate here the d ≥ 2 situation. It turns out that the asymptotic location of minima in (0.3) is a rather delicate question, which is closely related to the size of fluctuations of the random variables λω ((−`, `)d ). At an informal level, if s(t) denotes the “typical scale” for ` in which the minima occur and σ(`) the ’‘typical size of fluctuations of λω ((−`, `)d ) around a median”, the two quantities are connected by the heuristic principle: s(t) ≈ t σ(s(t)), for large t .
(0.5)
This “Ansatz” roughly corresponds to balancing out the extra cost of moving at distances of order s(t) and the associated gain induced by the decrease of the principal eigenvalue. We refer the reader to the beginning of Sect. 5 for further comments on the variational problem (0.3). Let us now explain how the article is organized. Section 1 introduces some further notations and recalls some results which we shall need here. In Sect. 2, we derive confidence intervals on λω ((−`, `)d ), which show (see Theorem 2.2) that: h c(d, ν) ii h 1 c(d, ν) γ − , + P λω ((−`, `)d ) ∈ 2+χ (log `)2/d (log `)2/d (log `)3/d (log `) d (0.6) ≥ 1 − exp{−(log `)b } , as ` → ∞, with χ, γ, b, suitable positive constants. Here we use the notation d 1/d λd c(d, ν) = 2 , and R0 = , νωd R0
(0.7)
where λd = λ− 1 1 (B(0, 1)) stands for the principal Dirichlet eigenvalue of − 21 1 in 2 B(0, 1) and ωd = |B(0, 1)| for its volume. These estimates come as a quick application of the version of the method of enlargement of obstacles developed in [18] and for instance improve recent results of Beliaev and Yurinsky [2], in the case of “hard spherical obstacles”. In Sect. 3 we derive less quantitative estimates which indicate that the size of fluctuations of the variables λω ((−`, `)d ) around a median, in a suitable sense has order 2 (log `)− d −1 , for large `, see Theorem 3.2, Corollaries 3.4 and 3.5. To appreciate the significance of this order of magnitude and of the c(d, ν)(log `)−2/d in (0.6), consider v(`, ω) the volume of the largest ball in (−`, `)d receiving no point of the cloud ω and eω (`) = λd (ωd /v(`, ω))2/d , (0.8) λ
Fluctuations of Principal Eigenvalues
339
its associated principal Dirichlet eigenvalue for − 21 1. It is known (see Janson [12], Hall [8], Theorem 2), that when ` is large, v(`, ω) has fluctuations of order 1 around a median of size (d log ` + (d − 1) log log `)/ν. This fact displays the rigidity of the behavior of eω (`) has fluctuations v(`, ω). Moreover it is easily inferred from these references that λ − d2 −1 , when ` is large. This indicates a of precisely the above mentioned order (log `) surprising stiffness of our random eigenvalues. In Sect. 4 we discuss geometric controls in the two dimensional situation. For instance we show in Theorem 4.6 that with high P-probability λω ((−`, `)2 ) is close to: 0
λω (B(X` (ω), (R0 + (log `)−χ )
p log `)) ,
(0.9)
where the random centering X` (ω) of the disc which appears in (0.9) is such that: 0
B(X` (ω), (R0 − (log `)−χ )
p log `)
(0.10)
receives no point of ω, and B(X` (ω), 2R0
p log `) ⊆ (−`, `)2 .
(0.11)
By a slight variation of our arguments, we study when → 0, the conditional distribution P [ · / λ ((−1, 1)2 ) ≤ c] for fixed c > λ2 ,
(0.12)
where P stands for the Poisson law with intensity ν−d (d = 2 in (0.12)), and where for an open set U ⊆ Rd : 1 2
λ (U ) = the principal Dirichlet eigenvalue of − 1 +
X i
−2 W
· − xi
in U .
(0.13) We show, somewhat in the vein of the Wulff droplet phenomenon, see [6, 10], that with high conditional probability an empty disc of radius comparable to R, where λ2 R−2 = c, occurs within (−1, 1)2 , see Theorem 4.7. In Sect. 5, we apply the results of Sects. 2 and 3 to study the variational problem (0.3). Although less complete than in the one dimensional case, our results display the 2 importance of the fluctuation scale (log `)− d −1 found in Sect. 3. The Appendix proves some results which are only used in Sect. 4. In particular we provide a quantitative version of the Faber-Krahn inequality in the two dimensional case, and a lower bound on the shift of principal eigenvalue due to the presence of one soft obstacle in a disc. The quantitative Faber-Krahn inequality is close to the results in Sect. 4 of Melas [14] or in Sect. 3 of [16]. With the help of the isoperimetric inequality from Hall [9], it should be possible to obtain qualitatively similar results in a d ≥ 3 setting, and thus extend part of the results of Section IV to a d ≥ 3 situation. We do not discuss these generalizations here for simplicity. Our results are written for the case of soft obstacles, but can routinely be adapted to the case of hard obstacles (when one imposes Dirichlet conditions on xi + K, with ¯ a), a > 0). Let us also point K a fixed non polar compact set, for instance K = B(0, out that they merely rely on [18] (and not on [16, 17]). Finally we thank A. Barbour for mentioning reference [12] and M. van den Berg for mentioning references [9 and 14].
340
A.-S. Sznitman
1. Setting and Notations The goal of this section is to introduce some further notations and recall some of the results of [18], which we shall use here. For ` > 1, we shall write = (log `)−1/d .
(1.1)
It then follows from a straightforward scaling argument (see (A.3) of [18]) that: λω ((−`, `)d ) has same distribution under P as 2 λ (T ) under P ,
(1.2)
where P is defined after (0.12), and T stands for the open set T = (−`(log `)−1/d , `(log `)−1/d )d = (−e
−d
, e
−d
)d .
(1.3)
With somewhat different notations from [18], Sect. III, we define 6 ∅, C` = the collection of blocks B = q + (0, [γ1 log log `])d , q ∈ Zd , with B ∩ T = (1.4) here γ1 (d, ν) > 0 is a constant determined in (3.14) of [18]. It was proven in Proposition 3.2 of [18] that (1.5) P [F ] ≥ 1 − 3`−d , when ` is large, where n o F = c(d, ν) + γ2 ≥ inf λ (B ∩ T ) ≥ λ (T ) ≥ inf λ (B ∩ T ) − 2d , (1.6) B∈C`
B∈C`
with the notation of (0.7) and γ2 (d, ν, W ) > 0, a constant defined in Proposition 3.2 of [18]. We shall make repeated use of (1.5) in this article. It indicates that λ (T ) tends to behave like a minimum of the not too dependent variables λ (B ∩ T ), B ∈ C` . We shall now recall some key estimates and elements of the method of enlargement of obstacles of [18]. The method is based on the construction for ∈ (0, 1) and ω ∈ of two disjoint subsets of Rd , D and B (dropping the , ω dependence in the notation. The set D (as “density”) is where one “enlarges” or “solidifies” the obstacles, see (1.15) below, whereas the set B is a “bad set”, where some obstacles are present, but are not “enlarged”. The construction of Sect. I and II in [18] goes as follows. One chooses 0<α<γ<β<1,
(1.7)
which correspond to rough scales 1 >> α >> γ >> β >> , and L≥2,
(1.8)
an integer. One has an L-adic decomposition of Rd into boxes of size L−k , k ≥ 0, (see Sect. I of [18]). The scales of interest are L−nα () , L−nγ () , L−nβ () , where nα , nγ , nβ are determined by (1.9) L−nα −1 ≤ α < L−nα , and similar inequalities for nγ , nβ . Finally one has a parameter: δ>0.
(1.10)
Fluctuations of Principal Eigenvalues
341
The set D is then the union of all “density boxes”, that is of all boxes of size L−nγ which fulfill a certain quantitative Wiener criterion, see (1.15) of [18]. This condition corresponds to a non degeneracy in average (role of δ), between scales L−nγ and L−nα , of the capacity in scale L−k of a certain skeletonSof the soft obstacles. By skeleton we ¯ i , a), where mean that the construction only involves the set B(x i
¯ a)c } > 0 , a(W ) = inf{r > 0, W = 0 on B(0,
(1.11)
and not the detailed structure of W . On the other hand B is the union of “bad boxes”, that is the boxes of size L−nβ , which receive a point of ω, and are contained in a box of size L−nγ , where the Wiener criterion breaks down. As a result of this construction, we obtain sets D, and B, which depend measurably on ω ∈ , and satisfy (see (2.34) of [18]): D∩B =∅,
(1.12)
ω(Rd \(D ∪ B)) = 0 ,
(1.13)
for each box Cq = q + [0, 1)d , q ∈ Zd , the sets D ∩ Cq , (resp. B ∩ Cq ) can take no more than 2
−dγ
(resp. 2
−dβ
)
(1.14)
different possible shapes as ω varies over . We shall now state the two estimates we shall use in the sequel. The values of ρ0 and κ0 are given in (1.17), (1.19) below. Spectral control (Theorem 1.2 of [18]): For ρ ∈ (0, ρ0 ) and M > 0, ¯ ∧ M − λ (U ) ∧ M ) = 0 , lim sup −ρ (λ (U \D)
→0
(1.15)
ω,U
here ω runs over and U over the open subsets of Rd of U . The replacement of U by U \D¯ corresponds to “solidifying the set D¯ as a hard obstacle”. Volume controls (Theorem 2.5 of [18]): There is an L0 (d) ≥ 2 and δ0 (d, L) > 0, such that for L ≥ L0 (d) and δ ∈ (0, δ0 (d, L)): lim
→0
sup q∈Zd ,ω∈
−κ0 |B ∩ Cq | < ∞ ,
(1.16)
with the notation of (1.14). The constant ρ0 is equal to ρ0 = δ c1
(γ − α) , (d + 2) log L
(1.17)
where c1 (d, W ) > 0 is defined in (1.21), (1.22) of [18] and has the scaling invariance property: · (1.18) c1 (d, W (·)) = c2 (d, η −2 W ( )), for η > 0 . η
¯ i , a), formally corre(In the case of hard obstacles given by Dirichlet conditions on B(x sponding to W (·) = ∞ 1 {| · | ≤ a}, c1 only depends on d). As for the constant κ0 (see (2.47) of [18]), it equals
342
A.-S. Sznitman
δ log(3d + 1) − (d − 2)(1 − β), when d ≥ 3 , κ0 = 1 − (γ − α) 2 − δ0 log L δ log(10 c log L) (1.19) , when d = 2 , 1− (γ − α) 2 − δ0 log L with c > 0, a numerical constant . Let us finally provide some comments on how we shall choose the parameters we have introduced. For the applications discussed in this article, with the sole exception of Theorem 4.7, we shall pick, see (0.7), M = 2c(d, ν) . (1.20) We shall also say that the parameters α, γ, β, δ, L, ρ, κ, are admissible, if (1.7) holds, L ≥ L0 (d), 0 < δ < δ0 (d, L) ,
(1.21)
(1.22) κ0 > 0, κ ∈ (0, κ0 ), ρ ∈ (0, ρ0 ) . (In view of (1.19) the condition κ0 > 0 is certainly fulfilled if L is large enough and β is close to 1.) 2. Confidence Intervals We shall now prove in this section quantitative confidence intervals for λω ((−`, `)d ) under P, as ` becomes large. We begin with a volume estimate for |B\(D ∪ B)|, B ∈ C` , which is a simple consequence of (1.13), (1.14). Lemma 2.1. Consider an admissible choice of parameters (see the end of Sect. 1), β 0 ∈ (β, 1), and define n o 0 d G = ω ∈ , sup |B\(D ∪ B)| ≤ + d(1−β ) , (2.1) ν
B∈C`
then for large `, P [G] ≥ 1 − exp
n
−
ν 2
(log `)β
0
o .
(2.2)
Proof. Consider B0 = (0, [γ1 log log `])d . It follows from (1.14) that: d
B0 \(D ∪ B) can take at most 22γ1 (log log `)
d
−dβ
(2.3)
distinct possible shapes as ω varies .
Therefore in view of (1.13), estimating the number of shapes with (2.3), and the prob0 ability that a given shape of volume larger than νd + d(1−β ) receives no point of ω, we find: P [|B0 \(D ∪ B)| ≥
d ν
0
d
+ d(1−β ) ] ≤ 22γ1 (log log `)
d
−dβ
0
exp{−d−d − ν−dβ } . (2.4)
Thus for large `, P [Gc ] ≤ (4`)d P [|B0 \(D ∪ B)| ≥
d ν
0
+ d(1−β ) ] 0
≤ 4d d exp{2γ1d (log log `)d −dβ − ν−dβ } ≤ exp This proves our claim.
n
−
ν 2
−dβ
0
o .
Fluctuations of Principal Eigenvalues
343
We now introduce the exponent χ0 (d, W ) = sup{ρ0 ∧ κ0 ∧ d(1 − β); admissible parameters} ∈ (0, d] .
(2.5)
It is plain that χ0 has the same scaling invariance as c1 : χ0 (d, W (·)) = χ0 (d, η −2 W (
· η
)), η > 0 .
(2.6)
Our main result in this section is: Theorem 2.2. If χ ∈ (0, χ0 ), then there exists b > 0, such that for large `: i h c(d, ν) 1 c(d, ν) γ2 d − ≤ λ ((−`, `) ) ≤ + P ω 2+χ (log `)2/d (log `)2/d (log `)3/d (log `) d
(2.7)
≥ 1 − exp{−(log `) } . b
Proof. Consider an admissible choice of parameters and β 0 ∈ (β, 1) so that: χ < ρ ∧ κ ∧ d(1 − β 0 ) . As a result of (1.5) and (2.2), we see that for large `: P [F ∩ G] ≥ 1 − exp{−(log `)β } .
(2.8)
We know from (1.6) that on the event F , λ (T ) ≤ c(d, ν) + γ2 , we shall now see that when ` is large, on the event F ∩ G, λ (T ) ≥ c(d, ν) − χ . Our claim (2.7) will then follow, thanks to the scaling argument (1.2). Define Ce` to be the subcollection of C` of boxes B such that: (2.9) c(d, ν) + γ2 ≥ λ (B ∩ T ) .
(1.20) For large `, when ω ∈ F ∩ G, Ce` is not empty, and c(d, ν) + γ2 + ρ < M = 2c(d, ν). Thus using (1.15), for B ∈ Ce` : ¯ ∧ M − ρ = λ (B ∩ T \D) ¯ − ρ . λ (B ∩ T ) ≥ λ (B ∩ T \D) (2.10)
On the other hand, for large `, by (1.16) and (2.1): ¯ ≤ |B ∩ T \(D ∪ B)| + |B ∩ B| |B ∩ T \D| ≤
d ν
0
+ d(1−β ) + κ γ1d (log log `)d .
(2.11)
If we now use Faber-Krahn’s inequality (i.e. a ball of volume |U | has a lower principal Dirichlet eigenvalue for − 21 1 than U ), see Chavel [4], p. 87-92, ¯ ≥ λ− 1 1 (B ∩ T \D) ¯ ≥ λd (ωd /|B ∩ T \D|) ¯ 2/d λ (B ∩ T \D) 2 h i2/d 0 d ≥ λd ωd / + d(1−β ) + κ γ1d (log log `)d . ν
Since for large ` we also have for ω ∈ F ∩ G: (1.6)
λ (T ) ≥ inf λ (B ∩ T ) − 2d . e` B∈C Combining (2.10), (2.12), we see that for large ` and ω ∈ F ∩ G: νωd 2/d λ (T ) ≥ λd − χ . d
Our claim now follows from (0.7) and (1.2).
(2.12)
344
A.-S. Sznitman
Corollary 2.3. If χ ∈ (0, χ0 ), then c(d, ν) 1 d − 2+χ ≤ λω ((−`, `) ) (log `)2/d (log `) d 2γ2 c(d, ν) + . ≤ (log `)2/d (log `)3/d P-a.s. for large `,
(2.13)
Proof. Observe that λω ((−`, `)d ) is a decreasing function of `. Then (2.13) is an easy consequence of Borel Cantelli’s lemma and (2.7) applied to blocks (−2k , 2k )d , k ≥ 1, with an exponent χ0 ∈ (χ, χ0 ), (one uses also χ0 ≤ d, and the constant 2 in front of γ2 takes care of the interpolation of ` between 2k and 2k+1 ).
3. Fluctuations of Principal Eigenvalues We shall develop in this section estimates on the size of fluctuations of λω (B) around a median, when B is a “typical large block”. These estimates are less quantitative than the controls derived in the previous section, however they single out the “correct size of fluctuations”. The strategy to derive our results is as follows. We compare the distribution of λω (4B) to the distribution of a minimum of a certain number of iid copies of λω (B). This provides estimates on the downward shift of the median of λω (4B) relative to that of λω (B), in terms of the size of fluctuations of λω (B) around this median. We shall then pile up these estimates over several scales and produce upper and lower estimates for the discrepancy of the medians of λω (B) and λω (2k B), when B and 2k are large and suitably chosen. On the other hand this discrepancy can be estimated with the help of the confidence intervals from Theorem 2.2. This combination will produce our desired estimates on the size of fluctuation (see Theorem 3.2). We first need some notations. For u ∈ (0, 1) and ` > 0, we consider the u-quantile of the distribution of λω ((−`, `)d ): m` (u) = inf{λ > 0, P[λω ((−`, `)d ) ≤ λ] ≥ u} , so that
(3.1)
P[λω ((−`, `)d ) ≤ m` (u)] ≥ u, P[λω ((−`, `)d ) ≥ m` (u)] ≥ 1 − u .
(3.2)
We also introduce the set d
d
S = {u¯ = (u0 , u1 , u2 ) ∈ (0, 1)3 ; 1 − u0 = (1 − u1 )2 , 1 − u0 = (1 − u2 )2·3 } . (3.3) Clearly, when u¯ = (u0 , u1 , u2 ) ∈ S, 0 < u2 < u1 < u0 < 1. We now proceed with our basic estimates on the downward shift of medians. Lemma 3.1. Let K be a compact subset of S. Then for large ` and u¯ ∈ K: m4` (u0 ) ≤ m` (u1 ), and
(3.4)
m2` (u0 ) ≥ m` (u2 ) − (log 2`)−2− d . 2
(3.5)
Fluctuations of Principal Eigenvalues
345
Proof. We begin with the proof of (3.4). Observe that (−4`, 4`)d contains at least 2d blocks, which are translates of (−`, `)d and lie at mutual distance > 2`. If we assume that ` > a(W ) ,
(3.6)
(see (1.11) for the notation), the principal eigenvalues associated to these 2d blocks are iid variables distributed like λω ((−`, `)d ), and their minimum is larger than λω ((−4`, 4`)d ). It follows that for u¯ ∈ K: P[λω ((−4`, 4`)d ) > m` (u1 )] ≤ P[λω ((−`, `)d ) > m` (u1 )]2
d
d
= (1 − P[λω ((−`, `)d ) ≤ m` (u1 )])2 ≤ 1 − u0 . This now implies our claim (3.4). Let us now prove (3.5). Assume ` large enough so that γ1 (log log 2`)(log 2`)1/d < ` (see (1.14) for the notation). Then any block of the form [(log 2`)1/d B] ∩ (−2`, 2`)d , B ∈ C2` , is included in one of the 3d blocks: Bv = (−`, `)d + `v, v ∈ {−1, 0, 1}d , so that
inf λω ([(log 2`)1/d B] ∩ (−2`, 2`)d ) ≥ inf λω (Bv ) .
(3.8)
v
B∈C2`
Observe now that the λω (Bv ) are increasing functions of ω ∈ (for the usual order of (point) measures). The F KG inequality for Poisson point measures (see Janson [11], Lemma 2.2) implies that: inf λω (Bv ) stochastically dominates the minimum v
(3.9)
of 3d iid copies of λω ((−`, `)d ) . If we now use (1.5), (3.8), (3.9), we find P[λω ((−2`, 2`)d ) < m` (u2 ) − (log 2`)− d −2 ] ≤ 2
P[ inf λω ([(log 2`)1/d B] ∩ (−2`, 2`)d ) < m` (u2 )] + 3(2`)−d ≤ B∈C2`
P[inf λω (Bv ) < m` (u2 )] + 3(2`)−d ≤ 1 − (P[λω ((−`, `)d ) ≥ m` (u2 )])3 v u d √0 + 3(2`)−d ≤ 1 − (1 − u2 )3 + 3(2`)−d = + 3(2`)−d < u0 , 1 + 1 − u0
d
when 3(2`)−d < u0
p
1 − u0 (1 +
p
1 − u0 )−1 .
This implies (3.5) and finishes the proof of the lemma.
(3.10)
(3.11)
346
A.-S. Sznitman
Our main result is now: Theorem 3.2. There exist positive constants γ3 (d, ν), γ4 (d, ν) such that when K is a compact subset of S and χ ∈ (0, χ0 (d, W ) ∧ 1), X 2 1 lim sup (log 2k ) d +1 (m2k (u0 ) − m2k (u1 )) < γ3 , (3.12) k0 →∞
u∈K ¯
lim
inf
k1 − k0
1 k1 − k0
¯ k0 →∞ u∈K
k0 ≤k
X
2
(log 2k ) d +1 (m2k (u0 ) − m2k (u2 )) > γ4 ,
(3.13)
k0 ≤k
with the notation
1− χ d
k1 = k0 + [k0
].
(3.14)
Proof. As a result of (3.4), when k is large, for any u¯ ∈ K: m2k (u1 ) ≤ m2k (u0 ) ≤ m2k−2 (u1 ) ,
and therefore:
0 ≤ m2k (u0 ) − m2k (u1 ) ≤ m2k−2 (u1 ) − m2k (u1 ) . Summing these inequalities over k ∈ [k0 , k1 − 1], we find: X m2k (u0 ) − m2k (u1 ) ≤ m2k0 −2 (u1 ) + m2k0 −1 (u1 ) k0 ≤k
−m2k1 −2 (u1 ) − m2k1 −1 (u1 ) .
(3.15)
(3.16)
It follows from (2.7) that for large k, uniformly in u¯ ∈ K, the right member of (3.16) is smaller than: i h γ2 c(d, ν) 1 c(d, ν) + − + 2 ((k0 − 2) log 2)2/d ((k0 − 2) log 2)3/d (k1 log 2)2/d (k1 (log 2)) 2+χ d hk − k 1 i k1 − k0 1 0 ≤ γ 0 (d, ν) + , ≤ γ(d, ν) 2 2+χ 2 +1 k0 d k0 d +1 k0 d where we used χ < 1 in the first inequality. This easily implies (3.12). As for (3.13), we know from (3.5) that for large k0 and u¯ ∈ K: X m2k (u0 ) − m2k+1 (u0 ) m2k0 (u0 ) − m2k1 (u0 ) = ≤
X
k0 ≤k
[m2k (u0 ) − m2k (u2 ) + (log 2k0 )− d −2 ] . 2
(3.17)
k0 ≤k
If we use (2.7), with χ < χ0 < χ0 ∧ 1, we also have for large k0 : c(d, ν) 1 − 2+χ0 2/d (k0 log 2) (k0 log 2) d c(d, ν) γ2 − − (k1 log 2)2/d (k1 log 2)3/d k1 − k0 ≥ γ 00 (d, ν) . 2 +1 k0d
m2k0 (u0 ) − m2k1 (u0 ) ≥
Our claim (3.13) now follows from (3.17) and (3.18).
(3.18)
Fluctuations of Principal Eigenvalues
347
Remark 3.3. It should be observed that γ3 and γ4 solely depend on d and ν. This may be seen as an indication that the “limiting behaviour of fluctuations” might not depend on W (·). A simple example of this type of behavior is the case of hard obstacles S xi + (−a, a), when d = 1. In this case the length M` of the largest interval in (−`, `)\ i (xi −a, xi +a) has fluctuations governed by a double exponential distribution: i h 1 P M` − log 2`ν + 2a ≤ x −→ exp{−e−νx }, x ∈ R , ν
`→∞
and the principal eigenvalue is simply: λω ((−`, `)) = One can define a median: λ(`) =
π2 . 2M`2
(πν)2 , so that P[λω (−`, `)) ≥ λ(`)] −→ e−1 2(log 2`ν − 2aν)2 `→∞
and easily check that for µ ∈ R, P[(log `)3 (λω ((−`, `)) − λ(`)) ≥ µ] −→ exp
n
µ
− e (νπ)2
o .
`→∞
In this example, the parameter a, which governs the “shape of the hard obstacles”, affects the behavior of the median, but not the limiting behavior of fluctuations. Corollary 3.4. For any I = [v, u] ⊂ (0, 1), there exist a constant 0(I, d, ν) > 0, such that for χ ∈ (0, χ0 ∧ 1), with the notation (3.14): X 2 1 lim (log 2k ) d +1 (m2k (u) − m2k (v)) < 0 . (3.19) k0 →∞ k1 − k0 k0 ≤k
Moreover for any γ > 0 and u ∈ (0, 1) (resp. v ∈ (0, 1)) there exist I = [u, v] ⊂ (0, 1), with v (resp. u) depending solely on γ, d, ν, such that for χ ∈ (0, χ0 ∧ 1): X 2 1 (log 2k ) d +1 (m2k (u) − m2k (v)) > γ . (3.20) lim k1 − k0 k0 →∞ k0 ≤k
Proof. Let us prove (3.19). Given 0 < v < u < 1, we can choose a finite sequence u¯ (1) , . . . , u¯ (p) in S, so that (p−1) (p) (p) (1) (2) = u(p) u = u(1) 0 > u1 , with u1 < v , 0 > u1 = u0 > . . . > u1
and p only depending on I and d. Adding the inequalities (3.12) for u¯ (1) , . . . , u¯ (p) , proves (3.19), with 0 = p · γ3 . The proof of (3.20) is similar. One now chooses p so that p γ4 > γ, and a sequence u¯ (1) , . . . , u¯ (p) , in S with (i+1) , for 1 ≤ i < p , and u(i) 2 = u0
u = u(1) (resp. v = u(p) 2 ). 0 Our claim then follows from (3.13).
348
A.-S. Sznitman
Corollary 3.5. For any γ > 0, P-a.s. for infinitely many k, λω ((−2k , 2k )d ) γ (3.21) > λω ((−2k+2 , 2k+2 )d ) + . 2 (log 2k ) d +1 Proof. Consider γ > 0, and 0 < u < v < 1, for which (3.20) holds with 2γ in place of γ. Define by induction an increasing sequence of integers ki , i ≥ 1, such that: 2ki +3 < (log 2ki+1 )1/d , and
(3.22)
ki −1− d2
. m2ki (u) − m2ki (v) ≥ 2γ(log 2 ) Then define the events: Ai = {λω (B2ki + 3 · 2ki e1 ) ≤ m2ki (v)} T {λω ((log 2ki )1/d B ∩ B2ki ) ≥ m2ki (u)} , ∩
(3.23)
(3.24)
B∈C2ki
B∩[−2,2]d =∅
i ≥ 1, with e1 = (1, 0, . . . , 0) ∈ Rd , the first vector of the canonical basis, and B` = (−`, `)d , for ` > 0. We easily deduce from (3.22) that for large enough i0 , the Ai , i ≥ i0 , are independent, and for i ≥ i0 , P (Ai ) = P(λω (B2ki + 3 · 2ki ei ) ≤ m2ki (v)] P[for B ∈ C2ki with B ∩ [−2, 2]d = ∅, λω ((log 2ki )1/d B ∩ B2ki ) ≥ m2ki (u)] ≥ P[λω (B2ki ) ≤ m2ki (v)] P[λω (B2ki ) ≥ m2ki (u)] ≥ v × (1 − u) > 0 . (3.25) It follows from Borel Cantelli’s lemma that: P[lim sup Ai ] = 1 .
(3.26)
With a second application of Borel Cantelli’s lemma and (1.5) we also have P-a.s. for large k, λω (B2k ) ≥ inf λω ((log 2k )1/d B ∩ B2k ) − (log 2k )−( d +2) . 2
B∈C2k
Moreover, when i is large enough, B ∈ C2ki and B ∩ [−2, 2]d 6= ∅, implies:
(3.27)
(log 2ki )1/d B ⊆ (−(log 2ki )2/d , (log 2ki )2/d )d , (see (1.4)), so that taking (2.13) into account, P-a.s. for large i, for any B ∈ C2ki with B ∩ [−2, 2]d 6= ∅ , i−2/d c(d, ν) h 2 log ki log 2 , λω ((log 2ki )1/d B) ≥ d 2 and this last quantity is of course for large i bigger than:
(3.28)
m2ki (u) ∼ c(d, ν) (ki log 2)−2/d , see (2.7) . Observe that clearly Ai ⊆ {λω (B2ki +2 ) ≤ m2ki (v)}. It now follows (3.26), (3.27), (3.28) that: ei ] = 1, where (3.29) P[lim sup A i
ei = {λω (B2ki +2 ) ≤ m2ki (v)} ∩ {λω (B2ki ) ≥ m2ki (u) − (log 2ki )− d2 −2 } . A In view of (3.23), this implies our claim (3.21).
(3.30)
Fluctuations of Principal Eigenvalues
349
4. Geometric Controls We shall now derive geometric controls in the two dimensional situation. We shall for instance show in Theorem 4.6 that with high P-probability, λω ((−`, `)2 ), when ` is large, is close to λω (D), where D is some open disc inside (−`, `)2 with radius of order R0 (log `)1/2 , and D receives almost no point of ω. With a slight variation of the argument, we shall also prove a “Wulff droplet” type result in Theorem 4.7 below. Indeed we shall show that conditional on λ ((−1, 1)2 ) ≤ c, with c > λ2 , there is with high P probability λ2 an empty disc of radius close to R inside (−1, 1)2 , where R is chosen so that R 2 = c. Let us also mention that with the help of the isoperimetric inequality of Hall [9], one should be able to extend Theorem A.1 in a suitable form to a d ≥ 3 setting, and consequently part of the results presented here. We first define the exponent (d = 2): 1 6
sup{1 ∧ 6α ∧ ρ0 ∧ 2(1 − β) ∧ κ0 ; i 1 over admissible parameter} ∈ 0, .
χ1 (W ) =
(4.1)
6
χ1 has the same scaling invariance as χ0 in (2.6). Our strategy is to analyse the consequences of the occurrence of the event F ∩ G, see (1.6), (2.1). Proposition 4.1. Consider χ ∈ (0, χ1 ), a choice of admissible parameters and β 0 ∈ (β, 1), with (4.2) χ < (1 ∧ 6α ∧ ρ ∧ 2(1 − β 0 ) ∧ κ)/6 , e with radius then for large `, for any ω ∈ F ∩ G and B ∈ Ce` , there exists a disc D r 2 χ R = R0 + , where R0 = , such that (4.3) πν e ∪ D)| ≤ 2χ . |B ∩ T \(D
(4.4)
(We use here the notations of Sects. 1 and 2). Proof. If χ0 ∈ (χ, (1 ∧ 6α ∧ ρ ∧ 2(1 − β 0 ) ∧ κ)/6), it follows from (1.6), (2.10), (2.11) that for large `, any ω ∈ F ∩ G and B ∈ Ce` : 0
λ (B ∩ T \D) ≤ c(2, ν) + 6χ , and |B ∩ T \D| ≤
2 ν
0
+ 6χ .
(4.5) (4.6)
Since c(2, ν) = λ2 R0−2 and ν2 = πR02 , our claim follows from Theorem A.1 of the 0 Appendix, with the choice h = 2χ . Our main technical step is now Proposition 4.2. Consider χ ∈ (0, χ1 ), a choice of admissible parameters and β 0 ∈ (β, 1) so that (4.2) holds. Then for large `, when ω ∈ F ∩ G and B ∈ Ce` , there exists a disc D with radius R (see (4.3)) with: λ (B ∩ T ∩ D) ≤ λ (B ∩ T ) + 6 .
(4.7)
350
A.-S. Sznitman
Proof. The argument is essentially a variation on the proof of Proposition 1.4 of [18]. We first choose χ0 ∈ (χ, (1 ∧ 6α ∧ ρ ∧ 2(1 − β 0 ) ∧ κ)/6). We can apply Proposition e with 4.1, with χ0 . Thus for large `, ω ∈ F ∩ G and B ∈ Ce` , there exists a closed disc D radius e = R0 + χ0 , so that (4.8) R e ∪ D)| ≤ 2χ . |B ∩ T \(D
(4.9)
e. O = B ∩ T \D
(4.10)
0
We now define the open set
Lemma 4.3. There exists a numerical constant c2 > 0, such that when ` is large, ω ∈ F ∩ G, B ∈ Ce` , 0 λ (O) ≥ c2 −2χ . (4.11) Proof. The argument is close to the proof of Proposition 1.3 of [18]. It relies on the application of Lemma A.2 of [18]. We first introduce some notations. We let Z. stand for the canonical two dimensional Brownian motion and Px for the two dimensional Wiener measure starting from x ∈ R2 . When U is an open set in R2 , TU denotes the exit time of Z. from U , whereas for C a closed set in R2 , HC stands for the entrance time of Z. in C. We define 0 (4.12) r = χ , and introduce the stopping time S1 = inf{s ≥ 0, kZs − Z0 k ≥ 3r}, where kzk = sup |zi |, z ∈ R2 .
(4.13)
i
Observe that when ` is large, for ω ∈ F ∩ G, B ∈ Ce` , (4.9) implies that: 0
|B∞ (x, r) ∩ (Oc ∪ D)| ≥ |B∞ (x, r)| − |O\D| ≥ 32χ , x ∈ R2 , with the notation B∞ (x, r) = {z ∈ R2 , kz − xk < r} .
(4.14)
Since χ0 < α, it follows, with the notations of Sect. 1, that for large `: L−nα < r .
(4.15)
If we now P define Hnα as in (4.13) with 3r replaced by L−nα , we find for x ∈ O, with i V (·, ω) = −2 W ( ·−x ), and θt,t≥0 , the canonical shift i
h
n
Z
oi V (Zs , ω)ds ≤ 0 h ≤ HD∪Oc ] + Px HD∪Oc < TB∞ (x,2r) < S1 < TO ,
Ex S1 < TO , exp
−
S1
Px [TB∞ (x,2r) oi n Z HD∪Oc +Hnα ◦θHD∪Oc V (Zs , ω)ds ≤ exp − HD∪Oc h 1 − Px [HD∪Oc < TB∞ (x,2r) ] + Ex HD∪Oc < TB∞ (x,2r) ∧ TO , h n Z Hnα oii E ZH exp − V (Z , ω)ds . s c D∪O
0
(4.16)
Fluctuations of Principal Eigenvalues
351
Observe that on {HD∪Oc < TB(x,2r) ∧ TO }, ZHD∪Oc ∈ D; so by (1.20) and (1.36) of [18], as soon as ` is large, for any ω ∈ and z ∈ D: h n Z Hnα oi 1 V (Zs , ω)ds ≤ . (4.17) Ez exp − 2
0
We thus see that when ` is large, ω ∈ F ∩ G, B ∈ Ce` , and x ∈ O: n Z S1 oi V (Zs , ω)dx Ex [S1 < TO , exp −
(4.18)
0
≤1−
1 2
Px [HD∪Oc < TB∞ (x,2r) ] ≤ γ6 < 1 ,
with γ6 a numerical constant, using scaling and (4.14). We now choose λ = c2 /r2 , with c2 a small enough numerical constant such that (using scaling once more): E0 [exp{2λ S1 }] = c2 < γ6−1 .
(4.19)
If we now let O play the role of U in Lemma A.2 of [18], (A.5), (A.7) are immediately fulfilled. Using (4.18), (4.19), together with Cauchy Schwarz’s inequality, we find: Z S1 n h oi2 def V (Zs , ω)ds α2 = sup Ex S1 < TO , exp λS1 − x 0Z (4.20) S1 h n oi ≤ c¯2 · sup Ex S1 < TO , exp − 2 V (Zs , ω)ds < 1 . x∈O
0
Our claim λ (O) ≥ c2 /r is now a consequence of (A.8) of [18].
2
We now proceed with the proof of Proposition 4.2. For large `, when ω ∈ F ∩ G, e with radius R = R0 + χ > R e . B ∈ Ce` , we define D as the disc concentric to D We shall now apply Theorem A.3 of [18] to control the shift λ (B ∩T ∩D)−λ (B ∩ T ). In the notations of Theorem A.3 of [18], B ∩ T will play the role of U2 , whereas B ∩ T ∩ D will play the role of U1 . Recall that M has been chosen in (1.20). When ` is large enough, it follows from Lemma 4.2 that: λ (O) ≥ c2 r−2 > 4M .
(4.21)
If we now define the stopping time τ = HD e,
(4.22)
then (A.11) of [18] is obviously fulfilled. We then define λ = (λ (T ∩ B ∩ D) ∧ M − 6 )+ ,
(4.23)
either λ = 0, in which case λ (T ∩ B ∩ D) ∧ M ≤ 6 , and (4.7) follows, if 6 < M , or λ > 0, in which case 6 0 < λ ≤ λ (T ∩ B ∩ D) 1 − . M
(4.24)
352
A.-S. Sznitman
It now follows from Proposition A.1 of [18] that Z ∞ h n Z u oi def λu A = 1 + sup λe Ex TT ∩B∩D > u, exp − V (Zs , ω)ds du x 0 0 M 2 ≤ K(d = 2) · 6 .
(4.25)
Moreover since λ ≤ M , and τ ∧ TT ∩B = TO , it follows from (4.21) that: Z ∞ h n Z u oi def B = sup λeλu Ex τ ∧ TT ∩B > u, exp − V (Zs , ω)ds du < ∞ , x∈T / ∩B∩D
0
0
and (A.12) of [18] holds. Consider then: Z h n def sup Ex τ < TT ∩B , exp λτ − C = x∈T / ∩B∩D
oi
τ
V (Zs , ω)ds
.
(4.26)
0
The above expectation vanishes when x ∈ / T ∩ B. It thus suffices to consider x ∈ T ∩ B\D. Using Cauchy Schwarz’s inequality: Z τ h n oi sup Ex τ < TT ∩B , exp 2λτ − V (Zs , ω)ds C2 ≤ x∈T ∩B\D 0 (4.27) h n Z τ oi Ex τ < TT ∩B , exp − V (Zs , ω)ds . · sup x∈T ∩B\D
0
If one use (A.16) of [18], to control the first term in the right member of (4.27), we find Z ∞ h n Z u oi 2M e2M u Ex TO > u, exp − V (Zs , ω)dx du C 2 ≤ 1 + sup x 0 0 h n Z τ oi · sup Ex τ < TT ∩B , exp − V (Zs , ω)ds . x∈T ∩B\D
0
(4.28) Now when x ∈ T ∩ B\D, the process must perform at least (m + 1) successive dis0 e with: placements at distance r = 3χ , before reaching D, √ √ 0 0 0 def (4.29) m = [(χ − χ )/3 2 χ ] = [(χ−χ − 1)/3 2] . It now follows from a repeated use of the strong Markov property and (4.18) that the second term in the right member of (4.28) is smaller than γ6m , and using (A.2) of [18] and (4.21) to control the first term: C 2 ≤ K(d = 2) 22 γ6m , so that A · C ≤ K(d = 2, M ) −12 γ6m < 1, when ` is large . It now follows from Theorem A.3 of [18] that λ ≤ λ (B ∩ T ) ≤ c(d, ν) + γ2 < M , and therefore keeping in mind the definition of λ in (4.23) λ (B ∩ T \D) ≤ λ (B ∩ T ) + 6 . This finishes the proof of our claim .
(4.30)
Fluctuations of Principal Eigenvalues
353
Our last step before our main result is: Proposition 4.4. Consider χ ∈ (0, χ1 ), a choice of admissible parameters, and β 0 ∈ (β, 1) so that (4.2) holds. Then for large `, when ω ∈ F ∩ G, there exist B ∈ C` and two concentric discs Di ⊆ De with respective radii: Ri = R0 − χ/4
(4.31)
Re = R0 + χ , so that
(4.32)
c(2, ν) + (γ2 + 1) ≥ λ (B ∩ T ∩ De ) ≥ λ (T ) ≥ λ (B ∩ T ∩ De ) − 2 ,
4
no point of ω falls in Di . 0
(4.33) (4.34)
0
Proof. Choose χ ∈ (χ, (1 ∧ 6α ∧ ρ ∧ 2(1 − β ) ∧ κ)/6), we can apply Proposition 4.2, with χ0 . For large `, ω ∈ F ∩ G, we can find a B ∈ Ce` and an open disc D with radius 0 R0 + χ such that: c(2, ν) + γ2 ≥ λ (B ∩ T ) ≥ λ (T ) ≥ λ (B ∩ T ) − 4 , and
(4.35)
λ (B ∩ T ) + 6 ≥ λ (B ∩ T ∩ D) .
(4.36)
Define Di and De the concentric discs to D with respective radii (4.31) and (4.32). Then clearly (4.33) holds if ` is large enough. Moreover, it follows from scaling and Theorem A.2 from the Appendix (with η = / 0 (R0 + χ )) that when ` is large and ω ∈ F ∩ G, is such that ω puts some point in Di , λ (D) ≥ c(2, ν) + γ7 (ν, W ) 6
0 4χ/4 − γ8 (ν) χ 1 log
(4.37)
> c(2, ν) + γ2 + , which contradicts (4.35), (4.36). This completes the proof of Proposition 4.4.
Remark 4.5. In the case of hard obstacles, one can do slightly better, using (A.21) in place of (A.22), and as a result replace χ/4 by χ/2 in (4.31). Define for χ > 0, ` > 1, Lχ,` = {z = q(log `)
1−χ 2
, with q ∈ Z2 such that B(z, 2R0
p log `) ⊂ (−`, `)2 } . (4.38)
Our main result is now: Theorem 4.6. If χ ∈ (0, χ1 ) and ` > 1, there exists an Lχ,` -valued random variable Xχ,` and b > 0, so that p 1−χ (4.39) De = B(Xχ,` , R0 log ` + (log `) 2 ) , p 1 Di = B(Xχ,` , R0 log ` + (log `) 2 −χ/8 ) , (4.40) satisfy for large ` h c(2, `) (γ2 + 1) 2 + ≥ λω (De ) ≥ λω ((−`, `)2 ) ≥ λω (De ) − , P log ` (log `)3 (log `)3/2 i and ω has no point in Di ≥ 1 − exp{−(log `)b } .
(4.41)
354
A.-S. Sznitman
Proof. Consider a choice of admissible parameters, β 0 ∈ (β, 1) and χ0 such that χ0 ∈ (χ, (1 ∧ 6α ∧ ρ ∧ 2(1 − β 0 ) ∧ κ)/6) . Let H be the event o n 6= 0 . (4.42) 6 ∅ =⇒ ω B qχ , R0 − χ/4 H = ∀q ∈ Z2 , B(qχ , 2R0 ) ∩ ∂T = Observe that when ` is large, when ω ∈ F ∩ G ∩ H we can find by Proposition 4.4 two 0 0 concentric discs with radii Ri = R0 − χ /4 and Re = R0 + χ for which (4.33), (4.34) hold. χ We can then find q ∈ Z2 such that qχ lies within distance √ from the center of 2 these discs. Therefore for large ` and ω ∈ F ∩ G ∩ H: B(qχ , R0 − χ/4 ) ⊂ Di ⊂ De ⊂ B(qχ , R0 + χ ) , 6 ∅ and ω(Di ) = 0, qχ ∈ Lχ,` . Moreover when ` is large, in view and since De ∩ T = of (2.8): P [F ∩ G ∩ H] ≥ 1 − exp{−(log `)β } − P [H c ] ,
(4.43)
and for large `: P [H c ] ≤ const(ν)
o n ν o n 1 ` χ/4 2 (log `) , (4.44) exp − π(R − ) ≤ exp − 0 2χ 2 2
where we used νπR02 = 2. Using scaling, it now easily follows from this that we can construct an Lχ,` -valued variable Xχ,` , for which (4.41) holds. We shall now close this section with an application to the study of the behavior of the obstacles under P , conditional upon the occurrence of the exceptional event λ ((−1, 1)2 ) ≤ c, where c > λ2 = λ− 1 1 (B(0, 1)) is some fixed number. We shall see 2 below that with conditional probability tending to 1 as tends to 0, there is an empty circular droplet in (−1, 1)2 with radius of order R, where λ2 R−2 = c. We first define a slightly different scaling invariant exponent, (see (2.6)): n 1o . χ2 (W ) = sup (6α ∧ ρ0 ∧ 2(1 − β) ∧ κ0 )/6; over admissible parameters with β > 2 (4.45) Theorem 4.7. Consider c > λ2 , R ∈ (0, 1) with c = λ2 /R2 , and χ ∈ (0, χ2 ). Then: lim P [E/λ ((−1, 1)2 ) ≤ c] = 1, with
→0
E=
[ q∈Z2
(4.46)
{λ ((−1, 1)2 ∩ B(qχ , R + χ )) ≤ c + 6 , ω(B(qχ , R − χ/4 )) = 0} . (4.47)
Fluctuations of Principal Eigenvalues
355
Proof. Pick an admissible choice of parameters, β 0 , χ0 such that 21 < β < β 0 < 1, and χ0 ∈ (χ, (6α ∧ ρ ∧ 2(1 − β 0 ) ∧ κ)/6). Observe that for small : n ν o P [λ ((−1, 1)2 ) ≤ c] ≥ exp − 2 π(R + a)2 (4.48) o n ν ν = exp − 2 πR2 − 2πRa − νπa2 . Define the event e = {|(−1, 1)2 \(D ∪ B)| ≤ πR2 + 2(1−β 0 ) } , G
(4.49)
then by a similar calculation as in (2.4), for small o n e c ] ≤ exp − ν πR2 − −2β , P [G 2 and since β > 21 , we see from (4.48) that for small e ((−1, 1)2 ) ≤ c] ≥ 1 − exp{− 1 −2β } . P [G/λ 2
(4.50)
e ∩ {λ ((−1, 1)2 ) ≤ c}, then the proof of Propositions If we now consider the event G e ∩ {λ ((−1, 1)2 ) ≤ c}, we can find 4.1, 4.2, 4.4 show that when is small and ω ∈ G 0 0 two concentric discs Di ⊂ De with respective radii R + χ and R − χ /4 such that λ (De ∩ (−1, 1)2 ) ≤ c + 6 and ω(Di ) = 0. Therefore when is small enough, e ∩ {λ ((−1, 1)2 ) ≤ c} ⊆ E , G and our claim (4.46) follows.
5. Applications to Random Scales The object of this section is to develop the applications of Sects. 2 and 3 to the study of the large t behaviour of the variational problem (0.3). Before stating and proving results, we shall recall some facts and provide some additional comments complementing the discussion of the introduction. It is standard to argue that for ω ∈ , the function ` ∈ (0, ∞) → λω ((`, `)d )
(5.1)
is continuous decreasing, tends to +∞ as ` tends to 0, (see for instance Lemma 1.1 of [17], I). Moreover, see (2.13): P-a.s. λω ((−`, `)d ) ∼ c(d, ν)(log `)−2/d .
(5.2)
Therefore for t > 0, ω ∈ , the set of minima of Ft (`, ω) = ` + tλω ((−`, `)d ): Mt,ω = {` > 0, Ft (`, ω) = inf Ft (·, ω)} , is a nonempty compact subset of (0, ∞).
(5.3)
356
A.-S. Sznitman
The difficulty in studying the large t behavior of Mt,ω stems in part from the fact that the function Ft (·, ω) tends to be rather flat near its minima. For instance on the set of full P-measure where (5.2) holds, one has: Ft (`(t), ω) ∼ inf Ft (·, ω), as t → ∞ ,
(5.4)
whenever `(t) is a function tending to ∞ in such a fashion that: ∀χ ∈ (0, 1), tχ = o(`(t)), and `(t) = o(t(log t)−2/d ) .
(5.5)
A first naive guess for the location of Mt,ω is obtained by replacing λω ((−`, `)d ) with the equivalent quantity given in (5.2) and studying the minima of: ` ∈ (1, ∞) → ` + t
c(d, ν) , (log `)2/d
(5.6)
(see [13]). The above function is easily seen to have a unique minimum m(t) having the asymptotic behaviour m(t) ∼
2 c(d, ν) t , as t → ∞ . d (log t) d2 +1
(5.7)
However this is only a naive guess, and it is not difficult to construct using suitable “downward bumps,” continuous decreasing functions λ(`) on (0, ∞) with λ(`) ∼ c(d, ν)(log `)−2/d , ` → ∞, for which the loci of minima of ` ∈ (0, ∞) → ` + t λ(`) , 2
do not remain bounded in scale t/(log t) d +1 (see also (5.10) below). The heuristic principle (0.5), which relates the scale in which Mt,ω occurs, with the size of fluctuations of the principal eigenvalue associated to large blocks, turns out to be more helpful. It can indeed be shown in the case of dimension 1 that (see Sect. 5 of [17], I) h h 1i i t(log t)−3 = 1 . (5.8) lim lim P Mt,ω ⊆ ρ, ρ→0 t→∞ ρ In this case the typical size of fluctuations of λω ((−`, `)) is (log `)−3 and (5.8) gives a precise interpretation of the heuristic principle (0.5). Further support for (0.5) comes from (5.10) and its proof. There we are able to infer a certain behaviour of Mt,ω when d ≥ 2, using information on the size of fluctuations of the principal eigenvalue of large blocks (not on the location of the medians). Theorem 5.1. P-a.s. for χ ∈ (0, χ0 ∧ 1), for large t, h n o χ Mt,ω ⊂ t exp − (log t)1− d ,
P-a.s.
lim
t→∞
inf Mt,ω t(log t)− d −1 2
i
t (log t) =∞.
2+χ d
.
(5.9)
(5.10)
Fluctuations of Principal Eigenvalues
357
Proof. We begin with the proof of (5.9). Denote by 1 the set of full P-measure where (2.13) holds for large ` whenever χ ∈ (0, χ0 ∧ 1). From the upper bound of (2.13), we know that when ω ∈ 1 , for large t: h c(d, ν) 2 (2γ2 + 1) i + . (5.11) inf Ft (·, ω) ≤ Ft (t(log t)− d −1 , ω) ≤ t (log t)2/d (log t)3/d On the other hand, using the lower bound of (2.13), when ω ∈ 1 , χ ∈ (0, χ0 ∧ 1), for sufficiently large t: h c(d, ν) i 2+χ 1 + , (5.12) inf{Ft (`, ω); ` ≥ t(log t)− d } ≥ t (log t)2/d 2(log t) 2+χ d (where we have used (2.13) with χ0 > χ, and separately considered the cases ` ∈ 2+χ [t(log t)− d , t] and ` ∈ (t, ∞)). Since χ < 1, it follows from (5.11), (5.12) that: for ω ∈ 1 , χ ∈ (0, χ0 ∧ 1), Mt,ω ⊂ [0, t(log t)−
2+χ d
] for large t .
(5.13)
0
As for the lower bound part, observe that when ω ∈ 1 , and χ < χ < χ0 ∧ 1, then for large t: χ
inf{λω ((−`, `)d ); ` ∈ (0, t exp{−(log t)1− d }]} ≥ 1 c(d, ν) 1 c(d, ν) − ≥ + χ 2+χ . χ 2+χ0 2/d 1− 2/d 1− (log t) [log t − (log t) d ] d(log t) d [log t − (log t) d ] d Combining this and (5.11), we see that for large t: inf{Ft (`, ω), ` ∈ (0, t(log t)−
2+χ d
]} > inf Ft (·, ω) .
This finishes the proof of (5.9). Let us now prove (5.10). Denote by 2 , a set of full measure where (3.21) holds for all constants γ > 0. Pick a fixed γ > 0, and define: 2 1 (5.14) tk = 2k+2 (log 2k+2 ) d +1 , k ≥ 0 . γ When ω ∈ 2 , then for infinitely many k γ λω ((−2k , 2k )d ) > λω ((−2k+2 , 2k+2 )d ) + 2 (log 2k ) d +1 > λω ((−2k+2 , 2k+2 )d ) +
2k+2 . tk
As a consequence, for infinitely many k: Mtk,ω ⊂ [2k , ∞) . It now follows that when ω ∈ 2 and γ > 0: inf Mt,ω γ lim > . 2 − −1 8 d t(log t) This proves our claim (5.8).
(5.15)
Remark 5.2. In view of the above discussion and of the results we have just obtained, it is natural to wonder whether (5.8) holds in dimension d ≥ 2, with t(log t)−3 replaced by 2 t(log t)− d −1 .
358
A.-S. Sznitman
Appendix We prove here two results which were used in Sect. 4. We begin with a reinforcement of Faber-Krahn’s inequality, see [5], in the 2-dimensional context. This result is close to Sect. 4 of [14], or Sect. 3 of [16], see also [3]. The key ingredient apart from some obvious modifications to deriving similar results in a d ≥ 3 situation, is the replacement of the Bonnesen’s isoperimetric inequality below (see after (A.16)), by Theorem 1 and the remark at the end of Sect. 2 of Hall [9]. Theorem A.1. There is a numerical constant c > 0, such that for all non empty finite area open set U ⊆ R2 , and all h ∈ (0, √1 ), there exists a disc D with radius |U |
r R= for which |U \D| ≤ with the notation η=
√ η |U | +c , π h
p |U | (η + 2h |U |) , 1+η
λ− 1 1 (U )|U | 2
λ2 π
−1≥0,
(A.1)
(A.2)
(A.3)
(by Faber-Krahn’s inequality). Proof. Using inner approximation we can assume that U is smooth and bounded. Let U1 be a connected component of U with λ− 1 1 (U1 ) = λ− 1 1 (U ) , 2
2
and denote by ψ ≥ 0, the L2 (U1 ) normalized eigenfunction associated to λ− 1 1 (U1 ), 2 and by Mψ its maximum, so that 1 . Mψ ≥ p |U |
(A.4)
At = |{ψ > t}|, t ∈ (0, Mψ ) ,
(A.5)
Let us also introduce and for t a regular value in (0, Mψ ), dσt = the length measure on {ψ = t} ,
(A.6)
Z Lt = Lot =
√
{ψ=t}
dσt , the length of {ψ = t} ,
4πAt , the perimeter of a disc of area At .
(A.7) (A.8)
Applying the co-area formula to ψ, whose stationary points have null area, see Chavel [5], p. 85-89, or Burago-Zalgaller [4], p. 103-105, we find:
Fluctuations of Principal Eigenvalues
359
Z Mψ Z Z Mψ Z 1 1 L2 |∇ψ|2 dx = |∇ψ|dσt = |∇ψt |dσt − t0 dt 2 0 2 0 |At | Z Mψ 2 1 Z Mψ Lo2 λ π 1 λ λ2 π Lt − Lo2 π 2 2 t t + dt + dt − + = (1 + η) , 0 0 2 0 |At | 2 0 |At | |U1 | |U1 | |U | (A.9) using the definition of η and ψ for the last expression. For regular values of t: Z dσt , (A.10) A0t = − |∇ψ| 1 2
Z
so that by Cauchy Schwarz’s inequality: Z
L2t ≤ |A0t |
|∇ψ|dσt .
(A.11)
On the other hand if ψ∗ is the decreasing symmetric rearrangement of ψ Z Mψ o2 Z 1 λ2 π Lt 1 dt = |∇ψ∗ |2 dy ≥ , 2 0 |A0t | 2 |U1 | (see [5], p. 87-89), and Lt ≥ Lot , (isoperimetric inequality). We thus see that the third member of (A.9) is a sum of non negative contributions and Z Mψ 2 1 λ2 πη 1 λ2 πη Lt − Lo2 1 t dt ≤ + λ − ≤ . (A.12) π 2 2 0 |A0t | |U | |U | |U1 | |U | Since Lo2 = 4π At , we find Z Mψ 2 Z 2 1/2 (Lt − 4π At ) dt ≤ 0
≤2
Mψ 0
L2t − Lo2 t dt |A0t |
Z
Mψ 0
|A0t | dt
λ2 π η × |U1 | ≤ 2λ2 πη . |U |
(A.13)
In view of (A.4) and (A.13), for h ∈ (0, √1 |) there exists a regular value of ψ, |U |
t0 ∈ (0, h) with:
(L2t0 − 4π At0 )1/2 ≤
2 p λ2 πη . h
Observe that: k(ψ − t0 )+ k2 ≥ kψk2 − kψ ∧ t0 k2 ≥ 1 − t0
(A.14)
p p |U | ≥ 1 − h |U | ,
and (ψ − t0 )+ ∈ H01 ({ψ > t0 }) with: Z Z 1 1 |∇(ψ − t0 )+ |2 dx ≤ |∇ψ|2 dx = λ− 1 1 (U ) . 2 2 2 It follows that
λ− 1 1 ({ψ > t0 }) ≤ λ− 1 1 (U )(1 − h 2
and by Faber Krahn’s inequality:
2
p |U |)−2 ,
(A.15)
360
A.-S. Sznitman
At0 = |{ψ > t0 }| ≥
p λ2 π λ2 π ≥ (1 − h |U |)2 λ− 1 1 ({ψ > t0 }) λ− 1 1 (U ) 2
p |U | (1 − h |U |)2 . = 1+η
2
(A.16)
We now want to use Bonnesen’s isoperimetric inequality, see Osserman [15]. The set {ψ > t0 } is not necessarily simply connected, and we introduce V , the complement of the unbounded component of {ψ > t0 }c . Then V ⊇ {ψ > t0 } is simply connected, and ∂V ⊆ {ψ = t0 }. If we denote by D the circumscribed disc to V and by R its radius, it follows from (16) in Theorem 3 of [12] that: p 2πR ≤ |∂V | + |∂V |2 − 4π|V | , where |∂V | denotes the length of ∂V q (A.17) ≤ Lt0 + L2t − 4π At0 , and since 0
Lt0 = (L2t0 − 4π At0 + 4π At0 )1/2 ≤ (4π At0 )1/2 + (L2t0 − 4π At0 )1/2 we see from (A.14) that: 2πR ≤
p 4 p 4π |U | + λ2 πη , h
(A.18)
and using (A.16): |U \D| ≤ |U \{ψ > t0 }| = |U | − At0 ≤ This proves our claim.
p |U | (η + 2h |U | − h2 |U |) . 1+η
(A.19)
The second result we shall now prove provides a lower bound on the shift of principal Dirichlet eigenvalue due to the presence of one obstacle falling in B(0, 1) ⊆ R2 . We shall write (A.20) λη (x) = λ− 1 1+η−2 W ( ·−x ) (B(0, 1)), η > 0, x ∈ R2 , 2
η
for the principal Dirichlet eigenvalue of − 21 1 + η −2 W ( ·−x η ) in B(0, 1). Here W (·) is as specified in the introduction, that is bounded measurable, compactly supported, not a.e. equal to 0, and a(W ) is defined in (1.11). We recall the following estimate from Lemma 4.6 of [16]. ¯ η) ⊂ There exists η0 > 0, such that for η ∈ (0, η0 ), and any compact set F ⊂ B(x, B(0, 1): (A.21) λ− 1 1 (B(0, 1)\F ) ≥ λ2 + γ9 capB(0,1) (F )(1 − |x| − η)2 , 2
where γ9 > 0 is a numerical constant and capB(0,1) (F ) denotes the capacity of F relative to B(0, 1). Our main object is the proof of: Theorem A.2. There exists γ10 (W ) such that when 0 < a(W ) η < η0 ∧ 41 , and x ∈ B(0, 1), 1 −1 (1 − |x| − aη)4+ , (A.22) λη (x) > λ2 + γ10 (W ) log aη 2 moreover Z the constant γ10 (W ) remains uniformly bounded away from 0, when a kW k∞ and 1/ W dy remain bounded.
Fluctuations of Principal Eigenvalues
361
Proof. We only need consider |x| < 1 − aη and aη < η0 ∧ 41 . We let ϕ denote the non negative normalized eigenfunction associated to λη (x). Let c > 0, denote some number. Then either Z z − x ϕ2 (z)dz > c , and therefore (A.23) η −2 W η λη (x) ≥ λ2 + c , or z − x η −2 W ϕ2 (z)dz ≤ c , so that η Z Z z − x n 2c o 1 1 ϕ2 > R W dy , dz ≤ η −2 W 2 η W dy Z
and therefore
Z η
−2
z − x n 2c o 1 ϕ2 ≤ R dz ≥ W η W dy
1 2
|F | ≥
1 2
η2
Z
(A.25)
Z W dy ,
from which it follows that the compact set Z o n ¯ F = B(x, aη) ∩ ϕ2 ≤ 2c/ W dy ⊆ B(0, 1) has area
(A.24)
W dy/kW k∞ .
(A.26)
(A.27)
(A.28)
If r stands for the reduced function of ϕ on F , then q.e.
r(x) = Ex [HF < TB(0,1) , ϕ(ZHF )] ,
(A.29)
see Theorem 4.4.1 of Fukushima [7], and ϕ − r and r are the respective orthogonal projections of ϕRon H01 (B(0, 1)\F ) an its orthogonal in H01 (B(0, 1)) relative to the Hilbertian norm 21 |∇f |2 dx, f ∈ H01 (B(0, 1)). We thus have: Z Z 1 1 2 |∇ϕ| dx ≥ |∇(ϕ − r)|2 dx ≥ λ− 1 1 (B(0, 1)\F ) kϕ − rk22 2 2 2 Z (A.30) rϕ dx , ≥ λ− 1 1 (B(0, 1)\F ) − γ11 2
where we used λ− 1 1 (B(0, 1)\F ) ≤ γ11 /2, for some numerical constant γ11 since 2 ¯ F ⊂ B(x, aη) and aη < 41 . If we let vF and eF denote respectively the equilibrium potential and equilibrium measure of F relative to B(0, 1), then from (A.29): Z q.e. (A.31) r ≤ (2c/ W dy)1/2 vF . If g(x, y) denotes the Green function relative to B(0, 1), Z 2c 1/2 Z ϕ(x) g(x, y) dx eF (dy) . rϕ dx ≤ R W dy Using Cauchy Schwarz’s inequality:
362
A.-S. Sznitman
Z
Z
ϕ(x) g(x, y)dx eF (dy) ≤ capB(0,1) (F ) · sup
g 2 (x, y)dx
1/2 ,
y
and a standard comparison argument shows Zthat the last quantity is finite. Combining (A.30), the above bound on
rϕ dx and (A.21), we find Z
h
λη (x) ≥ λ2 + γ9 (1 − |x| − aη) − γ12 (c/ 2
i W dy)1/2 capB(0,1) (F ) ,
(A.32)
with γ12 > 0, a suitable numerical constant. Moreover from Tsuji [19], p. 58, and a standard comparison argument πa2 kW k −1 1 1 −1 ∞ R ≥ γ15 γ16 + log , capB(0,1) (F ) ≥ γ13 γ14 + log 1 |F | W dy log aη (A.33) 1 > 1, in the last step. for suitable numerical constants, using (A.28) and log aη If we now choose the value Z 1 2 −2 c = γ9 γ12 W dy(1 − |x| − aη)4 (A.34) 4
our claim now readily Z follows from (A.24), (A.32), (A.33). Observe by the way that the constant a2 kW k∞ /
W dy is invariant under the scaling u−2 W ( u· ) of W (·).
References 1. Albeverio, S., Molchanov, S.A., Surgailis, D.: Stratified structure of the universe and Burger’s equation – A probabilistic approach. Probab. Th. Rel. Fields 100, 457–484 (1994) 2. Beliaev, A.Yu., Yurinsky, V.V.: The spectrum bottom of the Laplacian in a Poisson cloud. Siberian Adv. in Math. 5, No. 4, 113–150 (1995) 3. Bolthausen, E.: Localization of a two-dimensional random walk with an attractive path interaction. Ann. Probab. 22, 875–918 (1994) 4. Burago, Yu.D., Zalgaller, V.A.: Geometric inequalities. Berlin: Springer-Verlag, 1988 5. Chavel, I: Eigenvalues in Riemannian geometry. New York: Academic Press, 1984 6. Dobrushin, R.L., Kotecky, R., Shlosman, S.: Wulff construction: A global shape from local interaction. AMS Translations series 104, Providence R.I. Am. Math. Soc., 1992 7. Fukushima, M.: it Dirichlet forms and Markov processes. Kodansha, Amsterdam, Tokyo: North-Holland, 1980 8. Hall, P.: Distribution of size, structure and number of vacant regions in a high intensity mosaic. Z. f. Wahrsch. verw. Gebiete 70, 237–261 (1985) 9. Hall, R.R.: A quantitative isoperimetric inequality in n-dimensional space. J. reine angew. Math. 428, 161–176 10. Ioffe, D.: Exact large deviation bounds up to Tc for the Ising model in two dimensions. Probab. Th. Rel. Fields 102, 3, 313–330 (1995) 11. Janson, S.: Bounds on the distribution of extremal values of a scanning process. Stoch. Proc. and their Appl. 18, 313–328 (1984) 12. Janson, S.: Random coverings in several dimensions. Acta Math. 156, 83–118 (1986) 13. Krug, J., Halpin Healy, T.: Directed polymer in the presence of columnar disorder. J. Phys. I, France, 3, 2179–2198 (1993) 14. Melas, A.D.: The stability of some eigenvalue estimates. J. Differential Geometry 36, 19–33 (1992) 15. Osserman, R.: Bonnesen-style isoperimetric inequalities. Amer. Math. Monthly 86, 1-29 (1979)
Fluctuations of Principal Eigenvalues
363
16. Sznitman, A.S.: On the confinement property of Brownian motion among Poissonian obstacles. Comm. Pure Appl. Math. 44, 1137–1170 (1991) 17. Sznitman, A.S.: Brownian confinement and pinning in a Poissonian potential I, II. Probab. Th. Rel. Fields 105, 1–30, 31–56 (1996) 18. Sznitman, A.S.: Capacity and principal eigenvalues: The method of enlargement of obstacles revisited. Ann. Probab. 25, 3, 1180–1209 (1997) 19. Tsuji, M.: Potential theory in modern function theory. 2nd ed., New York: Chelsea, 1975 Communicated by J.L. Lebowitz
Commun. Math. Phys. 189, 365 – 371 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Entropy Production in Nonequilibrium Statistical Mechanics David Ruelle I.H.E.S., 91440 Bures-sur-Yvette, France Received: 15 July 1996/ /Accepted: 30 October 1996
Dedicated to the memory of Roland Dobrushin
Abstract: We consider systems of nonequilibrium statistical mechanics, driven by nonconservative forces and in contact with an ideal thermostat. These are smooth dynamical systems for which one can define natural stationary states µ (SRB in the simplest case) and entropy production e(µ) (minus the sum of the Lyapunov exponents in the simplest case). We give exact and explicit definitions of the entropy production e(µ) for the various situations of physical interest. We prove that e(µ) ≥ 0 and indicate cases where e(µ) > 0. The novelty of the approach is that we do not try to compute entropy production directly, but make it depend on the identification of a natural stationary state for the system.
Memories My first contact with Roland L. Dobrushin consisted in the exchange of scientific papers. I did not know (or care) that he was perhaps the greatest active probabilist of the time. But I had been working on a problem of statistical mechanics with Oscar Lanford [15], and Jean Lascoux pointed out to us that Dobrushin [7, 8] had published something on the same problem. As it turned out, our results complemented his in showing that translationally invariant Gibbs states are identical with equilibrium states (in present day terminology). Then Yasha Sinai [22, 23] obtained his great results on the use of Gibbs states in hyperbolic dynamical systems, complemented later by Rufus Bowen [2]. All these results are now well known, and used without reference (as they will be in the present paper), but participating in the discovery was a great experience. I first met Dobrushin in Moscow (the Soviet authorities did not allow him to travel to the West). He was quite outspoken about the regime, with the usual precaution of taking a little stroll in a park before saying anything compromising (not indoors!). I remember Roland Dobrushin as smiling and fearless. When I brought him a copy of a novel by Solzhenitsyn, he thanked me by saying it was “a gift for a king”, and not a word about
366
D. Ruelle
the danger of accepting such a gift. He told me about the “extraordinary stability” of the Soviet system, expressing the belief that it would take five hundred years before it went down. There he was wrong: he lived to see the collapse of Soviet Socialism. But Nomenklaturism is not dead, and anywhere in the world we may yet have to see the people crushed in the name of the People, and freedom and democracy destroyed in the name of Freedom and Democracy. Anywhere in the world we may yet have to take a little stroll in the park to express opinions that do not conform with the official Truth.
Introduction A revival of nonequilibrium statistical mechanics is currently taking place, using the ideas and methods of the ergodic theory of smooth dynamical systems (see in particular Chernov et al. [4], Bunimovich and Spohn [3], Gallavotti and Cohen [11], Gallavotti [10]). In the present note we adopt this dynamical systems point of view and give (for various cases of physical interest) explicit formulae for the entropy production in terms of nonequilibrium stationary states. We check that the entropy production is nonnegative, and we sometimes can prove that it is strictly positive. One way to approach nonequilibrium statistical mechanics is to try to define stationary states which might replace the Gibbs ensembles of equilibrium statistical mechanics. The definition of nonequilibrium stationary states of a system will involve some amount of idealization because the forces keeping the system outside of equilibrium normally produce heat, which has to be evacuated by a thermostat. We want in fact an ideal thermostat, such that its state is not altered by the heat it absorbs from the system. In what follows we want to study entropy production by a system in contact with an ideal thermostat and subjected to forces that maintain it outside of equilibrium. We shall propose various expressions (depending on the type of thermostat) for the entropy production associated with a nonequilibrium stationary state, and show that for a natural stationary state the entropy production is positive.
Time Evolution From now on we shall discretize the time, so that it takes integer values. (One may for instance consider a system with continuous time at integral multiples of some arbitrary time unit, or use a Poincar´e map). Time evolution is thus given by a diffeomorphism f of a smooth compact manifold M . The so-called Gaussian thermostat (see Hoover [13]), which constrains time evolution to some energy shell gives just such a pair (M, f ) (after discretization). The symplectic structure is lost because we use nonconservative forces. We shall need a volume element dx to define entropy, but any choice will give the same result for the entropy production. Note that the map f incorporates both the effect of the nonconservative forces and of the thermostat. Physically, f does not put energy into the system but may be thought of as pumping entropy out of it. We now list several types of time evolutions corresponding to different ideal thermostats. (These are discussed in more details in Ruelle [20, 21]; for a physically oriented introduction to dynamical systems see Eckmann and Ruelle [9]). (i)
Diffeomorphism f of manifold M . This has been discussed above; f has an inverse f −1 .
Entropy Production in Nonequilibrium Statistical Mechanics
367
(ii) Map f of M ; f is not assumed to have an inverse, and the folding of M by f will contribute to entropy production. (Example: f describes a shock of a particle in a gas with the container (=thermostat) and distinct initial states may give the same final state, see Chernov and Lebowitz [6]). (iii) Map f restricted to a neighborhood N of a compact invariant set X ⊂ M . As discussed by Gaspard and Nicolis [12], one can express diffusion coefficients in terms of the study of orbits spending a long time near X (see also [9]). (iv) Random dynamics. Let (, τ ) be a dynamical system with invariant probability measure P, and (fω )ω∈ a family of diffeomorphisms of M . The time k map is fτ k−1 ω ◦ · · · ◦ fτ ω ◦ fω (with ω distributed according to P). Since the action of a real thermostat is stochastic at the microscopic level, it is natural to describe it by random dynamics. A case of particular interest is when the fτ k ω are independent. (For this case see in particular Kifer [14], Ledrappier and Young [18], Liu and Qian [19], the recent paper by Bahnm¨uller and Liu [1], and references quoted there). Natural Stationary States The dynamical systems (M, f ) here considered typically only have singular invariant measures (no smooth invariant measure of the form ρ(x)dx). If we start with a smooth measure ρ(dx) = ρ(x)dx, apply time evolution to obtain f k ρ, and let somehow k → ∞, we obtain candidates for describing natural nonequilibrium states. Consider for definiteness the case (i) of a diffeomorphism f of M . Any limit ρ∗ when m → ∞ of m−1 1 X k f ρ m
(with ρ(dx) = ρ(x)dx)
k=0
is a natural nonequilibrium stationary state.
Another class of natural states are the
SRB measures (named after Sinai, Ruelle, Bowen, see Ledrappier and Young [17]). The SRB states are invariant measures µ satisfying X positive Lyapunov exponents, hf (µ) = where hf (µ) is the time entropy associated with f (see [9, 17] for details). Entropy Production Define the entropy of a smooth probability measure ρ(dx) = ρ(x)dx to be Z S(ρ) = − dxρ(x) log ρ(x). From this, and taking limits, one obtains expressions for the entropy production per time step ef (µ) associated with a natural stationary state µ in the above cases (i)-(iv). (We shall indicate how in case (i)). (i) Let J(x) be the absolute value of the Jacobian of f with respect to some Riemann metric on M . If ρ has density ρ and f ρ density ρ1 = (ρ ◦ f −1 .)/(J ◦ f −1 ), the entropy production is
368
D. Ruelle
Z e(ρ) = −[S(ρ1 ) − S(ρ)] = −
Z ρ(x)dx log J(x) = −
ρ(dx) log J(x).
Therefore we write for a general probability measure µ, Z ef (µ) = −
µ(dx)ρ(x) log J(x).
When µ is ergodic, this is also minus the sum of all Lyapunov exponents. (ii) We may write (disintegration of the f -invariant probability measure µ with respect to f ) Z µ(dx) = µ(dy)νy (dx), where νy isPa probability measure carried by f −1 {y}. If νy has mass cyα at xα let H(νy ) = − α cyα log cyα and define the folding entropy Z F (µ) = Here
µ(dy)H(νy ).
Z ef (µ) = F (µ) −
µ(dx) log J(x).
(iii) In this case we use the rate of escape from X under f , which is (up to a change of sign) P = sup{h(ρ) − ρ
X
positive Lyapunov exponents for (ρ, f )} ≤ 0,
where the sup is taken over f -ergodic measures ρ with support in A. We have then the following formula for the entropy production: Z eXf (µ) = −P −
µ(dx) log J(x).
The term −P corresponds to a renormalization of the probability to compensate for leakage out of a neighborhood of X. In the present situation it is natural to assume that the time evolution is Hamiltonian, so that J = 1, and the entropy production reduces to −P . (iv) A natural nonequilibrium measure is here a measure µ on × M such that its image by the projection × M → is P. Let Jω denote the absolute value of the Jacobian of fω , then Z ef (µ) = − µ(dωdx) log Jω (x).
Entropy Production in Nonequilibrium Statistical Mechanics
369
Positivity of Entropy Production We consider for definiteness the case (i) of a diffeomorphism f of M . For every f -ergodic measure µ we have X h(µ) ≤ positive Lyapunov exponents . Replacing f by f −1 , this gives also X h(µ) ≤ − negative Lyapunov exponents . For a SRB state µ we have h(µ) =
X
positive Lyapunov exponents
and subtracting the above inequality gives X 0≥ all Lyapunov exponents , hence ef (µ) ≥ 0. Consider now the probability measure ρ(dx) = ρ(x)dx with density ρ, and assume Pm−1 S(ρ) finite. If ρ∗ is a limit of ρ(m) = (1/m) k=0 ρk (with ρk = f k ρ) we have Z ef (ρ∗ ) = lim ef (ρ(m) ) = − lim dx ρ(m) (x) log J(x) =
m−1 Z m−1 1 X 1 X dx ρk (x) log J(x) = − lim − lim [S(ρk−1 ) − S(ρk )] m m k=1
=
k=1
1 lim [S(ρ) − S(ρm )], m
which is ≥ 0 because S(ρm ) is bounded above (by log volM ). While the above arguments are very simple, the corresponding problems are more difficult in cases (ii), (iii), (iv) and rigorous proofs exist only under specific assumptions (for which see [20, 21]).
Strict Positivity of Entropy Production When can one assert that ef (µ) > 0? In case (i), if µ is singular with respect to the Riemann volume and has no zero Lyapunov exponent, then ef (µ) > 0. This follows from a result of Ledrappier [16], see [20]. In case (ii) note that we may have ef (µ) = 0 even if F (µ) > 0. (If f : x 7→ 2x (mod1) and µ is Lebesgue measure, then ef (µ) =log2-log2=0). In case (iii) suppose that f satisfies Smale’s Axiom A, and X is a basic set. The natural measure µ then satisfies X h(µ) − positive Lyapunov exponents = P.
370
D. Ruelle
This is compatible with ef (µ) = 0 only if X is an attractor for f −1 and µ is the corresponding SRB measure on X (see [16]). Finally, consider the i.i.d. subcase of case (iv), i.e., (, P) = (AZ , pZ ) and we have maps fα : M → M independently distributed according to p(dα). Assume furthermore that there is a steady state with density m with respect to Riemann volume. The image of m(x)dx by fα has the density mα (x) = m(fα−1 )/Jα (fα−1 ) and the entropy production vanishes only if mα (x) = m(x) a.e. with respect to p(dα)m(x)dx. This follows from a remark by Kifer [14] (see also Ledrappier and Young [18], Ruelle [21]).
Further Problems (a) In case (ii) we know that for every f -ergodic measure µ X h(µ) ≤ positive Lyapunov exponents . Under what condition is the following true? X h(µ) ≤ F (µ) + |negative Lyapunov exponents|. (The case where f is piecewise smooth and 1/J bounded is dealt with in [20] ). (b) The results known in case (iii) assume that f is an Axiom A diffeomorphism and X a basic set. One can prove that the natural measure µ for this case satisfies X h(µ) − positive Lyapunov exponents = P. (See [20], and also the special case treated by Chernov and Markarian [5]). There remains the problem to treat more general situations.
References 1. Bahnm¨uller, J. and Liu Pei-Dong: Characterization of measures satisfying Pesin’s entropy formula for random dynamical systems. To appear 2. Bowen, R.: Equilibrium states and the ergodic theory of Anosov diffeomorphisms Lecture Notes in Math. 470. Springer, Berlin, 1975 3. Bunimovich, L.A. and Spohn, H: Viscosity for a periodic two disk fluid: an existence proof. Commun. Math. Phys. 176, 661–680 (1996) 4. Chernov, N.I., Eyink, G.L., Lebowitz, J.L. and Sinai, Ya.G.: Steady-state electrical conduction in the periodic Lorentz gas. Commun. Math. Phys. 154, 569–601 (1993) 5. Chernov, N.I. and Markarian, R.: Ergodic properties of Anosov maps with rectangular holes. To appear 6. Chernov, N.I. and Lebowitz. J.L.: Stationary shear flow in boundary driven Hamiltonian systems. Phys. Rev. Letters 75, 2831–2834 (1995); Stationary nonequilibrium states in boundary driven Hamiltonian systems: shear flow. J. Statist. Phys., to appear 7. Dobrushin, R.L.: The description of a random field by means of conditional probabilities and conditions of its regularity. Teorija Verojatn. i Ee Prim. 13, 201-229 (1968). English translation, Theory Prob. Applications 13, 197–224 (1968) 8. Dobrushin, R.L: Gibbsian random fields for lattice systems with pairwise interactions. Funkts. Analiz i Ego Pril. 2, No 4, 31–43 (1968). English translation, Functional Anal. Appl. 2, 292–301 (1968) 9. Eckmann J.-P., and Ruelle. D.: Ergodic theory of strange attractors. Rev. Mod. Phys. 57, 617–656 (1985) 10. Gallavotti, G.: Chaotic hypothesis: Onsager reciprocity and fluctuation- dissipation theorem. J. Statist. Phys. 84, 899–926 (1996)
Entropy Production in Nonequilibrium Statistical Mechanics
371
11. Gallavotti, G. and Cohen, E.G.D.: Dynamical ensembles in nonequilibrium statistical mechanics. Phys. Rev. Letters 74, 2694–2697 (1995). "Dynamical ensembles in stationary states." J. Statist. Phys. 80, 931–970 (1995) 12. Gaspard, P. and Nicolis, G.: Transport properties, Lyapunov exponents, and entropy per unit time. Phys. Rev. Letters 65, 1693–1696 (1990) 13. Hoover, W.G.: Molecular dynamics. Lecture Notes in Physics 258. Springer, Heidelberg, 1986 14. Kifer, Yu.: Ergodic theory of random transformations. Birkh¨auser, Boston, 1986 15. Lanford, O.E. and Ruelle, D.: Observables at infinity and states with short range correlations in statistical mechanics. Commun. Math. Phys. 13, 194–215 (1969) 16. Ledrappier, F.: Propri´et´es ergodiques des mesures de Sinai. Publ. math. IHES 59, 163–188 (1984) 17. Ledrappier, F. and Young Lai-Sang. The metric entropy of diffeomorphisms: I. Characterization of measures satisfying Pesin’s formula, II. Relations between entropy, exponents and dimension. Ann. of Math. 122, 509–539, 540–574 (1985) 18. Ledrappier, F. and Young Lai-Sang: Entropy formula for random transformations. Prob. Th. Rel. Fields 80, 217–240 (1988) 19. Dong and Qian Min: Smooth ergodic theory of random dynamical systems. Lecture Notes in Mathematics 1606. Springer, Berlin, 1995 20. Ruelle, D.: Positivity of entropy production in nonequilibrium statistical mechanics. J. Statist. Physics 85, 1–25 (1996) 21. Ruelle, D.: Positivity of entropy production in the presence of a random thermostat. J. Statist. Physics, to appear 22. Sinai, Ya.G.: Markov partitions and C-diffeomorphisms. Funkts. Analiz i Ego Pril. 2, No 1, 64–89(1968). English translation, Functional Anal. Appl. 2, 61–82 (1968) 23. Sinai, Ya.G.: Gibbsian measures in ergodic theory. Uspehi Mat. Nauk 27, No 4, 21–64 (1972). English translation, Russian Math. Surveys 27, No 4, 21–69 (1972) Communicated by Ya. G. Sinai
Commun. Math. Phys. 189, 373 – 393 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Complete Analyticity of the 2D Potts Model above the Critical Temperature Aernout C. D. van Enter1,? , Roberto Fern´andez2,??,??? , Roberto H. Schonmann3,† , Senya B. Shlosman4,‡ 1 Instituut voor Theoretische Natuurkunde, Rijksuniversiteit Groningen, 9747 AG Groningen, The Netherlands 2 Instituto de Matem´ atica e Estat´ıstica, Universidade de S˜ao Paulo, Caixa Postal 66281, 05389-970 S˜ao Paulo, Brazil 3 Mathematics Department, University of California at Los Angeles, Los Angeles, CA 90024, USA 4 Mathematics Department, University of California at Irvine, Irvine, CA 92717, USA and Institute for the Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia
Received: 16 July 1996 / Accepted: 8 January 1997
Dedicated to the memory of Roland L’vovich Dobrushin Abstract: We investigate the complete analyticity (CA) of the two-dimensional q-state Potts model for large values of q. We are able to prove it for every temperature T > Tcr (q), provided we restrict ourselves to nice subsets, their niceness depending on the temperature T . Contrary to this restricted complete analyticity (RCA), the full CA is known to fail for some values of the temperature above Tcr (q). Our proof is based on Pirogov-Sinai theory and cluster expansions for the FortuinKasteleyn representation, which are available for the Potts model at all temperatures, provided q is large enough. 1. Introduction In this paper we are dealing with the two-dimensional q-state Potts model, which is the statistical mechanics model on Z2 with formal Hamiltonian X δσ(x),σ(y) , (1.1) H(σ) = − {x,y}
where σ(x) = 1, . . . , q is the spin variable at the site x ∈ Z2 , δσ(x),σ(y) is 1 for σ(x) = σ(y) and is 0 otherwise, the summation is taken over nearest neighbors, and q > 1 is an integer. The case q = 2 is the well known Ising model. It is known that the Potts model undergoes a first-order phase transition at a certain transition temperature Tcr = Tcr (q), provided q is large enough. Namely, the model has ?
Work partially supported by EU grant CHRX-CT93-0411. Researcher of the National Research Council (CONICET), Argentina. ??? Work partially supported by Fundaci´ on Antorchas and FAPESP, Projeto Tem´atico 95/0790-1. † Work partially supported by N.S.F. grants DMS 9100725 and DMS 9400644. ‡ Work partially supported by N.S.F. grant DMS 9500958 and by grant 930101470 of the Russian Fund for Fundamental Research. ??
374
A.C.D. van Enter, R. Fern´andez, R. H. Schonmann, S. B. Shlosman
q different Gibbs states for temperatures T < Tcr , q + 1 states at T = Tcr and one state for T > Tcr (see, for example, [KS]). We are going to study the Potts model in this last high temperature regime, and we want to investigate the problem of whether the unique Gibbs state the model has in that regime is completely analytic. The notion of “Complete Analyticity” (CA) of an interaction U was introduced in [DS2] and [DS3] for lattice spin systems. It can be defined in many different equivalent ways. According to one of them one has to consider an arbitrary finite subset 3 of Zd , and to compare the conditional Gibbs measures in 3 defined by U and two boundary conditions which only differ at a single site y. Namely, one asks for the distance in total variation between the restrictions of these Gibbs measures to an arbitrary subset 30 ⊂ 3 to decay exponentially with the Euclidean distance from y to 30 . In the papers [DS2] and [DS3] complete analyticity was shown to be equivalent to several properties of the conditional Gibbs measures corresponding to finite subsets of the lattice and arbitrary boundary conditions. All these properties are in the form of some estimates which are uniform, both in all the finite subsets of the lattice and in all the corresponding boundary conditions. They include the analytic dependence of the logarithm of the (conditional) partition function on the interaction parameters U = {UA , A ⊂ Zd , |A| < ∞}, the representation of the logarithm of the partition function as a sum of a volume term and a boundary term, the exponential decay of the truncated correlation functions, etc. Later, Stroock and Zegarlinski showed in [SZ] that complete analyticity is also equivalent to some statements about the various corresponding Glauber-type dynamics (i.e., reversible spin flip dynamics) and their corresponding Dirichlet forms – including logarithmic Sobolev inequalities, and exponential convergence to equilibrium. Again all the statements were uniform over all boundary conditions and all finite subsets of the lattice. It is natural to ask in the case of concrete models, like the Ising model, for which values of the parameters one has all these nice properties. It was realized that the notion of complete analyticity as originally defined, uniform over all finite subsets of the lattice, is actually too strong to hold in certain cases in which one still expects the system to have a very decent behavior. In particular, it is violated for the Ising model at low temperature and small nonzero field (for uncountably many curves in dimension D = 2 and for an open region of the (T, h)-plane for D ≥ 3, see [EFS], pp. 1010–15). Another explicit two-dimensional counterexample, due to one of us, was described in [MO2]. In this example the Hamiltonian considered is just slightly more complicated than the one of the Ising model, but the analysis is much simpler. The idea behind these examples is that if one considers arbitrary subsets of Z2 , then pathologies should not be unexpected, simply because the subsets may have boundaries which are comparable in size to the sets themselves. From the point of view of the physics involved in such problems, one is ready to compromise over such weird shapes and be satisfied with a condition of complete analyticity restricted to “reasonable” subsets of the lattice, including sufficiently thick rectangles, say. A project of this type was carried out by Martinelli and Olivieri in [MO2], [MO3] and related results appeared also in [LY]. In these papers results similar to those of Stroock and Zegarlinski were proven, in the form of equivalences between statements of complete analyticity, properties of reversible spin-flip dynamics and logarithmic Sobolev inequalities, uniformly only over certain subsets of the lattice, including all (sufficiently large) cubes. This weaker property was called “strong mixing for cubes” in [MO2], [MO3], [MOS] and “restricted complete analyticity” (RCA) in [SS]. As was remarked by Roland Dobrushin, the notion of restricted complete analyticity for a given lattice model is equivalent to (full) complete analyticity of another model, obtained from the initial one by partitioning the lattice into cubic blocks of a certain size
Complete Analyticity of the 2D Potts Model above the Critical Temperature
375
and considering the spin configurations of the initial system in the blocks to be the spin values of the new system. Therefore one can formulate the notion of restricted complete analyticity in as many equivalent ways as it is possible for the usual complete analyticity. The introduction of the notion of restricted complete analyticity turns out to be meaningful once one is able to establish this property beyond the region where the standard (full) complete analyticity is known to hold. The first such result was obtained in [MO2], where it was proven that the d-dimensional Ising model is RCA for any nonzero value of the magnetic field h, provided the temperature T is low enough: T ≤ T (d, h). This includes values where the standard CA property is violated ([EFS]). It was actually this result which prompted the remark quoted in the preceding paragraph. Indeed, if one considers the model obtained by partitioning the Ising model into cubic blocks of size l, then there exists a scale l(h) such that for l ≥ l(h) the block model satisfies the so-called e B(0) property, introduced in [DS1], which means that the finite volume ground state configurations should not depend on the boundary conditions for any volume. From that property the low-temperature RCA follows as a direct corollary of results in [DS2]. The next result in this direction was obtained in [MOS], where the authors showed, among other things, that RCA holds for the two-dimensional zero-field Ising model down to the critical temperature. Later some of us extended these results in [SS] by proving that RCA for 2D Ising model holds everywhere on the (T, h)-plane except for the phase transition segment ((T = 0, h = 0), (T = Tc , h = 0)) and the critical point (T = Tc , h = 0). It should be mentioned that it is not known to which extent CA holds in the above region. As mentioned above, in [EFS] this property was shown to be violated at low temperatures and small fields. It is not known whether or not CA holds at all temperatures above Tc . In the present paper we study the same problem for the 2D q-state Potts model. Our main result is that if the number of states q is large enough, then the model is RCA for all temperatures T above the transition temperature Tc = Tc (q). The validity of the RCA property for the Potts model turns out to be of more importance than for the zero-field Ising model: while CA for the Ising model still might hold at all temperatures above the critical one, it definitely fails for the Potts model for some supercritical temperatures, as was shown in [EFK]. For an estimate of the temperatures where CA does hold, see [L1,L2]. More precisely, we are proving the following result. Let l be an integer, and consider a natural partition of the lattice Z2 by 2l × 2l squares. Consider the collection of all finite subsets 3 ⊂ Z2 , which can be represented as a finite union of these squares. Such subsets 3 will be called l-regular. Then the following result holds: Theorem 1. Restricted Complete Analyticity for the 2D Potts model. Consider the two-dimensional q-state Potts model with q large enough, at any temperature T > Tcr (q). Then each of the 12 equivalent properties of Complete Analyticity, formulated in [DS3], is valid for every l-regular box 3, provided l ≥ l(T ), where l(T ) is some finite function. The analysis of the result in [EFK] leads us to expect that the function l(T ) has to diverge as T ↓ Tcr (q) for such a theorem to hold. We point out that our results hold for all q for which Theorem A below applies. While in its original form this Theorem does not provide the expected range for q, presumably Theorem A (or some version of it) will eventually be proven for all q for which there is a first-order phase transition in the temperature. The strategy of the proof of the main result is the following: we first note that from the results of [MOS] it follows that for the two-dimensional systems the RCA
376
A.C.D. van Enter, R. Fern´andez, R. H. Schonmann, S. B. Shlosman
property holds provided the system satisfies the weak mixing condition. The latter is defined by saying that for an arbitrary finite subset 3 of Z2 , if we compare the Gibbs measures with any two boundary conditions, then the distance in total variation between the restrictions of the corresponding Gibbs measures to an arbitrary set 30 ⊂ 3 decays exponentially with the Euclidean distance between 30 and ∂3. (We want to stress that in higher dimensions it may happen that weak mixing holds while RCA is violated – one example being the Czech models, studied in [Shl] – and that is the reason why our results are necessarily restricted to the two-dimensional case.) To show the weak mixing we use the Edwards-Sokal (ES) coupling between the q-state Potts model and the corresponding Fortuin-Kasteleyn random cluster model (or FK model). The latter is described in detail below; for the box 3 it is a probability distribution on the set of all partitions of 3 into connected components, which are called clusters. The conditional distribution of the Potts model in 3, given the configuration of FK clusters, which is defined by the ES coupling, is remarkably simple: for every cluster independently one has to choose one of the values 1, . . . , q with equal probability q1 , and to put all spins in that cluster to be equal to the value chosen. The exception comes from the clusters attached to the boundary, where the (common) value for the spins in the cluster is defined by the boundary conditions. So, roughly speaking, the conditional distribution in 30 ⊂ 3 under the condition that there are no clusters connecting 30 with the boundary ∂3 does not depend on boundary conditions. As a result, we have weak mixing, provided we can show that the probability that there exists a cluster, connecting 30 with ∂3, decays exponentially with the Euclidean distance between 30 and ∂3. The analogues of the last statement were obtained for different models by different methods. One was proven by Martirosyan [Mar2] for the low temperature Ising model with magnetic field (in arbitrary dimension). This result was strengthened (and the proof simplified) by one of us in [Sch]. Another result of this kind was obtained by one of us for the so called Czech models in [Shl]. Here we prove it by using the cluster expansion and Pirogov-Sinai theory for the large-q FK model, obtained in [LMMRS]. The specific feature of the large-q FK model is that it admits a cluster expansion which converges for all temperatures (and not only for low or high temperatures, like the Ising model). That enables us to obtain the weak mixing for all temperatures above the critical one. Once we know the weak mixing property, we can claim, by invoking the result of [MOS], that the q-state Potts model has the following properties above the critical temperature Tcr (q): (i) – Restricted complete analyticity, in the sense that the sets 3 in the definition that we reviewed above are restricted to be l-regular, l = l(T ). (ii) – Exponential convergence to equilibrium of the associated Glauber dynamics uniformly over l-regular subsets, uniformly over boundary conditions and over initial conditions. (iii) – Positive lower bound for the spectral gap of the generator of the associated Glauber dynamics, uniform over l-regular subsets with arbitrary boundary conditions. (iv) – Finite upper bound for the logarithmic Sobolev constant of the generator of the associated Glauber dynamics, uniform over l-regular subsets with arbitrary boundary conditions. (v) – A constructive condition for uniqueness of the Gibbs measure in infinite volume which was introduced by Dobrushin and one of us in [DS1] is satisfied. In fact, these nice properties were stated in [MOS] to hold only for square subsets, but they are valid for all l-regular subsets of the lattice and also all rectangles. We refer the reader to [MOS] for the precise statements and the various necessary definitions.
Complete Analyticity of the 2D Potts Model above the Critical Temperature
377
Another consequence of RCA is ([MO1]) that a sufficiently often iterated decimation transformation (how often depends on how close one is to the transition temperature) acting on the infinite volume Gibbs measure results in a Gibbs measure, even though applying the decimation transformation only a few times can result in a non-Gibbsian measure [EFK]. Similarly, the restriction of the RCA Gibbs measure to the spins on a line will be a Gibbs measure, compare [L1,L2]. It should be noted, that our restriction to the two-dimensional case comes only from the fact that in the final step of the proof we apply the results of [MOS], which are essentially two-dimensional. However, all the results of the present paper which do not rely on [MOS] hold in any dimension. In the next section we introduce the necessary notation and review some known results. The proof of our main statement is given in Section 3. 2. Notation and terminology The lattice: The cardinality of a set 3 ⊂ Z2 will be denoted by |3|. The expression 3 ⊂⊂ Z2 will mean that 3 is a finite subset of Z2 . For each x ∈ Z2 , we define the usual norm kxk = max{|x1 |, |x2 |}. The distance between two sets A, B ∈ Z2 will be denoted by dist(A, B) = inf{||x − y|| : x ∈ A, y ∈ B}. The (interior) boundary of a set 3 ⊂ Z2 will be denoted by ∂3 = {x ∈ 3 :kx − yk = 1 for some y 6∈ 3}. For lattice squares centered at the origin, we will use the notation 3(l) = Z2 ∩ [−l, l]2 . We will consider also layers L(l) = 3(l) \ 3(l − 1).
(2.1)
The graph of bonds, i.e., (unordered) pairs of nearest neighbors is defined as B = {{x, y} : x, y ∈ Z2 and kx − yk = 1}. Given a set 3 ⊂⊂ Z2 we define also B3 = {{x, y} : x or y ∈ 3 and kx − yk = 1}, ∂B3 = {{x, y} : x ∈ 3, y 6∈ 3 and kx − yk = 1}. A chain is a sequence of distinct sites x1 , . . . , xn , with the property that for i = 1, . . . , n − 1, ||xi − xi+1 || = 1. The sites x1 and xn are called the end-points of the chain x1 , . . . , xn , and n is its length. A chain is said to connect two sets if it has one end-point in each set. The configurations, observables and measures: At each site in Z2 there is a spin which can take values 1, . . . , q, where q is an integer. The spin configurations will therefore 2 be elements of the set {1, . . . , q}Z = . Given σ ∈ , we write σ(x) for the spin at 2 2 the site x ∈ Z . For A ⊂ Z we denote by σA the restriction of σ to A. Likewise, this restriction σA can be viewed as a subset of the set of all configurations:
378
A.C.D. van Enter, R. Fern´andez, R. H. Schonmann, S. B. Shlosman
σA = {σ ∈ : σ|A = σA }.
(2.2)
The single spin space, {1, . . . , q} is endowed with the discrete topology and is endowed with the corresponding product topology. The following definition will be important when we introduce finite systems with boundary conditions later on; given 3 ⊂⊂ Z2 and a configuration η ∈ , we introduce 3,η = {σ ∈ :σ(x) = η(x) for all x 6∈ 3}. Real-valued functions with domain in are called observables. Local observables are those which depend only on the values of finitely many spins; more precisely, f : → R is a local observable if there exists a set S ⊂⊂ Z2 such that f (σ) = f (η) whenever σ(x) = η(x) for all x ∈ S. The smallest S with this property is called the support of f , denoted supp(f ). The topology introduced above on , has the nice feature that it makes the set of local observables dense in the set of all continuous observables. We will also use bond variables. For every bond {x, y} ∈ B we introduce the bond variable nxy , taking values 0 and 1. The bond configurations n will be the elements of the set B = {0, 1}B . We call the bond {x, y} open with respect to the configuration n, if nxy = 1, and closed otherwise. Two bonds open with respect to n will be called connected by n, if there is a chain of endpoints of bonds open with respect to n, joining the endpoints of these bonds. A maximal connected component of open bonds will be called a (open) cluster of n. A single site, not connected to any other site, forms a cluster by definition. We introduce now the sets of bond configurations which are compatible with (site) boundary conditions. Namely, for every 3 ⊂⊂ Z2 and every configuration η ∈ , we introduce / B3 , B3,η ={n ∈ B : nxy = 0 for all {x, y} ∈ (2.3) η(u) = η(v) for all u, v ∈ / 3, connected by n}. We denote by B3 the larger family of all bond configurations, which are indifferent to the boundary conditions: / B3 }. B3 = {n ∈ B : nxy = 0 for all {x, y} ∈ We endow also with the Borel σ-algebra corresponding to the topology introduced above. In this fashion, each probability measure µ on this space can be identified by the R corresponding expected values f dµ of all the local observables f . The Gibbs measures: We will consider always the formal Hamiltonian (1.1). In order to give precise definitions, we define, for each set 3 ⊂⊂ Z2 , each boundary condition η ∈ and each σ ∈ 3,η X H3,η (σ) = − δσ(x),σ(y) . (2.4) {x,y}∈B3
Given 3 ⊂⊂ Z2 , η ∈ , and E ⊂ , we write X P (E) = exp(−βH3,η (σ)), Z3,η,T
(2.5)
σ∈3,η ∩E
where β = 1/T is the inverse temperature, and the superscript P stands for “Potts”. We P P = Z3,η,T (). abbreviate Z3,η,T
Complete Analyticity of the 2D Potts Model above the Critical Temperature
379
The Gibbs (probability) measure in 3 with boundary condition η at temperature T is now defined on as exp(−βH3,η (σ)) , if σ ∈ 3,η , P Z3,η,T (2.6) µ3,η,T (σ) = 0, otherwise. The Fortuin-Kasteleyn model: For every bond configuration n ∈ B3,η we define the total number of open bonds by X nxy . |n| = {x,y}∈B3
For every bond configuration n ∈ B3,η we define the number C3 (n) of inner clusters in 3 to be the number of open clusters of n, which contain no points outside 3. The FK (probability) measure in 3 with boundary condition η at temperature T is defined on B by β |n| C3 (n) (e − 1) q , if n ∈ B3,η , F K Z3,η,T (2.7) µ3,η,T (n) = 0, otherwise, where the FK partition function
X
FK Z3,η,T =
(eβ − 1)|n| q C3 (n) .
(2.8)
n∈B3,η
The Edwards-Sokal coupling: Finally we remind the reader of the construction of the ES coupling between the Potts model and the Fortuin-Kasteleyn random cluster model. Consider the box 3, and for any bond {x, y} ∈ B3 let us introduce a new variable nxy , taking values 0 and 1. Using now the identity exp{βδu,v } = 1 + (eβ − 1)δu,v , we can rewrite the expression for the Gibbs factor exp(−βH3,η (σ)) in the formula for the Gibbs distribution (2.6). Namely, for every σ ∈ 3,η we have X exp(−βH3,η (σ))= exp{β δσ(x),σ(y) } =
X
{x,y}∈B3
Y
[δnxy ,0 + (eβ − 1)δnxy ,1 dDeltaσx ,σy ]
nxy ∈B3 {x,y}∈B3
=
X
Y
(2.9)
[δnxy ,0 + (eβ − 1)δnxy ,1 δσx ,σy ].
nxy ∈B3,η {x,y}∈B3
The second equality is straightforward, and the third one holds because the bond configurations from B3 \ B3,η are not contributing to the sum. So we can now introduce the Edwards-Sokal probability distribution on the pairs (σ, n) ∈ 3,η × B3,η by Q β {x,y}∈B3 [δnxy ,0 + (e − 1)δnxy ,1 δσx ,σy ] µ3,η,T (σ, n)= P Z3,η,T (2.10) Q β {x,y}∈B3 [δnxy ,0 + (e − 1)δnxy ,1 δσx ,σy ] ≡ , ES Z3,η,T
380
A.C.D. van Enter, R. Fern´andez, R. H. Schonmann, S. B. Shlosman
ES where the Edwards-Sokal partition function Z3,η,T is defined by: X Y ES P = [δnxy ,0 + (eβ − 1)δnxy ,1 δσx ,σy ] ≡ Z3,η,T . (2.11) Z3,η,T nxy ∈B3 ,σ∈3,η {x,y}∈B3
The statement that the formula (2.10) indeed introduces a probability measure as well as the equality of the two partition functions follow immediately from the identity (2.9). A straightforward check shows that the marginal distribution of the n-variables under the measure (2.10) is nothing else than the FK measure (2.10), and therefore the three partition functions are equal: P ES FK = Z3,η,T = Z3,η,T . Z3,η,T
(2.12)
3. The Proof of Theorem 1 3.1. The general strategy. As we already said in the introduction, the RCA property will be established once we check the weak mixing property for the Potts model. This property is the following estimate on the variation distance: X exp{−c dist(x, y)}, (3.1) V ar(µ3,η1 ,T |A , µ3,η2 ,T |A ) ≤ C x∈A,y ∈1 /
which should hold for every A, 1, such that A ⊂ V ⊂⊂ Z2 , and where µ3,η,T |A is a restriction of the measure, while C, c are positive constants, which do not depend on A, 1, η 1 , η 2 . Actually, the proof of the implication {weak mixing} ⇒RCA, given in [MOS], does not use the full strength of the weak-mixing property (3.1). It is enough to know (3.1) only for some special A-s and 1-s. Namely, let some k be fixed, and consider the subsets 3, which can be obtained by taking unions, intersections and complements of at most k lattice rectangles of sizes not greater than l, for some real l. Let 0 < p < 1 be some real number, and let A = A(3, l, p) = {x ∈ 3 : dist(x, 3c ) ≥ lp }. In that case (3.1) boils down to V ar(µ3,η1 ,T |A , µ3,η2 ,T |A ) ≤ exp{−clp }, (3.2) (with smaller c). To use [MOS] one needs to know (3.2) for all l sufficiently big and k ≤ 2. In order to check (3.2) it is enough to prove for all such 3, A the estimate µ3,η1 ,T (σA ) ≤ exp{−clp } − 1 (3.3) µ 2 (σA ) 3,η ,T for every configuration σA , uniformly in η 1 , η 2 , which clearly implies (3.2). We will give the proof only for the case of the square box 3(l); for the reader who will read it, the generalization will be obvious. Without loss of generality we can suppose that η 1 , η 2 ∈ σA , (see (2.2)) as is evident from the definitions (2.4), (2.6). In this case the ratio in (3.3) can be rewritten as P P Z3\A,η 1 ,T Z3,η 2 ,T P P Z3,η 1 ,T Z3\A,η 2 ,T
.
(3.4)
Let us explain why one should expect the last ratio to be close to one in the regime T > Tcr . The Potts model has one Gibbs state in that regime, which is called the chaotic
Complete Analyticity of the 2D Potts Model above the Critical Temperature
381
state. Therefore the typical configuration of the system in the box 3 under the boundary condition η is the following: near the boundary it is dictated by the boundary condition η, whereas somewhere close to the boundary ∂3 there is a long contour 0, separating the boundary layer from the rest of the box, where the system behaves chaotically. So the partition function can be written as a sum over such contours, X P P = Z3,η,T (0). Z3,η,T 0
Within the precision we need, we can rewrite it as P ≈ Z3,η,T
X( 0 )
P Z3,η,T (0),
(3.5)
0
where the summation is restricted to those 0 which are close to the boundary ∂3. The P can be written in the same way. However, the boundary of partition function Z3\A,η,T the box 3 \ A is not connected, so the analogue of (3.5) is the following: P Z3\A,η,T ≈
X(0 )
P ¯ Z3\A,η,T (0, 0),
(3.6)
0,0¯
where again the summation is restricted to 0 lying close to the boundary ∂3 and to 0¯ lying close to the boundary ∂A. We have, therefore, approximate equalities: P Z3,η i ,T ≈
P Z3\A,η i ,T ≈
X(0 ) 0i
X (0 )
P Z3,η i ,T (0i ),
(3.7)
P ¯ Z3\A,η i ,T (0i , 0).
(3.8)
0i ,0¯
So it is enough to estimate the ratio P ¯ P Z3\A,η 1 ,T (01 , 0)Z3,η 2 ,T (02 ) ¯ Z P 1 (01 )Z P (02 , 0) 2 3,η ,T
3\A,η ,T
¯ of contours, which are close to the corresponding parts of the for every triple (01 , 02 , 0) boundaries of our subsets. To see the desired cancellation we observe that the logarithm P ¯ can be represented as a volume term plus a of the partition function Z3\A,η,T (0, 0) boundary term, and if the two boundaries ∂3, ∂A are well separated, this boundary term is nearly a sum of two terms corresponding to the contours 0, 0¯ (again with the same precision). This is the strategy we are going to follow. There are different options to study these partition functions. One way is to use the variant of the Pirogov-Sinai theory for the Potts model, developed in [Mar1]. Technically however it is easier to pass first to the FK representation, introduced above, and then use the Pirogov-Sinai theory for it, developed in [LMMRS]. To implement this program we rewrite (3.4) with the help of the identity (2.12) as P P Z3\A,η 1 ,T Z3,η 2 ,T P P Z3,η 1 ,T Z3\A,η 2 ,T
=
FK FK Z3\A,η 1 ,T Z3,η 2 ,T F K ZF K Z3,η 1 ,T 3\A,η 2 ,T
.
(3.9)
382
A.C.D. van Enter, R. Fern´andez, R. H. Schonmann, S. B. Shlosman
We want to express the above partition functions in terms of more familiar FK partition functions with free and wired boundary conditions. We then want to use the corresponding contour models to treat the latter. In order to proceed we need some more notation, notions and results which we borrow from [LMMRS]. 3.2. Pirogov-Sinai theory of the FK model and cluster expansions. We call a plaquette p any four-tuple of bonds in B, which form an elementary cell. We call two bonds adjacent, if they share a vertex, and we call them coadjacent if they belong to the same plaquette. These definitions lead to natural notions of connectedness and coconnectedness of a subset of B. Let X ⊂ B be a subgraph with no isolated sites. We denote by v(X) ⊂ Z2 the set of its vertices, and by |X| the number of its bonds. The subset vI (X) ⊂ v(X) of inner vertices consists of all vertices which belong to four bonds of X. The bond b ∈ X belongs to the boundary ∂X ⊂ X iff b ∈ p, where p is a plaquette such that p 6⊂ X. The bond b ∈ / X belongs to the coboundary δX ⊂ X c iff v(b) ∩ v(X) 6= ∅. We denote by C(X) the number of connected components of the graph X. Let now V ⊂ B be a finite subgraph without isolated vertices. We introduce the partition functions with free and wired boundary conditions by X (eβ − 1)|X| q C(X)+|vI (V )\v(X)| , Z f (V ) = X⊂V,X∩δV c =∅
Z w (V ) =
X
(eβ − 1)|X| q C(X)−C(V )+|vI (V )\v(X)| .
X⊂V,∂V ⊂X
The following limits exist and are equal: lim (1/|V |) ln Z f (V ) = lim (1/|V |) ln Z w (V ) = f (β).
V →B
V →B
A coconnected subset 0 ⊂ B is called a contour, if it is a coboundary of some X ⊂ B. If 0 is finite, then either X or X c is finite. The unique infinite component of B \ 0 is called the exterior of 0 and is denoted by Ext(0). We also introduce V (0) = B \ Ext(0), and Int(0) = V (0) \ 0. For b ∈ δX we introduce d(b) as the number of endpoints of b, which belong to X, and we define the length of the contour 0 by X d(b). ||0|| = b∈0
If X is finite, then 0 is called a contour of the free class, and if X c is finite, then 0 is called a contour of the wired class. Note, that some of the contours belong to both classes. For each of the classes one introduces in the standard way the notions of compatible contours and external contours. For a family θ = {01 , . . . , 0n } of mutually compatible external contours in V we introduce V (θ) = ∪i V (0i ), Int(θ) = V (θ) \ θ, Ext(θ) = B \ V (θ), ExtV (θ) = V \ V (θ). With these definitions we obtain the following relations between the partition functions: X q |vI (V \Int(θf ))| Z w (Int(θf )), (3.10) Z f (V ) = θf ⊂V
where the sum is over the families θf of mutually compatible external f -contours in V , and X (eβ − 1)|ExtV (θw )| Z f (V (θw )), (3.11) Z w (V ) = θw ⊂V
Complete Analyticity of the 2D Potts Model above the Critical Temperature
383
where the sum runs over the families θw of mutually compatible external w-contours in V , which do not intersect with the boundary ∂V . A contour model is specified by assigning weights ϕ(0) to contours. The corresponding partition function is defined by X Y ϕ(0), (3.12) Z(V |ϕ) = ∂⊂V 0∈∂
where the sum is over admissible families ∂ of contours in V . We are going to consider contour models both for f - and w-contours; in the first case admissibility means that contours 0f are compatible and are in V , while in the second case it means that contours 0w are compatible, are in V and moreover 0w ∩ ∂V = ∅. For every family ∂ of admissible contours we introduce the subset θ(∂) ⊂ ∂ as the collection of all external contours in ∂. Evidently, XY ϕ(0)Z(Int(0)|ϕ), (3.13) Z(V |ϕ) = θ⊂V 0∈θ
where the summation is over all families θ of external contours. We are going to consider the probability distribution νV,ϕ on the ensemble of the admissible contours in V , corresponding to the contour functional ϕ. Namely, we define the probability to observe the family ∂ by Q ϕ(0) . (3.14) νV,ϕ (∂) = 0∈∂ Z(V |ϕ) By applying the Peierls transformation one gets immediately from this definition, that the probability of a given contour 0 to appear in V satisfies the Peierls estimate: νV,ϕ {∂ : 0 ∈ ∂} ≤ ϕ(0).
(3.15)
The contour model with a parameter a ≥ 0 is defined by the following partition function: XY ea|V (0)| ϕ(0)Z(Int(0)|ϕ), (3.16) Z(V |ϕ, a) = θ⊂V 0∈θ
where the sum runs over all families θ of external contours. We introduce also the probability distribution νV,ϕ,a (∂) for the contours of the contour model with parameter by modifying the definition (3.14) in an obvious way. The important difference is that once a > 0, then the estimate (3.15) is no longer valid in general. A contour model with parameter is in fact associated to an “unstable phase” or “wrong” boundary condition. The presence of a parameter a > 0 favors the formation of a “large” contour representing a flip into a “stable phase”, taking place very close to the boundary (Lemma 1 below). The advantage of contour models lies in the fact that they can be treated by means of the cluster expansion technique. However, that is possible only for those contour models, whose contour functional satisfies the estimate |ϕ(0)| ≤ e−τ ||0|| , with τ reasonably big. In that case the functional is called a τ -functional, following [PS1], [PS2], [Sin]. This ensures the existence of the free energy per bond:
384
A.C.D. van Enter, R. Fern´andez, R. H. Schonmann, S. B. Shlosman
f (ϕ) = lim (1/|V |) ln Z(V |ϕ). V →B
Actually, it implies much more. Namely, one has the following formula for the partition function: X 8(B), ln Z(V |ϕ) = B⊂V
where the sum runs over all connected subsets of V , and 8 is a ϕ-dependent function, which satisfies the bound τ 8(B) ≤ e− 2 d(B) , where d(B) is the number of bonds in the smallest connected set which contains all boundary bonds of B. In particular, one has the following formula for the logarithm of the partition function: ln Z(V |ϕ) = |V |f (ϕ) +
X
gϕ (b, V ),
(3.17)
b∈∂V
where the function gϕ (b, V ) is defined for every pair consisting of a bond b and a box V , such that b ∈ ∂V , and has the following regularity properties: τ
|gϕ (b, V )| ≤ Ce− 2 , τ
|gϕ (b, V1 ) − gϕ (b, V2 )| ≤ Ce− 2 dist(b,V1 4V2 )
(3.18) (3.19)
for b ∈ ∂V1 ∩ ∂V2 , where V1 4V2 stands for the symmetric difference. (The above statements are standard from the point of view of the theory of cluster expansions and can be found, for example, in [DKS], sect. 3.11.) In [LMMRS] the contour functionals, which describe the FK model (in a sense which will be explained later) were constructed. We will need the following result, which is part of the main result of [LMMRS]: Theorem A. Consider the two-dimensional FK model for the q-state Potts model, q being large enough, in the regime when β < βcr (q). Then there exist τ -functionals ϕf , ϕw and a real parameter a = a(β) > 0 such that Z f (V ) = q |vI (V )| Z(V |ϕf ),
(3.20)
Z w (V ) = (eβ − 1)|V | Z(V |ϕw , a).
(3.21)
The following relations hold: a + ln(eβ − 1) + f (ϕw ) =
1 ln q + f (ϕf ) = f (β), 2
ϕf (0f )Z(Int(0f )|ϕf ) = q −|v(Int(0f ))| Z w (Int(0f )), ϕw (0w )Z(Int(0w )|ϕw ) = e−a|V (0w )| (eβ − 1)−|V (0w )| Z f (V (0w )).
(3.22) (3.23) (3.24)
The parameter τ can be chosen arbitrarily large, provided q is sufficiently large.
Complete Analyticity of the 2D Potts Model above the Critical Temperature
385
The relation between the contour models and the initial FK model comes from comparing the formulas (3.10), (3.11) with (3.12), (3.16), (3.20) and (3.21): the distribution of the external contours of the FK model in the box V with free b.c. coincides with the distribution of the external contours in V defined by the contour model with contour functional ϕf , while that of the FK model with wired b.c. coincides with the distribution of the contour model with the functional ϕw and parameter a. Indeed, in both cases the partition function is written as a sum of products of terms, corresponding to compatible external contours. Since the formulas (3.10), (3.11), (3.12), (3.16), (3.20) and (3.21) are valid for all volumes, it implies that the factors corresponding to external contours are actually the same. 3.3. The boundary clusters. We are ready now to rewrite the ratio (3.9) with the help of the partition functions introduced above. We will consider first the case when 3 is the square box 3(l). Let n ∈ B3(l),η , and consider all open clusters K of n, which have sites in 3(l)c . Such clusters will be called boundary clusters. By K = K(n) we denote the collection of all boundary clusters K of n. The set of all possible collections of boundary clusters K of configurations in B3(l),η will be denoted by Sη . Denote by O = O(K) the complement O = B3(l) \ ∪K∈K K. It is immediate to see that FK = Z3(l),η,T
P
X
Z f (O(K))(eβ − 1)
K∈K
|K|
.
(3.25)
K∈Sη
Let us introduce the shorthand notation 3(l, lp ) for the annulus 3(l)\3(l −lp ). Then for every configuration n ∈ B3(l,lp ),η we can introduce the set of its boundary clusters in the same manner as it was done above. This set splits into two families: the family K of boundary clusters which are attached to the exterior boundary of the annulus 3(l, lp ) and the family K¯ of boundary clusters which are attached to the interior boundary of 3(l, lp ) and are disjoint from the exterior one. The set of all such pairs (K, K) will be denoted by Seη . In the obvious notation one has the following analogue of the formula (3.25): P X |K| FK Z f (O(K ∪ K))(eβ − 1) K∈K∪K . (3.26) Z3(l,l p ),η,T = eη (K,K)∈S Let us introduce the subset Sη0 ⊂ Sη formed by all families K, such that every K ∈ K has a height n o he(K, ∂3(l)):= max dist u, ∂3(l) : u ∈ K ≤ lp /3 . In the same way we define the subset Seη0 ⊂ Seη as the collection of all pairs (K, K) with 0
heights he(K, ∂3(l)) ≤ lp /3 and he(K, ∂3(l − lp )) ≤ lp /3. If we denote by S η the set of all families K of boundary clusters K satisfying the last restriction, then clearly 0 Seη0 = Sη0 × S η .
Suppose now for a moment that we are able to show that
(3.27)
386
A.C.D. van Enter, R. Fern´andez, R. H. Schonmann, S. B. Shlosman
FK Z3(l),η,T
=
X
P
Z (O(K))(e − 1) f
β
K∈K
|K|
τl ), (1 + Ce−e p
(3.28)
K∈Sη0
and that FK Z3(l,l p ),η,T =
X
P
Z f (O(K ∪ K))(eβ − 1)
K∈K∪K
|K|
τl (1 + C 0 e−e ), (3.29) p
eη0 (K,K)∈S
where the constants C = C(l, p, η), C 0 = C 0 (l, p, η) are uniformly bounded in l and η, and τe = τe(τ ) > 0 is independent of l and η. We claim that in such a case the relation (3.3) follows from the expansion (3.17) and the relation (3.19). Indeed, let us insert the expansions (3.28) and (3.29) into (3.9), with 3 = 3(l), A = 3(lp ). Using (3.27), we have FK FK Z3(l,l p ),η 1 ,T Z3(l),η 2 ,T
= FK FK Z3(l),η 1 ,T Z3(l,lp ),η 2 ,T P P P |K| f |K| f β K∈K1 ∪K Z (O(K2 ))(eβ −1) K∈K2 K1 ∈Sη0 ,K2 ∈Sη0 Z (O(K1 ∪ K))(e −1) 1
2
K∈S η0 (=S η0 )
P
1
P
2
K1 ∈Sη0 ,K2 ∈Sη0 1 2
Z f (O(K
1
))(eβ −1)
K∈K1
|K|
P Z f (O(K
2
∪
K))(eβ −1)
K∈K2 ∪K
|K|
K∈S η0 (=S η0 ) 2
1
00 −e τ lp
×(1 + C e
).
Consider the ratio of the corresponding terms: P P |K| |K| Z f (O(K1 ∪ K))(eβ −1) K∈K1 ∪K Z f (O(K2 ))(eβ −1) K∈K2 P P . |K| |K| Z f (O(K1 ))(eβ −1) K∈K1 Z f (O(K2 ∪ K))(eβ −1) K∈K2 ∪K Note that the total sets of the boundary clusters K, appearing in the numerator or in the denominator, are the same, and each is equal to K1 ∪ K2 ∪ K. Hence all the factors P (eβ − 1) ∗ cancel out. Now, the sets O(K∗ ∪ K), O(K∗ ) are in general not connected, so the corresponding partition functions split into products, and the factors which appear both in the numerator and in the denominator also cancel. A moment’s thought leads to the conclusion that what is left equals the ratio e 1 ∪ K))Z f (O(K e 2 )) Z f (O(K , e 1 ))Z f (O(K e 2 ∪ K)) Z f (O(K e ∗ ∪K), O(K e ∗ ) are those connected components of the sets O(K∗ ∪K), O(K∗ ), where O(K which contain the whole “middle level”, i.e. the set ∂3(l − 21 lp ). The application of the expansion (3.17) and the relation (3.19) implies immediately, that the last ratio is equal τ lp with C = C(K1 , K2 , K, l, p) uniformly bounded in K1 , K2 , K, l, which to 1 + Ce−e proves our statement (3.3). The above argument shows, that the only things that remain to be proven are the relations (3.28), (3.29). We will do this in the next subsection. The reason why our project is bound to succeed is that above the critical temperature −1 (q) the FK model (as well as the Potts model) has a unique state – the chaotic one – βcr
Complete Analyticity of the 2D Potts Model above the Critical Temperature
387
which is characterized by the appearance of a large amount of small connected clusters. So the boundary conditions, fixed around some box V , are unable to influence the behavior of the system in the bulk. More precisely, no matter which boundary conditions we choose, there will be a contour in the vicinity of the boundary ∂V , separating the boundary influenced behavior outside it from the chaotic one inside. We start the rigorous proof of this picture by considering the wired b.c. In that case the formula (3.21) tells us that the corresponding distribution of the external contours coincides with the one for the contour model with parameter. In light of that the appearance of the following statement is natural: Lemma 1 (Estimate on the volume of the unstable phase). Let θw = {01 , . . . , 0n } be a family of mutually compatible external w-contours in V . Consider the event that the contours {01 , . . . , 0n } are the only external contours in the ensemble defined by the contour τ -functional ϕw with parameter a. That is, we consider the probability distribution Qn a|V (0i )| e ϕw (0i )Z(Int(0i )|ϕw ) . (3.30) νV,ϕw ,a (θw ) = νV,ϕw ,a (01 , . . . , 0n ) = 1 Z(V |ϕw , a) Introduce the random variable uV = uV (θw ) = |ExtV (θw )|. Then νV,ϕw ,a (uV ≥ N ) ≤ exp{−aN + C|∂V |},
(3.31)
where C = C(τ, β). Note . It is worth noting that our statement does not hold for an arbitrary contour model with parameter, even for large τ . The reason is that when one discusses general contour models, one asks for the upper bound |ϕ(0)| ≤ e−τ ||0|| only, and so one does not rule out the possibility that ϕ(0) is actually much smaller and even vanishes for some contours. But in such a case the number of sites in the box V which stay outside all external contours is of the order of |V |, and the estimate (3.31) breaks down. However for the situation at hand we have also the lower bound ϕ(0) ≥ e−τ¯ ||0||
(3.32)
for some real τ¯ , and this is enough to prove the estimate (3.31). Proof of Lemma 1. The idea of the proof of the upper bound is to replace the partition function in the denominator of (3.30) by a lower bound which has the form of one of the factors of the numerator of (3.30). To do this we consider the collection Θw (V ) = {01 , . . . , 0k , k = k(V )} of mutually compatible external w-contours in V which minimizes the variable uV . It is clear that uV (Θw (V )) = C|∂V | for some C. Then X νV,ϕw ,a (uV ≥ N ) = νV,ϕw ,a (θw ) = θw ⊂V :uV (θw )≥N
Q
P θw ⊂V :uV (θw )≥N
0∈θw
ea|V (0)| ϕw (0)Z(Int(0)|ϕw )
≤ Z(V |ϕw , a) P Q a|V (0)| ϕw (0)Z(Int(0)|ϕw ) θw ⊂V :uV (θw )≥N 0∈θw e Q ≤ a(|V |−u (Θ (V ))) w V e 0∈Θw (V ) ϕw (0)Z(Int(0)|ϕw ) ≤ ea(uV (Θw (V ))−N ) Q
1 0∈Θw (V )
ϕw (0)
Q
Z(V |ϕw ) . 0∈Θw (V ) Z(Int(0)|ϕw )
388
A.C.D. van Enter, R. Fern´andez, R. H. Schonmann, S. B. Shlosman
We now claim that each of the last two factors admits an upper bound of the order of exp{C|∂V |} for some C. For the last factor this follows from the expansion (3.17), since the complement V \ ∪0∈Θw (V ) Int(0) is contained in the neighborhood of ∂V of radius 2. For the first one we use (3.24) and (3.20) to express the contour functional ϕw via partition functions Z(∗|ϕw ), Z(∗|ϕf ) of contour models (with no parameters). We obtain that ϕw (0w ) = e−a|V (0w )| (eβ − 1)−|V (0w )| q |vI (V (0w ))|
Z(V (0w )|ϕf ) . Z(Int(0w )|ϕw )
We then use the expansion (3.17) to write each partition function as an exponent of the volume term and the boundary term and the relation (3.22) to observe that all volume terms cancel out. (The fact that we are dealing not with just an abstract contour model, but with a specific one which admits the lower bound (3.32) on the contour functional is made explicit by our use of the relation (3.24), which implies in particular the strict positivity of the contour functional.) 3.4. Fingers of the boundary clusters and their surgeries. In what follows we are proving the relation (3.29) for the case of the square box 3(l). The relation (3.28) is easier and can be proven by the same argument with simpler notation. In the following statement we estimate the probability of the event that the boundary cluster goes deep inside the box. Lemma 2 (Estimate of the probability of a long finger). Let q be such that Theorem A above holds. Fix a real number 0 < p < 1 and consider the event π(l, p, η) = {n ∈ B3(l,lp ),η : K ∩ 3(l − lp /3, lp + lp /3) 6= ∅ for some K ∈ K(n)}. (3.33) Then (3.34) µ3(l,lp ),η,T (π(l, p, η)) ≤ C exp{−bτ lp }, where C = C(p, T ) > 0 and b > 0 is an absolute constant (e.g. 1/6). Proof of Lemma 2. The idea of the proof is to study “fingers”, which are protruding parts of the boundary clusters. The finger can be either attached to the exterior boundary of 3(l, lp ) or it joins the exterior and the interior boundaries of 3(l, lp ). If the finger is “thin” somewhere – which means that its length is of higher order than its thickness – then one can cut across it, obtaining an exterior contour of the length of the order lp , which implies the estimate needed. If the finger is “fat” everywhere, that implies that the number of open bonds inside it is much larger than the perimeter, so one can hope to control the situation by using the estimate (3.31). To implement this program we start by defining fingers and their parameters. For a boundary cluster K and fixed numbers 0 < k, h < lp /6, we define the set Fk,h ⊂ K – the (k, h)-finger – and the sets of bonds Bk , Bh ⊂ K – the bases of the finger – by the following properties: h i i) Fk,h ∩ L l − (lp /3) ∪ L lp + lp /3 6= ∅ [see (2.1)], ii) Bk ⊂ L(lp + k) ∩ K, Bh ⊂ L(l − h) ∩ K. iii) Fk,h is a connected component of K \ Bk ∪ Bh , iv) there is no path in Fk,h , connecting L l − (lp /3) ∪ L lp + lp /3 to the boundary ∂3(l, lp ),
Complete Analyticity of the 2D Potts Model above the Critical Temperature
v)
389
Bk ∪ Bh is the smallest set of bonds, satisfying ii), iii) and iv). (Either Bk or Bh can be empty.)
The proof is based on the following three ingredients: 1) Surgery of a finger. This is a map that to each configuration n exhibiting a finger Fk,h with bases Bk , Bh , associates the configuration n0 = n0 (n) which is obtained from n by declaring all bonds in Bk , Bh to be closed. This configuration n0 is characterized by an exterior contour κk,h (of the wired class) delimiting the finger Fk,h . The map is many-to-one, with the multivaluedness coming from the number of ways to choose the |Bk | + |Bh | bonds in a proper place in layers L(lp + k), L(l − h). For our purposes it is enough to take the rough bound (2l2 )|Bk |+|Bh |
(3.35)
for the number of preimages. On the other hand, the relation between the probabilities µ3(l,lp ),η,T (n) and µ3(l,lp ),η,T (n0 ) is the following: (eβ − 1)|Bk |+|Bh | µ3(l,lp ),η,T (n) = . µ3(l,lp ),η,T (n0 ) q
(3.36)
The numerator comes from the number of connections severed by the surgery, and the denominator q arises from the one extra cluster, Fk,h , obtained after the surgery. As a consequence, the probability of an event E(Fk,h , Bk , Bh ) that the given finger appears, satisfies the inequality µ3(l,lp ),η,T E(Fk,h , Bk , Bh ) (3.37) β |Bk |+|Bh | 2 |Bk |+|Bh | (e − 1) µ3(l,lp ),η,T E(Fk,h , κk,h ) , ≤ (2l ) q where E(Fk,h , κk,h ) – the event to observe the contour κk,h delimiting the cluster Fk,h – is obtained from E(Fk,h , Bk , Bh ) through surgery. 2) Thin-finger estimation. A finger Fk,h will be called (ls , γ)-thin, for some s > 0 and 0 ≤ γ < 1, if for some c > 0, |κk,h | ≥ cls
and |Bk | + |Bh | ≤ 2lγs .
(Here and in the following we will be interested in situations when c is fixed, while l is large.) The µ3(l,lp ),η,T -probability of the union of all configurations n0 , which have the contour κk,h among their external contours, is at most exp{−τ ||κk,h ||}. Hence we can use (3.37) plus a Peierls estimate (3.15) to obtain the following bound: µ3(l,lp ),η,T {n ∈ π(l, p, η) : some K in K(n) contains a (ls , γ)-thin finger} X β 2lγs 2 2lγs (e − 1) 2 2j (2l) 4 exp{−τ j} ≤ (2l ) q s j≥cl
≤ exp{−
(3.38)
cτ s l }, 2
provided l is large enough. Here the combinatorial factor (2l)2 42j estimates the number of contours of length j that can be drawn inside our box.
390
A.C.D. van Enter, R. Fern´andez, R. H. Schonmann, S. B. Shlosman
3) Fat-finger estimation. A finger Fk,h will be called (ls , γ)-fat, for some s > 0 and 0 ≤ γ < 1, if for some c > 0 |κk,h | ≤ lγs
and |Fk,h | ≥ cls .
If we apply the estimate (3.31) with V to be the interior of κ, we obtain, via inequality (3.37) and the fact that |Bk | + |Bh | ≤ |κk,h |, the bound µ3(l,lp ),η,T {n ∈ π(l, p, η) : some K in K(n) contains a (ls , γ)-fat finger} γs
(eβ − 1)l νInt(κk,h ),ϕw ,a (uInt(κk,h ) ≥ cls ) q γs β γs (e − 1)l ≤ (2l2 )2l exp{−acls + Clγs } q ac ≤ exp{− ls } , 2 ≤ (2l2 )2l
γs
(3.39)
for l large. With these ingredients, the proof of (3.34) proceeds as follows. We fix a positive real number 0 < α < 1, such that 1 − α is sufficiently small to guarantee that α p>1, 1−α
(3.40)
and perform the following finite sequence of steps: Step 1. We consider first the configurations n ∈ π(l, p, η) which for some k1 , h1 < lp /(3 · 2) have a finger Fk1 ,h1 with both bases having less than lαp bonds: max |Bk1 |, |Bh1 | ≤ lαp . (3.41) The length of the contour κk1 ,h1 is at least lp /3, because it penetrates at least a distance lp /3 inside 3(l, lp ), while the bases are at most at a distance lp /6 from the boundary ∂3(l, lp ). Hence, such a finger is (lp , α)-thin and the bound (3.38) shows that the configurations considered in this step have a probability of occurrence not exceeding τ exp{− lp } . 6
(3.42)
Step 2. For the remaining configurations the condition (3.41) is violated for all k, h < lp /(3 · 2). We consider the following part of them: those configurations for which for some k2 , h2 ≤ lp /(3 · 22 ) both bases have less than lα(αp+p) bonds. That is, either
or
|Bk | > lαp for all 0 ≤ k ≤ lp /(3 · 2)
(3.43)
|Bh | > lαp for all 0 ≤ h ≤ lp /(3 · 2)
(3.44)
and also
max |Bk2 |, |Bh2 | ≤ lα(αp+p) for some 0 ≤ k2 , h2 ≤ lp /(3 · 22 ) .
(3.45)
Complete Analyticity of the 2D Potts Model above the Critical Temperature
391
To bound their contribution we consider two cases: Case 2.1. If the length of the contour κk2 ,h2 is of an order larger than the size of the bases: |κk2 ,h2 | ≥ ls2 with s2 = (αp + p)(1 + α)/2 , 2α then the finger in question is (ls2 , α e)-thin, with α e = 1+α . The thin-finger bound (3.38) tells us that the probability of these configurations is bounded above by
τ exp{− ls2 } . 2 Case 2.2. In the opposite case we have |κk2 ,h2 | ≤ ls2 , e e which implies that the finger is (lαp+p , α e)-fat, with α e = 1+α 2 . Indeed, given that either (3.43) or (3.44) is satisfied, the finger contains at least lαp × lp /(3 · 22 ) bonds. Applying (3.39) we conclude that we are dealing with configurations whose probability is at most exp{−
a αp+p l }. 3 · 23
We proceed by induction and we arrive to Step m. Introduce the quantity rm =
m−1 X
αi p .
i=1 th
During the m step we treat the portion of configurations not treated before – namely, those which have fingers such that for all 0 ≤ h, k ≤ lp /(3 · 2m−1 ) all corresponding base-widths satisfy max |Bk |, |Bh | > lrm , while for some km , hm ≤ lp /(3 · 2m ) max |Bkm |, |Bhm | ≤ lα(rm +p) (≡ lrm+1 ) . The first inequality implies that either |Bk | > lrm for all 0 ≤ k ≤ lp /(3 · 2m−1 ), or |Bh | > lrm for all 0 ≤ h ≤ lp /(3 · 2m−1 ) (or both). We have two cases: Case m.1. If the order of the length of the contour κk,h exceeds that of the size of the bases: |κk,h | ≥ lsm with sm = (rm + p)(1 + α)/2 , then the finger is (lsm , α e)-thin. From (3.38) the probability of the corresponding configurations is bounded by τ (3.46) exp{− lsm } . 2 Case m.2. In the opposite case, when |κk,h | ≤ lsm ,
392
A.C.D. van Enter, R. Fern´andez, R. H. Schonmann, S. B. Shlosman
we use the fact that the finger contains at least lrm × lp /(3 · 2m ) bonds. Hence the finger e e)-fat and (3.39) implies that the event formed by these configurations has a is (lrm +p , α probability bounded by a lrm +p } . (3.47) exp{− 3 · 2m+1 One might think that we are in trouble here, since the exponent in (3.47) goes to 0 as m → ∞. Happily, our procedure terminates after a finite number of steps, because condition P (3.40) ensures that there exists a m0 – independent of l – such that for all l m0 i we have l i=1 α p exceeds 4l, which is the maximum possible size for |Bk | and |Bh |. The sum of the (finitely many) estimates (3.46)–(3.47) proves the bound (3.34). We see that the leading contribution comes at the first step, which was taken care of in (3.42). As was mentioned above, the result of Lemma 2 implies the relations (3.28), (3.29), which in turn imply Theorem 1. Acknowledgement. R. F. and R. H. S. wish to thank the hospitality of the Instituut voor Theoretische Natuurkunde, Rijksuniversiteit Groningen, where the collaboration was started.
References [DKS]
Dobrushin, R.L., Koteck´y, R., Shlosman, S.B.: Wulff construction: a global shape from local interaction. AMS translations series, 1992 [DS1] Dobrushin, R.L., Shlosman, S.B.: Constructive criterion for the uniqueness of Gibbs fields. In: in Statistical Physics and Dynamical Systems. ed. J. Fritz, A. Jaffe and D. Sz´asz, Boston etc.: Birkh¨auser, 1985 pp. 347–370 [DS2] Dobrushin, R.L., Shlosman, S.B.: Completely analytical Gibbs fields. In: in Statistical Physics and Dynamical Systems ed J. Fritz, A. Jaffe and D. Sz´asz, Boston etc.: Birkh¨auser 1985 pp. 371–403 [DS3] Dobrushin, R.L., Shlosman, S.B.: Completely analytic al interactions. Constructive description. J. Stat. Phys. 46, 983–1014 (1987) [EFK] van Enter, A. C. D., Fern´andez, R., Koteck´y, R.: Pathological behavior of renormalization-group maps at high fields and above the transition temperature. J. Stat. Phys. 79, 969–992 (1995) [EFS] van Enter, A.C.D., Fern´andez, R., Sokal, A.D.: Regularity properties and pathologies of positionspace renormalization-group transformations: Scope and limitations of Gibbsian theory. J. Stat. Phys. 72, 879–1167 (1993) [KS] Koteck´y, R., Shlosman, S.B.: First order phase transitions in large entropy lattice models. Commun. Math. Phys. 83, 493–515 (1982) [L1] L¨orinczi, J.: On limits of the Gibbsian formalism in thermodynamics. Ph.D.Thesis, University of Groningen (1995) [L2] L¨orinczi, J.: Quasilocality of projected Gibbs measures through analyticity techniques. Helv. Phys. Acta 68, 605–626 (1995) [LMMRS] Laanait, L., Messager, A., Miracle-Sole, S., Ruiz, Shlosman, S.B.: Interfaces in Potts model. I. Pirogov-Sinai theory of the Fortuin-Kasteleyn representation. Commun. Math. Phys. 140, 81–91 (1991) [LY] Lu, S.L., Yau, H.T.: Spectral gap and logarithmic Sobolev inequality for Kawasaki and Glauber dynamics. Commun. Math. Phys. 156, 399–433 (1993) [Mar1] Martirosyan, D.G.: Translation invariant Gibbs states in the q-state Potts model. Commun. Math. Phys. 105, 281–290 (1986) [Mar2] Martirosyan, D.G.: Theorems on strips in the classical Ising ferromagnetic model. Soviet Journal Contemporary Mathematical Analysis 22, 59–83 (1987) [MO1] Martinelli, F., Olivieri, E.: Some remarks on pathologies of renormalization-group transformations. J. Stat. Phys. 72, 1169–1177 (1993)
Complete Analyticity of the 2D Potts Model above the Critical Temperature
[MO2] [MO3] [MOS] [PS1] [PS2] [Sch] [Sin] [SS] [Shl] [SZ]
393
Martinell, F., Olivieri, E.: Approach to equilibrium of Glauber dynamics in the one phase region I. The attractive case. Commun. Math. Phys. 161, 447–486 (1994) Martinelli, F., Olivieri, E.: Approach to equilibrium of Glauber dynamics in the one phase region II. The general case. Commun. Math. Phys. 161, 487–514 (1994) Martinelli, F., Olivieri, E., Schonmann, R.H.: For 2-D lattice spin systems weak mixing implies strong mixing. Commun. Math. Phys. 165, 33–47 (1994) Pirogov, S.A., Sinai, Ya.G.: Phase diagrams of classical lattice systems, 1. Theor. Math. Phys. 25, 1185–1192 (1975) Pirogov, S.A., Sinai, Ya.G.: Phase diagrams of classical lattice systems, 2. Theor. Math. Phys. 26, 39–49 (1976) Schonmann, R.H.: Slow droplet-driven relaxation of stochastic Ising models in the vicinity of the phase coexistence region. Commun. Math. Phys. 161, 1–49 (1994) Sinai, Ya.G.: Phase diagrams of classical lattice systems. Theory of phase transitions: rigorous results. London: Pergamon Press, 1982 Schonmann, R.H., Shlosman, S.B.: Complete analyticity for 2D Ising completed. Commun. Math. Phys. 170, 453–482 (1995) Shlosman, S.B.: Uniqueness and half-space nonuniqueness of Gibbs states in Czech models. Theor. Math. Phys. 66, 284–293 (1986) Stroock, D.W., Zegarlinski, B.: The logarithmic Sobolev inequality for discrete spin systems on a lattice. Commun. Math. Phys. 149, 175–194 (1992)
Communicated by Ya. G. Sinai
Commun. Math. Phys. 189, 395 – 445 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Fluctuations of the Phase Boundary in the 2D Ising Ferromagnet R. Dobrushin, O. Hryniv1,? 1 Institute for Applied Problems of Mechanics and Mathematics, Ukrainian Academy of Sciences, Naukova 3“b”, Lviv 290601, Ukraine. E-mail:
[email protected] 2
Received: 20 July 1996 / Accepted: 18 February 1997
Abstract: We discuss some statistical properties of the phase boundary in the 2D lowtemperature Ising ferromagnet in a box with the two-component boundary conditions. We prove the weak convergence in C[0, 1] of measures describing the fluctuations of phase boundaries in the canonical ensemble of interfaces with fixed endpoints and area enclosed below them. The limiting Gaussian measure coincides with the conditional distribution of certain Gaussian process obtained by the integral transformation of the white noise. 1. Introduction The large deviation probabilities for the total magnetization in the two-dimensional (2D) Ising ferromagnet are known to possess the non-classical asymptotics in the phase coexistence region. The exponential decay here is of the surface order [25, 14] reflecting the fact that the phase separation is the main mechanism responsible for this asymptotic behaviour. (Without being explicitly stated, this fact was essentially presented in the early papers by Minlos and Sinai [19, 20] where the case of d-dimensional (d ≥ 2) Ising model was rigorously studied.) The rate function corresponds to the total surface tension of the phase boundary and the limiting shape of the latter can be described in the framework of the Wulff theory [7, 23]. Particularly, in the typical configurations, the immersed phase tends to form a unique macroscopic droplet with the shape and the area close to that of the Wulff droplet, i. e., the solution of the related variational problem. As a result, the optimal value of the Wulff functional provides the correct constant on the surface scale of the exponential decay of large deviations probabilities. Note the really remarkable fact that the last observation is actually true for all subcritical temperatures, i. e., in the whole phase coexistence region [15, 16]. ? Current address: TU Berlin, FB 3, Secr. MA 7-3, Str. des 17. Juni 136, 10623 Berlin, Germany; E-mail:
[email protected]
396
R. Dobrushin, O. Hryniv
The results obtained in [7, 23, 15, 16] describe many interesting properties of the phase boundary as well as typical configurations in the considered situation. However, they are not sufficient to deliver the exact asymptotics of the probabilities of large deviations. To this end one needs more detailed information about the fluctuations of phase boundary with respect to the limiting Wulff shape, the information that is also of independent interest. The present paper is an attempt on the way to fill this gap. Namely, we discuss statistical properties of phase boundary in the 2D low-temperature Ising ferromagnet with the two-component boundary conditions in the canonical ensemble of interfaces with fixed endpoints and fixed “area enclosed below them”. We prove the weak convergence in C[0, 1] of the probability distributions describing the fluctuations of such interfaces around the corresponding part of the Wulff shape to certain conditional Gaussian distribution. This limiting measure coincides with the conditional distribution of a Gaussian random process obtained by the integral transformation of the white noise. As in the preceding paper [6], where a similar problem for general model of the SOStype was investigated, we use extensively the large deviation principle in the strong form [8] combined with ideas further developed from the original book [7]. These results were announced in [13]. To our knowledge, there were only two mathematical papers 1 studying weak convergence of measures describing fluctuations of the phase boundary in the 2D Ising ferromagnet [12, 5]. Nevertheless, the methods used there were adjusted to the investigation of interfaces with fixed endpoints (even only horizontal ones in [12]) and are not applicable to the additional volume constraint discussed here. The paper is organized as follows. Section 2 contains notions and known facts to be used later on. The main results are stated in Sect. 3. The basic polymer representation of the partition function is developed in Sect. 4. Then, in Sect. 5 we prove the analyticity of the corresponding free energy and discuss some its properties that are used in proofs of limit theorems in Sect. 6. Convergence of finite dimensional distributions of the considered conditional process is established in Sect. 7. The proof of the main result is completed in Sect. 8, where the tightness condition for the sequence of measures is checked. Finally, in the Appendix we present the geometric construction of the solution to the Wulff variational problem corresponding to the discussed situation. Professor Roland Dobrushin left us forever when the work described in this paper was still in progress. But even this irreversible loss could not reduce his personal influence on the whole work – without any doubts, he is the main author of this result. In fact, this text is an attempt by the second author to realize some ideas of his Teacher. This paper is devoted to the memory of R. L. Dobrushin.
2. Preliminaries To fix the notations let us recall briefly certain notions and facts from the theory of the 2D Ising model (for detailed discussion see, e. g., [7]). ∗ ∗ Lattices. Let Z2 be the two-dimensional integer lattice and Z2 be its dual, Z2 = (Z + 1/2)2 , both consisting of sites. Theseplattices are immersed into R2 equipped with the usual Euclidean distance |·|, |x−y| = (x1 − y1 )2 + (x2 − y2 )2 , where x = (x1 , x2 ) 1 Many interesting ideas appeared already in the pioneering paper [9], where however only a particular one-dimensional distribution of the phase boundary was discussed.
Fluctuations of Phase Boundary in 2D Ising Ferromagnet
397
and y = (y1 , y2 ). We call a bond any segment of unit length connecting two neighbouring sites of the dual lattice. Let s, t be two neighbours in Z2 and f denote the unit segment connecting s and t. By definition, a bond e separates these sites if the segments f and e are orthogonal and meet at their midpoints. Fix one of the two directions (1, 1) and (1, −1). Any straight line passing through a site in this fixed direction is called a diagonal. Thus, any site belongs to certain ∗ (uniquely determined) diagonal. By definition, a site s ∈ Z2 is attached to s∗ ∈ Z2 provided √ they share the diagonal and |s − s∗ | = 2/2. A site s ∈ Z2 is attached to a bond e if s is attached to one end of e. Let e1 and e2 be two orthogonal bonds that share a site of the dual lattice. We say that e1 and e2 form a linked pair of bonds if they belong to the same half-plane in R2 determined by the diagonal passing through their common point. For a set V ⊂ Z2 , |V | denotes its cardinality and ∂V is its outer boundary, n o ∂V = s ∈ Z2 \ V : ∃t ∈ V with |t − s| = 1 . A bond e is called a boundary bond of the set V if there exist t ∈ V and s ∈ Z2 \ V such that e separates t and s. Configurations. For V ⊂ Z2 denote by V = {−1, 1}V the set of all possible configurations σ = σV in V . In the case V = {s} the configuration σV is reduced to the spin at the site s and is denoted simply by σs . If VN , N > 1, is the vertical strip in Z2 of the width N , n o (2.1) VN = t = (t1 , t2 ) ∈ Z2 : 0 < t1 < N , we denote the corresponding set {−1, 1}VN of configurations by N . Fix any V ⊂ Z2 . A configuration σ = σ Z2 \V in the complement Z2 \ V is called a boundary condition (for V ). Two kinds of boundary conditions will be considered mainly in the following: the constant plus boundary condition σ + , σ +t = 1,
for all t ∈ Z2 \ V ,
and the two-component boundary condition σ ϕ , ϕ ∈ (−π/2, π/2), 1, if t2 > t1 tan ϕ, σϕ = t −1, otherwise.
(2.2)
(2.3)
Contours. Let σ be a configuration in a set V ⊂ Z2 and σ be a boundary condition. The boundary 0(σ, σ) of the configuration σ under the boundary condition σ is the collection of all bonds separating the sites in Z2 with different values of spins. Then any site s∗ of the dual lattice is the meeting point of an even number of such bonds. If four bonds meet at a common vertex we split them up into two pairs of linked bonds. This procedure is actually a fixed choice of the so-called “rounding of corners” along the diagonal passing through the common vertex of these bonds. Apply this procedure at any dual site that is a meeting point of four bonds from 0(σ, σ). Then the boundary 0(σ, σ) splits up into connected components to be called contours. Let VN M , M > 1, be the set (cf. (2.1)) n o VN M = t = (t1 , t2 ) ∈ VN : 1 − M < t2 < M (2.4)
398
R. Dobrushin, O. Hryniv
and σ ≡ σ + . Then every contour of 0(σ, σ), σ ∈ N M = {−1, 1}VN M , is a closed polygon. For σ ≡ σ ϕ the boundary 0(σ, σ) contains one (infinite) open polygon S. In the case M ≥ [N tan ϕ] + 1 this open polygon passes through the points (0, 1/2) and (N, [N tan ϕ] + 1/2). Phase boundary. Let σ be a configuration in VN (recall (2.1)) and σ ϕ be the boundary condition defined in (2.3). As before, denote by S ∈ 0(σ, σ) the (infinite) open contour passing through the points (0, 1/2) and (N, [N tan ϕ] + 1/2). Let 1(S) be the set of all points from Z2 ∩ RN , n o RN = (x1 , x2 ) ∈ R2 : x1 ∈ [0, N ] , that are attached to bonds of S. The restriction of S to the vertical strip RN is called the phase boundary and is denoted also by S. Let TNϕ denote the set of all phase boundaries consistent with the boundary condition ϕ σ . Fix any S ∈ TNϕ . The point (0, 1/2) is the initial point and (N, [N tan ϕ] + 1/2) is the ending point of the phase boundary S. By definition, the height h(S) of S is the difference in the ordinates of the ending and the initial points of S. Thus, for S ∈ TNϕ one has h(S) = [N tan ϕ]. Assume that M = M (S) > 1 is such that the contour S is covered by the rectangle RN M = [0, N ] × (1 − M, M ). Then the polygon S splits up the rectangle RN M into two parts, the “upper” and the “lower” ones, with the areas Q+N and Q− N respectively. The quantity Q− − Q+N (2.5) a(S) = aN (S) = N 2 is called the area under the phase boundary S. Clearly, this definition does not depend upon M provided it is sufficiently large, M ≥ M0 (S). Observe also that for a “nice” contour S that intersects any vertical line x = k, k = 1, 2, . . . , N − 1, at a unique point the quantity a(S) gives the value of the integral of the piecewise constant function appearing after removing all vertical segments from S. Gibbs measures. Let V be a finite subset of Z2 and σ be a boundary condition. The Gibbs distribution PV,β (·|σ) in V with the boundary condition σ is the probability measure in V given by PV,β (σ|σ) = Z(V, β, σ)−1 exp{−βH(σ|σ)}, where the hamiltonian H(σ|σ) is defined by X H(σ|σ) = − σs σt −
X
s,t∈V,
s∈V,t∈∂V,
|s−t|=1
|s−t|=1
σ ∈ V ,
σs σ t ,
(2.6)
(2.7)
the partition function Z(V, β, σ) is Z(V, β, σ) =
X
exp{−βH(σ|σ)},
(2.8)
σ∈V
and β > 0 denotes the inverse temperature. In what follows we will always assume that β is sufficiently large. Ensembles of phase boundaries. Consider the box VN M defined in (2.4) and let σ ϕ be the boundary condition from (2.3). Let PN,M,β (·|σ ϕ ) be the Gibbs distribution in
Fluctuations of Phase Boundary in 2D Ising Ferromagnet
399
N M = {−1, 1}VN M defined as in (2.6)–(2.8). For M > N tan ϕ denote by TNϕM the set of all phase boundaries in VN M consistent with the boundary condition σ ϕ . The Gibbs distribution PN,M,β (·|σ ϕ ) induces the probability distribution PN,M,β,ϕ (·) in TNϕM according to the following formula: n o S ∈ TNϕM . PN,M,β,ϕ (S) = PN,M,β σ ∈ N M : 0(σ, σ ϕ ) 3 S σ ϕ , Another form of this distribution will be of importance in the following ([7, §4.3]). Namely, let 8(3) be the function of finite subsets in Z2 determined from the cluster expansion of the partition function Z(VN M , β, σ + ) ([7, §3.9]), |S| denote the length 2 of the polygon S, and 1(S) is the set of all sites attached to the phase boundary. Then, defining the weights wN M (S) via n o X 8(3) , (2.9) wN M (S) = exp −2β|S| − 3⊂VN M :3∩1(S)6=∅
we rewrite PN,M,β,ϕ (S) =
wN M (S) , Ξ(N, M, ϕ)
(2.10)
where Ξ(N, M, ϕ) is the corresponding partition function, X wN M (S). Ξ(N, M, ϕ) = ϕ S∈TN M
For future reference we recall here the following important properties of the function 8(3) ([7, §3.9,§4.3]): 8(3) is a translation invariant function vanishing on nonconnected sets 3 ⊂ Z2 ; moreover, there exists β0 < ∞ such that for all β ≥ β0 one has (2.11) |8(3)| ≤ exp{−2(β − β0 )d(3)}, where the function d(3) satisfies the inequality d(3) > 2diam(3) + 2
(2.12)
with diam(3) denoting the diameter of the set 3, diam(3) = max{|x − y| : x, y ∈ 3}. According to Lemma 3.10 ([7]), estimate (2.11) implies the inequality X |8(3)| ≤ K|S|, (2.13) 3⊂Z2 :3∩1(S)6=∅
where K = K(β) is a constant such that K&0 as β% + ∞. Therefore, for all sufficiently large β the weights (cf. (2.9)) o n X 8(3) (2.14) w(S) = exp −2β|S| − 3:3∩1(S)6=∅
are well defined. Let TNϕ = ∪M TNϕM be the set of all phase boundaries in VN consistent with the boundary condition σ ϕ and TN = ∪ϕ TNϕ denote the set of all possible phase boundaries 2 Observe that two external halfbonds of S did not contribute to |S| in [7] but this does not affect the value on the right-hand side of (2.10).
400
R. Dobrushin, O. Hryniv
in VN (the union here is over all ϕ ∈ (−π/2, π/2)). Due to [7, Theorem 4.8] the quantities X X w(S), Ξ(N ) = w(S) Ξ(N, ϕ) = ϕ S∈TN
S∈TN
are finite (in fact, Ξ(N ) coincides with the partition function Ξ(N, 0, restr), where Ξ(N, H, restr) is the partition function for the restricted grand canonical ensemble of the phase boundaries (see definition (4.3.16) in [7])). As a result, one can define the probability distributions PN,β,ϕ (·) ≡ PN,+∞,β,ϕ (·) and PN,β (·) in TNϕ and TN respectively via the following formulas: w(S) , Ξ(N, ϕ)
PN,β,ϕ (S) = and PN,β (S) =
w(S) , Ξ(N )
S ∈ TNϕ ,
S ∈ TN .
(2.15)
(2.16)
Here again one has the condition β ≥ β1 > βcr that is a consequence of application of the cluster expansions technique. Surface tension, free energy, Legendre transformation. For any fixed ϕ ∈ (−π/2, π/2) denote by n = n(ϕ) = (− sin ϕ, cos ϕ) the unit orthogonal vector to the straight line t2 = t1 tan ϕ in R2 . Let the box VN M , M > N tan ϕ, be as in (2.4) and Z(VN M , β, σ) denote the partition function in N M corresponding to the boundary condition σ. By definition, the surface tension in the direction of n is given by Z(VN M , β, σ ϕ ) cos ϕ log , N →∞ M →∞ βN Z(VN M , β, σ + )
τβ (n) = − lim
lim
(2.17)
where the boundary conditions σ ϕ and σ + are defined by (2.3) and (2.2) respectively. The surface tension is closely related to another important function, the so-called free energy. To define it we fix any δ > 0 and for any complex number H satisfying the condition |
Ξ(N, H) =
exp{βHh(S)}w(S)
(2.19)
S∈TN
with h(S) denoting the height of the phase boundary S. The limit log Ξ(N, H) N →∞ N
F (H) = lim
(2.20)
is called the free energy corresponding to the height h(S) of the phase boundary. According to Theorem 4.8 [7] this limit exists and is an analytical function of H in the domain (2.18). The free energy F (H) defined in (2.20) is dual to the surface tension τβ (·). Namely ([7, Theorem 4.12]), one has τβ (n) =
1 ∗ F (β tan ϕ) cos ϕ, β
(2.21)
Fluctuations of Phase Boundary in 2D Ising Ferromagnet
401
where f ∗ (·) denotes the Legendre transformation of the real convex function 3 f : R → R, f ∗ (p) = sup px − f (x) . x
The following property of the Legendre transformation will be used below. Property 2.1. Let f (·) be a strictly convex twice continuously differentiable real function m ∗ defined in a region U ⊂ R , m ≥ 1, and f (p) be its Legendre transformation, f ∗ (p) ≡ supx (x, p) − f (x) , p ∈ Rm . Assume that the values x ∈ U and p ∈ Rm are related via ∇f (x) = p. Then the following relations hold: f ∗ (p) = (x, p) − f (x), ∇f ∗ (p) = x, −1 Hess f ∗ (p) = Hess f (x) .
(2.22)
Observe that in the considered case the matrix Hess f (x) of the second derivatives f (x) as a function of x ∈ Rm is strictly positive definite at x. This duality property of the Legendre transformation can be verified directly or induced from the known facts ([24, Chap. 5]). Wulff shape. Let τβ (ϕ) = τβ (n) be the surface tension defined in (2.17). Using the symmetry properties of the lattice Z2 we easily have τβ (ϕ) ≡ τβ (π/2 − ϕ),
τβ (ϕ) ≡ τβ (−ϕ),
and thus τβ (n) can be defined for all unit vectors n ∈ S1 . Denote by D the set of all closed self-avoiding rectifiable curves γ ⊂ R2 that are boundaries of bounded regions (thus, boundary of any bounded convex region belongs to D). Recall that any such rectifiable curve has finite length and has a tangent at its almost every point. To each γ ∈ D we assign the quantity Z τβ (ns ) ds, (2.23) W(γ) = Wβ (γ) = γ
where ds denotes the length element and ns is the unit outward normalvector to the curve γ at the point s ∈ γ. The functional (2.23) is called the Wulff functional corresponding to the surface tension τβ (·). For any γ ∈ D denote by Vol(γ) the area of the enclosed region. By definition, the Wulff shape wβ is a solution to the variational problem Wβ (γ) → inf :
γ ∈ D,
Vol(γ) ≥ 1.
Alternatively, one defines Wβ,λ = ∩n∈S1 x ∈ R2 : (x, n) ≤ λτβ (n) , where (·, ·) denotes the usual scalar product in R2 , n is a unit vector, and τβ (·) is the surface tension defined in (2.17). Then the Wulff shape wβ coincides with the boundary of the set Wβ,λ0 , where λ0 is determined from the condition Vol(Wβ,λ0 ) = 1. The Wulff 3 Here and in the following we omit restrictions near the signs like upper bounds, sums, integrals, etc. when the appropriate operation is going over the whole set of possible values of parameters, summation indices, integration variables respectively.
402
R. Dobrushin, O. Hryniv
shape is known to be unique up to translations in R2 [26, 27]. Due to positiveness of the d2 stiffness, 4 τβ (ϕ) + dϕ 2 τβ (ϕ), the Wulff shape is a smooth strictly convex closed curve in R2 and inherits the natural symmetries from Z2 [7, §2.20, §4.21]. Wulff profile. The main goal of the present paper is to study the statistical properties of phase boundaries of the 2D Ising ferromagnet in a bulk with the two-component boundary conditions σ ϕ . More precisely, we investigate the limiting behaviour of probability distributions PN,β,ϕ (·) (PN,M,β,ϕ (·) resp.) in the canonical ensemble of phase boundaries S ∈ TNϕ (TNϕM resp.) with fixed value of the area (recall (2.5)) aN (S) = N 2 qN ,
qN → q
as N → ∞,
enclosed below them. The phase boundary here is an open polygon; thus, its limiting behaviour is closely related to the corresponding piece of the Wulff shape to be called below the Wulff profile. To construct the Wulff profile we use the following geometric algorithm. 5 Let l be a non-vertical straight line intersecting the Wulff shape at two different points O and A (we denote by A that of them that is to the left; see Fig. 1,a)). The segment OA splits up the interior of the Wulff shape into two parts, the “upper” one Q+l and the “lower” one − − + + + Q− l with the areas |Ql | and |Ql | = 1 − |Ql | correspondingly. Clearly, Ql and Ql are convex sets having tangents at all their boundary points except O and A.
6 A0
→ O
l
A
?
l0
O0
a)
1 b)
Fig. 1. Geometric construction of the Wulff profile
We say that the line l generates a (q, ϕ)-cutting of the Wulff shape if the following + two conditions hold: a) the line l has the slope angle ϕ; b) the area |Q− l | (|Ql | in the 1 case q < 2 tan ϕ) satisfies the equality |Q− l | = |q −
tan ϕ | · |OA|2 cos2 ϕ 2
with |OA| denoting the length of the segment OA (and thus |OA| cos ϕ is its horizontal projection). Due to the strict convexity of the Wulff shape wβ , for any q ∈ R and Here we treat the surface tension τβ (·) as a function of ϕ (recall that n = (− sin ϕ, cos ϕ)). The analytical expression for the Wulff profile in terms of the free energy F (·) from (2.20) is given in (3.14) below. See also the Appendix for more detailed discussion of the problem in a framework of a general 1D SOS model. 4
5
Fluctuations of Phase Boundary in 2D Ising Ferromagnet
403
ϕ ∈ (−π/2, π/2) there exists a unique (q, ϕ)-cutting of wβ (for q = 21 tan ϕ the points O and A coincide, and l becomes a tangent to the Wulff shape). If, in addition, the limiting value q is relatively small, 1 (2.24) q − tan ϕ < Q0 (ϕ) 2 (with Q0 (ϕ) easily identified in terms of the Wulff shape), all the tangents to Q− l at its boundary points (different from O and A) have uniformly bounded slope angles. Then the simple transformation (reflection + scaling; see Fig. 1,b)) of the arc OA gives the corresponding Wulff profile (in the degenerate case q = 21 tan ϕ the Wulff profile becomes a segment O0 A0 ). It what follows we will always assume the validity of condition (2.24) (which, in particular, will make possible the SOS approximation of phase boundaries for sufficiently large values of the inverse temperature β). 3. Results Let TN be the set of all possible phase boundaries in VN and P(·) ≡ PN,β (·) denote the probability distribution from (2.16). Let E(·) ≡ EN,β (·) be the corresponding operator of mathematical expectation. Fix any S ∈ TN and for all k = 0, 1, . . . , N define + (k) = max{t2 : (k, t2 ) ∈ S}. gN
(3.1)
+ + (x), x ∈ [0, N ], be the piecewise linear interpolation of the values gN (k). Denote Let gN + (t), t ∈ [0, 1], the random polygonal function by ξN + + + (t) = gN (N t) − gN (0). ξN
(3.2)
+ (t) conditioned by Our aim here is to describe the statistical properties of trajectories ξN fixing the values of the area aN (S) and the height h(S). More precisely, let 3N be the random vector
3N = (YN , hN ),
(3.3)
where hN = hN (S) is the height of S ∈ TN and YN =
1 aN (S) N
(3.4)
is the normalized area under S (recall (2.5)). For H = (H0 , H1 ), denote by L3N (H) the logarithmic moment generating function of the random vector 3N (recall (2.16)), n o L3N (H) ≡ log E exp β H, 3N = log Ξ(N, 3, H) − log Ξ(N ), (3.5) where the partition function Ξ(N, 3, H) is calculated via n X Ξ(N, 3, H) = exp −2β|S| + βH0 YN + βH1 hN − S∈TN
X
o 8(3) .
(3.6)
3:3∩1(S)6=∅
We will show below (see Remark 5.1.1) that the last expression is finite provided the real part
404
R. Dobrushin, O. Hryniv
n o Dδ2 = (H0 , H1 ) ∈ R2 : |H1 | < 2 − δ/β, |H1 + H0 | < 2 − δ/β
(3.7)
with some δ > 0 and β ≥ β0 (δ). Consider any sequence of real vectors AN = (N qN , N bN ) such that 2N 2 qN and N bN are integer numbers and N −1 AN → A = (q, b),
2q 6= b,
(3.8)
in such a way that 1 N −1 AN − A = o √ N
as N → ∞.
(3.9)
Definition 3.1. Let δ be a positive number. Any sequence AN satisfying (3.8)–(3.9) is called (3N , δ)-regular if the following conditions hold: 1) for any N > 1, P(3N = AN ) > 0; (3.10) 2) for all N > 1 there exists a solution HN ∈ Dδ2 of the equation = AN ; β −1 ∇H L3N (H)
(3.11)
H=HN
b = (Q, H) ∈ D2 of the equation 3) there exists a solution H δ I(H) ≡ β −1 ∇H
Z
1
F (H0 y + H1 ) dy
0
H=b H
= A.
(3.12)
Here Dδ2 is the set from (3.7), ∇H denotes the gradient with respect to H = (H0 , H1 ) and F (·) is the free energy from (2.20). Remark 3.1.1. It can be checked directly that (3.10) is true provided N bN and 2N 2 qN are integer numbers of the same parity. Remark 3.1.2. The condition HN ∈ Dδ2 for all N > 1 is a technical one; namely, we will b ∈ D2 implies HN ∈ D2 show below (see the discussion after (7.5)) that the inclusion H δ δ for all sufficiently large N . Remark 3.1.3. Using the strict convexity of the function F (·) one can show that the relations 2q 6= b and Q 6= 0 are equivalent (see also the discussion the in Appendix below). ¯ sequence AN and consider the conditional random process Fix any (3N , δ)-regular + + (t) = ξN (t)|3N = AN ) θN
(3.13)
+ (t) defined in (3.2). Applying arguments similar to those used in [7] one can prove with ξN + (t). Namely, the distribution of the process the law of large numbers for the process θN tends weakly in the space C[0, 1] of continuous function on the segment [0, 1] to the distribution concentrated on some deterministic function e(t), ˆ t ∈ [0, 1]. The function e(t) ˆ presents the solution of the following variational problem (cf. (2.23), (2.21)):
Fluctuations of Phase Boundary in 2D Ising Ferromagnet
405
Z 1 W(f ) = β −1 F ∗ (f 0 (t)) dt → inf, 0 Z n f ∈ g ∈ AC[0, 1] : g(0) = 0, g(1) = b,
o
1
g(t) dt = q
0
(here AC[0, 1] is the space of absolutely continuous functions on [0, 1]) and can be computed explicitly, e(t) ˆ = F (H + Q) − F (H + Q − Qt) βQ, (3.14) where (Q, H) is the solution of (3.12). Observe that due to Remark 3.1.3 one has Q 6= 0 and thus e(t) ˆ is well defined. Moreover, in view of the inclusion (Q, H) ∈ Dδ2 , the derivative of e(t) ˆ is uniformly bounded in [0, 1]. Consider the random process 1 ∗ + (t) = √ θN (t) − N e(t) ˆ , θN N
t ∈ [0, 1],
(3.15)
and denote the corresponding measure in C[0, 1] by µ∗N = µ+,∗ N . The following theorem formulates the main result of the present paper. ¯ sequence AN be as described above. Then there Theorem 3.2. Let a (3N , δ)-regular ¯ < ∞ such that for all β ≥ β0 the sequence of measures µ∗ converges exists β0 = β0 (δ) N weakly to some Gaussian measure µ∗ in C[0, 1]. The limiting measure µ∗ coincides with ˆ the conditional probability distribution of the random process ξ(t), t ∈ [0, 1], obtained by the integral transformation of the white noise dws , Z t 1/2 ˆ ≡ β −1 ξ(t) dws , F 00 (H + Q − Qs) 0
conditioned by the conditions Z 1 ˆ dt = 0 ξ(t) ηˆ ≡
and
ˆ = 0. ξ(1)
0
Remark 3.2.1. The random vector 3N from (3.3) has zero mean and the variances of its components are of order N (see Lemma 6.1 below). Therefore, the condition 2q 6= b means that the events {3N = AN } are in the large deviation region for the distribution PN,β (·). Plan of the proof of Theorem 3.2. The proof of our main result follows the same scenario used in the case of random walks [6] with necessary modifications. Namely, for any natural number k and a set S of real numbers si , 0 < s1 < s2 < . . . < sk < 1 = sk+1 , consider the random vector ΘN ≡ YN , XN (s1 ), . . . , XN (sk ), XN (1) ∈ Rk+2 , (3.16) where YN was defined in (3.4), and XN (t), t ∈ [0, 1], are calculated via (cf. (3.2)) + + XN (t) ≡ gN ([N t]) − gN (0),
with [N t] denoting the integral part of N t. Let Mk+2 N , k = 0, 1, . . ., be the set
(3.17)
406
R. Dobrushin, O. Hryniv
n o 1 Mk+2 . N = M = (m0 , m1 , . . . , mk+1 ) : {2N m0 , m1 , . . . , mk+1 } ⊂ Z
(3.18)
1 k Then for any MN ∈ Mk+2 N of the kind MN = (N qN , mn , . . . , mN , N bN ) one has the relation P(ΘN = MN ) . (3.19) P XN (s1 ) = m1N , . . . , XN (sk ) = mkN | 3N = AN = P(3N = AN )
¯ Here 3N = (YN , XN (1)) is the vector from (3.3) and AN = (N qN , N bN ) is the (3N , δ)regular sequence fixed above. First, we investigate the asymptotical behaviour of the numerator and the denominator in (3.19) and obtain the central limit theorem for the finite dimensional distributions of the random process (3.20) ΘN (t) ≡ XN (t) | 3N = AN . + (t) (recall (3.13)) Then, we prove that the difference between the conditional process θN and ΘN (t) has uniformly bounded exponential moments in some neighbourhood of the origin. This observation implies immediately the same central limit theorem for the + (t). corresponding finite dimensional distributions of the process θN Finally, we check the following inequality: ∗ ∗ (t) − θN (s)| ≤ C|t − s|7/4 E|θN 4
with some constant C > 0 uniformly in s, t ∈ [0, 1] and sufficiently large N . This implies the weak compactness of the sequence µ∗N and finishes the proof by applying known results on weak convergence of measures in C[0, 1] ([10]). A similar result holds also for the random process − − (t) = ξN (t)|3N = AN ), θN
t ∈ [0, 1],
induced by the lowest points of intersection (cf. (3.1)), − (k) = min{t2 : (k, t2 ) ∈ S}, gN
via
− − − (t) = gN (N t) − gN (0). ξN
denote the probability distribution in C[0, 1] corresponding to the process Let µ−,∗ N (recall (3.15)) 1 −,∗ − (t) = √ θN (t) − N e(t) ˆ , t ∈ [0, 1]. θN N Theorem 3.3. For the sequences of measures µ−,∗ N the statement of Theorem 3.2 holds true. Moreover, for any sequence of real numbers αN , αN → 0 as N → ∞, one has the convergence − + (t) − θN (t) → 0 (3.21) αN θ N in probability as N → ∞. Clearly, the formulated results are valid also for the measures µ±,∗ N M describing the statistical properties of the phase boundaries S ∈ TNϕM inthe box VN M with the boundary ˆ N with any fixed ε > 0. This folcondition σ ϕ , provided only M > max t∈[0,1] |e(t)|+ε ± lows immediately from the observation that the events {max t∈[0,1] |N −1 θN (t) − e(t)| ˆ > ± ε} belong to the large deviations region for the measures µN and thus have exponentially small probabilities as N → ∞.
Fluctuations of Phase Boundary in 2D Ising Ferromagnet
407
4. Basic Representation of the Partition Function We start with discussing the statistical properties of the vector ΘN of joint distribution (recall (3.16)), (4.1) ΘN ≡ YN , XN (s1 ), . . . , XN (sk ), XN (sk+1 ) ∈ Rk+2 , where k is a natural number, the quantities si satisfy the condition 0 < s1 < . . . < sk < sk+1 = 1, the normalized area YN is defined in (3.4), and the process XN (t), t ∈ [0, 1], is determined via (recall (3.17)) + + ([N t]) − gN (0). XN (t) ≡ gN
(4.2)
For future reference we consider the more general situation. Namely, fix any natural number k and a collection R = {r1 , . . . , rk+1 } of natural numbers (they can depend on N , i. e., ri = ri,N ) such that for all sufficiently large N ≥ N0 (R) one has the relation 0 < r1 < . . . < rk < rk+1 = N. Denote (cf. (4.2))
+ + (ri ) − gN (0), X(ri ) ≡ gN
(4.3)
and consider the random vector
ΘN,R = YN , X(r1 ), . . . , X(rk ), X(rk+1 ) ∈ Rk+2 .
(4.4)
For any complex vector H = (H0 , H1 , . . . , Hk+1 ) ∈ Ck+2 we denote by LN,R (H) the logarithmic moment generating function of the random vector ΘN,R , LN,R (H) ≡ log E exp{β H, ΘN,R }. Observe that the last equality can be rewritten in the form (cf. (3.5)) LN,R (H) = log Ξ(N, R, H) − log Ξ(N ), where Ξ(N, R, H) =
X S∈TN
n exp −2β|S| + β H, ΘN,R −
X
(4.5) o 8(3) .
(4.6)
3:3∩1(S)6=∅
As we will show below (see Theorem 5.1), the last expression is finite provided
408
R. Dobrushin, O. Hryniv
9(3) = exp{−8(3)} − 1, we observe that there exists β0 < ∞ such that |9(3)| ≤ exp{−2(β − β0 )d(3)}
(4.8)
for all β ≥ β0 and any finite set 3 (cf. (2.11)–(2.12)). In particular, 9(3) vanishes on non-connected sets 3. Denote by CN the set of all collections C = {S, 31 , . . . , 3j }, where S ∈ TN , finite sets 3i ⊂ Z2 are connected and satisfy the condition 3i ∩ 1(S) 6= ∅, i = 1, . . . , j; j = 0, 1, . . .; here 1(S) is the set of all sites attached to the phase boundary S. Then the partition function Ξ(N, R, H) can be rewritten in the form n X Y o Ξ(N, R, H) = exp −2β|S| + β H, ΘN,R 9(3) + 1 3:3∩1(S)6=∅
S∈TN
=
n
X
exp −2β|S| + β H, ΘN,R
j o Y
(4.9) 9(3l ).
l=1
C∈CN
Fix any C = {S, 31 , . . . , 3j } ∈ CN . We say that the collection C is regular in the column m ∈ N if the line {(x, y) ∈ R2 : x = m} intersects the set S ∪ 31 ∪ . . . ∪ 3j at a unique point. Let 1 ≤ m1 < m2 < . . . < ml ≤ N − 1, l = l(C) ∈ {0, 1, . . . , N − 1}, be the set of all m, 1 ≤ m ≤ N − 1, such that the collection C is regular in the column m. Denote 41 = {(x, y) ∈ R2 : x ≤ m1 }, 42 = {(x, y) ∈ R2 : m1 ≤ x ≤ m2 }, ··· 4l = {(x, y) ∈ R2 : ml−1 ≤ x ≤ ml }, 4l+1 = {(x, y) ∈ R2 : ml ≤ x} (in the case l = 0 we have 41 = R2 ). By definition, the animal ξi , i = 1, . . . , l + 1, is the collection ξi = {Si , 3j1 , . . . , 3js }, where
Si = S ∩ 4 i ,
{3j1 , . . . , 3js } = {3 ∈ C : 3 ⊂ 4i }.
Let (mi , yi ) = S ∩ {(x, y) ∈ R2 : x = mi }, i = 1, . . . , l. We put also (m0 , y0 ) = (0, 1/2) and (ml+1 , yl+1 ) = (N, h(S) + 1/2). For any animal ξi we define the following quantities: the length |ξi | that coincides with the length of the polygon Si ; the base J(ξi ) = (mi−1 , mi ]; the width |J(ξi )| = mi − mi−1 ; the height h(ξi ) = yi − yi−1 with (mi−1 , yi−1 ) and (mi , yi ) denoting the beginning and the end of the animal ξi . Then, we define the area a(ξi ) below ξi as a(ξi ) =
1 − (a − a+i ), 2 i
+ where a− i and ai denote the areas of the lower and the upper parts of the rectangle [mi−1 , mi ] × [yi−1 − M, yi−1 + M ] that appear after cutting it along Si (clearly, this definition is independent of M provided it is sufficiently large, M ≥ M0 (S); cf. (2.5)). Finally, for r ∈ J(ξi ) = (mi−1 , mi ] we denote by h(r, ξi ) the height of the animal ξi in the rth column,
Fluctuations of Phase Boundary in 2D Ising Ferromagnet
409
+ + h(r, ξi ) = gN (r) − gN (mi−1 ).
Direct computations give us the following relations: h(S) =
l+1 X
h(ξi ),
i=1
X(r) =
j(r)−1 X
(4.10)
h(ξi ) + h(r, ξj (r)),
i=1
a(S) =
l+1 X
a(ξi ) + (N − mi )h(ξi )
i=1
with j(r) denoting j such that r ∈ J(ξj ). Define the activity of ξi via k+1 n X mi H0 + 9N,R,H (ξi ) = exp −2β|ξi | + βh(ξi ) 1 − 1{i<j(rn )} Hn N n=1
+β
k+1 X n=1
o Y 1 1{i=j(rn )} Hn h(rn , ξj(rn ) ) + βH0 a(ξi ) 9(3s ), N
(4.11)
3s ∈ξi
where 1{i<j(rn )} and 1{i=j(rn )} denote the indicator functions of the relations i < j(rn ) and i = j(rn ) correspondingly. Then the partition function Ξ(N, R, H) can be rewritten in the form l(C) X Y 9N,R,H (ξi ). (4.12) Ξ(N, R, H) = C∈CN i=1 0
Fix any animal ξ. An animal ξ is called vertically congruent to ξ iff it can be obtained by shifting all components of ξ on the same distance in the vertical direction. Let ξˆ denote the class of all animals that are vertically congruent to ξ. Clearly, all ξ ∈ ξˆ have the ˆ Observe that same length, base, height, etc. and thus have the same activity 9N,R,H (ξ). ˆ ˆ any collection C ∈ CN can be rewritten in the form {ξ1 , . . . , ξl+1 } such that the class ξˆi has the base J(ξˆi ) = (mi−1 , mi ] and 0 = m0 < m1 < . . . < ml+1 = N . On the other hand, to any such collection {ξˆ1 , . . . , ξˆl+1 } corresponds a unique C ∈ CN ; therefore, b N of all ordered collections there exists a one-to-one mapping between CN and the set K {ξˆ1 , . . . , ξˆl+1 } described above. As a result, (4.12) can be rewritten in the form X
Ξ(N, R, H) =
l+1 Y
9N,R,H (ξˆi ).
(4.13)
bN i=1 {ξˆ1 ,...,ξˆl+1 }∈K b (a,b] , (a, b] ⊆ [0, N ], a, b ∈ N, of ordered In a similar way we consider the set K ˆ ˆ collections {ξ1 , . . . , ξl } of the equivalence classes ξˆi such that J(ξˆi ) = (mi−1 , mi ] and a = m0 < m1 < . . . < ml+1 = b. Using the activities from (4.11) we introduce the partition function Ξ((a, b], N, R, H) =
X
l+1 Y
b(a,b] {ξˆ1 ,...,ξˆl+1 }∈K
i=1
9N,R,H (ξˆi ).
(4.14)
410
R. Dobrushin, O. Hryniv
(In the case a = b we put as usually Ξ(∅, N, R, H) = 1.) Relations (4.13) and (4.14) will be the starting point of our considerations. It follows from estimate (2.13) that the weights w(S) from (2.14) coincide asymptotically as β → ∞ with exp{−2β|S|}. Therefore, the probability distribution (2.15) is “close” to the distribution concentrated on the polygons S ∈ TNϕ of minimal length. It is convenient to consider a slightly larger set of phase boundaries TN,∞ = {S ∈ TN : |S ∩ {(x, y) : x = m}| = 1, ∀m = 0, . . . , N }
(4.15)
and the probability distribution PN,β,∞ (S) =
exp{−2β|S|} , Ξ(N, β, ∞)
S ∈ TN,∞ ,
(4.16)
with the partition function X
Ξ(N, β, ∞) =
exp{−2β|S|}.
(4.17)
S∈TN,∞
Note that according to definition (4.15) every S ∈ TN,∞ is regular in any column m, m = 0, . . . , N . Therefore, any animal ξ corresponding to S ∈ TN,∞ has unit width and is called a tame animal. The probability distribution PN,β,∞ (·) from (4.16)–(4.17) is called the ensemble of tame animals. Any animal that is not tame is called wild. For any S ∈ TN,∞ one has |S| = |ξ1 | + . . . + |ξN |. Moreover, for any tame animal ξ one easily gets |J(ξ)| = 1, |ξ| = |h(ξ)| + 1, a(ξ) = h(ξ)/2, and therefore (cf. (4.10)) X(r) =
r X
h(ξj ),
a(S) =
j=1
N X
N − j + 1/2 h(ξj ).
j=1
As a result, the distribution (4.16)–(4.17) coincides with the distribution of the homogeneous random walk with the generating function Z(H) of one step, Z(H) ≡ E exp{βHh(ξ)} = Q(H)/Q(0), where Q(H) =
+∞ X
exp{−2β |k| + 1 + βHk} = e−2β
k=−∞
sinh(2β) . cosh(2β) − cosh(Hβ)
(4.18)
Thus, the limiting behaviour of the phase boundary S in the ensemble of tame animals with fixed values X(N ) = N bN and a(S) = N 2 qN can be described by Theorem 2.3 from [6], where such asymptotics for a general random walk was investigated. To extend that result to the case of the probability distribution PN,β (·) (recall (2.16)) in the ensemble of phase boundaries S ∈ TN (i. e., to prove Theorem 3.2) is the main goal of the present paper. In the ensemble of tame animals the partition function (4.6) is reduced to n X o exp −2β|S| + β H, ΘN,R . (4.19) Ξ(N, R, H, ∞) = S∈TN,∞
We rewrite it in the form
Fluctuations of Phase Boundary in 2D Ising Ferromagnet
Ξ(N, R, H, ∞) =
411 N Y
Q(HN,j ),
(4.20)
j=1
where Q(·) was defined in (4.18) and the quantities HN,j , j = 1, . . . , N , are calculated via k+1 X Hn 1{j≤rn } . (4.21) HN,j = 1 − (j − 1/2)/N H0 + n=1
For future reference we define also the partition function (cf. (4.14)) Ξ((a, b], N, R, H, ∞) =
b Y
Q(HN,j ),
(4.22)
j=a+1
where (a, b] ⊆ [0, N ] is a segment with integer endpoints a, b. Here again Ξ(∅, N, R, H, ∞) = 1. Observe that the function Q(·) is finite for all H such that |
0 and β ≥ β0 (δ) > 0 one has
then
|
(4.23)
| cosh(Hβ)| cosh(|
(4.24)
if only β ≥ β0 (δ) > 0 and therefore tanh(2β) cosh(2β − δ/2) − 1 ≤ < e−δ/4 . 2β e Q(H) cosh(2β)
(4.25)
As a result, log Q(H) is well defined and uniformly bounded for all real H satisfying (4.23) with any fixed β ≥ β0 (δ) > 0. b k+2 (recall (4.7)). Then any HN,j from (4.21) satisConsider arbitrary H ∈ D δ fies (4.23) and therefore the function N −1 log Ξ(N, R, H, ∞) is bounded uniformly in N and any such H. Since the asymptotical properties of the partition function Ξ(N, R, H, ∞) are well understood ([6]), we can reduce the investigation of the partition function Ξ(N, R, H) from (4.6) to the study of the relative partition function b Ξ(N, β, R, H) =
Ξ(N, R, H) . Ξ(N, R, H, ∞)
(4.26)
In the remaining part of this section we develop the so-called polymer representation of this partition function and obtain certain estimates for the polymer weights. All the considerations will be applicable also to the relative partition function b Ξ((a, b], N, R, H) ≡
Ξ((a, b], N, R, H) Ξ((a, b], N, R, H, ∞)
(4.27)
(recall (4.14), (4.22)) for any interval (a, b] ⊆ (0, N ] with integer endpoints. Substituting (4.13) and (4.20) into (4.26) one easily obtains 6 6 Here and below j is always an integer number; therefore, j ∈ J(ξ) ˆ means j ∈ J(ξ) ˆ ∩ Z1 . For any segment I = (a, b] ⊆ [0, N ] with integer endpoints we denote by |I| its length, |I| = b − a.
412
R. Dobrushin, O. Hryniv l+1 Y
X
b Ξ(N, β, R, H) =
Y
9N,R,H (ξˆi )
bN i=1 {ξˆ1 ,...,ξˆl+1 }∈K
Q(HN,j )−1 .
(4.28)
j∈J(ξˆi )
For any segment I = (a, b] ⊆ [0, N ] denote bN,R,H (I) = X
Y
−1 X
ˆ 9N,R,H (ξ).
(4.29)
bN,R,H (Ii ), X
(4.30)
Q(HN,j )
j∈I
ˆ ξ)=I ˆ ξ:J(
Then (4.28) can be rewritten in the form b Ξ(N, β, R, H) =
X
[N/2]
X
α Y
α=0 {I1 ,I2 ,...,Iα } i=1
where the inner sum is taken over all families of mutually disjoint intervals Ii = (ai , bi ] ⊆ bN,R,H (I) = 1. [0, N ] such that |Ii | ≥ 2. Observe that |I| = 1 implies X Formula (4.30) is a particular case of the polymer representation of the partition function ([17, 7]). To apply the cluster expansions technique we need the following estimate (cf. [7, Lemma 4.7]). b k+2 and a real number γ satisfies the Lemma 4.1. Let H ∈ Ck+2 be such that
Y j∈I
−1 X
Q(HN,j )
ˆ exp{γ|ξ|}. ˆ 9N,R,H (ξ)
ˆ ξ)=I ˆ ξ:I(
Then there exists β¯ > 0 depending only upon the value β0 from (4.8) and on the constant δ such that for all β ≥ β¯ and all intervals I ⊂ (0, N ] under consideration one has b¯ ¯ (4.31) |X N,R,H (I)| ≤ exp{−4(β − β)(|I| − 1)}. b¯ The functions X N,R,H (I) depend analytically on such H. Remark 4.1.1. Putting γ = 0 we obtain estimate (4.31) for the polymer weights bN,R,H (I) from (4.29). X Proof. We start with the following observation. Let ξ be a wild animal with the base J(ξ) = (m0 , m00 ] and let a natural number m satisfy the condition m0 < m < m00 . Since ξ is not regular in the column m at least one of the following two events occurs: 1) the vertical line {(x, y) ∈ R2 : x = m} intersects the corresponding part S = Sξ of the phase boundary at least at three points; 2) a point from some set 3 ∈ ξ belongs to the column m and thus at least two boundary bonds of the set 3 are intersected by this line. Therefore, for any wild animal ξ = (S, 31 , . . . , 3k ) one has the inequality |J(S)| − 1 ≤
X 1 Nh (S) − |J(S)| − 1 + d(3) , 2 3∈ξ
Fluctuations of Phase Boundary in 2D Ising Ferromagnet
413
where Nh (S) denotes the number of full horizontal bonds in S, the function d(·) satisfies (2.11)–(2.12) and J(S) ≡ J(ξ). As a result, X d(3) ≥ 3 |J(S)| − 1 − Nh (S). 3∈ξ
Denote X
e X(S) ≡ Pk
exp{−2(β − β0 )
d(3i )}
(4.32)
i=1
k,31 ,...,3k :3i ∩1(S)6=∅
i=1
k X
d(3)≥3(|J(S)|−1)−Nh (S)
and fix any β1 > 0. As it was shown in [7] (see Eq. (4.7.11)), there exists a function ε = ε(β1 ), ε(β1 )&0 as β1 %∞, such that e X(S) ≤ exp{−6(β − β2 ) |J(S)| − 1 + 2(β − β2 )Nh (S)} exp{ε|S|} (4.33) with β2 = β0 + β1 . Define
X
XN,R,H (S) =
|9N,R,H (ξ)|,
(4.34)
ξ:Sξ =S
where the sum is taken over all wild animals ξ with fixed Sξ = S. We prove below the following estimate: e XN,R,H (S) ≤ exp{−2β|S| + (2β − δ/2)Nv (S)}X(S)
(4.35)
with Nv (S) denoting the number of vertical bonds in S. Then (4.31) follows directly. b k+2 , one has (recall (4.25)) Indeed, for any H,
−1
≤ e2β+2β3
with some β3 = β3 (β0 , δ). Therefore the inequality Y −1 X b¯ XN,R,H (I) ≤ Q(HN,j ) j∈I
XN,R,H (S)eγ|S|
S:I(S)=I, yin (S)=0
(here yin (S) denotes the y-coordinate of the initial point of S) can be rewritten in the form b¯ XN,R,H (I) ≤ e−(4β−6β2 +2β3 )(|I|−1) e2β+2β3 e−(2β−ε−γ) X exp{ 2β − 2β2 − (2β − ε − γ) Nh (S)} S:I(S)=I, yin (S)=0
exp{(−δ/2 + ε + γ)Nv (S)}, where the identity |S| = Nv (S)+Nh (S)+1 was used. Let β1 be such that ε = ε(β1 ) < δ/8 and β2 = β0 + β1 ≥ β3 . Then X b¯ e−β4 (Nh (S)+1)−δNv (S)/4 , (4.36) XN,R,H (I) ≤ e−4(β−2β2 )(|I|−1) S:I(S)=I, yin (S)=0
414
R. Dobrushin, O. Hryniv
where we used the obvious inequality 2β2 + 2β3 ≤ 2(β2 + β3 ) |I| − 1 (recall that for any wild animal ξ one has |J(ξ)| > 1) and denoted β4 = 2β2 − ε − γ. It remains to observe that the last sum was shown to be bounded [7, p. 119], X −1 e−β4 (Nh (S)+1)−δNv (S)/4 ≤ R(β4 , δ)|I| 1 − R(β4 , δ) , (4.37) S:I(S)=I, yin (S)=0
provided β4 is large enough, β4 ≥ β¯4 (δ), to guarantee the estimate R(β4 , δ) = 2e−β4
1 + e−δ/4 < 1. 1 − e−δ/4
As a result, (4.31) follows directly from (4.36) and (4.37). It remains to establish (4.35). To do this we cut the polygon S into pieces by any vertical line x = m, m ∈ N. Then S splits up into certain collection of zigzag fragments fn consisting of two horizontal half-bonds and (possibly) a vertical segment of S. The ordering of fn in S determines in a unique way the initial and ending points of fn . Define the height h(fn ) of fn as the difference between ordinates of ending and initial points of fn . Clearly, X X h(fn ), Nv (ξi ) = |h(fn )|. (4.38) h(ξi ) = fn ∈ξi
fn ∈ξi
Define the midpoint cn of the fragment fn as the midpoint of the vertical segment belonging to fn (provided it is not empty) or as the midpoint of the fragment fn itself (otherwise). Let dn denote the distance from cn to the vertical line x = mi passing through the ending point of the animal ξi (recall that J(ξi ) = (mi−1 , mi ]). The direct geometric considerations give the equality X dn h(fn ). (4.39) a(ξi ) = fn ∈ξi
Now, (4.38) and (4.39) imply the relation (cf. (4.11)) X mi 1 mi − d n + a(ξi ) = . h(fn ) 1 − h(ξi ) 1 − N N N
(4.40)
fn ∈ξi
b k+2 and the inequality Then, the inclusion
(4.41)
imply the estimate (recall (3.7), (4.7)) k n o X mi 1 < βh(ξi ) 1 − H0 + 1{i<j(rl )} Hl + Hk+1 + βH0 a(ξi ) N N l=1 ≤ 2β − 3δ/4 Nv (S).
(4.42)
b k+2 and the obvious inequality On the other hand, from the inclusion
Fluctuations of Phase Boundary in 2D Ising Ferromagnet
415
k n X o δ < β 1{i=j(rn )} Hn h(rn , ξj(rn ) ) ≤ Nv (Sξi ). 4
(4.43)
n=1
Finally, (4.35) follows immediately from (4.34), (4.11), (4.7), (4.42) and (4.43). Estimate (4.31) is proved. It remains to observe that the uniform estimates obtained above imply the analyticity b¯ b k+2 of X N,R,H (I) as a function of H,
n=1
(4.44) Then there exist constants β0 and N0 = N0 (β0 ) such that for all β ≥ β0 , N ≥ N0 and all segments I = (a, b] ⊆ [0, N ], b − a ≤ log2 N , with integer endpoints one has the estimate e bN,R,H (I) ≤ 2 e2β log4 N/N − 1 exp{−4(β − β0 )(|I| − 1)}. (4.45) XN,R,H (I) − X Proof. We start with the following simple observation. There exists β¯ > 0 (probably different from β¯ in (4.31)) such that for all αN > 0 and all β ≥ β¯ one has Y −1 Q(HN,j ) j∈I
¯ ˆ ≤ e−δαN /8 e−4(β−β)(|I|−1) 9N,R,H (ξ) .
X ˆ ξ)=I ˆ ξ:I( ˆ Nv (ξ)≥αN
Indeed, using the relation (cf. (4.37)) X
e−β4 (Nh (S)+1)−δNv (S)/4
S:J(S)=I, yin (S)=0, Nv (S)≥αN
≤ e−δαN /8
X
e−β4 (Nh (S)+1)−δNv (S)/8
S:J(S)=I, yin (S)=0
≤e
−δαN /8
one easily deduces (4.46) from (4.36). Now,
R(β4 , δ/2)|I| 1 − R(β4 , δ/2)
−1
(4.46)
416
R. Dobrushin, O. Hryniv
Y b eN,R,H (I) |Q(HN,j )| XN,R,H (I) − X ≤
j∈I
ˆ −9 ˆ e N,R,H (ξ) 9N,R,H (ξ)
X 2N ˆ ξ)=I, ˆ ˆ ξ:J( Nv (ξ)≤log
+
X
ˆ + 9N,R,H (ξ)
e ˆ . 9N,R,H (ξ)
X
ˆ ξ)=I, ˆ ξ:J(
ˆ ξ)=I, ˆ ξ:J(
2 ˆ Nv (ξ)>log N
2 ˆ Nv (ξ)>log N
(4.47)
ˆ definitions (4.11) and (4.44), one ˆ ≤ |I|Nv (ξ), Then, using the simple estimate |a(ξ)| obtains N (S) ˆ −9 ˆ ≤ e2β|I| vN − 1 9N,R,H (ξ) ˆ e N,R,H (ξ) 9N,R,H (ξ) (4.48) 4 ˆ , ≤ e2β log N/N − 1 9N,R,H (ξ) provided |I| ≤ log2 N and Nv (S) ≤ log2 N . Finally, substituting (4.48) into (4.47) and using (4.46) to evaluate the last two sums in (4.47), one easily deduces (4.45) from Lemma 4.1 for all sufficiently large N . 5. Cluster Expansion and Limiting Properties of the Partition Function b We establish here the cluster expansion for the relative partition function Ξ(N, β, R, H) and investigate some asymptotical properties of the corresponding free energy to be used later. The following statement presents the main result of this section. Theorem 5.1. There exists a constant β0 depending only on δ and the constant β0 from b k+2 (recall (4.7)), the partition (2.11) such that for all β ≥ β0 , N , and H,
=
N P log Q(HN,j ) log Ξ(N, R, H) − j=1
≤
(5.1)
N exp{−4(β − β0 )}.
There exist functions 8N,R,H (I) of intervals I = (a, b] ⊆ (0, N ] with integer endpoints such that (5.2) 8N,R,H (I) ≤ exp{−4(β − β0 ) |I| − 1 }, and
b log Ξ(N, β, R, H) =
X
8N,R,H (I).
(5.3)
I⊂[0,N ]
Finally, the functions 8N,R,H (I) depend analytically on polymer weights XN,R,H (I 0 ), I 0 ⊆ I, and the following inequality holds ∂8N,R,H (I) 2 0 0 (5.4) ∂XN,R,H (I 0 ) ≤ |I| − |I | + 1 exp{|I | exp{−4(β − β0 )}}.
Fluctuations of Phase Boundary in 2D Ising Ferromagnet
417
Remark 5.1.1. For k = 0 one has Ξ(N, R, H) ≡ Ξ(N, 3, H) (recall (3.6)) and therefore this partition function is finite for all H,
(5.5)
i=1
that is valid for some β¯0 < ∞ and arbitrary I0 = (a, a + 1] ⊂ [0, N ], a ∈ Z. It remains to check (5.4). Due to the M¨obius inversion formula (see, e. g., [18, §2.6], [7, §3.8], [8, §3.3]) the cluster weights 8N,R,H (I) can be calculated from (recall (4.27)) 8N,R,H (I) =
X
∗ b ∗ , N, R, H), (−1)|I\I | log Ξ(I
(5.6)
I ∗ : ∅6=I ∗ ⊆I
where again I ∗ are intervals with integer endpoints. According to Proposition 3.6 ([8]) the bN,R,H (I 0 ), b ∗ , N, R, H) depend analytically on the polymer weights X functions log Ξ(I 0 ∗ 7 I ⊆ I . Moreover, using (4.30) and (5.3) one has n b ∗ , N, R, H) Ξ(I b ∗ \ I 0 , N, R, H) ∂ log Ξ(I = = exp − bN,R,H (I 0 ) b ∗ , N, R, H) ∂X Ξ(I
X
o ˜ . 8N,R,H (I)
˜ ˜ ∗, I=(a,b]: I⊆I 0 ˜ 6=∅ I∩I
As a result, (5.5) implies directly that ∂ log Ξ(I b ∗ , N, R, H) ˜ ≤ exp{|I 0 |e−4(β−β0 ) }, bN,R,H (I 0 ) ∂X
(5.7)
with some β˜0 depending only on β0 . It remains to observe that for any pair I, I 0 , I 0 ⊆ I, 2 of intervals with integer endpoints there exists no more than |I| − |I 0 | + 1 intervals I˜ satisfying the condition I 0 ⊆ I˜ ⊆ I. Finally, (5.4) follows immediately from (5.6), (5.7) and the last observation. Remark 5.1.2. We have proved (5.4) using only the polymer representation (4.30) of the b bN,R,H (I) partition function Ξ(N, β, R, H) and the estimate (4.31) of polymer weights X (recall Remark 4.1.1). Since the explicit form of these polymer weights was not used, our result is valid for any partition function defined via (4.30) with any collection of polymer weights satisfying (4.31). 7
In the case I ∗ \ I = I1 ∪ I2 with disjoint intervals I1 and I2 we denote
b 1 ∪ I2 , N, R, H) ≡ Ξ(I b 1 , N, R, H)Ξ(I b 2 , N, R, H). Ξ(I
418
R. Dobrushin, O. Hryniv
In the remaining part of the present section we obtain some corollaries of Theorem 5.1 to be used later on. Let first k = 0, R = {r1 }, r1 = N and H = (0, H) ∈ C2 . Then the partition function Ξ(N, R, H) from (4.6) coincides with the partition function Ξ(N, H) (recall (2.19)) b for the height h(S) of the phase boundary S. Define Ξ(N, H) similarly to (4.26). The following result was obtained in [7]. Corollary 5.2 ([7], Theorem 4.8). Let H satisfy the condition |
(5.8)
b Then all the statements of Theorem 5.1 are valid for the partition function Ξ(N, H). Moreover, the functions 8N,R,H (I) do not depend on N , 8N,R,H (I) ≡ 8H (|I|), where |I| denotes the length of the interval I, and there exists a limit b log Ξ(N, H) , Fb (H) = lim n→∞ N
(5.9)
that presents an analytical function of H in the region (5.8). Finally, one has the expansion ∞ X 8H (i) (5.10) Fb (H) = i=2
and the estimate |Fb (H)| ≤ exp{−4(β − β0 )},
(5.11)
where β ≥ β0 with sufficiently large β0 . b N, R, H) = 1 for any I ⊂ [0, N ] such that Remark 5.2.1. Due to (4.27) one has Ξ(I, |I| = 1. Thus, (5.6) implies 8H (1) = 0 that explains the absence of i = 1 in (5.10). The expansion from (5.10) plays the important role in the following considerations. Remark 5.2.2. It follows from (5.9), definitions (4.26) and (4.20) that the limit (recall (2.20)) log Ξ(N, H) F (H) = lim N →∞ N exists, is an analytical function of H in the region (5.8), and satisfies there the following identity ([7, p. 120]) F (H) ≡ Fb (H) + log Q(H). To study the asymptotical properties of the area aN (S) below the phase boundary S we put k + 1 = 0 in (4.1). Denote the corresponding partition function by Ξ(N, H, area) b and define the relative partition function Ξ(N, H, area) as in (4.26).
Fluctuations of Phase Boundary in 2D Ising Ferromagnet
419
b Corollary 5.3. Assume that H satisfies (5.8). Then Ξ(N, H, area) is a non-vanishing analytical function of such H. Moreover, there exists the limit Z 1 b log Ξ(N, H, area) = Fb (1 − x)H dx, Fbarea (H) ≡ lim N →∞ N 0
(5.12)
where Fb (·) is the free energy from (5.9) corresponding to the height h(S) of the phase boundary S. Finally, there exist constants β0 and N0 such that for all N ≥ N0 and β ≥ β0 , Z b H, area) − N log Ξ(N,
1
Fb (1 − x)H dx ≤ exp{−3(β − β0 )} log10 N. (5.13)
0
Remark 5.3.1. Due to the integral representation in (5.12), the function Fbarea (·) is an analytical function of H in the region (5.8). b H, area) with respect to H converge to Remark 5.3.2. The derivatives of N −1 log Ξ(N, the corresponding derivatives of Fbarea (H). In this case estimate (5.13) is also true with possibly another constant β0 . The following simple property of real functions will be used below. Property 5.4. Let f (·) be a smooth real function, f : U → R1 , where U is some open convex set in Rk . Assume that for any i = 1, . . . , k one has ∂f (x) ∂xi ≤ ai x=y
(5.14)
uniformly in y ∈ U . Then for all y, z ∈ U , |f (y) − f (z)| ≤
k X
ai |yi − zi |.
(5.15)
i=1
Proof. Define g(t) = f z + t(y − z) . Then g 0 (t) =
k X ∂f z + t(y − z) · (yi − zi ), ∂xi i=1
and therefore (recall (5.14)) Z
1
|g (t)| dt ≤
|f (y) − f (z)| ≡ |g(1) − g(0)| ≤ 0
0
k X i=1
ai |yi − zi |.
420
R. Dobrushin, O. Hryniv
b Proof of Corollary 5.3. The analyticity of log Ξ(N, H, area) with respect to H in the region (5.8), the cluster expansion X b (5.16) 8N,H,area (I) log Ξ(N, H, area) = I⊂[0,N ]
and the estimates for 8N,H,area (I) of the type (5.2) and (5.4) follow directly from Theorem 5.1. It remains to establish (5.13). We will check below that there exists a constant C1 = C1 (δ, β0 ) such that for all β ≥ β0 and all intervals I = (m0 , m00 ] ⊂ [0, N ], |m0 − m00 | ≤ log2 N , with integer endpoints the following inequality holds: log8 N exp{−3(β − β0 )}, 8N,H,area (I) − 8(1−m00 /N )H (|I|) ≤ C1 N
(5.17)
where the quantities 8H (k) coincide with the elements of expansion (5.10). Then (5.13) will follow directly from (5.17). Indeed, using (5.16) we obtain Z b Ξ(N, H, area) − N log
1
Fb (1 − x)H dx
0 N X b ≤ F 1 − m00 /N H − m00 =1
Z + N
1
8N,H,area (I)
X I=(m,m00 ]:I⊆(0,m00 ]
(5.18)
N X Fb (1 − x)H dx − Fb 1 − j/N H ,
0
j=1
where in view of (5.10), ∞ X Fb 1 − j/N H = 8(1−j/N )H (k).
(5.19)
k=2
Let us estimate every term on the right-hand side of (5.18). First of all, due to analyticity of Fb (·) there exists a constant C2 = C2 (δ, β) > 0 such that for all β ≥ β0 and H in the region (5.8) one has Z N
1 0
N X j H ≤ C2 . Fb (1 − x)H dx − Fb 1 − N j=1
Then, (5.11) and analog of (5.2) imply RN = Fb 1 − m00 /N H −
X
8N,H,area (I)
I=(m,m00 ]:I⊆(0,m00 ]
≤ exp{−4(β − β0 )} + exp{−4(β − β¯0 )} = C3 < ∞. Finally, for any m00 ≥ log2 N we rewrite (recall (5.19))
(5.20)
Fluctuations of Phase Boundary in 2D Ising Ferromagnet
RN = Fb 1 − m00 /N H −
X
8N,H,area (I)
I=(m,m00 ]:I⊆(0,m00 ]
X
≤
421
8N,H,area (I) − 8(1−m00 /N )H (I)
I=(m,m00 ]:I⊆(0,m00 ] |m00 −m|≤log2 N
X
+
|8N,H,area (I)| + 8(1−m00 /N )H (I) .
I=(m,m00 ]:I⊆(0,m00 ] |m00 −m|>log2 N
Applying (5.17) to every term in the first sum and using (5.2) for all other terms we obtain log8 N exp{−3(β − β0 )} RN ≤ log2 N · C1 N X exp{−4(β − β0 )k} +2 (5.21) k≥log2 N
≤ C4
log10 N exp{−3(β − β0 )}. N
for all sufficiently large N . Finally, applying (5.20) for m00 ≤ log2 N and (5.21) in the opposite case, log2 N < m00 ≤ N , we obtain Z 1 b Fb (1 − x)H dx H, area) − N log Ξ(N, 0
≤ C3 log2 N + C4 (N − log2 N )
log10 N exp{−3(β − β0 )} + C2 N
≤ C5 exp{−3(β − β0 )} log10 N (with some constant C5 > 0) for all sufficiently large N . Thus, it remains to prove (5.17). Fix any I = (m0 , m00 ] ⊂ [0, N ], |m00 − m0 | ≤ 2 b H, area) correlog N , with integer endpoints. Recall that the partition function Ξ(N, sponding to the normalized area YN is expressed in terms of activities n o Y mi 1 h(ξi ) + βH a(ξi ) 9(3s ), 9N,H,area (ξi ) = exp −2β|ξi | + βH 1 − N N 3s ∈ξi
where the animal ξi has the base J(ξi ) = (mi−1 , mi ]. Define (cf. (4.29)) Y −1 X bN,H,area (I 0 ) = ˆ Q 1 − j/N H 9N,H,area (ξ), X j∈I 0
ˆ ξ)=I ˆ 0 ξ:I(
ˆ ⊂ (m0 , m00 ] consider also new activities For all ξˆ with J(ξ) o Y n 00 ˆ ˆ = exp −2β|ξ| ˆ + βH 1 − m h(ξ) 9N,H,area (ξ) 9(3s ) N 3s ∈ξi
ˆ and polymer weights (with the same value m00 for all such animals ξ)
I 0 ⊆ I.
422
R. Dobrushin, O. Hryniv
b 0 X N,H,area (I ) =
Y
Q
−1 X ˆ 1 − j/N H 9N,H,area (ξ),
j∈I 0
I 0 ⊆ I.
ˆ ξ)=I ˆ 0 ξ:I(
b 0 0 Clearly, the polymer weights X N,H,area (·) satisfy (4.31). Moreover, for all I , I ⊆ I, one has b b 0 XN,H,area (I 0 ) − X N,H,area (I ) 4 ¯ − 1)} ≤ 2 e2β log N/N − 1 exp{−4(β − β)(|I| (5.22) ≤ 4β
log4 N 2β log4 N/N ¯ e exp{−4(β − β)(|I| − 1)}, N
provided N and β are sufficiently large, β ≥ β¯ and N ≥ N0 . In the second inequality above we have used the simple inequality ex − 1 ≤ xex that is true for all x ≥ 0. Let 8N,H,area (I) and 8N,H,area (I) be the cluster weights generated by b 0 0 bN,H,area (I 0 ) and X X N,H,area (I ), I ⊆ I, correspondingly. In view of Remark 5.1.2 we apply (5.4) and (5.15) to obtain 8N,H,area (I) − 8N,H,area (I) X 0 −4(β−β0 ) b b 0 0 e|I |e · log4 N · X ≤ N,H,area (I ) − XN,H,area (I ) . I 0 :I 0 ⊂I
Then, using (5.22) one gets 8N,H,area (I) − 8N,H,area (I) ≤ 4β
log8 N 2β log4 N/N X −4(β−β0 )(|I 0 |−1) |I 0 |e−4(β−β0 ) e e e , N 0 0 I :I ⊂I
8
≤
log N −3(β−β 0 ) e N
provided N is sufficiently large and β ≥ β 0 > 0. It remains to observe that due to its definition 8N,H,area (I) coincides with 8(1−m00 /N )H (I). Finally, consider the random vector ΘN from (4.1)–(4.2), ΘN ≡ YN , XN (s1 ), . . . , XN (sk ), XN (1) ∈ Rk+2 , where the collection S = {s1 , . . . , sk+1 } is such that 0 < s1 < . . . < sk+1 = 1. Denote o n R(S) = [N s1 ], . . . , [N sk ], N . b Then the corresponding partition function Ξ(N, R(S), H) is given by (4.6) with R k+2 b replaced by R(S). For any H,
k+1 X l=1
Hl 1{x<sl } .
(5.23)
Fluctuations of Phase Boundary in 2D Ising Ferromagnet
423
b Corollary 5.5. The partition function Ξ(N, R(S), H) is a non-vanishing analytical k+2 b function of H,
(5.24)
e where Fb (·) is the free energy from (5.9) and H(x) was defined in (5.23). Finally, there ˜ exist constants N0 and β0 such that for all N ≥ N0 and β ≥ β˜0 , Z b Ξ(N, R(S), H) − N log
1
e Fb H(x) dx ≤ log10 N exp{−3(β − β0 )}.
(5.25)
0
Remark 5.5.1. Due to the integral representation in (5.24), the free energy FbR(S) (H) is b k+2 . an analytical function of H,
+
k X
1
e Fb H(x) dx
0
si+1
e b (ri , ri+1 ], N, R, H Fb H(x) dx − log Ξ
si
8N,R(S),H (I) .
X
i=1 I: (ri ,ri +1]⊆I⊂[0,N ]
Therefore, (5.26) and (5.5) imply the inequality Z b R(S), H) − N log Ξ(N,
1
e Fb H(x) dx
0
≤ (k + 1) log N exp{−3(β − β0 )} + k exp{−4(β − β0 )} ≤ log10 N exp{−3(β − β˜0 )} 10
for all sufficiently large N and β ≥ β˜0 .
424
R. Dobrushin, O. Hryniv
6. Limit Theorems for the Joint Distribution We study here the asymptotical behaviour of the probabilities P(ΘN = MN ) and P(3N = AN ) entering the right-hand side of (3.19). Let an integer number k ≥ 0 and a set S of real numbers si , {0 < s1 < . . . < sk < 1 = sk+1 be fixed. Denote R = {ri : ri = [N si ], i = 1, . . . , k + 1}, b k+2 consider the logarithmic moment generating function LN,R (H) and for H ∈ D δ corresponding to the random vector ΘN,R ≡ ΘN from (4.1)–(4.2), (6.1) LN,R (H) = log E exp{β H, ΘN,R }. b k+2 we introduce also the random vector ΘN,R,H with H-tilted distribuFor any H ∈ D δ tion, (6.2) P(ΘN,R,H = M) = exp{β(M, H) − LN,R (H)}P(ΘN,R = M), where M ∈ Mk+2 N (recall (3.18)). Observe that the mean vector EΘN,R,H and the covariance matrix CovΘN,R,H of ΘN,R,H can be calculated via β EΘN,R,H = ∇H LN,R (H),
β 2 CovΘN,R,H = HessLN,R (H),
(6.3)
where ∇H denotes the gradient and HessLN,R (H) is the Hessian (the matrix of the second derivatives) of LN,R (H) as the function of H = (H0 , H1 , . . . , Hk+1 ). Assuming that H and M are related via βM = ∇H LN,R (H), one easily obtains (recall (6.2), (4.5)) Ξ(N, R, H) P ΘN,R,H = M P ΘN,R = M = exp{−β(M, H)} Ξ(N ) = exp{−L∗N,R (M)} P ΘN,R,H = M
(6.4)
with L∗N,R (·) denoting the Legendre transformation
L∗N,R (M) ≡ sup β(M, H) − LN,R (H) . H
In view of (6.4) the problem is reduced to the investigation of the asymptotical behaviour of the probability P ΘN,R,H = M . b k+2 define the matrix For any H ∈ D δ B N,R (H) ≡
1 HessLN,R (H) β2N
and introduce the quadratic form BN,R,H (T), T = (t0 , t1 , . . . , tk+1 ) ∈ Rk+2 , BN,R,H (T) ≡ B N,R (H)T, T . Consider also the quadratic form
BR,H (T) ≡ B R (H)T, T
(6.5)
Fluctuations of Phase Boundary in 2D Ising Ferromagnet
425
corresponding to the matrix (recall (5.24)) Z 1 1 e B R (H) ≡ 2 Hess log Q + Fb H(x) dx, β 0
(6.6)
e where Q(·), Fb (·), and H(x) were defined in (4.18), (5.9), and (5.23) respectively. b k+2 and Lemma 6.1. Let β ≥ β0 with β0 fixed in (5.11). Then uniformly in H ∈ D δ k+2 T ∈ R , |T| = 1, one has BN,R,H (T) → BR,H (T) as N → ∞.
(6.7)
Moreover, there exist positive constants b, N0 , and β¯ depending only on β0 from (5.11) b k+2 , N ≥ N0 , and β ≥ β¯ one has and δ such that uniformly in H ∈ D δ BR,H (T) ≥ b|T|2 ,
BN,R,H (T) ≥ b|T|2 .
(6.8)
Proof. In view of (6.5), (4.5), and (4.26) one easily obtains B N,R (H) =
1 β2N
Hess log Ξ(N, R, H, ∞) +
1 β2N
b Hess log Ξ(N, β, R, H).
(6.9)
The first term on the right-hand side of (6.9) presents the normalized covariance matrix for the ensemble of tame animals. Due to (4.20) the corresponding quadratic form QN,R,H (T) satisfies the relation QN,R,H (T) = QR,H (T) + O(N −1 )|T|2
as N → ∞,
(6.10)
where the limiting quadratic form QR,H (T) is calculated via 1 QR,H (T) = 2 β
Z
1
e (log Q)00 H(x)
(1 − x)t0 +
k+1 X
0
2 1{x<sl } tl
dx.
(6.11)
l=1
Let FbN,R,H (T) be the quadratic form corresponding to the second term on the right-hand side of (6.9). According to Remark 5.5.2 one has log10 N exp{−3(β − β0 )} |T|2 FbN,R,H (T) = FbR,H (T) + O N
as N → ∞
(6.12)
with the limiting quadratic form (cf. (6.11)) 1 FbR,H (T) = 2 β
Z
1 0
e (Fb )00 H(x)
(1 − x)t0 +
k+1 X
2 1{x<sl } tl
dx.
l=1
As a result, (6.7) follows immediately from (6.10) and (6.12). It remains to prove the inequalities in (6.8). First, observe that BR,H (T) ≡ QR,H (T) + FbR,H (T). 00 We will show later that the function β −2 log Q + Fb (H) is uniformly bounded from below (and above) by two positive constants uniformly in H, |H| < 2−3δ/4β, provided
426
R. Dobrushin, O. Hryniv
β is sufficiently large, β ≥ β¯0 . Then the first inequality in (6.8) follows from the observation that the quadratic form Z 1
(1 − x)t0 +
k+1 X
0
2 1{x<sl } tl
dx
l=1
is a positive continuous function of T = (t0 , . . . , tk+1 ) on the unit sphere |T| = 1, and thus is bounded from below by some positive constant C1 . 00 To prove that the function β −2 log Q + Fb (H) is bounded uniformly in H, |H| < 2 − 3δ/4β, we observe that due to (4.18), ∂2 β 2 ∂H 2
log Q(H) =
cosh(2β) cosh(Hβ) − 1 2 cosh(2β) − cosh(Hβ)
and thus (recall (4.24)) e−2β
cosh(2β0 ) − 1 cosh(Hβ) cosh(2β0 ) − 1 ≤ · cosh(2β0 ) cosh(2β0 ) cosh(2β) 2 ∂ eδ/4 ≤ 2 log Q(H) ≤ 2 2 β ∂H eδ/4 − 1
if only β ≥ β0 and H ∈ R1 satisfies (4.23). On the other hand, due to Corollary 5.2 for any fixed H0 , |H0 | < 2−3δ/4β, the function Fb(H) is analytic in the disk of radius δ/4β with the center at H0 . Applying the Cauchy formula and estimate (5.11) one obtains
∂2 b F (H) ≤ C(δ) exp{−4(β − β0 )}, β 2 ∂H 2
where C(δ) > 0 is a constant depending only on δ. The needed inequality follows immediately provided β is such that C(δ) exp{−4(β − β0 )} ≤
1 −2β cosh(2β0 ) − 1 e = q1 . 2 cosh(2β0 )
b k+2 , the last Put b = q1 C1 /2. Since the convergence in (6.7) is uniform in H ∈ D δ inequality in (6.8) follows for all sufficiently large N , N ≥ N0 . Let Θ be the Gaussian random vector with zero mean and the covariance matrix B R (H) (recall (6.6)). Denote its characteristic function by χH (T), n 1 o χH (T) ≡ exp − BR,H (T) , 2
T ∈ Rk+2 .
(6.13)
Since the matrix B R (H) is positively definite, the distribution of Θ is non-degenerate and has the density pH (X), X ∈ Rk+2 .
Fluctuations of Phase Boundary in 2D Ising Ferromagnet
427
b k+2 satisfy the condition HN → H ∈ Theorem 6.2. Let a sequence of vectors HN ∈ D δ k+2 b Dδ as N → ∞. Consider the random vector 1 ∗ ΘN ≡ √ ΘN,R,HN − EΘN,R,HN . N
(6.14)
∗ Then for all β ≥ β0 with sufficiently large β0 the distribution of ΘN converges weakly as N → ∞ to the distribution of the random vector Θ with the characteristic function χH (T).
Proof. Let χN (T) be the characteristic function of the random vector ΘN,R,HN , χN (T) ≡ E exp{i(T, ΘN,R,HN )} =
Ξ(N, R, HN + iβ −1 T) . Ξ(N, R, HN )
(6.15)
∗ equals Then the characteristic function χ∗N (T) of the random vector ΘN
1 i RN , log χ∗N (T) = − BN,R,HN (T) − 2 6N 3/2
(6.16)
where RN =
k+1 1 X ∂3 t t t log Ξ(N, R, H) l m p 3 β ∂Hl ∂Hm ∂Hp H=HN + l,m,p=0
(6.17) iω √ β N
T
with some ω = ω(HN , T), 0 ≤ ω ≤ 1. Since the convergence in (6.7) is valid for T b k+2 ), it remains to prove that belonging to any compact set in Rk+2 (uniformly in H ∈ D δ as N → ∞. (6.18) RN = o N 3/2 Let χN,R,H (T) be the characteristic function of the random vector ΘN,R,H , H ∈ k+2 b Dδ , (cf. (6.15)) Ξ(N, R, H + iβ −1 T) χN,R,H (T) = . (6.19) Ξ(N, R, H) We will show below that the function log χN,R,H (T) can be extended to an analytical Pk+1 function of T in the region {T ∈ Ck+2 , i=0 |=ti | < δ/4}. Then, applying the Cauchy formula one obtains ∂3 (6.20) ∂tl ∂tm ∂tp log χN,R,H (T) ≤ C(δ) sup |log χN,R,H (T)| (H,T)∈G(δ) for all such T, where (recall (4.7)) k+1 o n X k+2 , T ∈ Ck+2 , |=ti | < δ/4 , G(δ) = (H, T) : H ∈ Dδ/2 i=0
and the constant C(δ) depends only on δ. This will give us the needed estimate for the remainder RN . Using (4.26) we rewrite (6.19) in the form
428
R. Dobrushin, O. Hryniv
χN,R,H (T) = χ∞ N,R,H (T)
b Ξ(N, R, H + iβ −1 T) , b Ξ(N, β, R, H)
(6.21)
where χ∞ N,R,H (T) denotes the corresponding characteristic function in the ensemble of tame animals (recall (4.20), (4.18)), χ∞ N,R,H (T) =
N Y Q(HN,j + iβ −1 tN,j ) , Q(HN,j )
(6.22)
j=1
and the quantities tN,j are calculated via (cf. (4.21)) k+1 X tN,j = 1 − (j − 1/2)/N t0 + tn 1{j≤rn } . n=1
It follows from (5.1) that b Ξ(N, R, H + iβ −1 T) log ≤ 2N exp{−4(β − β0 )} b Ξ(N, β, R, H)
(6.23)
uniformly in (H, T) ∈ G(δ) provided β ≥ β0 with β0 = β0 (2δ/3) > 0. On the other hand (see Eq. (4.10.18) in [7]), the inequality −β(2−|HN,j |) −3β/4 ¯ ¯ ≤ C(δ)e (6.24) |log Q(HN,j + iβ −1 tN,j ) − log Q(HN,j )| ≤ C(δ)e
holds uniformly in N , j = 1, . . . , N and (H, T) ∈ G(δ). Then, (6.21), (6.22), (6.23), and (6.24) imply the estimate e e−3δ/4 |log χN,R,H (T)| ≤ C(δ)N
(6.25)
for all N , (H, T) ∈ G(δ), provided β ≥ β0 (2δ/3) > 0. Finally, the analyticity of log χN,R,H (T) follows directly from (6.25), definitions (6.21), (6.22), (4.18), and Theorem 5.1. Since (6.18) follows directly from (6.17), (6.20), and (6.25), one has the convergence χ∗N (T) → χH (T),
as N → ∞
(6.26)
that is uniform in T belonging to any compact set in Rk+2 provided β is sufficiently large. b k+2 be the sequence of vectors from Theorem 6.2. For any Let HN , HN → H ∈ D δ N define 1 EN ≡ EΘN,R,HN = ∇H LN,R (H) , β H=HN and for any MN ∈ Mk+2 N (recall (3.18)) put 1 XN = √ MN − EN . N
Fluctuations of Phase Boundary in 2D Ising Ferromagnet
429
b k+2 b k+2 Theorem 6.3. Uniformly in MN ∈ Mk+2 N and HN ∈ Dδ , HN → H ∈ Dδ , one has 2N
k+4 2
P(ΘN,R,HN = MN ) − pH (XN ) → 0
as N → ∞,
where pH (·) denotes the density of the random vector Θ from Theorem 6.2, provided β ≥ β0 with sufficiently large β0 > 0. Proof. Using the well-known inversion formula for the Fourier transformation we rewrite the difference ρN = 2N
k+4 2
in the form ρN =
1 (2π)k+2
P(ΘN,R,HN = MN ) − pH (XN )
Z
χ∗N (T)e−i(T,HN ) dT Z 1 − χ (T)e−i(T,HN ) dT, (2π)k+2 Rk+2 H
A
(6.27)
where √ A = {T = (t0 , . . . , tk+1 ) ∈ Rk+2 : |t0 | ≤ 2πN 3/2 , |tl | ≤ π N , l = 1, 2, . . . , k + 1}. Following the standard proof of the local limit theorem (see, e. g., [11, §43]) we evaluate the right-hand side of (6.27) by the sum of four terms, (2π)−(k+2) (J1 + J2 + J3 + J4 ) , where for some positive constants A and 1, Z J1 = |χ∗N (T) − χH (T)| dT, A1 = [−A, A]k+2 , A1 Z J2 = χH (T) dT, A2 = Rk+2 \ A1 , A2 Z Jp = |χ∗N (T)| dT, p = 3, 4, Ap
with
√ A3 = {T ∈ Rk+2 : |tl | ≤ 1 N , l = 0, 1, . . . , k + 1} \ A1 , A4 = A \ (A1 ∪ A3 ) .
Fix any ε > 0. We will show in the following that the constants A and 1 can be chosen in such a way to imply Jp < ε/4, p = 1, . . . , 4, if only β ≥ β¯0 (and N ≥ N0 ) with sufficiently large β0 > 0 (and N0 > 1). First, due to (6.26) one has J1 → 0 as N → ∞ for any fixed A > 0 and all β ≥ β0 , provided β0 is sufficiently large. Then, since the distribution of the random vector Θ is non-degenerate, one has large β0 . J2 → 0 as A → ∞ for all β ≥ β0 with sufficiently √ To estimate J3 fix any T ∈ A3 . Then |T| ≤ 1 N (k + 2) and for any N one gets (recall (6.17), (6.20), and (6.25))
430
R. Dobrushin, O. Hryniv
|RN | ≤ C1 (δ)N e
−3δ/4
k+1 X
|tl |
3
≤ C1 (δ)N exp{−3δ/4}(k + 2)3/2 |T|3
l=0
≤ C1 (δ)N 3/2 exp{−3δ/4}(k + 2)2 1|T|2 . Consequently (recall (6.16)), i C (δ)(k + 2)2 1 1 e−3δ/4 1|T|2 . R ≤ log χ∗N (T) + BN,R,HN (T) = N 2 6 6N 3/2 Let 1 > 0 be such that b C1 (δ)(k + 2)2 exp{−3δ/4}1 ≤ 6 4 with the constant b from (6.8). Then 1 b b < log χ∗N (T) ≤ − BN,R,HN (T) + |T|2 ≤ − |T|2 2 4 4 and therefore
|χ∗N (T)| ≤ exp{< log χ∗N (T)} ≤ exp{−b|T|2 /4} for all T ∈ A3 uniformly in N ≥ N0 and β ≥ β¯ (with N0 and β¯ from Lemma 6.1). As a result, Z Z 2 |χ∗N (T)| dT ≤ e−b|T| /4 dT & 0 as A % ∞. J3 = A3
A2
Finally, fix any T ∈ A4 and rewrite |χ∗N (T)| in the form (recall (6.21), (6.14)) √ b |Ξ(N, R, HN + iT/β N )| ∗ ∞ −1/2 T)| . (6.28) |χN (T)| = |χN,R,HN (N b |Ξ(N, R, HN )| The arguments, similar to those used in the proof of Theorem 4.2 from [6], imply the b k+2 , existence of a constant C = C(R, δ, β0 ) > 0 such that for all T ∈ A4 , H ∈ D δ β ≥ β0 , and N sufficiently large one has −1/2 T)| ≤ exp{−CN }. |χ∞ N,R,H (N
Then, applying (5.1) to estimate the partition functions on the right-hand side of (6.28) one immediately gets |χ∗N (T)| ≤ exp{−N (C − 2 exp{−4(β − β0 )})}. Therefore, for all sufficiently large β, β ≥ β¯0 , one obtains Z Z k+4 |χ∗N (T)| dT ≤ e−CN/2 dT = (2π)k+2 N 2 exp{−CN/2} & 0 J4 = A4
A
as N → ∞ that finishes the proof of the theorem.
In the arguments above the Gaussian density pH (·) can be replaced by the density of zero-mean Gaussian distribution with the covariance matrix B N,R (HN ) (recall (6.5), (6.16), and (6.26)). In particular, one has
Fluctuations of Phase Boundary in 2D Ising Ferromagnet
431
Corollary 6.4. There exist positive constants N0 , β0 , c0 , and C0 such that for all N ≥ N0 and β ≥ β0 , 1/2 C0 2 c0 2 β ≤ det Hess L3N (HN ) β , P(3N,HN = AN ) ≤ (6.29) N N where L3N (·) was determined in (3.5) and HN – in (3.11). For future reference we formulate also the following simple statement. Corollary 6.5. Let all XN be uniformly bounded. Then under the conditions of Theorem 6.3 one has P(ΘN,R,HN = MN ) =
1 − k+4 N 2 pH (XN ) · (1 + o(1)), 2
b k+2 where the estimate o(1) is uniform with respect to the considered sequences HN ∈ D δ and XN , provided only β is sufficiently large. Moreover, there exist positive constants β0 , ci , Ci , i = 1, 2, and a number N0 such that k+4 c2 β k+2 ≤ c1 pH (XN ) ≤ N 2 P ΘN,R,HN = MN ≤ C1 pH (XN ) ≤ C2 β k+2 (6.30) uniformly in N ≥ N0 and the sequences HN , XN under consideration, provided only β ≥ β0 . 7. Convergence of Finite Dimensional Distributions We prove here the convergence of finite dimensional distributions of the conditional random process (recall (3.15)) 1 ∗ + (t) = √ θN (t) − N e(t) ˆ , θN N
t ∈ [0, 1],
to the corresponding distributions of the Gaussian measure µ∗ from Theorem 3.2. Consider first the vector 3N of conditions (3.3) with the logarithmic moment generating function L3N (H) from (3.5). Assume that H belongs to the set Dδ2 defined in (3.7). Then log10 N 1 ∇H L3N (H) = I(H) + O , (7.1) Nβ N where I(H) was defined in (3.12) and the estimate O(·) is uniform in H ∈ Dδ2 . Indeed, it follows from (3.5) and (4.26) that b 3, H) + log Ξ(N, 3, H, ∞) . (7.2) β −1 ∇H L3N (H) = β −1 ∇H log Ξ(N, Then, due to Remark 5.5.2 one has Z 1 1 1 b ∇H log Ξ(N, 3, H) − ∇H Fb (1 − x)H0 + H1 dx Nβ β 0 ≤e
10 −3(β−β˜ 0 ) log
N
N
(7.3) .
432
R. Dobrushin, O. Hryniv
On the other hand, the analyticity and uniform boundedness of log Q(·) in the region (4.23) imply the estimate Z 1 1 1 ∇H log Ξ(N, 3, H, ∞) − ∇H log Q (1 − x)H0 + H1 dx = O N −1 . (7.4) Nβ β 0 Finally, (7.1) follows directly from (7.2)–(7.4) and definition (3.12). b be the solutions of (3.11) and (3.12) respectively. Applying the implicit Let HN and H function theorem to I(·) and taking into account estimate (7.1) one easily obtains 10 b = β −1 O log N + β −1 O(N −1 AN − A), |HN − H| N
(7.5)
where the estimates O(·) are uniform in HN ∈ Dδ2 , and N −1 AN ∈ I Dδ20 respectively (here δ 0 > 0 is any fixed number and I Dδ20 denotes the image of the region Dδ20 ). b as N → ∞ and therefore all HN with sufficiently large N belong to the Thus HN → H 2 region Dδ from (3.7) (recall Remark 3.1.2). Let ΘN be the random vector from (3.16), ΘN ≡ YN , XN (s1 ), . . . , XN (sk ), XN (sk+1 ) ∈ Rk+2 . 0 1 For HN = HN , HN determined from (3.11) we introduce the vector 0 0 1 HN ≡ (HN , 0, . . . , 0, HN ) ∈ Rk+2 . 0 converges to Clearly, the sequence HN
H0 = (Q, 0, . . . , 0, H) ∈ Rk+2 , b = (Q, H) denotes the solution of (3.12); thus, all H0 with sufficiently large N where H N b k+2 from (4.7). Denote (recall (6.1), (6.3)) belong to the region D δ 0 ≡ EΘN,R,H0 = N qN , e1N , . . . , ekN , N bN EN N
with eiN = eN (si ) =
1 ∂ LN,R (H) 0 . β ∂Hi H=HN
Similarly to (7.1) one easily obtains the relation (recall (7.5)) log10 N 1 eN (s) = e(s) ˆ + sO + sO N −1 AN − A , N N where (cf. (3.14)) Z 1 s 0 F (1 − x)Q + H dx = F (H + Q) − F (H + Q − Qs) βQ, e(s) ˆ = β 0 and the estimates O(·) are uniform in s ∈ [0, 1], provided β is sufficiently large. For any MN ∈ Mk+2 N (see (3.18)) of the kind MN = (N qN , m1N , . . . , mkN , N bN ),
(7.6)
(7.7)
Fluctuations of Phase Boundary in 2D Ising Ferromagnet
433
we put 1 xiN = √ (miN − eiN ), N
i = 1, . . . , k.
Let pk (·) denote the probability distribution of the Gaussian random vector Θ = (η, ¯ ξ¯1 , . . . , ξ¯k+1 ) with the characteristic function χH0 (T) from (6.13). Then p˜k (x1 , . . . , xk |0) ≡
pk (X0 ) , p0 (0)
X0 = (0, x1 , . . . , xk , 0) ∈ Rk+2 ,
presents the density of the conditional distribution (ξ¯1 , . . . , ξ¯k |η¯ = 0, ξ¯k+1 = 0). Finally, define the random process (recall (3.20), (7.7)) 1 ∗ (t) = √ ΘN (t) − N e(t) ˆ . ΘN N
(7.8)
Theorem 7.1. Let a natural number k and a collection of real numbers ti , 0 < t1 < . . . < tk < 1, be fixed. Then for all β ≥ β0 with sufficiently large β0 the distribution of ∗ ∗ the random vector ΘN (t1 ), . . . , ΘN (tk ) converges weakly to the Gaussian distribution with the density p˜k (·|0). This limiting distribution coincides with the corresponding distribution of the measure µ∗ from Theorem 3.2. The proof of Theorem 7.1 can be obtained by literal repetition of that of Theorem 5.2 in [6]. It is based on the following simple observation that follows immediately from Theorem 6.3 (cf. Lemma 5.1 in [6]). Lemma 7.2. Let all xiN be uniformly bounded. Then k P ΘN (s1 ) = m1N , . . . , ΘN (sk ) = mkN = N − 2 p˜k (x1N , . . . , xkN |0) (1 + o(1)) as N → ∞ if only β is sufficiently large, β ≥ β0 > 0; the estimate o(·) is uniform in such xiN . Denote (cf. (4.3)) + + 1j X = 1j X(S) = gN (j) − gN (j − 1)
and choose any ρ,
¯ 0 < ρ < δ/12
(7.9)
with δ¯ fixed in Theorem 3.2. Lemma 7.3. There exist positive constants C, β0 , and N0 such that for all β ≥ β0 , N ≥ N0 , and all j = 1, . . . , N one has E exp{ρ|1j X|} | 3N = AN ≤ C. (7.10) Proof. Fix any j ∈ {1, 2, . . . , N } and a phase boundary S ∈ TN . Applying to S the animal decomposition described in Sect. 4 we observe that 1j X is uniquely determined ˆ ⊇ (j − 1, j]. Denote by {ξ} ˆ the event by the animal ξ satisfying the condition J(ξ) ˆ ˆ = {S ∈ TN : the animal decomposition of S contains ξ}. {ξ} Then one has
434
R. Dobrushin, O. Hryniv
X ˆ ˆ 3N = AN , E eρ|1j X| | 3N = AN = exp{ρ|1j X(ξ)|}P({ ξ}|
(7.11)
ξˆ
ˆ such that J(ξ) ˆ = where the summation is going over the whole set of disjoint events {ξ} (mi−1 , mi ] ⊇ (j − 1, j]. Relation (7.11) will be the initial point of our reasoning. b k+2 , be We start with the following simple observation. Let Ξ(N, R, H), H ∈ D δ b ˆ the partition function from (4.13) and ξ ∈ KN be the animal fixed above. Denote by b N the set of all collections from K b N that contain ξ, b N (ξ) ˆ ⊂K ˆ K b N (ξ) b N : ξˆ ∈ {ξˆ1 , . . . , ξˆl+1 }}. ˆ = {{ξˆ1 , . . . , ξˆl+1 } ∈ K K b N labeled by ξˆ under consideration. Define b N (ξ) ˆ form the partition of K Clearly, the sets K (cf. (4.13)) l+1 Y X 9N,R,H (ξˆi ) Ξ N, R, H; ξˆ ≡ (7.12) i=1 ˆ ˆ ˆ bN (ξ) {ξ1 ,...,ξl+1 }∈K ˆ = Ξ N, R, H | ξˆ · 9N,R,H (ξ). b k+2 and sufficiently large β one has Then for all H ∈ D δ Ξ N, R, H | ξˆ ˆ Ξ(N, R, H) ≤ exp{(2β + Qδ + ρ)|J(ξ)|},
(7.13)
ˆ is the base of the animal ξˆ and (recall (4.25)) where J(ξ) Qδ ≡
max
H:|H|<2−δ/2β
| log Q(H) + 2β|.
(7.14)
To check (7.13) observe that the cluster expansion of log Ξ N, R, H | ξˆ contains ˆ Since the same only the cluster weights depending on I = (a, b] ⊆ (0, N ] \ J(ξ). weights appear in the expansion for log Ξ(N, R, H) one easily obtains (recall (4.26), (4.20), (4.21), and (5.5)) X ˆ log Q(HN,j ) ≤ K|J(ξ)| (7.15) log Ξ N, R, H | ξˆ − log Ξ(N, R, H) + ˆ j∈J(ξ)
b k+2 and sufficiently large β, where the constant K = K(β) & 0 as β % ∞. for all H ∈ D δ Thus, (7.13) follows directly from (7.15), (7.14), and the inequality |HN,j | < 2 − δ/2β (cf. (4.7), (4.23)). We will show below that for some constant C¯ > 0 and all sufficiently large β one has ˆ P(3N = AN ; {ξ} ˆ 3N = AN = P({ξ}| P(3N = AN (7.16) ˆ + ρ|3[ξ]|}9 ˆ ˆ ≤ C¯ exp{(2β + Qδ + 2ρ)|J(ξ)| N,3,HN ξ , where
ˆ h(ξ) ˆ ˆ + 1 − mi h(ξ), ˆ = a[ξ], ˆ h[ξ] ˆ = 1 aN (ξ) 3[ξ] N N
Fluctuations of Phase Boundary in 2D Ising Ferromagnet
435
ˆ = (mi−1 , mi ]), HN is the solution to (3.11), and (cf. (4.11)) (recall that J(ξ) n mi ˆ + βh(ξ) ˆ H0 + H 1 1− 9N,3,H ξˆ ≡ exp −2β|ξ| N 1 ˆo Y + βH0 a(ξ) 9(3s ) N ˆ 3 ∈ ξ s n o Y ˆ + β 3[ξ], ˆ H = exp −2β|ξ| 9(3s ) 3s ∈ξˆ
with H ∈ Dδ2 . Then (7.10) follows directly. Indeed, according to (4.42), ˆ H ≤ 2β − 3δ/4 Nv (ξ) ˆ β 3[ξ],
(7.17)
for any H ∈ Dδ2 . Then, the inequalities ˆ ≤ Nv (ξ), ˆ |h(ξ)|
ˆ ≤ |J(ξ)| ˆ · Nv (ξ), ˆ |a(ξ)|
ˆ ≤ mi |J(ξ)|
(7.18)
ˆ and therefore ˆ ≤ Nv (ξ) imply |a[ξ]| ˆ ˆ ≤ |a[ξ]| ˆ + |h[ξ]| ˆ ≤ 2Nv (ξ). |3[ξ]|
(7.19)
As a result, using the simple observation ˆ ≤ Nv (ξ), ˆ |1j X(ξ)|
(7.20)
one obtains (recall (7.9)) E exp{ρ|1j X|} | 3N = AN Y X ˆ ˆ ˆ e−2β|ξ|+(2β+Qδ +2ρ)|J(ξ)|+(2β−δ/2)Nv (ξ) 9(3s ) ≤ C¯ ξˆ
= C¯
X
e(2β+Qδ +2ρ)|I|
I
X
3s ∈ξˆ
e−2β|ξ|+(2β−δ/2)Nv (ξ) ˆ
ˆ
Y
(7.21)
9(3s ),
3s ∈ξˆ
ˆ ξ)=I ˆ ξ:J(
P where I denotes the summation over all I = (a, b] ⊆ (0, N ] such that I ⊇ (j − 1, j]. As in Sect. 4 (recall (4.32)–(4.35)) we estimate the inner sum via X ˆ ˆ e e−2β|ξ|+(2β−δ/2)Nv (ξ) X(S) S:I(S)=I, yin (S)=0
≤ e−6(β−β2 )(|I|−1)+2(β2 −β)
X
e−δNv (S)/4−β3 (Nh (S)+1) ,
S:I(S)=I, yin (S)=0
where β3 = 2β2 −ε and β2 in (4.33) is sufficiently large to imply ε(β2 ) < δ/4. Evaluating the last sum with the help of (4.37) one easily obtains (7.10), ∞ X R(β3 , δ)n ≤ C, (n + 1)e−4(β−β4 )n E eρ|1j X| | 3N = AN ≤ C˜ 1 − R(β3 , δ) n=1
(7.22)
436
R. Dobrushin, O. Hryniv
for β ≥ β4 and some constant C, where we set β4 = (6β2 + Qδ + 2ρ)/4 and C˜ = C¯ exp{2β2 + Qδ + 2ρ}. It remains to establish (7.16). First, we apply the analog of (6.4) to rewrite ˆ ˆ P(3N ;ξ,H P(3N = AN ; {ξ} ˆ N = AN Ξ(N, HN , 3; ξ) = · (7.23) Ξ(N, HN , 3) P(3N = AN P(3N,HN = AN with HN denoting the solution of (3.11). The first fraction on the right-hand side of (7.23) can be estimated with the help of (7.12)–(7.13), ˆ ˆ Ξ(N, HN , 3 | ξ) Ξ(N, HN , 3; ξ) = 9N,3,HN ξˆ Ξ(N, HN , 3) Ξ(N, HN , 3) ˆ ˆ ≤ exp{(2β2 + Qδ + ρ)|J(ξ)|}9 N,3,HN ξ . On the other hand, similarly to (7.13) one obtains 1 ˆ − log Ξ(N, HN , 3) ≤ (Qδ,1 + ρ)|J(ξ)| ˆ ∇H log Ξ(N, HN , 3 | ξ) β
(7.24)
(7.25)
with (recall (4.24)) Qδ,1 ≡ ≤
∂ sinh Hβ log Q(H) = max H:|H|<2−δ/2β β∂ H:|H|<2−δ/2β cosh 2β − cosh Hβ max
e−δ/2 e−δ/4 sinh(2β − δ/2) ≤ = cosh 2β − cosh(2β − δ/2) 1 − e−δ/4 eδ/4 − 1
for all β ≥ β0 (δ). Thus, taking into account the simple identity 1 ˆ ∇H log 9N,3,H ξˆ = 3[ξ] β (that can be obtained by direct computation) one deduces immediately that 1 ˆ ∇H log Ξ(N, H, 3; ξ) E3N ;ξ,H ˆ N ≡ β H=HN satisfies the estimate ˆ ˆ E3N ;ξ,H ˆ N − AN − 3[ξ] ≤ (Qδ,1 + ρ)|J(ξ)|.
(7.26)
√ ˆ ≤ A N with some It remains to evaluate the last fraction in (7.23). Let first |J(ξ)| fixed constant A > 0. Observe that the analog of (7.25) for the second derivatives can be obtained in a similar way; therefore, the analog of (5.24) for our special case R = {N } imply the convergence Z 1 1 1 ˆ Hess log Ξ(N, H, 3; ξ) → 2 Hess F (1 − x)H0 + H1 dx 2 β N β 0 b2 for any β ≥ β0 (δ) uniformly in H = (H0 , H1 ) ∈ Dδ . Thus, the limiting properties of√the ˆ ˆ ≤B N random vector 3N ; {ξ} are the same as that of 3N . In particular, if |3[ξ]| with any fixed constant B > 0, one can apply Corollary 6.4 to obtain (recall (7.26))
Fluctuations of Phase Boundary in 2D Ising Ferromagnet
P(3N ;ξ,H ˆ N = AN ≤ C¯ 1 P(3N,HN = AN
437
(7.27)
√ ˆ > B N , one has (recall provided β is sufficiently large. In the opposite case, |3[ξ]| (6.30)) P(3N ;ξ,H ˆ N = AN 1 C¯ N 2 ˆ ≤ ≤ 2 2 ≤ C¯ 3 eρ|3[ξ]| . (7.28) β P(3N,HN = AN P(3N,HN = AN √ ˆ > A N , one gets Finally, for |J(ξ)| P(3N ;ξ,H ˆ N = AN 1 C¯ N 2 ˆ ≤ ≤ 2 2 ≤ C¯ 4 eρ|J(ξ)| . (7.29) β P(3N,HN = AN P(3N,HN = AN Now, (7.16) follows immediately from (7.23), (7.24), and (7.27)–(7.29).
Observe that this proof can be applied to any local variable that satisfies the analog of ˆ where C > 0 is any fixed constant; (7.20) with the right-hand side of the kind CNv (ξ), then (7.9) should be replaced by ¯ 0 < ρ < δ/12C. In particular, one has Corollary 7.4. Let the constants C, β0 , and N0 be as determined in Lemma 7.3. Then − + E exp{ρ|gN (j) − gN (j)|} | 3N = AN ≤ C for all j = 1, 2, . . . , N , provided only N ≥ N0 and β ≥ β0 . For future reference we formulate here the following corollary of Lemma 7.3 that could be obtained directly from (7.16) using calculations similar to those in (7.21)– (7.22). Corollary 7.5. Fix a number j ∈ {1, 2, . . . , N }. For any phase boundary S ∈ TN apply the animal decomposition and denote by ξ(j) the animal satisfying J(ξ(j)) ⊇ (j − 1, j]. Then there exists β¯ < ∞ such that for all β ≥ β¯ and all l ≥ 1 one has ¯ P(|J(ξ(j))| ≥ l + 1| 3N = AN ≤ exp{−4(β − β)l}. Another consequence of Lemma 7.3 is the following Theorem 7.6. For all β ≥ β0 with β0 determined in Theorem 7.1 the finite dimensional ∗ distributions of the random process θN (t), t ∈ [0, 1], have the same limiting behaviour ∗ as that of ΘN (t). Proof. In view of the observation (recall (3.15), (3.20), (7.8), (3.17), and (3.2)) {N t} + ∗ ∗ + gN ([N t] + 1) − gN θN (t) − ΘN (t) = √ ([N t]) | 3N = AN N the statement of the theorem follows immediately from (7.10). For details see [6, Theorem 5.4].
438
R. Dobrushin, O. Hryniv
8. Proof of Main Theorems To complete the proof of our main result we need to check the weak compactness of the sequence of measures µ∗N . We obtain it here as an implication of Theorem 2.2 from [10, Chap. 9] which provides the sufficient condition for the weak compactness of measures in C[0, 1]. The following statement verifies the assumption of the above mentioned theorem. Theorem 8.1. There exist positive numbers C, β0 , and N0 such that ∗ ∗ E|θN (t) − θN (s)| ≤ C|t − s|7/4 4
uniformly in N ≥ N0 and all segments [s, t] ⊆ [0, 1], s < t, provided only β ≥ β0 . As in [6] we consider two cases, 1 = 1N ≡ |t − s| ≤ N −8/9 and 1 > N −8/9 , separately. Lemma 8.2. There exist positive numbers C1 and N1 such that ∗ ∗ E|θN (t) − θN (s)| ≤ C1 |t − s|7/4 4
uniformly in [s, t] ⊂ [0, 1], 1 ≤ N −8/9 , if only N ≥ N1 and β ≥ β0 with β0 determined in Lemma 7.3. The proof is based on estimate (7.10) and can be obtained by literal repetition of that of Lemma 6.2 from [6]. Lemma 8.3. There exist positive numbers C2 , β2 , and N2 such that ∗ ∗ E|θN (t) − θN (s)| ≤ C2 |t − s|2 4
(8.1)
uniformly in [s, t] ⊂ [0, 1], 1 > N −8/9 , if only N ≥ N2 and β is sufficiently large, β ≥ β2 . Proof. Denote (recall (3.2)) + + + + ζN ≡ ξN (t) − ξN (s) = gN (N t) − gN (N s)
and introduce the random vector (cf. (3.3))
√ e N = YN , hN , ζ N / 1 3
e 3 (H), H ∈ R3 , (cf. (3.5)) with the logarithmic moment generating function L N n o e 3 (H) ≡ log E exp β H, 3 e eN L = log Ξ(N, 3, H) − log Ξ(N ). N
(8.2)
0 1 , HN ) determined from (3.11) we define For HN = (HN 0 0 1 HN = (HN , HN , 0)
and
e e N ≡ 1 ∇H log Ξ(N, E 3, H) = (N qN , N bN , e˜N ), 0 β H=HN
where similarly to (7.6) one obtains the relation (recall (3.9))
(8.3)
Fluctuations of Phase Boundary in 2D Ising Ferromagnet
439
√ e˜N = N (e(t) ˆ − e(s)) ˆ + 1o( N ). As a result, for all sufficiently large N one has |ζ − e˜ √1| θ∗ (t) − θ∗ (s) 4 X N n n 4 √ √ N > k| 3N = AN . ≤2 (k + 1) P E 1 N1 k≥0
(8.4)
We will show below that for all N ≥ N2 and β ≥ β2 with sufficiently large N2 > 0 and β2 > 0 one gets the estimate √ √ (8.5) P |ζN − e˜N 1| > k N 1| 3N = AN ≤ fN (k), (
where fN (k) =
D1 exp{−α1 k 2 },
if
D2 exp{−α2 N 1/18 |k|},
if
√ |k| ≤ ε N 1, √ |k| > ε N 1,
(8.6)
and D1 , D2 , α1 , α2 , ε are some fixed positive constants. Thus, the series in (8.4) is convergent and (8.1) follows immediately. It remains to establish estimates (8.5)–(8.6). To do this we introduce the vector (recall (8.3)) √ √ e N + (0, 0, k N ) e N ≡ (N qN , N bN , e˜N + k N ) = E (8.7) Z e 0 (k), H e 1 (k), H e 2 (k)) from the equation e N (k) = (H eN = H and determine H N N N 1 e eN . ∇H log Ξ(N, 3, H) =Z β H=e HN
(8.8)
It follows theorem that provided√k in (8.7) is of √ from (8.2) and the 0implicit function e (k) − H 0 , H e 1 (k) − H 1 , and H e 2 (k) 1 are of order order N 1 the quantities H N N N N N 1. Therefore, there exist ε = ε(ρ) > 0, N3 > 0, and β3 > 0 such that for all k, √ |k| ≤ ε N 1, all β ≥ β3 , and all N ≥ N3 the following inequalities hold true √ e 0 (k) − H 0 | < ρ1, e 1 (k) − H 1 | < ρ1, e 2 (k)| < ρ 1. |H |H |H N N N N N Thus, applying arguments similar to those used in the proof of Lemma 6.1 one obtains the inequality (cf. (6.8)) e 3 (H)T, T ≥ Cβ 2 N |T|2 (8.9) Hess L N H=e HN (k) √ for all k, |k| ≤ ε N 1, all T ∈ R3 , β ≥ β4 , N ≥ N4 , where C, β4 , and N4 are some positive constants depending only on ε and β0 from (5.11). For future reference we fix such a value of ε > 0. √ Assuming that ζN − e˜N N 1 ≥ 0 (in the opposite case the estimates are similar) we rewrite √ √ P ζN > e˜N 1 + k N 1| 3N = AN √ √ P 3N = AN , ζN > e˜N 1 + k N 1 = (8.10) P(3N = AN ) √ √ e∗3N (e −L ZN ) P e e 3N = AN , ζN > e˜N 1 + k N 1 , = −L∗ (A ) HN PHN (3N = AN ) e 3N N
440
R. Dobrushin, O. Hryniv
e ∗ (·) and L∗ (·) denote the Legendre transformations of the functions L e 3 (·) where L N 3N 3N e and L3N (·) correspondingly, HN was determined in (8.8), HN – in (3.11), and Pe (·), HN e PHN (·) denote the tilted distributions of the random vectors 3N and 3N with parameters e N and HN respectively. H e ∗ (Z e N ) − L∗ (AN ). It follows from (8.2), Let us evaluate first the difference L 3N 3N (8.3), (3.11) and the duality relations (2.22) for the Legendre transformation that e ∗3 (E e N ) = L∗3 (AN ) L N N
and
e ∗3 (E e N ) = 0, ∂2 L N
e ∗ (x0 , x1 , x2 ) with respect to e ∗ (·) denotes the derivative of the function L where ∂2 L 3N 3N x2 . Consequently (cf. relation (A.5) in [6]), e ∗ (E e ∗3 (Z eN ) − L eN ) = L 3N N
Z 0
√ k N
√ e ∗3 (N qN , N bN , e˜N + y) dy, (8.11) (k N − y)(∂2 )2 L N
e ∗ (·) from below. Denote and one needs to evaluate (∂2 )2 L 3N ey = E e N + (0, 0, y) = (N qN , N bN , e˜N + y). E (8.12) N √ We will show below that in the case |k|√≤ ε N 1 there exist positive constants α1 = α1 (ε) and β5 such that for all y, |y| ≤ k N , one has
Then (8.11) implies √
e ∗3 (E e y ) ≥ α1 /N. (∂2 )2 L N N
(8.13)
e ∗ (E e ∗ (Z eN ) − L e N ) ≥ α1 k 2 , L 3N 3N
(8.14)
e ∗ (·) (see also Property A.2 in provided |k| ≤ ε N 1 and due to the convexity of L 3N [6]) e ∗ (E e ∗3 (Z eN ) − L e N ) ≥ 2α2 N 1/18 |k| (8.15) L 3N N √ in the opposite case, |k| > ε N 1. Thus, it remains to prove (8.13). To do this determine e 0 (y), H e 1 (y), H e 2 (y)) from the condition (recall (8.12)) e y = (H H N N N N 1 e ∗3 (H) ey , ∇H L =E N N y β H=e HN e 3 (H e y ). Since it is positive definite (recall inequality and consider the matrix Hess L N N √ √ (8.9)) there exists C5 = C5 (ε) > 0 such that for all y, |y| ≤ k N ≤ εN 1, one has ∂ 2 ∂2 2 ∂ 2 e 2 e e L3N (H) L3N (H) − L3 (H) y ≥ C5 N . ∂H0 ∂H1 ∂H1 ∂H0 N H=e HN (8.16) On the other hand, e 3 (H) ≤ C6 N 3 (8.17) det Hess L N y H=e HN uniformly in such y with some fixed constant C6 > 0. Since due to the duality relations e ∗ (E e y ) coincides with the ratio of the left-hand (2.22) the value of the derivative (∂2 )2 L 3N N sides in (8.16) and (8.17), one immediately obtains (8.13).
Fluctuations of Phase Boundary in 2D Ising Ferromagnet
441
√ It remains to evaluate the last fraction in (8.10). Consider first the case |k| ≤ be the random vector with the distribution induced by Pe (·) and ε N 1. Let 3N,e HN HN (H), H = (H , H ), be its logarithmic moment generating function, L3 ,e 0 1 H N
N
P β(H,M) (H) ≡ log P (3 = M) L3 ,e 2 e N M∈M e HN N HN N e 3 (H e e e N ). = L3N (HN + (H0 , H1 , 0)) − L N Note that this function is strictly convex and satisfies the condition (H) ≥ C5 N 2 det Hess L3 ,e N HN H=(0,0)
(8.18)
(since the expression √ on the left-hand side of (8.18) coincides with the left-hand side of (8.16) with y = k N ). As a result, applying the analog of (6.29) one gets C0 3N = AN ≤ 2 β 2 . Pe HN N On the other hand, the denominator PHN (3N = AN ) can be evaluated from below via the analog of (6.30). Thus, there √ exist positive constants C7 , β7 , and N7 such that for all N ≥ N7 , β ≥ β7 , and |k| ≤ ε N 1 one has √ √ Pe 3N = AN , ζN > e˜N 1 + k N 1 Pe (3N = AN ) HN ≤ HN ≤ C7 . PHN (3N = AN ) PHN (3N = AN )
(8.19)
√ In the opposite case, |k| > ε N 1, one easily gets (recall (6.30)) √ √ Pe 3N = AN , ζN > e˜N 1 + k N 1 1 HN ≤ PHN (3N = AN ) PHN (3N = AN ) N2 ≤ C8 exp{α2 N 1/18 |k|}. ≤ c2 β 2
(8.20)
It remains to observe that (8.5)–(8.6) follow immediately from (8.10), (8.14), (8.15), (8.19), and (8.20). Proof of Theorem 3.2. The statement of the theorem follows directly from Theorems 7.1, 7.6, 8.1, and Theorem 2.2 from [10]. Proof of Theorem 3.3. The first part of the theorem can be obtained in the same way as Theorem 3.2. The convergence in (3.21) follows from Corollary 7.4.
442
R. Dobrushin, O. Hryniv
A. Wulff Construction in 1D Models of SOS Type The 1D SOS model is the simplest interface model. In view of its simplicity it is very popular in the physical literature and is used mainly as a “toy model” for discussing the statistical properties of interfaces. In particular, the Wulff construction for this model is well understood ([1, 21]). On the other hand, the interfaces appearing in the 1D SOS model present sample paths of the 1D random walk of the special type (see, e. g., [6, Sect. 3]) and therefore the Wulff construction here follows immediately from the known facts of the sample paths large deviations theory ([3, Chap. 5], [22]). Using the probabilistic interpretation one can investigate a much more general case of random walks than those usually appearing in the physical literature in the context of 1D model of SOS type (see, e. g., [2] for a list of typical examples). In this sense, the random walks provide the most general model of SOS type and for this reason we will use the probabilistic language in the present section. We will restrict ourselves to the discrete case, though the generalization to the continuous one is straightforward [6, Sect. 2]. Let ξi be a sequence of independent integer valued random variables having the same non-degenerate distribution that is concentrated on the lattice Z1 . Then the interface is Pk described by the sequence of partial sums, S0 = 0, Sk = i=1 ξi , of the corresponding random walk. Denote by L(h) ≡ log E exp{hξ} the logarithmic moment generating function (the free energy) of a single step of this random walk. Assume in addition that L(·) is a finite function (and thus analytical) in some open neighbourhood of the origin. 8 Finally, for any n ≥ 1 and t ∈ [0, 1] define a random polygonal function (a piece-wise linearly interpolated interface) xn (t) = S[nt] + {nt}ξ[nt]+1 =
[nt] X
ξi + {nt}ξ[nt]+1
i=1
with [nt] and {nt} denoting the integral and the fractional parts of nt correspondingly. Then the distribution of n−1 xn (t) satisfies the large deviations principle with the rate function ([22, 4, 3]) Z 1 L∗ f 0 (t) dt, if f ∈ AC[0, 1], f (0) = 0, J (f ) = 0 +∞ otherwise, where AC[0, 1] is the space of absolutely continuous functions on [0, 1] and L∗ (·) is the Legendre transformation of L(·), L∗ (x) = sup xh − L(h) , h
that is well defined due to the strict convexity of L(·). In particular, for any admissible pair (q, b) (i. e., satisfying condition (A.4) below) one has R1 log P xn (1) ∈ (b, b + ε), 0 xn (t) dt ∈ (q, q + ε) = −J (f¯), lim lim ε&0 n→∞ n 8 This is a usual conjecture in applications; moreover, typically one demands the existence of all exponential moments for ξ (see, e. g., [2]).
Fluctuations of Phase Boundary in 2D Ising Ferromagnet
443
where f¯(·) presents the solution of the variational problem: Z 1 f (t) dt = q. J (f ) → inf : f (0) = 0, f (1) = b,
(A.1)
0
Note that the functional J (·) is closely related to the Wulff functional with naturally defined surface tension (see, e.g., [6, Sect. 3]), and therefore the function f¯(·) is the Wulff profile in the considered situation. L(h)
f¯(t)
6 0
h0
A
- -
O
h
6 A0
−→
?
O0
a)
1
-
t
b)
Fig. 2. Wulff construction in a general 1D model of SOS type
It turns out that the variational problem (A.1) can be solved explicitly. Namely, define the quantities hˆ 0 = hˆ 0 (q, b) and hˆ 1 = hˆ 1 (q, b) from the equations Z 1 L0 hˆ 1 + y hˆ 0 dy = b, Z
0
1
(A.2)
y L0 hˆ 1 + y hˆ 0 dy = q.
0
Then the Wulff profile f¯(·) is defined via ([6, Sect. 2]) L hˆ 1 + hˆ 0 − L hˆ 1 + (1 − t)hˆ 0 hˆ 0 , f¯(t) = 0 ˆ L (h1 )t ≡ bt,
if hˆ 0 6= 0,
(A.3)
otherwise.
Relations (A.2)–(A.3) have a simple geometric interpretation. Namely, rewriting (A.2) in the form (cf. [21, Theorem 3]) hˆ 0 = b, L hˆ 1 + hˆ 0 − L hˆ 1 Z hˆ 0 ˆ 1 L(h1 + hˆ 0 ) + L(hˆ 1 ) ˆ − L(h1 + y) dy = q − b/2 ˆ2 2 h0 0 we infer that these conditions prescribe to find two points A(hˆ 1 , L(hˆ 1 )) and O(hˆ 1 + hˆ 0 , L(hˆ 1 + hˆ 0 )) on the graph of the function L(·) such that (see Fig. 2,a)): 1) the straight line passing through O and A has the slope coefficient b; 2) the area Qb (h0 ) of the figure bounded by the segment OA and the arc of the graph of L(·) with the endpoints A and O equals (q − b/2)h0 2 , where h0 denotes the horizontal separation of the points A and O (in the case q < b/2 one should interchange these points). Then the Wulff proflie f¯(·)
444
R. Dobrushin, O. Hryniv
is obtained by simple transformation (reflection + scaling) of the arc OA (see Fig. 2,b)). In the critical case 2q = b the points O and A coincide and due to the second line in (A.3) the corresponding Wulff profile is reduced to the segment O0 A0 (Fig. 2,b)). Due to the strict convexity and analyticity of the function L(·), the normalized area Qb (h0 )/h0 2 is an increasing function of h0 and Qb (h0 )/h0 2 → 0 as h0 → 0. In particular, the conditions hˆ 0 = 0 and 2q = b are equivalent (recall (A.3)). As a result, equations (A.2) have at most one solution. Such solution clearly exists for every pair (q, b) satisfying the condition 9 |q − b/2| < sup Qb (h)/h2 . (A.4) h
Here the supremum corresponds to the most “upper” limiting position of the secant OA; thus, (A.4) means that the real secant should be below the limiting one (if such exists). Acknowledgement. This work was partially supported by ISF grant No. M5E000. O. H. thanks the E. Schr¨odinger International Institute for Mathematical Physics (Vienna) for warm hospitality and FWF for financial support through L. Meitner Fellowship No. M00289-MAT during the preparation of the final version of this text. We thank also R. Koteck´y for critical reading the manuscript and useful remarks.
References 1. DeConinck, J., Dunlop, F., Rivasseau, V.: On the microscopic validity of the Wulff construction and of the generalized Young equation. Commun. Math. Phys. 121, 401–419 (1989) 2. DeConinck, J., Ruiz, J.: Fluctuations of interfaces and anisotropy. J. Phys. A: Math. Gen. 21, L147–153 (1988) 3. Dembo, A., Zeitouni, O.: Large Deviations Techniques and Applications. Boston-London: Jones & Bartlett Publishers, 1992 4. Deuschel, J.-D., Stroock, D.W.: Large Deviations. London: Academic Press, 1989 5. Dobrushin, R.: A statistical behaviour of shapes of boundaries of phases. In: Koteck´y, R. (ed.) Phase Transitions: Mathematics, Physics, Biology.... Singapore: World Scientific, 1993, pp. 60–70 6. Dobrushin, R., Hryniv, O.: Fluctuations of Shapes of Large Areas under Paths of Random Walks. Probab. Theor. Relat. Fields 105, 423–458 (1996) 7. Dobrushin, R., Koteck´y, R., Shlosman, S.: Wulff Construction: A Global Shape from Local Interaction. (Translations of Mathematical Monographs, 104) Providence, R.I.: Amer. Math. Soc., 1992 8. Dobrushin, R., Shlosman, S.: Large and Moderate Deviations in the Ising Model. In: Dobrushin, R. L. (ed.) Probability Contributions to Statistical Mechanics. (Advances in Soviet Mathematics, 20), Providence, R.I.: Amer. Math. Soc. 1994, pp. 91–220 9. Gallavotti, G.: The phase separation line in the two-dimensional Ising model. Commun. Math. Phys. 27, 103–136 (1972) 10. Gikhman, I.I., Skorokhod, A.V.: Introduction to the theory of random processes. Philadelphia: Saunders, 1969 11. Gnedenko, B.V.: The theory of probability. New York: Chelsea, 1962 12. Higuchi, Y.: On some Limit Theorems Related to the Phase Separation Line in the Two-dimensional Ising Model. Z. Wahrscheinlichkeitstheorie verw. Gebiete. 50, 287–315 (1979) 13. Hryniv, O.O. and Dobrushin, R.L.: On Fluctuations of the Wulff Shape in the 2D Ising Model. Uspekhi Mat. Nauk, 50(6), 177–178 (1995) 14. F¨ollmer, H. and Ort, M.: Large deviations and surface entropy for Markov fields. Asterisque, 157–158, 173–190 (1988) 15. Ioffe, D.: Large Deviations for the 2D Ising Model: A Lower Bound without Cluster Expansions. J. Stat. Phys. 74, 411–432 (1994) 16. Ioffe, D.: Exact large deviation bounds up to Tc for the Ising model in two dimensions. Probab. Theo. Relat. Fields 102, 313–330 (1995) 9 In some cases a solution of (A.2) exists also when both expressions in (A.4) are equal. But this depends on the distribution of the single step ξ, and we will not discuss this question here.
Fluctuations of Phase Boundary in 2D Ising Ferromagnet
445
17. Koteck´y, R., Preiss, D.: Cluster expansions for abstract polymer models. Commun. Math. Phys. 103, 491–498 (1986) 18. Malyshev, V. A., Minlos, R. A.: Gibbs random fields: cluster expansions. Dordrecht: Kluwer, 1991 19. Minlos, R.A. and Sinai, Ya.G.: The phenomenon of “phase separation” at low temperatures in some lattice models of a gas. I. Math. USSR-Sb. 2, 335–395 (1967) 20. Minlos, R.A. and Sinai, Ya.G.: The phenomenon of “phase separation” at low temperatures in some lattice models of a gas. II. Trans. Moscow Math. Soc. 19, 121–196 (1968) 21. Miracle-Sole, S., Ruiz, J.: On the Wulff Construction as a Problem of Equivalence of Statistical Ensembles. In: Fannes, M. et al (eds.) On Three Levels: Micro-, Meso-, and Macro-Approaches in Physics. New York: Plenum Press, 1994, pp. 295–302 22. Mogulskii, A.A.: Large deviations for trajectories of multidimensional random walks. Th. Prob. Appl. 21, 300–315 (1976) 23. Pfister, C.E.: Large deviations and phase separation in the two-dimensional Ising model. Helv. Phys. Acta. 64, 953–1054 (1991) 24. Rockafellar, R.: Convex analysis. Princeton, N.J.: Princeton Univ. Press, 1970 25. Schonmann, R.H.: Second Order Large Deviation Estimates for Ferromagnetic Systems in the Phase Coexistence Region. Commun. Math. Phys. 112, 409–422 (1987) 26. Taylor, J.E.: Unique structure of solutions to a class of nonelliptic variational problems. Proc. Symp. Pure Math. 27, 419–427 (1975) 27. Taylor, J.E.: Some crystalline variational techniques and results. Asterisque 154–155, 307–320 (1987) Communicated by Ya.G. Sinai
Commun. Math. Phys. 189, 447 – 464 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Non-Perturbative Criteria for Gibbsian Uniqueness K.S. Alexander1,? , L. Chayes2 1 2
Department of Mathematics, USC, Los Angeles, CA 90089-1113, USA. E-mail: [email protected] Department of Mathematics, UCLA, Los Angeles, CA 90095-1555, USA. E-mail: [email protected]
Received: 22 August 1996 / Accepted: 15 November 1996
This paper is dedicated to the memory of R. Dobrushin: A loss that cannot be replaced. Abstract: For spin-systems with an internal symmetry, we provide sufficient conditions for unicity of the Gibbs state and/or complete analyticity by comparison to random cluster models. Introductory Remarks In the realm of statistical mechanics, under the subject headings high-temperature behavior, analyticity and uniqueness, the philosophical and mathematical contributions of R. Dobrushin will remain intact as long as the subject still exists. The usual approach to these questions consists of “expansion techniques” – high temperature expansions, cluster expansions, etc. These expansions have the advantage that they may be applied to virtually any (short-ranged) system, however, they suffer in that they are only functional for extreme values of parameters. As was often stressed by Dobrushin, a peculiar feature of these expansions is that while the formulation and resolution of problems within such a framework constitute definitive probabilistic statements, the intermediate steps do not. Concrete actions towards the repair of this deficiency were taken in [D2 ] where a not-cluster expansion was derived. Most of the usual high-temperature results can be obtained by this method (but unfortunately with the same sorts of restrictions) and in addition, certain new problems are suggested. Carrying the probabilistic attitude to its extreme, we arrive at the other edge of the spectrum: Graphical representations in statistical mechanics. These are faithful representations of the problem at hand, leading to stochastic-geometric problems that are well defined for all values of parameters. Prominent examples include the random cluster [FK] and random current [Ai] representations. The above examples are successful ?
Work supported in part by the NSF under the grant DMS-95-04462 (K.A.)
448
K.S. Alexander, L. Chayes
in the sense that phase transitions are characterized by a geometric phase transition in the graphical representation ([ACCN] and [Ai], respectively).1 The shortcomings of this approach are all too apparent: Such representations have only been found for a very few systems – each new result along these lines represents a separate challenge. The above cited applies, respectively, to the Potts ferromagnets and to Ising-type (Griffiths– Simon class) systems, period. The complete list (to date, to the authors’ knowledge) consists of the 2-component Widom-Rowlinson model [CCK, GLM] the cubic (generalized Ashkin–Teller) models and some models with first-order transitions [CM]. In this paper, we will pursue a hybrid approach: we will consider graphical representations for a “wider than usual” class of systems but sacrifice the “successfulness” clause usually associated with such representations. Let us address the specifics of these two points: (i) The systems that we study consist of interacting spins taking values in a compact (or discrete) group. The group structure is respected by the Hamiltonian and by the singlespin measure – Haar measure. In other words, for a given spin, all spin states are a priori equivalent. Thus, we are well away from a statement concerning “all possible spinsystems.” However, we are by no means restricted to phase transitions that result from a break down of symmetry. For ease of exposition, we will further restrict to translation invariant nearest neighbor interactions on the d-dimensional hypercubic lattices. By and large, these latter restrictions are far less important. (Related results on non-translation invariant systems, e.g. “disordered” systems, will appear in a future paper.) (ii) In a successful representation, the usual signal of a phase transition in the underlying spin-system is percolation in the graphical problem. In one form or another, this is the case in all the examples mentioned. Here we will find situations where percolation in the graphical representation implies nothing in particular for the spin-system. On the other hand, the absence of percolation in these representations is strongly suggestive of high temperature behavior. Unfortunately, as of yet, these systems are too poorly understood to demonstrate that absence of percolation is, in fact, a sufficient criterion for uniqueness. Nevertheless, these representations can be compared with and coupled to other graphical models, e.g. the independent percolation models. When the comparison models fail to percolate, uniqueness and, under stronger conditions, complete analyticity can be established. Of course the use of “non-percolation” as a tool for establishing uniqueness or complete analyticity is hardly new. These ideas are implicitly in play when the cluster expansion is shown to converge and, e.g. in the original derivation of Dobrushin [D1 ]. Furthermore, the works of [vdBM] and [N] both use (absence of) percolation in a dominating measure to establish complete analyticity/uniqueness. However, as will be discussed to some extent at the end, the results here represent an improvement over the existing (general) sufficient conditions. 1
A related approach, designed for the study of lattice models that approximate field theories, are the random walk expansions [BFSp, BFSo]. Although it may be that the full statistical mechanics model can be recovered from this expansion, it is difficult to conceive of explicit expressions, e.g. for the probability of cylinder sets in terms of the polymer weights. Nevertheless, it is presumably the case that some version of “percolation” in these expansions corresponds to the multiple phase regime in the lattice system. E.g. in finite volume, the dominant contribution to the two-point function, in finite volume, could come from terms where the polymer fills a fraction of the available space. However, to the authors’ knowledge, such a statement has not appeared in the literature.
A Criterion for Uniqueness
449
Derivation of the Expansion In what follows, we will consider only nearest neighbor interactions on Zd . The forthcoming is easily generalizable to any system with pair interactions (i.e. any graph) and, with some additional labor, to systems with multi-spin interactions including, e.g. lattice gauge theories. Let G denote a compact group, let h : G × G → R denote a left-invariant function and consider the Hamiltonian described by the formal expression H=
1 X h(si , sj ). 2 d
(1)
i,j∈Z |i−j|=1
Remarks and restrictions. (a) Here, the left-invariance of h is the mechanism for assuring that the spin-states at a single site are a priori equivalent. (b) In the above formula, each neighboring pair is counted twice. We will get rid of this convention – and the 21 – by asserting that h is symmetric (which is physically reasonable). For future convenience – but of no physical significance – we will assume that each bond of the lattice has some fixed orientation and, for |i − j| = 1 use hi, ji as notation for the bond pointing from i to j. (c) We will only consider (again on physical grounds) the cases where h is continuous, or in the discrete cases, bounded. Without loss of generality, we will set the maximum value to zero. Throughout this work, we will often consider graphical subsets of Zd ; that is, collections of sites and some of the edges (or bonds) connecting nearest neighbor pairs. Although we will often be notationally cavalier regarding the distinction between the bonds and/or the sites of a graph and/or the graph itself, in all instances, the meaning should be clear from context. Let 3 ⊂ Zd and let |3| denote the number of sites in 3. The boundary, ∂3, is here defined as the sites in Zd \ 3 with a neighbor in 3 and we will use 3 to denote 3 ∪ ∂3. For |3| < ∞, and fixed spin configuration s∂3 ∈ G|∂3| , the Hamiltonian is a well defined function of spin configuration s3 ∈ G|3| , that can be inferred from Eq. (1) and will be denoted by H(s3 | s∂3 ). The partition function on 3 at temperature 1/β with boundary condition s∂3 is given by Z 3,s∂ 3 = e−βH(s3 |s∂ 3 ) d|3| s, (2) ZH,β where ds is normalized Haar measure. To avoid cumbersome expressions, we will use as notation for both the generic lattice 3 and the generic boundary condition s∂3 ; indeed, we will further extend the notation and allow to stand for superpositions of boundary conditions, periodic boundary conditions, etc. As usual, the integrand in Eq. (2) defines the finite volume Gibbs measures on G|3| ; we denote these measures (or (−). their densities) by gH,β The derivation of the expansion follows closely the derivation of the random cluster representation in [FK]; a less compressed version (for the discrete cases) of what is to follow can be found in [CM]. For t ∈ G, let E(t) = h(e, t) (where e is the identity) and define Rt = Rt (β) = eβ|E(t)| − 1. The partition function admits the expression Z Y = d|3| s (Rs−1 sj (β) + 1), (3) ZH,β hi,ji
i
450
K.S. Alexander, L. Chayes
where we assume that a single spin configuration provides the boundary condition on ∂3 or there are free boundary conditions on 3. (Otherwise, a boundary spin integral would be required.) Let B3 denote the set of bonds of 3 – including those connecting 3 with ∂3. Let ω ⊂ B3 and for b ∈ B3 , define ωb = 1 (or “occupied ”) if b ∈ ω and 0 (or “vacant”) if b ∈ / ω. Expanding the product in Eq. (3), we may identify each term in the expansion with an ω ⊂ B3 : ωb is occupied if the “R” term is selected and is vacant otherwise. This defines a set of graphical weights: Z Y Rs−1 sj (β) (4) WH,β (ω) = d|3| s hi,ji∈ω
i
which will be our principal tool. We will denote the corresponding finite volume graphical measures (measures on {0, 1}B3 ≡ B3 ) by µ H,β (−) and we will refer to these as the grey measures. The configuration ω divides the lattice into connected components – isolated sites and clusters (components that contain bonds). We will denote the total number of clusters by k(ω) and, for future reference, the total number of components by c(ω). Obviously, the isolated sites can be integrated away which allows us to express the weights as a product over clusters: (ω) = WH,β
k(ω) YZ `=1
d|K` | s
Y hi,ji∈K`
Rs−1 sj (β),
(5)
i
where K` is the `th cluster of ω and |K` | denotes the number of sites within this cluster. Already Eq. (5) hints at a conditional independence for the behavior of spins residing in disjoint clusters; this matter will be discussed in greater depth after the following paragraph. At this point it is worth pausing to make contact with the familiar random cluster representation for the Potts model. Here, s ∈ {1, . . . q}, the group structure is of no particular significance (we may take G = Zq ) and Rs (β) = eβ − 1 if s = e and is zero otherwise (which serves to define the Potts Hamiltonian). We will consider, for simplicity, the case of free boundary conditions on 3. Examining Eq. (5) for this case, it is seen that the “integral” over any cluster vanishes unless all spins of the cluster are in the ||K || same state. In the q cases where this happens, the result is a factor of Re ` (β), where ||A|| denotes the number of bonds in the set A. Multiplying in the normalization constant ||K || of 1/q for the single-spin measure at each site we obtain the factor of qRe ` (β)q −|K` | for the cluster K` . Now, multiplying the total weight for any configuration by an irrelevant factor of q |3| , and using [components] = [clusters] + [isolated sites], the weight of the ||ω|| configuration ω is given by q c(ω) Re (β). This is equivalent to the usual random cluster F K;f weights (with free boundary conditions), i.e. Wq,p ∝ p||ω|| (1−p)[||3||−||ω||] q c(ω) with Re (β) = p/(1 − p). Although information about the spin-system is clearly lost in going to the grey representation, this can, in principal, be recovered or “built back”. Let ω ∈ B3 denote a bond configuration and s3 a spin configuration and assume, for simplicity, that the boundary condition has been provided by a single spin configuration. Consider the function Y 1 (s3 | ω) = R −1 . (6) gH,β WH,β (ω) hi,ji∈ω si sj
A Criterion for Uniqueness
451
This is clearly positive and integrates to one. We claim that gH,β (s3 | ω) has the interpretation of the conditional Gibbs density given the configuration ω. Indeed,
X
µ H,β (ω)gH,β (s3 | ω) =
ω∈B3
=
1 ZH,β
1
X Y ω hi,ji∈ω
Y
Rs−1 sj i
(Rs−1 sj i ZH,β hi,ji∈B3
+ 1),
(7)
which (cf. Eq. (3)) is exactly the Gibbsian probability density for the configuration s3 . Furthermore, a brief examination of Eq. (6) – written as a product over clusters as in Eq. (5) – clearly exhibits the conditional independence mentioned previously. There are several ways to define percolation in the grey representation. The following is the least stringent definition in the sense that if the system does not satisfy the forthcoming criterion for percolation, it certainly cannot percolate by any other definition. Let 3 ⊂ Zd with |3| < ∞ and 0 ∈ 3. Let T0,∂3 denote the event that the origin is connected to the boundary by a path of occupied bonds and define 3,s
P3 (β) = max µH,β∂ 3 (T0,∂3 ). s∂ 3
(8)
If (3k ) is any sequence of boxes satisfying 3k+1 ⊃ 3k and 3k % Zd it is not hard to see that P∞ (β) = lim P3k (β) (9) k→∞
exists and is independent of the sequence (3k ). We say that there is percolation if P∞ > 0. There are a few circumstances where percolation in the grey representation is known to coincide with a phase transition in the spin-system. In particular, this is the case for the Potts models [ACCN], the cubic models (generalized Ashkin–Teller models) – for a certain region of parameters [CM] and, in some generality, systems with discontinuous transitions [CM]. But this is certainly not always the case. For example, it is possible to show that for the 4-state clock model on Z2 , percolation in the grey representation occurs well above the critical temperature [C]. On a less ambitious tack, it seems that the absence of percolation in the grey measure should imply uniqueness of the limiting Gibbs measure. Along these lines, a considerably weaker statement was established in [CM]: if there is no percolation, then all Gibbs states are invariant under the action of G. To date, a full theorem to the effect that non-percolative behavior in a grey representation implies the uniqueness of the corresponding Gibbs measure has required the additional ingredient of a monotonicity property, e.g. the FKG property of the former. Although such monotonicity properties are plausible under some general condition of “ferromagnetism” of the Hamiltonian, the FKG property has only been established in a handful of cases. Notwithstanding the lack of monotonicity, some progress is possible when the graphical measure is dominated by a (non-percolating) measure that does have the FKG property. This is exactly the strategy that was used for the case considered in [N] and is operating implicitly in the derivation of [vdBM]. It is therefore worthwhile to consider comparison inequalities between the grey measures and other graphical problems such as the FK random cluster model. Via such comparisons, non-perturbative statements about high temperature behavior are possible.
452
K.S. Alexander, L. Chayes
A Comparison Inequality F K; (−) denote the random cluster meaFor finite 3 ⊂ Zd , p ∈ (0, 1) and q > 0, let νq,p sures with boundary conditions (appropriate to a random cluster model) as specified in (Cf. the description in the statement of Proposition 1). For h(s1 , s2 ) of the form described in the remarks and restrictions following Eq. (1), we define E0 = mins1 ,s2 h(s1 , s2 ), the quantity R0 = R0 (β) = eβ|E0 | − 1 ≡ max Rt (β) (10) t∈G
Z
and R = R(β) =
dsRs (β).
(11)
The following domination bound is elementary: F K; (−), Proposition 1. For a finite lattice, consider the random cluster measures νq,p where indicates a boundary condition in which various subsets of the boundary are considered to be “preconnected” (i.e. they act as a single site) and the rest are left free. (This includes free, wired and periodic.) For H of the type that has been described, let µ H,β (−) denote the grey graphical measures with the same boundary condition. Then F K; µ H,β (−) ≤ νQ,P (−), FKG
where P = R0 /(1 + R0 ) ≡ 1 − e−β|E0 | and Q = R0 /R. Proof. For boundary conditions of the type stated, the weights of the random cluster measure have the expression F K; (−) ∝ [ νq,p
p ||ω|| c (ω) ] q , 1−p
(12)
where c (ω) counts the number of connected components according to the rules specified by the boundary conditions in . As is well known, these are FKG measures. For convenience, let us express these measures in “loop form”: Let ` (ω) denote the minimum number of bonds in ω that must be removed until what remains is a tree. (As is the case for the number of components, this depends on boundary conditions.) Using c (ω) = ` (ω) − ||ω|| + constant, we get F K; νq,p (−) ∝ [
p ]||ω|| q ` (ω) . q(1 − p)
(13)
(ω) may be expressed in the form To establish our claim, we show that the weights WH,β F K; (ω) ∝ [νQ,P (ω)]F (ω) ∝ [R WH,β
||ω||
(
R0 ` (ω) ]F (ω), ) R
(14)
where F is a decreasing function. Defining F (ω) to be the ratio of the right-hand side of Eq. (5) to the quantity ||ω|| R [R0 /R]` (ω) , let ω ⊂ B3 and b ∈ B3 \ ω. Consider F (ω) versus F (ω ∨ b): The bond b either joins two components of ω or closes a loop in ω. We claim that in the (ω ∨ b) = RWH,β (ω) (and hence F (ω ∨ b) = F (ω)). To see this, former case, WH,β consider an integration over one component as in the right-hand side of Eq. (5):
A Criterion for Uniqueness
453
˜ (C) ≡ W H,β
Z
Y hi,ji∈C
Rs−1 sj d|C| s,
(15)
i
where here one of the “sites” may include a boundary component and if C is a single site, the integrand is taken to be unity. For any site a ∈ C, if the integration is performed so that sa is integrated last, it is seen that the final integrand is a constant independent of sa . Indeed, this follows directly from the invariance of Haar measure: For fixed value of sa , let us compare this last integrand with its value at gsa , g ∈ G. Noting that the sa dependence always comes in the form s−1 a sj , we perform the other integrations after the change of variables sj → gsj , j 6= a. The result, after the first |C| − 1 integrations, is thus manifestly independent of g so indeed the final integrand is a constant. Now consider two disjoint components, Cx and Cy of the configuration ω and suppose that ˜ (Cx ∪ Cy ∪ b) = b joins an x ∈ Cx with a y ∈ Cy . It follows immediately that W H,β ˜ ˜ RW because here, saving the sx and sy integrations for last, we get H,β (Cx )WH,β (Cy ) R ˜ (Cx )W ˜ (Cy ) × dsx dsy R −1 by the previous argument. W H,β H,β sx s y In the case where b closes a loop in ω, the derivation is simple: If b joins x to y, as an that appears in the integrand defining upper bound we replace the new factor of Rs−1 x sy the weight for ω ∨ b with R0 . This results in F (ω ∨ b) ≤ F (ω). Remark. Following the same derivation, it is easily shown that for virtually any boundary condition in the grey system, we get the above sort of dominations if we compare to the random cluster measure with wired boundary conditions. In particular (and of particular importance) are the ’s that come from a fixed spin configuration at the boundary. Here the argument is identical if the “new bond” is not connected to the boundary. If the new bond attaches a previously isolated cluster to the boundary, the derivation is the same as when two isolated clusters are joined: the grey weight gets multiplied by R and, the number of loops has not changed. Finally, if the new bond joins two clusters that are already attached to the boundary, then, by the definition according to wired boundary conditions, the number of loops increases by one, and, as in the previous loop case, the new weight factor does not exceed R0 . Let us also observe that these dominations are identities for the Potts models and therefore expected to be fairly sharp for models that are “close” to the Potts models: systems with a significant energy gap and relatively few low-lying states that occur only when the spins are nearly aligned. As an example, suppose the spins take values on the unit circle and are parameterised by θ, 0 ≤ θ ≤ 2π and, using additive notation, a pair interaction given by V (θi − θj ). If V (θ) = −(1 + cos θ), this is the usual XY model, let us consider the case where V (θ) = −1 if |θ| < and is zero otherwise. Here we have Q = 1/ and, P = 1 − e−β . If 1 then, using the well known results for the Potts model (cf. the discussion before Theorem 3) we can show uniqueness – and exponential decay of correlations down to temperatures β −1 satisfying eβ ≥ const.(−1/d ). An FKG Decomposition To implement our strategy, we need some standard terminology from percolation theory. In what follows, we will focus on a fixed U ⊂ 3. Consider a minimal set of bonds that separates U from ∂3. Such an object is often better envisioned on the dual lattice and will be referred to as a separating surface. If C is such a surface, we will denote the interior graph – sites and bonds with both endpoints that are inside C – by I(C). Similarly, the
454
K.S. Alexander, L. Chayes
exterior graph will be denoted by E(C). Since |3| < ∞, we may consider the entire collection C1 , . . . , CN of such separating surfaces. We let Cj ⊂ B3 denote the event that all the bonds in Cj are vacant. Finally, for future reference, let us observe that the surfaces {Cj } have a natural partial order by containment of interiors. Let Cj ⊂ Cj denote the event Cj = {ω ∈ Cj | Cj is the outermost vacant surface separating U from ∂3}.
(16)
We remark on two standard features of these Cj : First, any configuration in which U is disconnected from ∂3 belongs to a unique Cj and second, the event Cj is determined exclusively by the bonds in Cj ∪ E(Cj ). Since, for H of the type described, the weight factor WH,β is given by a product over clusters, it follows that restriction of the conditional graphical measure to the set U , µ H,β (− | Cj ) |U , is identical to (the restriction of) the measure on I(Cj ) with free boundary conditions on Cj : j µ H,β (− | Cj ) |U = µH,β
I(C ),f
(−) |U .
(17)
Indeed, the above holds for the restrictions to I(Cj ). Furthermore, considering the Gibbsian (built back) viewpoint, it is clear that under the condition Cj , the spins on the inside of Cj have the same distribution as the spin-system with free boundary conditions on Cj . Explicitly, gH,β (− | Cj ) |I(Cj ) = gH,βj (−), (18) P (− | Cj ) ≡ ω µ where gH,β H,β (ω | Cj )gH,β (− | ω). Next, let us define a version of FKG dominance that is slightly stronger than usual. Suppose that ν and µ are probability measures on some finite Σ = {0, 1}Σ and that ν FKG-dominates µ in the usual sense and in addition, for any 0 ⊂ Σ and any configuration η0 on 0, then for all ω0 ≺ η0 , we have µ(− | ω0 ) ≤ ν(− | η0 ). Then we will call such a FKG relationship extended FKG dominance and express this relationship by the symbol ≤ e. FKG We remark in passing that in the above definition and in the decomposition that will follow below, there is no requirement that either measure have the FKG property in its own right. Of importance in the present context is the fact that for systems of the type described, if is any boundary condition in the spin system, the corresponding grey measure satisfies F K;3,w (−) ≤ e µ (19) νQ,P H,β (−), I(C ),f
FKG
where P and Q are as described in Proposition 1. First, if ω0 ∈ 0 , we claim that F K;3,w (− | ω0 ) ≤ µ νQ,P H,β (− | ω0 ). Indeed, the vacant bonds of ω0 are accounted for FKG by considering a graph with these edges deleted. The occupied clusters of ω0 that are detached from the boundary constitute (part of) a boundary condition on 3 \ 0 of the type described in Proposition 1, while the bonds in ω0 that attached to ∂3 are equivalent to a superposition of spin-state boundary conditions on this portion of ∂(3 \ 0). But F K;3,w F K;3,w (− | η0 ) ≤ νQ,P (− | ω0 ) (and then, if η0 ω0 , we automatically have νQ,P FKG
F K;3,w (− | η0 ) ≤ µ hence νQ,P H,β (− | ω0 )) by the strong FKG property of the random FKG cluster measures with Q ≥ 1 .
A Criterion for Uniqueness
455
Whenever ν and µ are measures on an Σ with ν ≤
µ, we claim that µ ad-
FKGe
mits a family of decompositions that are analogous to decompositions into conditional measures for cylinder events. However here the measure ν influences the nature of the “conditional measures” and completely determines the coefficients of the decomposition. We will illustrate with the simplest example: Let 0 ⊂ Σ. Then we claim that µ may be expressed as X ν(η0 )µη0 (−), (20) µ(−) = η0 ∈0
where each µη0 (−) is a probability measure which itself is a convex sum of measures obtained by conditioning on configurations ω0 that are below η0 : µη0 (−) =
X ω0 :ω0 ≺η0
λη0 (ω0 )µ(− | ω0 )
(21)
P with 0 ≤ λη0 (ω0 ) ≤ 1 and ω0 λη0 (ω0 ) = 1. Similar decompositions occur for a wider variety of partitioning events. In the general case, the conditioning events are not always situated on the same set; indeed 0 may be random but should be constructed via a growth algorithm. Of immediate importance is when 0 coincides with Cj ∪ E(Cj ) whenever the event Cj occurs in the η-configuration. One can formulate these decomposition in terms of “couplings” as follows: The pair (ω0 , η0 ) are constrained in such a way that ω0 always lies below η0 and the conditional distribution of ω3 depends only on ω0 . The measures µη0 and λη0 can then be interpreted as the conditional distributions given η0 . Expressed in this language, the decomposition is not dissimilar to the couplings used in [vdBM] and in [N]. A precise statement of the generalization and a proof of the existence of such decompositions will be provided in the Appendix. On this basis, we have the following: Proposition 2. Let 3 ⊂ Zd , |3| < ∞, U ⊂ 3, and let H be a Hamiltonian of the type described in Eq. (1) and the remarks that follow. Let µ H,β (−) denote a grey measure with denoting boundary conditions coming from a single spin configuration F K;3,w (−) denote the random cluster measure with or combinations thereof and let νQ,P wired boundary conditions on ∂3 and with parameters Q = Q(H, β) and P = P (H, β) as described in Proposition 1. Then µ H,β (−) admits the decomposition µ H,β (−) =
N X
F K;3,w νQ,P (Cj )µ Cj ;H,β (−),
j=0
where for j = 1 . . . N , j µ Cj ;H,β (−) |I(Cj ) = µH,β
I(C ),f
(−),
and the measure µ C0 ;H,β (−) (with C0 denoting the configurations where there is a connection between U and ∂3) is a certain combination of µ’s that have been conditioned on cylinder events defined outside of U . Proof. This is an immediate consequence of Corollary II to Proposition A.1. (Cf. also the discussion following the proof of Corollary II).
456
K.S. Alexander, L. Chayes
Principal Results Our principal results will follow after a few definitions and remarks pertaining to the random cluster models: Consider the random cluster models on Zd with parameters q ≥ 1 and p ∈ (0, 1). Define P∞ (q, p) as in Eq. (9) – here the optimizing boundary condition is known to be the wired boundary condition. Let pc (q) denote the percolation threshold: (22) pc (q) = inf{p | P∞ (q, p) > 0}. For the sequence of hypercubes 3L of side L centered at the origin, let wc (q) be defined as the supremum of the set of p’s for which the estimate P3L (q, p) ≤ D1 e−D2 L holds uniformly in L for some D1 < ∞ and D2 > 0. Remark. Obviously pc (q) ≥ wc (q). An additional notion of a transition point may be defined: Assume for simplicity that q ≥ 1 and let τn (p, q) denote the probability, in the limiting wired measure, that the origin and the point (n, 0, . . . , 0) are in the same connected cluster. Standard subadditive arguments show that −
log τn 1 = lim n→∞ ξ n
(23)
exists. The point πc (q) is defined by πc (q) = sup{p | ξ(q, p) < ∞}. By straightforward arguments, pc (q) ≥ πc (q) ≥ wc (q). It is widely believed that for all q and in all dimenand that in d = 2, the unique transition point is located sions, pc (q) = πc (q) = wc (q) √ √ at the self dual point pD (q) = q/(1 + q) (which, a priori, lies in [πc (q), pc (q)]). For q = 1, these issues have all been settled starting with [K] and ending with [AB, MMS]. For q = 2 (starting with [O] and ending with [ABF]) these problems have also been solved. For q 1, a variety of techniques can be brought into play: Expansion techniques [LMMsRS] (for general q) or reflection positivity [KS, CM] (for integer q) can be used to show that pc (q) = πc (q) and that in d = 2, these coincide with pD (q). The stronger result pc (q) = wc (q) is established (for large q and general dimension) in [vEFSS]. Using different methods, exclusive to two dimensions, for q & 25.9 the result pc (q) = wc (q) = pD (q) is proved in [Al1 and G]. For d = 2 and integers (of relevance) between 3 and 25, it can be shown that wc (q) ≥ pD (q − 1) [Al2 ]. For d ≥ 3 and modd−1 erately large q, it is known that wc (q) ≥ [(q − 1) − (q − 1) d ]/[q − 2] [Al2 ] which agrees, to lowest non-trivial order, with the large q expansion for pc (q) in [LMMsRS]. Theorem 3. Consider a spin-system with Hamiltonian H at inverse temperature β as described in Eq. (1) and in the paragraph that follows and let Q and P denote the quantities defined in Proposition 1. Then if P < pc (Q), there is a unique limiting measure for the spin-system and if P < wc (Q), the spin-system has weak mixing. Proof. Let 3 ⊂ Zd , U ⊂ 3 and consider the spin system with two boundary condi⊗ tions on ∂3 denoted by and ⊗. Let VarU (gH,β , gH,β ) denote the variational distance between the two measures: Z 1 ⊗ ⊗ d|U | s|gH,β (sU ) − gH,β (sU )|. (24) VarU (gH,β , gH,β ) = 2 Finally, let TU,∂3 ⊂ B3 denote the event that there is a connection between U and ⊗ , gH,β ) is bounded above by ∂3. (Note that TU,∂3 = C0 .) We claim that VarU (gH,β F K;3,w (TU,∂3 ). 2νQ,P
A Criterion for Uniqueness
457
Let us start things off with the following elementary consequence of the decomposition described in Proposition 2: For each ω ∈ B3 , let JU (ω) denote those bonds in the connected component of U . If ζ ⊂ B3 is a set of bonds that are connected to U (and therefore a candidate to be JU ) we claim that F K;3,w µ⊗ (TU,∂3 )[µ⊗ H,β (JU = ζ) − µH,β (JU = ζ) = νQ,P C0 ;H,β (JU = ζ) − µC0 ;H,β (JU = ζ)]. (25) (J = ζ) and Indeed, for j = 1, . . . , N , if ζ pokes through Cj , then both µ⊗ Cj ;H,β U µCj ;H,β (JU = ζ) are zero because these measures insist that all the bonds of Cj are vacant. On the other hand, if Cj ∩ ζ = ∅, the event JU = ζ is determined in I(Cj ) where these measures agree. Thus, the only surviving term in the difference of the two decompositions from Proposition 2 is the zeroth which is the right-hand side of Eq. (25). In what follows, we will label a generic ζ with a subscripted G (for good) if no bond of ζ touches ∂3 and with a subscripted B otherwise. Recall, for ω ∈ B3 , the (s3 | ω) or the similarly defined gH,β (sU | ω) obtained by integrating out objects gH,β (sU | ω) depends only on JU (ω) so we may write the spins in 3 \ U . It is clear that gH,β (sU | JU (ω)) for these conditional densities. However, if JU (ω) = ζG for some gH,β “good” ζG then the density is independent of and we will write gH,β (sU | ζG ). We thus have X gH,β (sU ) = µ H,β (ω)gH,β (sU | ω) ω
=
X
µ H,β (JU (ω) = ζ)gH,β (sU | ζ)
ζ
=
X
µ H,β (JU (ω) = ζG )gH,β (sU | ζG ) +
ζG
X
µ H,β (JU (ω) = ζB )gH,β (sU | ζB ). (26)
ζB
Obviously there is a similar expression for the ⊗ boundary condition. The variational distance may now be estimated: Z X ⊗ , gH,β ) ≤ d|U | s gH,β (sU | ζG )|µ 2VarU (gH,β H,β (JU (ω) = ζG )− −µ⊗ H,β (JU (ω) = ζG )| + +
X
Z
ζG
d|U | s
X
µ H,β (JU (ω) = ζB )gH,β (sU | ζB ) +
ζB
µ⊗ H,β (JU (ω)
=
⊗ ζB )gH,β (sU
| ζB ).
(27)
ζB
In both terms, we may now simply integrate the spins away – these are probability densities. The remains of the first term can be estimated using Eq. (25): X ⊗ |µ H,β (JU (ω) = ζG ) − µH,β (JU (ω) = ζG )| ≤ ζG
≤
F K;3,w νQ,P (TU,∂3 )
X ⊗ [µ C0 ;H,β (JU (ω) = ζ) + µC0 ;H,β (JU (ω) = ζ)] = ζ
=
F K;3,w 2νQ,P (TU,∂3 ).
(28)
458
K.S. Alexander, L. Chayes
⊗ Meanwhile, the second term is just µ H,β (TU,∂3 ) + µH,β (TU,∂3 ) which by the (basic) F K;3,w (TU,∂3 ). We have established the claim that domination is also bounded by 2νQ,P followed Eq. (24). F K;3,w The theorem is now easily proved: if P < pc (Q) then for fixed U , νQ,P (TU,∂3 ) d vanishes as 3 % Z while if P < wc (Q), it is easily shown that there are positive constants D3 and D4 that are independent of U and 3, such that X F K;3,w νQ,P (TU,∂3 ) ≤ D3 e−D4 |x−y| (29) x∈∂U y ∈∂3
which implies weak mixing.
Corollary. In two dimensions, for the above systems with a discrete spin-space, the condition P < wc (Q), implies that the interaction βH has the restricted complete analyticity property. Proof. This is an application of [MOS] where it was established that for two dimensional discrete spin-systems, weak mixing implies “strong mixing for squares” ≡ restricted complete analyticity. For d > 2, the following is of interest: Theorem 4. For discrete spin-systems of the type described in the statement of Theorem 3, if P < pc (1), the interaction βH is completely analytic. Proof. We will use condition IIIc of [DS] which for present purposes may be read as follows: Let 3 and U denote sets as described earlier and let s∂3 and s0∂3 denote two boundary conditions that differ only at a single site y ∈ ∂3. Then complete analyticity follows if for all y and for any such s∂3 and s0∂3 , X 3,s0 3,s∂ 3 , gH,β∂ 3 ) ≤ D5 e−D6 |x−y| (30) VarU (gH,β x∈∂U
with D5 and D6 positive and independent of U , 3 and the boundary conditions s∂3 and s0∂3 . The strategy is identical to that used in the proof of Theorem 3 except that here our partitioning events, A1 , . . . , AN feature the surfaces A1 , . . . , AN that separate U from y. Indeed, let Aj denote the event that all the bonds in Aj are vacant and let I(Aj ) denote the region that can be reached by a path inside 3 that starts from U and does not use ∂3 any bond in Aj . Then µ3,s H,β (− | Aj )|I(Aj ) is identical to the grey measure with the boundary condition provided by the configuration s∂3 restricted to ∂I(Aj ) ∩ ∂3 and free boundary conditions on Aj . It is evident that this is the same as the restriction of the similar conditional measure with s∂3 replaced by s0∂3 . With this observation in mind, the derivation is now identical to the one in the previous theorem with the result 3,s0
3,s∂ 3 F K;3,w , gH,β∂ 3 ) ≤ 2νQ,P (TU,y ), VarU (gH,β
(31)
where TU,y is defined similarly to the previous T ’s. By the well known domination inequalities, we may replace QP by 1 in which case the stipulation “wired” is meaningless. 3 3 (TU,y ) ≤ x∈U ν1,P (Tx,y ) ≤ e|x−y|/ξ(P ) , where ξ, the correlation Finally we have ν1,P length, is positive for P < pc (1). Thus, with D5 = 2 and D6 = 1/ξ(P ), complete analyticity is established.
A Criterion for Uniqueness
459
Brief Comparison to Other Methods Actual (general) conditions under which expansions converge are often “nearly existential” in their statement and then easily proved at high temperature or low activity. One exception is [D2 ] where a long string of equations leading back to Eq. (2.8) allows us to calculate P < c/2d =⇒ CA (complete analyticity) where c ≈ .082. Of course orthodox enthusiasts will argue that this condition is not optimal. In [KP], one of the better bounds is obtained. Here it is required that e−τ < k/2d with k ≈ .206, where e−τ |γ| provides a bound on the weight of the “contour functional” for contours γ of length |γ|. In simple cases, it can be shown that e−τ ∼ R0 ∼ P (if P 1) hence this is similar to the above mentioned with the constant improved by a factor of 2–3. By contrast, substituting the mean-field bound pc (1) > λ−1 (d) > 1/(2d − 1) with λ(d) the connectivity constant of the lattice, we obtain P < 1/(2d − 1) =⇒ CA. This represents a substantial improvement if d 1 and, even more so in moderate dimensions because λ and/or pc (1) have improved estimates. The second general method involves the calculation of variational norms. The most prominent example is the original result of [D1 ] which, in the present context, reads ρD < 1/2d =⇒ CA. (See [DS].) Here ρD is the maximum variational distance between two single site measures whose neighbors differ at a single site. The more recent results of [vdBM] provides ρBM < sc (d), where sc (d) is the site percolation threshold and ρBM is similar to ρD , but here the boundary conditions are allowed to be different at any or all the neighboring sites. Thus ρD ≤ ρBM . For highly frustrated systems, it is argued [vdBM] that ρD ≈ ρBM and hence the advantage of sc (where sc (d) ≥ 1/(2d − 1)) versus 1/2d. However, for ferromagnetic-type systems, the impact of the full neighborhood is felt and ρBM is significantly larger than ρD . (If β 1 it is larger by a factor of 2d.) Thus the Dobrushin condition is usually better with the main advantage of [vdBM] coming in d = 2 (where sc = .59 may be accepted on faith). Here, if we make certain uncontrolled approximations, we self consistently arrive at ρD ≈ HP with H a number in the range of 2-5. However, it is difficult to really tell: These variational norms are again easy to estimate as “small” for extreme values of parameters but, in practical situations, they are very difficult to work with, especially for the derivation of general conditions. In this work, the two-dimensional systems divide into two cases: Q 1 and Q of the order of one. In the latter case, it often happens that Q ≥ 2 and we may compare √ √ directly to the Ising case which gives RCA (restricted complete analyticity) for P < 2/(1+ 2). On the other hand, for large Q (thanks to [vEFSS]) our condition for uniqueness is the same as that of RCA which reads RR0 < 1 =⇒ RCA. Thus, in the example following the proof of Proposition 1 (discretized for convenience) the Corollary to Theorem 3 implies 1/2 (which is correct to within RCA down to temperatures satisfying eβ < 1 + ( 2n ) constants [C]). An unadorned cluster expansion will not pick up the dependence and an actual variational calculation may even produce the wrong direction of the dependence (as in [CKS] for another large q model). Now it may be possible that a better variational calculation picks up the correct trend with . But for large q and no extra symmetry (as in the Potts models) the number of calculations really required is unmanageable. At this moment of writing, we believe that the most tangible asset of these methods is the intrinsic simplicity of the required calculations.
460
K.S. Alexander, L. Chayes
Appendix Here we establish the decomposition formulas discussed in the text. Let us start off with the one bond case for measures with the usual sort of FKG domination: Proposition A.1. Consider measures ν(−) and µ(−) defined on some finite {0, 1}Σ that satisfy ν ≤ µ. Assume, without loss of generality, that ∀b ∈ Σ, ν(b = 1) 6= 0. Then FKG
for any b ∈ Σ, we may write µ(−) =
X
ν(ηb )µηb (−),
ηb =0,1
where µηb =0 (−) = µ(− | ωb = 0) and µηb =1 (−) is a convex combination of µ(− | ωb = 1) and µ(− | ωb = 0). Proof. We write, tentatively, µ(−) = ν(ηb = 0)µ(− | ωb = 0) + +ν(ηb = 1)[λµ(− | ωb = 1) + (1 − λ)µ(− | ωb = 0)]
(A.1)
and attempt to solve for λ. This is accomplished by directly expanding µ in terms of its conditional measures and equating coefficients. The result is λ=
µ(ωb = 1) ; ν(ωb = 1)
this provides a sensible solution since ν ≤ µ implies λ ≤ 1.
(A.2)
FKG
Remark. If it happens that ν(b = 1) = 0 the above decomposition is still valid (and trivial) if the formulas are properly interpreted – indeed, all the ill-defined measures in the decomposition appear with zero coefficient. Hereafter, we will assume this interpretation and omit the provisos analogous to ν(b = 1) 6= 0. An immediate corollary is the “fixed 0” decomposition described in the text: Corollary I. Let Σ, µ, ν denote the quantities described above but now let us assume that ν(−) ≤ µ(−). Then for any 0 ⊂ Σ, we may write FKGe
µ(−) =
X
ν(η0 )µη0 (−),
η0
where the sum runs over all η0 ∈ {0, 1}0 and where the µη0 (−) are convex combinations of the µ measure conditioned on configurations ω0 ∈ {0, 1}0 that lie below η0 : µη0 (−) = 0 ≤ λη0 ≤ 1 and
P ω0
X ω0 : ω0 ≺η0
λη0 (ω0 ) = 1.
λη0 (ω0 )µ(− | ω0 ),
A Criterion for Uniqueness
461
Proof. Suppose that such a measure can be constructed for any 0 ⊂ Σ with k elements, let 0 denote one such example and let 00 = 0 ∪ b with b ∈ Σ \ 0. We write the full expansion for µ(−) with the further expansion of µ(− | ω0 ) into the two possibilities for ωb : X X µ(−) = ν(η0 ) λη0 (ω0 )[µ(ωb = 1 | ω0 )µ(− | ω0 , ωb = 1) + ω0 ≺η0
η0
+µ(ωb = 0 | ω0 )µ(− | ω0 , ωb = 0)].
(A.3)
Now we wish to write X X [ν(η0 , ηb = 1) λη0 ,ηb =1 (ω0 , ωb = 1)µ(− | ω0 , ωb = 1) + µ(−) = ω0 ≺η0
η0
+ν(η0 , ηb = 1)
X
ω0 ≺η0
+ν(η0 , ηb = 0)
X
ω0 ≺η0
λη0 ,ηb =1 (ω0 , ωb = 0)µ(− | ω0 , ωb = 0) + λη0 ,ηb =0 (ω0 , ωb = 0)µ(− | ω0 , ωb = 0)].
As we will demonstrate, this can be (non-uniquely) accomplished by simply equating coefficients. First off, we are forced with λη0 ,ηb =1 (ω0 , ωb = 1) =
µ(ωb = 1 | ω0 ) λη (ω0 ) µ(ηb = 1 | η0 ) 0
(A.4a)
which lies in [0, 1] by the inductive assumption (for λη0 (ω0 )) and the extended FKG dominance. As for the remainder, there is still a great deal of leeway. A natural choice is λη0 ,ηb =0 (ω0 , ωb = 0) = λη0 (ω0 ),
(A.4b)
which leaves us with λη0 ,ηb =1 (ω0 , ωb = 0) = [1 − This provides the desired decomposition.
µ(ωb = 1 | ω0 ) ]λη0 (ω0 ). µ(ηb = 1 | η0 )
(A.4c)
Remark. It is reemphasized that the above constructed µη0 (−) is by no means the only possibility. Indeed, the order in which the bonds of 0 are “processed” appears to effect the details of the outcome. We also remark that all of the above results (and those which follow) can be derived from the technically weaker condition than extended dominance. Indeed, all that is needed is the corresponding inequality for single site occupations. However, it is hard to imagine a system that satisfies the weaker condition without enjoying the stronger property. The generalizations to situations of the sort needed for this work follow from the consideration of partitioning events that can be defined via a growth algorithm. Let us start with an informal description: The algorithm starts with some (predetermined) bond b0 . The bond is “checked” to see if ηb0 is occupied or vacant. Depending on the outcome, the algorithm picks a new bond b1 (ηb0 ). This new bond is checked and, depending on ηb0 and ηb1 , a new bond b2 is determined and so forth. The procedure continues until a stopping condition is fulfilled (which depending on the algorithm could even happen
462
K.S. Alexander, L. Chayes
on the first step). The algorithm is defined so that any possible choice of outcomes will eventually lead to a stopping condition – this partitions the configuration space. Let us illustrate this procedure with the “fixed set” rule as featured in the above corollary. The bonds of 0 are deterministically ordered, b0 , b1 , . . . , bN and the algorithm dictates that after bk−1 has been checked, go to the bond bk , k = 1, 2, . . . N ; after the bond bN has been checked, stop. The formal definition in the general case is as follows: Definition. Let Σ = {0, 1}Σ be a finite space and consider a growth algorithm 8 = (81 , . . . 8M ; b0 ) defined as follows: The 8k are functions with values in Σ ∪ [stop] and the domains are particular configurations on particular subsets, 0 of Σ. In general, if η0 is in the domain of 8k and 8k (η0 ) 6= [stop], then 8k (η0 ) ∈ Σ \ 0. Thus all possible “alive” sets at the k th stage are of size k + 1. Let Ξk = {(0α , η0α )} denote the collection of all possible alive sets and configurations at the k th stage. These may be generated as follows: Ξ0 = {(b0 , ηb0 )}, Ξk+1 = {(0, η0 ) | 0 = 00 ∪ 8k+1 (η00 ); 8k+1 (η00 ) 6= [stop] η0 = (η00 , η8k+1 (η00 ) ); (00 , η00 ) ∈ Ξk }.
(A.5)
It is required that 8k+1 be defined on all η0 with (0, η0 ) ∈ Ξk . Finally, if relevant, when k = ||0|| − 1 so the first component of each Ξ||0||−1 is all of Σ, it is required that 8||0||−1 (ηΣ ) ≡ [stop]. It is evident that such an algorithm partitions Σ into disjoint events: Σ = ∪y Ky with Ky ∩ Ky0 = ∅ if y 6= y 0 . The Ky are defined as cylinder events: Ky = {η0α ; (0α , η0α ) ∈ Ξk for some k, 8k+1 (η0α ) = [stop]}. With this in mind, the generalization used in the text is another corollary. Corollary II. Let Σ, µ, ν be defined as in Corollary I and let 8 denote a growth algorithm as defined above. Then µ can be decomposed: X ν(η0α )µη0α (−), µ(−) = α
where η0α denote the partitioning cylinder events of the algorithm and where X λη0α µ(− | ω0α ), µη0α (−) = ω0α ≺η0α
with the λη0α denoting convex coefficients. Proof. Fix n and suppose that the corollary is true for any algorithm that always stops by the nth stage. (That is the cardinality of every “alive” set is at most n) This holds if n = 1 by Proposition A.1 and now the method of Corollary I readily establishes Corollary II by induction on n. Example. For the situation discussed in Proposition 2, the growth algorithm is straightforward: The bonds of 3 are deterministically ordered and b0 is defined to be the lowest bond that is touching the boundary of 3. The connected (occupied) cluster of b0 is explored using the ordering to determine the sequence of bonds checked until this cluster hits the boundary of U or is fully explored (i.e. cutoff by vacant bonds). In the former case, we get a [stop] and in the latter case, we uncover the lowest unexplored bond touching the boundary of 3 and repeat. If this procedure exhausts all the bounds touching
A Criterion for Uniqueness
463
the boundary of 3 and never gets to U , then there is evidently no occupied connection between ∂3 and ∂U and we get a [stop] when the cluster of the last bond touching ∂3 has been fully explored. In the latter cases, the exposed configuration defines the outermost separating surface Cj , j = 1, . . . , N . For conceptual convenience, we may take all the η0 ’s that produce the same event Cj and combine the resulting µη0 ’s into µCj (−), j = 1, . . . , N . What remains – where the connected component of ∂3 succeeded in reaching U , defines the event C0 and the measure µC0 (−). In a similar fashion, one can define a growth algorithm for the decomposition used in Theorem 4.
References [Ai] [Al1 ] [Al2 ] [AB] [ABF] [ACCN] [BFSo]
[BFSp] [vdBM] [C] [CCK] [CKS] [CM] [D1 ] [D2 ] [DS] [vEFSS] [FK] [G] [GLM] [K] [KP] [KS]
Aizenman, M.: Geometric Analysis of 8 Fields and Ising Models. Parts I and II. Commun. Math. Phys. 86, 1–48 (1982) Alexander, K.S.: On Weak Mixing in Lattice Models. Preprint (1996) Alexander, K.S.: Unpublished Aizenman, M., Barsky, D.J.: Sharpness of the Phase Transition in Percolation Models. Commun. Math. Phys. 108, 489–256 (1987) Aizenman, M., Barsky, D., Fernandez, R.: The Phase Transition in a General Class of Ising–Type Models is Sharp. J. Stat. Phys. 47, 343–374 (1987) Aizenman, M., Chayes, J.T., Chayes, L., Newman, C.M.: Discontinuity of the Magnetization in One–Dimensional 1/|x − y|2 Ising and Potts Models. J. Stat. Phys. 50, 1–40 (1988) Brydges, D., Fr¨ohlich, J., Sokal, A.D.: The Random Walk Representation of Classical SpinSystems and Correlation Inequalities. II. The Skeleton Inequalities. Commun. Math. Phys. 91, 117–139 (1983) Brydges, D., Fr¨ohlich, J., Spencer, T.: The Random Walk Representation of Classical SpinSystems and Correlation Inequalities. Commun. Math. Phys. 83, 123–150 (1982) van den Berg, J., Maes, C.: Disagreement Percolation in the Study of Markov Fields. Ann. Prob. 22, 749–763 (1994) Chayes, L.: Unpublished, Available on request: [email protected] (1996) Chayes, J.T., Chayes, L., Koteck´y, R.: The Analysis of the Widom-Rowlinson Model by Stochastic Geometric Methods. Commun. Math. Phys. 172, 551–569 (1995) Chayes, L., Koteck´y, R., Shlosman, S.B.: Aggregation and Intermediate Phases in Dilute Spin Systems. Commun. Math. Phys. 171, 203–232 (1995) Chayes, L., Machta, J.: Graphical representations and Cluster Algorithms Part I: Discrete Spin Systems. Preprint (1996) Dobrushin, R.: The Description of a Random Field by Means of Conditional Probabilities and Conditions of its Regularity. Theor. Prob. Appl. 13, 197–224 (1968) Dobrushin, R.: Induction on Volume and no Cluster Expansion. VIIth International Congress on Mathematical Physics, M. Mebkhout and R. S´en´eor, eds., Singapore: World Scientific, 1987 Dobrushin, R., Shlosman, S.B.: Completely Analytical Interactions: Constructive Description. J. Stat. Phys. 46, 983–1014 (1987) van Enter, A., Fernandez, R., Schonmann, R.H., Shlosman, S.B.: On complete Analyticity of the 2D Potts Model. Preprint (1996) Fortuin, C.M., Kasteleyn, P.W.: On the Random Cluster Model I. Introduction and Relation to Other Models. Physica 57, 536–564 (1972) Grimmett, G.R.: Percolation and Disordered Systems. Unpublished manuscript (1996) Giacomin, G., Lebowitz, J.L., Maes, C.: Agreement Percolation and Phase Coexistence in Some Gibbs Systems. J. Stat. Phys. 22, 1379–1403 (1995) Kesten, H.: The Critical Probability of Bond Percolation on the Square Lattice Equals 21 . Commun. Math. Phys. 74, 41–59 (1980) Koteck´y, R., Preiss, D.: Cluster Expansion for Abstract Polymer Models. Commun. Math. Phys. 103, 491–498 (1986) Koteck´y, R., Shlosman, S.B.: First-Order Phase Transitions in Large Entropy Lattice Models. Commun. Math. Phys. 83, 493–515 (1982) 4
464
K.S. Alexander, L. Chayes
[LMMsRS] Laanait, L., Messager, A., Miracle–Sole, S., Ruiz, J., Shlosman, S.: Interfaces in the Potts Model I: Pirogov–Sinai Theory of the Fortuin–Kasteleyn Reprenentation. Commun. Math. Phys. 140, 81–91 (1991) [MOS] Martinelli, F., Olivieri, E., Schonmann, R.H.: For 2–d Lattice Spin Systems, Weak Mixing Implies Strong Mixing. Commun. Math. Phys. 165, 33–47 (1994) [MMS] Menshikov, M.V., Molchanov, S.A., Sidovenko, S.A.: Percolation Theory and some Applications. Itogi Nauki i Techniki (Series of Probability Theory, Mathematical Statistics and Theoretical Cybernetics) 24, 53–110 (1986) [N] Newman, C.M.: Disordered Ising Systems and Random Cluster Representations. Probability and Phase Transitions (G. Grimmett ed.), NATO ASI Series No. 420, Dordrechet: Kluwer, 1994 [O] Onsager, L.: Crystal Statistics. I. A Two–Dimensional Model with an Order–Disorder Transition. Phys. Rev. 65, 117–149 (1944) Communicated by J.L. Lebowitz
Commun. Math. Phys. 189, 465 – 480 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Decay of Correlations in Random-Cluster Models G.R. Grimmett, M.S.T. Piza Statistical Laboratory, University of Cambridge, 16 Mill Lane, Cambridge CB2 1SB, United Kingdom Received: 25 September 1996 / Accepted: 21 February 1997
In memory of Roland L. Dobrushin
Abstract: We prove exponential decay for the tail of the radius R of the cluster at the origin, for subcritical random-cluster models, under an assumption slightly weaker than that E(Rd−1 ) < ∞ (here, d is the number of dimensions). Specifically, if E(Rd−1 ) < ∞ throughout the subcritical phase, then P(R ≥ n) ≤ exp(−αn) for some α > 0. This implies the exponential decay of the two-point correlation function of subcritical Potts models, subject to a hypothesis of (at least) polynomial decay of this function. Similar results are known already for percolation and Ising models, and for Potts models when the number q of available states is sufficiently large; indeed the hypothesis of polynomial decay has been proved rigorously for these cases. In two dimensions, the hypothesis that E(R) < ∞ is weaker than requiring that the susceptibility be finite, i.e., that the twopoint function be summable. The principal new technique is a form of Russo’s formula for random-cluster models reported by Bezuidenhout, Grimmett, and Kesten. For the current application, this leads to an analysis of a first-passage problem for random-cluster models, and a proof that the associated time constant is strictly positive if and only if the tail of R decays exponentially.
1. Introduction The probability theory of phase transition in physical systems is fairly developed (see the papers published in [21]). For a variety of models of interest, it turns out that there is a unique point of phase transition, which separates a ‘subcritical’ phase from a ‘supercritical’ phase. Throughout the subcritical phase, one often finds that the correlation functions decay exponentially over large distances. In contrast, they are bounded away from zero in the supercritical phase. This general picture of statistical mechanics has been verified in many probabilistic systems, including the percolation and Ising models. Such percolation/Ising systems may be incorporated together with Potts models within the broader class of ‘random-cluster models’, and the latter class of models provides a
466
G.R. Grimmett, M.S.T. Piza
beautiful general setting for studying such systems. In particular, one may ask whether or not the exponential decay of the connectivity function characterises the subcritical phases of all random-cluster models. The current paper is directed at this question. Decay rates are fundamental to understanding the structure of models of statistical physics. One of the major thrusts of the modern theory of Gibbs states is directed towards a control of correlation functions over large spatial scales. This programme was initiated in part in a famous paper of Dobrushin and Pecherski [17], who established in a certain context that polynomial decay of correlation functions implies exponential decay. Such results have provided stepping stones towards proofs of full exponential decay; see [25, 32, 36] for further examples of such theorems. In the present paper, we prove a similar result in the general context of the random-cluster model (otherwise known as the Fortuin–Kasteleyn representation). In advance of presenting the technical details, we state briefly the main result of this paper. (For formal definitions, the reader is referred to Sect. 2.) Let p and q be the parameters of a random-cluster model on Zd , where d ≥ 2; here, p is the edge parameter, and q is the cluster-weighting parameter. Suppose q ≥ 1, and let pc (q) be the critical value of p, i.e., pc (q) = sup{p : φp,q (0 ↔ ∞) = 0}, where φp,q is the appropriate probability measure, and {0 ↔ ∞} is the event that the origin is in an infinite open cluster. [For cognoscenti, we remark that φp,q is the randomcluster measure obtained using ‘free boundary conditions’.] Writing {0 ↔ ∂3n } for the event that the open cluster at the origin intersects the sphere of radius n, it is presumably the case that (1.1) φp,q (0 ↔ ∂3n ) ≤ e−αn for some α = α(p, q) satisfying α(p, q) > 0
if
p < pc (q).
(1.2)
Inequalities of the form (1.1) have been proved in the special cases when q = 1, q = 2, and q is sufficiently large. These cases correspond respectively to the percolation model ([19]), the Ising model ([2, 4, 7]), and Potts models with large q ([29, 30, 31]). Although the arguments used in these three special situations have certain features in common, there is no unified proof, and in particular no proof which extends to general values of q. For percolation and Ising models, the exponential decay of the two-point function was first proved in two stages. Initially, it was shown that exponential decay is valid whenever the susceptibility is finite, i.e., whenever the two-point connectivity function (or correlation function in the case of the Ising model) is summable; and later it was proved that the susceptibility is indeed finite throughout the subcritical phase. (This was achieved by Hammersley [25] and Aizenman–Barsky [3] for percolation, and by Simon– Lieb [32, 36] and Aizenman–Barsky–Fern´andez [4] for the Ising model. In the case of percolation, a direct argument, avoiding the first stage, was discovered by Menshikov [34, 35].) In proofs of exponential decay for the percolation model, the BK inequality plays a central role (see [10, 19]). When q = 2, this role is played by the Simon–Lieb inequality (see [32, 36]). No such method is known for general q, although various attempts have been made to fill the gap (see [14, 22]). In this paper, we establish the first stage of the above programme in the general setting of random-cluster models. We prove that
Decay of Correlations in Random-Cluster Models
if
467
o n lim sup nd−1 φp,q (0 ↔ ∂3n ) < ∞
when
n→∞
p < pc (q),
(1.3)
then there exists α = α(p, q), satisfying α(p, q) > 0 when p < pc (q), such that φp,q (0 ↔ ∂3n ) ≤ e−αn
for all large n.
(1.4)
Next we discuss briefly the assumption (1.3). Hypothesis (1.3) requires that φp,q (0 ↔ ∂3n ) decay at least as fast as 1/nd−1 , and is implied by the stronger statement that φp,q (Rd−1 ) < ∞
when
p < pc (q),
(1.5)
where R = max{n : 0 ↔ ∂3n } is the radius of the open cluster at the origin (and we use φp,q to denote expectation as well as probability); we shall return to this discussion just before the statement of Theorem 1 in Sect. 3. By elementary geometrical considerations, there exists a positive constant β = β(d) such that β|C|1/d ≤ R + 1 ≤ |C|,
(1.6)
where C = {x : 0 ↔ x} is the open cluster at the origin. Therefore (1.5) is implied by the statement (1.7) φp,q (|C|d−1 ) < ∞ when p < pc (q), which is equivalent, when d = 2, to the finiteness of the susceptibility χ(p, q) = φp,q (|C|). The relationship between random-cluster models and percolation/Ising/Potts models is well explored and documented elsewhere (see the references in [23]). The result described above has the following implication for ferromagnetic Potts models. If the two-point correlation function decays at least as fast as a certain negative polynomial, then it decays at least as fast as e−αn . Hypothesis (1.3) is not easily translated into an exactly equivalent statement for Potts models. Either of the following two conditions suffices for Potts models: (a) the two-point correlation function decays at least as fast as 1/n2(d−1) , 1 (σ0 = 1) − q −1 decays at least as fast as 1/nd−1 . (b) the finite-volume quantity π3 n
1 is a ferromagnetic Potts measure on {1, 2, . . . , q}3 having ‘1’ boundary (Here, π3 conditions, and σ0 is the spin at the origin.)
For this study, it is natural to investigate a certain related first-passage problem arising as follows from the random-cluster model. Let Fn denote the minimum number of closed edges amongst paths of the lattice joining the origin to ∂3n , i.e., Fn is the minimal number of extra edges required to be open in order that {0 ↔ ∂3n } occurs. It may be shown, using the ergodicity of φp,q (see [18, 23, 28]), that the limit n o (1.8) µ(p, q) = lim n−1 Fn n→∞
exists and is constant (φp,q -a.s.). It is presumably the case that µ(p, q) > 0
for p < pc (q).
(1.9)
We show in Theorem 4 that (1.9) holds if and only if φp,q (0 ↔ ∂3n ) decays exponentially as n → ∞ when p < pc (q) (i.e., (1.1) and (1.2) hold). As noted earlier in a related
468
G.R. Grimmett, M.S.T. Piza
context, exponential decay is proved only for q = 1, q = 2, and for sufficiently large q. The above first-passage problem has been studied in the case of percolation (q = 1) by Kesten [27], and related results are known for the two-dimensional Ising model (see [18] and its references). A similar first-passage problem has been studied by Fontes and Newman [18]. By utilising one of their arguments, we shall establish sufficient conditions for the conclusion µ(p, q) > 0. This in turn implies the required exponential decay. Incidentally, the comparison inequalities (see [23], Thm 2.2) imply exponential decay for sufficiently small p. The problem is to prove it all the way up to the critical point. 2. Random-Cluster Models In this section, we introduce appropriate notation, and we define random-cluster measures. For general results and historical background, we refer the reader to [23] and the references therein. We define a random-cluster measure on a finite graph G = (V, E) as follows. Let 0 ≤ p ≤ 1 and q > 0. The relevant sample space is the finite set E = {0, 1}E , containing configurations that allocate 0’s and 1’s to the edges of G. For ω ∈ E , we call an edge e open if ω(e) = 1, and closed otherwise. The random-cluster measure on G, having parameters p and q, is the probability measure φG,p,q on E given by Y 1 ω(e) 1−ω(e) q k(ω) , ω ∈ E , p (1 − p) (2.1) φG,p,q (ω) = ZG,p,q e∈E
where k(ω) is the number of open components of ω (i.e., the number of components of the graph (V, η(ω)), where η(ω) is the set of open edges under ω), and X Y ω(e) 1−ω(e) p (1 − p) (2.2) q k(ω) ZG,p,q = ω∈E
e∈E
is the normalising factor (or ‘partition function’). We shall define a random-cluster measure on an infinite lattice by taking weak limits of such measures on finite boxes of the lattice. In advance of doing this, we present some notation which will be useful later. Let L be the d-dimensional hypercubic lattice having vertex set Zd and edge set E containing all pairs of vertices which are euclidean distance 1 apart; we assume throughout that d ≥ 2. We shall write x = (x1 , x2 , . . . , xd ) for x ∈ Zd , and denote by hx, yi an edge joining vertices x and y. A path of L is an alternating sequence x0 , e0 , x1 , e1 , . . . of distinct vertices xi and edges ej such that ej = hxj , xj+1 i for each j. If this path terminates at some xn then it is said to join x0 to xn and to have length n; if a path has infinitely many vertices then it is said to connect x0 to ∞. We write kxk = max{|xi |}, i
where x = (x1 , x2 , . . . , xd ).
The basic configuration space is = {0, 1}E endowed with the σ-field F generated by the finite-dimensional cylinders of . A configuration ω (∈ ) is an assignment of 0 or 1 to each edge e (∈ E), and may be put into one–one correspondence with the set η(ω) = {e ∈ E : ω(e) = 1} of ‘open’ edges in ω. The ‘open paths’ of a configuration
Decay of Correlations in Random-Cluster Models
469
ω are those paths of L all of whose edges are open. If A and B are sets of vertices, we write {A ↔ B} for the event that there exists an open path joining some vertex of A to some vertex of B. Similarly we write {A ↔ ∞} for the event that some vertex of A is the endpoint of an infinite open path. The complements of such events are denoted using the symbol =. For any subset E of E, we write FE for the σ-field of subsets of generated by the finite-dimensional cylinders of E, so that F = FE . A box 3 is a subset of Zd of the form 3=
d Y
[xi , yi ]
i=1
for some x, y ∈ Zd , and where [xi , yi ] is interpreted as [xi , yi ]∩Z. The box 3 generates a subgraph of L with vertex set 3 and edge set E3 containing all edges hu, vi with u, v ∈ 3. Of particular interest are the boxes 3n = [−n, n]d , for n ≥ 1. The boundary ∂V of a set V of vertices is the set of all vertices x (∈ V ) which are adjacent to some vertex of L not in V . For a box 3, we write 03 for the subset of containing all configurations ω satisfying ω(e) = 0 for e ∈ / E3 . Let 0 ≤ p ≤ 1 and q ≥ 1. We define φ03,p,q to be the random-cluster measure on the finite graph (3, E3 ) ‘with boundary condition 0’ (this is the equivalent of free boundary conditions for ferromagnetic systems). This is done basically as in (2.1), but on a slightly different probability space. More precisely, let φ03,p,q be the probability measure on (, F ) satisfying Y 1 pω(e) (1 − p)1−ω(e) q k(ω,3) for ω ∈ 03 , (2.3) φ03,p,q (ω) = 0 Z3,p,q e∈E 3
where k(ω, 3) is the number of components of the graph (Zd , η(ω)) which intersect 3, 0 and where Z3,p,q is the appropriate normalising constant 0 Z3,p,q
=
X Y ω∈0
3
p
ω(e)
(1 − p)
1−ω(e)
q k(ω,3) .
(2.4)
e∈E3
Note that φ03,p,q (03 ) = 1. The following facts are known and relevant (see [23]). (a) The limit φp,q = lim3→Zd φ03,p,q exists, in the sense of weak convergence of measures. (b) The measure φp,q is ergodic. (c) If φξ3,p,q is a random-cluster measure on 3 with some boundary condition ξ other than ‘0’ (see [23]), then all weak limits as 3 → Zd of φξ3,p,q are equal to φp,q , so long as p < pc (q), where pc (q) is the following critical value: pc (q) = sup{p : φp,q (0 ↔ ∞) = 0}. (d) Random-cluster measures (with q ≥ 1) satisfy the FKG inequality.
(2.5)
470
G.R. Grimmett, M.S.T. Piza
The relationship between random-cluster models and Potts models is well documented elsewhere (see the references in [23]). We note here only that the q-state Potts model with pair-interaction J (> 0) corresponds to the random-cluster model with parameters p = 1 − e−J and q. In particular, the two-point correlation function of the Potts model with spins σ satisfies hδσ0 ,σx i − q −1 = (1 − q −1 )φp,q (0 ↔ x), where h·i denotes averages with respect to the Potts measure on L arising from free boundary conditions, and δi,j is the Kronecker delta. Now, φp,q (0 ↔ x) ≤ φp,q (0 ↔ ∂3n )
if kxk = n,
so that upper bounds for φp,q (0 ↔ ∂3n ) imply upper bounds for the Potts correlation function. 3. Exponential Decay We are interested here in the rate of decay of connectivity functions in the subcritical phase, i.e., when p < pc (q). We prove exponential decay under a certain assumption which we introduce next. Let q ≥ 1. For 0 ≤ p ≤ 1, define n o (3.1) Z(p, q) = lim sup nd−1 φp,q (0 ↔ ∂3n ) . n→∞
Now Z(p, q) is non-decreasing in p, and we may therefore define pg (q) = sup{p : Z(p, q) < ∞}.
(3.2)
Clearly pg (q) ≤ pc (q), and it is generally believed that equality holds here. The critical point pg (q) plays the role of the quantity pT in the percolation literature (see [19], p. 45), although pg (q) and pT have different (but similar) definitions. As observed in Sect. 1, it is known that pg (q) = pc (q) if q = 1, q = 2, or q is sufficiently large. The condition Z(p, q) < ∞ amounts to assuming that the radius R = max{kxk : 0 ↔ x} has a tail decaying at least as fast as n−(d−1) , and is a weaker assumption than the moment condition φp,q (Rd−1 ) < ∞. [The expression µ(X) denotes the mean of the random variable X under the measure µ.] Actually Z(p, q) = 0 if φp,q (Rd−1 ) < ∞, since n
d−1
φp,q (0 ↔ ∂3n ) = n
d−1
φp,q (R ≥ n) ≤
∞ X
k d−1 φp,q (R = k).
k=n
There is a converse also. If p < pg (q) then Z(p, q) < ∞, implying that nc φp,q (0 ↔ ∂3n ) → 0
for all c satisfying c < d − 1.
This in turn implies that φp,q (Rc ) < ∞ for all c satisfying c < d − 1 (see [24], Problem 5.6.18). Theorem 1. Let 0 < p < 1 and q ≥ 1, and suppose that p < pg (q). There exists α = α(p, q) satisfying α(p, q) > 0 such that φp,q (0 ↔ ∂3n ) ≤ e−αn
for all large n.
(3.3)
Decay of Correlations in Random-Cluster Models
471
This theorem is proved in Sect. 5. When d √ = 2, it is √ believed that the critical point pc (q) coincides with the self-dual point κq = q/(1 + q); see [23, 37]. It is known that pc (q) ≥ κq , but no rigorous proof of the converse inequality is available for general q (≥ 1). It would be sufficient to prove a ‘reasonable’ decay rate for φp,q (0 ↔ ∂3n ) as n → ∞, when p < pc (q). Using Theorem 1, we find that pc (q) = κq if φp,q (0 ↔ ∂3n ) ≤
c(p, q) n
for all n,
where c(p, q) < ∞ for p < pc (q). 4. Two Lemmas, and a First-Passage Problem Next we state and prove two fundamental inequalities. After this, we apply them in studying a first-passage problem. First we review a fundamental formula of [13]. Fix q ∈ (0, ∞), p ∈ (0, 1), and let ψp be the random-cluster measure with parameters p and q on the finite graph G = (V, E); later we shall set G = 3 and ψp = φ03,p,q . It is proved in [13] that, for any event A, n o 1 d ψp (A) = ψp (N 1A ) − ψp (N )ψp (A) , dp p(1 − p)
(4.1)
and N is the number of open edges (i.e., for where 1A is the indicator function of A,P ω ∈ E = {0, 1}E , we have N (ω) = e ω(e)). A version of this formula is often attributed to Russo in the case q = 1 (percolation) although it was known earlier to those working in reliability theory (see the discussion in [19]). There is a partial order on E given by: ω ≤ ω 0 if and only if ω(e) ≤ ω 0 (e) for all e ∈ E. A function f : E → R is called increasing if f (ω) ≤ f (ω 0 ) whenever ω ≤ ω 0 , and is called decreasing if −f is increasing. An event A (⊆ E ) is called increasing (resp. decreasing) if its indicator function 1A is increasing (resp. decreasing). Henceforth we assume that q ≥ 1, so that ψp satisfies the FKG inequality. Suppose that A is an increasing event (but not the empty set ∅). For ω ∈ E , let FA (ω) be the minimum number of additional edges necessary for A to occur; that is to say, X 0 0 0 {ω (e) − ω(e)} : ω ≥ ω, ω ∈ A . (4.2) FA (ω) = inf e
It may be checked that N + FA is an increasing random variable, and also that FA (ω)1A (ω) = 0 for all ω. Therefore, by the FKG inequality, ψp (N 1A ) = ψp (N + FA )1A ≥ ψp (N + FA )ψp (A), whence
ψp (N 1A ) − ψp (N )ψp (A) ≥ ψp (FA )ψp (A).
Substituting this into (4.1), we obtain the following lemma. Lemma 2. Let q ≥ 1 and 0 < p < 1. For any increasing event A (6= ∅), ψp (FA ) d {log ψp (A)} ≥ . dp p(1 − p)
(4.3)
472
G.R. Grimmett, M.S.T. Piza
In the proof of Theorem 1, this inequality plays the role of inequalities (3.10) and (3.36) of [19], used by Menshikov [34, 35] to prove exponential decay for subcritical percolation models. Integrating (4.3) over the interval [r, s], and using the facts that p(1 − p) ≤ 41 and that FA is a decreasing random variable, we find that Z s ψp (FA )dp ψr (A) ≤ ψs (A) exp −4 (4.4) r ≤ ψs (A) exp{−4(s − r)ψs (FA )}, if r ≤ s. There is a further relation between the probability of A and the mean of FA . Lemma 3. Let q ≥ 1 and 0 < r < s < 1. Then, for any increasing event A, k (1 − r)q q ψr (FA ≤ k) ≤ · ψs (A) for all k ≥ 0. s − r r + (1 − r)q
(4.5)
This lemma is very closely related to the ‘sprinkling’ lemma of [5], a version of which is valid for random-cluster models; see also [19]. We shall make use of it in the following way. By (4.5) with C = q 2 (1 − r)/{(s − r)(r + (1 − r)q)}, ψr (FA ) =
∞ X
ψr (FA > k) ≥
k=0
K X
1 − C k ψs (A) ,
k=0
where K = max{k : C ψs (A) ≤ 1}. We sum this as usual, noting that C > 1, to find that − log ψs (A) C − ψs (A) − if r < s. (4.6) ψr (FA ) ≥ log C C −1 In advance of proving the latter lemma, we present an application of the two lemmas together. Henceforth let q ≥ 1. Returning to the lattice L, we set An = {0 ↔ ∂3n }, and write Fn for FAn . As remarked in Sect. 1, Derrienic’s theorem (see [18, 28]) implies the existence of the constant limit n o φp,q -a.s. (4.7) µ(p, q) = lim n−1 Fn k
n→∞
Using a comparison inequality (see [23], Thm 2.2) we have that µ(p, q) is non-increasing in p, and we define pflow (q) = sup{p : µ(p, q) > 0}. Next we define the correlation length ξ(p, q) by 1 −1 ξ(p, q) = lim − log φp,q 0 ↔ ne1 , n→∞ n where e1 is a unit vector in the direction of increasing first coordinate, and where the limit exists by the FKG inequality and subadditivity. (We adopt the convention that ∞−1 = 0.) Note that ξ(p, q) is non-decreasing in p. Using the argument of [23], Thm 5.14, we have that 1 lim − log φp,q (An ) = ξ(p, q)−1 , n→∞ n whence φp,q (An ) decays exponentially if and only if ξ(p, q) < ∞. We define the further critical point pcorr (q) = sup{p : ξ(p, q) < ∞}.
Decay of Correlations in Random-Cluster Models
473
Theorem 4. Let q ≥ 1. It is the case that pflow (q) = pcorr (q). It is clear from the above observations that pflow (q) = pcorr (q) ≤ pg (q) ≤ pc (q), and it is a consequence of Theorem 1 that pcorr (q) = pg (q). It is believed also that pg (q) = pc (q). As observed earlier, this is known only for q = 1, q = 2, and for sufficiently large q. The first-passage problem and the time constant µ(p, q) have been studied in detail when q = 1; see [27, 28]. Several authors have paid serious attention to a closely related question when q = 2 and d = 2, namely, the corresponding question for the two-dimensional Ising model, where the ‘passage time’ Fn is replaced by the minimum number of changes of spin along paths from the origin to ∂3n ; see [1, 15, 18]. The time constant in the Ising case cannot exceed the corresponding random-cluster time constant µ(p, 2), since each edge of the Ising model having endpoints with unlike spins gives rise to a closed edge in the associated (coupled) random-cluster process. In some of the following proofs we shall make use of Lemmas 2 and 3 applied to the infinite-volume random-cluster measures. Let A be an increasing (non-empty) cylinder event in the measurable space (, F), and set ψp = φ03M ,p,q , where M is a positive integer. We apply (4.4) and (4.6) accordingly, noting that q q(1 − r)
C − log φs,q (A) − . log{q/(s − r)} C − 1
(4.8) (4.9)
Before turning to the proof of Theorem 4, we make one further observation. Inequalities (4.8) and (4.9), with A = An , imply that the correlation length ξ(p, q) is strictly increasing in p whenever it is finite (cf. [19], Thm 5.14). Proof of Lemma 3. Let r < s. We shall employ a suitable coupling of the measures ψr and ψs . Let E = {e1 , e2 , . . . , em } be the edges of the graph G, and let U1 , U2 , . . . , Um be independent random variables having the uniform distribution on [0, 1]. We shall examine the edges in turn, to determine whether they are open or closed for the respective parameters r and s. The outcome will be a pair (π, ω) of configurations each lying in = {0, 1}E such that π ≤ ω. The configurations π, ω are random in the sense that they are functions of the Uj . First, we declare π(e1 ) = 1 ω(e1 ) = 1
if and only if U1 < ψr (J1 ), if and only if U1 < ψs (J1 ),
where Ji is the set of configurations γ (∈ ) with γ(ei ) = 1. Note that ψr (J1 ) ≤ ψs (J1 ) since r < s, and therefore π(e1 ) ≤ ω(e1 ). Let M be an integer satisfying 1 ≤ M < m. Having defined π(ei ), ω(ei ) such that π(ei ) ≤ ω(ei ) (for i ≤ M ), we define π(eM +1 ) and ω(eM +1 ) as follows. We declare π(eM +1 ) = 1 if and only if UM +1 < ψr JM +1 | FM (π) , ω(eM +1 ) = 1 if and only if UM +1 < ψs JM +1 | FM (ω) ,
474
G.R. Grimmett, M.S.T. Piza
where FM (γ) is the set of configurations ν satisfying ν(ei ) = γ(ei ) for 1 ≤ i ≤ M . We have that ψr (JM +1 | FM (π)) ≤ ψs (JM +1 | FM (ω)) since r < s and π(ei ) ≤ ω(ei ) for 1 ≤ i ≤ M ; this implies that π(eM +1 ) ≤ ω(eM +1 ). Continuing likewise, we obtain a pair (π, ω) of configurations satisfying: (a) π ≤ ω, (b) π is distributed according to the measure ψr , (c) ω is distributed according to the measure ψs . We write µ for the probability measure associated with the Uj . By a straightforward computation (cf. Eq. (3.10) of [23]), p , p + (1 − p)q ψp (Ji | Dic ) = p, ψp (Ji | Di ) =
where Di is the event that there is no open path of E \ {ei } joining the endpoints of ei , and Dic is the complement of Di . Using conditional expectations, we deduce that, since q ≥ 1, then p ≤ ψp (Ji | D) ≤ p (4.10) p + (1 − p)q for any event D defined on the states of E \ {ei }. It follows from the definition of the π(ei ) and ω(ei ) that µ π(eM +1 ) = 0 | U1 , U2 , . . . , UM = 1 − ψr JM +1 | Fm (π) ≤
(1 − r)q . r + (1 − r)q
By a similar argument, µ ω(eM +1 ) = 1, π(eM +1 ) = 0 U1 , U2 , . . . , UM s−r = ψs JM +1 | FM (ω) − ψr JM +1 | FM (π) ≥ . q A full derivation of the last inequality is obtainable as follows. Using Lemma 2 with A = Ji (so that FJi = 1Jic ) together with (4.10), ψp (Ji ) 1 − ψp (Ji ) 1 1 0 ≥ ≥ . ψp (Ji ) ≥ p(1 − p) p + (1 − p)q q Now integrate over the interval [r, s] to obtain that ψs (Ji ) − ψr (Ji ) ≥
s−r . q
(4.11)
Finally,
ψs JM +1 | FM (ω) − ψr JM +1 | FM (π) ≥ ψs JM +1 | FM (ω) − ψr JM +1 | FM (ω) ,
and the claim follows by applying (4.11) with i = M + 1 to the graph obtained from G by contracting (resp. deleting) any edge ei (for 1 ≤ i ≤ M ) with ω(ei ) = 1 (resp. ω(ei ) = 0). Cf. Theorem 2.3 of [23]. It follows from the above that
Decay of Correlations in Random-Cluster Models
475
s − r r + (1 − r)q · . µ ω(eM +1 ) = 1 π(eM +1 ) = 0, U1 , U2 , . . . , UM ≥ q (1 − r)q
(4.12)
Now fix a configuration ξ (∈ ) and a set B of edges such that ξ(e) = 0 for e ∈ B. We claim that µ π = ξ, ω(e) = 1 for e ∈ B ≥
s − r r + (1 − r)q · q (1 − r)q
|B| µ(π = ξ).
(4.13)
This follows from the recursive construction of π and ω in terms of the family U1 , U2 , . . . , Um , in the light of the bound (4.12). Inequality (4.13) implies the claim of the lemma, as follows. Let ξ be a configuration satisfying FA (ξ) ≤ k. There exists a set B = Bξ of edges such that (a) |B| ≤ k, (b) ξ(e) = 0 for e ∈ B, (c) the configuration obtained from ξ by allocating state 1 to all edges in B lies in the event A. If more than one such set B exists, we pick the earliest in some deterministic ordering of all subsets of E. Then, by (4.13), ψs (A) ≥ µ FA (π) ≤ k, ω(e) = 1 for e ∈ Bπ X = µ π = ξ, ω(e) = 1 for e ∈ Bξ ξ:FA (ξ)≤k
≥
s − r r + (1 − r)q · q (1 − r)q
k ψr (FA ≤ k).
Proof of Theorem 4. Let r < s < pflow (q). There exists a constant γ = γ(s, q) (> 0) such that φs,q (Fn ) ≥ nγ(s, q)
for all n ≥ 1.
(4.14)
Now let A = An = {0 ↔ ∂3n }. In conjunction with (4.14), (4.8) implies the exponential decay of φr,q (An ), whence r < pcorr (q). Therefore pflow (q) ≤ pcorr (q). Conversely, suppose that r < s < pcorr (q). There exists α = α(s, q) (> 0) such that φs,q (An ) ≤ e−αn for all n. By (4.9) with A = An and some positive β = β(r, s, q), φr,q (Fn ) ≥
αn − log(e−αn ) −β = − β, log{q/(s − r)} log{q/(s − r)}
whence r < pflow (q). Therefore pcorr (q) ≤ pflow (q).
476
G.R. Grimmett, M.S.T. Piza
5. Proof of Theorem 1 There are two stages in the proof. In the first stage, we use inequalities (4.4) and (4.6) in an iterative scheme in order to prove that φp,q (An ) decays ‘near-exponentially’ when p < pg (q). In the second stage, we use Theorem 4 together with an argument developed by Fontes and Newman [18] to deduce full exponential decay. The conclusions of these two stages are summarised in the following two lemmas. Lemma 5. Let 0 < p < 1 and q ≥ 1, and suppose that p < pg (q). There exist constants c(p), 1(p), satisfying c(p) > 0, 0 < 1(p) < 1, such that for all n ≥ 1. φp,q (An ) ≤ exp −cn1 We recall the flow constant µ(p, q) defined in (1.8) and (4.7). As before, C is the vertex set of the open cluster at the origin. Lemma 6. Let 0 < p < 1 and q ≥ 1. If φp,q |C|2d+ < ∞ for some > 0, then µ(p, q) > 0. Before embarking on the proofs of these lemmas, we make some remarks. First, Lemma 5 will be proved by an iterative scheme which may be continued further. If this is done, one obtains thereby a proof that φp,q (An ) decays at least as fast as exp{−αk (p)n/ logk n for any k ≥ 1, where αk (p) > 0 and logk n is the k th iterate of logarithm. Secondly, the hypothesis of Lemma 6 is implied by the conclusion of Lemma 5, using (1.6). Therefore Lemmas 5 and 6 imply that µ(p, q) > 0 when p < pg (q), whence Theorem 1 follows by Theorem 4. Thirdly, essentially the only feature of the measure φp,q which enables Lemma 6 is the FKG property. More precisely, a version of Lemma 6 holds with φp,q replaced by any ergodic probability measure satisfying the FKG inequality. In addition, the moment condition may be relaxed just a little; see [16, 18]. Proof of Lemma 5. We shall make central use of inequalities (4.4) and (4.6), in an iterative scheme. Rather than using these inequalities in the forms presented for finite graphs, we shall make use of their infinite-volume versions (4.8) and (4.9). In the following, we shall sometimes use real quantities when integers are required. It will be clear that this notational simplification has no ultimate effect on the validity of the proof. All o(1) and O(1) terms are to be interpreted in the limit n → ∞. Fix q ≥ 1. For p < pg (q), there exists c1 (p) satisfying c1 (p) > 0 such that φp,q (An ) ≤
c1 (p) nd−1
for all n.
(5.1)
Let r < s < t < pg (q). By (4.9), φs,q (Fn ) ≥
(d − 1) log n − log φt,q (An ) + O(1) ≥ + O(1), log C log C
where 1 < C = q/(t − s) < ∞. Insert this into (4.8) to obtain that φr,q (An ) ≤
c2 (r) d−1+1 2 (r) n
for all n
(5.2)
Decay of Correlations in Random-Cluster Models
477
for some strictly positive and finite c2 (r) and 12 (r). This holds for all r < pg (q), and is an improvement over (5.1). Next we shall obtain an improvement of (5.2). Let m be a positive integer, and let Ri = im for 0 ≤ i ≤ K, where K = bn/mc. Let Li be the event {∂3Ri ↔ ∂3Ri+1 }, and let Hi = FLi , the minimal number of extra edges needed for Li to occur. Clearly, Fn ≥
K−1 X
Hi ,
(5.3)
i=0
since every path from 0 to ∂3n traverses each annulus 3Ri+1 \ 3Ri . There exists a constant η (≥ 1) such that |∂3R | ≤ ηRd−1 for all R. Therefore, by the translation invariance of φp,q , φp,q (Li ) ≤ |∂3Ri |φp,q (Am ) ≤ ηnd−1 φp,q (Am ).
(5.4)
Let r < s < pg (q), and let c2 = c2 (s), 12 = 12 (s), where the functions c2 (p) and 12 (p) are given as in (5.2). It follows from (5.2) and (5.4) that φs,q (Li ) ≤ ηnd−1 if
c2 d−1+1 2 m
≤
1/(d−1+12 )
m = {(2ηc2 )nd−1 }
1 2
(5.5)
,
(5.6)
and we choose m accordingly (here and later, we assume that n is large). Now Hi ≥ 1 if Li does not occur, whence φs,q (Fn ) ≥
K−1 X
{1 − φs,q (Li )} ≥ 21 K
(5.7)
i=0
by (5.3) and (5.5). Also,
K = bn/mc ≥ Dn13
(5.8)
by (5.6), for appropriate positive constants D, 13 satisfying D > 0, 0 < 13 < 1. In conjunction with (4.8) and (5.7), this lower bound for K implies that φr,q (An ) ≤ exp −c3 n13 for all n, (5.9) where c3 = c3 (r) > 0, 0 < 13 = 13 (r) < 1. This holds for all r < pg (q).
Proof of Lemma 6. We prove that µ(p, q) > 0 by an argument to be found in [18]. Let Πn be the set of all paths of L joining the origin to ∂3n . With T (π) denoting the number of closed edges in a path π, we have that T (π) + 1 ≥
X x∈π
1 , |Cx ∩ π|
where the sum is over all vertices x of π, and Cx is the open cluster at x. It follows by Jensen’s inequality that ( )−1 1 X 1 1 X T (π) + 1 ≥ ≥ |Cx | . |π| |π| x∈π |Cx | |π| x∈π
478
G.R. Grimmett, M.S.T. Piza
Therefore, Fn + 1 ≥ inf π∈Πn n where
( Kn = sup
π∈Πn
T (π) + 1 |π|
≥ Kn−1 ,
) 1 X |Cx | . |π| x∈π
Using (4.7), we find that µ(p, q) ≥ K −1 a.s., where )# " ( 1 X |Cx | : |π| = m , K = lim sup sup |π| x∈π m→∞
(5.10)
where the (inner) supremum is over all paths from the origin containing m vertices. We propose to show that K < ∞ a.s., whence µ(p, q) > 0 as required. ex : x ∈ Zd } be a collection of independent subsets of Zd with the property Let {C ex has the same distribution as Cx . We claim, as in [18], that {|Cx | : x ∈ Zd } is that C dominated stochastically by {Mx : x ∈ Zd }, where ey | : y ∈ Zd , x ∈ C ey }. Mx = sup{|C We prove this inductively. Let v1 , v2 , . . . be a deterministic ordering of Zd . Given the ex : x ∈ Zd }, we shall construct a family {Dx : x ∈ Zd } having random variables {C ey for the same joint distributions as {Cx : x ∈ Zd } and satisfying (for each x) Dx ⊆ C ev . Given Dv , Dv , . . . , Dv , we define someS y depending on x. First, we set Dv1 = C 1 1 2 n n E = i=1 Dvi . If vn+1 ∈ E, we set Dvn+1 = Dvj for some j such that vn+1 ∈ Dvj . If vn+1 ∈ / E, we argue as follows. Let 1E be the set of edges of Zd having exactly one ev such that F has the conditional endpoint in E. We may find a (random) subset F of C n+1 distribution of Cvn+1 given that all edges in 1E are closed; we now set Dvn+1 = F . [It is here that we use the FKG inequality.] We obtain the stochastic domination accordingly. It follows by (5.10) that )# " ( 1 X Mx : |π| = m a.s. K ≤ lim sup sup |π| x∈π m→∞ By Lemma 2 of [18], p. 760], " K ≤ 2 lim sup sup m→∞
(
)# 1 X e 2 |Cx | : |0| = m |0| x∈0
a.s.,
where the (inner) supremum is over all animals 0 of L having m vertices and containing ex |2 has finite the origin. Using the result of [16], the right side is a.s. finite so long as |C th (d + ) moment for some > 0. The conclusion of Lemma 6 follows. Acknowledgement. This work was aided by partial support from the European Union under contract CHRX– CT93–0411, and from the Engineering and Physical Sciences Research Council under grant GR/L15425.
Decay of Correlations in Random-Cluster Models
479
References 1. Abraham, D.B. and Newman, C. M.: The wetting transition in a random surface model. J. Stat. Phys. 63, 1097–1111 (1991) 2. Aizenman, M.: Geometric analysis of φ4 fields and Ising models. Commun. Math. Phys. 86, 1–48 (1982) 3. Aizenman, M. and Barsky, D. J.: Sharpness of the phase transition in percolation models: Commun. Math. Phys. 108, 489–526 (1987) 4. Aizenman, M., Barsky, D. J., and Fern´andez, R.: The phase transition in a general class of Ising-type models is sharp. J. Stat. Phys. 47, 343–374 (1987) 5. Aizenman, M., Chayes, J. T., Chayes, L., Fr¨ohlich, J, and Russo, L.: On a sharp transition from area law to perimeter law in a system of random surfaces. Commun. Math. Phys. 92, 19–69 (1983) 6. Aizenman, M., Chayes, J.T., Chayes, L., and Newman, C.M. Discontinuity of the magnetization in one-dimensional 1/|x − y|2 Ising and Potts models. J. Stat. Phys. 50, 1–40 (1988) 7. Aizenman, M. and Fern´andez, R.: On the critical behavior of the magnetization in high-dimensional Ising models. J. Stat. Phys. 44, 393–454 (1986) 8. Aizenman, M. and Grimmett, G.R.: Strict monotonicity for critical points in percolation and ferromagnetic models. J. Stat. Phys. 63, 817–835 (1991) 9. Baxter, R.J.: Exactly Solved Models in Statistical Mechanics. London: Academic Press, 1982 10. van den Berg, J. and Kesten, H.: Inequalities with applications to percolation and reliability. J. Appl. Prob. 22, 556–569 (1985) 11. Bezuidenhout, C.E. and Grimmett, G.R.: The critical contact process dies out. Ann. Prob. 18, 1462–1482 (1990) 12. Bezuidenhout, C.E. and Grimmett, G.R.: Exponential decay for subcritical contact and percolation processes. Ann. Prob. 19, 984–1009 (1991) 13. Bezuidenhout, C.E., Grimmett, G.R., and Kesten, H.: Strict inequality for critical values of Potts models and random-cluster processes. Commun. Math. Phys. 158, 1–16 (1993) 14. Borgs, C. and Chayes, J.T.: The covariance matrix of the Potts model: a random cluster analysis. J. Stat. Phys. 82, 1235–1297 (1996) 15. Chayes, L.: The density of Peierls contours in d = 2 and the height of the wedding cake. J. Phys. A: Math. and Gen. 26, L481–L488 (1993) 16. Cox, J.T., Gandolfi, A., Griffin, P., and Kesten, H.: Greedy lattice animals I: Upper bounds. Adv. Appl. Prob. 3, 1151–1169 (1993) 17. Dobrushin, R.L. and Pecherski, E.A.: Uniqueness conditions for finitely dependent random fields. 233– 261 (1981 )In: Random Fields, Esztergom (Hungary). 1979, 1, Amsterdam: North-Holland 18. Fontes, L. and Newman, C.M.: First passage percolation for random colorings of Zd : Ann. Appl. Prob. 3, 746–762 (1993); Erratum 4, 254 19. Grimmett, G.R.: Percolation. Berlin: Springer-Verlag, 1989 20. Grimmett, G.R.: The random-cluster model. In: Probability, Statistics and Optimisation. ed F.P. Kelly, Chichester: John Wiley & Sons 49–63 21. Grimmett, G.R. Probability and Phase Transition. Dordrecht: Kluwer, 1994 22. Grimmett, G.R.: Comparison and disjoint-occurrence inequalities for random-cluster models. J. Stat. Phys.+78, 1311–1324 (1995) 23. Grimmett, G.R.: The stochastic random-cluster process and the uniqueness of random-cluster measures. Ann. Prob. 23, 1461–1510 (1995) 24. Grimmett, G.R. and Stirzaker, D.R.: Probability and Random Processes: Problems and Solutions. Oxford: Oxford University Press, 1992 25. Hammersley, J.M.: Percolation processes. Lower bounds for the critical probability. Ann. Math. Stat. 28, 790–795 (1957) 26. Hintermann, D., Kunz, H., and Wu, F.Y.: Exact results for the Potts model in two dimensions. J. Stat. Phys. 19, 623–632 (1978) 27. Kesten, H.: On the time constant and path length of first-passage percolation. Adv. Appl. Prob. 12, 848– 863 (1980) 28. Kesten, H.: Aspects of first-passage percolation. In: Ecole d’Et´e de Probabilit´es de Saint Flour. XIV-1984 (P.L.Hennequin, ed.), Lecture Notes in Mathematics no. 1180 Berlin: Springer, 125–264, 1986 29. Koteck´y, R. and Shlosman, S.: First order phase transitions in large entropy lattice systems. Commun. Math. Phys. 83, 493–515 (1982) 30. Laanait, L., Messager, A., Miracle-Sole, S., Ruiz, J., and Shlosman, S.: Interfaces in the Potts model I: Pirogov–Sinai theory of the Fortuin–Kasteleyn representation. Commun. Math. Phys. 140, 81–91 (1982)
480
G.R. Grimmett, M.S.T. Piza
31. Laanait, L., Messager, A., and Ruiz, J. Phase coexistence and surface tensions for the Potts model. Commun. Math. Phys. 105, 527–545 (1986) 32. Lieb, E.H.: A refinement of Simon’s correlation inequality. Commun. Math. Phys. 77, 127–135 (1980) 33. Menshikov, M.V.: Quantitative estimates and rigorous inequalities for critical points of a graph and its subgraphs. Th. Prob. Appl. 32, 544–547 (1980) 34. Menshikov, M.V.: Coincidence of critical points in percolation problems. Sov. Math. Dokl. 33, 856–859 (1986) 35. Menshikov, M.V., Molchanov, S.A., and Sidorenko, A.F.: Percolation theory and some applications. In: Itogi Nauki i Techniki. (Series of Probability Theory, Mathematical Statistics, Theoretical Cybernetics) 24, 53–110 (1986) 36. Simon, B.: Correlation inequalities and the decay of correlations in ferromagnets. Commun. Math. Phys. 77, 111–126 (1980) 37. Welsh, D.J.A.: Percolation in the random-cluster process. J. Phys. A: Math. and Gen. 26, 2471–2483 (1993) Communicated by J.L. Lebowitz
Commun. Math. Phys. 189, 481 – 496 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Reversibility in Infinite Hamiltonian Systems with Conservative Noise? J´ozsef Fritz1 , Carlangelo Liverani2 , Stefano Olla3 1
Department of Probability and Statistics, E¨otv¨os Lor´and University of Sciences, H-1088 Budapest, M´uzeum krt. 6-8, Hungary. E-mail: [email protected] 2 II Universit´ a di Roma “Tor Vergata,” Dipartimento di Matematica, 00133 Roma, Italy. E-mail: [email protected] 3 Universit´ e de Cergy–Pontoise, D´epartement de Math´ematiques, 2 avenue Adolphe Chauvin, Pontoise 95302 Cergy–Pontoise Cedex, France and Centre de Math´ematiques Appliqu´ees, Ecole Polytechnique, 91128 Palaiseau Cedex, France. E-mail: [email protected] Received: 26 September 1996 / Accepted: 3 January 1997
In Memoriam Roland Dobrushin Abstract: The set of stationary measures of an infinite Hamiltonian system with noise is investigated. The model consists of particles moving in R3 with bounded velocities and subject to a noise that does not violate the classical laws of conservation, see [OVY]. Following [LO] we assume that the noise has also a finite radius of interaction, and prove that translation invariant stationary states of finite specific entropy are reversible with respect to the stochastic component of the evolution. Therefore the results of [LO] imply that such invariant measures are superpositions of Gibbs states. 0. Introduction Let denote the space of locally finite configurations ω = (qα , pa )α∈I indexed by a countable set I , that is qα , pα ∈ R3 are the position and momentum of particle α ∈ I ; the set {qα }α∈I has no limit points in R3 by assumption. The classical dynamics of the system is governed by a formal Hamiltonian H , H(ω) =
X
φ(pα ) +
α∈I
1 XX V (qα − qβ ) , 2 α∈I β6=α
where the kinetic energy φ : R3 7→ R is strictly convex with bounded derivatives, and V : R3 7→ R is a symmetric and superstable pair potential of finite range. The associated Liouville operator will be denoted by L , Lϕ =
X α∈I
?
h
∂H ∂ϕ ∂H ∂ϕ , i−h , i, ∂pα ∂qα ∂qα ∂pα
Work partially supported by grants CIPA-CT92-4016 and CHRX-CT94-0460 of the Commission of the European Community, and by grant T 16665 of the Hungarian NSF. Two of us (J.F. and C.L.) acknowledge hospitality of the Ervin Schr¨odinger Institute.
482
J. Fritz, C. Liverani, S. Olla
where h·, ·i denotes the usual scalar product in R3 . Almost nothing is known on the ergodic properties of such infinite systems. In fact, very few results are available even for finite systems of this type (e.g., [KSS, DL, BLPS, LW]). To ensure proper ergodic behavior of the system we add some noise, whereby obtaining stochastic equations of motion; these equations read dqα
=
dpα
=
φ0 (pα ) dt , d X X X θ θ − V 0 (qα − qβ ) dt + bα (ω) dt + σα,β (ω) dwα,β , β6=α
(0.1)
θ=1 β6=α
θ where wα,β is a family of independent one-dimensional Wiener processes for θ = θ θ 1, 2, ..., d and α 6= β such that wα,β = −wβ,α ; φ0 and V 0 denote the gradient of φ θ and V , respectively. The coefficients bα , σα,β : 7→ R3 are smooth local functions to be specified in the next section in such a way that total energy and momentum are both preserved by the randomized evolution (0.1). In addition, any Gibbs state IP with energy H will be a reversible measure for the stochastic part of the evolution: Z Z b b ϕ(ω)Lψ(ω) IP (dω) = ψ(ω)Lϕ(ω) IP (dω) (0.2)
for all smooth local functions ϕ, ψ : 7→ R , where b = Lψ
X α∈I
∂ψ 1 XXX θ 2 θ i+ hσα,β , (Dα,β ψ)σα,β i, ∂pα 4 d
hbα ,
(0.3)
θ=1 α∈I β6=α
2 ψ is the matrix of second derivatives obtained by applying Dα,β = ∂/∂pα − and Dα,β ∂/∂pβ twice to ψ . Since the Liouville operator is antisymmetric with respect to Gibbs e = L+L b also satisfies the stationary Kolmogorov distributions, the full generator, L e dIP = 0 for a wide class of test functions ψ and any Gibbs state IP . equation, ∫ Lψ The converse problem is much more complex. In our basic reference [LO] it is shown that if a translation invariant measure Q with finite specific entropy satisfies the stationary Kolmogorov equation and (0.2), together with some other technical conditions, then Q enjoys the Gibbs property. Let us remark that finiteness of specific entropy is a fairly natural and effective condition in the theory of hydrodynamics limits (see [OVY]). On the contrary, condition (0.2) looks rather restrictive and, at least in general, not particularly natural. The main purpose of this paper is to show that condition (0.2) of reversibility is superfluous (i.e., it follows from the stationarity of the measure).1 To obtain such a result we are forced to prove the existence of a semigroup defined by (0.1); its regularity (locality) will play a crucial role in the argument. This problem may not seem to be a very difficult one since φ0 (pα ) , the velocity of particle α, is bounded by assumption. However, ¯ ⊂ of initial configurations: we the evolution must be defined for a very large set ¯ need Q() = 1 for any probability measure Q of finite specific entropy. On the other hand, to obtain the necessary regularity properties of the dynamics we have to restrict the configuration space by excluding extremely high values of particle density. We shall see that the desired construction fails unless the dimension of the space is less than four, cf. [FD] and [S]. 1 Note that in applications to hydrodynamics the reversibility (0.2) is insured by construction (see [OVY], Lemma 4.4), hence the present paper does not add to hydrodynamics type problems for which the results in [LO] suffice. The focus here is on the classification of stationary translation invariant measures.
Reversibility in Infinite Hamiltonian Systems with Conservative Noise
483
1. Notations and Results Configurations can be interpreted as σ-finite integer valued measures on R3 × R3 ; sometimes we write ω = (q, p) with q = (qα )α∈I and p = (pα )α∈I , and if 3 ⊂ R3 , then ω3 denotes the restriction of ω to 3 , i.e. ω3 = (qα , pα )qa ∈3 , while |ω3 | is the cardinality of this set. The centered Pcubic box of side r > 0 will be denoted by 3r . Referring to functionals ω(ϕ) := α∈I ϕ(qα , pα ) , where ϕ : R3 × R3 7→ R is continuous with compact support, we equip with the associated weak topology and Borel structure, and C0 () denotes the space of cylinder (local) functions 9(ω) = f (ω(ϕ1 ), ω(ϕ2 ), ..., ω(ϕn )) such that f ∈ C(Rn ) . Since all sets Σ(δ), defined for an increasing sequence δ = (δ1 , δ2 , ...) such that δ1 ≥ 1 by Σ(δ) := {ω ∈ : |ω3n | ≤ δn and |pα | ≤ δn if qα ∈ 3n } are compact, we need not worry too much about topology. Indeed, in the forthcoming considerations we always do have an a priori bound allowing us to restrict calculations to some compact Σ(δ) . In these situations all reasonable topologies coincide, moreover any continuous function can be uniformly approximated by elements of C0 () in view of the Stone-Weierstrass Theorem. Interaction. We consider a repelling pair potential V of finite range such that V (x) = V (−x) is twice continuously differentiable, V (0) > 0 but V (x) = 0 if |x| > R0 , finally hx, V 0 (x)i ≤ 0 for all x ∈ R3 . These conditions imply that V is superstable, see [R]: for each cubic box or ball 3 ⊂ R3 there exist some constants A3 ≥ 0 and B3 > 0 such that, for any configuration, we have X X
V (qα − qβ ) ≥ B3 |ω3 |2 − A3 |ω3 |.
(1.1)
α:qα ∈3 β6=α
Kinetic energy. We assume that φ has bounded second derivatives, and velocities are also bounded, i.e. |φ0 (y)| ≤ c¯ < +∞ for all y ∈ R3 . To define Gibbs measures we need a lower bound: lim inf |y|→∞ φ(y)/|y| ≥ c > 0 . When results of [LO] are applied an extra technical condition on φ is needed. For simplicity one can consider the case in P3 which φ(y) = i=1 φ0 (yi ) for y = (y1 , y2 , y3 ) , where φ0 ∈ C ∞ (R) is strictly convex and 1 d2 00 2 iv 00 (φ (u))2 = φ000 0 (u) + φ0 (u)φ (u) 6= 0 2 du2 0 apart from, at most, finitely many points (see [LO], Sect. Two, “Condition on the Noise" d2 00 2 for the general condition that φ must satisfy). Notice that if du 2 (φ0 (u)) = 0 for each u and if we require the natural condition φ0 (u) = φ0 (−u), then φ000 is a constant, which is the classical case of a quadratic kinetic energy function. Stochastic perturbation of classical dynamics. There are several ways to select the coefficients of the stochastic perturbation. We set bα (ω) =
X β6=α
θ γα,β (q)F (pα , pβ ) and σα,β (ω) =
p γαβ (q)Gθ (pα , pβ ) ,
(1.2)
484
J. Fritz, C. Liverani, S. Olla
where, as in [LO],2 γα,β (q) = γβ,α (q) ≥ 0 is continuously differentiable, γα,β (q) > 0 if |qα − qβ | < R1 and it is zero for |qα − qβ | ≥ R1 , i.e. R1 > R0 is the radius of stochastic interaction. The functions F, Gθ : R6 7→ R3 are infinitely differentiable and bounded together with their derivatives; they are chosen in such a way that the stochastic interaction also preserves the total momentum and energy of an interacting couple of particles. Moreover, {Gθ }dθ=1 spans, at each point, all R3 . It is natural to assume that γα,β depends only on the interparticle distances, and it does not depend on a coordinate qδ if |qα − qδ | > R2 or |qβ − qδ | > R2 , where R2 > 2R1 is a constant. Therefore the stochastic interaction is also translation invariant and has a finite range R3 := R1 + R2 . A new feature of the present model is that γα,β vanishes when the number of particles near qα or qβ tends to infinity. For convenience, we set γα,β (q) = σ(qα − qβ )Θα,β (q) , where −1 X X χ(qα − qδ ) + χ(qβ − qδ ) . (1.3) Θα,β (q) := 1 + δ∈I
δ∈I
In (1.3) σ, χ : R 7→ [0, 1] are twice continuously differentiable, σ(x) = σ(−x) > 0 if |x| < R1 and it is zero for |x| ≥ R1 . Similarly, χ(x) = χ(−x) > 0 if |x| ≤ 2R1 and χ(x) = 0 if |x| > R2 with some R2 > 2R1 . A technical condition, |χ0 (x)| ≤ Kχ(x)1−κ , where 0 < κ < 1/9 , will be exploited in Lemma 2.4. θ θ = −wβ,α , the condition F (pα , pβ ) = −F (pβ , pα ) of antisymmetry of F Since wα,β clearly implies the conservation of total momentum. For convenience, we choose 3
1X hGθ (pα , pβ ), Dα,β iGθ (pα , pβ ) 2 d
F (pα , pβ )
:=
and
θ=1
θ ϕ Xα,β
:=
1 √ hGθ (pα , pβ ), Dα,β ϕi , 2
b of the random component of our process becomes3 then the formal generator L XXX θ θ b =1 γα,β (q)Xα,β ϕ . Xα,β Lϕ 2 d
(1.4)
θ=1 α∈I β6=α
In this case the orthogonality relations hGθ (pα , pβ ), φ0 (pα ) − φ0 (pβ )i = 0
(1.5)
imply the formal conservation of energy, see [LO]. To have conservation of phase volume it is also assumed that (1.6) hDα,β , Gθ (pα , pβ )i = 0 , θ θ Xα,β are symmetric with respect to Lebesgue measure i.e. the operators γα,β Xα,β dpα dpβ . As a consequence we shall see that the conservation laws imply the reversibilb . For an explicit example of F and Gθ , see the ity of Gibbs states with respect to L Appendix of [LO]. 2 In fact, in [LO], the functions γ αβ depend only on the variables qα and qβ ; yet all that is done there applies without changes to the situation described here. 3 The future requirements (1.5) and (1.6) imply that X θ ∗ = −X θ , where the adjoint is taken with α,β α,β respect to the measure defined by the kinetic energy, (see[LO]).
Reversibility in Infinite Hamiltonian Systems with Conservative Noise
485
Gibbs measures. Let λ = (λ0 , λ1 , λ2 , λ3 , λ4 ) be a set of real parameters with λ4 > 0 and λ21 + λ22 + λ23 < c2 , and denote by Π the distribution of a Poisson process of unit intensity in R3 . A probability measure IP on is called a Gibbs state for H with parameters λ if its conditional distributions given the configuration outside of any cubic box 3 ⊂ R3 can be represented as " # 3 n X X 1 exp λ0 |ω3 | + λi piα − λ4 H3 (ω3 , ω3c ) Π(dq3 ) dp3 , IP [dω3 |ω3c ] = Z3 α=1 i=1
where Z3 is the normalization, and a natural decomposition ω3 = (q3 , p3 ) is used, see [D]. The local Hamiltonian, H3 is defined as X X X φ(pα ) + 1 V (qα − qβ ) + V (qα − qβ ) , H3 (ω3 , ω3c ) = 2 q ∈ω q ∈ω c α
3
qβ ∈ω3 ; α6=β
β
3
the set of such measures will be denoted by Pλ , see [R] for the existence of Gibbs states for superstable interactions. Relative entropy. Let Q and P be probability measures on , and for any 3 ⊂ R3 denote F3 the set of continuous and bounded functions ψ : 7→ R such that ψ(ω) = ψ(ω3 ) for all ω ∈ . The entropy of Q in 3 , relative to P , is defined by (1.7) H3 (Q|P ) = sup IE Q (ψ) − log IE P (eψ ) , ψ∈F3
where IE Q denotes the expectation with respect to the probability measure Q . If 3 = R3 then the subscript 3 of H3 will be omitted; for properties of H3 , see, for example [OVY]. As a reference measure a distinguished, translation invariant, Gibbs state P = IP will be chosen. We say that Q has finite specific entropy if there exists a constant C such that H3 (Q|IP ) ≤ C(1 + |3|) for any cubic box 3 . If Q is translation invariant with finite specific entropy, then the particle density ρ = ρ(ω) is Q-a.s. defined as the following limit taken along any increasing sequence of cubic boxes, see [LO], ρ(ω) = lim |3|−1 |ω3 |. |3|→∞
Main results of [LO] for the system under consideration can be summarized as follows. Theorem 1.1. Suppose that Q is a translation invariant probability measure on with finite specific entropy, and let ρc := 3/(4πR13 ) . If Q[ρ(ω) > ρ0 ] = 1 for some ρ0 > ρc , e = L+L b in the sense that, for any smooth local (ii) Q is invariant with respect to L Q e function ψ we have IE (Lψ) = 0 , (i)
b , i.e. IE Q (ψ Lϕ) b b (iii) Q is reversible with respect to L = IE Q (ϕLψ) for any pair ϕ, ψ of smooth local functions, then Q is a convex combination of Gibbs states.
486
J. Fritz, C. Liverani, S. Olla
Statement of the result. Notice that the theorem above is stated without any reference to the existence of the infinite dynamics; properties (ii) and (iii) of invariance are purely formal. However, the extraction of such local information as reversibility is usually based on a method of Liapunov functions, namely entropy and its rate of change are compared, so the first step of our argument is intrinsically related to the evolution. Theorem 1.2. Under the conditions on the stochastic dynamics listed above, there exists ¯ ⊂ such that Q() ¯ = 1 for each Q with finite specific entropy. an explicitly defined set ¯ we have a unique strong solution ω(t) , t ≥ 0 to (0.1) such Moreover, for each ω0 ∈ ¯ a.s. The solution is a measurable function of the initial that ω(0) = ω0 and ω(t) ∈ configuration, and every Gibbs state IP ∈ Pλ with λ4 > 0 and λ1 = λ2 = λ3 = 0 is a stationary measure for the random evolution. This theorem is proven in the next section; solutions are defined by a limiting procedure starting from finite systems. The restriction on the parameters of a Gibbs measure in the last statement could have been removed by elaborating some technical details, but we do not need such a general assertion. Having constructed the infinite evolution we can consider stationary measures instead of simply measures formally invariant as in Theorem 1.1 (ii).4 Theorem 1.3. Every translation invariant stationary measure with finite specific enb of the generator; that is, condition tropy is reversible with respect to the stochastic part L (iii) in Theorem 1.1 holds. The proof of Theorem 1.3 is the content of Sect. 3. Combining the above results we get the final result of the paper: Theorem 1.4. Let Q be a translation invariant stationary measure with finite specific entropy, then condition (i) of Theorem 1.1 implies that Q is a superposition of Gibbs states.
2. Infinite Dynamics We start this section by describing the set of allowed initial configurations. Although the definition is a bit technical our choice boils down to configurations for which the energy in a box does not grow too fast with respect to the size of the box. The exact meaning of this construction will become more clear later on when the desired a priori bounds for a family of partial dynamics and the requirements for the existence of a unique limiting dynamics are discussed. Initial conditions. Let Hm (ω, r) denote the total energy of ω ∈ in a ball Bm (r) ⊂ R3 of center m and radius r ≤ ∞, i.e. Hm (ω, r) := H(ωBm (r) ); the number of points of q in Bqα (r) will be denoted as Nα (q, r) . For κ ∈ (0, 1/9) , see (1.3), and r ≥ R3 = R1 + R2 , define Hm (ω, R3 ) , H¯ κ,r (ω) := sup 3+2κ 1 |m|≤r + |m|
¯ κ,r (h) = {ω ∈ : H¯ κ,r (ω) ≤ h}.
4 Notice that, since the infinite dynamics satisfies Eqs. (0.1), if Q is stationary, then it satisfies (ii) of Theorem 1.1.
Reversibility in Infinite Hamiltonian Systems with Conservative Noise
487
¯ κ,r (h) such that Q(Hm (ω, R3 )) ≤ Let Qr (k) be the set of Borel probability measures on k for all |m| ≤ r. Now the set of all allowed configurations is defined as [ ¯ κ,∞ (h) = {ω ∈ : H¯ κ,∞ (ω) < ∞}. ¯ κ,∞ = h>0
¯ (h) are compact in the weak topology of and, in Remember that the level sets √κ,∞3/2+κ ¯ κ,L (h) . view of (1.1), Nα (q, R3 ) = O( hL ) for qα ∈ Bm (L − R3 ) and (q, p) ∈ We shall see that the initial condition for the existence and uniqueness of the limiting dynamics could have been formulated in terms of Nα only, but a preservation of bounds on kinetic energy will be needed when we prove locality of the dynamics. Lemma 2.1. If κ > 0 then for any fixed k > 0 we have lim inf
inf
h→∞ r≥R3 Q∈Qr (k)
¯ κ,∞ ) = 1 if Q ∈ Q∞ := that is Q(
S k>0
¯ κ,r (h)) = 1, Q(
Q∞ (k).
Proof. This statement is a direct consequence of the Markov inequality. In fact we have some universal v > 0 such that (by vZ3 we denote the tridimensional cubic lattice of size v) X ¯ κ,r (h)c ) ≤ Q[Hm (ω, R3 ) > vh(1 + |m|3+2κ )] Q( m∈B0 (r)∩vZ3
X
≤
m∈B0 (r)∩vZ3
k X Q(Hm ) 1 ≤ , 3+2κ h(1 + |m| )v vh (1 + |m|3+2κ ) 3
which proves the statement for any κ > 0 .
m∈vZ
Observe that the entropy condition H3 (Q|IP ) ≤ C(1 + |3|) implies Q ∈ Q∞ via (1.7) and (1.1), see Lemma 3.1 of [LO]. Local dynamics. There are several ways to define a family of partial dynamics, the advantage of the following construction consists in its direct relation to Gibbs states. Let a : R3 7→ [0, 1] be twice continuously differentiable with compact support. We assume also |a0 (x)| ≤ 1 for all x ∈ R3 . We interpret a as a smooth version of the indicator function of a ball, its concrete shape is not very important. For every such cutoff a and inverse temperature λ4 > 0 we consider a system of stochastic differential equations, dqα dpα
1 λ4 H3 (ω3 ,ω3c ) ∂ e a(qα )e−λ4 H3 (ω3 ,ω3c ) dt , λ4 ∂pα 1 λ4 H3 (ω3 ,ω3c ) ∂ = e a(qα )e−λ4 H3 (ω3 ,ω3c ) dt λ4 ∂qα X γα,β (q)a(qβ )F (pα , pβ ) dt +a(qα ) =
−
β6=α d X X p p θ + a(qα ) a(qβ )γα,β (q)Gθ (pα , pβ ) dwα,β , θ=1 β6=α
(2.1)
488
J. Fritz, C. Liverani, S. Olla
where it is assumed that 3 ⊂ R3 is bounded and contains the support of a in its interior; in such a situation the equations above do not depend on the particular choice of 3 . Notice that in a region where a = 1 our particles follow the original equations of motion, while they are frozen outside of the support of a , i.e. q˙α = p˙α = 0 . Particles approaching the boundary of the support of a slow down, thus we have a smooth transition between moving and frozen particles, see [F1] for a similar construction. This means that we essentially have a finite dimensional diffusion. Let Pλt 4 ,a denote the Markov semigroup induced by partial dynamics (2.1), i.e. Pλt 4 ,a ψ(ω) := IE w (ψ(ω(t)) , where ω(t) is the solution with initial condition ω(0) = ω , ψ : 7→ R is continuous and bounded, while IE w denotes the expectation with respect to the joint distribution of our Wiener processes. By a direct application of the Ito lemma we see that the (formal) generator of Pλt 4 ,a e λ ,a = Lλ ,a + L b a , where decomposes as L 4 4 ∂ψ 1 X λ4 H3 (ω3 ,ω3c ) ∂ e a(qα )e−λ4 H3 (ω3 ,ω3c ) Lλ4 ,a ψ = − λ4 ∂pα ∂qα α∈I ∂ψ 1 X λ4 H3 (ω3 ,ω3c ) ∂ + e , (2.2) a(qα )e−λ4 H3 (ω3 ,ω3c ) λ4 ∂qα ∂pα α∈I
ba ψ L
=
1 2
d XX X
θ θ γα,β (q)a(qα )a(qβ )Xα,β (Xα,β ψ).
θ=1 α∈I β6=α
Since the coefficients of (2.1) are bounded smooth functions, we have a differentiable dependence of solutions on initial values. Therefore a class Da of twice continuously b a , e.g. in the space of condifferentiable functions forms a common core of Lλ4 ,a and L tinuous and bounded functions. An extension to the space L2 (IP λ ) of square integrable functions with respect to a distinguished Gibbs state IP λ follows by the next lemma. Lemma 2.2. Let λ1 = λ2 = λ3 = 0 while λ4 > 0 , then every Gibbs state IP ∈ Pλ satisfies b a ψ2 ) = IE IP (ψ2 L b a ψ1 ) IE IP (ψ1 Lλ4 ,a ψ2 ) = −IE IP (ψ2 Lλ4 ,a ψ1 ) and IE IP (ψ1 L for ψ1 , ψ2 ∈ Da , consequently IP is a stationary measure of the process Pλt 4 ,a for each cutoff a. Proof. Both symmetry relations follow from the definition of IP by integrating by parts. The property of reversibility is a direct consequence of (1.6). Integration by parts with respect to positions is possible because of the presence of the cutoff a . Since (2.1) violates the law of momentum conservation in regions where a is not a constant, Lemma 2.2 is not true for general Gibbs measures. Construction of the infinite dynamics. First we derive an a priori bound for local dynamics; we show that the set of initial conditions is preserved for all t > 0 and the related bound does not depend on the particular choice of the cutoff function a . Lemma 2.3. There exists a constant c1 depending only on λ4 and on the parameters of the infinite system (0.1) such that IE w (Hm (ωa (t), R3 )) ≤ (c1 + c1 t)(1 + Hm (ωa (0), R3 + c¯t) for all m, t and a , where ωa (t) is any solution to (2.1).
Reversibility in Infinite Hamiltonian Systems with Conservative Noise
489
Proof. Since all velocities are bounded by c¯ , we have |qα (t) − qα (0)| ≤ c¯t , whence Nα (q(t), r) ≤ Nα (q(0), r + c¯t) , which yields an explicit deterministic bound for the potential energy via superstability (1.1). On the other hand, |bα (ω)| +
d X X
θ |σα,β (ω)|2 ≤ c01 Nα (q, R1 ) ,
θ=1 β6=α
and the same bound holds true for the corresponding coefficients of (2.1). From the stochastic equations by the Schwarz inequality we get (2.3) IE w |pα (t) − pα (0)|2 ≤ c001 t(1 + t)Nα (q(0), R1 + c¯t)2 , which completes the proof. Indeed, as φ(y) ≤ φ(0) + c¯|y| , taking the square root of both sides and summing for α ∈ I such that |qα (0) − m| ≤ R3 + c¯t , we get a bound for Hm (ω(t), R3 ) ; the square of the number of points at time zero is estimated again by superstability. To prove the existence of limiting solutions when the cutoff is removed we have to compare different partial solutions. Let AL denote the set of twice continuously differentiable a : R3 7→ [0, 1] with compact support such that |a0 (x)| ≤ 1 for all x and ¯ = (q(t), ¯ p(t)) ¯ denote a(x) = 1 if |x| ≤ L . For a, a¯ ∈ AL let ω(t) = (p(t), q(t)) and ω(t) the corresponding solutions to (2.1) with a common initial value ω(0) = ω(0) ¯ = (ξ, η) . Supposing |x| ≤ L − 2c¯t, for |x − ξα | ≤ R0 + 2c¯t we get ∂t |qα − q¯α | ≤ K0 |pα − p¯α | , whence Z t 2 |qα (s) − q¯α (s)||pα (s) − p¯α (s)| ds ; (2.4) |qα (t) − q¯α (t)| ≤ 2K0 0
K0 , K1 , K10 ...
denote constants depending only on the coeffihere and in what follows, cients of the infinite system. The case of the momentum variables is more complex. If a = a¯ = 1 can be assumed as before, then by Ito’s formula we get Z t Jα,1 (s) + Jα,2 (s) + Jα,3 (s) ds , IE w |pα (t) − p¯α (t)|2 = IE w 0
where
X hpα − p¯α , V 0 (qα − qβ ) − V 0 (q¯α − q¯β )i ,
=
−2
Jα,2
=
X 2 hpα − p¯α , γα,β (q)F (pα , pβ ) − γα,β (q)F ¯ (p¯α , p¯β )i ,
Jα,3
d X X p p 2 = | γα,β (q)Gθ (pα , pβ ) − γαβ (q)G ¯ θ (p¯α , p¯β )| .
Jα,1
α6=β
β6=α
θ=1 β6=α
Introduce now 0i for i = 1, 2 and r, t ≥ 0 by X |qα (t) − q¯a (t)|2 ; 01 (ξ, η, a, a¯ ; r, t) := α:|ξα |≤r
in the definition of 02 the variables qα and q¯a should be replaced by pα and p¯α , respectively. Our main tool is the following:
490
J. Fritz, C. Liverani, S. Olla
Lemma 2.4. Suppose that κ < 1/9 , 2c¯T < R1 . For all r ≥ R3 , t ≤ T , h > 0 and i = 1, 2 we have lim
sup
sup
L→∞ a,a∈A ¯ κ,L (h) ¯ L (ξ,η)∈
IE w (0i (ξ, η, a, a¯ ; r, t)) = 0 ,
and the convergence is uniform on the time interval [0, T ]. Proof. The idea of the proof is to define and to estimate a “distance" (based on 0i ) among different partial dynamics in boxes of radius r < L. This will lead us to Eq. (2.9) in which such a distance in a given box is related to the distance in a larger box, the result will easily follow. Let N¯ = N¯ L,T := max Nα (ξ, R3 + 2c¯T ) for all α ∈ I such that |ξa | + R3 + 2c¯T ≤ L . Suppose r +R3 +2c¯T < L , |ξα | ≤ r , and remember that |qδ (t)−ξδ | ≤ c¯t is always true. The uniform Lipschitz continuity of V 0 , σ , F and Gθ shall also be used without any further reference in the following calculations. Let γ˜ = γ˜ α,β (t) denote any matrix such that 0 ≤ γ˜ α,β (t) ≤ 1 if t ≤ T , moreover γ˜ α,β (t) = 0 whenever |ξα − ξβ | ≥ R3 + 2c¯t . For J1 we get X γ˜ α,β (t) |qα − q¯α | + |qβ − q¯β | =: K1 J˜α,1 (t). (2.5) Jα,1 (t) ≤ K1 |pα − p¯α | β∈I
In the case of J2 the pattern |ax − by| ≤ min{|a|, |b|}|x − y| + |a − b| max{|x|, |y|} is used several times to derive X γα,β (q) |pα − p¯α | + |pβ − p¯β | Jα,2 (t) ≤ K2 J˜α,1 (t) + K2 |pα − p¯α | β6=α
+K2 |pα − p¯α |σ(qα − qβ )
XX
|∂δ Θα,β (q˜α,β )||qδ − q¯δ | ,
β6=α δ∈I
where ∂δ := ∂/∂qδ and q˜α,β is an intermediate configuration on the line segment connecting q and q¯ . Observe that by H¨older’s inequality X X 2 |∂δ Θα,β (q)| ˜ ≤ KΘα,β (q) ˜ χ(q˜α − q˜δ )1−κ + χ(q˜β − q˜δ )1−κ δ∈I
δ∈I
≤
KΘα,β (q) ˜ Nα (q, ˜ R2 )κ + Nβ (q, ˜ R 2 )κ .
(2.6)
On the other hand, σ(qα − qβ ) > 0 implies |q˜αα,β − q˜βα,β | ≤ R1 + 2c¯t ≤ 2R1 , i.e. Nα (q, R1 ) ≤ Nα (q˜α,β , 2R1 ) . This means that Nα (q, R1 ) ≤ 1/Θα,β (q˜α,β ) , consequently κ Jα,2 (t) ≤ K20 N¯ L,T J˜α,1 (t) + K20 J˜α,2 (t) ; X J˜α,2 (t) := γα,β (q) |pα − p¯α |2 + |pβ − p¯β |2 . (2.7) β6=α
In a similar way we get Jα,3 (t)
≤
K3 J˜α,2 (t) + K3 J˜α,3 (t) 2 X σ(qα − qβ ) X |∂δ Θα,β (q˜α,β )||qδ − q¯δ | , +K3 α,β Θα,β (q˜ ) β6=α δ∈I X J˜α,3 (t) := γ˜ α,β (t) |qα − q¯α |2 + |pδ − p¯δ |2 , δ∈I
Reversibility in Infinite Hamiltonian Systems with Conservative Noise
491
whence by the Cauchy inequality and (2.6) 2κ ˜ Jα,3 (t). Jα,3 (t) ≤ K30 J˜α,2 + K30 N¯ L,T
(2.8)
Introduce now d(r, t) := IE w (02 (ξ, η, a, a¯ ; r, t)) + N¯ L,T IE w (01 (ξ, η, a, a¯ ; r, t)) for t < T . Comparing (2.4), (2.5)–(2.8) and using the elementary inequality 2|pα − p¯α ||qδ − q¯δ | ≤ N¯ −1/2 |pα − p¯α |2 + N¯ 1/2 |qδ − q¯δ |2 , we obtain, by a direct calculation, Z 1/2+κ d(r, t) ≤ K4 N¯ L,T
t
d(r + R3 + 2c¯T, s) ds ,
(2.9)
0
which completes the proof by a standard iteration procedure. Indeed, we get d(r, t) ≤
T `+1 1/2+κ ` K4 N¯ L,T sup d(L, t) , `! t
(2.10)
where ` , the number of allowed √ iterations is at least cT L with cT > 0 depending only on R3 and T , while N¯ L,T = O( hL3/2+κ ) . Using |qα (t) − ξα | ≤ c¯t and the second a priori bound (2.3), we see that the right-hand side of (2.10) vanishes as L → +∞ because `! = O (`/e)` and (1/2 + κ)(3/2 + κ) < 1 by hypothesis. Now we are in a position to prove the existence and uniqueness of limiting solutions to (0.1). Let us consider a sequence of partial solutions ωn = ωn (t) , n ∈ N of (2.1) with ¯ κ,∞ ; the corresponding cutoff an : R3 7→ R a common initial value ωn (0) = (ξ, η) ∈ is assumed to be a decreasing smooth function of |x| such that an (x) = 1 if |x| ≤ n and an (x) = 0 if |x| > n + 1 . In view of Lemma 2.4 ωn converges in probability to some limit ω(t) for each t < T = R1 /2c¯ . It is easy to verify that the limit satisfies the infinite system (0.1); the uniqueness of limiting solutions follows again by Lemma 2.4. Since ¯ κ,∞ , the construction extends T does not depend on the initial configuration (ξ, η) ∈ to all times. Properties of the infinite dynamics. Let Pnt denote the Markov semigroup induced by the partial dynamics ωn . Since ωn (t) is a continuous function of the initial data, it is well defined by Pnt ψ(ξ, η) := IE w (ψ(ωn (t))) if ωn (0) = (ξ, η) for any measurable and ¯ κ,∞ 7→ R . As a limit of measurable functions, the limiting solution ω(t) bounded ψ : is again a jointly measurable function of (ξ, η) and the random element representing the θ , the limiting semigroup, P t can be defined in the same way. Wiener processes wα,β If the initial configuration is distributed by Q ∈ Q∞ , then QPnt and QP t denote the evolved measure at time t > 0 . In view of Lemma 2.1 and Lemma 2.3 we know that QP t ∈ Q∞ , too. While Pnt has fairly good regularity properties, semigroup theory does not apply directly to the limiting case. Nevertheless, all we need in the next section is summarized as follows. ¯ κ,∞ 7→ R is a continuous and bounded local function, Lemma 2.5. Suppose that ψ : i.e. ψ(ω) := ψ(ωB0 (r) ) for some r > 0 , then lim sup
sup
`→∞ n>`+r Q∈Qn (k)
|QPnt ψ − QP t ψ| = 0
for all t, k > 0 , and the convergence is uniform on compact time intervals.
492
J. Fritz, C. Liverani, S. Olla
Proof. The a priori bound of Lemma 2.3 extends immediately to the limiting dynamics, ¯ κ,r (h)) ≥ 1 − ε , thus for any ε, T > 0 we have some k¯ > k and h > k¯ such that Q( ¯ κ,r (h)) ≥ 1 − ε , and QP t ( ¯ κ,r (h)) ≥ 1 − ε whenever t < T , n > r + R3 + 2c¯T QPnt ( ¯ is compact, there exists also an ε0 > 0 such that ¯ κ,r (k) and Q ∈ Qn (k) . Since 0 ¯ ≤ε |qα − q¯α | + |pα − p¯α | < ε , for all α ∈ I with |qα |, |q¯α | ≤ r, implies |ψ(ω) − ψ(ω)| ¯ . Therefore the statement follows from Lemma 2.4 and Chebishev ¯ κ,r (k) for ω, ω¯ ∈ inequality by a 3ε argument. The final statement of Theorem 1.2 on stationarity of certain Gibbs states is now a direct consequence of Lemma 2.2. 3. An Entropy Argument In this section we extend a familiar argument by Holley [H] to the present more complex situation. For a probability measure Q on , let H(Q|IP λ ) denote the entropy relative to a distinguished Gibbs state IP λ with λ1 = λ2 = λ3 = 0, as defined by (1.7) with 3 = R3 . The family of partial dynamics (2.1) has been chosen such that IP λ is a common stationary measure of each local dynamics Pnt = Pλ4 ,an introduced in Sect. 2. Therefore Pnt is a strongly continuous semigroup in L2 (IP λ ) , and smooth cylinder functions e n = Ln + L b n , see (2.2). Remember that Ln := Lλ ,a , form a core for its generator L 4 n the Hamiltonian part, is antisymmetric in L2 (IP λ ) while the symmetric (reversible) ba . b n := L component is just L n If G is a generator in L2 (IP λ ), then the corresponding Donsker-Varadhan rate function is defined as o n Z Gψ dQ : ψ ∈ Dom G, inf ψ > 0 . D(Q|G) = sup − ψ ψ If G is self-adjoint and G < 0, then we can apply the following result due to Donsker and Varadhan (cf. [DV], Theorem 5). p Theorem √ 3.1. D(Q|G) < +∞ if and only if Q << IP λ and g := dQ/dIP λ ∈ Dom −G; moreover Z 2 √ −Gg dIP λ . (3.1) D(Q|G) = Our main tool consists of the following entropy inequality. Rt Proposition 3.2. Let Q¯ tn := (1/t) 0 QPns ds. If H(Q|IP λ ) < ∞, then H(QPnt |IP λ ) + 2tD(Q¯ tn |Lˆ n ) ≤ H(Q|IP λ ).
(3.2)
Proof. Let Pn∗t be the adjoint semigroup with respect to IP λ , which is again a diffusion with formal generator L˜ ∗n = −Ln + Lˆ n . Both forward and backward diffusion are essentially finite dimensional with smooth coefficients, thus twice continuously differentiable functions form a common core Dn of Ln and L∗n . This suffices to justify the following computations. Observe first that, as an easy consequence of Jensen’s inequality, we have H(Q0 Pnτ |Q00 Pnτ ) ≤ H(Q0 |Q00 )
(3.3)
Reversibility in Infinite Hamiltonian Systems with Conservative Noise
493
for any two measures Q0 , Q00 . For any strictly positive ψ ∈ Dn with IP λ (ψ) = 1 define Q00 by dQ00 = ψdIP λ . Since dQ00 Pnτ = Pn∗τ ψ , dIP λ we have
H(Q0 |Q00 ) = H(Q0 |IP λ ) − Q0 (log ψ)
and
H(Q0 Pnτ |Q00 Pnτ ) = H(Q0 Pnτ |IP λ ) − Q0 Pnτ (log Pn∗τ ψ).
Accordingly, by (3.3), H(Q0 |IP λ ) − H(Q0 Pnτ |IP λ ) ≥ Q0 (log ψ) − Q0 Pnτ (log Pn∗τ ψ), whence, by the concavity of the logarithm and the inequality log(x + 1) ≤ x, Z ψ − Pnτ Pn∗τ ψ 0 dQ . H(Q0 |IP λ ) − H(Q0 Pnτ |IP λ ) ≥ Q0 (log ψ) − Q0 (log Pnτ Pn∗τ ψ) ≥ ψ Remembering that Pnt and Pn∗t are both Feller semigroup and ψ belongs to the common core of L˜ n and L˜ ∗n , we have, for small τ , ψ−Pnτ Pn∗τ ψ = ψ−Pnτ ψ+ψ−Pn∗τ ψ+Pnτ ψ − Pn∗τ ψ − ψ − Pn∗τ ψ = −2τ Lˆ n ψ+o(τ ). Therefore, by dividing the given interval [0, t] into m small pieces, with τ = t/m and it
Q0 = QPnm , we get H(Q|IP λ ) − H(QPnt |IP λ )
= ≥
m−1 i i i+1 1 Xh t t H(QPnm |IP λ ) − H(QPnm |IP λ ) m→∞ m i=0 Z t Z ˆ Ln ψ −2 dQPns . ds ψ 0
lim
By taking the supremum over all ψ considered we conclude the proof.
ˆ Observe now that p if D(Q|Ln5 ) < ∞, then by Theorem 3.1 it can be written as a sum, namely, if g = dQ/dIP λ , Z X 1 θ ˆ D(Q|Ln ) = an (qα )an (qβ )γαβ (q)(Xαβ g)2 dIP λ . 2 θ,α,β
Let an,1 (x), an,2 (x), . . . , an,j (x) be smooth non-negative functions with compact support, and assume that their supports are disjoint. Furthermore, assume that an (x) ≥ an,1 (x) + · · · + an,j (x), then D(Q|Lˆ n ) ≥ D(Q|Lˆ an,1 ) + · · · + D(Q|Lˆ an,j ). Therefore, from (3.2), we have p
p
ˆ n (hence g ∈ Dom a(qα )a(qβ )γαβ (q)Xαβ )), one can approximate To see this, since g ∈ Dom −L it by local smooth functions, then use the closability of the Dirichlet form D. 5
494
J. Fritz, C. Liverani, S. Olla
H(QPnt |IP λ )
+ 2t
j X
D(Q¯ tn |Lˆ an,i ) ≤ H(Q|IP λ ).
i=1
Thus we can choose strictly positive and smooth functions ψ0 , ψ1 , . . . , ψj such that ÿ ! j X Lˆ an,i ψi t ψ0 t Q¯ n QPn (ψ0 ) − log IP λ (e ) − 2t ≤ H(Q|IP λ ). ψi i=1
This inequality extends by continuity to the infinite dynamics (cf. Lemma 2.5 and note that Q ∈ Q∞ ) ÿ ! j X Lˆ an,i ψi t ψ0 t Q¯ QP (ψ0 ) − log IP λ (e ) − 2t (3.4) ≤ H(Q|IP λ ). ψi i=1
Now we are in a position to take the thermodynamic limit and conclude the main result of this section. Proposition 3.3. If Q∗ is a translation invariant stationary measure of the infinite system b a¯ ) = 0 for all (0.1), and Q∗ has finite specific entropy with respect to IP λ , then D(Q∗ |L smooth functions a¯ ≤ 1 of compact support. Proof. We are going to use (3.4) with Q = Q∗n , where Q∗n is defined by Z IP λ (ψ|F3n ) dQ∗ , Q∗n (ψ) = and 3n denotes the centered cubic box of size n. Of course, H(Q∗n |IP λ ) = H3n (Q∗ |IP λ ) , thus ¯ ∗ |IP λ ) := lim H(Q∗n |IP λ ) = sup Q∗ (ψ) − F¯ (ψ) , H(Q n→∞ |3n | ψ where ψ are the local, bounded and continuous functions; in addition, Z X 1 log exp sk ψ dIP λ , F¯ (ψ) := lim n→∞ |3n | 3
(3.5)
(3.6)
k∈3n ∩Z
and sk denotes the shift in R3 by k ∈ R3 , i.e. sk ψ(p, q) = ψ(p, sk q) and sk q = {qα + k} if q = {qα }. The proof of the existence of (3.5) and (3.6) can be found in [OVY]. Now we set X sk ψ ψ0 = k∈3n ∩Z3
for some local bounded continuous function ψ. Without loss of generality we can suppose n so large that 3n contains the support of a¯ , and define an,i (x) = a¯ (x + ki ), ki ∈ Jn , and Jn is a discrete subset of 3n such that the an,i have the disjoint supports contained in 3n , and |Jnn3 | ≥ J¯0 , for some fixed constant ¯ ki ∈ Jn , for a local bounded continuous J¯0 .6 Correspondingly we choose ψi = ski ψ, ¯ function ψ. Pj 6 This can be done in such away to ensure that an ≥
i=1
an,i .
Reversibility in Infinite Hamiltonian Systems with Conservative Noise
495
Substituting in (3.4) and dividing by |3n |, it remains to prove that lim
n→∞
and
1 |3n |
X
Q∗n P t (sk ψ) = Q∗ (ψ)
(3.7)
k∈3n ∩Z3
1 X ¯t lim Q∗n n→∞ |Jn | ki ∈Jn
ÿ
b a¯ ψ¯ L s ki ¯ ψ
!
ÿ = Q∗
b a¯ ψ¯ L ψ¯
! .
(3.8)
Indeed, then (3.4), (3.5), (3.6), (3.7) and (3.8) imply ! ÿ Lˆ a¯ ψ¯ ¯ ¯ ∗ |IP λ ), ≤ H(Q Q∗ (ψ) − F (ψ) − 2tJ0 Q∗ ψ¯ and taking the supremum over all ψ and ψ¯ considered we obtain the wanted result. To prove (3.7), observe first that the rate of convergence in Lemma 2.5 depends only on the magnitude and the modulus of continuity of the underlying function. In the present situation all functions are translates of each other, thus the convergence is uniform on such functions. Therefore, for k ∈ 3n−√n we approximate P t with the local dynamics √ t in the ball Bk ( n) , otherwise we use simply the uniform bound of ψ . The proof s k P√ n of (3.8) is similar. b a¯ ) = 0 implies the reversibility of Q∗ with As it is well known, see [DV], D(Q∗ |L b respect to La¯ , which completes the proof of Theorem 1.3, whereby proving Theorem 1.4 as well, by a direct argument.
References [BLPS] Bunimovich, L., Liverani, C., Pellegrinotti, A., Suhov, Y.: Ergodic Systems of n Balls in a Billiard Table. Commun. Math. Phys. 146, 357–396 (1992) [D] Dobrushin, R.L.: Gibbsian random fields for particles without hard core (in Russian). Teor. Mat. Fiz. 4, 101–118 (1969) [DL] Donnay, V., Liverani, C.: Potential on the Two-Torus for which the Hamiltonian Flow is ergodic. Commun. Math. Phys. 135, 267–302 (1991) [DV] Donsker, M.D., Varadhan, S.R.S.: Asymptotic Evaluation of Certain Markov Process Expectations for Large Time. I. Commun. Math. Phys. 28, 1–47 (1975) [FD] Fritz, J., Dobrushin, R.L.: Non-equilibrium dynamics of two-dimensional infinite particle systems with a singular interaction. Commun. Math. Phys. 57, 67–81 (1977) [FFL] Fritz, J., Funaki, T., Lebowitz, J.L.: Stationary States of Random Hamiltonian Systems. Probab. Theory Related Fields 99, 211–236 (1994) [F1] Fritz, J.: Gradient dynamics of infinite point systems. Ann. Prob. 15, 478–514 (1987) [F2] Fritz, J.: Stationary States of Hamiltonian Systems with Noise. In: On Three Levels, M. Fannes, Ch. Maes, A. Verbeure Eds, New York: Plenum, 1994, pp. 203–214 [H] Holley, R.: Free energy in a Markovian model of a lattice spin system. Commun. Math. Phys. 23, 87–99 (1971) [KSS] Kr´amli, A., Simanyi, N., Sz´asz, D.: The K property for Four Billiard Balls. Commun. Math. Phys. 144, 107–148 (1992) [LO] Liverani, C., Olla, S.: Ergodicity in Infinite Hamiltonian Systems with Conservative Noise. Probab. Theory Related Fields 106, 3, 401–445 (1996) [LW] Liverani, C., Wojtkowski, M.: Ergodicity in Hamiltonian Systems. Dynamics Reported 4, 130–202 (1995)
496
J. Fritz, C. Liverani, S. Olla
[OVY] Olla, S., Varadhan, S.R.S., Yau, H.T.: Hydrodynamics Limit for a Hamiltonian System with Weak Noise. Commun. Math. Phys. 155, 523–560 (1993) [R] Ruelle, D.: Superstable interactions in classical statistical mechanics. Commun. Math. Phys. 18, 127–159 (1970) [S] Siegmund-Schultze, R.: On nonequilibrium dynamics of multidimensional infinite particle systems. Commun. Math. Phys. 100, 245–265 (1985) Communicated by J.L. Lebowitz
Commun. Math. Phys. 189, 497 – 512 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Taming Griffiths’ Singularities in Long Range Random Ising Models Abel Klein? , Sharareh Masooman ?? Department of Mathematics, University of California, Irvine, Irvine, CA 92697-3875, USA. E-mail: [email protected], [email protected] Received: 30 September 1996 / Accepted: 18 February 1997
Dedicated to the memory of Roland Dobrushin Abstract: We study long range random Ising models and develop modified high temperature and strong magnetic field expansions that give decay of truncated correlation functions and uniqueness of Gibbs states, in spite of the presence of Griffiths’ singularities. 1. Introduction In 1969 Griffiths [G] considered the statistical mechanics of a random ferromagnetic Ising model; he showed that for the site diluted model, the quenched magnetization displayed non-analytic behavior for values of the inverse temperature at which the system has neither long-range order nor spontaneous magnetization (see also [F, Su]). His arguments are believed to apply to a large class of random models. At the origin of this behavior is the fact that even if, with probability one, the infinite system is not ordered as whole, there are, also with probability one, infinitely many arbitrarily large regions inside which the system is strongly correlated. This phenomenon of Griffiths’ singularities is now recognized to be a regular feature in the statistical mechanics of disordered systems. It has the unpleasant consequence that the usual high temperature or strong magnetic field expansions, the standard tools for obtaining decay of correlation functions (and also existence and uniqueness of the thermodynamical limit), fail to converge. In this article we consider long range random Ising models; we develop modified high temperature and strong magnetic field expansions that give decay of truncated correlation functions and uniqueness of Gibbs states, in spite of the presence of Griffiths’ singularities. Our expansions are based on the work of von Dreifus, Klein and Perez [DKP]. ? ??
This author was supported in part by the NSF Grant DMS-9500720. This author was supported in part by the NSF Grant DMS-9500720.
498
A. Klein, S. Masooman
The short range random Ising model was studied by Berretti [B], Fr¨ohlich and Imbrie [FI], Bassalygo and Dobrushin [BD], von Dreifus, Klein and Perez [DKP] and Gielis and Maes [GM]. The high temperature behavior of long range spin glasses has been studied by Fr¨ohlich and Zegarlinski [FZ] and Zegarlinski [Z]. The subject of this paper is the long range random Ising model whose Hamiltonian in a finite volume 3 ⊂ Zd is given by X X Jxy σx σy + B hx σ x , (1) H3 (σ) = − {x,y}∈3∗
x∈3 ∗
with exchange couplings J = {Jxy ; {x, y} ∈ Zd } and magnetic field h = {hx ; x ∈ Zd }; we use the notation 0∗ = {{x, y} ; x, y ∈ 0, x 6= y} for 0 ⊂ Zd , d Xxy d∗ , σ = σx ; x ∈ Zd ∈ {−1, 1}Z . We assume Jxy = |x−y| αd for each {x, y} ∈ Z o n ∗ and h to be independent famiwhere α > 1. We take X = Xxy ; {x, y} ∈ Zd lies of independent identically distributed (within each family) random variables, with e ≡ E |Xxy | < ∞. X P Since α > 1, for any x ∈ Zd we have y∈Zd |Jxy | < ∞ for a.e. J, so boundary conditions can be introduced in the usual way. A boundary condition is a map χ : Zd → [−1, 1] . It is an external boundary condition if it is a configuration on Zd , i.e., a map χ : Zd → {−1, 1} . If χ ≡ 0 we have free boundary conditions, in which case we omit χ in the notation below. For a given boundary condition χ we set X Jxy σx χy . (2) H3χ (σ) = H3 (σ) − x∈3 y ∈3 /
We now define thermal averages for local observables; A is a local observable if it depends nontrivially on only a finite number of sites which we call the support of A. We set kAk = supσ |A (σ)|. If A and B are local observables we will write d (A, B) for the distance between the supports of A and B. Given an inverse temperature β, a local observable A, a finite volume 3 containing the support of A and a boundary condition χ, we define χ
hAi3 =
X χ χ 1 X A (σ) e−βH3 (σ) , with Z3χ = e−βH3 (σ) . χ Z3 σ σ
(3)
The truncated finite volume correlation function of two local observables A and B, with boundary condition χ, is defined by χ
χ
χ
χ
hA; Bi3 = hABi3 − hAi3 hBi3 .
(4)
We start with the high temperature case. In this case we fix arbitrary B ∈ R and d h ∈ RZ in (1); only J is random (all our estimates will be uniform in B and h). Theorem 1 (High Temperature Regime). There exists β1 = β1 (α, d) > 0, such that for all 0 < β < β1 we can find C = C (β) < ∞, such that for any two local observables A and B and any finite 3 containing their supports, we have χ (5) E |hA; Bi3 | ≤ C kAk kBk |supp A| | supp B| d (A, B)−αd ,
Taming Griffiths’ Singularities in Long Range Random Ising Models
499
for all B ∈ R, h ∈RZ , and any boundary condition χ. α + ≤ C1 < ∞ for some > 0, then for each 0 < β < β1 If in addition E |Xxy | α−1 we have, with probability one (in J), that there exists a unique Gibbs State and for every local observable A χ (6) hAi ≡ lim hAi33 d
3→Zd
exists and is independent of the choice of the boundary conditions χ3 . We now turn to the strong magnetic field case. In this case both J and h are random, and we assume P (hx = 0) = 0. Theorem 2 (Strong MagneticField Regime). Let 1 < α0 < α. Then for all β > 0 there exists B1 = B1 β, α0 , d > 0, such that for all B with |B| > B1 we can find C = C (B) < ∞ such that for any two local observables A and B and any finite 3 containing their supports, we have 0 χ (7) E |hA; Bi3 | ≤ C kAk kBk |supp A| | supp B| d (A, B)−α d . α0 + If in addition E |Xxy | α0 −1 ≤ C1 < ∞ for some > 0, then for all |B| > B1 we have, with probability one (in J and h), that there exists a unique Gibbs State and for every local observable A χ
hAi ≡ lim hAi33 3→Zd
(8)
exists and is independent of the choice of the boundary conditions χ3 . 2. The Expansions To deal with truncated correlation functions we use the duplication trick; we consider two non-interacting copies of the system, i.e., a new spin system with configurations σ e = σ ex = σx , σx0 ; x ∈ Zd , σx , σx0 ∈ {−1, +1}, and finite volume Hamiltonian e χ (σ), where for any function F (σ) we set Fe (e σ ) = F (σ) + F σ 0 . Finite volume H 3 thermal averages of a local observable C (e σ ) in the duplicated system, with boundary condition χ (same for both copies), are given by X −β H 1 X e3χ e e3χ e −β H σ σ χ C (e σ) e e , where Ze3χ = . (9) hhCii3 = χ Ze3 σ σ e e Truncated correlation functions of the original system may be expressed as ordinary correlation functions of the duplicated system through the identity 1 DD b b EEχ χ AB , (10) hA; Bi3 = 2 3 b where to every local observable A of the original system we associate an observable A 0 b σ ) = A (σ) − A σ . of the duplicated system by setting A (e 2.1. The high temperature expansion. We start with an infinite range version of [DKP, Theorem 2.1]; the main difference being the use of long range self avoiding bond walks.
500
A. Klein, S. Masooman ∗
This deterministic result holds for an arbitrary set S ⊂ Zd ; it will be later applied to the case of random exchange couplings J, with the “singular" set S chosen appropriately for each realization of J. A self avoiding (bond) walk from a site x to a site y, written ω : x → y, is a finite sequence of bonds {x1 , y1 } , {x2 , y2 } , . . . , {xn , yn }, such that x1 = x and yn = y, xi+1 = yi for i = 1, . . . , n − 1, and xi 6= xj if i 6= j. For such a walk we set |ω| = n. We define Wxy = {ω : x → y}; if A and B are local observables we set WAB = {ω : x → y; x ∈ supp A, y ∈ supp B}. ∗
Theorem 3. Given S ⊂ Zd let ρxy = ρxy (S, J, β) =
ξxy ≡ e4β|Jxy | − 1 1
if {x, y} ∈ /S . if {x, y} ∈ S
(11)
Then for any two local observables A and B, any finite 3 containing their supports, and any boundary condition χ we have X Y χ ρxy . (12) |hA; Bi3 | ≤ 2 kAk kBk ω∈WAB {x,y}∈ω
Proof. We start by redefining the Hamiltonian as X X hx σ x , Jxy σx σy + |Jxy | + B H3 (σ) = − {x,y}∈3∗
(13)
x∈3
which differs from (1) by an overall constant. We then perform an expansion only in 3∗ \S. Let X T3∗ \S = − (14) Jxy σx σy + |Jxy | {x,y}∈3∗ \S
and Also let
Then
T3χ∗ ∩S = H3χ − T3∗ \S .
(15)
n o 0 0 Exy = exp β Jxy σx σy + σx σy + 2 |Jxy | − 1.
(16)
DD EEχ bB b A
3
1 X b b −β T˜ χ∗ −β T˜3∗ \S 3 ∩S e = χ AB e Z˜ 3 σ˜ P −β − Jxy 1 X b b −β T˜ χ∗ {x,y}∈3∗ \S 3 ∩S e = χ AB e Z˜ 3
= =
σ˜
1 X b b −β T˜ χ∗ 3 ∩S AB e ˜ Z3χ σ˜ 1 X b b −β T˜ χ∗ 3 ∩S AB e ˜ Z3χ σ˜
Y {x,y}∈3∗ \S
X
Exy + 1 Y
G⊂3∗ \S {x,y}∈G
Exy .
(17) 0
0
σx σy +σx σy +2|Jxy |
(18) (19) (20)
Taming Griffiths’ Singularities in Long Range Random Ising Models
501
Now suppose G is such that G ∪ S does not contain a walk ω ∈ WAB . Due to the invariance of the Hamiltonian under the exchange σ ←→ σ 0 , such a set would contribute zero to the above sum. We may therefore restrict the sum to those G of the form G = ωS ∪ G0 , where ωS = ω\S and G0 ⊂ 3∗ \S \ωS . Thus DD EEχ X Y 1 X b b −β T˜ χ∗ bB b 3 ∩S A = χ Exy AB e 3 Z˜ 3 σ˜ ω∈WAB {x,y}∈ωS X Y Ex 0 y 0 (21) G0 ⊂(3∗ \S )\ωS {x0 ,y 0 }∈G0 Y X χ 1 X bB b e−β T˜3∗ ∩S Exy A ≤ χ ˜ Z3 ω∈W {x,y}∈ω σ˜ AB S X Y Ex 0 y 0 (22) 0 0 0 ∗ 0 G ⊂(3 \S )\ωS {x ,y }∈G Y 1 X ξxy ≤ 4 kAk kBk χ ˜ Z3 ω∈W {x,y}∈ω AB S X Y ˜χ e−β T3∗ ∩S (23) Exy + 1 σ˜
{x,y}∈3∗ \S
X
Y
1 ξxy Z˜ 3χ Z˜ 3χ ω∈W {x,y}∈ω AB S X Y ≤ 4 kAk kBk ρxy ,
= 4 kAk kBk
(24) (25)
ω∈WAB {x,y}∈ω
which, in view of (10), is the same as (12).
We can now prove the first part of Theorem 1. Taking the expectation of both sides of (12) gives X X Y χ ρxy ≤ 2 kAk kBk 0x0 y0 , (26) E |hA; Bi3 | ≤ 2 kAk kBk ω∈WAB {x,y}∈ω
x0 ∈supp A y0 ∈supp B
P Q where 0x0 y0 ≡ ω∈Wx y {x,y}∈ω ρxy ; we used the fact that the ρxy ’s are independent 0 0 random variables. Here ρxy ≡ E ρxy ≤ E ξxy ; {x, y} ∈ / S + P {x, y} ∈ S . (27) ∗ ∗ Let us now choose S = {x, y} ∈ Zd ; ξxy > 1 , i.e., S = {x, y} ∈ Zd ; |Jxy | > δ} with δ = ln4β2 . It follows that e 1 1 X . P {x, y} ∈ S ≤ E |Jxy | = δ δ |x − y|αd
(28)
(δ) (δ) Setting Jxy = Jxy if |Jxy | ≤ δ and Jxy = 0 otherwise, it follows from (11) and (27) that for β small enough we have
502
A. Klein, S. Masooman
e e (δ) 1 4 X βX + ρxy ≤ E 5β Jxy ≤ 5 + . αd δ |x − y| ln 2 |x − y|αd
(29)
From (29) we can see that given any θ > 0 we can choose β small enough such that θ ρxy ≤ . This along with (26) gives αd |x − y| Y X θ (30) 00y0 ≤ αd ω:0→y0 {x,y}∈ω |x − y| =
=
∞ X n=1 ∞ X n=1
θn
X
1
x1 ,...,xn
|x1 |
1 αd
|x2 − x1 |
αd
···
1 |y0 − xn−1 |
(31)
αd
θn (ϕ ∗ ϕ ∗ . . . ∗ ϕ) (y0 ) , {z } |
(32)
n times
(
where ϕ (x) =
1 αd |x|
if x 6= 0
0
if x = 0
.
(33)
The following lemma gives an upper bound for the n-fold convolution in (32). Lemma 2.1. Let ϕ, ψ : Zd → R and suppose there exist finite constants Cϕ and Cψ Cψ Cϕ such that |ϕ (x)| ≤ and |ψ (x)| ≤ for all x ∈ Zd . Then αd αd 1 + |x| 1 + |x| |(ϕ ∗ ψ) (x)| ≤ for all x ∈ Zd , where K ≡
X x∈Zd
1 1 + |x|
αd
2αd+1 KCϕ Cψ 1 + |x|
(34)
αd
.
Proof. We have
X X Cϕ Cψ ϕ (x − y) ψ (y) ≤ |(ϕ ∗ ψ) (x)| = αd αd y∈Zd 1 + |x − y| 1 + |y| y∈Zd X X Cϕ Cψ Cϕ Cψ = + . αd αd αd αd 1 + |x − y| 1 + |y| 1 + |x − y| 1 + |y| y∈Zd y∈Zd |x−y|≤
|x| 2
|x−y|>
We treat each sum in (35) separately: X Cϕ Cψ y∈Zd |x| |x−y|≤ 2
1 + |x − y|
αd
1 + |y|
αd
≤
≤
(35)
|x| 2
Cϕ C ψ αd 1 + |x| 2
X y∈Zd |x| |x−y|≤ 2
1 1 + |x − y|
Cϕ Cψ K 2αd Cϕ Cψ K ≤ , αd αd 1 + |x| 1 + |x| 2
αd
(36)
Taming Griffiths’ Singularities in Long Range Random Ising Models
503
and X y∈Zd |x| |x−y|> 2
Cϕ 1 + |x − y|
Cψ αd
1 + |y|
αd
X
≤ Cϕ C ψ
y∈Zd |x| |x−y|> 2
2αd Cϕ Cψ K
≤
1 + |x|
αd
1+
1 |x| 2
αd
1 1 + |y|
.
αd
(37)
Combining (35), (36), and (37) we get (34) .
Let ϕ be as in (33), using (34) and an induction we conclude that n−1 2 · 2αd+2 K 1 (ϕ ∗ . . . ∗ ϕ) (x) ≤ ≡ K1 K2n . αd αd {z } | 1 + |x| 1 + |x|
(38)
n times
The above inequality and (32) lead to 00y0 ≤ 2
∞ X
θn K1 K2n
n=1
1 1 + |x|
αd
=
2K1 K2 θ K3 1 ≡ , αd 1 − K2 θ 1 + |x|αd 1 + |x|
1 . K2 From (26), (39) and the translation invariance of expectations we get X 1 χ E |hA; Bi3 | ≤ 2 kAk kBk K3 , αd 1 + |x 0 − y0 | x0 ∈supp A
(39)
where the equality holds if we choose θ <
(40)
y0 ∈supp B
from which (5) follows, concluding the proof of the first part of Theorem 1. 2.2. The strong magnetic field expansion. The long range strong magnetic field expansion cannot be done as in the short range case [DKP]. The reason is that the events of sites being singular would not be independent for any two given sites. To circumvent this problem we rewrite the Hamiltonian given in (2) as the sum of “short range” and “long range” Hamiltonians; we then perform a short range strong magnetic field expansion on the short range part and a long range high temperature expansion on the long range part, the range being chosen large enough so that the latter expansion converges. Without loss of generality we assume B ≥ 0. χ χ + H3,l.r. , So let us fix a range R > 0, to be chosen later. We write H3χ = H3,s.r. where the short range Hamiltonian is X X X χ (σ) = − Jxy σx σy + B hx σ x − Jxy σx χy , (41) H3,s.r. {x,y}∈3∗ |x−y|≤R
x∈3
and the long range Hamiltonian is X χ (σ) = − H3,l.r.
{x,y}∈3∗ |x−y|>R
Jxy σx σy −
x∈3 y ∈3 / |x−y|≤R
X x∈3 y ∈3 / |x−y|>R
Jxy σx χy .
(42)
504
A. Klein, S. Masooman χ We redefine H3,l.r. by adding an overall constant as follows: X X χ (σ) = − H3,l.r. Jxy σx χy . Jxy σx σy + |Jxy | − ∗
{x,y}∈3 |x−y|>R
For a subset K of 3∗ we set TK (σ) = −
(43)
x∈3 y ∈3 / |x−y|>R
X
Jxy σx σy + |Jxy |
(44)
{x,y}∈K |x−y|>R
and
X
χ (σ) = TK (σ) − TK
Jxy σx χy .
(45)
x∈3 y ∈3 / |x−y|>R
Note that
χ = T3χ∗ = T3χ∗ ∩Sb + T3∗ \ Sb H3,l.r.
(46)
∗
for any subset Sb of Zd . We also define “long range bonds" and “short range sites" for a self avoiding walk ω = {{x1 , y1 } , {x2 , y2 } , . . . , {xn , yn }} (as defined in the previous subsection): ωl = {{x, y} ∈ ω; |x − y| > R} , ωs = x ∈ Zd ; x = xi for some i and either |x − xi−1 | ≤ R or |x − xi+1 | ≤ R . ∗
Theorem 4. Given Sb ⊂ Zd and Ss ⊂ Zd , let ρxy = ρxy (Sb , J, β) be as in (11) and ζx if x ∈ / Ss θx = θx (Ss , J, B, h, β) = , (47) 1 if x ∈ Ss where
ζx = ζx (J, B, h, β) = exp
−2β 2B |hx | − 6
X d
y∈Z |x−y|≤R
|Jxy | .
(48)
Then for any two local observables A and B, any finite 3 containing their supports, and any boundary condition χ we have X Y Y χ |hA; Bi3 | ≤ 2 kAk kBk 3θx ρxy . (49) ω∈WAB x∈ωs
{x,y}∈ωl
χ we proceed as in [DKP], we introduce variables η = ηx ; x ∈ Zd , Proof. For H3,s.r. where ( sgn hx ) σx + 1 ηx = ∈ {0, 1} , (50) 2 with sgn u = 1 if u ≥ 0 and sgn u = −1 otherwise. We set Kxy = (sgn hx ) sgn hy Jxy . χ by subtracting an overall constant and rewrite it in the new variables: We redefine H3,s.r.
Taming Griffiths’ Singularities in Long Range Random Ising Models χ (η) = −4 H3,s.r.
X
505
Kxy ηx ηy
(51)
{x,y}∈3∗ |x−y|≤R
+2
X
X B |hx | +
x∈3
Kxy −
X y ∈3 / |x−y|≤R
y∈3 |x−y|≤R
Kxy sgn hy χy ηx .
Given a configuration η˜ = η, η 0 of the duplicated system we define Gη˜ = x ∈ Zd : ηx + ηx0 > 0 , and say that a configuration η˜ is compatible with G ⊂ Zd , and write η˜ ≺ G , if Gη˜ = G. In what follows we consider σ˜ as a function of η˜ by (50), so we only need to sum over η. ˜ We have
χ ˜χ 1 X ˆ ˆ −β H˜ 3χ,s.r. (η)+ (σ) ˜ H ˜ 3,l.r. ABe (52) Aˆ Bˆ 3 = χ Z˜ 3
η˜
˜χ 1 X X ˆ ˆ −β H˜ 3χ,s.r. (η) (σ) ˜ −β H ˜ 3,l.r. ABe e = χ ˜ Z3 G⊂3 η≺G ˜ 3,χ ˜χ 1 X X ˆ ˆ −β H˜ G,s.r. (σ) ˜ (η) ˜ −β H 3,l.r. ABe e = χ ˜ Z
(53) (54)
3 G⊂3 η≺G ˜
=
χ 3,χ 1 X X ˆ ˆ −β H˜ G,s.r. ˜ −β T˜3 ˜ (η) ˜ ˜ ∗ ∩S (σ) b AB e e e−β T3∗ \Sb (σ) , χ ˜ Z3 G⊂3 η≺G ˜
where 3,χ (η) = −4 HG,s.r.
+2
X
Kxy ηx ηy
(56)
{x,y}∈G∗ |x−y|≤R
X B |hx | +
x∈G
(55)
X
X
Kxy −
y∈3 |x−y|≤R
y ∈3 / |x−y|≤R
Kxy sgn hy χy ηx .
The equality in (54) follows from the fact that for η˜ ≺ G and x ∈ / G we have ηx = ηx0 = 0; (55) comes from (46). To deal with the long range part of the above equation the same method we follow ˜ = exp β Jxy σx σy + σx0 σy0 + 2 |Jxy | − 1 : as in [DKP] using Exy (σ) X ˜ −β T˜3∗ \Sb (σ) 0 0 = exp −β − Jxy σx σy + σx σy + 2 |Jxy | (57) e {x,y}∈3∗ \Sb =
Y {x,y}∈3∗ \Sb |x−y|>R
|x−y|>R
Exy (σ) ˜ +1 =
X
Y
Gb ⊂3∗ \Sb {x,y}∈Gb |x−y|>R
Exy (σ) ˜ ,
(58)
506
A. Klein, S. Masooman
so we have
χ Aˆ Bˆ 3 1 X = χ Z˜
X
X
3,χ
˜χ
˜ −β T3∗ ∩S (σ) ˜ ˜ b Aˆ Bˆ e−β HG,s.r. (η) e
3 G⊂3 Gb ⊂3∗ \Sb η≺G ˜
Y
Exy (σ) ˜ .
(59)
{x,y}∈Gb |x−y|>R
Now we notice that if G ⊂ 3 and Gb ⊂ 3∗ \Sb , if there is no self avoiding walk ω from A to B with ωs ⊂ G ∪ Ss and ωl ⊂ Gb ∪ Sb , then the term with G and Gb contribute zero to the sum in (59). So we can restrictthe sum to pairs G, Gb for which there exist ω ∈ WAB such that G = G0 ∪ ωs \Ss , where G0 ⊂ 3\ ωs \Ss , and Gb = G0b ∪ ωl \Sb , where G0b ⊂ 3∗ \Sb \ωl . Thus
χ Aˆ Bˆ 3 X X ˜ 3,χ 1 X (η) ˜ −β H G0 ∪(ωs \Ss ),s.r. Aˆ Bˆ e = χ (60) Z˜ 3 ω∈W 0 0 AB G ⊂3\(ωs \Ss ) η≺G ˜ ∪(ωs \Ss ) Y X Y χ ( σ) ˜ −β T˜3 ∗ ∩S b Ex 0 y 0 Exy , ×e G0 ⊂(3∗ \Sb )\ωl {x0 ,y0 }∈G0 {x,y}∈ωl \Sb b b |x0 −y0 |>R and so
ˆ ˆ χ AB 3 ≤
X
X 4 kAk kBk χ Z˜ 3
×e
χ −β T˜3 ∗ ∩S
b
Y
×
×e
b
˜ 3,χ −β H G0 ∪
(ωs \Ss ),s.r.
(η) ˜
Y Y Exy Ex0 y 0 + 1 {x,y}∈ωl \Sb {x0 ,y0 }∈(3∗ \S )\ω b l |x0 −y0 |>R Exy + 1 (this expression is bounded below by 1!)
X 4 kAk kBk χ Z˜ 3 ω∈WAB χ −β T˜3 ∗ ∩S
e
0 ∪ ω \S ω∈WAB G0 ⊂3\(ωs \Ss ) η≺G ˜ ( s s)
{x,y}∈ωl \Sb
=
X
X
X
e
˜ 3,χ −β H G0 ∪
(ωs \Ss ),s.r.
this is exactly e
(η) ˜
0 ∪ ω \S G0 ⊂3\(ωs \Ss ) η≺G ˜ ( s s)
Y Y Exy . Ex 0 y 0 + 1 {x,y}∈ωl \Sb {x0 ,y0 }∈3∗ \S
|
(61)
|x0 −y0 |>R {z
˜χ (σ) −β H ˜ 3,l.r.
b
}
(62)
Taming Griffiths’ Singularities in Long Range Random Ising Models
507
3,χ We will now look at the expression HG 0 ∪ ω \S ,s.r. (η) (given in (56)). Let us ( s s) examine the first term of this expression : X Kxy ηx ηy −4 ∗ {x,y}∈(G0 ∪(ωs \Ss )) |x−y|≤R
X
= −4
∗
{x,y}∈(G0 ) |x−y|≤R
X
= −4
X
X
Kxy ηx ηy − 4
Kxy ηx ηy
x∈(ωs \Ss ) y∈G0 ∪(ωs \Ss ) |x−y|≤R
X
X
x∈(ωs \Ss )
y∈3 |x−y|≤R
Kxy ηx ηy − 4 ∗
{x,y}∈(G0 ) |x−y|≤R
Kxy ηx ηy ,
(63)
where the last equality holds if η˜ ≺ G0 ∪ ωs \Ss . So now we have 3,χ HG 0∪ ω (
),s.r. (η) X X = −4 Kxy ηx ηy − 4 ∗ {x,y}∈(G0 ) x∈(ωs \Ss ) s \Ss
|x−y|≤R
+2
X B |hx | +
x∈G0
+2
X x∈ (ωs \Ss )
3,χ = HG 0 ,s.r. (η) +
X
y ∈3 / |x−y|≤R
X
X
Kxy sgn hy χy ηx
X
Kxy −
y ∈3 / |x−y|≤R
y∈3 |x−y|≤R
Kxy ηx ηy
y∈3 |x−y|≤R
X
Kxy −
y∈3 |x−y|≤R
B |hx | +
X
Kxy sgn hy χy ηx
0χx,3 (η) ,
(64)
x∈(ωs \Ss )
where
X
0χx,3 (η) = −4
+2 B |hx | + ≥ 2 2B |hx | − 6 Thus
Kxy ηx ηy
y∈3|x−y|≤R
X
Kxy −
y∈3 |x−y|≤R
X y∈ Z |x−y|≤R d
X y ∈3 / |x−y|≤R
Kxy sgn hy χy ηx
|Jxy | ≡ YB,x = YB,x (J , h) .
(65)
508
A. Klein, S. Masooman
X 4 ˆ ˆ χ AB 3 ≤ ˜ χ kAk kBk Z 3
X
X
e
˜χ (σ) ˜ −β H 3,l.r.
0 ∪ ω \S ω∈WAB G0 ⊂3\(ωs \Ss ) η≺G ˜ ( s s)
×e
Y
˜ 3,χ (η) ˜ −β H G0 ,s.r.
e
˜ χ (η) −β 0 ˜ x,3
×e
˜ 3,χ −β H
G0 ,s.r.
X
X
AB
×
ξxy
{x,y}∈ωl \Sb
Y
AB
X
˜χ (σ) ˜ −β H 3,l.r.
Y
e−βYB,x
x∈(ωs \Ss )
X 4 = χ kAk kBk Z˜ 3 ω∈W
e
0 ∪ ω \S G0 ⊂3\(ωs \Ss ) η≺G ˜ ( s s)
Y
(η) ˜
˜ (66) Exy (σ)
{x,y}∈ωl \Sb
x∈(ωs \Ss )
X 4 ≤ χ kAk kBk ˜ Z3 ω∈W
Y
ξxy
{x,y}∈ωl \Sb
x∈(ωs \Ss )
X
Y
e−βYB,x
(67)
e
˜ 3,χ ˜χ (η) ˜ (σ) −β H ˜ −β H G0 ,s.r. 3,l.r.
e
.
(68)
0 ∪ ω \S G0 ⊂3\(ωs \Ss ) η≺G ˜ ( s s)
Since # η˜ : η˜ ≺ G0 ∪ ωs \Ss = 3|ωs \Ss | × # {η˜ : η˜ ≺ G0 }, we get
X 4 ˆ ˆ χ AB 3 ≤ ˜ χ kAk kBk Z3 ω∈W
AB
×
G0 ⊂3\
X 4 ≤ χ kAk kBk Z˜ 3 ω∈W ×
X
AB
e
e
˜ 3,χ −β H
G0 ,s.r.
Y
3e−βYB,x
X X
ξxy
Y
3θx
(70)
}
Y
ω∈WAB x∈ωs
so (49) now follows from (10).
Y
e
3e−βYB,x
ω∈WAB x∈(ωs \Ss )
= 4 kAk kBk
(69)
{x,y}∈ωl \Sb
x∈(ωs \Ss )
{z χ ˜ this is exactly Z3
= 4 kAk kBk
(η) ˜
˜ 3,χ ˜χ (η) ˜ (σ) −β H ˜ −β H G0 ,s.r. 3,l.r.
0 G0 ⊂3 η≺G ˜
|
ξxy
{x,y}∈ωl \Sb
˜χ (σ) ˜ −β H 3,l.r.
0 η≺G ˜
(ωs \Ss )
X
e
Y
3e−βYB,x
x∈(ωs \Ss )
X
X
Y
Y
Y
ξxy
(71)
{x,y}∈ωl \Sb
ρxy ,
(72)
{x,y}∈ωl
We will now prove the first part of Theorem 2. Asbefore we will take expectations of both sides of (49). Notice that the random variables θx ; x ∈ Zd are not independent. We set
Taming Griffiths’ Singularities in Long Range Random Ising Models
Ξ0y0 =
ÿ
X
Y
E
!
Y
3θx
ρxy .
(73)
{x,y}∈ωl
x∈ωs
ω:0→y0
509
∗ We pick Sb = {x, y} ∈ Zd ; |x − y| > R and ξxy > 1 , so proceeding as in (29), we get, with R sufficiently large, that e 4 βX ρxy ≤ 5 + (74) ln 2 |x − y|αd 1 1 (for any 1 < α0 < α ) < 13β X˜ α0 d α−α0 )d |x − y| |x − y|( <
13β X˜ 1 0 R(α−a0 )d |x − y|α d
(since |x − y| > R ). (75)
Hence for any given θ > 0 and any 1 < α0 < α we can find R large enough so that θ ρxy < α0 d . |x−y|
We now turn to the θx . Let ω be a self avoiding walk with ωs = {x1 , . . . xN }, the x0 s Q QN being ordered as they appear in ω, so x∈ωs θx = i=1 θxi . We obtain a subset of ωs inductively as follows: The first step is to consider those xi ∈ ωs for which |xi − x1 | ≤ R and to throw them away. From the remaining xi ’s (i 6= 1) we now pick the one with the lowest index, say xi2 , and throw away all remaining xi ’s such that |xi − xi2 | ≤ R. The procedure is repeated until we exhaust the xi ’s. After this elimination process, what is left is a subset of ωs , say {y1 (= x1 ) , y2 , . . . , yM } with the property that |yi − yj | > R , for all 1 ≤ i < j ≤ M and so the θyi are independent random variables. Note that at each step at most Rd − 1 sites are thrown out, so at the end at most M Rd − 1 sites have been thrown out. Therefore N ≤ M + M Rd − 1 and thus M ≥RNd . P y: Let us now choose Ss = x : 2B |hx | − 6 |Jxy | ≤ 0 , then, since P (hx = 0) = 0,
|x−y|≤R
P (x ∈ Ss ) ≤ P B ≤
6
P
y: |x−y|≤R
|Jxy |
2 |hx |
−→ 0 as B → ∞.
Thus θ ≡ E (θx ) → 0 as B → ∞. Since the above definition for Ss implies that θx ≤ 1, we have ! ÿM ! ÿ Y Y N M θx ≤ E θyi = θ ≤ θ Rd ≡ θeN . E x∈ωs
(76)
(77)
i=1
Thus given θ > 0, 1 < α0 < α and R > 0, if B is large enough we have 3θe < Rαθ0 d . In this case it follows from (73) that Y Y θ X θ (78) Ξ0y0 ≤ α0 d R α0 d |x − y| x∈ω ω:0→yo {x,y}∈ωl
≤
X
Y
ω:0→yo {x,y}∈ω
s
θ |x − y|
α0 d
.
(79)
510
A. Klein, S. Masooman
The inequality (7) now follows from (79) and (49), as in the proof of (5), concluding the proof of the first part of Theorem 2.
3. Uniqueness of Gibbs States In this section we consider a long range random Ising model as in (1), except that we do not require independence of the random variables. (We still require X and h to be families of identically distributed (within each family) random variables.) Lemma 3.2. Suppose (5) (the expectation being over all the random variables) holds α α−1 + ≤ C1 < ∞ for some > 0. Then with E |Xxy | χ lim E sup hσ1 iBL − hσ1 iBL = 0,
L→∞
(80)
χ
where the supremum is taken over all external boundary conditions χ, BQ L is a box of side 2L centered at the origin in Zd , 1 is a finite subset of Zd , and σ1 = x∈1 σx . Proof. It follows from the Fundamental Theorem of Calculus that Z Z 1 X 1 d sχ sχ hσ1 i − hσ1 iχ = hσ i β |Jxy | hσ1 ; σx iBL ds, (81) ≤ 1 BL BL B L 0 ds 0 x∈BL y ∈B / L
so
X χ |Jxy | GL (1, x) , sup hσ1 iBL − hσ1 iBL ≤ β χ
where
(82)
x∈BL y ∈B / L
sχ GL ( 1, x) = sup hσ1 ; σx iBL .
(83)
0≤s≤1 χ
Clearly GL ( 1, x) ≤ 2, so by hypothesis we have C |1| , 2 for all L. E (GL (1, x)) ≤ min d (1, x)αd Taking the expectation of both sides in (82) gives X χ E |Jxy | GL (1, x) . E sup hσ1 iBL − hσ1 iBL ≤ β χ
(84)
(85)
x∈BL y ∈B / L
Let us fix a number γ : 0 < γ < min {1, α − 1}, and expand the last sum by dividing BL into two parts so that X E |Jxy | GL (1, x) = S1 + S2 , x∈BL y ∈B / L
Taming Griffiths’ Singularities in Long Range Random Ising Models
where S1 =
P
|x|≤Lγ |y|>L
P E |Jxy | GL (1, x) and S2 =
511
γ
L <|x|≤L |y|>L
E |Jxy | GL (1, x) .
We will first show convergence of S1 to 0: S1 ≤
X
X
|x|≤Lγ
y: |x−y|>L−Lγ
≤ 2X˜
E |Jxy | GL (1, x)
X
X |x|≤Lγ
1
y: |x−y|>L−Lγ
|x − y|
αd
˜ d ≤ 2XC
(86) ∞ X
X
|x|≤Lγ r=L−Lγ +1
rd−1 , rαd
(87)
P∞ where Cd is some finite constant depending only on d. The series r=L−Lγ +1 rd−αd−1 R∞ converges since α > 1. In fact it behaves like the integral L−Lγ td−αd−1 dt which converges to a constant multiple of (L − Lγ )d−αd . Therefore S1 is bounded by a constant multiple of Lγd (L − Lγ )d−αd , which (for large L) behaves like Lγd Ld−αd = Lγd+d−αd which converges to 0 as L → ∞. (Recall that we chose 0 < γ < α − 1.) Thus S1 → 0 as L → ∞. α +ε, p1 + q1 = 1: We now apply H¨older’s inequality to the expectation in S2 with q = α−1 X
S2 ≤
E |Jxy |
Lγ <|x|≤L |y|>L
X
≤
ÿ
1
= C2 |1| p
qαd
1
X
1
d (1, x)
Lγ <|x|≤L 1
≤ C3 |1| p Cd 2
E (GL (1, x))p ! q1
X
Lγ <|x|≤L
≤ C3 |1| p
1
C1 |x − y|
γ
L <|x|≤L |y|>L
q q
αd p
1 αd p
r=Lγ +1
r
αd p
(88)
2p−1
C |1| d (1, x)αd
X
1
|y|>L
|x − y|
p1
αd
(for L large enough)
|x| 2
L X rd−1
αd p
p1
,
(89)
(90) (91)
(92)
where C2 and C3 are finite constants. But L X r= Lγ +1
rd−1−
αd p
Z ≈
L
td−1−
αd p
dt → 0 as L → ∞.
Lγ +1
(Recall that αp > 1 so that d − αd p is a negative number.) Hence we also have S2 → 0 as L → ∞ and that completes the proof. Lemma 3.3. Suppose (80) holds. Then there is a unique Gibbs state with probability one.
512
A. Klein, S. Masooman
Proof. It follows from (80) that, with probability one, for each finite 1 ⊂ Zd we can find a sequence L1,k → ∞ (depending on 1) such that χ (93) lim sup hσ1 iBL − hσ1 iBL = 0. k→∞ χ
1,k
1,k
Now suppose /8 is an arbitrary Gibbs state for a given X and h. It follows from the DLR equations that for any finite 1 ⊂ Zd we have χ
|/8 (σ1 ) − hσ1 i3 | ≤ sup |hσ1 i3 − hσ1 i3 |
(94)
χ
for all finite subsets 3 of Zd . Thus it follows from (93) that, with probability one, we have χ (95) lim /8 (σ1 ) − hσ1 iBL ≤ lim sup hσ1 iBL − hσ1 iBL = 0 k→∞
1,k
k→∞ χ
1,k
for all finite 1 ⊂ Zd . Thus /8 is unique with probability one.
1,k
The second parts of Theorems 1 and 2 now follow from the respective first parts and Lemmas 3.2 and 3.3. References [BD]
Bassalygo, L. and Dobrushin, R.: Uniqueness of a Gibbs field with random potential – An elementary approach. Theory Probab. Appl. 31, 572–589 (1986) [B] Berretti, A.: Some properties of random Ising models. J. Stat. Phys. 38, 483–496 (1985) [DKP] von Dreifus, H., Klein, A. and Perez, J.F.: Taming Griffith’s singularities: infinite differentiability of correlation functions Commun. Math. Phys. 170, 21–39 (1995) [F] Fr¨ohlich, J.: Mathematical aspects of the Physics of disordered systems. In: Critical Phenomena Random Systems, Gauge Theories, K. Osterwalder and R. Stora (eds.), Amsterdam: Elsevier, 1986 [FI] Fr¨ohlich, J. and Imbrie, J.: Improved perturbation expansion for disordered systems: beating Griffiths singularities. Commun. Math. Phys. 96, 145–180 (1984) [FZ] Fr¨ohlich, J. and Zegarlinski, B.: The high temperature phase of long range spin glasses. Commun. Math. Phys. 110, 121–155 (1987) [GM] Gielis, G. and Maes, C.: The uniqueness regime of Gibbs fields with unbounded disorder. J. Stat. Phys. 81, 829–835 (1995) [G] Griffiths, R. B.: Non-analytic behavior above the critical point in a random Ising ferromagnet. Phys. Rev. Lett. 23, 17–19 (1969) [Su] S¨uto, A.: Weak singularity and absence of metastability in random Ising ferromagnets. J. Phys. A15, L749–L752 (1982) [Z] Zegarlinski, B.: Spin glasses and long range interactions at high temperature. J. Stat. Phys. 47, 911–930 (1987) Communicated by Ya. G. Sinai
Commun. Math. Phys. 189, 513 – 519 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Models with Even Potential and the Behaviour of Total Spin at the Critical Point Boris S. Nahapetian Institute of Mathematics of Armenian National Academy of Sciences, Marshal Bargramian Ave. 24–b, 375019 Yerevan, Armenia Received: 10 October 1996 / Accepted: 20 January 1997
Dedicated to the memory of Roland Dobrushin Abstract: We consider a model with even potential and describe its phase diagram. We show that at the critical point of this model the total spin is asymptotically standard normal. A formula expressing the Ising model total spin probabilities by means of total spin probabilities of the considered model is established. 1. Many probabilistic methods are widely used in problems of mathematical statistical physics. However such a powerful and known martingale method practically was ignored in this theory. The reason apparently is that the notion of a martingale is essentially based on the real line completely ordering property, but the main problems of statistical physics are multidimensional as a rule. In [1–4] an attempt to use the martingale method in the theory of Gibbs random fields (r.f.) was realized. The basic notion of these works is a notion of martingale-difference r.f.. A r.f. ξt , t ∈ Zν is called martingale-difference r.f., if E|ξt | < ∞, t ∈ Zν and E(ξt |ξs , s ∈ Zν \ {t}) = 0
a.s.
t ∈ Zν .
Note that for martingale-difference r.f. ξt , t ∈ Zν the family of random variables X SV = ξt , V ∈ W, W = {V ⊂ Zν : |V | < ∞} t∈V
forms a martingale with respect to any sequence of increasing subsets Vn ∈ W , n = 1, 2..., i.e. E(SVn |σ(ξt , t ∈ Vn−1 )) = SVn−1 a.s. n = 2, 3, . . . . There are various constructions of martingale-difference r.f. (see [2–4]). We mention the following one: let a Gibbs random field component take values in some symmetric set X ⊆ R1 (x ∈ X ⇒ −x ∈ X) with finite symmetric measure µ (µ(A) = µ(−A)) and let the potential 8 of this Gibbs r.f. be even, i.e.
514
B.S. Nahapetian
8V (xt , t ∈ V ) = 8V (|xt |, t ∈ V ),
V ∈ W.
Then the Gibbs r.f. will be a martingale-difference r.f. 2. For the martingale–difference r.f. the following central limit theorem (CLT) is valid. Theorem 1. ([2]) Let ξt , t ∈ Zν be a homogeneous ergodic martingale-difference random field such that 0 < σ 2 = Eξ02 < ∞. Then Z x 2 SV n 1 √ lim P < x = e−u /2 du, x ∈ R1 , (1) ν/2 n→∞ σn 2π −∞ where Vn is a ν-dimensional cube with side length n, n = 1, 2, . . . . Corollary 1. Let ξt , t ∈ Zν be a translation-invariant ergodic Gibbs random field with even potential and 0 < Eξ02 < ∞. Then for this r.f. (1) is valid. 3. Classical statistical physics generally assumes the CLT break-down at the critical temperatures for ν = 2, 3 dimension lattice models (see, for instance, [5–7]). The usual justification refers to the Ising ferromagnetic model. This model is defined on the integer lattice Zν , ν ≥ 1 by means of the nearest neighbor potential xt xs , |t − s| = 1, 8{t,s} (xt , xs ) = 0, |t − s| 6= 1, where |t − s| = max |t(i) − s(i) |, 1≤i≤ν
t, s ∈ Zν ,
xt , xs ∈ X = {−1, 1}.
Let H3β,h (x/x), 3 ⊂ Zν , |3| < ∞ be the Hamiltonian of the Ising model defined on the configurations x = (xt , t ∈ 3) ∈ X 3 with boundary conditions x ∈ X ∂3 , ∂3 = {t ∈ Zν \ 3 : d(t, 3) = 1}, i.e. H3β,h (x/x) =
β X xt xs + β 2 ht,si t,s∈3
d(t, 3) = inf {|t − s|}, s∈3
X
xt xs + h
ht,si t∈3,s∈∂ 3
X
xt .
t∈3
Above ht, si denotes summation over all ordered pairs of nearest neighbors t and s; the parameters h ∈ R1 and β ∈ R1+ are called the external field and inverse temperature respectively. In the Ising model the distribution finite volume 3 with boundary conditions x ∈ X ∂3 has the Gibbs form −1 P3β,h (x/x) = Zβ,h (x) exp[−H3β,h (x/x)], x ∈ X 3 , 3 Zβ,h 3 (x) =
X
exp[−H3β,h (x/x)].
x∈X 3
The boundary conditions {xt = 1, t ∈ ∂3} and {xt = −1, t ∈ ∂3}, which are called “plus" and “minus" correspondingly, are of special importance. The corresponding β,h β,h β,h β,h Hamiltonians and Gibbs distributions we denote as H3,+ , P3,+ , H3,− , P3,− . It is well
Total Spin at Critical Point
515
β,h β,h known that when 3 ↑ Zν , both measures P3,+ and P3,− weakly converge to some β,h β,h limits P+ and P− . Both limiting distributions are ergodic Gibbs and any other spaceinvariant limiting distribution is a linear combination of these two.
For h 6= 0 and every β, 0 < β < ∞ both distributions P+β,h and P−β,h coincide for the range mentioned above: P+β,h = P−β,h = P β,h . This means that in the Ising model for the mentioned range of parameters (β, h), we have uniqueness. For h = 0 the uniqueness takes place only in some interval 0 < β ≤ βcr and for β > βcr we have P+β,0 6= P−β,0 . The last relation means that as β increases and reaches the inverse critical temperature βcr , we have a phase transition. Note also that for 0 < β ≤ βcr , h = 0 the pair correlations E β,0 xt xs for the distribution P β,0 decrease exponentially when |t − s| → ∞. In other words, there exists a positive quantity τ (β, 0), which is called the correlation length, such that
E
β,0
|t − s| . xt xs ∼ exp − τ (β, 0)
The standard albeit not rigorous arguments in favor of CLT break-down at the critical point for Ising model are as follows. As β approaches βcr , β ≤ βcr , h = 0 the correlation between spins increases and the correlation length τ (0, β) tends to infinity. At the critical point τ (0, βcr ) becomes infinite and correlations decay in a power law. In this situation one can not consider the spins as weakly dependent random variables. Hence CLT breakdown. At the same time the following question arises: how to rescale the total spin to obtain a nondegenerate limiting distribution? To demonstrate lack of rigor in the previous reasoning we give below a simple example of an exactly solvable lattice model for which the total spin at the critical temperature is asymptotically normal. In this model the pair correlations between spins equals zero for any (h, β), particularly for β = βcr and h = hcr . Our model is described by the following nearest neighbor potential
e {t,s} (yt , ys ) = 8
|yt | · |ys |, 0,
|t − s| = 1, |t − s| 6= 1,
t, s ∈ Zν ,
where yt , ys ∈ Y = {−1, 0, 1}. The Hamiltonian is given by X e β,h (y/y) = β |yt | · |ys | + β H 3 2 ht,si t,s∈3
X ht,si t∈3,s∈∂ 3
|yt | · |y s | + h
X
|yt |,
y ∈ Y 3 , y ∈ Y ∂3 .
t∈3
e β,h in a more convenient form. We have Let us rewrite the Hamiltonian H 3
516
B.S. Nahapetian
e β,h (y/y) = H 3 β X β X = (2|yt | − 1)(2|ys | − 1) + |yt | 8 ht,si 2 ht,si t,s∈3
β X β − 1+ 8 ht,si 4 t,s∈3
X
β + 2 =
t,s∈3
X
(2|yt | − 1)(2|y s | − 1)
ht,si t∈3,s∈∂ 3
|yt | +
ht,si t∈3,s∈∂ 3
β 2
X
|y s | −
ht,si t∈3,s∈∂ 3
β X β (2|yt | − 1)(2|ys | − 1) + 8 ht,si 4 t,s∈3
hX β + (2|yt | − 1) + 2 t∈3 2
X
X
ht,si t∈3,s∈∂ 3
f (|3|, |∂3|) = −
β 2
1+
ht,si t∈3,s∈∂ 3
hX h (2|yt | − 1) + |3| 2 t∈3 2
(2|yt | − 1)(2|y s | − 1)
ht,si t∈3,s∈∂ 3
|yt | +
where
X
β 4
X
h |3|, 2
|y s | + f (|3|, |∂3|) +
ht,si t∈3,s∈∂ 3
β β X 1− 8 ht,si 4 t,s∈3
X
1.
ht,si t∈3,s∈∂ 3
Further X β e β,h (y/y) = β H (2|yt | − 1)(2|ys | − 1) + 3 8 ht,si 4 t,s∈3
+
X
(2|yt | − 1)(2|y s | − 1)+
ht,si t∈3,s∈∂ 3
β βX h hX (2|yt | − 1) + |yt | · 2ν + f (|3|, |∂3|) + |3| + 2 t∈3 2 t∈3 2 2
X
|y s |.
ht,si t∈3,s∈∂ 3
Finally X β e β,h (y/y) = β H (2|yt | − 1)(2|ys | − 1) + 3 8 ht,si 4 t,s∈3
+
β h + βν X (2|yt | − 1) + 2 t∈3 2
β/4,(h+βν)/2
= H3
X
X
|y s | +
ht,si t∈3,s∈∂ 3
(2|y| − 1 / 2|y| − 1) +
β 2
(2|yt | − 1)(2|y s | − 1)+
ht,si t∈3,s∈∂ 3
X
βν |3| + f (|3|, |∂3|) = 2 |y s | +
ht,si t∈3,s∈∂ 3
βν |3| + f (|3|, |∂3|), 2
where 2|y| − 1 = {2|yt | − 1, t ∈ 3},
2|y| − 1 = {2|y t | − 1, t ∈ ∂3}.
Using (2) we write the corresponding Gibbs distribution
(2)
Total Spin at Critical Point
517
h i β/4,(h+βν)/2 exp −H3 (2|y| − 1 / 2|y| − 1) h i. Qβ,h 3 (y/y) = P β/4,(h+βν)/2 exp −H (2|y| − 1 / 2|y| − 1) 3 3 y∈Y Put Ye = {0, 1}. Then X
h i β/4,(h+βν)/2 exp −H3 (2|y| − 1 / 2|y| − 1) =
y∈Y 3
= 2|3|/2
P
X
=
t∈3
2
e3 y ∈Y e X e3 y ∈Y e
= 2|3|/2
1
e yt
P t∈3
22
h i β/4,(h+βν)/2 exp −H3 (2|e y | − 1 / 2|y| − 1) =
(2e yt −1)
h i β/4,(h+βν)/2 exp −H3 (2|e y | − 1 / 2|y| − 1) =
h i β/4,(h+βν−ln 2)/2 exp −H3 (2|e y | − 1 / 2|y| − 1) =
X e3 y ∈Y e
β/4,(h+βν−ln 2)/2
= 2|3|/2 Z3 Further we have Qβ,h 3 (y/y) = 2
−
P t∈3
|yt |
2
1 2
P t∈3
(2yt −1)
(2|y| − 1).
h i β/4,(h+βν)/2 exp −H3 (2|y| − 1 / 2|y| − 1) β/4,(h+βν−ln 2)/2
Z3
(2|y| − 1) h i β/4,(h+βν−ln 2)/2 P exp −H3 (2|y| − 1 / 2|y| − 1) − |yt | t∈3 = =2 β/4,(h+βν−ln 2)/2 Z3 (2|y| − 1) P − |yt | β/4,(h+βν−ln 2)/2 t∈3 P3 (2|y| − 1 / 2|y| − 1). =2
=
Consider now two sets of boundary conditions {y ∈ Y 3 : yt = 0, t ∈ 3},
{y ∈ Y 3 : yt 6= 0, t ∈ 3},
the first set consist of only a zero configuration, the second set includes all configurations β,h with nonzero components. Denote by Qβ,h 3,0 , Q3,+ the Gibbs distributions corresponding to zero boundary conditions and nonzero conditions respectively. We have P − |yt | β/4,(h+βν−ln 2)/2 t∈3 P3,− (2|y| − 1), Qβ,h 3,0 (y) = 2 P − |yt | β/4,(h+βν−ln 2)/2 t∈3 P3,+ (2|y| − 1). Qβ,h 3,+ (y) = 2 For I ⊂ 3 the following equalities are valid: P X −P |e yt | β/4,(h+βν−ln 2)/2 − |yt | β,h t∈3\I t∈I 2 P3,− (2|y| − 1, 2|e y | − 1) = (Q3,0 )I (y) = 2 y ∈Y 3\I e P − |yt | β/4,(h+βν−ln 2)/2 t∈I (2|y| − 1), (3) P3,− =2 I
518
B.S. Nahapetian
where
yt − 1, t ∈ 3 \ I). (2|y| − 1, 2|e y | − 1) = (2yt − 1, t ∈ I, 2e
Similarly (Qβ,h 3,+ )I (y) = 2
−
P t∈I
|yt |
β/4,(h+βν−ln 2)/2
P3,+
I
(2|y| − 1).
(4)
Since as 3 ↑ Zν the limits in the right hand sides of (3) and (4) exist, we get P − |yt | β/4,(h+βν−ln 2)/2 t∈I P− (2|y| − 1), (Qβ,h 0 )I (y) = 2 I P − |yt | β/4,(h+βν−ln 2)/2 t∈I (2|y| − 1), y ∈ Y I . P+ (Qβ,h + )I (y) = 2 I
From these equations we conclude that whenever h + βν − ln 2 6= 0, the limiting Gibbs distribution in our model for ν = 2 is unique (since for corresponding parameters we have the uniqueness in the Ising Model) and there is the phase transition on the line h + βν − ln 2 = 0, 0 < β < ∞. The critical point has the coordinates ∗ = 4βcr , βcr
∗ h∗cr = −βcr ν + ln 2,
where βcr is the critical inverse temperature for the Ising model. Note now that the Gibbs distribution of our model at the critical point is unique and therefore ergodic. This distribution is translation-invariant and has the martingaledifference property since the corresponding potential is even (see, for example, [2] – [4]). It remains to use Corollary 1. 4. Denote by SVIs , SVev the total spin in the volume V ∈ W for the above considered Ising model and for the model with even potential respectively. It is not difficult to see that the following relation is fulfilled: [(|V |−k)/2]
P (SVev
X
= k) =
j=0
2j Ck+2j 2k+2j P (SVIs = k + 2j),
k ≥ 0,
where [·] denotes the integer part of a number. More important is the following inverse formula: [(|V |−k)/2] X 2j Ck+2j P (SVev = k + 2j)+ P (SVIs = k) = 2−k P (SVev = k) − j=1
[(|V |−k)/2]−j
[(|V |−k)/2]
X
+
j=1
2j Ck+2j 22j
X s=1
[(|V |−k)/2]−j
[(|V |−k)/2]
−
X j=1
[(|V |−k)/2]−j−l
×
X
2s Ck+2(s+j) 22s P (SVev = k + 2(s + j))−
2j Ck+2j 22j
X s=1
2s Ck+2(s+j) 22s ×
2l Ck+2(s+j+l) 22l P (SVev = k + 2(s + j + l)) + · · · .
l=1
This formula probably gives a possibility to investigate the asymptotical behaviour for total spin of Ising model by use of the obtained properties of the model with even potential.
Total Spin at Critical Point
519
References 1. Nahapetian, B.S., Petrosian, A. N.: Martingale-difference Gibbs random fields and central limit theorem. Ann. Acad. Sci. Fennicae, Ser. A. I. Math., Vol. 17, 105–110 (1992) 2. Nahapetian, B.S.: Billingsley-Ibragimov theorem for martingale-difference random fields and its applications to some models of classical statistical physics. C. R. Acad. Sci. Paris, Vol. 320, Ser.I, 1539–1544 (1995) 3. Nahapetian, B.S., Petrosian, A.N.: Martingale-difference random fields. Limit theorems and some applications. E.Schr¨odinger Intern. Inst., Vienna, Preprint ESI 283, 1995 4. Nahapetian, B.S., Petrosian, A.N.: Limit theorems for martingale-difference random fields" [in Russian], Izv. Akad. Nauk Armenii. Matematika [English translation: J. Contemp. Math. Anal. (Armenian Academy of Sciences)], Vol. 30, 6, 22–38 (1995) 5. Sinai, Ya.G.: Theory of Phase Transitions: Rigorous Results. Oxford: Pergamon Press and Budapest: Akademiai Kiado, 1982 6. Ellis, R.S.: Entropy, Large Deviations and Statistical Mechanics.Berlin–Heidelberg–New York: SpringerVerlag, 1985 7. Georgii, H.-O.: Gibbs Measures and Phase Transitions. Berlin, New York: de Gruyter, 1988 Communicated by Ya. A. Sinai
Commun. Math. Phys. 189, 521 – 531 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Statistical Approach to Dynamical Inverse Problems? P. L. Chow, R. Z. Khasminskii Department of Mathematics, Wayne State University, Detroit, MI 48202, USA Received: 10 October 1996 / Accepted: 20 January 1997
Dedicated to the memory of Roland Dobrushin Abstract: Based on nonparametric estimation ideas, a statistical approach to the dynamical inverse problem for some nth order nonlinear differential equations is introduced. It is proved that the real-time filter type of estimators converge respectively, as the data noise tends to zero, to the unknown force function and its derivatives at an optimal rate in the minimax sense. 1. Introduction According to Hadamard [6], an initial and/or boundary value problem in mathematical physics is said to be well-posed if the problem has a unique solution which depends continuously on the coefficients of the equation and the initial/boundary data. Otherwise the problem is considered ill-posed. In the mathematical formulation of a physical problem, the well-posedness is usually required of the direct problem of finding the solutions to the governing differential equations with known coefficients and given initial/boundary conditions. In many engineering and physical applications, it is of great interest to consider an inverse problem: Given the usually imprecise information on the solution, determine the unknown coefficients and, possibly, the initial/boundary data. Such inverse problems are typically ill-posed, since a small error in solutions can result in a large deviation from the true values of the unknown coefficients or other parameters to be determined. It is this lack of continuous dependence that causes the main difficulities in most inverse problems. To get around these difficulties, there exist many deterministic approaches to this type of problems: for instance, the method of quasi-solutions [9], the regularization method of Tikhonov [15]. Alternatively statistical approaches to inverse problems have been considered by many researchers, including Sudakov and Khalfin [13], Franklin [4], Wahba [16], O’Sullivan [14], Donoho [2], Banks and Kunisch [1], ? This work was supported in part by the Office of Naval Research under the grants N00014-93-0936 and N00014-95-0793.
522
P. L. Chow, R. Z. Khasminskii
Johnstone and Silverman [10], Golubev and Pinsker [5], and Ermakov [3], among others. For a statistical approach, similar to the least-square method, random data noises are introduced to model the measurement errors and the ill-posed problem is reformulated as a well-defined statistical estimation problem. All these authors except [1] were concerned with linear or linearized inverse problems, while the methodology used in [1], basically a least square approach, is quite different from what will be proposed here. This paper is concerned with a statistical approach to some nonlinear dynamical inverse problems. As an example, consider the dynamical equation for the nonlinear oscillation of a particle given by x00 (t) = f (t, x, x0 ) + S(t),
(1.1)
where f (t, x, x0 ) is the combination of the nonlinear elastic and frictional forces, and S(t) is the external force, which is only known to belong to a certain class. The problem is to determine the force function S(t) by observing the trajectory {x(s), s ≤ t} in the presence of measurement noise. As a generalization of (1.1), we shall consider the case of nth order equation describing certain multidimensional dynamical systems. Based on a non-parametric estimation idea, we propose a new statistical approach to treat a class of dynamical inverse problems governed by nonlinear equations such as (1.1). For the upper bounds we apply the nonparameteric kernel estimation technique due to Rosenblatt [12] and Parzen [11]. For dynamical problems, to satisfy the casuality principle in physics, the window function is modified so that the estimator becomes a filter “in real time”. Moreover it will be shown that the proposed statistical approach will provide a nonlinear filter type of estimator which converges to the unknown force function, as the data noise tends to zero, at an optimal rate in a minimax sense. The main results are contained in Theorems 1-2 and Theorems 3-4 which give, respectively, the upper and the lower bounds on the estimation error for different classes of S and its derivatives within a large class of loss functions. For simplicity we shall only consider dynamical inverse problems governed by a single nth order differential equation. The current approach can be adapted to the case of systems of nonlinear equations as well.
2. Nonparametric Estimation for Inverse Problem Let x(t) be a solution of the initial value problem for the nth order differential equation x(n) (t) = f (t, x, x0 , . . . , x(n−1) ) + S(t), 0 < t ≤ T,
(2.1)
subject to the initial conditon: x(i) (0) = x(i) 0 , i = 0, 1, . . . , n − 1, where f has (k + 1) bounded partial derivatives with respect to t, x, x0 , . . . , x(n−1) , and the forcing function S(t), t ∈ [0, T ] and, possibly, the initial data are unknown. The data Y˙ε (t), t ∈ [0, T ], for the position x(t) is observed in the presence of a white noise of intensity ε so that dYε (t) = x(t)dt + εdW (t), 0 < t ≤ T,
(2.2)
with Yε (0) = 0, where W (t), t ≥ 0, is the standard Wiener process. Let β = k + α, with P k being a nonnegative integer and α ∈ (0, 1], and denote by (β, L) the set of functions S(t) on [0, T ] having k continuous derivatives in (0, T ) such that |S (k) (t + h) − S (k) (t)| ≤ L|h|α ,
(2.3)
Statistical Approach to Dynamical Inverse Problems
523
for all t, (t + h) ∈ P (0, T ), where L > 0 is a constant. The problem is, given the a priori information: S ∈ (β, L) to estimate S(t) based on the observed data {Yε (s), s ≤ t}. Let 3(µ) denote a class of monotonically nondecreasing, for x > 0, functions ` : R → R+ such that `(0) = 0, `(−x) = `(x) and `(x) ≤ c exp{µx2 } for some positive constants c and µ. In [7] the authors applied the Parzen-Rosenblatt type of estimators for the estimation of a signal and its derivatives in a Gaussian-white noise of intensity ε corresponding to the case: x(t) = S(t) in Eq. (2.2). It was proved that these estimators are optimal in a minimax sense in the rate of convergence to zero risks as ε → 0. The main goal of this paper is to generalize these results to the nonlinear estimation problem given by (2.1) and (2.2).
Theorem 1. Let g : R → R be a function with n + k continuous derivatives and with the following properties: (1) g(t) = 0 for t ≥ 0 and for t < −A,
(2.4)
where A > 0 is a constant, (2) Z
Z
0
0
ti g(t)dt = 0, i = 1, 2, . . . , k.
g(t)dt = 1; −A
Let S ∈
P
(2.5)
−A
(β, L) and consider the filtering type of estimators for x(i) (t): Z
s−t )dYε (s), i = 0, 1, . . . , n + k, δ
(2.6)
b (n) (t) − f (t, X bε,δ (t), X b (1) (t), . . . , X b (n−1) (t)). Sˆ ε,δ (t) = X ε,δ ε,δ ε,δ
(2.7)
i b (i) (t) = (−1) X ε,δ δ i+1
t 0
g (i) (
and the estimator for S(t):
Then there exist constants C1 > 0 and C2 > 0 (independent of ε, δ) such that, for any ε > 0, δ > 0 and t > Aδ, E|Sbε,δ (t) − S(t)|2 ≤ C1 δ 2β + C2
ε2 δ 2n+1
.
(2.8)
In particular, by choosing δ(ε) = γε2/2(β+n)+1 , we get
S∈
sup P
(β,L)
where Sbε (t) = Sbε,δ(ε) (t).
E|Sbε (t) − S(t)|2 ≤ Cε4β/2(β+n)+1 ,
(2.9)
524
P. L. Chow, R. Z. Khasminskii
b (i) (t) are Gaussian processes. From this Remark 1. Note that the filtering estimators X ε,δ fact and the boundedness of the derivatives of f , it follows that the following stronger result holds: limε→0 S∈
sup P
E`(ε−2β/[2(n+β)+1] (Sbε (t) − S(t))) < ∞,
(2.10)
(β,L)
by a suitable choice of γ, where ` ∈ 3(µ) and µ > 0 being small enough.
Proof of Theorem 1. By (2.2), (2.6) and (2.7), we have
= = = +
. 1Sbε,δ (t) = Sbε,δ (t) − S(t) bε,δ , . . . , X b (n−1) ) − S(t) b (n) (t) − f (t, X X ε,δ ε,δ Z (−1)n t (n) s − t bε,δ , . . . , X b (n−1) ) − S(t) )dYε (s) − f (t, X g ( ε,δ δ n+1 0 δ n Z t s−t (−1) bε,δ , . . . , X b (n−1) ) − S(t)} )x(s)ds − f (t, X g (n) ( { n+1 ε,δ δ δ 0 Z t s−t ε )dW (s). (2.11) g (n) ( (−1)n n+1 δ δ 0
By setting u = (s − t)/δ, integrating by parts and invoking (2.1), (2.4) and (2.5), (2.11) for t > Aδ can be rewritten as Z 0 g(u)[x(n) (t + δu) − x(n) (t)]du 1Sˆ ε,δ (t) = −A
bε,δ , . . . , X b (n−1) )} {f (t, x, . . . , x(n−1) ) − f (t, X ε,δ Z t s − t ε )dW (s), g (n) ( + (−1)n n+1 δ δ 0
+
(2.12)
which yields the following estimate: Z 0 g(u)[x(n) (t + δu)) − x(n) (t)]du}2 E|1Sbε,δ (t)|2 ≤ 3{ −A
+ +
bε,δ , . . . , X b (n−1) ) − f (t, x, . . . , x(n−1) )|2 3E|f (t, X ε,δ Z 0 ε2 3 2n |g (n) (u)|2 du. (2.13) δ −∞
P Since S ∈ (β, L) and f has (kP + 1) bounded derivatives with respect to all the arguments, we can deduce that x(t) ∈ (n + β, L1 ) for some L1 > 0. By a Taylor series expansion for x(n) (t + δu) with remainder: x(n) (t + δu) − x(n) (t) Z δu k−1 X 1 (δu)j (n+j) x (t) + (δu − s)k−1 x(n+k) (t + s)ds, = j! (k − 1)! 0 j=1
Statistical Approach to Dynamical Inverse Problems
525
and by (2.5), we have Z 0 g(u)[x(n) (t + δu) − x(n) (t)]du −A
=
1 (k − 1)!
Z
Z
0
δu
g(u)[ −A Z 0
(δu − s)k−1 x(n+k) (t + s)ds]du
0 Z δu
1 g(u) (δu − s)k−1 [x(n+k) (t + s) − x(n+k) (t)]dsdu. (k − 1)! −A 0 P Since x(·) ∈ (β + n, L1 ), it follows that Z 0 | g(u)[x(n) (t + δu) − x(n) (t)]du| =
−A
Z 0 Z 0 L1 |g(u)|( |δu|α |δu − s|k−1 ds)du (k − 1)! −A δu Z L1 β 0 δ |g(u)||u|β du k! −A
≤ = ≤
Ck δ β .
Similarly, for each i ≤ n, we can show Z 0 | g(u)[x(i) (t + δu) − x(i) (t)]du| −A
≤
Ci δ n−i+β ,
(2.14)
for some Ci > 0. By the smoothness of f with bounded derivatives, there exists K2 > 0 such that b (n−1) ) − f (t, x, . . . , x(n−1) )|2 bε,δ , . . . , X E|f (t, X ε,δ bε,δ (t) − x(t)|2 + · · · + |X b (n−1) (t) − x(n−1) (t)|2 }. ≤ K2 E{|X ε,δ
(2.15)
Now, for i = 0, 1, . . . , n, we have, as in (2.13) by noting (2.14), Z (−1)i t (i) s − t (i) (i) 2 ˆ E|Xε,δ (t) − x (t)| = E{ i+1 )dYε (s) − x(i) (t)}2 g ( δ δ 0 Z 0 ≤ { g(u)[x(i) (t + δu) − x(i) (t)]du}2 −A 2 Z 0
+ ≤
ε δ 2i+1
−∞
|g (i) (u)|2 du
Ci δ 2(β+n−i) (
Z
0 −∞
|g(u)||u|β du)2 +
ε2 δ 2i+1
Z
0 −∞
|g (i) (u)|2 du.
(2.16)
In view of (2.11) to (2.16), there exist C1 > 0 and C2 > 0 such that the mean-square error: ε2 E|1Sˆ ε,δ (t)|2 ≤ C1 δ 2β + C2 2n+1 δ
526
P. L. Chow, R. Z. Khasminskii
or the estimate (2.8) holds. By setting δ(ε) = γε2/[2(n+β)+1] the estimate (2.9) follows immediately. Remark 2. This theorem shows that the mean-square estimation error is of O(ε4β/[2(β+n)+1] ) for small ε. Clearly the rate for Tε to converge to S increases with the degree β of the smoothness of S and decreases with the order n of the differential equation (2.1). According to the upper bound (2.9), the best possible rate of convergence approaches ε2 as β → ∞. Remark 3. The same approach can be applied to the estimation of the derivatives S (i) (t) for i = 1, 2, ..., k. Consider the estimators: di (i) (n+i) (n−1) (t) = Xˆ ε,δ (t) − i f (t, Xˆ ε,δ (t), . . . , Xˆ ε,δ (t)). (2.17) Sˆ ε,δ dt Theorem 2. Let the conditions of Theorem 1 be fulfilled. Then, for i = 1, 2, ..., k, there exists a constant C > 0 such that , for any ε > 0, t > Aε2/[2(n+β)+1] , S∈
sup P
E|Sbε(i) (t) − S (i) (t)|2 ≤ Cε4(β−i)/2(β+n)+1 .
(2.18)
(β,L)
Proof. For brevity, we will only sketch the proof of the simplest case with i = 1, since the proof for the general case is quite similar. Now, for i = 1 , the estimator for S 0 (t) is given by 0 bε,δ (t), . . . , X b (n+1) (t) − d f (t, X b (n−1) (t)). (t) = X Sˆ ε,δ ε,δ ε,δ dt Then Z (−1)n+1 t (n+1) s − t 0 0 1Sˆ ε,δ )dYε (s) (t) = Sˆ ε,δ (t) − S 0 (t) = g ( δ n+2 δ 0 d bε,δ (t), . . . , X b (n−1) )(t) − S 0 (t). − f (t, X (2.19) ε,δ dt By Eq. (2.1), (2.2), integration by parts and changing variables, similar to (2.12), Eq. (2.19) can be rewritten for t > Aδ as Z 0 0 ˆ g(u)[x(n+1) (t + δu) − x(n+1) (t)]du 1Sε,δ (t) = −A
+ +
d d bε,δ , . . . , X b (n−1) )} f (t, x, . . . , x(n−1) ) − f (t, X ε,δ dt dt Z t s−t ε )dW (s), g (n+1) ( (−1)n+1 n+2 δ δ 0
{
(2.20)
which implies that 0 (t)|2 E|1Sbε,δ
Z
0
≤
3{
+
3E|
+
−A
g(u)[x(n+1) (t + δu)) − x(n+1) (t)]du}2
d bε,δ , . . . , X b (n−1) ) − d f (t, x, . . . , x(n−1) )|2 f (t, X ε,δ dt dt 2 Z 0 ε 3 2n+3 |g (n+1) (u)|2 du. (2.21) δ −∞
Statistical Approach to Dynamical Inverse Problems
527
Similar to (2.14) , it is easy to show |
R0 −A
g(u)[x(n+1) (t + δu) − x(n+1) (t)]du| ≤
Cδ β−1 .
(2.22)
Next consider d bε,δ , . . . , X b (n−1) ) − d f (t, x, . . . , x(n−1) )| f (t, X ε,δ dt dt ∂ bε,δ , . . . , X b (n−1) ) − ∂ f (t, x, . . . , x(n−1) )| ≤ | f (t, X ε,δ ∂t ∂t |
+
n−1 X
bε,δ , . . . , X b (n−1) )X b (i) (t) − fi (t, x, . . . , x(n−1) )x(i) (t)|, |fi (t, X ε,δ ε,δ
(2.23)
i=0
where fi (t, x, . . . , x(n−1) ) =
∂ f (t, x, . . . , x(n−1) ). ∂xi
In view of (2.23 ) and Lipschitz continuity and boundness of f, fi , x(i) we can write d bε,δ , . . . , X b (n−1) ) − d f (t, x, . . . , x(n−1) )|2 f (t, X E| dt ε,δ dt Pn (i) ε2 (i) b ≤ C i=0 E|Xε,δ (t) − x (t)|2 ≤ C1 (δ 2β + δ2n+1 ).
(2.24)
(We took into account (2.16) in the last inequality.) Taking into account also (2.21)-(2.24) we can write 0 (t)|2 ≤ C2 (δ 2(β−1) + E|1Sbε,δ
ε2 δ 2n+3 ).
Choosing now δ = ε2/[2(β+n)+1] , we reach the desired conclusion for i = 1.
(2.25)
3. Lower Bound for Risks Now we will show that the rate of convergence as indicated by (2.9) is indeed optimal P in a minimax sense. To this end let ϕ ∈ (n + β, 1) be a (n + k + 1)-time differentiable function with ϕ(t) = 0 for t < −A , ϕ(i) (0) = 0 for i ≤ (n − 1) and ϕ(n) (0) 6= 0. For a fixed t0 ∈ (0, T ), let x0 (t) = θε2(β+n)/[2(β+n)+1] ϕ(κ(t − t0 )ε−2/[2(β+n)+1] ),
(3.1)
where θ ∈ Θ = [0, 1] is a parameter and κ is a constant. We shall need the following lemma, whose proof can be found in [7] or [8] (see Lemma 2.1 in [7]).
528
P. L. Chow, R. Z. Khasminskii
Lemma 1. Consider the problem of estimating the parameter θ ∈ Θ based on the observation equation dxε (t) = θϕε (t)dt + εdW (t), t ∈ (0, T ],
(3.2)
with xε (0) = 0. Assume that the set {θ : |θ| < bεkϕε k−1 L2 [0,T ] } is contained in Θ. Then for any ` ∈ 3, lim inf inf sup E`(ε−1 kϕε kL2 [0,T ] (Tε − θ)) ε→0 Tε θ∈Θ Z 2 1 ≥ √ `(x)e−x /2 dx, 2 2π |x|
(3.3)
where the inf is taken over all estimators Tε based on xε (t), t ∈ [0, T ]. In particular, if xε (t) satisfies dxε (t) = x0 (t)dt + εdW (t), t ∈ [0, t0 ],
(3.4)
ϕε (t) = ε2(β+n)/[2(β+n)+1] ϕ(κ(t − t0 )ε−2/[2(β+n)+1] ),
(3.5)
so that
then, for t0 >
A 2/[2(β+n)+1] , κε
we obtain Z
kϕε k2L2 [0,t0 ]
=
t0
ε4(β+n)/[2(β+n)+1] 0
=
ε[4(β+n)+2]/[2(β+n)+1]
=
ε2 kϕk2 . κ
1 κ
ϕ2 (κ(t − t0 )ε−2/[2(β+n)+1] )dt Z
0
ϕ2 (u)du −A
Therefore, for the estimation problem (3.2), Lemma 1 implies the following lower bound: Z 2 kϕk 1 (Tε − θ)) ≥ √ `(x)e−x /2 ds, (3.6) lim inf inf sup E`( ε→0 Tε θ∈Θ κ 2 2π |x|
A 2/[(β+n)+1] . κε
Consider now the function (n−1) 0 Sε,θ (t) = x(n) (t)), 0 (t) − f (t, x0 (t), x0 (t), . . . , x0
(3.7)
where the function x0 (t) is defined in (3.1).
Lemma 2. The function Sε,θ (t) belongs to the class of κ in (3.1).
P
(β, L) for some suitable choice
Statistical Approach to Dynamical Inverse Problems
529
Proof. By (3.1), we obtain Sε,θ (t)
=
θκn ε2β/[2(β+n)+1] ϕ(n) (κ(t − t0 )ε−2/[2(β+n)+1] )
−
f (t, x0 (t), . . . , x(n−1) (t)), 0
(3.8)
so that (k) (t) Sε,θ
=
θκn+k ε2α/[2(β+n)+1] ϕ(n+k) (κ(t − t0 )ε−2/[2(β+n)+1] )
− . =
dk f (t, x0 (t), x0 0 (t), . . . , x(n−1) (t)) 0 dtk S1 (t) − S2 (t).
(3.9)
In the expressions (3.8) and (3.9), the arguments in f are substituted by ` 2(β+n−`)/[2(β+n)+1] (`) ϕ (κ(t − t0 )ε−2/[2(β+n)+1] ), x(`) 0 (t) = θκ ε
for ` = 0, 1, . . . , (n − 1). It is clear from the above expression, the boundedness of
∂γ f ∂ξ γ
(3.10)
for |γ| ≤ k + 1 (γ is a
multi-index) and x(j) 0 for j ≤ n + k − 1, that S2 (t) has (k + 1) bounded partial derivatives with |S2(k+1) (t)| < L/2, if κ is chosen sufficiently small. Now let us consider S1 (t): |S1 (t + h) − S1 (t)| = θκn+k ε2α/[2(β+n)+1] · |ϕ(n+k) (κ(t − t0 +h)ε−2/[2(β+n)+1] ) − ϕ(n+k) (κ(t− t0 )ε−2/[2(β+n)+1] )| ≤ Cθκn+k+α |h|α . So for κ small enough |S1 (t + h) − S1 (t)| < (L/2)|h|α . The lemma is proved.
With the aid of the lemmas, we can easily verify the following theorem. Theorem 3. For the estimation of the value of S(t) at t0 in Eq. (2.1) with arbitrary initial conditions x(0) = x0 , x0 (0) = x1 , . . . , x(n−1) (0) = xn−1 ,
(3.11)
by observing Yε (t) with t ∈ [0, T ], the following lower bound holds for any ` ∈ 3(µ), t0 > Cε2/[2(β+n)+1] , for sufficiently large constant C: lim inf inf sup E`(ε−2β/[2(β+n)+1] (Tε − S(t0 ))) > 0, ε→0 Tε
S
where the inf is taken over all estimators Tε and the sup over S ∈
P
(3.12) (β, L).
530
P. L. Chow, R. Z. Khasminskii
Proof. Consider first the case of zero initial conditions x(0) = x0 (0) = · · · = x(n−1) (0) = 0.
(3.13)
Then (see (3.7)) the function x0 (t) defined in (3.1) satisfies Eq. (2.1) with S(t) = Sε,θ (t) and the zero initial conditions. Moreover Sε,θ (t0 ) = θκn ε2β/[2(β+n)+1] ϕ(n) (0) − f (t0 , 0, . . . , 0). P By Lemma 2, we have Sε,θ ∈ (β, L) and hence lim inf inf ε→0 Tε
≥ = ≥
S∈
sup P
(3.14)
E`(ε−2β/[2(β+n)+1] (Tε − S(t0 )))
(β,L)
lim inf inf sup E`(ε−2β/[2(β+n)+1] (Tε − Sε,θ (t0 ))) ε→0 Tε θ∈Θ
lim inf inf sup E`(Teε − θκn ϕ(n) (0)), by noting (3.14), ε→0 T eε θ∈Θ Z 2 1 √ `(κn+1 ϕ(n) (0)kϕk−1 x)e−x /2 dx, by (3.6), 2 2π |x|
which is independent of ε and is strictly positive for any ` ∈ 3 if κ is chosen properly. The theorem is proved for the initial conditions (3.13). For the general initial conditions, we need to consider, instead of x0 (t) the function x1 (t) = x0 (t) + Pn−1 (t), where Pn−1 (t) is a polynomial of degree (n − 1) such that (i) Pn−1 (0) = xi , i = 0, 1, . . . , (n − 1),
then, analogously, the function x1 (t) satisfies Eq. (2.1) with the initial conditions (3.11) and with S(t) = Seε,θ (t)
=
θκn ε2β/[2(β+n)+1] ϕ(n) (κ(t − t0 )ε−2/[2(β+n)+1] )
−
f (t, x1 (t), x0 1 (t), . . . , x(n−1) (t)). 1
P Similar to Lemma 2, it is easily seen that the function Seε,θ (t) ∈ (β, L) for a suitable choice of κ. So we can use the same kind of arguments as before to establish the validity of the theorem in general. Remark 4. A simple modification of the proof by requiring the function ϕ(.) be (n+k+1) -time differentiable, ϕ(j) (0) = 0 for 0 ≤ j ≤ n + i − 1 and ϕ(n+i) (0) 6= 0 allows to prove that the upper bound of Theorem 2 is also tight. Theorem 4. Consider the same estimation problem as in Theorem 3. Given any t0 ≥ Cε2/[2(β+n)+1] with C ≥ 0 sufficiently large, the following lower bound holds for i = 1, 2, . . . , k, lim inf inf sup E`(ε−2(β−i)/[2(β+n)+1] (Tε − S (i) (t0 ))) > 0, ε→0 Tε
S
where the inf is taken over all estimators Tε and the sup over S ∈
P
(β, L).
(3.15)
Statistical Approach to Dynamical Inverse Problems
531
Remark 5. Theorems 1-4 support our claim that, among the class 3 of loss functions `, the optimal rate of convergence for the estimator Tε to S is indeed of the order O(ε2β/[2(β+n)+1] ), which cannot be improved. In closing we wish to point out that the statistical approach based on nonparametric estimation ideas has been applied to some inverse problems connected with partial differential equations. The results will be presented in another paper later on. References 1. Banks, H. T. and Kunisch, K.: Estimation Techniques for Distributed Parameter Systems, Birkh¨auser, Boston, 1989 2. Donoho, D.L.: Nonlinear solution of linear inverse problems by wavelet-vaguelette decomposition, 1991, Stanford University. Preprint 3. Ermakov, M.S.: Minimax estimation of solution of ill-posed problem of type of convolution, Problems of information Transmission, 25, 3, 28–39 (1989) 4. Franklin, J.: Well-posed stochastic extensions of ill-posed linear problems, Jour. Math. Analy. and Appl. 31, 682–716 (1970) 5. Golubev, G.K. and Pinsker, M.S.: Extremal problems for minimax estimation of sequences,Problems ofInformation Transmission, 21, 3, 36–52 (1985) 6. Hadamard, J.: Le Probl´eme de Cauchy et les Equations aux D´eriv´ees Partielles Lin´eaires Hyperboliques. Herman et Cie, Paris, 1932 7. Ibragimov, I. A. and Khasminskii, R. Z.: Estimates of the signal, its derivatives and point of maximum for Gaussian distributions, Theory Probab. Appl. 25, 703–720 (1980) 8. Ibragimov, I. A. and Khasminskii, R. Z.: Statistical Estimation: Asymptotic Theory. Springer-Verlag, New York, 1981 9. Ivanov, V. K.: On ill-posed problems, Soviet Math., Sbornik, 61, 211–223 (1963) (in Russian) 10. Johnstone, I.M. and Silverman, B.W.: Discretization effects in statistical inverse problems, Journal of complexity, 1, 1–34 (1991) 11. Parzen, E.: On estimation of a probability density function and mode, Ann. Math. Statist. 33, 1065–1076 (1962) 12. Rosenblatt, M.: Curve estimation, Ann. Math. Statist., 42, 1815–1842 (1971) 13. Sudakov, V. N. and Halfin, L. A.: Statistical approach to the correctness of problems in mathematical physics, Soviet Math. Doklady 157, 1094–1096 (1964) 14. O’Sullivan F.: A statistical perspective on ill-posed inverse problems,Statist. Science 1, 502–518 (1986) 15. Tikhonov, A. N. and Arsenin, V. Ya.: Solutions of Ill-posed Problems, John Wiley and Sons, New York, 1977 16. Wahba G.: Smoothing and ill-posed problems. In: Solutions Methods for Integral equations with applications, ed. by M.Goldberg, Plenum, N.Y.(1979), pp. 183–194 Communicated by Ya. G. Sinai
Commun. Math. Phys. 189, 533 – 557 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Almost-Sure Central Limit Theorem for Directed Polymers and Random Corrections C. Boldrighini1,? , R.A. Minlos2,?? , A. Pellegrinotti3,??? 1 Dipartimento di Matematica e Fisica, Universita di Camerino, via Madonna delle Carceri 9, 62032 Camerino, Italy 2 Institute for Problems of Information Transmission, Russian Academy of Sciences 3 Dipartimento di Matematica, Universita degli studi de Roma Tre, Via C. Segre 2, 00146 Roma, Italy
Received: 14 October 1996 / Accepted: 28 March 1997
Dedicated to the memory of Roland Dobrushin Abstract: We consider a general model of directed polymers on the lattice Zν , ν ≥ 3, weakly coupled to a random environment. We prove that the central limit theorem holds almost surely for the discrete time random walk XT associated to the polymer. Moreover we show that the random corrections to the cumulants of XT are finite, starting from some dimension depending on the index of the cumulants, and that there are corresponding random corrections of order T −k/2 , k = 1, 2, . . ., in the asymptotic expansion of the expectations of smooth functions of XT . Full proofs are carried out for the first two cumulants. We finally prove a kind of local theorem showing that the ratio of the probabilities of the events Xt = y to the corresponding probabilities with no randomness, in the region |y − bT | = o(T 23 ) of “moderate” deviations from the average drift bT , are, for almost all choices of the environment, uniformly close, as T → ∞, to a functional of the environment ”as seen from (T, y)”.
1. Introduction Models of growth of directed polymers in a random environment have recently attracted much attention both in the physical and in the mathematical literature (see for example [De-Sp]). The first rigorous results are due, to our knowledge, to Imbrie and Spencer [Im-Sp]. They considered a nearest neighbor symmetric random walk on Zν weakly coupled to a space-time independent field, and proved diffusivity for ν ≥ 3. The same model was then considered by other authors. An almost-sure central limit theorem, also for ν ≥ 3, was proved in [Bo], combining martingale methods and explicit estimates. ? Partially supported by research funds of M.U.R.S.T., of the University of Camerino and of C.N.R. (G.N.F.M.) ?? Partially supported by C.N.R. (G.N.F.M.) and M.U.R.S.T. research funds, by R.F.R.F. grant n. 93-0111470, by AMS funds of US-AID, and NSF grand MJ000 ??? Partially supported by C.N.R. (G.N.F.M.) and M.U.R.S.T. research funds
534
C. Boldrighini, R.A. Minlos, A. Pellegrinotti
Further results, notably an invariance principle, and the existence, in dimension ν ≥ 7, of a finite random correction for the second moment of the displacement XT , were obtained in [Al-Zh] using similar methods. Sinai [Si] considers a slightly different model, proving L2 estimates by a sort of cluster expansion, and giving a sharp upper bound for the values of the strength of the field which ensure convergence. We should also mention a recent paper [Ki], in which almost-sure central limit thorems are proved for models of discrete and continuous symmetric random walks, weakly coupled to a two-point independent random field, and the results are applied to the analysis of the Burgers equation with a random force. We consider here a general finite-range non-degenerate random walk on Zν , weakly coupled to an independent field, in dimension ν ≥ 3. Our results are based on a kind of cluster-expansion technique which allows to estimate the L2n norms of some general classes of functionals of the field. Our methods are related to those of [Im-Sp], which is also based on L2n estimates, and of Sinai [Si] (who considers only L2 norms), with the crucial addition that we devise a new procedure which can be considered, roughly speaking, as a sort of iteration in the index (2n) of the norm. In addition to an almost-sure central limit theorem, we prove that one can find finite random corrections to the cumulants of the displacement XT if the dimension is high enough. We give full proofs only for the first two cumulants, and indicate the procedure for the third one, but it is easy to see how one could proceed for the higher ones. k A connected result is the existence of random terms of order T − 2 in the asymptotic expansion of the expectation of smooth functions of XT . Again we give a full proof only for k = 1, 2. Finally, we prove a kind of local limit theorem, the L2 form of which is proved for a nearest-neighbor walk in [Si]. The L2n estimates are proved in the appendix. Similar methods were applied in the paper [Bo-Mi-Pe] to a model of random walks in dynamical random environment, which however behaves differently with respect to dimension. In particular, due to the Markov character of the model, one can prove an a.s. central limit theorem in dimension ν ≥ 2. One also gets finite corrections to the cumulants in lower dimension. The cluster expansion method used for L2n estimates is more involved for that model, though the procedure is essentially the same. Our model is described as follows. Let {P0 (u) : u ∈ Zν } denote the collection of the transition probabilities of a discrete-time homogeneous random walk, which we suppose to be non-degenerate and finite-range (i.e., there is some D so large that P0 (u) = 0 for |u| > D). Drift and covariance matrix are denoted as X X uP0 (u), cij = (ui − bi )(uj − bj )P0 (u), i, j = 1, . . . , ν. (1.1) b= u
u
The environment is given by a collection ξ = {ξt (x) : t ∈ Z, x ∈ Zν } of i.i.d. random variables which are taken as copies of a non-degenerate random variable ζ ∈ [−1, 1] ν+1 with zero expectation. The corresponding measure on = [−1, 1]Z is denoted by 5, and the notation h·i will refer to averages with respect to 5. For a fixed ξ ∈ we consider the measure on the trajectories t,y )= π(Xs,x
t−1 Y
P0 (Xτ +1 − Xτ )(1 + ξτ (Xτ ) ),
t > s,
(1.2)
τ =s
t,y = {(s, Xs ), (s + 1, Xs+1 ), . . . , (t, Xt )} is a trajectory with Xs = x and where Xs,x Xt = y, i.e., starting at time s at the site x and ending at time t at the site y. ∈ (0, 1)
A.S. Central Limit Theorem for Directed Polymers and Random Corrections
535
is a parameter on which we shall impose a smallness condition. We are interested in the inhomogeneous random walk with transition probabilities Prob(Xt = y|Xs = x) =
X 1 t,y π(Xs,x ), Zx (s, t|ξ) t,y
s < t,
(1.3)
Xs,x
P P
t,y t,y π(X where Zx (s, t|ξ) = y Xs,x s,x ) is the “partition function” (or “statistical sum”). By translation invariance, it is enough for most purposes to consider the special case s = 0, x = 0. We set for brevity ZT (ξ) = Z0 (0, T |ξ) and
πT (y|ξ) =
1 X T,y π(X0,0 ). ZT (ξ) T ,y
(1.4)
X0,0
Before stating the results we introduce some more notation. Let B ⊂ Zν × Z+ be a subset of points of a possible trajectory starting at 0 at time 0 and ending at the point y at time T . We label the points of B in order of increasing time, so that if |B| = n (where | · | denotes the cardinality of a set) we write B = {(t0 , y0 ), . . . , (tn−1 , yn−1 )}. We also use the notation (tj (B), yj (B)) for the j − th point and (tf (B), yf (B)) for the “final point” (t|B|−1 , y|B|−1 ) of B. We set |B|−1
MB (ξ) = ξt0 (y0 )
Y
t −tj−1
P0 j
(yj − yj−1 )ξtj (yj ),
(1.5a)
j=1
and
X
M (t1 , y1 , t2 , y2 |ξ) =
|B| MB (ξ),
B:t0 (B)=t1 ,y0 (B)=y1 tf (B)=t2 ,yf (B)=y2
M (t, y|ξ) =
X X
P0t0 (y0 )M (t0 , y0 , t, y|ξ),
(1.5b)
0≤t0 ≤t y0
where P0t denotes the convolution P0 ∗ P0 ∗ . . . P0 and we use the convention P00 (y) | {z } t < times = δy,0 . By expanding the products we find ZT (ξ)πT (y|ξ) = P0T (y) +
T −1 X X t=0
M (t, z|ξ)P0T −t (y − z),
T ≥ 1.
(1.6a)
z
Summing over y, since πT (·|ξ) is normalized, we get ZT (ξ) = 1 + M0,T (ξ) with MT,T 0 (ξ) =
0 TX −1 X
t=T
M (t, z|ξ).
(1.6b)
z
We state here the main results. The following sections are devoted to the proofs.
536
C. Boldrighini, R.A. Minlos, A. Pellegrinotti
Theorem 1. For ν ≥ 3, and any integer n ≥ 1 there is a positive number (ν, n) such that for < (ν, n) the sequence ZT converges as T → ∞ in L2n (5), as well as 5-a.e., to the functional Z(ξ) = 1 + M(ξ),
M(ξ) =
∞ X X t=0
M (t, z|ξ).
(1.7)
z
Moreover Z(ξ) is positive 5-a.e.. We next consider the limiting behavior of the corrections to the average value and to the covariance matrix: X y πT (y|ξ) − bT, E T (ξ) = y∈Zν T Cij (ξ) =
X
πT (y|ξ)(yi − bi T − EiT (ξ))(yj − bj T − EjT (ξ)) − T cij
i, j = 1, . . . , ν.
y∈Zν
Theorem 2. i) For ν ≥ 5 and any integer n ≥ 1 the sequence E T converges as T → ∞, for small enough, in L2n (5) as well as 5-a.e., to a limiting functional ˆ with E(ξ) = [Z(ξ)]−1 E(ξ), ˆ = E(ξ)
∞ X X
M (t, z|ξ)(z − bt).
(1.8)
t=0 z∈Zν T (ξ) converge as T → ∞, in ii) For ν ≥ 7 and any integer n ≥ 1 the sequences Cij 2n L (5) as well as 5-a.e., to the limiting functionals
Cij (ξ) = −Ei (ξ)Ej (ξ) + where Cˆij (ξ) =
∞ X X t=0
1 ˆ Cij (ξ), Z(ξ)
M (t, z|ξ)[(zi − bi t)(zj − bj t) − tcij ].
(1.9a)
(1.9b)
z
As indicated by Remark 2.2 below (after the proof of Theorem 2), a similar result is easily obtained for the correction to the third cumulant for ν ≥ 9. Higher cumulants could also be considered. The other results concern the central limit theorem. Consider the linear functionals µξT on the space C 0 of the continuous bounded functions on Rν µξT (f ) =
X
z − bT ). πT (z|ξ)f ( √ T z∈Zν
(1.10)
They define positive measures on Rν which we again denote by µξT . Let moreover µ denote the gaussian measure with density s C − 1 A(u) e 2 , (1.11) g(u) = (2π)ν P where A(u) = i,j aij ui uj , {aij } being the elements of the matrix A, inverse of the matrix C with elements cij given by Eq. (1.1), and C = det A.
A.S. Central Limit Theorem for Directed Polymers and Random Corrections
537
Theorem 3. (Central limit theorem 5-a.e.) For all ν ≥ 3 and small enough one ˆ ⊂ of full 5-measure such that for ξ ∈ ˆ the measures µξ tend can find a subset T weakly, as T → ∞, to the gaussian measure µ. Though the parameters of the C.L.T. are the same for 5-a.a. ξ ∈ , the terms of order and higher in the asymptotic expansion of µξT (f ), for smooth f , do depend on ξ. Let C , k ≥ 0, be the Banach space of the functions on Rν which are k times differentiable and are bounded together with their derivatives of order ≤ k, endowed with the norm kf kk = maxx∈Rν maxα,|α|≤k |Dα f (x)|, where α = (α1 , . . . , αν ) is a multiindex with Pν |α| integer values, Dα = ∂xα1∂···∂xαν , αi ≥ 0,|α| = i=1 αi . In what follows we denote ν 1 by Qi , i = 1, 2, . . . the polynomials appearing in the corrections to the local central limit theorem for non-degenerate random walks according to the standard asymptotic expansion (see, e.g., [Gi-Sk], and formula (3.1) below). √1 T k
Theorem 4. i) For ν ≥ 5 and any integer n ≥ 1, if is small enough, the sequence √ T [µξT (f ) − µ(f )] converges in L2n (5), as T → ∞, for any f ∈ C 2 , to the limiting functional (1.12) 8(f |ξ) = µ(Q1 f + E(ξ) · ∇f ). 0
0
ˆ of full 5 measure such that for ξ ∈ ˆ Moreover if n( ν2 − 2) > 1 there is a subset the sequence converges for all f ∈ C 2 . ii) For ν ≥ 7 and any integer n ≥ 1, if is small enough, the sequence 1 ξ (1.13a) T µT (f ) − µ(f ) − √ 8(f |ξ) T converges in L2n (5), as T → ∞, for any f ∈ C 3 , to the limiting functional 9(f |ξ) = µ(f Q2 ) + µ(Q1 E(ξ) · ∇f ) +
1 X ˆ Cij (ξ)µ(fij ). Z(ξ) ij
00
(1.13b) 00
ˆ of full 5 measure such that for ξ ∈ ˆ Moreover if n( ν2 − 3) > 1 there is a subset the sequence converges for all f ∈ C 3 .
Remark. Theorem 4 is closely connected to Theorem 2. One can think of expanding the integral over the gaussian distribution with average b + √ET and covariance matrix C + TC
in the small parameter α = √1T . One would then get the right corrections, except for the terms containing the polynomials Qk , which express the “deterministic” corrections to the local limit theorem. The last result is a kind of local limit theorem. We define the “backward field” η (T,y) (as seen from the point (T, y)) by setting ηt(T,y) (x) = ξT −t (y − x).
(1.14)
We can consider values t > T as well, since the field ξt is defined for all t ∈ Z. The field η (T,y) has the same distribution as ξ. The probability that the random walk takes a particular value y at time T clearly depends on the environment in the neighborhood of y, i.e., on the backward field η (T,y) . Hence an almost-sure local limit theorem cannot hold. It turns out that the probability
538
C. Boldrighini, R.A. Minlos, A. Pellegrinotti
πT (y|ξ) is asymptotically given by the corresponding probability for the random walk P0T (y) times a factor depending on η (T,y) , which is the “backward analogue” of the normalization functional Z of Theorem 1. Theorem 5. For all ν ≥ 3, if is small enough, there is a functional Zˆ of the backward field which is 5-a.e. positive and such that for any choice of the constants K > 0 and β ∈ (0, 16 ) one can find a subset 0 ⊂ of full measure such that for all ξ ∈ 0 πT (y|ξ) (T,y) ˆ − Z(η lim max ) = 0. (1.15) T 1 T →∞ y:|y−bT |≤KT 2 +β P0 (y) The explicit expression of Zˆ is given at the end of Sect. 4. The rest of the paper is organized as follows. In Sect. 2 we prove Theorems 1 and 2. The proofs for the central limit theorem and its stochastic corrections (Theorems 3-4) are carried out in Sect. 3, and the proof of Theorem 5 is given in Sect. 4. The appendix is devoted to the proof of the basic L2n estimates.
2. Proof of Theorems 1-2 Proof of Theorem 1. The proof is based on the estimates of the appendix (Proposition A.1). It follows immediately from Inequality (A.18a), which has to be applied for g = 1, that if is small enough, then h(MT,T 0 (ξ))2n i ≤
const ν T n( 2 −1)
+1
,
(2.1)
where the constant depends on , ν and n. Hence M0,T (ξ) → M(ξ) in L2n . If now n( ν2 − 1) > 1, Inequality (2.1) implies, by the Chebyshev inequality and the Borel-Cantelli lemma, that M0,T (ξ) → M(ξ) 5-a.e.. Otherwise one can observe that M0,T (ξ) is a martingale with respect to the σ-algebras Ft generated by the variables ξs (x), x ∈ Zν , s < t, i.e., hM0,T (ξ)|FT −1 i = M0,T −1 (ξ). Since it is bounded in L1 -norm, by Inequality (2.1), it converges 5-a.e.. As for positivity, observe that, since the field is homogenous, the limits Z(t|ξ) = limT →∞ Z0 (t, T |ξ) also exist 5-a.e. for all t ∈ Z, and the probabilities of the sets At+ = {ξ : Z(t|ξ) > 0} do not depend on t. A simple inequality shows that (1 − )Z0 (t + ⊆ At+ . 1, T |ξ) ≤ Z0 (t, T |ξ), for all ξ, and, taking the limit T → ∞, we see that At+1 + t+1 t 0 As 5(A+ ) = 5(A+ ), we see that that the set A+ is invariant mod 0 under time shifts, and, since the field ξ is ergodic and hZ(ξ)i = 1 we conclude that 5(A0+ ) = 1. Proof of Theorem 2. We set Eˆ T (ξ) =
T −1 X X t=0
M (t, z|ξ)(z − bt).
(2.2)
z
We consider the functionals MT,T 0 (g|ξ) given by Eq. (A.11) with g(t, y) = (y − bt)i , for i = 1, . . . , ν. Clearly |g(t, y)| ≤ |y − bt|, and for ν ≥ 5, since ν2 − 2 > 0, we can apply Inequality (A.18a), which gives for small enough
A.S. Central Limit Theorem for Directed Polymers and Random Corrections
h(MT,T 0 (g|ξ))2n i ≤
const ν T n( 2 −2)
+1
539
,
(2.3)
where the same remarks as before hold for the constant. Reasoning as in the previous ˆ proof we see that M0,T (g|ξ) → E(ξ) in L2n , and, if n( ν2 − 2) > 1, 5-a.e. as well. Otherwise one can observe that the components EiT (ξ), i = 1, . . . , ν are martingales with respect to the σ-algebras Ft . For this one has to note that EˆiT (ξ) can be written as X
T Y
P0 (uj )(1 + ξj−1 (Xj−1 ))
u1 ,...,uT ∈Zν j=1
T X
uˆ i ,
i=1
PT where uˆ = u−b, and the sum i=1 uˆ i is a martingale w.r.t. the σ-algebras Mn generated by the variables u1 , . . . , un . Since Eˆ T is bounded in L1 , it converges 5-a.e.. The convergence of ZT is given by Theorem 1, and assertion i) is proved. As for assertion ii) we have T (ξ) = −EiT EjT + (ZT (ξ))−1 Cij
T −1 X
X
M (t, z|ξ)[(zi − bi t)(zj − bj t) − tcij ].
t=0 z∈Zν
We consider the functionals MT,T 0 (g|ξ) with g(t, y) = gij (t, y) = (y − bt)i (y − bt)j . Since |gij (t, y)| ≤ |y −bt|2 and ν2 −3 > 0 for ν ≥ 7, we can apply once again Inequality (A.18a), to get const . h(MT,T 0 (gij |ξ))2n i ≤ n( ν −3) T 2 +1 The same inequality holds for g(t, y) = t, and we get convergence in L2n for the functionals T −1 X X T (ξ) = M (t, z|ξ)[(zi − bi t)(zj − bj t) − tcij ] Cˆij t=0
z
to the limiting functional Cˆij . If n( ν2 − 3) > 1 convergence 5-a.e. is granted by the T Borel-Cantelli lemma, otherwise one can observe that the sequences Cˆij are martingales 1 bounded in L , just as in the proof of assertion i). The L2n convergence of the product EiT EjT is granted if is so small that E T converges in L4n . Remark 2.1. The L2 (5) norm of M0,T (ξ) diverges if ν ≤ 2. In fact, let M (1) (t, z|ξ) = PT −1 P (1) (1) (1) P0t (z)ξt (z), and M(1) 0,T = z M (t, z|ξ). Then M0,T and M0,T − M0,T are t=1 P P T −1 2 2 t 2 orthogonal in L2 (5), so that h(M0,T )2 i ≥ h(M(1) 0,T ) i = hζ i z (P0 (z)) → ∞, t=0 as T → ∞. In the same way one can see that the L2 norm of Eˆ T diverges as T → ∞ if ν ≤ 4, T and the same happens for Cˆij if ν ≤ 6. Remark 2.2. It is not hard to see that the stochastic correction to the third cumulant X RTijk (ξ) = πT (y|ξ)(yi − bi T − EiT )(yj − bj T − EjT )(yk − bk T − EkT ) − T rijk , y
where rijk =
P u
P0 (u)(ui − bi )(uj − bj )(uk − bk ), is given by the expression
540
C. Boldrighini, R.A. Minlos, A. Pellegrinotti T −1 1 XX M (t, y|ξ) zˆi zˆj zˆk − tzˆi cjk − tzˆj cjk − tzˆk cij − trijk ZT (ξ) z t=0
T T T − EjT Cik − EkT Cij − EiT EjT EkT , − EiT Cjk
where zˆ = z − bt. Reasoning as in the previous proof one easily gets convergence for ν ≥ 9. 3. Proof of Theorems 3 and 4 By a standard result for non-degenerate random walks on the ν-dimensional lattice (see [Gi-Sk], Ch. 2, Sect. 8), we have for any integer r ≥ 0, r
X 1 δ (t) C2 z − bt z − bt √ ) − 21 A( z−bt t [1 + √ )] + r ν Nt(r) ( √ ), ν e j Qj ( (2πt) 2 t2 t t 2 j=1 t 1
P0t (z) =
(3.1)
r const . where Qj is a polynomial of order 3j, δr (t) = o(t− 2 ) and |Nt(r) (v)| ≤ 1+|v| 2+r
˜ (0) We define the measures µ(0) T and µ T by setting µ(0) T (f ) =
X z
z − bT ), P0T (z)f ( √ T
µ˜ (0) T (f ) =
z − bT 1 X z − bT )f ( √ ). g( √ ν 2 T T T z
(ν(·) denotes, as usual, average with respect to the measure ν.) Using some easy estimates on Riemann sums (see [Bo-Mi-Pe], App. B), one finds that for f ∈ C 0 there are constants ∗ − s2 . This implies, as shown c∗s such that µ˜ (0) T (f ) = µ(f ) + rT (f ) with |rT (f )| ≤ cs kf ks T s+1 in [Bo-Mi-Pe] that there are constants cs such that if f ∈ C , then µ(0) T (f ) =
s X µ(f Qj ) j=0
T
j 2
|rs (T ; f )| ≤ cs kf ks+1 T −
+ rs (T ; f ),
s+1 2
.
(3.2)
Proof of Theorem 3. We first prove that for 5-a.a. ξ and f ∈ C 1 lim µξT (f ) = µ(f ).
(3.3)
T →∞
Setting T0 = [T α ], with α ∈ (0, 1), we write µξT (f ) = with Rˆ T (f |ξ) =
TX 0 −1 X t=0
R˜ T (f |ξ) =
1 [µ(0) (f ) + R˜ T (f |ξ) + Rˆ T (f |ξ)], ZT (ξ) T
y
T −1 X X t=T0
M (t, y|ξ)
y
X z
M (t, y|ξ)
X z
(3.4)
z − bT ), P0T −t (z − y)f ( √ T
P0T −t (z
z − bT ). − y)f ( √ T
(3.5)
A.S. Central Limit Theorem for Directed Polymers and Random Corrections
541
P √ Denoting gT (t, y) = z P0T −t (z − y)f ( z−bT ) we see that |gT | ≤ kf k0 . By Corollary T A.2 of the appendix (and the consequent Inequality (A.18a)) we see that for all ν ≥ 3, and n ≥ 1, if is small enough we have h(R˜ T (f |ξ))2n i ≤
const kf k2n 0 . ν T nα( 2 −1) + 1
(3.6)
If n is so large that nα( ν2 − 1) > 1 , by the Borel Cantelli lemma we conclude that the sequence R˜ T (f |ξ) → 0 as T → ∞, 5-a.e.. By adding and subtracting we find 1 gT (t, y) = µ(f ) + √ gT∗ (t, y) + gT0 (t, y) , T
(3.7a)
with √ T (µ(0) T −t (f ) − µ(f )), X z + y − bT √ z − b(T − t) √ ) − f( √ ) P0T −t (z), f( gT0 (t, y) = T T −t T z gT∗ (t, y) =
(3.7b)
and consequently 1 Rˆ T (f |ξ) = M0,T0 (ξ)µ(f ) + √ M0,T0 (gT∗ |ξ) + M0,T0 (gT0 |ξ) . T By a Taylor expansion of f we get X P0T −t (z)∇f (z) ¯ ·u gT0 (t, y) =
(3.8)
(3.9a),
z
where z¯ is a point depending on z, y, T and t. We can write t z − b(T − t) t u = y − bt − √ (1 + r( )) √ , T T −t 2 T
(3.9b)
where r is a C ∞ function on [0, 1) such that r(0) = 0. Hence for all t < T0 and T large enough we have gT0 (t, y) = gˆ T(1) (t, y) − 2√1 T gˆ T(2) (t, y) with gˆ T(1) (t, y) =
X z
P0T −t (z)∇f (z) ¯ · (y − bt),
t X T −t z − b(T − t) gˆ T(2) (t, y) = t(1 + r( )) . P0 (z)∇f (z) ¯ · √ T T −t z
(3.10)
By Inequality (3.2) |gT∗ (t, y)| ≤ const kf k1 . Using Inequalities (A.18a,b,c), we see that for all ν ≥ 3, n ≥ 1, if is small, we have h(M0,T0 (gT∗ |·))2n i ≤ const kf k2n 1 and ν>4 const kf k2n 1 2n n (1) 2n ν = 4 h(M0,T0 (gˆ T )|·)) i ≤ const kf k1 (log+ Tν0 ) (3.11) n(2− 2 ) const kf k2n T ν < 4. 1 0
542
C. Boldrighini, R.A. Minlos, A. Pellegrinotti
Taking n large enough, by the Borel-Cantelli lemma, we see that
√1 T
M0,T0 (gˆ T(1) |·) → 0,
5-a.e.. As for gˆ T(2) , it satisfies the inequality |gˆ T(2) (t, y)| ≤ const kf k1 t, and we have to apply Inequalities (A.18a,b,c) for p = ν2 − 3. We get a table similar to (3.11), with the difference that we have convergence for ν ≥ 7, logarithmic divergence for ν = 6, and n(3− ν2 ) for ν < 6. Again we see that T1 M0,T0 (gˆ T(2) |·)) → 0, divergence with a power T0 5-a.e., if is small for all ν ≥ 3. In conclusion we find Rˆ T (f |ξ) − M0,T0 (ξ)µ(f ) → 0, 5-a.e., and substituting into (3.4), taking account of Theorem 1 we see that the relation (3.3) is proved for f ∈ C 1 , and ξ in some subset f ⊆ of full 5 measure. One can now consider a discrete set of functions ℘ ⊂ C 1 , which is dense in C 0 . ˆ = ∩f ∈℘ f , and let f be any function of C 0 . For any δ > 0 Consider the intersection ˆ we have we can find a function f∗ ∈ ℘ such that kf − f∗ k0 < δ2 . If ξ ∈ lim sup |µξT (f ) − µ(f )| ≤ T →∞
≤ 2kf − f∗ k0 + lim sup |µξT (f∗ ) − µ(f∗ )| ≤ δ. T →∞
Hence the result.
Remark. Theorem 3 implies convergence 5-a.e. of the probabilities X πT (y|ξ) √ y−bT ∈ T G
for any region G such that its boundary has vanishing Lebesgue measure, a result which, due to the lattice structure, would be hard to obtain by explicit estimates. Proof of Theorem 4. Taking α ∈ ( 23 , 1), and T0 = [T α ], as above, we see by Inequality √ 2n ˜ (3.6) that for √ all ν ≥ 5 the L norm of T RT goes to zero as T → ∞, and the same happens to T (M0,T (ξ) − M0,T0 (ξ)). Moreover if n is large enough, by the Chebyshev inequality and the Borel-Cantelli lemma, both quantities tend to 0 5-a.e.. Hence, taking into account the expression (3.8) of Rˆ T (f |ξ), it is enough to consider the limit √ ˆ T (f |ξ) − µ(f )ZT0 (ξ) = lim M0,T0 (gT∗ |ξ) + M0,T0 (gT0 |ξ). lim T µ(0) (f ) + R T T →∞
T →∞
We have 1 M0,T0 (gT∗ |ξ) − M0,T0 (ξ)µ(f Q1 ) = √ M0,T0 (gT∗∗ |ξ), (3.12) T √ √ ∗∗ with gT∗∗ (t, y) = T ( T (µ(0) T −t (f ) − µ(f )) − µ(f Q1 )), and |gT (t, y)| ≤ const kf k2 , by Inequality (3.2). Repeating the same arguments as above, we see that √1T M0,T0 (gT∗∗ |ξ) → 0 5-a.e. if is small enough, for all ν ≥ 5. By expanding f to second order in (3.7b) we write gT0 as 1 gT0 = gT(1) − √ gT00 , 2 T where
(3.13a)
A.S. Central Limit Theorem for Directed Polymers and Random Corrections
gT(1) = µ(0) T −t (∇f ) · (y − bt) 1 1 gT00 = gT(2) + g˜ T(2) − gˆ T(11) + √ gˆ T(12) − gˆ T(22) , T T and, setting h(u) = u · ∇f (u) we have
543
(3.13b), (3.13c)
t g˜ T(2) = t r( )µ(0) (h), gT(2) = tµ(0) T −t (h), T T −t X X gˆ T(11) = P0T −t (z)fij (z)(y ˜ − bt)i (y − bt)j , ij
z
t X X T −t zj − bj (T − t) √ gˆ T(12) = t(1 + r( )) , P0 (z)fij (z)(y ˜ − bt)i T ij z T −t gˆ T(22) =
(3.13d)
1 2 t t (1 + r( ))2 4 T XX (zi − bi (T − t)) (zj − bj (T − t)) , × P0T −t (z)fij (z) ˜ T −t z ij
z˜ being a point depending on y, z, T and t. Further we write gT(1) = µ(∇f ) · (y − bt) + √1 g (1∗) , so that T T 1 M0,T0 (gT(1) |ξ) − µ(∇f ) · Eˆ T0 (ξ) = √ M0,T0 (gT(1∗) |ξ). T
(3.14)
Again by Inequality (3.2) we see that |gT(1∗) | ≤ const kf k2 and √1T M0,T0 (gT(1∗) |ξ) → 0, 5-a.e., for small enough. A tedious but straightforward check, which makes use once again of Inequalities (A.18a,b,c), shows that √1T M0,T0 (gT00 |ξ) does the same. Therefore, taking into account (3.12), (3.13a), and (3.14), we see that, if is small lim M0,T0 (gT∗ |ξ) + M0,T0 (gT0 |ξ) − M0,T0 (ξ)µ(f Q1 ) − Eˆ T0 (ξ) · µ(∇f ) = 0. T →∞
The conclusion, for the given f , is obtained by applying Theorems 1 and 2. The proof of assertion i) now comes, as for Theorem 3, by taking intersections. The proof of assertion ii) follows the same lines, and we will not repeat obvious details. First of all we take α ∈ ( 45 , 1), and we see, again by Inequality (3.6), that the L2n norm ˜ goes to zero as T → ∞. The same happens to the quantities T (ZT (ξ) − ZT0 (ξ)), of √ √ T RZT (ξ) T T ( Z(ξ) − 1) and T (Eˆ − Eˆ T0 ). If is small enough all these quantities go to 0 5-a.e. Therefore, taking into account Relations (3.8), (3.12), (3.13a) and (3.14), and using (3.2) for s = 2, we see that the limit on the left of Eq. (1.13a) is equal to 1 1 lim µ(f Q2 ) + M0,T0 (gT∗∗ |ξ) + M0,T0 (gT(1∗) |ξ) − M0,T0 (gT00 |ξ) . Z(ξ) T →∞ 2 Using Inequality (3.2) we see that M0,T0 (gT∗∗ |ξ) − M0,T0 (ξ)µ(f Q2 ) → 0, 5-a.e., and the same happens for the quantity M0,T0 (gT(1∗) |ξ) − µ(Q1 Eˆ T0 (ξ) · ∇f ). As for gT00 , expanding the difference (3.7b) to third order, we see that we have to take into account only the terms gT(2) and
544
C. Boldrighini, R.A. Minlos, A. Pellegrinotti
gT(11) (t, y) =
X ij
µ(0) T −t (fij )(y − bt)i (y − bt)j .
Using once more the results of the appendix we see that for all ν ≥ 7, if is small PT0 −1 P enough, the quantities M0,T0 (gT(2) |ξ) − µ(h) t=0 y tM (t, y|ξ) and M0,T0 (gT(11) |ξ)
−
X
µ(fij )
ij
TX 0 −1 X t=0
M (t, y|ξ)(y − bt)i (y − bt)j
y
tend to 0, 5-a.e., and consequently lim [M0,T0 (gT(11) |ξ) − M0,T0 (gT(2) |ξ) −
T →∞
X ij
T0 µ(fij )Cˆij ] = 0.
The conclusion now comes by Theorems 1 and 2.
4. Proof of Theorem 5 For the proof we need the following lemma. 1
Lemma 4.1. Let (T, y) be such that |y − bT | ≤ KT 2 +β , β ∈ (0, 16 ). One can find a (t1 , y1 ),P . . . , (tn , yn ) constant c1 , independent of T and y, such that for any collection Pn n of points in space-time satisfying the conditions ti > 0, i=1 ti = T , and i=1 yi = y, the following inequality holds: Qn ν ti T2 n i=1 P0 (yi ) ≤ c (4.1) ν . 1 Qn 2 P0T (y) t i=1 i Proof. The proof is based on some results of the Cram´er theory of large deviations (see, e.g. [Ne]), which we briefly recall. Let λ ∈ Rν and consider the function X ˆ P0 (u)e(λ,u) , uˆ = u − b. (4.2) ψ(λ) = u
ψ(λ) is analytic, ψ(0) = 1, and ∇ψ(0) = 0. Moreover cij (λ) = ∂λ∂i ∂λj log ψ(λ) are the elements of the covariance matrix C(λ) for the “shifted random walk” with transition ˆ probabilities Pλ (u) = (ψ(λ))−1 e(λ,u) P0 (u). C(λ) is positive definite for all λ ∈ Rν , and cij = cij (0). Let H be the convex hull in Rν of the points {uˆ : P0 (u) > 0}, and H(0) its interior part. For α ∈ H(0) the equation (in λ) ∇ log ψ(λ) = α has a unique solution λ(α), which is an analytic function in H(0) . The “Cram´er function" 2
3(α) = sup [(α, λ) − log ψ(λ)] = (α, λ(α)) − log ψ(λ(α)), λ∈Rν
(4.3)
is also analytic in H(0) , and ∇3(α) = λ(α). Moreover if α = u−bt ∈ H(0) we have t X ˆ −t(λ(α),α) P0t (u) ≤ P0t (v)e(λ(α),v) e = e−t[(α,λ(α))−log ψ(λ(α))] = e−t3(α) . (4.4) v
A.S. Central Limit Theorem for Directed Polymers and Random Corrections
545
Setting β¯ = min{P0 (u) : P0 (u) > 0}, if P0t (u) > 0 it follows that P0t (u) ≥ β¯ t , and the previous inequality implies 3(α) ≤ 1t log P t1(u) ≤ log β¯ −1 . Therefore 3 is a 0 bounded strictly convex function defined in a ν-dimensional polytope, which implies (see [Ro], Th. 10.3) that it can be extended to a convex continuous function to the whole H. Hence the Inequality P0t (u) ≤ e−t3(α) holds for all α ∈ H, and holds for all α ∈ Rν if we set 3(α) = +∞ for α ∈ / H. For α ∈ H(0) we have, by a standard computation Z e−t3(α) t dτ1 . . . dτν e−i(τ,u) (φα (τ ))t , P0 (u) = (4.5a) (2π)ν where φα (τ ) is the characteristic function of the shifted random walk with λ = λ(α), α = u−bt t . Making use of the standard results on the local limit theorem for random walks in Zν , one gets for large t the asymptotics 1
P0t (u) =
C 2 (α) −t3(α) rt (α) (1 + 1 ), ν e (2πt) 2 t2
(4.5b)
Here C(α) is the determinant of the matrix A(α) with elements aij (α) = ∂λi ∂αj ,
∂2 ∂αi ∂αj 3(α)
=
which is the inverse of the matrix C(λ(α)) and is therefore positive definite for all α ∈ H(0) . Moreover the analyticity of 3 implies that rt (α) is uniformly bounded for all t and α in any bounded set A such that the closure A is contained in H(0) . We now come to the actual proof of the lemma. Let a > 0 be so small that the region y−bT i Ba = {α : |α| ≤ a} ⊂ H(0) . We set αi = yi −bt ti , α = T , and 1
C∗ = sup max
t≥1 α∈Ba
rt (α) C 2 (α) ) < ∞. ν (1 + 1 (2π) 2 t2
(4.6a)
Let us suppose that n1 ≤ n of the αi ’s, satisfy the inequality |αi | ≤ a, and the remaining n2 = n − n1 belong to H but do not satisfy this inequality. Since α → 0 for T → ∞ we see, by relations (4.4) and (4.5b) that there is a constant c such that P0T (y) ≥ cν2 e−T 3(α) T and that the inequality Qn ν ti C∗n1 T 2 i=1 P0 (yi ) ≤ ν G(α1 , . . . , αn ; α), c Qn t 2 P0T (y) i=1 i holds, where we have labeled the αj ’s so that |αi | ≤ a for i = 1, . . . , n1 , and G(α1 , . . . , αn ; α) = ti T
Pn
n1 Y i=1
e−ti 3(αi )
n Y
ν
ti2 e−ti 3(αi ) eT 3(α) .
i=n1 +1
Setting ri = we have i=1 ri αi = α. Consider the expression 1 = 3(α)). By a Taylor expansion we can write
Pn
i=1 ri (3(αi )
1 3(αi ) − 3(α) = (αi − α, λ(α)) + (αi − α, A(αi∗ )(αi − α)), 2
−
(4.6b)
where αi∗ is a point intermediate between P α and αi . Since 2the2 shifted random walks Pλ ¯ z , where D ¯ = max{|u−b| : have the same range as P0 , we see that ij cij (λ)zi zj ≤ D ∗ (0) ν ¯ −2 , P0 (u) > 0}, so that for any α ∈ H and α ∈ R we find, setting a0 = D
546
C. Boldrighini, R.A. Minlos, A. Pellegrinotti
(α, A(α∗ )α) ≥ a0 α2 .
Pn
(4.6c)
Hence 1 ≥ a0 i=1 ri (αi − α)2 , and, since for i = n1 + 1, . . . , n we have, for T large enough, |αi − α| > a2 , we get, for some positive constant γ, n X ν (γti − log ti ) . G(α1 , . . . , αn ; α) ≤ exp − 2 j=n1 +1
Taking K = − minx≥1 (γx − proves the lemma.
ν 2
log x), we see that G(α1 , . . . , αn ; α) ≤ eKn2 , which
Remark. Replacing in (4.6b) αi by α and α by 0, since 3(0) = 0, by (4.6c) we get a0 2 3(α) ≥ a0 α2 . Inequality (4.4) then gives P0t (u) ≤ e− 2t (u−bt) . For |α| < a by (4.5b) a0 2 ν and (4.6a) we have P0t (u) ≤ C∗ t− 2 e− 2t (u−bt) . Hence for any a∗ < a0 we can find a constant c2 such that for all u ∈ Zν and t ≥ 1, (u−bt)2
e−a∗ 2t ≤ c2 . ν t2 Proof of Theorem 5. We set, in analogy with (1.5b), X X ˆ (t, y|ξ) = P0t1 (y1 )M (t1 , y1 , t, y|ξ), M P0t (u)
(4.6d)
(4.7a)
1≤t1 ≤t y1
and ˆ T,T 0 (ξ) = M
0 TX −1 X
t=T
ˆ (t, y|ξ). M
(4.7b)
y
The proof of the existence 5-a.e., for small , of the limit ˆ 0,T (ξ) ˆ M(ξ) = lim M T →∞
(4.8)
follows from Theorem 1. We introduce further the quantities X X ∗ M(T,y) (t, z|ξ) = M (t, z; t1 , y1 |ξ)P0T −t1 (y − y1 ), t≤t1
and
0
MT(T,y) (ξ) =
X X T 0 ≤t
∗ M(T,y) (t, z|ξ).
(4.9)
z
Theorem 1 implies also the existence 5-a.e., for small , of the limits M∗(T,y) (ξ) = and it is easily seen that We write
lim
T 0 →−∞
0
MT(T,y) (ξ),
(4.10)
ˆ (T,y) ). M∗(T,y) (ξ) = M(η
(4.11)
ZT (ξ)πT (y|ξ) = P0T (y)(1 + L(T,y) (ξ)),
(4.12a)
where, introducing for each set B the time projection β(B) = {t : (t, y) ∈ B} we have
A.S. Central Limit Theorem for Directed Polymers and Random Corrections
X
L(T,y) (ξ) =
|B| MB (ξ)
B:β(B)⊂[0,T −1]
547
T −t (B)
P0t0 (B) (y0 (B))P0 f (y − yf (B)) . (4.12b) P0T (y)
Let T0 = [T α ], where α ∈ (0, 1) will be determined in due time. We divide L(T,y) (ξ) into two pieces by setting X
L(2) (T,y) (ξ) =
|B| MB (ξ)
B:β(B)⊂[0,T −1] β(B)∩1T 6=∅
T −t (B)
P0t0 (B) (y0 (B))P0 f (y − yf (B)) , P0T (y)
(2) where 1T = [T0 + 1, T − T0 − 1], and L(1) (T,y) (ξ) = L(T,y) (ξ) − L(T,y) (ξ). For what follows we need one more lemma.
Lemma 4.2. For all n = 1, . . ., the following inequality holds for small enough: 2n const . (4.13) (ξ) i ≤ nα( ν −1) h L(2) (T,y) 2 T +1 Proof. The proof follows the same lines as for Proposition A.1; we will report in detail only the main points. For each B such that β(B) ⊂ [0, T − 1] we set Bˆ = B ∪ {(T, y)}. We have, in analogy with Inequality (A.4a), for all k = 1, 2, . . ., k h L(2) (ξ) i ≤ (T,y)
X
|Bi |
k Y N (Bˆ i ) i−1
(B1 ,...,Bk )∈Bk (0,T ) β(Bi )∩1T 6=∅
P0T (y)
.
(4.14)
We define the “central point”(tc (B), yc (B)) of B as follows. tc (B) is the point of β(B) for which the distance |t − T2 | is minimal, if it is unique, and if there are two such points t1 < t2 we set tc = t1 . yc (B) is then determined by the condition (tc (B), yc (B)) ∈ B. We proceed as in the proof of Proposition A.1, with small changes. We define, for a given choice of B1 , . . . , Bk the “minimal central point" (t∗ , y∗ ), and the “minimal collection" (Bi1 , . . . , Bis ) with multiplicity s by the condition that (t∗ , y∗ ) = (tc (Bi1 ), yc (Bi1 )) = . . . = (tc (Bis ), yc (Bis )), and (t∗ , y∗ ) < (tc (Bj ), yc (Bj )) for j ∈ / {i1 , . . . , is }. The (1) (2) definition of the sets Bj , Bj , etc., is as in the proof of Proposition A.1, and we ˆ = m, βˆ = β(B) ˆ = {t0 , . . . , tm−1 }, and `1 = t0 , set Bˆ j(i) = Bj(i) ∪ {(T, y)}. Let |B| `j = tj−1 − tj−2 . By Lemma 4.1 we have, for some constant c¯, ν
ν
ˆ cm T 2 c¯m T 2 N (B) ˆ ≤ Q1m ν ≤ Qm ν := MT (β). T P0 (y) `2 (` 2 + 1) i=1 i
i=1
(4.15)
i
The analogue of Equality (A.8) (obtained by replacing Bj(1) with Bˆ j(1) ) holds, and Inequality (A.9) implies, setting τ = min{t, T − t}, X β:tc (β)=t
ν
ˆ ≤ (K)|β| MT (β)
C0 C 00 T 2 ≤ ν ν (t + 1)((T − t) 2 + 1) τ 2 +1 ν 2
(4.16)
for some constants C 0 , C 00 . Hence, if the minimal collection is made of the only set Bk (i.e., s=1), then we get, in analogy with Inequality (A.6),
548
C. Boldrighini, R.A. Minlos, A. Pellegrinotti
X
Y |B (1) ∪B | ˆ k−1 |βk | N (Bk ) jk j N (Bˆ j(1) T P (y) 0 j=1 yk (t0 ):t0 ∈βˆ k X
βk :tc (βk )=t βˆ k ,κ,κ0
k−1 C0 Y ∪ Bjk ) ≤ ν N (Bˆ j(1) ). τ 2 + 1 j=1
The analysis for s > 1 proceeds in the same way as in the Appendix. Taking into account that all the central points tc (Bj ) ∈ 1T , and that at least two of them must overlap, we find, in analogy with Inequality (A.3), and repeating the steps that led to Inequality (3.6), that
h
2n
L(2) (T,y) (ξ)
i≤
2n−1 X k=1
≤
T
X P
0
C (n1 , . . . , nk )
[2] k Y X j=1 tj =T0 +1
n1 ,...,nk ≥1,max nj >1
2 ν 2
(tj + 1)mj
nj =2n
const . ν T nα( 2 −1) + 1
(4.17)
By the usual Borel-Cantelli argument we see that the quantity max y:|y−bT |≤T
1 +β 2
(2) L(T,y) (ξ)
goes to 0 as T → ∞ 5-a.e., if nα( ν2 − 1) > 1. (1) 0 00 000 L(1) (T,y) can be written as a sum of three pieces: L(T,y) = L(T,y) + L(T,y) + L(T,y) with
X
L0(T,y) (ξ) =
P0t0 (B) (y0 (B))|B| MB (ξ)
T −tf (B)
P0
B:β(B)⊂[0,T0 ]
X
L00(T,y) (ξ) =
B:β(B)⊂[T −T0
L000 (T,y) (ξ)
X
=
(y − yf (B))
P0T (y)
,
P0t0 (B) (y0 (B)) |B| T −t (B) MB (ξ)P0 f (y − yf (B)), T (y) P 0 ,T −1] P0t0 (B1 ) (y0 (B1 ))|B1 | MB1 (ξ)
B1 :β(B1 )⊂[0,T0 ] B2 :β(B2 )⊂[T −T0 ,T −1]
t (B2 )−tf (B1 )
P0 0
(y0 (B2 ) − yf (B1 )) |B2 | T −t (B) MB2 (ξ)P0 f (y − yf (B)). P0T (y)
Taking α ∈ (0, 21 ) and recalling the position T0 = [T α ], it is easy to see that there are positive constants K 0 and γ such that K0 P T −τ (y + z) 0 − 1 ≤ γ . max max max T 1 +β 0≤τ ≤2T0 z:|z|≤DT T P (y) 0 0 y:|y−bT |≤KT 2 Once again the Chebyshev inequality and the Borel-Cantelli lemma give that
A.S. Central Limit Theorem for Directed Polymers and Random Corrections
max |y−bT |≤KT
max |y−bT |≤KT
max |y−bT |≤KT
max |y−bT |≤KT
1 +β 2
1 +β 2
1 +β 2
1 +β 2
549
0 L(T,y) (ξ) − M0,T0 +1 (ξ) → 0, 00 T −T L(T,y) (ξ) − M(T,y)0 (ξ) → 0, 000 T −T L(T,y) (ξ) − M0,T0 +1 (ξ)M(T,y)0 (ξ) → 0, T −T0 M(T,y) (ξ) − M∗(T,y) (ξ) → 0,
as T → ∞, 5-a.e., if is small. Hence we find that 1 + LT,y) (ξ) − (1 + M0,T (ξ))(1 + M∗(T,y) (ξ)) → 0. max |y−bT |≤KT
1 +β 2
By Theorem 1 we have
πT (y|ξ) ∗ − (1 + M(T,y) (ξ)) → 0 lim max 1 +β P T (y) T →∞ 0 |y−bT |≤KT 2
ˆ (T,y) ) = 5-a.e.. Taking into account relations (4.11), Theorem 5 is proved with Z(η (T,y) ˆ 1 + M(η ). Appendix: L2n estimates The iteration method that we use in proving L2n -estimates is essentially the same as that of [Bo-Mi-Pe] for random walks in a dynamical random environment. Details and results are however different, and in fact proofs for the polymer models are simpler. Proofs are carried out in full, but we refer to [Bo-Mi-Pe] for some standard parts. We will need the following simple inequality (see [Bo-Mi-Pe], Lemma A.3) : for any choice of a and b such that a > 1 and a ≥ b ≥ 1, one can find a constant c(a, b) such that T X c(a, b) 1 1 ≤ b . (A.1) a b n + 1 (T − n) + 1 T +1 n=0
The main result of the Appendix concerns the quantity MT,T 0 defined by Eq. (1.6b), which can be written as X |B| P0t0 (B) (y0 (B))MB (ξ), (A.2) MT,T 0 (ξ) = B∈B(T,T 0 )
where T and T 0 are integers such that T 0 > T > 0, B(T, T 0 ) is the collection of the B’s with endpoint in the interval [T, T 0 ), and MB (ξ) is given by Eq. (1.5a). Proposition A.1. For any integer n > 0 and ν ≥ 3 the following inequalities hold for small enough, depending on n: h(MT,T 0 (ξ))2n i ≤
2n−1 X k=1
X n1 ,...,nk ≥1,max nj >1 k nj =2n j=1
P
0
C(n1 , . . . , nk )
k TX −1 Y j=1 tj =T
1 ν 2
(tj + 1)mj
, (A.3)
where mj = max{1, nj − 1} and C(n1 , . . . , nk ) are constants which depend on .
550
C. Boldrighini, R.A. Minlos, A. Pellegrinotti
Proof. By formula (A.2), and the fact that the random variables {ξt (x) : (t, x) ∈ Z×Zν } are independent with zero average, we have, for any positive integer k, k Y
X
|h(MT,T 0 rime )k i| ≤
|Bi | N (Bi ),
(A.4a)
(B1 ,...,Bk )∈Bk (T,T 0 ) i=1
where N (B) =
P0t0 (y0 )
|B|−1
Y
t −tj−1
P0 j
(yj − yj−1 )
(A.4b)
j=1
with the convention N (∅) = N ((0, 0)) = 1. The sum runs over the class Bk (T, T 0 ) of all collections (B1 , . . . , Bk ) of subsets of Z × Zν such that T ≤ tf (Bj ) < T 0 , which possess the property of “covering” i.e., [ Bi , j = 1, . . . , k. Bj ⊆ i=1,...,k i6=j
We denote by < the lexicographic order of the points (t, y) ∈ Z × Zν . Let Bi1 , . . . , Bis be the collection of the sets (B1 , . . . , Bk ) for which (tf (Bi1 ), yf (Bi1 ) ) = . . . = (tf (Bis ), yf (Bis ) ) ≡ (tˆ, y) ˆ < / {i1 , . . . , is }. < (tf (Bj ), yf (Bj )), j ∈ (tˆ, y) ˆ is called the “left point" of the collection (B1 , . . . , Bk ), the number s its “multiplicity", and the collection (Bi1 , . . . , Bis ) the “minimal collection”. We denote by Bks (T, T 0 ) the set of the collections (B1 , . . . , Bk ) such that the minimal collection has multiplicity s, and it is not restrictive to assume that it is made of the sets (Bk−s+1 , . . . , Bk ). Taking into account the number of possible choices of the sets in the minimal collection we have k X X X k = . s s 0 0 (B1 ,...,Bk )∈Bk (T,T )
s=1
(B1 ,...,Bk )∈Bk (T,T )
Assume that k > 2, and let Bj,k , j = 1, . . . , k − 1, denote the subset of the points (τ, x) ∈ Bk ∩ Bj which do not belong to any other Bi , i 6= j. Their union is denoted as Bˆ k ⊂ Bk . Let furthermore Bj(1) = Bj \ Bj,k , j = 1, . . . , k − 1. Clearly the sets (1) ) also possess the property of covering. (B1(1) , . . . , Bk−1 We can recover all k-tuples (B1 , . . . , Bk ) in Bk which correspond to a given choice of (1) ) ∈ Bk−1 in the following way. Let β(B) = {τ : (τ, y) ∈ B} ⊂ Z denote (B1(1) , . . . , Bk−1 the time projection of B. B can be understood as the graph of a function y : β(B) → Zν . We shall use the notation βi = β(Bi ), βjk = β(Bj,k ), yi and yjk being the corresponding functions, and also βˆk = β(Bˆ k ), βj(1) = β(Bj(1) ), yj(1) , etc.. We can assign Bk by giving first a set of times βk , and a subset βˆk ⊆ βk . Because of the property of covering the subset βˆk must satisfy the following conditions: (1) ˆ βk \ ∪k−1 j=1 βj ⊆ βk ,
βˆk ∩ (
k−1 \ j=1
βj(1) ) = ∅.
(A.5)
A.S. Central Limit Theorem for Directed Polymers and Random Corrections
551
For τ ∈ βk \ βˆk we then have to assign the points yk (τ ) in such a way that they coincide with one of the points yj(1) (τ ). By the property of covering yi(1) (τ ) ∈ ∪j6=i yj(1) (τ ), so there are at most [ k−1 2 ] choices. Hence the number of possible choices of the space |βk \βˆ k | positions yk (τ ) for all τ ∈ βk \ βˆk does not exceed ([ k−1 . Having fixed βˆk 2 ]) and a possible choice κ of the space positions, we choose an admissible partition of βˆk , which specifies for each τ ∈ βˆk the index j of the set Bj with which Bk has a common point with time coordinate τ . We denote by κ0 any such partition, and their number does ˆ not exceed (k − 1)|βk | . The whole collection (B1 , . . . , Bk ) is now completely recovered by assigning the space coordinates yk (τ ) for τ ∈ βˆk . We perform the sum in (A.4a) by summing first over all possible Bk ’s, and partitions (1) ), with the property of of Bˆ k , for a given choice of the k − 1-tuple (B1(1) , . . . , Bk−1 covering. The main point in the proof is that there is a way of majorizing the result of the sum as a factor times an expression similar to (A.4a), written for the k − 1 sets (1) ). (B1(1) , . . . , Bk−1 We consider first the case s = 1, i.e., the minimal collection is made of one set. Recalling the discussion above, and observing that (tf (Bj(1) ), yf (Bj(1) ) = (tf (Bj ), yf (Bj )), for j = 1, . . . , k − 1, we can write for a fixed choice of tf (Bk ) = t ∈ [T, T 0 ), X
k Y
(B1 ,...,Bk )∈B1 (T ,T 0 ) k tf (Bk )=t
i=1
X
X
X
X
(1) (B1(1) ,...,Bk−1 )∈Bk−1 (T,T 0 )
βk tf (βk )=t
βˆ k ,κ,κ0
|Bi | N (Bi ) =
|Bk | N (Bk )
k−1 Y
(1)
|Bj
∪Bj,k |
N (Bj(1) ∪ Bj,k ).
j=1
yk (τ ):τ ∈βˆ k
Here tf (β) denotes the maximal point of β, κ is a possible choice of yk (τ ) for τ ∈ βk \ βˆk , κ0 is an admissible partition of βˆk , and the sum is over all choices of βk , βˆk ⊆ βk that satisfy conditions (A.5). We recall that βk , βˆk , κ, κ0 , and the values {yk (τ ) : τ ∈ βˆk } completely determine Bk as well as the partition {Bj,k : j = 1, . . . , k − 1} of Bˆ k . We shall now show that there is a constant C, depending on and k, such that X
X
βk :tf (βk )=t βˆ k ,κ,κ0
yk (τ ):τ ∈βˆ k
|Bk | N (Bk )
k−1 Y
(1)
|Bj
∪Bj,k |
N (Bj(1) ∪ Bj,k ) ≤
j=1
(A.6)
k−1 C Y |Bj(1) | N (Bj(1) ). ≤ ν t 2 + 1 j=1
The expression (A.4b) of N (B), setting |B| = m, β = β(B) = {t0 , . . . , tm−1 }, tf (B) = tm−1 = t, and `1 = t0 , `j = tj−1 − tj−2 , j > 1, using Inequality (4.6d), gives, for some constant c∗ > 0, N (B) ≤ M (β) ≡ Qm
cm ∗
ν 2
i=1 (`i
We have, for j = 1, . . . , k − 1
. + 1)
(A.7)
552
C. Boldrighini, R.A. Minlos, A. Pellegrinotti
X
N (Bj(1) ∪ Bj,k ) = N (Bj(1) ).
(A.8)
{yk (τ ):τ ∈βjk }
In fact, consider the case j = 1, set m1 = |B1(1) |, and denote by (ti , yi ), i = 0, . . . , m1 − 1 i = β1k ∩ (ti−1 , ti ), with the the points of B1(1) , in order of increasing time. Let β1k i convention t−1 = 0. If we perform the sum over the space positions yk (τ ), for τ ∈ β1k ti −ti−1 we get a factor P0 (yi − yi−1 ), which proves Eq. (A.8). It follows that for βk , βˆk κ and κ0 fixed we have X
|Bk | N (Bk )
(1)
|Bj
∩Bj,k |
N (Bj(1) ∩ Bj,k ) ≤
j=1
yk (τ ):τ ∈βˆ k
≤ |βk | M (βk )
k−1 Y
k−1 Y
(1)
|βjk | |Bj | N (Bj(1) ) = |βk | |βk | M (βk ) ˆ
j=1
k−1 Y
(1)
|Bj | N (Bj(1) ).
j=1
|βk |−|βk | We have at most (k − 1)|βk | possible partitions κ0 of βˆk and at most [ k−1 2 ] |βk | possible partitions κ. Taking into account that for βk fixed one can choose in p ways a subset βˆk of cardinality p, and summing over p, we see that the sum on the left side of Inequality (A.6) for βk fixed is bounded by ˆ
ˆ
(K)|βk | M (βk )
k−1 Y
(1)
|Bj | N (Bj(1) ),
j=1
where K = (k − 1) + [ k−1 2 ]. The final step consists in showing that for small enough there is a constant C such that X
(K)|β| M (β) ≤
β:tf (β)=t
C . ν t2 +1
(A.9)
Summing over all β with |β| = m is equivalent to summing over all values of `i ≥ 0 such that their sum is t. Hence, by an iterated application of Inequality (A.1), we see that the sum in (A.9) for |β| = m is bounded by (Kc∗ c( ν2 , ν2 ))m , ν t2 +1 and we have to sum over m from 1 to t. If now Kc∗ c( ν2 , ν2 ) < 1 we get Inequality (A.6) with C −1 = 1 − Kc∗ c( ν2 , ν2 ). We next consider the case when the minimal collection is made of two sets, i.e., the sum for (B1 , . . . , Bk ) ∈ Bk2 (T, T 0 ). We denote the coordinates of the common endpoint of Bk−1 and Bk by (t, y). If (t, y) ∈ ∪k−2 j=1 Bj , then we simply apply the previous procedure two times, the first time for the Bj ’s and the second time for the Bj(1) ’s. The result is
A.S. Central Limit Theorem for Directed Polymers and Random Corrections
X
k Y
(B1 ,...,Bk )∈Bˆ 2 (T ,T 0 ) k ˆ 1 )∈∪k−2 Bj (t,y j=1
i=1
≤
0 TX −1
t=T
|Bk | N (Bk ) ≤ (A.10a)
C 2 ) ν t2 +1
(
553
k−2 Y
X
|Bj(2) |
N (Bj(2) ),
(2) (B1(2) ,...,Bk−2 )∈Bk−2 (T,T 0 ) j=1
where C is the same as before. (1) If (t, y) ∈ / ∪k−2 j=1 Bj , then let (t1 , y1 ) be the final point of Bk−1 , which of course (1) (1) belongs to ∪k−2 j=1 Bj . Summation over the coordinates yk (τ ) leads to the factor N (Bk−1 ) as above. The final result is the same inequality (A.6), with the difference that the final (1) can be smaller than T . We now repeat the procedure. Summation over time of Bk−1 (1) (2) ) leads of course Bk−1 for fixed t1 and a fixed choice of the k − 2-tuple (B1(2) , . . . , Bk−2 C to a factor ν2 . Summing over t1 we find t1 +1
X
k Y
(B1 ,...,Bk )∈B2 (T ,T 0 ) k k−2 ˆ 1 )∈∪ (t,y / Bj j=1
i=1
0 TX −1
t1 =0
0 TX −1
C ν 2
t1 + 1
t=T
|Bi | N (Bi ) ≤ (A.10b)
C t +1 ν 2
k−2 Y
X
(2) (B1(2) ,...,Bk−2 )∈Bk−2 (T,T 0 ) j=1
Putting together Inequalities (A.10a,b), setting C2 = C 2 k Y
X
(2)
|Bj | N (Bj(2) ).
P∞
ν 2
t1 =0 (t1
+ 1)−1 , we find
|Bi | N (Bi ) ≤
2 (T,T 0 ) i=1 (B1 ,...,Bk )∈Bk
≤
0 TX −1
t=T
C2 ν t2 +1
X
k−2 Y
(2) (B1(2) ,...,Bk−2 )∈Bk−2 (T,T 0 )
j=1
(A.10c)
|Bj(2) |
N (Bj(2) ).
If the multiplicity of the minimal set is s > 2 we repeat s − 2 times the same procedure as for s = 1 and then apply the one for s = 2. In conclusion we find, for all s < k, X
k Y
(B1 ,...,Bk )∈Bs (T ,T 0 ) k k−2 ˆ 1 )∈∪ (t,y / Bj j=1
i=1
≤
0 TX −1
t=T
Cs (t + 1)s−1 ν 2
|Bk | N (Bk ) ≤
X
k−s Y
(s)
|Bj | N (Bj(s) ),
(s) (B1(s) ,...,Bk−s )∈Bk−s (T,T 0 ) j=1
where Bˆks (T, T 0 ) denotes the collection of the sets (B1 , . . . , Bk ) such that the minimal set is made of Bk−s+1 , . . . , Bk . The case s = k is easy to handle, by applying k − 1 times the procedure for s = 1 and observing that B1k−1 = ∅. Iterating the procedure we get Inequality (A.3).
554
C. Boldrighini, R.A. Minlos, A. Pellegrinotti
We also need a refinement of the preceding result, which we state as a Corollary. Let g(t, y) be a real function such that |g(t, y)| ≤ ts |y − bt|r , for some r, s ≥ 0. Then the following result holds. Corollary A.2. Let MT,T 0 (g|ξ) =
X
|B| MB (ξ)g(tf (B), yf (B)).
(A.11)
B∈B(T,T 0 )
Then for any integer n > 0 and ν ≥ 3 the following inequality holds for small enough, depending on n h(MT,T 0 (g|ξ)) i ≤ 2n
r
0
2n−1 X
X
k=1
n1 ,...,nk ≥1,maxj nj >1 k nj =2n j=1
Cr (n1 , . . . , nk )
( +s)nj k TX −1 Y tj 2 ν
j=1 tj =T
P
(tj2 + 1)mj
,
(A.12) where the constants Cr (n1 , . . . , nk ) depend on . Proof. Observe first that, by Inequalities (4.1) and (4.6d) we have n Y
2
t P0 j (yj )
≤
j=1
(y−bt) e−a∗ 2t c2 cn1 Qn ν 2 j=1 tj
,
(A.13)
Pn Pn for all n ≥ 1, tj ≥ 1, yj ∈ Zν such that j=1 tj = t, j=1 yj = y. Going back to Expression (A.4b) we have, in analogy with Inequality (A.7), r
r c2 c m K r t 2 N (B)|y − bt|r ≤ Q1m ν = Kr t 2 M 0 (β), 2 i=1 `j
(A.14)
ξ2
where Kr = maxξ∈Rν e−a∗ 2 |ξ|r , β is as above the time projection of B and M 0 (β) differs from M (β) defined in (A.7) only in that the constant c∗ is replaced by some other constant c¯∗ . In analogy with (A.4a) we see that |h(MT,T 0 (g|ξ))k i| ≤
k Y
X (B1 ,...,Bk )∈Bk
(T,T 0 )
|Bi | N (Bi )|g(tf (Bi ), yf (Bi ))|,
i=1
and repeating almost word for word the steps of the preceding proof we get that the sum over all (B1 , . . . , Bk ) ∈ Bk1 (T, T 0 ) such that tf (Bk ) = t and the “reduced sets” (1) ) are fixed, is bounded, up to a constant factor depending on r, by (B1(1) , . . . , Bk−1 r k−1 t 2 +s Y |Bj(1) | N (Bj(1) )|g(tf (Bj(1) ), yf (Bj(1) )|. ν t 2 + 1 j=1
(A.15)
Passing now to the sum over Bk2 (T, T 0 ), we see that it gives no problems if the / ∪k−2 common final point (t, y) of Bk and Bk−1 belongs to ∪k−2 j=1 Bj . If (t, y) ∈ j=1 Bj , then (1) (1) (1) ˆ Bk−1 has a final point (t1 , y1 ), with t1 < t. We introduce the set Bk−1 = Bk−1 ∪{(t, y)}, and perform the sum over all sets (B1 , . . . , Bk ) ∈ Bk2 (T, T 0 ) such that the common point
A.S. Central Limit Theorem for Directed Polymers and Random Corrections
555
of the minimal set (t, y) is fixed. Reasoning as in the previous case we get, up to a constant factor, the expression r k−2 (1) t 2 +s Y |Bj(1) | (1) N (Bj(1) )|g(tf (Bj(1) ), yf (Bj(1) )||Bk−1 | N (Bˆ k−1 )|y − bt|r ts . ν t 2 + 1 j=1
(A.16)
(1) (1) Clearly N (Bˆ k−1 ) = P t−t1 (y − y1 )N (Bk−1 ), so that, summing over y we get a factor P t−t10 r Ht (t1 , y1 ) = y P0 (y − y1 )|y − bt| . Recalling that there are positive constants k, k 0 , P r depending on r, such that |u + v|r ≤ k(|u|r + |v|r ), and u P0t (u)|u − bt|r ≤ k 0 t 2 , we r see that Ht (t1 , y1 )| ≤ k|y1 − bt1 |r + kk 0 (t − t1 ) 2 . By Inequality (A.14) we have, for some constants Kr0 , Kr00 , r
r
r
(1) )Ht (t1 , y1 ) ≤ Kr0 [(t − t1 ) 2 + t12 ]M 0 (β1 ) ≤ Kr00 t 2 M 00 (β1 ), N (Bk−1 (1) ) and M 00 is defined as in (A.7), replacing c∗ by max{c∗ , c¯∗ }. where β1 = β(Bk−1 (1) )∈ Further we have to sum the terms (A.16) over all possible choices of (B1(1) , . . . , Bk−1 (1) (2) (2) 1 0 B (t1 , T ) for which tf (Bk−1 ) = t1 is fixed, and the set (B1 , . . . , Bk−2 ) is also fixed. The sum for t1 fixed leads now, repeating the steps that led to (A.10b), to k−2 Y
r
t2( 2 +s) ν 2
ν
(t 2 + 1)(t1 + 1)
N (Bj(2) )|g(tf (Bj(2) ), yf (Bj(2) )|,
j=1
again up to a constant factor. Finally we sum over t1 from 0 to t and get the bound k−2 Y
X
r
t2( 2 +s) const ν t2 +1
(2)
|Bj | N (Bj(2) )|g(tf (Bj(2) ), yf (Bj(2) )|,
(2) (B1(2) ,...,Bk−2 )∈B(T,T 0 ) j=1
which leads to the analogue of Inequality (A.10c). The case of higher multiplicity is treated in a similar way.
In conclusion we derive some consequences of Inequality (A.12) on convergence or divergence of the right side for large T 0 . We set q = r2 + s, and distinguish three cases according to the sign of p = ν2 − 2q − 1. ν
qn
i) p > 0. The terms tj j (tj2 +1)−mj are summable for all possible values of nj . Summing up from T to T 0 we find 0
qnj
k TX −1 Y j=1 tj =T
ν 2
tj
(tj +
1)mj
≤ T
( ν2 −q)
Pk j=1
const Pk
mj −q
j=1
(nj −mj )−k
Let k1 = card {j : nj = 1}. We have k2 = k − k1 ≥ 1 and the exponent of T in the right side of (A.17) is equal to en (k, k1 ) = (
.
Pk
j=1 (nj
ν ν − q)(2n − k) − (q + 1)k + k1 . 2 2
If k ≤ n, then 0 ≤ k1 ≤ k − 1, so that en (k, k1 ) ≥ en (k, 0) ≥ en (n, 0) = np.
(A.17)
+1 − mj ) = k2 , so that
556
C. Boldrighini, R.A. Minlos, A. Pellegrinotti
If k = n + h with h ≥ 1, then 2h ≤ k1 ≤ k − 1, and en (n + h, k1 ) ≥ en (n + h, 2h) = np + (
ν − 1)h ≥ np. 2
We conclude that for all n ≥ 1 if is small enough there is a constant depending on n, and ν such that const . (A.18a) h(MT,T 0 (g|ξ))2n i ≤ np T +1 ii) p = 0. The sums over tj in the left side of (A.17) diverge for nj = 2, as a logarithm, and converge for nj 6= 2. Hence, reasoning as above we see that h(M0,T (g|ξ))2n i ≤ const (log+ T )n .
(A.18b)
iii) p = −p∗ < 0. Then on the left side of (A.17) we have divergence as a power T p∗ for nj = 2, and may have power-law divergence, logarithmic divergence or convergence for over tj ’s other values of nj . For a fixed choice of k, and of the multiplicities nj summation P∗ from 0 to T gives a divergent factor which is equal to T to the power j (qnj − ν2 mj +1) P∗ denotes summation only over those j such that qnj − ν2 mj + 1 > 0) times (where some logarithmic factor. Let k 0 ≤ k be the number of the j’s such that qnj − ν2 mj +1 > 0, P∗ and m = j nj ≤ 2n the sum of their multiplicities. k10 will denote the number of the j’s nj = 1 (if nj = 1 gives divergence). Reasoning as for case i) we see that P∗for which ν ν ν 0 0 0 ∗ 0 0 (qn − m j j 2 j + 1) = (q − 2 )(m − k ) + (q + 1)k − 2 k1 := em (k , k1 ). If m is even, m = 2s, the same arguments as above give e∗m (k 0 , k10 ) ≤ s(2q + 1 −
ν ). 2
If m = 2s + 1 is odd the argument needs a small modification. If k 0 ≤ s then 0 ≤ k10 ≤ k 0 and ν e∗m (k 0 , k10 ) ≤ e∗m (k 0 , 0) ≤ e∗m (s, 0) = s(2q + 1 − ). 2 0 0 If k = s + h, h ≥ 1, then k1 ≥ 2h − 1, and e∗m (k 0 , k10 ) ≤ e∗m (k 0 , 2h − 1) ≤ e∗m (s + 1, 0) ≤ s(2q + 1 −
ν ). 2
Hence the largest possible divergence is for k 0 = k = n, and nj = 2 for all j, which gives no logarithmic factors, so that h(M0,T (g|ξ))2n i ≤ const T p∗ n .
(A.18c)
References [Al-Zh]
Albeverio, S., Zhou Xian Yin: A martingale approach to Directed Polymers in Random Environment. J. Theoret. Prob. 9, n. 1, 171–189 (1996) [Bo] Bolthausen E.: A note on the diffusion of directed polymers in a random environment. Commun. Math. Phys. 123, 529–534 (1989) [Bo-Mi-Pe] Boldrighini, C., Minlos, R.A., Pellegrinotti, A.: Almost-sure central limit theorem for random walks in dynamical random environment with random corrections. Probability Th. Rel. Fields. In press [De-Sp] Derrida, B., Spohn, H.: Polymers on disordered trees, spin glasses, and traveling waves. J. Stat. Phys. 51, 817–840 (1988)
A.S. Central Limit Theorem for Directed Polymers and Random Corrections
[Gi-Sk] [Im-Sp] [Ki] [Ne] [Ro] [Si]
557
Gihman, I.I., Skorohod, A.V.: The Theory of Stochastic Processes, Vol. I. Grundlehren der mathematischen Wissenschaften, Berlin-Heidelberg–New York: Springer-Verlag, 1974, 210 Imbrie, J., Spencer, T.: Diffusion of directed polymers in a random environment. J. Stat. Phys. 52, 609–626 (1988) Kifer, Yu.: The Burgers Equation with a random force and a general model for directed polymers in random environments. Preprint (1996) Ney, P.: Dominating points and the asymptotics of large deviations for random walk on Rd . Ann. Prob. 11, 158–167 (1983) Rockafellar, R.T.: Convex Analysis. Princeton: Princeton University Press, 1970 Sinai, Ya.G.: A Remark concerning Random Walks with Random Potentials. Fund. Math. 147, n. 2, 173–180 (1995)
Communicated by J. L. Lebowitz
Commun. Math. Phys. 189, 559 – 575 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Splitting of the Low Landau Levels into a Set of Positive Lebesgue Measure under Small Periodic Perturbations E.I. Dinaburg1 , Ya.G. Sinai2,3 , A.B. Soshnikov2 1 2 3
United Institute of Earth Physics, Russian Academy of Sciences, Moscow, Russia Princeton University, Mathematics Department, Fine Hall, Washington Road, Princeton, NJ 08544, USA Landau Institute of Theoretical Physics , Russian Academy of Sciences, Moscow, Russia
Received: 14 October 1996 / Accepted: 27 February 1997
Dedicated to the memory of Roland Dobrushin
Abstract: We study the spectral properties of a two-dimensional Schr¨odinger operator with a uniform magnetic field and a small external periodic field: " # 2 1 ∂2 ∂ − iBy + 2 + ε0 V (x, y), Lε0 (B) = − 2 ∂x ∂y where V (x, y) = V0 (y) + ε1 V1 (x, y), and ε0 , ε1 are small parameters. Representing Lε0 as the direct integral of onedimensional quasi-periodic difference operators with long-range potential and employing recent results of E.I.Dinaburg about Anderson localization for such operators (we assume 2π/B to be typical irrational) we construct the full set of generalised eigenfunctions for the low Landau bands. We also show that the Lebesgue measure of the low bands is positive and proportional in the main order to ε0 .
1. Introduction Spectral properties of Schr¨odinger operator describing electrons in the magnetic field have received a special attention recently in connection with attempts to explain Quantum Hall Effect ([1]-[8]). D.Thouless et al in [1] considered a two-dimensional model with constant magnetic field and a small periodic external field. In the Landau gauge it leads to the operator " # 2 1 ∂2 ∂ − iBy + 2 + ε0 V (x, y), (1.1) Lε0 (B) = − 2 ∂x ∂y
560
E.I. Dinaburg, Ya.G. Sinai,A.B. Soshnikov
where B is the value of magnetic field, V (x, y) is a smooth enough 1-periodic function, ε0 is a small parameter. (In [1] the case of external potential α cos(2πx) + β cos(2πτ y) was considered.) If ε0 equals zero, the spectrum σ (L0 (B)) of L0 (B) consists of the discrete sequence of numbers: 1 (1.2) λm = (m + )B, m ∈ Z1+ , 2 (Z1+ is the set of nonnegative integers) called Landau levels ([10]). Each level is infinitely degenerate and the differential operator (1.1) leaves invariant the subspace of functions exp(2πipx)9(y), p ∈ R1 since 1 d2 B2 −1 2 L0 (B) (exp(2πipx)9(y)) = exp(2πipx) · − (y − 2πB 9(y). + p) 2 dy 2 2 (1.3) In other words, if we consider L2 (R2 ) as a direct integral of L2 (R1 ): 2
2
L (R ) =
∞ MZ
˜ p dp, H
H˜ p ∼ L2 (R1 ),
−∞
where H˜ p consists of the functions fˆ(p, y) given by Fourier transform Z∞ e2πipx fˆ(p, y)dp,
f (x, y) = −∞
then L0 (B) is equal to the direct integral of shifted harmonic oscillators: Z∞ L˜ 0,p dp,
L0 = −∞
1 d2 B2 L˜ 0,p = − (y − 2πB −1 p)2 + 2 dy 2 2 (For the definition and properties of the direct integral see [11], vol.1 ch.2.1, vol. 4 ch. 13.16.). The eigenfunctions of L˜ 0,p are n 1 1 o B 4 m B 2 (y − 2πB −1 p) m ∈ Z1+ , (1.4) where m are Weber-Hermite functions: m (y) =
(−1)m 1 π4
1 (2m m!) 2
exp
2 y 2
dm dy m
exp(−y 2 ).
The eigensubspace, corresponding to the mth Landau level is denoted by E0(m) . It follows from the general theory of perturbations that for ε0 << 1 the operator Lε0 (B) , close to E0(m) . In this paper we will study the spectrum has invariant subspaces Eε(m) 0 , m < M (ε0 , B, V (x, y)). If external potential of the restriction of Lε0 (B) to Eε(m) 0 V (x, y) depends only on y, Lε0 is still periodic in x and exibits localization in the y direction:
Splitting Low Landau Levels into Positive Lebesgue Measure
561
Lε0 (B) (exp(2πipx)9(y)) 1 d2 B2 −1 2 = exp(2πipx) · − (y − 2πB p) + ε0 V (y) 9(y) + 2 dy 2 2 Under such periodic perturbation each Landau level λm transforms into some interval of length constm ε0 located in a O(ε0 ) neighborhood of λm . The band function, which we denote by 3(m) (p) is the mth eigenvalue of a quantum Hamiltonian −
1 d2 B2 2 y + ε0 V (y + 2πB −1 p) + 2 dy 2 2
The corresponding eigenfunctions are decaying superexponentially in the y direction. The function 3(m) will be as smooth as we want (or even analytic) if we assume the smoothness (analyticity) condition on V ( see [11] vol.4 ch.12 , [24] ). The aim of our paper is to extend the study of the low Landau bands to the case V (x, y) = V0 (y) + ε1 V1 (x, y)
(1.5)
where ε1 is a small parameter. We assume that V0 , V1 are smooth enough:
(C)
V0 (y) ∈ C 6 (S 1 ); V1 has continuous derivatives ∂ i V1 ∂y i , i ≤ 7 in the cube {x : |Im x| < δ} × {y : 0 ≤ y ≤ 1}; ∂ 7 V1 ∂y 7 (x, y) is analytic in the strip |Im x| < δ for any fixed y.
δ is some positive number, ε1 << 1. Some of our results are valid under stronger conditions on the smoothness of V0 , V1 : ∗
(C )
V0 ∈ C ∞ (S 1 ), V1 ∈ C ∞ (T 2 ); i all derivatives ∂∂yVi1 are analytic in the strip |Im x| < δ.
The spectrum of (1.1) depends on the arithmetic nature of ω = 2πB −1 . The case of rational ω was fully investigated by S.P. Novikov ([12]) and B.A. Dubrovin ,S.P. Novikov ([13],[14]). We study below the case of typical (Diophantine) irrational ω; i.e. (D)
|{n · ω}| >
C |n|κ ;
n ∈ Z1 \0
for some constants C > 0, κ > 1. Below we represent the restriction of Lε0 (B) to Eε(m) 0 as the direct integral of difference operators on the lattice with quasi-periodic coefficients which allow us to apply known results about Anderson localization for such operators (see [15–18]). We are able to construct the full family of generalized eigenfunctions {8q }q∈R1 for the low Landau levels. The corresponding band functions 3(m) are ε20 close to the band functions of the x-periodic operator obtained by setting ε1 = 0. For ε1 6= 0 we prove polynomial localization in the y direction. We formulate our main results in Sect. 2 (Theorem 3). Proposition 1 and Theorem 2 are of more auxiliary nature.
562
E.I. Dinaburg, Ya.G. Sinai,A.B. Soshnikov
2. Formulation of the Main Results If ε1 6= 0 the differential operator (1.1) no longer leaves invariant the subspace of functions exp(2πipx)9(y). Nevertheless the image of any linear combination X exp(2πi(p + n)x)9n (y) (2.1) n
is again a function of this type. Choosing in the space of functions (2.1) the basis 1 o n 1 exp(2πi(p + n)x)B 4 m B 2 (y − (p + n)ω) , the double index (m, n) runs through Z1+ × Z1 , we arrive at Proposition1. The Hilbert space L2 (R2 ) can be represented as a direct integral of l2 Z1+ × Z1 ω MZ 2 2 Hp dp, Hp ∼ l2 Z1+ × Z1 L (R ) = 0
such that the Schr¨odinger operator (1.1) equals the direct integral of difference operators Lε0 ,p acting on l2 Z1+ × Z1 : MZ
ω
Lε0 (B) =
Lε0 ,p dp,
0
1
where for h(m, n) ∈ l2 Z1+ × Z , 1 Lε0 ,p h (m, n) = (m + )B + ε0 Vm,m (p + nω) h(m, n)+ 2 X + ε0 Vm1 ,m (p + nω)h(m1 , n)+ m1 6=m, m1 ≥0
+
∞ X
ε 0 ε1
k=−∞
∞ X
(2.2)
(k) Wm (p + nω)h(m1 , n − k) 1 ,m
m1 =0
In these expressions Vm1 ,m (α) = B
1 2
Z∞
1
1
V0 (y + α)m1 (B 2 y)m (B 2 y)dy,
(2.3)
−∞
1
Z∞
1
1
V1(k) (y + α)m1 (B 2 (y + kω))m (B 2 y)dy,
(k) (α) = B 2 Wm 1 ,m
(2.4)
−∞
V1 (x, y) =
∞ X k=−∞
exp(2πikx)V1(k) (y).
(2.5)
Splitting Low Landau Levels into Positive Lebesgue Measure
563
The subspace, corresponding to the mth Landau level is generated by the vectors (m) then , as was first observed {δm,m1 δn,n1 }n∈Z1 . If we denote the projection to it by P0,p by D. Hofstadter [9] 1 (m) (m) P0,p Lε0 ,p P0,p h (m, n) = (m + )B + ε0 Vm,m (p + nω) h(m, n)+ 2 (2.6) X (k) + ε 0 ε1 Wm,m (p + nω)h(m, n − k) k
is the one-dimensional difference operator with exponentially decaying quasiperiodic coefficients. (If V (x, y) = α cos(2πy) + β cos(2πτ x) (2.6) is just the Almost Mathieu operator.) It turns out that one can find such a unitary operator U (p) that the restrictions , m ≤ M (B, ε0 , V (x, y)) have of U (p)−1 Lε0 ,p U (p) to the invariant subspaces Eε(m) 0 ,p the form similar to the r.h.s. of (2.6). This is the main result of Theorem 2. Theorem 2. Assume that the parameters ε0 , ε1 are small enough. Then there exists an integer M = M (B, ε0 , V ) tending to ∞ as ε0 , ε1 → 0 so that for any m ≤ M is the direct integral of one-dimensional difference the restriction of Lε0 (B) to Eε(m) 0 operators with exponentially decaying coefficients: MZ
ω
Lε0 (B)|Eε(m) = 0
L(m) ε0 (p)dp,
0
1
where for ϕ ∈ l2 Z , X L(m) am (n − k, p + nω)ϕ(k) ε0 (p)ϕ (n) = dm (p + nω)ϕ(n) +
(2.7)
k6=n
2
dm − (m + 1 )B − ε0 Vm,m
2 1 < const1 ε0 2 C (S ) X 2δ |k| kam (k, p)kC 2 (S 1 ) e 3 < const2 ε1 ε0
(2.8) (2.9)
k6=0
The proof of Theorem 2 uses standard methods of perturbation theory. However, some nontrivial details due to the special form of the operator Lε0 ,p remain. The proof is given in Sects. 3, 4. of operators in the sense of [18], associated The family L(m) ε0 (p) is anergodic family 1 with the dynamical system S , Tω , l and defined by some function h(m) (n, p). Here S 1 is the unit circle, Tω is the rotation x → (x + ω)mod1, l is the Lebesgue measure and h(m) (n, p) is a function of two variables n ∈ Z1 , p ∈ S 1 , such that h(m) (n, p) = dm (p), for n = 0, h(m) (n, p) = am (n, p), for n 6= 0. The matrix elements of L(m) ε0 (p) in the natural basis are given by the formula (m) (l − k, Tωk p). L(m) ε0 (p)kl = h
It follows from Theorem 2 that if d(m) (p) is a Morse function on the unit circle, having two critical points, the family L(m) ε0 (p) satisfies the conditions of the main theorem from
564
E.I. Dinaburg, Ya.G. Sinai,A.B. Soshnikov
[18]. For completeness we formulate this theorem below. Let h(n, p) ∈ C 2 (S 1 ) for any n ∈ Z1 ; h(0, p) be a Morse function with two critical points, X kh(n, p)kC 2 (S 1 ) eρ|n| < ε for some ρ > 0, n6=0
and ω satisfy the Diophantine condition (D) with constants C, κ. Theorem (18). One can find ε¯ = ε(C, ¯ κ, h(0, p), ρ) so that for any |ε| < ε, ¯ a) For a.e. p (with respect to Lebesgue measure), the spectrum of L(m) ε0 (p) is pure point, its eigenvalues coincide with the values of some function 3(p) ∈ L∞ (S 1 ) along the trajectory {Tωn p}n∈Z1 of the point p. b) The corresponding eigenfunctions decay exponentially. They can be constructed with the 9(n, p), measurable for any n ∈ Z1 , such that for a.e. p P help of a ρfunction +∞ |9(n, p)| e 2 |n| < ∞. Then the set of eigenfunctions {ϕk }k=−∞ is given by the n
formula ϕk (n) = 9(n − k, p + kω). c) The spectrum is nondegenerate. d) The spectrum as a set (i.e. the closure of the set of eigenvalues) has a positive Lebesgue measure, greater than l(Ran(h(0, p)) − const · εσ , σ > 0. Combining this result and the statement of Theorem 2 we arrive at 1
Theorem 3. Let Vm,m (p) = B 2
R∞ −∞
1
V0 (y + p)2m (B 2 y)dy, m ≤ M (B, ε0 , V ) be a
Morse function with two critical points and ω = 2πB −1 satisfies ( D). Then there exist positive constants ε¯0 , ε¯1 (B, V0 , V1 , C, κ, m) such that for |ε0 | < ε¯0 , |ε1 | < ε¯1 the following statements hold: i) For any fixed s ∈ Z1+ , n ∈ Z1 there exist functions c(m) (s, n; p), 1-periodic and measurable in p and 1-periodic measurable functions 3(m) (p) such that
(m) 2
3 (p) − ((m + 1 )B + ε0 Vm,m (p))
∞ 1 < const · ε0 ;
2 L (S ) for a.e. p ∈ [0, 1]
X c(m) (s, n; p) (s2 + 1)e δ3 |n| < ∞;
(2.10)
s,n
and for every k ∈ Z1 , a.e. p ∈ [0, ω] the series 8(m) p,k (x, y) =
X
p
1
c(m) (s, n − k, p + kω)e2πi( ω +n)x · s (B 2 (y − p − nω))
(2.11)
s,n
and their first and second derivatives converge uniformly in x, y giving generalized eigenfunctions of Lε0 (B) with the eigenvalues 3(m) (p + kω), (m) Lε0 (B)8(m) (p + kω)8(m) p,k (x, y) = 3 p,k (x, y)
The constructed functions 8(m) p,k (x, y) are infinitely differentiable in x; (m) ipB (m) 8p,k (x + 1, y) = e 8p,k (x, y), and they decay at infinity in y at least as
1 y 2 +1
Splitting Low Landau Levels into Positive Lebesgue Measure
const(p, k, m) (m) . 8p,k (x, y) ≤ y2 + 1
565
(2.12)
If the functions V0 , V1 satisfy the stronger condition (C∗ ), then for any integer N > 0 one can find so small ε¯0 , ε¯1 (depending on N ) that for a.e. p the functions 8p,k (x, y) are infinitely differentiable in x and y and const(p, k, m) (m) . 8p,k (x, y) ≤ y 2N + 1
(2.13)
(ii) One can construct a full family of eigenfunctions defining them for any real parameter q ∈ R1 by the formula (m) 8(m) q (x, y) = 8{ q }ω,[ q ] (x, y) ω ω
so that for any f (x, y) from the Schwartz space J(R2 ) the Plancherel formula holds:
(m) 2
Pε f = 0
Z∞ |gf (q)|2 dq, −∞
is the projection to Eε(m) and where Pε(m) 0 0 Z gf (q) =
f (x, y)8q (x, y)dxdy. R2
(iii) The restriction of Lε0 (B) to Eε(m) is unitary equivalent to the multiplication 0 operator on L2 (R1 ) with the multiplication function 3(m) ; the Lebesgue measure of the mth Landau band equals to ε0 l(Ran(Vm,m )) + o(ε0 ). Remark. The nature of the spectrum clearly depends on the type of the distribution function of 3(m) . For the Almost Mathieu operator the distribution function of 3(m) is known to be absolute continuous ( see [15] ). However for the general quasi-periodic operators with long-range potential, studied in [18], this is still an open question. Remark. The condition on Vm,m (p) formulated in the Theorem 3 is satisfied for example by V0 (y) = cos(2πy). Remark. B.Helffer and J.Sj¨ostrand applied in [19, 20] the semiclassical analysis of the Almost Mathieu equation to the case of the Schr¨odinger operator with a strong symmetric external field (ε0 >> 1). They showed (under some conditions on the continuous fraction expansion of 2π/B) that in the neighborhood of the first eigenvalue of the approximating hamiltonian with a quadratic potential, the spectrum of L is a Cantor set of zero Lebesgue measure. We will discuss Theorem 3 in more detail in Sect. 5.
566
E.I. Dinaburg, Ya.G. Sinai,A.B. Soshnikov
3. Reduction of the Matrix Representation of Lε0 ,p in the Neighborhood of the Low Landau Levels to the Special Block Type We will prove Theorem 2 in the case of the lowest Landau level. The generalization to the case of a few Landau levels is straightforward. Let us write the matrix of Lε0 ,p in the block form: (0, 0) (0, 1) (0, 2) (0, 3) · · · (1, 0) (1, 1) (1, 2) · · · · · · (3.1) (2, 0) (2, 1) · · · ··· ··· ··· ··· ··· ··· ··· which consists of the countable number of blocks, enumerated by the double index (m, m1 ), m ∈ Z1+ , m1 ∈ Z1+ . Each block is infinitely-dimensional and its matrix th elements correspond to the interaction between the mth and m1 Landau levels. In this special representation we are looking for a unitary operator U (p), such that the matrix U (p)−1 Lε0 ,p U (p) has zero non-diagonal blocks (0, m1 ), (m1 , 0) for m1 6= 0, and block (0, 0) is given by an operator of the type (2.7–2.9). We represent the matrix of Lε0 ,p as (2) L = D(1) + A(1) (1) + A(1) , where D(1) is a diagonal part, D(1) (m, n; m1 , n1 ) = δm,m1 δn,n1 L(m, n; m1 , n1 ), A(2) (1) corresponds to the interaction of the zero Landau level and the other levels, A(2) (1) (m, n; m1 , n1 ) = δ0,m (1 − δ0,m1 )L(m, n; m1 , n1 ) + (1 − δ0,m )δ0,m1 L(m, n; m1 , n1 ), (2) iW and A(1) as (1) = L − D − A(1) . We can write the conditions on U = e ) e−iW LeiW (0, n; m1 , n1 ; p) = 0 if m1 > 0
P −iW iW 2 Le (0, n; 0, n1 ; p) C 2 (S 1 ) e 3 δ|n1 −n| < +∞ . e
(3.2)
n1
We also require W (p) to be an ergodic family of operators: W (m, n; m1 , n1 ; p) = W (m, 0; m1 , n1 − n; p + nω). To define W we use the well known formula e−iW LeiW = L +
∞ k X i [· · · [L, W ], · · · W ] . {z } k! | k=1
k times
In the first approximation W(1) is the solution of the equation i[D(1) , W(1) ] = −A(2) (1) ,
(3.2)
i.e. for m1 ≤ m2 W(1) (m1 , n1 ; m2 , n2 ) = 0 if m1 > 0 A(2) (1) (0, n1 ; m2 , n2 ) W(1) (0, n1 ; m2 , n2 ) = i D(0, n1 ; 0, n1 ) − D(m2 , n2 ; m2 , n2 )
) .
(3.4)
Then for L(2) = e−iW(1) LeiW(1) we obtain the analogous representation L(2) = D(2) + (2) (2) 2 A(1) (2) + A(2) in which A(2) has a norm of order ε0 . In the same way we can find the next approximation solving the equation i D(2) , W(2) = −A(2) (2) , and so on. It is clear that
Splitting Low Landau Levels into Positive Lebesgue Measure
567
|D(1) (0, n1 ; 0, n1 ) − D(1) (m2 , n2 ; m2 , n2 )| > B B > m2 − ε0 kV0,0 kC 2 (S 1 ) + kVm2 ,m2 kC 2 (S 1 ) > m2 2 4 if m2 > 0 and ε0 is small enough. The same inequality for D(s) immediately follows from the inductive assumptions (Is − IIIs ) (see below). It means that the small denominators do not appear on each step of our inductive procedure and the standard perturbation theory can be applied. The most convenient way to formulate the inductive hypothesises is to use the functions l(m1 , m2 ; n, p) := L(m1 , 0; m2 , n; p), such that L(m1 , n1 ; m2 , n2 ; p) = l(m1 , m2 ; n2 − n1 ; p + n1 ω). Remark also that the product of two ergodic operators B, C corresponds to the convolution of functions b, c: X b(m1 , m; n, p) · c(m, m2 ; n1 − n; p + nω). (b · c)(m1 , m2 ; n1 , p) = m,n
Now we are ready to formulate the inductive assumptions at the sth step of induction: (Is,l ) ∞
X
(1)
a(s) (m, m1 ; 0; p)
a)
m1 =0
b)
∞ X
C 2 (S 1 )
X
(1)
a(s) (m, m1 ; n; p)
m1 =0
n6=0
(IIs,l ) a)
C 2 (S 1 )
b)
m1
2 · e 3 δ|n| · ml1 + 1 ≤ ε1 ml+1 + 1 δ(s) .
X
(2)
a(s) (0, m1 ; 0; p) m1
X
· ml1 + 1 ≤ ml+1 + 1 δ(s) ,
C 2 (S 1 )
· ml1 + 1 < ε(s) ,
X
(2)
a(s) (0, m1 ; n; p) n6=0
C 2 (S 1 )
·e
2 3 δ|n|
· ml1 + 1 ≤ ε1 ε(s) .
(IIIs,l )
d(s) (m, m; 0; p) − (m + 1 )B − ε0 Vm,m (p)
2 1 ≤ ε0 δ(s) .
2 C (S ) We will see later that there exist some constants const3,l (V0 , V1 , B) and const4,l (V0 , V1 , B), such that 0 < δ(s) < const3,l · ε0 , s 0 < ε(s) < const4,l · ε0 .
568
E.I. Dinaburg, Ya.G. Sinai,A.B. Soshnikov
Proposition 4. The inductive assumptions I1,1 − III1,1 are valid for s = 1 and ∗ ε(1) = δ(1) = const 5,1 (V0 , V1 , B) · ε0 . If V0 , V1 satisfy (C ) , inductive assumptions I1,l − III1,l l = 1, 2 . . . are valid at the first step of induction s = 1 with ε(1) = δ(1) == const5,l (V0 , V1 , B) · ε0 . Remark. Various constants appearing in the proof of Theorems 2,3 depend only on the magnetic field and external potential V (x, y). Proposition 4 will be proven in Sect. 4. Lemma 5. Assume that the inductive assumptions Is,l − IIIs,l are valid on the sth step of induction. Define W(s) with the help of the formula D(s) , W(s) = iA(2) (s) . Then the inductive assumptions Is+1,l − IIIs+1,l are valid for L(s+1) = e−iW(s) L(s) eiW(s) ,
ε(s+1) = ε(s) · const7,l · δ(s) , δ(s+1) = δ(s) 1 + const6,l ε(s) . Moreover w(s) (0, m; n) ∼
a(2) (s) (0, m; n) . m+1
Remark. The last relation allows us to write an additional power of m in the r.h.s. of inequalities (Is,l ). The proof of inductive lemma is rather and will be omitted. Qstandard eiW(s) is well defined and satisfies the statement The operator U (p) = eiW = lim s→∞ of Theorem 2. 4. Checking the Inductive Assumptions at the First Step of Induction Writing Lε0 ,p = D + A(1) + A(2) we have the following representation for the matrix elements: 1 1 B + ε0 Vm,m (p) = m + B+ d(m, m; 0; p) = m + 2 2 Z∞ (4.1) 1 1 1 2 2 2 V0 (y + p)m B y m B y dy, + ε0 B −∞ (n) (p) = a(1) (m, m1 ; n, p) = ε0 ε1 Wm,m 1 ∞ Z 1 1 1 = ε0 ε1 B 2 V1(n) (y + p)m B 2 (y + nω) m1 B 2 y dy −∞
if n 6= 0 , where
V1(n) (y)
are Fourier coefficients of V1 (·, y): V1 (x, y) =
∞ X n=−∞
e2πinx V1(n) (y)
(4.2)
Splitting Low Landau Levels into Positive Lebesgue Measure
569
(clearly we can assume V1(0) (y) ≡ 0 or add it to V0 (y)), (1)
a (m, m1 ; 0, p) = ε0 Vm,m1 (p) = ε0 B
Z∞
1 2
1 1 V0 (y + p)m B 2 y m1 B 2 y dy
−∞
(4.3)
if m1 6= m 6= 0;
(2)
a (0, m1 ; 0, p) = ε0 V0,m (p) = ε0 B
Z∞
1 2
1 1 V0 (y + p)0 B 2 y m1 B 2 y dy (4.4)
−∞
if m1 > 0; and (n) a(2) (0, m1 ; n, p) = ε0 ε1 W0,m (p) = 1 ∞ Z 1 1 1 = ε 0 ε1 B 2 V1(n) (y + p)0 B 2 (y + nω) m1 B 2 y dy
(4.5)
−∞
if n 6= 0, m1 > 0. (n) (p), lead to the Conditions I1,l − III1,l rewritten in terms of Vm,m1 (p), Wm,m 1 inequalities: X kVm,m1 kC 2 (S 1 ) ml1 + 1 ≤ const6,l ml+1 + 1 , (4.6) m1
ÿ
X X
(n)
Wm,m 1
m1
n
! C 2 (S 1 )
e
2 3 δ|n|
ml1 + 1 ≤ const7,l ml+1 + 1 .
(4.7)
The main part of the proof of estimates (4.6), (4.7) is contained in lemmas 6–7. Lemma 6. Let m ¯ ≥ m, b ≥ 0. Then Z∞ eiby m (y)m¯ (y)dy =
I(m, m; ¯ b) = −∞
(m−m) 21 ¯ 2 1 b m ¯ − b4 √ =i · · 1 · e m 2 ((m ¯ − m)!) 2 ) (m l X m (m ¯ − m)! b2 · (−1)l l 2 (m ¯ − m + l)!
(m−m) ¯
(4.8)
l=0
Lemma 7. a)
∞ X
l+ 21
|I(m, m1 ; b)| · (ml1 + 1) ≤ 4 max(4m; m + 18b2 )
+ const8,l ≤
m1 =0
√ 1 ≤ (12l + 12) (5m)l+ 2 + (3 2b)2l+1 + const8,l . (b) Let the function f (y) be periodic with period τ and (2l + 2) - times continuously differentiable. Then
570
E.I. Dinaburg, Ya.G. Sinai,A.B. Soshnikov
∞ Z sup f (y + α)m (y)m1 (y)dy ml1 + 1 ≤ α m1 =0 −∞
(2l+2) 1
2 1 · const9,l (τ ) · ml+ 2 + 1 . ≤ f L (S ) ∞ X
Proof of Lemma 6.
Z∞ eiby 0 (y)m¯ (y)dy
I(0, m; ¯ b) = −∞
is a well known integral (see [21, 22]): I(0, m; ¯ b) = i
m ¯
b √ 2
m¯ ·
e−
b2 4 1
(m!) ¯ 2
.
(4.9)
It is not difficult to see that r I(m, m; ¯ b) =
b m ¯ I(m − 1, m ¯ − 1; b) + i √ m 2
r
1 · I(m − 1, m; ¯ b). m
(4.10)
Iterating (4.10) m times we arrive at (4.8). Proof of Lemma 7. a)
∞ X
|I(m, m1 ; b)| ·
(ml1
+ 1) =
2 max(4m,m+18b ) X
m1 =0
m1 =0
X
+
.
m1 >max(4m,m+18b2 )
We use a rough estimate for the first sum. Since 2 max(4m,m+18b ) X
|I(m, m1 ; b)|2 ≤
m1 =0 2 max(4m,m+18b ) X
∞ X
|I(m, m1 ; b)|2 = 1,
m1 =0
1 |I(m, m1 ; b)| · ml1 + 1 ≤ max(4m, m + 18b2 ) + 1 2 ·
(4.11)
m1 =0
· (max(4m, m + 18b2 ))l + 1 . The second sum is uniformly bounded by a constant. To see this, we need Lemma 8. Let m ¯ > max(4m, m + 18b2 ). Then
m 2 l X ( m ¯ − m)! b m (−1)l ≤ 1. l 2 (m ¯ − m + l)! l=0
(4.12)
Splitting Low Landau Levels into Positive Lebesgue Measure
Denoting
2 l b 2
(m−m)! ¯ (m−m+l)! ¯
571
by r(l) we can write
m X
(−1)l
l=0
m l
¯ m r (m), r(l) = 1
¯ 1 ¯ k−1 r (l). ¯ k r (l) = 1 ¯ where 1r(l) = r(l − 1) − r(l), l = 1, · · · m and 1 In our case 2(m ¯ − m) 2l ¯ + − 1 r(l), 1r(l) = b2 b2 and, in general, for t ≤ l ¯ − m) 2(l + 1 − t) ¯ t−2 r (l). (4.13) ¯ t−1 r(l) − 2 1 ¯ t r (l) = 2(m + − 1 1 1 2 2 2 b b b ¯ t r (l) > 1 ¯ (t−1) r (l). ¯ t r (l) > 0, 1 Equations (3.12) and (3.13) imply 1 t−1 Q 2(m−m)+2(l−j) ¯t ¯ r (l) ≤ r(l) −1 Finally 1 b2 j=0
m−1 m−1 m Q 2(m−j) Q ¯ ¯ r (m) ≤ r(m) 1− −1 ≤ and 1 b2 j=0
j=0
proof of Lemma 2 a) we write X
b2 2(m−j) ¯
≤ 1. To finish the
|I(m, m, ¯ b)| (m ¯ l + 1) ≤
2) m>max(4m,m+18b ¯
k
1 1 − b2 k+m 2 4 √ e · ≤ m k! k>max(3m,18b2 ) X b k 1 b2 l √ √ e− 4 2k (2k)l + 1 . (k + m) + 1 ≤ 2 k! k>18b2 X
b √ 2
It is clear that the sum of the last series is uniformly bounded in b. Part b) of Lemma 7 follows from part a) and estimates on decay of Fourier coefficients of differentiable functions. Lemma 2 is proven. Remark. Since for Weber-Hermite functions ∞ ∞ Z Z inωy , (y − nω) (y)dy = e (y) (y)dy m m1 m m1 −∞
−∞
the estimates from part a) of Lemma 7 imply ∞ Z∞ ∞ X X δ · (ml1 + 1) ≤ e− 12 |n| (y − nω) (y)dy m m1 n=−∞ m1 =0 −∞ 1 ≤ const10,l · ml+ 2 + 1 .
(4.14)
572
E.I. Dinaburg, Ya.G. Sinai,A.B. Soshnikov
Now we are ready to prove inequalities (4.6)–(4.7). The first of them immediately follows from Lemma 7 b). To check (4.7) we consider the Fourier series for V1 (x, y): ∞ X
V1 (x, y) =
e2πinx V1(n) (y) =
n=−∞
X
g(n, l) · e2πinx · e2πiny .
n,l
The condition (C) implies |g(n, l)| ≤ e− 4 δ|n| · 3
Then
1 · const(V1 ). l7 + 1
∞ Z 1 e sup m B 2 (y − nω) · α m1 =0 n=−∞ −∞ 1 ·V1(n) (y + α)m1 B 2 y dy · (m1 + 1) = ∞ Z ∞ ∞ 1 X X 2 = e 3 δ|n| sup m B 2 (y − nω) · α m1 =0 n=−∞ −∞ ! ! ÿ 1 X 2πily 2πilα 2 g(n, l)e e m1 B y dy · (m1 + 1) = l ∞ Z X ∞ ∞ ∞ X X 2 1 = e 3 δ|n| sup I(m, m, ¯ lB − 2 )· α m1 =0 n=−∞ ¯ −∞ m=0 ! ∞ X 1 1 g(n, l)e2πilα · m1 (B 2 y)dy · (m1 + 1) ≤ m¯ (B 2 (y − nω)) · ∞ X
∞ X
2 3 δ|n|
(4.15)
l=−∞
∞ ∞ X const (V ) X 1 1 · ¯ lB − 2 ) · ≤ I(m, m, l7 + 1 ¯ l=−∞ m=0 ∞ Z ∞ ∞ 1 X X δ − 12 |n| 2 (y − nω) e · B m ¯ m1 =0 n=−∞ −∞ 1 ·m1 B 2 y dy · (m1 + 1) ≤ ∞ ∞ X const (V ) 3 X 1 1 2 + 1 · m ¯ ¯ lB − 2 ) · · const10 . ≤ I(m, m, l7 + 1 ¯ l=−∞ m=0
Here the last inequality follows from (4.14). Using the result of Lemma 7 a) once more one can show that the r.h.s. of (4.15) is less than const11 ·
∞ X l=−∞
The estimates of
m2 + l 4 + 1 ·
1 ≤ const12 (m2 + 1). l7 + 1
Splitting Low Landau Levels into Positive Lebesgue Measure
573
∞ Z 1 e sup m B 2 (y − nω) · α m1 =0 n=−∞ −∞ 1 ∂ k (n) · k V1 (y + α)m1 B 2 y dy · (ml1 + 1) ∂y ∞ X
∞ X
2 3 δ|n|
for k = 1, 2; l = 1 or k = 0, 1, 2; l > 1 can be derived in a similar way. Proposition 4 is proven. 5. Proof of Theorem 3 Mainly we will consider the case of functions V0 , V1 satisfying the condition (C). We proved in Sects. 3, 4 the existence of a unitary operator U (p) = eiW that −1 U (p) Lε0 p U (p) E (m) is given by formulas (2.7–2.9). The columns of the matrix repε0 ,p resentation of U (p) produce the new basis ∞
{ej (p)}j=−∞ : ej (p)(m, n) = eiW (0, m; n − j; p + jω). It follows from (2.3) and inductive assumptions (IIs,1 ), s = 1, 2, 3, · · ·, that 2
|ej (p)(m, n)| · (m2 + 1)e 3 δ|n−j| < const.
(5.1)
The last inequality, combined with the results a),b), concerning the spectrum of −1 L(m) ε0 = U (p) Lε0 ,p U (p) E (m) ε0 ,p
gives us the series representation (2.10–2.11) for the generalized functions of Lε0 (B). The trivial estimate |l (y)| < const · (l + 1) 1 21 d l = 2l 2 l−1 − l+1 l+1 imply the uniform convergence of and the formula dy 2 (2.11) and allow us to differentiate it twice in x and y term by term. To prove (2.12) we decompose the series (2.11) into two parts: X X + . 8p,k (x, y) = y 2 y 2 l<( 20 ) l≥( 20 ) We derive the trivial bound of the second sum X X p 1 2πi( ω +n)x c(l, n − k; p + kω) · e l B 2 (y − p − nω) ≤ y 2 n l≥( 20 ) X δ 1 const(p, k) ≤ . |c(l, n − k, p + kω)| const(l2 + 1) · e 3 |n| ≤ · y 2 y2 + 1 +1 20
l,n
h √ √i To consider the first sum, we recall that l (y) oscillates on the interval −2 l, 2 l and decays superexponentially off this interval. In particular
574
E.I. Dinaburg, Ya.G. Sinai,A.B. Soshnikov
y2 |l (y)| < exp − 10 (see [22]). Thus X X √ l:|y|>20 l n
=
X
if |y| >
√ l · 10
(5.2)
1 p c(l, n − k, p + kω) · e2πi( ω +n)x l B 2 (y − p − nω) = X
√ y l:|y|>20 l n:|n|< 2
+
X
X
√ y l:|y|>20 l n:|n|≥ 2
Using (5.2) in the first subsum and (2.10) in the second, we can easily show that they are exponentially small in y. This gives us (2.12). If V0 , V1 satisfy (C∗ ) we replace (5.1) by 2 (5.10 ) |ej (p)(m, n)| mN +1 + 1 e 3 δ|n−j| < const, where N can be taken arbitrary large if ε0 , ε1 → 0, and use similar arguments. The infinite differentiability of 8(m) p,k follows from the Friedrichs theorem for strongly elliptic operators ( [23] ). To prove part (iii), we consider for every k ∈ Z1 and a.e. p ∈ [0, ω], the eigenspace Hm,k (p) of the operator L(m) ε0 (p), generated by the eigenfunction ϕk (p) L Rω with the eigenvalue 3(m) (p + kω). If we define Hm,k = Hm,k (p)dp, then Eε(m) = 0 ∞ L P
0
Hm,k , each Hm,k is Lε0 (B) - invariant and the restriction of Lε0 (B) to Hm,k
k=−∞
is unitary equivalent to the multiplication operator on L2 ([0, ω]) with the multiplication function 3(· + kω). Part (ii) follows from the representation of Lε0 (B) as the direct integral of the difference operators, Theorem 2 and the previous considerations. Theorem 3 is proven. Acknowledgement. E.D. and A.S. are sincerely grateful to Professor R.Seiler for the warm hospitality at the Technical University of Berlin in May–July 1993 where a part of this work has been written. E.D. acknowledges RFFI (grant # 96-01-0037) for partial support.
References 1. Thouless, D., Kohmoto, M., Nightingale, P., den Nijs, M.: Quantized Hall conductance in a twodimensional periodic potential. Phys.Rev. Lett. 49, 1405 (1982) 2. Avron, J., Seiler, R.: On the quantum Hall effect. J.Geom.Phys. 1, 3, 13–23 (1984) 3. Avron, J., Seiler, R., Simon, B.: Homotopy and quantization in condensed matter physics. Phys.Rev. Lett., 51, 51 (1983) 4. Kunz, H.: The quantum Hall effect for electron in a random potential. Commun. Math. Phys., 112, 1, 121–145 (1987) 5. Bellissard, J.: Ordinary quantum Hall effect and non-commutative cohomology. Teubner-Texte Phys., 16, Leipzig: Teubner, 1987 6. Bellissard, J.: C ∗ algebras in solid state physics. 2-D electron in a uniform magnetic field. In: Operator algebras and applications. Vol. 2 (ed. E.Evans, M.Takesaki), Cambridge: Cambridge University Press, 1988 pp.49–76 7. Bellissard, J., van Elst, A., Schulz-Baldes, H.: The non-commutative geometry of the quantum Hall effect. J. Math.Phys. 35, 5373–5451 (1994) 8. Avron, J., Seiler, R., Simon, B.: Charge Deficiency, Charge Transport and Comparison of Dimensions. Commun. Math. Phys. 159, 399–422 (1994)
Splitting Low Landau Levels into Positive Lebesgue Measure
575
9. Hofstadter, D.: Energy levels and wave functions for Bloch electrons in rational and irrational magnetic fields. Phys. Rev. B 14, 2239–2249 (1976) 10. Landau, L.D., Lifschitz E.M.: Quantum mechanics. Nonrelativistic theory. 2-nd ed., New-York: Pergamon Press, 1965 11. Reed, M., Simon B.: Methods of Modern Mathematical Physics. Vols. 1–4 , New York: Academic Press, 1978 12. Novikov, S.P.: Magnetic Bloch functions and vector bundles. Typical dispersion laws and their quantum numbers. Soviet Mathematics Doklady 23, 2 298–303 (1981) 13. Dubrovin, B.A., Novikov, S.P.: Ground states in a periodic field. Magnetic Bloch functions and vector bundles. Soviet Mathematics Doklady 22, 1, 240–244 (1980) 14. Dubrovin, B.A., Novikov, S.P.: Ground states of a two-dimensional electron in a periodic magnetic field. Soviet Physics JETP 52, 3, 511–516 (1980) 15. Sinai, Ya.G.: Anderson localization for one-dimensional difference Schr¨odinger operators with quasi-periodic potentials. J.Stat.Phys., 46, 861–909 (1987) 16. Frohlich, J., Spencer, T., Wittwer, P.: Localization for a class of one-dimensional quasi-periodic Schr¨odinger operators. Commun. Math. Phys. 132, 5–26 (1990) 17. Chulaevsky, V.A., Dinaburg, E.I.: Methods of KAM-theory for long-range quasi-periodic operators on Zν . Pure Point spectrum. Commun. Math. Phys. 153, 559–577 (1993) 18. Dinaburg, E.I.: Some problems of spectral theory of discrete operators with quasi-periodic coefficients. Russian Math. Surveys (To appear) 19. Helffer, B., Sj¨ostrand, J.: Semiclassical Analysis for Harper’s Equation III. Cantor Structure of the Spectrum. M´emoires de la Soci´et´e Math´ematique de France. n.39, 117:4, 1–124 (1989) 20. Helffer, B., Sj¨ostrand, J.: Analyse semi-classique pour l’ e´ quation de Harper (avec application a` l’ e´ quation de Schr¨odinger avec champ magnetique). M´emoires de la Soci´et´e Math´ematique de France. n.34, 116:4, 1–113 (1988) 21. Gradshteyn, I.S., Ryzhik, I.M.: Table of integrals , series and products. Boston: Academic Press, 1994 22. Bateman, H., Erdelyi, A.: Higher transcendetal functions. Vol.2, New York, Toronto and London: McGraw-Hill, (1953) 23. Friedrichs, K.O.: Differentiability of Solutions of Elliptic Partial Differential Equations. Math. Scand., 1, 55–72 (1953) 24. Kato, T.: Perturbation Theory for Linear Operators. 2-nd ed., Berlin and New York: Springer-Verlag, 1976 Communicated by J.L. Lebowitz
Commun. Math. Phys. 189, 577 – 590 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
One-Dimensional Hard-Rod Caricature of Hydrodynamics: “Navier–Stokes Correction” for Local Equilibrium Initial States? C. Boldrighini1 , Yu.M. Suhov2 1
Dipartimento di Matematica e Fisica, Universit`a di Camerino, Camerino, Italy Institute for Problems of Information Transmission, Russian Academy of Sciences, Moscow, Russia; Statistical Laboratory, DPMMS and Isaac Newton Institute for Mathematical Sciences, University of Cambridge; St John’s College, Cambridge, UK
2
Received: 14 October 1996 / Accepted: 13 February 1997
Dedicated to the memory of Roland Dobrushin Abstract: A one-dimensional system, consisting of identical hard-rod particles of length a is studied in the hydrodynamical limit. A “Navier–Stokes correction” to the Euler equation is found for an initial local equilibrium family of states P , > 0, of constant density. The correction is given, at t ∼ 0, by a non-linear second order differential operator acting on f (q, v), the hydrodynamical density at a point q ∈ R1 of the “species” of fluid with velocity v ∈ R1 . 1. Introduction The dynamics of a system of one-dimensional identical hard rods of length a corresponds to a simple law of motion: the particles move freely on the line R1 , except that they undergo elastic collisions, which occur when the distance between particles equals a. When two particles collide they simply exchange their velocities. (One can only consider binary collisions: multiple collisions occur with zero probability with respect to all natural initial measures [2]). As velocities are conserved, no thermalization is possible, and the hard-rod (h.r.) model is certainly far from physical reality. Nevertheless it is quite remarkable since it is the only mechanical deterministic model for which transport coefficients can be computed by the Green-Kubo formula and proved to be nonvanishing, in addition to being the only one for which a non-linear “Euler equation” can be deduced in the hydrodynamical limit. We mention here some papers related to this model in the context of the present work. The ergodic properties of the model have been studied in [13, 1, 2]. In [4] it was proved that the hydrodynamical limit leads to a non-linear “Euler equation” which appeared ? This research was supported in part by CNR (GNFM) funds, MURST funds and research funds of the University of Camerino (C.B.); the EC Grant “Training Mobility and Research” No 16296 (Contract CHRXCT 93-0411) and the INTAS Grant “Mathematical methods for stochastic discrete even systems” 93-820 (Yu.M.S.)
578
C. Boldrighini, Yu.M. Suhov
earlier in the physical literature [12]. Equilibrium fluctuations and transport coefficients were studied in [16 and 5] (see also [17], §7.3). A starting point here was an earlier result on the equilibrium correlation functions [9]. The present paper intends to study corrections to the hydrodynamical limit for the h.r. system. The general problem here is to derive the Euler and Navier–Stokes equations for large particle systems evolving according to hamiltonian dynamics. The reader is referred to [17] for an overview of the state of the problem. The hydrodynamical equations are usually written for densities of the “canonical” first integrals of the motion: mass, momentum and energy. The hydrodynamical limit is related to scaling both (microscopic) time and length by multiplying them by a factor → 0. One considers a family {P } of initial states such that the local distribution induced by P changes “very little” over distances ∼ o(−1 ) but the change becomes noticeable over distances ∼ O(−1 ). For details, see [8, 7, 17]. A general result was obtained in an important paper [11], where the Euler equation was established for a general system of moving particles combining hamiltonian and stochastic laws of motion. Introducing randomness into the motion may be regarded as a simple way of tackling the local instability of the deterministic dynamics. An alternative (and perhaps more physical) approach was proposed in [14]. The pecularity of the h.r. model is that it possesses a much richer family of “natural” first integrals. In fact, the fraction of particles with a prescribed velocity is preserved in time, and therefore the hydrodynamical equations connect densities of various “species”, labelled by the value of velocity v ∈ R1 , of the hard-rod “fluid” . The quantity under investigation is therefore the moment function ρT ∗−1 P (−1 q, v) (see below) giving the t density of particles with velocity v at a (micro-) point −1 q and a (micro-) time −1 t. The result of [4] is that, under some conditions on the initial states P , and the test function g, the limit Z Z (1.1) dqν0 (dv)ft (q, v)g(q, v) = lim dqν0 (dv)ρT ∗−1 P (−1 q, v)g(q, v) →0
t
exists and gives a (weak) solution of the equation ∂ ft (q, v) = (Aft )(q, v), ∂t
(1.2)
where A is a non-linear first order differential operator Z ∂ −1 (Aft )(q, v) = − ft (q, v) v + a ν0 (dw)(v − w)ft (q, w)(1 − aρt (q)) . ∂q (1.3) 1 is a “basic” measure on R (one may think of the Lebesgue measure) and ρ Here ν 0 t (q) R ¯ t (q, w). ¯ = ν0 (dw)f Equations (1.2)–(1.3) is interpreted as the Euler equation for the h.r. fluid, expressed in the “macroscopic” variables q, t ∈ R1 . The quantity ft (q, v) is the density of mass of the fluid species with velocity v, and ρt (q) is the total fluid mass density, at the (macro-) point q and (macro-) time t. The Navier–Stokes equation arises when we take into account the gradients of the conserved quantities. Since in the hydrodynamical scaling such gradients are of order , it appears that the function describing the fluid “at the Navier–Stokes level” should be obtained by keeping the corrections of order to the limit (1.1), and it should coincide with the solution of the Navier–Stokes equation for the h.r. fluid, obtained by adding
1-D Hard-Rod Caricature of Hydrodynamics
579
terms of order to the right side of Eq. (1.2). The Navier–Stokes solution would give a better approximation of the true density ρT ∗−1 P (−1 q, v) for finite (macro-)times, and t should remain close to it for longer times, of the order t ∼ −1 on the macroscopic scale. The connection between the two properties is however unclear. There are actually no results on the derivation of the Navier–Stokes equations from microscopic dynamics, except for free motion, and the hard rods seem at present the only model which can provide further insight on the issue. Some contradicting variants of a possible Navier–Stokes equation for the hard rod model were proposed in [3, 15 and 6]. However there has been no rigorous derivation, so far, of any of those equations. An important remark is that in deriving the Navier–Stokes equation the conditions on the hydrodynamic family {P } of initial states are more restrictive than for the Euler case: as an analysis of our proof below shows, it is essential that the initial states should be “true local equilibrium” states, i.e., Gibbs states with pure hard-rod potential. In this paper we derive the h.r. Navier–Stokes equation as a correction to the Euler equation (1.2) up to the O()−terms in the RHS of (1.1). The equation is established at an infinitesimal level: we prove that for appropriately defined local equilibrium initial states P and any nice test function g, the following relation holds: Z dqν0 (dv)(Bf0 )(q, v)g(q, v) Z (1.4) d lim −1 dqν0 (dv) ρT ∗−1 P (−1 q, v) − ft (q, v) g(q, v)|t=0+ . = t dt →0 Here ft is the solution of Eq. (1.2), (1.3) and B is a non-linear second order differential operator: Z a2 ∂ h ∂ ν0 (dw)|v − w|f0 (q, w) f0 (q, v)− (Bf0 )(q, v) = 2 ∂q ∂q (1.5) Z i ∂ − f0 (q, v) ν0 (dw)|v − w|f0 (q, w) (1 − aρ0 (q))−1 . ∂q For the case t → 0− the RHS of (1.5) changes sign. In the present R paper we prove the ¯ 0 (q, w) ¯ = ρ. result only for initial states with constant mass density ρ0 (q) = ν0 (dw)f This result suggests the following version of a “short-time” Navier–Stokes equation: ∂ () f = Aft() + Bft() . (1.6) ∂t t This is the form predicted long ago by H. Spohn [15]. (It differs however from the form proposed in [17], Ch. 7.) One could argue that the derivation of Eq. (1.6) for small times is physically irrelevant, since for such times the fluctuations are much larger than the Navier–Stokes correction to the Euler solution. Our main aim is however, as we explained above, to get some insight into the nature of the “Navier–Stokes terms”, and to test the consistency of our procedure in deriving them. An important remark in this respect is that our result is in agreement with the expression of the transport coefficients, computed via the Green-Kubo formula as integrals of the current-current correlation functions [17]. Throughout the paper we use mainly the physical terminology for probabilistic notions (state, moment measure, moment function, particle density, etc.), but the paper does not require any use of the physical literature. On the other hand, the probabilistic background may be provided, e.g., by the book [10].
580
C. Boldrighini, Yu.M. Suhov
2. Euler and Navier–Stokes Equations for Hard-Rod Particles 2.1. Preliminaries. The dynamics of identical hard rods can be handled mainly because, due to the conservation of velocities, it can be, to some extent, reduced to free motion. In fact the h.r. dynamics is equivalent to that of “pulses’, i.e., objects which preserve their velocities and make jumps ±a at collision, and it is not hard to see that the evolution of the system can be represented in terms of the free motion of a point particle system (obtained by squeezing the rods to a point) and of random shifts which have a simple expression in terms of the initial conditions. (The precise formula is given by Eq. (2.2) below.) In what follows we need to consider maps between the phase space of the h.r. system M(a) and the point particle phase space M (or between the corresponding configuration spaces X (a) and X ), called “contractions” and “dilations”. We give here only a very sketchy summary of the notions and results that we need, and refer the reader to the papers [2, 4, 8] (we mainly follow the notation of [2], [4]). A point X of X or X (a) (a configuration) is a “locally finite” set of points x ∈ R1 (distant at least a apart for X ∈ X (a) ), indicating the positions of the particles. A point Y of M or M(a) is an “equipped” configuration, i.e., a set of points (x, v) ∈ R1 × R1 such that its image under the projection (x, v) 7→ x is a configuration in X or X (a) , respectively. Furthermore, when Y ∈ M(a) the projection is one–to–one (i.e., there is at most one particle with a given position) and when Y ∈ M the projection has finitely many pre–images, i.e., there are at most finitely many particles with a given position x ∈ R1 . Finally, we assume that if (x, v), (x,0 v 0 ) ∈ Y ∈ M(a) and x0 = x + a then v ≤ v0 . The spaces X , X (a) , M and M(a) are equipped with standard (vague) topologies. Measures are supposed to be given on the corresponding σ-algebras, though we will speak, by a standard abuse of language, of measures on the spaces. A probability measure (p.m.) on X ( X (a) ) is called a “configuration state” (“configuration h.r. state”), and a p.m. on M (M(a) ) a “full state” (“full h.r. state”). Given a full state P , we can construct a configuration state Q as the image of P under the projection induced by the map (x, v) 7→ x (we shall call the state Q the configuration projection of P ). Conversely, given a configuration state Q, we can build up a full state P by indicating a family of conditional probabilities which we denote as P (·|X), X ∈ X or X ∈ X (a) . Dealing with h.r. dynamics we need to define dilation and contraction transformations. Given Y ∈ M and (x, v) ∈ Y , we denote by D(x,v) Y the “dilation of Y ”, i.e., x, ve) ∈ Y by the integers the element of M(a) obtained as follows. We label the points (e n ∈ Z1 , according to the lexicographic order on R1 × R1 , and giving the label zero to (x, v). Then D(x,v) Y is formed by the points (xn + na, vn ) , n ∈ Z1 . Conversely, given Y ∈ M(a) and (x, v) ∈ Y , we denote by C(x,v) Y the “contraction of Y ”, i.e., the element of M which is formed by the points (xn − na, vn ) , n ∈ Z1 , with the same rule of labelling as before. By projection one defines the dilation Dx on X and the contraction Cx on X (a) , for x ∈ R1 . The free dynamics of a point (x, v) is defined by (x, v) 7→ (x + τ v, v) , τ ∈ R1 being the time variable. Given Y ∈ M and (x, v) ∈ Y we set M (x, v; τ, Y ) = Card{(e x, ve) ∈ Y : x e > x, x e + τ ve ≤ x + τ v}− −Card{(e x, ve) ∈ Y : x e < x, x e + τ ve ≥ x + τ v} . Given Y ∈ M(a) , we define the h.r. dynamics by
(2.1)
1-D Hard-Rod Caricature of Hydrodynamics
581
Tτ Y = {(x + τ v + aM (x, v; τ, C(x,v) Y ), v) : (x, v) ∈ Y } ,
τ ∈ R1 ,
(2.2)
cf. [2, 4]. Our aim is to study the evolution of a (full) h.r. state P under the h.r. dynamics: Tτ∗ P = P (T−τ ·) ,
τ ∈ R1 .
(2.3)
The (first-order) moment measure (MM) of a configuration state Q or full state P is denoted by RQ or RP ; it is a Borel measure on R1 or R1 × R1 , respectively. The Radon-Nikodym derivatives (if they exist), ρQ (x) = (d RQ / d λ)(x) , x ∈ R1 , or ρP (x, v) = (d RP / d (λ × ν0 ))(x, v) , x, v ∈ R1 , where λ is the Lebesgue measure and ν0 is a σ−finite non-negative Borel measure on R1 , are called the moment function (MF) of Q, and the ν0 −moment function (ν0 −MF) of P ), respectively. (If ν0 is fixed, we call RP simply the MF of P .) ρQ and ρP (if ν0 = λ) are interpreted as local particle densities in the one-particle configuration space R1 , or in the one-particle phase space R1 × R1 . For a translation-invariant (t.i.) configuration state Q, the particle density ρQ (x) = ρ is a constant. For a t.i. full state P, the MM is of the form RP = ρ(λ × ν) where ν is a p.m. on R1 (interpreted as the velocity distribution) and ρ is again a constant giving the particle density in the state P . We will also need the concept of Palm distribution associated to a configuration state Q or to a full state P . For the sake of brevity, we give the definition for full states; otherwise one simply applies the projection M → X . Considera non-negative measure 1 1 ¯ π R on (R ×R )×M (more precisely, on the set M = { (x, v), YR : (x, v) ∈ Y }) given by π((dx × dv) × dY )g((x, v), Y ) = EP N(g) , where N(g) (Y ) = Y (dx × dv)g((x, v), Y ) ¯ (The point Y ∈ M is here and g is a (non-negative) measurable function on M. 1 considered as an atomic measure on the phase space R × R1 with atoms of mass 1 at the points (x, v) ∈ Y .) The image of the measure π under the projection ((x, v), Y ) 7→ (x, v) is precisely the moment measure RP . Hence, by the Fubini Theorem, Z Z Z π((dx × dv) × dY )g((x, v), Y ) = RP (dx × dv) Pb(x,v) (dY )g((x, v), Y ), (2.4) where {Pb(x,v) , (x, v) ∈ R1 × R1 } is a family of probabilities on M (more precisely, on M(x,v) = {Y : Y 3 (x, v)}) which are defined for RP -a.a. (x, v) ∈ R1 × R1 . We call this family the Palm family of the state P , and a single measure Pb(x,v) the Palm distribution (or Palm state) associated to P at the point (x, v). Let P 0 be a t.i. full state. By dilation we transform the elements of its Palm family 0 b P(x,v) into a new family of states 0 , Pb(x,v) = D∗(x,v) Pb(x,v)
(2.5a)
which is the Palm family of a t.i. full h.r. state P [2], called the “dilated state” corresponding to P 0 . Conversely, given a h.r. full state P one defines the corresponding “contracted state” on M as the state with Palm family given by 0 = C∗(x,v) Pb(x,v) . Pb(x,v)
(2.5b)
We refer the reader to [2] for details. Dilation and contraction for t.i. configuration states are defined in the same way. A well-known example of a full state is that of a t.i. Poisson state with i.i.d. velocities, which is identified by a positive number ρ(0) (its “particle density”) and a probability
582
C. Boldrighini, Yu.M. Suhov
distribution ν on R1 (the “velocity distribution”). If P is such a state, then the Palm state Pb(x,v) associated to it at a point (x, v) is given by the relation Pb(x,v) (A) = P (A(x,v) ) ,
A ⊆ M(x,v) ,
(2.6)
where A(x,v) is the image of the event A under the map M(x,v) → M given by Y ∈ M(x,v) 7→ Y − δ(x,v) , δ(x,v) being the Dirac mass concentrated at the point (x, v). A similar assertion is valid for t.i. Poisson states on X (hereafter called configuration Poisson (c.P.) states). A t.i. full state P is called a “h.r. equilibrium state” with density ρ ∈ (0, a−1 ), and velocity distribution ν if the contracted state P 0 corresponding to it is the t.i. Poisson full state with density ρ(0) = ρ(1 − aρ)−1 and velocity distribution ν. If P is a h.r. equilibrium state with density ρ ∈ (0, a−1 ) its configuration projection Q is the dilated state corresponding to the configuration Poisson state Q0 with density ρ(0) . Q is called “pure h.r. Gibbs state” with density ρ. Observe that a h.r. equilibrium state P is identified by its MM which is of the form RP = ρ(λ × ν). An equivalent definition of h.r. equilibrium state is that the configuration projection of P is a pure h.r. Gibbs state, and the conditional probabilities P (·|X) correspond to i.i.d. velocities with some distribution ν. 2.2. The hard-rod dynamics: Equilibrium properties. For details and proofs of the results of Propositions 2.2.1 and 2.2.2 below we refer to [13, 1 and 2]. Proposition 2.2.1. Let P be a h.r. equilibrium state with velocity distribution ν such that Eν |v| < ∞. Then the following assertions hold: (i) P is invariant under the h.r. dynamics: Tτ∗ P ≡ P for all τ ∈ R1 . (ii) If ν has no atom at v0 = Eν v, then (M, Tτ , P ) is a K-system. (iii) If, for some δ > 0, ν([v0 − δ, v0 + δ]) = 0, then (M, Tτ , P ) is a B-system. Proposition 2.2.2. Let G be a t.i. h.r. state with MM RG = ρ(λ × ν), where Eν |v| < ∞, and P be the h.r. equilibrium state with the same MM RP = RG . Let furthermore G0 and P 0 denote the corresponding contracted states. Then the state Tτ∗ G converges weakly, as τ → ±∞, to P iff state Tτ0∗ G0 obtained from G0 in the course of the free dynamics converges weakly to P 0 . 2.3. The Euler equation. Consider a family of initial (full) h.r. states P , > 0, which satisfy Conditions 2.3.1 and 2.3.2 below. 2.3.1. i) The MM RP is absolutely continuous with respect to λ × ν0 , ν0 being a σ-finite measure on R1 with ν0 (C) ≤ cλ(E(C)) for any compact C ⊂ R1 , where E(C) = C ∪ {x ∈ C c : dist(x, C) ≤ 1}, and c > 0 is a constant. Rii) The ν0 −MF ρP (x, v)R satisfies the bound ρP (x, v) ≤ φ(v) , x, v ∈ R1 , where ν0 (dv)φ(v) < a−1 and ν0 (dv)φ(v)|v| < ∞. iii) The following relation holds: lim ρP (−1 q, v) = f0 (q, v),
→0
x, v ∈ R1 ,
where f0 is a function of class C 1 in q for which f0 (q, v) < φ(v). To state condition 2.3.2, we need some definitions. Given q, v and t, we set
(2.7)
1-D Hard-Rod Caricature of Hydrodynamics
ÿZ r(q, v; t; f0 ) =
Z
v
−∞
583
Z
t(v−w)
ds −
ν0 (dw) 0
Z
∞ v
ν0 (dw)
0 t(v−w)
! ds (Cq f0 )(q + s, w).
(2.8) Here the function Cq f0 is the image of f under a continuous analogue of the contraction C(q,v) : (2.9) (Cq f0 )(q + s, w) = f0 (q + s∗ , w)(1 − aρ0 (q + s∗ ))−1 , R ∗ ∗ ¯ 0 (q, w) ¯ and the point s = s (q, s; f0 ) is determined by the where ρ0 (q) = ν0 (dw)f equations Z q+s∗ Z q ∗ dsρ ¯ 0 (s) ¯ = s, for s > 0 , and s∗ + a dsρ ¯ 0 (s) ¯ = s , for s < 0 . s −a q+s∗
q
−1 Note that formula (2.8) becomes particularly simple if R ρ0 (q) = ρ ∈ (0, a ) is a constant, i.e., f0 is of the form f0 (q, v) = ρh(q, v), where ν0 (dw)h(q, w) = 1 for any q ∈ R1 . In this case r(q, v; t, f0 ) is given by the expression ! ÿZ Z Z Z v
ρ(0) −∞
t(v−w)
ds −
ν0 (dw) 0
∞
v
0
ν0 (dw)
with ρ(0) = ρ(1 − ρa)−1 . Condition 2.3.2 reads as follows:
ds h(q + s(1 + ρ(0) a), w), t(v−w)
(2.10)
2.3.2. For any δ > 0, t ∈ R1 and any bounded C ⊂ R1 , uniformly in (q, v) ∈ R1 × C, ∗ −1 −1 b Y : |M ( q, v; t, Y ) − r(q, v; t, f0 )| > δ = 0, lim C(−1 q,v) P(−1 q,v) →0
c )(−1 q,v) is the Palm state associated with P at (−1 q, v) and C∗ −1 where (P ( q,v) c )(−1 q,v) ) is its contraction. This condition is a kind of law of large numbers. ((P Definition. A family of states {P } satisfying Conditions 2.3.1 and 2.3.2 is called a “h.r. family of Euler order, with hydrodynamical profile f0 ” (or, briefly, a “family of Euler order with profile f0 ”). Examples of such families are given in [4]. The following assertion was proved in [4]. Theorem 1. Let {P } be a family of Euler order, with profile f0 . For any t ∈ R1 and any test function g: R1 × R1 → R1 of class C 2 and with compact support, the limit Z Z ∗ lim RT −1 P (dx × dv)g(x, v) = dqν0 (dv)ft (q, v)g(q, v), →0
t
exists, where ft is the unique weak solution of Eq. (1.2), (1.3), with initial data f0 . An important formula for the solution of the Cauchy problem for (1.2), (1.3) is Z Z dqft (q, v)g(q, v) = dqf0 (q, v)g q + tv + ar(q, v; t, f0 ), v , t ∈ R1 , (2.11) where g is an arbitrary measurable bounded function R1 × R1 → R1 with compact support and r(q, v; t, f ) is given by Eq. (2.8).
584
C. Boldrighini, Yu.M. Suhov
2.4. Navier–Stokes correction. Let the measure ν0 , with the properties listed in Condition 2.3.1, be fixed, and consider a non-negative function h = h(q, v), q, v ∈ R1 , of class C 4 R 1 in q for any given v ∈ R such that ν0 (dw)h(q, w) ≡ 1, q ∈ R1 , and i ∂ h(q, v), i h(q, v) ≤ φ(v), ∂q
q, v ∈ R1 ,
i = 1, 2, 3, 4,
R where ν0 (dv)φ(v)|v|i < ∞, i = 1, 2. Finally, choose a value ρ > 0 such that R ρ ν0 (dv)φ(v) < a−1 and set f (q, v) = ρh(q, v), q, v ∈ R1 . (We omit from now on the lower index 0 from the notation f0 .) Definition. A family of h.r. states {P , > 0} is called a hydrodynamical family of Navier–Stokes order with constant particle density ρ and velocity distribution {h(q, v)ν0 (dv), q ∈ R1 }, if the following Condition 2.4.1 is fulfilled. 2.4.1. The configuration projection of the state P is the pure h.r. Gibbs state Q of density ρ, and the conditional probabilities P (·|X) correspond to independent velocities, the velocity distribution for a particle with position q ∈ X being h(q, v)ν0 (dv). We interpret the states {P , > 0, } as “local equilibrium states” with a constant density. Local equilibrium means here that the family of velocity distributions is -scaled, and the first ν0 −MF of the state P is given by ρP (q, v) = f (q, v),
q, v ∈ R1 .
(2.12)
c (x,v) may be described as the image, under D∗ , of a Remark 2.4.2. The Palm state P (x,v) ,0 point–particle state P(x,v) which is determined by the following properties A and B. b (0) associated to the A. The configuration projection of P ,0 is the Palm state Q (x,v)
x
configuration Poisson state Q(0) of density ρ(0) . ,0 B. The conditional distribution P(x,v) (·|X) corresponds to independent velocities of the particles positioned at the points x e ∈ X, the velocity distribution being h x e+ aN(x,e +a , ve ν0 (de v ) if x e ∈ X ∩ (x + ∞) and h (e x −aN(e − a), ve ν0 (de v ), if x) x,x) x e ∈ X ∩ (−∞, x). Here, and below, N(b,b0 ) stands for the number of particles with positions in (b, b0 ). c )(x,v) = P ,0 , where P ,0 is determined by A and B. It is In other words, C∗(x,v) (P (x,v) (x,v) possible to prove that a hydrodynamical family of Navier–Stokes order with constant density ρ satisfies both Conditions 2.3.1 and 2.3.2, with f (q, v) = ρh(q, v). For the proof, see [4]. Our aim is to prove the following assertion: Theorem 2. Let {P , > 0} be a family of Navier–Stokes order with constant particle density ρ ∈ (0, a−1 ) and velocity distribution {h(q, v)ν0 (dv), q ∈ R1 }, where ν0 and h satisfy the conditions above. Then, for any test function g: R1 × R1 → R1 of class C 4 and with compact support, the relation (1.4) holds, with f0 (q, v) = ρh(q, v) and B given by (1.5).
1-D Hard-Rod Caricature of Hydrodynamics
585
3. Proof of Theorem 2 Let t > 0. The integral
R
dqν0 (dv)ρT ∗−1
Z dqν0 (dv)f (q, v)EC
b (−1 q,v) P −1 (
t
P (
−1
q, v)g(q, v) can be written as
g q + tv + aM (−1 q, v; −1 t; ·), v .
(3.1)
q,v)
b 0 instead of This is the basic expression we shall work with. We write for brevity E b (0)−1 , . (Actually expectations are, in most cases, with respect to Q EC q b (−1 q,v) P(−1 q,v) the Palm state of the c.P. state Q(0) of density ρ(0) .) We also write M instead of f for the centered variable: M f = M −E b 0 M . (For other random M (−1 q, v; −1 t, ·) and M variables a tilde will have a similar meaning.) The derivatives g 0 , g 00 , etc., (and h0 , h00 , etc.) are understood as partial derivatives with respect to the first argument. By a Taylor expansion of g to fourth order we write the expression (3.1) as Z 2 f2 b 0 M, v E b 0 M, v + (a) g 00 q + tv + aE b0M dqν0 (dv)f (q, v) g q + tv + aE 2 4 (a)3 000 f4 . f3 + (a) E b 0 g IV (q + tv + Θ1 , v)M b0M b 0 M, v E g q + tv + aE + 3! 4! (3.2) 0 f b Here Θ1 is a random variable such that |Θ1 − aE M | ≤ a|M |. R Lemma 3.1. Under the conditions of Theorem 2, for any t, lim→0 dqν0 (dv)h(q, v) i h 2 3 f4 = 0. f3 + 3 a4 E b 0 g IV (q + tv + Θ1 , v)M b 0 M, v E b0M × 3!a g 000 q + tv + aE 4! We will prove Lemma 3.1 after the end of the proof of Theorem 2. The same we do for the other auxiliary Lemmas 3.2 and 3.4 below. The next step is to analyse the two leading terms in (3.2). Observe that Z Z t(v−w) b 0 h q + s ± a(N± + 1), w b 0 M = ρ(0) ν0 (dw) dsE (3.3) E 0
Ru Rv Ru ( v is here understood in the algebraic sense: v = − u ). This follows immediately from properties A and B in Remark 2.4.2. Here and below N+ stands for N(−1 q,−1 (q+s)) if s > 0 and N− for N(−1 (q+s),−1 q) if s < 0, and the sign ± is chosen in accordance b 0 h(q + s + a(N+ + 1), w) as with that of s. For s > 0 we write E 2 e+2 + b 0 N+ + 1), w + (a) h00 q + s + a(E b 0 N+ + 1), w E b0N h q + s + a(E 2 4 (a)3 000 e4 , e+3 + (a) E b 0 hIV (q + s + Θ(+) , w)N b 0 N+ + 1), w E b0N h q + s + a(E + 2 3! 4! (3.4a) b 0 N+ . A similar formula holds where Θ2(+) is a random variable between aN+ and aE for s < 0. As before, the terms with h000 and hIV will be estimated and neglected. The non-trivial contribution will come from the remaining two terms. e 2 = −1 ρ(0) s, and similar b0N b 0 N+ = E Using again property A yields, for s > 0, E + equalities hold for s < 0. We can write, for s > 0,
586
C. Boldrighini, Yu.M. Suhov
b 0 N+ + 1 , w = h q + s 1 + ρ(0) a , w h q + s + a E (a)2 00 + ah0 q + s 1 + ρ(0) a , w + h q + s(1 + ρ(0) a) + ϑ(+) , w , 1 2
(3.4b)
where ϑ(+) 1 is a quantity between zero and a. A similar equality holds for s < 0. Combining (3.3) and (3.4a,b) we get b 0 M = r(q, vt; f ) + a(r¯1 (q, v, t; f ) + r1 (q, v, t; f )) + E
(a)2 0 r (q, v, t; f ), 2 1
(3.5)
where r(q, v, t; f ) is given by (2.10), r1 (q, v, t; f ), denoting the sign of s by σ(s), is equal to Z Z t(v−w) dsσ(s)f 0 q + s(1 + ρ(0) a), w , (1 − ρa)−1 ν0 (dw) 0
r¯1 (q, v, t; f ) is equal to aρ 2(1 − ρa)2
Z
Z
t(v−w)
ν0 (dw)
dssσ(s)f 00 q + s(1 + ρ(0) a), w ,
0
and r10 is the integral of an expression containing the derivatives h00 , h000 , and hIV . Going back to (3.2), we have b 0 M, v = g q + tv + ar(q, v, t; f ), v + a2 g 0 q + tv g q + tv + aE + ar(q, v, t; f ), v r1 (q, v, t; f ) + r¯1 (q, v, t; f ) + 2 g1 (q, v, t; f ), 0 2 2 with g1 = a2 g 0 (q + tv + ar, v)r10 + a2 g 00 (q + tv + ar + θ0 , v) (r1 + r¯1 + a 2 r1 ) . The −term may be neglected, as shown in Lemma 3.2 below: R Lemma 3.2. Under the conditions of Theorem 2, for any t, lim→0 dqν0 (dv)h(q, v) ×g1 (q, v, t; f ) = 0. 3
4
The assertion of Lemma 3.2 completes the analysis of the g-term in (3.2). We now pass to the g 00 -term. f2 = r2 (q, v, t; f ) b0M Lemma 3.3. Under the conditions of Theorem 2, for any t, lim→0 E + r¯2 (q, v, t; f ), where r2 (q, v, t; f ) is equal to Z Z t(v−w) dsσ(s)f q + s(1 + ρ(0) a), w (1 − ρa)−1 ν0 (dw) 0
and r¯2 (q, v, t; f ) to Z t(v−wj ) 2 Z Y 1 ν (dw ) dsj f 0 q + sj (1 + ρ(0) a), wj s1 ∧ s2 1(s1 s2 > 0). 0 j 2 (1 − ρa) 0 j=1
Here s1 ∧ s2 stands for min[s1 , s2 ] when s1 , s2 ≥ 0 and max[s1 , s2 ] when s1 , s2 < 0.
1-D Hard-Rod Caricature of Hydrodynamics
587
From Lemma 3.3 we derive that 2 (a)2 00 f2 = a g 00 q + tv + ar(q, v, t; f ), v b 0 M, v E b0M g q + tv + aE 2 2 × r2 (q, v, t; f ) + r¯2 (q, v, t; f ) + 2 g2 (q, v, t; f ).
(3.6)
It turns out that the 2 −term is again negligible, as stated by the following lemma. Lemma 3.4. Under the conditions of Theorem 2, for any t, we have Z lim dqν0 (dv)h(q, v) × g2 (q, v, t; f ) = 0. →0
After all calculations are made, we write the “essential” part of (3.2) as Z h dqν0 (dv)f (q, v) g q + tv + ar(q, v, t; f ), v (3.7) + a2 r1 (q, v, t; f ) + r¯1 (q, v, t; f ) g 0 q + tv + ar(q, v, t; f ), v i a2 + r2 (q, v, t; f ) + r¯2 (q, v, t; f ) g 00 q + tv + ar(q, v, t; f ), v . 2 R The first term in parentheses gives dqν0 (dv)ft (q, v)g(q, v). The terms of order sum up to the expression Z h a2 dqν0 (dv)f (q, v) 2 r1 (q, v, t; f ) + r¯1 (q, v, t; f ) g 0 q + tv + ar(q, v, t; f ), v 2 i + r2 (q, v, t; f ) + r¯2 (q, v, t; f ) g 00 q + tv + ar(q, v, t; f ), v . (3.8) We now take the derivative ∂/∂t (or divide by t) and pass to the limit t → 0 + . The limiting expression is h Z i a2 ∂ dqν0 (dv)f (q, v) 2 r1 (q, v, t; f ) + r¯1 (q, v, t; f ) g 0 (q, v) 2 ∂t t=0+ h i ∂ + r2 (q, v; t, f ) + r¯2 (q, v, t; f ) g 00 (q, v) . ∂t t=0+ ∂ ∂ r¯1 (q, v, t; f ) = ∂t r2 (q, v, t; f ) = 0, and It is easy to check that ∂t t=0+
t=0+
Z ∂ ∂ r1 (q, v, t; f ) = ν0 (dw)|v − w| f (q, w)(1 − ρa)−1 , (3.9a) ∂t ∂q t=0+ Z ∂ r2 (q, v, t; f ) = ν0 (dw)|v − w|f (q, v)(1 − ρa)−1 . (3.9b) ∂t t=0+ R After integrating by parts we arrive at the integral dqν0 (dv)g(q, v)(Bf )(q, v), where (Bf )(q, v) is given by (1.5). This completes the proof of Theorem 2 for t > 0. For t < 0, t → 0−, the reasoning is the same except that the right sides of Eq.s (3.9a, b) change sign.
588
C. Boldrighini, Yu.M. Suhov
Proof of Lemma 3.1. The two terms arising in the expression are analysed in a similar way. To avoid repetitions, we treat only the one containing g IV . We have Z Z f4 ≤ c3 dqν0 (dv)h(q, v)E f4 . b 0 g IV (q + tv + Θ1 , v)M b0M 3 dqν0 (dv)h(q, v)E (3.10) We now write formulas which will be repeatedly used in what follows in slightly different versions. The fourth semi-invariant (or cumulant) of M is given by f4 = E b 0 M + 6E b 0 M 4 − 4E b0M 3E b 0 M 2 (E b 0 M )2 − 3(E b 0 M )4 . b0M E
(3.11)
b 0 M 4 as a sum: One can then write E Z Z t(v−w) b 0 h q + s ± a(1 + N± ), w −1 ρ(0) ν0 (dw) dsσ(s)E + 4
−2
(0) 2
(ρ )
2 Z Y
0
Z
+ 6−2 (ρ(0) )2
Z
Z
Z
b 0 h q + sj ± a(1 + N± (sj )), wj dsj σ(sj )E
b 0 h(q + sj ± a(1 + N± (sj )), wj ) dsj E
0
t(v−w3 )
b 0 h(q + s3 ± a(1 + N± (s3 )), w3 ) ds3 σ(s3 )E
0
+ −4 (ρ(0) )4
t(v−wj )
ν0 (dwj )
ν0 (dw3 ) 4 Z Y
0
j=1
Z ·
2 Z Y
t(v−wj )
ν0 (dwj )
j=1
+ 6−3 (ρ(0) )3
b 0 h q + sj ± a(1 + N± (sj )), wj dsj E
0
j=1 2 Z Y
t(v−wj )
ν0 (dwj )
Z ν0 (dwj )
t(v−wj )
b 0 h(q + sj ± a(1 + N± (sj )), wj ); dsj E
0
j=1
(3.12) b0M 2. b 0 M 3 and E similar formulas hold for E We have to follow the terms of order −4 and −3 which arise in the RHS of (3.11). Let us consider the case of −4 which is slightly more complicated. Note that in the limit → 0 the coefficient of this term in (3.12) is 4 Z Z t(v−wj ) Y ν0 (dwj )dsj h q + sj (1 + ρ(0) a) ± a, wj ; (3.13) j=1
0
this follows immediately from the law of large numbers and the Lebesgue dominated convergence theorem. The same is true for the −4 - terms which come from the other addends in the RHS of (3.11). The sum of all coefficients gives zero. The deviation of the −4 -coefficient in the RHS of (3.12) from the limiting value (3.13) remains to be studied: this may create a term of order and, after multiplication by −4 , a term of order −3 . A similar problem arises with other terms of order −4 figuring in the RHS of (3.11). Comparing the quantities b0 E
4 Y j=1
h q + sj ± a(1 + N± (sj )), wj
and
4 Y j=1
h q + sj (1 + ρ(0) ) ± a, wj ,
1-D Hard-Rod Caricature of Hydrodynamics
589
we use again the Taylor expansion formula and formulas for moments of the random e± in a c.P. state. We conclude that the first derivative gives zero contribution variables N and the second one generates an 2 factor in front, which makes the term negligible. Similar arguments work for the other −4 − terms from the RHS of (3.11). This completes the proof of Lemma 3.1. Proof of Lemma 3.2. In many details the proof of Lemma 3.2 repeats that of Lemma 3.1 (this is also true for Lemmas 3.3 and 3.4), and we shall proceed in a more concise way. The analysis of the various terms of which the quantity g1 (q, v, t; f ) is made is similar, so we will analyse only one of them, the one which contains the product of g 0 times one of the terms containing the derivative h000 in r10 . Up to a constant factor, it can be written as Z t(v−w) Z 0 3 e± . b N dsσ(s)h000 q + s(1 + ρ(0) a), w E ν0 (dw) 3 g 0 q + tv + ar(q, v; t, f ), v 0
e 3 = −1 |s|ρ(0) , we conclude that the b0N By our assumptions on h and the equality E ± “true” order of the last expression is 2 . Proof of Lemma 3.3. .It is easy to see that Z Z t(v−w) b 0 h q + s ± a(1 + N± ), w + b 0 M 2 = ρ(0) ν0 (dw) dsσ(s)E E 0 t(v−w1 )
Z Z t(v−w2 ) ds1 ν0 (dw2 ) ds2 0 0 b (0) h q + s1 ± a(1 + N± (s1 )), w1 h q + s2 ± a(1 + N± (s2 )), w2 . E
−1 (ρ(0) )2
Z
Z
ν0 (dw1 )
(3.14)
b 0 M )2 is equal to On the other hand, (E −1 (ρ(0) )2
2 Z Y
Z
t(v−wj )
ν0 (dwj )
b 0 h q + sj ± a(1 + N± (sj ), wj . dsj E
(3.15)
0
j=1
f2 = EM 2 − (E b 0 M )2 ). b0M We have to look for the limiting non-vanishing terms of E It is not hard to check that those terms are: Z Z t(v−w) (0) ρ ν0 (dw) dsσ(s)h q + s(1 + ρ(0) a), w 0
(this is the limit of the first addend on the RHS of (3.14)) and (ρ(0) )3
2 Z Z Y j=1
t(v−wj )
dsj ν0 (dwj )h0 q + sj (1 + ρ(0) a), wj (|s1 | ∧ |s2 |)1(s1 s2 > 0)
0
(which is the difference of the second addend from the RHS of (3.14) and the RHS of (3.15)). This completes the proof of Lemma 3.3. f)2 which was done in the proof b 0 (M Proof of Lemma 3.4. After the analysis of the term E of Lemma 3.3, all that remains is to study the difference
590
C. Boldrighini, Yu.M. Suhov
b 0 M, v − g 00 q + tv + ar(q, v; t, f ), v . g 00 q + tv + aE This is straightforward in view of (3.3). Acknowledgement. Parts of this work were done at: Institut f¨ur Angewandte Mathematik, Universit¨at Heidelberg; Isaac Newton Institute for Mathematical Sciences, University of Cambridge; School of Theoretical Physics, Dublin Institute for Advanced Studies; Institute for Mathematics and its Applications, University of Minnesota, and Universit´e de Cergy–Pontoise. The authors thank these institutions for the warm hospitality.
References 1. Aizenman, M., Goldstein, S. and Lebowitz, J.L.: Ergodic properties of an infinite one dimensional hard rod system. Commun. Math. Phys. 39, 289–301 (1975) 2. Boldrighini, C., Dobrushin, R.L. and Suhov, Yu.M.: Time asymptotics for some degenerate models of evolution of infinite particle systems. In: Sovremennye Problemy Matematiki. VINITI AN SSSR. VINITI, Moscow [Russian], vol. 14, 147–254 (1979) (English translation published by the University of Camerino (1980)) 3. Boldrighini, C., Dobrushin, R.L. and Suhov, Yu.M.: Hydrodynamical limit for a degenerate model of classical statistical mechanics. Uspekhi Matem. Nauk [Russian], 35 no 4, 152 (1980) 4. Boldrighini, C., Dobrushin, R.L. and Suhov, Yu.M.: One dimensional hard rod caricature of hydrodynamics. J. Stat. Phys. 31, 577–616 (1983) 5. Boldrighini, C. and Wick, D. (1988) Fluctuations in a one-dimensional mechanical system. I. The Euler limit. J. Stat. Phys. 52, 1069–1098 6. Dobrushin, R.L. Caricatures of hydrodynamics. In: IXth International Congress on Mathematical Physics, 17-27 July, 1988, Swansea, Wales (B. Simon, A. Truman and I.M. Davies, Eds). Amsterdam et al: Adam Higler, 1989 pp. 117–132 7. Dobrushin, R.L.: On the way to the mathematical foundations of Statitstical Mechanics. In: R.L. Dobrushin, S.Kusuoka, Statistical Mechanics and Fractals. Lecture Notes in Mathematics, vol. 1567. Berlin: Springer, 1992, 1–37 8. Dobrushin, R.L., Sinai, Ya.G. and Suhov, Yu.M.: Dynamical systems of statistical mechanichs. In: Dinamicheskie Sistemy-2. Sovremennye Problemy Matematiki. Fundamentalnye Napravlenija. VINITI AN SSSR [Russian], vol 2, (1985) 233–307 9. Lebowitz, J.L., Percus, J.T. and Sykes, J.: Time evolution of the total distribution function of a onedimensional system of hard rods. Phys. Rev. 171, 224–235 (1968) 10. Matthes, K., Kerstan, J. and Mecke, J.: Infinitely Divisible Point Processes. Chichester et al: Wiley, 1978 11. Olla, S., Varadhan, S.R.S. and Yau, H.T.: Hydrodynamical limit for a Hamiltonian system with weak noise. Commun. Math. Phys. 155, 523–557 (1992) 12. Percus, J.T.: Exact solution of kinetics of a model of classical fluid. Phys. Fluids 12, 1560–1563 (1968) 13. Sinai, Ya.G.: Ergodic properties of a gas of hard rods with an infinite number of degrees of freedom. Funct. Anal. Appl. 6, 35–42 (1972) 14. Sinai, Ya.G.: Dynamics of local equilibrium Gibbs distributions and Euler equations. The onedimensional case. Selecta Mathematica Sovietica 7, 279–289 (1988) 15. Spohn, H.: Private communication (1982) 16. Spohn, H. Hydrodynamical theory for equilibrium time correlation functions of hard rods. Ann. Phys. 141, 353–364 (1982) 17. Spohn, H.: Large scale dynamics of interacting particles. Berlin: Springer-Verlag 1991 Communicated by J.L. Lebowitz
Commun. Math. Phys. 189, 591 – 619 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Dobrushin States in Quantum Lattice Systems C. Borgs1,? , J. T. Chayes2,?? , J. Fr¨ohlich3 1
Institut f¨ur Theoretische Physik, Universit¨at Leipzig, Augustusplatz 10/11, D-04109 Leipzig, Germany. E-mail: [email protected] 2 AT&T Research, Murray Hill, New Jersey, USA. E-mail: [email protected] 3 Institut f¨ ur Theoretische Physik, ETH-H¨onggerberg, CH-8093 Z¨urich, Switzerland. E-mail: [email protected] Received: 15 October 1996 / Accepted: 21 February 1997
Dedicated to the memory of R. L. Dobrushin
Abstract: We consider quantum lattice systems which are quantum perturbations of suitable classical systems with two translation-invariant ground states, not necessarily related by symmetry. Simple examples of such systems include the anisotropic quantum Heisenberg model and the narrow band extended Hubbard model. Under the assumption that the quantum perturbation is exponentially decaying with a sufficiently large decay constant, we prove that these systems are capable of supporting non-translation-invariant states at sufficiently low temperatures in dimension d ≥ 3. These states are induced by so-called Dobrushin boundary conditions which force an asymptotically horizontal interface into the system. We also discuss quantum and classical interfacial ordering transitions that may occur in these systems.
1. Introduction The Ising model has been a paradigm for the study of phase transitions in equilibrium statistical mechanics ever since Peierls discovered his famous argument [Pei36] to prove the existence of a first-order phase transition in this model in two or more dimensions and Onsager [Ons44] solved the two-dimensional model in zero magnetic field. The key idea underlying Peierls’ analysis is to represent the Ising model as a gas of domain boundaries separating regions of opposite spin orientation. The point of Peierls’ argument is to recognize that, at low temperatures, these so-called Peierls contours form a dilute gas, (and hence the probability that the spin at the origin has the same orientation as the spins at infinity is > 1/2). ? Present address: Microsoft Research, 1 Microsoft Way, Redmond, WA 98052, USA. Partially supported by the Commission of the European Union under contract CHRX-CT93-0411. ?? Present address: Microsoft Research, 1 Microsoft Way, Redmond, WA 98052, USA. Partially supported by NSF Grant No. DMS-9403842.
592
C. Borgs, J. T. Chayes, J. Fr¨ohlich
From a mathematical point of view, Peierls’ original argument actually turned out to be somewhat incomplete. It was Griffiths [Gri64] and Dobrushin [Dob65] who converted Peierls’ analysis into a rigorous proof of the statement that, in dimension d > 2, at sufficiently low temperature and in zero magnetic field, there are (at least) two translationinvariant equilibrium states of opposite spontaneous magnetization. It was then natural to ask whether the Ising system could support non-translation-invariant states. This question was answered in the affirmative by Dobrushin [Dob72]. He showed that the Ising model in three (or more) dimensions has equilibrium states with a non-translation invariant magnetization profile. This can be interpreted as resulting from an interface separating two pure phases of opposite magnetization. At sufficiently low temperatures, the fluctuations of an interface asymptotically parallel to a lattice (hyper) plane are bounded. When the temperature is increased one encounters a transition to a regime with divergent interface fluctuations. A long-standing open problem is to show that, in the three-dimensional Ising model, this so-called roughening transition occurs at a temperature strictly below the Curie temperature above which the spontaneous magnetization vanishes. It has been shown by Gallavotti [Gal72] that, in two dimensions, the roughening temperature is zero (see also [Mer79, Hig79 and Aiz80]), and one expects that in four or more dimensions, the roughening and the Curie temperatures coincide. The Peierls argument was later generalized and made into a systematic tool by Pirogov and Sinai [PS75]. Finally, Pirogov-Sinai theory has been extended to quantum lattice systems in [BKU96] and [DFF96]. Our concern, in this note, is to extend Dobrushin’s result on interface rigidity in dimension d > 3 to a general class of anisotropic quantum lattice systems. A simple example of a system amenable to our analytical methods is an anisotropic quantum Heisenberg model. This model describes a system of quantum mechanical spins, Sx = (Sx(1) , Sx(2) , Sx(3) ), located at the sites of the (hyper) cubic lattice Zd . The dynamics of the system confined to a finite region 3 ⊂ Zd is given by a Hamiltonian X X Sx(1) Sy(1) + λ VA , (1.1) H3 = ± hx,yi⊂3
A⊂3
where hx, yi denotes a pair of nearest-neighbor sites. The components, Sx(i) , of the spin operators Sx are matrices acting on a finite-dimensional Hilbert space Hx ' C2j+1 , j = 1 3 2 , 1, 2 , · · ·; the operators VA act on the tensor product space O Hx , HA = x∈A
their norms decrease exponentially in the size of A at a sufficiently fast rate (see Sect. 2 for precise definitions), and λ isP a coupling constant. The first term in H3 is an Ising VA , is a quantum perturbation which need neither Hamiltonian, the second term, λ A⊂3
have any symmetries, nor be self-adjoint (e.g., λ could be complex). We study the properties of equilibrium states with Dobrushin boundary conditions forcing an (asymptotically) horizontal interface into the system, in the limit where 3 increases to Zd . We show that, at sufficiently low temperatures β −1 and for sufficiently small values of λ, the interface is rigid. The corresponding equilibrium states are translation-invariant in directions parallel to the interface. Expectation values of products of spin operators in these states exhibit exponential clustering, and they approach the corresponding pure phase expectation values exponentially fast in the distance to the flat interface.
Dobrushin States in Quantum Lattice Systems
593
Next, we sketch some of the key ideas involved in the proofs of such results. The first step is to derive a Feynman-Kac type representation for exp(−βH3 ), converting a d-dimensional quantum system into a (d+1)-dimensional classical system with one continuous, compact dimension. By a resummation procedure, this system is then mapped onto a classical contour model defined on the (d + 1)-dimensional lattice Zd × (Z/N Z), where N is proportional to the inverse temperature β. An unconventional feature of the classical contour model is that its contour weights are not, in general, positive or even real-valued. While the complications arising from non-positive contour weights have been tackled in the analysis of periodic states in [BI92], they have been studied in the context of interfacial states only recently; see [BCF96]. The key idea in the analysis of Dobrushin states is to condition on an arbitrary interfacial contour, or interface, satisfying Dobrushin’s boundary conditions and to resum over all remaining contours compatible with the interfacial contour. This yields an effective model for the interface with weights given in terms of pure-phase partition functions. Explicit expressions for these weights can be derived with the help of Pirogov-Sinai theory. The next step is to decompose the interface into flat pieces and interfacial excitations. At low temperatures, these excitations are weakly interacting and form a dilute system that can be studied with the help of expansion methods. As a result, one finds that interface fluctuations are bounded, uniformly in the size of the system. Although in this note we only formulate and prove results for the simplest class of systems, the methods described in [BCF96] and illustrated in this paper have considerably more general applications to the analysis of surface and interface phenomena, some of which will be sketched presently. First, under suitable assumptions on the weights of interfaces, and borrowing techniques of e.g. [BF85], one can extend Gallavotti’s results [Gal72] to some class of two-dimensional quantum lattice systems. Although at positive temperature interfaces in two-dimensional lattice systems are usually rough, they tend to be rigid at zero temperature. For a large class of classical lattice systems, this can be proven by explicit computation. The question then arises whether quantum-mechanical perturbations might destabilize such zero-temperature interfaces. Our techniques enable us to show that, for sufficiently small perturbations, and under suitable assumptions on the classical interactions, this does not happen. We shall not, however, pursue these issues in this paper. Second, our methods can be extended to the general setting presented in [BKU96] and [DFF96] provided one makes natural, additional assumptions ensuring that interfacial excitations are suppressed at low temperatures and ruling out that a third (pure) phase emerges in between the two coexisting stable phases favored by the boundary conditions, a phenomenon known as (complete) interfacial wetting. Our methods enable us, for example, to analyze the surface separating a dense from a dilute phase in a quantum lattice gas of fermions or (hard core) bosons, or the interface between two distinct charge density wave states of an extended Hubbard model. In order to describe these and further applications more precisely, we consider a t-J Hamiltonian defined by
594
C. Borgs, J. T. Chayes, J. Fr¨ohlich
H3 = −t
X
{c†xσ cyσ + c†yσ cxσ }
hxyi⊂3 σ=↑,↓
+U
X
nx↑ nx↓ +
hxyi⊂3
x∈3
X
−
U0 X nx ny 2
{JSx(1) Sy(1)
(1.2)
+ λ(Sx(2) Sy(2) + Sx(3) Sy(3) )}
hxyi⊂3
−µ
X
nx + · · ·
x⊂3
Here c†xσ , cxσ are standard creation- and annihilation operators satisfying canonical P P † (α) nxσ , Sx(α) = 21 cxσ τσσ0 cxσ0 , anticommutation relations, nxσ = c†xσ cxσ , nx = σ=↑,↓
(1)
(2)
σ,σ 0
0
(3)
where τ , τ and τ are the usual Pauli matrices, U and U are positive constants, J and λ are exchange couplings, and µ is a chemical potential. The dots stand for further terms depending e.g. on an external magnetic field; (they will be set to zero below). For J = λ = 0 this model is the extended Hubbard model referred to earlier in this introduction. We consider the parameter range 0
6
|λ|
,
t
|J |
U0
U.
(1.3)
Then, for an open interval of values of the chemical potential µ contained in 0 < µ < d(U 0 − |J| 2 ), the ground states of the Hamiltonian H3 introduced in (1.2) describe configurations of particles in which two neighboring particles are separated by an empty site. These are “checkerboard configurations” corresponding to a quarter-filled lattice. More precisely, the states described here are the ground states of the model when the hopping term vanishes (t = 0). For t in the range described in (1.3), the true ground states are small perturbations of these states. For t = 0, the ground state energy density is given by ε0 = − µ2 . This ground state energy has a macroscopic degeneracy 2 · 2|3|/2 , where | 3 | is the number of sites in 3, because the spins of the particle can have an arbitrary orientation; (note that exchange couplings are nearest-neighbor). It is easy to choose Dobrushin boundary conditions at ∂3 such that, for t = 0, the ground state of the system has an interface separating two periodic ground states (i.e. two “checkerboard configurations”) obtained from each other by translation by one lattice unit, as shown in Fig. 1 below. It is important to note that quantum-mechanical hopping generates effective nextnearest-neighbor and longer-range exchange couplings, as discussed in [DFFR96]. However, as long as t, | J | and | λ | are much smaller than U 0 and U , we can show that there is an intermediate range of temperatures with the property that there are Dobrushin states with stable interfaces of the type indicated in Fig. 1, but the spins of the particles are in a disordered state [BCF97]. This result holds, of course, even when | λ |=| J |. One may now ask what happens when the temperature is lowered. A combination of the methods and results developed in [DFFR96] with the methods of [BCF96] and of the present paper shows that, e.g. in the range (1.3), the spin orientation of the particles is predominantly σ =↑ if “up” (↑) boundary conditions are imposed at the boundary ∂3, provided the temperature is sufficiently low. Ferromagnetic spin ordering at low temperatures is a consequence of quantum-mechanical hopping and of the presence of nearest-neighbor ferromagnetic exchange couplings. An interesting phenomenon that is likely to be encountered in the study of these systems is that the interface orders at higher temperatures than the bulk: We again
Dobrushin States in Quantum Lattice Systems
595
Fig. 1. A Dobrushin state formed by two translates of a periodic state
impose Dobrushin boundary conditions on particle positions forcing an interface of the type sketched in Fig. 1 into the system and “up” boundary conditions on spins at ∂3. We conjecture that, in an intermediate range of temperatures, the spins along the interface are ferromagnetically ordered, while the spins in the bulk are still disordered. Unfortunately, single-scale expansion methods are probably not powerful enough to prove this conjecture. A result which can, however, be obtained by combining the perturbation-theoretic methods of [DFFR96] with the Pirogov-Sinai theory developed in [BKU96] and [DFF96] and with the methods of [BCF96] and of this paper is that, at low temperatures, there are states with a non-trivial magnetization profile, as sketched in Fig. 2 below, obtained by imposing Dobrushin “up-down” boundary conditions on the spins at ∂3.
Fig. 2. An “up-down” Dobrushin state
596
C. Borgs, J. T. Chayes, J. Fr¨ohlich
For 0 < µ < d(U 0 − J2 ), such states exist in the range 0 < t J U 0 U, | λ | J, provided the temperature is sufficiently low. The interesting phenomenon is that spin ordering and the magnetization profile are due to quantum-mechanical hopping. If the hopping amplitude t were zero neither ferromagnetic ordering nor Dobrushin states would exist. The phenomena that surfaces or interfaces order before the bulk does and that surfaces and interfaces can exhibit order-disorder- and Kosterlitz-Thouless type transitions in temperature ranges where the bulk is still disordered are interesting in the context of surface magnetism [PP90, KP93]. As remarked above, the t-J model defined in (1.3) is likely to exhibit such phenomena, but it may be too difficult, technically, to establish their existence rigorously. For this reason, we have identified a class of quantum lattice systems, ranging from modified t-J models over systems of localized magnetic ions to valence bond solids with mobile surface electrons, for which we can, in principle, establish the existence of surface ordering and surface phase transitions. However, a detailed description of our ideas and proofs [BCF97] goes beyond the scope of the present paper. We conclude this introduction by drawing attention to some major, unsettled issues in the theory of surfaces and interfaces. (1) While there are simple models of lattice surfaces for which the existence of a roughening transition has been established [vB77, FS81], we do not know of any realistic model for which the existence of an interfacial roughening transition at a temperature strictly below the Curie temperature has been established rigorously. (2) The expansion methods used in [BCF96] and in this paper fail in the analysis of surfaces exhibiting gapless surface modes. The mathematical study of surfaces and interfaces supporting gapless modes is in a rudimentary stage. In this connection, recent results in [Nac96] are of considerable interest. (3) An interface separating two stable pure phases can be wetted by a third phase. Partial wetting of interfaces is a very common phenomenon and, under suitable technical assumptions, can be analyzed. However, if a point of coexistence of the third phase with the two stable phases is approached, the phenomenon of complete interfacial wetting is encountered. It is a consequence of entropic repulsion. Present analytical methods appear to be inadequate to describe it quantitatively. We hope that this and subsequent papers will draw renewed attention to some interesting phenomena in the theory of surfaces and interfaces and to some novel analytical methods useful to study them. Our efforts have been helped in important ways by collaborations and discussions with Roberto Fernandez and Roman Koteck´y.
2. Definition of the Model and Statement of Results In this paper, we consider quantum lattice systems with Hamiltonians of the form H = H (0) + λV ,
(2.1)
where H (0) is diagonal in a basis |si labeled by the configurations s = {sx } of a classical lattice system on Zd1/2 = (1/2, . . . , 1/2) + Zd , while V is a finite-range or exponentially decaying interaction that is not necessarily diagonal in the basis |si. While our methods apply to many bosonic and fermionic lattice systems, our results are most easily stated for quantum spin systems. The Hilbert space H3 associated with a finite subset 3 of Zd1/2 is a tensor product
Dobrushin States in Quantum Lattice Systems
H3 =
597
O
Hx ,
(2.2)
x∈3
and Hx is isomorphic to a fixed finite-dimensional Hilbert space H0 . For these models, our main assumption is that there is a basis { |s0 i : s0 ∈ S} of H0 , with S a finite set, such that H (0) is diagonal in the tensor product basis |si = ⊗x |sx i, where s : Zd1/2 → S : x 7→ sx . In [BKU96] and [DFF96], it was assumed that the corresponding classical Hamiltonian, H (0) (s), is finite-range with translation-invariant interactions, has finitely many periodic ground states, that a suitable Peierls condition for the excitations above these ground states and a so-called degeneracy removing condition for the dependence of the ground state energies on the external parameters of the model hold. The perturbation V was assumed to be a sum of local potentials VA : X VA , (2.3) V = A
where VA is decaying exponentially in the size of the minimal connected set that covers A, with a large enough decay constant γQ . Under these assumptions, it was shown that the periodic states of the system (2.1) can be controlled by convergent expansions if λ and the temperature 1/β are small enough. In order to construct non-periodic states for the models discussed in [BKU96] and [DFF96], one must impose several additional assumptions on the unperturbed Hamiltonian H (0) . These assumptions have to be designed to guarantee that local deviations from a flat interface are suppressed and to preclude complete wetting. Complete wetting is the phenomenon of “interface thickening,” leading to a periodic state in the thermodynamic limit. These assumptions are essentially the same as those needed for the construction of Dobrushin states in classical lattice systems, see e.g. [HKZ88] or [BI92, Sect. 5]. While our results can be proven in the general context discussed above, we refrain from doing so here in order to simplify the presentation of proofs and results. In particular, we assume that the system possesses only two ground states, making assumptions on the absence of wetting unnecessary. In addition, we assume that the main contribution to the classical part of the interaction comes from on-site and nearest-neighbor interactions, so that we can absorb the remaining parts of the classical interaction into the perturbation V . Note that this includes, as a special case, purely classical systems which are perturbations of nearest-neighbor Hamiltonians by exponentially decaying classical interactions. From now on, we therefore assume that the Hamiltonian H (0) is of the form X h(0) (2.4) H (0) = xy , hxyi
where the sum runs over nearest-neighbor pairs in Zd1/2 , and h(0) xy is a self-adjoint operator on Hhxyi diagonal in the basis |si, (0) h(0) xy |si = hxy (sx , sy ) |si ,
(2.5)
with an eigenvalue h(0) xy (sx , sy ) transforming covariantly under translations and rotations, (0) so that hxy (sx , sy ) depends only on the values assigned to sx and sy , and not on the orientation or position of the bond hxyi, (0) (0) h(0) xy (sy , sx ) = h (sx , sy ) = h (sy , sx ) .
(2.6)
598
C. Borgs, J. T. Chayes, J. Fr¨ohlich
Furthermore, we assume the existence of two translation-invariant reference configurations r(q) : x 7→ rx(q) , q = ±, and the existence of a Peierls constant γcl > 0, such that h(0) (sx , sy ) ≥ max h(0) (rx(q) , ry(q) ) + γcl q=±
for all hxyi and all s 6= r(±) .
(2.7)
For example, for the ferromagnetic version of the Hamiltonian (1.1), the reference configurations are rx(±) ≡ ±j. The condition (2.7) guarantees that classical excitations above the two reference states are suppressed on the energy scale γcl . As for the quantum perturbation, we assume, in addition to translation covariance, that X ||VA ||eγQ d(A) < ∞ , (2.8) |||V |||γQ = A:x∈A
where ||VA || is the operator norm of VA , γQ is a sufficiently large constant, and d(A) is the size of the smallest tree on Zd1/2 that contains all points in A. Remark. Note that periodic Hamiltonians and/or ground states, as e.g. the two staggered ground states of the antiferromagnetic Ising model, can be easily described by the above formalism by considering spin configurations sB on small blocks B as spins in an enlarged spin space S 0 = S B . The above assumptions allow us to apply the methods of [BKU96] and [DFF96]. In order to state their results, we need some notation. Given a finite box 3 = 3(L) = {x ∈ Zd1/2 | −L < xi < L ∀i = 1, . . . , d} , we define and
(2.9)
¯ = {x ∈ Zd1/2 | dist (x, 3) ≤ 1} 3
(2.10a)
¯ = {x ∈ Zd | dist (x, 3) ≤ 2} 3 1/2
(2.10b)
and introduce the operators (0) = H3,q
X
(q) (q) hr3\3 | Hx(0) |r3\3 i ¯ ¯
(2.11)
¯ x∈3
and
(0) +λ H3,q = H3,q
X
VA .
(2.12)
h(0) uv ,
(2.13)
A⊂3
Here Hx(0) is the operator Hx(0) = and
1 2
X huvi:x∈{u,v}
(q) i = ⊗x∈3\3 |rx(q) i . |r3\3 ¯ ¯
(2.14)
We denote the Gibbs state corresponding to the Hamiltonian (2.12) by h·iβ,λ 3,q . In a similar way, one introduces finite-volume Gibbs states with periodic boundary conditions, denoted by h·iβ,λ 3,per in the sequel.
Dobrushin States in Quantum Lattice Systems
599
Theorem 2.1. [BKU96], [DFF96] Let d ≥ 2. Then there is a set of stable states Q = Q(H (0) , λV, β) = {+} or {−} or {+, −}, and positive finite constants γ0 = γ0 (d) and = (d), such that the following statements are true, provided γ(β, λ) := max min{βγcl , γQ , γQ ≥0
γcl } − log(2e|S|) > γ0 (d) , 2e|λ| |||V |||γQ
(2.15)
where e = 2.7 . . .; (see Remark (i) following Lemma 3.1 for the interpretation of γ(β, λ)). i) For q ∈ Q and all local observables A, the limit = lim hAiβ,λ hAiβ,λ q 3(L),q L→∞
(2.16)
exists and describes a translation-invariant pure state with exponential clustering for truncated expectation values of local observables. ii) For all local observables A, there exists a constant CA such that hAiβ,λ − hr(q) | A |r(q) i ≤ CA e−γ(β,λ) , (2.17) q provided q ∈ Q. iii) For all local observables A, the limit β,λ hAiβ,λ per = lim hAi3(L),per
(2.18)
1 X hAiβ,λ . q |Q|
(2.19)
L→∞
exists, and hAiβ,λ per =
q∈Q
Remark. As usual in Pirogov-Sinai theory, the set Q of stable states is given in terms of certain “meta-stable free energies” f± as
where
Q = {q ∈ {+, −} | Re fq = f0 },
(2.20)
f0 = min{Re f− , Re f+ } .
(2.21)
In order to obtain the existence of a hypersurface (of codimension one) in the space of coupling constants where two stable states coexist, i. e. Q = {+, −}, one assumes either the existence of a symmetry relating the two phases + and −, or a so-called degeneracy removing condition for the Hamiltonian H (0) together with suitable smoothness assumptions on the perturbation V , see [BKU96 and DFF96] for the precise statements. For the purpose of this paper, we will assume that H (0) and λV have been chosen such that the coexistence condition Re f+ = Re f− is satisfied and therefore Q = Q(H (0) , λV, β) = {+, −} .
(2.22)
Next we define finite-volume states with Dobrushin boundary conditions in a box of the form 3(L⊥ , L) = {x ∈ Zd1/2 | −L⊥ < x1 < L⊥ , −L < xi < L ∀i ≥ 2} . Introducing the reference configuration rx(+) (+−) rx = rx(−)
if x1 > 0 if x1 < 0,
(2.23)
(2.24)
600
C. Borgs, J. T. Chayes, J. Fr¨ohlich
we define
(0) = H3,+−
X
(+−) (+−) hr3\3 | Hx(0) |r3\3 i ¯ ¯
(2.25)
¯ x∈3
and
(0) +λ H3,+− = H3,+−
X
VA ,
(2.26)
A⊂3
¯ and H (0) are again given by (2.10) and (2.13). ¯ 3 where 3, x The expectation value hAiβ,λ 3,+− of an observable A in 3 (i.e., of a bounded selfadjoint operator A : H3 → H3 ) is then defined as hAiβ,λ 3,+− =
1 β,λ Z+− (3)
TrH3 A e−βH3,+− ,
(2.27)
β,λ where Z+− (3) is the partition function β,λ (3) = TrH3 e−βH3,+− . Z+−
(2.28)
In this general setting, the infinite-volume state resulting from h · iβ,λ 3,+− can exhibit many interesting physical phenomena, including order-disorder transitions in the interface that are not accompanied by any bulk transition. While our methods allow us to study these phenomena [BCF97], we refrain from doing this here. We will therefore formulate assumptions that guarantee that the minimal energy interface of the classical (0) (+−) is unique and is given by the reference configuration r3 . Note Hamiltonian H3,+− that this also excludes the cases in which it is energetically favorable to pass through one or more excited states in order to obtain a transition from r(+) to r(−) , leading to an interface with width greater than one. While such thick interfaces – assuming uniqueness and suitable bounds on excitations – do not lead to new physics and can be easily treated by our methods, we exclude these cases here to keep our notation as simple as possible. In order to state the assumptions discussed above, we define δ(sx , sy ) = h(0) (sx , sy ) − min h(0) (rx(q) , ry(q) ) . q=±
(2.29)
We then assume that for all finite N ≥ 2 and all s(i) 0 ∈ S, i = 0, 1, . . . , N , we have (N ) δ(s(0) 0 , s0 ) ≤
N X
δ(s(i−1) , s(i) 0 0 ),
(2.30)
i=1 (−) ) provided s(0) and s(N = r0(+) . 0 = r0 0
Lemma 2.2. i) Assume that δ(r0(−) , r0(+) ) ≤ δ(r0(−) , s0 ) + δ(s0 , r0(+) )
(2.31)
δ(s0 , r0(+) ) ≤ δ(s0 , se0 ) + δ(e s0 , r0(+) )
(2.32a)
δ(s0 , r0(−) ) ≤ δ(s0 , se0 ) + δ(e s0 , r0(−) )
(2.32b)
for all s0 ∈ S, and that for all s0 , se0 ∈ S, or that for all s0 , se0 ∈ S. Then assumption (2.30) is satisfied for all N ≥ 2. ii) The assumption (2.30) is satisfied for the spin j Hamiltonian (1.1) discussed in the introduction.
Dobrushin States in Quantum Lattice Systems
601
Proof. i) We proceed by induction on N . For N = 2, (2.30) is just assumption (2.31). Assume therefore that (2.30) has been proven for N . We then rewrite the left hand side of (2.30) for N → N + 1 as N +1 X
δ(s(i−1) , s(i) 0 0 )=
i=1
N −1 X
(N −1) (N ) ) (+) δ(s(i−1) , s(i) , s0 ) + δ(s(N 0 0 ) + δ(s0 0 , r0 ) .
(2.33)
i=1
Using the bound (2.32a), we bound the sum of the last two terms from below by −1) (+) δ(s(N , r0 ), which allows us to use the inductive assumption on the resulting sum. 0 The proof of (2.30) under assumption (2.32b) instead of (2.32a) is similar. ii) For this model, δ(s0 , se0 ) = j 2 − s0 se0 and the bound (2.31) is saturated. On the other hand, since −j ≤ s0 , se0 ≤ j, s0 , r0(+) ) δ(s0 , se0 ) + δ(e = 2j 2 − se0 (s0 + j) ≥ 2j 2 − j(s0 + j) = j 2 − js0 = δ(s0 , r0(+) ), which proves (2.32a).
For the models considered in this section, our main result is the following theorem. In order to state it, we recall that a local observable is a self-adjoint operator on H3 for some finite 3. Also, we use the symbol tx (A) to denote the translate of a local observable by a vector x ∈ Zd . Theorem 2.3. Let d ≥ 3 and let , γ0 , γ = γ(β, λ), H (0) , and λV be as in Theorem 2.1. Assume in addition that Q = Q(H (0) , λV, β) = {+, −}, and that the bound (2.30) is satisfied for all N ≥ 2. Then i) The limit hAiβ,λ +− = lim
lim hAiβ,λ 3(L⊥ ,L),+−
L→∞ L⊥ →∞
(2.34)
exists for all local observables A, and is translation-invariant in the horizontal directions (i.e., in the directions orthogonal to the 1-direction). ii) For all local observables A, there exist constants CA < ∞, such that
if x1 > 0, and
β,λ −(γ−γ0 )|x1 | htx (A)i+− − hAiβ,λ + ≤ CA e
(2.35)
β,λ β,λ htx (A)i+− − hAi− ≤ CA e−(γ−γ0 )|x1 |
(2.36)
if x1 < 0. iii) For all local observables A and B, there exist constants CAB < ∞ such that β,λ β,λ β,λ hAtx (B)i+− − hAi+− htx (B)i+− ≤ CAB e−(γ−γ0 )|x| .
(2.37)
602
C. Borgs, J. T. Chayes, J. Fr¨ohlich
3. Proofs The result of this section is the proof of Theorem 2.3, which we divide into several parts. 3.1. Contours and interfaces. In order to prove our results, we use the Duhamel-Phillips (or Dyson) expansion to rewrite our model in terms of space-time contours and interfaces. While we could, in principle, use either the continuous time approach of [DFF96] or the blocked version of [BKU96], here we use the latter since it allows us to map our system onto a classical interface model on a space-time lattice, which can then be treated by standard methods [Dob72, Gal72, BF85, HKZ88, BCF96]. Following [BKU96], we rewrite the partition function (2.28) in the form β,λ (3) = Tr H3 T M Z+−
where
T = e−βeH3,+−
and
e β = M β,
(3.1)
with an integer M to be chosen later. We then expand the partition function (2.28) around the partition function of the corresponding classical spin system using the Duhamel formula for the transfer matrix T = e−βeH3,+− (see e.g. [SS76] for a discussion of the Duhamel formula). The Duhamel expansion (or Dyson series) for the operator T is T =
i Xh Y (−λ)nA Z βe dτA1 . . . dτAnA T (τ, n). nA ! 0 n
(3.2)
A∈A0
Here A0 is the family of all sets A contributing P to the sum (2.26), n is a multiindex 7 nA with finite n = A∈A0 nA , τ = {τA1 , . . . , τAnA , A ∈ n: A0 → {0, 1, . . .} : A → (0) e n , and the operator T (τ, n) is obtained from T (0) = e−βeH3,+− by “inserting” A0 } ∈ [0, β] the operator VA at the times τA1 , . . . , τAnA . Formally, it can be defined as follows. For a given n and τ, let supp n ≡ A = {A1 , . . . , Ak } be the set of all A ∈ A0 with nA 6= 0, ni = nAi , and Vi = VAi . Let (s1 , . . . , sn ) = π(τA1 1 , . . . , τAn11 , . . . , τA1 k , . . . , τAnkk ) be a permutation of the times τ such that s1 ≤ s2 ≤ . . . ≤ sn , and set (Ve1 , . . . , Ven ) = π(V1 , . . . , V1 , . . . , Vk , . . . , Vk ), where on the right-hand side each Vi appears exactly ni times. Then T (τ, n) is defined by e−sn )H3,+− −(s −s )H −(s −s )H −(β . Ve1 e 2 1 3,+− Ve2 . . . e n n−1 3,+− Ven e (3.3) Next we resum (3.2) to obtain the expansion X T = T (B), (3.4)
T (τ, n) = e
(0) −s1 H3 ,+−
(0)
(0)
(0)
B⊂3
where T (B) =
X A={A1 ,...,Ak } ∪i Ai =B
Te(A),
(3.5)
Dobrushin States in Quantum Lattice Systems
with Te(A) =
X n:supp n=A
603
h Y (−λ)nA Z βe i dτA1 . . . dτAnA T (τ, n). nA ! 0
Using the basis |s3 i = ⊗x∈3 |sx i to rewrite (3.1) as X β,λ (2) (M ) (1) (3) = hs(1) Z+− 3 | T |s3 i . . . hs3 | T |s3 i (1)
(3.6)
A∈A
(3.7)
(M )
s3 ,...,s3
and inserting the formula (3.4) to expand T around T (0) = e β,λ Z+− (3) =
X
M Y
) s(1) ,...,s(M 3 3
t=1
eH3(0),+− −β
T (B = ∅), we obtain
hs(t−1) | T (B (t) ) |s(t) 3 3 i,
(3.8)
B (1) ,...,B (M ) (M ) where we have identified s(0) 3 and s3 . In order to rewrite (3.8) in terms of interfaces interacting via contours, we define elementary cubes as the unit closed cubes C = C(x, t) ⊂ Rd+1 with centers (x, t) where ¯ Identifying t = 1 with t = M + 1 , we define the cylinder t ∈ {1, 2, . . . , M } and x ∈ 3. 2 2
T3¯ =
M [ [
C(x, t)
(3.9)
¯ t=1 x∈3
and say that an elementary cube C(x, t) lies in the tth time slice of T3¯ . We say that C(x, t) is in a quantum state if x ∈ B (t) , and in a classical state otherwise. Consider and s(t) now a cube C(x, t) in a classical state. Then s(t−1) 3 3 must assume the same value d sx ∈ S on the point x ∈ Z . The cube C is said to be in (classical) state sx . If sx = rx(m) , m = ±, the cube C is said to be part of the ground state region Vm . (We trust that the reader can distinguish these ground state regions Vm from the potentials Vi = VAi considered above.) All cubes which are not part of a ground state region are said to be excited. Notice that each excited cube is either a quantum cube or a classical cube which is not part of a ground state region. Finally, a d-dimensional unit face is said to be excited if it is shared by two classical cubes belonging to two different ground state regions. Given these definitions, the union of all excited cubes and faces decomposes into one component, K0 , that touches the boundary of T3¯ , and finitely many components, K1 , . . ., Kn , that do not touch the boundary of T3¯ . As usual, we define the contours (M ) (1) (M ) Y1 , . . ., Yn corresponding to a “configuration” s(1) as the 3 , . . . , s3 , B , . . . , B pairs Yi = (supp Yi , αYi ), where supp Yi is the connected component Ki and αYi is an assignment of a value αYi (C) to each elementary cube C 6⊂ supp Yi that touches supp Yi (in the sense that C ∩ supp Yi 6= ∅). We choose αYi so that αYi (C) = rx(m) if C = C(x, t) is part of the ground state region Vm . In a similar way, we define the (labelled) (M ) (1) (M ) interface S corresponding to the configuration s(1) as the pair 3 , . . . , s3 , B , . . . , B (m) S = (supp S, αS ), where supp S = K0 and αS (C) = rx if C = C(x, t) is part of the ground state region Vm . Instead of S, we sometimes use the symbol Y0 = (supp Y0 , αY0 ) for the interface S. Resumming all terms in (3.8) which lead to the same interface S and the same set of contours {Y1 , . . . , Yn }, we obtain the partition function of our system as a sum
604
C. Borgs, J. T. Chayes, J. Fr¨ohlich
over interfaces and contours in T3¯ . In order to show factorization of the corresponding (0) can be rewritten as weights, we need some notation. First, let us note that H3,+− X (0) e H3,+− = (3.10a) h(0) xy , ¯ hxyi⊂3
where
¯ ∩ {x, y}| (+−) (0) (+−) |3 e hr3\3 | hxy |r3\3 i. (3.10b) h(0) ¯ ¯ xy = 2 Next we let ∂B denote the set of all x 6∈ B that have a nearest neighbor in B, and define TB (B | s∂B ) as the operator on HB that is obtained from T (B) on H3 by replacing (0) in (3.3) by the operator H3,+− X X (0) H3,+− (B | s∂B ) = h(0) hs∂B | e h(0) (3.11) xy + xy |s∂B i . hxyi⊂B
hxyi: x∈B,y∈∂B
Using the fact that T (B) contains only quantum excitations VA with A ⊂ B, so that it acts diagonally on H3\B , we have P e hxyi⊂3¯ \B hs{x,y} | eh(0) −β xy |s{x,y} i s3 i = e × hs3 | T (B) |e Y × hsB | TB (B | s∂B ) |e sB i δsx ,e . (3.12) sx x∈3\B
In addition, if B1 , . . . , Bk are the connected components of B, we have sB i = hsB | TB (B | s∂B ) |e
k Y
hsBi | TBi (Bi | s∂Bi ) |e sB i i .
(3.13)
i=1
The rest is straightforward: Given a term contributing to the sum in (3.8), we first extract a factor e−βeem for each elementary cube in the ground state m, where em is the energy density of the classical ground state r(m) , em =
1 X (0) (m) (m) h (rx , ry ) = dh(0) (r0(m) , r0(m) ) , 2 hxyi:
(3.14)
z∈hxyi
and then resum all terms which lead to the same interface S and the same set of contours {Y1 , . . . , Yn }. This gives the representation X Y e β,λ Z+− (3) = ρ(S, Y1 , . . . , Yn ) e−β em |Vm | , (3.15) S,{Y1 ,...,Yn }
m
where |Vm | is the number of elementary cubes which are in the ground state m, and ρ(S, Y1 , . . . , Yn ) is a suitable activity for the interface S and the contours Y1 , . . . , Yn . By (3.12) and (3.13), this activity factors into a product over S and Y1 , . . ., Yn , so that we get X X Y Y e Z β,λ (3) = ρ(S) ρ(Y ) e−β em |Vm | . (3.16) i
+−
S {Y1 ,...,Yn }
i
m
Dobrushin States in Quantum Lattice Systems
605
Here the first sum ranges over interfaces in T3¯ and the second runs over sets of nonoverlapping contours in T3¯ \supp S, with the constraint that the labels of S and Y1 . . . , Yn e if C and C e are matching in the sense that for all i, j = 0, 1, . . . , n, αYi (C) = αYj (C) belong to the same component of T3¯ \ (supp S ∪ supp Y1 ∪ · · · ∪ supp Yn ) (recall that we sometimes use the notation Y0 for the interface S). Equation (3.16) is the desired β,λ (3) as a sum over interfaces S and contours Y1 , . . . , Yn . representation for Z+− We will need the following lemma, derived in [BKU96], to prove convergence of the cluster expansion for the pure-phase states. Lemma 3.1. [BKU96] Let λ ∈ C, βe > 0, and γQ ≥ 1 be such that
Then
e |||V |||γ ≤ 1. eβ|λ| Q
(3.17)
|ρ(Y )| ≤ e−βee0 NC (Y )−γ|Y | ,
(3.18)
where NC (Y ) is the number of elementary cubes in supp Y , |Y | is the sum of NC (Y ) and the number of d-dimensional unit faces in supp Y ,
and
eo = min{e+ , e− } ,
(3.19)
e cl , γQ } − log(2e|S|). γ = min{βγ
(3.20)
Remarks. i) Choosing M = M (γQ ) = max{1, 2eβ|λ| |||V |||γQ } and optimizing over γQ , one obtains the decay constant γ(β, λ) from Theorem 2.1. ii) The proof of Lemma 3.1 immediately generalizes to the weights ρ(S), giving the bound (3.21) |ρ(S)| ≤ e−βee0 NC (S)−γ|S| . However, while the bound (3.18) is enough to prove convergence of the cluster expansion for translation-invariant states (and hence to prove Theorem 2.1), the additional bound (3.21) is not enough to show the convergence of the expansions used in the proof of Theorem 2.2. We discuss this issue further in the next section, where we also prove the necessary strengthening of (3.21), Lemma 3.2. iii) By construction, the labels αS (C) of the interface S are constant on (the boundary cubes of) each component K of T3¯ \ supp S. Defining Vq (S) as the union of all components K of T3¯ \ supp S on which αS = q, we note that V+ (S) consists of one component (the component “above” S) which is bounded by supp S and the boundary of T3¯ , and finitely many (possibly zero) components bounded entirely by supp S, and similarily for V− (S). We denote the component of Vq (S) that is bounded by supp S and the boundary of T3¯ by Ext q (S), and the union of all other components of Vq (S) by Int q (S). iv) The support of each contour Y contributing to (3.16) is connected and has no overlap with supp S. Each such contour therefore lies either in V+ (S) or in V− (S), so that the second sum in (3.16) splits into two sums, one over sets {Y10 , . . . , Yn0 0 }+ of nonoverlapping contours with matching labels in V+ (S), and one over sets {Y100 , . . . , Yn0000 }− of non-overlapping contours with matching labels in V− (S). Since the matching condition for the contours contributing to (3.16) implies in particular that the labels of S are matching with those of Y1 , . . . , Yn , the contours in {Y10 , . . . , Yn0 0 }+ are such that each
606
C. Borgs, J. T. Chayes, J. Fr¨ohlich
component of V+ (S) \ (supp Y10 ∪ · · · ∪ supp Yn0 0 ) that touches the boundary of V+ (S) carries the label +, and similarily for the contours in {Y100 , . . . , Yn0000 }− . As a consequence, we have X β,λ (3) = ρ(S)Z+ (V+ (S))Z− (V− (S)) , (3.22) Z+− S
where Zq (Vq (S)) =
X
Y
{Y1 ,...,Y }q
i
ρ(Yi )
Y
e−βeem |Vm | ,
(3.23)
m
en with Vm defined as the union of all components of Vq (S) \ (supp Y1 ∪ · · · ∪ supp Ye ) n that carry the label m. v) The partition functions Z± defined in (3.23) are partition functions of a contour model defined on the cylinder T = lim3→Zd T3¯ . While they are well defined for any subvolume V of T which is a finite union of unit cubes with centers in Zd1/2 ×{1, 2, . . . , M }, they are only related to the pure-phase partition functions Zqβ,λ (3) = Tr H3 e−βH3,q if V is of the form V = ∪M ¯ C(x, t), i.e. if V is invariant under translations in the t=1 ∪x∈3 “time direction” t. In this case, Zq (V ) = Zqβ,λ (3). 3.2. Walls, flat pieces and the interface tension σ. In the last section, we derived a β,λ (3) as a sum over interfaces. Here we will study the geometry representation of Z+− and the weight of such an interface, and, in particular, prove its rigidity. In order to do this, we will describe an arbitrary interface in terms of its deviations from the flat interface. These deviations are just the so-called walls introduced by Dobrushin [Dob72] in the context of the much simpler Ising model. Here we will use the methods of [BCF96] which were developed to treat interface models with complex weights and without a symmetry relating the phases above and below the interface. We have attempted to make our treatment as self-contained as possible. However, the reader is referred to [BCF96] for technical details. Since the goal of this work is the proof of the rigidity of the interface S, it is natural to describe S in terms of its deviations from the flat interface S0 = (supp S0 , αS0 ), where supp S0 = Σ0 = {(x, t) = ((x1 , x2 , . . . , xd ), t) ∈ T3¯ | x1 = 0}
(3.24)
and αS0 is the map from the set of all elementary cubes C(x, t) in T3¯ that touch supp S0 into +, −, with +1 if x1 = 1/2 αS0 (C(x, t)) = (3.25) −1 if x1 = −1/2. q β,λ β,λ (3) from Z+− (3), which gives First we extract a factor ρ(S0 ) Z+β,λ (3)Z− β,λ (3) Z+−
q β,λ β,λ = ρ(S0 ) Z+β,λ (3)Z− (3) Ze+− (3),
where β,λ (3) = Ze+−
X ρ(S) eφ3 (S) , ρ(S0 )
(3.26)
(3.27)
S
with
Z+ (V+ (S))Z− (V− (S)) . eφ3 (S) = q β,λ Z+β,λ (3)Z− (3)
(3.28)
Dobrushin States in Quantum Lattice Systems
607
Due to the bound (3.18), the partition functions in (3.28) can be analyzed by convergent cluster expansions, provided the decay constant γ in (3.18) is large enough. This gives a representation for φ3 (S) as a sum of two terms: a volume term e − 1− (S) , e e + 1+ (S) + βf β1F (S) = βf where
and
(3.29)
e m = − lim βf
1 log Zm (V ) V →T |V |
(3.30)
1m (S) = |Vm (S0 )| − |Vm (S)| ,
(3.31)
and a surface term W3 (S) given in terms of “clusters” connected to the support of S. See [BCF96] for details. By the convergence of the cluster expansion for W3 (S), the fact e + ) = Re (βf e − ) = βe e 0 + O(e−γ ) in the coexistence region (2.22) considered that Re (βf in this paper, and the fact that 1+ (S) + 1− (S) = NC (S) ,
(3.32)
one obtains the bound |eφ3 (S) | ≤ e(βee0 +O(e
−γ
))NC (S) O(e−γ )|S|
e
,
(3.33)
uniformly in 3. Combined with (3.21), this shows absolute convergence of the expansion (3.27), uniformly in L⊥ (recall that 3 is a box of size (2L⊥ − 1)(2L − 1)d−1 , see Eq. (2.23)). Using similar arguments, one obtains the existence of the limit β,λ β,λ (3∞ ) = lim Ze+− (3(L⊥ , L)) = Ze+− L⊥ →∞
X ρ(S) eφ(S) , ρ(S0 )
(3.34)
S
where the sum runs over all finite interfaces in 3∞ = 3(∞, L), and φ(S) = lim φ3(L⊥ ,L) (S) L⊥ →∞
(3.35)
e is a sum of two terms: the term β1F (S) (see (3.29)) and a term W (S) which is given as a sum over clusters connected to the support of S. In order to continue our analysis, we need some notation. We refer to the directions parallel to Σ0 as horizontal, and to the direction orthogonal to Σ0 as vertical. We define π as the orthogonal projection from T3¯ ∞ onto Σ0 , π((x1 , x2 , . . . , xd ), t) = ((0, x2 , . . . , xd ), t), the height h(x) of a point x ∈ T3¯ ∞ as h(x) = x1 , and the height of a d-dimensional unit face f or an elementary cube C as the height of its center. Let S = (supp S, αS ) be an interface contributing to (3.34). Let CS and FS be the set of cubes and faces in supp S, respectively (we recall that a d-dimensional face f is a face in supp S if it is the intersection of two elementary cubes, C(x, t) and C(y, t), which are in two different ground states). We then say that f ∈ FS is simple if it is horizontal and if there is no cube C ∈ CS with π(C) = π(f ) and no face f 0 ∈ FS \ {f } with π(f 0 ) = π(f ). The set of all faces f ∈ FS that are not simple, or that are touched by a cube C ∈ CS or a face f 0 ∈ FS which is not simple, is denoted by FS? . Motivated by [Dob72], we now consider the connected components K1 , . . . Kk of FS? ∪ CS and define the walls W1 , . . . Wk of S as the pairs Wi = (supp Wi , αWi ), where supp Wi is the connected component Ki and αWi is the restriction of the map αS to the
608
C. Borgs, J. T. Chayes, J. Fr¨ohlich
set of cubes which touch the set Ki . The flat pieces F1 , . . . , Fl of S, on the other hand, are defined as the connected components of FS \ FS? . We define π(W ) as π(supp W ). Finally, a simple face f in a wall W or a flat piece F is called a boundary face of W (of F ) if π(f ) is connected to ∂π(W ) (to ∂π(F )). Remarks. i) We have defined walls and flat pieces in such a way that two boundary faces, fW and fF of a wall W and a flat piece F have the same height if π(fW ) and π(fF ) are connected. The interface S can therefore be reconstructed uniquely from its walls W1 , . . . , Wk . ii) For an arbitrary set of elementary cubes and unit faces B, let [B] be the equivalence class of all translates of B in the vertical direction. By the same argument as in i), it can then be seen that an interface S with walls W1 , . . . , Wk can actually be reconstructed from [W1 ], . . . , [Wk ]. We express this fact by writing S = S([W1 ], . . . , [Wk ]) if S is an interface with walls W1 , . . . , Wk , and call [W1 ], . . . , [Wk ] the floating walls of S. Defining two floating walls [W ] and [W 0 ] as compatible if their projections π(W ) and π(W 0 ) are not connected to each other, it is easy to see that the floating walls of an interface S contributing to (3.34) are pairwise compatible, and that each set of pairwise compatible floating walls leads to exactly one interface S, provided d ≥ 3.1 As a consequence, the sum over interfaces in (3.34) can be replaced by a sum over sets of pairwise compatible floating walls [W1 ], . . . , [Wk ], X
β,λ (3∞ ) = Ze+−
{[W1 ],...,[Wk ]}
ρ(S([W1 ], . . . , [Wk ])) φ(S([W1 ],...,[Wk ])) e . ρ(S0 )
(3.36)
Next we analyze the weights ρ(S([W1 ], . . . , [Wk ])). Starting with the flat interface S0 , we note that d−1 ρ(S0 ) = e−βeσ0 |supp S0 | = e−βeσ0 M (2L+1) , (3.37) where
1 (0) (−) (−) h (rx , ry ) + h(0) (rx(+) , ry(+) ) ; (3.38) 2 (note that σ0 is independent of x and y by the translation invariance of the reference configurations r(±) ). For an arbitrary interface S, we still get a factor e−βeσ0 |F | for each flat piece F in S. Moreover, due to (3.12) and (3.13), the remaining weight factors over the walls of S, so that we get a decomposition of the form σ0 = h(0) (rx(−) , ry(+) ) −
e ρ(S) = e−β σ0
Pl i=1
|Fi |
k Y j=1
ρ(Wj ) = ρ(S0 )
k Y
ρ(Wj )eβeσ0 |π(Wj )| .
(3.39)
j=1
Here F1 , . . . , Fl are the flat pieces and W1 , . . . , Wk are the walls of S, ρ(Wj ) are suitable weights, and |π(Wj )| is the d dimensional area of π(Wj ) ⊂ Σ0 . f ) if W f can be obtained from W by a translation Observing finally that ρ(W ) = ρ(W in the vertical direction, so that ρ(W ) = ρ([W ]), we rewrite the partition function β,λ (3∞ ) as Ze+− 1 For d = 2, this is not the case, since the sum over interfaces S in (3.34) is a sum over “ribbons” in T ¯∞ 3 with both ends fixed at x1 = 0, while a sum over pairwise compatible floating walls corresponds to a sum over ribbons with only one fixed end.
Dobrushin States in Quantum Lattice Systems
β,λ (3∞ ) = Ze+−
X
eφ(S([W1 ],...,[Wk ]))
{[W1 ],...,[Wk ]}
609 m Y
ρ([Wk ])eβeσ0 |π(Wi )| ,
(3.40)
j=1
where the sum is again a sum over sets of pairwise compatible floating walls. Remarks. In an obvious fashion, a floating wall [W ] can be considered as a polymer in Σ0 = supp S0 with countably many internal degrees of freedom. In this interpretation, the partition function (3.40) is the partition function of an interacting polymer system in the “volume” Σ0 . A low-density phase with exponentially decaying correlations for this polymer system then corresponds to rigid interfaces, while a phase transition into a Kosterlitz-Thouless phase corresponds to roughening. Reformulated in this language, the goal of this paper is to show that, under the hypotheses in Theorem 2.3, the polymer system defined by (3.40) is in a low-density phase with exponentially decaying correlations. As discussed below Eq. (3.28), the interaction φ between different walls can be analyzed by a convergent cluster expansion. Due to the bound (3.18), this interaction is weak if γ is sufficiently large. In order to show diluteness for the polymer system, characterized β,λ (3∞ ), it is therefore enough e.g. by the convergence of the Mayer expansion for log Ze+,− to prove that the activities ρ([W ]) are sufficiently small; see [BCF96] for details. The necessary bound is the content of the following lemma. In order to state it, we introduce the symbols CW (and FW ) to denote the set of elementary cubes (faces) in supp W , the symbol FW,h to denote the horizontal faces in FW , and the symbol FW,⊥ to denote the set FW \ FW,h . e γQ and γ be as in Lemma 3.1, and assume that the coexistence Lemma 3.2. Let λ, β, condition (2.22) and condition (2.30) hold. Then e e ρ([W ])eβ σ0 |π(W )| | ≤ e−β e0 |CW | × (3.41) X × exp − γ |CW | + |FW,⊥ | + max{0, |FW,h ∩ π −1 (F )| − 1} F ∈π(W )
for all floating walls [W ]. Lemma 3.2 is the main technical result of this paper. Its proof is given in the appendix. Remarks. i) For walls whose support consists only of horizontal and vertical faces, the right hand side of (3.41) reduces to the usual bound e−βee0 |CW | e−γ(|W |−|π(W )|) . In general, the bound (3.41) is stronger than the bound e−βee0 |CW | e−γ(|W |−|π(W )|) . ii) It is easy to see that a bound e−βee0 |CW | e−γ(|W |−|π(W )|) is actually not strong enough to imply suppression of surface excitations. Taking, e.g., an interface whose support consists of the flat surface, with one unit face replaced by an elementary cube having this face in its boundary, the bound (3.41) for the corresponding wall W gives a suppression factor e−γ , while the bound e−βee0 |CW | e−γ(|W |−|π(W )|) gives no suppression at all. e γQ and γ be as in Lemma 3.1. Then there is Theorem 3.3. Suppose d ≥ 3, and let λ, β, a constant γ0 = γ0 (d) such that the following statements are true provided γ > γ0 and conditions (2.22) and (2.30) hold.
610
C. Borgs, J. T. Chayes, J. Fr¨ohlich
i) The Mayer expansion for formly in L. ii) The surface tension
1 β(2L+1)(d−1)
β,λ log Ze+,− (3∞ ) is absolutely convergent, uni-
1 log σ = − lim L→∞ β(2L + 1)(d−1)
β,λ Z+,− (3∞ )
q β,λ Z+β,λ (3)Z− (3)
(3.42)
exists, and obeys the bound e − σ0 )| ≤ O(e−γ ) , |β(σ
(3.43)
where σ0 has been defined in (3.38). Proof. i) Except for the interaction φ(S([W1 ], . . . , [Wn ])), the representation (3.40) for β,λ (3∞ ) is a representation in terms of a hard-core interacting polymer system with Ze+,− exponentially decaying activities. As in [BCF96], the interaction term eφ(S([W1 ],...,[Wn ])) can be expanded in such a way that the resulting system is again a hard-core interacting polymer system, with polymers consisting now of several walls, glued together by “decorations” stemming from the cluster expansion for φ(S([W1 ], . . . , [Wn ])). Using the bounds (3.18) and (3.41), one obtains exponential decay for the activities of the new polymer system, and hence absolute convergence for the corresponding Mayer expansion; see [BCF96] for details. ii) This follows immediately from i). 3.3. Dobrushin States. In the last section, we showed that the deviations from the flat β,λ (3) interface form a dilute gas of excitations. It follows that the partition function Ze+− is dominated by interfaces S which do not deviate much from the flat interface S0 . More precisely, defining the signed distribution P3∞ of S as P3∞ (S) =
1 ρ(S) φ(S) e , β,λ e ρ(S 0) Z+− (3)
the expansions of the last section imply X ≤ O(1)e−γ|x1 | P (S) 3∞ S:Ext m (S)3x
(3.44)
(3.45)
if sgn x1 = −m. For positive activities, such an estimate immediately gives the bounds (2.35) and (2.36), and hence the desired breaking of the vertical translation invariance in the state h · iβ,λ +− . Unfortunately, this strategy fails here, since the activities ρ(S) and the interaction φ(S) are not necessarily positive or even real. In order to circumvent this difficulty, it is necessary to replace the use of “probability estimates” of the form (3.45) by fullfledged cluster expansions for the expectations hAiβ,λ +− of local observables A. Since the machinery needed for systems with complex interface weights has been developed in detail in [BCF96], we only sketch the main steps needed to map our system onto the kind of systems treated in [BCF96]. In a first step, we derive an expansion of the form (3.16) for the unnormalized expectation
Dobrushin States in Quantum Lattice Systems
611
β,λ Z+− (A; 3) = Tr H3 Ae−βH3,+− .
(3.46)
To this end, we first retrace the steps leading to (3.8) to get X
β,λ Z+− (A; 3) =
(1) (1) hs(0) 3 | A T (B ) |s3 i
M Y
hs(t−1) | T (B (t) ) |s(t) 3 3 i,
(3.47)
t=2
) s(1) ,...,s(M 3 3
B (1) ,...,B (M ) (M ) where we have again identified s(0) 3 and s3 . Defining contours and interfaces as before, this gives the representation
X
β,λ (A; 3) = Z+−
ρA (S, Y1 , . . . , Yn )
S,{Y1 ,...,Yn }
Y
e−βeem |Vm | ,
(3.48)
m
where ρA (S, Y1 , . . . , Yn ) is a suitable activity. Recalling that a local observable is an operator on HA for some finite set A ⊂ Zd1/2 , we introduce the set CA as the union over all elementary cubes C(x, t) with t = 1 and x ∈ A. Denoting the union of all finite components of T \ supp Yi by Int Yi , we group all contours Y ∈ {Y1 , . . . , Yn } for which the set supp Y ∪ Int Y is connected to CA into a new contour YA . We then get the factorization Y ρA (S, Y1 , . . . , Yn ) = ρA (S, YA ) ρ(Y ) , (3.49) Y ∈{Y1 ,...,Yn }\YA
so that β,λ Z+− (A; 3) =
X
X
ρA (S, YA )
S,YA {Y1 ,...,Y }
n e Y
ρ(Yi )
i=1
en
Y
e−βeem |Vm | .
(3.50)
m
and defining Vq (S, YA ) as the union of all Resumming the contours Y1 , . . . , Ye n components of T3¯ \ (supp S ∪ supp YA ) with label q, and Vq(0) (S, YA ) as Vq(0) (S, YA ) = Vq (S, YA ) \ CA , q = ±1, we obtain the representation β,λ Z+− (A; 3) =
X
ρA (S, YA )
S,YA
Y
Zq (Vq(0) (S, YA )) ,
(3.51)
q=±1
which is of exactly the form of the representation (6.3) of [BCF96]. Proceeding as in [BCF96], and using the relations (3.12) and (3.13) to show factorization when needed, we obtain a representation for β,λ (A; 3) = Ze+−
Z β,λ (A; 3) q +− β,λ ρ(S0 ) Z+β,λ (3)Z− (3)
(3.52)
in terms of a hard-core interacting gas of decorated floating walls. Dividing by the β,λ (3), we obtain the desired cluster expansion for the expectation normalization Ze+− β,λ values hAi+− , and a proof of Theorem 2.3; see [BCF96] for details.
612
C. Borgs, J. T. Chayes, J. Fr¨ohlich
4. Appendix In this appendix, we prove Lemma 3.2. Let S = S([W ]) be the (uniquely defined) interface that has [W ] as its only floating wall. By the factorization property (3.39) and the formula (3.37) for the weight of the flat interface, the weight ρ([W ])eβeσ0 |π(W )| can be rewritten as (A.1) ρ([W ])eβeσ0 |π(W )| = ρ(S)eβeσ0 |supp S0 | . The bound (3.41) therefore reduces to the bound |ρ(S)|eβeσ0 |supp S0 | eβee0 |CS | ≤ X max{0, |FS,h ∩ π −1 (F )| − 1} . ≤ exp − γ |CS | + |FS,⊥ | +
(A.2)
F ∈Σ0
Next, let 3 = 3(Lper , L) be a volume for which S is an interface in T3¯ , and let V± (S) be the volumes defined in remark iii) after Lemma 3.1. The weight ρ(S) can then be rewritten as X0
ρ(S) = eβee+ |V+ (S)|+βee− |V− (S)| (1)
M Y
(M )
s3 ,...,s3
hs(t−1) | T (B (t) ) |s(t) 3 3 i,
(A.3)
t=1
B (1) ,...,B (M ) (M ) (1) (M ) such that all where the sum goes over all configurations s(1) 3 , . . . , s3 , B , . . . , B cubes in Vm (S) are cubes in the classical ground state m and all cubes in CS are excited cubes. In order to continue, we decompose the interface S and its projection π(S) = Σ0 into time slices. We define [ ¯ C(x, t) ⊂ Vm (S)} , (A.4) Vm(t) = {C(x, t) | x ∈ 3,
¯ C(x, t) ∈ CS } C (t) = {C(x, t) | x ∈ 3,
(A.5)
and F (t) = {F | ∃C(x, t) ⊂ V+(t) and C(y, t) ⊂ V−(t) such that F = C(x, t) ∩ C(y, t)} , (A.6) and introduce the set Σo(t) as the set of all faces F in Σ0 = supp S0 that have a center with time-coordinate t. Note that M [ (t) (t) supp S = F ∪C , (A.7) t=1
Vm (S) =
M [
Vm(t) ,
(A.8)
t=1
|S0 | =
M X t=1
and
|Σo(t) |
(A.9)
Dobrushin States in Quantum Lattice Systems
613
NC (S) = |CS | =
M X
|C (t) | .
(A.10)
t=1
As a consequence, Eq. (A.3) can be rewritten in the form ρ(S) =
X0
M Y
) s(1) ,...,s(M 3 3
t=1
ee+ |V+ β hs(t−1) | T (B (t) ) |s(t) 3 3 ie
(t)
ee− |V−(t) | |+β
.
(A.11)
B (1) ,...,B (M ) (M ) (1) (M ) , and a fixed time Consider now a fixed configuration s(1) 3 , . . . , s3 , B , . . . , B slice t. Our goal is a suitable bound on
ee+ |V+ β hs(t−1) | T (B (t) ) |s(t) 3 3 ie
(t)
ee− |V−(t) | βee0 |C (t) | |+β e
.
(A.12)
Starting with a relatively simple case, let us assume for the moment that the tth time slice of supp S consists solely of excited faces and classically excited cubes, so that B (t) is empty. Then necessarily s(t−1) = s(t) (A.13) 3 3 , and ee+ |V+(t) |+βee− |V−(t) | eβee0 |C (t) | = β | T (∅) |s(t) hs(t−1) 3 3 ie eX X e − |V (t) | + βe e 0 |C (t) | − β e + |V (t) | + βe h(0) (su , sv ) = exp βe + − 2 ¯ huvi: x∈3
(A.14)
x∈{u,v}
X e − |V (t) | + βe e 0 |C (t) | − βe e + |V (t) | + βe e = exp βe h(0) + xy (sx , sy ) , − ¯ hxyi⊂3
where
¯ (0) |{x, y} ∩ 3| e h(0) h (sx , sy ) . xy (sx , sy ) = 2
(A.15)
Considering the even simpler case in which C (t) is empty, the expression in the exponent of the right hand side of (A.14) can be evaluated explicitly: first, we note that the terms e m |V (t) | can be rewritten as βe m e e m |V (t) | = β βe m 2
X
X
¯: huvi: x∈3 (t) C(x,t)∈Vm x∈{u,v}
h(0) (ru(m) , rv(m) )
X |{(x, t), (y, t)} ∩ V (t) | m = βe h(0) (rx(m) , ry(m) ) . 2 ¯
(A.16)
¯ hxyi⊂3
Inserted into (A.14), all terms but those corresponding to a face in F (t) cancel, and we obtain ee+ |V+(t) |+βee− |V−(t) | = exp − βσ (t) β e 0 |F (t) | . | T (∅) |s i e (A.17) hs(t−1) 3 3 Since σ0 ≥ γcl , this gives the bound
614
C. Borgs, J. T. Chayes, J. Fr¨ohlich
(t) (t) (t) (t) (t−1) e (t) e (t) e hs3 | T (∅) |s3 i eβ e+ |V+ |+β e− |V− | eβ σ0 |Σ0 | ≤ e−γcl (|F |−|Σ0 |)
(A.18)
if C (t) is empty. Coming back to the more general case in which C (t) consists merely of classical cubes, we now use the condition (2.30). Let us assume that e0 = e− and hence min h(0) (rx(q) , ry(q) ) = h(0) (rx(−) , ry(−) )
(A.19)
δ(s, se) = h(0) (s, se) − h(0) (rx(−) , ry(−) )
(A.20)
q
so that, by (2.29),
(the case where e0 = e+ is analogous). As in (A.16), we rewrite e e − |C (t) | = β e 0 |C (t) | = βe βe 2 = βe
X
X
h(0) (ru(−) , rv(−) )
¯: huvi: x∈3 C(x,t)∈C (t) x∈{u,v}
X |{(x, t), (y, t)} ∩ C (t) | h(0) (rx(−) , ry(−) ) . 2 ¯
(A.21)
¯ hxyi⊂3
In order to use the condition (2.30), we need some notation. Consider a point x(0) = ((0, x2 , . . . , xd ), t) in Σ0(t) , and define x(h) = ((h, x2 , . . . , xd ), t) , the line
L=
[
{x(h)}
(A.22) (A.23)
h∈R
and
L? = L ∩ C (t) .
(A.24)
As a first step, we analyze the contribution to the sum in (A.14) of all bonds huvi with both endpoints in L. With a slight abuse of notation, if {u, v} ⊂ L, we say that huvi ⊂ L. ¯ in which Inserting (A.16) and (A.21) into (A.14), we get a sum over bonds huvi ⊂ 3 (t) (t) each term is a sum of four terms – one from e+ |V+ |, one from e− |V− |, one from e0 |C (t) |, and one from the sum in (A.14). There are several cases to consider: First, if huvi ⊂ L lies entirely in V+(t) or entirely in V−(t) , then these four terms cancel. Second, if huvi ⊂ L is dual to a face F ∈ F (t) , then the four terms yield a factor of σ0 . Finally, we are left with ¯ such that at least one of the two endpoints of huvi lies in C (t) . In order to terms huvi ⊂ 3 analyze the sum of these terms, we decompose the set L? into a finite number of nonempty intervals I1 = [a1 , b1 ], . . ., In = [an , bn ], each corresponding to the intersection of L with one or more cubes C ∈ C (t) , and a finite number of isolated points p1 , . . . , p` , corresponding to intersections of L with faces F ∈ F (t) . For each interval Ik = [ak , bk ], 1 we then denote the last lattice point below Ik by (x(k) − , t) = ak − ( 2 , 0, . . . , 0), the first 1 ¯ lattice point above Ik by (x(k) + , t) = bk + ( 2 , 0, . . . , 0), and define the interval Ik by (k) (k) (k) (k) I¯k = [(x− , t), (x+ , t)]. For each such interval, either both (x− , t) and (x+ , t) lie in (k) V+(t) , both lie in V−(t) , or one lies in V−(t) and one lies in V+(t) . If both (x(k) − , t) and (x+ , t) (t) lie in Vm , we use the Peierls estimate (2.7), the fact that γcl ≥ 0, and the fact that sx(k) = sx(k) = r0(m) (the value of the reference configuration at the origin), to bound −
+
Dobrushin States in Quantum Lattice Systems
X
e h(0) xy (sx , sy ) =
hxyi⊂I¯k
X
615
h(0) (sx , sy )
hxyi⊂I¯k
X
= h(0) (sx(k) , s[x(k) +(1,0,···,0)] ) + −
≥h
(0)
−
(r0(m) , r0(m) )
+
h(0) (sx , sy )
hxyi⊂I¯k : (k) ∈{x,y} / −
(A.25)
x
X
(0)
h (sx , sy )
hxyi⊂I¯k : (k) x ∈{x,y} / −
≥ h(0) (r0(m) , r0(m) ) + (|I¯k | − 1)h(0) (r0(−) , r0(−) ) . (k) (t) (t) If one of the two points (x(k) − , t) and (x+ , t) lies in V− and one of them lies in V+ , we use the bound (2.30) together with (A.20) to get
X hxyi⊂I¯k
e h(0) xy (sx , sy ) =
X
h(0) (sx , sy )
hxyi⊂I¯k
= |I¯k |h(0) (r0(−) , r0(−) ) +
X
δ(sx , sy )
hxyi⊂I¯k
(A.26)
≥ |I¯k |h(0) (r0(−) , r0(−) ) + δ(r0(−) , r0(+) ) = h(0) (r(−) , r(+) ) + (|I¯k | − 1)h(0) (r(−) , r(−) ) . 0
0
0
0
Combining the bound (A.28) with the corresponding terms from (A.16) and (A.21), we get a contribution to the right hand side of (A.14) that is bounded by one, while the bound (A.26) leads to a contribution which is bounded by e−βeσ0 . Combining these factors for the different intervals I¯k , and recalling that the isolated points p1 , . . . , p` contribute a factor e−βeσ0 as well, we conclude that the sum over all bonds in L gives a contribution that can be bounded by e−βeσ0 max{1,`} ,
(A.27)
where ` = `(x(0) ) is the number of horizontal faces in F (t) that intersect the line L = L((x(0) ). Next we analyze the sum over horizontal bonds in (A.14), which we write as βe X X0 (0) h (su , sv ) , 2 ¯ huvi: x∈3
(A.28)
x∈{u,v}
P0 where we have used the symbol to indicate that the sum is restricted to bonds huvi for which h(u, t), (v, t)i is parallel to the flat interface (in which case we say that huvi is horizontal). ¯ either (x, t) ∈ V+(t) ∪ V−(t) or (x, t) ∈ C (t) , we first Observing that, for each x ∈ 3, bound
616
C. Borgs, J. T. Chayes, J. Fr¨ohlich
βe X 2 m=±
X
X0
h(0) (su , sv )
¯: huvi: x∈3 (t) (x,t)∈Vm x∈{u,v}
βe X ≥ 2 m=±
X
X0
h
(0)
(ru(m) , rv(m) )
+
X00
(A.29) e 0, βσ
huvi
huvi: (t) (x,t)∈Vm x∈{u,v} ¯: x∈3
where the last sum goes over all horizontal nearest-neighbor pairs huvi such that (u, t) lies in V+(t) and (v, t) lies in V−(t) , or vice versa. Note that this sum can also be written as a sum over all faces F ∈ F (t) that are not parallel to Σ0 . ¯ for which (x, t) is not in V+(t) ∪ V−(t) (which is Considering finally the points x ∈ 3 / {r0(+) , r0(−) } and the Peierls bound equivalent to (x, t) ∈ C (t) ), we use the fact that sx ∈ (2.7) to estimate βe 2
X
X0
h(0) (su , sv ) ≥
¯: huvi: x∈3 (x,t)∈C (t) x∈{u,v}
βe 2
X
X0
h(0) (ru(−) , rv(−) )
¯: huvi: x∈3 (x,t)∈C (t) x∈{u,v}
(A.30)
e cl (d − 1)|C (t) | . + βγ Combining these terms with the corresponding terms from (A.16) and (A.21), we get a suppression coming from the horizontal bonds that is given by a factor e−βeγcl (d−1)|C | e−βeσ0 |F⊥ | , (t)
(t)
(A.31)
(t) denotes the set of faces F ∈ F (t) that are dual to a horizontal bond. where F⊥ Putting everything together, we finally get the bound (t) (t−1) e (t) e (t) e (t) hs3 | T (∅) |s3 i eβ e+ |V+ |+β e− |V− | eβ e0 |C | ≤ X (A.32) (t) (t) (t) ≤ exp − γcl |CW | + |FW,⊥ |+ max{0, |FW,h ∩ π −1 (F )| − 1} , F ∈Σ0
where Fh(t) = F (t) ∩ FW,h . In order to complete the proof, we finally have to bound the general term (A.12) for B (t) 6= ∅. Recalling the relations (3.5) and (3.6) between T (B) and T (τ, n), we first consider a term of the form ee+ |V+ β | T (τ, n) |s(t) hs(t−1) 3 3 ie
(t)
ee− |V−(t) | βee0 |C (t) | |+β e
,
(A.33)
where n is chosen in such a way that B :=
[
A = B (t) .
(A.34)
A:nA 6=0 (0) Defining TB (τ, n | s(t) ∂B ) by replacing the operator H3,+− in (3.3) by the operator (0) H3,+− (B | s∂B ), see Eq. (3.11), we rewrite hs(t−1) | T (τ, n) |s(t) 3 3 i as
Dobrushin States in Quantum Lattice Systems
617
hs(t−1) | T (τ, n) |s(t) 3 3i=e
e −β
P ¯ \B hxyi⊂3
(t) × hs(t−1) | TB (τ, n) | s(t) B ∂B ) |sB i
Y
(t) (t) eh(0) xy (sx ,sy )
× (A.35)
δs(t−1) , ,s(t) x x
x∈3\B
and then bound
(t−1) (t) (t) (t) hsB | TB (τ, n | s∂B ) |sB i ≤ ||TB (τ, n | s∂B )|| ≤ Y eH3(0),+− (B|s(t) −β ) ∂B || ||VA ||nA . ≤ ||e
(A.36)
A
Inserted into (A.35), this gives the bound P Y Y e hxyi⊂3¯ eh(0) −β (t−1) (t) xy (sx ,sy ) ||VA ||nA δs(t−1) , hs3 | T (τ, n) |s3 i ≤ sup e ,s(t) x x s3
A
x∈3\B
(A.37) and s(t) where the supremum is taken over all configurations s3 which agree with s(t−1) 3 3 (t) (t) (t) on 3 \ B . Observing that all cubes in C \ B must be classically excited cubes, we then use (A.32) to bound P (t) (t) (t) e hxyi⊂3¯ eh(0) −β xy (sx ,sy ) ≤ e−βee+ |V+ |+βee− |V− | e−βee0 |C | × sup e s3
X (t) (t) (t) × exp − γcl |CW \ B (t) | + |FW,⊥ |+ max{0, |FW,h ∩ π −1 (F )| − 1} . F ∈Σ0
(A.38) As a consequence, (t) (t) (t−1) e (t) e e (t) hs3 | T (τ, n) |s3 i eβ e+ |V+ |+β e− |V− | eβ e0 |C | ≤ X (t) (t) (t) ≤ exp − γcl |CW \ B (t) | + |FW,⊥ |+ max{0, |FW,h ∩ π −1 (F )| − 1} × ×
Y
F ∈Σ0
||VA ||
nA
.
A
(A.39) Continuing as in the proof of Lemma 4.2 in [BKU96], we conclude that (t) (t) (t) (t−1) e (t) e e (t) hs3 | T (B (t) ) |s3 i eβ e+ |V+ |+β e− |V− | eβ e0 |C | ≤ e−(γQ −1)|B | × X (t) (t) (t) × exp − γcl |CW \ B (t) | + |FW,⊥ |+ max{0, |FW,h ∩ π −1 (F )| − 1} , F ∈Σ0
(A.40) provided condition (3.17) of Lemma 3.1 of this paper is satisfied. Inserting this into (M ) (A.11) and observing that the sum over s(1) 3 , . . . , s3 can be bounded by the size of the spin space to the power |S|, while the sum over B (1) , . . . , B (M ) can be bounded by 2|CS | , we obtain the desired bound (3.41) of Lemma 3.2. Acknowledgement. C.B. and J.T.C. are grateful for the hospitality of ETH-H¨onggerberg, where this work was begun, and the IAS in Princeton, where it was completed.
618
C. Borgs, J. T. Chayes, J. Fr¨ohlich
References [Aiz80]
Aizenman, M.: Translation invariance and instability of phase coexistence in the two-dimensional Ising system Commun. Math. Phys. 73, 83–94 (1980) [BCF96] Borgs, C., Chayes, J. T., Fr¨ohlich, J.: Dobrushin states for classical spin systems with complex interactions. Preprint [BCF97] Borgs, C., Chayes, J. T., Fr¨ohlich, J.: In preparation [BI89] Borgs, C., Imbrie, J.: A unified approach to phase diagrams in field theory and statistical mechanics Commun. Math. Phys. 123, 305–328 (1989) [BI92] Borgs, C., Imbrie, J.: Finite-size scaling and surface tension from effective one dimensional systems Commun. Math. Phys. 145, 235–280 (1992) [BK90] Borgs, C., Koteck´y, R.: A rigorous theory of finite size scaling at first order phase transitions J. Stat. Phys. 61, 79–119 (1990) [BKU96] Borgs, C., Koteck´y, R., Ueltchi, D.: Low temperature phase diagrams for quantum perturbations of classical spin systems Commun. Math. Phys. 181, 409–446 (1996) [BF85] Bricmont, J., Fr¨ohlich, J.: Statistical mechanics methods in particle structure analysis of lattice field theories II: Scalar and surface models Commun. Math. Phys. 98, 553–578 (1985) [Bry86] Brydges, D.: A short course on cluster expansions. eds. K. Osterwalder, R. Stora Critical phenomena, random systems, gauge theories (Les Houches 1984) North Holland, Amsterdam, (1986) [DFF96] Datta, N., Fernandez, R., Fr¨ohlich, J.: Low-temperature phase diagrams of quantum lattice systems. I. Stability for quantum perturbations of classical systems with finitely many ground states J. Stat. Phys. 84, 455–534 (1996) [DFFR96] Datta, N., Fernandez, R., Fr¨ohlich, J., and Rey-Bellet,L.: Low-temperature phase diagrams of quantum lattice systems. II. Convergent perturbation expansions and stability in systems with infinite degeneracy: Preprint [Dob65] Dobrushin, R. L.: Existence of a phase transition in the two-dimensional and three-dimensional Ising models Sov. Phys. Doklady 10, 111-113 (1965) [Dob72] Dobrushin, R. L.: Gibbs states describing the coexistence of phases for a three-dimensional Ising model Teor. Prob. Appl. 17, 582–600 (1972) [FS81] Fr¨ohlich, J. and Spencer, T.: The Kosterlitz-Thouless transition in two-dimensional abelian spin systems and the Coulomb gas Commun. Math. Phys. 81, 527–602 (1981) [Gal72] Gallavotti, G.: The phase separation line in the two-dimensional Ising model Commun. Math. Phys. 27, 103-136 (1972) [GJ85] Glimm, J., Jaffe, A.: Expansions in statistical physics Commun. Pure and Appl. Math. XXXVIII, 613–630 (1985) [Gri64] Griffiths, R. B.: Peierls proof of spontaneous magnetization in a two-dimensional Ising ferromagnet Physical Review A, 136 437–439 (1964) [Hig79] Higuchi, T.: On some limit theorems related to the phase separation line in the two-dimensional Ising model Z. Wahrsch. Verw. Gebiete 50, 287–315 (1979) [HKZ88] Holicky, P., Koteck´y, R., Zaharadn´ik, M.: Rigid interfaces for lattice models at low temperatures J. Stat. Phys. 50, 755-812 (1988) [KP93] Kashuba, A. and Pokrovsky, V.: Stripe domain structures in a thin ferromagnetic film Phys. Rev. Lett. 70, 3155–3158 (1993) [Mer79] Merlini, D.: Boundary conditions and cluster property in two-dimensional Ising ferromagnets J. Stat. Phys. 21, 739–745 (1979) [Nac96] Nachtergale, B.: In preparation [Ons44] Onsager, L.: Crystal statistics. I. A two-dimensional model with an order-disorder transition Phys. Rev. (2) 65, 117–149 (1944) [Pei36] Peierls, R.: On the Ising model of ferromagnetism Proc. Camb. Phil. Soc. 32, 477–481 (1936) [PP90] Pescia, D. and Pokrovsky, V.: Perpendicular versus in-plane magnetization in a 2D Heisenberg monolayer at finite temperatures Phys. Rev. Lett. 65, 2599–2601 (1990,)70, 1185 (1993) [PS75] Pirogov, S. A., Sinai, Ya. G.: Phase diagrams of classical lattice systems Theoretical and Mathematical Physics 25, 1185–1192 (1975,)26, 39–49 (1976) [Sei82] Seiler, E.: Gauge theories as a problem of constructive quantum field theory and statistical mechanics. Lecture Notes in Physics, Vol. 159 , Berlin–Heidelberg–New York: Springer-Verlag (1982)
Dobrushin States in Quantum Lattice Systems
[SS76] [vB77] [Zah84]
619
Seiler, E. and Simon, B.: Nelson’s symmetry and all that in Yukawa and (φ4 )3 theories. Ann. Phys. 97, 470 (1976) van Beijeren, H.: Exactly solvable model for the roughening transition of a crystal surface Phys. Rev. Lett. 38, 993–996 (1977) Zahradn´ık, M.: An alternate version of Pirogov-Sinai theory Commun. Math. Phys. 93, 559–581 (1984)
Communicated by Ya. G. Sinai
Commun. Math. Phys. 189, 621 – 630 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Dobrushin’s Uniqueness for Quantum Lattice Systems with Nonlocal Interaction S. Albeverio1,4 , Yu.G. Kondratiev2 , M. R¨ockner3 , T.V. Tsikalenko2 1 2 3 4
Fakult¨at f¨ur Mathematik, Ruhr-Universit¨at Bochum, D-44780 Bochum, Germany BiBoS, Universit¨at Bielefeld, D-33615 Bielefeld, Germany and Institute of Mathematics, Kiev, Ukraine Fakult¨at f¨ur Mathematik, Universit¨at Bielefeld, D-33615 Bielefeld, Germany BiBoS,Universit¨at Bielefeld, D-33615 Bielefeld, Germany
Received: 25 October 1996 / Accepted: 3 March 1997
This paper is dedicated to the memory of Ronald Dobrushin, to whom we are forever grateful, for all he has given us, by his inspired work, his warm friendship, his being in many ways a true teacher for us. Abstract: Based on Dobrushin’s fundamental criterion, we prove uniqueness of Euclidean Gibbs states for a certain class of quantum lattice systems with unbounded spins, nonharmonic pair potentials and infinite radius of interaction. The necessary estimates on Dobrushin’s coefficients are obtained from the Log-Sobolev inequality which holds for the one-point conditional distributions on the infinite dimensional single spin ( = loop) spaces. 1. Temperature Loop Space Representation for Quantum Gibbs States Let Zd , d ∈ N, be the integer lattice with the Euclidean distance |k −j|, k, j ∈ Zd ⊂ Rd . By S(Zd ) and S 0 (Zd ) we denote the mutually dual spaces of fastly decreasing resp. slowly increasing sequences over Zd , i.e., X d (1 + |x|)p x2k < ∞ , S(Zd ) := x = (xk )k∈Zd ∈ RZ | ∀ p ∈ Z+ , d S 0 (Zd ) :=
k∈Z
x = (xk )k∈Zd ∈ RZ | ∃ p ∈ Z+ , d
(1 + |x|)−p x2k < ∞ . d
X k∈Z
Note that S 0 (Zd ) obviously is a sub-algebra of RZ . We consider a translation invariant system of interacting quantum anharmonic oscillators with the heuristic Hamiltonian d
H := −
1 X a2 X 2 1k + xk + 2m 2 d d k∈Z
k∈Z
X {k,j}⊂Zd
J(|k − j|)U (xk − xj ) +
X
V (xk ), (1.1)
k∈Zd
where m > 0 (= physical mass). The two-particle interactions U (xk −xj ) have intensities
622
S. Albeverio, Yu.G. Kondratiev, M. R¨ockner, T.V. Tsikalenko
J : R+ → R+ , J(0) = 0, (J(|k|))k∈Zd ∈ S(Zd ), Je :=
X
J(|k|),
(1.2)
k∈Zd
and U ∈ C 2 (R1 ) is a convex function such that ∃b− , b+ > 0: ∀q ∈ R1 , b2− ≤ U 00 (q) ≤ b2+ , 0 = U (0) ≤ U (q) = U (−q) ≤ b2+ q 2 /2 .
(1.3)
The harmonic self-interaction is defined by a2 x2k /2 with a constant a2 > 0. The anharmonic self-interaction potential has the form V (q) = V0 (q) + W (q) .
(1.4)
Here V0 ∈ C 2 (R1 ) is a convex function with polynomially bounded derivatives, that is, ∃ b, C > 0, M ∈ N: ∀q ∈ R1 , V000 (q) ≥ b2 , |V0(l) (q)| ≤ C(1 + |q|)M , l = 0, 1, 2.
(1.5)
The potential W describes the presence of (possible) wells in the self-interaction V and is given by a bounded function W ∈ Cb (R1 ).
(1.6)
We set δ(W ) := supR1 W − inf R1 W . Models of the above type in the classical case where the single spin spaces are just R1 have been studied in [BH-K]. A fundamental problem of equilibrium quantum statistical physics is the study of temperature (i.e., Gibbs) states. We will take the Euclidean approach (see e.g. the early work [AH-K] and its further developments in [KL, GK]) to give a rigorous meaning to the Gibbs state Gβ of the lattice system (1.1) at inverse temperature β > 0, i.e., we construct (via a path space representation) the corresponding Euclidean Gibbs measure νβ on the ”loop lattice” β defined as follows: Let Sβ be the circle of length β. The spaces C(Sβ ) and Hβ := L2 (Sβ ) consist of all continuous resp. (relative to Lebesgue measure) square integrable functions ω : 1/2 Sβ → R1 , equipped with the sup-norm k·kC(Sβ ) resp. L2 -norm k·kβ = (·, ·)β . For the corresponding Borel σ-algebras we have: B(C(Sβ )) = B(Hβ ) ∩ C(Sβ ). As the configuration space we introduce the temperature loop space (“loop lattice”) β := C(Sβ )Z = {ω = (ωk )k∈Zd |ω : Sβ → RZ , ωk ∈ C(Sβ )} , d
d
endowed with the product topology and with the Borel σ-algebra B(β ) (= σ-algebra generated by the cylinder sets {ω ∈ β |(ωk )k∈3 ∈ B3 } ∈ B(C(Sβ )3 ), 3 ⊂ Zd with |3| < ∞ . Because of the infinite radius of interaction, we also need the subspace of tempered configurations tβ := {ω ∈ β | (kωk kβ )k∈Zd ∈ S 0 (Zd )} with the topology and Borel structure induced by β . The Euclidean Gibbs measure µβ is described by a corresponding family of local specifications πβ,3 , 3 ⊂ Zd , |3| < ∞ (cf. [Do1, Do2, Pre, Ge]). Let 1β be the
Dobrushin’s Uniqueness for Quantum Lattice Systems
623
Laplace-Beltrami operator on the circle Sβ and γβ (dωk ) be the Gaussian measure on (Hβ , B(Hβ )) with zero mean value and correlation operator (−m1β +a2 11)−1 . Actually, the set C(Sβ ) of continuous loops has full measure, i.e., γβ (C(Sβ )) = 1, and the measure γβ on the space (C(Sβ ), B(C(Sβ ))) can be viewed as the canonical realization of the well-known oscillator bridge process of length β (see e.g. [Si]). We then define Z 1 exp{− V (ωk (τ ))dτ }dγβ (ωk ), (1.7) dσβ (ωk ) := Zβ Sβ as a probability measure on (C(Sβ ), B(C(Sβ ))) and consider the reference measure ×k∈Zd dσβ (ωk ) on (tβ , B(tβ )). Obviously, the definition (1.7) is meaningful for the potentials V dealt with in (1.4) - (1.6), and moreover, the measure σβ has all the moments of the form M ∈ N, (1.8) Eσβ [kωβ kM C(Sβ ) ] < ∞, due to the same property of the Gaussian measure γβ . The local specifications πβ,3 , 3 ⊂ Zd , |3| < ∞, are defined as stochastic kernels on (tβ , B(tβ )) in the following way: ∀B ∈ B(tβ ), ξ ∈ tβ , Z 1 exp{−Iβ,3 (ω|ξ)}11B (ω3 × ξ3c ) ×k∈3 dσβ (ωk ). (1.9) πβ,3 (B|ξ) := Zβ,3 (ξ) β,3 Here β,3 := C(Sβ )3 ,
ω3 := (ωk )k∈3 ∈ β,3 ,
Zβ,3 (ξ) is a normalization constant, and Iβ,3 (ω|ξ) := Z X X [ J(|k − j|)U (ωk (τ )−ωj (τ ))+ Sβ {k,j}⊂3
(1.10) J(|k − j|)U (ωk (τ )−ξj (τ ))]dτ
{k,j}⊂Zd k∈3,j∈3c
is the pair interaction in the volume 3 under the external boundary condition ξ3c := (ξj )j∈3c , 3c := Zd \ 3. Due to assumption (1.2), Iβ,3 (ω|ξ) is well-defined for all ξ ∈ tβ and can be extended by the continuity to the class of boundary conditions n o e tβ := ω ∈ L2 (Sβ )Zd | (kωk kβ )k∈Zd ∈ S 0 (Zd ) . (1.11) ξ∈ An essential point is that for the stochastic kernels (1.9) the consistency condition holds [Do1, Do2, Pre, Ge]: ∀ 3 ⊂ 30 , |30 | < ∞, B ∈ B(tβ ), ξ ∈ tβ Z πβ,30 (πβ,3 (B|·) | ξ) := πβ,30 (dω|ξ)πβ,3 (B|ω) = πβ,30 (B|ξ) . (1.12) β
Definition 1.1. A probability measure µβ on (tβ , B(tβ )) is called (Euclidean) Gibbs state for the local specifications πβ,3 , 3 ⊂ Zd , |3| < ∞, if it satisfies the DLR (Dobrushin-Lanford-Ruelle) equations, i.e., for all 3 ⊂ Zd , |3| < ∞, µβ πβ,3 = µβ .
(1.13)
624
S. Albeverio, Yu.G. Kondratiev, M. R¨ockner, T.V. Tsikalenko
As usual, we restrict our considerations to the the subset Gβt of tempered Gibbs measures, i.e., those µβ from Definition 1.1 with (< kωk kβ >µβ )k∈Zd ∈ S 0 (Zd ) , where < f >µβ =
R β
(1.14)
f (ω)dµβ (ω) for µβ -integrable f : β → R. Note that any
probability measure µ on (β , B(β )), which suits (1.14), is in fact supported on tβ , i.e., µ(tβ ) = 1. The initial step in the study of Gibbs measures is to ensure that the set Gβt is nonempty, which is otherwise not evident for unbounded spin systems. The existence problem goes back to Dobrushin’s papers [Do1, Do2], where general existence criteria for the Gibbs distribution were first given. For further research on the existence problem for quantum lattice models, based on various techniques, see [AH-K, BaK1, BaK2, GK, Pa, PY1, PY2]. Regarding the quantum lattice system with Hamiltonian (1.1), the simplest way to show that Gβt 6= ∅ is to take advantage of the fact that the interaction is superstable as soon as assumptions (1.2)-(1.4) hold. As was shown in [PY1], for a subclass of boundary conditions ξ ∈ β t,0 ⊂ tβ (for instance, such that supk kξk kβ < ∞) the family of local specifications πβ,3 , 3 ⊂ Zd , |3| < ∞, has at least one accumulation point µβ t from the subset of so-called Ruelle type “superstable” Gibbs measures Gt,0 β ⊂ Gβ . The t aim of our paper is to prove a sufficient condition implying |Gβ | = 1. All corresponding results below have been announced in [AKRT2].
2. Uniqueness of Gibbs States We present the main result of this paper. Theorem 2.1. Suppose that the parameters of the system specified in (1.1) (resp. its rigorous implementation through Definition 1.1) satisfy the relation eβδ(W ) b2+ <1, b2− + Je−1 (a2 + b2 )
(2.1)
then the set Gβt of tempered Gibbs measures consists of exactly one point. Proof. The proof is based on Dobrushin’s fundamental uniqueness criterion (see [Do2, DoS, DSK, F¨o]). With full details the proof is contained in [AKRT1], but for a more particular class of quantum lattice models than (1.1). Below we outline the basic ideas and the new feature of our more general case here. In fact we shall prove more, namely that there exists at most one tempered Gibbs measure µβ on the enlarged configuration space e tβ := {ω ∈ HβZd | (kωk kβ )k∈Zd ∈ S 0 (Zd )},
(2.2)
which satisfies the growth condition (1.14) and corresponds to the family of local spece tβ . ifications (1.9) with the definition of the interaction Iβ,3 (ω|ξ) extended to all ξ ∈ It is enough to consider the one-dimensional Gibbs distributions νk (dω|ξ) in volumes e tβ , which are defined by 3k = {k}, k ∈ Zd , with boundary conditions ξ = (ξj )j∈Zd ∈
Dobrushin’s Uniqueness for Quantum Lattice Systems
625
e tβ | ωk ∈ B}| ξ), B ∈ B(Hβ ) . νk (B|ξ) := π{k} ({ω ∈ These are explicitly given by νk (dωk |ξ) =
1 Zβ,k (ξ)
exp{−Iβ,k (ωk |ξ)}dσβ (ωk ),
where Zβ,k (ξ) is a normalization constant and Z X Iβ,k (ωk |ξ) = [ J(|k − i|)U (ωk (τ ) − ξi (τ ))]dτ . Sβ
(2.3)
(2.4)
i∈Zd i6=k
t
e β, As is obvious from (1.8), for all ξ ∈ Z kωk kM β νk (dωk |ξ) < ∞, M ∈ N.
(2.5)
Hβ
The Dobrushin coefficients are defined by Ckj :=
sup t ξ,η∈e β
R(νk (·|ξ), νk (·|η)) , k 6= j, kξj − ηj kβ
(2.6)
∀i6=j:ξi =ηi
where R denotes the Wasserstein distance between the distributions νk (·|ξ) and ν(·|η) on Hβ . That is (cf. [Do2, Ra]) Z Z R(νk (·|ξ), νk (·|η)) := sup f dνk (·|ξ) − f dνk (·|η) , (2.7) f ∈Lip1 (Hβ )
Hβ
Hβ
where Lip1 (Hβ ) := {f : Hβ → R1 |[f ]Lip :=
sup
z,z 0 ∈Hβ z6=z 0
|f (z) − f (z 0 )| ≤ 1}. kz − z 0 kHβ
Actually, in the definition (2.7) one can take the supremum only over the subclass F C ∞ (Hβ ) of smooth cylinder functions from Lip1 (Hβ ) (it follows e.g. from the proof of Corollary 5.2 in [AKRT1]). Due to the Dobrushin uniqueness criterion, it suffices to prove that X Ckj < 1 . (2.8) sup k∈Zd
j∈Zd j6=k
e tβ , let us consider the For k, j ∈ Zd , k 6= j, f ∈ Lip1 (Hβ ), and ξ = (ξi )i∈Zd ∈ mapping Z f (ωk )νk (dωk |ξ) ∈ R1 . (2.9) Hβ 3 ξj 7→< f >νk (dωk |ξ) := Hβ
Because of (1.2) - (1.4) and (2.5) this mapping is Fr´echet differentiable and the derivative in direction ϕ ∈ Hβ is the following:
626
S. Albeverio, Yu.G. Kondratiev, M. R¨ockner, T.V. Tsikalenko
∇ξj hf iνk (dωk |ξ) , ϕ
β 0
=
J(|k − j|)[hf (ωk ) (U (ωk − ξj ), ϕ)β iνk (dωk |ξ)
=
−hf iνk (dωk |ξ) h(U 0 (ωk − ξj ), ϕ)β iνk (dωk |ξ) ] J(|k − j|)Covνk (dωk |ξ) f (ωk ), (U 0 (ωk − ξj ), ϕ)β .
(2.10)
The latter can be estimated as | ∇ξj hf iνk (dωk |ξ) , ϕ
β
|≤
≤ J(|k − j|) Varνk (dωk |ξ) f
21
Varνk (dωk |ξ) (U 0 (ωk − ξj ), ϕ)β
21
, (2.11)
where as usual Z Varνk (dωk |ξ) f :=
Hβ
(f − < f >νk (dωk |ξ) )2 dνk (dωk |ξ)
denote the variance of the function f w.r.t. the measure νk (dωk |ξ) and Covνk (dωk |ξ) is defined correspondingly. Now we are at the crucial point in the proof. For the measures νk (dωk |ξ) uniformly w.r.t. the boundary conditions ξ, the Log-Sobolev (LS) inequality (see Lemma 2.2 below) holds and as a consequence the variance estimate (2.19) (see Corollary 2.3). Therefore, Varνk (dωk |ξ) f ≤ CLS := and by (1.3) Varνk (dωk |ξ) U 0 (ωk − ξj ), ϕ
eβδ(W ) e2 Jb − β
+ a 2 + b2
≤ CLS b4+ kϕk2β .
(2.12)
Hence from (2.11) and (2.12), |(∇ξj hf iνk (dωk |ξ) , ϕ)β | ≤ J(|k − j|)CLS b2+ kϕkβ , t
e β such that ξi = ηi , ∀i 6= j, which by the mean-value theorem implies that for all ξ, η ∈ Z
Z Hβ
f dνk (·|ξ) −
Hβ
f dνk (·|η) ≤ J(|k − j|)CLS b2+ kξj − ηj kβ ,
or, in terms of the Wasserstein distance, R(νk (·|ξ), νk (·|η)) ≤ J(|k − j|)CLS b2+ .
(2.13)
By (2.6), (2.13), and the assumption (2.1), the estimate (2.8) follows, which in turn results |Gβt | = 1 .
Dobrushin’s Uniqueness for Quantum Lattice Systems
627
In order to formulate the LS-inequality and its consequences, first we introduce the spaces of smooth functions on Hβ . We define the space of k-times, k ∈ N ∪ {∞}, continuously differentiable cylinder functions on Hβ , FC k (Hβ ) := g((·, l1 )β , . . . , (·, ln )β ) | n ∈ N , g ∈ C k (Rn ), l1 , . . . , ln ∈ Hβ },
(2.14)
and its subspace FCbk (Hβ ) := {g((·, l1 )β , . . . , (·, ln )) | n ∈ N , g ∈ Cbk (Rn ), l1 , . . . , ln ∈ Hβ }.
(2.15)
We shall use the symbol ∇u(z) ∈ Hβ , z ∈ Hβ , for the gradient realization of the derivative u0 (z) ∈ L(Hβ , R1 ). t
e β , the subject of our study now will be the Having fixed a boundary condition ξ ∈ one-particle measure ν ξ (·) := νk (·|ξ) on the spin space C(Sβ ) ⊂ L2 (Sβ ) := Hβ given by (2.3), (2.4). On the domain FCb∞ (Hβ ) we define the classical pre-Dirichlet form associated with the measure ν ξ : Z (∇u(ω), ∇v(ω))β dν ξ (ω) . (2.16) Eν ξ (u, v) := Hβ
Clearly, (Eν ξ , FCβ∞ (Hβ )) is closable on L2 (νξ ) := L2 (Hβ , ν ξ ) [AR1]. We denote its closure by (Eν ξ , D(Eν ξ )). For a more detailed discussion of infinite–dimensional Dirichlet forms and additional references see [AR, AKR]. Lemma 2.2 (Log–Sobolev inequality). For all u ∈ F Cb∞ (Hβ ) the following inequality is true Z
Z Hβ
|u|2 log |u| dν ξ ≤ CLS
Hβ
||∇u||2β dν ξ + ||u||2L2 (ν ξ ) log ||u||L2 (ν ξ ) ,
(2.17)
with the Sobolev coefficient CLS :=
eβδ(W ) e 2 + a 2 + b2 Jb −
.
(2.18)
Proof. Note first that obiously by approximation we can at once restrict our considerations to the case of finite radius of interaction (i.e., when ∃ 0 < ρ < ∞ : J(|k − j|) = 0, |k − j| > ρ). We follow completely the method of the proof of [AKRT1, Theorem 5.1]. Secondly, in the case W = 0, which corresponds to the quantum lattice system (1.1) with convex interactions, we use an approximation approach (similar to [AKR, Theorem 4]) to obtain an infinite-dimensional version of the well-known Bakry-Emery criterion in the case of the single spin space of loops C(Sβ ). Then the statement of the lemma for general potentials W 6= 0 results from the simple perturbation theorem for the LS-constants, see [St].
628
S. Albeverio, Yu.G. Kondratiev, M. R¨ockner, T.V. Tsikalenko
Corollary 2.3 (Variance estimate). Lip(Hβ ) ⊂ D(Eν ξ ), and for all f ∈ Lip(Hβ ) the following inequality holds: Z (f − < f >ν ξ )2 dν ξ ≤ CLS [f ]2Lip . (2.19) V arν ξ f := Proof. The proof is completely analogous to the proof of [AKRT1, Corollary 5.2]. 3. Concluding Remarks (i) The assertion of Theorem 2.1 in particular implies that uniqueness of µβ can be achieved by keeping fixed the other parameters of the system (1.1), but choosing a sufficiently high temperature, i.e., a small β > 0, or a small intensity of the pair interaction Je > 0, or a W with a small oscillation. These effects have an obvious physical interpretation. For an extended analysis of the connection between the shape of the one-particle potentials V and the uniqueness of the corresponding Gibbs states µβ , see [AKRT1]. (ii) Without discussion of the existence problem, the fact that |Gβt | ≤ 1 in Theorem 2.1 can also be proved for the more general quantum lattice systems with the heuristic Hamiltonian X X 1 X 2 2 1 X 1k + ak xk + Ukj (xk − xj ) + Vk (xk ). (3.1) H=− 2m 2 d d d d k∈Z
k∈Z
{k,j}∈Z
k∈Z
The pair interactions are given here by functions Ukj ∈ C 2 (R1 ) such that ∀q ∈ R1 , − 00 + ≤ Ukj (q) ≤ Bkj < ∞, 0 ≤ Ukj (q) = Ukj (−q) ≤ Ckj (1 + |q|)2 . 0 < Bkj
The anharmonic self-interactions have the form Vk (q) = V0,k (q) + Wk (q), where V0,k ∈ C 2 (R1 ), Wk ∈ Cb (R1 ) and, ∀q ∈ R1 , (l) 00 0 < b2k ≤ V0,k (q), |V0,k (q)| ≤ Ck (1 + |q|)M , l = 0, 1, 2, M ∈ N . − + The sequence (Ck )k∈Zd and the matrices (Bkj )k,j∈Zd , (Bkj )k,j∈Zd , (Ckj )k,j∈Zd are supposed to be fastly decreasing. In this non-translation-invariant case the uniqueness condition then reads as P + eβδ(Wk ) j∈Zd , j6=k Bkj <1. (3.2) sup P − 2 2 k∈Zd j∈Zd , j6=k Bkj + bk + ak
(iii) A uniqueness result (via cluster expansion techniques) in the more restricted t t (⊂ class Gβ,0 6= Gβ ) of quantum Gibbs states satisfying an a priori assumption on the support (the so-called “super-stable” Ruelle type Gibbs measures, cf. [COPP]) has been obtained in [PY2]. (iv) It is worth noting that our methods work both in quantum and classical cases. The analogous uniqueness result for classical lattice systems with harmonic nearestneighbour and polynomial self-interactions has been proved in [Roy]. The Dobrushin matrix has been also estimated in [BP], using a technique different from our’s for classical systems with potentials V = V0 + W such that inf R1 V000 ≥ 0 and, in addition,
Dobrushin’s Uniqueness for Quantum Lattice Systems
629
supR1 |W 0 | < ∞. Actually, the idea to get a bound on Dobrushin’s coefficients through the spectral gap for the Dirichlet operators associated with one-particle conditional distributions goes back to [DeS, Wa], which concern compact manifolds as spin spaces for classical lattice models. Acknowledgement. Financial support by the DFG (through SFB 237 Bochum-D¨usseldorf-Essen, SFB 343 Bielefeld as well as the Research Project AL 214/9-2), by the Alexander-von-Humboldt Foundation (fellowship for the fourth named author) and by the EC-Science Project SC1*CT92-0784 is gratefully acknowledged.
References [AH-K]
Albeverio, S., Høegh–Krohn, R.: Homogeneous random fields and quantum statistical mechanics. J. Funct. Anal. 19, 242–272 (1975) [AKR] Albeverio, S., Kondratiev, Yu.G., R¨ockner, M.: Dirichlet operators via stochastic analysis. J. Funct. Anal. 128, 102–138 (1995) [AKRT1] Albeverio, S., Kondratiev, Yu.G., R¨ockner, M., Tsikalenko, T.V.: Uniqueness of Gibbs states for quantum lattice systems. Probab. Th. Rel. Fields 108, 193–218 (1997) [AKRT2] Albeverio, S., Kondratiev, Yu.G., R¨ockner, M., Tsikalenko, T.V.: Uniqueness of Gibbs states on loop lattices. C.R. Acad.Sci. 324,ser. I. 1401–1406 (1997) Paris [AR] Albeverio, S., R¨ockner, M.: Dirichlet forms on topological vector spaces – closability and a Cameron-Martin formula. J. Funct. Anal. 88, 395–436 (1990) [BaK1] Barbulyak, V.S., Kondratiev, Yu.G.: The semiclassical limit for the Schr¨odinger operator and phase transitions in quantum statistical physics Funct. Anal. Appl., 26, 124 (1992) [BaK2] Barbulyak, V.S., Kondratiev, : A criterion for the existence of periodic Gibbs states of quantum lattice systems. Selecta Math. Sov., 12, 25–35 (1993) [BH-K] Bellissard, J., Høegh–Krohn, J.: Compactness and the maximal Gibbs states for random Gibbs fields on a lattice. Comm. Math. Phys. 84, 297–327 (1982) [BP] Bellissard, J., Picco, J.: Lattice quantum fields: uniqueness and Markov property. Preprint Centre de Physique Theorique, Marseille (1978) [COPP] Cassandro, M., Olivieri, E., Pellegrinotti, A., Presutti, E.: Existence and uniqueness of DLR measures for unbounded spin systems. Z. Wahrsch. verw. Gebiete, 41, 313-334, (1978) [DeS] Deuschel, J.D., Stroock, D.W.: Hypercontractivity and spectral gap of symmetric diffusions with applications to the stochastic Ising models. J.Funct.Anal 92, 30–48 (1990) [Do1] Dobrushin, R.L.: The description of a random field by means of conditional probabilities and conditions of its regularity. Theory Prob. Appl. 13, 197–224 (1968) [Do2] Dobrushin, R.L.: Prescribing a system of random variables by conditional distributions. Theory Prob. Appl., 15, 458–489 (1970) [DoS] Dobrushin, R.L., Shlosman, S.B.: Constructive criterion for the uniqueness of Gibbs field. In: Statistical Physics and Dynamical Systems. Rigorous results. Basel–Boston: Birkh¨auser, 1985, pp. 347–370 [DSK] Dobrushin, R.L., Shlosman, S.B., Kolafa, I.: Uniqueness of Gibbs random fields. Preprint IITP, Moscow (1986) [F¨o] F¨ollmer, H.: A covariance estimate for Gibbs measures. J. Funct. Anal. 46, 387-395 (1982) [Ge] Georgii, H.O.: Gibbs measures and phase transitions. Studies in Mathematics 9, Berlin–New York: Walter de Gruyter, 1988 [GK] Globa, S.A., Kondratiev, Yu.G.: The construction of Gibbs states of quantum lattice systems. Selecta Math. Sov. 9, 297–307 (1990) [Gr] Gross, L.: Logarithmic Sobolev inequalities and contractivity properties of semigroups. In: Dirichlet forms, G.F.Dell’antonio et al. (eds.), Lecture Notes in Math. 1563, Berlin: Springer–Verlag, 1993 [KL] Klein, A., Landau, L.: Stochastic processes associated with KMS states. J. Funct. Anal. 42, 368–428 (1981) [Ko] Kondratiev, Yu.G.: Phase transitions in quantum models of ferroelectrics. In: Stochastic processes, physics and geometry II, Singapore – New Jersey: World Scientific, 1994 pp. 465-475
630
[K¨u] [LP] [Pa] [PY1] [PY2] [Pre] [Ra] [Rot] [Roy] [Ru] [Si] [St]
[Wa]
S. Albeverio, Yu.G. Kondratiev, M. R¨ockner, T.V. Tsikalenko
K¨unsch, H.: Decay of correlations under Dobrushin’s uniqueness condition and its applications. Comm. Math. Phys. 84, 207–222 (1982) Lebowitz, J.L., Presutti, E.: Statistical mechanics of systems of unbounded spins. Comm. Math. Phys. 50, 195–218 (1976) Park, Y.M.: Quantum statistical mechanics of unbounded continuous spin systems. J. Korean Math. Soc. 22, 43–74 (1985) Park, Y.M., Yoo, H.H.: A characterization of Gibbs states of lattice boson systems. J. Stat. Phys., 75, 215–239 (1994) Park, Y.M., Yoo, H.H.: Uniqueness and clustering properties of Gibbs states for classical and quantum unbounded spin systems. J. Stat. Phys., 80, 223–271 (1995) Preston, C.: Random fields, Lecture Notes in Math. 534, Berlin: Springer-Verlag, 1976 Rachev, S.T.: Probability metrics and the stability of stochastic models. Wiley Series in Prob. and Math. Stat., Chichester-New York: Wiley, 1991 Rothaus, O.S.: Diffusion on compact Riemannian manifolds and logarithmic Sobolev inequalities. J. Funct. Anal. 42, 102–109 (1981) ´ Royer, O.S.: Etude des champs Euclidiens sur un resau Zν . J. Math. Pures et Appl. 56, 455–478 (1977) Ruelle, D.: Probability estimates for continuous spin systems. Comm. Math. Phys. 50, 189–194 (1976) Simon, B.: Functional integrals in quantum physics. New York: Academic Press, 1986 Stroock, D.W.: Logarithmic Sobolev inequalities for Gibbs states. In: Dirichlet forms, G. F. Dell’antonio et al. (eds.), Lect. Notes in Math. 1563, Berlin–Heidelberg–New York: Springer– Verlag, 1993 Wang, F.: Uniqueness of Gibbs states and exponential L2 -convergence for infinite-dimensional reflecting diffusion processes. Science in China 38, 908–917 (1995)
Communicated by Ya. G. Sinai
Commun. Math. Phys. 189, 631 – 640 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Staggered Phases in Diluted Systems with Continuous Spins L. Chayes? , R. Koteck´y?? , S. B. Shlosman??? Erwin Schr¨odinger Institute, Vienna, Austria Received: 2 January 1997 / Accepted: 1 February 1997
Dedicated to the memory of Roland L’vovich Dobrushin
Abstract: We consider systems with continuous spins and annealed dilution. We show that, as in the discrete case, such systems often undergo a phase transition, which is manifested in the appearance of a staggered intermediate phase. In particular, these phases appear in systems such as the massive Gaussian model where there is no phase transition in the undiluted system.
1. Introduction In this paper we continue the study, initiated in [CKS], of intermediate phases and first order transitions in annealed dilute spin systems. While in [CKS] we dealt with discrete spin systems, in the present paper we focus on systems with continuous spins. It is normally the case that continuous spin systems are much harder to study than their discrete counterparts. The celebrated papers of Fr¨ohlich, Simon, Spencer [FSS] and Fr¨ohlich, Spencer [FS] were milestones in the rigorous understanding of the phase diagrams of these systems. It therefore came as a surprise to the authors that the problem of the intermediate phases for continuous annealed systems is not significantly harder than the discrete version. On the other hand, we have much less control over the phase diagrams of the continuous spin systems (in particular, systems with continuous symmetry) and of the properties of their different phases. This is reflected in the fact that our ?
Permanent address: Department of Mathematics, UCLA, Los Angeles,CA 90095-1555, USA. E-mail: [email protected] ?? On leave from Center for Theoretical Study, Charles University, Prague, and Department of Theoretical Physics, Charles University, V Holeˇsoviˇck´ach 2, 180 00 Praha 8, Czech Republic, E-mail: ˇ 202/96/0731 and GAUK 96/272. [email protected]; partly supported by the grants GACR ??? On leave from Department of Mathematics, University of California, Irvine, CA 92717, USA, E-mail: [email protected]; partly supported by the NSF through grant DMS 9500958 and by the Russian Fund for Fundamental Research through grant 930101470.
632
L. Chayes, R. Koteck´y, S. Shlosman
statements about phase diagrams of the dilute annealed systems with continuous spins are less complete than in the discrete spin case of [CKS]. The principal results of this work are all exhibited in the simplest (and best known) example of a continuous spin system: the XY model. Indeed, for brevity, all of the explicit proofs will concern the two-dimensional versions of this problem; we will be content with just stating general conditions under which analogous results can be established in other systems. Thus consider the Hamiltonian X (cos(ϕi − ϕj ) − 1). (1.1) H=− i,j∈Z2
Here the sum is over the nearest neighbors, and the ϕ variables take values in the unit circle, which is identified with the segment [0, 2π). The Hamiltonian Hs of the site diluted version of this model is given by X X X ni nj (cos(ϕi − ϕj ) − 1) − µ ni − κ ni nj . (1.2) Hs = − i,j∈Z2
i∈Z2
i,j∈Z2
In the last equation the variables ni take on the values 0 or 1, indicating the presence or absence of a particle at the site i. In the formula (1.1) the constant (−1) is for convenience only – it can be omitted without loss of generality – this is not the case in (1.2); hence the introduction of the term with the parameter κ. Our claim for the site diluted XY model concerns the existence of an intermediate phase within which there are two states that are characterized by the preferential occupation of the even/odd sublattices. This phase intercalates between the low temperature/high density regime and the high temperature/low density regime. As in the discrete case, the existence of such a phase is due to entropic repulsion. While in the “large q-models” this is not difficult to understand, in the continuous case the origin of the effect is slightly more subtle. In the case, e.g. of the XY model, the explanation goes roughly as follows. If, at low temperatures, two particles are nearest neighbors, their spins have to be nearly aligned. In this case the available phase volume is o(1) as β → ∞. On the other hand if the sites are isolated, they may enjoy the full freedom of the circle; hence an effective repulsion is provided by the relative restrictions due to low temperature. Our main result is that if the temperature is low (but not very low) the staggered phase indeed exists. Namely, in a region of intermediate temperature and chemical potential, there are (at least) the two staggered states. Moving out of the staggered phase towards the line µ = ∞ (the uniform undiluted system), a portion of the phase boundary is a line of first order transitions. Here, the two staggered states coexist with the dense phase which may or may not be magnetically ordered depending on the details of the model. The rest of the phase boundary – the line between the staggered phases and the uniqueness regime – is, possibly, a line of higher order transitions. We will not not make any claims about these transitions. The second sort of dilute systems we consider are the so called bond-dilute models. For the XY model, the relevant Hamiltonian is given by X X ni,j cos(ϕi − ϕj ) − 1 − λ ni,j , (1.3) Hb = − hi,ji
hi,ji
where the ni,j are bond occupation variables that take on the values zero or one and λ is the bond fugacity.
Staggered phases in Diluted Systems
633
Here the results are also similar to the analogous problem for the discrete cases but, perhaps, more surprising for continuous spins. In particular, it will be established that extending from the point β = ∞, λ = 0, there is a line of first order transitions across which the bond density – and the energy density – are discontinuous. In particular, note that for any configuration with a fixed number of bonds, as T → 0 the energy is independent of the arrangement of bonds. Our results indicate that nevertheless, there is phase separation even at zero temperature. 2. Site diluted models 2.1. The two-dimensional XY model. As in [CKS], our analysis here relies heavily on the fact that all the systems we consider are reflection positive (RP). For two-dimensional systems with nearest neighbor interactions, we may use reflections in the lines {x ± y = k},
k = . . . , −1, 0, 1, . . . .
(2.1)
The corresponding finite volume boxes that are invariant under such reflections are the two-dimensional tori TN = {i = (x, y) ∈ Z2 : |x ± y| ≤ N },
(2.2)
(or more precisely, the graphs thereof) where the standard identification of the boundary sites is assumed. Use of the tori TN cuts down considerably on the amount of calculations that have to be performed. However, the above described advantages occur only for the two-dimensional case. In higher dimensions one has to consider reflections with respect to planes perpendicular to coordinate axes as was done in [CKS]. We denote by HN the restriction of the Hamiltonian Hs , given by (1.2), to the box TN . The partition function ZN,β is given by X Z Y exp{−βHN (nN , ϕN )} dϕi . (2.3) ZN,β = ni =0,1 i∈TN
i∈TN ;ni =1
This partition function serves as the normalization constant for the finite volume Gibbs state with periodic boundary conditions, h−iN,κ,β,µ , which assigns to the configuration (nN , ϕN ) a weight proportional to exp{−βHN (nN , ϕN )}. Let us now introduce the different possible infinite volume phases of the model, the existence of which are the crux of Theorem 2.1. They are denoted by h−ioκ,β,µ , h−iA κ,β,µ o and h−iB . The state h−i is a small perturbation of the high density state of κ,β,µ κ,β,µ the XY model. In our context it is characterized by the high probability hχ1i ioκ,β,µ for any site to be occupied by a particle. (Here we have used χ1i to denote the indicator of the event {nN , ϕN ni = 1}.) Moreover, if we introduce the indicator χ1,1 b of the event that both sites of a bond b, (e.g. b = [(0, 0), (1, 0)]) are occupied by a particle, then its o A expected value hχ1,1 b iκ,β,µ is also close to one. The state h−iκ,β,µ describes the phase where the even sublattice is preferentially occupied and similarly h−iB κ,β,µ for the odd A sublattice. To characterize the state h−iκ,β,µ we introduce the indicator χA b which is one when the even endpoint of the bond b is occupied and the odd endpoint is vacant and vanishes otherwise. The indicator χB b is described similarly with the roles of even and A odd exchanged. The state h−iA is characterized by a value of hχA κ,β,µ b iκ,β,µ close to one B and similarly for the state h−iκ,β,µ . We can now state:
634
L. Chayes, R. Koteck´y, S. Shlosman
Theorem 2.1. Consider the two-dimensional site diluted XY model as described by the Hamiltonian in Eq. (1.2) with κ > 0. There exist small numbers and κ0 and, for every κ < κ0 , a region F (κ) ⊂ R so that for any µ ∈ F there exist inverse temperatures β1 (κ, µ), β2 (κ, µ), and βc (κ, µ) ∈ [β1 , β2 ] such that: i) For any β ∈ [βc , β2 ] there exists a state h i(o) κ,β,µ , for which hχ1i i(o) κ,β,µ ≥ 1 − for every site i. ( A) (B) and h iκ,β,µ for which ii) For any β ∈ [β1 , βc ] there exist two states h iκ,β,µ (A) hχA b iκ,β,µ ≥ 1 −
and
( B) hχB b iκ,β,µ ≥ 1 − ,
respectively, for every bond b. The proof of the above theorem via the RP technology requires certain estimates of the partition functions taken over subsets of configurations, which exhibit given pattern behavior. In our case of the two-dimensional XY model there are five such patterns. The first three of them are obtained by repeated reflections of the three characteristic patterns on the bond b, specified before Theorem 1. One has to use the reflections in lines (2.1), until the pattern is disseminated to the whole TN . In this way we obtain the events: A = {nN , ϕN : ni = 1; i = (x, y) ∈ TN , x + y even, ni = 0; i ∈ TN , x + y odd }, IN B – that the odd sublattice is full, – that the even sublattice is full, the analogous event IN and o = {nN , ϕN : ni = 1 for all i = (x, y) ∈ TN }, IN
– that the whole box is filled by the particles. In addition we need the empty event – that all sites in TN are vacant, ∅ = {nN , ϕN : ni = 0 for all i = (x, y) ∈ TN }, IN
and the event c = {nN , ϕN : ni = 0 iff i = (x, y) ∈ TN , x − y = 4k + 1, }. IN
(2.4)
This last event is obtained from the elementary configuration n(0,0) = 1, n(0,1) = 1, n(1,0) = 0 on the union of two bonds, b = [(0, 0), (1, 0)] and b0 = [(0, 0), (0, 1)], by repeated reflections with respect to the lines (2.1), from which we exclude all lines of the form {x − y = 2k, k = . . . , −1, 0, 1, . . .}. Elementary configurations of this type should appear whenever the two phases – ordered and staggered – touch each other; they form then the separating contour between the phases, which explains the superscript c 1,1,0 in our notation. We denote the indicator of the above elementary configuration by χb,b 0 . ∅ A B o c We denote by χN , χN , χN , χN , and χN the indicators of the corresponding events ∅ A B o c IN , IN , IN , IN , and IN . Finally, we define the partition functions, restricted to the above five events, by
Staggered phases in Diluted Systems
635 ∗ ZN,β = hχ∗N iN,κ,β,µ ZN,β ,
(2.5)
where ∗ takes five values: A, B, o, ∅ and c. The main step in proving Theorem 1 above consists of the following estimates: Lemma 2.2. The partition functions (2.5) for the patterns ∅, A, B, o and c satisfy the following estimates: ∅ = 1, (2.6) ZN,β A B )1/|TN | = (ZN,β )1/|TN | = e 2 βµ (2π) 2 , (ZN,β 1
1
(2.7)
−1
e eβµ e2βκ √
4 o ≤ (ZN,β )1/|TN | ≤ eβµ e2βκ √ , β β
(2.8)
3 4 3 c )1/|TN | ≤ e 4 βµ eβκ ( √ ) 4 , (ZN,β β
(2.9)
where |TN | denotes the number of sites in TN and various terms in (2.6)-(2.9) may be modified by multiplicative factors that tend to one as N → ∞. Proof. The identity (2.6) is obvious, and (2.7) is straightforward. To get the lower bound in (2.8) we restrict the range of integration to the product of arcs of length √1 , each β
centered at the origin, i.e. to the set 1 {|ϕi | < √ , 2 β
i ∈ TN }.
We then replace the integrand by its minimal value and use the inequality cos ϕ − 1 ≥ P 2 − ϕ2 . To get the upper bound, we estimate the sum i,j∈TN ,|i−j|=1 (cos(ϕi − ϕj ) − 1) from above by restricting the range of summation from the set of all bonds to its subset, which forms a maximal tree. R After that the integration can be done site by site, each contributing the factor of exp{β(cos ϕ − 1)} dϕ, which can be estimated from above with the help of the inequality: √ 2 2 ϕ , |ϕ| ≤ π. cos ϕ − 1 ≤ − π The upper estimate (2.9) is obtained in the same way.
Proof of Theorem 2.1. The proof of this theorem goes in the manner that is customary for RP systems. We establish below that at very high β and reasonable µ the ordered phase prevails, that for the same µ and lower β’s the staggered phase prevails, and finally that their coexistence in space is ruled out in the whole interval of β’s (contour estimate). We will also show that there are two different staggered phases by showing that their spatial coexistence is suppressed. To implement this program we start with estimates of the expectations 0,0 hχA b iN,κ,β,µ and hχb iN,κ,β,µ . By the chessboard estimate we have 1 2|TN |
A hχA b iN,κ,β,µ ≤ hχN iN,κ,β,µ ≤
and
ÿ
A ZN,β o ZN,β
! 2|T1
N|
≤
√ − 1 βµ −βκ 1 ee 4 e (2πβ) 4 ,
(2.10)
636
L. Chayes, R. Koteck´y, S. Shlosman 1 2|T
− 2|T1
∅ o N hχ0,0 b iN,κ,β,µ ≤ hχN iN,κ,β,µ ≤ (ZN,β ) |
≤
N|
√ − 1 βµ −βκ 1 ee 2 e β4,
(2.11)
which is small for very large β and positive µ. Taking into account the identity 1,1 A B χ0,0 b + χb + χb + χb = 1,
(2.12)
these bounds show that at low temperatures almost all sites are occupied. Hence, one can choose β2 = ∞. 0,0 In the “staggered” region we estimate similarly hχ1,1 b iN,κ,β,µ and hχb iN,κ,β,µ . Again by the chessboard estimate ÿ ! 2|T1 | o N 1 Z 1 1 N,β 2|T | 1,1 N ≤ ≤ 2eβ( 4 µ+κ) (2πβ)− 4 , (2.13) hχb iN,κ,β,µ ≤ hχoN iN,κ,β,µ A ZN,β and 1 2|T
− 2|T1
∅ A N hχ0,0 b iN,κ,β,µ ≤ hχN iN,κ,β,µ ≤ (ZN,β ) |
N|
≤ e− 4 βµ (2π)− 4 . 1
1
(2.14)
It is clear, for κ small enough, that we can find a µ & 0 and a β1 = β1 (µ, κ) with β1 1 such that for β equal to (or slightly larger than) β1 , the right hand sides of Eqs. (2.13) and (2.14) are both small. To show that under these circumstances two staggered states coexist, we need a contour estimate. A site that belongs to more than one of the 1,1,0 A-, B- or o- type bonds (a contour site) will be the center of a χb,b 0 -type event or be part of an empty bond. The latter are uniformly unlikely in the entire specified region, 1,1,0 let us show that the same holds for the principal “contour term” hχb,b 0 iN,κ,β,µ . This is readily accomplished: 1 |T |
1,1,0 c N hχb,b 0 iN,κ,β,µ ≤ hχN iN,κ,β,µ ≤
× h 1+
e−1/2 (2π)1/4 e−1 (2π)1/2
1
e 4 βµ eβκ
1 e 2 βµ e2βκ
1 β 1/4
1 β 1/2
1 43/4 e1/2 β 1/8 (2π)1/4
i|TN | |T1N | ≤
×
1 43/4 e1/2 . β 1/8 (2π)1/4
(2.15)
We used here the upper bound (2.9) as well as lower bounds (2.7) and (2.8) estimating A o + ZN,β . first ZN,β ≥ ZN,β All the required ingredients have now been assembled. It is clear that a finite region of any of the three competing phases is surrounded by contour sites. (In this case, the relevant notion of connectivity for the contours is ∗-connectedness.) It follows that for all β in [β1 , ∞), the contours are damped exponentially with their length. In the region where the ordered indicator has small expectation, i.e. β & β1 this implies the existence of the two staggered states. For β large we of course have the ordered state. Applying Lemma (2.4) from [CKS], we may conclude that there is a βc ∈ (β1 , ∞), where the two staggered states coexist with an ordered state. Remark . The fact that Theorem 2.1 was proved for κ > 0 allowed us to easily demonstrate that the staggered phase does not survive down to zero temperature. In fact, for β 1, κ fixed and µ allowed to vary, the system undergoes a first order phase transition near µ = −dκ, where the (almost fully) occupied state coexists with a state in which nearly every site is vacant. Such results were established in [CKS] in the discrete cases; here the proof is nearly identical.
Staggered phases in Diluted Systems
637
2.2. The general case. The results from the preceding section can be extended to a general class of nearest neighbor site diluted models on Zd , d ≥ 2. Namely, we consider a model with spins ϕ taking values in a Riemannian manifold S equipped with an a priori Borel measure µ(dϕ). The metric on S will be denoted by ρ(·, ·), the corresponding Riemann measure is dϕ, and µ(dϕ) is supposed R to be absolutely continuous with respect to dϕ, with a continuous density, such that dµ < ∞. The Hamiltonian on the torus TN has the form X X X ni nj U (ϕi , ϕj ) − µ ni − κ ni nj . (2.16) HN (nN , ϕN ) = hi,ji
i
hi,ji
Here µ, κ, and the occupation variables ni ∈ {0, 1} play the same role as in the particular case (1.2) of XY model. For ϕ, ψ ∈ S, it is supposed that the interaction U (ϕ, ψ) satisfies the following conditions: – U (·, ·) is measurable and U (ϕ, ψ) ≥ 0 for each ϕ, ψ ∈ S. – There exist ϕ0 ∈ S such that U (ϕ0 , ϕ0 ) = 0. – The minimum of U (·, ·) at the point ϕ0 is essential: Namely, using Or (ϕ), for any ϕ ∈ S, to denote the neighborhood Or (ϕ) = {ψ ∈ S ρ(ϕ, ψ) ≤ r}, we suppose that for some k > 0 we have max
ϕ,ψ∈Or (ϕ0 )
U (ϕ, ψ) ≤ C1 rk
for r small enough. – Attractiveness: for any ϕ, ψ ∈ S we have U (ϕ, ψ) ≥ C2 ρ(ϕ, ψ)k . B Introducing the indicators χ1i , χA b , and χb in the same way as in the case of XY model, we get the anticipated generalization. Its proof is a rather straightforward extension of the proof of Theorem 2.1 above applying the version of RP used in the proof of Theorem 3.1 in [CKS].
Theorem 2.3. Consider the site diluted model as described by the Hamiltonian HN in Eq. (2.16) with κ > 0 and with interaction U satisfying the conditions above. There exist small numbers and κ0 and, for every κ < κ0 , a region F (κ) ⊂ R so that for any µ ∈ F there exist inverse temperatures β1 (κ, µ), β2 (κ, µ), and βc (κ, µ) ∈ (β1 , β2 ) such that: i) For any β ∈ [βc , β2 ] there exists a state h i(o) κ,β,µ , for which hχ1i i(o) κ,β,µ ≥ 1 − for every site i. (A) (B) and h iκ,β,µ for which ii) For any β ∈ [β1 , βc ] there exist two states h iκ,β,µ (A) hχA b iκ,β,µ ≥ 1 −
and respectively, for every bond b.
( B) hχB b iκ,β,µ ≥ 1 − ,
638
L. Chayes, R. Koteck´y, S. Shlosman
Proof (sketch). The basic ideas have been spelled out in the above mentioned resources; the major distinctions are technical. We consider elementary hypercubes of side 2 and assign patterns A, B, ∅, etc. to these cubes. Any cube that does not fall into such a pattern is part of a contour. Estimates for the partition functions associated with these cubes are straightforward given the stated conditions. For example, the ordered partition function per site has upper and lower bounds of the form eβµ edβκ β −w with w > 0 depending on the details of the internal spin-space. Contour cubes are then listed (or classified) and controlled with a chessboard estimate. In general, each constrained site in the contour pattern costs a factor of β −w and there are not enough favorable bonds to compensate; the details are straightforward but tedious. The upshot is that all these objects are uniformly suppressed by inverse powers of β. For κ sufficiently small and µ > 0 we find a large β1 , where staggered order dominates and for β 1, any state is a perturbation of the fully occupied state. The claimed results follows, mutatis mutandis, from previous derivations. Remark . It is evident that this theorem applies even in cases where the undiluted models does not undergo a phase transition. Thus, for example, one can consider Gaussian lattice field ϕi ∈ Zν with the Hamiltonian X X X ni nj (ϕi − ϕj )2 + m2 ((ϕi )2 + (ϕj )2 ) − µ ni − κ ni nj , HN (nN , ϕN ) = hi,ji
i
hi,ji
and use Theorem 2.2 to show the existence of staggered phase.
3. Bond diluted models 3.1. The two-dimensional XY model. We consider TN as defined in Eq. (2.2) and the restriction HN of the Hamiltonian in Eq. (1.3) to TN . The partition function, here denoted by ZN,β , is given by P ZN,β,λ = ni,j = 0, 1 R Q (3.1) hi, ji ∈ TN exp{−βHN (nN , φN )} i∈TN dφi , where nN denotes a bond configuration on TN . Similarly, HN is used to define the finite volume states h−iN,λ,β on TN . (∅) Here, there are only two relevant infinite volume states: h−i(o) λ,β and h−iλ,β representing states with nearly all full and nearly all empty bond configurations. For this problem, (o) we consider the variable ni,j itself: h−i(o) λ,β is distinguished by a value of hni,j iλ,β that is close to one, while hni,j i(∅) λ,β is close to zero. Theorem 3.1. Consider the two-dimensional bond-diluted XY -model as described by the Hamiltonian in Eq. (1.3). Then for all β sufficiently large, there is a λc (β) and a small number such that at (β, λc (β)), there are two coexisting infinite volume states (∅) h−i(o) λc ,β and h−iλc ,β with hni,j i(o) λc ,β ≥ 1 − and hni,j i(∅) λc ,β ≤ .
Staggered phases in Diluted Systems
639
Proof. Denoting by χoN and χ∅N the indicators of the events that all bonds in TN are ∗ = ZN,β,λ hχ∗ iN,λ,β with ∗ = o or ∅, we occupied or, respectively, vacant, and ZN,β,λ start with the observation that ∅ )1/|TN | = 2π (ZN,β,λ
and
1 4 ∅ e2βλ e−1 √ ≤ (ZN,β,λ )1/|TN | ≤ e2βλ √ . β β
(3.2)
Indeed, the first part of Eq. 3.2 is trivial and, after pulling out a factor of eβλ for each bond, the second part is just the estimates performed for Eq. (2.8) in the proof of Lemma 2.2. It is thus clear that as λ → ±∞, fully occupied/fully vacant states are predominant. Next we consider the contour term. Here, the relevant contour piece occurs when two bonds, b = hi, ji and b0 = hi, j 0 i on the corners of a square satisfy ni,j = 1 and c 0 ni,j 0 = 0. Let χ1,0 b,b0 = nb (1 − nb ) denote the indicator for this event and let χN denote the indicator for the event that the torus is covered by the bond pattern that is defined by c denote the partition function associated with the indicator Eq. (2.4). Finally, let ZN,β,λ c χN . Then, by the chessboard estimate, we have c 1/|Tn | . hχ1,0 b,b0 iN,β ≤ [ZN,β,λ /ZN,β,λ ]
(3.3)
c )1/|TN | ≤ We claim (modulo terms that tend to one as N gets large) that (ZN,β,λ √ [(2π)1/4 eβλ [4/ β]3/4 ]. Indeed, this is the same sort of estimate as the bound (2.9) and may be proved as in Lemma 2.2. To finish this proof, it remains to show that this contour term is uniformly small if β is large. We follow the reasoning that was used to derive the bound in Eq. (2.15) and arrive at 1 43/4 e1/2 (3.4) hχ1,0 b,b0 iN,β ≤ 1/8 β (2π)1/4
which is manifestly small if β is large. By the previously used arguments, this shows that for some value of λ, there is phase coexistence. 3.2. The general case. Finally, we consider bond dilute problems with continuous spins in a more general setting. Thus we write X X ni,j U (ϕi , ϕj ) − λ ni,j (3.5) HN (nN , ϕN ) = hi,ji
hi,ji
for the Hamiltonian restricted to the d-dimensional torus of scale N . It is supposed that the space S and the potential U satisfy the conditions spelled out just prior to Theorem 2.3; the quantities λ and ni,j have the same meaning as in the XY -case. The following can be established: Theorem 3.2. Consider the d-dimensional bond-diluted models as described by the Hamiltonian in Eq. (3.5) and satisfying the subsequently stated conditions. Then the results stated in Theorem 3.1 hold in these cases. Proof. The necessary modifications are similar to those required in the generalization of the site-diluted cases; the relevant portion of [CKS] is Theorem 4.2.
640
L. Chayes, R. Koteck´y, S. Shlosman
References [CKS] Chayes, L., Koteck´y, R., and Shlosman, S.: Aggregation and Intermediate Phases in Dilute Spin Systems. Comm. Math. Phys. 171, 203–232 (1995) [FS] Fr¨ohlich, J. and Spencer, T.: The Kosterlitz–Thouless Transition in Two-dimensional Abelian Systems and the Coulomb Gas. Comm. Math. Phys. 81, 527–602 (1981) [FSS] Fr¨ohlich, J., Simon, B., and Spencer, T.: Infrared Bounds, Phase Transitions and Continuous Symmetry Breaking. Comm. Math. Phys. 50, 79–95 (1976) Communicated by Ya. G. Sinai
Commun. Math. Phys. 189, 641 – 654 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Primitive Vassiliev Invariants and Factorization in Chern-Simons Perturbation Theory? M. Alvarez1 , J.M.F. Labastida2 1 Center for Theoretical Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA. E-mail: [email protected] 2 Departamento de F´ısica de Part´ıculas, Universidade de Santiago, E-15706 Santiago de Compostela, Spain. E-mail: [email protected]
Received: 2 May 1996 / Accepted: 21 March 1997
Abstract: The general structure of the perturbative expansion of the vacuum expectation value of a Wilson line operator in Chern-Simons gauge field theory is analyzed. The expansion is organized according to the independent group structures that appear at each order. It is shown that the analysis is greatly simplified if the group factors are chosen in a certain way that we call canonical. This enables us to show that the logarithm of a polynomial knot invariant can be written in terms of primitive Vassiliev invariants only. 1. Introduction Vassiliev invariants, or numerical invariants of finite type, are a set of knot invariants first proposed in [1] to classify knot types. To each knot corresponds an infinite sequence of rational numbers which have to satisfy some consistency conditions in order to be knot class invariants. This infinite sequence is divided into finite subsequences, each one characterized by its order, which form vector spaces. The number of independent elements in each finite subsequence is called the dimension of the space of Vassiliev invariants at that order. Apart from the original definition [1] of these invariants, there are other approaches to the subject. Since their formulation in terms of inductive relations for singular knots [4, 5] and of their relation to knot invariants based on quantum groups or in ChernSimons gauge theory [6, 7, 4, 8, 5], several works have been performed to analyze Vassiliev invariants in both frameworks [9–13]. In [9, 10] it was shown that Vassiliev invariants can be understood in terms of representations of chord diagrams without isolated chords modulo the so called 4T relations (weight systems), and that using semi-simple Lie algebras weight systems can be constructed. It was also shown in [10], using Kontsevitch’s representation for Vassiliev invariants [14] that the space of ? This work is supported in part by funds provided by the U.S.A. DOE under cooperative research agreement #DE-FC02-94ER40818 and by the DGICYT of Spain under grant PB93-0344.
642
M. Alvarez, J.M.F. Labastida
weight systems is the same as the space of Vassiliev invariants. In [11] it was argued that these representations are precisely the ones underlying quantum-group or Chern-Simons invariants. We observed in [12] that the generalization of the integral or geometrical knot invariant first proposed in [15] and further analyzed in [7], as well as the invariant itself, are Vassiliev invariants. In [12] we proposed an organization of those geometrical invariants and we described a procedure for their calculation from known polynomial knot invariants. This procedure was applied to obtaining Vassiliev knot invariants up to order six for all prime knots up to six crossings. These geometrical invariants have also been studied by Bott and Taubes [16] using a different approach. The relation of this approach to the one in [12] has been studied recently in [17]. An interesting outcome of the analysis presented in [12] is the well known fact that the Vassiliev invariants of a given knot form an algebra in the sense that the product of two invariants of orders i and j is an invariant of order i + j. Therefore the set of independent Vassiliev invariants at a given order can be divided in two subsets: those that are products of invariants of lower orders (composite invariants), and those that are not (primitive invariants). We shall call the decomposition of a Vassiliev invariant as a product of lower order Vassiliev invariants “factorization”. This phenomenon is most clearly exposed after choosing a particular kind of basis of group factors that we will call “canonical”. The detailed description of these bases and its significance to the theory of numerical knot invariants of finite type is the main goal of the present work. More precisely, in this paper we shall show that the factorizations observed in [12] can be resummed in a single exponential, which includes only the primitive Vassiliev invariants of the knot C, thus disentangling the contribution of these primitive invariants to all orders in perturbation theory. This, which is the main result of this paper, can be regarded as an extension of the theorems by Birman and Lin in [4, 8, 5] where it is proven that the coefficients of the power expansion of any Chern-Simons or quantum group polynomial invariant is a Vassiliev invariant. Our main result is contained in the “Factorization Theorem” presented in Sect. 5 and it can be very simply stated as follows: Let C be a knot and let HtR (C, G) be a Chern-Simons or quantum group polynomial invariant associated to a compact semi-simple Lie group G and a representation R of G (normalized so that for the unknot it takes the value 1). HtR (C, G) is a polynomial in t. Let WxR (C, G) be obtained from HtR (C, G) by replacing the variable t by ex and let us consider the power series expansion of log WxR (C, G) around x = 0: log WxR (C, G) =
∞ X
wic xi .
(1)
i=0
Then, w0c = 0 and each wic , i ≥ 1, is a primitive Vassiliev invariant relative to a canonical basis. The proof of this theorem is accomplished through the choice of a canonical basis for the independent group factors at each order in the perturbative expansion of the vacuum expectation value of the Wilson line, hWR (C, G)i, which is precisely, up to a normalization factor, HtR (C, G), or WxR (C, G), above. The use of these bases reveals a simplicity in the perturbative expansion that can hardly be grasped otherwise. A logarithm of a formal power series of generators of group factors was considered in [7] showing that only connected (primitive) group factors remain after introducing certain algebraic structure. Our result is stronger than that since we show that, after
Primitive Vassiliev Invariants and Factorization
643
choosing a canonical basis, the power series expansion corresponding to the logarithm of a Chern-Simons or quantum group polynomial invariant contains only contributions from primitive groups factors and geometrical factors associated to primitive group factors. This paper is organized as follows. Section 2 contains an elementary exposition of Chern-Simons quantum field theory, along with the definition of the object of interest in this article: the Wilson loop operator. In section 3 we introduce the general structure of the perturbative expansion of hWR (C, G)i and the definition of canonical bases. Section 4 presents some consequences of our having chosen a canonical basis to express the perturbative expansion, encoded in a “Master Equation”. In Sect. 5 these results are shown to lead to the factorization of the expansion in a single exponential. It is noteworthy that this implies that the logarithm of the polynomial invariant contains primitive Vassiliev invariants only; the concept of primitive invariant has a simple definition in a canonical basis. Changes of basis are analyzed in Sect. 6. Section 7 contains our conclusions.
2. Chern-Simons Theory In this section we will recall a few facts on Chern-Simons gauge theory. Let us consider a compact semi-simple Lie group G and a connection A on R3 . The Chern-Simons action is defined as: Z k 2 (2) Tr(A ∧ dA + A ∧ A ∧ A), Sk (A) = 4π R3 3 where Tr denotes the trace in the fundamental representation of G. Given a knot C, i.e., an embedding of S 1 into R3 , we define the Wilson line associated to C carrying a representation R of G as: I (3) WR (C, G) = Tr R P exp A , where “P” stands for path ordered and the trace is to be taken in the representation R. The vacuum expectation value is defined as the following ratio of functional integrals: Z 1 [DA]WR (C, G) eiSk (A) , (4) hWR (C, G)i = Zk where Zk is the partition function: Zk =
Z [DA] eiSk (A) .
(5)
The theory based on the action (2) possesses a gauge symmetry which has to be fixed. In addition, one has to take into account that the theory must be regularized due to the presence of divergent integrals when performing the perturbative expansion of (4). Regarding these two problems we will follow the approach taken in [12]. It is known [19] that in the Landau gauge the contribution of the one-loop gauge field self-energy and one-loop gauge field vertex to the perturbative expansion of any vacuum expectation value can be traded by a shift in the parameter k that multiplies the Chern-Simons action: k → k − CA , CA being the quadratic casimir in the adjoint representation of G. In so doing we do not need to include one loop two- or three-point gauge field subdiagrams
644
M. Alvarez, J.M.F. Labastida
in the perturbative expansion. Also, it has been shown [20] that higher-order corrections to the gauge field two- and three-point functions vanish. There is one more problem emanating from perturbative quantum field theory which must be considered. Often, products of operators Aµ (x)Aν (y) must be considered at the same point x = y, where they are ambiguous. This situation can be solved [21, 15] without spoiling the topological nature of the theory. In the process one needs to introduce a framing attached to the knot which is characterized by an integer n. It was shown in [18] that working in the standard framing, n = 0, is equivalent to ignoring diagrams containing collapsible propagators in the sense explained in [18, 12].
3. General Structure of the Perturbative Expansion The facts mentioned in the previous section (exclusion of loop contributions to gauge field two- and three-point functions, and of collapsible propagators) simplify considerably the perturbative analysis of the vacuum expectation value (4). As shown in [12] the perturbative expansion of the vacuum expectation value of the Wilson line (3) has the form: di ∞ X X αij rij xi , (6) hWR (C, G)i = d(R) i=0 j=1 2πi where x = k−C , and d(R) is the dimension of the representation R. The factors αij and A rij in (6) incorporate all the dependence dictated from the Feynman rules apart from the dependence on k which is contained in x. The power of x, i, represents the order in perturbation theory. Of the two factors, rij and αij , the first one contains all the grouptheoretical dependence, while the second all the geometrical dependence. The quantity di denotes the number of independent group structures rij which appear at order i. The first values of di , αij and rij are: α01 = r0,1 = 1, d0 = 1 and d1 = 0. Notice that we are shifting k in the definition of x and therefore no diagrams with loop contributions to two and three-point functions should be considered. In addition, there is no linear term in the expansion (d1 = 0) so that diagrams with collapsible propagators (isolated chords) should be ignored in the sense explained in [12]. It was proven in [12] that the quantities αij are Vassiliev invariants of order i. We introduce now some vocabulary in order to classify Feynman diagrams. We will assume that the reader is familiar with the types of trivalent Feynman diagrams appearing in Chern-Simons perturbation theory. These diagramas are trivalent graphs with a distinguished line called Wilson line which carry the representation chosen. A detailed account of Chern-Simons perturbation theory specially suited for our pourposes can be found in [12]. We begin introducing the notion of “connected” loop diagram. We will say that a diagram is a connected loop diagram if it is possible to go from one propagator (or internal line) to another without ever having to go through the Wilson line. If the diagram is “disconnected” that is not possible. In this second case we say that the diagram has subdiagrams, which are the connected components of the whole loop diagram. We say that two subdiagrams are “non-overlapping” if we can move along the Wilson line meeting all the legs of one subdiagram first, and all the legs of the other second. Here, “legs” means the propagators directly attached to the Wilson line. If it is impossible to do that, the subdiagrams are “overlapping”. In Fig. 1 the diagram a is connected while the diagrams b and c are disconnected. Of these last two, diagram b contains subdiagrams which are overlapping while c does not.
Primitive Vassiliev Invariants and Factorization
a
645
b
c
Fig. 1. Examples of diagrams
In the general expansion (6) there are many possible choices of the independent groups factors rij . Given all Feynman diagrams contributing to a given order in perturbation theory some of the resulting group factors might be related due to the relations among the generators T a and the structure constants fijk of semi-simple groups. From a diagramatic point of view these relations are the so called STU and IHX relations [10]. Since for a semi-simple group the structure constants can be chosen antisymmetric there is no need to distinguish orientation of internal three-vertices. The group factors entering (6) are chosen to be associated to diagrams that are independent. Of course, many choices are possible. Each possible set of group factors rij represents a basis. There are two simple but far-reaching facts about the bases rij which we summarize in two propositions: Proposition 1. It is always possible to choose a basis such that the rij come from connected diagrams, or products of connected diagrams. That is, if there are subdiagrams, they can be chosen so that they do not overlap. The value of such an rij is the product of the values of its subdiagrams. Proposition 2. The rij which are products can be chosen as products of connected rij ’s of lower orders. These propositions follow from two simple facts. First, using STU relations it is always possible to trade in a disconnected diagram overlapping subdiagrams by connected diagrams and disconnected diagrams containing non-overlapping subdiagrams. Second, if a loop diagram is not connected, and its subdiagrams are non-overlapping, its group factor is the product of the group factors of its subdiagrams. This last statement follows from the fact that if one cuts a Wilson line at a given point where no leg is inserted the resulting matrix is a diagonal matrix. Propositions 1 and 2 are very important because a basis of group factors such constructed shows the following unique feature: a given connected rij begets a whole family of other group factors at higher orders in which it enters as a subdiagram. A basis constructed following Propositions 1 and 2 shall be called “a canonical basis”. The basis used in [12] up to order six is canonical. The diagrams chosen are reproduced in Fig. 2.
4. The Master Equation Let the gauge group be a product G ⊗ G0 , where G and G0 are compact semi-simple groups. From a path-integral representation of the vacuum expectation values (vev), the following identity is obvious:
646
M. Alvarez, J.M.F. Labastida
r2,1
r3,1
r4,1
r4,2
r4,3
r5,1
r5,2
r5,3
r5,4
r6,1
r6,2
r6,3
r6,4
r6,5
r6,6
r6,7
r6,8
r6,9
Fig. 2. Example of a canonical basis up to order 6
hWR⊗R0 (C, G ⊗ G0 )i = hWR (C, G)ihWR0 (C, G0 )i .
(7)
When combined with the choice of the same canonical basis for all the vev’s, this equation proves to be most fruitful. In order to show this, consider an rij composed of several (p) connected subdiagrams which we denote by rij with p = 1, . . . , #(ij). Some of them (p) may be identical. In a canonical basis these rij are elements of the basis at lower orders, and therefore are associated with geometrical factors which we denote by (αij )(p) . If the Lie group is simple, we know that rij (G) =
#(ij) Y
(p) rij (G),
(8)
p=1
but if the Lie group is a product, we find that rij (G ⊗ G0 ) =
#(ij) Y
(p) (p) rij (G) + rij (G0 ) .
p=1
(9)
Primitive Vassiliev Invariants and Factorization
647
Inserting Eq. (6) in Eq. (7) and putting things together, we arrive at the “Master Equation”: di ∞ X X
αij (C)
i=0 j=1
ÿ
#(ij) Y
(p) (p) rij (G)xi + rij (G0 )x0
p=1
dk ∞ X X
!ÿ
αkl (C)rkl (G)xk
dm ∞ X X
i
= !
m n αm (C)rmn (G0 )x0
,
(10)
m=0 n=1
k=0 l=1
0 ). In Eqs. (8), (9) and (10) we have where x = 2πi/(k − CA ) and x0 = 2πi/(k − CA shown explicitly the fact that the group factors rij depend only on the group-theoretical data while the geometrical factors αij depend only on the knot C. Actually, we ought to have indicated the representations R and R0 in the group factors. We did not do it to avoid a cumbersome notation but it certainly should be understood. The matching of the two polynomials in x and x0 in (10) produces an infinite string of identities relating αij ’s at a given order with products of the (αkl )(p) of its components. The general result is as follows. Let us consider a composite rij that consists of p(k) ij connected non-overlapping subdiagrams of some type k, with k = 1, . . . , N . This means in particular that N X
p(k) ij = #(ij) .
(11)
k=1
(The only purpose of this formula is to clarify the notation). We call rij;k the connected subdiagram of type k, which in a canonical basis is also an element of the basis at a lower order, and therefore is associated to a geometrical factor denoted by αij ;k . In other words, the element rij contains the following connected subdiagrams: (1) rij (p(1) +1)
rij ij
= = etc.
(p(1) )
(2) rij = . . . = rij ij ≡ rij;1 , (p(1) +2)
rij ij
(p(1) +p(2) ) ij
= . . . = rij ij
≡ rij;2 ,
(12)
The Master Equation (10) leads to a formula for the αij associated with our composite rij : Theorem 1. αij =
N Y 1 k=1
p(k) ij !
αij ;k
p(k) ij
.
(13)
Note that this is true only for a canonical choice of basis. This general result is the key to the next sections.
5. Factorizations When the approach described in the preceding sections was first proposed in [12], the canonical basis did not include any elements at order 1. The reason is that the only contribution to the vacuum expectation value at that order would be the framing factor, which is not intrinsic to the knot. In that paper the interest was centered in extracting
648
M. Alvarez, J.M.F. Labastida
numerical knot invariants from the perturbative expansion, so it was natural to exclude the framing factor and its corresponding element of the basis. We can include the framing factor if we enlarge our basis with r11 = C2 , the quadratic casimir. The geometrical factor associated with this new group structure will be denoted by n for it is the framing. We are including these new elements here in order to show the simplest application of Eq. (13). According to Propositions 1 and 2 there is a new basis in which each rkl originates a family of elements of the form rkl C2 , rkl C22 and so on. This set of elements of the new basis originated by rkl can be called “the kl-C2 -family”. Let us focus on a given kl-C2 -family and consider its contribution to hWR (C, G)i. Theorem 1 shows that this contribution is l k l k+1 l 2 1 2 k+2 (14) d(R) αk rkl x + αk rkl C2 n x + αk rkl C2 n x + . . . , 2 and the following terms are the expansion of an exponential. It follows that Eq. (14) is equal to: (15) d(R)αkl rkl xk eC2 nx . Therefore, repeating the same argument for each kl-C2 -family (which are all independent because the rkl are) we get: Theorem 2. hWR (C, G)i = hWR (C, G)i|n=0 eC2 nx .
(16)
This agrees exactly with the non-perturbative result [21] and can be regarded as a simplified proof of the factorization of the framing factor shown in [18]. Now, we can use a similar approach to factorize more structures. The idea is the same: the selection of a given connected element of our canonical basis, the “dressing” of another element of the basis with copies of our selected subdiagram (thus creating a family of diagrams in the sense explained above) and the repeated use of Theorem 1. The rest of this section dwells on this subject. The reasoning that led to the factorization of the C2 structure can be applied without changes to any other connected rij . The only peculiarity of C2 is that its addition to an element of the basis at order k, say rkl , leads to another element at order k + 1, its “first descendant” in the kl-C2 -family. If we want to factorize r21 , the members of the kl-r21 -family would have orders k + 2, k + 4 and so on in double steps. Proposition 2 says that our basis can be constructed so that it contains this new family. Let rk+2q,l be a member of the kl-r21 -family generated by rkl and a q-fold insertion of r21 , i.e. (17) rk+2q,l = rkl (r21 )q . Theorem 1, and the fact that we have chosen a canonical basis, enables us to prove that the contribution of this family to hWR (C, G)i is 1 l 12 l k l 1 k+2 2 k+4 d(R) αk rkl x + αk α2 rkl r21 x + αk (α2 ) rkl (r21 ) x + . . . 2 1
2
= d(R) αkl rkl xk eα2 r21 x .
(18)
We can repeat the same steps for all kl-r21 -families because they are all independent; in so doing we arrive at:
Primitive Vassiliev Invariants and Factorization
649
Theorem 3. 1
2
hWR (C, G)i = hWR (C, G)i|α1 =0 eα2 r21 x .
(19)
2
And nothing hinders the generalization of this theorem to what can be seen as the full expression of the concept of perturbative factorization of the vacuum expectation value: Factorization Theorem. hWR (C, G)i = d(R) exp
dˆi ∞ X X
c αij c (C) rij (G) xi
i=1 j=1
,
(20)
c where rij denotes the connected elements of the basis, and αij c their corresponding geometrical factors. These αij c do not correspond uniquely to connected diagrams bec include geometric factors from both connected and disconnected Feynman cause the rij diagrams. The symbol dˆi stands for the number of connected elements in the canonical basis at order i.
Equation (20) can be written in the form log
1 hWR (C, G)i d(R)
=
dˆi ∞ X X
c αij c (C) rij (G) xi ,
(21)
i=1 j=1
which is the result announced in (1). Equation (21) is reminiscent of the well known fact in quantum field theory that the logarithm of the generating functional can be expanded in terms of connected diagrams only. The relevance of this formula to the theory of knot invariants comes from the identification of the vacuum expectation value of the Wilson line as a polynomial invariant. Actually we are in a much more general situation because we are considering an arbitrary semi-simple gauge group, so our vev is in some sense the most general polynomial invariant possible. Therefore, Eqs. (20) or (21) prove that if a canonical basis is chosen, the logarithm of a polynomial knot invariant can be expanded in terms of the primitive Vassiliev invariants of the knot only. The primitive Vassiliev invariants αij c (C) have been computed up to order six for all prime knots up to six crossings [12] and for arbitrary torus knots [22]. It was conjectured in [12] that there exist a normalization for the αij c (C) in which they are integer-valued. The integral expressions for α21 (C) and α31 (C) were first presented in [15] and in [12] respectively. Properties of these two primitive Vassiliev invariants have been studied in [7] and in [23].
6. Change of Basis In this section we want to investigate to what extent the previous results are independent of the basis chosen. We are thus led to consider changes of basis. First we treat changes 0 its of canonical basis. Let B and B 0 be two different canonical bases, being rij and rij elements. The most general change of canonical basis is 0 = Njk rik , rij
(22)
650
M. Alvarez, J.M.F. Labastida
where N is a di × di matrix 1 yet to be determined. We are assuming that the vectors in this space are written as columns. To see that (22) is the most general change between B and B 0 , consider a possible extra term in the right-hand side. It has to be a product of several rkl ∈ B such that the sum of the orders of its factors is i. But by definition of canonical basis of such a product is also an element of B at order i and therefore the extra term can be absorbed into the first term. This shows that Eq. (22) is indeed the most general change of canonical basis. It is elementary to prove that in order to preserve the independence of the elements of B at each order, the matrix N has to be non-singular: det N 6= 0 .
(23)
We now analyze the effect of the change of basis (22) on the geometric factors αij . The vectors in this space of numerical factors are written as rows. The starting point is the invariance of the vacuum expectation value under changes of basis. Therefore, at each order i it is true that, di di X X j αij rij = α0 i r0 ij , (24) j=1
j=1
and it follows that the αij transform “contravariantly”: j j α0 i = αik N −1 k .
(25)
In general, at order i the change of canonical basis involves a matrix N ∈ GL(di , Q). We do not know how to characterize these matrices completely. Only a small subset of the whole linear group is relevant. For example we can discard N ’s which are mere permutations, or diagonal, since they do not lead to essentially new bases. More important, the elements of the basis B 0 will be linear combinations of the elements of B, but these linear combinations must be interpretable as new diagrams because we want B 0 to be canonical. In other words, we would have to investigate which linear combinations of independent diagrams can be written as a single diagram. The resulting subgroup of GL(di , Q) would be equivalent to the space of independent canonical bases at order i. The N ’s have some properties that do not depend on these details. Let us order the elements of a canonical basis B at order i as follows: n o c , . . . , ri,c dˆi , ri,dˆi +1 , . . . , ri,di , (26) Bi = ri1 i.e. the connected elements before, and the disconnected elements after. A given disconnected element of B can be written as a product of elements of B of lower orders, rij = rkl ri−k,s ,
(27)
where l and s depend implicitly on j, but this will not be relevant in what follows. If we write the identity (27) in a new canonical basis B 0 , it reads Njp r0 ip = Nlq r0 kq Nst r0 i−k,t .
(28)
Note that the N ’s operate on different spaces. What we have on the right hand side is a linear combination of elements of B 0 at order i, because all these products of rkq ’s times ri−k,t ’s must be elements of B 0 (it is canonical). On the left hand side we have other 1
There is an N at each order i, but we are not indicating this fact.
Primitive Vassiliev Invariants and Factorization
651
elements of the same basis at the same order. Therefore Eq. (28) is a contradiction unless 0 are themselves products and, therefore, correspond to disconnected diagrams. the rip The conclusion is that in a change of canonical basis, the disconnected r’s in B come from the disconnected r’s in B 0 . This result can be summarized in the following symbolic representation of a matrix N valid for a change of canonical basis: NC → C C→C , (29) N = 0 NC → NC where C and N C mean “connected” and “non-connected” respectively. In this notation a change of basis would be written as c A B r0 rc = , (30) 0 nc rnc r C D where C = 0. The blocks in the diagonal are square matrices because all canonical bases must have the same number dˆi of connected elements at a given order i. These matrices have an interesting property: they form a subgroup of GL(di , Q). The inverse of a given element is −1
A 0
and the determinant is
B
=
A−1
D
0
det
−A−1 BD−1 −1
(31)
D
A
B
0
D
= det A det D .
(32)
Therefore, the matrix is non-singular if and only if its diagonal blocks are non-singular. We are assuming that the matrix represents a valid change of basis, thus nothing is singular and the inverses in Eq. (31) do exist. 6.1. Diagrammatic interpretation. We can sharpen the previous result by analyzing the constraints that the diagrammatic origin of the r’s imposes on their algebra. For example, the product of two r’s is interpretable as a diagram in an obvious way (actually, as an equivalence class of diagrams). The big constraint is on the sum of two r’s. We are now investigating the algebra of diagrams which are either connected or disconnected and non-overlapping, independently of their being independent or not, i.e. of being elements of a canonical basis or not. We shall call these diagrams generally r, and will not consider diagrams with overlapping subdiagrams at all. The question is: when is the sum of two r’s interpretable as a diagram? The answer comes in two parts: Lemma. 1. Let r1 and r2 be two different connected diagrams of the same order i. Their sum or difference r1 ± r2 exists as a diagram of order i if and only if r1 and r2 are two of the terms in an STU or IHX relation. The sign in ± depends on which two terms of those relations are r1 and r2 related to.
652
M. Alvarez, J.M.F. Labastida
2. If r1 and r2 are not connected, they can differ only in a single subdiagram. Again, the subdiagrams that are different must be two of the terms in an STU or IHX relation. More complicated linear combinations of r’s that are interpretable as diagrams can always be decomposed in elementary steps, each of them satisfying the lemma. We introduce now a new concept. A linear combination of r’s that can be interpreted as a diagram shall be called a “valid” linear combination. To be completely rigorous, a valid linear combination should be written in an unambiguous way by using parentheses to indicate which r’s are added to which other r’s and in what order. For example, r1 + r2 − r3 = (r1 + r2 ) − r3
or
r1 + (r2 − r3 )
or
(r1 − r3 ) + r2 .
(33)
Which one of the three possibilities is the good one depends on the diagrams. In this sense, the addition of r’s is commutative but not associative. This may be a minor point, because the numerical values of the r’s can be added freely, but we are now focusing on the formal properties of the r’s as elements of an “algebra” 2 . In general, when we write a valid linear combination of r’s we shall assume that there is an ordering of the additions such that each step complies with the lemma. This ordering depends on the particular diagrams in the linear combination and thus will not be indicated in general. 6.2. Arithmetic of diagrams. The previous subsection establishes the rules for an arithmetic of diagrams. The r’s are elements of an algebraic structure in which we can always multiply, but not always add or subtract (see the lemma above). As for the division, an expression like r1 /r2 is an r only if r2 is a subdiagram of r1 , in which case we say that r2 divides r1 . There is a neutral element for the product: the empty diagram. Therefore no diagrammatic interpretation exists for 1/r2 . The set of r’s with the addition is not even a group, for not all diagrams can be added. This lack of structure precludes an abstract formulation of the algebra of group factors r. We need to know which is the diagram that a given r represents in order to ascertain if it can be added to another r or not. Nevertheless we can prove some theorems which have a bearing in our investigation. Theorem 4. Let r1 and r2 be two diagrams of the same order i that can be added or subtracted, and r3 , r4 be two diagrams whose orders add to i. Then, or r4 |r1 and r4 |r2 , (34) r1 ± r2 = r3 r4 ⇐⇒ r3 |r1 and r3 |r2 where a|b means that a divides b. The proof follows from the lemma. An immediate generalization of this theorem is that a valid linear combination of r’s is disconnected non-overlapping if and only if so is each r in the linear combination. A similar conclusion holds for valid linear combinations of connected diagrams. We want to gather these important conclusions in three remarks: 1. No valid linear combination of r’s includes connected and disconnected nonoverlapping diagrams at the same time. 2. A valid linear combination of connected diagrams is a connected diagram. 3. A valid linear combination of disconnected non-overlapping diagrams is a disconnected non-overlapping diagram. All of them have the same number of subdiagrams. We can say that “a valid linear combination of diagrams conserves the number of components”. A closely related result is that a valid change of canonical basis must be represented by a block-diagonal matrix: 2
Strictly speaking they do not form an algebra, whence the quotation marks.
Primitive Vassiliev Invariants and Factorization
653
Theorem 5. Let N be a matrix corresponding to a change of canonical basis at order i, written in the form N=
A
B
C
D
,
(35)
where A is a non-singular dˆi × dˆi matrix, and D a non-singular (di − dˆi ) × (di − dˆi ) matrix. Then, B=C=0. (36) To prove this theorem notice that under a change of basis the connected r’s transform as (see Eq. (30) for notation) rc = Ar0 + Br0 c
nc
.
(37)
Given that the l.h.s. is a diagram we observe that the r.h.s. is a valid linear combination of diagrams. But the lemma implies that all diagrams in the r.h.s. must be connected, or all of them disconnected non-overlapping, i.e. either A=0 or B=0. It is clear from the lemma that the only option is B=0. A similar argument establishes that C=0, which we already knew from previous considerations, see Eq. (29). The final picture for a valid N is A 0 (38) N= 0 D with A and D non-singular square matrices. In words, the connected r’s transform independently from the disconnected r’s; they never mix in a change of canonical basis. In particular this shows that Eq. (20) is consistent under a change of canonical basis. No matter what canonical basis we choose, the only relevant diagrams are those that c . As for the change from a canonical B to a non-canonical B 0 , we contribute to the rij have little to say. The concept of factorization as described here only makes sense for canonical bases, and only in this case we have Eq. (20). We regard canonical bases as privileged systems of reference in which the perturbative expansion is at its simplest.
7. Conclusions We have shown that the perturbative expansion of the vev of a Wilson line in ChernSimons quantum field theory shows a striking simplicity if presented in terms of a canonical basis for the group factors. These bases provide a simple characterization of primitive Vassiliev invariants: they are the geometric factors associated to the connected elements of a canonical basis. Within this framework it is possible to factorize the whole perturbative expansion in separate contributions from each primitive invariant. Each of these factors can be resummed in an exponential. The relevance of this result for the theory of numerical knot invariants is that the logarithm of a polynomial invariant contains only the primitive invariants. This work opens a variety of investigations. One should study further the algebraic structure of the set of canonical bases. As follows from the discussion in Sect. 6, at each
654
M. Alvarez, J.M.F. Labastida
order i this set is characterized by a subgroup of GL(dˆi , Q). Methods to find the groups corresponding to each order should be investigated. Another important extension of our work consists of the study of factorization in the context of n-component links. Vassiliev invariants for n-component links have not been much studied and it is very likely that some of the ideas behind factorization can be used in the organization of their structure. The extension of the concept of canonical basis and primitiveness should be explored in that case following a similar analysis to the one presented in this paper. Work on this and other aspects of Vassiliev invariants associated to n-component links will be reported elsewhere. Acknowledgement. M.A. is indebted to Prof. John Negele and the CTP for hospitality, and to the Spanish CICYT for financial support. J.M.F.L. thanks E. P´erez for many intersting discussions on Vassiliev invariants.
References 1. Vassiliev, V.A.: Cohomology of knot spaces. In Theory of Singularities and its applications, V.I. Arnold, ed., Am. Math. Soc., Providence, RI: 1990 2. Vassiliev, V.A.: Topology of complements to discriminants and loop spaces: In Theory of Singularities and its applications, V.I. Arnold, ed., Am. Math. Soc., Providence, RI: 1990 3. Vassiliev, V.A.: Complements of discriminants of smooth maps: topology and applications. Trans. Math. Monographs, vol. 98, Providence, RI: AMS, 1992 4. Birman, J.S. and Lin, X.S.: Inv. Math. 111, 225 (1993) 5. Birman, J.S.: Bull. AMS 28, 253 (1993) 6. Bar-Natan, D.: Weights of Feynman diagrams and the Vassiliev knot invariants. Preprint, 1991 7. Bar-Natan, D.: Perturbative aspects of Chern-Simons topological quantum field theory. Ph. D. Thesis, Princeton: Princeton University, 1991 8. Lin, X.S.: Vertex models, quantum groups and Vassiliev knot invariants. Columbia University preprint, 1991 9. Kontsevich, M.: Adv. in Sov. Math. 16, Part 2, 137 (1993) 10. Bar-Natan, D.: Topology 34, 423 (1995) 11. Piunikhin, S.: J. Knot Theory and its Ramif. 4, 163 (1995) 12. Alvarez M. and Labastida, J.M.F.: Nucl.Phys. B433, 555; (1995); Erratum, Nucl. Phys. B441, 403 (1995) 13. Hayashin, N.: Graph Invariants of Vassiliev Type and Application to 4D Quantum Gravity. UT-Komaba 94-8, March 1995, q-alg/9503010 14. Kontsevich, M.: Graphs, homotopical algebra and low-dimensional topology. Preprint, 1992 15. Guadagnini, E., Martellini M. and Mintchev,M.: Phys. Lett. B227, 111 (1989); Phys. Lett. B228, 489 (1989); Nucl. Phys. B330, 575 (1990) 16. Bott, R. and Taubes, C.: J. Math. Phys. 35, 5247 (1994) 17. Altschuler, D. and Freidel, L.: Vassiliev knot invariants and Chern-Simons perturbation theory to all orders. ETH-TH/96-35 18. Alvarez, M. and Labastida, J.M.F.: Nucl. Phys. B395, 198 (1993) 19. Alvarez-Gaume, L., Labastida, J.M.F., Ramallo, A.V.: Nucl. Phys. B334, 103 (1990) 20. Giavarini, G., Mart´ın, C.P., Ruiz, F.: Nucl. Phys. B381, 222 (1992) 21. E. Witten: Commun. Math. Phys. 121, 351 (1989) 22. Alvarez, M. and Labastida, J.M.F.: Journal of Knot Theory and Its Ramifications 5, 779 (1996) 23. Hirshfeld, A.C. and Sassemberg, U. Derivation of the total twist from Chern-Simons gauge theory. DOTH 95/02, 1995; Explicit formulation of a third order finite knot invariant derived from Chern-Simons gauge theory, DO-TH 95/16, 1995 Communicated by R.H. Dijkgraaf
Commun. Math. Phys. 189, 655 – 665 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Uniqueness of the Functional Determinant Matthew J. Gursky? Department of Mathematics, Indiana University, Bloomington, IN 47405-5701, USA. E-mail: [email protected] Received: 30 October 1996 / Accepted: 21 March 1997
Abstract: The functional determinant of the conformal laplacian and the square of the Dirac operator are known to be extremized at the standard round metric of the four-sphere among all conformal metrics (up to gauge equivalence). In this article we show that this is the unique critical point, thus extending the work of Onofri and Osgood, Phillips and Sarnak for the functional determinant on S 2 which characterized the constant curvature metric as the unique critical point of the determinant. In addition, we introduce a new symmetric two-tensor field which is defined on any conformally flat four-manifold and can be viewed as a fourth order generalization of the Einstein gravitational tensor. As a consequence we prove a Pohozaev identity for manifolds with boundary which admit conformal Killing vector fields. Introduction The functional determinant introduced by Ray and Singer ([RS]) and studied extensively for compact surfaces by Osgood, Phillips, and Sarnak([OPS1, OPS2]) has recently received considerable attention in the context of four-manifolds ([BCY, BO, CY1, CQ]). The purpose of the present paper is to establish the uniqueness of critical points for the functional determinant on the four-sphere. We also introduce a new metrically-defined symmetric two-tensor which arose as a technical tool in the proof and which may be of independent interest in the study of four-dimensional conformal geometry. In order to motivate our results it will be helpful to review the work of Onofri for the determinant on S 2 . If g = e2w g0 , where g0 denotes the standard round metric, then the Polyakov ([Po]) formula gives Z det 1g 1 log =− (|∇w|2 + 2w). (0.1) det 1g0 3 S2 ?
Research Supported in Part by NSF Grant DMS-9623048
656
M. J. Gursky
R R (In (0.1) and in what follows we will adopt the notation = (vol )−1 ). Since (0.1) is not scale invariant, we introduce the the auxiliary functional Z Z e2w − (|∇w|2 + 2w). F [w] = log S2
S2
Theorem. ([On]). F [w] ≤ 0 with equality if and only if e2w g0 = ϕ∗ g0 for some conformal map ϕ of S 2 . Corollary. ([On see also [OPS1]). Among all metrics on S 2 of volume 4π, the standard metric (and its images under the conformal group) maximizes det 1g . Although the preceding corollary completely characterizes maximizers of the determinant, there remains the question of lower energy critical points. A first variation of F shows that critical points satisfy the euler equation 1w + e2w = 1. Since the Gauss curvature Kg of g = e2w g0 is given by 1w + Kg e2w = 1, we conclude that
Kg ≡ 1.
It is then standard that g = e2w g0 = ϕ∗ g0 for a conformal map ϕ; i.e., w is a maximizer. Turning to the situation in four dimensions, we say that the metrically-defined differential operator A is conformally covariant if whenever g = e2w g0 , Ag u = e−3w Ag0 (ew u).
(0.2)
In [BO] the authors computed an explicit formula for the functional determinant of such an operator. Imitating [CY1] we write this formula as F [w] = log det Aw /det A0 = γ1 I[w] + γ2 II[w] + γ3 III[w],
(0.3)
where γi = γi (A). Examples of operators which satisfy (0.2) include 1. The conformal laplacian L = 1 − R/6. In this case (γ1 , γ2 , γ3 ) = (1, −4, − 23 ). 2. The spin laplacian 6∇2 . In this case (γ1 , γ2 , γ3 ) = (7, −88, − 14 3 ). Instead of giving the most general formula for each “subfunctional” in (0.3), we restrict our attention to the sphere (S 4 , g0 ). In this case (and more generally, for any locally conformally flat manifold) the functional I vanishes. For the others we have Z Z ¯ (1w)2 + 2|∇w|2 − 3 log e4(w−w) , II[w] = 4 S ) (Z 2 1ew 2 − 4|∇w| , III[w] = 12 ew S4 Z w. w¯ = S4
Remarkably, both functionals are non-negative. For II this is a consequence of a sharp logarithmic Sobolev inequality due to Beckner; for III it follows from the usual sharp sobolev inequality on the sphere.
Uniqueness of the Functional Determinant
657
Theorem. (i) ([Bec]). II[w] ≥ 0 with equality if and only if e2w g0 = ϕ∗ g0 for some conformal map ϕ of S 4 . (ii) ([BCY], [Bec1]). III[w] ≥ 0 with equality if and only if e2w g0 = ϕ∗ g0 for some conformal map ϕ of S 4 . For the operators L and 6∇2 the coefficients γ2 and γ3 have the same sign, so we conclude Corollary. ([BCY]). Among all metrics g = e2w g0 on S 4 , the standard metric (and its images under the conformal group) extremizes log det Lg /det Lg0 and log det6∇2g /det 6∇2g0 . As in the two-dimensional case, the corollary above characterizes extremals of F . But what about other critical points? The Euler equation for a critical point g = e2w g0 is (see [CY1]) γ2 Qg − γ3 1g Rg = 3γ2 , (0.4) where 1 Qg = 12
1 2 −1g Rg − 3|Eg | + Rg , 4 2
(0.5)
1 Rg g, (0.6) 4 and Rg is the scalar curvature of g. It is now far from clear whether g is conformally equivalent to g0 – and yet this is indeed the case, at least when A = L or A = 6∇2 . Eg = Ric(g) −
Theorem A. The standard metric is the unique critical point of log det Lg /det Lg0 and log det 6∇2g / det 6∇2g0 (up to equivalence by a conformal transformation). Remarks. 1. In [CY1], a uniqueness result was proved for log det Lg /det Lg0 for manifolds which satisfied Z 3 2 κd ≡ 16π χ(M ) − |W0 |2 dv0 ≤ 0, 2 where W0 is the Weyl curvature of g0 . Examples of such manifolds include CP 2 , S 2 × S 2 , and the torus T4 . But for S 4 , κd = 32π 2 > 0. 2. In [CY2], uniqueness of critical points of the functional II is established. The proof of this fact is quite analytic. Interestingly, our techniques cannot be used to prove their result, and vice versa. Their method is based on the observation that critical points of II, when pulled back to R4 via stereographic projection, can be identified with solutions of the semilinear P.D.E. 12 w = e4w .
(0.7)
The method of moving planes is then used to characterize all solutions of (0.7) on R4 with certain growth at infinity. We are aware of two ways of proving Theorem A. However, we will only present the one which is – to the author’s sensibilities at least – the most elementary. The more involved proof has the virtue of indicating the connection between our uniqueness result and the work of Obata on uniqueness for the Yamabe problem. This connection is worth discussing. Recall that the total scalar curvature of the metric g is
658
M. J. Gursky
S(g) = (vol g)−
n−2 n
Z Rg dvol(g).
The problem of minimizing S in a fixed conformal class is known as the Yamabe problem. While its history and resolution is a subject unto itself (see [LP]), we only wish to point out a few well-known results. If [g] denotes the conformal class of g, then critical points of S|[g] have constant scalar curvature. (ii) Let Y (g) = inf S|[g] . Then Y (g) > −∞, and is obviously a conformal invariant. Moreover, if µ(−Lg ) denotes the lowest eigenvalue of the conformal laplacian, then sign µ(Lg ) = sign Y (g). (iii) If Y (g) ≤ 0 then critical points of S|[g] are unique. (iv) Suppose g0 is Einstein. Then by (i) g0 is critical for S|[g0 ] . Obata ([Ob]) proved that g0 is the unique critical point of S|[g0 ] up to homothety (and up to a conformal transformation in the exceptional case of the sphere). (i)
Obata’s proof involved a clever use of the formula which expresses the trace-free Ricci tensor E of a metric g = u2 g0 in terms of the trace-free Rici tensor E0 of g0 : 1 Eij = (E0 )ij − (n − 2) u−1 ∇i ∇j u − 1u gij . (0.8) n Here the covariant derivatives are with respect to g, not g0 . If g0 is Einstein, then E0 ≡ 0. If we pair both sides of (0.8) with uEij and integrate we get Z Z u|E|2 = −(n − 2) Eij ∇i ∇j u Z (Stoke’s Theorem) = (n − 2) ∇i Eij ∇j u Z (n − 2)2 ∇j R∇j u (Contracted 2nd Bianchi). (0.9) = 2n Notice that in the first line of (0.9) we used the fact that E is trace-free. If R is constant, we see that E ≡ 0; i.e., g is also Einstein. Obata analyzed (0.8) in light of this and was able to conclude uniqueness. Our purpose in partially reproducing his proof is to point out the role that the trace-free Ricci tensor played in it, at least on the level of the calculations in (0.9). A careful analysis shows that there are two properties of E that were important: First, that E has vanishing trace; second, that δE = − (n−2) 2n dR, where d denotes the differential and δ the divergence. Returning to our setting, one can ask whether an argument imitative of Obata’s could be used to establish uniqueness for the functional determinant. The answer is yes, and although we will not give the details, we wish to point out the existence of a tensor which would supplant the trace-free Ricci tensor in such a proof. Theorem B. Let (M, g) be a locally conformally flat four-manifold. Let γ2 , γ3 be given constants, and define (0.10) Ug = γ2 Qg − γ3 1g Rg . Then there is a symmetric 2-tensor T = T (g) which satisfies (i) trace T = 0, (ii) δT = −dU .
Uniqueness of the Functional Determinant
659
Remarks. 1. Alternatively, we could have required T to satisfy δT = 0 and tr T = −4Q. The analogy between Q and the scalar curvature R described above implies an obvious analogy between T and the Einstein gravitational tensor Gij = Rij − 21 Rgij . G satisfies δG = 0, tr G = −( n−2 2 )R. See also Remark 4 below, where T is realized as the gradient of an “action". 2. For any conformally flat four-manifold, a metric g which is critical for the functional determinant F satisfies Ug ≡ constant (see [CY1]). 3. The assumption that (M, g) is conformally flat seems crucial. We rather doubt that such a tensor exists in general. 4. There is an analogue of the quantity Q in dimensions n ≥ 5 which is important in conformal geometry (see [Br]). In higher dimensions one can demonstrate the existence of a trace-free tensor T satisfying δT = −dQ in a rather systematic way: Define Z Q [g] = Qg dvol(g). Let Te be the tensor such that Q0g (h)
Z d Q [g + th] = = g(Te, h). dt t=0
Then δ Te ≡ 0 (see [Be, Prop. 4.11]), while it is not difficult to show that trace Te = λ(n)Q for some constant λ(n). Then let T = n λ(n)−1 Te + Qg. If we take γ2 = 1 and γ3 = 0 in (0.10) then T satisfies (i) trace T = 0, (ii) δT = −dQ. The quantity Q is of independent interest and its significance in conformal geometry and spectral theory has been studied in several papers (see [Gu, Br]). We would like to point out the following “Pohozaev identity" which is analogous to the Pohozaev identity for the scalar curvature (see [Sc]), or the Futaki functional in Kahler geometry (see [Be, p. 92]). Theorem C. Let (N, g) be a conformally flat four-manifold with boundary ∂N . Suppose X is a conformal Killing vector field. Then Z Z LX Q dvol(g) = T (X, ν) dσ(g), N
∂N
where LX Q denotes the Lie derivative of Q with respect to X, ν denotes the outward normal on ∂N and dσ(g) the induced measure on ∂N . In local coordinates, 1 1Rgij − 4∇i ∇j R − 36Eik Ejk + 9|E|2 gij + 10 REij . Tij = 36 The organization of the remainder of the paper is straightforward: in Sect. A we prove Theorem A; in Sect. B, Theorem B, etc.
A. The Proof of Theorem A Let (S 4 , g0 ) denote the sphere with its standard round metric. Assume g = u2 g0 is critical for log det Ag /det Ag0 , where A = L or A = 6∇2 . It will be helpful to rewrite the Euler equation (0.10) as
660
M. J. Gursky
1R = λ − β|E|2 + where
1 βR2 , 12
(1.1)
λ=
−3γ2 , 1 (γ3 + 12 γ2 )
(1.2)
β=
γ2 . 1 4(γ3 + 12 γ2 )
(1.3)
6 2 , (γ1 , γ2 , γ3 ) = Recall that when A = L, (γ1 , γ2 , γ3 ) = (1, −4, − 23 ); and when A = ∇ 14 (7, −88, − 3 ). According to (0.8) the trace-free Ricci tensor E of g is given by h i 1 u Eij = −2 ∇i ∇j u − 1u gij . 4
(1.4)
Also, a well known formula for the scalar curvature of a conformally related metric gives |∇u|2 1 + 2u−1 , 1u = − Ru + 2 6 u
(1.5)
where the laplacian and gradient are with respect to g. Our proof begins by pairing both sides of (1.4) with the Hessian of the scalar curvature ∇i ∇j R and integrating. To simplify notation we supress the volume form dvol(g): Z Z 1 (1.6) uEij ∇i ∇j R = −2 ∇i ∇j R ∇i ∇j u + 1u 1R 2 Integrating by parts and commuting derivatives we have Z Z −2 ∇i ∇j R ∇i ∇j u = 2 ∇j R ∇i ∇i ∇j u Z = 2 ∇j R ∇i ∇j ∇i u Z h i = 2 ∇j R ∇j (1u) + Rij ∇i u . Integrating by parts again gives Z Z −2 ∇i ∇j R ∇i ∇j u = − 21R1u + 2 Rij ∇i u∇j R Z 1 = −21R1u + 2 Eij ∇i u∇j R + R∇i R∇i u (1.7) 2 Substituting (1.7) into (1.6), Z Z 3 1 u Eij ∇i ∇j R = − 1u 1R + 2 Eij ∇i u ∇j R + R∇i R∇i u. 2 2 By the Euler equation (1.1),
Uniqueness of the Functional Determinant
Z
661
3 1 2 2 βR u Eij ∇i ∇j R = − 1u λ − β|E| + 2 12 1 + 2 Eij ∇i u ∇j R + R∇i R∇i u 2 Z 1 3 β|E|2 1u − βR2 1u = 2 8 1 + 2 Eij ∇i u ∇j R + R∇i R∇i u. 2 Z
(1.8)
Notice that we can combine two of the terms in (1.8) by integrating by parts: Z 1 1 − βR2 1u + R∇i R∇i u 8 2 Z 1 1 βR∇i R∇i u + R∇i R∇i u = 4 2 Z 1 (β + 2)R∇i R∇i u. = 4 Therefore, Z Z 1 3 β|E|2 1u + 2 Eij ∇i u ∇j R + (β + 2)R∇i R∇i u. u Eij ∇i ∇j R = 2 4 Substituting (1.5) into (1.9), Z Z |∇u|2 1 3 2 −1 β|E| − Ru + 2 + 2u u Eij ∇i ∇j R = 2 6 u 1 + 2 Eij ∇i u ∇j R + (β + 2)R∇i R∇i u 4 Z 1 |∇u|2 = − βR|E|2 u + 3β|E|2 + 3β|E|2 u−1 4 u 1 + 2 Eij ∇i u ∇j R + (β + 2)R∇i R∇i u 4 ⇒ Z 1 |∇u|2 0 = −u Eij ∇i ∇j R − βR|E|2 u + 3β|E|2 4 u 1 + 3β|E|2 u−1 + 2 Eij ∇i u∇j R + (β + 2)R∇i R∇i u. 4
(1.9)
(1.10)
There are two more terms to simplify in (1.10): The first and the last. For the first, we integrate by parts once again. Z Z −u Eij ∇i ∇j R = u ∇i Eij ∇j R + ∇i uEij ∇j R Z 1 |∇R|2 u + Eij ∇i u ∇j R. (1.11) = 4 For the last term, we return to (1.4) and pair both sides with R Eij , then integrate by parts:
662
M. J. Gursky
Z
Z R|E|2 u = Z = Z =
⇒ Z
1 (β + 2)R ∇i R∇i u = 4
Z
−2R Eij ∇i ∇j u 2R ∇i Eij ∇j u + 2∇i REij ∇j u 1 R∇i R∇i u + 2 Eij ∇i u ∇j R 2 1 (β + 2)R|E|2 u − (β + 2) Eij ∇i u ∇j R 2
Substituting (1.11) and (1.12) into (1.10) and combining terms we finally get Z |∇u|2 1 |∇R|2 u + (1 − β) Eij ∇i u ∇j R + 3β|E|2 0= 4 u 1 + 3β|E|2 u−1 + (1 + β)R|E|2 u. 4
(1.12)
(1.13)
We will now argue that the RHS of (1.13) is non-negative and vanishes if and only if |E| ≡ 0; i.e., g is Einstein. Uniqueness is then immediate. The key observation needed to prove this is that the scalar curvature of a metric satisfying (1.1), with the values of γ2 and γ3 corresponding to L and 6∇2 , is strictly positive. This follows from [Gu, Lemma 1.2], once we have shown that 0 < β(L), β(6∇2 ) ≤ 2,
(1.14)
λ(L), λ(6∇2 ) ≤ 0,
(1.15)
µ(g) > 0,
(1.16)
where µ(g) is the lowest eigenvalue of L. As we pointed out in the introduction, the sign of µ(g) is a conformal invariant and µ(g0 ) = 12. Hence (1.16) holds. Using the values of γ2 and γ3 for L and 6∇2 , we have by (1.2) and (1.3), β(L) = 1, 11 , β(6∇2 ) = 6 so indeed (1.14) holds. Also, λ(L) = −12, λ(6∇2 ) = −22, and therefore (1.15) is true as well. We conclude that R > 0 and moreover that the last term in (1.13) is positive. Because β(L) = 1, it follows that for a metric critical for the log determinant of L, Z |∇u|2 1 |∇R|2 u + 3|E|2 + 3|E|2 u−1 . 0≥ 4 u Hence E ≡ 0. On the other hand, if a metric is critical for the log determinant of 6∇2 ,
Uniqueness of the Functional Determinant
Z 0≥
663
5 11 |∇u|2 1 |∇R|2 u − Eij ∇i u ∇j R + |E|2 4 6 2 u 11 |E|2 u−1 . + 2
Using the inequality 56 xy ≤ 41 x2 +
25 2 36 y
we have
1 5 25 2 |∇u|2 5 |E| | − Eij ∇i u ∇j R| ≤ |E||∇u||∇R| ≤ |∇R|2 u + 6 6 4 36 u ⇒
Z 0≥
11 |∇u|2 173 |E|2 + |E|2 u−1 , 36 u 2
so that E ≡ 0 as well. This completes the proof of Theorem A. Remark. In the proof involving the tensor T described in Theorem B, we still arrive (after many integrations by parts) at (1.13). The argument presented above is “easier" in the sense that one does not have to first establish the existence of T in order to get to the same identity. B. The Proof of Theorem B The condition that the tensor T satisfy δT = −dU suggests that it should be a linear combination of the following terms: 1Eij , ∇i ∇j R, 1Rgij , Eik Ejk , |E|2 gij , REij , R2 gij . Conformal flatness implies that the first term in the above list can be expressed as a linear combination of the rest. We therefore arrive at the following form for T : Tij = a1 ∇i ∇j R + a2 1Rgij + a3 Eik Ejk + a4 |E|2 gij + a5 REij + a6 R2 gij . Since tr T = 0, we must have a6 = 0, a1 + 4a2 = 0, a3 + 4a4 = 0. We are now free to choose only a2 , a4 , and a5 . But the condition δT = −dU imposes precisely three conditions. To see this, note (δT )j = −∇i Tij , so −(δT )j = −4a2 ∇i ∇i ∇j R + a2 ∇j (1R) − 4a4 ∇i (Eik Ejk ) + a4 ∇j |E|2 + a5 ∇j (REij ).
(2.1)
Commuting derivatives, the first term in (2.1) can be written ∇i ∇i ∇j R = ∇j (1R) + Rjk ∇k R = ∇j (1R) + Ejk ∇k R +
1 R ∇j R. 4
(2.2)
Turning to the third term, we will use the fact that on a locally conformally flat manifold,
664
M. J. Gursky
∇i Ejk = ∇j Eik −
1 1 gjk ∇i R + gik ∇j R. 12 12
Hence ∇i (Eik Ejk ) = ∇i Eik Ejk + Eik ∇i Ejk 1 1 1 = Ejk ∇k R + Eik ∇j Eik − gjk ∇i R + gik ∇j R 4 12 12 1 1 1 = Ejk ∇k R + ∇j |E|2 − Ejk ∇k R 4 2 12 1 1 2 = Ejk ∇k R + ∇j |E| . 6 2
(2.3)
For the last term in (2.1) we have 1 ∇i (REij ) = Ejk ∇k R + R∇j R. 4
(2.4)
Substituting (2.2)-(2.4) into (2.1) and combining terms we get 1 1 −(δT )j = 3a2 ∇j (1R) − a4 ∇j |E|2 + ( a5 − a2 )∇j (R2 ) 8 2 2 + (−4a2 − a4 + a5 ) Ejk ∇k R. 3 Let a5 = 4a2 + 23 a4 ; then −(δT )j = −3a2 ∇j (1R) − a4 ∇j |E|2 +
1 a 4 ∇j R 2 12
1 = ( a4 − 3a2 )∇j (1R) + 4a4 ∇j Q. 3 Finally we take a4 = 41 γ2 and a2 = 13 (γ3 +
1 12 γ2 )
to get the desired identity.
C. The Proof of Theorem C If we take γ3 = 0 and γ2 = 1 in (0.10), then it follows that there is a tensor T satisfying (i) trace T = 0, (ii) δT = −dQ. Let X be a conformal Killing vector field; then X satisfies 1 ∇i Xj + ∇j Xi = − (δX)gij . 2 If we pair both sides of (3.1) with T we get Tij ∇i Xj = 0. Integrating over N and using Stoke’s theorem we have
(3.1)
Uniqueness of the Functional Determinant
Z
665
Z
Z −∇i Tij Xj + T (X, ν) N ∂N Z D E Z δT, X + T (X, ν) = ∂N Z ZN D E − dQ, X + T (X, ν) = ∂N ZN Q= T (X, ν).
Tij ∇i Xj =
0= N
Z LX
⇒ N
∂N
Acknowledgement. The author would like to thank Alice Chang and Paul Yang for bringing to our attention some of the questions addressed in this article.
References [Be] [Bec] [Bec1] [Br] [BCY] [BO] [CQ] [CY1] [CY2] [Gu] [LP] [Ob] [On] [OPS1] [OPS2] [Po] [RS] [Sc]
Besse, A.: Einstein Manifolds. Berlin–Heidelberg: Springer Verlag, 1987 Beckner, W.: Sharp Sobolev Inequalities on the Sphere and the Moser-Trudinger Inequality. Ann. Math. 138, 213–242 (1993) Beckner, W.: Moser–Trudinger inequality in higher dimensions. Int. Mat. Res. Not. 7, 83–91 (1991) Branson, T.: The Functional Determinant. Lecture Note Series, 4, Seoul National University, Research Institute of Mathematics, Global Analysis Research Center, Seoul, 1993 Branson, T., Chang, S.Y.A., Yang, P.: Estimates and Extremals for Zeta Function Determinants on Four-Manifolds Comm. Math. Phys. 149, 241–262 (1992) Branson, T., Orsted, B.: Explicit Functional Determinants in Four Dimensions. Proc. Am. Math. Soc. 113, 669–682 (1991) Chang, S.Y.A., Qing, J.: Zeta Functional Determinants on Manifolds with Boundary Preprint. Chang, S.Y.A., Yang, P.: Extremal Metrics of Zeta Function Determinants on Four-Manifolds Ann. Math. 142, 171–212 (1995) Chang, S.Y.A., Yang, P.: Preprint Gursky, M.: The Weyl Functional, DeRham Cohomology, and Kahler-Einstein Metrics. Preprint Lee, J.M., Parker, T.: The Yamabe Problem. Bull. Amer. Math. Soc. (N.S.) 17, 37–91 (1987) Obata, M.: The Conjectures on Conformal Transformations of Riemannian Manifolds. J. Diff. Geom. 6, 247–258 (1971) Onofri, E.: On The Positivity of The Effective Action in a Theory of Random Surfaces. Comm. Math. Phys. 86, 321–326 (1982) Osgood, B., Phillips, R., Sarnak, P.: Extremals of Determinants of Laplacians. J. Funct. Anal. 80, 148–211 (1988) Osgood, B., Phillips, R., Sarnak, P.: Compact Isospectral Sets of Surfaces. J. Funct. Anal. 80, 212–234 (1988) Polyakov, A.: Quantum Geometry of Bosonic Strings. Phys. Lett. B 103, 207–210 (????) Ray, D., Singer, I.: R-Torsion and The Laplacian on Riemannian Manifolds. Adv. Math. 7, 145–210 (1971) Schoen, R.: The Existence of Weak Solutions with Prescribed Singular Behavior for a Conformally Invariant Scalar Equation. Comm. Pure Appl. Math. 41, 317–392 (1988)
Communicated by R.H. Dijkgraaf
Commun. Math. Phys. 189, 667 – 698 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Non-Commutative Martingale Inequalities Gilles Pisier1,? , Quanhua Xu2 1
Texas A&M University and Universit´e Paris VI, Equipe d’Analyse, Case 186, 4, Place Jussieu, 75252 Paris Cedex 05, France. E-mail: [email protected] 2 Universit´ e de Franche-Comt´e, Equipe de Math´ematiques, UFR des Sciences et Techniques, 16, Route de Gray, 25030 Besanc¸on Cedex, France. E-mail: [email protected] Received: 20 March 1997 / Accepted: 21 March 1997
Abstract: We prove the analogue of the classical Burkholder-Gundy inequalites for non-commutative martingales. As applications we give a characterization for an ItoClifford integral to be an Lp -martingale via its integrand, and then extend the Ito-Clifford integral theory in L2 , developed by Barnett, Streater and Wilde, to Lp for all 1 < p < ∞. We include an appendix on the non-commutative analogue of the classical Fefferman duality between H 1 and BM O.
0. Introduction Recently, non-commutative (=quantum) probability theory has developed considerably. In particular, all sorts of non-commutative analogues of Brownian motion and martingales have been studied following the basic work of Parthasarathy and Schmidt. We refer the reader to P. A. Meyer’s exposition ([M]) and to the proceedings of the successive conferences on quantum probability [AvW] for more details and references. There are also intimate connections with Harmonic Analysis (cf. e.g. [Mi]). Motivated by quantum physics, and after the pioneer works of Gross (cf. [Gr1-2]), a Fermionic version of Brownian motion and stochastic integrals was developed (see [BSW1]), and the optimal hypercontractive inequalities have been finally proved ([CL]). In this paper we will prove the non-commutative analogue of the classical BurkholderGundy inequalities from martingale theory. We should point out that what follows was originally inspired by some recent work of Carlen and Kr´ee, who had considered Fermionic versions of the Burkholder-Gundy inequalities. They obtained the inequality in Theorem 4.1 below in some special cases, as well as some sufficient conditions for the convergence of stochastic integrals in the case p ≤ 2 (see Sect. 4 below for more on this). ?
Partially supported by the NSF
668
G. Pisier, Q. Xu
One interesting feature of our work, is that the square function is defined differently (and it must be changed!) according to p < 2 or p > 2. This surprising phenomenon was already discovered by F. Lust-Piquard in [LP] (see also [LPP]) while establishing non-commutative versions of Khintchine’s inequalities. Let us briefly describe our main inequality. Let M be a finite von Neumann algebra with a normalized normal faithful trace τ , and (Mn )≥0 be an increasing filtration of von Neumann subalgebras of M. Let 1 < p < ∞ and (xn ) be a martingale with respect to (Mn )≥0 in the usual Lp -space Lp (M, τ ) associated to (M, τ ). Set d0 = x0 , dn = xn − xn−1 . Then our main result reads as follows. If p ≥ 2, we have (with equivalence constants depending only on p) X X d∗n dn )1/2 kp , k( dn d∗n )1/2 kp }. (0.1) sup kxn kp ≈ max {k( n
This is no longer valid for p < 2; however for p < 2 the “right” inequalities are X X a∗n an )1/2 kp + k( bn b∗n )1/2 kp }, (0.2) sup kxn kp ≈ inf {k( n
where the infimum runs over all decompositions dn = an +bn of dn as a sum of martingale difference sequences adapted to the same filtration. In particular, this applies to martingale transforms: given a martingale (xn ) as above and an adapted bounded sequence ξ = (ξn ), i.e. such that ξn ∈ Mn for all n ≥ 0, we can form the martingale y n = x0 +
n X
ξk−1 (xk − xk−1 ).
1
Then, if (xn ) is a martingale which converges in Lp (M, τ ) (1 < p < ∞), if the sequence ξ = (ξn ) is bounded in M and if ξn−1 commutes with Mn for all n, the transformed martingale (yn ) also converges in Lp (M, τ ). Indeed, by duality, it suffices to check this for p ≥ 2, and then it is an easy consequence of (0.1). Note however that the preceding statement can fail if one does not assume that ξn−1 commutes with Mn . In the case p ≥ 2, it suffices to assume that ξn−1 commutes with xn − xn−1 for all n. The latter assumption is used to show that if, say kξn−1 k ≤ 1, we have (yn − yn−1 )(yn − yn−1 )∗ ≤ (xn − xn−1 )(xn − xn−1 )∗ . Of course, this assumption can be relaxed further, all that is needed is to be able to compare the “square functions” associated to (yn ) and (xn ) appearing on the right in (0.1). In Sect. 2 the above inequalities (0.1) and (0.2) are proved. The key point of our proof is the following passage: assuming the above inequalities for some 1 < p < ∞, then we deduce them for 2p. The rest of the proof can be accomplished by iteration (starting from p = 2), interpolation and duality. We would like to emphasize that this proof is entirely self-contained. The style of proof of (0.1) and (0.2) is rather old fashioned: it is reminiscent of Marcel Riesz’s classical argument for the boundedness of the Hilbert transform on Lp (1 < p < ∞), and also of Paley’s proof of (0.1) in the classical dyadic case ([Pa]), i.e. when Mn = L∞ ({−1, +1}n ). It has been known for many years that Marcel Riesz’s argument could be easily adapted to prove the boundedness of the Hilbert transform on the vector valued Lp -space (p ≥ 2) Lp (X), when the Banach space X is the Schatten p-class Sp , or a non-commutative Lp -space associated to a trace (the first author learned this from P. Muhly back in 1976). More recently, Bourgain ([B1]) used this to show the
Non-Commutative Martingale Inequalities
669
unconditionality of martingale differences with values in Sp . In other words, he showed that Sp is a UMD space, in the terminology of [Bu2]. (See [BGM] for the case of more general non-commutative Lp -spaces.) Recall that a Banach space X is called a UMD space if, for any 1 < q < ∞, there is a constant C such that, for any q-integrable X-valued finite martingale (xn ) on a probability space (, A, P ) and for any choice of sign n = ±1, we have (here we write briefly Lq (X) instead of Lq (, A, P ; X)) X X xn − xn−1 kLq (X) = C sup kxn kLq (X) . (0.3) k εn (xn − xn−1 )kLq (X) ≤ Ck n
We will denote by Cq (X) the best constant C satisfying this. By well known stopping time arguments (the so-called “good λ inequalities”, see [Bu1]) it suffices to have this for some 1 < q < ∞, for instance for q = 2 say, and there is a positive constant Kq depending only on q such that for all 1 < q < ∞, Kq−1 C2 (X) ≤ Cq (X) ≤ Kq C2 (X).
(0.4)
Of course, when X is a non-commutative Lp -space, the choice of q = p gives a nicer form to (0.3). The reader is referred to [Bu2] for more information on UMD spaces. The fact that non-commutative Lp -spaces are UMD ([B1-2, BGM]), which is of course a corollary of our main result, can also be used to prove, by some kind of transference argument, several special cases of it. This is explained in Sect. 3. However, although it seems to give better behaved constants (when p → ∞), we do not see how to use this transference idea in the situation of an arbitrary filtration, as treated in Sect. 2. In Sect. 3 we give three examples. They are respectively the tensor products, Clifford algebras and algebras of free groups. For all of them the preceding inequalities admit a different proof, that we outline in the tensor product case. Its main idea is to transfer a noncommutative martingale to a commutative martingale with values in the corresponding non-commutative Lp -space Lp (M, τ ), and to use its unconditionality. This alternate method is, in fact, our first approach to non-commutative martingale inequalities, as announced in [PX]. Section 4 is devoted to the Ito-Clifford integral. There we apply our main inequalities to give a characterization for an Ito-Clifford integral to be a Lp -martingale via its integrand. This is the Fermionic analogue of the square function inequality for the classical Ito integrals. As a consequence, we extend the Ito-Clifford integral theory in L2 , developed by Barnett, Streater and Wilde, to Lp for all 1 < p < ∞. We include an appendix on the non-commutative analogue of the classical Fefferman duality between H 1 and BM O.
1. Preliminaries Let M be a finite von Neumann algebra with a normalized faithful trace τ . For 1 ≤ p ≤ ∞ let Lp (M, τ ) or simply Lp (M) denote the associated non-commutative Lp -space. Note that if p = ∞, Lp (M) is just M itself with the operator norm; also recall that the norm in Lp (M) (1 ≤ p < ∞) is defined as kxkp = (τ (|x|p ))1/p , where
x ∈ Lp (M),
|x| = (x∗ x)1/2
670
G. Pisier, Q. Xu
is the usual absolute value of x. Let a = (an )n≥0 be a finite sequence in Lp (M). Define kakLp (M;lC2 ) = k
X
|an |2
1/2
kp ,
2 ) = k kakLp (M;lR
n≥0
X
|a∗n |2
1/2
kp .
(1.1)
n≥0
This gives two norms on the family of all finite sequences in Lp (M). To see that, denoting by B(l2 ) the algebra of all bounded operators on l2 with its usual trace tr, let us consider the von Neumann algebra tensor product M ⊗ B(l2 ) with the product trace τ ⊗ tr. τ ⊗ tr is a semifinite faithful trace. The associated non-commutative Lp -space is denoted by Lp (M ⊗ B(l2 )). Now any finite sequence a = (an )n≥0 in Lp (M) can be regarded as an element in Lp (M ⊗ B(l2 )) via the following map: a0 0 . . . a 7→ T (a) = a1 0 . . . , .. .. . . that is, the matrix of T (a) has all vanishing entries except those in the first column which are the an ’s. Such a matrix is called a column matrix, and the closure in Lp (M ⊗ B(l2 )) of all column matrices is called the column subspace of Lp (M ⊗ B(l2 )) (when p = ∞, we take the w∗ -closure of all column matrices). Then kakLp (M;lC2 ) = k |T (a)| kLp (M⊗B(l2 )) = kT (a)kLp (M⊗B(l2 )) . Therefore, k · kLp (M;lC2 ) defines a norm on the family of all finite sequences of Lp (M). The corresponding completion (for 1 ≤ p < ∞) is a Banach space, denoted by 2 2 ). Then Lp (M; lC ) is isometric to the column subspace of Lp (M ⊗ B(l2 )). Lp (M; lC 2 For p = ∞ we let L∞ (M; lC ) be the Banach space of sequences in L∞ (M) isometric by the above map T to the column subspace of L∞ (M ⊗ B(l2 )). It is easy to check that 2 ) iff a sequence a = (an )n≥0 in Lp (M) belongs to Lp (M; lC sup k n≥0
if this is the case,
∞ P k=0
|ak |2
1/2
n X
|ak |2
1/2
kp < ∞;
k=0
belongs to Lp (M) and
∗
n P
|ak |2
1/2
converges to it in
k=0
L (M) (relative to the w -topology for p = ∞). 2 ) is a norm on the Similarly (or passing to adjoints), we may show that k · kLp (M;lR p 2 ), family of all finite sequences in L (M). As above, it defines a Banach space Lp (M; lR p 2 which now is isometric to the row subspace of L (M ⊗ B(l )) consisting of matrices whose non-zero entries lie only in the first row. Observe that the column and row subspaces of Lp (M ⊗ B(l2 )) are 1-complemented subspaces. Therefore, from theclassical duality between Lp (M ⊗ B(l2 )) and Lq (M ⊗ B(l2 )) p1 + q1 = 1, 1 ≤ p < ∞ we deduce that p
2 ∗ 2 Lp (M; lC ) = Lq (M; lC )
2 ∗ 2 and Lp (M; lR ) = Lq (M; lR ).
2 2 This complementation also shows that the families {Lp (M; lC )} and {Lp (M; lR )} are two interpolation scales, say, for instance, relative to the complex interpolation method.
Non-Commutative Martingale Inequalities
671
Note that, for any finite sequence (an )n≥0 in Lp (M), we have, using tensor product notation and denoting again by k.kp the norm in Lp (M ⊗ B(l2 )), X X X X an ⊗ en0 kp and k( an a∗n )1/2 kp = k an ⊗ e0n kp . k( a∗n an )1/2 kp = k The following is an extension of a non-commutative version of H¨older’s inequality from [LP], which can be established (perhaps at the cost of an extra factor 2) by arguing as in [LP]. For completeness, we include a direct elementary proof (without any extra factor) based on the three lines lemma. Lemma 1.1. Let 2 ≤ p ≤ ∞. For any finite sequence a = (an )n≥0 in L2p (M) and any A ∈ L2p (M) we set B(a, A) = (an A)n≥0 . Then n o 2 ) ≤ max 2 ) kakL2p (M;lC2 ) , kakL2p (M;lR (1.2) kAk2p . kB(a, A)kLp (M;lR P 1/2 Proof. By definition, the left side of (1.2) is equal to k an AA∗ a∗n kp/2 and, on the other hand, by duality, we have X (1.3) k an AA∗ a∗n kp/2 = sup |ψ(B)| with ψ(B) = τ (
X
an AA∗ a∗n B)
and where the supremum in (1.3) runs over the set of all B ≥ 0 in M such that τ (B r ) ≤ 1 with r conjugate to p/2, or equivalently with 1/r = 1 − 2/p. We will apply the three lines lemma to the analytic function F defined for 0 ≤ <(z) ≤ 1 by X 0 0 F (z) = τ an (AA∗ )zp/p a∗n B (1−z)r/p . Let θ = p0 /p so that 1 − θ = p0 /r. Note that 0 ≤ θ ≤ 1 and F (θ) = ψ(B). Hence, by the three lines lemma, we have |ψ(B)| = |F (θ)| ≤ (sup |F (it)|)1−θ (sup |F (1 + it)|)θ . t∈R
(1.4)
t∈R
But, by an easy application of H¨older’s inequality, we have X an U a∗n kp | U ∈ M, kU k ≤ 1}, sup |F (it)| ≤ sup{k
(1.5)
t∈R
and since τ is a trace, we also find 0
sup |F (1 + it)| ≤ k(AA∗ )p/p kp0 sup{k t∈R
X
a∗n U an kp | U ∈ M, kU k ≤ 1}.
(1.6)
P P a∗n kp ≤ k anP a∗n kp , and similarly with a∗n Note that, if kU k ≤ 1, we P have k∗ an UP instead of an . Indeed, k an U an kp = k( an U ⊗ e0n )( a∗n ⊗ en0 )kp , hence X X X X an U ⊗ e0n k2p k a∗n ⊗ en0 k2p = k an a∗n kp . k an U a∗n kp ≤ k Therefore the inequalities (1.4), (1.5) and (1.6) combined with (1.3) immediately yield the announced result (1.2).
672
G. Pisier, Q. Xu
Remark 1.2. The following example shows that the right side of (1.2) cannot be simplified too much: let M be the algebra of all N ×N complex matrices equipped with its usual P PN trace, let A = e11 and let an = en1 for n = 1, ..., N . Then ( an AA∗ a∗n )1/2 = 1 enn = P P P a∗n an = N e11 so that k( an AA∗ a∗n )1/2 kp = N 1/p , kAk2p = 1 ( an a∗n )1/2 and P ∗ 1/2 and k( an an ) k2p = N 1/2 . Thus, if 2 ≤ p < ∞, for no constant C can the inequality 2 ) ≤ CkakL2p (M;l2 ) kAk2p be true. This example also shows that (1.2) kB(a, A)kLp (M;lR R 2 ) ≤ CkakL2p (M;l2 ) kAk2p fails for p < 2. Similarly, the inequality kB(a, A)kLp (M;lR C also fails if 2 < p ≤ ∞ (take A = 1 and an = e1n ). We now turn to the description of non-commutative martingales and their square functions. S Let (Mn )n≥0 be an increasing sequence of von Neumann subalgebras of M Mn generates M (in the w∗ -topology). (Mn )n≥0 is called a filtration of such that n≥0
M. The restriction of τ to Mn is still denoted by τ . Let En = E(·|Mn ) be the conditional expectation of M with respect to Mn . En is a norm 1 projection of Lp (M) onto Lp (Mn ) for all 1 ≤ p ≤ ∞, and En (x) ≥ 0 whenever x ≥ 0. A non-commutative Lp -martingale with respect to (Mn )n≥0 is a sequence x = (xn )n≥0 such that xn ∈ Lp (Mn ) and Em (xn ) = xm ,
∀ m = 0, 1, ..., n.
Let kxkp = sup kxn kp . If kxkp < ∞, x is said to be bounded. n≥0
Remark 1.3. Let x∞ ∈ Lp (M). Set xn = En (x∞ ) for all n ≥ 0. Then x = (xn ) is a bounded Lp -martingale and kxkp = kx∞ kp ; moreover, xn converges to x∞ in Lp (M) (relative to the w∗ -topology in the case p = ∞). Conversely, if 1 < p < ∞, every bounded Lp -martingale converges in Lp (M), and so is given by some x∞ ∈ Lp (M) as previously. Thus one can identify the space of all bounded Lp -martingales with Lp (M) itself in the case 1 < p < ∞. Let x be a martingale. Its difference sequence, denoted by dx = (dxn )n≥0 , is defined as (with x−1 = 0 by convention) dxn = xn − xn−1 , Set SC,n (x) =
ÿ n X
n ≥ 0.
!1/2 |dxk |
2
and
SR,n (x) =
k=0
ÿ n X
!1/2 |dx∗k |2
.
k=0
2 2 ) (resp. Lp (M; lR )) iff (SC,n (x))n≥0 By the preceding discussion dx belongs to Lp (M; lC p (resp. (SR,n (x))n≥0 ) is a bounded sequence in L (M); in this case,
SC (x) =
ÿ∞ X k=0
!1/2 |dxk |
2
and
SR (x) =
ÿ∞ X
!1/2 |dx∗k |2
k=0
are elements in Lp (M). These are the non-commutative analogues of the usual square functions in the commutative martingale theory. It should be pointed out that one of SC (x) and SR (x) may exist as an element of Lp (M) without the other making sense; in other words, the two sequences SC,n (x) and SR,n (x) may not be bounded in Lp (M) at the same time.
Non-Commutative Martingale Inequalities
673
p p Let 1 ≤ p < ∞. Define HC (M) (resp. HR (M)) to be the space of all Lp -martingales p 2 2 x with respect to (Mn )n≥0 such that dx ∈ L (M; lC ) (resp. dx ∈ Lp (M; lR )), and set
kxkHpC (M) = kdxkLp (M;lC2 )
2 ). and kxkHpR (M) = kdxkLp (M;lR
p p Equipped respectively with the previous norms, HC (M) and HR (M) are Banach spaces. p Note that if x ∈ HC (M),
kxkHpC (M) = sup kSC,n (x)kp = kSC (x)kp , n≥0
p and similar equalities hold for HR (M). Then we define the Hardy spaces of noncommutative martingales as follows: if 1 ≤ p < 2, p p (M) + HR (M) Hp (M) = HC
equipped with the norm kxkHp (M) = inf{kykHpC (M) + kzkHpR (M) : x = y + z, and if 2 ≤ p < ∞,
p p y ∈ HC (M), z ∈ HR (M)}; (1.7)
p p (M) ∩ HR (M) Hp (M) = HC
equipped with the norm kxkHp (M) = max{kxkHpC (M) ,
kxkHpR (M) }.
(1.8)
The reason that we have defined Hp (M) differently according to 1 ≤ p < 2 or 2 ≤ p < ∞ will become clear in the next section, where we will show that Hp (M) = Lp (M) with equivalent norms for all 1 < p < ∞.
2. The Main Result In this section (M, τ ) always denotes a finite von Neumann algebra equipped with a normalized faithful trace, and (Mn )n≥0 an increasing filtration of subalgebras of M which generate M. We keep all notations introduced in the last section. In the sequel αp , βp , etc., denote positive constants depending only on p. The following is the main result of this paper. Theorem 2.1. Let 1 < p < ∞. Let x = (xn )n≥0 be an Lp -martingale with respect to (Mn )n≥0 . Then x is bounded in Lp (M) iff x belongs to Hp (M); moreover, if this is the case, αp−1 kxkHp (M) ≤ kxkp ≤ βp kxkHp (M) . (BGp ) Identifying bounded Lp -martingales with their limits, we may reformulate Theorem 2.1 as follows. Corollary 2.2. Let 1 < p < ∞. Then Hp (M) = Lp (M) with equivalent norms.
674
G. Pisier, Q. Xu
Corollary 2.2 explains why we have defined, in (1.7) and (1.8), the space Hp (M) and its norm differently for p in [1,2) and [2, ∞). One should note that such a different behavior in the non-commutative case already appears in the non-commutative Khintchine inequalities obtained by F. Lust-Piquard, which we will recall later on. Before proceeding to the proof of Theorem 2.1, let us briefly explain our strategy. Firstly, we prove the implication “(BGp ) =⇒ (BG2p )” (this is the key point of the proof). Then by iteration (noting that (BG2 ) is trivial) and interpolation we deduce (BGp ) for all 2 ≤ p < ∞. Finally, duality yields (BGp ) for 1 < p < 2. This is a well-known approach to the classical Burkholder-Gundy inequalities in the commutative martingale theory. However, in order to adapt it to the non-commutative setting, one encounters several substantial difficulties. Perhaps the main one is the lack of a reasonable maximal function in the non-commutative case. (Note that all the truncation arguments that appeal to stopping times appear unavailable or inefficient.) In the course of the proof we will show (and also need) the following result, which is the non-commutative analogue of a classical inequality due to Stein [St]. (See also [B1, Lemma 8] for a similar result in the case of commutative martingales with values in a UMD space.) Theorem 2.3. Let 1 < p < ∞. Define the map Q on all finite sequences a = (an )n≥0 in Lp (M) by Q(a) = (En an )n≥0 . Then kQ(a)kLp (M;lC2 ) ≤ γp kakLp (M;lC2 ) ,
2 ) ≤ γp kakLp (M;l2 ) . kQ(a)kLp (M;lR R
(Sp )
2 2 ) and Lp (M; lR ); consequently, Thus Q extends to a bounded projection on Lp (M; lC p p 2 p 2 p 2 2 ) acH (M) is complemented in L (M; lC ) + L (M, lR ) or L (M; lC ) ∩ Lp (M; lR cording to 1 < p ≤ 2 or 2 ≤ p < ∞.
Remark 2.4. The inequalities (BGp ) imply that all martingale difference sequences are unconditional in Lp (M), i.e. there is a positive constant βp0 such that for all finite martingales x in Lp (M) we have X X k εn dxn kp ≤ βp0 k dxn kp , ∀ εn = ±1. (BG0p ) n
Moreover
βp0
n
≤ αp β p .
We begin the proof of Theorems 2.1 and 2.3 with some elementary lemmas. The inequality below is well known: indeed, it is a consequence of the UMD property of Lp (M). One can also use the Hilbert transform instead. For the sake of completeness, we will show that it follows from (BGp ). The following proof is similar to an argument presented in [HP]. Lemma 2.5. Let ε = (εn )n≥0 be a sequence of independent random variables on some probability space (, F, P ) such that P (εn = 1) = P (εn = −1) = 1/2 for all n ≥ 0. Let ε0 = (ε0n )n≥0 be an independent copy of ε. Let 1 < p < ∞. Suppose (BGp ). Then for all finite double sequences (aij )i,j≥0 in Lp (M), Z 1/p X k εi ε0j aij kpp dP (ε)dP (ε0 )
0≤i≤j
≤ αp β p
Z
k
X i,j≥0
1/p
εi ε0j aij kpp dP (ε)dP (ε0 )
.
Non-Commutative Martingale Inequalities
675
Proof. Given n ≥ 0 let F2n and F2n+1 be the sub-σ-fields of F generated respectively by {ε0 , · · · , εn } ∪ {ε00 , · · · , ε0n } and {ε0 , · · · , εn , εn+1 } ∪ {ε00 , · · · , ε0n }. Then (Fn )n≥0 is an increasing filtration of sub-σ-fields of F. Let E denote the expectation viewed as a (tracial!) functional on L∞ (, F , P ). We consider the tensor product (M, τ ) ⊗ (L∞ (, F, P ), E) and its increasing filtration M ⊗ L∞ (, Fn , P ). Hence we have (BGp ) for the corresponding martingales (noting that such martingales are in fact commutative martingales with values in Lp (M)). Now given a finite double sequence (aij )i,j≥0 in Lp (M) we define a martingale f = (fn )n≥0 by X fn = idM ⊗ En εi ε0j aij , i,j≥0
where En stands for the conditional expectation of F with respect to Fn . Then (BGp ) yields X ε00n dfn kp ≤ αp βp kf kp , ∀ ε00n = ±1, k n≥0
where the norm k · kp is understood as it should be, that is, it is the norm on Lp M ⊗ L∞ (), τ ⊗ E . Consequently, X k df2n kp ≤ αp βp kf kp . n≥0
However,
X
df2n =
n≥0
X
εi ε0j aij ,
0≤i≤j
whence the announced result.
Lemma 2.6. Let 1 ≤ p ≤ ∞. Then for all finite sequences a = (an )n≥0 ⊂ Lp (M) we have X X X 1/(2p) |an |4 )1/2 kp ≤ k( |an |2 )1/2 k2p ( kan k2p . k( 2p ) n≥0
n≥0
n≥0
Proof. Let ei,j be the matrix in B(l2 ) whose entries all vanish but the one on the position (i, j) which equals 1. Using the tensor product M⊗B(l2 ) (already considered in Sect. 1) we have X X |an |4 )1/2 kp = k |an |2 ⊗ en,0 kLp (M⊗B(l2 )) k( n≥0
n≥0
=k
X
a∗n ⊗ en,n
n≥0
≤k
X
a∗n
X n≥0
⊗ en,n kL2p (M⊗B(l2 )) k
n≥0
=(
X
n≥0
an ⊗ en,0 kLp (M⊗B(l2 ))
1/(2p) kan k2p k( 2p )
X
X
an ⊗ en,0 kL2p (M⊗B(l2 ))
n≥0
|an |2 )1/2 k2p .
n≥0
In particular, for martingale differences we get the following
676
G. Pisier, Q. Xu
Lemma 2.7. Let 1 ≤ p ≤ ∞. Then for all finite martingales x = (xn )n≥0 ⊂ L2p (M) we have X |dxn |4 )1/2 kp ≤ 21−1/p kxk2p kxkH2p (M) . k( C
n≥0
Proof. By Lemma 2.6, it suffices to show X 1/(2p) kdxn k2p ≤ 21−1/p kxk2p . 2p n≥0
This is trivial for p = 1 and p = ∞. Then the general case follows by interpolation. Now we are prepared to prove Theorems 2.1 and 2.3. The proof is divided into several steps. Proof of Theorems 2.1 and 2.3. Step 1. (BGp ) implies (Sp ). Let 1 < p < ∞. Suppose (BGp ) holds. We will show (Sp ) holds as well. To this end, fix a finite sequence a = (ak )0≤k≤n ⊂ Lp (M). We consider the tensor 2 ) and σ = (n + 1)−1 tr is the normalized product (M, τ ) ⊗ (N , σ), where N = B(ln+1 2 ˜ trace on B(ln+1 ). Let Ek = Ek ⊗ idN denote the conditional expectation of M ⊗ N with respect to Mk ⊗ N . Then we have (BGp ) for all martingales relative to the filtration (Mk ⊗ N )k≥0 . Now set Ak = (n + 1)1/p ak ⊗ ek,0 ,
0 ≤ k ≤ n.
Let ε = (εn )n≥0 and ε0 = (ε0n )n≥0 be the sequences in Lemma 2.5. Then, with k.kp denoting here the norm in the space Lp (M ⊗ N ), we have kQ(a)kLp (M;lC2 ) = k
n X
E˜k (εk Ak )kp = k
n X k X k=0 j=0
k=0
=k
n X
(E˜j − E˜j−1 )(εk Ak )kp
(E˜j − E˜j−1 )
n X
j=0
εk Ak kp ,
k=j
hence by (BGp ) (cf . Remark 2.4) n n Z X 1/p X k ε0j (E˜j − E˜j−1 ) εk Ak kpp dP (ε)dP (ε0 ) , ≤ αp β p
so by Lemma 2.5, ≤ (αp βp )2
j=0
Z
k
k=j
n X
ε0j (E˜j − E˜j−1 )
n X
j=0
1/p εk Ak kpp dP (ε)dP (ε0 ) .
k=0
On the other hand, applying (BGp ) once again, this is n n Z X 1/p X k (E˜j − E˜j−1 ) εk Ak kpp dP (ε) ≤ (αp βp )3
= (αp βp )3
Z
j=0
k
n X k=0
k=0
εk Ak kpp dε
1/p
= (αp βp )3 kakLp (M;lC2 ) .
Non-Commutative Martingale Inequalities
677
Thus, we conclude kQ(a)kLp (M;lC2 ) ≤ (αp βp )3 kakLp (M;lC2 ) . 2 ). Passing to adjoints yields the boundedness of Q on Hence Q is bounded on Lp (M; lC p 2 L (M; lR ).
Step 2. (BGp ) implies (BG2p ). Let 1 < p < ∞ and suppose (BGp ). Let x = (xn )n≥0 be a martingale in L2p (M). We must show x satisfies (BG2p ). Clearly, we can assume x finite, that is, there exists n ∈ N such that xk = xn for all k ≥ n. For simplicity, set dk = dxk (so dk = 0 for all k > n). Then we write the classical “Doob identity”: X X |xn |2 = x∗n xn = SC (x)2 + d∗k xk−1 + x∗k−1 dk . (2.1) k≥0
Hence
kxk22p = k|xn |2 kp ≤ kSC (x)2 kp + k
X
k≥0
d∗k xk−1 kp + k
k≥0
= kxk2H2p (M) + 2k C
X
X
x∗k−1 dk kp
(2.2)
k≥0
d∗k xk−1 kp .
k≥0
(d∗k xk−1 )k≥0
Observe that is a martingale difference sequence. Letting y = (yk ) be the corresponding martingale, then by (BGp ), we get kykp ≤ βp kykHp (M) .
(2.3)
Now note that dyk = d∗k xk − d∗k dk = Ek (d∗k xn ) − |dk |2 ,
0 ≤ k ≤ n.
(2.4)
Let us first consider the case 1 < p < 2. Then kykHp (M) ≤ kykHpC (M) , so by (2.3), (2.4), Lemma 2.7 and (Sp ) (which, by Step 1, holds under (BGp )), we get kykHp (M) ≤ k(
n X
|dk |4 )1/2 kp + k
k=0
n X
|Ek (d∗k xn )|2
1/2
kp
k=0
≤ 21−1/p kxk2p kxkH2p (M) + γp k C
n X
x∗n dk d∗k xn
1/2
kp
k=0
≤ 21−1/p kxk2p kxkH2p (M) + γp kxk2p kxkH2p (M) C
R
≤ (21−1/p + γp )kxk2p kxkH2p (M) . If 2 ≤ p < ∞, again by (2.3) and (2.4), kykHp (M) ≤ k(
n X
|dk |4 )1/2 kp
k=0
n n n X X 1/2 1/2 o + sup k |Ek (d∗k xn )|2 kp , k |Ek (d∗k xn )∗ |2 kp . k=0
k=0
(2.5)
678
G. Pisier, Q. Xu
The first two terms on the right are dealt with as before; while by (Sp ) and Lemma 1.1, the third term is majorized by γp kxk2p kxkH2p (M) . Thus in the case 2 ≤ p < ∞, we have (2.6) kykHp (M) ≤ (21−1/p + γp )kxk2p kxkH2p (M) . Putting together (2.2), (2.3), (2.5) and (2.6), we obtain finally kxk22p ≤ kxk2H2p (M) + 2βp (21−1/p + γp )kxk2p kxkH2p (M) C
≤ kxk2H2p (M) + δp kxk2p kxkH2p (M) , where δp = 2βp (21−1/p + γp ). Therefore, it follows that kxk2p ≤ β2p kxkH2p (M) q with β2p = 21 (δp + 4 + δp2 ). Thus we have proved the second inequality of (BG2p ). The first one can be obtained in a similar way. Indeed, again by (2.1) and the previous argument, we get kxk22p ≥ kxk2H2p (M) − δp kxk2p kxkH2p (M) . C
Replacing xn by
x∗n
in (2.1), we also have kxk22p ≥ kxk2H2p (M) − δp kxk2p kxkH2p (M) . R
Therefore, kxk2H2p (M) ≤ kxk22p + δp kxk2p kxkH2p (M) , which gives the first inequality of (BG2p ). Step 3. (BGp ) for 2 ≤ p < ∞ and (Sp ) for 1 < p < ∞. Evidently, (BG2 ) holds with α2 = β2 = 1. Then by Step 2 and iteration we get (BG2n ) for all positive integers n, and so also (S2n ) by virtue of Step 1. Now we use interpolation to cover all values of p in [2, ∞). This is easy for (Sp ) and the first inequality of (BGp ). Let us consider, for instance, the first inequality of (BGp ). Byn what we already know about (BG2n ), the linear map x 7→ dx is bounded from n 2 ) for every positive integer n. Then by complex interpolation, L2 (M) into L2 (M; lC 2 ) for 2n < p < 2n+1 , and so for all p ∈ [2, ∞). it is bounded from Lp (M) into Lp (M; lC Hence kdxkLp (M;lC2 ) ≤ αp kxkp . 2 2 Passing to adjoints, we get the same inequality with Lp (M; lR ) instead of Lp (M; lC ). Thus the first inequality of (BGp ) holds for all 2 ≤ p < ∞. A similar argument applies to (Sp ) for all 2 ≤ p < ∞. However, the projection Q in Theorem 2.3 is self-adjoint; hence, we get (Sp ) for all 1 < p < ∞, which completes the proof of Theorem 2.3. Concerning the second inequality of (BGp ), we observe that by duality and the first inequality of (BGp ) just proved in [2, ∞), we deduce that for every 1 < p ≤ 2 and any martingale x in Lp (M) we have n o kxkp ≤ βp inf kxkHpC (M) , kxkHpR (M) .
Non-Commutative Martingale Inequalities
679
(Here if 1/p + 1/q = 1 (so 2 ≤ q < ∞), βp = αq with αq being the constant in the first inequality of (BGq ); see the next step for more on this). Examining the proof in Step 2, we see that the implication “(BGp ) =⇒ (BG2p )” still holds now with the help of (Sp ) and the above inequality for all 1 < p ≤ 2. It follows that the second inequality of (BGp ) holds for all 2 ≤ p ≤ 4. Then Step 2 and iteration yield the second inequality of (BGp ) for all 2 ≤ p < ∞. Step 4. (BGp ) for 1 < p < 2. Dualizing (BGp ) in the case 2 < p < ∞, we obtain that if 1 < p < 2, then for all martingales x in Lp (M), 2 ) . kxkp ≈ kdxkLp (M;lC2 )+Lp (M;lR
On the other hand, by Theorem 2.3 (already proved), Hp (M) is complemented in 2 2 ) + Lp (M; lR ), so the norm of dx in the latter space is equivalent to the Lp (M; lC norm of x in the former. Therefore, the proof of Theorems 2.1 and 2.3 is now complete. Remarks. (i) In Step 3 above, for the proof of the second inequality of (BGp ) we 2 2 ) ∩ Lp (M; lR ) for p ≥ 2, have avoided interpolating the intersection spaces Lp (M; lC although it is shown in [P] that they form an interpolation scale for the complex method. (ii) The constants αp and βp given by the above proof are not good. In fact, they grow exponentially as p → ∞ (see also Remark 3.2 below). The inequalities (BGp ) are intimately related to the non-commutative Khintchine inequalities, which played an important rˆole in our first approach to (BGp ) for the examples considered in the next section. Let us recall them here for the convenience of the reader. Let ε = (εn )n≥0 be a sequence of independent random variables on some probability space (, P ) such that P (εn = 1) = P (εn = −1) = 1/2 for all n ≥ 0. Theorem 2.8. (Non-commutative Khintchine inequalities, [LP, LPP]). Let 1 ≤ p < ∞. Let a = (an )n≥0 be a finite sequence in Lp (M). (i) If 2 ≤ p < ∞, 2 ) ≤ kakLp (M;lC2 )∩Lp (M;lR
Z
k
X
εn an k2p dP (ε)
1/2
n≥0
2 ) . ≤ δp kakLp (M;lC2 )∩Lp (M;lR
(ii) If 1 ≤ p < 2, 2 ) ≤ αkakLp (M;lC2 )+Lp (M;lR
Z
k
X
1/2
εn an k2p dP (ε)
n≥0
2 ) , ≤ kakLp (M;lC2 )+Lp (M;lR
where α > 0 is an absolute constant. This result was first proved in [LP] for 1 < p < ∞ for the Schatten classes. The general statement as above (including p = 1) is contained in [LPP]. Let us also mention that, as observed in [P] (independently observed by Marius Junge), a combination √of the main result in [LPP] with the type 2 estimate from [TJ] yields that δp is of order p (the
680
G. Pisier, Q. Xu
best possible) as p → ∞. One should emphasize that for 1 < p < ∞ the above noncommutative Khintchine inequalities all follow from (BGp ) (with some worse constants, of course). In that special case however, our proof essentially reduces to the original one in [LP]. Remark 2.9. (i) Note that, by Theorem 2.8, the unconditionality of martingale differences expressed in (BG0p ) actually implies (hence is equivalent to) (BGp ). Evidently, (BGp ) or (BG0p ) is no longer valid for p = 1. However, in this case p = 1, the second inequality of (BGp ) remains true (see the corollary in the appendix). Consequently, by the above non-commutative Khintchine inequalities (p = 1), we deduce the following substitute for (BG01 ): for any finite martingale x in L1 (M) X 2 ) . sup k εn dxn k1 ≈ kdxkL1 (M;lC2 )+L1 (M;lR εn =±1
n
(ii) Clearly, (BG0p ) implies the well-known fact (cf. [B1, BGM]) that Lp (M) is a UMD space for all 1 < p < ∞ (take q = p in (0.3)). In particular if f = (fn )n≥0 is a finite commutative martingale defined on some probability space with values in Lp (M), then 1/p Z X εn fn (ω) − fn−1 (ω) kpp dω k n≥0
≤
βp0
Z
sup n≥0
kfn (ω)kpp dω
(2.7)
1/p ,
∀ εn = ±1.
3. Examples In this section, we give some examples for which the corresponding inequalities (BGp ) can be proved by a different method from the one given in Sect. 2. The key idea of this alternate method is to transfer a non-commutative martingale in Lp (M) to a commutative martingale with values in Lp (M). This then enables us to use the unconditionality of commutative martingale differences with values in Lp (M). (Recall that Lp (M) is a UMD space; see Remark 2.9 in Sect. 2). Although it does not seem suitable in the general case, this transference approach might be of interest in other situations. This explains why we will give a sketch of this second method in the tensor product case below. Let us also point out that we have first obtained the non-commutative martingale inequalities for these examples, before proving the general Theorem 2.1 (see [PX]). I. Tensor products. Let (An ) be a sequence of hyperfinite von Neumann algebras, An being equipped with a normalized faithful trace σn . Let (Mn , τn ) =
n O k=0
(Ak , σk )
and
(M, τ ) =
∞ O
(Ak , σk )
k=0
be the tensor products in the sense of von Neumann algebras. Thus we have an increasing filtration (Mn )n≥0 of subalgebras of M which allows us to consider martingales. Let us reformulate Theorem 2.1 in this case as follows. Theorem 3.1. Let 1 < p < ∞ and (Mn )n≥0 be as above. Then Lp (M) = Hp (M) with equivalent norms.
Non-Commutative Martingale Inequalities
681
Remark. A special case of Theorem 3.1 is the one where all An ’s are equal to the algebra of all 2 × 2 matrices with its normalized trace. Then M is the hyperfinite II1 factor, and (Mn )n≥0 is its natural filtration. Sketch of the transference proof of Theorem 3.1. It is not hard to reduce Theorem 3.1 to the case where all An ’s are finite dimensional and simple. Thus we will consider this special case only. Then let n be the unitary group of An , equipped with its normalized Haar measure µn (noting that since dimAn < ∞, n is compact). Set Y (, µ) = (n , µn ) . n≥0
For ω = (ω0 , ω1 , · · ·) ∈ , we denote by πωn the automorphism of An induced by ωn , i.e. πωn (a) = ωn∗ aωn , ∀ a ∈ An , and we let πω =
O
πωn .
n≥0
Then πω is an automorphism of M, and extends to an isometry on Lp (M) for all 1 ≤ p ≤ ∞. Now for a ∈ Lp (M) we define ∀ ω ∈ .
f (a, ω) = πω (a),
Then f (a, ω) is strongly measurable as a function from to Lp (M) for every 1 ≤ p < ∞. Let Σn be the σ-field on generated by (ωk )nk=0 , and En = E( · |Σn ) the corresponding conditional expectation. The key point here is the following observation: Ek f (a, ω) = f (Ek (a), ω)
a.e. on ,
∀ k ≥ 0, ∀ a ∈ L1 (M).
(Roughly speaking, the automorphism πω intertwines the two conditional expectations Ek and Ek .) Then let x be a finite Lp -martingale (so there is an n such that xk = xn for all k ≥ n). Let f (ω) = f (xn , ω) be the function defined above. Then (Ek f )k≥0 is a commutative martingale on with values in Lp (M), and by the above observation, Ek f − Ek−1 f = πω (dxk ),
a.e.
Therefore, since Lp (M) is a UMD space (see [B1, B2, BGM]), with constant Cp = Cp (Lp (M)) in (0.3), we have Z Z X p p k εk πω (dxk )kp dω ≤ (Cp ) kπω (xn )kpp dω , ∀ εk = ±1.
k≥0
But πω is an isometry on Lp (M); hence X k εk dxk kp ≤ Cp kxkp ,
∀ εk = ±1.
k≥0
Thus we obtain the unconditionality of martingale differences in Lp (M), i.e. (BG0p ) (defined at the end of Sect. 2) with βp0 ≤ Cp , which, together with the non-commutative Khintchine inequalities, implies easily (BGp ).
682
G. Pisier, Q. Xu
Remark 3.2. In this tensor product case (also in the two following) the above transference proof gives better constants αp and βp in (BGp ) than the general proof in Sect. 2. Indeed, by the argument in [B1-2] or [BGM], one can show that the constant Cp is O(p2 ) (resp. O(1/(p − 1)2 )) as p → ∞ (resp. p → 1). Note that, when p ≥ 2, the preceding proof yields (in the tensor product case) αp ≤ Cp and βp ≤ Cp δp , and when 1 < p ≤ 2, αp ≤ α−1 Cp γp and βp ≤ Cp . Actually, a more careful use of duality yields that for p ≥ 2, we still have βp ≤ Cp . Therefore, the preceding sketch of proof yields the following estimates for αp and βp in (BGp ): αp and βp are both of order O(p2 ) as p → ∞, and respectively of order O((p − 1)−6 ) and O((p − 1)−2 ) as p → 1. II. Clifford algebras. Our second example concerns Clifford algebras. We take this opportunity to give a brief introduction to von Neumann Clifford algebras and to prepare ourselves for the next section. The reader is referred to [PR, BR, S and C] for more information on this subject. Let H be a complex Hilbert space with a conjugation J. Let C(H, J) or simply C(H) denote the von Neumann Clifford algebra associated to the J-real subspace of H. C(H) is a finite von Neumann algebra. Let us briefly describe C(H) via its Fock representation. Denote by 3n (H) the n-fold antisymmetric product of H, equipped with the canonical scalar product: hu1 ∧ · · · ∧ un , v1 ∧ · · · ∧ vn i = det(huk , vj i1≤k,j≤n ). 30 (H) = C1l, where 1l is the vacuum vector. The antisymmetric Fock space 3(H) is the direct sum of 3n (H): M 3n (H). 3(H) = n≥0
Given any v ∈ H the associated creator c(v) on 3(H) is linearly defined over antisymmetric tensors by c(v)u1 ∧ · · · ∧ un = v ∧ u1 ∧ · · · ∧ un . c(v) is bounded on 3(H) and kc(v)k = kvk. Its adjoint c(v)∗ is the annihilator a(v) associated to v. The creators and annihilators satisfy the following canonical anticommutation relation (CAR): {c(u), a(v)} = hu, vi,
{c(u), c(v)} = 0,
∀u, v ∈ H,
where {S, T } = ST + T S stands for the anticommutator of S and T . The Fermion field 8 is then defined by 8(v) = c(v) + a(Jv), ∀v ∈ H. 8 is a linear map from H to B(3(H)). Moreover {8(u), 8(v)} = 2hu, Jvi,
∀u, v ∈ H.
Therefore, if u and Jv are orthogonal, 8(u) and 8(v) anticommute. Notice also that 8(v) is hermitian for any J-real vector v (i.e., Jv = v). Then the von Neumann Clifford algebra C(H) is exactly the subalgebra of B(3(H)) generated by {8(v): v ∈ H}. Observe that if {ei : i ∈ I} is a J-real orthonormal basis of H, {8(ei ): i ∈ I} is a family of anticommuting hermitian unitaries, and it generates C(H). The vector state on B(3(H)), given by the vacuum 1l, induces a trace τ on C(H): τ (x) = hx(1l), 1li for any x ∈ C(H). Let Lp (C(H)) denote the associated noncommutative Lp -space.
Non-Commutative Martingale Inequalities
683
If K is a J-invariant closed subspace of H, C(K) is naturally identified as a subalgebra of C(H). Now let (Hn )n≥0 be an increasing sequence of J-invariant closed S Hn = H. Then the corresponding von Neumann Clifford subspaces of H such that n≥0
algebras (C(Hn ))n≥0 form a filtration of von Neumann subalgebras of C(H). We will call a non-commutative martingale with respect to (C(Hn ))n≥0 a Clifford martingale. Therefore, by Theorem 2.1, we have inequalities (BGp ) for Clifford martingales. In fact, this Clifford martingale case can be easily reduced to Theorem 3.1 (the tensor product case) with the help of the classical Jordan-Wigner transformation. Let us consider only a special case for Clifford martingales, where dim Hn = n for all n ≥ 0. Fix a J-real orthonormal basis (en )n≥1 of H such that en ∈ Hn Hn−1 for all n ≥ 1. Then Cn = C(Hn ) is the C ∗ -algebra generated by {8(ek )}nk=1 and of dimension 2n . For convenience we set e0 = 1 and e−1 = 0. Let x = (xn )n≥0 be a Clifford Lp -martingale. Then dxn can be written as dxn = ϕn (e1 , . . . , en−1 )8(en ), where ϕn = ϕ(e1 , . . . , en−1 ) belongs to Lp (Cn−1 ). Let ϕ = (ϕn )n≥0 and C = C(H). Proposition 3.3. Let 1 ≤ p ≤ ∞ and x = (xn )n≥0 be a bounded Clifford Lp -martingale 2 ) = kϕkLp (C;l2 ) and as above. Then kdxkLp (C;lR R 1 kϕkLp (C;lC2 ) ≤ kdxkLp (C;lC2 ) ≤ 2kϕkLp (C;lC2 ) . 2 2 ) = kϕkLp (C;l2 ) . To Proof. Since 8(en ) is unitary (and hermitian), we have kdxkLp (C;lR R p 2 prove the inequalities on L (C; lC ) we need the grading automorphism (or parity) G of C: G is uniquely determined by G 8(v1 ) . . . 8(vn ) = 8(−v1 ) . . . 8(−vn ), ∀ vk ∈ H, 0 ≤ k ≤ n.
This means that G is the automorphism induced by minus the identity of H. Recall that a ∈ Lp (C) is called even (resp. odd) if G(a) = a (resp. G(a) = −a). We have the decomposition Lp (C) = Lp (C + ) ⊕ Lp (C − ) into even and odd parts; more precisely for any a ∈ Lp (C) a + G(a) a − G(a) + = a+ + a− . a= 2 2 Since G is isometric on Lp (C), max(ka+ kp , ka− kp ) ≤ kakp ≤ ka+ kp + ka− kp . Now for x = (xn )n≥0 as in the proposition we have G(dxn ) = −G(ϕn )8(en ); − p − so (dxn )+ = ϕ− n 8(en ). Notice that ϕn ∈ L (Cn−1 ). Then by the anticommutation − of 8(en ) with 8(ek ) (1 ≤ k ≤ n − 1) we get ϕ− n 8(en ) = −8(en )ϕn . Therefore + − (dxn ) = −8(en )ϕn ; hence, since 8(en ) is unitary, 2 ). k(dx+n )n≥0 kLp (C;lC2 ) = k(ϕ− n )n≥0 kLp (C;lC
Similarly,
+ 2 ) = k(ϕ )n≥0 kLp (C;l2 ) . k(dx− n )n≥0 kLp (C;lC n C
684
G. Pisier, Q. Xu
Combining the preceding inequalities, we get 1 kϕkLp (C;L2C ) ≤ kdxkLp (C;lC2 ) ≤ 2kϕkLp (C;lC2 ) , 2 proving the proposition.
Let us record explicitly the following consequence of Theorem 2.1 and Proposition 3.3. Corollary 3.4. Let 1 < p < ∞ and x = (xn )n≥0 be as in Proposition 3.3. Then if 2 ≤ p < ∞ we have 2 ) }, kxkHp (C) ≈ max{kϕkLp (C;lC2 ) , kϕkLp (C;lR
and if 1 < p < 2 we have 2 ) }, kxkHp (C) ≈ inf{kϕ0 kLp (C;lC2 ) + kϕ00 kLp (C;lR
2 2 where the infimum runs over all ϕ0 ∈ Lp (C; lC ), ϕ00 ∈ Lp (C; lR ) such that ϕ = ϕ0 + ϕ00 and ϕ0n , ϕ00n ∈ Lp (Cn−1 ) for all n ≥ 0.
III. Free group algebras. Let Fn be the free group of n generators. Let vN (Fn ) be the von Neumann algebra of Fn , equipped with its standard normalizedtrace τ . vN (Fn ) is naturally identified as a subalgebra of vN (Fn+1 ), so that vN (Fn ) n≥1 is an increasing filtration of von Neumann subalgebras of vN (F∞ ), which generate vN (F∞ ). For convenience, we put vN (F0 ) = C1 l. Thus we can consider martingales with respect to vN (Fn ) n≥0 . Let Hp vN (F∞ ) denote the corresponding Hardy space. Then Theorem 2.1 gives Theorem 3.5. Let 1 < p < ∞. Then Hp vN (F∞ ) = Lp vN (F∞ ) with equivalent norms. Let us emphasize that, a priori, the above situation is quite different from the one considered in the tensor product case, since vN (Fn ) is not hyperfinite as soon as n ≥ 2. However, Theorem 3.5 also admits an alternate proof, which appears as a limit case of the tensor product case: indeed, as Philippe Biane kindly pointed out to us, this can be done via random matrices with the help of Voiculescu’s limit theorem [V]. We omit the details. Note that again this argument yields better constants when p tends to infinity, the same ones as indicated in Remark 3.2. 4. Applications to the Ito-Clifford Integral In this section H denotes L2 (R+ ) with its usual Lebesgue measure and complex conjugation; C = C(H) is the associated von Neumann Clifford algebra equipped with its normalized trace τ . For t ≥ 0 let Ht denote the subspace L2 (0, t) and Ct = C(Ht ). Clearly, C0 = C and Cs ⊂ Ct for 0 ≤ s ≤ t. Let Et = E(· | Ct ) be the conditional expectation of C with respect to Ct . Thus we have a continuous time filtration of von Neumann subalgebras (Ct )t≥0 of C, which generate C. All the notions for discrete martingales in Sect. 1 can be transferred to this continuous time setting. Thus a Clifford Lp -martingale is a family X = (Xt )t≥0 such that Xt ∈ Lp (Ct ) and Es Xt = Xs for 0 ≤ s ≤ t; if
Non-Commutative Martingale Inequalities
685
additionally kXkp = sup kXt kp < ∞, X is said to be bounded. In this section, unless t≥0
otherwise stated all martingales are Clifford martingales with respect to (Ct )t≥0 . The main result here is the analogue of Theorem 2.1 for these Clifford martingales. We will deduce it from Theorem 2.1 by discretizing continuous time Clifford martingales. This reduction from continuous time to discrete time will be done via the Ito-Clifford integral developed by Barnett, Streater and Wilde, who had extended the classical Ito integral theory to Clifford L2 -martingales. They showed that any Clifford L2 -martingale admits an Ito-Clifford integral representation. The Clifford martingale inequalities below will allow us to extend this Ito-Clifford integral theory from L2 -martingales to Lp -martingales for any 1 < p < ∞. As a consequence, we will show that any Clifford Lp -martingale (1 < p < 2) has an Ito-Clifford integral representation. Let us first recall the Ito-Clifford integral defined in [BSW1-2]. For given t ≥ 0, let 8t = 8(χ[0,t) ) (recalling that 8 is the Fermion field defined in Sect. 3). Then 8t is hermitian and belongs to Ct ; by the canonical anticommutation relations, (8t − 8s )2 = t − s for 0 ≤ s ≤ t. 8t is the Fermion analogue of Brownian motion. Like in the classical Ito integral, Barnett, Streater and Wilde develop their Ito-Clifford integral by first defining the integrals of simple processes. A simple adapted Lp -process is a function f : R+ → Lp (C) such that f (t) ∈ Lp (Ct ) for t ≥ 0 and X f (t) = f (tk )χ[tk ,tk+1 ) (t), k≥0
where (tk )k≥0 is a subdivision of R+ , i.e., 0 = t0 < t1 < · · · increasing to +∞. For such an f we define its Ito-Clifford integral as follows: for tk ≤ t < tk+1 , Z
t
f (s)d8s =
Xt = 0
k−1 X
f (tj )(8tj+1 − 8tj ) + f (tk )(8t − 8tk ).
j=0
Clearly, X = (Xt )t≥0 is a Clifford Lp -martingale; and if p = 2, Z t 2 kXt k2 = kf (s)k22 ds, ∀t ≥ 0. 0
This identity allows one to define the Ito-Clifford integral of any “adapted L2 -process” f belonging to L2loc (R+ ; L2 (C)): Z t Xt = f (s)d8s , ∀t ≥ 0. 0
(Xt )t≥0 is again a Clifford L2 -martingale and the above identity still holds. Conversely, any Clifford L2 -martingale admits such an Ito-Clifford integral representation (cf. [BSW1]). As in the discrete case, for any simple adapted process f we define Z Z t ∗ 1/2 t 1/2 f (s)f (s)ds and SR,t (f ) = f (s)f ∗ (s)ds . SC,t (f ) = 0 p Let Sad
0
p be the linear space of all simple adapted L -processes and Sad [0, t] its subspace of processes vanishing in (t, ∞). Then like in the case of discrete time, kSC,t (f )kp and p p [0, t]. The completions of Sad [0, t] with respect to kSR,t (f )kp define two norms on Sad p
686
G. Pisier, Q. Xu
p p them are denoted respectively by HC [0, t] and HR [0, t] for 1 ≤ p < ∞. Let us point p p out that elements in HC [0, t] and HR [0, t] can be regarded as measurable operators in p Lp (Ct ⊗ B(L2 [0, t])) (see Sect. 1 about the column and row subspaces). Let HC,loc (R+ ) p p (resp. HR,loc (R+ )) denote the space of all functions f : R+ → L (C) whose restrictions p p p to [0, t] belong to HC [0, t] (resp. HR [0, t]) for all t ≥ 0. We call elements in HC,loc (R+ ) p p and HR,loc (R+ ) (measurable) adapted L -processes. As in the discrete case, we define p p Hp [0, t] = HC [0, t] + HR [0, t]
for
1 ≤ p < 2,
p p [0, t] ∩ HR [0, t] Hp [0, t] = HC
for
2 ≤ p < ∞.
and We endow Hp [0, t] with the corresponding sum or intersection norm. Similarly, we p (R+ ). define Hloc Now we can state the main result of this section. p (R+ ) its Ito-Clifford integral Theorem 4.1. Let 1 < p < ∞. Then for any f ∈ Hloc
Z
t
t≥0
f (s)d8s ,
Xt = 0
is a well-defined Clifford Lp -martingale and αp−1 kf kHp [0,t] ≤ kXt kp ≤ βp kf kHp [0,t] ,
∀ t ≥ 0.
Remarks. (i) Carlen and Kr´ee [CK] proved that if p ≤ 2 and if f is a simple adapted process, then the Ito-Clifford integral (Xt ) of f satisfies
Z t n Z t o 1/2 1/2
|f (s)|2 ds |f (s)∗ |2 ds kXt kp ≤ βp min
,
. 0
p
0
p
(This corresponds essentially to the second inequality of Theorem 4.1 for p ≤ 2.) From this they deduced some sufficient conditions for the existence of Ito-Clifford integrals. They also proved Theorem 4.1 for p = 4 (and mentioned that the same argument works for p = 6 and 8). (ii) If 2 ≤ p < ∞, then p (R+ ) ⊂ L2loc (R+ ; L2 (C)); Hloc
so adapted Lp -processes are adapted L2 -processes. Thus the existence of Ito-Clifford integrals of adapted Lp -processes (p ≥ 2) goes back to [BSW1]. Note also that in the case p = 2 the inequalities in Theorem 4.1 become equalities (i.e., α2 = β2 = 1). This is the only case already treated in [BSW1]. If f is an adapted L1 -process, then its ItoClifford integral is also a well-defined Clifford L1 -martingale X = (Xt )t≥0 and we have kXt k1 ≤ β1 kf kH1 [0,t] , ∀t ≥ 0 (see the corollary in the appendix and Remark 2.9). Of course, the reverse inequality fails this time.
Non-Commutative Martingale Inequalities
687
We will reduce Theorem 4.1 to simple adapted processes and then apply Theorem 2.1. For this reduction to be successful we have to check two things. The first one is the density p [0, t] in Hp [0, t] (this is trivial for 1 ≤ p ≤ 2). The second one is that the norm of Sad of a simple adapted Lp -process f in Hp [0, t] for 1 < p < 2 is equivalent to inf{kgkHpC [0,t] + khkHpR [0,t] : f = g + h,
p g, h ∈ Sad }.
These will be done by the following lemmas. Lemma 4.2. Let σ = (tk )∞ k=0 be a subdivision of R+ . Define the map Qσ over simple adapted processes by Qσ (f )(t) =
1 tk+1 − tk
Z
tk+1 tk
Etk f (s)ds,
tk ≤ t < tk+1 ,
t ≥ 0.
p p [0, t] and HR [0, t] for Then for 1 < p < ∞, Qσ extends to a bounded projection on HC all t ≥ 0.
Proof. Suppose f is a simple adapted Lp -process: X f (sj )χ[sj ,sj+1 ) . f= j≥0
By refining the subdivision (sj )j≥0 if necessary we may assume it is finer than σ. Then Qσ f =
X k≥0
where θk,j = Note that
X
θk,j Etk f (sj ) χ[tk ,tk+1 ) ,
j:tk ≤sj
sj+1 − sj tk+1 − tk X
for
tk ≤ sj < tk+1 .
θk,j = 1,
∀ k ≥ 0.
j:tk ≤sj
Observe also the following elementary and well known inequality: for any sequence of operators (aj ) in B(H) P(H being a Hilbert space) and for any finitely supported sequence (θj ) with θj ≥ 0 and θj = 1, we have (in the order of B(H)) |
X
θj aj | 2 ≤
X
θj |aj |2 .
P P (Indeed, for any h in H, by convexity of k · k2 , we have k θj aj hk2 ≤ θj kaj hk2 , whence the desired inequality.) Therefore, for all k ≥ 0, X X | θk,j Etk f (sj )|2 ≤ θk,j |Etk (f (sj ))|2 . j:tk ≤sj
j:tk ≤sj
Now let t ≥ 0. Without loss of generality we assume t = tn+1 for some n ≥ 0. Then by Theorem 2.3,
688
G. Pisier, Q. Xu
kQσ f kHpC [0,t]
Z t 1/2
= (Qσ f (s))∗ (Qσ f (s))ds
0
n
X
≤
X
p
(sj+1 − sj )|Etk f (sj )|2
k=0 tk ≤sj
n
X
≤ βp
X
(sj+1 − sj )|f (sj )|2
1/2
1/2
k=0 tk ≤sj
p
p
= βp kf kHpC [0,t] . p [0, t]. The same reasoning Therefore Qσ extends to a bounded map (projection) on HC p applies to HR [0, t]. p p (R+ ) (resp. HR,loc (R+ )). Then for all Lemma 4.3. Let 1 < p < ∞ and f ∈ HC,loc t ≥ 0, p p [0, t] (resp. HR [0, t]), lim Qσ f = f in HC σ
where the limit is taken relative to the subdivision σ = (tk )k≥0 when sup(tk+1 − tk ) goes k≥0
to zero.
p , then Qσ f = f when σ is sufficiently fine; so the lemma is true for Proof. If f ∈ Sad simple adapted processes. The general case is proved by Lemma 4.2 and the density of p p p [0, t] in HC [0, t] and HR [0, t]. Sad p [0, t] is dense in Hp [0, t] for all t ≥ 0. Lemma 4.4. Let 1 ≤ p < ∞. Then Sad p p [0, t] + HR [0, t] in this case. Proof. This is trivial for 1 ≤ p < 2 because Hp [0, t] = HC For 2 ≤ p < ∞ and f ∈ Hp [0, t] Lemma 4.3 implies that
lim Qσ f = f
in
σ
p Thus Sad [0, t] is also dense in Hp [0, t].
Lemma 4.5. Let 1 < p < ∞ and f ∈
Hp [0, t].
p . Sad
Then for all t ≥ 0,
kf kHpC [0,t]+HpR [0,t] ≈ inf{kgkHpC [0,t] + khkHpR [0,t] }, p [0, t] such that f = g + h. where the infimum is taken over all g, h ∈ Sad
Proof. Let f be a simple adapted Lp -process defined by a subdivision σ = (tk )k≥0 : X f (tk )χ[tk ,tk+1 ) . f= k≥0 p p [0, t], h ∈ HR [0, t] such that f = g + h and Let g ∈ HC
kgkHpC [0,t] + khkHpR [0,t] ≤ 2kf kHpC [0,t]+HpR [0,t] . p . By Lemma 4.2 Then f = Qσ f = Qσ g + Qσ h, and Qσ g, Qσ h ∈ Sad
kQσ gkHpC [0,t] ≤ βp kgkHCp [0,t] , kQσ hkHRp [0,t] ≤ βp khkHpC [0,t] ; whence the equivalence in the lemma.
Non-Commutative Martingale Inequalities
689
Now we are ready to show Theorem 4.1. p Proof of Theorem 4.1. First consider the case 2 ≤ p < ∞. Let f ∈ Sad : X f (tk )χ[tk ,tk+1 ) . f= k≥0
Then (assuming t = tn ) Xt =
n−1 X
f (tk )[8(tk+1 ) − 8(tk )].
k=0
Thus (Xtk )nk=0 is a finite Clifford Lp -martingale with respect to (C(Htk ))nk=0 . Set dk = Xtk+1 − Xtk = f (tk )[8(tk+1 ) − 8(tk )]. Then by Theorem 2.1, X n−1
kXt kp ≈ k
|dk |2
1/2
X n−1
kp + k
k=0
|d∗k |2
1/2
kp .
k=0
Since 8(tk+1 ) − 8(tk ) is hermitian and [8(tk+1 ) − 8(tk )]2 = tk+1 − tk , we have
n−1 X
|d∗k |2 =
k=0
n−1 X k=0 t
Z =
f (tk )f (tk )∗ (tk+1 − tk ) f (s)f (s)∗ ds.
0
On the other hand, since χ[tk ,tk+1 ) is orthogonal to L2 (0, tk ) and since f (tk ) ∈ Ctk , by Proposition 3.3 and its proof
n−1
n−1 1/2 1/2
X
X
|dk |2 f (tk )∗ f (tk )(tk+1 − tk )
≈
k=0
p
k=0
Z t 1/2
= f (s)∗ f (s)ds
0
p
p
= kf kHpC [0,t] . Therefore, we finally deduce that kXt kp ≈ max(kf kHpC [0,t] , kf kHpR [0,t] ) , proving Theorem 4.1 in the case 2 ≤ p < ∞ for simple adapted Lp -processes. The general adapted Lp -processes are treated by approximation by means of Lemma 4.4. p . Write Now suppose 1 < p < 2 and f ∈ Sad X f (tk )χ[tk ,tk+1 ) . f= k≥0
690
G. Pisier, Q. Xu
Since step functions are dense in L2 [0, tk ], by refining (tk )k≥0 if necessary we may assume f (tk ) belongs to the von Neumann algebra generated by {8(tj+1 ) − 8(tj )}k−1 j=0 . Let Lk denote the subspace of Htk spanned by {χ[tj ,tj+1 ) }k−1 . Then dim L = k and k j=0 f (tk ) ∈ C(Lk ). Let t = tn for some n ≥ 0. Then Xt =
n−1 X
f (tk )[8(tk+1 ) − 8(tk )].
k=0
Thus (Xtk )nk=1 is a finite Clifford martingale relative to (C(Lk ))nk=1 . Applying Corollary 3.4 to (Xtk )nk=1 we get n n−1 X X 1/2 n−1 1/2 o |ak |2 (tk+1 − tk ) kp + k |b∗k |2 (tk+1 − tk ) kp , kXt kp ≈ inf k k=0
k=0
where the infimum runs over all (ak ) and (bk ) such that ak +bk = f (tk ) and ak , bk ∈ C(Lk ) for all 0 ≤ k ≤ n − 1. Let us show that the last infimum is equivalent to kf kHp [0,t] . By p p ) there are g, h ∈ Sad such that Lemma 4.5 (recall that f ∈ Sad kgkHpC [0,t] + khkHpR [0,t] ≤ βp kf kHp [0,t] ; moreover, we may assume that g and h are given by the same subdivision as f . Therefore kgk
Hp [0,t] C
X n−1
=k
1/2
|g(tk )|2 (tk+1 − tk )
kp .
k=0
Applying Theorem 2.3 to the sequence of conditional expectations {E · | C(Lk ) }nk=1 , we deduce that X n−1 1/2 k |E g(tk )|C(Lk ) |2 (tk+1 − tk ) kp ≤ βp kgkHpC [0,t] . k=0 p p [0, t] in place of g and HC [0, t]. Since f (tk ) ∈ The same inequality holds for h and HR C(Lk ), n−1 Xh i f= E g(tk )|C(Lk ) + E h(tk )|C(Lk ) χ[tk ,tk+1 ) . k=0
Set ak = E g(tk )|C(Lk ) and bk = E h(tk )|C(Lk ) for 0 ≤ k ≤ n − 1. Then f (tk ) = ak + bk and X n−1 1/2 k |ak |2 (tk+1 − tk ) kp ≤ βp kgkHpC [0,t] , k=0
k
n−1 X
1/2
|b∗k |2 (tk+1 − tk )
kp ≤ βp khkHpR [0,t] .
k=0
Thus the desired equivalence follows, and so kXt kp ≈ kf kHp [0,t] .
Non-Commutative Martingale Inequalities
691
Therefore, the inequalities of Theorem 4.1 in the case 1 < p < 2 have been proved p p (R+ ) (1 < p < 2). Let fn ∈ Sad [0, t] for simple adapted processes. Now let f ∈ Hloc p converge to f in H [0, t]. Set Z t fn (s)ds. Xtn = 0
Then
kXtn − Xtm kp ≈ kfn − fm kHp [0,t] .
Therefore Xtn converges to some Xt as n → ∞. It is clear that (Xt )t≥0 is a Clifford Lp -martingale and kXt kp ≈ kf kHp [0,t] , ∀t ≥ 0. Also (Xt )t≥0 is uniquely determined by f . Then we define the Ito-Clifford integral of f to be (Xt )t≥0 . Hence the proof of Theorem 4.1 is complete. As a consequence of Theorem 4.1 we get the following Ito-Clifford integral representation for Clifford Lp -martingales (1 < p < ∞), which extends to any p ∈ (1, ∞) the Barnett-Streater-Wilde representation theorem for L2 -martingales. Theorem 4.6. Let 1 < p < ∞. Then for any Clifford Lp -martingale (Xt )t≥0 there p exists an adapted Lp -process f ∈ Hloc (R+ ) such that Z t f (s)d8s , ∀t ≥ 0. X t = X0 + 0 p
Proof. Let (Xt )t≥0 be a Clifford L -martingale. Without loss of generality assume X0 = 0. It suffices to construct the required adapted process over any interval [0, T ]. Thus fix T > 0. For any subdivision σ of [0, T ]: 0 = t0 < · · · < tn = T , let Lσ denote the subspace of HT = L2 [0, T ] spanned by {χ[tk ,tk+1 ) }n−1 k=0 . Since the union of all Lσ is dense in HTS , CT = C(HT ) is generated by the union of all Clifford algebras C(Lσ ). It follows that C(Hσ ) is dense in Lp (CT ). Therefore there exists a sequence (XTn )n≥0 σ
of Lp (CT ) such that lim XTn = XT in Lp (CT ) and such that XTn ∈ C(Hσn ) for some n→∞
n n subdivision σn of [0, T ]. Let σn = (tnk )N k=0 . Then XT can be written as
XTn =
NX n −1
an,k [8(tnk+1 ) − 8(tnk )],
k=0
where an,k belongs to the C ∗ -algebra generated by {8(tnj+1 ) − 8(tnj )}k−1 j=0 for all 0 ≤ k ≤ Nn and n ≥ 0. Put NX n −1 fn = an,k χ[tnk ,tnk+1 ) . k=0
Then fn is a simple adapted Lp -process and Z T fn (s)ds. XTn = 0
Therefore, by Theorem 4.1,
692
G. Pisier, Q. Xu
kXTn − XTm kp ≈ kfn − fm kHp [0,T ] , whence (fn )n≥0 is a Cauchy sequence in Hp [0, T ], so it converges to some adapted Lp -process f ∈ Hp [0, T ]. Then clearly Z T XT = f (s)ds. 0
This finishes the proof of Theorem 4.6.
Remark. If we identify a Clifford Lp -martingale with the integrand (adapted Lp -process) in its Ito-Clifford integral representation (this is always possible by Theorem 4.6), then Theorem 4.1 can be reformulated as follows: for any 1 < p < ∞ and any t ≥ 0, Lp0 (Ct ) = Hp [0, t] where
with equivalent norms,
Lp0 (Ct ) = {X ∈ Lp (Ct ) : τ (X) = 0}.
This equivalence can be extended to the whole of R+ . Let us say that an adapted Lp process f belongs to Hp (R+ ) if kf kHp (R+ ) = sup kf kHp [0,t] < ∞. t≥0
Then for 1 < p < ∞ a Clifford Lp -martingale X = (Xt )t≥0 is bounded iff the associated adapted Lp -process f belongs to Hp (R+ ); moreover, in this case we have kXkp = sup kXt kp ≈ kX0 kp + kf kHp (R+ ) . t≥0
Recall also that X = (Xt )t≥0 is bounded iff lim Xt = X∞ exists in Lp (C). Identifying t→∞
the three objects X = (Xt )t≥0 with X0 = 0, f and X∞ , we get that Hp (R+ ) = Lp0 (C) with equivalent norms. Appendix In this appendix we consider the non-commutative analogue of the classical duality between the Hardy space H 1 and BMO of martingales (see [G]). We will show this duality remains valid in the non-commutative case. Let us go back to the general situation presented in Sect. 1. In all what follows (M, τ ) denotes a finite von Neumann algebra with a normalized trace τ , and (Mn ) an increasing filtration of von Neumann subalgebras of M, which generate M. Recall that En denotes the conditional expectation of M with respect to Mn . In Sect. 1 we have 1 1 (M), HR (M) and H1 (M) of martingales with respect introduced the Hardy spaces HC to (Mn ). Now let us define the corresponding BMO-spaces. We set BMOC (M) = {a ∈ L2 (M) : sup kEn |a − En−1 a|2 k∞ < ∞}, n≥0
where, as usual, E−1 a = 0 (recall |a|2 = a∗ a). BMOC (M) becomes a Banach space when equipped with the norm
Non-Commutative Martingale Inequalities
693
kakBMOC (M) = sup kEn |a − En−1 a|2 k∞
1/2
.
n≥0
Similarly, we define BMOR (M), which is the space of all a such that a∗ ∈ BMOC (M), equipped with the natural norm. Finally, BMO(M) is the intersection of these two spaces BMO(M) = BMOC (M) ∩ BMOR (M), and for any a ∈ BMO(M), kakBMO(M) = max{kakBMOC (M) , kakBMOR (M) }. Notice that if an = En a, then X
En |a − En−1 a|2 = En
|dak |2 .
k≥n
Note also that En |a|2 = En−1 |a|2 + En |a − En−1 a|2 , so that En |a − En−1 a|2 ≤ En |a|2 . Therefore, it follows that kakBMO(M) ≤ kak∞ .
(A1 )
1 1 (M), BMOC (M), etc., respectively by HC , For simplicity we will denote HC BMOC , etc. We will also adapt the identification between a martingale and its limit whenever the latter exists. The result of this appendix is the following duality. 1 ∗ ) = BMOC with equivalent norms. More precisely, Theorem. We have (HC 1 by (i) Every a ∈ BMOC defines a continuous linear functional on HC
ϕa (x) = τ (a∗ x),
∀ x ∈ L2 (M).
(A2 )
1 ∗ (ii) Conversely, any ϕ ∈ (HC ) is given as above by some a ∈ BMOC . Moreover,
√ 1 √ kakBMOC ≤ kϕa k(H1 )∗ ≤ 2kakBMOC . C 3 1 , BMOR and between H1 , BMO as well: The same duality holds between HR 1 ∗ (HR ) = BMOR
and
(H1 )∗ = BMO.
Remark. In the duality (A2 ) we have identified an element x ∈ L2 with the martingale 1 and (En x)n≥0 . It is evident that this martingale is in HC kxkH1C ≤ kxk2 . Let us also note that from the discussions in Sect. 1 the family of finite martingales is 1 1 , and so is L2 . Of course, the same remark applies to HC and H1 as well. dense in HC
694
G. Pisier, Q. Xu
Before proceeding to the proof of the theorem, let us note that the equivalence constants in (ii) above are the same as in [G]. In fact, our proof below is modelled on the one presented in [G], although one should be careful about some difficulties caused by the non-commutativity. However, this time, they are much less substantial than those appearing in the proof of Theorem 2.1. We will frequently use the tracial property of τ and the following elementary property of expectation: En (abc) = aEn (b)c,
∀ a, c ∈ Mn , ∀ b ∈ M.
Proof of the theorem. (i) Let a ∈ BMOC . Define ϕa by (A2 ). We must show that ϕa 1 induces a continuous functional on HC . To that end let x be a finite L2 - martingale. Then (recalling our identification between a martingale and its limit) X τ (da∗n dxn ) . ϕa (x) = n≥0
Set, as in Sect. 1, n X
SC,n =
|dxk |2
1/2
and
SC =
k=0
∞ X
|dxk |2
1/2
.
k=0
By approximation we may assume the SC,n ’s are invertible elements in M. Then by the Cauchy-Schwarz inequality X 1/2 −1/2 τ (SC,n da∗n dxn SC,n )| |ϕa (x)| = | n≥0
≤ τ
X
−1/2
−1/2 1/2
SC,n |dxn |2 SC,n
n≥0
τ
X
1/2 1/2
1/2
SC,n |dan |2 SC,n
n≥0
X −1 1/2 X 1/2 = τ( SC,n |dxn |2 SC,n |dan |2 τ n≥0
n≥0
= I · II. We are going to estimate I and II separately. First for I we have X −1 2 2 I2 = τ [SC,n − SC,n−1 ]SC,n n≥0
=
X
n≥0
≤
−1 τ [SC,n − SC,n−1 ][1 + SC,n−1 SC,n ]
X
−1 τ SC,n − SC,n−1 k1 + SC,n−1 SC,n k∞
n≥0
X
≤ 2τ
SC,n − SC,n−1
n≥0
= 2τ (SC ) = 2kxkH1C , 2 2 ≤ SC,n ) where we have used the trivial fact that (noting SC,n−1 −1 2 −1 2 −1 k∞ = kSC,n SC,n−1 SC,n k∞ ≤ 1. kSC,n−1 SC,n
Non-Commutative Martingale Inequalities
695
As for II, set θ0 = SC,0 and θn = SC,n − SC,n−1 for n ≥ 1. Then θn ∈ Mn , and X τ SC,n |dan |2 II 2 = n≥0
X X τ θk |dan |2 = k≥0
n≥k
X X τ θk E k |dan |2 = k≥0
≤
X k≥0
≤
n≥k
τ (θk )kEk
X
|dan |2 k∞
n≥k
kak2BMOC kxkH1C .
Combining the preceding estimates on I and II, we obtain, for any finite L2 -martingale x, √ |ϕa (x)| ≤ 2kakBMOC kxkH1C . √ 1 of norm≤ 2kakBMOC . Therefore, ϕa extends to a continuous functional on HC 1 ∗ ) . Then by the Hahn-Banach theorem, ϕ extends to a (ii) Now suppose ϕ ∈ (HC 1 2 ) of the same norm. Thus by the duality (see Sect. 1) continuous functional on L (M, lC ∗ 2 2 (L1 (M, lC ) = L∞ (M, lC ), 2 ) such that there exists a sequence (bn ) ∈ L∞ (M, lC X X |bn |2 k∞ = kϕk2 and ϕ(x) = b∗n dxn , k n≥0
Let a =
P n≥0
1 ∀ x ∈ HC .
n≥0
En bn − En−1 bn (and so dan = En bn − En−1 bn ). Then a ∈ L2 and ϕ(x) =
X
da∗n dxn = ϕa (x),
1 ∀ x ∈ HC .
n≥0
Therefore, ϕ is given by ϕa as in (i). It remains to show a ∈ BMOC and to bound kakBMOC by kϕk. This is done as follows. If k − 1 ≥ n ≥ 0, En Ek b∗k Ek−1 bk = En Ek−1 (Ek b∗k Ek−1 bk ) = En Ek−1 b∗k Ek−1 bk ; similarly,
En Ek−1 b∗k Ek bk = En Ek−1 b∗k Ek−1 bk .
It then follows that if k − 1 ≥ n ≥ 0, En |dak |2 = En (Ek bk − Ek−1 bk )∗ (Ek bk − Ek−1 bk ) = En Ek b∗k Ek bk − Ek−1 b∗k Ek−1 bk ≤ En Ek b∗k Ek bk ≤ En |bk |2 . Hence,
696
G. Pisier, Q. Xu
kEn |a − En−1 a|2 k∞ = kEn
X
|dak |2 k∞
k≥n
X ≤ kEn |dan |2 + |bk |2 k∞ ≤ 3k
X
k≥n+1
|bk | k∞ ≤ 3kϕk2 ; 2
k≥0
whence a ∈ BMOC
and kakBMOC ≤
√ 3kϕk.
1 Thus we have finished the proof of the theorem concerning HC and BMOC . Passing to 1 adjoints yields the part on HR and BMOR . Finally, the duality between H1 and BMO is obtained by the classical (and easy) fact that the dual of a sum is the intersection of the duals.
Corollary. Let x ∈ H1 . Then xn converges in L1 and kxk1 ≤
√ √ 2 ) ≤ 2kdxkL1 (M;lC2 )+L1 (M;lR 2kdxkH1 .
(A3 )
Proof. Let x ∈ H1 . By the discussions in Sect. 1, the finite martingale (x0 , · · · , xn , xn , · · ·) converges to x in H1 . This, together with (A3 ), implies the convergence of xn in L1 . Thus it remains to show (A3 ); also it suffices to show the first inequality of (A3 ) for the second one is trivial. To this end fix n ≥ 0, and choose a ∈ L1 (Mn ) such that kak∞ ≤ 1 and kxn k1 = τ (a∗ xn ). Put ak = Ek (a) for k ≥ 0. Then ak = a for all k ≥ n, and n ∞ X X da∗k dxk = τ da∗k dxk kxn k1 = τ k=0
k=0
2 ) kdak L∞ (M;l2 )∩L∞ (M;l2 ) . ≤ kdxkL1 (M;lC2 )+L1 (M;lR C R (H1 )⊥
However, by the preceding theorem 2 2 ∗ ) ∩ L∞ (M; lR ) L∞ (M; lC = H1 ∼ = BMO . 1 ⊥ (H )
Therefore, by (A1 ) kdak L∞ (M;l2C )∩L∞ (M;l2R ) ≤
√ √ √ 2kakBMO ≤ 2kak∞ ≤ 2.
(H1 )⊥
Combining the previous inequalities we obtain (A3 ), and thus complete the proof of the corollary. Acknowledgement. We are very grateful to Philippe Biane for several fruitful conversations, and also to Eric Carlen for kindly providing us with a copy of a preliminary version of [CK].
Non-Commutative Martingale Inequalities
697
References [AvW]
Accardi, L., von Waldenfels, M., (Eds.): Quantum probability and applications. (Proc. 1988) Springer Lecture Notes 1442 Berlin–Heidelberg–New York: Springer-Verlag, 1990 [BSW1] Barnett, C., Streater, R.F., Wilde, I.F.: The Itˆo-Clifford Integral. J. Funct. Analysis 48, 172–212 (1982); The Itˆo-Clifford Integral II - Stochastic differential equation. J. London Math. Soc. 27, 373– 384 (1983); The Itˆo-Clifford Integral III – Markov property of solutions to Stochastic differential equation. Commun. Math. Phys. 89, 13–17 (1983); The Itˆo-Clifford Integral IV – A Radon-Nikodym theorem and bracket processes. J. Operator Theory 11, 255–211 (1984) [BSW2] Barnett, C., Streater, R.F., Wilde, J.F.: Stochastic integrals in an arbitrary probability gauge space. Math. Proc. Camb. Phil. Soc. 94, 541–551 (1983) [BGM] Berkson, E., Gillespie, T.A., Muhly, P.S.: Abstract spectral decompositions guaranteed by the Hilbert transform. Proc. Lond. Math. Soc. 53, 489–517 (1986) [B1] Bourgain, J.: Vector valued singular integrals and the H 1 -BMO duality. Probability theory and harmonic analysis (Chao-Woyczynski, ed.), New York: Dekker, 1986, pp. 1–19 [B2] Bourgain, J.: Some remarks on Banach spaces in which martingale differences are unconditional. Arkiv f¨or Math. 21, 163–168 (1983) [BR] Bratteli, O., Robinson, D.W.: Operator algebras and quantum statistical mechanics II. Berlin– Heidelberg–New York: Springer-Verlag, 1981 [Bu1] Burkholder, D.: Distribution function inequalities for martingales. Ann. Probab. 1, 19–42 (1973) [Bu2] Burkholder, D.: A geometrical characterization of Banach spaces in which martingale difference sequences are unconditional. Ann. Probab. 9, 997–1011 (1981) [C] Cook, J.M.: The mathematic of second quantization. Trans. Am. Math. Soc. 74, 222–245 (1953) [CK] Carlen, E.A., Kr´ee, P.: On martingale inequalities in noncommutative stochastic analysis. Preprint, 1996 [CL] Carlen, E. and Lieb, E.: Optimal hypercontractivity for Fermi fields and related non-commutative integration inequalities. Comm. Math. Phys., 155, 27–46 (1993) [G] Garsia, A.M.: Martingale inequalities, Seminar Notes on Recent Progress. New. York: Benjamin Inc. 1973 [Gr1] Gross, L.: Existence and uniqueness of physical ground states. J. Funct. Analysis, 10, 52–109 (1972) [Gr2] Gross, L.: Hypercontractivity and logarithmic Sobolev inequalities for the Clifford-Dirichlet form. Duke Math. J. 42, 383–396 (1975) [HP] Haagerup, U. and Pisier, G.: Factorization of analytic functions with values in non-commutative L1 -spaces. Canadian J. Math. 41, 882–906 (1989) [LP] Lust-Piquard, F.: In´egalit´es de Khintchine dans Cp (1 < p < ∞). C.R. Acad. Sci. Paris 303, 289–292 (1986) [LPP] Lust-Piquard, F., Pisier, G.: Noncommutative Khintchine and Paley inequalities. Arkiv f¨or Mat. 29, 241–260 (1991) [M] Meyer, P.A.: Quantum probability for probabilists. Lect. Notes Math. 1538, Berlin–Heidelberg–New York: Springer-Verlag, 1995 [Mi] Mitrea, M.: Clifford wavelets, singular integrals and Hardy spaces. Lect. Notes Math. 1575, Berlin– Heidelberg–New York: Springer Verlag, 1994 [Pa] Paley, R.E.A.C.: A remarkable series of orthogonal functions (I). Proc. Lond. Math. Soc. 34, 241–264 (1932) [P] Pisier, G.: Non-commutative vector valued Lp -spaces and completely p-summing maps. (Revised February 97). To appear in Ast´erisque, Soc. Math. France [PX] Pisier, G., Xu, Q.: In´egalit´es de martingales non commutatives. C. R. Acad. Sci. Paris, 323, 817–822 (1996) [PR] Plymen, R.J., Robinson, P.L.: Spinors in Hilbert space. Cambridge: Cambridge University Press, 1994 [S] Segal, I.E.: Tensor algebra over Hilbert spaces II. Ann. Math. 63, 160–175 (1956) [St] Stein, E.M.: Topics in harmonic analysis related to the Littlewood-Paley theory. Princeton, N. J.: Princeton University Press, 1970
698
G. Pisier, Q. Xu
[TJ]
Tomczak-Jaegermann. N.: The moduli of convexity and smoothness and the Rademacher averages of trace class Sp . Studia Math. 50, 163–182 (1974) Voiculescu, D.V.: Limit laws for random matrices and free products. Invent.Math. 104, 201–220 (1991) Voiculescu, D.V., Dykema, K.J., Nica, A.: Free Random variables. CRM Monograph Series, Vol. 1, Centre de Recherches Math´ematiques, Universit´e de Montr´eal, 1992
[V] [VDN]
Communicated by A. Connes
Commun. Math. Phys. 189, 699 – 707 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Invariants in the Enveloping Algebra of a Semisimple Lie Algebra for the Adjoint Action of a Nilpotent Lie Subalgebra Philippe Caldero Institut de Math´ematique et d’Informatique, Bat. 101, Universit´e Claude Bernard Lyon I, 69622 Villeurbanne, France. E-mail: [email protected] Received: 20 December 1996 / Accepted: 21 March 1997
Abstract: Let g be a complex semisimple Lie algebra and let S(g), resp. U (g), be its symmetric, resp. enveloping algebra. It is shown in [1] that finite W-algebras can be realized, up to a central extension, as the algebra of invariants S(g)m , resp. U (g)m , for the adjoint action of a Lie subalgebra m of g. For m nilpotent, we use the Taylor lemma to give a description (generators and relations) for these algebras. 0. Introduction Let g be a complex semisimple Lie algebra with enveloping algebra U (g). We fix a Cartan subalgebra h of g. Let g = n ⊕ h ⊕ n− be a triangular decomposition of g. Recall, cf. [7], [Kostant, unpublished], (see also Remark 2.1 ), that the center of U (n) is a polynomial algebra and let z be the product of its natural generators. Then, {z n } is a Ore set and let U (g)z be the localized algebra. In [1, 2], a class of algebras called finite W-algebras are shown to be, up to central extension, invariant algebras of type U (g)m z , where m is a Lie subalgebra of g. The aim of this article is to describe these algebras when m is nilpotent. We may without loss of generality restrict ourself to the case where m is a Lie subalgebra of n. The paper is organized as follows. Let A be an associative algebra and m a Lie algebra of locally finite derivations on A. Then, m is a nilpotent Lie algebra and we may fix a central sequence for m. This sequence permits us to apply the Taylor Lemma by induction, cf. Proposition 1.1. Therefore, we can give a description of A as an extension of the algebra of invariants Am and a projection φ from A onto Am . Let R+ be the set of positive roots relative to the triangular decomposition of g. We follow [7] for the construction of a maximal set of strongly orthogonal roots and an ordering in R+ , cf. 1.3. For each ad h-stable Lie subalgebra m of n, we may define, cf. 2.3, a central sequence which is compatible with this ordering. In the second section, we study the adjoint action of m on the (localized) symmetric algebra S(g)z . This algebra being commutative, the projector φ is an algebra projector.
700
P. Caldero
Let m be an almost saturated Lie algebra, cf. 2.2, then, up to central extension, S(g)m z is isomorphic to the symmetric algebra of a quotient Lie algebra of g, cf. Proposition 2.2. Notice that this isomorphism is not compatible with the Poisson-Kirillov structure of these algebras. Nevertheless, in the case where m is saturated, cf. 2.2, we can give generators and relations for S(g)m z ,. Theorem 2.4 and Remark 2.4. In the third section, we first use the Taylor Lemma to describe U (g)z as an extension of U (g)nz , (see [3] for a quantum analog of these results). T. Levasseur pointed out to me that this method was used in [8] and [9] in the framework of the adjoint action of a nilpotent Lie algebra on its enveloping algebra U . Recall that in this case, the invariant algebra is in the center and U is a central extension of a Weyl algebra. We then obtain generators and relations for U (g)nz (this can be done in the same way for a saturated Lie algebra). U (g)nz has a polynomial base and the degree of the relations is lower than 2. Moreover, U (g)nz has a triangular decomposition with a Cartan subalgebra of dimension rk g. The algebra U (g)nz is similar to an enveloping algebra of a semisimple Lie algebra. As in [5], we may define Verma modules, O-category and Shapovalov determinant. For g = sl3 , U (g)nz belongs to a class of algebra similar to U (sl2 ) defined by Smith [10].
1. Notations and Preliminaries 1.1. Let A be a C-algebra and n be a finite dimensional Lie algebra of locally nilpotent derivations on A, dim n = N . Then, n is a nilpotent Lie algebra and admits a central sequence n = n1 ⊃ n2 ⊃ . . . ⊃ nN +1 = {0}, of ideals such that dim ni /ni+1 = 1. For i, 1 ≤ i ≤ N , fix Xi ∈ ni , Xi 6∈ ni+1 . Then, Xi acts on the invariant subalgebra Ani+1 and let S be the set of i such that this action is non trivial. Suppose that for each i in S, there exists ai ∈ Ani+1 such that Xi ai = 1. By convention, we set ai = 0 for i 6∈ S. Now, for i ∈ S, let’s define the endomorphism of Ani+1 by X (−1)n an X n . φi := n! i i Q
n≥0
Q Let (I) = (ni ) in NS . Set X(I) := Xini (for the natural ordering), a(I) := ani i , (for ← Q→ the reverse ordering), and (I)! = ni !. The following proposition is a revisited version of the Taylor lemma: Proposition. We have: M a(I) An . (i) A = (I)∈NS
Q
(ii) φ :=
→
φi is a projector from A to An whose kernel is
X a(I) (iii) IdA = φX(I) . (I)! S
M
a(I) An .
(I)6=ø
(I)∈N
(iv) If A is commutative then φ is an algebra morphism and An ' A/(ai , i ∈ S).
Invariants in the Enveloping Algebra
701
1.2. Let g be a semisimple complex Lie algebra and U := U (g) its enveloping algebra. Let n be the rank of g. We fix a Cartan subalgebra h of g. Let R, resp. R+ , denote the set of all, resp. positive, roots with 1 = {αi , 1 ≤ i ≤ n} a simple system corresponding to R+ . Let gα , α ∈ R, be the root subspace, and for each α in R+ , fix non zero elements xα and yα in, respectively, gα and g−α . Define hα the element of h corresponding to the root α. Set M M g α , n− = g−α . n= α∈R+
α∈R+
Let P denote the associated weight lattice generated by the fundamental weights $i , P 1 ≤ i ≤ n, and P + := i N$i the semigroup of dominant weights. Let W be the Weyl group associated with R. We denote by ( , ) the W -invariant form on P , as well as its corresponding form on h. As usual, we denote by ht(α) for the height α for all α in R. We have (αj , $i ) = δij (αi2,αi ) . Set di = (αi , $i ). of α and αˇ = 2 (α,α) For each Lie algebra m , let S(m), resp. U (m), denote the symmetric, resp. enveloping, algebra of m. S(m) is endowed with the so-called Poisson-Kirillov structure defined through the gradation functor, [6, 2.8.7]. Let { , } be the Poisson-Kirillov bracket. As in [4, 4.2], let Nα,β , α, β ∈ R be the constant structures of g. In particular, Nα,−α = 1, α ∈ R+ , N−α,−β = −Nα,β , α, β ∈ R, and Nα,β = 0 if α + β is not a root. 1.3. We now follow the construction of [7]. Let’s decompose R into simple components Ri , i ∈ N with βi the highest root of Ri . Then, the set of roots of Ri which are orthogonal to βi is again a root system and let Rij , j ∈ N, be its simple components. This (finite) process defines, inductively, a subset K of N ∪ N2 ∪ . . ., and for each K in K a simple root system RK with highest root βK . Then, M := {βK , K ∈ K} is a maximal set of strongly orthogonal roots. K admits a partial ordering through L ≤ K if L = (K, l1 , . . . , lt ), li ∈ N. For K in K, we define the following sets + RK = R+ ∩ RK , 0K = {δ ∈ RK , (δ, βK ) > 0}, 1K = 1 ∩ 0K , 0oK = 0K \{βK }.
The following lemma results from [7]: Lemma. We have: (i) R+ is a disjoint union of the 0K , K ∈ K, (ii) let K ∈ K and α ∈ 0oK , then (βK , α) = 21 (βK , βK ), and α := βK − α ∈ 0oK , (iii) Let α ∈ 0K , δ ∈ R+ . If α + δ is a root, then it belongs to 0L , L ≥ K. If βK + δ is a root, then it belongs to 0L , L > K, (iv) Let α ∈ 0K , δ ∈ 0oL , K < L, if δ − α is a root, then it belongs to 0oL . For α in R+ , let K(α) denote the element of K such that α ∈ 1K(α) , cf. (i). Set K(i) = K(αi ). 2. Invariants in the Symmetric Algebra Let m be an h-stable Lie subalgebra of n. Then, m acts on S(g) and U (g) by the adjoint action. In order to describe the decomposition given by Proposition 1.1 (i), we first need to study the algebra S(n)n . 2.1. Fix a lexicographic (total) ordering ≺ in R+ such that α ≺ β ⇒ (K(α), ht(α)) < (K(β), ht(β)). Set
702
P. Caldero
nα = ⊕δα nδ ,
nα = ⊕δα nδ .
By Lemma 1.3, nα , α ∈ R+ is a central sequence for n. Remark that, by Lemma 1.3 (iii): ad yα (xβK + S(nβK )) = NβK ,−α xα + S(nβK ),
K = K(α).
Now, let’s define inductively in an appropriate localization of S(n) Q S zK = ( φδ )yβK , for δ ∈ L>K 0oL , → Y z>K = zL , K ∈ K,
(2.1.1)
(2.1.2)
L>K
2 −1 ad yα (zK(α) )zK(α) , α ∈ R+ \M, K = K(α), (βK , βK ) Y = zL , K ∈ K.
aα = z>K
L>K
Q
As in 1.1, we can construct projectors φα , for α ∈ R+ \M and φ :=
→
φα . This gives:
Proposition. Each zK , K ∈ K is central in the localized Poisson-Kirillov-algebra S(n)z>K . Let z be their product, then φ is an (algebra) projector from S(n)z onto S(n)nz . Proof. Notice first that if zK(α) is adn-invariant, then, by Lemma 1.3 (ii) ad xα (aα ) = 1. Moreover, aα is nα -invariant. Now, by Proposition 1.1 (iv) and (2.1.1), we have by (inverse) induction on K [ [ nβ S(n)z>KK ' S(n)z>K /(aα , α ∈ 0oL ) ' S(n)z>K /(xα , α ∈ 0oL ). (2.1.3) L>K
L>K
Indeed, suppose this is true for L > K. Then, by Lemma 1.3 (iii), the image of yβK is adnβK invariant in the third algebra of (2.1.3). As well, the adjoint action of yβK is trivial on this algebra. This proves the proposition. Remark. This implies that the localized algebra S(n)nz is isomorphic to the algebra of the group generated by M. [7, 4.12] is then a direct consequence of this fact. 2.2. For β ∈ M, set β = βK and aβ =
1 −1 ad yβ (zK )zK . (β, β)
(2.2.1)
By construction, ad xβ (aβ ) = 1. Fix an h-stable Lie subalgebra m of n and let (m) be its set of weights. Let z := z(m) be the product of the zK such that K = K(α) for α in m. We now study the adjoint action of m on S(g)z . We start with some definitions. Definition. The Lie subalgebra m above is called saturated if α ∈ (m) ⇒ 0K(α) ⊂ (m). It is called almost saturated, if it verifies the following property: let δ ∈ 0L , α ∈ 0K ∩ (m), L > K, with δ − α ∈ R+ , then δ − α ∈ (m).
Invariants in the Enveloping Algebra
703
Note that n is a saturated Lie algebra. A saturated Lie subalgebra of n is clearly almost saturated. If m is the nilradical of a parabolic subalgebra of g, then it is an almost saturated Lie subalgebra (but generally not saturated). For α in R+ , with β = βK(α) , set εα = N−α,β . By [4, Theorem 4.2.1], we have 2 εα = ±1 and εα = −εα . Set nα = (β,β) εα . Let’s start with the lemma: Lemma. Let α ∈ R+ \M, K = K(α) = K(β), and β ∈ M. We have: (i)
zK = xβK + S(nβK )z>K .
−1 + S(nβK )z>K , (ii) aα = nα xα zK
aβ = −
1 −1 hβ zK + S(nβ )z>K . (β, β)
(iii) S(nβ )z>K is an adn−α -module. S (iv) The algebra S(nβ )z>K is generated by aδ , δ ∈ L>K 0oL , and zL , L > K. (v) Up to a multiplicative scalar, ( aα−δ if α − δ ∈ 0K if δ = α ad xδ (aα ) = 1 0 if not, and ad xα (aβ ) = aα . Proof. (i) follows from the definition of zK . (ii) and (v) are consequences of (2.1.1), (2.1.2), (2.2.1). (iv) can be deduced, by induction, from (ii). (iii) is consequence of Lemma 1.3 (v). This leads to the following proposition. Proposition. Let m be an almost saturated Lie subalgebra of n, then β S(g)m z ' S(g)z /(xα , h ; α ∈ (m)\M, β ∈ (m) ∩ M).
Proof. Let α ∈ (m)\M, K = K(α). From (i) and (ii) of the previous lemma and Lemma 1.3 (iv), we have, up to a scalar S aα = xα + IK , where IK , is the ideal in S(nβK )z>K generated by the xδ−α , δ ∈ L>K 0oL . As m is almost saturated, IK is generated by elements xµ , µ ∈ (m). β Now, by Proposition 1.1 (iv), S(g)m z ' S(g)z /(aα , a ; α, β ∈ (m)). This proves the proposition. Remark. This proposition shows that, up to a central extension, the invariant algebra S(g)m z is the algebra of regular functions on a (localized) subspace of g. In [1], xα ’s are the constraint conditions and the hβ ’s are the symmetry fixing. 2.3. Fix a saturated Lie subalgebra m and L let m = dim m. From Proposition 1.1 (i), we have the (ordered) decomposition S(g)z = (I)∈Nm a(I) S(g)m z , where a(I) is an ordered product of aα ’s and aβ ’s. Before calculating the commutation relations inside S(g)m z , we first study the Poisson commutations in S(g)z involving the aα ’s and aβ ’s. This is given by the following proposition. Proposition. Let α, δ in 0\M, K = K(α), and β, β 0 in M. Then:
704
P. Caldero
(i)
{aα , aδ } =
−1 nα zK 0
(ii)
if δ = α . otherwise.
0
β
{a , aα } = (iii) {aα , −}, resp. {aβ +
{aβ , aβ } = 0, −1 1 2 aα zK
if K(α) = K(β) . otherwise
0
−1 zK β (β,β) h ), −},
is zero on S(g)m z .
Proof. Suppose that K(α) is not smaller than K(δ). Then, up to a scalar, {aα , aδ } = {aα , xδ } + {aα , S(nK(δ) )z>K(δ) } = 0, cf. Lemma 2.2 (ii), (v). Now, by Lemma 1.3 (ii) if K = K(α) = K(δ), α − δ belongs to 0K ∪ {0} iff α = δ. Thus, by Lemma 2.2 (v), if {aα , xδ } is non zero, then α = δ. This gives the “otherwise” part of (i). −1 by Lemma 2.2 (ii), (v). Thus, (i) holds. Note that {aα , aα } = nα zK (ii) is proved in the same manner. As m is saturated, S(g)m z PK-commutes with S(nK )z>K . This gives (iii). 2.4. We can now give generators and relations for S(g)nz . Let φ denotes the (algebra) projector on S(g)nz . Set w0δ := φ(hδ ), δ ∈ P,
wα := φ(yα ), α ∈ R+ ,
w−β := φ(xβ ) = zK(β) , β ∈ M. (2.4.1) Fix a set 3 of independent weights generating the orthogonal of M in P. Then, by Proposition 1.1, we have the following natural generators of S(g)nz : wα , α ∈ R+ ∪ −M, w0δ , δ ∈ 3. Moreover, by Proposition 1.1 (i), (iv), they generate a polynomial base for the space S(g)nz . Now, we calculate generators of S(g)nz by using the Taylor formula ( Proposition 1.1 (iii)) at order 1. Theorem. The Poisson bracket on S(g)nz is determined by 1) the w−β , β ∈ M are central elements. 2) 0 {w0δ , w0δ } = 0, {w0δ , wα } = −(δ, α)wα , δ ∈ 3, α ∈ R+ , 3)
X
{wα , wα0 } = Nα0 ,α wα+α0 +
K∈K, ν∈0K ∪{0}
ν Mα,α 0 wα−ν wα0 −ν¯ , w−βK
0
with wα−ν = w0α , if ν = α and wα0 −ν¯ = w0α , if ν¯ = α0 . The constants being given by ν 0, if ν ∈ 00K , Mα,α 0 = −nν Nν,−α Nν,−α ¯ β Mα,α 0 = −
NβK ,−α (βK , α0 ) , (βK , βK )
0 Mα,α 0 =
NβK ,−α0 (βK , α) . (βK , βK )
Invariants in the Enveloping Algebra
705
Proof. We sketch the proof of the last formula. Let I denote the kernel of φ. We have, by Proposition 1.1 (iii): yα = w α +
X
aν Nν,−α wα−ν +
X
aβ Nβ,−α wα−β [I 2 ].
This formula, and the corresponding one for α0 ,is given by Proposition 2.3: X
Nα0 ,α yα+α0 = {yα , yα0 } = {wα , wα0 } −
K∈K, ν∈0K ∪{0}
Acting by φ on both sides yields the desired formula.
ν Mα,α 0 wα−ν wα0 −ν¯ [I]. w−β
Remark. These formulas can be easily generalized for a saturated Lie subalgebra of n. Recall that if m is not saturated, Proposition 2.3 (iii) may not hold.
3. Nilpotent Invariants in the Enveloping Algebra With the help of the Taylor Lemma, cf. Proposition 1.1, we can transpose Proposition 2.3 and 2.4 inside the enveloping algebra of g. 3.1. As in the previous section, Proposition 1.1, (2.1.2), (2.2.1), (2.4.1) enable us to define inside some localization of U (g) the analog of φ, zK , z>K , K ∈ K, z, aα , α ∈ R+ \M, aβ , β ∈ M, wα α ∈ R+ , w0δ , δ ∈ P , w−β , β ∈ M. Replacing the PK-bracket by the usual bracket, the formulas of Proposition 2.3 remain exact in U (g)z (with the same proof). Now, in order to formulate an analog of Proposition 2.4, we need to introduce some more notation. 3.2. NR ordered as in 2.1. +
Definition. Let I be the set of I in NR such that Iα = 0 except for α ∈ R+ \M and + α ≺ α. Let I ∈ I, define I in NR by I α = Iα , α ∈ R+ \M, I β = 0, β ∈ M. For I ∈ I, + B ∈ N[M], let I B ∈ NR be such that IαB = Iα , α ∈ R+ \M, Iβ = Bβ , β ∈ M. Define +
β
I in the same way. For I ∈ NR , set +
| I |=
X
Iα ,
| I |K =
X
Iα ,
I! =
α∈0o K
α∈R+
Q
xI =
→
xIαα ,
nI =
Y
Iα !,
αI =
X
Iα α,
α∈R+
Y
nIαα ,
zK(I) =
Y
Iα zK(α) .
Let aI be as in 1.1. For α ∈ R+ , let NI,−α be the constant such ad xI (yα ) = NI,−α yα−αI . In order to understand the commutations inside U (g)nz , we need to compute the commutators [aI wα , aI 0 wα0 ] modulo Ker φ, cf Proposition 1.1 (ii).
706
P. Caldero
Lemma. Let I ∈ I, B ∈ N[M], and let wλ and wλ0 be elements in U (g)nz of weights λ and λ0 in P. Then: −1 φ[aI wλ , aI wλ0 ] = (−1)|I|+1 I!nI zK(I) w λ0 w λ ,
φ[aI B wλ , aI wλ0 ] = (−1)|B|
Y | I |K(β) (β, λ0 ) −1 + )Bβ . zK(B) ( φ[aI wλ , aI wλ0 ], 2 (β, β)
β∈M
φ[wλ , aB wλ0 ] = (−1)|B|
Y (β, λ0 ) −1 )Bβ . zK(B) ( w λ w λ0 . (β, β)
β∈M
We can now state the analog of Theorem 2.4 for U (g)n . Theorem. The algebra U (g)z has the following decomposition: M aI U (g)n . U (g)z = I∈NR+
Moreover, the algebra U (g)nz is generated by the wα ’s, α ∈ R+ , the w0δ ’s, δ ∈ 3 and ±1 , β ∈ M. The relations are: the central elements w−β 0
{w0δ , w0δ } = 0, {wα , wα0 } = Nα0 ,α wα+α0
{w0δ , wα } = −(δ, α)wα , δ ∈ 3, α ∈ R+ , X NI,−α NI 0 ,−α0 φ[aI wα−αI , aI 0 wα0 −αI 0 ], − I!I 0 ! 0 I,I
where the non zero images of the commutators are given by the previous lemma. Proof. The first assertion follows from Proposition 1.1 (i). Now, let’s prove that the algebra U (g)nz is generated by the given elements. Let u, v ∈ U (g)z . Applying Proposition 1.1 (iii) to the equality Id(uv) = Id(u).Id(v), we obtain that φ(uv) is a linear combination of φ(ad xI (u))φ(ad xJ (v)). The adjoint action of n being nilpotent on U (g)z , we obtain the assertion. As in Theorem 2.4, the proofs of the commutations formulas are straightforward. Remark. In the above proof, we showed that φ(uv) is a combination of φ(ad xI (u))φ(ad xJ (v)). This may be precised by a formula provided by Proposition 1.1 (iii) and the previous lemma. ±1 , β ∈ M] and W 0 = R[w0δ , wβ , δ ∈ 3, β ∈ M]. By Theorem 3.2, 3.3. Set k = C[w−β 0 W is a commutative subalgebra of U (g)nz (this is an analog of the Cartan subalgebra but the wβ are not semisimple). Let W + , resp. W − , the k-space with polynomial base generated by the wα ’s, α ∈ R+ \M, α ≺ α, resp. α α. We have the following decomposition W := U (g)nz = W + X ⊗k W 0 ⊗k W − . For λ in k M × k 3 , we can define the Verma module M (λ) := W/ Wwα + W(wβ − λβ ) + W(w0δ − λδ ), where α,β,δ
wα ∈ W + . The algebra U (g)nz is similar to an enveloping algebra of a semisimple Lie algebra. T. Levasseur pointed out to me that for g = sl3 , cf. [1], [2], [5], it belongs to a class of algebra similar to U (sl2 ) defined by Smith, [10]. Acknowledgement. I am in debt to F. Barbarin, E. Ragoucy, and P. Sorba for introducing me to finite Walgebras and their connection with centralizers. I also want to thank T. Levasseur and O. Mathieu, for precious remarks and discussions.
Invariants in the Enveloping Algebra
707
References 1. Barbarin, F., Ragoucy, E., Sorba. P.: Non-polynomial realizations of W-algebras. To appear in Int. J. Mod. Phys. A 2. de Boer, J. and Tjin, T.: Quantization and representation theory of finite W algebras. Commun. Math. Phys. 158, 485 (1993) 3. Caldero. P.: On the Gelfand-Kirillov conjecture for quantum algebras. In preparation 4. Carter, R.W.: Simple groups of Lie type. London: Interscience, John Wiley, 1972 5. de Vos, K., Van Driel, P.: The Kazhdan-Lusztig conjecture for finite W -algebras. Preprint 6. Dixmier, J.: Alg`ebres enveloppantes, Cahiers scientifiques no 37, Paris: Gauthiers-Villars, 1974 7. Joseph, A.: A preparation theorem for the prime sprectrum of a semisimple Lie algebra. J. Algebra 48, 241–289 (1977) 8. Joseph, A.: A generalization of the Gelfand-Kirillov conjecture. Am. J. Math. 99, 1151–1165 (1977) 9. Nouaz´e, Y., Gabriel, P. Id´eaux premier de l’alg`ebre enveloppante d’une alg`ebre de Lie nilpotente. J. Algebra 6, 77–99 (1967) 10. Smith, S.P.: A class of algebra similar to the enveloping algebra of sl(2). Trans. Am. Math. Soc. 322 no 1, 285–314 (1990) Communicated by R. H. Dijkgraaf
Commun. Math. Phys. 189, 709 – 728 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Completely Integrable Equation for the Quantum Correlation Function of Nonlinear Schr¨odinger Equation T. Kojima1,? , V. E. Korepin2 , N. A. Slavnov3 1 Research Institute for Mathematical Sciences, Kyoto University, Kyoto 606, Japan. E-mail: [email protected] 2 Institute for Theoretical Physics, State University of New York at Stony Brook, Stony Brook, NY 117943840, USA. E-mail: [email protected] 3 Steklov Mathematical Institute, Gubkina 8, Moscow 117966, Russia. E-mail: [email protected]
Received: 15 January 1997 / Accepted: 21 March 1997
Abstract: Correlation functions of exactly solvable models can be described by differential equations [1]. In this paper we show that for the non-free fermionic case, differential equations should be replaced by integro-differential equations. We derive an integrodifferential equation, which describes a time and temperature dependent correlation function hψ(0, 0)ψ † (x, t)iT of the penetrable Bose gas. The integro-differential equation turns out to be the continuum generalization of the classical nonlinear Schr¨odinger equation. 1. Introduction We consider exactly solvable models of statistical mechanics in one space and one time dimension. The Quantum Inverse Scattering Method and Algebraic Bethe Ansatz are effective methods for a description of the spectrum of these models. Our aim is the evaluation of correlation functions of exactly solvable models. Our approach is based on the determinant representation for correlation functions. It consists of a few steps: first the correlation function is represented as a determinant of a Fredholm integral operator, second the Fredholm integral operator is described by a classical completely integrable equation, third the classical completely integrable equation is solved by means of the Riemann–Hilbert problem. This permits us to evaluate the long distance and large time asymptotics of the correlation function. The method is described in [2]. The most interesting correlation functions are time dependent correlation functions. The determinant representation for time and temperature dependent correlation functions of a quantum nonlinear Schr¨odinger equation was obtained in [4]. In this paper we describe the correlation function by means of a completely integrable integro-differential equation. In the forthcoming publication we shall formulate the Riemann–Hilbert problem for this equation and evaluate long distance asymptotics. The quantum nonlinear Schr¨odinger ?
Research Fellow of the Japan Society for the Promotion of Science.
710
T. Kojima, V. E. Korepin, N. A. Slavnov
equation can be described in terms of canonical Bose fields ψ(x, t), ψ † (x, t) (x ∈ R) obeying (1.1) [ψ(x, t), ψ † (y, t)] = δ(x − y). The Hamiltonian and momentum of the model are Z H= dx ∂x ψ † (x)∂x ψ(x) + cψ † (x)ψ † (x)ψ(x)ψ(x) − hψ † (x)ψ(x) , Z P = −i
dxψ † (x)∂x ψ(x).
(1.2) (1.3)
Here 0 < c ≤ ∞ is the coupling constant and h > 0 is the chemical potential. The spectrum of the model was first described by E. H. Lieb and W. Liniger [5, 6]. The Lax representation for the corresponding classical equation of motion [ψ, H] = −
∂2 ψ + 2cψ † ψψ − hψ, ∂x2
(1.4)
was found by V. E. Zakharov and A. B. Shabat [7]. The Quantum Inverse Scattering Method for the model was formulated by L. D. Faddeev and E. K. Sklyanin [8]. The quantum nonlinear Schr¨odinger equation is equivalent to the Bose gas with delta-function interaction. In the sector with N particles the Hamiltonian of Bose gas is given by HN = −
N X X ∂2 + 2c δ(zk − zj ) − N h. 2 ∂zj j=1 1≤j
(1.5)
In this paper we shall consider the thermodynamic of the model. The partition function and the free energy of the model are defined by H
F
Z = tr e− T = e− T .
(1.6)
The free energy F can be expressed in terms of the Yang-Yang equation [9], T ε(λ) = λ − h − 2π
Z∞
2
F =−
T 2π
Z∞
−∞
2c − ε(µ) T dµ, ln 1 + e c2 + (λ − µ)2
ε(µ) ln 1 + e− T dµ.
(1.7)
(1.8)
−∞
The correlation function, which we shall study in this paper, is defined by H tr e− T ψ(0, 0)ψ † (x, t) . hψ(0, 0)ψ † (x, t)iT = H tr e− T
(1.9)
In the previous paper [4] we obtained the determinant representation for this correlation function. In this paper we shall derive completely integrable equations, starting from the determinant representation. The plan of this paper is the following. In Sect. 2 we shall remind the reader of the determinant representation and definition of dual fields. In Sect. 3 we introduce new Hilbert space and rewrite the kernel of the integral operator in
Integrable Equation for Correlation Function of Bose Gas
711
the canonical form. In Sect. 4 we define the resolvent of the integral operator. Sect. 5 is devoted to the construction of the Lax representation. In Sect. 6 we find the logarithmic derivatives of the Fredholm determinant and obtain the completely integrable equation describing the correlation function. In Sect. 7 we summarize the main results. In Appendix A we find some identities for potentials. We give the treatment of the quantum nonlinear Schr¨odinger equation as a continuum generalization of the classical equation in Appendix B. Appendix C is devoted to the free fermion limit.
2. Determinant Representation for the Correlation Function Our starting point is the determinant representation for the temperature correlation function of local fields obtained in [4], ∂ † −iht (0| G(x, t) + hψ(0, 0)ψ (x, t)iT = e ∂α α e det I˜ + Ve − 2π Y |0) . × (2.1) 1 e det I˜ − 2π KT α=0
Let us explain our notations. We begin with the numerator in the r.h.s. of (2.1), which is the Fredholm determinant α e of the integral operator I˜ + Ve − 2π Y (here I˜ is identical operator: I(λ, µ) = δ(λ − µ)). This operator acts on the real axis. The left and right actions on some trial function f are given by
Z∞ α α e e ˜ Y ◦ f (µ) = f (λ) + Y (λ, µ) f (µ) dµ, V (λ, µ) − I +V − 2π 2π −∞
Z∞ α e α e ˜ f (λ) ◦ I + V − Y = f (µ) + Y (λ, µ) dλ. f (λ) V (λ, µ) − 2π 2π
(2.2)
−∞
Here and hereafter we denote by the symbol “◦” the action of integral operators on functions. The kernels of operators Ve and Ye can be written in terms of auxiliary quantum operators – dual fields, acting in an auxiliary Fock space. One can find the detailed definition and properties of dual fields in Sect. 5 and Appendix C of [4]. Here we repeat them briefly. Consider an auxiliary Fock space having vacuum vector |0) and dual vector (0|. Three dual fields ψ(λ), φD1 (λ) and φA2 (λ) acting in this space are defined as φA2 (λ) = qA2 (λ) + pD2 (λ), φD1 (λ) = qD1 (λ) + pA1 (λ), ψ(λ) = qψ (λ) + pψ (λ).
(2.3)
Here p(λ) are annihilation parts of dual fields: p(λ)|0) = 0; q(λ) are creation parts of dual fields: (0|q(λ) = 0. Thus, any dual field is the sum of annihilation and creation parts. Nonzero commutation relations are (see [4])
712
T. Kojima, V. E. Korepin, N. A. Slavnov
[pA1 (λ), qψ (µ)] = ln h(µ, λ), [pD2 (λ), qψ (µ)] = ln h(λ, µ), [pψ (λ), qA2 (µ)] = ln h(µ, λ), where [pψ (λ), qD1 (µ)] = ln h(λ, µ), [pψ (λ), qψ (µ)] = ln[h(λ, µ)h(µ, λ)].
h(λ, µ) =
λ − µ + ic , ic
(2.4)
Recall that c is the coupling constant in (1.2). It follows immediately from (2.4) that the dual fields belong to an Abelian subalgebra [ψ(λ), ψ(µ)] = [ψ(λ), φa (µ)] = [φb (λ), φa (µ)] = 0,
(2.5)
where a, b = A2 , D1 . This property, in fact, permits us to treat the dual fields as some c-number functions. Let us define the function Z(λ, µ): Z(λ, µ) =
e−φD1 (λ) e−φA2 (λ) + . h(µ, λ) h(λ, µ)
(2.6)
The kernel Ve is equal to √ √ 1 1 e 2 (φD1 (λ)+φA2 (λ)) e 2 (φD1 (µ)+φA2 (µ)) θ(λ) θ(µ) V (λ, µ) = 4π 2 (λ − µ) Z∞ × −∞
du Z(u, u)
e−φA2 (u) e−φD1 (u) e−φA2 (u) e−φD1 (u) + − − u − λ − i0 u − λ + i0 u − µ − i0 u − µ + i0
×eψ(u)+τ (u) e− 2 (ψ(λ)+τ (λ)+ψ(µ)+τ (µ)) Z(u, λ)Z(u, µ). 1
(2.7)
The integral operator Ye is a one-dimensional projector Y (λ, µ) = P (λ)P (µ),
(2.8)
where √ −φD (u) Z∞ 1 1 e−φA2 (u) du e e 2 (φD1 (µ)+φA2 (µ)) θ(µ) + P (µ) = 2π Z(u, u) u − µ − i0 u − µ + i0 −∞
×eψ(u)+τ (u) e− 2 (ψ(µ)+τ (µ)) Z(u, µ). 1
Here functions θ(λ) and τ (λ) are equal to −1 ε(λ) , θ(λ) = 1 + exp T τ (λ) = itλ2 − ixλ.
(2.9)
(2.10) (2.11)
The Fermi weight θ(λ) defines the dependence of the correlation function on temperature T . The energy of the one-particle excitation ε(λ) is given in (1.7). The function τ (λ) depends also on the distance x and the time t. All other functions entering expressions for V (λ, µ) and P (µ) do not depend on x and t.
Integrable Equation for Correlation Function of Bose Gas
713
We would like to draw the reader’s attention to the fact that formulae (2.7) and (2.9) slightly differ from formulae (6.25) and (6.26) of [4]. It is explained in Appendixes C and D of [4] how one can reduce formulae (6.25) and (6.26) to formulae (2.7) and (2.9). α e Y is explained. Thus, the integral operator I˜ + Ve − 2π 1 e ˜ The operator I − 2π KT also acts on the whole real axis. Its kernel is given by p p 2c KT (λ, µ) = θ(λ) θ(µ). (2.12) c2 + (λ − µ)2 Finally the function G(x, t) in the r.h.s. of (2.1) is equal to 1 G(x, t) = 2π
Z∞
eψ(v)+τ (v) dv.
(2.13)
−∞
Thus, we have described the r.h.s. of (2.1). The temperature correlation function of local fields is proportional to the vacuum expectation in the auxiliary Fock space of the Fredholm determinant of the integral operator. The auxiliary quantum operators – dual fields – enters the kernels Ve and Ye . Due to the property (2.5) the Fredholm determinant is well defined. Our aim now is a description of the correlation function in terms of solutions of classical completely integrable equations. 3. Vectors and Operators of New Hilbert Space We introduce the functions E± : E+ (λ|u) =
1 Z(u, λ) 2π Z(u, u)
e−φD1 (u) e−φA2 (u) + u − λ + i0 u − λ − i0
p θ(λ)
×eψ(u)+τ (u)+ 2 (φD1 (λ)+φA2 (λ)−ψ(λ)−τ (λ)) , 1
(3.1)
p 1 1 Z(u, λ)e 2 (φD1 (λ)+φA2 (λ)−ψ(λ)−τ (λ)) θ(λ). (3.2) 2π The functions E± depend also on the distance x, the time t, the temperature T and the chemical potential h, but this dependence as a rule is suppressed in the notation. One can rewrite expressions (2.7) and (2.9) for V (λ, µ) and P (µ) in terms of these functions, E− (λ|u) =
1 V (λ, µ) = λ−µ
Z∞ du(E+ (λ|u)E− (µ|u) − E− (λ|u)E+ (µ|u)),
(3.3)
−∞
Z∞ dudvE+ (λ|u)E+ (µ|v).
Y (λ, µ) = P (λ)P (µ) = −∞
Using obvious equalities ∂x eτ (λ) = −iλeτ (λ) ; we arrive at
∂t eτ (λ) = iλ2 eτ (λ) ,
(3.4)
714
T. Kojima, V. E. Korepin, N. A. Slavnov
∂x E+ (λ|u) = − iλ 2 E+ (λ|u) − i ∂x E− (λ|u) = ∂t E+ (λ|u) =
R∞ −∞
dvg(u, v)E− (λ|v), (3.5)
iλ 2 E− (λ|u),
iλ2 2 E+ (λ|u)
R∞
+
−∞
dv(iλg(u, v) − ∂x g(u, v))E− (λ|v), (3.6)
∂t E− (λ|u) = − iλ2 E− (λ|u). 2
Here
g(u, v) = δ(u − v)eψ(v)+τ (v) .
(3.7)
The relations (3.5) and (3.6) are important for a description of the correlation function in terms of solutions of completely integrable equations. In order to do this it is necessary to investigate properties of the integral operator I˜ + Ve . Later on we shall show how one can take into consideration the contribution of the projector Ye . In order to derive differential equations for the correlation function it is convenient to treat functions E± as components of vectors of some Hilbert space H, for example, rigged L2 (−∞, ∞) ⊗ R2 . In order to do it let us introduce the bra-vector hE L (λ)| and the ket-vector |E R (λ)i which belong to the Hilbert space H. Both of these vectors have two discrete components (corresponding to the space R2 ), which we shall denote with indices 1 and 2. In turn any discrete component has continuous “index” (corresponding to the rigged L2 (−∞, ∞) space) which we shall denote as “u” (or “v”, “w”, etc.). So R E1 (µ|u) . hE L (λ)| = E1L (λ|u), E2L (λ|u) , |E R (µ)i = R E2 (µ|u) The definition of the scalar product is standard : L
R
Z∞
hE (λ)|E (µ)i =
du E1L (λ|u)E1R (µ|u) + E2L (λ|u)E2R (µ|u) .
(3.8)
−∞
On the contrary the products of type |E R (µ)ihE L (λ)| as usual are operators in the Hilbert space H. We shall consider such operators below. Let us identify E1R (µ|u) = E+ (µ|u),
E1L (λ|u) = −E− (λ|u),
E2R (µ|u)
E2L (λ|u)
(3.9) = E− (µ|u),
= E+ (λ|u).
Then one can rewrite the kernel of the integral operator V as V (λ, µ) = Due to (3.9) we have
hE L (λ)|E R (µ)i . λ−µ
hE L (λ)|E R (λ)i = 0,
and hence the kernel V (λ, µ) is not singular in the point λ = µ.
(3.10)
(3.11)
Integrable Equation for Correlation Function of Bose Gas
715
The representation (3.10) is the canonical form of the kernels of the completely integrable integral operators. In all examples related to correlation functions, the kernels of integral operators can be presented in the form (3.10). The different realizations of space H correspond to the concrete correlation functions. For example, in the free fermion situation (the coupling constant c goes to infinity) H = Rn , where n is the number of fields [10–14]. For equal-time correlation functions of penetrable bosons the representations of type (3.10) were constructed with H = L2 (0, ∞)⊗R2n in [12, 15, 16] (see also Sect. XIV of [2]). In the present paper we shall follow the method developed in the papers enumerated above. Operators acting in the space H are defined in the standard way. They have discrete b = Ajk (u, v), j, k = 1, 2; −∞ < u, v < ∞. We shall denote and continuous indices: A these operators with the sign “hat” in order to distinguish them from integral operators which we have denoted with the sign “tilde”. Action on vectors is given by b (µ)i = A|E R
2 Z∞ X
Ajk (u, v)EkR (µ|v) dv,
k=1 −∞
(3.12) b= hE L (λ)|A
2 X
Z∞
EjL (λ|u)Ajk (u, v) du.
j=1 −∞
On the contrary the integral operators of type I˜ + Ve appear to be scalars relative to the space H, for example Z∞ E1R (λ|u) + V (λ, µ)E1R (µ|u) dµ −∞ R e ˜ I + V ◦ |E (µ)i = . Z∞ R E2 (λ|u) + V (λ, µ)E2R (µ|u) dµ −∞
The product of operators of type Aˆ is b(2) = b(1) A A
2 Z∞ X l=1 −∞
(2) A(1) jl (u, w)Alk (w, v) dw.
The trace of the operator is defined as usual b= tr A
Z∞
du A11 (u, u) + A22 (u, u) .
−∞
In particular for operators of type | . . .ih. . . | we have tr | . . .ih. . . | = h. . . | . . .i. Using the operator notations, one can rewrite the relations (3.5), (3.6) in the form of linear partial differential equations :
716
T. Kojima, V. E. Korepin, N. A. Slavnov R b ∂x |E R (λ)i = L(λ)|E (λ)i,
b ∂x hE L (λ)| = −hE L (λ)|L(λ),
(3.13)
ˆ (λ)|E R (λ)i, ∂t |E R (λ)i = M
ˆ (λ), ∂t hE L (λ)| = −hE L (λ)|M
(3.14)
where
b L(λ) = λσˆ + [g, ˆ σ], ˆ
(3.15)
b ˆ (λ) = −λL(λ) ˆ M + ∂x g,
(3.16)
and the operators σˆ and gˆ are equal to i 1 0 i δ(u − v), σˆ = − σ3 δ(u − v) = − 2 2 0 −1 0 −1 g(u, v). gˆ = −σ+ g(u, v) = 0 0
(3.17)
(3.18)
Later we shall use the relations (3.13) and (3.14) in order to construct the nontrivial Lax representation. 4. Vectors hF L (λ)|, |F R (λ)i and Resolvent Let us introduce the vectors hF L (λ)| and |F R (λ)iwhich belong to the same space H: R F1 (µ|u) , (4.1) |F R (µ)i = hF L (λ)| = F1L (λ|u), F2L (λ|u) , F2R (µ|u) defining them as solutions of the integral equations I˜ + Ve ◦ hF L (µ)| = hE L (λ)|,
(4.2)
|F R (λ)i ◦ I˜ + Ve = |E R (µ)i.
(4.3)
More preciously these formulae mean FjL (λ)
Z∞ +
V (λ, µ)FjL (µ) dµ = EjL (λ),
−∞
FjR (µ)
Z∞ +
FjR (λ)V (λ, µ) dλ = EjR (µ).
−∞
Define the resolvent of the operator I˜ − Ve as e = I. ˜ I˜ + Ve ◦ I˜ − R Obviously
(4.4)
Integrable Equation for Correlation Function of Bose Gas
717
e ◦ hE L (µ)| = hF L (λ)|, I˜ − R
(4.5)
e = |F R (µ)i. |E R (λ)i ◦ I˜ − R
(4.6)
Let us find the kernel of the resolvent. One can rewrite (4.4) as follows: Z∞ V (λ, µ) −
V (λ, ν)R(ν, µ) dν = R(λ, µ). −∞
Multiplying both sides of the last equality by λ − µ we get Z∞ V (λ, ν)(λ − ν + ν − µ)R(ν, µ) dν = (λ − µ)R(λ, µ),
(λ − µ)V (λ, µ) − −∞
or L
Z∞
R
hE (λ)|E (µ)i −
hE L (λ)|E R (ν)iR(ν, µ) dν = I˜ + Ve ◦ (ν − µ)R(ν, µ),
−∞
or due to (4.6),
hE L (λ)|F R (µ)i = I˜ + Ve ◦ (ν − µ)R(ν, µ).
e act on this equation from the left, we get Making I˜ − R hF L (λ)|F R (µ)i = (λ − µ)R(λ, µ). Therefore we have come to the following expression for the resolvent kernel: R(λ, µ) =
hF L (λ)|F R (µ)i . λ−µ
(4.7)
It is worth mentioning that this method of calculation of resolvent is a direct generalization of the method described in the section XIV.1 of (7). In the next section we shall need the operator B˜ (potential), defined as Z∞ B˜ =
|F R (λ)iiE L (λ)| dλ.
(4.8)
−∞
Obviously R
L
Z∞
B˜ = |F (λ) ◦ I˜ + V˜ ◦ hF (µ)| =
|E R (µ)ihF L (µ)| dµ.
(4.9)
−∞
The components of this operator are Z∞ Bjk (u, v) = −∞
FjR λ|u EkL λ|v dλ,
(4.10)
718
T. Kojima, V. E. Korepin, N. A. Slavnov
so B˜ =
B11 (u, v) B21 (u, v)
B12 (u, v) . B22 (u, v)
(4.11)
The operator C˜ defined in the similar way is also useful Z∞ C˜ =
λ|F R (λ)ihE L (λ)| dλ.
(4.12)
−∞
The components of the operator C˜ are Z ∞ λFjR λ|u EkL λ|v du, Cjk (u, v) =
(4.13)
−∞
so
C˜ =
C11 (u, v) C12 (u, v)
C12 (u, v) . C22 (u, v) cr
(4.14)
5. The Lax Representation In this section we construct the Lax representation having nontrivial compatibility condition. Namely, we establish the following relations ˜ ∂x |F R (λ)i = L|(λ)i, R ˜ ∂t |F (λ)i = M|(λ)i,
(5.1) (5.2)
which are analogous to the relations (3.13) and (3.14). We shall prove that one can obtain ˜ ˜ (λ) using formulae (3.15) and (3.16) with replacement the operators calL(λ) and calM ˆ Let us calculate the derivative of V (λ, µ) with respect to x using formulae gˆ by gˆ + B. (3.13) ∂x V (λ, µ) = −
R b b hE L (λ)|(L(λ) − L(µ))|E (µ)i λ−µ
ˆ R (µ)i. = −hE L (λ)|σ|E
(5.3)
Thus, from (4.3) we get Z∞ R e b ˜ |F R (λ)ihE L (λ)|σ|E ˆ R (µ)i dλ = L(µ)|E (µ)i, ∂x |F (λ)i ◦ I + V − R
(5.4)
−∞
b or using the definition of B, R b · σ|E b ˆ R (µ)i = L(µ)|E (µ)i. ∂x |F R (λ)i ◦ I˜ + Ve − B e on this equality on the right, we get Making I˜ − R
R R b · σ|F b b e + L(λ)|F b ∂x |F R (λ)i − B ˆ R (λ)i = (L(µ) − L(λ))|E (µ)i ◦ I˜ − R (λ)i.
Integrable Equation for Correlation Function of Bose Gas
719
In the r.h.s. we have R∞ R R e = −σˆ b b (µ)i ◦ I˜ − R |E (µ)ihF L (µ)|F R (λ)i dµ (L(µ) − L(λ))|E −∞
b R (λ)i. = −σˆ · B|F b b σ] + [B, ˆ |F R (λ)i, ∂x |F R (λ)i = L(λ)
Therefore or where and
R b (λ)i, ∂x |F R (λ)i = L(λ)|F
(5.5)
b = λσˆ + [b, ˆ σ], L(λ) ˆ
(5.6)
b + g. bˆ = B ˆ
(5.7)
In the same way one can obtain a similar formula for hF L (λ)|, b ∂x hF L (λ)| = −hF L (λ)|L(λ). Now let us turn to the derivative of |F R (λ)i with respect to t. As before, we start with differentiation of the kernel V (λ, µ) using formulae (3.14), ∂t V (λ, µ) = =
R b b hE L (λ)|(λL(λ) − µL(µ))|E (µ)i λ−µ R b b b hE L (λ)|((λ − µ)L(λ) − µ(L(µ) − L(λ))|E (µ)i λ−µ
R b (µ)i + µhE L (λ)|σ|E ˆ R (µ)i = hE L (λ)|L(λ)|E
ˆ R (µ)i. = −∂x hE L (λ)| · |E R (µ)i + µhE L (λ)|σ|E
(5.8)
Thus, from (4.3) we get Z∞ e ˜ ∂t |F (λ)i ◦ I + V − |F R (λ)i∂x hE L (λ)| · |E R (µ)i dλ
R
−∞
Z∞
b |F R (λ)ihE L (λ)|σ|E ˆ R (µ)i dλ = [−µL(µ) + ∂x g]|E ˆ R (µ)i.
+µ −∞
Comparing (5.9) with (5.4), we see that Z∞ µ −∞
R b |F R (λ)ihE L (λ)|σ|E ˆ R (µ)i dλ + µL(µ)|E (µ)i
= ∂x |F R (λ)i ◦ I˜ + Ve · µ.
(5.9)
720
T. Kojima, V. E. Korepin, N. A. Slavnov
Substituting this formula into (5.9) we find ∂t |F R (λ)i ◦ I˜ + Ve + ∂x |F R (λ)i ◦ I˜ + Ve · µ Z∞ =
|F R (λ)i∂x hE L (λ)| · |E R (µ)i dλ + ∂x g|E ˆ R (µ)i.
(5.10)
−∞
Acting on (5.10) by the resolvent from the right we have e ∂t |F R (λ)i + ∂x |F R (λ)i ◦ I˜ + Ve · µ ◦ I˜ − R =
Z∞
|F R (µ)i∂x hE L (µ)| dµ · |F R (λ)i + ∂x g|F ˆ R (λ)i.
(5.11)
−∞
The second term in the l.h.s. of (5.11) is equal to e ∂x |F R (λ)i ◦ I˜ + Ve · µ ◦ I˜ − R e = ∂x |F R (λ)i ◦ I˜ + Ve · (µ − λ + λ) ◦ I˜ − R = λ∂x |F R (λ)i − ∂x |F R (λ)i ◦ I˜ + Ve ◦ hF L (µ)|F R (λ)i ∞ Z ∂x |F R (µ)i · hE L (µ)| dµ |F R (λ)i. = λ∂x |F R (λ)i − −∞
Therefore we arrive at
∂t |F R (λ)i = −λ∂x |F R (λ)i + +
Z∞
∂x |F R (µ)i · hE L (µ)| dµ · |F R (λ)i
−∞
Z∞
|F R (µ)i · ∂x hE L (µ)| dµ · |F R (λ)i + ∂x g|F ˆ R (λ)i
−∞
b + g)|F ˆ R (λ)i, = −λ∂x |F R (λ)i + ∂x (B or
(5.12)
R ˆ (λ)i. ∂t |F R (λ)i = M(λ)|F
(5.13)
Here
b + ∂x b, ˆ ˆ M(λ) = −λL(λ) or, using (5.6) we can rewrite (5.14) as
(5.14)
ˆ ˆ σ] ˆ M(λ) = −λ2 σˆ − λ[b, ˆ + ∂x b.
(5.15) L
In the same way one can obtain a similar formula for hF (λ)|, ˆ ∂t hF L (λ)| = −hF L (λ)|M(λ). Thus, we have constructed the Lax representation.
Integrable Equation for Correlation Function of Bose Gas
721
6. The Logarithmic Derivatives of the Determinant and the Differential Equations Remind the reader that the operator describing the correlation function contains the projector Ye : Z∞ dudvE+ (λ|v)E+ (µ|u). Y (λ, µ) = P (λ)P (µ) = −∞
In the operator notations it can be written in the form Z∞
dudvE+ (λ|v)E+ (µ|u) = hE L (λ)|σˆ − |E R (µ)i,
(6.1)
−∞
where the operator σˆ − is equal to
σˆ − =
00 10
.
(This is the particular case of an operator – its discrete components are constant functions of continuous indices u and v. The action of this operator on vectors is still given by (3.12).) α e Y with respect to α Let us calculate the derivative of det I˜ + Ve − 2π α e Y ∂α log det Ie + Ve − 2π α=0 1 1 e =− Tr hE L (λ)|σˆ − |E R (µ)i ◦ I˜ − R Tr hE L (λ)|σˆ − |F R (λ)i =− 2π 2π Z∞ 1 b σˆ − = − 1 tr B B12 (u, v) dudv. =− 2π 2π −∞
Here we denote by the symbol “Tr” the trace of the integral kernels. Hence, Z∞ dudv α e e e e ˜ Y B12 (u, v). = − det I + V ∂α det I + V − 2π 2π α=0
(6.2)
−∞
Due to the definitions (3.18), (5.7) and identity 1 G(x, t) = 2π
Z∞ dudvg(u, v), −∞
we come to the equality 1 G(x, t) − 2π
Z∞ −∞
1 B12 (u, v) dudv = − 2π
Hence the correlation function is equal to
Z∞ b12 (u, v) dudv. −∞
(6.3)
722
T. Kojima, V. E. Korepin, N. A. Slavnov
hψ(0, 0)ψ † (x, t)iT = −
e
det I˜ + Ve
−iht
2π
(0|
det(Iˆ −
Z∞
1 e 2π KT ) −∞
b12 (u, v) dudv|0).
(6.4)
The logarithmic derivatives of the determinant of the operator I˜ + Ve with respect to x and t are equal, e , ∂x log det I˜ + Ve = Tr ∂x Ve ◦ I˜ − R e . ∂t log det I˜ + Ve = Tr ∂t Ve ◦ I˜ − R Using (5.3) we have e e = − Tr hE L (λ)|σ|E Tr ∂x Ve ◦ I˜ − R ˆ R (µ)i ◦ I˜ − R Z∞ =−
b dλ(hE (λ)|σ|F ˆ (λ)i) = − tr B · σˆ , L
R
−∞
and hence
∂x log det I˜ + Ve
b = − tr B · σˆ .
(6.5)
In the same method one can compute the time derivative e Tr ∂t Ve ◦ I˜ − R R e ˆ + µ − λ) + [g, ˆ σ])|E ˆ (µ)i ◦ I˜ − R = Tr hE L (λ)|(σ(2λ
Z∞
L
R
L
R
dλ 2λhE (λ)|σ|F ˆ (λ)i + hE (λ)|[g, ˆ σ]|F ˆ (λ)i
= −∞
Z∞ −
dλdµ hE L (λ)|σ|E ˆ R (µ)ihF L (µ)|F R (λ)i
−∞
b · [g, b · σˆ − B b 2 · σˆ + B ˆ σ] ˆ = tr 2C Therefore
2 b ˆ ˆ = tr 2(C + b · g) ˆ · σˆ − b · σˆ .
b + bˆ · g) ˆ · σˆ − bˆ 2 · σˆ . ∂t log det I˜ + Ve = tr 2(C
(6.6)
The second logarithmic derivatives of the determinant can be expressed in terms the of matrix bˆ only. Indeed, using (A.5) and (A.7) we get ˆ σ] ∂x ∂x log det I˜ + Ve = − tr [b, ˆ · bˆ · σˆ , (6.7)
Integrable Equation for Correlation Function of Bose Gas
e ˆ ˆ ˆ ˆ ˜ ∂x ∂t log det I + V = − tr ∂x b · b · σˆ − b · ∂x b · σˆ .
723
(6.8)
Thus, we have expressed the logarithmic derivatives of the Fredholm determinant in b and C. b It is worth mentioning that the second ˆ B terms of the traces of operators b, logarithmic derivatives depend on the operator bˆ only. Now let us turn back to the Lax representation. The relations (5.5) and (5.13), R b ∂x |F R (λ)i = L(λ)|F (λ)i, R ˆ (λ)i ∂t |F R (λ)i = M(λ)|F
should be compatible. It means that b M] ˆ + [L, ˆ = 0. ∂t Lb − ∂x M Substituting (5.6) and (5.15) into the last formula, we get ˆ σ] ˆ σ], ˆ − ∂x ∂x bˆ + [b, ˆ ∂x bˆ = 0. [∂t b,
(6.9)
Remark. More accurately Eq. (6.9) is valid if there exists a sequence of complex numbers λn such that the corresponding sequence of vectors |F R (λn )i generates a basis of the space H. However, one can check that this condition is not necessary. Indeed, using identities for potentials from Appendix A (in particular formulae (A.3) and (A.6)) it is possible to prove the equality (6.9) directly, without using the compatibility condition. Thus, we have arrived at the following results. The second logarithmic derivatives ˆ On the other hand of the Fredholm determinant are presented in terms of the operator b. this operator satisfies the partial differential Eq. (6.9). In turn this equation appears to be the compatibility condition of the auxiliary linear problem (5.5), (5.13). b 7. Main Results in Components of Operators bˆ and C In this section we summarize the main results obtained in the previous sections. Till now we have not used the symmetry of the kernel V (λ, µ) = V (µ, λ) (3.3). Using this property and the definitions (3.9) one can get additional identity, B11 (u, v) = −B22 (v, u).
(7.1)
Due to the definitions (5.7) and (3.18) we have bab (u, v) = Bab (u, v),
for all a, b except the case a = 1, b = 2, (7.2)
b12 (u, v) = B12 (u, v) − g(u, v) It follows from (A.3), (A.4) and definition (3.17) that Z∞ ∂x B11 (u, v) = i
dwb12 (u, w)b21 (w, v),
(7.3)
−∞
Z∞
∂x B22 (u, v) = −i −∞
dwb21 (u, w)b12 (w, v).
(7.4)
724
T. Kojima, V. E. Korepin, N. A. Slavnov
Using identities (A.5), (A.7) and expressions for the logarithmic derivatives of the determinant (6.5)–(6.8) we have Z∞ dub11 (u, u), ∂x log det I˜ + Ve = i −∞ Z∞
∂t log det I˜ + Ve = i
(7.5)
dudv −b21 (u, v)g(v, u)
−∞ Z∞
du(C22 (u, u) − C11 (u, u)) ,
+ i
(7.6)
−∞ Z∞
∂x ∂x log det I˜ + Ve = − ∂t ∂x log det I˜ + Ve = i
dudvb12 (u, v)b21 (v, u),
(7.7)
−∞ Z∞
dudv(∂x b12 (u, v) · b21 (v, u)
−∞
− ∂x b21 (u, v) · b12 (v, u)).
(7.8)
The diagonal part of the compatibility condition (6.9) is equal to zero due to (7.3) and (7.4). For the off-diagonal part of the compatibility condition we find − i∂t b12 (u, v) = −∂x2 b12 (u, v) Z∞ +2 dw1 dw2 b12 (u, w1 )b21 (w1 , w2 )b12 (w2 , v),
(7.9)
−∞
i∂t b21 (u, v) = −∂x2 b21 (u, v) Z∞ +2 dw1 dw2 b21 (u, w1 )b12 (w1 , w2 )b21 (w2 , v).
(7.10)
−∞
The system of partial differential Eqs. (7.9) and (7.10) is a continuum generalization of the classical nonlinear Schr¨odinger equation. The second logarithmic derivatives of the Fredholm determinant (7.7) and (7.8) give us the densities of the first and the second integrals of motion of this equation. It means that the correlation function possesses the properties of a τ -function of the generalized nonlinear Schr¨odinger equation. Summary The main goal of this paper was a description of the quantum correlation function in terms of solutions of the classical completely integrable equation. We have constructed such differential Eqs. – (7.9), (7.10), which, in fact, are a continual generalization of the nonlinear Schr¨odinger equation. The correlation function of local fields plays the role of the τ -function of these equations. In particular the second logarithmic derivatives of
Integrable Equation for Correlation Function of Bose Gas
725
the Fredholm determinant (7.7) and (7.8) give us the densities of the first and the second integrals of motion. In a forthcoming publication we shall formulate the Riemann– Hilbert problem for the differential equation obtained in this paper. This permits us to evaluate the long distance asymptotic. A. Identities for Potentials b (or b) ˆ and Here we establish some identities between matrix elements of potentials B b b C. We also introduce the operator D: b= D
Z∞
λ2 |F R (λ)ihE L (λ)| dλ.
(A.1)
−∞
b with respect to x. We shall use Let us calculate the derivative of the operator B formulae (3.13), (3.15), (5.5) and (5.6). We have b= ∂x B
Z∞
R b b dλ L(λ)|F (λ)ihE L (λ)| − |F R (λ)ihE L (λ)|L(λ)
−∞
Z∞ =
R ˆ σ])|F dλ (λσˆ + [b, ˆ (λ)ihE L (λ)| − |F R (λ)ihE L (λ)|(λσˆ + [g, ˆ σ] ˆ
−∞
b + [b, b−B b · [g, ˆ σ] = [σ, ˆ C] ˆ ·B ˆ σ]. ˆ Therefore
(A.2)
b = [σ, b + [b, b−B b · [g, ˆ σ] ˆ C] ˆ ·B ˆ σ]. ˆ ∂x B
(A.3)
Taking the trace of this equality we have b tr ∂x B = 0.
(A.4)
b + g. Here we used definition bˆ = B ˆ Multiplying (A.3) by σˆ and taking the trace we have b ˆ ˆ tr ∂x B · σˆ = tr [b, σ] ˆ · b · σˆ . (A.5) Here we used the property gˆ 2 = (gˆ σ) ˆ 2 = 0,
see (3.18).
b with respect Using the same method one can calculate the derivative of the operator B to t. To do it we need formulae (3.14), (3.16), (5.13) and (5.14). We have b= ∂t B
Z∞
R ˆ ˆ (λ) dλ M(λ)|F (λ)ihE L (λ)| − |F R (λ)ihE L (λ)|M
−∞
b−B b · ∂x g. b − [b, b+C b · [g, ˆ σ] ˆ = −[σ, ˆ D] ˆ ·C ˆ σ] ˆ + ∂x bˆ · B
(A.6)
726
T. Kojima, V. E. Korepin, N. A. Slavnov
Again multiplying by σˆ and taking the trace we have b b b b b ˆ ˆ ˆ C] · gˆ · σˆ − [σ, ˆ C] · σˆ · b + ∂x b · B · σˆ − B · ∂x gˆ · σˆ . tr ∂t B · σˆ = tr [σ, b from (A.3) (Here we used cyclic permutation under the sign “tr”). Substituting [σ, ˆ C] we get after simple algebra b ˆ ˆ ˆ ˆ (A.7) tr ∂t B · σˆ = tr ∂x b · b · σˆ − b · ∂x b · σˆ . These identities were used for calculation of the second logarithmic derivatives of the Fredholm determinant in Sect. 6. One can also use the method described above in order to prove directly Eq. (6.9) ˆ σ] ˆ σ], ˆ − ∂x ∂x bˆ + [b, ˆ ∂x bˆ = 0. [∂t b,
B. Quantum Equation as Continual Differential Equation Consider the quantum nonlinear Schr¨odinger equation i∂t ψ = −∂x2 ψ + 2cψ † ψψ,
(B.1)
where ψ(x, t) is the annihilation operator in the bosonic Fock space. Let us take a matrix element of this equation between the ground a state at finite density |i and state with one hole hh0 |. We shall have in mind that the momentum of the hole k0 is fixed, i∂t hh0 |ψ|i = −∂x2 hh0 |ψ|i + 2chh0 |ψ † |h0 , h1 ihh1 , h0 |ψ|h2 ihh2 |ψ|i.
(B.2)
Here h2 is a hole with momentum k2 and h1 is a hole with momentum k1 . In the last term in the r.h.s. of (B.2) we have to integrate with respect to k1 and k2 . This shows that the quantum nonlinear Schr¨odinger equation is equivalent to the classical continual differential equation. The same type of equations describes quantum correlation functions (see (7.9), (7.10)). C. Free Fermion Limit In the free fermion limit c → ∞ function E− (λ|u) (3.2) does not depend on the continuous variable u. Indeed, in this limit we have h(λ, µ) = (λ − µ + ic)/ic → 1, hence all commutators (2.4) are equal to zero. Thus, one can put all dual fields equal to zero in expressions for the functions E± , √ 1 1 θ(λ) 1 E+ (λ|u) = + eτ (u)− 2 τ (λ) , (C.1) 2π u − λ + i0 u − λ − i0 E− (λ|u) =
1 − 1 τ (λ) p e 2 θ(λ). π
(C.2)
b possess the following properties: B21 (u, v) Hence matrix elements of the operator B does not depend on u and v; B11 (u, v) depends only on the first argument u; B22 (u, v)
Integrable Equation for Correlation Function of Bose Gas
727
depends only on the second argument v; B12 (u, v) depends on both arguments u and v. b Analogous properties have matrix elements of the operator C. One can make the replacement B21 (u, v) → B21 ,
R∞ −∞ R∞ −∞ R∞ −∞
duB11 (u, v) → B11 , dvB22 (u, v) → B22 ,
dudvB12 (u, v) → B12 ,
C21 (u, v) → C21 ,
R∞ −∞ R∞ −∞ R∞ −∞
duC11 (u, v) → C11 , dvC22 (u, v) → C22 ,
dudvC12 (u, v) → C12 .
Here functions Bab and Cab are scalar potentials. In terms of these potentials one can rewrite the correlation function, the logarithmic derivatives of the determinant and the partial differential equations in the form hψ(0, 0)ψ † (x, t)iT = −
e−iht b12 det I˜ + Ve . 2π
The logarithmic derivatives of the determinant ∂x log det I˜ + Ve = ib11 , ∂t log det I˜ + Ve = i(−C11 + C22 − b21 G), ∂x ∂x log det I˜ + Ve = −b12 b21 , ∂t ∂x log det I˜ + Ve = i(∂x b12 · b21 − ∂x b21 · b12 ). The differential equations. (One should integrate (7.9) with respect to u and v) −i∂t b12 = −∂x2 b12 + 2b212 b21 , i∂t b21 = −∂x2 b21 + 2b221 b12 . Here bab = Bab for all a and b except a = 1, b = 2; b12 = B12 − G. These results reproduce the results of [12] up to notations and rescaling the distance x and the time t. Acknowledgement. We wish to thank Professor T. Inami for useful discussions. This work is partly supported by the National Science Foundation (NSF) under Grant No. PHY-9605226, the Japan Society for the Promotion of Science, the Russian Foundation of Basic Research under Grant No. 96-01-00344 and INTAS-01-166-ext.
728
T. Kojima, V. E. Korepin, N. A. Slavnov
References 1. Barough, E., McCoy, B.M. and Wu, T.T.: Phys. Rev. Lett. 31, 1409 (1973) 2. Korepin, V.E., Bogoliubov N.M., Izergin A.G.: Quantum Inverse Scattering Method and Correlation Functions. Cambridge: Cambridge University Press 1993 3. Jimbo, M., Miwa, T., Mˆori, Y., Sato, M.: Density matrix of an impnetrable Bose gas and the fifth Painlev#’e transcendent. Physica 1D, 80–185 (1980) 4. Kojima, T., Korepin V.E., Slavnov N.A.: Determinant representation for dynamical correlation functions of the Quantum nonlinear Schr¨odinger equation. Preprint ITP-SUNY-SB-96-71, hep-th 9611216, Commun. Math. Phys. ??? 5. Lieb, E.H. and Liniger, W: Exact analysis of an interacting Bose gas I. The general solution and the ground state. Phys. Rev. 130, 1605–1616 (1963) 6. Lieb, E.H.: Exact analysis of an interacting Bose gas II. The excitation spectrum. Phys. Rev. 130, 1616–1634 (1963) 7. Zakharov, V.E., Shabat, A.B.: Exact theory of two-dimensional self-focusing and one-dimensional sel modulation of wawes in nonlinear media. Sov. Phys. JETP 34, 62–69 (1972) 8. Faddeev, L.D., Sklyanin, E.K.: Quantum mechanical approach to completely integrable field theory models. Sov. Phys. Dokl. 23, 902–904 (1978) 9. Yang, C.N., Yang, C.P.: Thermodynamics of a one dimensional system of bosons with repulsive deltafunction interaction. J. Math. Phys. 10, 1115–1122 (1969) 10. Its, A.R., Izergin, A.G., Korepin, V.E.: Correlation radius for one-dimensional impenetrable Bosons. Phys. Lett. 141A, 121–125 (1989) 11. Its, A.R., Izergin, A.G., Korepin, V.E.: Temperature correlators of the impenetrable Bose gas as an integrable system. Comm. Math. Phys. 129, 205–222 (1990) 12. Its, A.R., Izergin, A.G., Korepin, V.E., Slavnov, N.A.: Differential equations for quantum correlation functions. Int. J. Mod. Phys. B4, 1003–1037 (1990) 13. Korepin V.E., Slavnov N.A.: The time-independent correlation fnctions of an impnetrable Bose gas as a Fredholm minor. Commun. Math. Phys. 129, 103–113 (1990) 14. Slavnov, N.A.: Differential equations for multipoint cirrelation functions in a one-dimensional impnetrable Bose ga-gas. Theor. Math. Phys. 106, 131–142 (1996) 15. Korepin, V.E.: Generating functional of correlation functions for the nonlinear Schr¨odinger equation. Funk. Analiz i jego Prilozh 23, 15–23 (1989) (in Russian) 16. Korepin, V.E., Slavnov N.A.: Correlation function of fields in a one-dimensional Bose gas. Commun. Math. Phys. 136, 633 (1991) 17. Lenard, A.: One-dimensional impenetrable bosons in thermal equilibrium. J. Math. Phys. 7, 1268–1272 (1966) Communicated by Ya. G. Sinai
Commun. Math. Phys. 189, 729 – 757 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Nowhere Dispersing 3D Billiards with Non-vanishing Lyapunov Exponents Leonid A. Bunimovich, Jan Rehacek School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332, USA Received: 14 February 1996 / Accepted: 21 March 1997
Abstract: We describe a class of 3-dimensional regions with focusing components that generate a billiard system with non-vanishing Lyapunov exponents. To do this we answer affirmatively the long standing question whether or not the chaotic motion caused by defocusing can be produced in more than two dimensions. Introduction Systems of the billiard type represent one of the most popular and best investigated class of hyperbolic dynamical systems with singularities. The first class of billiards thoroughly studied were the Sinai billiards, which exhibit the strongest stochastic properties, caused by the dispersing character of the boundary components. In the work spanning two decades Sinai and his school developed ideas originally expressed by Hopf [H] for geodesic flows to prove ergodicity for hyperbolic systems with singularities. These techniques [S1], were extended and clarified in [B-S] and later in [S-C] and [K-SS]. After Lazutkin [L] proved the existence of caustics for billiards in smooth convex domains, it seemed that regions with focusing components (i.e. convex inwards) would not possess the ergodic properties attributed to dispersing billiards. It has been discovered, however, that even regions with focusing components may produce chaotic behavior, if the focusing at the moments of reflections is compensated for on the average by defocusing. This occurs when the focusing is strong enough and a free path between reflections is long enough. In [B1, B2] a class of regions was introduced (bounded by circle arcs and, perhaps, dispersing and/or neutral components) for which the billiards were shown to be ergodic and K-systems. Later also some other regions (with focusing components of more general types) were studied by Donnay [D3], Markarian [M1, M2], Wojtkowski [W2] and Bunimovich [B4, B5]. All of this work has been done for planar billiards. Therefore for more than 20 years the following problem was open: Is the mechanism of defocusing, discovered in [B2, B6] purely two-dimensional or does it work in higher
730
L.A. Bunimovich, J. Rehacek
dimensions too? This paper shows that the mechanism of defocusing is rather universal. Even though we study the regions with focusing components only in three dimensions, we believe that the basic idea should apply in the general case. In fact, we have shown how to fight the principal problem that is a weak focusing in the direction orthogonal to the plane of the orbit. The existence of several such directions doesn’t seem to be the principal, but rather a technical problem that demands some cumbersome calculations. We follow the basic guidelines given in [B3] and prove the non-vanishing of Lyapunov exponents for a specific class of regions consisting of flat faces with spherical caps attached to them. To ensure a finite time of focusing (see [B3, W3]), we’ll derive some conditions, limiting the size of the spherical caps. This limitation is due to the fact that the reflections from the sphere have different effects on the nearby trajectories when viewed from different directions. In the plane of the orbit the nearby rays are focused just as in the 2-dimensional case. In any two-dimensional plane orthogonal to it, however, the focusing is weaker, and if the orbit were given sufficiently long series of reflections, they might leave the sphere with the focusing so slow that at the moment of arrival at the next spherical cap the nearby rays might still be converging. That would allow arbitrarily strong contraction of nearby orbits that couldn’t be compensated for by a defocusing in the following evolution. It is worthwhile to recall that in the 2D case the circular components of the boundary may have an arbitrarily large internal angle and may be arbitrarily close to each other. On the contrary, in the 3D case one needs much stronger conditions, namely the spherical caps need to be sufficiently small (in terms of the angle) and the distances between them must be sufficiently big. Our results also open the possibility to construct ergodic geodesic flows on a 3D sphere. Corresponding 2D examples that were inspired by the discovery of the mechanism of defocusing have been considered in [D1, D2] and [B-G]. The structure of the paper is as follows. In the first section, we introduce the regions in question and review the necessary background from the theory of billiards. We then derive some relations for the evolution of Jacobi fields, which are useful tools for studying the behavior of trajectories close to a billiard orbit. The third section contains technical lemmas and two, for our purposes fundamental, propositions. These assert that even though the passage of the orbit through the sphere causes local convergence of the neighboring trajectories, this convergence is in all the principal directions restricted to a relatively small time interval. The last section contains the proof of the main result.
1. Description of the model and the main result We will study billiards in some class of 3-dimensional regions described below (an example of a typical region is in Fig. 1). The boundary of the region consists of flat walls and spherical caps attached to them. We will consider only caps whose maximal angle ω 00 < 90◦ . By the maximal angle of the spherical cap we mean the biggest possible angle subtended by the spherical cap at the center of the sphere. Since every orbit lies in a uniquely determined plane, we will do most of our computations in this plane. Every such plane cuts the spherical cap in a circular arc, whose angle ω 0 (see Fig. 2) will be called an effective angle of the spherical cap. In Sects. 2 and 3 we will, for short, call this angle an angle of the spherical cap. After the computations are done in a fixed plane (Proposition 1), we return to the angle ω 00 in Proposition 2. Our aim is to obtain some relationships between the given angles ω 00 , the radii of the corresponding spheres and the “distances” between the different caps which will ensure that the motion of our billiards has non-vanishing Lyapunov exponents.
Nowhere Dispersing 3D Billiards with Non-vanishing Lyapunov Exponents
731
C
ω
Q
,,
S n φ
-1
T x Tx
x=(q,v)
Fig. 1. Billiard trajectory
φ
C
D ω B
ω,
Fig. 2. Spherical cap
A
732
L.A. Bunimovich, J. Rehacek
Consider an evolution of a 2-D infinitesimal control surface γ (also called a wavefront) of class C 2 perpendicular to the orbit. The rate at which the neighboring trajectories diverge is measured by the curvature operator (the operator of the second fundamental form) of the surface γ. To demonstrate that the defocusing mechanism works, we show that if the surface γ approaches the spherical cap with the positive curvature which is small in all directions, then after the last reflection from the cap the surface focuses in all directions within a certain fixed distance (see Proposition 1). After passing through focusing points the surface will have a strictly positive curvature operator and, after a sufficiently long free path, will return to some (possibly other) cap with a small positive curvature again. This property ensures that the mechanism of defocusing discovered in [B2] for 2-D billiard works also for our 3-D regions. We introduce the following notation. Let Q ⊂ R3 be a region described above, whose boundary is equipped with a field of inward normal vectors n(q). Further, let M be a restriction to Q of the unit tangent bundle of R3 . Points in M have the form x = (q, v), where q ∈ Q is the support of x and v ∈ Tq (R3 ). By a billiard we mean a dynamical system in M, generated by the motion of x ∈ M along a straight line determined by v with unit speed. When this line reaches the boundary of Q it is reflected according to the rule “the angle of incidence is equal to the angle of reflection”. The angle φ is measured with respect to the normal n(q). This motion generates a flow on the phase space which we shall denote by St . As usual, this flow induces the discrete dynamical system with the induced mapping T obtained by the restriction of St to the boundary (see Fig. 1). The 3-D billiard mapping preserves the measure dµ = const. cos(φ) sin(φ)dφdϑdr1 dr2 , where φ is the angle of reflection, r1 , r2 are cartesian coordinates of the tangent plane and ϑ is the polar angle of the projection of the outcoming ray in the r1 , r2 plane. The const is the usual normalizing constant so that µ(M ) = 1. Strictly speaking, the billiard mapping is defined only for a subset (of full measure) of M. Since the detailed information about the singularity set is not necessary for our purposes, we will still call this subset M . In order to describe the linearized dynamics in the vicinity of a billiard orbit, we use the concept of orthogonal Jacobi fields, that is, vector fields ξ(t) = (α(t), β(t)) ∈ V × V satisfying α0 = β and β 0 = 0, where V is a plane perpendicular to the orbit. Since our main interest lies in the case when the orbit has a series of reflections with a sphere, we shall also need formulas which describe the change of Jacobi fields at the moment of reflection. Let n(q) be a unit normal vector at the point of reflection and let v + and v − be the unit vectors along the directions of the outcoming and incoming billiard orbit respectively. Then v + = v − − 2(n.v − )n and the vectors v + and v − span a plane P in which the whole orbit lies as long as it is within the sphere S. This plane is unique and contains any two points of reflection in a series of reflections from the spherical cap along with the center of the corresponding sphere. The plane V, perpendicular to the orbit, is then naturally split into V = V p ⊕ Vt , where Vp = V ∩ P and Vt is the orthogonal complement to Vp in V. The plane V has thus two distinguished directions which will be referred to as the planar direction and the transversal (or orthogonal) direction. The Jacobi fields can be split accordingly and
Nowhere Dispersing 3D Billiards with Non-vanishing Lyapunov Exponents
733
the changes upon reflection of the respective components of Jacobi fields in Vp and Vt are given in Sect. 2. Since the evolution of a Jacobi field along the orbit can be also viewed as the evolution of the infinitesimal surface γ, perpendicular to the orbit we shall adopt the notation ξ(t) = (dr(t), dφ(t)) and often drop the argument t. More precisely, if we denote κ(t) = dφ(t) dr(t) , then κ can be thought of as a curvature of the surface St (γ) in the direction which lies in the plane P or is orthogonal to it, while (dr, dφ) stand for a variation of the arc length and the angle of the normal vector for a curve which is tangent to Vp or to Vt . Since we always investigate the evolution of Jacobi fields separately for both directions, we use the same notation (dr, dφ) in both cases. By St (γ) we mean the surface obtained by following the orbit of each point in γ in the direction of the unit normal vector to this surface. Let us also note that κ0 (t) = −κ2 (t), which is a special case of the Riccati equation that appears in the geometry of geodesics. If we identify ±∞ and set κ(t) = ∞ when dr(t) = 0, then we say that for this value of t the infinitesimal surface γ goes through the focusing (conjugate) point in the respective direction. And finally let us note that the formulas for the evolution of the Jacobi fields in both planar and transversal directions can be expressed in terms of κ(t) as follows (see also [S1]): κ(0) (1.1), κ(t) = 1 + κ(0)t for evolution along the free path. For reflections in planar and transversal directions respectively we get 2k (1.2) κ+ = κ− − cos(φ) and
κ+ = κ− − 2k cos(φ),
(1.3),
where - and + denote the value of κ before and after the reflection, φ is the angle of incidence and k is the curvature of the reflecting surface at the point of reflection. Since we consider a sphere with radius ρ we have k = 1/ρ. Our main result is the following. Theorem 1. Let the boundary ∂Q contain only flat components and spherical caps attached to some of the flat components that are perpendicular to all its adjoined flat components (see Fig. 1). Suppose that the angles of all spherical caps are smaller than 90◦ and that almost every trajectory enters some spherical cap. Then there exists a number R > 0, such that a billiard in Q has non-vanishing Lyapunov exponents provided that each spherical cap is attached to a certain subregion of the size R (the exact conditions on the region Q are specified at the end of Sect. 3) and that these subregions are separated from each other. Remark 1. The main difficulty in the proof of this theorem is caused by the fact that when the surface γ reapproaches the spherical cap C, the principal curvature directions which are invariant along its free path do not generally coincide with the planar and orthogonal directions. To overcome this, we will enclose the surface γ between two control surfaces of constant positive curvature at the entrance. One of these surfaces will have the maximal curvature of γ and the other the minimal one. For both of these surfaces, the planar and orthogonal directions are principal and are preserved during a series of reflections in one spherical cap. Thus, after leaving the spherical cap we can compute the curvatures in both directions separately. That enables us to state some conditions on the entrance curvature κ so that the focusing points in both principal directions will occur within a certain fixed distance after the last reflection in a series
734
L.A. Bunimovich, J. Rehacek
of reflections in a spherical cap. Therefore a long free path will provide a defocusing leading to the exponential divergence of nearby trajectories (i.e. to non-vanishing of Lyapunov exponents). Then we prove that the curvatures of an arbitrary surface after the last reflection in a series of consecutive reflections from a given spherical cap lie within the range of the curvatures of the two special surfaces considered above. We’ll make this exact in the third section. As a preliminary step, we will derive some formulas describing the evolution of the Jacobi fields along the orbit, and from these we shall derive formulas for their evolution between the entrance and the exit from the spherical cap. 2. Evolution of Jacobi Fields Consider a spherical cap C, which is a part of a sphere S (with a radius ρ) obtained by cutting this sphere by some 2D plane, and such that the angle which the cap subtends at the center is ω 0 < 90◦ . We use Jacobi fields to describe the infinitesimal variations along the billiard orbit from point A to point B (see Fig. 2). Point A lies in the middle of the first chord which the orbit cuts in the sphere S. Similarly point B lies in the middle of the last chord. We choose these two points because the formulas obtained seem to be easier to handle than formulas resulting from considering the points just before the first reflection and immediately after the last one, which might seem to be the most natural choice. We will point out the difficulties with this choice at the end of this section. The only difficulty with our choice is that it forces us to work with the angle ω (the angle which A and B subtend at the center of the sphere) which in general is different from the angle ω 0 of the spherical cap. Let us consider a billiard orbit having n reflections between points A and B. We shall handle evolution in the planar and orthogonal directions separately. In both cases the Jacobi field at point A will be denoted by (dr, dφ) and at point B by (dr0 , dφ0 ). These are related by the equation 0 dr dr =G , 0 dφ dφ where the matrix G is a product of the following matrices: 1 t T = , 0 1 for a free path of the length t and P =
−1
2 ρ cos(φ)
for the reflection in the planar direction or −1 O = 2 cos(φ) ρ
0 −1
0 −1
for the reflection in the orthogonal direction. It is clear that for the length of the free path between the reflections we get t = 2ρ cos(φ). Because of the scaling, we will mostly consider the case ρ = 1 to which we restrict ourselves in the following derivation. By the successive applications of matrices P and T, we get the formula for the evolution in the planar direction from A to B in the form:
Nowhere Dispersing 3D Billiards with Non-vanishing Lyapunov Exponents
dr0 dφ0
=
1 2n cos(φ)
0 1
735
dr . , dφ
where n is the number of reflections of the billiard orbit between A and B. This, in terms of curvatures, yields dφ dφ0 2n + . (2.1) κ0 = 0 = dr cos(φ) dr Similarly we could obtain a relation of the exit curvature κ0 (just after the last reflection) and the entrance curvature κ = dφ dr (just before the first one) −2n/ cos(φ) + (2n − 1)κ . 2n − 1 − (2n − 2)κ cos(φ)
κ0 =
(2.2)
These formulas give an intuitive feeling for what happens in the planar direction. As we increase the number of reflections, the focusing point tends to the middle of the chord between two consecutive reflections. This was already proven in [B1]. At each reflection the curvature is diminished so much that it always focuses before the next reflection. In the transversal direction, however, the curvature drop is much smaller and it usually takes a series of reflections to change its sign. It is easy to see that in the planar direction the curvature after a series of reflections approaches −1/ρ cos(φ), while it will be shown that in the transversal direction it oscillates, if we move along the sphere indefinitely. The corresponding formulae for the evolution of a control surface in the transversal direction are obtained in a similar way. In the first place we again multiply matrices T and O in the right order to get: 0 dr dr = G. , 0 dφ dφ where G=
cos 2nφ (cos(2n + 1)φ − cos(2n − 1)φ)/2 2(cos φ + cos 3φ + · · · + cos(2n − 1)φ) cos 2nφ cos 2nφ − sin 2nφ. sin φ . = sin 2nφ cos 2nφ sin φ
Now we use the relation between the angles ω and φ, 2nφ = 180◦ n − ω, or in trigonometric terms sin(φ) = cos(
to obtain G=
or G=
cos(ω) − sin(ω) sin(φ)
cos(ω)
− sin(ω) cos(ω/2n)
(2.3)
ω ), 2n
sin(ω). sin(φ) cos(ω)
sin(ω). cos(ω/2n) cos(ω)
This gives us an expression for the curvature at the point B
.
736
L.A. Bunimovich, J. Rehacek ω ) + cos(ω). dφ dφ0 − sin(ω)/ cos( 2n dr = . ω dφ dr0 cos(ω) + sin(ω). cos( 2n ). dr
(2.4)
In particular when the curvature at A is 0 we have dφ0 −tanω = ω . dr0 cos 2n
(2.5)
The analog of the formula (2.4) in case when the radius of the sphere is ρ (in (2.4) ρ = 1) is ω )) + cos(ω). dφ dφ0 − sin(ω)/(ρ cos( 2n dr = . (2.6) ω dφ dr0 cos(ω) + sin(ω).ρ cos( 2n ). dr And finally let us write the matrix G in the case, where we consider the part of the orbit between points C and D only (Fig. 2) −1 sin((2n − 1)φ) − sin((2n − 2)φ) . G= . − sin(2nφ) sin((2n − 1)φ) sin(φ) From this matrix a curvature formula similar to (2.4) can be derived. Unfortunately this cannot be simplified (using the analog of (2.3)) to a convenient form expressible in terms of the angle ω. That’s why we chose to consider the evolution from the point A to B (in Fig. 2) rather than from C to D.
3. Passage of the Control Surface Through the Spherical Cap In this section we will prove a statement (Proposition 1) which will enable us to say that even though the passage through the spherical cap C may cause some local convergence of trajectories, this convergence is in some sense bounded. Therefore after defocusing the long divergence of trajectories during the free path leads to the divergence (in time average) of nearby trajectories in billiards under consideration. To achieve this, however, we have to assume that the spherical cap is not too big (in terms of the spherical angle), and so before the proof of this proposition, we would like to illustrate why we need this restriction. Let us look at the orbit coming into the cap and having two reflections in it (as in Fig. 3). Also consider an infinitesimal beam of trajectories along the orbit, represented by a small surface γ perpendicular to it. Since the behavior in the planar direction has been well understood (for a nice geometric description see [D3]), we will concentrate on what happens in the direction perpendicular to it. Let us denote the curvature in this direction at the moment before the first reflection by κ and the angle of the first reflection by φ. Then, using the formulas (1.1) and (1.3), we can directly compute the curvature immediately after the second reflection as c(κ) =
κ − 4 cos(φ) − 4 cos2 (φ)(κ − 2 cos(φ)) . 1 + 2 cos(φ)(κ − 2 cos(φ))
In particular, if the approaching surface is flat in this direction we get c(0) =
−4 cos(φ) + 8 cos3 (φ) . 1 − 4 cos2 (φ)
(3.1)
Nowhere Dispersing 3D Billiards with Non-vanishing Lyapunov Exponents
737
C
45
n
90
Fig. 3. Need for the boundedness of a spherical angle
Analyzing the last formula we see that for the angle of reflection φ = 45◦ , the curvature at the exit is c(0) = 0 again (this corresponds to a solid line in Fig. 3). It is also easy to see that for φ < 45◦ (dashed line) the exit curvature is negative and, for φ > 45◦ , positive. For small initial curvatures κ and for angles φ close to 45◦ , the denominator in (3.1) is approximately 1 − 4 cos2 (φ). Rewriting the numerator gives us approximately c(κ) = κ + c(0). If we make the angle φ just slightly smaller than 45◦ , we see that for very small initial curvatures κ > 0 (and such curvatures are to be expected after a long free path) we get arbitrarily small negative curvatures at the exit. This means that after our control surface leaves the given cap, it may converge for a very long time, possibly all the time until it hits another cap, which in turn may produce another convergence. Such uncontrollably large convergence may prevent the existence of non-vanishing Lyapunov exponents for billiards. From Fig. 3 it is clear that it is the angular size of the cap which enables this behavior. The limiting case happens when the spherical cap subtends the right angle. This case still permits the square orbit (solid line on Fig. 3) which has the property that the initially flat control surface leaves the cap flat again. If the angle of the cap, however, is smaller than 90◦ , then a flat control surface leaves the sphere with negative and bounded away from zero curvature. The purpose of the following Proposition 1 is exactly to show that in this case the control surface, after leaving the sphere, focuses in all directions within a certain fixed distance (in the Proposition denoted by R), having thus enough space for expansion. Indeed, by insuring that there is enough free path between spherical caps, we can generate sufficient expansion to compensate the local contraction in accordance with the mechanism of defocusing [B1-B4]. Proposition 1. Consider a billiard orbit having n consecutive reflections in a spherical cap, whose radius is ρ and whose maximal angle is ω 0 < 90◦ . Then there exist constants λ(ρ, ω 0 ) > 0 and R(ρ, ω 0 ) > 0 such that every control surface γ whose initial curvature satisfies κ ∈ (0, λ(ρ, ω 0 )) in every direction (on that surface) has both the exit principal curvatures outside the interval (0, −1/R(ρ, ω 0 )).
738
L.A. Bunimovich, J. Rehacek
Proof. First it is clear from the formula (2.6) that it is enough to consider only a spherical cap with the radius 1. In the case of a radius ρ the constants λ and R in the formulation of this proposition would be scaled by the factors 1/ρ and ρ, respectively. Indeed, the formula (2.6) is invariant to multiplying the radius ρ by an arbitrary constant and dividing the curvatures in the formula by the same constant. The proof is now divided into two steps. In the first part of the proof we will show the result for a special case of the surface whose principal directions coincide with the planar and transversal directions. Two particular surfaces which will later be of interest are spherical surphaces γf and γd in the Fig. 4. In this case both planar and transversal directions are invariant for the spherical cap and we will consider them separately in the first 2 lemmas. We will then state additional lemmas about the effect of the free path on control surfaces and finish the proof by considering the general surface γ0 whose initial curvatures are bounded by those of γf and γd . γd κ2
κ1 γ0
γf γd
κ
θ0 γf
-90
0
θ
90
Fig. 4. Curvature graphs of surfaces
Lemma 1. Suppose that the planar direction is the principal curvature direction at the entrance to the spherical cap of radius one and that the initial curvature of the surface γ in the planar direction satisfies 0 < κ < 1. Then the exit curvature in the planar direction satisfies κ0 < κ − 1 < 0. Proof. This lemma follows from well known facts about planar billiards (see [B2]), namely that after leaving the spherical cap, the surface γ actually focuses before it leaves the corresponding sphere. This can be seen from formula (2.2), from which we obtain (for κ > 0): −1 2n 1 +κ< + κ. κ0 < − 2n − 1 cos φ cos φ Lemma 2. Suppose that the transversal direction is the principal curvature direction at the entrance to the spherical cap of the unit radius and angular size ω 0 . Then there exist constants λ(ω 0 ) and R(ω 0 ) such that for the control surface γ the following holds. If all entrance curvatures whose principal directions lie in the transversal direction satisfy
Nowhere Dispersing 3D Billiards with Non-vanishing Lyapunov Exponents
739
0 < κ < λ(ω 0 ) then the corresponding exit curvatures satisfy either κ0 < −1/R(ω 0 ) or κ0 > 0. Proof. From (2.5) it follows that there are two ways how the curvature at the exit can be small and negative. Either ω is very small or it is bigger than 180◦ . We will consider these two cases separately. The second case (when ω > 180◦ ) can be dismissed immediately by restricting the angle ω 00 and thus restricting also any effective angle ω 0 . Indeed, if the effective angle of any orbit is ω 0 ≤ 90◦ we obtain ω ≤ 180◦ and from (2.5) we see that the exit curvature is within stated bounds. For non-zero entrance curvature we obtain the same conclusion using the monotonicity properties (subtraction of some fixed curvature does not change the order of two curvatures and neither does the free path). As for the first case, we consider orbits with very small effective angle ω 0 . A small effective angle may result in a small angle ω and by (2.5) possibly small negative curvature. We will first discuss the case n = 1 (one reflection), since it is this case that gives us the numerical values of constants λ and R. Since the value subtracted from the 0 0 curvature in one reflection is 2 cos φ = 2 sin ω4 it is enough to set λ = 1/R = sin ω4 and we get the statement of the lemma. In case the sphere doesn’t have the unit radius, according to the remark at the beginning of the proof of Proposition 1, the constants are 0 easily rescaled to λ = 1/R = ρ1 sin ω4 . We now claim that in a given plane the orbit which gives us the largest exit curvature (i.e. the one that is closest to 0 in the absolute value) is the one with only one reflection. Indeed, the more reflections, the longer the overall free path within the cap (since such orbits trace out the cap more closely) and that causes larger curvature drop resulting from the free-path formula. The cumulative value of curvature drop resulting from all the reflections is also least for one reflection. To see this, let us recall that if the orbit is to have n reflections, its angle of reflection must satisfy φ > 90◦ −
ω0 . 2(n + 1)
Since for each reflection we have to subtract 2 cos φ, we obtain that for n reflections we subtract ω0 ω0 ) = 2n sin . 2n cos φ ≤ 2n cos(90◦ − 2(n + 1) 2(n + 1) The claim now follows from the fact that the last expression is increasing as a function of n. These two lemmas prove the statement of Proposition 1 in the case of the surface γ whose principal curvature direction before the first reflection is aligned with the planar direction. Indeed, in that case the second principal curvature direction coincides with the transversal direction, and because both planar and transversal directions are invariant during the series of reflections, we can consider their evolution separately. To conclude the same for an approaching surface with non-aligned principal directions we have to consider a general surface γ0 , perpendicular to the direction of the motion. Before we do that, we have to mention a few facts from differential geometry. Convention. It is clear that on any surface γ the curvature directions can be parametrized by an angle θ measured from some standard direction. During a passage through a spherical cap, the billiard orbit lies in a fixed plane V and this plane intersects the surface γ in a certain curve. This curve defines a definite direction on the surface γ (we called it the “planar” direction) which will be assigned an angular value 0. The angular parameter
740
L.A. Bunimovich, J. Rehacek
for the transversal direction thus becomes 90◦ (or −90◦ ). Since angles differing by 180◦ correspond to the same curvature directions we need to consider only θ ∈ (−90◦ , 90◦ ). Sometimes it will be convenient to parametrize the directions on the control surface by the angle ψ = 2θ with ψ ∈ (−180◦ , 180◦ ). There are two ways to express a curvature in a general direction v. The first way hinges on the fact that the curvature in a given direction v(θ) on the surface is given by (Bv, v), where B is the second fundamental form of the surface. If we write v in terms of principal directions w1 , w2 v = cos(θ)w1 + sin(θ)w2 , we obtain that the curvature κ in the direction v(θ) is κ(θ) = cos2 (θ)κ1 + sin2 (θ)κ2 ,
(3.2)
where κ1 ≤ κ2 are the principal curvatures. The second way of expressing a curvature in an arbitrary direction can be obtained from the “half-angle” formulas (recall that we set ψ = 2θ): sin2 (θ) =
1 − cos(ψ) , 2
cos2 (θ) =
1 + cos(ψ) . 2
It is more convenient for us to use the angle ψ to parametrize the directions. For the curvature we obtain κ(ψ) = k − K cos(ψ), (3.3) where k = (κ1 + κ2 )/2, K = (κ2 − κ1 )/2 > 0. The positive number K, which we will call “steepness” of the surface, plays an important role in the process of subtraction of curvatures of two surfaces (which we consider in Lemma 3). These two forms both correspond to surfaces whose curvature attains a minimum at the angle 0. If we deal with only one surface, this can always be arranged. If one, however, considers two surfaces, then one of them may attain minimal curvature at some non-zero angle. In that case both (3.2) and (3.3) hold with the provision that we have to shift the arguments on the right hand side to θ − θ0 and ψ − ψ0 , respectively. The subscripted angles θ0 , ψ0 thus indicate the position of the minimal curvature. The formulas now may be stated as κ(θ) = cos2 (θ − θ0 )κ1 + sin2 (θ − θ0 )κ2 ,
(3.20 )
κ(ψ) = k − K cos(ψ − ψ0 ).
(3.30 )
Thus any surface perpendicular to the orbit is characterized by a triple (κ1 , κ2 , θ0 ) or (k, K, ψ0 ). For any value of ψ we now consider the difference of (directional) curvatures of two such surfaces. The result of this subtraction gives a function of the angular parameter ψ. It turns out that this function again has the form (3.3) and its parameters are explicitly computed in the lemma below. Lemma 3. Consider two surfaces characterised by (k, K, 0) and (l, L, ψ0 ). Then the difference of their curvatures (directionwise) can be expressed as a curvature of a surface (s, S, ψ1 ), where s = l − k and S, ψ1 can be computed from (3.6) below.
Nowhere Dispersing 3D Billiards with Non-vanishing Lyapunov Exponents
741
Proof. Without loss of generality we have set to 0 the minimal curvature for one of the surfaces. We seek to find the triple (s, S, ψ1 ) so that the difference of curvatures of the two surfaces (in each direction) can be expressed as l − L cos(ψ − ψ0 ) − (k − K cos(ψ)) = s − S cos(ψ − ψ1 ).
(3.4)
First, to eliminate one of the variables to be determined (namely s) we differentiate (3.4): L sin(ψ − ψ0 ) − K sin(ψ) = S sin(ψ − ψ1 ).
(3.5)
Expanding this expression yields L sin ψ cos ψ0 − L sin ψ0 cos ψ − K sin ψ = S sin ψ cos ψ1 − S sin ψ1 cos ψ. By comparing the terms with sin(ψ) and cos(ψ) we obtain two equations for the two unknowns S, ψ1 : L sin(ψ0 ) = S sin(ψ1 ), (3.6a) L cos(ψ0 ) − K = S cos(ψ1 ).
(3.6b)
From (3.6) S and ψ1 can be determined using the usual geometrical considerations, S 2 = L2 − 2KL cos(ψ0 ) + K 2 ,
(3.7)
and ψ1 satisfies tan(ψ1 ) =
L sin ψ0 . L cos ψ0 − K
With these relations at hand we can finally determine s. Setting ψ = 0 in (3.4) yields s = l − L cos ψ0 − k + K + S cos ψ1 , where S and ψ1 have already been determined. Differentiating (3.5) and setting ψ = 0 gives L cos ψ0 − K = S cos ψ1 , which simplifies the previous equation to s = l − k. Remark 2. The effect of the subtraction of the curvatures can be best visualized in Fig. 5. One can think of the “steepness” L and the direction ψ0 as polar coordinates of a point in the plane. Then the “steepness” S of the surface representing the difference of the curvatures and the new direction ψ1 will be polar coordinates of a point which has been translated along the x-axis by the distance -K. Lemma 4. Consider the two surfaces in Lemma 3 and such that their principal curvatures satisfy κ1 ≤ λ1 < κ2 ≤ λ2 . Then there exists a unique (up to a sign) value of ψ0 given below by (3.9) such that the curvatures of the surfaces (κ1 , κ2 , 0) and (λ1 , λ2 , ψ0 ) satisfy k(ψ) ≤ l(ψ) (3.8) for all the values of ψ and the equality is attained for at least one value of ψ = ψ1 .
742
L.A. Bunimovich, J. Rehacek
K
S L ψ
1
ψ
0
Fig. 5. Subtracting curvatures of two surfaces
Proof. Recall that k = (κ1 + κ2 )/2, K = (κ2 − κ1 )/2 and similarly for the other surface. In terms of the previous lemma we have to prove that there is a unique value of ψ0 for which the resulting “difference” surface (s, S, ψ1 ) has positive curvature in all directions except at ψ1 where its curvature is 0. Naturally ψ1 is a principal direction for a difference surface and 0 is the smaller of principal curvatures. Since the principal curvatures of the difference surface are s + S and s − S, we need to find the value ψ0 for which s = S (recall that s is independent of ψ0 , while S depends on it). From (3.7) and Lemma 3 (S = s = l − k) we obtain cos ψ0 =
K 2 + L2 − S 2 K 2 + L2 − (l − k)2 = . 2KL 2KL
Expressed in terms of the principal curvatures this yields cos ψ0 =
(λ2 − λ1 )2 + (κ2 − κ1 )2 − (λ1 + λ2 − κ1 − κ2 )2 . 2(λ2 − λ1 )(κ2 − κ1 )
By expanding the numerator and rearranging the terms we finally obtain the value of ψ0 from (λ1 − κ1 )(λ2 − κ2 ) . (3.9) cos ψ0 = 1 − 2 (λ2 − λ1 )(κ2 − κ1 ) The uniqueness of ψ0 follows immediately from (3.9). Note that the sharp inequality λ1 < κ2 prevents the denominator in (3.9) from vanishing. Definition 1. Two surfaces satisfying the relation (3.8) with equality attained for at least one value of ψ will be called tangential. Remark 3. Figure 6 shows why we needed the specific inequalities between curvatures in Lemma 4. In the case κ1 < κ2 < λ1 < λ2 the two surfaces can be separated by a spherical surface. In that case no rotation can make them tangential. If κ1 < λ1 < λ2 < κ2 , then the difference curvature changes the sign. Therefore no rotation of the (l, L, ψ0 ) surface can make the two surfaces tangential. Actually in both cases the right-hand side of (3.9)
Nowhere Dispersing 3D Billiards with Non-vanishing Lyapunov Exponents
743
is bigger than 1 (in absolute value) and the value of cosine is not defined. Also notice that the quotient on the right hand side of (3.9) is the cross-ratio of the four curvatures. λ2 (l,L, ψ 0 ) κ2 λ1 ψ0 S
K
-180
(k,K,0)
κ1
0
κ ψ
180
Fig. 6. Nearly tangential surfaces
To finish the proof of Proposition 1 we need to know that the tangentiality of two surfaces is preserved during the free path, which is the content of the next lemma. Before that, we would like to remark that during the free path it may happen that the inequality (3.8) is reversed. As a matter of fact, any time one of the four principal curvatures “goes through the focusing point” the inequality is reversed. But the tangentiality is preserved. Only the “lower” surface becomes an “upper” one and vice versa. Lemma 5. Tangentiality of two surfaces is preserved during the free path. Proof. During the free path the cross ratio in (3.9) is preserved (under the linear fractional transformation representing the free path), as can be verified by direct substitution. This means that during the free path the unique value of ψ0 remains constant. But the principal curvature directions remain constant too and hence the statement follows. In other words the principal curvature direction ψ0 which is constant during the free path satisfies the tangentiality condition (3.9) at each time t. Note that the actual angular value ψ1 where the tangency occurs may not be preserved during the free path. End of the proof of Proposition 1. Consider now a general surface γ0 , with curvatures κ1 < κ2 ≤ λ (λ is a constant from Lemma 2) and direction θ0 (corresponding to κ1 ). We first construct the surfaces γf and γd with constant curvatures κ1 and κ2 , respectively. Thus for each direction θ, f (θ) ≤ κ(θ) ≤ d(θ), (3.10) that is the directional curvatures of γ0 lie between those of γf and γd . Moreover both pairs γ0 , γf and γ0 , γd are tangential.
744
L.A. Bunimovich, J. Rehacek
The billiard orbit consists of points of reflection and of free paths. After each reflection the principal curvature directions change in a manner similar to the one described in Lemma 3. The relation (3.8) (or its opposite), however, is preserved, because at each direction θ we subtract the same value from the three curvatures f (θ), κ(θ), d(θ). In particular it follows from this that if two surfaces are tangential before the reflection, they remain tangential after it. Moreover, the actual angular value ψ1 , where the tangency occurs, is preserved, which is not true for the free path. Since initially the pairs of surfaces γd , γ0 and γf , γ0 are tangential, they remain tangential for the whole series of reflections because tangentiality is not lost during the reflection (as we just concluded) and it follows from Lemma 5 that the tangentiality is also preserved during the free path. This fact gives us the control over the curvatures of γ0 provided we can control those of γd and γf . Let us look at the situation at the exit from the spherical cap (point D in Fig. 2). We have three infinitesimal surfaces with the same tangentiality properties as at the entrance (i.e. the surfaces γd , γ0 and γf , γ0 are tangential). We know that the principle curvatures of γf and γd lie outside the interval (−1/R, 0), as was already proved in Lemmas 1 and 2. We now show how tangentiality helps to control the general surface. Since, as we have already remarked, the inequality (3.8) may have been reversed during the free path we have to do this with some care. The surfaces γf and γd assume planar and transversal directions as their principal directions immediately after the first reflection in a series. Thus the evolution of their principal curvatures (corresponding to the angles 0◦ and 90◦ ) can be understood easily, as was done in Lemmas 1 and 2. For convenience we denote the curvatures of these two surfaces in the planar direction by fp and dp respectively and similarly in the transversal direction (θ = 90◦ ) by ft and dt . It is important to remark here, that unless the curvatures of the surface γ0 have the same value, the difference κ2 − κ1 > 0 and that means that at all times fp 6= dp and ft 6= dt . The evolution of the curvatures in the planar direction (θ = 0◦ ) was described many times in the literature (e.g. [B2]) and it is known that at the exit point D the curvatures satisfy fp < dp ≤ −1/R < 0.
(3.11)
This reflects the fact that both surfaces focus between any pair of consecutive reflections in the spherical cap. In the transversal direction the situation is a bit more complicated. The following three cases may happen: ft < dt ≤ −1/R < 0,
(3.12)
dt ≤ −1/R ≤ 0 ≤ ft ,
(3.13)
0 ≤ ft < dt .
(3.14)
The case (3.12) corresponds to the situation, when neither the surface γf nor γd went through the focusing point (in the transversal direction). In (3.13) only the surface γf focused, while in (3.14) both surfaces focused in the transversal direction before they arrived at the exit point D (in Fig. 2). No other configuration of curvatures is possible in the transversal direction, since from the proof of Lemma 2 it is clear that after passing through the spherical cap, both initial curvatures must drop below −1/R. That in particular prevents the situation ft < 0 ≤ dt ,
Nowhere Dispersing 3D Billiards with Non-vanishing Lyapunov Exponents
745
which would spoil our argument. This discussion so far relies only on Lemmas 1 and 2 and doesn’t make any statements about the curvatures of the general surface γ0 . This surface may have focused several times during the series of reflections and its exit curvatures are difficult to compute. We will show, however, that using the tangentiality we can estimate those curvatures in all of the above cases to obtain the desired statement. The cases (3.12) and (3.14) can be treated simultaneously. In these cases the inequalities between curvatures of γf and γd are of the same type and so the curvature graphs of the surfaces (as in Fig. 6) are two non-intersecting curves of the form (3.3). If the surface γ0 is to have tangentiality with both surfaces γf and γd , it is necessary that its curvature graph lies between those of γf and γd . Hence if we denote the exit curvatures of γ0 by k1 ≤ k(θ) ≤ k2 we obtain f (θ) ≤ k(θ) ≤ d(θ),
∀θ,
from which the corresponding inequalities for the principal curvatures are easily deduced. In the case (3.13) the curvature graphs of γf and γd intersect and it is easy to see that the curvature graph now cannot lie between the surfaces γf and γd (it would violate one of the tangentialities). Hence either the curvature graph of γ0 lies above or below both surfaces γf and γd . We thus have to consider two subcases (k(θ) ≤ f (θ)) & (k(θ) ≤ d(θ)),
(3.15a)
(f (θ) ≤ k(θ)) & (d(θ) ≤ k(θ)).
(3.15b)
In the subcase (3.15a) both principal curvatures of γ0 must be smaller than those of γt since γ0 lies “under” γt . The subcase (3.15b) is the most complicated one. Here one curvature k2 of γ0 (the bigger one) must be positive, since the surface γ0 lies “above” γf , whose one curvature is positive. On the other hand, the smaller curvature k1 of γ0 must be smaller than the bigger curvature of γd (whichever it is). Indeed, if it were not, the surfaces γ0 and γd would not be tangential. Since both principal curvatures of γd are smaller than −1/R (see (3.11) and (3.13)), we infer the same for k1 . Hence in this last case we have k2 ≤ −1/R < 0 ≤ k1 , and that concludes the proof of Proposition 1. Now we leave the plane of the orbit which cut the cap in an angle ω 0 and prove that there exists a region in which the outcoming control surface focuses in all directions for any effective angle ω 0 , no matter how small it may be (see Fig. 7). Definition 2. A “zone of focusing” of a given spherical cap is a part of the billiard region, which is bounded by the spherical cap, by the flat component to which the cap is attached, by flat components perpendicular to the one with the spherical cap and finally by a transparent virtual flat component, which is parallel to the one with the spherical cap and at the distance R(ρ, ω 00 ) from it. The number R will be called the size of a zone of focusing. Proposition 2. Let ω 00 be the maximal angle of a spherical cap C, let ρ be its radius and let λ = λ(ρ, ω 00 ) and R = R(ρ, ω 00 ) be the constants from Proposition 1. Then every outcoming control surface whose entrance curvatures were in the interval (0, λ) focuses, after leaving the spherical cap C within the zone of focusing of the size R(ρ, ω 00 ) no matter how small the effective angle of the spherical cap may be.
746
L.A. Bunimovich, J. Rehacek
C t ω,,
ρ
R( ω,, , ρ)
b
Q
Fig. 7. Zone of focusing
Proof. The size of the focusing zone is chosen to be the constant from Proposition 1, corresponding to the maximal angle ω 00 . This automatically ensures that all the surfaces that are perpendicular to an orbit with the maximal effective angle and that enter the zone of focusing with positive curvatures will focus while still being in the same zone of focusing. The main problem is to show that when the ”effective angle” is arbitrary, we can still ensure focusing within the zone which depends only on the maximal angle ω 00 and not on the effective angle ω 0 . We show that if the size of the zone of focusing is bigger than ρ sin(ω 00 /4) , then every orbit coming out of the cap focuses before it hits the “transparent bottom” of the zone. Since all the lengths can be rescaled, we can again consider ρ = 1. The effective angle for general orbits can be arbitrarily small and we must prove that the length of the free path (from the ”bottom”) is proportionately bigger for such orbits. Let us fix one orbit and denote by P the plane of the orbit in the spherical cap. Since the reflections from the flat “side” walls are equivalent to passing of the orbit through them, we can think that after leaving the spherical cap, the orbit stays in the plane P , rather than leaving it as a result of reflections from the “side” walls. In the plane P the “top” and “bottom” walls are represented by a pair of parallel lines t and b whose distance is bigger than the size of the zone of focusing, i.e. 1/ sin(ω 00 /4). The spherical cap intersected with P is now a piece of a circle attached to the upper line t. The effective angle of the cap corresponding to the plane P will be denoted by ω 0 and can be arbitrarily small. The worst situation is when the angle of reflection is closest to 90◦ , when the curvature drop resulting from the reflection in the transversal direction is very small. This happens when the approaching orbit is almost horizontal, has one reflection with the sphere and its angle ω < ω 0 . In this case the angle which the orbit subtends with the line t must be smaller than ω 0 /2 (this is the angle at which the
Nowhere Dispersing 3D Billiards with Non-vanishing Lyapunov Exponents
747
circular arc cuts the line t). Since the distance between the lines t and b is bigger than 1/ sin(ω 00 /4) the length of the free path from the cap to the line b satisfies l≥
1 2 1 1 ≥ ≥ . sin(ω 0 /2) sin(ω 00 /4) sin(ω 0 /2) sin(ω 0 /4)
The first inequality follows from geometry, the second one from the fact that ω 00 < 90◦ which implies sin(ω 00 /4) < 21 and the third one from the concavity of the function sin on (0, π). As a result of these inequalities we obtain that the curvature at the entrance of the spherical cap with the effective angle ω 0 is smaller than sin(ω 0 /4). Hence after passing through the sphere the curvature of the control surface will be smaller than − sin(ω 0 /4) in all directions (Proposition 1) and the surface will focus while still being in the zone of focusing.
,,
ω1 ρ1
,,
R( ω1 , ρ 1 )
,,
R( ω2 , ρ 2 )
,,
ω2
ρ2 Fig. 8. A general billiard region Q
To summarize the results from this section, we now formulate three conditions that we impose on the billiard region, which we assume consists only of the flat components and spherical caps (an example of such a region is Fig. 8). Let R(ρ, ω 00 ) be the size of a zone of focusing from Proposition 2. Condition A. The angles of all spherical caps are less than the right angle. Condition B. Every spherical cap has its own zone of focusing and there is a positive distance between these zones for different spherical caps.
748
L.A. Bunimovich, J. Rehacek
Condition C. The set of all the phase points x ∈ M whose orbit never enters any spherical cap has measure 0. Remark 4. The purpose of “enclosing” each spherical cap in a zone of focusing is to give the outcoming control surface ample time to defocus. By requiring that the respective zones of focusing do not overlap, we make sure that the control surface always enters the zone of focusing with the positive definite curvature operator (second fundamental form). This in turn causes the control surface to enter the corresponding spherical cap with small curvatures and after leaving it to focus while still being in the same focusing zone. It follows from the definition of the size of the zone of focusing that the distance between different spherical caps must be bigger than the sum of their radii. While in our proof this is only a necessary condition, numerical results indicate that it is also a sufficient one. This belief is also supported by a discussion of the stability of a 2-periodic orbit bouncing between two spheres (see the Appendix). The reader may also notice that for some regions (e.g. the one in Fig. 1, which has a rectangular cross-section) Condition C is satisfied automatically. However, for a general position of flat walls it is not yet known whether the set of points x whose orbit never enters spherical caps has zero measure. That’s why we have postulated Condition C. 4. Lyapunov exponents We shall show that the 3-D billiard systems described above have non-vanishing Lyapunov exponents. To achieve this we use the approach via invariant cones discussed in [W1]. The key ingredient is to define a family of cones in the tangent bundle which is invariant (and eventually strictly invariant) with respect to the billiard map. Before we state the theorem, we would like to review a few facts from the geometry of tangent vectors and from symplectic geometry. It is customary to relate tangent vectors for billiard systems to infinitesimal families of trajectories. More precisely, consider a point x = (r, φ) = (r1 , r2 , φ1 , φ2 ) ∈ M (as in Fig. 9), determining a dashed billiard orbit. A vector x0 = (r0 , φ0 ) = (r10 , r20 , φ01 , φ02 ) ∈ Tx M can naturally be related to a family of orbits o(σ) = (r(σ), φ(σ)) = (r + σr0 , φ + σφ0 ). Note that this family represents a curve in the phase space, satisfying o(0) = x and o0 (0) = x0 . Hence this family is a natural representative of a class of equivalent curves from the usual definition of a tangent vector. However, representing tangent vectors as families of trajectories originating from the billiard boundary has one formal drawback. For the purpose of expressing the behavior of nearby trajectories it is convenient to use the curvature of the wavefront corresponding to the family o(σ). Note that two tangent vectors which differ only by a scalar multiple give rise to families with the same curvature. The quantity dφ dr , which is the natural candidate to look at, is related to this curvature through a factor of cosine of the angle of reflection. For this reason we will consider the new arc-length parameter, which can be thought of as measuring the distances in the plane perpendicular to the orbit rather than along the billiard boundary. In the situation in Fig. 10, the traditional parameters are designated by (s1 , s2 ) and the new ones by r1 , r2 . It is clear from the picture that the directions of the parameter axes can be chosen so that
Nowhere Dispersing 3D Billiards with Non-vanishing Lyapunov Exponents
749
φ
r
dφ
x’
r+dr
φ
x
r Fig. 9. Tangent vectors
r1 = s1 , r2 = s2 cos(φ). The differentials of the angular parameters are the same in both cases and we will keep the notation φ1 for the angles in the r1 direction and similarly for the second pair. Also notice that this parametrization is in a certain sense inherited from the parametrization of the tangent bundle to the whole billiard region, quotiented out by the direction of the motion. Indeed, consider an arbitrary point x = (q, v) ∈ Q × S 2 . By simply adding the arclength parameter r0 in the direction of the motion, we have a complete set of coordinates on Tx (Q × S 2 ) for any point of the (continuous) billiard orbit. Since the dynamics in the direction of the motion is trivial, this coordinate is usually suppressed. The remaining coordinates parametrize the plane perpendicular to the orbit at any point of Q, which includes the boundary points. In the remainder of this section Tx M will always mean this perpendicular subspace of the tangent space to Q × S 2 at each point x. Technically the replacement of the s-coordinates by the r-coordinates can be thought of as a rescaling of a tangent bundle T M in such a manner that every tangent space Tx M, x = (r, φ) is parametrized with a different arc-length parameter (depending on the angle of reflection φ). These new coordinates make most of the formulae simpler. Now we will take a closer look at the tangent vectors and the curvatures of wavefronts defined by the associated families o(σ). First let us recall that the second fundamental form of any smooth surface is a quadratic bilinear form B, expressing the change in the normal vectors of neighboring points. In terms of our coordinates, B can be expressed as dφ dφ1 1 dr1 dr2 B = dφ dφ2 . 2 dr1 dr2 Suppose that we fix a point x = (r1 , r2 , φ1 , φ2 ) ∈ M and a tangent vector x0 = (r10 , r20 , φ01 , φ02 ) ∈ Tx M . This vector defines a family of trajectories o(σ), which
750
L.A. Bunimovich, J. Rehacek
r1
r2
s1
s2
Fig. 10. Tangent planes
can be also thought of as an infinitesimal curve (perpendicular to the orbit). We will describe below how to express the curvature of this infinitesimal curve using the second fundamental form. Denote r0 = (r10 , r20 ) and φ0 = (φ01 , φ02 ). Then the second fundamental form of a surface maps the arc-length vector onto the angular vector φ0 = Br0 , or in more detail
dφ1 0 dφ1 0 r + r , dr1 1 dr2 2 dφ2 0 dφ2 0 r + r . φ02 = dr1 1 dr2 2 The curvature of a surface in the direction of a unit vector u is given by (Bu, u). We take u = r0 /|r0 |. Then by making use of Br0 = φ0 one can define the curvature of the family o(σ) by φ0 · r 0 κ= 02 . |r | φ01 =
Since the curvature depends only on the direction we can always rescale the vector x0 so that r0 is a unit vector. In the rest of this paper we will assume this. Remark 5. Note that if the vectors r0 and φ0 are collinear, that is φ0 = λr0 for some scalar λ, then r0 is a principal curvature direction and λ is a principal curvature. Also note that even if we specify two tangent vectors x0 = (r0 , φ0 ) and y 0 = (s0 , ψ 0 ), we still may not have specified a surface, because even though we can find a matrix B such that Br0 = φ0 , Bs0 = ψ 0 , this matrix may not be symmetric. In general, the vectors r0 and s0 do not have to be perpendicular, unless, of course, they represent the two principal curvature directions.
Nowhere Dispersing 3D Billiards with Non-vanishing Lyapunov Exponents
751
The above remark expresses the fact that not every plane in the 4-D tangent space corresponds to an infinitesimal perpendicular surface. In order that span(x0 , y 0 ) corresponds to an infinitesimal surface, it is necessary and sufficient that r0 .ψ 0 = s0 .φ0 .
(4.1)
This is just a condition for the symmetricity of the matrix B in terms of the vectors x0 = (r0 , φ0 ) and y 0 = (s0 , ψ 0 ). If we think of R4 as a symplectic space with a standard symplectic form , then Eq. (4.1) becomes (x0 , y 0 ) = 0. Hence the infinitesimal surfaces, perpendicular to the orbit can be identified with the Lagrangian subspaces of R4 , i.e. with planes that are skew-orthogonal to themselves ( (x0 , y 0 ) = 0 for any two vectors from that plane). It is also easy to see that it is the condition (4.1) which guarantees (in fact is equivalent to) the fact that the principal curvature directions of the infinitesimal surface determined by x0 and y 0 are perpendicular. Indeed, given the above vectors x0 , y 0 we can define two 2x2 matrices P = (r0 , s0 ) and Q = (φ0 , ψ 0 ), where the vectors in parentheses are understood to be column vectors from R2 . Direct computation immediately yields that (4.1) is equivalent to P ∗ .Q = Q∗ .P and that in turn guarantees that the matrix Q.P −1 is symmetric. In fact, the matrix Q.P −1 is the matrix which maps r0 to φ0 and s0 to ψ 0 . Hence Q.P −1 is nothing else than the curvature matrix B for the infinitesimal surface determined by x0 , y 0 and as such it has real eigenvalues and orthogonal eigenvectors (principal curvature directions). Thus given two vectors x0 , y 0 satisfying (4.1) there is a unique pair of unit vectors u, u⊥ which are the principal curvature directions of the infinitesimal surface corresponding to x0 , y 0 . On the other hand, if we have only one vector x0 ∈ Tx M , then there are many infinitesimal surfaces which can contain the vector x0 . However, if we decide that we want our vector embedded in a surface with principal directions u, u⊥ ∈ R2 (where u may be chosen arbitrarily), then we easily compute the corresponding curvatures φ0 · u , (4.2) κ= 0 r ·u φ0 · u⊥ (4.3) κ⊥ = 0 ⊥ . r ·u Having made the choice of principal curvature directions u, u⊥ , and denoting the corresponding curvatures by κ, κ⊥ , the components of the vector x0 can be expressed as r0 = u cos θ + u⊥ sin θ,
(4.4)
φ0 = uκ cos θ + u⊥ κ⊥ sin θ.
(4.5)
Before we prove the main theorem, let us introduce the notion of sectors in the tangent space (for more detailed treatment see [LW]) and recall some elementary facts about them. Let V1 , V2 ⊂ Tx M be two transversal Lagrangian subspaces, i.e. every vector in w ∈ Tx M can be uniquely written as w = v1 + v2 , where vi ∈ Vi . This decomposition allows one to define a quadratic form Q(w) = (v1 , v2 ) on Tx M . Recall that (x0 , y 0 ) = r0 · ψ 0 − s0 · φ0 , where r0 , s0 , φ0 , ψ 0 ∈ R2 and x0 , y 0 ∈ R4 ∼ = ” denotes an isomorphism between = Tx M (“∼ the linear spaces). Given V1 and V2 we can define a sector (cone) C = C(V1 , V2 ) = (w ∈ Tx M, Q(w) ≥ 0).
(4.6)
752
L.A. Bunimovich, J. Rehacek
The interior of the sector is then defined as the set of vectors on which the quadratic form Q is strictly positive. Since the definition (4.6) is difficult to work with, we will now evaluate the quadratic form Q explicitly for a particular choice of the Lagrangian subspaces V1 and V2 . Let us fix two real numbers > δ > 0. Then we set V1 = span [(1, 0, δ, 0), (0, 1, 0, δ)],
(4.7)
V2 = span [(1, 0, , 0), (0, 1, 0, )].
(4.8)
It is clear that these two subspaces are Lagrangian and that they are transversal, i.e. R4 = V1 ⊕ V2 (“⊕” stands for a direct sum). These subspaces correspond to infinitesimal surfaces in the form of spherical caps of radii 1/δ and 1/ respectively and their definition is coordinate free. By using elementary linear algebra, we can express an arbitrary vector x0 = (r0 , φ0 ) in terms of the above direct sum as follows: 1 ((r0 − φ0 , δ(r0 − φ0 )) + (φ0 − δr0 , (φ0 − δr0 ))). −δ With this decomposition at hand, the quadratic form Q can be easily expressed as (r0 , φ0 ) =
Q(x0 ) =
1 1 (( + δ)r0 · φ0 − δkr0 k2 − kφ0 k2 ) = (r0 − φ0 ) · (φ0 − δr0 ). (4.9) −δ −δ
We finish this part by stating a lemma which looks at the definition of the sector C from the point of view of curvatures. Lemma 6. Let V1 and V2 be Lagrangian subspaces defined by (4.7), (4.8) and let C(V1 , V2 ) be a sector defined by (4.6). Then any vector x0 ∈ Tx M ∼ = R4 such that 0 Q(x ) = 0 can be embedded in a 2-D subspace (control surface) with principal curvature directions u, u⊥ and corresponding curvatures and δ. Moreover, if a vector y 0 ∈ Tx M satisfies Q(y 0 ) > 0, then there exists another pair of principal directions such that the corresponding pair of principal curvatures lies in the interval (δ, ). Remark 6. This lemma states that even in terms of curvatures the boundary of the sector is what it is supposed to be, i.e. a set of vectors, which can be embedded in a surface with principal curvatures attaining the extreme values δ and . Note that it doesn’t say that Q(x0 ) = 0 automatically implies that every surface (i.e. Lagrangian subspace) containing x0 must have that property. As a matter of fact, for “most” of such surfaces one principal curvature is outside and one inside the interval (δ, ). Only for a very specific “position” of the pair of principal directions (specified in the proof) the extreme values are simultaneously attained. Proof. The first part of the lemma is easy. We consider the vector x0 = (r0 , φ0 ). Then for the pair of principal curvature directions we can take φ0 − δr0 and φ0 − r0 (possibly normalized so that they have unit length). From Q(x0 ) = 0 and from (4.9) it follows that these two vectors are orthogonal. The principal curvatures corresponding to this pair of directions are (according to (4.2), (4.3)): κ=
φ0 · (φ0 − δr0 ) r0 · (φ0 − δr0 ) = 0 = , r0 · (φ0 − δr0 ) r · (φ0 − δr0 )
Nowhere Dispersing 3D Billiards with Non-vanishing Lyapunov Exponents
κ⊥ =
753
φ0 · (φ0 − r0 ) = δ. r0 · (φ0 − r0 )
In both equations, the normalizing factors cancel, so we omitted them. As for the second part, we first observe that Q(y 0 ) > 0 with y 0 = (s0 , ψ 0 ), implies that the angle between the vectors ψ 0 − δs0 and ψ 0 − s0 is bigger than 90◦ . Indeed, from (4.9) it follows that (4.10) (ψ 0 − δs0 ) · (ψ 0 − s0 ) < 0. Thus we can no longer take this pair of vectors for the principal curvature directions (they are not orthogonal). Instead, we will take an orthogonal pair (ψ 0 − δs0 ) and (ψ 0 − δs0 )⊥ (we could also take the pair with and proceed along the same lines). For the “δ-pair", one principal curvature is very simple. In fact, the one corresponding to (ψ 0 − δs0 )⊥ is obtained from (4.3) immediately: κ⊥ =
ψ 0 · (ψ 0 − δs0 )⊥ = δ, s0 · (ψ 0 − δs0 )⊥
by using the trivial equality (ψ 0 − δs0 ) · (ψ 0 − δs0 )⊥ = 0. To estimate the principal curvature corresponding to the direction ψ 0 − δs0 we first note that (4.11) s0 · (ψ 0 − δs0 ) > 0. Indeed, if (4.11) was not true, we would get (ψ 0 − δs0 ) · (ψ 0 − s0 )
= (ψ 0 − δs0 ) · (ψ 0 − δs0 − ( − δ)s0 ) = kψ 0 − δs0 k2 − ( − δ)s0 · (ψ 0 − δs0 ) ≥ 0,
and that is a contradiction since (according to (4.10)) the first term in this string of equalities is negative. Using (4.11) we can get the upper bound on the curvature corresponding to the direction (ψ 0 − δs0 ). We can rewrite (4.10) as ψ 0 · (ψ 0 − δs0 ) < s0 · (ψ 0 − δs0 ), and using (4.2) and (4.11) conclude that κ=
ψ 0 · (ψ 0 − δs0 ) < . s0 · (ψ 0 − δs0 )
The other bound κ > δ follows easily from (4.2) and from kψ 0 − δs0 k2 > 0. Note that if ψ 0 = δs0 , (4.10) would be violated. So far we have a pair of perpendicular directions, with one curvature already inside the desired interval (δ, ) and the other one being equal to δ. To finish the proof of Lemma 6 we now rotate the pair of directions ψ 0 − δs0 and (ψ 0 − δs0 )⊥ by an angle θ. If θ is sufficiently small then the curvature κ stays in (δ, ), no matter whether we turn the pair of directions clockwise or anticlockwise. The other curvature κ⊥ = δ will get inside the said interval either for the clockwise rotation or the anticlockwise one (depending on the values of (s0 , ψ 0 )). This follows from (4.3) if we write u in the form u⊥ = (cos θ, sin θ). The corresponding curvature then is κ(θ) =
ψ10 cos θ + ψ20 sin θ ψ10 + ψ20 tanθ = 0 . s01 cos θ + s02 sin θ s1 + s02 tanθ
754
L.A. Bunimovich, J. Rehacek
Moreover its derivative is always non-zero: ψ 0 s0 − ψ10 s02 dκ = 0 2 1 . dθ (s1 cos θ + s02 sin θ)2 Indeed, the numerator is non-zero, unless the vectors s0 and ψ 0 are collinear, which case, as we have already mentioned, is trivial. The lemma is proved. Theorem 2. The billiard map T for the region Q satisfying the conditions (A), (B) and (C) has non-vanishing Lyapunov exponents for almost every x ∈ M . Proof. For a point Ci ∈ ∂Q (see Fig. 11) for which the trajectory is defined we will construct the sectors (cones) first for certain points inside the billiard region and then translate them using the differential of the flow back to the boundary. We define these sectors roughly as those representing surfaces which have very small positive curvatures at the entrance of any spherical cap, which the region Q may contain. R1 R2 A1 A2
C1
C2 Fig. 11. Construction of invariant cones
More precisely, given a point Ci , let us denote by Ai the point in the middle of the chord immediately before the first reflection in the next series of reflections in any spherical cap that belongs to Q. At these points Ai we will define the sectors, which we then move to other points. The fact that we can find such Ai ’s for a set of points of full measure follows from Condition C and from the Poincar´e Recurrence Theorem. We denote the unit velocities at the points Ai by vi and set xi = (Ai , vi ). For the given billiard orbit we denote the map carrying xi to xi+1 by s (thus s(xi ) = T ti (xi ) = xi+1 for a suitable time ti ) and the corresponding differential from Txi M to Txi+1 M by S. Since xi ’s are not points of the boundary, let us recall from the beginning of this section that Txi M is an orthogonal complement of the velocity vector vi in the 5D tangent space Txi (Q × S 2 ). Since this space is 4D and plays the same role as Tx M we keep this notation at points xi .
Nowhere Dispersing 3D Billiards with Non-vanishing Lyapunov Exponents
755
By considering first the dynamics between the configuration points Ai , we establish the non-vanishing of Lyapunov exponents for the first return map (with respect to the spherical caps). The non-vanishing of the Lyapunov exponents for the case of the billiard map itself then follows from the standard argument (see [W1]), i.e. the Lyapunov exponents of the map T and of the “first return map” s (between xi ’s) are proportional; the constant of proportionality being the average of the return time ti (T ti xi = xi+1 ), whose existence is guaranteed by the Ergodic Theorem. Let λ > 0 be the constant from Proposition 1, characterizing the smallness of the curvature. We now define the cone C(xi ) at a phase point xi , corresponding to a configuration point Ai . We use (4.7) and (4.8) with δ = 0 and = λ. Let Q be the quadratic form associated with the thus defined V1 and V2 . In view of transversality of Lagrangian subspaces V1 and V2 , we can now define the sector C(xi ) as C(xi ) = C = {x0 ∈ Tx M, Q(x0 ) ≥ 0}. In terms of geometry, the cone C(xi ) consists of those infinitesimal surfaces (i.e. Lagrangian subspaces), whose principal curvatures κ ∈ [0, λ]. Now we recall the result of Lemma 6 for each of the above mentioned tangent spaces at xi . Vectors from the interior of the cones (sectors) C(V1 , V2 ) can be embedded into surfaces with principal curvatures between 0 and λ. The converse of Lemma 6 is also true. Control surfaces at points xi with principal curvatures in the interval (0, λ) consist of vectors belonging to the interior of the cones C. This can be seen as follows. Let us denote the vectors corresponding to the principal curvature directions by x0 = (u, κu) and y 0 = (v, κ0 v). The value of the quadratic form Q for the linear combination w = a1 x0 + a2 y 0 is κ0 κ (4.12) Q(w) = κ(1 − )a21 + κ0 (1 − )a22 . λ λ If both principal curvatures κ, κ0 ∈ (0, λ), then Q(w) > 0, and w lies strictly in C(xi ). Thus the position of tangent vectors x0 ∈ Txi M , belonging to some particular surface (Lagrangian subspace), with respect to the cone C(xi ) can be determined solely from the principle curvatures of that surface. We mention that the differential S of the “first return” map s is a symplectic matrix, since it is a product of symplectic matrices representing the differential of the billiard map between the individual reflections. Since both s and S are measurable maps, the pair (s, S) is a measurable cocycle and we can use the results of [W1] about symplectic matrices. The set of 4x4 symplectic matrices will be denoted by Sp(4) and of special interest will be its subset consisting of “cone-preserving” matrices F = {S ∈ Sp(4), Q(Sx0 ) > 0
for x0 ∈ C}.
(4.13)
The result of Sect. 3 (Propositon 1) can be now reformulated in terms of the sectors (cones) defined at xi . We embed every tangent vector from the cone C at xi into an infinitesimal surface (i.e. Lagrangian subspace V ). From Proposition 1, we infer that after arrival at xi+1 , this surface (subspace) has small curvatures κ in all directions (namely 0 < κ < λ). The strictness of these inequalities guarantees the strict invariance of the cones between xi and xi+1 and consequently the eventually strict invariance of the billiard map. The “κ < λ” part of this inequality follows from the condition B, where we assumed that zones of focusing are separated by some positive distance. This implies (thanks to (4.12)) that Sx0 lies in the interior of the cone C. Hence, Q(w) > 0 for every vector w ∈ S(V ) and S belongs to F for every pair of xi , xi+1 .
756
L.A. Bunimovich, J. Rehacek
The proof of our Theorem is now concluded by application of Theorem 5.1 (from [W1]), which states that every measurable cocycle with values in F has only nonvanishing Lyapunov exponents. Corollary. Let the billiard map T for the region Q satisfy only conditions (A) and (B). Then there exists an invariant subset M 0 ⊂ M with µ(M 0 ) > 0 such that the billiard system in Q has non-vanishing Lyapunov exponents for every x ∈ M 0 . Proof. It is enough to observe that the set of points x ∈ M for which the corresponding billiard orbit has at least one reflection in some spherical cap is invariant (Poincar´e Recurrence Theorem) and has a positive measure. For such x the proof of the Theorem can be repeated without change.
Appendix We will discuss the linear stability of a 2-periodic orbit bouncing between two spherical caps. Since the angle of reflection is 0, there is no difference between the behavior in the planar and transversal directions and the 4-dimensional problem breaks down to two (identical) 2-dimensional ones. Let us denote the distance between the two points of reflection by d and the radii of the two spheres by r and R, respectively. The linearized behavior around the 2-periodic orbit is then described by 2x2 matrices T =
1 0
1 d , Pr = −2 1 r
0 1 , PR = −2 1 R
0 . 1
In terms of these matrices, one period corresponds to a matrix A = PR .T.Pr .T , which is 2d(1 − 2d 1 − 2d r r ) . A= 4d 2d 2d − 2d − r2 − R2 + rR R + (1 − R )(1 − r ) The trace of this matrix is equal to T r(A) = 2 −
4d 4d 4d2 − + . r R rR
In order to obtain −2 < T r(A) < 2 which characterizes the stable behavior, one has to solve two inequalities. One yields d < r + R, and the other d < min(r, R)
or
d > max(r, R).
In the special case where r=R the other inequality becomes vacuous and from the first one we can see that in order to produce stable behavior, the spheres should be close to each other (one should contain the center of the other). Acknowledgement. Authors wish to express their gratitude to N.I. Chernov for valuable discussions. The authors are also indebted to the referee for careful reading of the manuscript and many useful comments. This work was supported by the NSF grant #DMS-9303769.
Nowhere Dispersing 3D Billiards with Non-vanishing Lyapunov Exponents
757
References [A] [B1] [B2]
Arnold, V.I.: Mathematical Methods of Classical Mechanics. New York: Springer-Verlag 1978 Bunimovich, L.A.: On billiards close to dispersing. Math. USSR Sb. 23, 45–67 (1974) Bunimovich, L.A.: On Ergodic properties of Certain Billiards. Funct. Anal. and Its Appl. 8, 254–255 (1974). Also On the Ergodic Properties of Nowhere Dispersing Billiards. Commun. Math. Phys. 65, 295–312 (1979) [B3] Bunimovich, L.A.: Many-dimensional nowhere dispersing billiards with chaotic behavior. Physica D 33, 58–64 (1988) [B4] Bunimovich, L.A.: A Theorem on Ergodicity of Two-Dimensional Hyperbolic Billiards. Commun. Math. Phys. 130, 599–621 (1990) [B5] Bunimovich, L.A.: On absolutely focusing mirrors. In Ergodic Theory and Related Topics III. (ed. by U. Krengel), Lect. Notes 1514, New York: Springer-Verlag 1992, pp. 62–82 [B6] Bunimovich, L.A.: Two mechanisms of Dynamical Chaos: Permanent Stochasticity and Intermittency. In Nonlinear and Turbulent Processes in Physics. (ed. by V.G. Baryakhtar et al.), Kiev: Naukova Dumka 1988 [B-S] Bunimovich, L.A., Sinai, Ya.G.: On the fundamental theorem of dispersing billiards. Math.Sb. 90, 415–431 (1973) [B-G] Burns, K., Gerber, M.: Real analytic Bernoulli geodesic flows on S 2 . Ergod. Th. & Dyn. Sys. 9, 27–45 (1989) [D1] Donnay, V.J.: Geodesic flow on the two-sphere. I: Positive measure entropy. Ergod. Th. & Dyn. Sys. 8, 531–553 (1988) [D2] Donnay, V.J.: Geodesic flow on the two-sphere. II: Ergodicity. In: Dynamical Systems ed. by J.C. Alexander, Lect. Notes 1342, New York: Springer-Verlag, 1988, pp. 112–153 [D3] Donnay, V.J.: Using Integrability to Produce Chaos: Billiards with Positive Entropy. Commun. Math. Phys. 141, 225–257 (1991) [H] Hopf, E.: Statistik der Geod¨atischen Linien in Mannigfaltigkeiten Negativer Krummung. Ber. Verh. S¨achs. Akad. Wiss. 91, 261–304 (1939) [K-S-S] Kramli, A., Simanyi, N., Szasz, D.: A “Transversal” Fundamental Theorem for Semi-Dispersing Billiard. Commun. Math. Phys. 129, 535–560 (1990) [L] Lazutkin, V.F.: On the existence of caustics for the billiard ball problem in a convex domain. Math. USSR Izv. 37, 186–216 (1973) [LW] Liverani, C., Wojtkowski, M.P.: Ergodicity in Hamiltonian systems. Dynamics Reported 4, 130–202 (1995) [M1] Markarian, R.: Billiards with Pesin region of measure one. Commun. Math. Phys. 118, 87–97 (1988) [M2] Markarian, R.: New ergodic billiards: Exact results. Nonlinearity 6, 819–841 (1993) [O] Oseledec, V.I.: The multiplicative ergodic theorem: The Lyapunov characteristic numbers of dynamical system. Trans. Mosc. Math. Soc. 19, 197–231 (1968) [S1] Sinai, Ya.G.: Dynamical systems with elastic reflections. Russ. Math. Surv. 25, 137–189 (1970) [S2] Sinai, Ya.G.: Development of Krylov’s ideas. In N.S. Krylov: Works on the foundations of statistical physics, Princeton: Princeton University Press, 1979, pp. 239–281 [S-C] Sinai, Ya.G., Chernov, N.I.: Ergodic Properties of certain systems of two-dimensional discs and three-dimensional balls. Russ. Math. Surv. 42, 181–207 (1987) [W1] Wojtkowski, M.P.: Invariant families of cones and Lyapunov exponents. Ergod. Th. & Dyn. Sys. 5, 145–161 (1985) [W2] Wojtkowski, M.P.: Principles for the design of billiards with non-vanishing Lyapunov exponents. Commun. Math. Phys. 105, 391–414 (1986) [W3] Wojtkowski, M.P.: Linearly Stable Orbits in 3 Dimensional Billiards. Commun. Math. Phys. 129, 319–327 (1990) Communicated by Ya. G. Sinai
Commun. Math. Phys. 189, 759 – 793 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Wakimoto Realizations of Current Algebras: An Explicit Construction Jan de Boer1 , L´aszl´o Feh´er2 1 Department of Physics, University of California at Berkeley, 366 LeConte Hall, Berkeley, CA 94720-7300, USA and Theoretical Physics Group, Mail Stop 50A-5101, Ernest Lawrence Berkeley Laboratory, Berkeley, CA 94720, USA. E-mail: [email protected] 2 Physikalisches Institut, Universit¨ at Bonn, Nussallee 12, D-53115 Bonn, Germany. E-mail: [email protected]
Received: 19 December 1996 / Accepted: 21 March 1997
Abstract: A generalized Wakimoto realization of GbK can be associated with each parabolic subalgebra P = (G0 + G+ ) of a simple Lie algebra G according to an earlier proposal by Feigin and Frenkel. In this paper the proposal is made explicit by developing the construction of Wakimoto realizations from a simple but unconventional viewpoint. An explicit formula is derived for the Wakimoto current first at the Poisson bracket level by Hamiltonian symmetry reduction of the WZNW model. The quantization is then performed by normal ordering the classical formula and determining the required quantum correction for it to generate GbK by means of commutators. The affine-Sugawara stress-energy tensor is verified to have the expected quadratic form in the constituents, which are symplectic bosons belonging to G+ and a current belonging to G0 . The quantization requires a choice of special polynomial coordinates on the big cell of the flag manifold P \G. The effect of this choice is investigated in detail by constructing quantum coordinate transformations. Finally, the explicit form of the screening charges for each generalized Wakimoto realization is determined, and some applications are briefly discussed.
1. Introduction Realizations of various symmetry algebras in terms of simpler algebras are often useful in representation theory and in physical applications. It is enough to mention, for example, the oscillator realizations of simple Lie algebras, the Feigin-Fuks free boson realization of the Virasoro algebra, or the Miura transformations pertinent to the theory of integrable hierarchies and W-algebras. This paper deals with another instance of this general theme, the so-called Wakimoto realizations of the non-twisted affine Lie algebras. b K: Wakimoto [Wa] found the following formula for the generating currents of sl(2) I− = −p,
I0 = j0 − 2 (pq) ,
I+ = − (j0 q) + K∂q + (p(qq)) ,
(1.1)
760
J. de Boer, L. Feh´er
√ where j0 = −i 2(K + 2)∂φ and the constituents are free fields subject to the singular operator product expansions1 (OPEs) p(z)q(w) =
−1 , z−w
∂φ(z)∂φ(w) =
−1 . (z − w)2
The affine-Sugawara stress-energy tensor is quadratic in the free fields i 1 1 Tr I 2 = − (∂φ∂φ) − √ ∂ 2 φ − (p∂q) . 2(K + 2) 2 2(K + 2) This free field realization raised considerable interest because of its usefulness in computing correlation functions of conformal field theories [FF1, BF, BMP, ATY, Ku] and b K to the Virasoro algebra [BO]. in analyzing the quantum Hamiltonian reduction of sl(2) A detailed study of “Wakimoto realizations” of arbitrary non-twisted affine Lie algebras was undertaken in [FF1, BF, BMP, GMOMS, KS, IKa, IKo, ATY, Ku]. These investigations made use of the observation that at the level of zero modes (1.1) becomes ∂ , I0 → the well-known differential operator realization of sl(2) given by I− → ∂q ∂ 2 ∂ −µ0 + 2q ∂q , I+ → µ0 q − q ∂q , for some µ0 ∈ C. Any simple Lie algebra, G, admits analogous realizations in terms of first order differential operators, ∂ + HµX0 (q), X 7→ Xˆ = −FαX (q) ∂qα
∀ X ∈ G,
(1.2)
where FαX , HµX0 are polynomials (see e.g. [Ko, A]). In the context of Wakimoto realizations mainly the so called “principal case” was considered. In this case α runs over the positive roots and µ0 is a character of the Borel subalgebra of G, which may be identified with an element of the Cartan subalgebra H ⊂ G. The main idea [FF1, BMP, IKo] for generalizing (1.1) was to regard the differential operator realization of G as the zero mode part of the sought after realization of GbK , which should be obtained by “affinization” −∂ by conjugate quantum fields and µ0 by a current that (in where one replaces qα and ∂q α the principal case) belongs to H. Of course, one also needs to add derivative terms and find the correct normal ordering, which is rather nontrivial. Feigin and Frenkel demonstrated by indirect, homological techniques that the affinization can be performed for any simple Lie algebra G [FF1] (see also [Ku]). An explicit formula for the currents corresponding to the Chevalley generators of sl(n) was given in [FF1] too, but the method does not lead to explicit formulas for all currents of an arbitrary G (or G = sl(n)). Explicit formulas were later obtained in [IKo] for any G, but again only for the Chevalley generators, and without complete proofs. The quadraticity of the affine-Sugawara stress-energy tensor in the free fields is not quite transparent in this approach. Elaborating the preceding announcement [dBF], in this paper we present an explicit construction of the Wakimoto realization. The essence of our method is that we first derive a classical, Poisson bracket version of the Wakimoto realization by Hamiltonian reduction of the WZNW model, and then quantize the classical Wakimoto realization by normal ordering. This procedure makes it possible to obtain an explicit formula for all currents of G in terms of finite group theoretic data. We shall achieve the above result in the general case, which was previously investigated by Feigin and Frenkel in their different approach (see the second article 1
We adopt the conventions of [BBSS] for OPEs and for normal ordering.
Wakimoto Realizations of Current Algebras
761
in [FF1]). The general case is characterized by the choice of a parabolic subalgebra, P = (G0 + G+ ) ⊂ G. Here G = G− + G0 + G+ denotes a triangular decomposition induced by a Z-gradation of G. One also uses a connected complex Lie group G corresponding to G, its connected subgroups G0 , G± ⊂ G with Lie algebras G0 , G± ⊂ G, and the parabolic subgroup P = G0 G+ . The principal case is that of the principal gradation of G, for which G0 = H and P is the Borel subalgebra. Our starting point will be the Hamiltonian interpretation of the differential operator realization of G mentioned in (1.2). The base space of these differential operators is the big cell of the flag manifold P \G, and µ0 is taken to be a character of P if one requires the operators to act on scalar, as opposed to vector, valued holomorphic functions. Identifying the big cell with the nilpotent subgroup G− ⊂ G, we replace the differential operator realization with the equivalent Poisson bracket realization given on the holomorphic cotangent bundle T ∗ G− by X 7→ I X (q, p) ≡ FαX (q)pα + HµX0 (q) in terms of some canonical coordinates qα , pα on T ∗ G− . This formula can be elegantly written as I X = Tr (XI) with −1 (−j − µ0 )g− , I = g− ∗ = G+ is the momentum map for the action of G− on where g− ∈ G− , j : T ∗ G− → G− ∗ T G− that comes from left translations, and µ0 ∈ G0 ' G0∗ ⊂ P ∗ defines a character, i.e., an element of [P, P]⊥ . Using dual bases V β ∈ G+ and Vα ∈ G− , we have2
j(q, p) = Fαβ (q)pβ V α
(1.3)
with −Fαβ (q) ∂q∂β being the vector field on G− that is tangent to the curve g− 7→ e−tVα g− . As explained in [dBF], the above formula of I, which is equivalent to a formula of Kostant [Ko] for the differential operators on G− , naturally follows from a Hamiltonian symmetry reduction on the holomorphic cotangent bundle T ∗ G. The reduction uses the action of the symmetry group P on T ∗ G that comes from left translations, and therefore the action of G defined by right translations survives to give an action of G on the reduced space T ∗ (P \G), whose generator (momentum map) on the big cell T ∗ G− ⊂ T ∗ (P \G) is just I. The value of the momentum map that defines the reduction is given by −µ0 ∈ G0 = G0∗ ⊂ P ∗ . The reduction can be considered in the same way if µ0 ∈ G0 is not necessarily a character but arbitrary, and then the formula of I becomes −1 (−j + j0 )g− , I(q, p, j0 ) = g−
(1.4)
where j0 is a variable on the co-adjoint orbit O of G0 through −µ0 . Remember that a character is a one-point co-adjoint orbit, and notice that a co-adjoint orbit of G0 can be regarded as a special co-adjoint orbit of P . Formula (1.4) is valid on the big cell T ∗ G− × O in the reduced phase space.3 The Hamiltonian reduction on T ∗ G that is behind the Poisson bracket realization of G given by (1.4) can be naturally generalized to a reduction of the WZNW model [Wi] based on G. As we shall see, this leads to the classical Wakimoto realization of the affine Lie algebra GbK generated by the G-valued current 2
Summation on coinciding indices is understood throughout the text. Globally, the reduced phase space is a fibre bundle over T ∗ (P \G) whose fibre is the co-adjoint orbit O of P , as follows from general results on reductions of cotangent bundles [GS, Ro]. 3
762
J. de Boer, L. Feh´er −1 −1 0 I(q, p, j0 ) = g− (−j + j0 )g− + Kg− g− ,
(1.5)
where now all constituents have been promoted to be fields on S 1 . In particular, qα , pα that define j by (1.3) are now conjugate classical fields and j0 is a G0 -valued current (belonging to a co-adjoint orbit of the centrally extended loop group of G0 ). After presenting the derivation of formula (1.5) for the classical Wakimoto current in Sect. 2, we shall quantize this formula explicitly. For this we shall replace the constituent free fields qα , pα and j0 by corresponding quantum fields, normal order the formula and determine an additional quantum correction so that I will indeed generate the current algebra. We then verify the expected quadratic form of the Sugawara expression. The result is given by Theorem 3 in Sect. 3. All previously known explicit formulas for the Wakimoto realization are recovered as special cases of Theorem 3, as shown in Sect. 5. The classical mechanical Wakimoto realization (1.5) is valid with respect to arbitrary coordinates qα on the group G− . However, in order to implement the quantization we will first choose certain special coordinates on G− , which we call “upper triangular coordinates”. Upper triangular coordinates are special “polynomial coordinates” whose main examples are the graded exponential coordinates associated to a homogeneous P q V . The polynomiality of the coordinates basis Vα of G− by q 7→ g− (q) = exp α α α ensures that the components of I are given by polynomial expressions, which can be easily rendered well-defined at the quantum mechanical level by normal ordering. The upper triangularity property, defined in Sect. 3, is a technical assumption on polynomial coordinates that simplifies the computations. To also implement the quantization in polynomial coordinates that are not necessarily upper triangular, in Sect. 4 we shall investigate quantum coordinate transformations. Two systems of coordinates qα and Qα on G− are “polynomially equivalent” if both the coordinate transformation qα (Q) and the inverse change of coordinates Qα (q) are given by polynomials. We shall show that the quantization of the classical Wakimoto realization can be performed in every system of coordinates which is polynomially equivalent to an upper triangular system, and all those quantum mechanical Wakimoto realizations are equivalent since they are related by quantum coordinate transformations. The so-called screening charges play a crucial role in the applications of the Wakimoto realization [FF1, BF, BMP, ATY, Ku]. Screening charges are operators commuting with the Wakimoto current that can be used to build resolutions of irreducible highest weight representations of the affine algebra and to construct chiral primary fields. In Sect. 6, we will find the explicit form of the screening charges for each generalized Wakimoto realization, extending the result previously known in the principal case [BMP, FF1, Fr1, Ku]. Lie algebraic conventions. For later reference, we here collect some notions of Lie theory (see e.g. [GOV]). Let the step operators E±αl and the Cartan elements Hαl , associated with the simple roots αl for l = 1, . . . , r = rank(G), be the Chevalley generators of the complex simple Lie algebra G. For any l = 1, . . . , r, choose an integer nl ∈ {0, 1} and determine the unique Cartan element H for which [H, E±αl ] = ±nl E±αl . The eigenspaces of H in the adjoint representation define a Z-gradation of G, G = ⊕ m Gm
[Gm , Gn ] ⊂ Gm+n
Gm ≡ { X ∈ G | [H, X] = mX }. (1.6) Denoting the subspaces of positive/negative grades by G± , we obtain the decomposition with
G = G− + G0 + G+ .
(1.7)
Wakimoto Realizations of Current Algebras
763
We let G denote a connected complex Lie group whose Lie algebra is G. In terms of the connected subgroups G0 , G± ⊂ G corresponding to the subalgebras G0 , G± , we have the dense open submanifold Gˇ ≡ { g = g+ g0 g− | g0 ∈ G0 , g± ∈ G± } ⊂ G of “Gauss decomposable” elements. In fact, Gˇ equals to G+ × G0 × G− as a manifold since the decomposition of any g ∈ Gˇ is unique. The standard parabolic subalgebra P ⊂ G associated with the fixed set of integers nl , and the corresponding parabolic subgroup P ⊂ G are given by the semidirect products P = (G0 + G+ ),
P = G0 G+ .
(1.8)
Any parabolic subalgebra is conjugate to a unique standard one. For X, Y ∈ G, we shall denote an invariant scalar product hX, Y i simply as Tr (XY ) as if a matrix representation of G was chosen. Similarly, we denote say Ad g(X) as gXg −1 for any g ∈ G, X ∈ G. This notation is used purely for convenience, a choice of representation is never needed below. In the principal case, for which nl = 1 ∀l, G± are the subalgebras generated by the positive/negative roots, and G0 is the Cartan subalgebra H. In the general case, the Lie algebra G0 can be decomposed into an abelian factor, say G00 , and simple factors, say G0i for i > 0, that are orthogonal with respect to Tr , G0 = ⊕i≥0 G0i .
(1.9)
If ψ and ψi are long roots of G and of G0i for i > 0, then the dual Coxeter numbers of G and of G0i are respectively given by h∗ =
c2 (G) |ψ|2
and h∗i =
c2 (G0i ) |ψi |2
for
i > 0.
(1.10)
The scalar product of the roots is defined by identifying H∗ with H using the restriction of the scalar product Tr of G to H. The quadratic Casimir c2 (G) of G is defined by η ab [Ta , [Tb , Y ]] = c2 (G)Y where Y ∈ G, η ab is the inverse of ηab = Tr (Ta Tb ) for a basis Ta of G; and c2 (G0i ) is defined analogously with the aid of the restriction of the scalar product Tr to G0i . e ≡ C ∞ (S 1 , A) for any Lie group or Lie algebra A. The dual space Finally, we have A to G, G0 , G+ will be respectively identified with G, G0 , G− by means of the scalar product, and the analogous identifications will be used for the corresponding loop algebras. 2. Classical Wakimoto Realization Below we derive Poisson bracket (PB) realizations of GbK , the central extension of the loop algebra Ge = C ∞ (S 1 , G), by means of Hamiltonian (Marsden-Weinstein) symmetry reduction [AM] of the WZNW model [Wi] based on G. Specifically, the Wakimoto current I(q, p, j0 ) in (1.5) will appear as one of the affine Kac-Moody currents of the WZNW model that survives the reduction, while qα (σ), pα (σ), j0 (σ) will be interpreted as coordinates on an open submanifold (the “big cell”) in the reduced phase space. The reduced phase space will not be described globally, since this is not needed for the purposes of this paper. We take the phase space, M, of the WZNW model to be the holomorphic cotangent e realized as bundle of the loop group G, e = { (g, J) | g ∈ G, e J ∈ Ge }. M ≡ T ∗G
764
J. de Boer, L. Feh´er
The phase space M is equipped with the symplectic form (see e.g. [HK]) Z Z 00 ω= d Tr Jdgg −1 + K Tr dgg −1 dgg −1 S1
S1
with some K ∈ C, which yields the fundamental PBs ¯ }WZ = Tr ([Ta , Tb ] J)(σ)δ + K Tr (Ta Tb ) δ 0 , { Tr (Ta J)(σ) , Tr (Tb J)(σ) ¯ }WZ = −Ta g(σ) δ, δ = δ(σ − σ), ¯ { Tr (Ta J)(σ) , g(σ) where 0 ≤ σ ≤ 2π parametrizes S 1 and Ta is a basis of G. As a consequence, I ≡ −g −1 Jg + Kg −1 g 0 satisfies the PBs ¯ }WZ = Tr ([Ta , Tb ] I)(σ)δ − K Tr (Ta Tb ) δ 0 . { Tr (Ta I)(σ) , Tr (Tb I)(σ)
(2.1)
The affine Kac-Moody currents J and I are the generators (momentum maps) for two e The action generated by J is given by e on T ∗ G. commuting Hamiltonian actions of G Lγ : (g, J) 7→ (γg, γJγ −1 + Kγ 0 γ −1 )
e γ ∈ G.
The action generated by I is written as Rγ : (g, J) 7→ (gγ −1 , J)
e γ ∈ G.
The action Lγ leaves I is invariant, while Rγ preserves J and transforms I according to the co-adjoint action (2.2) Rγ : I 7→ γIγ −1 − Kγ 0 γ −1 . e given by Lγ Now we define a symmetry reduction using the action of Pe ⊂ G e for γ ∈ P . The infinitesimal generators of this symmetry are the components of the e∗ = (Ge0 + Ge− ), where we momentum map that maps (g, J) ∈ M to (J0 + J− ) ∈ P decompose J = (J− + J0 + J+ ) using (1.7). The reduction is defined by imposing the constraints J− = 0 J 0 = µ0 , with an arbitrary µ0 ∈ Ge0 . The corresponding reduced phase space is the space of orbits Mred = Pe (µ0 )\Mc , where Mc ⊂ M is the constrained manifold and Pe(µ0 ) is e 0 ≡ C ∞ (S 1 , G0 ), the subgroup of Pe whose action preserves the constraints. Using G ∞ 1 e ± ≡ C (S , G± ), it is easy to see that G e 0 (µ0 )G e+ Pe(µ0 ) = G
with
e 0 (µ0 ) ≡ {g0 ∈ G e 0 | g0 µ0 g −1 + Kg 0 g −1 = µ0 }. G 0 0 0
ˇ c ⊂ Mc where g is Gauss decomposable, We are only interested in the big cell M which is a property respected by the action of Pe. Our immediate aim is to characterize the manifold ˇ red = Pe (µ0 )\M ˇ c, M
e 0 , g± ∈ G e ± , J+ ∈ Ge+ }, ˇ c = {(g+ g0 g− , J+ + µ0 ) | g0 ∈ G M
Wakimoto Realizations of Current Algebras
765
ˇ red defined by the canonical map η : M ˇc→ and to find the symplectic form ωred on M ∗ e ˇ ˇ Mred as η ωred = ω|Mc . For this we notice that a complete set of P (µ0 ) invariant ˇ c is given by functions on M g− , j ≡ g0−1 g+−1 (J+ + µ0 ) g+ − µ0 − Kg+−1 g+0 g0 , j0 ≡ −g0−1 µ0 g0 + Kg0−1 g00 . (2.3) The induced mapping e − × Ge+ × G e 0 (µ0 )\G e0 , ˇ red → G (g− , j, j0 ) : M is in fact one-to-one, and hence (g− , j, j0 ) may be regarded as coordinates on the reduced phase space. Next, one verifies that Z Z Z 0 −1 ˇc= d Tr jdg− g− d Tr µ0 dg0 g0−1 + K Tr dg0 g0−1 dg0 g0−1 . + ω|M S1
S1
S1
(2.4) e− = The first term is then recognized to be the canonical symplectic form on T ∗ G e− e − × Ge+ , where the identification is made using right translations to trivialize T ∗ G G ∗ e 0 (µ0 )\G e0 and Ge− = Ge+ . To interpret the other two terms, we note that the coset space G 0 e can be realized as the orbit, O−K (−µ0 ), of G0 through the point j0 = −µ0 with respect e 0 on Ge0 : to the following action of G Rg0 0 : j0 7→ g0 j0 g0−1 − Kg00 g0−1
e 0 , j0 ∈ Ge0 . g0 ∈ G
(2.5)
e 0 , and thus it preserves the PB This is the co-adjoint action in (2.2) applied to G ¯ } = Tr ([Yi , Yl ] j0 (σ))δ − K Tr (Yi Yl ) δ 0 , { Tr (Yi j0 (σ)) , Tr (Yl j0 (σ))
(2.6)
where Yi is a basis of G0 and j0 now denotes a coordinate on Ge0 . Actually, we can identify the sum of the second and third terms in (2.4) to be just the canonical symplectic form on 0 e 0 (µ0 )\G e 0 in terms of the redundant coordinate g0 . (−µ0 ) = G the co-adjoint orbit O−K In order to confirm this formula of the canonical symplectic form, it is enough to remark that in the full WZNW model I runs over the co-adjoint orbit of (the central extension of) e through −µ for any fixed J = µ, and thus the formula for the symplectic form on this G orbit follows from the reduction of the WZNW model defined by the constraint J = µ. e 0 . The outcome of the foregoing analysis is summarized as Then apply this remark to G follows. ˇ red of the reduced phase space can be identified with the Theorem 1. The big cell M ∗e 0 0 e 0 (µ0 )\G e 0 is the orbit of G e0 (−µ0 ) = G manifold T G− × O−K (−µ0 ), where O−K ˇ red through −µ0 with respect to the co-adjoint action in (2.5). The symplectic form on M ˇ c in (2.4) coincides with the canonical symplectic form on this product defined by ω|M manifold. The Poisson brackets of the coordinate functions (g− , j, j0 ) determined by the symplectic ˇ red (which can be thought of as Dirac brackets) are given by (2.6) together form on M with ¯ } = Tr ([Vα , Vβ ] j)(σ) δ, { Tr (Vα j)(σ) , Tr (Vβ j)(σ) ¯ } = −Vα g− (σ) δ, { Tr (Vα j)(σ) , g− (σ)
(2.7)
766
J. de Boer, L. Feh´er
where Vα is a basis of G− . Denote by V β the dual basis of G+ , Tr (V β Vα ) = δαβ . Let qα be some global, holomorphic coordinates on G− . Define N αβ (q) ≡ Tr (V β
∂g− −1 g ). ∂qα −
(2.8)
Then4 −1 j(q, p) = Nαβ (q)pβ V α
(2.9)
e − . Indeed, in terms of the canonical coordinates qα (σ), pβ (σ) on T ∗ G Z S1
−1 d Tr jdg− g− =
Z d (pα dqα ) , S1
and therefore {qα (σ), pβ (σ)} ¯ = δαβ δ(σ − σ). ¯
(2.10)
The classical Wakimoto realization is given by the statement: Theorem 2. As a consequence of the PBs of j0 in (2.6) and qα , pβ in (2.10), where (2.10) is equivalent to (2.7) by means of (2.9), the classical Wakimoto current −1 −1 0 I(q, p, j0 ) = I(g− (q), j(q, p), j0 ) = g− (−j + j0 )g− + Kg− g−
(2.11)
¯ } = Tr ([Ta , Tb ] I)(σ)δ − K Tr (Ta Tb ) δ 0 . { Tr (Ta I)(σ) , Tr (Tb I)(σ)
(2.12)
satisfies
The affine-Sugawara stress-energy tensor is quadratic in the free field constituents, X 1 1 1 0 −1 Tr (I 2 ) = Tr (j02 ) − Tr (jg− Tr (j02 ) − g− ) = pα qα0 . 2K 2K 2K α
(2.13)
ˇ c , we have Proof. With the Pe (µ0 ) invariants in (2.3) on M I = −(g+ g0 g− )−1 (J+ + µ0 )(g+ g0 g− ) + K(g+ g0 g− )−1 (g+ g0 g− )0 −1 −1 0 = g− (−j + j0 )g− + Kg− g− .
This implies the first statement since the PBs of I in (2.1) survive the reduction as I is invariant under the symmetry group Pe . The second statement is easily verified. Remark 1. The validity of Theorem 2 does not require j0 to be restricted to a co-adjoint orbit. This is clear, for example, from the fact that the orbit appearing in Theorem 1 is through an arbitrarily chosen element −µ0 . 4
−1 To compare with (1.3), we have Fαβ = Nαβ .
Wakimoto Realizations of Current Algebras
767
It is natural to ask what the result would be if one substituted a Wakimoto realization of the G0 -valued current j0 into the G-valued Wakimoto current I(g− , j, j0 ) in (2.11). In fact, the resulting G-valued current will have the form (2.11) with respect to another parabolic subalgebra of G. Below we explain this “composition property” of the classical Wakimoto realization. We need to introduce some notations to verify the composition property. Suppose that we are given the parabolic subalgebra P = (G0 + G+ ) associated with the triangular decomposition G = G− + G0 + G+ . Then consider a parabolic subalgebra P0 ⊂ G0 , where P0 = (G0,0 + G0,+ ) in terms of a triangular decomposition G0 = G0,− + G0,0 + G0,+ , which is defined by the signs of the eigenvalues of some real-semisimple element H0 ∈ G0 . On account of standard Lie algebraic results [GOV], P c ≡ (P0 + G+ ) ⊂ G is again a parabolic subalgebra. Here the superscript “c” is our mnemonic for “composite”. We have the new triangular decomposition c + G0,0 + G+c G = G−
with
c G± ≡ G0,± + G± ,
c and the subalgebras G0,0 , G± allow us to further identify P c in the form
P c = P0 + G+ = G0,0 + G0,+ + G+ = G0,0 + G+c . c Notice that G± are semidirect sums since [G0,± , G± ] ⊂ G± . In correspondence with all these Lie subalgebras, we also introduce the respective connected subgroups G0,0 and G0,± of G0 , and the semidirect product groups Gc± = G0,± G± , where G± and G0 are the subgroups of G associated with the original triangular decomposition. Armed with these notations, we can now describe the “sub-Wakimoto” realization of j0 in a form analogous (2.11), namely −1 0 j0 = n−1 − (−i + i0 )n− + Kn− n− ,
where i0 is G0,0 -valued, i is G0,+ -valued, and n− is G0,− -valued. Substituting this into (2.11) we then find the identity −1 −1 0 c −1 c c −1 c 0 (−j + j0 )g− + Kg− g− = (g− ) (−j c + j0c )g− + K(g− ) (g− ) g−
with the composite objects c g− = n− g− ,
j c = i + n− jn−1 − ,
j0c = i0
that belong to G− , G+c and G0,0 , respectively. This identity expresses the composition property of the Wakimoto realization, which may be symbolically written as c , j c , j0c ). I(g− , j, j0 (n− , i, i0 )) = I(g−
(2.14)
After postulating the Poisson brackets of the constituents variables g− , j and n− , i, i0 , c , j c , j0c are readily checked to be the the Poisson brackets of the composite variables g− correct ones that one requires for the Wakimoto realization based on P c . The coordinates on Gc− that are naturally associated with the composition property just exhibited are obtained as unions of independent coordinates on G− and on G0,− . Using such coordinates, the composition property in principle allows one to produce the principal Wakimoto realization of I ∈ G by proceeding through a chain of non-principal Wakimoto realizations according to a partial ordering of parabolic subalgebras.
768
J. de Boer, L. Feh´er
3. Quantization of the Classical Wakimoto Realization Our goal now is to derive a quantum counterpart of the classical Wakimoto realization (2.11). As this classical realization was derived by means of a Hamiltonian reduction, there seem a priori to be two ways to quantize it. The first possibility would require us to write down a quantization of the phase space of the WZNW model and subsequently to implement a quantum Hamiltonian reduction. Although it might be very interesting to pursue this line of thought further, it seems to be rather cumbersome, and in addition in the case at hand it turns out to be relatively easy to directly quantize the classical Wakimoto realization. Therefore we will restrict ourselves to the latter method. Since the classical Wakimoto realization expresses the currents of G in terms of currents in G0 and a set of coordinates and momenta that constitute the cotangent bundle e − , our philosophy will be to first quantize these objects by postulating OPEs for T ∗G their generators, and subsequently to write down a normal ordered version of (2.11) in terms of these generators. The hard work lies in verifying that the currents defined in this way indeed satisfy the OPEs of the affine Lie algebra based on G. This requirement will in addition fix the ambiguities that one has to deal with in normal ordering (2.11). Fixing a basis Ta of G, the OPEs corresponding to (2.12) should read as Tr (Ta I)(z)Tr (Tb I)(w) =
KTr (Ta Tb ) Tr ([Ta , Tb ]I)(w) . + (z − w)2 z−w
(3.1)
Replacing (2.12) with (3.1) amounts to replacing the PBs of the Fourier modes of the current with corresponding commutators, as is well-known. Naturally, the OPEs of the constituent coordinate and momentum fields are declared to be −δβα . (3.2) z−w P Decomposing the G0 -valued current j0 as j0 = i≥0 j0i according to (1.9), we postulate the OPEs of the current j0i in G0i as pα (z)qβ (w) =
Tr (π0i (Ta )j0i )(z)Tr (π0i (Tb )j0i )(w) =
K0i Tr (π0i (Ta )π0i (Tb )) (z − w)2 Tr ([π0i (Ta ), π0i (Tb )]j0i )(w) , + z−w
(3.3)
where π0i : G → G0i is the orthogonal projection onto G0i . All other OPEs of the constituents are regular. Note that we have taken the central extension K0i of j0i to be a free parameter, to be determined from requiring (3.1), and that the properly normalized level parameters of I and j0i (which are integers in a unitary highest weight representation) are respectively given by k≡
2K |ψ|2
and k0i ≡
2K0i |ψi |2
for
i > 0.
Notice now that the classical Wakimoto current in (2.11) is linear in the pα and in j0 , but could contain arbitrary functions of the qα if the coordinates were not chosen with care. However, we here only wish to deal with objects that are polynomial in the basic quantum fields, since those are easily defined in chiral conformal field theory (which is the same as the theory of vertex algebras, see e.g. [G]) by normal ordering. Below we
Wakimoto Realizations of Current Algebras
769
will define a class of coordinates on G− , the so-called “upper triangular coordinates”, in which the quantum Wakimoto current will be polynomial. The computations will also simplify considerably in these coordinates. −1 jg− = Fortunately, the only ordering problem in (2.11) arises from the term −g− −1 −1 β α −g− Nαβ (q)p V g− for which we have to choose where to put the momenta pβ . We now choose to put them on the left, and replace any classical object pf (q) by the normal ordered object (p(f (q))). From the OPE5 pα (z)f (q(w)) =
−∂ α f , z−w
we see that ([pα , f (q)]) = −∂∂ α f (q) = −∂ β ∂ α f (q)∂qβ . Hence, the difference between two normal orderings of the classical object pf (q) will always be of the form β (q)∂qβ for some function β , and we should allow for an additional term of this type in quantizing (2.11). Altogether this leads to the following proposal for the quantum Wakimoto current: −1 −1 α −1 −1 −1 β g− V g− ) + g − j0 g− + Kg− ∂g− + g− g− ∂qβ . (3.4) I = − pβ (Nαβ Our main result will be to give the explicit form of the last term, which represents a quantum correction due to different normal orderings. The function β (q) is G-valued and we inserted some factors of g− around it for convenience. To define the special coordinates in which our formula for β will be valid, we first introduce the matrix Rab (g− ) by −1 ≡ Rab (g− )Tb g − Ta g −
g − ∈ G− .
(3.5)
Definition (polynomial coordinates). We call a system of global, holomorphic coordinates qα on G− polynomial if Rab (g− (q)) is given by a polynomial of the coordinates. The fact that det R = 1, which follows from the invariance of Tr and from the −1 ) is also polynomial in the qα . fact that G− is topologically trivial, shows that Rab (g− αβ Furthermore, since one can evaluate N (q) in (2.8) using the adjoint representation, the definition implies that N αβ (q) is a polynomial, too. The determinant of N αβ (q) is a nowhere vanishing complex polynomial and must therefore be a constant, so that −1 (q) is a polynomial as well. Nαβ Let deg denote the Z-gradation with respect to which the decomposition (1.7) was made6 , assume that the basis elements Vα of G− have well-defined degree, and set dα ≡ −deg(Vα ) = deg(V α ) > 0. For polynomial coordinates qα on G− , let us assign degree dα to qα . Definition (upper triangular coordinates). We call a system of polynomial coordinates on G− upper triangular if N αβ (q) is given by a homogeneous polynomial of degree (dβ − dα ) with respect to the above assignment of the degree. 5
The notations ∂ α f (q) ≡
∂f (q) ∂qα
and (∂F )(z) ≡
∂F (z) ∂z
are used for functions f of q and F of z from
now on. 6 The standard “grading operator” H in (1.6), [H, E 0 ±αl ] = ±nl E±αl , can be replaced with any H for which [H 0 , E±αl ] = ±n0l E±αl in such a way that n0l = 0 whenever nl = 0 and n0l nl > 0 otherwise. All results remain true if one uses the subsequent definition relative to any such gradation.
770
J. de Boer, L. Feh´er
Remark 2. In upper triangular coordinates N αβ (q) obviously vanishes unless dβ ≥ −1 dα , which explains the name and implies that Nαβ (q) is also a polynomial of degree −1 (dβ − dα ). Thus the definition ensures that the vector field Nαβ (q) ∂q∂β , which generates the one-parameter group g− 7→ etVα g− , has the degree −dα of Vα when acting on polynomials in the coordinates. Examples. The most obvious examples of upperP triangular coordinates are the “graded exponential coordinates”, given by g− (q) = exp( α qα Vα ). One can also take products of graded exponential coordinates, Q P by distributing the set {Vα } over disjoint subsets even simpler SI and taking g− (q) = I exp( α∈SI qα Vα ). If G = SL(n), there are P coordinates where the qα are matrix elements, namely g− (q) = 1n + α qα Vα with (Vα )lk = δil δjk for some i > j. To check the upper triangularity property in these examples, it is useful to think of g− (q) as a polynomial in the qα and the Vα which are declared to have degrees dα and −dα , respectively. Then g− (q) has “total degree” zero, and N αβ (q)Vβ has total degree −dα , implying that N αβ (q) has degree (dβ − dα ). For the same reason, in our examples not only N αβ (q), but actually also the matrix Rab (q) in (3.5) is given by homogeneous polynomials. Namely, if [H, Ta ] = τa Ta with the grading operator H in (1.6), then Rab (q) has degree (τa − τb ). It follows that the vector field that generates the action of Ta on G− ⊂ P \G, induced from right multiplication, has degree τa when acting on polynomials in the coordinates. Now we are ready to state the main result of this section: Theorem 3. Given a system of upper triangular coordinates qα on G− , the current I defined in (3.4) satisfies the OPE given in (3.1) if (i) 2K00 = |ψ|2 (k + h∗ ) = |ψi |2 (k0i + h∗i ) for i > 0, and (ii) β is given by the following G− -valued object −1 [V γ , Vρ ]. β = N λρ ∂ β Nγλ
(3.6)
Furthermore, β is uniquely determined by (3.1) up to trivial redefinitions of the momenta pβ in (3.4), (3.7) pβ → pβ + (∂ β Aγ − ∂ γ Aβ )∂qγ with an arbitrary polynomial Aγ (q). Finally, the affine-Sugawara stress-energy tensor for the current I is equal to a sum consisting of free stress-energy tensors for pβ , qβ , affine-Sugawara stress-energy tensors for the currents j0i with values in the simple factors G0i ⊂ G0 , and an improved stress-energy tensor for the current j00 with values in the abelian factor G00 ⊂ G0 , 1 X 1 1 Tr (I 2 ) = −pβ ∂qβ + Tr (j0i j0i ) + Tr ((ρG − ρG0 )∂j00 ), 2y 2y y
(3.8)
i≥0
where y ≡ 21 |ψ|2 (k + h∗ ) and ρG − ρG0 = 21 [V α , Vα ] is half the sum of those positive roots of G that are not roots of G0i for any i > 0. Proof. The proof of this theorem consists of an explicit calculation of the OPE of I with itself and of Tr (I 2 ). This calculation is relatively straightforward and not very insightful, so we will not present it in full detail. In it one needs for example the relation between the structure constants of G− and N αβ , which follows directly from (2.8), −1 γ −1 βµ −1 γ −1 βµ ∂ Nαβ N − Nαγ ∂ Nδβ N = fαδ µ . Nδγ
(3.9)
Wakimoto Realizations of Current Algebras
771
In addition, one can use the fact that the coordinates qα are upper triangular to show that many terms appearing in the calculation actually vanish. To illustrate how the calculations are done, we compute here only the double pole in the OPE of Tr (Ta I)(z) with Tr (Tb I)(w). Denote by I1 , . . . , I4 the four terms on the right hand side of (3.4). Then the OPE of Tr (Ta I1 )(z) with Tr (Tb I1 )(w) has a double pole D1,1 (w)/(z − w)2 which is completely absent in the classical case, with −1 −1 α −1 −1 δ g− V g− )∂ β Tr (Tb Nδγ g− V g− ). D1,1 = −∂ γ Tr (Ta Nαβ
This follows from the OPE (pα f (q))(z)(pβ g(q))(w) =
−(∂ β f (z)∂ α g(w)) (pα (g∂ β f ) − pβ (f ∂ α g))(w) . + (z − w)2 z−w
Before we continue, we introduce the following notation. Define −1 T˜a = g− Ta g− ,
(3.10)
and let T˜a = T˜a− + T˜a0 + T˜a+ be the decomposition of T˜a in parts with values in G+ , G0 , G− , and T˜a0,i the part of T˜a0 with values in G0i . The next double pole, coming from the OPE of I2 with itself, can now be written as D2,2 (w)/(z − w)2 with X K0i Tr (T˜a0,i T˜b0,i ). D2,2 = i
The only further double poles come from the OPE of I1 with I3 and from the OPE of I1 with I4 . These contribute double poles D1,3 = KTr (T˜a− T˜b+ + T˜a+ T˜b− ) and
−1 −1 D1,4 = Tr (T˜a V α Nαβ )Tr (T˜b β ) + Tr (T˜b V α Nαβ )Tr (T˜a β ).
According to (3.1), the sum of all the double poles should be equal to KTr (Ta Tb ). Subtracting this from the sum of the double poles yields the equation X (K0i − K)Tr (T˜a0,i T˜b0,i ) = 0. D ≡ D1,1 + D1,4 + i
More explicitly, D is equal to D = −Tr ([Vρ , T˜a ]V α )Tr ([Vα , T˜b ]V ρ ) −Tr ([Vρ , T˜a ]V α )N γρ N −1 ∂ β N −1 Tr (T˜b V δ ) αβ
δγ
−1 βη −1 N Nδγ Tr ([Vη , T˜b ]V δ ) −Tr (T˜a V α )∂ γ Nαβ −1 β −1 ∂ Nδγ Tr (T˜b V δ ) −Tr (T˜a V α )∂ γ Nαβ −1 Tr (T˜b β ) +Tr (T˜a V α )Nαβ −1 +Tr (T˜a β )Tr (T˜b V α )Nαβ X + (K0i − K)Tr (T˜a0,i T˜b0,i ). i
(3.11)
772
J. de Boer, L. Feh´er
The fourth term in D vanishes identically in upper triangular coordinates. The first term can be further written as −Tr ([Vρ , T˜a ]V α )Tr ([Vα , T˜b ]V ρ ) = −Tr ([Vρ , T˜a0 ]V α )Tr ([Vα , T˜b0 ]V ρ ) −Tr ([Vρ , T˜a− ]V α )Tr ([Vα , T˜b+ ]V ρ ) −Tr ([Vρ , T˜a+ ]V α )Tr ([Vα , T˜b− ]V ρ ) = −Tr ([Vρ , T˜a0 ]V α )Tr ([Vα , T˜b0 ]V ρ ) −Tr (T˜a V δ )fρδ α Tr ([Vα , T˜b+ ]V ρ ) −Tr ([Vρ , T˜a+ ]V α )fαδ ρ Tr (T˜b− V δ ). If we now use (3.9) to replace the structure constants in this last expression by expressions involving N , they conspire with the second and third line in (3.11) to yield the following expression for D: D = − Tr ([Vρ , T˜a0 ]V α )Tr ([Vα , T˜b0 ]V ρ ) −1 −1 Tr (T˜b (β − N λρ ∂ β Nδλ [V δ , Vρ ])) +Tr (T˜a V α )Nαβ −1 −1 [V δ , Vρ ]))Tr (T˜b V α )Nαβ +Tr (T˜a (β − N λρ ∂ β Nδλ X + (K0i − K)Tr (T˜a0,i T˜b0,i ).
(3.12)
i
The first and fourth line contain only the projections of T˜a and T˜b on G0 , whereas the second and third line contain the projections of T˜a and T˜b on G− , respectively. Therefore the sum of the first and fourth line must vanish independently of the sum of the second and third line. The sum of the first and fourth line can be written as X i
1 1 (K0i −K)Tr (T˜a0,i T˜b0,i )− Tr (T˜a0 [Vρ , [V ρ , T˜b0 ]])− Tr (T˜a0 [V ρ , [Vρ , T˜b0 ]]). (3.13) 2 2
Using the quadratic Casimirs c2 (G), c2 (G0i ) that appear in (1.10) and c2 (G00 ) = 0, we find from (3.13) that the following identity must hold X 1 1 i i K0 + c2 (G0 ) − K − c2 (G) Tr (T˜a0,i T˜b0,i ) = 0. 2 2 i≥0
This implies the relation 1 1 K0i + c2 (G0i ) = K + c2 (G), 2 2
∀i ≥ 0,
which is claimed in Theorem 3. The remaining two terms in (3.12) have to vanish separately, and this gives a linear equation for β . The general solution to this linear equation is given by a particular solution of the inhomogeneous equation plus the general solution, say 1β , to the homogeneous equation. From (3.12) we see immediately that a particular solution of the inhomogeneous equation for β is precisely the one given in Theorem 3. It remains to analyze the homogeneous equation −1 −1 Tr (T˜b 1β ) + Tr (T˜a 1β )Nαβ Tr (T˜b V α ) = 0. Tr (T˜a V α )Nαβ
(3.14)
Wakimoto Realizations of Current Algebras
773
Decomposing T˜a , T˜b and 1β according to G = (G− + G0 + G+ ), we first deduce that 1β must be G+ -valued. Parametrizing 1β = −N βρ Bρσ V σ , we then find that (3.14) is equivalent to B being anti-symmetric. These solutions to the homogeneous equation for 1β correspond to redefinitions of the pβ of the form pβ → pβ + N βλ N ρµ Bµλ ∂qρ in the sense that such a redefinition can be absorbed in the change β → (β + 1β ) in (3.4). This completes the analysis of the double pole. A further analysis of the single pole shows that I indeed satisfies (3.1), if N βλ N ρµ Bµλ is restricted to be of the form ∂ β Aρ − ∂ ρ Aβ . Finally, using the identities in Appendix A, one verifies by an explicit calculation that Tr (I 2 ) is given by the expression in (3.8). Remark 3. The level shifts given in the theorem have already been found in the second article in [FF1] (see also the appendix in [FFR]). The fact that β in (3.6) is G− -valued follows from the upper triangularity of the coordinates. We shall see in Sect. 5 that our formula for β reproduces all the earlier explicit results for the Wakimoto realization. Remark 4. Let ρG and ρG0i denote the Weyl vectors of the Lie algebras G and G0i for P i > 0, respectively, and set ρG0 := i>0 ρG0i . Identifying them as elements of the Cartan subalgebra, the Weyl vectors by definition satisfy Tr (ρG Hαl ) = 1 for any simple root αl of G, and respectively Tr (ρG0i Hαl ) = 1 for any such simple root for which the Chevalley generator Hαl lies in G0i . Since ρG is half the sum of the positive roots of G, the equality 21 [V α , Vα ] = (ρG − ρG0 ) follows by evaluating the sum on the left hand side using bases of G± that consist of root vectors. The defining properties of the Weyl 0 vectors imply that P(ρG − ρG0 ) belongs to the abelian factor G0 ⊂ G0 in (1.9), and thus ρG = (ρG −ρG0 )+ i>0 ρG0i contains pairwise orthogonal terms. Taking this into acccount and applying the strange formula 24|ρG |2 = |ψ|2 h∗ dim(G), as well as its analogue for G0i , one readily checks that the central charges of the constituent stress-energy tensors in (3.8) add up correctly to k dim(G)/(k + h∗ ). Remark 5. The analysis of the uniqueness of β (q) contained in the proof above goes through in the same way with respect to arbitrary polynomial coordinates on G− . We find that if a particular β exists so that I given by (3.4) satisfies the current algebra, then the most general β having this property arises from the replacement β → (β + 1β )
where
−1 α β 1β = −Nγα (∂ A − ∂ β Aα )V γ
with an arbitrary polynomial Aα (q). This corresponds to a redefinition of the momenta in (3.4) similar to (3.7). These redefinitions are in fact canonical transformation since they preserve the basic OPEs. Since such canonical transformations already exist at the classical level, we see that there is no genuine ambiguity in the quantum correction β .
774
J. de Boer, L. Feh´er
4. Quantizations and Polymorphisms In Sect. 3 we quantized the classical Wakimoto current (2.11) in special polynomial coordinates on G− , namely in upper triangular ones. It is natural to ask whether the quantization can be performed in other coordinates as well, and whether the quantizations arising in different coordinates are essentially the same. In this section we shall give a partial answer to these questions. We call two systems of global, holomorphic coordinates on G− polynomially equivalent if the change of coordinates is given by a polymorphism. We shall demonstrate that polymorphisms induce equivalent quantizations of the classical Wakimoto realization. A map f : Cn → Cn is a polynomial map if its components are given by polynomials. A polynomial map is a polymorphism if it is one-to-one, onto and the inverse is also a polynomial map. It is known that every injective polynomial map on Cn is a polymorphism [BR, Ru]. Let us also mention the very non-trivial7 Jacobian Conjecture (Ke,BCW). A polynomial map f : Cn → Cn is a polymorphism if and only if the Jacobian determinant det(Df ) is a non-zero constant . Suppose that qα (Q) describes the change of coordinates between two global, holomorphic systems of coordinates {qα } and {Qα } on G− . At the classical level, one has the canonical transformation that maps (Q, P ) to (q, p) according to qα = qα (Q),
pα = pα (Q, P ) =
∂Qµ µ P , ∂qα
(4.1)
which follows geometrically if one thinks of pα as − ∂q∂α . It is clear that the classical Wakimoto current in (2.11) is a coordinate independent object, that is, I(Q, P, j0 ) = I(q(Q), p(Q, P ), j0 ). Suppose now that the classical Wakimoto realization has been quantized in the qα system in the sense that a polynomial β (q) was found for which I(q, p, j0 ) in (3.4) satisfies (3.1). Below we explain how to construct a quantum field theoretic — vertex algebraic — analogue of the classical canonical transformation (4.1) in the case for which the change of coordinates is given by polynomials. After finding this “quantum polymorphism”, we will simply define the quantized Wakimoto current in the Qα -system by (4.2) I(Q, P, j0 ) ≡ I(q(Q), p(Q, P ), j0 ). Then I(Q, P, j0 ) will have the form (3.4) for some β (Q) induced by the construction, and hence this quantization is essentially unique due to the remark at the end of Sect. 3. We search for the quantum mechanical analogue of (4.1) in the form α,µ qα = qα (Q), pα = pα (Q, P ) = P µ 0(Q)α ∂Qµ , (4.3) µ + Θ(Q) where classically qα (Q) is a polymorphism. This formula takes into account the ambiguity in normal ordering (4.1) and it guarantees the invariance of the form of I in (3.4). The standard OPEs in the qα -system (3.2) must hold as a consequence of those in the Qα -system: −δβα . P α (z)Qβ (w) = z−w 7
It is obvious that det(Df ) is a non-zero constant for any polymorphism f . The converse is open at present.
Wakimoto Realizations of Current Algebras
775
α,µ Next we show that this condition determines the functions 0(Q)α . µ and Θ(Q) We first require that the OPE of pα with qβ has the correct form. This determines 0 as ∂Qµ . 0α µ = ∂qα
We then require the double pole of the OPE of pα with pβ to vanish, which admits the particular solution Θα,µ = Θ∗α,µ with Θ∗α,µ =
1 ∂Qρ ∂ 2 qδ ∂ 2 Qλ ∂qθ . 2 ∂qα ∂Qλ ∂Qρ ∂qδ ∂qθ ∂Qµ
The general solution of the same requirement is found to be Θα,µ = Θ∗α,µ +
∂Qη η,µ B , ∂qα
where B = B α,β (Q)dQα dQβ is an arbitrary 2-form on G− , B α,β = −B β,α . The question is whether we can choose this 2-form in such a way that the simple pole in the OPE of pα with pβ vanishes. Introducing the matrix C α by ∂Qλ ∂ ∂qη α η (C )ρ ≡ ∂qρ ∂qα ∂Qλ and using the preceding formulae, we obtain pα (z)pβ (w) =
where S
α,β
= ∂qγ
1 Tr ([C α , C β ]C γ ) − (dB)α,β,γ 2
with (dB)α,β,γ =
S α,β , z−w
∂Qa ∂Qb ∂Qc ∂qα ∂qβ ∂qγ
∂B a,b ∂B b,c ∂B c,a + + ∂Qc ∂Qa ∂Qb
.
The notation is justified by the fact that dB = (dB)α,β,γ dqα dqβ dqγ for the 2-form B on G− . Since the singular part of the OPE of pα with pβ must vanish, we conclude from the above that there exists a quantum canonical transformation from the Qα -system of coordinates to the qα -system if and only if E ≡ Tr ([C α , C β ]C γ )dqα dqβ dqγ is an exact 3-form on G− . In view of the fact that G− has trivial topology, this is equivalent to E being closed. Now E can be rewritten as E = Tr ([0d0−1 , 0d0−1 ], 0d0−1 ), a form which reminds one of the topological term in WZNW action, and when written in this form one easily verifies that dE = 0. Thus there always exists a B such that dB = 21 E. It is also clear that the components of B can be chosen to be polynomials, although it may not be possible to write them down explicitly. Choosing a solution for B, the resulting canonical transformation is unique up to a 1-form A = Aα (Q)dQα , which may be used to modify the 2-form B appearing in the general solution for Θα,µ according to B → B + dA. The meaning of this last ambiguity in B is that it is possible to redefine the momenta while keeping the coordinates fixed. Such a canonical transformation is
776
J. de Boer, L. Feh´er
∂ ∂ given by qα = Qα , pα = P α + (dA)α,β ∂Qβ , where (dA)α,β = ∂Q Aβ − ∂Q Aα . These α β transformations, which already made their appearance in Theorem 3 and in Remark 5, are similarity transformations generated by the operator I dz α A (Q(z)) ∂Qα (z) . U = exp 2πi α,µ Having established the existence of the required polynomials 0(Q)α , µ and Θ(Q) we can implement the polynomial change of coordinates by the canonical transformation (4.3). It is natural to enquire how the free field stress-energy tensor, −pα ∂qα , behaves with respect to this transformation. A straightforward OPE calculation reveals the transformation rule
1 − pα ∂qα = −P α ∂Qα + ∂ 2 log det 0. 2
(4.4)
The extra term vanishes since det 0 is the Jacobian determinant of a polymorphism in our case. Incidentally, if the Jacobian conjecture is true, then we could conclude from the same calculation that the form invariance of pα ∂qα under a polynomial map f requires f to be a polymorphism. It seems interesting that the free field form of the stressenergy tensor can be related to the polynomial equivalence of the underlying systems of coordinates in this way. The main result of this section is now summarized as follows: Theorem 4. Let qα and Qα be polynomially equivalent systems of coordinates on G− . Suppose that the quantization of the classical Wakimoto realization can be performed in the qα -system in such a way that I(q, p, j0 ) has the form (3.4) with a polynomial β (q), and the affine-Sugwara stress-energy tensor has the free field form given in Theorem 3. Then, using the quantum coordinate transformation (4.3), I(Q, P, j0 ) in (4.2) defines a quantization of the classical Wakimoto realization in the Qα -system that has the same properties. We wish to conclude with some remarks and conjectures related to the above result. Let us call a system of polynomial coordinates qα on G− admissible if there exists a polynomial β (q) for which I(q, p, j0 ) in (3.4) satisfies the current algebra. Combining Theorem 4 and Remark 5, we see that the quantizations of the Wakimoto realization associated with two different systems of admissible coordinates that are polynomially equivalent can always be converted into each other by a canonical transformation (4.3). Hence these quantizations are essentially the same. Theorem 3 tells us that all upper triangular systems of coordinates, and therefore also all those that are polynomially equivalent to upper triangular ones, are admissible. It would be interesting to know whether all polynomial coordinates on G− are polynomially equivalent to upper triangular ones or not. We have the Conjecture 1. Any two upper triangular systems of coordinates are polynomially equivalent. Remark 6. Let {qα } and {Qα } be two polynomially equivalent systems of coordinates on G− that are each upper triangular with respect to the same underlying gradation of the Lie algebra. Let us further assume that qα (Q) and Qα (q) are homogeneous polynomials of degree dα in their respective arguments, which themselves carry the degrees specified in the notion of upper triangularity. (We can verify this property in our examples, but have no general proof of it.) Under this homogeneity property the object Θ∗ vanishes
Wakimoto Realizations of Current Algebras
777
and C α is an upper triangular matrix, which implies that the 3-form E vanishes too. Therefore the quantum polymorphism (4.3) simplifies to ∂Qµ µ µ ν ∂ ν µ α P + ∂Q A − ∂Q ∂Q A ∂Qν , = . pα = ∂qα ∂Qα In the present case this formula is free of normal ordering ambiguities. Remark 7. At the end of Sect. 2, we explained the “composition property” of the classical Wakimoto realization. Clearly, there exists a quantum mechanical version of this property. To formulate a precise assertion, let us continue with the notations in (2.14) c by upper triangular coordinates {qi0 }, {qα } and {Qa } and parametrize n− , g− and g− c on G0,− , G− and on G− , respectively. We can then quantize the classical Wakimoto realization associated with the parabolic subalgebra P c in two ways: either directly in the {Qa } system or by composing the quantized Wakimoto realizations associated with P0 and P that are obtained in the {qi0 } and {qα } systems of coordinates. As a consequence of the above, the results of these two procedures are related by a quantum canonical transformation whenever the {Qa } system of coordinates on Gc− = G0,− G− is polynomially equivalent to the system given by the union {qi0 } ∪ {qα }. This is the case in all examples we know. In conclusion, up to a few open problems like the polynomial equivalence of all upper triangular coordinates, we can argue from the results of this section that there is a unique quantization of the classical Wakimoto realization up to canonical transformations. The fact that we could not completely prove the uniqueness of the quantization is due to the fact that in Theorem 3 we required only a rather abstract condition on the coordinates, whose content is not easy to analyze completely. If we restrict the allowed coordinates to the polynomial class of the graded exponential coordinates, of the form P equivalence in g− = exp α qα E−α , which are the ones used in practice, then our results already imply the expected uniqueness of the quantized Wakimoto realization.
5. Examples of the Wakimoto Realization We here show that the formulas of the Wakimoto current found in the literature previously follow as special cases of Theorem 3. If needed in applications, one can easily work out other examples along similar lines. 5.1. Recovering the explicit formulas of Feigin and Frenkel. In the first article of ref. [FF1], Feigin and Frenkel found the explicit form of the Chevalley generators ei (z), hi (z), fi (z) for the principal Wakimoto realization of the sl(n) current algebra. See also the second paper in [BMP]. Next we reproduce the Feigin-Frenkel formula. In accordance with [FF1, BMP], we parametrize the group element g− ∈ G− by its matrix entries in the defining representation of G = SL(n), X g− = 1n + qab eab , n≥a>b≥1
where eab is the usual elementary matrix. Our task is to determine the projections of the current I in (3.4) on the subspaces of sl(n) with principal grades 0, ±1. Let us decompose I as
778
J. de Boer, L. Feh´er
I = Igeom + Y
−1 −1 α Igeom = −pβ Nαβ g− V g− ,
with
where the index α is now a pair α = ab with n ≥ a > b ≥ 1, Vab = eab , V ab = eba , and −1 −1 −1 β Y = g− j0 g− + Kg− ∂g− + g− g− ∂qβ .
Clearly, we have Y1 = 0, Y0 = j0 . Specifying the formula (3.6) of β for the case at hand, the grade −1 part of Y turns out to be Y−1 =
n−1 X
ei+1i (k + n − i − 1)∂qi+1i − qi+1i Tr (Hi j0 ) ,
Hi ≡ eii − ei+1i+1 .
i=1
In order to determine Igeom we apply an indirect method based on the finite group version of the Hamiltonian reduction. The method utilizes the fact that Igeom is the momentum map for the infinitesimal action of G on T ∗ G− , which is induced from G− being the big cell in the coset space P \G, where P = G+ G0 is the upper triangular subgroup of G = SL(n). We wish to determine the components of this momentum map, ∀ X ∈ sl(n). F X (q, p) ≡ Tr XIgeom = FαX (q)pα We see from the geometric picture of the Hamiltonian reduction that the Hamiltonian vector field on T ∗ G− associated with F X is the lift of a corresponding vector field, V(X), on G− , which is given by V(X) = −FαX (q)
∂ . ∂qα
V(X) is obtained by projecting the infinitesimal generator of the one parameter group action (t, g) 7→ getX on G onto the coset space P \G, and restricting the result to G− ⊂ P \G. In other words, V(X) is the infinitesimal generator of the local one parameter X (t) on G− defined by the Gauss decomposition group action (t, g− ) 7→ g− X (t), g− etX = g+X (t)g0X (t)g−
X g0X (t) ∈ G0 , g± (t) ∈ G± .
A straightforward computation (like the proof of Theorem 2.4 in the second paper in [BMP]) yields V(el+1l ) =
X ∂ ∂ + qil+1 , ∂ql+1l ∂qil
V(H) =
i≥l+2
X
(1l − 1i )qil
i>l
∂ ∂qil
for H ≡ diag(11 , . . . , 1n ), and V(ell+1 ) =
X i≥l+2
qil
∂ − ∂qil+1
X i≤l−1
ql+1i
∂ + ql+1l ∂qli
X
i≤l−1
qli
∂ − ∂qli
X i≤l
ql+1i
∂ . ∂ql+1i
Collecting the above formulas, we arrive at the following expressions for the quantum fields generating the sl(n) current algebra:
Wakimoto Realizations of Current Algebras
X
Tr (el+1l I) = −pl+1l −
pil qil+1 ,
i≥l+2
Tr (Hl I) = Tr (Hl j0 ) − 2pl+1l ql+1l + +
X
779
X
(pil+1 qil+1 − pil qil )
i≥l+2
(pli qli − pl+1i ql+1i ) ,
i≤l−1
Tr (ell+1 I) = −Tr (Hl j0 )ql+1l + (k + n − l − 1)∂ql+1l + +
X
pl+1i ql+1i ql+1l −
i≤l
X
pli ql+1i −
i≤l−1
X
pli qli ql+1l .
X
pil+1 qil
i≥l+2
(5.1)
i≤l−1
Here pab denotes pα for α = ab and normal ordering is understood according to our conventions fixed previously. If, using an automorphism of sl(n), we define hi ≡ −Tr Hn−i I , fi ≡ Tr en−i , n−i+1 I , ei ≡ Tr en−i+1 , n−i I , and in addition set β ab ≡ −pn+1−a , n+1−b
γ ab ≡ −qn+1−a , n+1−b ,
and bosonize the current j0 of the Cartan subalgebra of sl(n) in the usual way, then formula (5.1) becomes identical to the Feigin-Frenkel formula as described e.g. in Theorem 3.2 of the second article in [BMP]. For later reference, let us note here the formula X (pl+1m qlm ), (5.2) Tr (jel+ll ) = pl+1l + 1≤m
which is obtained by using that Tr (jVα ) corresponds to the differential operator −1 ∂ −tVα g− for any −Nαβ ∂qβ whose action on g− generates the one parameter group e Vα ∈ G− . 5.2. Recovering the formulas of Ito and Komata. We now consider the principal case for an arbitrary Lie algebra G. We here regard the vector fields on G− ⊂ P \G as input data, and only search for the correction to these data for the currents associated with the −1 ∂g− of I in (3.4) yields an obvious Chevalley generators of G. The derivative term Kg− correction. The only non-trivial quantum correction that will contribute is at grade −1, namely β (q) −1 ∂qβ . As a preparation to determining this term, let us choose root vectors E±α for every positive root α ∈ 8+ in such a way that [Eα , E−α ] = Hα ,
[Hα , E±α ] = ±2E±α .
This implies (e.g. [H]) that 2 Tr Eα E−β = δαβ , |α|2 and thus the dual bases of G− and G+ can be taken to be
(5.3)
780
J. de Boer, L. Feh´er
Vα = E−α
and
Vα =
|α|2 Eα . 2
For arbitrary roots γ, λ for which (γ + λ) is a root, we have [Eγ , Eλ ] = Cγ,λ Eγ+λ with some non-zero Cγ,λ . We now parametrize G− by the exponential coordinates X qα E−α . g− = exp α∈8+
In order to compute the grade −1 part of β in (3.6), remember that the matrices N and N −1 are block upper triangular, and their block-diagonal part is the identity. Thus the only contribution to (β )−1 will come from the block-diagonal part of N multiplied by ∂ β of the blocks of N −1 just above the diagonal. Before taking the derivative, these blocks of N −1 are just the negative of the blocks of N in the same position. If A(x) is a matrix function, then 1 (∂x eA )e−A = ∂x A + [A, ∂x A] + higher commutators, 2 and we see from this that −1 = N αλ E−λ = E−α + ∂ α g− g−
1 X qγ [E−γ , E−α ] + higher commutators. 2 + γ∈8
Hence, for the blocks above the diagonal, we get N αλ |dλ =dα +1 = Therefore,
1 qλ−α Cα−λ,−α . 2
1 −1 |dλ =dα +1 = δα+β,λ − C−β,−α . ∂ β Nαλ 2
In particular, this is always zero if β is not a simple root. Now, −1 (β )−1 = ∂ β Nαλ |dλ =dα +1 N λρ |dλ =dρ [V α , Vρ ]
implies that (β )−1 = 0 unless β is a simple root, and if β is a simple root then X −1 ∂ β Nα,α+β N α+β,α+β [V α , Vα+β ]. (β )−1 = α,α+β∈8+
Substituting the above formulas, we get 1 X (β )−1 = E−β − 4
|α|2 C−β,−α Cα,−β−α .
α,α+β∈8+
To summarize, for any simple root β we found the quantum correction
Wakimoto Realizations of Current Algebras
Tr (Eβ γ ∂qγ ) = Aβ ∂qβ
with
781
Aβ = −
1 2
X α,α+β∈8+
|α|2 C−β,−α Cα,−β−α . |β|2
(5.4)
Using that |α|2 Cα,−β−α = |β|2 C−β−α,β , which follows from (5.3), we can rewrite Aβ as 1 X C−α,−β C−β−α,β . (5.5) Aβ = 2 + α,α+β∈8
In fact, this precisely is the quantum correction part of Tr (Eβ I) found by Ito and Komata [IKo] (see also [Ku]). When making the comparison with Eqs. (24), (26) in [IKo], one −1 ∂g− too. Note should notice that the term in (26) there contains the contribution of Kg− also that the normal ordering used in [IKo] is different from our convention, since there p is placed to the right and the normal ordering is implemented in a nested manner, like for example in (q(qp)). Actually, the only term in the Chevalley generators where the normal ordering could be ambiguous is of this kind, and for this term (q(qp)) = (p(qq)). We conclude that the formulas of [IKo] are consistent with our result. Finally, let us record the compact formula Aβ =
c2 (G) − 1. 2|β|2
(5.6)
This formula was found by Ito who verified it by a case by case inspection of the Lie algebras using a Chevalley basis, that is a Cartan-Weyl basis satisfying [H, GOV] Cα,β = C−β,−α in addition to (5.3), see the second paper in [IKo]. In Appendix B we present a simple derivation, which in particular shows that (5.6) is valid in any CartanWeyl basis subject to the normalization condition (5.3). Remark 8. In the very recent paper [Ra], explicit formulas are given for arbitrary components of the Wakimoto current in the principal case, using graded exponential coordinates. Since the above quantum correction for the Chevalley generators is recovered in [Ra], too, the formulas of [Ra] must be consistent with our result for β in (3.6), which is valid with respect to an arbitrary parabolic subgroup P ⊂ G and arbitrary upper triangular coordinates on G− . 5.3. A simple example: n = 1 + (n − 1). Next we consider a partial free field realization b b − 1), gl(1) b in which sl(n) is realized in terms of sl(n and (n − 1) pairs of symplectic bosons. This in principle allows for getting the complete free field (principal Wakimoto) b realization of sl(n) by iterating the construction (see also [B]). We choose the matrix H ∈ sl(n) that determines the grading and the parabolic subalgebra to be 1 H ≡ diag (n − 1), −1n−1 . n Correspondingly, G− is spanned by the grade −1 elements eα+1,1 , α = 1, . . . , (n − 1), and is abelian. Parametrizing g− ∈ G− as g− = 1n +
n−1 X α=1
qα e1+α,1 ≡ 1n +
h
0 q
i 0 , 0
−1 = δαβ . This implies that the quantum correction β vanishes we find that Nαβ = Nαβ in this case, and that
782
J. de Boer, L. Feh´er
j(q, p) =
X α
pα V α =
X
pα e1,1+α ≡
h
α
0 0
i p . 0
The above notations mean that we think of q and p as an (n − 1)-component column, and row vectors, respectively, where the n × n matrices are written in an obvious block notation associated with the grading. Using this notation the current j0 ∈ G0 can written in matrix form as h i 0 0 j0 = ξnH + , 0 η where η belongs to sl(n − 1) and ξ is a gl(1) current. We then parametrize the classical current I ∈ sl(n) as h i a b −1 −1 (−j + j0 )g− + Kg− ∂g− ≡ . (5.7) I = g− c d Straightforward matrix multiplication gives a = (n − 1)ξ − pq, b = −p, c = qpq + ηq − nqξ + K∂q, d = η − ξ1n−1 + qp,
(5.8)
where q and p are again understood as column and row vectors. For example, we have (qpq)α = qα pβ qβ . Quantization is performed by normal ordering, pulling p to the left. Notice that if n = 2 then η = 0 and (5.8) looks formally identical to the formula of the original Wakimoto realization described at the beginning of the introduction (the current j0 in the introduction may be identified with 2ξ in terms of the present notation). Applying the construction to sl(n − 1), and so on, one can iteratively obtain the complete free field realization of I ∈ sl(n). This appears to be an effective way to derive completely explicit formulas for the free field realization. The resulting final formula will be equivalent to that obtained “in one step” as was noted before. A curious interpretation of this result is that Wakimoto’s original formula contains every information about the complete free b field realization of sl(n) in the sense that one needs to use that formula only in every step of the iterative procedure. 5.4. The general two-blocks case: n = r+s. Straightforwardly generalizing the preceding example, we define 1 H = Hr,s ≡ diag (s1r , −r1s ) . n Then g− , j can be parametrized similarly as before, and β again vanishes since G− is abelian. Now q and p are s × r and r × s matrices containing the conjugate pairs in transposed positions, respectively. We parametrize j0 as h i 0 η j0 = ξnHr,s + r 0 ηs with ηr ∈ sl(r) and ηs ∈ sl(s). The formula of I in (5.7) becomes a = ηr + sξ1r − pq, b = −p, c = qpq + ηs q − qηr − nqξ + K∂q, d = ηs − rξ1s + qp.
(5.9)
Wakimoto Realizations of Current Algebras
783
Quantization is achieved by normal ordering and choosing the levels of ηr , ηs and ξ according to Theorem 3. This example belongs to the series of cases for which P \G is a hermitian symmetric space, which has already been considered in [B]. 6. Screening Charges In this section, we describe the screening charges relevant for the Wakimoto realization (3.4). The form of the screening charges has been known in the case where G0 is abelian [BMP, FF1, Fr1, Ku]. We are not aware of previous explicit results in the case where G0 is nonabelian. Screening charges are given by contour integrals of screening currents of conformal weight one. The screening currents, in turn, are constructed out of polynomials in pα , qα , and out of certain chiral primary fields for the affine Lie algebra generated by j0 . The crucial property of screening charges is that they commute with the Wakimoto currents given in (3.4); more precisely, for generic K, the centralizer of the screening charges in the chiral algebra generated by pα , qα and j0 should be generated precisely by the Wakimoto currents. Screening charges play an important role in the construction of irreducible representations and in the computation of correlation functions [FF1, BF, BMP, ATY, Ku]. We will be able to define screening charges that commute with the Wakimoto currents for general G and G0 . However, we can prove [Fr2] that the centralizer of these screening charges is generated by the Wakimoto currents only in the case where G0 is abelian. For nonabelian G0 , we conjecture that a similar statement holds. In the case of nonabelian G0 , a further problem is that no explicit expression is known for the chiral primary fields of the current algebra generated by j0 , which are contained in the screening operators. It could be useful to give a free field realization of these chiral vertex operators using a further Wakimoto realization for G0 with respect to the Cartan subalgebra of G0 . For abelian G0 , one can bosonize G0 in terms of free scalars and the chiral vertex operators are given by exponents of these free scalars. In the literature (see [Ra] and references therein), one sometimes encounters formal screening operators containing negative powers of fields, especially in the analysis of admissible representations. Their precise definition involves further field redefinitions and bosonizations, and we will not consider them here. We next describe the classical version of the screening charges, looking only at the property that they should commute with the Wakimoto currents. The required chiral primary fields can classically be obtained by taking a path-ordered exponent of the G0 -valued current. We then consider the quantum case, and discuss the centralizer of the screening charges. Finally, we briefly explain how all this is connected to free field realizations for W-algebras. 6.1. Screening charges: the classical case. To define the classical screening charges, we need to introduce a non-periodic G0 -valued field h0 (σ) which is related to j0 by 0 j0 = Kh−1 0 h0 .
(6.1)
It follows that h0 satisfies h0 (2π) = h0 (0) · m, where m is the monodromy of j0 on the circle, ÿ Z ! 1 2π m = P exp j0 (σ)dσ . K 0 We require h0 to be a primary field with respect to j0 ,
784
J. de Boer, L. Feh´er
{Tr (Yi j0 )(σ), h0 (σ)} ¯ = h0 (σ)Y ¯ i δ(σ − σ) ¯
Yi ∈ G 0 ,
(6.2)
and to have vanishing PBs with qα , pβ . The affine current algebra PBs (2.6) of j0 follow from (6.1) and (6.2). We can therefore consistently replace the variables (j0 , qα , pβ ) with the new variables (h0 , qα , pβ ). Since the solutions of the differential equation (6.1) for h0 are parametrized by the arbitrary initial value h0 (0), the mapping h0 7→ j0 is many-to-one. The PBs of h0 with itself have a quadratic structure given in terms of a classical r-matrix, as described e.g. in [FL] and references therein. We will not need them explicitly. We now show that the screening currents are some of the components of h0 jh−1 0 . For any ξ ∈ G− , define the current Sξ and the associated charge Sξ by Z 2π −1 Sξ ≡ dσ Sξ (σ). (6.3) Sξ ≡ Tr (ξh0 jh0 ) 0
Theorem 5. The charge Sξ in (6.3) commutes with the classical Wakimoto current I in (2.11) if and only if ξ ∈ G− ∩ [G+ , G+ ]⊥ . Proof. Let us compute the PB ¯ {Tr (Ta I)(σ), Tr (ξh0 jh−1 0 )(σ)}.
(6.4)
The only unknown ingredient in this computation is the PB of I with h0 , which from (6.2) is found to be −1 ¯ = (h0 π0 (g− Ta g− ))(σ)δ(σ − σ). ¯ {Tr (Ta I)(σ), h0 (σ)}
In addition, when working out (6.4), one encounters h00 , which has to be replaced by K −1 h0 j0 , using (6.1). Putting everything together one obtains −1 ¯ = Tr ((h−1 ¯ {Tr (Ta I)(σ), Tr (ξh0 jh−1 0 )(σ)} 0 ξh0 )[j, π+ (g− Ta g− )])(σ) δ(σ − σ) −1 0 ¯ +K Tr ((h−1 0 ξh0 )(g− Ta g− ))(σ) δ (σ − σ).(6.5)
The first term on the right hand side vanishes identically if and only if ξ ∈ [G+ , G+ ]⊥ , and since ξ was an element of G− to start with, this completes the proof. Clearly, Sξ in (6.3) is a primary field with conformal weight one with respect to the stress-energy tensor (2.13). For ξ ∈ G− ∩ [G+ , G+ ]⊥ , we call Sξ and Sξ the screening current and the screening charge associated with ξ. It is important to notice that not all the screening currents are independent in general, since (6.3) leads to the relation Sξ (r0 h0 , j) = Sr−1 ξr0 (h0 , j) 0
∀ r0 ∈ G0 .
(6.6)
The mapping r0 : h0 7→ r0 h0 actually [FL] defines a Poisson-Lie action of the group G0 on the space of fields h0 , and (6.6) tells us that this action transforms the screening currents Sξ into each other according to the natural action of G0 on the space G− ∩ [G+ , G+ ]⊥ . Hence, there exists one independent screening charge for each irreducible representation of G0 in G− ∩ [G+ , G+ ]⊥ . Using the gradation in (1.6), which satisfies [H, Eαl ] = nl Eαl with nl ∈ {0, 1} for any simple root αl of G, it is not difficult to show that G− ∩ [G+ , G+ ]⊥ = G−1 .
Wakimoto Realizations of Current Algebras
785
The highest weight vectors of the irreducible representations of G0 in G−1 are those root vectors of simple roots that lie in G−1 , i.e., the E−αl for which nl = 1. Thus a set of independent screening currents can be chosen to contain S(l) ≡ Sξ
for
ξ = E−αl ∈ G−1 .
In the next section, we shall quantize the classical screening currents S(l) . For this, it will be advantageous to rewrite them in the form α S(l) = Tr (jM(l) ) = jα M(l)
(6.7)
M(l) ≡ h−1 0 E−αl h0 .
(6.8)
with α = Tr (V α M(l) ) with respect to dual bases Vα of G−1 and V α Here jα = Tr (Vα j), M(l) of G1 . The field M(l) (σ) takes its values in the irreducible representation of G0 built on α vanishes the highest weight vector E−αl , which we denote by V(l) ⊂ G−1 , i.e., M(l) 0 0 if Vα ∈ V(l ) for l 6= l. Incidentally, the V(l) are pairwise inequivalent representations. The crucial property of M(l) is that, as a consequence of (6.2), it is a primary field with respect to j0 ,
{Tr (Yi j0 )(σ), M(l) (σ)} ¯ = −[Yi , M(l) (σ)]δ(σ ¯ − σ) ¯
Yi ∈ G 0 .
(6.9)
6.2. Screening charges: The quantum case. We now turn to the quantum screening charges. This is a more difficult case, because we cannot define an operator h0 by means of (6.1). However, according to (6.7) we do not need h0 itself to construct the screening currents S(l) , only the chiral primary fields M(l) . The general theory of vertex operators (see e.g. [G, TK]) associates with each irreducible representation of G0 a unique vertex operator that creates the corresponding representation of the affine algebra (3.3) of j0 from the vacuum, and this is precisely sufficient for our purposes. We denote the chiral vertex operator associated with the representation V(l) ⊂ G−1 by M(l) , in components α (z), to indicate that it is a quantization of the corresponding classical M(l) (z) = Vα M(l) object. It satisfies the OPE that is the quantum version of the Poisson bracket (6.9) Tr (Yi j0 )(z)M(l) (w) =
−[Yi , M(l) (w)] z−w
Yi ∈ G 0 .
(6.10)
α In the case where G0 is abelian, the components of j0 and the M(l) can be represented by derivatives and exponentials of free scalar fields, respectively. α , we find that the screening currents in the quanUsing the chiral primary fields M(l) tum theory are given by the same equation as the classical ones, namely by Eq. (6.7) if we normal order as before by moving the momenta contained in jα to the left. In the rest of this section, we use upper triangular coordinates on G− as in Sect. 3, and adopt the definition I dz −1 β α S(l) (z). S(l) ≡ (6.11) S(l) ≡ (p (Nαβ M(l) )) 2πi
Then the following result holds.
786
J. de Boer, L. Feh´er
Theorem 6. The OPEs between the screening current S(l) in (6.11) and the Wakimoto current I in (3.4) have the form S(l) (z)Tr (Ta I)(w) =
−1 M(l) )(w) −y Tr (g− Ta g− , 2 (z − w)
(6.12)
where y = 21 |ψ|2 (k + h∗ ) and Ta is a basis of G. As a consequence, the screening charge S(l) commutes with the Wakimoto current. Furthermore, the screening current S(l) is a primary field with conformal weight one with respect to the stress-energy tensor in (3.8). Proof. The verification of the OPE in (6.12) is a long and tedious exercise, using the −1 various properties of Nαβ , and the identities given in Appendix A. The only novel ingredient is that in the same way as we encountered h00 in the proof of Theorem 5, α . Chiral where we could replace it by K −1 h0 j0 , we now encounter the operator ∂M(l) primary fields generating highest weight representations of affine algebras have a generic null vector, first found by Knizhnik and Zamolodchikov [KZ]. This null vector translates α reads as into an operator identity, which for M(l) µ α y∂M(l) . (6.13) = (Tr [j0 , V α ]Vµ )M(l) As for the conformal weight of S(l) , this is another standard OPE calculation. It turns −1 α ) is one and the conformal weight of M(l) is out that the conformal weight of (pβ Nαβ zero, completing the proof of the theorem. Observe that the OPE in (6.12) corresponds precisely to the PB in (6.5). It vanishes for Ta ∈ G≤0 , and for Ta ∈ G1 the conjugation by g− drops out. In the principal case, (6.12) reproduces the known OPEs between the screening currents and the components of I associated with the Chevalley generators of G [BMP, Ku]. Of course, in the principal case the formula of the screening currents in (6.11) itself reduces to the known result [BMP, FF1, Fr1, Ku]. As an illustration, let us present the detailed form of the S(l) for the example of the sl(n) current algebra discussed in Section 5.1.√In this case, the abelian current algebra generated by j0 can be bosonized by j0 = −i y∂φ, where φ is a G0 -valued scalar field with the OPE Tr (Yl φ)(z)Tr (Ym φ)(w) = −Tr (Yl Ym ) log(z − w)
Yl , Ym ∈ G0 ,
and G−1 decomposes into the one-dimensional representations of G0 spanned by the root vectors E−αl = el+1l for l = 1, . . . , n − 1. The chiral vertex operators M(l) are now given by M(l) = el+1l : exp(− √iy αl (φ)) :. Combining this with (6.7) and (5.2), we find the screening currents S(l) = pl+1l +
X 1≤m
i pl+1m qlm : exp(− √ αl (φ)) : y
consistently with their formula in [BMP]. In the case where G0 is abelian, one can define a so-called Wakimoto module Wλ as being the Verma module generated by j0 , pα , qα from a highest weight state whose j0 eigenvalue is the weight λ [Wa, FF1]. The Wakimoto modules fit into a complex whose mth term C m is C m = ⊕s∈W,l(s)=m Ws(ρ)−ρ , where W is the Weyl group of G, l(s) is the length of the Weyl group element s ∈ W , and ρ is half the sum of the positive roots. The
Wakimoto Realizations of Current Algebras
787
differential of this complex is given by suitable multiple contour integrals of products of the screening currents. In particular, the differential d0 mapping C 0 to C 1 is given by the direct sum of the screening charges themselves. In [Fr2] it is shown that the zeroth cohomology of this complex is for generic K equal to the vacuum module of GˆK , and all other cohomologies vanish. This means that the kernel of d0 is the vacuum module of GˆK , or in other words, the centralizer of the screening charges acting on the algebra generated by j0 , pα , qα is precisely generated by the Wakimoto current I. Unfortunately, we do not know of a similar finite resolution for nonabelian G0 , but we conjecture that such a resolution should also exist, where the first differential is again given by the direct sum of the screening charges. Such a resolution would immediately imply that for generic K the Wakimoto current generates the centralizer of the screening charges S(l) in the nonabelian case as well, but we have no rigorous proof of this statement at present. 6.3. Applications to W-algebras. With the explicit results for the Wakimoto realization and its screening charges, it is interesting to see what we can learn from this for W-algebras, since it has for example been observed that application of a Hamiltonian reduction to the Wakimoto realization leads to free field realizations and resolutions for representations of W-algebras [BO, FF2, Fr1, Fr2, Fi]. Alternatively, free field realizations for W-algebras can be obtained by expressing the generators of the W-algebra in terms of those of an affine Lie algebra through the Miura map, and by subsequently using a Wakimoto realization for this affine Lie algebra [dBT]. Quite remarkably, the level shifts found in the Miura map in [dBT] match precisely those appearing in Theorem 3. This is no coincidence, and we will now briefly explain the relation. Consider those W-algebras that can be obtained by Hamiltonian reduction of an affine Lie algebra based on an sl2 embedding, with a set of first class constraints that e− constrain the components of the current that lie in G+ and generate the gauge group G at the classical level. The corresponding quantum W-algebra is by definition the BRST cohomology of I Q=
1 dz [(Iα − χ(Iα ))cα − fαβ γ (bγ (cα cβ ))] 2πi 2
(6.14)
acting on the chiral algebra generated by the current I of GbK together with the ghosts and antighosts cα and bβ . Here Iα = χ(Iα ) are the first class constraints, Iα = Tr (Vα I) for a basis Vα of G− using the decomposition G = G− + G0 + G+ associated with the sl2 embedding. For notation and the computation of this cohomology, see [dBT]. Let us now repeat the calculation of the cohomology, but first insert the Wakimoto form (3.4) for I in (6.14). In principle we could choose the underlying parabolic subalgebra arbitrarily, but the form of the BRST operator suggests that we should identify the subalgebra G+ ⊂ G that appears in the first class constraints in (6.14) with the G+ that is used in the Wakimoto realization. This will indeed lead to a major simplification, and we restrict our attention to this case. Thus we are going to compute the BRST cohomology of Q in (6.14) acting on the algebra generated by j0 , pα , qα , bα , cα . Clearly, since j0 does not appear at all in Q, it will survive in the cohomology. In addition, we claim that the cohomology on the other fields is trivial, so that the full cohomology is in fact generated by j0 . To prove this, we note that the algebra generated by pα , qα , bα , cα is the same as the algebra generated by pˆα , qα , cˆα , bα , where pˆα ≡ Q0 (bα ), with Q0 equal to Q without the χ(Iα ) term, and
788
J. de Boer, L. Feh´er −1 −1 β cˆα = cγ Tr (Vγ Nβα g− V g− ).
That the transormation between these two sets of variables is one-to-one and invertible follows from the fact that the coordinates qα are upper triangular. In terms of the new variables, the BRST transformations are particularly simple, 0 → bα → pˆα − χ(Iα ) → 0
0 → qα → cˆα → 0
from which one immediately derives (“quartet confinement”) that the BRST cohomology is trivial [dBT]. Clearly, the above BRST cohomology has not yet anything to do with the W-algebra, since it is generated by j0 . The W-algebra is defined to be the BRST cohomology of Q acting on a chiral algebra generated by the current algebra GbK , and we have extended this current algebra to the algebra of j0 , qα , pα , explaining the apparent discrepancy. Suppose now that GbK is recovered as the centralizer of the screening charges S(l) in (6.11) acting on the algebra of j0 , qα , pα . Then we still have to take this into account to obtain the W-algebra. The screening charges commute with the Wakimoto current and therefore should give a well-defined action on the BRST cohomology. If we choose as representatives for the cohomology anything generated by j0 , then the action of the screening charges in (6.11) will not preserve this set of representatives. In order to obtain such screening charges that act inside this set of representatives, we have to add BRSTexact pieces to the screening charges in (6.11). Let us therefore take a closer look at the BRST cohomology. If we define rα = pˆα − χ(Iα ), then the action of Q can be summarized as 0 → bα → rα → 0 and 0 → qα → cˆα → 0. We used this to argue that the cohomology on the space generated by b, r, q, cˆ was trivial. In other words, every Q-closed differential polynomial of b, r, q, cˆ is in Q-cohomology equal to a constant. We now associate a weight to b, r, q, cˆ, namely we give b and r weight one, q and cˆ weight zero and ∂ weight one. This then associates a weight to any differential polynomial expression in b, r, q, cˆ, but in the remainder we only look at polynomials whose weight is at most one. Since the BRST operator preserves the weight and the constants have weight zero, a differential polynomial of weight one that is Q-closed is equal to zero in Q-cohomology. Any polynomial of weight zero is an ordinary polynomial in the commuting variables q and cˆ, and restricted to these Q can be identified with the usual exterior derivative on a dim(G− )-dimensional plane. In that case the Poincar´e lemma tells us that every closed form is exact except for the constants. Altogether this shows that any polynomial P (b, r, q, cˆ) of weight at most one is in Q-cohomology equal to P (0, 0, 0, 0). Returning to the screening charges S(l) in (6.11), it is easy to see that they can be written as I I dz β −1 α dz α (p (Nαβ M(l) )) = ((χ(Iα ) + fα (b, r, q, cˆ))M(l) ), S(l) = 2πi 2πi where fα is some polynomial of weight one. Hence we can apply the result derived in the previous paragraph and deduce that S(l) is in Q-cohomology the same as I dz W α S(l) (χ(Iα )M(l) ≡ ), 2πi and these operators do act in the space generated by j0 . However, the above result does not yet imply that for generic K the centralizer of the W acting on the algebra generated by j0 is indeed the W-algebra. screening charges S(l)
Wakimoto Realizations of Current Algebras
789
To prove such a statement, we would need a resolution of the vacuum module of GˆK as mentioned in the preceding subsection. With such a resolution, the W-algebra would ∗ (Hd∗ (C ∗ ⊗ Aghosts )), with d the differential of the resolution and by definition be HQ Aghosts the algebra generated by b and c. This is the second term in a spectral sequence ∗ ∗ which degenerates at the second term. We can then compute Hd+Q calculation of Hd+Q ∗ using the opposite spectral sequence with second term Hd∗ (HQ ). This spectral sequence must also degenerate at the second term, and from this point of view the cohomology is W acting on the algebra generated by j0 . For abelian G0 equal to the centralizer of the S(l) the resolution exists [Fr2] and this argument proves that the W-algebra is the centralizer W acting on the algebra generated by j0 , and everything can easily be represented of the S(l) in terms of free fields. For nonabelian G0 a similar statement follows once we assume the existence of the conjectured resolution. 7. Conclusions In this paper we have provided explicit expressions for general Wakimoto realizations of current algebras. The advantages of our approach are that (i) we obtain the form of all components of the G-valued current, not just those corresponding to the Chevalley generators of G, (ii) it has a clear geometrical origin and (iii) we obtain the realizations for nonabelian G0 at one stroke as well. Moreover, our explicit formulas are valid in arbitrary upper triangular coordinates, and we have determined the screening charges for all cases, too. Wakimoto realizations play an important role in the representation theory of affine Lie algebras as the building blocks of resolutions of irreducible highest weight representations [FF1, BF, BMP], where the screening currents are used to construct intertwiners between different Wakimoto modules. Furthermore, Wakimoto realizations can be used to compute correlation functions in the WZNW model [FF1, BF, BMP, ATY, Ku], and to obtain free field realizations of W-algebras [BO, FF2, Fr1, Fr2, Fi]. For each of these applications, our explicit expressions should be useful if one wants to derive general results (as we illustrated for W-algebras above), or if one simply wants to work out a complicated example in detail. Let us also mention integrable deformations of conformal field theory, Toda theories and generalized KdV hierarchies as subjects where our results might prove useful. In addition to the above-mentioned applications, there are various issues in the construction itself that we would like to understand better. One of them is the role the screening charges play in the Hamiltonian reduction. In this paper we introduced them in a somewhat ad hoc manner. A more conceptual derivation of the screening charges should probably make use of a left-right separated version of the WZNW phase space and its Poisson-Lie geometry. Another very interesting problem is whether we can obtain an explicit expression for the chiral group-valued field of WZNW theory in terms of free fields. Classically, there are several chiral group-valued fields that one can consider, distinguished by the different monodromies they possess, with different Poisson brackets. Upon quantization, quantum group structures may arise, and we would like to know what kind of quantum group structure, if any, will naturally appear in a Wakimoto-type free field realization of the chiral group-valued field. These may be related to the quantum group structures encountered in [BMP], and we hope to come back to this point in the future.
790
J. de Boer, L. Feh´er
A. Some OPE Identities In this appendix we record some identities for OPEs of composite fields formed out of pα , qβ that are subject to the OPEs in (3.2). These identities were used in our computations. They follow straightforwardly from the general OPE rules given in the appendix of [BBSS]. Recall that the normal ordered product (AB) of two fields A and B is defined by I dz 1 (AB) (w) = A(z)B(w) 2πi z−w using a contour that winds around w counterclockwise. Then for polynomial functions f and g of the qβ , but not of their derivatives, the Wick rule implies pα (z)f (w) = f (z)(pα g)(w) = pβ (z)(pα g)(w) = (pα f )(z)(pβ g)(w) = (pα f )(z)(g∂qβ )(w) =
−∂ α f (w) , z−w (g∂ α f )(w) , z−w −(pα (∂ β g))(w) , z−w −(∂ β f (z)∂ α g(w)) (pα (g∂ β f ) − pβ (f ∂ α g))(w) , + (z − w)2 z−w −(f (z)g(w))δβα −(f ∂ α g∂qβ )(w) . (A.1) + (z − w)2 z−w
Here a classical object pf (q) is represented by the normal ordered object (p(f (q))), which we denote as (pf ) for simplicity, but having this particular ordering in mind. As in the main text, ∂ α = ∂q∂α and ∂ means derivation with respect to the complex
parameter, (∂f )(z) = ∂f ∂z . To verify the form of the affine-Sugawara stress-energy tensor in Theorem 3, we have further used the following rearrangement identities for normal ordered products: 1 (pα f )(pβ g) = − (∂ α g)∂ 2 (∂ β f ) + ∂pα (g∂ β f ) + pα (g∂∂ β f ) , 2 − pβ (∂f ∂ α g) + pβ (pα (gf )) , (pα f )g = −∂f ∂ α g + pα (gf ) , g(pα f ) = f ∂∂ α g + pα (gf ) , 1 (pα f )(g∂qβ ) = −∂f (∂ α g)∂qβ − δβα g∂ 2 f + pα (f g∂qβ ) , 2 1 α 2 α α (A.2) (g∂qβ )(p f ) = f ∂((∂ g)∂qβ ) − δβ f ∂ g + pα (f g∂qβ ) . 2
Wakimoto Realizations of Current Algebras
791
B. Derivation of Ito’s Formula for Aβ The purpose of this appendix is to derive the formula of Aβ in (5.6) from that in (5.5). First, let us introduce the notation E α ≡ |α| 2 E−α for any root α. We use the normalization given in (5.3), and hence E α is the dual of Eα with respect to the inner product Tr of G, that is Tr (Eα E β ) = 1 iff α = β where α, β are arbitrary elements of the root system 8. We also introduce a basis {Hi } for the Cartan subalgebra, with dual basis {H i } with respect to Tr , Tr (Hi H k ) = δik . We then proceed to rewrite Aβ in (5.5) as follows: 1 X Tr ([E−α , E−β ]E −α−β )Tr ([E−α−β , Eβ ]E −α ) Aβ = 2 + 2
α,α+β∈8
= =
1 2
X
Tr ([E−α , E−β ]Eγ )Tr ([E γ , Eβ ]E −α )
α,α+β,γ∈8+
1 X Tr ([E−α , E−β ][Eβ , E −α ]) 4 α∈8+ 1 X + Tr ([E−β , Eγ ]E−α )Tr (E −α [E γ , Eβ ]) 4 + α,γ∈8
1 X = Tr ([E −α , [E−α , E−β ]]Eβ ) 4 + α∈8 1 X + Tr ([E γ , [Eγ , E−β ]]Eβ ) 4 + γ∈8 ,γ6=β
1 = c2 (G)Tr (E−β Eβ ) 4 1X Tr ([H i , [Hi , E−β ]]Eβ ) − 4 i 1 − Tr ([Eβ , E−β ][Eβ , E β ]) 4 c2 (G) 1 X |β|2 i = − Tr (Hβ Hβ ) β(H )β(H ) − i 2|β|2 | 2|β|2 i 8 =
c2 (G) − 1. 2|β|2 |
P In the last step we used that |β|2 = i β(Hi )β(H i ), which holds simply by the definition of the scalar product of the roots, and that Tr (Hβ Hβ ) = 4/|β|2 , which follows [H] from (5.3). Acknowledgement. LF is grateful to B. Feigin, E. Frenkel, K. Ito and G. Meisters for useful conversations and correspondence. He also wishes to thank I. Tsutsui for hospitality at the Institute for Nuclear Studies, University of Tokyo during the main course of this work, as well as the Alexander von Humboldt Foundation for support during the final stages of this work. JdB is supported in part by the Director, Office of Energy Research, Office of Basic Energy Services, of the US Department of Energy under Contract DE-AC03-76SF00098, in part by the National Science Foundation under grant PHY95-14797, and is a fellow of the Miller Institute for Basic Research in Science.
792
J. de Boer, L. Feh´er
References [AM] [A] [ATY]
[BBSS]
[B]
[BCW] [BF] [BO] [BR] [BMP]
[dBF] [dBT] [FL] [FF1]
[FF2]
[FFR] [Fi] [Fr1] [Fr2]
[G]
Abraham, R., Marsden, J.E.: Foundations of Classical Mechanics. Second edition, Reading, MA: Addison-Wesley, 1978 Akhiezer, D.N.: Lie Group Actions in Complex Analysis, Aspects of Mathematics. Vol. E27, Vieweg, 1995 Awata, H., Tsuchiya, A., Yamada, Y.: Integral formulas for the WZNW correlation functions. Nucl. Phys. B365, 680–696 (1991); Awata, H.: Screening currents Ward identity and integral formulas for WZNW correlation functions, Prog. Theor. Phys. Suppl. 110, 303–319 (1992) Bais, F.A., Bouwknegt, P., Schoutens, K., Surridge, M.: Extensions of the Virasoro algebra constructed from Kac-Moody algebras using higher order Casimir invariants. Nucl. Phys. B304, 348–370 (1988) Bars, I.: Free fields and new cosets of current algebras. Phys. Lett. B255, 353–358 (1991); Bars, I., Sfetsos, K.: Hermitian symmetric spaces and free field topological theories. Nucl. Phys. B371, 507–518 (1992) Bass, H., Connell, E.H., Wright, D.: The Jacobian conjecture: reduction of degree and formal expansion of the inverse. Bull. AMS 7, 287–330 (1982) Bernard, D., Felder, G.: Fock representations and BRST cohomology in SL(2) current algebra. Commun. Math. Phys. 127, 145–168 (1990) Bershadsky, M., Ooguri, H.: Hidden SL(n) symmetry in conformal field theories. Commun. Math. Phys. 126, 49–83 (1989) Bialynicki-Birula, A., Rosenlicht, M.: Injective morphisms of real algebraic varieties. Proc. AMS 13, 200–203 (1962) Bouwknegt, P., McCarthy, J., Pilch, V: Free field realizations of WZNW models, the BRST complex and its quantum group structure. Phys. Lett. B234, 297–303 (1990); Quantum group structure in Fock space resolutions of sbl(n) representations. Commun. Math. Phys. 131, 125– 155 (1990); Free field approach to two-dimensional conformal field theory. Prog. Theor. Phys. Suppl. 102, 67–135 (1990); Some aspects of free field resolutions in 2D CFT with applications to quantum Drinfeld-Sokolov reduction. In: Strings and Symmetries. 1991, eds. N. Berkovits et al., Singapore: World Scientific, 1992 p. 407 de Boer, J., Feh´er, L.: An explicit construction of Wakimoto realizations of current algebras. Mod. Phys. Lett. A11, 1999–2011 (1996) de Boer, J., Tjin, T.: The relation between quantum W algebras and Lie algebras. Commun. Math. Phys. 160, 317–332 (1994) Fateev, V.A., Lukyanov, S.L.: Poisson-Lie groups and classical W -algebras. Int. J. Mod. Phys. A7, 853–876 (1992) Feigin, B.L., Frenkel, E.V.: The family of representations of affine Lie algebras. Usp. Mat. Nauk. 43, 227–228 (1988), Russ. Math. Surv. 43, 221–222 (1989); Affine Kac-Moody algebras and semi-infinite flag manifolds. Commun. Math. Phys. 128, 161–189 (1990); Representations of affine Kac-Moody algebras, bosonization and resolutions. Lett. Math. Phys. 19, 307–317 (1990); Representations of affine Kac-Moody algebras and bosonization. In: Physics and Mathematics of Strings. eds. L. Brink at al, Singapore: World Scientific, 1990 pp. 271–316; Frenkel, E.: Free field realizations in representation theory and conformal field theory. On: Proceedings of the ICM., Z¨urich 1994 preprint: hep-th/9408109 Feigin, B.L., Frenkel, E.V.: Quantization of the Drinfeld-Sokolov reduction. Phys. Lett. 246B, 75–81 (1990); Affine Kac-Moody algebras at the critical level and Gelfand-Dikii algebras. Int. J. Mod. Phys. A7, 197–215 (1992) Feigin, B.L., Frenkel, E.V., Reshetikhin, N.: Gaudin model, Bethe ansatz and critical level. Commun. Math. Phys. 166, 27–62 (1994) Figueroa-O’Farrill, J.M.: On the homological construction of Casimir algebras,. Nucl. Phys. B343, 450–466 (1990) Frenkel, E.: Affine Kac-Moody algebras at the critical level and quantum Drinfeld-Sokolov reduction. PhD thesis, Harvard University, 1991 Frenkel, E.: W-algebras and Langlands-Drinfeld correspondence. In: New symmetry principles in quantum field theory. Proceedings of Carg`ese summer school 1991, eds. J. Fr¨ohlich et al., New York: Plenum Press, 1992 pp. 433–447 Gebert, R.W.: Introduction to vertex algebras, Borcherds algebras and the Monster Lie algebra. Int. J. Mod. Phys. A8, 5441–5504 (1993)
Wakimoto Realizations of Current Algebras
793
[GMOMS] Gerasimov, A., Morozov, A., Olshanetsky, M., Marshakov, A., Shatashvili, S.: Wess-ZuminoWitten model as a theory of free fields. Int. J. Mod. Phys. A5, 2495–2589 (1990); Morozov, A.: Bosonization and multiloop corrections for the Wess-Zumino-Witten model. JETP Lett. 49, 345–349 (1989) [GOV] Gorbatsevich, V.V., . Onishchik, A.L, Vinberg, E.B.: Structure of Lie Groups and Lie Algebras. Encyclopaedia of Mathematical Sciences, Vol. 41, eds. A.L. Onishchik, E.B. Vinberg, Berlin– Heidelberg–New York: Springer-Verlag, 1994 [GS] Guillemin, V., Sternberg, S. Symplectic Techniques in Physics. Cambridge: Cambridge University Press, 1984 e Hamiltonian group actions and [HK] Harnad, J., Kupershmidt, B.A. Symplectic geometries on T ∗ G, integrable systems. J. Geometry and Physics 16, 168–206 (1995) [H] Humphreys, J.E.: Introduction to Lie Algebras and Representation Theory. Berlin–Heidelberg– New York: Springer-Verlag, 1972 [IKa] Ito, K., Kazama, Y.: Feigin-Fuchs representations of A(1) n affine Lie algebra and the associated parafermionic algebra. Mod. Phys. Lett. A5, 215–224 (1990) [IKo] Ito, K., Komata, S.: Feigin-Fuchs representations of arbitrary affine Lie algebras. Mod. Phys. Lett. A6, 581–589 (1991); Ito, K.: Feigin-Fuchs representation of generalized parafermions. Phys. Lett. B252, 69–73 (1990) [Ke] Keller, O.H.: Ganze Cremona-Transformationen. Monats. Math. Physik 47, 299–306 (1939) [Ko] Kostant, B.: Verma modules and the existence of quasi-invariant differential operators. Lect. Notes Math. 466, 101–128 (1974) [Ku] Kuroki, G.: Fock space representations of affine Lie algebras and integral representations in the Wess-Zumino-Witten models. Commun. Math. Phys. 142, 511–542 (1991) [KS] Kuwahara, M., Suzuki, H. Coset conformal models of the W -algebra and their Feigin-Fuchs construction. Phys. Lett. B235, 52–56 (1990); Kuwahara, M., Ohta, N., Suzuki, H.: Free field realization of coset conformal field theories. Phys. Lett. B235, (1990) 57–62; Conformal field theories realized by free fields. Nucl. Phys. B340, 448–474 (1990) [KZ] Knizhnik, V.G., Zamolodchikov, A.B.: Current algebra and Wess-Zumino model in two dimensions. Nucl. Phys. B247, 83–103 (1984) [Ra] Rasmussen, J.: Applications of free fields in 2D current algebra. PhD thesis, Niels Bohr Institute, 1996, preprint: hep-th/9610167 [Ro] Robson, M.A.: Geometric quantization of reduced cotangent bundles. J. Geometry and Physics 19, 207–245 (1996) [Ru] Rudin, W.: Injective polynomial maps are automorphisms. The Am. Math. Monthly 102(6), 540–543 (1995) [TK] Tsuchiya, A., Kanie, Y.: Vertex operators in conformal field theory on P 1 and monodromy representations of the braid group. Advanced Studies in Pure Mathematics 16, 297–372 (1988) [Wa] Wakimoto, M.: Fock representations of the affine Lie algebra A(1) 1 . Commun. Math. Phys. 104, 605–609 (1986) [Wi] Witten, E. Non-abelian bosonization in two dimensions. Commun. Math. Phys. 92, 455–472 (1984) Communicated by R. H. Dijkgraaf
Commun. Math. Phys. 189, 795 – 828 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Abelian BF Theories and Knot Invariants Alberto S. Cattaneo Lyman Laboratory of Physics, Harvard University, Cambridge, MA 02138, USA. E-mail: [email protected] Received: 2 October 1996 / Accepted: 21 March 1997
Abstract: In the context of the Batalin–Vilkovisky formalism, a new observable for the Abelian BF theory is proposed whose vacuum expectation value is related to the Alexander–Conway polynomial. The three-dimensional case is analyzed explicitly, and it is proved to be anomaly free. Moreover, at the second order in perturbation theory, a new formula for the second coefficient of the Alexander–Conway polynomial is obtained. An account on the higher-dimensional generalizations is also given. 1. Introduction In recent years, the study of three-dimensional topological quantum field theories (TQFT) has shed new light on knot invariants. The non perturbative analysis of the Chern–Simons theory in [31] has given an intrinsically three-dimensional definition of the Jones [22] and HOMFLY [17] polynomials, while the approach of [19] has shown a more direct connection with the (2 + 1)dimensional formulation. Later, the perturbative expansion in the covariant gauge [21, 4] has shown that numerical knot invariants can be obtained in terms of integrals over copies of the knot times copies of R3 , and the second-order invariant has been computed explicitly. A rigorous mathematical formulation of these integrals has been given in [10] where some subtleties (“anomalies”) that arise in this framework have also been pointed out. Another three-dimensional TQFT is the so-called BF theory [29]. The version with a cosmological term gives results that are equivalent to those obtained in Chern–Simons theory [11, 13]. The version without cosmological term (or pure) admits an observable [14, 12, 23] whose vacuum expectation value (v.e.v.) is related to the Alexander–Conway polynomial [1]. This is a “classical” knot invariant that can be defined in any odd dimension; yet it is quite difficult to find a good generalization of the three-dimensional observable of the pure BF theory in higher dimensions (s. [16] for an attempt).
796
A. S. Cattaneo
However, pure BF theory is essentially an Abelian theory, as one can see by rescaling A → A and B → B/ since, in the limit → 0, the non-Abelian perturbation B ∧ A ∧ A gets killed. Therefore, we expected to get the same invariants studying the simpler Abelian version of BF theory. In this framework, we have got to define a new non-trivial observable which, though rather involved, has a natural generalization in any odd dimension. In the three-dimensional case, we show that this observable is not “anomalous" both in the usual field-theoretical and in the Bott and Taubes [10] meaning; i.e., we prove both that a quantum observable corresponding to the classical one exists and that the topological nature of its v.e.v. is not spoiled by the collapse of more than two points together (hidden faces). Moreover, we show that this v.e.v. yields, at the second order in perturbation theory, a new integral expression for the second coefficient of the Alexander–Conway polynomial (our conjecture is that the whole v.e.v. is the inverse of the Alexander–Conway polynomial). More generally, our approach suggests a new way of defining knot invariants (or even higher-degree forms on the space of imbeddings) as integrals over copies of the knot and a cobounding surface that turn out not to depend on the choice of the surface. Eventually, we recall that the Abelian BF theory has a physical application [18] as a tool for studying the bosonization of many-body systems. It would be interesting to see if our observable has a physical interpretation as well. 1.1. Plan of the paper. Explicitly, in the three-dimensional case, we want to consider the classical action Z B ∧ dA, Scl = R3
where A and B are one-forms. This theory is invariant under adding an exact form to either A or B . A classical observable for this theory is then given by I Z 1 (ΣK ,K,x0 ) = B∧A+ [A(x) B(y) − B(x) A(y)], γcl 2 x
Abelian BF Theories and Knot Invariants
797
not anomalous in the sense of [10], and generalize this result to any order in perturbation theory.) Finally, in Sect. 6, we describe the steps to be taken to define the higher-dimensional generalizations of our theory. 2. The BV Formalism The BV formalism [6] is a generalization of the BRST formalism [7] which is applicable to theories whose symmetry closes only on shell. Moreover, even in the case of off-shell closed symmetries, the BV formalism allows dealing with observables that are invariant only on shell, as the ones we are interested in. In this section we give only a brief introduction. We refer to Ref. [3], whose notations we follow, for a thorough exposition of the BV formalism, as well as for a clear-cut discussion of the renormalization issue (which we do not deal with in the present paper since the theory we consider is topological as well as Gaussian). 2.1. Preliminaries. We denote by 8i the fields one needs in a theory (i.e., the physical fields, the ghosts, the antighosts, the Lagrange multipliers and, if necessary, the ghosts for ghosts and so on); the space of fields is called the configuration space. We denote by (8i ), or simply by i , the ghost number of the field 8i . By simplicity, we consider only the case of a theory whose physical fields are bosonic, so the Grassmann parity of 8i is given by (−1)i . † In the BV formalism, along with every field 8i one introduces an antifield 8i with the same characteristics of its partner but the ghost number, which instead is given by † (8i ) = −i − 1;
(1)
this also implies that the Grassmann parity is reversed. The space of fields and antifields is called the phase space [28]. In the next sections we will also use the antifields 8∗i satisfying † 8i := ∗8∗i , (2) where ∗ is the Hodge operator. Over the phase space one introduces a supersymplectic structure [28] which allows the definition of the BV antibracket * ← *← − − → + − →+ − δ δ δ δ Y −X Y (3) , , (X , Y ) := X i † † δ8 δ8 δ8i δ8 i
i
and the BV Laplacian 1X :=
X i
i +1
(−1)
*← − ← − + δ δ . X , δ8i δ8† i
(4)
Here X and Y are functionals over the phase space and h· , ·i denotes the scalar product Z α ∧ ∗β (5) hα , βi := M
798
A. S. Cattaneo
and M is the manifold over which the theory is defined. In Ref. [32], the BV phase space is interpreted as the tangent space T M over the space of fields M. In the finite-dimensional case, this is locally isomorphic to the cotangent space T ∗ M by using a volume form on M. Then the Laplacian 1 on T M is in correspondence with the exterior derivative on T ∗ M. However, the product of two functionals on T M does not correspond to the wedge product on T ∗ M, and the antibracket measures this failure since (X , Y ) = (−1)(Y ) [1(XY ) − X 1Y − (−1)(Y ) 1X Y ]
(6)
(notice that the Laplacian as defined in (4) acts from the right). Of course, in the infinite-dimensional case (in which we are interested), this description is only formal. In particular, the Laplacian depends on the regularization we use to define the functional integral. 2.2. BV cohomologies. One can define some cohomologies on the phase space, with respect to the gradation provided by the ghost number. Each cohomology is defined by a coboundary operator, i.e., a nilpotent operator of ghost number one. The simplest coboundary operator is the Laplacian itself. The second interesting coboundary operator is X := (X , Σ) − i~1X,
(7)
where Σ, the quantum action, is a bosonic functional that has to satisfy the quantum master equation (Σ , Σ) − 2i~1Σ = 0 (8) for to be nilpotent. Notice that (8) is equivalent to asking the Gibbs weight exp(iΣ/~) to be 1-closed. The third coboundary operator we consider is σX := (X , S) ,
(9)
where S, the action, is a bosonic functional that has to satisfy the master equation (S , S) = 0
(10)
for σ to be nilpotent. A particular case, which we encounter in this paper, is provided by a 1-closed action S. In this case 1σ + σ1 = 0, (11) so 1 and σ define a double complex. Moreover, by (8), S is also a quantum action for any ~ and, as such, it defines an -cohomology. The restriction of the operator σ to the configuration space defines a new operator, sX := (σX)|
†
,
(12)
8 =0
which can be shown to be nilpotent on shell, i.e., modulo the solutions of Scl
← − δ = 0, δ8i
(13)
Abelian BF Theories and Knot Invariants
799
where Scl = S| 8† =0
(14)
is the classical action. Thus, s defines a cohomology on the configuration space on shell. If s is nilpotent also off shell, one says that the symmetry closes. In this case the BRST approach is available and s is actually the BRST operator. Notice, however, that by (12) s is defined to act from the right; the usual BRST operator sl is the corresponding operator acting from the left, and one has sl X = (−1)(X) sX.
(15)
2.3. BV quantization. The interest in the -cohomology relies on the fact that one can formally show that the class of observables whose v.e.v.’s are gauge-fixing independent is given by the -closed bosonic functionals modulo -exact terms. More precisely, one introduces the partition function Z i e ~ Σ, (16) Z9 := L9
where the Lagrangian submanifold L9 is defined by the equations − → δ 9(8), 8i = δ8i †
(17)
and 9, the gauge-fixing fermion, is a functional on the configuration space that has ghost number −1. In Ref. [32], prescription (17) is shown to amount to selecting the top form in the functional on T ∗ M that, under the isomorphism between T M and T ∗ M, corresponds to the Gibbs weight exp(iΣ/~). The v.e.v. of a functional X over the phase space is then defined as Z i 1 e ~ Σ X. (18) hXi9 = Z9 L9 By using the formal properties of the functional integration, one has then the following [2, 28]: Statement 1. If Σ satisfies the quantum master equation (8), then 1. the partition function Z9 and the expectation values of -closed functionals do not change under infinitesimal variations of the gauge-fixing fermion 9, and 2. the expectation value of an -exact functional vanishes. In the finite-dimensional case, Statement 1 becomes a rigorous theorem. One can also show that the definitions (16) and (18) correspond to the usual ones in the BRST formalism whenever applicable. The σ-cohomology is useful since it is given by the -cohomology in the limit ~ → 0 and is much easier to study. The idea is to solve the quantum master equation and to study the -cohomology by an expansion in powers of ~. Notice, however, that from an action satisfying (10) it is not always possible to obtain a quantum action satisfying (8) that, in the limit ~ → 0, yields the starting action; if this does not happen, one calls the theory anomalous. Moreover, even if the theory is not anomalous, a σ-closed functional
800
A. S. Cattaneo
of ghost number zero does not always produce an -closed functional that, in the limit ~ → 0, yields the starting one. A sufficient condition for both to happen is that the one-ghost-number σ-cohomology be trivial. As explained before, the s-cohomology is the restriction of the σ-cohomology to the configuration space on shell. Since it is easier to study than the σ-cohomology, one can study the latter by an expansion in antifields. Under some mild assumptions [30], one can prove that the extension from a classical action Scl to an action S satisfying (10) and (14) exists and is unique modulo canonical transformations (i.e., transformations on the phase space that preserve the supersymplectic structure). 3. The Three-Dimensional Abelian BF Theory In this section we apply the BV formalism to the theory defined by the classical action Z ω B ∧ dω A, (19) Scl = M
where • • • •
M is a three-manifold; A and B are fields taking values in 1 (M ); dω = d + iω; and ω is an external d-closed source in 1 (M ) (thus, d2ω = 0).
In the particular case ω = 0, we will simply write Scl and speak of the pure theory. We can also split Sclω as (20) Sclω = Scl − iγclω , Z
with
Z
γclω = −
B∧ω∧A= M
ω ∧ B ∧ A,
(21)
M
and see γclω as a perturbation of the pure classical action. If H 1 (M, dω ) is trivial (for the general case s. App. A), the symmetries of this theory are simply given by sω A = dω c, sω c = 0, (22) sω B = dω ψ, sω ψ = 0, where dω = d − iω, and c and ψ are the ghosts, which take values in 0 (M ) and have ghost number one. Notice that γclω is on-shell invariant under the symmetry (22) of the pure theory. 3.1. The BV action. The BV action corresponding to (19) is given by Z B ∧ dω A + A∗ ∧ dω c + B ∗ ∧ dω ψ + c¯∗ hc + ψ¯ ∗ hψ , Sω =
(23)
M
where c¯ and ψ¯ are the antighosts, and hc and hψ are the Lagrange multipliers. The antighosts and the Lagrange multipliers take values in 0 (M ); the former have ghost number minus one, the latter have ghost number zero. The additional terms in the
Abelian BF Theories and Knot Invariants
801
antighosts and Lagrange multipliers are necessary to gauge fix the theory. Notice that we have used here the antifields ∗ instead of the antifields †, s. (2). If the Laplacian d∗ω dω + dω d∗ω has zero modes, additional terms are required; for simplicity we suppose that there are no zero modes, i.e., we suppose that the cohomology H ∗ (M, dω ) is trivial (s. App. A for the case when H 1 (M, dω ) is not trivial). It is not difficult to see that S ω satisfies the master equation (10) for any closed oneform ω. Notice moreover that 1S = 0, so S also satisfies the quantum master equation (8). Thus, we can quantize the theory with Gibbs weight exp(iS ω /~) for any ~. In the following we will set ~ = 1. Notice moreover that the action S ω does not require the choice of a metric on M , so we expect its partition function to be a topological invariant of M . The σ ω operator (9) acts on the fields and antifields as follows: σ ω ψ ∗ = −dω B ∗ , σ ω B ∗ = −dω A, σ ω A = dω c, σ ω c = 0, σ ω c∗ = −dω A∗ , σ ω A∗ = −dω B, σ ω B = dω ψ, σ ω ψ = 0,
(24)
σ ω h∗c = −¯c∗ , σ ω c¯∗ = 0, σ ω c¯ = hc , σ ω hc = 0, σ ω h∗ψ = −ψ¯ ∗ , σ ω ψ¯ ∗ = 0, σ ω ψ¯ = hψ , σ ω hψ = 0.
(25)
It is very useful to consider the following linear combinations: (3, −2) (2, −1) (1, 0) (0, 1) A = −ψ ∗ + B ∗ + A + c, B = −c∗ + A∗ + B + ψ;
(26)
where by (i, j) we denote an i-form of ghost number j. Notice that A and B have an overall (i.e., form plus ghost) degree equal to one. By (26), we can rewrite the action as Z B ∧ dω A + c¯∗ hc + ψ¯ ∗ hψ . (27) Sω = M
Moreover, we can rewrite (24) as σlω A = dω A, σlω B = dω B,
(28)
where σlω is the operator corresponding to σ ω but acting from the left (as the exterior derivative); notice that (29) σlω X = (−1)(X) σ ω X. Following (20), we can split S ω as S ω = S − iγ ω , Z
where
Z
γω = −
ω ∧ Be ∧ A,
B∧ω∧A= M
(30) (31)
M
where the operator˜acts by changing sign to odd-ghost-number terms. The splitting (30) is very convenient since not only do both S ω and S satisfy the quantum master equation (8), but we also have (γ ω , S) = 0, (γ ω , γ ω ) = 0, 1γ ω = 0
(32) (33) (34)
802
A. S. Cattaneo
[notice that, because of (143), (32) does not hold if H 1 (M, d) contains nontrivial elements besides ω]. We can also split the Gibbs weight exp(iS ω ) into the Gibbs weight exp(iS) times the observable (35) 0[ω] = exp γ ω , which we can prove to be -closed as a consequence of (32), (33) and (34). Notice that, if ω is not trivial, the action S has to be modified as in App. A [with b1 = 1 and ϕ1 = ϕ1 = v ω 0 / hω 0 , ω 0 i, where ω 0 = ω + dα and d∗ ω 0 = 0]. By using these notations, it is easy to prove that the theory really depends only on the cohomology class of ω. In fact, if we substitute ω with ω + df , the action S ω gets an extra contribution (36) S ω −→ S ω + ω T f Z
with
f Be ∧ A.
Tf =
(37)
M
By noticing that, 1T f = 0, ω T f , ω T f = 0 and ω T f , T f = 0, we can show that (38) exp(iω T f ) = 1 + ω U f , with Uf =
∞ n X i (ω T f )n−1 T f . n!
(39)
n=1
Since, by Statement 1, the v.e.v. of an ω -exact functional vanishes, we conclude the (formal) proof that the partition function of the theory depends only on the cohomology class of ω. 3.2. The quantization. To quantize the theory, we have to choose a gauge-fixing fermion. A convenient choice is
(40) 9 = hdω c¯ , Ai + dω ψ¯ , B . Notice that to gauge fix the theory we need to choose a metric on M , but, by Statement 1, the partition function will not depend on it. By (16) and (17), we have Z ω ), (41) Z[M, ω] = [DA DB Dc Dc¯ Dψ Dψ¯ Dhc Dhψ ] exp (iSg.f. where
Z
ω Sg.f. = M
D E
∗ B ∧ dω A + hdω c¯ , dω ci + dω ψ¯ , dω ψ + hhc , d∗ω Ai + hψ , dω B . (42)
The partition function (41) can then be computed by using the zeta-function regularization of the determinants and yields [29] Z[M, ω] = T (M, dω ),
(43)
where T is the Ray–Singer torsion, which is a topological invariant of the manifold M and depends only on the cohomology class of the closed one-form ω. This explicit result confirms the previously discussed formal arguments. Notice that any multiple of ω is still a closed one-form; thus, we can consider Z[M, λω] as well.
Abelian BF Theories and Knot Invariants
803
Now we recall that the Ray–Singer torsion is equal to the Reidemeister torsion [15] and the Reidemeister torsion of the complement of a knot is proportional to the inverse of the Alexander–Conway polynomial of the knot itself [26]; thus, we can see the inverse of the Alexander–Conway polynomial as the partition function (43) of an Abelian BF theory. More precisely, we have the following (cf. [5] and [27]) Theorem 1. If M = R3 \Tub(K), where Tub(K) is a tubular neighborhood of the knot K ∈ R3 , and ω ∈ H 1 (M ) is such that I ω = 1, (44) K1
with K1 a closed circle wrapping around K only once, then 1 Z[M, λω] z(λ) = , Z[M ] iλ 1(K; z(λ))
z(λ) = 2i sin(λ/2),
(45)
where 1(K; z) is the Alexander–Conway polynomial satisfying the skein relation 1(K+ ; z) − 1(K− ; z) = z 1(K0 ; z) and normalized to one on the unknot. 4. Observables for the Pure Theory From now on, we will consider only the pure theory defined by the action S in (23) with ω = 0. We will look for observables (i.e., -closed zero-ghost-number functionals modulo -exact terms) that are metric independent. By Statement 1, their v.e.v.’s will give topological invariants (up to framing) since the action is metric independent as well. Our survey is not exhaustive; i.e., there could exist other more involved, metricindependent observables that could lead to other topological invariants. 4.1. Loop observables. The simplest observables one can build are I I K K A, γB = B, γA = K
(46)
K
where K is an exact one-cycle [if K were only closed, these functionals would not be closed under (143)]. These observables are always -exact: K ∗ = −βΣ , γA
Z
where ∗ αΣ
∗
A ,
= Σ
K ∗ γB = −αΣ ,
∗ βΣ
Z
B∗,
=
(47) (48)
Σ
and Σ is a surface cobounding K. Any function of γA or γB separately will be -exact, too. To get a nontrivial observable, we have to pair them; e.g., we can consider the observable K1 K2 γB . (49) τ [K1 , K2 ] = γA By (47), we can show that K1 ∗ K1 ∗ αΣ2 ) − (γA αΣ2 ). τ [K1 , K2 ] = −i1(γA
(50)
804
A. S. Cattaneo
Since K1 ∗ αΣ2 ) = − 1(γA
Z ωK1 = −#(K1 , Σ2 ) = −lk(K1 , K2 )
(51)
Σ2
(where ωK1 is the Poincar´e dual of K1 , # denotes the intersection number and lk the linking number), we have hτ [K1 , K2 ]i = i lk(K1 , K2 ).
(52)
An explicit computation of the l.h.s. with the gauge-fixing (40) actually gives Gauss’s formula. 4.2. Surface observables. As we have seen, the loop observables are rather trivial. A more interesting observable can be built if dim H 1 (M ) = dim H2 (M, ∂M ) = 1; viz., define Z Z (B ∧ A + B ∗ ψ + cA∗ ), (53) Be ∧ A = γΣ = Σ
Σ
with Σ ∈ H2 (M, ∂M ). This observable is essentially the same as in (31) with Σ the Poincar´e dual of ω, so we know that its exponential 0[Σ, λ] = exp(λγ Σ ),
(54)
is an observable as well which, up to -exact terms, depends only on the homology class of Σ. Moreover, the splitting (30), shows us that h0[Σ, λ]iM = hexp(λγ ω )iM =
Z[M, λω] . Z[M ]
(55)
In particular, this holds when M is as in the hypotheses of Thm. 1. Thus, from the r.h.s. of (45) we can read the v.e.v. of 0[Σ, λ]. Notice that condition (44) on ω requires its Poincar´e dual Σ to satisfy #(Σ, K1 ) = 1. Since any surface ΣK spanning the knot K (i.e., any oriented surface ΣK imbedded in R3 such that K is identical with the boundary of ΣK , and the orientation on ΣK induces the given orientation on K) satisfies this property, we have the following T Theorem 2. If M = R3 \Tub(K) and Σ = ΣK M ∈ H2 (M, ∂M ), then h0[Σ, λ]iM =
1 z(λ) , iλ 1(K; z(λ))
z(λ) = 2i sin(λ/2).
(56)
Notice that a spanning surface ΣK always exists; e.g., we can take the Seifert surface. An expansion in powers of λ of the l.h.s. of (56) would give a representation of the coefficients of the inverse of the Alexander–Conway polynomial as Feynman diagrams involving only bivalent vertices on Σ. However, the problem of finding the propagators in a manifold like the one described above is very difficult. In the next subsection, we will see how to recast the problem as the computation of a v.e.v. in R3 . 4.3. Surface-plus-knot observables. From now on we work in R3 and consider a knot K together with a spanning surface ΣK . Theorem 2 suggests to consider the v.e.v. of the exponential of γ ΣK . However, since ΣK has a boundary, γ ΣK is not σ-closed anymore; actually,
Abelian BF Theories and Knot Invariants
805
I (ψA − Bc).
σγ ΣK =
(57)
K
Therefore, we have to find another functional depending on K (so that it vanishes when ΣK is closed) such that its σ-variation cancels (57). We first consider Z 1 γ (K,x0 ) = [A(x) B(y) − B(x) A(y)], (58) 2 x
i 6= j,
(59)
x ∈ Cn (R),
(60)
i, j = 1, . . . , n,
where φ∗ij denotes the pullback via the map φij (x) = sgn(xi − xj ),
and ω = 1/2 is the the unit volume element on S 0 . With these notations, we can rewrite (58) as Z A1 ∧ η12 ∧ B2 .
γ (K,x0 ) =
(61)
C2 (K\x0 )
Now a simple computation shows that I I I (K,x0 ) = − (ψA − Bc) + ψ(x0 ) A− B c(x0 ); σγ K
K
(62)
K
so the first term cancels (62). We have then to find another functional (vanishing when ΣK has no boundary) whose variation cancels the second and third terms. It is not difficult to see that Z Z B∗ + A∗ c(x0 ) (63) γ (ΣK ,x0 ) = ψ(x0 ) ΣK
ΣK
does the job. Thus, we can define the following σ-closed (actually, -closed) functional γ (ΣK ,K,x0 ) = γ ΣK + γ (K,x0 ) + γ (ΣK ,x0 ) .
(64)
In the case of links – which we will not consider anymore in the following – the observable has to be modified as X γ (Ki ,x0i ) + γ (ΣKi ,x0i ) , (65) γ (ΣK ,K,{ΣKi ,x0i }) = γ ΣK + i
where ΣK is a spanning surface for the link K while each ΣKi is a spanning surface only for the component Ki , whose base point is denoted by x0i . Then, recalling (54), we want to consider the exponential of γ (ΣK ,K,x0 ) , O0 [K, λ] = exp(λγ (ΣK ,K,x0 ) ), which is σ-closed and hence a candidate to be an observable. Actually,
(66)
806
A. S. Cattaneo
1O0 [K, λ] =
λ2 O0 [K, λ] γ (ΣK ,K,x0 ) , γ (ΣK ,K,x0 ) 2
vanishes if we are working in standard framing, i.e., if Z ωK = 0, slk(K) =
(67)
(68)
ΣK
where ωK is the Poincar´e dual of K and slk denotes the self-linking number [whose definition via (68) relies on a choice of regularization]. With this hypothesis, we expect the v.e.v. of O0 not to depend on the gauge fixing and, as a consequence, to be metric independent. By essentially the same proof that led to the invariance (modulo -exact terms) of 0[ω], s. (35), under ω → ω + dη, we can prove that O0 is invariant (modulo -exact terms) under ΣK → ΣK + ∂T with T ∈ 3 (R3 ). From (56) we expect the v.e.v. of O0 [K, λ] to be proportional to the inverse of the Alexander–Conway polynomial. The proportionality constant, which depends on λ, could be spoiled when we send R3 \Tub(K) to R3 ; thus, we can make only the weaker statement that 1 hO0 [Ks.f. , λ]i = , hO0 [ s.f. , λ]i 1(K; z(λ))
z(λ) = 2i sin(λ/2),
(69)
where Ks.f. and s.f. are, respectively, a generic knot and the unknot in standard framing. This result has to be compared with the similar formulae obtained in the context of the non-Abelian pure BF theory [14, 12]. It should not amaze us that the Abelian and non-Abelian pure BF theories are under this respect equivalent, for, as observed in [12], the v.e.v.’s of the latter can be computed exactly in saddle-point approximation (s. also the observation in the Introduction). We want to point out the precise meaning of (69): We are not claiming that the coefficients of the λ-expansion of hO0 i are a sum of numerical knot invariants up to factors containing the self-linking number. We are saying that these numerical knot invariants are well defined only if the knot is in standard framing; otherwise O0 is not an observable, and we are not guaranteed that its v.e.v. is a topological invariant. This means that, to compute this v.e.v., we have to choose a particular presentation of the knot, viz., one in which the self-linking number vanishes, and that this v.e.v. should be invariant only under deformations that do not change the self-linking number. In the next section, we will discuss how to drop this cumbersome condition. 4.4. The corrected surface-plus-knot observable. For the purposes of this subsection, it is convenient to rescale B −→ B/λ, (70) so the Gibbs weight becomes exp(iS/λ) and we recognize λ as the Planck constant of the theory. Under this rescaling we also have O0 [K, λ] −→ O0 [K] = exp γ (ΣK ,K,x0 ) ;
(71)
thus, O0 is a classical observable, i.e., it is σ-closed and does not depend on the Planck constant. Now we look for a quantum generalization O[K, λ] =
∞ X n=0
(iλ)n On [K]
(72)
Abelian BF Theories and Knot Invariants
807
satisfying O = (σ − iλ1)O = 0,
(73)
and hence σOn = 1On−1 ,
n = 1, 2, . . ..
(74)
Notice that a solution On , if it exists, is defined only up to σ-closed terms. However, if we send On into On + σYn , the solution On+1 will be sent into On+1 − 1Yn because of (11). Thus, O will be changed by an -exact term. On the other hand, if we add a nontrivial σ-closed term to On , then this – together with the extra contribution On+1 receives by (74) – lets O get an extra -closed term. This means that the solution of (73), if it exists, is unique only up to -closed terms, i.e., up to other observables. It is not difficult to see, by (11), that, if (74) holds up to a fixed n − 1, then σ1On−1 = 0.
(75)
Thus, the r.h.s. of (74) is σ-closed; however, to solve (74), we want it to be σ-exact, which is not guaranteed. If this happens, then the observable O exists and we say that O0 is not anomalous. Among the possible solutions of (73), we are interested in the ones that depend only on the triple ΣK , K, x0 and that reduce to exp γ ΣK when ΣK has no boundary. We call these solutions proper. Now notice that the action is invariant under the change of variables (A, B) → (B, A) while the observable γ (ΣK ,K,x0 ) is odd under it. This means that the corrections can be chosen to have a well-defined parity. By induction, one can see that they can be written as integrals of products of B ∧ A and Be ∧ A over submanifolds of products of configuration spaces of R3 . Moreover, 1On will have the same structure. Since Be ∧ A and B ∧ A are overall two-forms (i.e., each component has form degree plus ghost number equal to two), a product of them will be an overall even form. As a consequence, the zeroghost-number part will be an even form, while the one-ghost-number part will be an odd form. However, only the even homology spaces of the configuration spaces of R3 are nontrivial. Therefore, since On has ghost number zero, it can be a non-trivial element of the σ-cohomology, whereas 1On , which has ghost number one and is σ-closed, must be σ-exact. Thus, we have proved the following Theorem 3. The classical observable O0 [K] is not anomalous and admits a proper extension. By Statement 1, we expect the v.e.v. of O to be a topological invariant (up to framing) of the triple ΣK , K, x0 . If the argument proving the invariance (modulo -exact terms) under a deformation ΣK → ΣK + ∂T with T ∈ 3 (R3 ) goes through, we arrive at the following Conjecture 1. The v.e.v. of a proper solution O[K, λ] is a regular-isotopy invariant of the knot K. Now write this invariant as the sum of an ambient-isotopy invariant and a regularisotopy invariant that vanishes in standard framing. If the second contribution can be written as the v.e.v. of an -closed observable, then we can redefine the proper solution O by subtracting this term; so we arrive to the following Conjecture 2. There exists a proper solution whose v.e.v. is an ambient-isotopy invariant.
808
A. S. Cattaneo
Eventually, since O is a quantum generalization of O0 to which it reduces in standard framing, and the v.e.v. of O0 is expected to satisfy (69), we have our last Conjecture 3. The proper solution O[K, λ] of Conjecture 2 satisfies hO[K, λ]iλ 1 , = hO[ , λ]iλ 1(K; z(λ))
z(λ) = 2i sin(λ/2),
(76)
where denotes the unknot. If only Conjecture 1 holds, we still expect (76) to hold but only if the standard framing is chosen. 4.4.1. The first correction. In App. B, we discuss how to find the first correction to O0 . In particular, we show that a proper solution is given by O1 = O0 U1 , with
(77)
U1 = 1u1 , Z
and
B ∗ + γBAA
u1 = γABB ΣK
where
(78) Z
A∗ ,
(79)
ΣK
Z 1 B(x) A(y) B(z), 2 x
γABB = γBAA
A1 η12 B2 η23 B3 = −
(80) (81)
Remember that O1 is defined up to a σ-closed term. Our choice – (77), (78) and (79) – is particularly convenient since it gives the correct v.e.v. of O to the second order in λ. To see this, we first observe that any Wick contraction in the computation of a v.e.v. carries a factor λ. Thus, at order λ2 , the v.e.v. of O0 + iλO1 will contain the v.e.v.’s of 1/2(γ (ΣK ,K,x0 ) )2 and of iλU1 . Since 1U1 = 0, we have O2 = 0,
(82)
1 (ΣK ,K,x0 ) 2 (γ (83) ) + iλU1 ; O2 = 2 therefore, no other correction is needed to make this second-order term an observable. Notice that any redefinition of U1 obtained by adding a 1-closed term will have the same property. By (74), this additional term must also be σ-closed, so it will be -closed as well. Thus, as expected, O2 is defined up to -closed terms whose v.e.v.’s are of order λ2 . There exist only a few of such terms, i.e., λ2 , λτ [K] and τ [K]2 , where [cf. (49)] where
K K γB . τ [K] = γA
By (52) – and remembering (70) – we see that
O2 + kτ 2 + iλlτ − λ2 m λ = hO2 iλ − λ2 [2k(slkK)2 + lslkK + m];
(84) (85)
i.e., the second order of hO[K, λ]iλ is defined up to a quadratic function of the selflinking number of K. We will see in the next section that, choosing k = −3/16 and l = m = 0, Conjectures 1, 2 and 3 hold at this order.
Abelian BF Theories and Knot Invariants
809
Remark. Notice that we are allowed to add to O2 only contributions of the form λ times an observable, for λ-independent contributions would change the classical part of the observable. Thus, τ 2 would not be an allowed contribution. However, as shown at the end of App. B, adding τ 2 to O2 is equivalent to adding a λ-independent correction to U1 , s. (159). 5. Computation of v.e.v.’s In this section we will describe the perturbative expansion of hO0 iλ with the gauge fixing (40). We will also explicitly compute the second order term of hO0 i and hOi. We will see that the latter satisfies, at this order, Conjectures 1 and 3. 5.1. Gauge fixing and propagators. We will work in the covariant gauge fixing; i.e., we will choose the gauge-fixing fermion as in (40) with ω = 0. By (17) and (2), this amounts to setting A∗ = ∗d¯c, c¯∗ = ∗d∗ A, (86) ¯ ψ¯ ∗ = ∗d∗ B, B ∗ = ∗dψ, while all the other antifields are set to zero. Thus, the gauge-fixed action reads [cf. (42)] Z
B ∧ dA + hd¯c , dci + dψ¯ , dψ + hhc , d∗ Ai + hhψ , d∗ Bi , (87) Sg.f. = R3
from which we can read the propagators (we write only the ones we are interested in) ρ
1 hAµ (x) Bν (y)iλ = hBµ (x) Aν (y)iλ = iλ 4π µνρ (x−y) |x−y|3 , 1 1 hc(x) c¯(y)iλ = − h¯c(x) c(y)iλ = iλ 4π |x−y| ,
¯ ¯ ψ(y) = iλ 1 1 . ψ(x) ψ(y) = − ψ(x) 4π |x−y| λ λ
(88)
Notice that the propagators are exactly the same as in Chern–Simons theory [21, 4]. 5.1.2. Parity. We have already observed that, under (A, B) → (B, A), the action is left unchanged while γ (ΣK ,K,x0 ) changes sign. As a consequence, in the gauge-fixing defined by (86), the propagators (88) are invariant under ¯ c, c¯, hψ , hc ). ¯ hc , hψ ) → (B, A, ψ, ψ, (A, B, c, c¯, ψ, ψ,
(89)
Thus, all the terms in perturbation theory that are odd under (89) [like, e.g., the v.e.v. of (γ (ΣK ,K,x0 ) )2n+1 ] will vanish. 5.1.3. Supersymmetry. Another observation that will simplify the discussion of the perturbation theory concerns the supersymmetry of the action (87) (which is the same that holds in its non-Abelian generalization [24]); viz., we can define a fermionic vector operator Q, i.e., an operator satisfying [Qα , Qβ ]+ = Qα Qβ + Qβ Qα = 0,
(90)
which annihilates the action: QS = 0.
(91)
Actually, there exist two such operators (which, moreover, anticommute with each other); the first one acts as
810
A. S. Cattaneo
(QA)αβ (QB)αβ (Qc)α (Qψ)α (Q¯c)α ¯ α (Qψ) (Qhc )α (Qhψ )α
= = = = = = = =
¯ αβγ ∂ γ ψ, γ αβγ ∂ c¯, −Aα , −Bα , 0, 0, ∂α c¯, ¯ ∂α ψ.
(92)
¯ hψ ). The second operator is obtained by exchanging (c, c¯, hc ) with (ψ, ψ, A consequence of this supersymmetry is that the v.e.v. of a Q-exact functional vanishes. We want to point out that this supersymmetry is peculiar of R3 , but holds (with a different Q-operator) also for other gauge fixings. 5.1.4. Regularization. The propagators (88) diverge as the two points where the fields are evaluated approach each other. The non-regularized v.e.v.’s of our observables are integrals of these propagators over products of Ck (K\x0 ) and ΣK . To avoid divergences we have to give a prescription to split the points in these integrals. Our choice will essentially follow the approach of [10] with some important modifications due to the presence of the surface ΣK (s. also [23]). The idea is to start defining the Fulton–MacPherson [20] compactification Cn (R3 ) of the configuration space of n points in R3 , where the latter is the compactification of R3 obtained by replacing the infinity with its blow up. Then, denoting by B 2 a two-dimensional surface whose boundary is diffeomorphic to S 1 , we can consider the following imbeddings of compact spaces pt ,→ S 1 ,→ B 2 ,→ R3 ,
(93)
where pt is a base point on the sphere S 1 which is mapped to the boundary of B 2 . This allows us to define the configuration space Cnt of n points on the knot distinct from the base point and t points on its spanning surface. Notice that the points on the knot can e t the identity be ordered, so Cnt has n! connected components. We will denote by C n 1 component (i.e., the component with points on S ordered as 0, 1, 2, . . . , n). Our regularization prescription to compute the v.e.v.’s will be to replace Ck (K\x0 )n × t . Moreover, we will rewrite the propagators (88) as (ΣK )t with Ckn hAi ∧ Bj iλ = hBi ∧ Aj iλ = iλ θij , hc
i (∗d¯c)j iλ = h(∗d¯c)i cj iλ = −iλ θij , ¯ j = (∗dψ) ¯ i ψj = −iλ θij , ψi (∗dψ) λ λ
(94)
where θij is the tautological form on R3 defined as the pullback of the SO(3)-invariant unit volume element on S 2 by the map φij (x) =
x j − xi , |xj − xi |
x ∈ Cn (R3 ).
(95)
θji = −θij .
(96)
Notice that θij satisfies dθij = 0,
2 θij = 0,
To compute our v.e.v.’s, we will have to integrate these two-forms as well as the tautological zero-forms ηij [appearing in (61)] over some Cnt . If we choose the identity
Abelian BF Theories and Knot Invariants
811
e t , we can eliminate the zero-forms η. Thus, we can represent the contricomponent C n butions to our v.e.v.’s graphically as follows: we represent the knot as a horizontal line (which we suppose directed from left to right) with the base point on its boundary and the spanning surface as the portion of plane above it, and the two-forms θij as arrows connecting the point i to the point j (s. Fig. 1).
3B
B
B BN
N
QQ Q A Q QA Q AQ s A W W AU
7 X B Z AKZ z X XX B A 1 ~ BBN A ZZ
Fig. 1. Some examples of diagrams with nonvanishing v.e.v.
We can give even a better descriptio – cf. [10] – if we introduce the bundles e t (S 1 × P × S) C n p , y
(97)
S1 × P × S where P is the space of the maps S 1 → S 1 homotopic to the identity and S is the space e t . Now we can see a generic v.e.v. as the of imbeddings of B 2 in R3 , and the fiber is C n sum of contributions which read (98) I = p∗ [I], where p∗ denotes the push forward along the fiber, and [I] is a form on the bundle. A locally constant function on the base space – i.e., a function whose differential vanishes – will be a topological invariant on S, for a locally constant function is a constant on S 1 × P which are connected. If, moreover, this topological invariant does not depend on cohomological deformations of the surface, it will eventually be an invariant of its boundary. As in [10], the differential of I can be written, by Stokes’s theorem, as dI = p∗ d[I] + p∂∗ [I] = p∂∗ [I],
(99)
where p∂∗ denotes the push forward along the boundary of the fiber. It is useful to distinguish on this boundary between principal and hidden faces. The principal faces are essentially of four types: 1. 2. 3. 4.
two points on the knot collapse together; one point on the knot collapses to the base point; two point on the surface collapse together; one point on the surface collapses on the knot where either (a) there are no points, or (b) there is one point, or (c) there is the base point.
All the other components of the boundary (viz., when more points come together) are referred to as the hidden faces. Among the principal faces, we will call simple the ones of type 1, 2 and 3a.
812
A. S. Cattaneo
The principal-face contribution δI to dI can be evaluated “graphically” just by looking at the diagrams. What is not immediate is seeing whether the push forward along the hidden faces vanishes. However, in App. C we prove a vanishing theorem for the push forward along all faces but the simple principal faces (s. Thm. 7). 5.2. The perturbative expansion of hO0 i. By (86), the gauge-fixed observable γ (ΣK ,K,x0 ) (64) now reads (ΣK ,K,x0 ) (K,x0 ) (ΣK ,x0 ) ΣK = γg.f. + γg.f. + γg.f. , (100) γg.f. with
Z ΣK γg.f.
¯ + c (∗d¯c)], [B ∧ A + (∗dψ)ψ
= Z
(K,x0 ) γg.f. =
(101)
ΣK
C2 (K\x0 )
Z
(ΣK ,x0 ) = ψ(x0 ) γg.f.
A1 η12 ∧ B2 , Z ¯ (∗dψ) + (∗d¯c) c(x0 ).
ΣK
(102) (103)
ΣK
5.2.5. The general structure of the perturbative expansion. The first thing we notice is that all these functionals are odd under (89). Thus, only an even product of them will have a nonvanishing v.e.v. This, by the way, proves that hO0 iλ is even in λ, in accord with (69). ΣK , is Q-exact, viz., The second observation is that, by (92), the functional γg.f. Z ΣK = dxα ∧ dxβ [Q(ψA − Bc)]αβ ; (104) γg.f. ΣK
thus, the v.e.v. of any of its powers vanishes. This implies that no loops appear among the v.e.v.’s we are computing. In Fig. 2a , we show one of these loops. Notice that, if we were working on a less trivial manifold, (87) would not be supersymmetric, so such loops would exist. As a matter of fact, the v.e.v. considered in (56) consists entirely of such loop diagrams. XXX COC XX z X
C @
RC @
a) A loop
oS
S S C : C C CCW b) A 5-chord
PP i P
P H H j c) A 4-flagellum
Fig. 2. Elements that appear in the diagrams (ΣK ,x0 ) The third observation is that γg.f. is linear in the Grassmann variables c(x0 ) and ψ(x0 ). Thus, its square simply reads
Abelian BF Theories and Knot Invariants
813
Z (ΣK ,x0 ) 2 (γg.f. ) = ψ(x0 )
Z ¯ (∗dψ) ΣK
(∗d¯c) c(x0 ),
(105)
ΣK
while all higher powers vanish. We can now describe the features of the perturbative expansion of hO0 iλ . In the (ΣK ,x0 ) appears, we have to Wick contract the fields A and B on K v.e.v.’s where no γg.f. and on ΣK together in all possible ways discarding all the diagrams that contain a loop. Thus, we are left with chains that connect two points on K through a certain number of bivalent vertices on ΣK . We will call an n-chord such a chain, where n > 0 is the number of links (s. Fig. 2b). (ΣK ,x0 ) , besides the chords described above, we will have a If the v.e.v. contains γg.f. chain connecting x0 with a point on ΣK through a certain number of bivalent vertices on ΣK . We will call an n-flagellum such a chain (s. Fig. 2c). (ΣK ,x0 ) 2 ) . They contain some chords and Eventually, we have v.e.v.’s containing (γg.f. two flagella. For a v.e.v. not to vanish, the total number of links in all the chords must be even; moreover, the total number of links in the flagella must be even. In Figs. 1 and 3, some diagrams of nonvanishing v.e.v.’s are shown. At order 2n in λ, we will have a sum of diagrams of this kind with a total number of links equal to 2n.
W
1
2
M
3
W
4
3
1
J ]J
V
1 2
Rw C
3 4
1
J
J
2
1
7 1
2
I0
2
*
J0
Fig. 3. The second-order diagrams
From the structure of the perturbative expansion, it is easy to see that, if hO0 iλ is a topological invariant, then it is invariant under ΣK → ΣK + ∂T with T ∈ 3 (R3 ). In fact, if we have a topological invariant, we can move the region where this deformation occurs to infinity. Since all the vertices on ΣK are connected through a finite number of links to a point on K, this region at infinity does not contribute. This, of course, would not be true if loops on ΣK were allowed. As a final remark, we notice that, if we represent the unknot in standard framing as a planar curve and choose its spanning surface to belong to the same plane, then hO0 [ s.f. , λ]i = 1.
(106)
5.2.6. The second order. Now we want to compute explictly the v.e.v. of 21 (γ (ΣK ,K,x0 ) )2 . By the supersymmetry argument, we know that the v.e.v. of (γ ΣK )2 vanishes. Moreover, the v.e.v. of γ (K,x0 ) γ (ΣK ,x0 ) is equal to the product of the v.e.v.’s of γ (K,x0 ) and γ (ΣK ,x0 ) , which vanish. Thus, we are left with only four contributions, which, after some computations, can be written as
814
A. S. Cattaneo
1 (K,x ) 2 0 (γ ) λ
2 (K,x ) ΣK 0 γ γ
1 (Σ ,x ) 2 λ (γ K 0 ) λ
2Σ γ K γ (ΣK ,x0 ) λ
= = = =
λ2 (M − C), λ2 V, λ2 I 0 , −2λ2 J0 ,
(107)
where the diagrams M , C, V , I0 and J0 are shown in Fig. 3. Explicitly they read R M = Ce0 θ12 ∧ θ34 , R 4 C = Ce0 θ14 ∧ θ23 , R 4 V = Ce1 θ13 ∧ θ23 , (108) R 2 I0 = Ce2 θ01 ∧ θ02 , R 0 J0 = Ce2 θ01 ∧ θ12 . 0 Thus, the second order of hO0 iλ reads 1 (ΣK ,K,x0 ) 2 (γ ) = λ2 (M − C + V + I0 − 2J0 ). 2 λ
(109)
5.3. The v.e.v. of the corrected observable. As explained in Subsect.4.3, we do not expect hO0 iλ to be a knot invariant (at least not with a general framing), since, in general, O0 is not an observable. In Subsect.4.4, we have seen that there is a procedure that leads to an observable O starting from O0 . We have computed the first correction (77) explicitly and have shown that the corrected second-order v.e.v. is given by the v.e.v. of the observable O2 defined in (83). In this section, we will compute this v.e.v. explicitly and show that it is a knot invariant. 5.3.7. The v.e.v. of O2 . To evaluate the v.e.v. of the correction iλU1 , we notice that, by (78) and Statement 1, (110) hiλU1 iλ = hiλ1u1 iλ = hσu1 iλ . By (79), we have then with
e1 + U e2 , hiλU1 iλ = U
(111)
D E R R ∗ ∗ e1 = γABB σ U , B + γ σ A BAA ΣK ΣK D R R λ ∗E ∗ e2 = − (σγABB ) U B + (σγBAA ) ΣK A . ΣK
(112)
λ
Finally, an explicit evaluation of this v.e.v.’s yields e1 = 2λ2 (X − 2M − C), U e U2 = λ2 (Hl + Hr ), where the new diagrams X, Hl and Hr are shown in Fig. 4. Explicitly they read R X = Ce0 θ13 ∧ θ24 , R 4 Hl = Ce1 θ12 ∧ θ13 , R 2 Hr = Ce1 θ21 ∧ θ23 . 2
Therefore, the correction to hO0 iλ reads
(113)
(114)
Abelian BF Theories and Knot Invariants
815
1
2
N X
N
3
3
4
1
3
U Hl
2
B
B
1
MBB
Hr
2
Fig. 4. The second-order correction diagrams
hiλU1 iλ = λ2 (2X − 4M − 2C + Hl + Hr ).
(115)
To get the complete v.e.v. of O2 , we have to add (115) to (109). First, however, it is useful to notice that the square of the self-linking number Z θ12 (116) slkK = 2 e20 C can be written as
(slkK)2 = 8(M + C − X);
(117)
3 hO2 iλ = − λ2 (slkK)2 + λ2 w2 , 8
(118)
w2 = −X + V + Hl + Hr + I0 − 2J0 .
(119)
thus,
with
We conclude this subsection by noticing that, if we take the unknot as a planar curve and choose its spanning surface to lie in the same surface, it is immediately proved that w2 ( ) = 0.
(120)
Shortly we will prove that w2 is a knot invariant; thus, (120) actually holds for any presentation of the unknot. 5.3.8. The invariance of the second-order term. Now we will show that the principalface variation of w2 vanishes; we will follow the approach of Ref. [10], which we have recalled in Subsect. 5.1.4. Referring to Figs. 3, 4 and 5, we start considering the principal-face contributions δX δV δHl δHr δI0 δJ0
= cl − m + cr , = cl + m + cr − 2v + `l + `r , = −m + h − hl + hr − `l , = −m − h − hl + hr − `r , = 2hl + 2`0 , = −v + hr + `0 ,
(121)
which by (119) imply δw2 = 0.
(122)
It should be clear from Fig. 5 what cl , m, cr , v, hl and hr mean. To write the diagrams h, `l , `0 and `r , we need to introduce explicitly the map 8 : B 2 −→ R3
(123)
816
A. S. Cattaneo
U
N
@ I @
W
v
W
`l
m
@
?
W
cl
?
cr
h
KAA
R
hl
A
hr
?
`r
1 ?
`0
Fig. 5. The principal-face diagrams
that defines the surface ΣK . The diagram h is given by Z θ12 ∧ θ11 , h= e11 C
(124)
where θ11 is the pull back of the volume form ω through the map φ11 (x) =
˙ 1) 8(x , ˙ |8(x1 )|
e1 , x∈C 1
(125)
˙ denotes the derivative of 8 in the direction tangent to the knot (notice that 8(x1 ) and 8 is on the knot). To describe the remaining diagrams, we have also to introduce 80 , i.e., the derivative of 8 w.r.t. the other coordinate in the parametrization of the surface. In ˙ To obtain an orthogonal vector we general the vector 80 will not be orthogonal to 8. define ˙ 80 · 8 ˙ 8. (126) 80⊥ = 80 − ˙ 2 |8| Then the diagrams `l and `r read
R R `l = Ce0 [θ12 ∧ U2 θ2 ], R 2 R `r = Ce0 [θ12 ∧ U1 θ1 ],
(127)
2
where the one-dimensional manifolds Ui are defined as ˙ i )|2 + (u22 )2 |80⊥ (xi )|2 = 1, Ui = {(u1 , u12 , u22 ) ∈ R × R × R+ / [(u1 )2 + (u12 )2 ]|8(x 1 u1 + u2 = 0}, (128) and θi is the pull back of ω to Ui through the map φi (u1 , u12 , u22 ) = Finally,
˙ i ) + u22 80⊥ (xi ) (u12 − u1 )8(x . ˙ i ) + u2 80⊥ (xi )| |(u12 − u1 )8(x 2 Z
`0 =
Z e01 C
[θ01 ∧
e0 U
θe0 ],
(129)
(130)
Abelian BF Theories and Knot Invariants
817
where the one-dimensional manifold Ue0 is defined as ˙ 0 )|2 + (u2 )2 |80⊥ (x0 )|2 = 1}, Ue0 = {(u1 , u2 ) ∈ R × R+ /(u1 )2 |8(x
(131)
and θe0 is the pull back of ω to Ue0 through the map ˙ 0 ) + u2 80⊥ (x0 ) u1 8(x . φe0 (u1 , u2 ) = 1 ˙ 0 ) + u2 80⊥ (x0 )| |u 8(x
(132)
Notice that in (121) we have written only the non vanishing contributions. (Actually, more sophisticated arguments, s. App. C, show that also `l , `r and `0 vanish.) All other possible terms vanish for one of the following reasons: 1. we have to integrate a form on a space of lower dimension; 2 appears, or 2. a factor θij 3. the push forward vanish because of a symmetry. An example of the first case is the push forward of [V ] along the face obtained by sending 3 to 0 which gives Z Z e0 U
[
e20 C
θ10 ∧ θ20 ].
The second case happens, e.g., when we push forward [V ] along the face where we send 1 to 2. The third case occurs in the push forward of [J0 ] when we send 1 to 2, the symmetry being the exchange of 1 with 2 which does not reverse the orientation of the manifold we are integrating over but changes the sign of the form to be integrated. For the same reasons, the push forwards of [X], [I0 ], [J0 ] and [Hl ] + [Hr ] along the hidden faces vanish. The only non-trivial case is the push forward of [V ] along the hidden face where 1, 2 and 3 come together. This case is analyzed in App. C, and a vanishing theorem is proved. These results together with (122) prove the following Theorem 4. The corrected second-order term w2 is a topological invariant of the imbedding ΣK of B 2 in R3 . As a consequence, if we deform the imbedding ΣK by adding to it the boundary of a three-cycle, we can always move this deformation to infinity. Since all the vertices on ΣK are connected through at most two θ’s to a point living on K, the deformation at infinity will not contribute. Therefore, w2 actually depends only on K, and we have the following Theorem 5. The corrected second-order term w2 is a knot invariant. The chord-diagram contribution −X to w2 is exactly the same that appears in the invariant studied in [21, 4]. This invariant is known to be equal to the second coefficient a2 of the Alexander–Conway polynomial plus a constant term (viz., the value it takes on the unknot). Notice that the chord diagram −X alone is not a knot invariant. To get a knot invariant we have to add to it either the other terms that define w2 – let call W their sum – or the diagram Y considered in [21, 4, 10] (viz., a diagram with a trivalent vertex in R3 ). Since both −X + W and −X + Y are knot invariants, also T = W − Y is a knot invariant. Our claim is that T is trivial (i.e., it is the same for all knots). To prove it, it is enough to check that T takes the same value on two knots K+ and K− that differ only around a chosen crossing. We notice that the difference T (K+ ) − T (K− ) comes from a singularity at the
818
A. S. Cattaneo
crossing point where the flip occurs – as in [4] – or along the line where the two spanning surfaces get to intersect. However, it is not difficult to check that such singularities do not arise; so T is a constant. Therefore, w2 is equal to a2 plus a constant. However, since by definition a2 ( ) = 0, (120) implies (133) w 2 = a2 , and Conjecture 3 on page 808 is satisfied at this order. As a concluding remark, we notice that in passing from Y to W one of the integrations on the knot is replaced by an integration on the spanning surface; so it should be possible to relate W and Y directly via Stokes’s theorem. 5.3.9. Higher orders. Theorem 3 ensures that a quantum observable O extending O0 exists. Its v.e.v. at order λn will be given by diagrams containing n propagators connecting points on the knot and/or on the spanning surface. Of course, the restrictions given in Subsect. 5.2.5 for the v.e.v. of O0 do not hold anymore; in particular, the vertices on the knot will not necessarily be univalent and the vertices on the spanning surface will not necessarily be bivalent (we have already seen a counterexample at the second order). However, no loops on the surface will appear (since the corrections must vanish when the spanning surface is boundariless). Moreover, since the v.e.v. of O0 vanishes at odd order, we do not need odd-order corrections. The combinatorics of these diagrams will be dictated by the specific form of the corrected observable O. What we expect, by field-theoretical arguments, is that these combinations of diagrams will be metric independent, i.e., will be the sum of invariants possibly times powers of the self-linking number, i.e., “isolated chords" in the diagrams. The true invariants will then be obtained by factorizing the isolated chords. A rigorous mathematical proof that they are actually knot invariants will simply require checking that the principal-face contributions of the diagrams that sum up cancel each other, for Thm. 7 in App. C.2 ensures that the push forwards along hidden faces always vanish. Notice that now we could also throw away the BF field theory and directly study δ-closed combinations of diagrams with vertices on the knot and/or on the spanning surface. By Thm. 7, these will yield knot invariants as well as higher-degree cohomology classes on the space of imbeddings (the degree being given by 2l − n − 2t where l is the number of propagators, n the number of points on the knot and t the number of points on the surface).
6. A Glimpse to Higher Dimensions There is no problem in defining the Abelian BF theory in any dimension: just take A and B to be fields taking values in p (M ) and q (M ) respectively, with p + q + 1 = d and d = dim M . The classical action (19) can easily be extended to a BV action. The partition function then is known to be equal to the Ray–Singer torsion or to its inverse (depending on p) [29, 8]. Moreover, it is not difficult to generalize the observable (53), where now Σ ∈ Hd−1 (M, ∂M ). The classical part of this observable reads Z Σ B ∧ A, (134) γcl = Σ
Abelian BF Theories and Knot Invariants
819
and satisfies sγclΣ = 0,
on shell,
(135)
where on shell means modulo the classical equations of motion, dA = 0,
dB = 0,
(136)
sA = dc,
sB = dψ
(137)
and s is the BRST operator (now c and ψ are a (p − 1)- and a (q − 1)-form respectively). If ΣK is a spanning surface for a (d − 2)-knot K (i.e., an imbedding of S d−2 in Rd ), then I ΣK [ψ ∧ A + (−1)q B ∧ c], on shell. (138) sγcl = K
To get an on-shell s-closed functional, we have to add to γ ΣK another term canceling the r.h.s. of (138). We first notice that γ (K,x0 ) as in (61) can be generalized in any dimension, where now η is the tautological (d − 3)-form on the configuration space of K\x0 . An explicit computation shows that I (K,x0 ) d+1+p = (−1) [ψ ∧ A + (−1)q+d+1 B ∧ c], on shell. (139) sγcl K
Therefore, in odd dimension we can define the following on-shell s-closed functional γcl(ΣK ,K,x0 ) = γclΣK + (−1)p+1 γcl(K,x0 ) .
(140)
Then, starting from (140), the BV procedure will yield a σ-closed observable. Finally, we would like to consider an object like O0 in (66), for its v.e.v. should be related to the Alexander–Conway polynomial (or its inverse, depending on p). Of course, we do not expect O0 to be an observable, so we should look for corrections as explained in Subsect. 4.4. Notice that in any dimension it is possible to define linear combinations A and B generalizing (26), (27) and (28) (and including the whole set of ghosts for ghosts). Moreover, in odd dimension the classical action is invariant under (A, B) → (B, A) while γcl(ΣK ,K,x0 ) is odd under it. Thus, their BV extensions will share the same property under (A, B) → (B, A). This leads to proving a generalization of Thm. 3 in Subsect. 4.4 stating that O0 is never anomalous. We have only to check that the form degree of the one-ghost-number component of any form with well-defined parity under the above transformation never matches with the dimension of a nontrivial homology space of Cn (Rd ). As a matter of fact, these dimensions are multiples of (d − 1). On the other hand, forms with well-defined parity are obtained by products of B ∧ A and Be ∧ A, both of which are overall (d − 1)-forms, times a certain number r of tautological (d − 3)forms η; thus, the form degree of the one-ghost-number component will be congruent to −1 − 2r mod (d − 1). Then our claim follows from the fact that 2r + 1 ≡ 0 mod(d − 1) has no solutions if d is odd. The v.e.v. of O should then yield metric-independent functionals of the knot and its spanning surface. Eventually, if a vanishing theorem holds, these functionals will be knot invariants. Hence, we could compute numerical knot invariants (presumably the
820
A. S. Cattaneo
coefficients of the Alexander–Conway polynomial or its inverse) in any odd dimension in terms of integrals over the configuration spaces of points on the knot and on its spanning surface. Via Stokes’s theorem, the second-order invariants should correspond, up to a constant term, to the ones proposed in [9]. As a final remark, we notice that in three dimensions we could have chosen A to be a zero-form and B a two-form. In this case, the v.e.v. of O should give directly the Alexander–Conway polynomial instead of its inverse. 7. Conclusions In this paper we have considered a new way of obtaining knot invariants from a TQFT. The nice feature of our theory is that it is Abelian. What makes things non-trivial is a rather involved observable, which can be defined only in the context of BV formalism; yet, as observed in the last section, it can be generalized in any odd dimension. In the three-dimensional case, we have shown that at the second order the theory actually produces a numerical knot invariant which, despite the fact it is not new, comes out written in an entirely new way. The next task is to find the other corrections to the observable and to evaluate higherorder v.e.v.’s, in three dimensions as well as in any odd dimension. Of course, an alternative way would be working directly on the space of surfaceplus-knot diagrams, as described in Subsect. 5.1.4, and try to find combinations whose differential vanishes. This would allow studying higher-degree forms on the space of imbeddings as well. Notice that while the Chern-Simons and the BF theory with a cosmological term produce the whole set of HOMFLY polynomials and their “colored" generalizations, pure BF theories, both Abelian and non-Abelian, give only the Alexander–Conway polynomial. However, it is possible that even more involved observables exist whose v.e.v. is a more general knot invariant. As a matter of fact, pure BF theory comes out naturally as a particular limit of the v.e.v. of a cabled Wilson loop in the theory with a cosmological term [12]. This limit corresponds to the first diagonal in the (h, d) expansion of the colored Jones function [25]. A generalization of the computation done in [12] should give the observables whose v.e.v.’s correspond to the upper diagonals in this expansion. Then a careful study of the “Abelianizing limit" described in the Introduction should yield the corresponding observables for the Abelian theory. A. The Treatment of the Harmonic Zero Modes If the ω-Laplacian d∗ω dω + dω d∗ω has zero modes, the formulae in Sect. 3 have to be slightly modified. We will essentially follow the approach explained in [8], adapting it to the BV formalism. Notice that we suppose here that H 1 (M, dω ) is not trivial, but we go on assuming that H 0 (M, dω ) is trivial since we want the symmetry group to act freely. The first step is to modify the BV action (23) so as to include the symmetry obtained by adding an ω- [(−ω)-]harmonic form to A [B]; viz., bX 1 [ω] Z † † ω ω ∗ α ∗ α α α ¯ (141) (A ∧ k ϕα + B ∧ p ϕα ) + kα χ + p¯α π , S −→ S + α=1
M
Abelian BF Theories and Knot Invariants
821
where • b1 [ω] = dim H 1 (M, dω ); • {ϕα } [{ϕα }] is an orthonormal basis of ω- [(−ω)-]harmonic one-forms; it is convenient to choose the normalization
1 (142) hϕα , ϕβ i = ϕα , ϕβ = v 3 δαβ , where v is the volume of the manifold M (which we suppose to be compact); • k α and pα are constant fields with ghost number one; • k¯ α and p¯α are constant fields with ghost number minus one, and • χα and π α are constant fields with ghost number zero. Notice that now the action (141) is not metric independent; as a matter of fact, the choice of the bases {ϕα } and {ϕα } requires a volume form. Thus, we cannot expect the partition function to be a topological invariant. However, it is not difficult to see that the argument given in Sect. 3 to prove that the partition function depends only on the cohomology class of ω still holds. The action of the σ ω operator is the same as in (24) and (25) but on A and B where it acts as follows: ω
σ A = dω c +
bX 1 [ω]
α
k ϕα ,
ω
σ B = dω ψ +
α=1
bX 1 [ω]
pα ϕ α .
(143)
α=1
Moreover, we have
† R σ ω kα = M A∗ ∧ ϕα , σ ω kα = 0, † R σ ω pα = M B ∗ ∧ ϕα , σ ω pα = 0, † † † σ ω χα = −k¯ α , σ ω k¯ α = 0, σ ω k¯ α = χα , σ ω χα = 0, † † † σ ω πα = −p¯α , σ ω p¯α = 0, σ ω p¯α = π α , σ ω π α = 0. The gauge-fixing fermion defined in (40) has now to be modified as 9 −→ 9 +
bX 1 [ω]
k¯ α hϕα , Ai + p¯α hϕα , Bi
(144)
(145)
(146)
α=1
in order to fix the new symmetries (143). With this choice of gauge, we have Z bY 1 [ω] ω Z[M, ω] = [D8] dk α dk¯ α dpα dp¯α dχα dπ α exp (iSg.f. ),
(147)
α=1
where and
ω Sg.f.
[D8] = [DA DB Dc Dc¯ Dψ Dψ¯ Dhc Dhψ ], is the following modification of (42)
ω ω −→ Sg.f. + Sg.f.
bX 1 [ω]
v (k¯ α k α + p¯α pα ) + χα hϕα , Ai + π α hϕα , Bi .
(148)
(149)
α=1
An explicit computation of (147), in zeta-function regularization, shows that Z[M, ω] = v
2b1 [ω] 3
T (M, dω );
(150)
thus, apart from a volume factor, the partition function is still a topological invariant.
822
A. S. Cattaneo
B. The First Correction to the Observable O0 In this appendix we solve eqn. (74) for n = 1. By (67), we have σO1 =
1 O0 γ (ΣK ,K,x0 ) , γ (ΣK ,K,x0 ) ; 2
(151)
thus, by (77),
1 (ΣK ,K,x0 ) (ΣK ,K,x0 ) γ ,γ . (152) 2 An explicit computation shows that the nonvanishing terms of (152) are given by ÿZ ! Z σU1 =
[B ∗ (ψ − ψ(x0 )) + (c − c(x0 ))A∗ ] .
A1 η12 B2 ,
2σU1 = C2 (K\x0 )
(153)
ΣK
(We suppress all the ∧ symbols for simplicity.) Since the antibracket contracts fields evaluated at the same point, we have the following identities: R R ∗ A η B , B (ψ − ψ(x )) 1 12 2 0 ΣK CR2 (K\x0 ) R = C2 (K\x0 ) A1 η12 B2 (ψ2 − ψ0 ) , ΣK B ∗ , R (154) R ∗ A η B , (c − c(x ))A 1 12 2 0 C2 (K\x0 ) ΣK R R = − C2 (K\x0 ) (c1 − c0 )A1 η12 B2 , ΣK A∗ ; moreover, since the antibracket of A with A∗ is the same as the antibracket of B with B ∗ , we have also the following identities: R ∗ R A R ∗ R R R 1 η12 B2 ψ2 , B − B1 η12 A2 ψ2 , B ∗ , =R A1 η12 A2 ψ2R, A (155) ∗ A R ∗ R ∗ R R 1 c1 η12 B2 , A B1 c1 η12 A2 , A . = B1 c1 η12 B2 , B − By (154) and (155), (153) now reads R R 2σU1 = C2 (K\x0 ) [A1 η12 A2 ψ2 + B1 c1 η12 A2 + c0 A1 η12 B2 ] , ΣK A∗ + R R − C2 (K\x0 ) [B1 η12 A2 ψ2 + B1 c1 η12 B2 + A1 η12 B2 ψ0 ] , ΣK B ∗ .
(156)
If now we define γABB and γBAA as in (80) and (81), we see that R R 1 (ΣK ,K,x0 ) , γ (ΣK ,K,x0 ) = − σγABB , ΣK B ∗ − σγBAA , ΣK A∗ = 2 γ i h R R = 1 (σγABB ) ΣK B ∗ + (σγBAA ) ΣK A∗ . (157) Since Z Z B ∗ ) + γBAA (σ A∗ ) = 0, 1 γABB (σ ΣK
we get
ΣK
Abelian BF Theories and Knot Invariants
823
1 (ΣK ,K,x0 ) (ΣK ,K,x0 ) γ ,γ (158) = −1σu1 = σ1u1 , 2 where we have used (11), and u1 is defined in (79). By (152) and (77), we arrive at (78) up to a σ-closed term. In particular, the 1-, σ-closed correction leading to (85) can be written as k (159) U1 → U1 − δU1 + lτ + iλm, 2 with τ defined in (84), and (160) δU1 = 1δu1 , Z
where K K 2 δu1 = γA (γB )
In fact,
ΣK
K K 2 B ∗ + γB (γA )
Z
A∗ .
(161)
ΣK
iλ1δu1 = σδu1 − δu1 = −2τ 2 − δu1 ;
(162)
therefore, by Statement 1 in Subsect. 2.3,
hiλ1δu1 iλ = −2 τ [K]2 λ .
(163)
C. Vanishing Theorems C.1. The hidden-face contribution to dV . In this subsection we show that dV does not have hidden-face contributions. The only nontrivial hidden face is the one where the three points come together (it is easy to show that the pushforward of [V ] along all other hidden faces vanishes). We will show that the push forward of [V ] vanishes on this face, too. To begin with, we parametrize the three points on this face, which we will denote by S, as ˙ u1 + · · · , x1 = 8(y) + r |8(y) ˙ 8(y)| x2 = 8(y) + r |8(y) u2 + · · · , ˙ 8(y)| ˙ 80⊥ (y) u + u + ···, x3 = 8(y) + r |8(y) 0 3 4 ˙ |8 (y)| 8(y)| ˙
(164)
⊥
where y ∈ S , r → 0 and the u variables live in the two-dimensional manifold P4 P3 (165) U = { (u1 , u2 , u3 , u4 ) ∈ R3 × R+ /u1 < u2 , i=1 (ui )2 = 1, i=1 ui = 0, (ui , 0) 6= (u3 , u4 ), i = 1, 2}. 1
+
Following the approach of [10], we want to represent [V ] on S as the pull back of a ˆ More precisely, we introduce the maps universal form λ. e0 → S 2 , ψ: C 1 ˙ , y 7→ |8(y) ˙ 8(y)| e0 → χ: C 1 y 7→ and
S2,
80⊥ (y) |80⊥ (y)| ,
(166)
(167)
824
A. S. Cattaneo
e0 → S 2 × S 2 . f = (ψ, χ) : C 1
(168)
Then we consider the commutative diagram fˆ
2 2 S −→ U × S × S , yπ yπˆ
e 0 −→ C 1 f
(169)
S2 × S2
where fˆ maps S into the parametrization U of the blow up times the value of f , while π and πˆ are projections along the fibers. Eventually, we notice that there exists a universal four-form λˆ on U × S 2 × S 2 such that [V ] = fˆ∗ λˆ
ˆ and π∗ [V ] = f ∗ πˆ ∗ λ.
(170)
To write λˆ explicitly, we have to introduce the maps φ13 , φ23 and φ defined as φi3 : U × S 2 × S 2 → (u, a, b)
, S2 (u − u 3 i )a − u4 b 7 → |(u3 − ui )a − u4 b|
i = 1, 2,
(171)
and Then we have
φ = (φ13 , φ23 ) : U × S 2 × S 2 → S 2 × S 2 .
(172)
λˆ = φ∗ (ω1 ∧ ω2 ),
(173)
where ω1 and ω2 are the SO(3)-invariant unit elements on the two spheres. Now we notice that φ and πˆ are equivariant maps under the action of SO(3) (we mean the diagonal action on S 2 × S 2 and the trivial action on U), and that ω1 ∧ ω2 is SO(3)-invariant. Thus, πˆ ∗ λˆ is rotationally invariant as well. Since it is a two-form and the only rotational invariant forms (up to multiplication by a scalar) on S 2 are 1 and ω, we conclude that πˆ ∗ λˆ reads πˆ ∗ λˆ = c1 ω1 + c2 ω2 , (174) where c1 and c2 are constants. Finally, we consider the (orientation reversing) automorphism Θ of S 2 that maps a point into its antipode, and notice that Θ∗ ω = −ω. Therefore, if we consider the diagonal extension of Θ to S 2 × S 2 , we see that ˆ Θ∗ πˆ ∗ λˆ = −πˆ ∗ λ.
(175)
On the other hand, φ and πˆ are equivariant maps under the action of Θ (we mean the diagonal action on S 2 × S 2 and the trivial action on U ). Since Θ∗ (ω1 ∧ ω2 ) = ω1 ∧ ω2 , we conclude that ˆ (176) Θ∗ πˆ ∗ λˆ = πˆ ∗ λ. This, together with (175), implies that πˆ ∗ λˆ vanishes. Thus, owing to (170), we have proved the following Theorem 6. The push forward of [V] along the hidden face where all the points collapse together vanishes.
Abelian BF Theories and Knot Invariants
825
C.2. The general vanishing theorem. A generalization of the previous argument leads to proving a more general vanishing theorem for diagrams involving tautological forms connecting points on the knot and/or on its spanning surface. The first step is to generalize the commutative diagram (169). In general, the stratum S we are considering will have a natural projection π to a configuration space with less e 0 in the lower left corner of (169)]. Suppose we start with points [which will replace C 1 n points on the knot and t points on the surface, i.e., with the (n + 2t)-dimensional ent . We want q points on the knot and s points on the surface to configuration space C collapse. There are three cases: Case 1) q = 0 and the s points collapse together on the surface; Case 2) the points collapse on the base point; Case 3) otherwise. e t−s ; in case 3, to C e t−s . The dimension ent−s+1 ; in case 2, to C In case 1, π projects S to C n−q n−q+1 of the fiber, which we shall denote by D, can be computed noticing that dim S = ent − 1. In case 1, we have D = 2s − 3; in case 2, D = q + 2s − 1; in case 3, dim C D = q + 2s − 2. e t vanishes if D > 0. Since We will prove that the push forward along a face of C n D = 0 is satisfied only in case 2, with (q = 1, s = 0), or in case 3, with either (q = 0, s = 1) or (q = 2, s = 0), we have the following Theorem 7. The push forward along a hidden face always vanishes; so does the push forward along a principal face unless it is simple. (For a definition of principal, hidden and simple principal faces, s. the end of Subsect. 5.1.) In particular, this proves that the contributions `l , `r and `0 – considered in Subsect. 5.3.8 – vanish. Proof. First we split the form on S into the product λ1 ∧ π ∗ λ2 . The propagators that define λ2 connect either two points that do not collapse, or a point that does not collapse with a point that does, or two of the q points on the knot that collapse together. The propagators that define λ1 connect two collapsing points at least one of which is on the surface. It follows that λ1 is written in terms of products of pull backs of ω via maps that are ˙ 8| ˙ and 80⊥ /|80⊥ |. Thus, S will be mapped linear combinations of the unit vectors 8/| 2 n to a proper submanifold of (S ) unless n = 0, n = 1 or n = 2. This means that λ1 vanishes unless it is a zero-, a two- or a four-form. In the first case, however, π∗ λ1 clearly vanishes unless D = 0. Now the idea is to generalize the commutative diagram (169) and write λ1 in terms of a universal form λˆ 1 : λ1 = fˆ∗ λˆ 1
and π∗ λ1 = f ∗ πˆ ∗ λˆ 1 .
(177)
The left column of (169) is unchanged if we still denote by U the parametrization of the blow up. Explicitly the manifold U reads Ps P 2 Ps α 2 U = { ((u11 , u21 ), . . . , (u1s , u2s )) ∈ (R × R+ )s / i=1 α=1 (uα i ) = 1, i=1 ui = 0, α = 6 u ∀α if i and j are connected}, uα i j (178)
826
A. S. Cattaneo
in case 1; U = { (u1 , . . . , uq , (u1q+1 , u2q+1 ), . . . , (u1q+s , u2q+s )) ∈ Rq × (R × R+ )s / Pq Pq+s P2 2 u1 < · · · < uq , i=1 (ui )2 + i=q+1 α=1 (uα i ) = 1, α α 1 2 ui 6= uj ∀α and (ui , 0) 6= (uj , uj ) if i and j are connected},
(179)
in case 2, and U = { (u1 , . . . , uq , (u1q+1 , u2q+1 ), . . . , (u1q+s , u2q+s )) ∈ Rq × (R × R+ )s / Pq Pq+s P2 2 u1 < · · · < uq , i=1 (ui )2 + i=q+1 α=1 (uα i ) = 1, Pq Pq+s 1 i=1 ui + i=q+1 ui = 0, α α ui 6= uj ∀α and (ui , 0) 6= (u1j , u2j ) if i and j are connected},
(180)
in case 3. ˆ defined as The universal form λˆ 1 is written in terms of pullbacks of ω via φ-maps follows: (u1j − ui )a + u2j b φˆ ij = , if i ≤ q and q < j ≤ q + s, (181) |(u1j − ui )a + u2j b| φˆ ij =
(u1j − u1i )a + (u2j − u2i )b , |(u1j − u1i )a + (u2j − u2i )b|
if q < i < j ≤ q + s,
(182)
and φˆ ji = −φˆ ij . We must now distinguish two subcases: viz., when λˆ 1 is a two-form and when it is a four-form. ˆ to S 2 . In the first subcase, λˆ 1 is obtained via pull back through a single φ-map ˆ Rotational invariance shows that πˆ ∗ λ1 does not vanish only if it is a zero- or a two-form; in the former instance it is a constant, in the latter it is a linear combination of ω1 and ω2 . However, πˆ ∗ λˆ 1 must be odd under the action of the automorphism Θ that maps a point on S 2 into its antipode since ω is odd. Therefore, it does not vanish only in the latter instance. However, if both λˆ 1 and πˆ ∗ λˆ 1 are two-forms, we have D = 0. ˆ In the second subcase, λˆ 1 is obtained via pull back through a product of two φ-maps 2 2 ˆ to S × S , as, e.g., in (172). Rotational invariance shows that πˆ ∗ λ1 does not vanish only if it is a zero-, a two- or a four-form; in the first instance it is a constant, in the second it is a linear combination of ω1 and ω2 , in the third it is a multiple of ω1 ∧ ω2 . However, πˆ ∗ λˆ 1 must be even under the diagonal action of the automorphism Θ since ω1 ∧ ω2 is even. Thus, πˆ ∗ λˆ 1 does not vanish only if it is a zero- or a four-form. Since λˆ 1 is a four form, we have D = 4 in the former instance and D = 0 in the latter. Therefore, to complete the proof we have only to show that, in the former instance, πˆ ∗ λˆ 1 vanishes. This happens since, if D = 4, πˆ ∗ selects the (4, 0)-component of the four-form λˆ 1 on U × (S 2 × S 2 ). This component, however, vanishes since, for fixed (a, b) ∈ S 2 × S 2 , φˆ maps the four-dimensional manifold U into a two-dimensional submanifold of S 2 × S 2 [this submanifold is parametrized by two unit vectors zi (u; a, b) satisfying zi · (a × b) = 0, i = 1, 2]. This concludes our proof. Acknowledgement. I thank D. Anselmi, P. Cotta-Ramusino and R. Longoni for helpful conversations. I am especially thankful to R. Bott for a number of very useful discussions. This work was supported by INFN Grant No. 5077/94 and DOE Grant No. DE-FG02-94ER25228, Amendment No. A003.
Abelian BF Theories and Knot Invariants
827
References 1. Alexander, J.W.: Topological Invariants of Knots and Links. Trans. Am. Math. Soc. 30, 275–306 (1928); Conway, J.H.: An Enumeration of Knots and Links, and Some of Their Algebraic Properties. In: Computational Problems in Abstract Algebra, edited by J. Leech, New York: Pergamon Press, 1970, pp. 329–358 2. Alfaro, J. and Damgard, P.M.: Origin of Antifields in the Batalin–Vilkovisky Lagrangian Formalism. Nucl. Phys. B 404, 751–793 (1993) 3. Anselmi, D.: Removal of Divergences with the Batalin–Vilkovisky Formalism. Class. Quant. Grav. 11 2181–2204 (1994); More on the Subtraction Algorithm. Class. Quant. Grav. 12, 319–350 (1995) 4. Bar-Natan, D.: Perturbative Aspects of the Chern–Simons Field Theory. Ph. D. Thesis, Princeton University, 1991; Perturbative Chern–Simons Theory. J. of Knot Theory and its Ramifications 4, 503–548 (1995) 5. Bar-Natan, D.and Garoufalidis, S.: On the Melvin–Morton–Rozansky Conjecture. Harvard University preprint, 1994 (available at ftp://ftp.ma.huji.ac.il/drorbn) 6. Batalin, I.A. and Vilkovisky, G.A.: Relativistic S-Matrix of Dynamical Systems with Boson and Fermion Constraints. Phys. Lett. 69 B, 309–312 (1977); Fradkin, E.S. and Fradkina, T.E.: Quantization of Relativistic Systems with Boson and Fermion First- and Second-Class Constraints. Phys. Lett. 72 B, 343–348 (1978) 7. Becchi, C., Rouet, A. and Stora, R.: Renormalization of the Abelian Higgs–Kibble Model. Commun. Math. Phys. 42, 127 (1975); Tyutin, I.V.: Lebedev Institute preprint N39, 1975 8. Blau, M. and Thompson, G.: Topological Gauge Theories of Antisymmetric Tensor Fields. Ann. Phys. 205, 130–172 (1991) 9. Bott, R.: Configuration Spaces and Imbedding Invariants. Turkish J. of Math. 20, 1–17 (1996) 10. Bott, R. and Taubes, C.: On the Self-Linking of Knots. J. Math. Phys. 35, 5247–5287 (1994) 11. Cattaneo, A.S.: Teorie topologiche di tipo BF ed invarianti dei nodi. Ph. D. Thesis, Milan University, 1995 (available at ftp://pctheor.uni.mi.astro.it/pub/tesi.ps) 12. Cattaneo, A.S.: Cabled Wilson Loops in BF Theories. J. Math. Phys. 37, 3684–3703 (1996) 13. Cattaneo, A.S., Cotta-Ramusino, P., Fr¨ohlich, J. and Martellini, M.: Topological BF Theories in 3 and 4 Dimensions. J. Math. Phys. 36, 6137–6160 (1995) 14. Cattaneo, A.S., Cotta-Ramusino, P. and Martellini, M.: Three-Dimensional BF Theories and the Alexander–Conway Invariant of Knots. Nucl. Phys. B 436, 355–382 (1995) 15. Cheeger, J.: Analytic Torsion and the Heat Equation. Ann. Math. 109, 259–322 (1979); M¨uller, W.: Analytic Torsion and the R-Torsion of Riemannian Manifolds. Adv. Math. 28, 233–305 (1978) 16. Cotta-Ramusino, P. and Martellini, M.: BF -Theories and 2-Knots. In Knots and Quantum Gravity. edited by J. Baez, Oxford NY: Oxford University Press, 1994, hep-th/9407097 17. Freyd, P., Yetter, D., Hoste, J., Lickorish, W.B.R., Millett, K. and Ocneanu, A.: A New Polynomial Invariant of Knots and Links. Bull. Am. Math. Soc. 12, 239–246 (1985) 18. Fr¨ohlich, J., G¨otschmann, R. and Marchetti, P.A.: Bosonization of Fermi Systems in Arbitrary Dimensions in Terms of Gauge Forms. J. Phys. A 28, 1169–1204 (1995); The Effective Gauge Field Action of a System of Non-Relativistic Electrons. Commun. Math. Phys. 173, 417–452 (1995) 19. Fr¨ohlich, J. and King, C.: The Chern–Simons Theory and Knot Polynomials. Commun. Math. Phys. 126, 167–199 (1989) 20. Fulton, W. and MacPherson, R.: Compactification of Configuration Spaces. Ann. Math. 139, 183–225 (1994) 21. Guadagnini, E., Martellini, M. and Mintchev, M.: Chern–Simons Model and New Relations between the HOMFLY Coefficients. Phys. Lett. B 228, 489–494 (1989) 22. Jones, V.F.R.: A Polynomial Invariant for Knots via von Neumann Algebras. Bull. Am. Math. Soc. 12, 103–112 (1985) 23. Longoni, R.: Sviluppo perturbativo delle teorie di campo topologiche di tipo BF e invarianti dei nodi. Laurea Thesis, Milan University, 1996 24. Maggiore, N. and Sorella, S.P.: Finiteness of the Topological Models in the Landau Gauge. Nucl. Phys. B 377, 236–251 (1992) 25. Melvin, P.M. and Morton, H.R.: The Coloured Jones Function. Commun. Math. Phys. 169, 501–520 (1995) 26. Milnor, J.: A Duality Theorem for Reidemeister Torsion. Ann. Math. 76, 137–147 (1962); Turaev, V.G.: Reidemeister Torsion in Knot Theory. Russ. Math. Surveys 41, 97–147 (1986)
828
A. S. Cattaneo
27. Rozansky, L.: A Contribution of the Trivial Connection to Jones Polynomial and Witten’s Invariants of 3d Manifolds. I. Commun. Math. Phys. 175, 275–296 (1996); A Contribution of the Trivial Connection to Jones Polynomial and Witten’s Invariants of 3d Manifolds. II. Commun. Math. Phys. 175, 297–318 (1996) 28. Schwarz, A.: Geometry of Batalin–Vilkovisky Quantization. Commun. Math. Phys. 155, 249–260 (1993); Alexandrov, M., Kontsevich, M., Schwarz, A. and Zaboronsky, O.: The Geometry of the Master Equation and Topological Quantum Field Theory. hep-th/9502010 29. Schwarz, A.S.: The Partition Function of Degenerate Quadratic Functionals and Ray–Singer Invariants. Lett. Math. Phys. 2, 247–252 (1978) 30. Voronov, B.L. and Tyutin, I.V.: Formulation of Gauge Theories of General Form. I. Theor. Math. Phys. 50, 218–225 (1982); Batalin, I.A. and Vilkovisky, G.A.: Existence Theorem for Gauge Algebra. J. Math. Phys. 26, 172–184 (1985); Fisch, J.M.L. and Henneaux, M.: Homological Perturbation Theory and the Algebraic Structure of the Antifield–Antibracket Formalism for Gauge Theories. Commun. Math. Phys. 128, 627–640 (1990) 31. Witten, E.: Quantum Field Theory and the Jones Polynomial. Commun. Math. Phys. 121, 351–399 (1989) 32. Witten, E.: A Note on the Antibracket Formalism. Mod. Phys. Lett. A 5, 487–494 (1990) Communicated by R. H. Dijkgraaf
Commun. Math. Phys. 189, 829 – 853 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Instability and Stability of Rolls in the Swift–Hohenberg Equation? Alexander Mielke Institut f¨ur Angewandte Mathematik, Universit¨at Hannover, Welfengarten 1, 30167 Hannover, Germany. E-mail: [email protected] Received: 25 October 1996 / Accepted: 24 March 1997
Dedicated to Professor K. Kirchg¨assner on the occasion of his sixty-fifth birthday Abstract: We develop a method for the stability analysis of bifurcating spatially periodic patterns under general nonperiodic perturbations. In particular, it enables us to detect sideband instabilities. We treat in all detail the stability question of roll solutions in the two–dimensional Swift–Hohenberg equation and derive a condition on the amplitude and the wave number of the rolls which is necessary and sufficent for stability. Moreover, we characterize the set of those wave vectors σ ∈ R2 which give rise to unstable perturbations. 1. Introduction The bifurcation of periodic patterns for partial differential equations on unbounded domains attracted a lot of attention within the last decade, especially concerning stability aspects. Often stability of bifurcating patterns is studied with respect to perturbations of related symmetry classes. However, for practical purposes it is also important to have stability with respect to general nonperiodic perturbations. To tackle this problem the theory of sideband instabilities was devised starting with the pioneering work of Eckhaus [Eck65]. Yet this theory remained purely formal, due to its usage of multiple scaling arguments. Only very few rigorous results where obtained at that time, as for instance in [KiS69], where instability of bifurcating roll–type solutions in the Navier–Stokes equations was proven whenever the period is not the one which is associated to the critical Reynolds number. However, the Eckhaus criterion for instability was mathematically justified only twenty–five years later: first, for scalar model problems in [CoE90, Mie95] and then for the Navier–Stokes equation in [KvW97, Mi97b]. A more general method, called the principle of reduced instability, was developed in [Mie95, BrM96] which then was applied to the Benjamin–Feir instability of sur?
Research partially supported by Deutsche Forschungsgemeinschaft under Grant Mi 459/2–2.
830
A. Mielke
face waves on a fluid layer of finite depth [BrM95] and to the sideband instabilities of convection rolls in the Rayleigh–B´enard problem [Mi97b]. This principle of reduced instability employs local arguments in the set of wave vectors, and is thus ideally suited to detect sideband instabilities. However, it only provides sufficient conditions for linear instability and is not able to give stability results. It is the purpose of this work to show how necessary conditions for stability can be provided. Problems with more than one unbounded direction display a more complex behavior and are less well understood. First rigorous (in)stability results for the two–dimensional Swift–Hohenberg equation (SHE) were obtained in [Mie95] and [Kuw96]. The former work gives sufficient conditions for instability while the latter also establishes sufficient conditions for stability. However, there remained a region in parameter space where no result could be obtained, see Remark 2 after Theorem 3.2. Here we generalize the principle of reduced instability such that it provides stability results also. In particular, we are able to derive condition (1.2) below which is necessary and sufficient for stability. Moreover, for the case of instability we can characterize the set of those wave vectors σ ∈ R2 which give rise to unstable modes. We will explain the main philosophy of the method in Sect. 2 and work out a first example in Sects. 3 and 4, namely the sideband instabilities for the roll patterns in the SHE: (1.1) ∂t u = −(1 + 1)2 u + εu − u3 , t ≥ 0, x ∈ R2 . p There are roll solutions u(t, x) = Uε,κ (kx1 ) = 4(ε−κ2 )/3 cos(kx1 ) + O(|ε−κ2 |3/2 ) which are independent of (t, x2 ) and periodic in x1 with period 2π/k. For notational convenience we throughout use the parameter κ = k 2 − 1. These solutions exist for all ε ∈ (κ2 , ε0 ] for some small positive ε0 . However, some of these roll patterns are unstable: There are two curves κ = KZ (ε) = O(ε2 ) and ε = EE (κ) = 3κ2 + O(|κ|3 ) such that the rolls with ε < EE (κ) are Eckhaus unstable and that the rolls with κ < KZ (ε) are zigzag unstable. These bounds were known on the formal level for more than 25 years, see [Eck65] and [Bus71] for the first studies. Exploiting the ideas introduced in [KiS69] it is possible to prove instability of the rolls with ε ∈ (κ2 , (1 + c0 )κ2 ) for some small c0 , yet the Eckhaus bound is c0 = 2. A more general theory was developed in [CoE90, KvW97] for the Eckhaus criterion and [Kuw96, Mie95, Mi97b] for both cases. The novel result of the present work is that we are able to show that these conditions are not only sufficient for instability but also necessary: Uε,κ is linearly stable if and only if
κ ≥ KZ (ε) and ε ≥ EE (κ).
(1.2)
This result is stated in Sect. 3 and proved in Sect. 4. Since KZ (ε) = −ε2 /512 + O(ε3 ) we conclude that there are stable rolls with κ < 0. As far as we know this result is new. For comparison we derive, in Sect. √ √ 3, the sideband instabilities of the roll solution Aε,κ (x1 ) = ε − κ2 eikx1 , with k = 1 + κ, of the complex SHE, ∂t A = −(1 + 1)2 A + εA − |A|2 A,
t ≥ 0, x ∈ R2 ,
where A(t, x) ∈ C. This stability problem is easier as it reduces to a purely algebraic one. Lengthy algebraic manipulations yield Aε,κ is linearly stable if and only if
κ ≥ 0 and ε ≥ κ2
6 + 7κ . 2 + 3κ
Stability of Rolls in the Swift–Hohenberg Equation
831
To establish result (1.2) we generalize the principle of reduced instability in Sect. 2. There we use a general setting for arbitrary elliptic operators, however in this introduction we give the ideas for the SHE only. Our notion of stability for Uε,k is always spectral stability, that is, we have to study the spectral problem λv = Bε,κ v
def
2 where Bε,κ v = −(1 + 1)2 v + εv − 3Uε,κ (kx1 )v.
The main difference from classical approaches is that we allow v to lie in W n,∞ (R2 ) rather than restricting it to the space H 4 (T2π ), containing only the patterns with the same periodicity as Uε,κ . (We continue to use Tα = R/αZ for the one–dimensional torus of length α.) Following [MiS95] we use the more general space L2lu (R2 ), the Banach space of uniformly local L2 functions, see Sect. 2 for the definition. The methods developed there imply that SHE defines a global semiflow in L2lu (R2 ). Using the results in [Sca94] we immediately conclude nonlinear instability of Uε,κ if it is spectrally unstable. In the case of spectral stability the nonlinear stability is less understood. For the one– dimensional case (no x2 –dependence) local nonlinear stability in L2 (R) is proved in [Sch96] (for κ = 0), but the case L2lu (R) and the two–dimensional problem are still open. We may treat Bε,κ as operator on L2lu (R2 ) or L2 (R2 ) with domain of definition 4 Hlu (R2 ) or H 4 (R2 ), respectively. The first variant allows us especially to study so– called Bloch waves v given in the form v(x) = ei(kσ1 x1 +σ2 x2 ) V (ξ) with ξ = kx1 , V ∈ 2 (kx1 ), the only x1 – X = H 4 (Tπ ), and wave vector σ ∈ R2 . We use the fact that 3Uε,κ dependent coefficient of Bε,κ , has period π/k since Uε,κ (ξ + π) = −Uε,κ (ξ). The main point is that the whole stability question in L2lu (R2 ) or L2 (R2 ) can be reduced to the study of Bloch waves. Such results are well known for Schr¨odinger operators with periodic potentials, cf. [ReS78], and were generalized to reaction diffusion problems in [Sca94] and to the Navier–Stokes equations in [Sca95]. Because of V ∈ H 4 (Tπ ) it suffices to consider wave vectors σ only in T ∗ = T2 × R and for given σ we are left with a spectral problem for V ∈ X: def
2 V. λV = B(ε, κ, σ)V = −(1 + (1 + κ)(∂ξ + iσ1 )2 − σ22 )2 V + εV − 3Uε,κ
(1.3)
The operators B(ε, κ, σ) are called Bloch operators. The essential feature is the following spectral identity: [ spec(B(ε, κ, σ)) . (1.4) L2 –spec(Bε,κ ) = L2lu –spec(Bε,κ ) = closure σ∈T ∗
We establish this result for general elliptic operators in Appendix A in a short, self– contained way. For the above–mentioned general theory no smallness assumption on the non– constant parts of the coefficients in the operator Bε,κ was needed. However, for the analysis of the spectra of each B(ε, κ, σ) we heavily rely on the fact, that we are dealing 2 k∞ = O((ε − κ2 )). with small perturbations from a homogeneous state, that is, kUε,κ Thus, we are able to study the Bloch operators B(ε, κ, σ) as small perturbations of B(0, 0, σ), which have constant coefficients. For each σ ∈ R2 the linear spectral problem (1.3) can be attacked by the Liapunov–Schmidt reduction with a splitting V = V0 +V1 according to the kernel of B(0, 0, σ). We find reduced finite–dimensional spectral problems
832
A. Mielke
def 0 = b(ε, κ, σ, λ)V0 = P B(ε, κ, σ) − λI V0 + V(ε, κ, σ, λ)V0 ,
(1.5)
where V1 = V(. . .)V0 defines the associated reduction. It is important to note that we can handle general large wave vectors σ; only the eigenvalue parameter λ ∈ C needs to be small. However, since the spectrum of B0,0 is equal to (−∞, 0] ⊂ C, classical perturbation arguments (cf. [Kat76]) show that possible unstable modes can only occur for |λ| = O((ε−κ2 )). In our case X0 = P X is, depending on σ, one– or two–dimensional and thus b(ε, κ, σ, λ) corresponds to a scalar or a 2 × 2– matrix. The control of spec(B(ε, κ, σ)) is now managed by solving def
0 = 3(ε, κ, σ, λ) = det b(ε, κ, σ, λ) for λ as a function of (ε, κ, σ). Our method allows us to characterize the set Sε,κ of unstable wave vectors for the state Uε,κ : Sε,κ = { σ ∈ T ∗ : B(ε, k, σ) has an eigenvalue λ with Re λ > 0 }. In Sect. 4 we give all curves in the (ε, κ) plane where the topological structure of Sε,κ changes. Moreover, we point out some differences between the sets Sε,κ and its A counterpart Sε,κ for the rolls Aε,κ in the complex SHE. The knowledge of the sets Sε,k can in fact be used to study the stability of the solution Uε,k on finite domains = (0, 2πN/k) × (0, 2πL), where N ∈ N and L > 0, with periodic boundary conditions. Considering functions with such periodicity the stability analysis has to be restricted to perturbations having wave vectors σ with σ1 N, σ2 L ∈ Z. Under this periodicity assumption we have stability if and only if Sε,κ ∩ { (n/N, l/L) ∈ T ∗ : n = 0, . . . , 2N − 1, l ∈ Z } = ∅. Thus, it is possible to rederive and refine the results in [Kuw96] by using the characterization of the set Sε,κ given in the present work. 2. General Theory We consider systems of partial differential equations which are posed over unbounded physical domains Q = Rd ×Σ with variables (x, z) ∈ Rd ×Σ. We assume for simplicity the form (2.1) ∂t u = Aµ (∂x )u + N (µ, ∂x , u) in Q = Rd × Σ, where u = u(t, x, z) ∈ Rn is the state variable, Aµ (∂x ) is an elliptic operator of order 2m in the (x, z) variables and incorporates the boundary conditions Bu = 0 on ∂Q = Rd × ∂Σ. The cross–section Σ is a bounded domain in Rs with Lipschitz boundary, and the vector µ ∈ Rp denotes all parameters. The problem is translational invariant (no x–dependence) while dependence on the cross–sectional variable z is allowed but not explicitly displayed. Our aim is to study the linearized stability of a given stationary spatially periodic eµ reads pattern u eµ of (2.1) under general nonperiodic perturbations. The linearization at u ∂t v = Bµ (∂x )v
def
with Bµ (∂x ) = Aµ (∂x ) + Du N (µ, ∂x , u eµ ).
(2.2)
To study (2.1) in a large function space which contains all sufficiently smooth bounded functions we define the uniformly local L2 space as in [MiS95]: Let
Stability of Rolls in the Swift–Hohenberg Equation
833
e 2 (Q) = { u ∈ L2 (Q) : kuklu < ∞ }, L lu R loc kuk2lu = sup{ Q Ty ρ(x)|u(x, z)|2 dx dz : y ∈ Rd }, where ρ : Rd → [0, ∞) is a suitable bounded and integrable weight function and Ty is the translation operator with Ty v(x) = v(x − y). For definiteness we choose the weight ρ(x) = e−|x| . The final L2lu uniformly local L2 –space is given by e 2 (Q) : kTy u − uklu → 0 for y → 0 }. L2lu (Q) = { u ∈ L lu k (Q) by asking that all partial derivatives in (x, z) As usual we define Sobolev spaces Hlu 2 k up to order k lie in Llu (Q). Then, Hlu (Q) is densely contained in L2lu (Q), and the 2 classical space L (Q) is continuously embedded in L2lu (Q) but not dense. The linear operator Bµ can now be defined on two spaces:
b0 ) = { u ∈ H 2m (Q) : Bu = 0 on ∂Q } → L2 (Q), u 7→ Bµ (∂x )u; bµ : D(A B e e0 ) = { u ∈ H 2m (Q) : Bu = 0 on ∂Q } → L2 (Q), u 7→ Bµ (∂x )u. (2.3) Bµ : D(A lu lu 2m The stationary periodic pattern u eµ lies in Hlu (Q). Its stability analysis can first be done 2 with respect to perturbations in L (Q), but finally we will show that the spectrum of the linearization around the periodic pattern is the same considered in L2 (Q) and in L2lu (Q). For the spectral analysis of Bµ we exploit the fact that Bµ has periodic coefficients via eµ (x)). Using the translation operators Ty this periodicity is characterized Du N (µ, ∂x , u by the lattice group L ⊂ Rd such that Bµ T` = T` Bµ for all ` ∈ L. In some cases, see e.g. the SHE in Sect. 3, the lattice group L is larger than Le = { y ∈ Rd : Ty u eµ = u eµ }, which is the translation group of u eµ , but Le ⊂ L always holds. Restricting the functions in L2lu (Q) to the subclass with the given lattice group L we obtain as natural space
L2lu (Q)/L = { u ∈ L2lu (Q) : T` u = u for all ` ∈ L }, which is easily identifiable with L2 (Q/L ) where Q/L = T ×Σ is the periodicity domain and T = Rd /L . For the wave vectors the dual lattice group L∗ ⊂ Rd is relevant. It is given by L∗ = { h ∈ Rd : h · ` ∈ 2πZ for all ` ∈ L }. Throughout we assume that L contains d linearly independent vectors and that the e connected components of L are d–dimensional, then T is a (d − de)–dimensional torus. ∗ e Under these conditions on L, the dual lattice L is discrete and contained in a (d − d)– d dimensional subspace. By choosing appropriate coordinates in R we can arrange things such that L = (2πZ)d−de × Rde ⊂ Rd . Then, T = (T2π )d−de × {0}, L∗ = Zd−de × {0}, and T ∗ = (T1 )d−de × Rde, where Tα = R/αZ is the one–dimensional torus of length α. The main idea is to reduce the spectral analysis in L2 (Q) to the space L2 (Q/L ) by using the Bloch decomposition which is also called the direct integral, cf. [ReS78], XIII.16. It is given by the isomorphism D : L2 (T ∗ , L2 (Q/L )) → L2 (Q) with Z eiσ·x U (σ, x, z) dσ, (2.4) D(U )(x, z) = σ∈T ∗
−1 R and satisfying kD(U )k2L2 (Q) = (2π)d vol(T ) kU (σ, ·)k2L2 (Q/L ) dσ. For more σ∈T ∗ details we refer to [ReS78] and to Appendix A.
834
A. Mielke
We define the closed subspaces Xσ = { eiσ·x U : U ∈ L2 (Q/L ) } ⊂ L2lu (Q), such that (2.4) tells us that L2 (Q) can be understood as the direct L2 –product of all the eµ , and we are spaces Xσ . It is clear that each Xσ is left invariant under the action of B able to define the Bloch operators B(µ, σ) : D(B) ⊂ L2 (Q/L ) → L2 (Q/L ) as follows eµ (∂x )[eiσ·x U ] = Bµ (iσ + ∂x )U, B(µ, σ)U = e−iσ·x B
(2.5)
where D(B) = { u ∈ H 2m (Q/L ) : Bu = 0 } does not depend on σ if the boundary operators B do not contain tangential derivatives (i.e. ∂x ). The family of Bloch operators allows us to gain full control over the operator Bµ (∂x ). In fact, assuming that the resolvents (B(µ, σ) − λI)−1 : L2 (Q/L ) → L2 (Q/L ) exist for all σ ∈ T ∗ with their norm uniformly bounded, we have bµ − λI)−1 f = D B(µ, ·)F (·) , where F = D−1 f. (2.6) (B See Lemma A.3 for the exact statement. In such a way it is possible to reduce the set of perturbations in L2 (Q) to the space L2 (Q/L ) while σ ∈ T ∗ appears as an additional parameter. If we are able to control the perturbations for all σ ∈ T ∗ simultaneously, then we are able to decide on stability. Note that no assumption on self–adjointness is needed for this theory. The only important fact is that we are in a Hilbert space setting, which enables us to use the Bloch decomposition. In Appendix A we show that all this can be made rigorous for general elliptic operators with suitable boundary conditions. The following result is provided there. Theorem 2.1. Let Bµ (∂x ) be an elliptic operator on Q with L–periodic coefficients bµ (∂x ), B eµ (∂x ) and B a boundary operator on ∂Q satisfiying conditions A.2. Define B 2 2 according to (2.3) on L (Q) and Llu (Q), respectively, and the Bloch operators B(µ, σ) according to (2.5). Then we have [ bµ (∂x )) = closure eµ (∂x )) = spec(B spec(B(µ, σ)) . (2.7) spec(B σ∈T ∗
Remarks. eµ and B bµ are the same as sets, however the type of spectrum 1. The spectra of B S usually differs dramatically. In fact, it is easy to see that σ spec(B(µ, σ)) is coneµ )) as point spectrum. Observe that from B(µ, σ)U = λU immediately tained in spec(B iσ·x e bµ these points are not Bµ [e U ] = λeiσ·x U ∈ L2lu (Q) follows. For the operator B iσ·x 2 U 6∈ L (Q). necessarily in the point spectrum, since e 2. Another difference appears when approaching the spectrum from inside the resolvent eµ is self–adjoint we have set. For instance, if B −n eµ )) eµ − λI)−1 kL2 (Q)→L2 (Q) = C dist(λ, spec(B k(B bµ − λI)−1 might be much with C = n = 1. However, the blow up for the operator (B worse, i.e. with C ≥ 1 and n ≥ 1. This question plays an important role if spectral
Stability of Rolls in the Swift–Hohenberg Equation
835
stability has to be improved to linearized stability. Then, we want to estimate the semigroup (eBeµ t )t≥0 or (eBbµ t )t≥0 for large t. Under the additional assumption that Bµ is a e + tn−1 )eνt for t ≥ 1. sectorial operator, one obtains keBµ t k ≤ C(1 3. In our application we cannot expect exponential stability since the spectrum always contains the origin λ = 0 if the periodic solution u eµ is non–constant. This is easily seen eµ (∂x ). since some partial derivative ∂xj u eµ is nonzero and it is in the kernel of B Thus, it remains to study the spectra of the Bloch operators B(µ, σ). For elliptic operators Bµ (∂x ) the Bloch operators are also elliptic and they are defined on the bounded spatial domain Q/L = T × Σ. Hence, they are Fredholm operators of index zero with compact resolvent. In order to analyze the spectrum we assume further on that we are in a bifurcation situation, where the stationary periodic pattern u eµ is small. Then, it is natural to assume that u = 0 is stable for µ = 0. If u = 0 would be unstable, then small u eµ could not gain stability. Thus, our main assumption on system (2.1) is that A0 (∂x ) is an elliptic operator on L2 (Q) which is spectrally stable. More precisely, our method can only work when the spectrum of A0 (∂x ) is contained in a set Sg = { λ ∈ C : Re λ ≤ −g(|Im λ|) } where g : [0, ∞) → [0, ∞) satisfies g(0) = 0 and g(t) ≥ g(s) > 0 for t > s > 0. The reason for this spectral bound is that our method involves perturbation arguments. Linearization around a small solution eµ ) with δ(µ) = k(Bµ − (µ, u eµ ) leads to the linear operator Bµ = Aµ (∂x ) + Du N (µ, ∂x , u A0 )(A0 −I)−1 k → 0 for |µ| → 0. Hence, standard perturbation arguments (see [Kat76]) show that the distance of the spectrum of Bµ from that of A0 is less than δ(µ). Our assumption spec(A0 ) ⊂ Sg now implies that the spectrum of Bµ is contained in { λ ∈ C : dist(λ, Sg ) ≤ δ(µ) }. Thus, we immediately conclude that for µ → 0 the unstable part (i.e., Re λ > 0) of the spectrum of Bµ is contained in a small neighborhood of zero. More precisely, for each ε > 0 there is a µ0 such that for all µ with |µ| ≤ µ0 the spectrum of Bµ is contained in { λ ∈ C : Re λ < 0 or |λ| ≤ ε }. Our method is exactly devised to study the spectrum close to λ = 0 in the case that u eµ is a small spatially periodic steady state of (2.1). We are not able to control large λ nor large solutions u eµ since our analysis is based in the exact control of the operator A0 (∂x ), which can be obtained by Fourier transform with respect to x ∈ Rd . For µ = 0 we know that the spectrum of B(0, σ) is contained in {0} ∪ { λ ∈ C : Re λ < 0 }. The kernel is finite–dimensional and depends on σ. The general a–priori estimate (A.9) tells us that for large σ ∈ T ∗ the kernel is trivial, so that only a compact set S0 of wave vectors σ can be important, i.e. S0 = { σ ∈ T ∗ : dim kernel(B(0, σ)) > 0 }. Considering now general small µ we immediately see that we only have to control the operators in a neighborhood of S0 . In fact, defining the set Sµ of unstable wave vectors as (2.8) Sµ = { σ ∈ T ∗ : B(µ, σ) has an eigenvalue λ with Re λ > 0 }, perturbation theory for operators with compact resolvent implies dist(S(µ), S0 ) → 0 for µ → 0. Thus, it remains to control the finitely many eigenvalues of B(µ, σ) for µ ≈ 0 and σ ≈ σ0 ∈ S0 . This, we can do with the help of the Liapunov–Schmidt reduction applied to the linear eigenvalue problem def
K(µ, σ, λ)U = B(µ, σ)U − λU = 0.
(2.9)
It is our aim to find nontrivial solutions of this equation, and we do this by treating it as a bifurcation problem. Although this is a perturbation problem for linear operators we
836
A. Mielke
use the Liapunov–Schmidt reduction since it is so closely related to the typical way of establishing the bifurcation result for the nonlinear problem, cf. [Mie95, Mi97b]. The main point is that it is sufficient to consider small λ as was shown above. Hence, for σ0 ∈ S0 fixed, (µ, σ − σ0 , λ) can be treated as a small bifurcation parameter in (2.9). For (µ, σ, λ) = (0, σ0 , 0) we find splittings D(B) = X0 (σ0 ) ⊕ X1 (σ0 ) and L2 (Q/L ) = Y0 (σ0 ) ⊕ Y1 (σ0 ) such that X0 (σ0 ) is the finite–dimensional kernel of K(0, σ0 , 0) = B(0, σ0 ) and Y1 (σ0 ) its range. Since the Fredholm index of B(µ, σ) is 0, the dimensions of Y0 and X0 are the same. Decompose U = U0 + U1 with Uj ∈ Xj , F = F0 + F1 with Fj ∈ Yj , and let P : L2 (Q/L ) → L2 (Q/L ) be the projection with P F = F0 . Then, K(µ, σ, λ)U = 0 is equivalent to P K(µ, σ, λ)(U0 + U1 ) = 0,
(I − P )K(µ, σ, λ)(U0 + U1 ) = 0,
where the second relation can be inverted for (µ, σ − σ0 , λ) sufficiently small in order to obtain U1 = U(µ, σ, λ)U0 . Inserting this result into the first equation we are left with the reduced spectral problem e σ, λ)U0 def = P K(µ, σ, λ)(U0 + U (µ, σ, λ)U0 ) = 0. K(µ,
(2.10)
This reduced problem is no longer linear in λ, however, it is finite–dimensional with e σ, λ) : X0 (σ0 ) → Y0 (σ0 ). Equation (2.10) has nontrivial solutions U0 if and only K(µ, if def e σ, λ) = 0. 3(µ, σ, λ) = det K(µ, We note that σ has to be close to σ0 ∈ S0 . By compactness it is sufficient to do this reduction for finitely many σ0 , where the subspaces X0 (σ0 ) and Y0 (σ0 ) can change dramatically: generally, even the dimension will change. The present approach does not only provide a tool to decide on stability or instability of the periodic pattern. It also gives a way to describe the set of unstable wave vectors quite precisely. Analyzing the problems 3(µ, σ, λ) = 0 we obtain information on the set Sµ , cf. (2.8). Moreover, it is possible to find those wave vectors σ ∈ Sµ which correspond to those λ having the largest real part. Such characterizations of Sµ are important in the theory of pattern formation. One special case attracted a lot of attention over the last thirty years, namely those of sideband instabilities. This phenomenon is now easily identified in the present context with the situation when Sµ is contained in a small neighborhood of σ = 0, but σ = 0 itself is not in Sµ . We will discover such sideband instabilities in the next section. 3. The Real and Complex Swift–Hohenberg Equation We work out the details of the method for a simple model problem showing the same theoretical behavior as many other pattern forming systems. The two–dimensional Swift– Hohenberg equation (SHE) is given by ut = −(1 + 1)2 u + εu − u3 ,
for t > 0, x ∈ Q = R2 ,
(3.1)
where 1 = ∂x2 1 +∂x2 2 is the Laplace operator. The linearization at zero admits the solutions v(t, x) = eλt+i(k1 x1 +k2 x2 ) with λ(k1 , k2 ) = −(1 − k12 − k22 )2 + ε. Hence, u ≡ 0 is weakly unstable with unstable modes having wave vectors with k12 + k22 ≈ 1.
Stability of Rolls in the Swift–Hohenberg Equation
837
The basic patterns of interest are so–called rolls, which are independent of time and of x2 (after√a suitable rotation), and periodic in x1 . Taking the period in x1 to be 2π/k with k = 1 + κ we are looking for a solution u of (3.1) in the form u(t, x) = U (ξ) where ξ = kx1 ∈ T2π = R/2πZ . The problem for U reads def
0 = N (ε, κ, U ) = −(1 + (1+κ)∂ξ2 )2 U + εU − U 3 ,
U ∈ H 4 (T2π ),
(3.2)
where N : R2 × H 4 (T2π ) → L2 (T2π ) is an analytical mapping. From [Mie95] (see also [CoE90], Thm. 17.1) we have the following result on the existence of steady roll patterns. √ √ Theorem 3.1. There is an ε0 > 0 such that for all ε ∈ (0, ε0 ] and all κ ∈ (− ε, ε) there is a unique small solution U = Uε,κ ∈ H 4 (T2π ) of (3.2) which is even in ξ and positive at ξ = 0. This solution has the expansion a5 ) for (ε, κ) → 0, Uε,κ (ξ) = a1 cos ξ + a3 cos(3ξ) + O(e p where e a=e a(ε, κ) = 4(ε − κ2 )/3 and R 2π a1 = π1 0 Uε,κ (ξ) cos ξ dξ = e a+e a3 /512 + O(e a4 ), R 1 2π 3 a3 = π 0 Uε,κ (ξ) cos(3ξ) dξ = −e a /256 + O(e a4 ).
(3.3)
Moreover, Uε,κ (π + ξ) = −Uε,κ (ξ). In light of Sect. 2 we say that the solution u eε,κ (x) = Uε,κ (kx1 ) is (spectrally) unstable, if there exists λ ∈ C with Re λ > 0 and a nontrivial smooth bounded function v such that u2ε,κ v. λv = −(1 + 1)2 v + ε − 3e The following necessary and sufficient stability criterion is derived in the next section together with precise information on the set Sε,κ of unstable wave vectors. Theorem 3.2. There is a positive ε1 , and there are curves κ = KZ (ε) and ε = EE (κ), satisfying the expansions KZ (ε) = −ε2 /512 + O(ε3 ),
EE (κ) = 3κ2 − κ3 + O(|κ|4 ), √ such that the roll solution Uε,κ with ε ∈ (0, ε1 ] and |κ| ≤ ε is stable if and only if ε ≥ EE (κ)
and
κ ≥ KZ (ε).
(3.4)
Remarks. 1. The bound ε ≥ EE (κ) is called the Eckhaus criterion (cf. [Eck65]), which contains the universal factor 3: rolls exist for ε > κ2 but the rolls are stable only for ε ≥ 3κ2 +O(|κ|3 ). The bound κ ≥ KZ (ε) is the zigzag instability bound, see [Bus71] for a first discussion. 2. Our results are sharper than those in [CoE90], Thm. 20.1+2 and [Kuw96]. Reformulating the latter results in our notation gives a statement as follows: there are curves 1 2 (κ) < EE (κ) with KZ2 (ε) − KZ1 (ε) = O(εα ) and KZ1 (ε) < 0 < KZ2 (ε) and EE 2 1 β EE (κ) − EE (κ) = O(|κ| ) for suitable α > 1 and β > 2 such that stability can be 2 1 (κ) and κ ≥ KZ2 (ε) whereas instability holds if either ε < EE (κ) concluded if ε ≥ EE or κ < KZ1 (ε). Hence, small tongues around the exact boundaries remained where no conclusion could be made. 3. There are parameters (κ, ε) with κ < 0 such that the roll Uε,κ is stable. Moreover, all small rolls with κ = 0 are stable.
838
A. Mielke
We postpone the proof of this result to Sect. 4 and study first a somewhat similar problem which is much easier as no Liapunov–Schmidt reduction is necessary. But nevertheless it shows the ideas and technicalities in the discussion of the algebraic eigenvalue problem. The complex SHE is given by ∂t A = −(1 + 1)2 A + εA − |A|2 A,
t ≥ 0, x ∈ R2 ,
(3.5)
where A(t, x, y) ∈ C. In contrast to the real SHE this problem has an additional symmetry group, namely the phase invariance A 7→ eiα A for α ∈ T2π . Obviously, the real SHE is contained in (3.5) by restricting to real–valued A. We will study the stability of the explicitly known family of stationary roll solutions given by (3.6) A(x) = rei(α+k1 x1 +k2 x2 ) , where r2 = ε − (1 − k12 − k22 )2 . √ Using the rotational invariance we may assume (k1 , k2 ) = (k, 0) with k = 1 + κ and denote by Aε,κ the unique solution in (3.6) with α = 0. These solutions are not related to the previously studied Uε,κ , which are, of course, also stationary solutions of (3.5). To study the stability of Aε,κ we consider the linearization of (3.5) around this steady state: (3.7) ∂t B = −(1 + 1)2 B + εB − 2|Aε,κ |2 B − A2ε,κ B, where B is the complex conjugate of B. We let B = (w1 + iw2 )eikx1 with w1 , w2 ∈ R and arrive at the constant coefficient problem w1 L4 + ε − 3r2 4kL2 ∂x1 w1 = , (3.8) ∂t w2 w2 −4kL2 ∂x1 L4 + ε − r2 where L4 = −L22 + 4(1 + κ)∂x2 1 and L2 = 1 − κ. This linear system can be solved completely by Fourier transform. Looking for solutions in the form w = eλt+i(k(σ1−1)x1 +σ2 x2 ) W with constant W ∈ C2 we obtain the algebraic problem ρ + c − λ iν W = 0, (3.9) −iν ρ − λ where ρ = −(κ + (1+κ)(σ1 −1)2 + σ22 )2 − 4(1+κ)2 (σ1 −1)2 + κ2 , ν = −4(1+κ)(σ1 −1)(κ + (1+κ)(σ1 −1)2 + σ22 ), and c = −2(ε − κ2 ).
(3.10)
Since roll solutions only exist for ε > κ2 we always have c < 0. Note that we have shifted back the vector σ by (−1, 0) to account for the factor eikx1 in the ansatz B = (w1 + iw2 )eikx1 . The two eigenvalues λ obtained from solving (3.9) are real and can be expressed explicitly by solving λ2 − (2ρ + c)λ + ρ(ρ + c) − ν 2 = 0. Our aim is to characterize the unstable wave vectors σ, where at least one eigenvalue is positive. This is the case if and only if either (i) or (ii) hold, where (i) ρ + c/2 > 0
and (ii) ρ(ρ + c) − ν 2 < 0.
To analyze these conditions in more detail we use the abbreviations se = k 2 (σ1 −1)2 ,
t = σ22 ,
and µ = ρ + 8(1 + κ)e s.
In this notation conditions (3.11) take the form
(3.11)
Stability of Rolls in the Swift–Hohenberg Equation
839
(i) µ − (ε − κ2 ) − 8(1 + κ)e s > 0, s < 0. (ii) µ2 − 2(ε − κ2 )µ + 16(ε − 2κ2 )(1 + κ)e
(3.12)
Of course, only such (e s, µ) are allowed which can be obtained from (e s, t) ∈ [0, ∞)2 , namely def q(κ, se) if se > −κ, , 0 ≤ se < ∞ and µ ≤ g(κ, se) = s if se ∈ [0, −κ]; κ2 + 4(1 + κ)e p s. Then, t = −e s − κ ± κ2 + 4(1 + κ)e s − µ, where the where q(κ, se) = −e s2 + 2(2 + κ)e + κ)e s . minus sign is only allowed if q(κ, se) ≤ µ ≤ κ2 + 4(1 √ √ Condition (i) in (3.12) can only hold if κ ∈ [− ε, − ε/2], namely in the region s, µ) : 0 ≤ se < A1 = { (e
2κ2 −ε 4(1+κ) ,
ε − κ2 + 8(1 + κ)e s < µ < κ2 + 4(1 + κ)e s }.
For condition (ii) we first consider the case κ ∈ [0, characterized by the intersection of the sets
√
ε]. The instability set is
s, µ) ∈ [0, ∞) × R : µ ≤ g(κ, se) } and A2 = { (e s, µ) ∈ [0, ∞) × R : µ2 − 2(ε − κ2 )µ + 16(ε − 2κ2 )(1 + κ)e s < 0 }. A3 = { (e Both regions are bounded by a parabola which contains the origin. Checking their position it is immediate that A2 ∩ A3 is nonempty if and only if the slope of ∂A2 in the origin is larger than that of ∂A3 . This gives the stability condition 2(2 + κ) ≤ 8(ε − 2κ2 )(1 + κ)/(ε − κ2 ), which is the classical Eckhaus criterion: def
C (κ) = κ2 ε ≥ EE
6 + 7κ = 3κ2 − κ3 + 3κ4 /2 + O(|κ|5 ). 2 + 3κ
C (κ) we have a nontrivial intersection A2 ∩ A3 , which changes its type For ε < EE 2 C (κ)) the set A2 \ A3 has one connected component when ε ≈ 2κ . For ε ∈ (E3C (κ), EE while for ε ∈ [κ2 , E3C (κ)] the set A2 \ A3 has two connected components: one above the line µ = 0 and one below. The boundary ε = E3C (κ) is determined by the condition s, µ) ≈ (4, 0). We find the that the boundaries of A2 and A3 touch each other in a point (e expansion 1 E3C (κ) = 2κ2 + κ4 + O(|κ|5 ). 64 √ The analysis of the case κ ∈ [− ε, 0) is more involved, since the set A2 is enlarged due to the fact that g(κ, se) > q(κ, se) for se < −κ. Now the intersection A2 ∩ A3 is always nontrivial and hence instability is concluded. To characterize the intersection we note that A2 \ A3 consists of one or two connected components for ε > E3C (κ) or ε ∈ [κ2 , E3C (κ)] respectively. Moreover,
A2 ∩ A3 = { (e s, µ) : µ ≤ κ2 + 4(1 + κ)e s, m− (ε, κ, se) < µ < m+ (ε, κ, se) }, p 2 2 2 2 where m± (ε, κ, se) = s. pε − κ ± (ε − κ ) + 16(2κ − ε)(1 + κ)e √ s) = For κ ∈ [− ε, − 2ε/3) the bound m+ lies below the straight line µ = m0 (e s for small se. However, the set A∗ lying between m0 and m+ is contained κ2 + 4(1 + κ)e inside the region A1 , where ρ + c/2 > 0. Hence, A∗ characterizes those σ for which both eigenvalues λ1,2 are positive.
840
A. Mielke C EE
C EE
ε
E3C 2
3
5 EC 3
4 stable
1
6
κ 0 Fig. 3.1. The regions R1C to R6C for the complex SHE σ2
j=3
B
BBN
0
σ2
j=2
j=1
j=6
A
AU
σ1 2
j=5
0
σ1 2
C of unstable wave vectors for A C Fig. 3.2. The set Sε,κ ε,κ with (ε, κ) ∈ Rj
For the interpretation of the above results in terms of σ we recall that A2 ∩ A3 always lies in a strip of width O(ε) around the se–axis. Moreover, the line µ = 0 corresponds in the case k = 1 to the two circles S0C = { σ ∈ R2 : σ12 + σ22 = 1 or (σ1 −2)2 + σ22 = 1 }. For a given solution Aε,κ with ε ∈ [κ2 , ε0 ) of (3.5) we define the instability set C = { σ ∈ R2 : either (i) or (ii) hold }. Sε,κ
Using the semidistance dist(A, B) = sup{ inf{ |a − b| : b ∈ B } : a ∈ A } for A, B ⊂ R2 we have the following results. C (κ) and ε = E3C (κ) in the form Theorem 3.3. There is a positive ε0 and curves ε = EE as given above such that for a roll solution Aε,κ with ε ∈ (κ2 , ε0 ) of the complex SHE (3.5) the following holds.
(a) (b)
C C C , and (σ1 , σ2 ) ∈ Sε,κ implies (2 − σ1 , σ2 ), (σ1 , −σ2 ) ∈ Sε,κ . (1, 0) 6∈ Sε,κ √ C dist(Sε,κ , S0 ) = O( ε) for ε → 0.
C C = ∅) if and only if κ ≥ 0 and ε ≥ EE (κ). (c) The solution Aε,κ is stable (i.e., Sε,κ
(d) On the curve ε = E3C (κ) the boundary of Sε,κ has a pair of double points on the line σ2 = 0 close to σ1 = −1 and σ1 = 3. C (κ), ε = E3C (κ) and κ = 0 divide the region ε ≥ κ2 into six regions The curves ε = EE C RjC , see Fig. 3.1. The boundaries between RjC and Rj+1 are exactly those curves where C the topological structure of Sε,κ changes. We depict the different shapes in Fig. 3.2. √ √ The cases κ = ε and κ = − ε can be given explicitly:
Stability of Rolls in the Swift–Hohenberg Equation C Sε,ε = { σ ∈ R2 : ∃ β ∈ {0, 2} such that C Sε,−ε
841 √ 1−√ ε 1+ ε
< (σ1 + β)2 +
= { σ ∈ R : ∃ β ∈ {0, 2} such that 1 < (σ1 + β) + 2
2
σ22 √ 1+ ε
σ22 √ 1+ ε
<
< 1 }, √ 1+ √ε 1− ε
}.
(3.13)
√ In both close to 1 and thickness 2 ε + O(ε). For √ √ cases we have two annuli of radii κ = ε the annuli touch each other in σ ∗ = (1, 0), while for κ = − ε they overlap such C . that σ ∗ remains an isolated point in the complement of Sε,−ε 2 C for For later reference we consider the case ε < 2κ such that the boundary of Sε,κ σ ≈ (2, 1) has two branches σ2 = Σ+,− (ε, κ, σ1 ) with the expansions Σ+,− (ε, κ, σ1 ) = α+,− (ε, κ) + β+,− (ε, κ)(σ1 − 2) + O(|σ1 − 2|2 ), where α+,− = 1 ±
1 2
√
2 2
√ ) 2κ2 − ε + O(ε) and β+,− = ∓ (ε−κ 2 32
2κ −ε
(3.14)
+ O(ε2 ).
4. On the Set of Unstable Wave Vectors We return to the real SHE and study the set Sε,κ of the unstable wave vectors associated to the roll Uε,κ . In showing Sε,κ = ∅ we prove Theorem 3.2. The linearization of (3.1) around the roll solution Uε,κ given in (3.3) defines the full operator bε,κ (∂ξ ) : H 4 (R2 ) ⊂ L2 (R2 ) → L2 (R2 ) B bε,κ (∂ξ )v = −(1 + k 2 ∂ 2 + ∂ 2 )2 v + ε − 3Uε,κ (ξ)2 v. B x2 ξ eε,κ : H 4 (R2 ) ⊂ L2 (R2 ) → L2 (R2 ), Of course we can also consider the operator B lu lu lu which is defined by the same formula. 2 is π–periodic The basic state Uε,κ is 2π–periodic, however the coefficient ε − 3Uε,κ in ξ, since Uε,κ (ξ + π) = −Uε,κ (ξ). Hence, it is advantageous to work with the lattice group L = πZ rather than with Le = 2πZ, which is the translation group of Uε,κ . We apply the abstract theory of Sect. 2 (using the coordinates (ξ, x2 )) with d = 2, de = 1, Q = R2 , L = πZ×R, L∗ = 2Z×{0}, T = R2 /L = Tπ ×{0}, and T ∗ = R2 /L∗ = T2 ×R. The Bloch operator family is given by B(ε, κ, σ) : H 4 (Tπ ) ⊂ L2 (Tπ ) → L2 (Tπ ), 2 B(ε, κ, σ)V = −(1 + k 2 (∂ξ + iσ1 )2 − σ22 )2 V + ε − 3Uε,κ (ξ) V, where (ε, κ, σ) ∈ R4 . Here B is even in σ2 , and the operator B(ε, κ, σ1 + m, σ2 ), m ∈ 2Z, is unitary equivalent to B(ε, κ, σ), since it is connected to B(ε, κ, σ) by the transformation V (ξ) 7→ eimξ V (ξ). Moreover, there are two reflection symmetries given by (4.1) (R1 V )(ξ) = V (−ξ) and (R2 V )(ξ) = V (ξ). In both cases we have Rj−1 = Rj and B(ε, κ, σ) = Rj B(ε, κ, (−σ1 , σ2 ))Rj . Hence, it is sufficient to study the case σ ∈ [0, 1]×[0, ∞) which is only one quarter of T ∗ = T2 × R. All the Bloch operators B(ε, κ, σ) are selfadjoint, which is helpful but not essential for our theory. We strongly use the fact that the operators are small perturbations def
of B κ (σ) = B(κ2 , κ, σ) which is trivially analyzed as it has constant coefficients: B κ (σ)φm = (µm (κ, σ) + κ2 )φm with φm (ξ) = eimξ , m ∈ 2Z, and
842
A. Mielke
µm (κ, σ) = −(1 − (1 + κ)(m + σ1 )2 − σ22 )2 .
(4.2)
Thus, for σ1 ∈ [0, 1] we have the explicit upper bound Z π (B κ (σ)V ) V dξ ≤ −β(κ, σ)kV k2L2 (Tπ ) , 0
where β(κ, σ) = min{(1 − (1 + κ)σ12 − σ22 )2 , (1 − (1 + κ)(2−σ1 )2 − σ22 )2 } − κ2 . Since κ ≈ 0 we immediately identify the dangerous set S0 = { σ ∈ T ∗ = T2 × R : σ12 + σ22 = 1 or (σ1 −2)2 + σ22 = 1 }. Hence, if σ is bounded away from S0 , we obtain a good bound on the spectrum of B. Choosing a small δ ∈ (0, 1] independent of (ε, κ, σ) we define the set of good wave vectors as Gδ = { σ ∈ [0, 1] × [0, ∞) : dist(σ, S0 ) ≥ δ }. For σ ∈ Gδ we have β(σ, κ) ≥ δ 2 /2 for all sufficiently small κ. For general small 2 k∞ ≤ e a2 for sufficiently ε ≥ κ2 we have kB(ε, κ, σ) − B κ (σ)kL2 →L2 = kε − κ2 − 3Uε,κ small ε0 . Thus, for σ ∈ Gδ we derive the estimate Z π (B(ε, κ, σ)V ) V dξ ≤ −(δ 2 /2 − e a2 )kV k2L2 (Tπ ) ≤ −(δ 2 /2 − 2ε)kV k2L2 (Tπ ) . 0
√ This shows that we may choose the width δ of the good set Gδ to be of order ε, e.g., √ δ = 3 ε. However, for our purposes it suffices to fix a small δ independent of ε. It remains to study B(ε, κ, σ) in the dangerous parts close to the circle σ12 + σ22 = 1. To this end we distinguish the two regions √ dist(σ, S0 ) ≤ δ, and σ2 ≥ δ }, C1 = { σ ∈ [0, 1] × [0, 2] : √ C2 = { σ ∈ [1−2δ, 1] × [0, δ] : dist(σ, S0 ) ≤ δ }. The operator B κ (σ) has only one small eigenvalue for σ ∈ C1 , while for σ ∈ C2 there are two small eigenvalues. It suffices to control the movement of these small eigenvalues only, since all other eigenvalues are bounded away from the imaginary axis. Region C1 . For σ ∈ C1 the eigenfunction φ0 (ξ) ≡ 1 is the only eigenfunction for B κ (σ) associated to a small eigenvalue, namely λ0 = −(1 − k 2 σ12 − σ22 )2 + κ2 . The associated eigenvalue of B(ε, κ, σ) is constructed by Liapunov–SchmidtRreduction of π the eigenvalue problem BV − λV = 0. To this end we define P1 V = π1 0 V φ0 dξ φ0 which is the orthogonal projection in L2 (Tπ ) onto span{φ0 }, and write the eigenvalue problem as P1 [B(ε, κ, σ) − λI][α0 φ0 + V2 ] = 0, (I − P1 )[B(ε, κ, σ) − λI][α0 φ0 + V2 ] = 0,
where P1 V2 = 0.
Since B κ (σ) is invertible on (I − P1 )L2 (Tπ ), the second equation can be solved for V2 = V(ε, κ, σ, λ)α0 yielding the expansion 2 1 1 φ2 (ξ) + µ−2 +ε−λ φ−2 (ξ) + O(e a4 ), V(ε, κ, σ, λ) = 3ea4 µ2 +ε−λ
Stability of Rolls in the Swift–Hohenberg Equation
843
where µm is defined in (4.2) and the error term O(e a4 ) is uniform in bounded sets for (σ, λ), e.g., |σ| ≤ 3 and |λ| ≤ 1. Inserting the result in the first equation we obtain the reduced spectral problem b0 (ε, κ, σ, λ)α0 φ0 = 0 with Rπ 2 (φ0 + V) φ0 dξ b0 (ε, κ, σ, λ) = µ0 + ε − λ − π3 0 Uε,κ 3a21 9e a4 1 1 = µ +ε−λ− − + + O(e a6 ). 0
2
16
µ2 +ε−λ
µ−2 +ε−λ
The small eigenvalue λ is determined by solving b0 (ε, κ, σ, λ) = 0. To discuss the sign of λ it is convenient to use polar coordinates √ σ = (σ1 , σ2 ) = 1 + r ( k1 sin γ, cos γ), √ where the region C1 corresponds to γ ∈ [0, π/2 − δ] and |r| ≤ δ. We obtain 3 9 4 1 + sin2 γ λ = λ0 (ε, κ, r, γ) = ε − a21 − r2 + e a + O(e a4 (e a + |r|)). 2 128 cos4 γ
(4.3)
For ε > 2κ2 + O(|κ|3 )√we always have λ ≤ 0, while for smaller ε there is a band of unstable σ of width O( ε) around the circle |σ| = 1. Region C2 . We are now in the situation of σ ≈ σ ∗ = (1, 0), where B κ (σ) has the critical eigenfunctions φ0 ≡ 1 and φ−2 (ξ) = e−i2ξ . In fact, this is the realm of classical sideband instability as discussed in [Mie95]. There, the analysis was done in a space of functions which are 2π–periodic in ξ such that our region C2 corresponds to σ ≈ 0 there (as σ1 is taken modulo 1). There the instability result of Theorem 3.2 was already derived, yet for our stability proof we have to repeat and improve upon these calculations. To be compatible with the calculations in [Mie95] we use the basis functions U1 (ξ) = cos ξ and U2 (ξ)R = e−iξ sin ξ and set σ b = σ − σ ∗ = (σ1 − 1, σ2 ). Letting P2 V = e−iξ R 2 π 2 π π 0 V U 1 dξU1 + π 0 V U 2 dξU2 and V = β1 U1 +β2 U2 +V1 with P2 V1 = 0, the equation a2 |β|), (I − P2 )[B(. . .) − λI]V = 0 can be solved uniquely for V1 = V(ε, κ, σ, λ)β = O(e for all sufficiently small (ε, κ, σ b, λ). Again the estimate follows easily from the fact that 2 V. the coupling only occurs through the term −3Uε,κ Inserting this expansion into P2 [B(. . .) − λI]V = 0 leads to the reduced eigenvalue problem. It is given by a 2 × 2–matrix, which depends nonlinearly on λ: m(ε, κ, σ, λ)β = P[B(ε, κ, σ) − λI][β1 U1 + β2 U2 + V(ε, κ, σ, λ)β], Rπ Rπ where P : V 7→ π2 ( 0 V U 1 dξ, 0 V U 2 dξ) ∈ C2 . This gives O(|b σ |2 +|λ|) O(|b σ1 |) ρ + c(ε, κ) − λ iν , m(ε, κ, σ, λ) = +e a4 −iν ρ−λ σ |2 +|λ|) O(|b σ1 |) O(|b
(4.4)
with ρ = (µ1 + µ−1 )/2 and ν = (µ1 − µ−1 )/2 from (3.10) and c(ε, κ) = −3e a2 /2 + O(e a4 ). Of course, m is Hermitian and each entry is even in σ2 . Two additional facts in this expansion are nontrivial. Firstly, the symmetries R1 and R2 in (4.1) show that the b1 . Secondly, diagonal elements are even in σ b1 = σ1−1 while m12 = −m21 is odd in σ 0 m(ε, κ, σ ∗ , 0) takes the form c(ε,κ) , where the lower diagonal element vanishes as 0 0 it corresponds to the eigenvalue λ = 0 associated to the translational mode ∂ξ Uε,κ = −e a sin ξ + O(e a3 ) (compare to Lemma 5.3 in [Mi97b]).
844
A. Mielke
In fact, we need a more refined expansion which follows from determining the term of order e a2 in V: a6 ), m11 (ε, κ, σ, λ) = ρ + ε − λ − 49 a21 − 23a1 a3 + η+ + O(e 6 a ) , m12 (ε, κ, σ, λ) = −m21 (ε, κ, σ, λ) = i ν + η− + O(e m22 (ε, κ, σ, λ) = ρ + ε − λ − 43 a21 + 23 a1 a3 + η+ + O(e a6 ), 9 4 e a (µ2 + ε − λ)−1 ± (µ−4 + ε − λ)−1 . where η± = 32 In order to study which wave vectors are stable we have to find λ from
(4.5)
3(ε, κ, σ, λ) = det m(ε, κ, σ, λ) = 0. Applying Weierstraß’ preparation theorem (see [ChH82], Ch. 2.6) we have 3(ε, κ, σ, λ) = 30 (ε, κ, σ, λ) λ2 + n1 (ε, κ, σ)λ + n0 (ε, κ, σ) , where 30 , n1 , and n0 are analytical functions with 30 (0, 0, 0, 0) = 1 and √ σ |2 ) and n0 = ρ(ρ + c) − ν 2 + O ε2 |b σ |2 ( ε + |b σ |2 )] . n1 = −2ρ − c + O(ε2 |b On the one hand, we know that m is Hermitian implying that both eigenvalues are real. Hence, without calculation we always have n21 ≥ 4n0 . On the other hand, these two eigenvalues have negative real part if and only if n0 ≥ 0 and n1 ≥ 0. Since n1 (0, 0, σ ∗ ) > 0 and since n1 can only change sign when n0 < 0, we conclude that it suffices to consider the condition n0 ≥ 0 when we are only interested in the question whether Uε,κ is stable or not. However, for the subsequent calculation of Sε,κ we need to consider both conditions n1 ≥ 0 and n0 ≥ 0. Since n0 is a positive multiple of the determinant of m(ε, κ, σ, 0) the stability condition is det m(ε, κ, σ, 0) ≥ 0 for all σ ∈ C 1 ∪ C2 . From (4.5) we obtain the following expansion for σ ≈ σ ∗ . Lemma 4.1. Let s = (σ1 − 1)2 , t = σ22 , and M (ε, κ, σ) = det m(ε, κ, σ, 0). Then, we have M = µ0,1 t + µ1,0 s + µ0,2 t2 + µ1,1 st + µ0,3 t3 + µ2,0 s2 + µ1,2 st2 + µ0,4 t4 + O((s+t2 )5/2 ), where µ0,1 (ε, κ) µ1,0 (ε, κ) µ0,2 (ε, κ) µ1,1 (ε, κ) µ0,3 (ε, κ) µ2,0 (ε, κ) µ1,2 (ε, κ) µ0,4 (ε, κ)
= c(ε, κ) − 2κ − (ε − κ2 )2 /256 + O(ε5/2 ) , = 8(ε − 3κ2 ) + 4κ(5ε − 13κ2 ) + O(ε2 ), = 2(ε + κ2 ) + O(ε2 ), = −16κ + 4(ε − 7κ2 ) + O(ε3/2 ), = 4κ + (ε − κ2 )2 /128 + O(ε5/2 ), = 16 + 48κ + 2(ε + 25κ2 ) + O(ε3/2 ), = −8 − 4κ + 4κ2 + O(ε3/2 ), = 1 + O(ε3/2 ).
(4.6)
This expansion is suitable to discuss the set of unstable wave vectors in region √ C2 ⊂ [1 − 2δ, 1] × [0, δ]. While instability is easily obtained from the signs of µ1,0 and µ0,1 , it will be a rather delicate task to prove the stability result. Proof of Theorem 3.2. The expansion det m = µ0,1 t + µ1,0 s + O(s2 + t2 ) immediately leads to instability if either µ0,1 or µ1,0 in (4.6) is negative. Thus, defining KZ and EE
Stability of Rolls in the Swift–Hohenberg Equation
845
via µ1,0 (EE (κ), k) = 0 and µ0,1 (ε, KZ (ε)) = 0, the instability result follows by choosing suitably small σ1 or σ2 , respectively. To establish stability we need to conclude Sε,κ = ∅ for those (ε, κ) satisfying (3.4). In that region we have ε ≤ 2e a2 ≤ 3ε such that the error terms in (4.6) can be expressed in terms of powers of e a. At first we note that the intersection of Sε,κ with C1 is empty since ε ≥ EE (κ) = 3κ2 + O(|κ|3 ) clearly implies ε ≥ E1 (κ) = 2κ2 + O(|κ|3 ). Thus, it remains to consider region C2 . As argued before of Lemma 4.1 it suffices to show that det m(ε, κ, σ, 0) ≥ 0 in a neighborhood of σ ∗ which is independent of (ε, κ). Before employing Lemma 4.1 we recall the expansion (4.4) which gives a4 [e as + t2 + s2 ] . (4.7) M (ε, κ, σ) = ρ(ρ + c) − ν 2 + O e j l c(ε, κ, σ) = P We define M 2j+l≤4 µj,l (ε, κ)s t and find
def c(ε, κ, σ) = O(e a4 (s + t2 )5/2 ). R(ε, κ, σ) = M (ε, κ, σ) − M
c in order to show 2|R| ≤ M c in a neighborhood of σ ∗ . Our aim is to use positivity of M c is degenerate for ε = 0, namely M c(0, 0, σ) = (4s − t2 )2 . This estimate is subtle, since M c For estimating M from below we use √ √ √ µ1,2 ≥ −2 µ2,0 µ0,4 and − µ1,1 µ0,4 ≤ µ0,3 µ2,0 . Both inequalities hold (after cancellation of the leading order terms) for sufficiently small (ε, κ) satisfying the stability criterion (3.4). Whence, 2 µ1,1 c(ε, κ, σ) ≥ µ0,1 t + µ1,0 s + µ∗ t2 + √µ0,4 t2 − √µ2,0 s − √ t , M 2 µ2,0 where
µ∗ (ε, κ) = µ0,2 − µ21,1 /(4µ2,0 ) = 2(ε − κ2 ) + O(e a3 ) ≥ e a2
for sufficiently small e a. c ≥ µ0,1 t + µ1,0 s + e For all small s, t ≥ 0 we obtain M a2 t2 . Additionally, for s ≥ t we √ √ µ 1,1 c ≥ µ0,1 t + µ1,0 s + e a2 t2 + s2 . Together have µ2,0 s− µ0,4 t2 − 2√µ2,0 t ≥ s, implying M c(ε, κ, σ) ≥ µ t + µ s + ea2 (s2 + t2 ). Since with the previous estimate, this gives M 0,1
1,0
2
R = O(e a4 |b σ |5 ) we conclude that for all small (ε, κ) which satisfy the stability criterion we have e a2 4 σ | for all σ ∈ C2 . (4.8) M (ε, κ, σ) ≥ µ0,1 σ12 + µ1,0 σ22 + |b 4 This proves Theorem 3.2.
In the unstable case it is desirable to describe the set Sε,κ of unstable wave vectors σ. The analyses for C1 and C2 provides us with a lot of information. To formulate the results correctly it is useful to consider the full set of wave vectors, namely (σ1 , σ2 ) ∈ T ∗ = T2 × R, where T2 = R/2Z . Recall that the identification of σ1 with σ1 + m, where m ∈ 2Z, is due to the fact that B(ε, κ, (σ1 + m, σ2 )) is unitary equivalent to B(ε, κ, σ). The critical set S0 = { σ ∈ [0, 2) × R : |σ| = 1 or |(σ1 − 2, σ2 )| = 1 } considered as a set in the cylinder T ∗ consists of only one circle which is wrapped around the cylinder once, touching itself in σ ∗ . We have the following results.
846
A. Mielke EE (κ) E1 (κ)
2
κ=KZ (ε)
EE (κ)
ε
3
4 stable
1
5 E1 (κ) 6
κ 0 Fig. 4.3. The regions R1 to R6 for the real SHE
Theorem 4.2. There exists an ε0 > 0 such that for all (ε, κ) with 0 ≤ ε ≤ ε0 and κ2 ≤ ε the set Sε,κ ⊂ T ∗ of unstable wave vectors for the roll solution u = Uε,κ has the following properties. (a) σ ∗ = (1, 0) 6∈ Sε,κ , and σ ∈ Sε,κ implies (σ1 , −σ2 ), (2 − σ1 , σ2 ) ∈ Sε,κ . √ (b) dist (Sε,κ , S0 ) = O( ε) for ε → 0. (c) There is a curve ε = E1 (κ) with the expansion E1 (κ) = 2κ2 +
11 4 96 κ
+ O(|κ|5 ),
such that for ε = E1 (κ) the boundary of Sε,κ has a pair of double points on the line σ1 = 0 near σ2 = ±1. √ √ (d) For κ b ∈ (−1/ 2, 1/ 2) there exists a constant C such that the estimate √ √ √ ⊂ [−C ε, C ε] × [−Cε1/4 , Cε1/4 ] Sε,b κ ε holds. Part (c) is obtained by studying the behavior of λ in region C1 , where formula (4.3) holds. The curve E1 is obtained by solving√λ0 (ε, κ, r, 0) = 0 and ∂r λ0 (ε, κ, r, 0) = 0, that is, we search for a double zero in σ2 = 1 + r on the symmetry line σ1 = 0 (γ = 0). The curves ε = EE (κ), ε = E1 (κ), and κ = KZ (ε) divide the region ε ≥ κ2 into six regions R1 to R6 , see Fig. 4.1. In each of these regions the shape of the set of unstable wave vectors Sε,κ can be derived from the above analysis. The curves separating Rj and Rj+1 stand for a topological change in the structure of Sε,k . In R4 the rolls Uε,κ are stable, i.e., Sε,κ = ∅. In R3 and R5 the set Sε,κ consists of two simply connected components such that the boundary is a figure 8 with the double point in σ ∗ . In R2 the set Sε,κ is homeomorphic to a pointed disc, namely a disc–shaped region where the interior point σ ∗ is taken away. Schematic drawings of the boundary of Sε,k are given in Fig. 4.2 for each of the regions Rj . We now mention a few differences between the stability analyses for the rolls Uε,κ in the real SHE (3.1) and the rolls Aε,κ for the complex SHE (3.5). For this purpose we define the factorization mapping R2 → T ∗ = T2 × R, J: (σ1 , σ2 ) 7→ (σ1 mod 2, σ2 ). A C = JSε,κ , which means that we have to interpret Thus, we can compare Sε,κ with Sε,κ the results from Sect. 3 taking σ1 modulo 2.
Stability of Rolls in the Swift–Hohenberg Equation
σ2
σ2 j=2
j=3
A
A
A AA U
0
847
j=1
?
j=5
j=6
? σ1
σ1 2
0
2
Fig. 4.4. The set Sε,κ of unstable wave vectors for Uε,κ with (ε, κ) ∈ Rj .
A √ As a first result we find, due to (4.7) and e a = 0, that Sε,±√ε = Sε,± which is ε explicitly given in (3.13). However, the number of unstable modes in the complex case is twice the number σ ≈ σ ∗ the complex case has the four in0the real case, e.g.,1 for−2iξ 1 1 2iξ unstable modes 0 , 1 , i e , and −i e , whereas the real case has only two unstable modes, namely φ0 and φ2 . We easily find the counterpart of the curve E1 . It means that the topological structure A changes since the boundary meets itself at points near (0, ±1). For the complex of Sε,κ C touches σ1 = 2. Using se = (1 + κ)(σ1 − 1)2 SHE this occurs when the boundary of Sε,κ we simply have to insert se = 1 + κ into (3.12) (ii) (with < 0 replaced by = 0) and solve 4 5 for a double zero in µ, giving E1A (κ) = 2κ2 + κ16 − κ8 + O(|κ|6 ). The difference between A the real and the complex SHE is that the boundary of SE A (κ),κ touches itself on the line 1 σ1 = 0 whereas the boundary of SE1 (κ),κ has a double point. Moreover, for smaller ε the boundary of Sε,κ is smooth close to σ = (0, ±1), whereas in the complex case the A boundary of Sε,κ has corners on the line σ1 = 0 which follows from the expansions (3.14) where β+,− are nonzero. A The shape of Sε,κ inside the region C2 is in fact similar to Sε,κ . This follows from 4 2 2 b, (a σ b1 , ab σ2 )) giving (4.7) and the scaling (ε, κ, σ) = (a , a κ
b κ, σ b) + O(a10 ). b, (a2 σ b1 , ab σ2 )) = a8 3(b M (a4 , a2 κ b is the same for the real and the complex SHE and Because of (4.7) the limit function 3 and S C4 2 to lowest order. thus determines for each κ b ∈ [−1, 1] the shape of Sa4 ,a2b κ a ,a b κ Remark. Instead of working in the space L2 (Tπ ) we could also have used L2 (T2π ) by 2 ignoring the difference in the minimal periods of Uε,κ and Uε,κ . We would encounter a ∗ e completely similar analysis with wave vectors σ lying in T = T1 × R. In fact, the above results can easily be transferred to that case by using the mapping J2 : T ∗ → Te ∗ ; σ 7→ it is wrapped around (σ1 mod 1, σ2 ). The critical set Se0 = J2 S0 is still one circle, but now√ the cylinder twice such that additional intersections at σ = (1/2, ± 3/2) appear. Thus, the sets Seε,κ = J2 Sε,κ will undergo an additional topological change along a curve ε =
848
A. Mielke
E2 (κ) which lies slightly above the curve ε = E1 (κ). When the boundary of Sε,κ touches the lines |σ1 −1| = 1/2 this corresponds to a touching of the boundary of Seε,κ with itself. Employing (4.3) with γ = π/6 yields the expansion E2 (κ) = 2κ2 + 77κ4 /288 + O(|κ|5 ). A. Elliptic Operators Let A be a differential operator of order 2m given in the form X ap,q (x, z)(Dxp Dzq u)(x, z) for (x, z) ∈ Q (A(∂x )u)(x, z) =
(A.1)
|p|+|q|≤2m
together with m–dimensional boundary operator B = (B1 , . . . , Bm ) with X bl,q (x, z)(Dzq u)(x, z), for l = 1, . . . , m, (Bl u)(x, z) =
(A.2)
|q|≤2m−1
where (x, z) ∈ ∂Q = Rd × ∂Σ. We assume that ∂Σ is of class C 2m and that A is uniformly strongly elliptic on Q and (A, B) satisfies the complementing condition for each (x, z) ∈ ∂Q, see [ReR92], Ch. 8.4, for the definitions. For simplicity, we do not allow for tangential derivatives in the boundary operators Bl . Additionally, we assume that all coefficient matrices ap,q (x, z), bp,q (x, z) ∈ Rn×n are bounded together with their first 2m derivatives. (Our main interest lies in the case of periodic coefficients, where uniformity and boundedness are trivial.) b : D(A) b ⊂ L2 (Q) → L2 (Q) via We define the L2 –based operator A b = { u ∈ H 2m (Q) : Bu = 0 on ∂Q }, D(A) def
b = A(∂x )u, Au
(A.3)
e : D(A) e ⊂ L2 (Q) → L2 (Q) via and similarly the L2lu –based operator A lu lu 2m e = { u ∈ Hlu (Q) : Bu = 0 on ∂Q }, D(A)
def
e = A(∂x )u. Au
(A.4)
We simply write A : D(A) ⊂ X → X in order to denote both cases simultaneously. k (Q), respectively. The associated norms are written Moreover, X k denotes H k (Q) or Hlu as kukk and kukk,lu , where the subscript k = 0 is dropped. The general theory of elliptic operators (see [ReR92], Thm. 8.31) provides the a–priori regularity estimate kukX 2m ≤ C(kAukX + kukX )
for all u ∈ D(A),
(A.5)
where C is independent of u. In order to relate the cases u ∈ L2 (Q) and u ∈ L2lu (Q) with each other, we use the weight function w(x) = cosh(|x|) on Rd and the scalar α to define the operators Aα,y u = w(· − y)−α A(∂x )[w(· − y)α u] = A(∂x + α tanh(|x−y|) |x−y| (x − y))u . Mostly we omit the index y. Applying this transformation to the boundary operators has no effect; hence for all α ∈ R we obtain elliptic opertors Aα : D(A) ⊂ X → X. Moreover, there is a constant C such that for all α ∈ [−1, 1] and y ∈ Rd we have the estimate (A.6) k(Aα,y − A)ukX ≤ C1 |α| kukX 2m−1 for all u ∈ D(A).
Stability of Rolls in the Swift–Hohenberg Equation
849
Thus, if A is invertible from X into D(A) ⊂ X 2m , then for sufficiently small |α| the −α −1 A [wα f ]. This follows by operator Aα is also invertible and satisfies A−1 α f = w combining (A.5) and (A.6). The weight w allows us to go from L2lu (Q) to L2 (Q) or vice versa via u 7→ w−α u by using the following simple characterizations. Lemma A.1. Let α > 0 and w as above. (a) Let w−α u ∈ L2 (Q). Then u ∈ L2lu () if and only if there exists a C > 0 such that kw(· − y)−α uk ≤ C for all y ∈ Rd . (b) There is a constant Cα such that for all u ∈ L2lu (Q) kuklu ≤ Cα sup{ kw(· − y)−α uk : y ∈ Rd } ≤ Cα2 kuklu . P 2 (c) A function u ∈ L2lu (Q) lies in L2 (Q) if and only if n∈Zd kχn uklu < ∞, where the partition of unity χn , n ∈ Zd , is given by χn (x) = 1 for x ∈ [n, n + η) and 0 otherwise (here η = (1, . . . , 1) ∈ Zd ). For a proof we refer to Lemma C.1 in [Mi97a]. We now obtain the first main result. Theorem A.2. Let the elliptic operator A(∂x ) from (A.1) satisfy the assumptions from b and A e be the operators defined in (A.3) and (A.4). Then, A b is invertible above and let A e is invertible and moreover, spec(A) e = spec(A). b if and only if A b is invertible. Thus we know that there exists an α > 0 such that Proof. We assume that A b for any y ∈ Rd with a bound C2 not depending Aα,y is invertible from L2 (Q) into D(A) 2 −α e−1 f = wα A−1 [w−α f ] on y. For f ∈ Llu (Q) we know w f ∈ L2 (Q) such that u = A α −α is well defined. Using Lemma A.1b) we obtain kw (·−y)uk2m ≤ C2 kw−α (·−y)f k ≤ e−1 maps C2 Cα kf klu . Thus, we have proved finiteness of the norm and the operator A e 2m (Q). L2lu (Q) into H lu We still have to establish the continuity of the translates y 7→ Ty u. To this end we use that the coefficients of A(∂x ) are uniformly continuous and define the operators Ay which is obtained by using the translated coefficients Ty apq . Then, k(Ay − A)uklu ≤ ey Ty u = Ty f if and γ(|y|)kuk2m,lu , where γ(t) → 0 for t → 0. Obviously, Ty u satisfies A e = f . Hence, applying A e−1 to the equality A(u−T e e ey only Au y u)+(A− A )Ty u = f −Ty f , we obtain the desired estimate ku − Ty uk2m,lu ≤ C[γ(|y|)kuk2m,lu + kf − Ty f klu ]. e : D(A) e → L2 (Q) is invertible. For the opposite direction we now assume that A lu eα,y is invertible with bound C3 for any y ∈ Rd . We use the There is a α > 0 such that A partition χn as in Lemma A.1(c) and define, for any f ∈ L2 (Q), the functions fn = χn f e−1 fn = w−α (·−n)A−1 [w(·−n)α fn ]. Thus, we obtain for each n, m ∈ Zd and un = A −α,n the estimate R 2 kχm un k2 = [m,m+η) w(· − n)−2α |A−1 −α,n gn | dx R 2 ≤ sup{ w(x − n)−α : x ∈ [m, m + η) } Rd w(· − n)−α |A−1 −α,n gn | dx −1 −α|n−m| 2 ≤ Ce kA−α,n gn klu , n)α fn (x) satisfies kgn klu ≤ Ckfn k. Thus, we can define the where gn (x) = w(x −P function u via χm u = n∈Zd χm un , where the sum converges in L2 . To show that this u lies in fact in L2 (Q) we employ Young’s inequality for convolutions applied to the
850
A. Mielke
sequences (kχm uk)m and (kfn k)n which satisfy the convolutional estimate kχm uk ≤ P −α/2|n−m| kfn k. Thus, we obtain n∈Zd Ce kuk =
X
kχm uk2
1/2
m
≤
X
Ce−α|p|/2
X
kfn k2
1/2
= C2 kf k,
n
p∈Zd
b−1 f ∈ L2 (Q), and the invertibility of A b is established. such that u = A e b − λI we conclude that the Applying the result on the invertibility to A − λI and A e b resolvent sets of A and A are equal. But this is the desired result on the spectra. The above result holds for general elliptic operators without any periodicity assumption. We now return to operators where the Bloch decomposition is available. Every into (U (σ, ·))σ∈T ∗ ∈ L2 (T ∗ , L2 (Q/L )) via function u ∈ L2 (Q) can be decomposed R the direct integral u = D(U ) = σ∈T ∗ eiσ·x U (σ, ·)dσ. Recall the notations from Sect. 2: Q = Rd ×Σ, T = Rd /L , and Q/L = T ×Σ. The integral D(U ) has to be understood in the PN L2 (Q)–sense, see [ReS78]. For example, for simple functions U (σ, ·) = j=1 χAj (σ)Uj with Aj ⊂ T ∗ and Uj ∈ L2 (Q/L ) we have u(x) = D(U )(x) =
N X
Z vj (x)Uj (x)
eiσ·x dσ,
with vj (x) = σ∈Aj
j=1
where vj ∈ L2 (Rd ) ∩ L∞ (Rd ) and Uj ∈ L2 (Q/L ) ⊂ L2lu (Q) such that vj Uj ∈ L2 (Q) is well–defined. The inverse of D can be constructed by using the inverse of the classical Fourier transform in the x–variable, Z 1 def e−ik·x u(x, z) dx. (F u)(k, z) = (2π)d/2 x∈Rd Setting k = σ + ` with σ ∈ T ∗ and ` ∈ L we immediately find u = D(U ) with U (σ, x, z) =
X 1 ei`·x (Fu)(σ + `, z). d/2 (2π) `∈L
Using Parseval’s identity for U (σ, ·) we obtain the norm relation Z XZ 2 2 kuk = kFuk = |(F u)(σ + `, z)|2 dz dσ σ∈T ∗ `∈L d Z
=
(2π) vol(T )
z∈Σ
σ∈T ∗
kU (σ, ·)k2L2 (Q/L ) dσ.
This shows that D defines an isomorphism between L2 (Q) and L2 (T ∗ , L2 (Q/L )). Additionally, we have the following characterization. A direct integral u = D(U ) lies in H k (Q) if and only if U (σ, ·) ∈ H k (Q/L ) for a.e. σ ∈ T ∗ and Z n o (1 + |σ|2k )kU (σ, ·)k2L2 (Q/L ) + kU (σ, ·)k2H k (Q/L ) dσ < ∞. (A.7) σ∈T ∗
Stability of Rolls in the Swift–Hohenberg Equation
851
Assume now that an elliptic operator A = A(∂x ) and a boundary operator B with the properties from above are given such that A has periodic coefficients with periodicity lattice L. Applying A(∂x ) to a Bloch wave leads to the definition of the Bloch operators B(σ) : D(B) ⊂ L2 (Q/L ) → L2 (Q/L ) with B(σ)U = e−iσ·x A(∂x )[eiσ·x U ] = A(iσ + ∂x )U,
(A.8)
where D(B) = { u ∈ H (Q/L ) : Bu = 0 } does not depend on σ since the boundary operator B does not contain tangential derivatives. Inserting Bloch waves u = eiσ·x U into the regularity estimate (A.5) we obtain the a–priori estimate 2m
(1 + |σ|2m )kU kL2 (Q/L ) + kU kH 2m (Q/L ) ≤ C(kB(µ, σ)U kL2 (Q/L ) + kU kL2 (Q/L ) ) (A.9) for any σ ∈ T ∗ and U ∈ D(B), where C is independent of σ and U . b : D(A) b ⊂ L2 (Q) → L2 (Q) be given as above with L–periodic Lemma A.3. Let A b has a bounded inverse A b−1 : coefficients and associated Bloch operators B(σ). Then, A 2 ∗ b L (Q) → D(A) if and only if all B(σ), σ ∈ T , have a bounded inverse B(σ)−1 : L2 (Q/L ) → D(B) with b = sup{ kB(σ)−1 kL2 (Q/L )→L2 (Q/L ) : σ ∈ T ∗ } < ∞. If b < ∞ then kA−1 kL2 (Q)→L2 (Q) = b and A−1 = DB(·)−1 D−1 . b is invertible. By Theorem A.2 we know that also A e is invertible on Proof. Assume that A L2lu (Q). Since Bloch waves lie in L2lu (Q) the inverse of the Bloch operators is given by e−1 [eiσ·x F ] and there is a constant C such that for all F ∈ L2 (Q/L ), B(σ)−1 F = e−iσ·x A e−1 [eiσ·x F ]kL2 (Q) kB(σ)−1 F kL2 (Q/L ) ≤ CkB(σ)−1 F kL2 (Q) = Cke−iσ·x A lu lu 2 iσ·x 3 ≤ C ke F kL2 (Q) ≤ C kF kL2 (Q/L ) . lu
This proves the ‘only if’ part. For the opposite assertion insert U = B(σ)−1 F into (A.9) giving (1 + |σ|2m )kU kL2 (Q/L ) + kU kH 2m (Q/L ) ≤ C(1 + b)kF kL2 (Q/L )
(A.10)
with C(1+b) independent of F and σ. Thus, we may define K : f 7→ D[B(·)−1 (D−1 f )(·)] b ⊂ H 2m (Q). The boundedness in H 2m (Q) as a bounded operator from L2 (Q) into D(A) is a consequence of (A.10) and the criterion (A.7). The fact that K maps into the closed b of H 2m (Q) follows since B(σ)−1 maps into D(B). Obviously, K is the subspace D(A) b and the ‘if’ part is proved. desired inverse of A The norm identity follows easily as B(σ)−1 as the operator from L2 (Q/L ) into itself depends continuously on σ. The main result of this appendix reads as follows. Theorem A.4. Let A(∂x ) be an elliptic operator on Q with L–periodic coefficients and B a boundary operator on ∂Q satisfying the conditions from above. Then, we have [ e = spec(A) b = closure spec(B(σ)) , (A.11) spec(A) σ∈T ∗
where B(σ) are the associated Bloch operators, cf. (A.8).
852
A. Mielke
Proof. The first identity was already proved in Theorem A.2. b Then, Denote by C the set on the right–hand side of (A.11) and take λ0 6∈ spec(A). ∗ by Lemma A.3 we know that B(σ) − λ0 I, σ ∈ T is invertible with the inverse having a uniform bound b. Hence, for each σ ∈ T ∗ the set { λ ∈ C : |λ − λ0 | < 1/b } is in the b resolvent set of B(σ) which implies λ0 6∈ C. This proves C ⊂ spec(A). −1 Now take λ1 6∈ C. Then, the function q : σ 7→ k(B(σ) − λ1 I) kL2 (Q/L )→L2 (Q/L ) is well–defined and maps T ∗ into (0, ∞). It is continuous, since B(σ) as the bounded operator from D(B) into L2 (Q/L ) is continuous in σ. Moreover, q decays like (1 + |σ|2 )−m for large σ because of (A.9). Thus, b = sup{ q(σ) : σ ∈ T ∗ } is finite and b − λ1 I. This shows spec(A) b ⊂ C and the Lemma A.3 provides a bounded inverse of A theorem is proved.
References [BrM95] Bridges, T.J., Mielke, A.: A proof of the Benjamin–Feir instability. Arch. Rat. Mech. Anal. 133, 145–198 (1995) [BrM96] Bridges, T.J., Mielke, A.: Instability of spatially-periodic states for a family of semilinear PDE’s on an infinite strip. Math. Nachr., 179, 5–25 (1996) [Bus71] Busse, F.H.: Stability regions of cellular fluid flow. In Instability of Continuous Systems, H. Leipholz (eds), Proc. of the IUTAM Symposium in Bad Herrenalb 1969, Berlin–Heidelberg–New York: Springer-Verlag, 1971, pp. 41–47 [CoE90] Collet, P., Eckmann, J.-P.: Instabilities and Fronts in Extended Systems. Princeton: Princeton University Press, 1990 [ChH82] Chow, S.-N., Hale, J.K.: Methods of Bifurcation Theory. Berlin–Heidelberg–New York: SpringerVerlag, 1982 [Eck65] Eckhaus, W.: Studies in Non–Linear Stability Theory. Berlin–Heidelberg–New York: SpringerVerlag, Springer Tracts in Nat. Phil. Vol. 6, 1965 [Kat76] Kato, T.: Perturbation Theory for Linear Operators. Berlin–Heidelberg–New York: Springer-Verlag, 1976 [KiS69] Kirchg¨assner, K., Sorger, P.: Stability analysis of branching solutions of the Navier–Stokes equations. In: Proceedings of the 12th International Congress of Applied Mechanics, M. H´etenyi, G. Vincenti (eds), 1969, pp. 257–268 [Kuw96] Kuwamura, M.: The stability of roll solutions of the 2–D Swift–Hohenberg equation and the phase diffusion equation. SIAM J. Math. Anal. 27, 1311–1335 (1996) [KvW97] Kagei, Y., von Wahl, W.: The Eckhaus criterion for convection roll solutions of the Oberbeck– Boussinesq equations. Int. J. Non-Linear Mech. 32, 563–620 (1997) [Mie95] Mielke, A.: A new approach to sideband-instabilities using the principle of reduced instability. In: Nonlinear Dynamics and Pattern formation in the Natural Environment. A. Doelman & A. van Harten (eds). Pitman Research Notes in Math. Vol. 335, 1995, pp. 206–222 [Mi97a] Mielke, A.: The complex Ginzburg–Landau equation on large and unbounded domains: sharper bounds and attractors. Nonlinearity 10, 199–222 (1997) [Mi97b] Mielke, A.: Mathematical analysis of sideband instabilites with application to Rayleigh–B´enard convection. J. Nonlinear Sci. 7, 57–99 (1997) [MiS95] Mielke, A., Schneider, G.: Attractors for modulation equations on unbounded domains – existence and comparison. Nonlinearity 8, 743–768 (1995) [ReR92] Renardy, M., Rogers, R.C.: An Introduction to Partial Differential Equations. Berlin–Heidelberg– New York: Springer-Verlag, 1992.
Stability of Rolls in the Swift–Hohenberg Equation
853
[ReS78] Reed, M., Simon, B.: Methods of Modern Mathematical Physics IV , New York: Academic Press, 1978 [Sca94] Scarpellini, B.: L2 -perturbations of periodic equilibria of reaction diffusion systems. Nonlinear Diff. Eqns. Appl. (NoDEA) 3, 281–311 (1994) [Sca95] Scarpellini, B.: The principle of linearized instability for space-periodic equilibria of Navier–Stokes on an infinite plate. Analysis 15, 359–391 (1995) [Sch96] Schneider, G.: Diffusive stability of spatial periodic solutions of the Swift–Hohenberg equation. Commun. Math. Phys. 178, 679–202 (1996) Communicated by A. Kupiainen
Commun. Math. Phys. 189, 855 – 877 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
Separation of Variables for the Ruijsenaars System V.B. Kuznetsov1,? , F.W. Nijhoff1 , E.K. Sklyanin2,?? 1 Department of Applied Mathematical Studies, University of Leeds, Leeds LS2 9JT, UK. E-mail:[email protected], [email protected] 2 Research Institute for Mathematical Sciences, Kyoto University, Kyoto 606, Japan. E-mail: [email protected]
Received: 10 January 1997 / Accepted: 1 April 1997
Abstract: We construct a separation of variables for the classical n-particle Ruijsenaars system (the relativistic analog of the elliptic Calogero-Moser system). The separated coordinates appear as the poles of the properly normalised eigenvector (Baker-Akhiezer function) of the corresponding Lax matrix. Two different normalisations of the BA functions are analysed. The canonicity of the separated variables is verified with the use of the r-matrix technique. The explicit expressions for the generating function of the separating canonical transform are given in the simplest cases n = 2 and n = 3. Taking the nonrelativistic limit we also construct a separation of variables for the elliptic Calogero-Moser system.
1. Introduction One of the most powerful methods in studies of Liouville integrable systems is that of Separation of Variables (SoV). Originated with the development of the Hamiltonian mechanics as a method to solve the Hamilton-Jacobi equation for particular Hamiltonians, nowadays it has been applied to many families of finite-dimensional (Liouville) integrable systems (see recent review [31]). For a very long time a great deal of attention has been given to so-called coordinate separation of variables or to separation in the configuration space (see, for instance, [8, 28, 14, 15, 4, 9, 10, 31] and references therein). In this case the separation variables uj do not depend on the momenta pi and are functions of the coordinates xi only: uj = uj (x1 , . . . , xN ) . ? On leave from: Department of Mathematical and Computational Physics, Institute of Physics, St. Petersburg University, St. Petersburg 198904, Russian Federation. ?? On leave from: Steklov Mathematical Institute, Fontanka 27, St. Petersburg 191011, Russian Federation. E-mail: [email protected]
856
V.B. Kuznetsov, F.W. Nijhoff, E.K. Sklyanin
Such kinds of integrable systems admitting a coordinate (local) separation of variables were studied in detail, although at the same time it was understood that far not every Lionville integrable system can be separated by such transformations. The class of admissible transformations should be enlarged for a generic integrable system up to a general canonical transformation uj = uj (x1 , . . . , xN , p1 , . . . , pN ) ,
vj = vj (x1 , . . . , xN , p1 , . . . , pN ) .
In the context of the Inverse Scattering Method [13, 3, 31] the separation variables (u, v) appear usually as pairs of canonically conjugate variables sitting on the spectral curve of the related n × n Lax matrix L(u). The coordinates uj are obtained respectively as the poles of the associated Baker-Akhiezer (BA) function f (u) satisfying the linear problem f (u) = (f1 (u), . . . , fn (u))t ,
L(u) f (u) = v f (u) , with some fixed normalisation α ~ (u) α ~ ·f ≡
n X
αj (u) fj (u) = 1 .
j=1
The method of SoV in such a formulation was successfully applied to many particular integrable systems, here are some of the relevant references [25, 26, 27, 29, 1, 24, 31, 17, 7, 18, 19, 16, 20]. In the present paper we prove the SoV for the classical n-particle Ruijsenaars system with the n × n Lax matrix found in [23] and with the Hamiltonian
H1 =
n X j=1
e pj
Y σ(xj − xk − λ) , σ(xj − xk )
{pj , xk } = δjk ,
(1.1)
k6=j
where σ(x) is the Weierstrass σ-function, λ ∈ R is a parameter of the model and (pj , xj ) are canonical Darboux variables. It is shown that the method of SoV applies to this system if we use the standard normalisation vector α ~, α ~ =α ~ 0 ≡ (0, 0, . . . , 0, 1) ,
i.e. fn (u) = 1 .
The structure of the paper is the following. In Sect. 2 we collect known information about the Ruijsenaars system (Lax matrix, integrals of motion, etc). In Sect. 3 we give an overview of the method of separation of variables and apply it then, in Sect. 4, to the system in question. In that key section we also discuss the possibility of an alternative choice for the normalisation vector α ~ (u). The generating functions of the canonical separating transform given in terms of the initial and separation variables are constructed in Sect. 5 in explicit form for the case of two and three degrees of freedom. We also provide the separation of variables for the nonrelativistic limit λ → 0 to the elliptic Calogero-Moser system in Sect. 6. Section 7 contains some concluding remarks.
Separation of Variables for Ruijsenaars System
857
2. The System Let us first recall some properties of the Weierstrass functions which we will need in the main text. Let 2ω1,2 ∈ C be a fixed pair of the primitive periods and 0 = 2ω1 Z + 2ω2 Z the corresponding period lattice. Let us fix also the primitive domain D := {z = 2ω1 x + 2ω2 y | x, y ∈ [0, 1)} such that D ∼ C/0. The Weierstrass sigma-function is defined by the infinite product (cf., for instance, [33]) h i Y 1 − γx exp γx + 21 ( γx )2 , (2.1) σ(x) = x γ∈0\{0}
the relations between σ-, ζ- and ℘- functions being given by ζ(x) =
σ 0 (x) , σ(x)
℘(x) = −ζ 0 (x) ,
(2.2)
where σ(x) and ζ(x) are odd functions and ℘(x) is an even function of its argument. We recall also that σ(x) is an entire function, and ζ(x) is a meromorphic function having simple poles at ωkl , both being quasi-periodic, obeying ζ(x + 2ω1,2 ) = ζ(x) + 2 η1,2 ,
σ(x + 2ω1,2 ) = −σ(x) e2η1,2 (x+ω1,2 ) ,
in which η1,2 satisfy η1 ω2 − η2 ω1 = πi 2 , whereas ℘(x) is doubly periodic. From an algebraic point of view, the most important property of these functions is the existence of a number of functional relations, the most fundamental being ζ(α) + ζ(β) + ζ(γ) − ζ(α + β + γ) =
σ(α + β) σ(β + γ) σ(γ + α) , σ(α) σ(β) σ(γ) σ(α + β + γ)
(2.3)
which can be cast into the following form: 8κ (x) 8κ (y) = 8κ (x + y) [ ζ(κ) + ζ(x) + ζ(y) − ζ(κ + x + y) ]
(2.4)
with the function 8κ (x) defined as follows: 8κ (x) :=
σ(x + κ) . σ(x) σ(κ)
Two other useful identities have the form 8κ−κ˜ (a − b) 8κ (x + b) 8κ˜ (y + a) − 8κ−κ˜ (x − y) 8κ (y + a) 8κ˜ (x + b) = 8κ (x + a) 8κ˜ (y + b) [ ζ(a − b) + ζ(x + b) − ζ(x − y) − ζ(y + a) ] ,
(2.5)
(2.6) 8κ−κ˜ (x − y) 8κ (y + a) 8κ˜ (x + a) ˜ ]. = 8κ (x + a) 8κ˜ (y + a) [ ζ(x − y) − ζ(κ + x + a) + ζ(κ˜ + y + a) + ζ(κ − κ) The generalised Cauchy identity has the following form [6]: Q σ(xk − xl ) σ(yl − yk ) , det 8κ (xi − yj ) = 8κ (Σ) σ(Σ) k
(2.7)
858
V.B. Kuznetsov, F.W. Nijhoff, E.K. Sklyanin
Now we can introduce the n-particle (An−1 type) Ruijsenaars system [23]. It is an integrable system with the following integrals of motion (i = 1, . . . , n): X X Y σ(xj − xk − λ) . (2.8) exp pj Hi = σ(xj − xk ) j∈J J⊂{1,...,n} j∈J
|J|=i
k∈{1,...,n}\J
The variables (pj , xj ), j = 1, . . . , n, on a 2n-dimensional symplectic manifold form a canonical system, i.e. they possess the Poisson brackets {pj , pk } = {xj , xk } = 0 ,
{pj , xk } = δjk ,
j, k = 1, . . . , n , (2.9) P P or, equivalently, the symplectic form ω is expressed as ω = j dpj ∧ dxj = d( j pj dxj ). The λ is a parameter of the model. This system was proposed by Ruijsenaars as a relativistic analog of the Calogero-Moser system. Proposition 1 ([23]). The Hamiltonians Hj Poisson commute {Hj , Hk } = 0,
j, k = 1, . . . , n .
(2.10)
The Lax matrix for this model has the form L(u) =
n X
hi 8u (xi − xj + λ) Eij ,
hi := epi
i,j=1
Y σ(xi − xj − λ) , σ(xi − xj )
(2.11)
j6=i
where the matrix Eij have the following entries: (Eij )kl = δik δjl . Notice that Ruijsenaars [23] used another gauge of the momenta such that two are connected by the following canonical transformation: s Y σ(xi − xj + λ) pi → pi + log , x i → xi . (2.12) σ(xi − xj − λ) j6=i
Proposition 2 ([23]). The characteristic polynomial of the matrix L(u) (2.11) generates the Hamiltonians (2.8), det(L(u) − v · 1) =
n X j=0
(−v)n−j
Hj σ(u + jλ) , σ j (λ) σ(u)
(2.13)
where we assume H0 ≡ 1. 3. The Method Recall, first, the standard definitions of Liouville integrability and SoV in the HamiltonJacobi equation [2]. An integrable Hamiltonian system with N degrees of freedom is determined by a 2N -dimensional symplectic manifold (phase space) and N independent functions (Hamiltonians) Hj commuting with respect to the Poisson bracket {Hj , Hk } = 0 ,
j, k = 1, . . . , N .
(3.1)
To find a SoV means then to find a canonical transformation M : (x, p) 7→ (u, v), M : Hi (x, p) 7→ Hi (u, v) such that there exist N relations
Separation of Variables for Ruijsenaars System
859
8j (uj , vj ; H1 , . . . , HN ) = 0 ,
j = 1, . . . , N ,
(3.2)
separating the variables uj . The most common way to describe a canonical transformation is the one in terms of its generating function F (u|x). Presently, no algorithm is known for constructing a SoV for any given integrable system. Nevertheless, there exists a fairly effective practical recipe based on the classical inverse scattering method. A detailed description of the procedure with many examples can be found in the review paper [31], see also the works [29, 17, 18, 16, 19]. Here we describe very briefly its main steps. A Lax matrix for a given integrable system is a matrix L(u) dependent on a “spectral parameter” u ∈ C such that its characteristic polynomial obeys two conditions (i) (ii)
Poisson involutivity: {det(L(u) − v · 1), det(L(u) ˜ − v˜ · 1)} = 0 , ∀u, u, ˜ v, v˜ ∈ C; det(L(u) − v · 1) generates all integrals of motion Hi .
A Baker-Akhiezer (BA) function is the eigenvector L(u) f (u) = v(u) f (u)
(3.3)
of the Lax matrix L(u), provided that a normalisation of the eigenvectors f (u) is fixed α ~ ·f ≡
n X
αi (u) fi (u) = 1 ,
( f (u) ≡ (f1 (u), . . . , fn (u))t ) .
(3.4)
i=1
The pair (u, v) can be thought of as a point of the spectral curve det(L(u) − v · 1) = 0 .
(3.5)
The BA function f (u) is then a meromorphic function on the spectral curve. The recipe for finding an SoV is simple: The separation variables uj are poles of the Baker-Akhiezer function, provided it is properly normalised. The corresponding eigenvalues vj of L(uj ), or some functions of them, serve as the canonically conjugated variables. It is easy to see that the pairs (uj , vj ) thus defined satisfy the separation equations (3.2) for 8j ≡ det(L(uj ) − vj · 1). The canonicity of the variables (uj , vj ) should be verified independently. No general recipe is known how to guess the proper (that is producing canonical variables) normalisation for the BA function. In many cases the simplest standard normalisation, α ~ (u) = α ~ 0 ≡ (0, 0, . . . , 0, 1) ,
(3.6)
works. In other cases the vector α ~ may depend on the spectral parameter u and the dynamical variables (x, p). We shall refer to such normalisation as a dynamical one. From the linear problem (3.3) and normalisation (3.4) we derive that α ~ · Lk f = k v , k = 0, . . . , n − 1, hence, −1 1 α ~ ~ · L(u) v α · . . f = f (u) = .. .. . n−1 (u) v n−1 α ~ ·L
(3.7)
860
V.B. Kuznetsov, F.W. Nijhoff, E.K. Sklyanin
Another useful representation of the eigenvector f (u), which can be directly verified, is as follows: (L(u) − v · 1)∧ jk , ∀k = 1, . . . , n , (3.8) fj (u) = (~ α · (L(u) − v · 1)∧ )k where the wedge denotes the classical adjoint matrix (matrix of cofactors). To derive equations for the separation variables, let fi(j) = res fi (u) and vj ≡ v(uj ). u=uj
Then from (3.3)–(3.4) we have the overdetermined system of n + 1 linear homogeneous equations for n components fi(j) of the vector f (j) : L(u ) f (j) = vj f (j) , Pn j (3.9) (j) i=1 αi (uj ) fi = 0 . The pair (u, v) ≡ (uj , vj ) is thus determined from the condition α ~ (u) rank = n − 1. L(u) − v · 1
(3.10)
Finally, the condition (3.10) can be rewritten as the following vector equation: α ~ · (L(u) − v · 1)∧ = 0 .
(3.11)
One can eliminate v from (3.11) to get the equation for uj ’s in the following way. From the linear system (3.9) it follows that α ~ · (L(uj ))k f (j) = 0, k = 0, . . . , n − 1, so that (because f (j) is not a zero vector) the following determinant has to vanish on the separation variables uj : α ~ ~ · L(u) α = 0. (3.12) B(u) = det .. . α ~ · Ln−1 (u) The formula (3.12) for the separation variables appeared already in [24] (see also [7]) in the case of standard normalisation: α ~ =α ~ 0 (3.6) (see, for instance, formula (22) in [24]). Notice that the fact that Eqs. (3.11) and (3.12) are the ones for the poles of the BA function, is already hinted, respectively, by the formulas (3.8) and (3.7). Also, from Eqs. (3.11) we can get many various formulas for v in the form v = A(u)
(3.13)
with A(u) being rational functions of the entries of L(u). Let us describe those formulas for A(u) explicitly. Define the matrices L(p) , p = 1, . . . , n, with the following entries: Li,i1 · · · Li,ip−1 Li,j n n X X Li1 ,j Li1 ,i1 · · · Li1 ,ip−1 . , p = 2, 3, . . . , n , (3.14) := · · · L(p) .. .. .. ij . . . . . i1 =1 ip−1 =1 L ip−1 ,j Lip−1 ,i1 · · · Lip−1 ,ip−1 and put L(1) ≡ L. These matrices satisfy the recursion relation of the form
Separation of Variables for Ruijsenaars System
861
L(p) = L tr L(p−1) − (p − 1) L(p−1) L . Introduce the matrix B(u) by the formula α ~ · L(1) (u) L−1 (u) α ~ · L(2) (u) L−1 (u) 1α B(u) := 2 ~ · L(3) (u) L−1 (u) . ··· 1 (n) −1 ~ · L (u) L (u) (n−1)! α
(3.15)
(3.16)
Then we have the following statement. Proposition 3. α ~ · (L(u) − v · 1)∧ = ((−v)n−1 , (−v)n−2 , . . . , 1) · B(u) .
(3.17)
Proof. The characteristic determinant det(L(u) − v · 1) has the following representation det(L(u) − v · 1) = (−v)n +
n X (−v)n−j
j!
j=1
tr(L(j) (u)) .
(3.18)
The adjoint matrix (L(u) − v · 1)∧ is a matricial polynomial in v of the degree n − 1, (L(u) − v · 1)∧ = (−v)n−1 · 1 +
n−1 X
(−v)n−1−j A(j) (u) .
(3.19)
j=1
In order to find the matrices A(j) , substitute (3.18) and (3.19) into the equality, det(L(u) − v · 1) · 1 = (L(u) − v · 1) (L(u) − v · 1)∧ , and equate coefficients with the degrees of v. In this way we get the following recursion relation for the A(j) ’s: (3.20) A(j) = j!1 tr(L(j) ) − L A(j−1) with the initial data A(0) = 1 ,
A(n−1) =
1 n!
tr(L(n) ) L−1 (u) .
(3.21)
The matrix j!1 L(j+1) L−1 (cf. (3.15)) satisfies the same recursion and the same initial values which means that A(j) (u) =
1 j!
L(j+1) (u) L−1 (u) .
From the system of linear homogeneous equations α ~ · (L(u) − v · 1)∧ ≡ ((−v)n−1 , (−v)n−2 , . . . , 1) · B(u) = 0
(3.22)
(cf. (3.17)) we derive that (−v)j−i =
(B ∧ (u))ki , (B ∧ (u))kj
∀k .
(3.23)
The formula (3.23) gives plenty of different representations for the function A(u), all of them being compatible on the separation variables since, because of the equality
862
V.B. Kuznetsov, F.W. Nijhoff, E.K. Sklyanin
B(u) = det(B(u)) ,
(3.24)
the matrix B ∧ (uj ) has rank 1. To validate the choice of normalisation α ~ (u) it remains, first, to make sure that the number of uj ’s is exactly N (in some degenerate cases one has to supply a couple of extra variables to make a complete set) and, second, to verify (somehow) the canonicity of brackets between the whole set of separation variables, namely: between zeros uj of B(u) and their conjugated variables vj ≡ v(uj ) = A(uj ). To do this final calculation one needs information about Poisson brackets between entries of the Lax matrix L(u). 4. The Separation We now proceed with applying the general method to the system in question. For the Ruijsenaars model the number N of degrees of freedom coincides with the number n of particles and, respectively, with the dimension n of the Lax matrix (2.11), so we can put N = n in the formulas of the above section. Let us first prove two useful lemmas. Lemma 1. Let ci ∈ C, x(i) j ∈ D, i = 1, . . . , M , j = 1, . . . , N , be arbitrary constants such that N X x(i) ∀i . j ≡ x (mod 0) j=1
Then there exist C ∈ C, yj ∈ D, j = 1, . . . , N , such that p(u) ≡
M X i=1
where
ci
N Y
σ(u − x(i) j )=C
j=1
N Y
σ(u − yj ) ,
∀u ∈ C ,
j=1
N X
yj ≡ x
(mod 0) .
j=1
The p(u) can be thought of as σ-function version of the N th degree polynomial (in u) which is represented in terms of its zeros yj . PN Proof. Let zj ∈ D, j = 1, . . . , N , be N distinct constants such that j=1 zj ≡ x (mod 0). Consider the elliptic function p(u) ˜ of the form p(u) ˜ =
M X
ci
i=1
N Y σ(u − x(i) j ) j=1
σ(u − zj )
.
(4.1)
Any elliptic function can be represented through the ratio of products of σ-functions depending on its zeros, yj , and its poles, zj (cf., for instance, [5]), i.e. p(u) ˜ =C
N Y σ(u − yj ) j=1
σ(u − zj )
,
(4.2)
PN PN where j=1 yj ≡ j=1 zj ≡ x (mod 0). The statement follows if we equate right hand sides of (4.1) and (4.2).
Separation of Variables for Ruijsenaars System
863
Consider the Lax matrix L(u) for the Ruijsenaars system n X
L(u) =
hi 8u (xi − xj + λ) Eij .
(4.3)
i,j=1
Lemma 2. For any integer p = 1, 2, . . . , n we have the identity (L(u))p +
p−1 X
(−1)j
j=1
=
n X
Hj σ(u + jλ) (L(u))p−j σ j (λ) σ(u) ,
(p) hi Cij
(4.4)
8u (xi − xj + pλ) Eij
i,j=1 (p) do not depend on the spectral parameter u and are given by the where the scalars Cij formula Q X σ(xi − xil ) σ(xil − xik ) (p) p−1 × (4.5) hi1 · · · hip−1 k
1
×
p−1 σ(xi − xj + pλ) Y σ(xi − xik ) σ(xik − xj ) σ(xi − xj + λ) σ(xi − xik + λ) σ(xik − xj + λ) k=1
(1) and Cij = 1.
This lemma does actually say that it is possible to arrange for the degree p polynomial in L(u) (the left hand side of (4.4)) such that u-dependence of its (ij)-entry occurs only through the factor 8u (xi − xj + pλ). This fact reflects some hidden internal structure of the Lax matrix L(u) and is essential for further proof of the separation of variables. Notice also that the usage of the generalised Cauchy identity is very important for the proof of the lemma given below. Proof. Iterating the recursion (3.15) for the matrix L(p) (u), we get the formula L
(p)
p−1
= (−1)
(p − 1)! L + p
p−1 X
(−1)p−1−j
j=1
(p − 1)! tr (L(j) ) Lp−j . j!
(4.6)
Noticing that the traces of the L(j) matrices are expressed in terms of the integrals of motion (cf. (2.13) and (3.18)) tr L(j) = j!
Hj σ(u + jλ) , σ j (λ) σ j (u)
(4.7)
we have that (p) 8u (xi − xj + pλ) = hi Cij
(−1)p−1 (p) L . (p − 1)! ij
The right-hand side being evaluated with the help of the generalised Cauchy identity (2.7), we arrive at the statement of the lemma.
864
V.B. Kuznetsov, F.W. Nijhoff, E.K. Sklyanin
In order to separate variables in the Ruijsenaars system, first of all we have to fix the normalisation vector α ~ (u). The crucial observation is that we can use the standard normalisation (3.6). Then we have the following “characteristic equations” for the separation variables u = uj and v = vj (cf. (3.11)), (L(u) − v · 1)∧ nk = 0 ,
k = 1, . . . , n .
(4.8)
The “σ-polynomial” B(u) (3.12) has now the form B(u) = det
0 Ln1 .. .
... ... .. .
1 Lnn .. .
(Ln−1 )n1
...
(Ln−1 )nn
.
(4.9)
Its zeros, uj , are the poles of the BA function f (u) and are the separation variables. Let us first verify that we have got the right number of the uj ’s. Theorem 1. σ-polynomial B(u) (4.9) has n − 1 zeros uj ∈ D and can be represented by the formula B(u) = C˜
n−1 Y
8u (−uj ),
(4.10)
j=1
where C˜ does not depend on the spectral parameter u and has the form
C˜ = (−1)n−1 hn−1 n
1 C (2) n1 . . . (n−1) Cn1
... ... .. . ...
1
(2) Cn,n−1
.. .
(n−1) Cn,n−1
.
(4.11)
Variables uj obey the restriction n−1 X
uj ≡
j=1
n−1 X
(xj − xn ) −
n(n−1) 2
λ
(mod 0) .
(4.12)
j=1
Proof. Using Lemma 2 we can represent B(u) in the form × B(u) = (−1)n−1 hn−1 n 8u (xn1 + λ) (2) C n1 8u (xn1 + 2λ) × .. . C (n−1) 8 (x + (n − 1)λ) n1
u
n1
... ... .. . ...
(4.13)
8u (xn,n−1 + λ) (2) Cn,n−1 8u (xn,n−1 + 2λ) . .. . (n−1) Cn,n−1 8u (xn,n−1 + (n − 1)λ)
Then, using Lemma 1, we conclude that the σ-polynomial B(u) can be rewritten in terms of its zeros in the form (4.10), where C˜ is given by the formula (4.11) and we also have the restriction (4.12).
Separation of Variables for Ruijsenaars System
865
To avoid discontinuities when discussing the Poisson brackets it is convenient to think of uj ’s as lying on the torus C/0 rather than on D. In the sequel we obtain few statements which are valid for a general Lax matrix L(u). Let us introduce the following matrices: L∧ (u, v) := (L(u) − v · 1)∧ .
L(u, v) := L(u) − v · 1 ,
(4.14)
∧
˜ v) ˜ in terms of the Poisson We can express the Poisson brackets of L (u, v) with L(u, brackets of L(u, v) with L(u, ˜ v). ˜ The answer is given by the following lemma. Lemma 3. ˜ v)} ˜ = 1−1 ˜ v)} ˜ ] tr1 [ L∧ {L∧ 1 (u, v), L2 (u, 1 (u, v) {L1 (u, v), L2 (u, 1
∧ − L∧ ˜ v)} ˜ L1 (u, v) , 1 (u, v) {L1 (u, v), L2 (u,
˜ v)} ˜ = 1−1 ˜ v) ˜ {L1 (u, v), L2 (u, ˜ v)} ˜ ] tr2 [ L∧ {L1 (u, v), L∧ 2 (u, 2 (u, 2
∧ − L∧ ˜ v) ˜ {L1 (u, v), L2 (u, ˜ v) ˜ , ˜ v)} ˜ L2 (u, 2 (u,
∧ ˜ v) ˜ = 1 ⊗ L(u, ˜ v), ˜ L∧ where L1 (u, v) = L(u, v) ⊗ 1, L2 (u, 1 (u, v) = L (u, v) ⊗ 1, etc., ˜ v)) ˜ and tr1,2 means trace in the first, respectively, the 11 = det(L(u, v)), 12 = det(L(u, second space of the tensor product of two spaces and is defined by the rule:
tr1 [A1 B2 ] ≡ tr1 [A ⊗ B] := tr(A) (1 ⊗ B) ≡ tr(A) B2 ,
(4.15)
tr2 [A1 B2 ] ≡ tr2 [A ⊗ B] := tr(B) (A ⊗ 1) ≡ tr(B) A1 .
(4.16)
Proof. The matrix L(u, v) and its classical adjoint L∧ (u, v) satisfy the relation L∧ (u, v) L(u, v) = L(u, v) L∧ (u, v) = 11 · 1 .
(4.17)
Differentiating this formula with respect to a parameter t and using the formula d ∧ d (det L) = tr L L , dt dt one obtains (cf. (1.45)–(1.47) from [1]) dL∧ L∧ tr L∧ = dt
d dt
L − L∧ 11
d dt
L L∧
.
From which we have the following derivatives in the component-wise form: ∧ ∧ ∧ ∂L∧ L∧ qp Lij − Lip Lqj ij = . ∂Lpq 11
(4.18)
Now, using the derivation property of the bracket, {L∧ ˜ v)} ˜ = ij (u, v), Lkl (u,
X ∂L∧ ij (u, v) {Lpq (u, v), Lkl (u, ˜ v)} ˜ , ∂Lpq (u, v) pq
˜ v)} ˜ = {Lij (u, v), L∧ kl (u,
X ∂L∧ (u, ˜ kl ˜ v) {Lij (u, v), Lpq (u, ˜ v)} ˜ , ∂L ( u, ˜ v) ˜ pq pq
we verify both statements of the lemma by substitution and straightforward calculation.
866
V.B. Kuznetsov, F.W. Nijhoff, E.K. Sklyanin
From the involutivity of the characteristic polynomials of a Lax matrix L, 11 and 12 , we have the equality: ˜ v) ˜ L∧ ˜ v)} ˜ 0 = {11 · 1 ⊗ 1, 12 · 1 ⊗ 1} = {L1 (u, v) L∧ 1 (u, v), L2 (u, 2 (u, ∧ ∧ ∧ ∧ ∧ ∧ ∧ = L1 L2 {L1 , L2 } + L1 {L1 , L2 } L2 + L2 {L1 , L2 } L1 + {L1 , L2 } L∧ 1 L2 . Hence, using Lemma 3, we can get from here an expression for the bracket of L∧ with L∧ in terms of the brackets of L with L. Lemma 4. −1 ∧ {L∧ ˜ v)} ˜ = 1−1 1 (u, v), L2 (u, 1 12
∧ ∧ ∧ L∧ 1 L2 {L1 , L2 } − tr1 [L1 L2 {L1 , L2 }] ∧ ∧ ∧ − tr2 [L∧ L1 L2 . 1 L2 {L1 , L2 }]
Suppose now that a Lax matrix L(u) satisfies the quadratic (dynamical) (r, s)-bracket, then we have the following statement. Lemma 5. Let a Lax matrix L(u) satisfy the quadratic (r, s)-bracket of the form ˜ = L 1 L 2 r + − r− L 1 L 2 + L 1 s + L 2 − L 2 s − L 1 {L1 (u), L2 (u)}
(4.19)
where r + − r− + s + − s − = 0 ,
P r± P = −r± |u↔u˜ ,
P s± P = s∓ |u↔u˜ .
Here P is the flip in tensor product of two spaces, i.e. P (A ⊗ B) P = B ⊗ A. Then the matrix L∧ (u, v) ≡ (L(u) − v · 1)∧ obeys the bracket of the form ∧ ∧ ∧ ∧ ∧ {L∧ 1 , L2 } = (r+ − tr1 r+ − tr2 r+ ) L1 L2 − L1 L2 (r− − tr1 r− − tr2 r− ) ∧ ∧ ∧ + L∧ 2 (s+ − tr1 s+ − tr2 s+ ) L1 − L1 (s− − tr1 s− − tr2 s− ) L2 ∧ ∧ ∧ + v 1−1 (L∧ 1 (r+ − s− ) − tr1 [L1 (r+ − s− )]) L1 L2 1
(4.20)
∧ ∧ ∧ −L∧ 1 L2 ((r− − s+ )L1 − tr1 [(r− − s+ )L1 ])
∧ ∧ ∧ (L2 (r+ + s+ ) − tr2 [L∧ + v˜ 1−1 2 (r+ + s+ )]) L1 L2 2
∧ ∧ ∧ −L∧ 1 L2 ((r− + s− )L2 − tr2 [(r− + s− )L2 ]) .
Proposition 4 ([21, 32]). The Lax matrix (2.11) of the Ruijsenaars model satisfies the quadratic (r, s)-algebra (4.19), where (r, s)-matrices can be chosen to be as follows: r+ = a − b + c − d , where a :=
X
r− = a + d ,
s+ = b + d ,
8u−u˜ (xi − xj ) Eij ⊗ Eji + ζ(u − u) ˜
i6=j
b :=
X
8u (xi − xj ) Eij ⊗ Eii + ζ(u)
i6=j
c :=
X X i6=j
(4.21)
Ekk ⊗ Ekk ,
(4.22)
k
Ekk ⊗ Ekk ,
(4.23)
Ekk ⊗ Ekk ,
(4.24)
k
8u˜ (xi − xj ) Eii ⊗ Eij + ζ(u) ˜
i6=j
d :=
X
X
s− = c − d ,
X k
ζ(xi − xj ) Eii ⊗ Ejj .
(4.25)
Separation of Variables for Ruijsenaars System
867
Notice here that one needs to use three algebraic relations (2.5)–(2.6) and (2.4) for the function 8 to verify this (r, s)-structure (cf. [21]). Separation variables (u, v) = (uj , vj ), j = 1, . . . , n − 1, for the Ruijsenaars model are implicitly defined by the following system of equations (L∧ (u, v))nk ≡ (L(u) − v · 1)∧ nk = 0 ,
k = 1, . . . , n ,
(4.26)
where L(u) is the Lax matrix (2.11). The Poisson brackets for these new variables are generally given by the expression: {ui , uj } {ui , vj } (4.27) = (Mi;kl )−1 × {vi , uj } {vi , vj } ×
˜ v)) ˜ nk } {(L∧ (u, v))nk , (L∧ (u, {(L∧ (u, v))nl , (L∧ (u, ˜ v)) ˜ nk }
{(L∧ (u, v))nk , (L∧ (u, ˜ v)) ˜ nl } {(L∧ (u, v))nl , (L∧ (u, ˜ v)) ˜ nl }
Aij
(Mtj;kl )−1 ,
where it is assumed that k 6= l, the condition Aij means substitution of the form (u, v) = (ui , vi ) Aij := , (u, ˜ v) ˜ = (uj , vj ) and matrices M are defined as follows: ∂(L∧ (u,v))nk Mm;kl := ∂(L∧∂u (u,v))nl ∂u
∂(L∧ (u,v))nk ∂v ∂(L∧ (u,v))nl ∂v
.
(4.28)
|(u,v)=(um ,vm )
Theorem 2. The separation variables (uj , vj ), j = 1, . . . , n − 1, for the Ruijsenaars system, defined by the system of Eqs. (4.26), possess the following Poisson brackets: (i) (ii)
{ui , uj } = {ui , vj } = {vi , vj } = 0 , {vj , uj } = vj .
i 6= j ,
Proof. Generically the matrix Mm;kl (4.28) for k 6= l is invertible which means that in order to prove the statement (i) we have to show that ˜ v)) ˜ nl }|A {(L∧ (u, v))nk , (L∧ (u,
ij
= 0,
∀k, l = 1, . . . , n ,
when i 6= j. The latter fact follows from Lemma 5 when we substitute in the right hand side of (4.20) the (r, s)-matrices from Proposition 4 and put in both sides (u, v) = (ui , vi ), (u, ˜ v) ˜ = (uj , vj ), i 6= j. Indeed, using the definition (4.26), we get then the expression of the form {(L∧ (u, v))nk , (L∧ (u, ˜ v)) ˜ nl }|A ij X = ((a − b + c − d )np,nq )|Aij (L∧ (ui , vi ))pk (L∧ (uj , vj ))ql
(i 6= j)
pq
+
X v [(L∧ (u,v))
np (L
∧
(u,v))qk −(L∧ (u,v))nk (L∧ (u,v))qp ] det(L(u,v))
(a − b)pq,nr
pqr
+
X v˜ [(L∧ (u, ˜ v)) ˜ pqr
np (L
∧
(u, ˜ v)) ˜ ql −(L∧ (u, ˜ v)) ˜ nl (L∧ (u, ˜ v)) ˜ qp ] det(L(u, ˜ v)) ˜
(a + c)nr,pq
|Aij
|Aij
(4.29)
(L∧ (uj , vj ))rl (L∧ (ui , vi ))rk .
868
V.B. Kuznetsov, F.W. Nijhoff, E.K. Sklyanin
Each of the above three terms equals zero, the first one when simply inspecting the inputs from the matrices a, b, c, d; the latter two because the simple zero in the denominator is cancelled by a double zero in the numerator. In order to prove the statement (ii) we take i = j in (4.27) to get {uj , vj } det Mj;kl = {(L∧ (u, v))nk , (L∧ (u, v))nl }|(u,v)=(uj ,vj ) ,
(4.30)
where we recall that k 6= l. Hence, we have to show that − vj det Mj;kl = {(L∧ (u, v))nk , (L∧ (u, v))nl }|(u,v)=(uj ,vj ) .
(4.31)
To calculate the right hand side of (4.31) we use Proposition 4 and take the limit u˜ → u in the (r, s)-bracket (4.19). Using the derivation property of the bracket and substituting u = uj , v = vj we then conclude that the only non-vanishing term in the right of (4.31) has the following form: {(L∧ (u, v))nk , (L∧ (u, v))nl }|(u,v)=(uj ,vj ) X ∧ ∂(L (u,v))nk ∂(L∧ (u,v))nl ∂(L∧ (u,v))nl = vj ∂(L(u,v))pr ∂(L(u,v))rs − ∂(L(u,v))pr
∂(L∧ (u,v))nk ∂(L(u,v))rs
∂(L(u,v))ps ∂u
|(u,v)=(uj ,vj )
prs
.
On the other hand the determinant of Mj;kl can be evaluated making use of its definition (4.28) and expressing the derivatives by u and v in terms of those by (L(u, v))pq . Then we have the following formula for the left-hand side of (4.31):
= vj
− vj det Mj;kl X ∧
(4.32)
∂(L (u,v))nk ∂(L∧ (u,v))nl ∂(L(u,v))ps ∂(L(u,v))rr
∧
(u,v))nl − ∂(L ∂(L(u,v))ps
∂(L∧ (u,v))nk ∂(L(u,v))rr
∂(L(u,v))ps ∂u
prs
|(u,v)=(uj ,vj )
.
Straightforward calculation, using (4.18) and the fact that the matrix L∧ (uj , vj ) has rank 1 shows that these two expressions are equal to each other (cf. here the proof of the analogous Theorem 1.3 from [1] establishing the Poisson brackets for the separation variables for the sl(n) Gaudin magnet obeying the simplest linear r-matrix algebra with the rational r-matrix). Theorem 3. The variables (uj , yj := log(vj )), j = 1, . . . , n − 1, together with the variables (X, P ) describing the “motion of the center-of-mass”, X := xn ,
P := log(Hn ) =
n X
pj ,
(4.33)
j=1
constitute the complete canonical set of new (separation) variables. Proof. The bracket {P, X} = 1 is easily seen, so, in addition to the statements of the Theorem 2, it is only left to check that {P, uj } = {P, vj } = 0 , {X, uj } = {X, vj } = 0 ,
j = 1, . . . , n − 1 , j = 1, . . . , n − 1 .
(4.34) (4.35)
The equalities (4.34) are trivial since (uj , vj ), j = 1, . . . , n − 1, are defined by the Eqs. (L(u) − v · 1)∧ nk = 0, k = 1, . . . , n, and entries of the matrix L(u) depend only on differences xi − xj , therefore
Separation of Variables for Ruijsenaars System
869
{P, (L(u))ij } = 0 ,
∀i, j = 1, . . . , n .
For the brackets in (4.35) we have the following expression (k 6= l): {X, uj } {X, (L∧ (u, v))nk } . Mj;kl =− ∧ {X, vj } {X, (L (u, v))nl } |(u,v)=(uj , vj )
(4.36)
The vector on the right of (4.36) is equal to zero since ∀k = 1, . . . , n, " # X L∧ L∧ −L∧ L∧ np qk nk qp {X, (L∧ (u, v))nk }| =− δnp (Lpq + v δpq ) = 0. det(L(u,v)) u = uj pq | v = vj u = uj v = vj The equalities (4.35) follow because the matrix Mj;kl is nondegenerate.
The proved SoV for the An−1 (n-particle) problem with the standard normalisation vector α ~ 0 ≡ (0, 0, . . . , 0, 1) actually implies another SoV for the An−2 problem with the non-standard normalisation vector α ~ 1: α ~ 1 := ( 8u (ξ − x1 + λ), . . . , 8u (ξ − xn−1 + λ) ) ,
(4.37)
if we choose ξ = xn . Let us demonstrate this explicitly. Let us take the Lax matrix (2.11) for the n-particle system L(u) =
n X
hi 8u (xi − xj + λ) Eij .
i,j=1
If we remove the last (nth ) row and the last column from this Lax matrix then we get the following (n − 1) × (n − 1) matrix: h1 8u (λ) · · · h1 8u (x1 − xn−1 + λ) .. .. .. , (4.38) L×n ×n (u) := . . . hn−1 8u (λ) hn−1 8u (xn−1 − x1 + λ) · · · which is the Lax matrix for the integrable system with n−1 particles with the Hamiltonian H1(×n) = 8u (λ)
n−1 X
hi = 8u (λ)
i=1
n−1 X
e pi
i=1
n Y σ(xi − xk − λ) . σ(xi − xk )
(4.39)
k6=i
Under the simple canonical transformation, e pi → e pi
σ(xi − xn ) , σ(xi − xn − λ)
x i → xi ,
i = 1, . . . , n − 1 ,
(4.40)
the system (4.39) turns into Ruijsenaars’ system with n − 1 particles. This 1-degreeof-freedom-less system obviously inherits the non-standard SoV with the dynamical normalisation (4.37) from the standard one (with α ~ 0 ) for the system with n degrees of freedom. Indeed, to see this, it is sufficient to note that the separation variables (uj , vj ), j = 1, . . . , n − 1, for both systems are defined from the intersection of two spectral curves:
870
V.B. Kuznetsov, F.W. Nijhoff, E.K. Sklyanin
n
det(L(u) − v · 1) = 0 , det(L×n ×n (u) − v · 1) = 0 .
(4.41)
In other words, the condition of the standard SoV for the first problem, α ~0 = n − 1, rank L(u) − v · 1 implies the following condition of SoV for the second problem: α ~1 = n−2, rank ×n L×n (u) − v · 1
(4.42)
(4.43)
where α ~ 1 (u) is given by (4.37). The procedure shown above, on how to connect the standard normalisation vector ~ 1 , does obviously reflect an embedding, gl(n − 1) ⊂ gl(n), α ~ 0 and the alternative one, α of one problem into the other. In other words (and it is true in general, for any integrable system of An type), one always has a free choice, namely: to include or not to include the “center-of-mass variable”, X, and its conjugate one, P , in the complete set of separation variables. 5. Generating Functions In this section we derive the explicit formulas for SoV in the simplest cases: n = 3 with the standard normalisation (3.6) of α ~ , and n = 2 with the dynamical normalisation (4.37) (we skip the trivial case of the purely coordinate SoV x1,2 → x1 ± x2 for the 2-particle problem). Since both cases are treated in very much the same manner as their trigonometric prototypes, see, respectively [18] and [19], we present only the main formulas here, omitting the details of the calculations. Let us start with the n = 3 case. Following [18] define two functions A1 (u) and A2 (u) by the formulas (L(u) − Ak )∧ 3,3−k = 0 ,
k = 1, 2 ,
(5.1)
or explicitly, Ak (u) = Lkk − ak (u) =
L3k Lk,3−k = epk ak (u) , L3,3−k
k = 1, 2 ,
σ(u + 2λ + x3 − x3−k ) σ(xk − x3−k − λ) , σ(λ) σ(u + λ + x3 − x3−k ) σ(xk − x3−k + λ)
k = 1, 2 .
(5.2) (5.3)
The separated variables uj are defined from the equation A1 (uj ) = A2 (uj ),
(5.4)
which is equivalent to the equation B(uj ) = 0 since B(u) = h23 8u (x3 − x1 + λ) 8u (x3 − x2 + λ) σ(λ) (A2 − A1 ) and has two roots u1,2 ∈ D. From the easily verified invariance of the ratio a1 (u)/a2 (u) under the transformation u 7→ x1 + x2 − 2x3 − 3λ − u it follows that u1 + u2 ≡ x1 + x2 − 2x3 − 3λ
(mod 0) ,
(5.5)
Separation of Variables for Ruijsenaars System
871
which agrees with (4.12). The conjugated variables vj ≡ eyj are defined as vj = A1 (uj ) = A2 (uj )
(5.6)
or, equivalently, through four equations j, k ∈ {1, 2} ,
vj = epk ak (uj ) ,
(5.7)
for four variables u1 , u2 , v1 , v2 . By virtue of Theorem 3 the variables (u1 , u2 , X; y1 , y2 , P ) are canonical. The generating function of the separating canonical transformation M is most conveniently expressed in terms of another set of canonical variables x+ = x1 + x2 − 2x3 , p± =
1 2
x − = x1 − x 2 ,
(p1 ± p2 ) ,
X = x3 ,
P = p1 + p2 + p3 ,
u ± = u1 ± u 2 ,
y± =
1 2
(y1 ± y2 ) .
(5.8) (5.9) (5.10)
We shall need a σ-generalisation of the Euler dilogarithm function, Z z log(sin(ζ)) dζ , Li2 (z) = 0
Z
which we define as
z
S(z) :=
log(σ(ζ)) dζ .
(5.11)
0
Notice that this function was introduced in [22] and has been used to construct the Lagrangian function of the integrable map which is a time-discretisation of the Ruijsenaars 2 system. Using the product expansion for the Weierstrass sigma-function (q = exp(iπ ω ω1 )): πz Y 2n 4n 2 ∞ cos 1 − 2q η z 1 ω1 + q 2ω1 2ω πz , σ(z) = e 1 sin (5.12) π 2ω1 (1 − q 2n )2 n=1
cf. [33], we can express the function S in terms of the following function Li3 (z; q) :=
∞ X k=1
zk , (1 − q k ) k 2
|q| < 1 ,
|z| < 1 .
(5.13)
Notice that similar, but different from (5.13), q-deformations of the Euler (di-) trilogarithm have been proposed in the review article [11]. In terms of (5.13) we obtain ! ÿ ∞ X η1 z 3 q 2k 2ω1 +2 z (5.14) + log S(z) = 6ω1 π (1 − q 2k )k k=1
+
iω1 Li3 (q 2 t; q 2 ) − Li3 (q 2 t−1 ; q 2 ) , π
where t = exp(πiz/ω1 ). This series representation converges for |q|2 ≤ |t| ≤ |q|−2 . Let L(ν; x, y) := S(ν + x + y) + S(ν − x + y) + S(ν + x − y) + S(ν − x − y) .
(5.15)
872
V.B. Kuznetsov, F.W. Nijhoff, E.K. Sklyanin
The generating function F (y+ , x+ ; u− , x− ) of the canonical transformation from (x± , p± ) to (u± , y± ), satisfying the defining relations ∂F = u+ , ∂y+
∂F = p+ , ∂x+
∂F = p− , ∂x−
∂F = −y− , ∂u−
(5.16)
is given then by the expression F = y+ (x+ − 3λ) + x+ log σ(λ) − L
λ x− u− ; , 2 2 2
(5.17)
+ S(λ − x− ) + S(λ + x− ) . The case n = 2 with the normalisation α ~ 1 = ( 8u (ξ − x1 + λ), 8u (ξ − x2 + λ) ) (cf. (4.37)) is treated similarly to its trigonometric prototype [19]. Having introduced the functions A1 (u) and A2 (u) by the formulas (~ α1 · (L(u) − Ak )∧ )3−k = 0 , or explicitly,
ak (u) =
k = 1, 2 ,
(5.18)
A1 = L11 −
8u (ξ−x1 +λ) 8u (ξ−x2 +λ)
L12 = ep1 a1 (u) ,
(5.19)
A2 = L22 −
8u (ξ−x2 +λ) 8u (ξ−x1 +λ)
L21 = ep2 a2 (u) ,
(5.20)
σ(u+ξ+2λ−x3−k ) σ(ξ−xk ) σ(xk −x3−k −λ) σ(λ) σ(u+ξ+λ−x3−k ) σ(ξ+λ−xk ) σ(xk −x3−k +λ)
,
k = 1, 2 ,
(5.21)
one proceeds as above with the only difference that the relation (5.5) is replaced by u1 + u2 ≡ x1 + x2 − 3λ − 2ξ
(mod 0)
(5.22)
and the variables x± are defined now as x± = x1 ± x2 . The resulting expression for F (y+ , x+ ; u− , x− ) is F = y+ (x+ − 3λ − 2ξ) + x+ log σ(λ) x− λ x− u− λ x+ − λ −L ; , −L ; − ξ, 2 2 2 2 2 2 + S(λ − x− ) + S(λ + x− ) .
(5.23)
6. Nonrelativistic Limit to the Calogero-Moser System The nonrelativistic limit is obtained by letting λ → 0 while rescaling the Pmomenta pj := iλpj /g, g ∈ R, and making the canonical transformation pj := pj − ig k6=j ζ(xj − xk ) such that hj → 1 + iλ pj /g + O(λ2 ) in (2.11). The (r, s)-matrix structure is linear in that limit since the L-matrix behaves as L(u) → ( λ−1 + ζ(u) ) · 1 + `(u) :=
X j
pj Ejj − ig
X j6=k
i g
`(u) + O(λ) ,
8u (xj − xk ) Ejk .
(6.1) (6.2)
Separation of Variables for Ruijsenaars System
873
The `-matrix (6.2) is Krichever’s [12] Lax operator for the elliptic Calogero-Moser system with the Hamiltonian H=
n X
p2j + g 2
j=1
X
℘(xj − xk ) .
(6.3)
j6=k
Proposition 5 ([30]). The Lax matrix `(u) (6.2) of the elliptic Calogero-Moser system satisfies linear (r, s)-algebra of the form ˜ = [`1 , r] + [`2 , s], {`1 (u), `2 (u)}
(6.4)
where s = a − b,
r = a + c,
s = −P r P|u↔u˜ ,
(6.5)
(see (4.22),(4.23),(4.24)), and [ . , . ] means matrix commutator. The SoV for the elliptic Calogero-Moser system follows, in principle, by taking the limit λ → 0 in the corresponding formulas describing SoV for the Ruijsenaars system. Although, because this limit is not so simple and straightforward, we prefer to do it independently, repeating the steps for proving the main statements for the Ruijsenaars system in Sect. 4. The normalisation vector is the same: α ~ (u) = α ~ 0 ≡ (0, 0, . . . , 0, 1) . We have now the following characteristic equations for the separation variables u = uj and v = vj (`(u) − v · 1)∧ k = 1, . . . , n . (6.6) nk = 0 , The zeros of the σ-polynomial b(u) b(u) := det
0 `n1 .. .
... ... .. .
`nn .. .
(`n−1 )n1
...
(`n−1 )nn
1
(6.7)
give us the separation variables uj . Theorem 4. σ-polynomial b(u) (6.7) has n − 1 zeros uj ∈ D and can be represented by the formula n−1 Y 8u (−uj ), (6.8) b(u) = C˜ j=1
where C˜ does not depend on the spectral parameter u. Variables uj obey the restriction n−1 X j=1
uj ≡
n−1 X
(xj − xn )
j=1
(mod 0) .
(6.9)
874
V.B. Kuznetsov, F.W. Nijhoff, E.K. Sklyanin
Proof. From the limit (6.1) and the definitions of B(u) and b(u) we conclude that B(u) = b(u) + O(λ) Both B(u) and b(u) are σ-polynomials in u and, since the degree of such a polynomial must not change with the analytical continuation of the parameter λ, b(u) has the same degree as B(u) does. Moreover, now the separation variables have to obey the restriction (6.9), the one being the limit of the corresponding relation (4.12). Let us introduce the following notations: `∧ (u, v) := (`(u) − v · 1)∧ ,
`(u, v) := `(u) − v · 1 ,
(6.10)
and also 11 = det(`(u, v)) and 12 = det(`(u, ˜ v)). ˜ Suppose now that a Lax matrix `(u) satisfies the linear (dynamical) (r, s)-bracket (6.4), then we have the following statement. Lemma 6. Let a Lax matrix `(u) satisfy the linear (r, s)-bracket of the form s = −P r P|u↔u˜ .
˜ = [`1 , r] + [`2 , s] , {`1 (u), `2 (u)}
(6.11)
Then the matrix `∧ (u, v) ≡ (`(u) − v · 1)∧ obeys the bracket of the form ∧ −1 ∧ ∧ ∧ ∧ ∧ ∧ ∧ {`∧ (6.12) ( `1 s − tr1 [ `∧ 1 , `2 } = 11 1 s ] ) `1 `2 − `1 `2 ( s `1 − tr1 [ s `1 ] ) ∧ −1 ∧ ∧ ∧ ∧ ∧ ∧ ∧ ( `2 r − tr2 [ `2 r ] ) `1 `2 − `1 `2 ( r `2 − tr2 [ r `2 ] ) . + 12 Theorem 5. The separation variables (uj , vj ), j = 1, . . . , n − 1, for the elliptic Calogero-Moser system, defined by the system of Eqs. (6.6), possess the following Poisson brackets: {ui , uj } = {ui , vj } = {vi , vj } = 0 , {vj , uj } = 1 .
(i) (ii)
i 6= j ,
Proof. In analogy with the proof of the Theorem 2 we have to show first that {(`∧ (u, v))nk , (`∧ (u, ˜ v)) ˜ nl }|A
ij
= 0,
∀k, l = 1, . . . , n ,
when i 6= j. We have from Lemma 6 and Proposition 5 that ˜ v)) ˜ nl }|A (i 6= j) (6.13) {(`∧ (u, v))nk , (`∧ (u, ij −1 ∧ −1 ∧ ∧ ∧ ∧ ∧ ∧ = 11 ( `1 s − tr1 [ `∧ 1 s ] ) `1 `2 + 12 ( `2 r − tr2 [ `2 r ] ) `1 `2 |A ij i X h (`∧ (u,v)) (`∧ (u,v)) −(`∧ (u,v)) (`∧ (u,v)) np qp ∧ qk nk = (a − b)pq,nr (` (uj , vj ))rl det(`(u,v)) pqr
+
X h (`∧ (u, ˜ v)) ˜
np (`
∧
(u, ˜ v)) ˜ ql −(`∧ (u, ˜ v)) ˜ nl (`∧ (u, ˜ v)) ˜ qp det(`(u, ˜ v)) ˜
pqr
i (a + c)nr,pq
|Aij
|Aij
(`∧ (ui , vi ))rk .
These two terms in the right hand side have the same form as the latter two in (4.29) and, again, they are equal to zero since in both expressions the simple zero in the denominator is cancelled by a double zero in the numerator. The matrix of derivatives M instead of (4.28) has now the form ∂(`∧ (u,v))nk ∂(`∧ (u,v))nk ∂u ∂v . (6.14) Mm;kl := ∂(`∧ (u,v)) ∂(`∧ (u,v))nl nl ∂u
∂v
|(u,v)=(um ,vm )
Separation of Variables for Ruijsenaars System
875
In order to prove the statement (ii) we have to show that − det Mj;kl = {(`∧ (u, v))nk , (`∧ (u, v))nl }|(u,v)=(uj ,vj ) ,
(6.15)
where k 6= l. Again, the right hand-side of (6.15) can be evaluated by first taking the limit u˜ → u in the (r, s)-bracket of Proposition 5 and then using the derivation property of the bracket. We derive the following expression: {(`∧ (u, v))nk , (`∧ (u, v))nl }|(u,v)=(uj ,vj )
=
X
∂(`∧ (u,v))nk ∂(`∧ (u,v))nl ∂(`(u,v))pr ∂(`(u,v))rs
−
∂(`∧ (u,v))nl ∂(`∧ (u,v))nk ∂(`(u,v))pr ∂(`(u,v))rs
∂(`(u,v))ps ∂u
|(u,v)=(uj ,vj )
prs
.
On the other hand the determinant of Mj;kl (cf. (4.32)) has the form
=
− det Mj;kl X ∧
(6.16)
∂(` (u,v))nk ∂(`∧ (u,v))nl ∂(`(u,v))ps ∂(`(u,v))rr
−
∂(`∧ (u,v))nl ∂(`∧ (u,v))nk ∂(`(u,v))ps ∂(`(u,v))rr
∂(`(u,v))ps ∂u
prs
|(u,v)=(uj ,vj )
.
Such two expressions are equal to each other by the reasons pointed out in the end of the proof of Theorem 2. Theorem 6. The variables (uj , vj ≡ yj ), j = 1, . . . , n − 1, together with the variables (X, P ) describing the motion of the center-of-mass, X := xn ,
P := tr `(u) =
n X
pj ,
(6.17)
j=1
constitute the complete canonical set of new (separation) variables. Proof. repeats the proof of the Theorem 3.
Consider now the nonrelativistic limit of the generating functions F (5.17) and (5.23) in the two simplest cases. In analogy with calculations in the previous section, for the case n = 3 let us define two functions A1 (u) and A2 (u) by the formulas (5.1), or explicitly, Ak (u) = Lkk −
L3k Lk,3−k = pk + ig ak (u) , L3,3−k
k = 1, 2 ,
ak (u) = ζ(u) + ζ(x3 − xk ) + ζ(xk − x3−k ) − ζ(u + x3 − x3−k ) ,
(6.18) k = 1, 2 .
The ±-variables are defined by (5.8)–(5.10) and we have the restriction u + ≡ x+
(mod 0) .
(6.19)
The generating function F (y+ , x+ ; u− , x− ) (cf. formula (7.12) in [31]) is then given by the expression x +u x −u x+ −x− − σ − 2 − σ − 2 − σ x+ +x σ 2 2 . (6.20) F = y+ x+ + ig log x+ +u− x+ −u− σ ) σ(x σ − 2 2 Similarly, in the case n = 2, the normalisation vector is taken as follows:
876
V.B. Kuznetsov, F.W. Nijhoff, E.K. Sklyanin
α ~ 1 = ( 8u (ξ − x1 ), 8u (ξ − x2 ) ) . Introduce the functions A1 (u) and A2 (u) by the formulas (5.18), or explicitly, A1 = L11 −
8u (ξ−x1 ) 8u (ξ−x2 )
L12 = p1 + ig a1 (u) ,
(6.21)
A2 = L22 −
8u (ξ−x2 ) 8u (ξ−x1 )
L21 = p2 + ig a2 (u) ,
(6.22)
ak (u) = ζ(u) + ζ(ξ − xk ) + ζ(xk − x3−k ) − ζ(u + ξ − x3−k ) ,
k = 1, 2 .
The variables x± are defined in this case as x± = x1 ± x2 and we have the restriction u+ ≡ x+ − 2ξ
(mod 0) .
(6.23)
The generating function F (y+ , x+ ; u− , x− ) has the following form: x +u x −u x +x −2ξ x −x −2ξ σ − 2 − σ − 2 − σ + 2− σ + 2− . (6.24) F = y+ x+ +ig log x+ +u− −2ξ x+ −u− −2ξ σ σ σ(x− ) 2 2 7. Concluding Remarks We have performed the separation of variables for the classical n-particle Ruijsenaars system. If we replace the σ-function σ(x) in all the above formulas by sin(x) (sinh(x)) or by the identity function: x → x, then we get all the above statements valid for the cases of the trigonometric (hyperbolic) or rational Ruijsenaars system, respectively. We have found the explicit generating function F (u|x) of the separating canonical transform in the cases of two and three particles. It is a challenging problem to obtain such a function for n > 3 in any explicit form. What is also a problem for possible further studies of this integrable system is to produce a quantum SoV, i.e. to find the corresponding kernel M~ (u|x) of the quantum separating integral operator M~ and related integral representation for eigenfunctions of the quantum integrals of motion Hj (cf. [31, 17, 18, 19, 20]). Acknowledgement. VBK and FWN wish to acknowledge the support of EPSRC.
References 1. Adams, M.R., Harnad, J. and Hurtubise, J.: Darboux coordinates and Liouville-Arnold integration in loop algebras. Commun. Math. Phys. 155, 385–413 (1993) 2. Arnol’d, V.I.: Mathematical methods of classical mechanics. New-York–Heidelberg–Berlin: Springer, 1974 3. Dubrovin, B.A.: Theta functions and nonlinear equations. Russ. Math. Surv. 36, 11–92 (1981) 4. Eilbeck, J.C., Enol’skii, V.Z., Kuznetsov, V.B. and Tsiganov, A.V.: Linear r-matrix algebra for classical separable systems. J. Phys. A: Math. Gen. 27, 567–578 (1994) 5. Erdelyi, A. et al: Higher transcendental functions, Volume 3. New York: McGraw Hill, 1953 ¨ 6. Frobenius, G.: Uber die elliptischen Funktionen zweiter Art. J. Reine Angew. Math. 93, 53–68 (1882) 7. Gekhtman, M.I.: Separation of variables in the classical SL(N ) magnetic chain. Commun. Math. Phys. 167, 593–605 (1995) 8. Kalnins, E.G.: Separation of variables for Riemannian spaces of constant curvature Pitman Monographs and Surveys in Pure and Applied Mathematics 28, Essex, England: Longman Scientific and Technical, 1986
Separation of Variables for Ruijsenaars System
877
9. Kalnins, E.G., Kuznetsov, V.B. and Miller Jr., W.: Quadrics on complex Riemannian spaces of constant curvature, separation of variables and the Gaudin magnet. J. Math. Phys. 35, 1710–1731 (1994) 10. Kalnins, E.G., Kuznetsov, V.B. and Miller Jr.,W.: Separation of variables and XXZ Gaudin magnet. Rendiconti del Seminario Matematico dell’Universita e del Politecnico di Torino 53, 109–120 (1995) 11. Kirillov, A.N.: Dilogarithm identities. Progr. Theor. Phys. Suppl. 118, 61–142 (1995) 12. Krichever, I.M.: Elliptic solutions of the Kadomtsev-Petviashvili equation and integrable systems of particles. Func. Anal. Appl. 14, 282–290 (1980) 13. Krichever, I.M. and Novikov, S.P.: Holomorphic bundles over algebraic curves and nonlinear equations. Russ. Math. Surv. 32, 53–79 (1980) 14. Kuznetsov, V.B.: Quadrics on real Riemannian spaces of constant curvature: separation of variables and connection with Gaudin magnet. J. Math. Phys. 33, 3240–3254 (1992) 15. Kuznetsov, V.B.: Equivalence of two graphical calculi. J. Phys. A: Math. Gen. 25, 6005–6026 (1992) 16. Kuznetsov, V.B.: Separation of variables for the Dn -type periodic Toda lattice. J. Phys. A: Math. Gen. 30, 2127–2138 (1997) 17. Kuznetsov, V.B. and Sklyanin, E.K.: Separation of variables in A2 type Jack polynomials. RIMS Kokyuroku 919, 27–34(1995) 18. Kuznetsov, V.B. and Sklyanin, E.K.: Separation of variables for the A2 Ruijsenaars model and a new integral representation for the A2 Macdonald polynomials. J. Phys. A: Math. Gen. 29, 2779–2804 (1996) 19. Kuznetsov, V.B. and Sklyanin, E.K.: Separation of variables and integral relations for special functions. October 1996, submitted 20. Kuznetsov V.B. and Sklyanin, E.K.: Factorisation of Macdonald polynomials. In: Proceedings of the Second Workshop on Symmetries and Integrability of Difference Equations (SIDEII), July 1996, Canterbury, UK , to appear 21. Nijhoff, F.W., Kuznetsov, V.B., Sklyanin, E.K. and Ragnisco, O.: Dynamical r-matrix for the elliptic Ruijsenaars-Schneider system. J. Phys. A: Math. Gen. 29, 333–L340 (1996) 22. Nijhoff, F.W., Ragnisco, O. and Kuznetsov, V.B.: Integrable time-discretization of the RuijsenaarsSchneider model. Commun. Math. Phys. 176, 681–700 (1996) 23. Ruijsenaars, S.N.M.: Complete integrability of relativistic Calogero-Moser systems and elliptic function identities. Commun. Math. Phys. 110, 191–213 (1987) 24. Scott, D.R.D.: Classical functional Bethe ansatz for SL(N ): separation of variables for the magnetic chain. J. Math. Phys. 35, 5831–5843 (1994) 25. Sklyanin, E.K.: Goryachev-Chaplygin top and the inverse scattering method. J. Soviet Math. 31, 3417– 3431 (1985) 26. Sklyanin, E.K.: The quantum Toda chain. In: Non-linear equations in classical and quantum field theory. Ed. by N.Sanchez (Lecture Notes in Physics 226), New York.: Springer, 1985, pp. 196–233. 27. Sklyanin, E.K.: Poisson structure of a periodic classical XYZ chain. J. Soviet Math. 46, 1664–1683 (1989) 28. Sklyanin, E.K.: Separation of variables in the Gaudin model. J. Sov. Math. 47, 2473–2488 (1989) 29. Sklyanin, E.K.: Separation of variables in the classical integrable SL(3) magnetic chain. Commun. Math. Phys. 150, 181–191 (1992) 30. Sklyanin, E.K.: Dynamical r-matrices for the elliptic Calogero-Moser model. St. Petersburg Math. J. 6, 397–406 (1994) 31. Sklyanin, E.K.: Separation of variables. New trends. Progr. Theor. Phys. Suppl. 118, 35–60 (1995) 32. Suris, Yu.B.: Elliptic Ruijsenaars-Schneider and Calogero-Moser hierarchies are governed by the same r-matrix. Preprint, solv-int/9603011 33. Whittaker, E.T. and Watson, G.N.: A course in modern analysis. Cambridge: Cambridge University Press, 4th ed., 1988 Communicated by G. Felder
Commun. Math. Phys. 189, 879 – 890 (1997)
Communications in
Mathematical Physics c Springer-Verlag 1997
On the Spectrum of Two-Dimensional Schr¨odinger Operators with Spherically Symmetric, Radially Periodic Magnetic Fields ? Georg Hoever Mathematisches Institut, Universit¨at M¨unchen, Theresienstr. 39, 80333 M¨unchen, Germany. E-mail: [email protected] Received: 12 March 1997 / Accepted: 2 April 1997
Abstract: We investigate the spectrum of the two-dimensional Schr¨odinger operator 2 2 ∂ ∂ H = − ∂x − ia1 (x, y) − ∂y − ia2 (x, y) + V (x, y), where the magnetic field B(x, y) =
∂ ∂x
a2 −
∂ a1 ∂yp
and the electric potential V are spherically symmetric, i.e.,
B(x, y) = b(r), r = x2 + y 2 , and b is p-periodic, similarly R p for V . By considering two different gauges we get the following results: In case 0 b(s) ds = 0 the spectrum contains a semi-axis that consists R p alternately of intervals of absolutely continuous and dense point spectrum. In case 0 b(s) ds 6= 0 the essential spectrum is purely dense point spectrum and possibly there are spectral gaps. 1. Introduction and Results The fact that the spectrum of a Schr¨odinger operator with spherically symmetric electric potential (without magnetic field) contains a semi-axis is proved in [HHK], not by separation in spherical coordinates but by separation in rectangular coordinates. In [HHHK] the result is applied in the case of potentials which additionally are periodic with respect to the radius, and combined with results obtained by separation in spherical coordinates to get a more detailed description of the spectrum. Motivated by these methods we will get our results concerning spherically symmetric, radially periodic magnetic fields. Whereas the electric potential arising in the Schr¨odinger operator is uniquely determined up to a constant, we can choose very different magnetic potentials which all describe the same magnetic field. In fact we will use different gauges in our examination. Now we give the results in detail: Let b : R → R be continuous and p-periodic (p > 0) with piecewise continuous ? This research was supported by Deutsche Forschungsgemeinschaft through Graduiertenkolleg “Mathematik im Bereich ihrer Wechselwirkung mit der Physik.”
880
G. Hoever
and bounded derivative a.e., and let a1 , a2 ∈ L4loc (R2 ), distributional sense), satisfying
∂ ∂x
a1 +
∂ ∂y
a2 ∈ L2loc (R2 ) (in
p ∂ ∂ a2 (x, y) − a1 (x, y) = b( x2 + y 2 ) . ∂x ∂y Letpv : R → R be p-periodic, piecewise continuous and bounded, and let V (x, y) = v( x2 + y 2 ). Let H be the two-dimensional Schr¨odinger operator H=−
2
∂ − ia1 (x, y) ∂x
−
2
∂ − ia2 (x, y) ∂y
+ V (x, y)
acting on L2 (R2 ). Due to [LS, Theorem 3] the operator H is essentially self-adjoint with core C0∞ (R2 ) (the set of all infinitely differentiable functions with compact support in R2 ). We define Z Z p Z x 1 p β= b(s) ds , d(x) = b(s) ds , δ = d(s) ds p 0 0 0 and the one-dimensional Schr¨odinger operator Hη = −
∂2 + (d(r) − η)2 + v(r) ∂r2
(η ∈ R)
acting on L2 (R). The structure of the spectrum of H depends on whether β = 0 or β 6= 0. In case β = 0 obviously the function d (and hence (d(r) − η)2 + v(r)) is p-periodic. Hence σ(Hη ) consists of bands ([RS, Theorem XIII.90]; the theorem holds for any period, cf. [E]). Let µ = inf inf σ(Hη ) η∈R
and [αn , βn ], n ∈ N = {1, 2, . . .}, be the spectral bands of Hδ as defined in [RS, Theorem XIII.90]. Theorem 1. Under the definitions above in case β = 0 we have: (a) (b) (c) (d)
[µ, ∞) ⊆ σ(H). H has no continuous spectrum in (−∞, α1 ) and in the gaps (βn , αn+1 ) of σ(Hδ ). Eigenvalues of H are dense in (µ, α1 ) and in the gaps (βn , αn+1 ) of σ(Hδ ). The spectrum of H is purely absolutely continuous in the interior (αn , βn ) of the spectral bands of Hδ .
Remark 1. In general it remains open whether µ = inf inf σ(Hη ) < inf σ(Hδ ) = α1 η∈R
or not. Below we will give a class of examples where µ < α1 . Then, due to Theorem 1, the lowest part of the essential spectrum is purely dense point spectrum.
On the Spectrum of Schr¨odinger Operators with Magnetic Fields
881
In case β 6= 0 it is easy to see that |d(r)| → ∞ for |r| → ∞. Thus we have (d(r) − η)2 + v(r) → ∞ (|r| → ∞) and Hη has compact resolvent ([RS, Theorem XIII.67]), i.e., σ(Hη ) = {λ1 (η), λ2 (η), . . .},
λ1 (η) ≤ λ2 (η) ≤ . . . and λn (η) → ∞ (n → ∞),
where the eigenvalues λn are counted according to their multiplicity. Theorem 2. Under the definitions above in case β 6= 0 we have: (a) The essential spectrum σess (H) of H is purely dense point spectrum, i.e., it contains no continuous spectrum and eigenvalues are dense in σess (H). (b) The maps η 7→ λn (η) are β-periodic and ∞ [
λn ([0, β)) ⊆ σess (H) .
n=1
Remark 2. A special example here, which we will examine below, is b ≡ const 6= 0 and v ≡ 0, i.e., a homogeneous magnetic S∞ field. Then all λn turn out to be constant, λn ≡ (2n − 1)b and σ(H) = σess (H) = n=1 {(2n − 1)b}, i.e., the well-known spectral characterisation for the homogeneous case (see, e.g., [I90, Theorem 4.1]). Essential for the proof is [L, Theorem 1.3]: We may choose an arbitrary L4loc -gauge with L2loc -divergence for the magnetic potential without varying the spectrum. We will choose two different gauges (Sects. 2 and 3 below) and in each case we will draw conclusions for β = 0 and β 6= 0. In Sect. 4 we analyse the two examples mentioned above.
2. The First Gauge A well-known suitable gauge for the spherically symmetric magnetic field is the following (cf. [MS]): Let Z p 1 r s b(s) ds , r = x2 + y 2 a(r) = r 0 and
y a1,rad (x, y) = − a(r), r Then an easy computation yields
a2,rad (x, y) =
x a(r) . r
∂ ∂ a2,rad (x, y) − a1,rad (x, y) = b(r) . ∂x ∂y Thus we can apply [L, Theorem 1.3] to 2 2 ∂ ∂ Hrad = − − ia1,rad (x, y) − − ia2,rad (x, y) + V (x, y) ∂x ∂y and obtain that H is unitarily equivalent to Hrad (formally: H ∼ = Hrad ), in particular
882
G. Hoever
σ(H) = σ(Hrad ) and the same with σ replaced by σess , σp , σac and σsc ,
(1)
i.e., by the essential, pure point, absolutely continuous and singular continuous spectrum, respectively. By the standard transformation into polar coordinates (cf. [MS]) we get M Hm,rad , Hrad ∼ = m∈Z
where
m2 − 41 ∂2 m + + 2 a(r) + a2 (r) + v(r) 2 ∂r r2 r 2 acts on L ((0, ∞)). In particular we have [ σ(Hm,rad ) and the same with σ replaced by σac and σsc σ(Hrad ) = m∈Z [ and σp (Hrad ) = σp (Hm,rad ) Hm,rad = −
(2)
m∈Z
(see [Sch, Lemma 7]; the assertion concerning σsc can be proved similarly). Now we shall determine σ(Hm,rad ): Lemma 1. If β = 0 then, for all m ∈ Z, σess (Hm,rad ) = σac (Hm,rad ) = σ(Hδ ) =
[
[αn , βn ] .
n∈N
Further σsc (Hm,rad ) = ∅ and σp (Hm,rad ) ∩ (αn , βn ) = ∅, i.e., Hm,rad has no embedded eigenvalues in the interior of the spectral bands of Hδ . Using Eqs. (1), (2) and Lemma 1, the claims (b) and (d) of Theorem 1 follow immediately and then (c) is a consequence of (a) and (b). Thus, after proving Lemma 1, it remains to show that [µ, ∞) ⊆ σ(H), which we will do in the third section. Proof (of Lemma 1). We first rewrite Hm,rad : By partial integration we have Z r Z 1 1 r r d(r) − a(r) = d(s) ds = d(r) − d(s) ds r r 0 0
(3)
and, defining vm (r) = 2m
2 Z Z 1 r a(r) 1 r + 2(d(r) − δ) δ − d(s) ds + δ − d(s) ds , r r 0 r 0
it is easy to see that Hm,rad = −
m2 − ∂2 + ∂r2 r2
1 4
+ vm (r) + (d(r) − δ)2 + v(r) .
We want to apply [St, Theorem 2(b)] to q0 (r) = (d(r) − δ)2 + v(r) (i.e., τ0 = Hδ ) and q(r) = that (i)
m2 − 41 r2
+ vm (r), that states that the assertions of Lemma 1 are valid if we can show
Hm,rad |(0,1) has purely discrete spectrum,
On the Spectrum of Schr¨odinger Operators with Magnetic Fields
883
R∞ (ii) 1 |q(s + p) − q(s)| ds < ∞, R r+1 (iii) limr→∞ r |q(s)| ds = 0. Obviously R p (i) is satisfied ([DSch, Theorem XIII 7.17]). Since 0 δ − d(s) ds = 0 by the definition of δ, the p-periodicity of d yields Z Z r 1 r δ − 1 = d(s) ds δ − d(s) ds r 0 r 0 Z t 1 δ − d(s) ds . ≤ max r t∈[0,p)
(4)
0
r→∞
Thus it is easy to see that a(r) is bounded (using Eq. (3)) and that vm (r) −→ 0. Hence (iii) is fulfilled. R 1 r To prove (ii), we first consider a(r) r and (d(r) − δ)(δ − r 0 d(s) ds): a(r + p) a(r) • − r+p r Z r Z 1 1 r 1 (3) 1 d(r) − d(r) − = d(s) ds + pδ − d(s) ds r+p r+p r r 0 0 2 Z r p 2pr + p 1 p ≤ |d(r)| + d(s) ds + |δ|. r(r + p) r(r + p)2 r 0 (r + p)2 Z r+p Z 1 r 1 d(s) ds − (d(r) − δ) δ − d(s) ds • (d(r + p) − δ) δ − r+p 0 r 0 Z r Z r 1 1 = (d(r) − δ) d(s) ds − d(s) ds + pδ r 0 r+p Z 0 p 1 r = |d(r) − δ| d(s) ds − δ r + p r 0 Z t (4) p |d(r) − δ| max ≤ δ − d(s) ds . t∈[0,p) r(r + p) 0 These estimates, the boundness of d(r) and (4) yield |q(r + p) − q(r)| ≤ and (ii) follows.
1 · const. r2
Lemma 2. If β 6= 0 then σess (Hm,rad ) = ∅ for all m ∈ Z. Using Eqs. (1), (2) and Lemma 2, the assertion (a) of Theorem 2 follows immediately. r→∞
Proof (of Lemma 2). We first show that |a(r)| −→ ∞: Let r = np + t, t ∈ (0, p], n ∈ {0, 1, 2, . . .}. Then Z t n−1 Z 1 X p (jp + s)b(s) ds + (np + s)b(s) ds a(r) = r 0 0 j=0 Z Z Z n p np t 1 t n(n − 1) pβ + sb(s) ds + b(s) ds + sb(s) ds . = 2r r 0 r 0 r 0
884
G. Hoever
Obviously the last three summands are bounded by a constant that depends only on b. Since β 6= 0 and n(n−1) → ∞ for r → ∞ and corresponding decompositions r = np + t, 2r r→∞ we have |a(r)| −→ ∞. Now it is easy to see that m2 − r2
1 4
+2
m r→∞ a(r) + a2 (r) + v(r) −→ ∞ . r
Thus, due to [DSch, Theorems XIII 7.4, 7.16 and 7.17], the assertion follows.
3. The Second Gauge Let
Z
y
arec (x, y) = −
√ b( x2 + s2 ) ds ,
0
Hrec = −
2
∂ − iarec (x, y) ∂x
−
∂2 + V (x, y) . ∂y 2
∂ Then − ∂y arec (x, y) = b(r) and again [L, Theorem 1.3], yields
H∼ = Hrec , in part. σ(H) = σ(Hrec ) and σess (H) = σess (Hrec ) .
(5)
Intuitively by this kind of definition we have arec (x, y) ∼ −y · b(x) for |y| << |x|, more precisely: Ry √ Lemma 3. Let w : R → R be p-periodic (p > 0), u(x, y) = − 0 w( x2 + s2 ) ds. Then for all y0 and each continuity point x0 of w we have lim |u(x0 + kp, y0 ) + y0 w(x0 + kp)| = 0 .
k→∞
Proof. For |s| ≤ |y0 | and large integers k (such that x0 + kp ≥ kp 2 ) we have y 2 2 p s 0 . | (x0 + kp)2 + s2 − kp − x0 | = p ≤ (x0 + kp)2 + s2 + kp + x0 kp
(6)
Because of the continuity of w in x0 , for ε > 0 fixed there is a ε1 > 0 such that 2 0 < ε1 ), using the |w(x) − w(x0 )| < ε for all |x − x0 | < ε1 . Thus, for large k (so that ykp periodicity of w and (6), we have p p | − w( (x0 + kp)2 + s2 ) + w(x0 + kp)| = |w( (x0 + kp)2 + s2 − kp) − w(x0 )| < ε , hence
Z |u(x0 + kp, y0 ) + y0 w(x0 + kp)| =
0
and the assertion follows.
y0
p 2 2 −w( (x0 + kp) + s ) + w(x0 + kp) ds ≤ |y0 |ε
On the Spectrum of Schr¨odinger Operators with Magnetic Fields
885
Bearing this in mind, it is reasonable to consider 2 ∂2 ∂ ˆ H=− + iyb(x) − 2 + v(x) . ∂x ∂y Indeed there is a connection between the spectra of Hˆ and Hrec : ˆ ⊆ σess (Hrec ). Lemma 4. σ(H) Proof. We use the following spectral characterisation (see, e.g., [W, Theorems 7.22 and 7.24]: ˆ iff there exists a sequence fn ∈ C ∞ (R2 ), kfn k2 = 1 satisfying λ ∈ σ(H) 0 (∗) ˆ kHfn − λfn k2 ≤ 1 , n
λ ∈ σess (Hrec ) iff there exists a sequence gn ∈ C0∞ (R2 ), kgn k2 = 1 satisfying (∗∗) gn → 0 weakly and limn→∞ kHrec gn − λgn k2 = 0. ˆ and choose fn corresponding to (∗). The idea is to shift the fn to the Now, fix λ ∈ σ(H) right so that arec (x, y) ∼ −yb(x) on the support of the shifted fn . To specify this, fix an integer n > 0. Then there exists a constant R such that suppfn ⊆ [−R, R] × [−R, R]. Let fn,k (x, y) = fn (x − kp, y) for integers k. Obviously kfn,k k2 = kfn k2 = 1 and because of the p-periodicity of b and v, ˆ n − λfn k2 ≤ 1 . ˆ n,k − λfn,k k2 = kHf kHf n By a short simple calculation we get ˆ n,k k2 kHrec fn,k − Hf ∂ = k2i arec (x, y) + yb(x) ∂x fn,k (x, y) h ∂ + i ∂x arec (x, y) + yb0 (x) + (arec (x, y))2 − (yb(x))2 i +V (x, y) − v(x) fn,k (x, y)k2 ∂ ≤ k2 arec (x + kp, y) + yb(x + kp) ∂x fn (x, y)k2 0 ∂ + k ∂x arec (x + kp, y) + yb (x + kp) fn (x, y)k2 + k (arec (x + kp, y))2 − (yb(x + kp))2 fn (x, y)k2 + k V (x + kp, y) − v(x + kp) fn (x, y)k2 . We claim that the four summands tend to zero for k → ∞. To show this we use the dominated convergence theorem. Due to the conditions on b there is a constant C such that |b(x)| ≤ C and |b0 (x)| ≤ C a.e. on R. Using this we have |arec (x, y)| ≤ |y|C a.e. and Z y √ ∂ x 0 2 2 √ b ( x + s ) ds ≤ |y|C . ∂x arec (x, y) = 2 2 x +s 0 Now, since fn has compact support, it is easy to see that the functions arising in the summands above are bounded a.e. by constants independent of k and that we can find integrable majorants. Thus it remains to show that the integrands tend to zero pointwise a.e.:
886
G. Hoever
– lim |2 arec (x + kp, y) + yb(x + kp) | → 0 due to Lemma 3.
(7)
k→∞
– We have
∂ arec (x + kp, y) + yb0 (x + kp) ∂x Z y p 0 0 2 2 ≤ yb (x + kp) − b ( (x + kp) + s ) ds 0 Z y p x+kp 0 2 2 + 1− √ b ( (x + kp) + s ) ds . 2 2 (x+kp) +s
0
The first summand tends to zero for each continuity point x of b0 due to Lemma 3 applied to b0 . The second summand tends to zero, because b0 is bounded and, for |s| ≤ |y| and large k, similarly to (6) p 1 x + kp = p (x + kp)2 + s2 − (x + kp) 1 − p 2 2 2 2 (x + kp) + s (x + kp) + s ≤
1 1 2 kp
·
y2 . kp
– Because of (7) and the boundness independent of k we have (arec (x + kp, y))2 − (yb(x + kp))2 k→∞
= |arec (x + kp, y) + yb(x + kp)| · |arec (x + kp, y) − yb(x + kp)| −→ 0 . – Using (6) with s = y and the definition of V it is obvious that in each continuity point x of v, k→∞
|V (x + kp, y) − v(x + kp)| −→ 0 . ˆ n,k k2 = 0, hence we can choose kn such Thus we have shown limk→∞ kHrec fn,k − Hf 1 ˆ n,kn k2 ≤ and suppfn,kn ∩ [−n, n]2 = ∅. Then that kHrec fn,kn − Hf n ˆ n,kn k2 + kHf ˆ n,kn − λfn,kn k2 ≤ 2 , kHrec fn,kn − λfn,kn k2 ≤ kHrec fn,kn − Hf n and obviously fn,kn → 0 weakly. Using the spectral characterisation (∗∗), we obtain λ ∈ σess (Hrec ). To determine the spectrum of HˆR we use partial Fourier-transformation like in [I85, §2]: ∞ Defining (U g)(x, t) = √12π −∞ e−ity g(x, y) dy for g ∈ C0∞ (R2 ) and continuing to a unitary isomorphism of L2 (R2 ) we obtain in the usual way 2 ˆ −1 = − ∂ − b(x) ∂ Hˆ ∼ + t2 + v(x) =: H˜ . = U HU ∂x ∂t Now, the unitary transformation of the variables ξ = x, η = d(x) + t, i.e., T : L2 (R2 ) → L2 (R2 ), T (w(x, t)) = (ξ, η) 7→ w(ξ, η − d(ξ)) , yields
On the Spectrum of Schr¨odinger Operators with Magnetic Fields
887
2 ˜ ∼ ˜ −1 = − ∂ + (d(ξ) − η)2 + v(ξ) =: H˜ tra . H = T HT ∂ξ 2
If we decompose L2 (R2 ) as a direct integral over L2 (R) (the two-dimensional function (ξ, η) 7→ w(ξ, η) is considered as a collection (ξ 7→ w(ξ, η))η∈R of one dimensional R ˜ tra = ⊕ Hη dη, where functions) we get a related decomposition H R Hη = −
∂2 + (d(ξ) − η)2 + v(ξ) ∂ξ 2
is an operator acting on L2 (R) (cf. [RS, Sect. XIII.16] and [I85, §2]). Due to [RS, Theorem XIII.85] we have ˆ = σ(H) ˜ = σ(H ˜ tra ) σ(H) = {λ : ∀ε > 0 : |{η : σ(Hη ) ∩ (λ − ε, λ + ε) 6= ∅}| > 0} ,
(8)
where |M | indicates the Lebesgues measure of the set M . Case β = 0: As mentioned above, in this case d is p-periodic. It is remarkable that Hδ is exactly the operator that appeared in the second section and induced the bands in the spectrum mentioned there. Here we do not need the precise structure of the spectrum of Hη but only the behaviour of µ(η) = inf σ(Hη ) . Recalling the results of the second section, to prove Theorem 1 it remains to show [µ, ∞) ⊆ σ(H), where µ = inf η∈R µ(η). Obviously Hη = Hη0 + Aη,η0 , where the operator of multiplication Aη,η0 = 2(η0 − η)(d(ξ) − η0 ) + (η0 − η)2 is symmetric and bounded, satisfying limη→η0 kAη,η0 k = 0. Thus, due to [K, Theorem V.4.10], the map η 7→ µ(η) is continuous. Now, using (8), it is easy to see, that ˆ . {µ(η) : η ∈ R} ⊆ σ(H)
(9)
Since d and v are bounded (say by the constants d0 and v0 respectively), for |η| > d0 and all ξ ∈ R, we have (d(ξ) − η)2 + v(ξ) ≥ (|η| − d0 )2 − v0 , hence µ(η) ≥ (|η| − d0 )2 − v0 . Thus µ(η) → ∞ for |η| → ∞ and by (9) and the continuity of µ(η) we get ˆ . [µ, ∞) = [ inf µ(η), ∞) ⊆ σ(H) η∈R
ˆ Now, Lemma / σ(H).) (In fact equality holds since, if λ < inf η∈R µ(η), Eq. (8) yields λ ∈ 4 and Eq. (5) yield [µ, ∞) ⊆ σ(H) and the proof of Theorem 1 is complete. Case β 6= 0: As noticed before Theorem 2, in this case |d(ξ)| → ∞ for |ξ| → ∞ and Hη has compact resolvent, σ(Hη ) = {λ1 (η), λ2 (η), . . .},
λ1 (η) ≤ λ2 (η) ≤ . . . and λn (η) → ∞ (n → ∞) ,
888
G. Hoever
where the eigenvalues λn are counted according to their multiplicity. The λn (η) depend continuously (even analytically) on η (see [I85, Lemma 2.3.(iii)]; the proof there goes through under our conditions), thus Eq. (8) yields [ ˆ = λn (R) . σ(H) n∈N
Since a shift of the independent variable is a unitary isomorphism in L2 (R) we get ∂2 Hη ∼ = − 2 + (d(ξ + p) − η)2 + v(ξ + p) ∂ξ ∂2 = − 2 + (d(ξ) + β − η)2 + v(ξ) ∂ξ = Hη−β , hence λn (η) = λn (η − β), i.e., the λn are β-periodic and, by Lemma 4 and Eq. (5), Theorem 2(b) is proved. 4. Examples First let us consider a homogeneous magnetic field which is of course a spherically symmetric, radially periodic field with arbitrary period p > 0. Then b(r) ≡ b = const 6= 0 and β = bp 6= 0, d(x) = xb and Hη = −
2 ∂2 ∂2 2 + (xb − η) = − + b2 x − ηb . 2 2 ∂x ∂x
∂ 2 2 and a well-known result gives A shift in x of ηb gives Hη ∼ = H0 = − ∂x 2 + b x λn (η) ≡ (2n − 1)b. (The fact that the λn (η) are constant is not surprising because Theorem 2(b) states that they are β- hence bp-periodic, where p is arbitrary here). Since ˆ hence the analysis of Sect. 3 gives now arec (x, y) = −yb we have even Hrec = H, 2
ˆ σ(H)
Lemma 4
⊆
(5)
ˆ . σess (H) ⊆ σ(H) = σ(Hrec ) = σ(H)
Thus we have σ(H) = σess (H) = {(2n − 1)b : n ≥ 1}. S By this result we can guess that there are also other cases where gaps in n≥1 λn ([0, β)) are spectral gaps of σ(H). Finally we give a class of examples for the case β = 0, satisfying inf inf σ(Hη ) < inf σ(Hδ ) .
η∈R
(10)
Then, as mentioned in Remark 1, the lowest part of the essential spectrum is purely dense point spectrum. Let v ≡ 0, ε ∈ (0, 21 ) and b = bε be 2-periodic consisting of peaks like in Fig. 1 below. Then β = 0 and the conditions of Theorem 1 are fulfilled. Now, d(x) looks like in Fig. 2 and obviously δ = 0. [E, Theorem 5.5.1] applied to the potential d 2 (x) yields ÿZ !2 Z 2 2 dx 1 2 . |d (x) − c| dx , where c = d 2 (x) µ(0) ≥ c − 16 2 0 0
On the Spectrum of Schr¨odinger Operators with Magnetic Fields
2c0 ε
b(x)
d(x) c0
1−ε −ε
889
ε
x
1+ε 1
x −ε
2
ε
1
2
−c0
− 2cε0 Figure 1
Figure 2
It is easy to see that c0 2 ≥ c ≥ (2 − 4ε)c0 2 ·
1 2
= c0 2 − 2εc0 2 ,
and hence Z
2
|d 2 (x) − c| dx ≤ 4εc0 2 + (2 − 4ε) c0 2 − c
0
≤ 4εc0 2 + (2 − 4ε) · 2εc0 2 ≤ 8εc0 2 , thus
(11) µ(0) ≥ c0 2 − 2εc0 2 − 4ε2 c0 4 . R2 0 2 Due to [E, (2.2.10), p. 23], we have µ(η) ≤ 0 |f (ξ)| + (d(ξ) − η)2 |f (ξ)|2 dξ for every R2 2-periodic C ∞ -function f satisfying 0 |f (ξ)|2 dξ = 1. Choosing f (ξ) = 21 − √12 sin(πξ) R2 and, using 0 |f |2 = 1 and the symmetries of d and sin, we get Z
π2 cos2 (πξ) + c0 2 − 2ηd(ξ) + η 2 |f (ξ)|2 dξ 0 2 Z 2 1 1 2 π2 1 2 2 √ + c0 + η − 2η − d(ξ) sin(πξ) + sin (πξ) dξ = 2 4 2 2 0 Z 2 2 √ π + c0 2 + η 2 + 2η d(ξ) sin(πξ) dξ . = 2 0 2
µ(η) ≤
Obviously, for ε < 41 , Z
Z
2
d(ξ) sin(πξ) dξ ≥ 2 0
Thus, for η
π2 2
3 4 1 4
Z d(ξ) sin(πξ) dξ ≥ 2
3 4 1 4
c0 1 c0 √ dξ = √ . 2 2
+ η 2 + c0 η and choosing η = − c20 we get for large c0
890
µ(− c20 ) ≤ c0 2 − 41 c0 2 +
G. Hoever
π2 2
c0 large
(11)
< c0 2 − 15 c0 2 − 4 ≤ c0 2 − 2εc0 2 − 4ε2 c0 4 ≤ µ(0),
and hence (10) holds. Acknowledgement. It is a pleasure to thank Prof. Dr. H. Kalf for stimulating discussions.
References [DSch] [E]
Dunford, N., Schwartz, J.T.: Linear Operators, II: Spectral Theory. New York: Interscience, 1963 Eastham, M.S.P.: The spectral theory of periodic differential equations. Edinburgh: Scottish Academic Press, 1973 [I85] Iwatsuka, A.: Examples of absolutely continuous Schr¨odinger operators in magnetic fields. Publ. Res. Inst. Math. Sci. 21, 385–401 (1985) [I90] Iwatsuka, A.: On Schr¨odinger operators with magnetic fields. In: Fujita, H., Ikebe, T., Kuroda, S.T. (eds.) Functional-analytic methods for partial differential equations. Proceedings, Tokyo 1989, Lect. Notes Math. 1450, 157–172 (1990) [HHK] Hempel, R., Hinz, A.M., Kalf, H.: On the essential spectrum of Schr¨odinger operators with spherically symmetric potentials. Math. Ann. 277, 197–208 (1987) [HHHK] Hempel, R., Herbst, I., Hinz, A.M., Kalf, H.: Intervals of dense point spectrum for spherically symmetric Schr¨odinger operators of the type −1 + cos |x|. J. Lond. Math. Soc., II. Ser. 43, 295– 304 (1991) [K] Kato, T.: Perturbation Theory for Linear Operators. Berlin-Heidelberg-New York: Springer, 1976 [L] Leinfelder, H.: Gauge invariance of Schr¨odinger operators and related spectral properties. J. Oper. Theory 9, 163–179 (1983) [LS] Leinfelder, H., Simader, C.G.: Schr¨odinger operators with singular magnetic vector potentials. Math. Z. 176, 1–19 (1981) [MS] Miller, K., Simon, B.: Quantum magnetic Hamiltonians with remarkable spectral properties. Phys. Rev. Lett. 44, 1706–1707 (1980) [RS] Reed, M., Simon, B.: Methods of Modern Mathematical Physics, IV: Analysis of Operators. New York: Academic Press, 1978 [Sch] Schmidt, K.M.: Dense point spectrum and absolutely continuous spectrum in spherically symmetric Dirac operators. Forum Math. 7, 459–475 (1995) [St] Stolz, G.: On the absolutely continuous spectrum of perturbed periodic Sturm-Liouville operators. J. Reine Angew. Math. 416, 1–23 (1991) [W] Weidmann, J.: Linear Operators in Hilbert Spaces. Berlin–Heidelberg–New York: Springer, 1980 Communicated by B. Simon